Sample records for rule mining algorithms

  1. Using an improved association rules mining optimization algorithm in web-based mobile-learning system

    NASA Astrophysics Data System (ADS)

    Huang, Yin; Chen, Jianhua; Xiong, Shaojun

    2009-07-01

    Mobile-Learning (M-learning) makes many learners get the advantages of both traditional learning and E-learning. Currently, Web-based Mobile-Learning Systems have created many new ways and defined new relationships between educators and learners. Association rule mining is one of the most important fields in data mining and knowledge discovery in databases. Rules explosion is a serious problem which causes great concerns, as conventional mining algorithms often produce too many rules for decision makers to digest. Since Web-based Mobile-Learning System collects vast amounts of student profile data, data mining and knowledge discovery techniques can be applied to find interesting relationships between attributes of learners, assessments, the solution strategies adopted by learners and so on. Therefore ,this paper focus on a new data-mining algorithm, combined with the advantages of genetic algorithm and simulated annealing algorithm , called ARGSA(Association rules based on an improved Genetic Simulated Annealing Algorithm), to mine the association rules. This paper first takes advantage of the Parallel Genetic Algorithm and Simulated Algorithm designed specifically for discovering association rules. Moreover, the analysis and experiment are also made to show the proposed method is superior to the Apriori algorithm in this Mobile-Learning system.

  2. Mining algorithm for association rules in big data based on Hadoop

    NASA Astrophysics Data System (ADS)

    Fu, Chunhua; Wang, Xiaojing; Zhang, Lijun; Qiao, Liying

    2018-04-01

    In order to solve the problem that the traditional association rules mining algorithm has been unable to meet the mining needs of large amount of data in the aspect of efficiency and scalability, take FP-Growth as an example, the algorithm is realized in the parallelization based on Hadoop framework and Map Reduce model. On the basis, it is improved using the transaction reduce method for further enhancement of the algorithm's mining efficiency. The experiment, which consists of verification of parallel mining results, comparison on efficiency between serials and parallel, variable relationship between mining time and node number and between mining time and data amount, is carried out in the mining results and efficiency by Hadoop clustering. Experiments show that the paralleled FP-Growth algorithm implemented is able to accurately mine frequent item sets, with a better performance and scalability. It can be better to meet the requirements of big data mining and efficiently mine frequent item sets and association rules from large dataset.

  3. RANWAR: rank-based weighted association rule mining from gene expression and methylation data.

    PubMed

    Mallik, Saurav; Mukhopadhyay, Anirban; Maulik, Ujjwal

    2015-01-01

    Ranking of association rules is currently an interesting topic in data mining and bioinformatics. The huge number of evolved rules of items (or, genes) by association rule mining (ARM) algorithms makes confusion to the decision maker. In this article, we propose a weighted rule-mining technique (say, RANWAR or rank-based weighted association rule-mining) to rank the rules using two novel rule-interestingness measures, viz., rank-based weighted condensed support (wcs) and weighted condensed confidence (wcc) measures to bypass the problem. These measures are basically depended on the rank of items (genes). Using the rank, we assign weight to each item. RANWAR generates much less number of frequent itemsets than the state-of-the-art association rule mining algorithms. Thus, it saves time of execution of the algorithm. We run RANWAR on gene expression and methylation datasets. The genes of the top rules are biologically validated by Gene Ontologies (GOs) and KEGG pathway analyses. Many top ranked rules extracted from RANWAR that hold poor ranks in traditional Apriori, are highly biologically significant to the related diseases. Finally, the top rules evolved from RANWAR, that are not in Apriori, are reported.

  4. A novel approach for incremental uncertainty rule generation from databases with missing values handling: application to dynamic medical databases.

    PubMed

    Konias, Sokratis; Chouvarda, Ioanna; Vlahavas, Ioannis; Maglaveras, Nicos

    2005-09-01

    Current approaches for mining association rules usually assume that the mining is performed in a static database, where the problem of missing attribute values does not practically exist. However, these assumptions are not preserved in some medical databases, like in a home care system. In this paper, a novel uncertainty rule algorithm is illustrated, namely URG-2 (Uncertainty Rule Generator), which addresses the problem of mining dynamic databases containing missing values. This algorithm requires only one pass from the initial dataset in order to generate the item set, while new metrics corresponding to the notion of Support and Confidence are used. URG-2 was evaluated over two medical databases, introducing randomly multiple missing values for each record's attribute (rate: 5-20% by 5% increments) in the initial dataset. Compared with the classical approach (records with missing values are ignored), the proposed algorithm was more robust in mining rules from datasets containing missing values. In all cases, the difference in preserving the initial rules ranged between 30% and 60% in favour of URG-2. Moreover, due to its incremental nature, URG-2 saved over 90% of the time required for thorough re-mining. Thus, the proposed algorithm can offer a preferable solution for mining in dynamic relational databases.

  5. Boosting association rule mining in large datasets via Gibbs sampling.

    PubMed

    Qian, Guoqi; Rao, Calyampudi Radhakrishna; Sun, Xiaoying; Wu, Yuehua

    2016-05-03

    Current algorithms for association rule mining from transaction data are mostly deterministic and enumerative. They can be computationally intractable even for mining a dataset containing just a few hundred transaction items, if no action is taken to constrain the search space. In this paper, we develop a Gibbs-sampling-induced stochastic search procedure to randomly sample association rules from the itemset space, and perform rule mining from the reduced transaction dataset generated by the sample. Also a general rule importance measure is proposed to direct the stochastic search so that, as a result of the randomly generated association rules constituting an ergodic Markov chain, the overall most important rules in the itemset space can be uncovered from the reduced dataset with probability 1 in the limit. In the simulation study and a real genomic data example, we show how to boost association rule mining by an integrated use of the stochastic search and the Apriori algorithm.

  6. An Incremental High-Utility Mining Algorithm with Transaction Insertion

    PubMed Central

    Gan, Wensheng; Zhang, Binbin

    2015-01-01

    Association-rule mining is commonly used to discover useful and meaningful patterns from a very large database. It only considers the occurrence frequencies of items to reveal the relationships among itemsets. Traditional association-rule mining is, however, not suitable in real-world applications since the purchased items from a customer may have various factors, such as profit or quantity. High-utility mining was designed to solve the limitations of association-rule mining by considering both the quantity and profit measures. Most algorithms of high-utility mining are designed to handle the static database. Fewer researches handle the dynamic high-utility mining with transaction insertion, thus requiring the computations of database rescan and combination explosion of pattern-growth mechanism. In this paper, an efficient incremental algorithm with transaction insertion is designed to reduce computations without candidate generation based on the utility-list structures. The enumeration tree and the relationships between 2-itemsets are also adopted in the proposed algorithm to speed up the computations. Several experiments are conducted to show the performance of the proposed algorithm in terms of runtime, memory consumption, and number of generated patterns. PMID:25811038

  7. An Algorithm of Association Rule Mining for Microbial Energy Prospection

    PubMed Central

    Shaheen, Muhammad; Shahbaz, Muhammad

    2017-01-01

    The presence of hydrocarbons beneath earth’s surface produces some microbiological anomalies in soils and sediments. The detection of such microbial populations involves pure bio chemical processes which are specialized, expensive and time consuming. This paper proposes a new algorithm of context based association rule mining on non spatial data. The algorithm is a modified form of already developed algorithm which was for spatial database only. The algorithm is applied to mine context based association rules on microbial database to extract interesting and useful associations of microbial attributes with existence of hydrocarbon reserve. The surface and soil manifestations caused by the presence of hydrocarbon oxidizing microbes are selected from existing literature and stored in a shared database. The algorithm is applied on the said database to generate direct and indirect associations among the stored microbial indicators. These associations are then correlated with the probability of hydrocarbon’s existence. The numerical evaluation shows better accuracy for non-spatial data as compared to conventional algorithms at generating reliable and robust rules. PMID:28393846

  8. Attribute index and uniform design based multiobjective association rule mining with evolutionary algorithm.

    PubMed

    Zhang, Jie; Wang, Yuping; Feng, Junhong

    2013-01-01

    In association rule mining, evaluating an association rule needs to repeatedly scan database to compare the whole database with the antecedent, consequent of a rule and the whole rule. In order to decrease the number of comparisons and time consuming, we present an attribute index strategy. It only needs to scan database once to create the attribute index of each attribute. Then all metrics values to evaluate an association rule do not need to scan database any further, but acquire data only by means of the attribute indices. The paper visualizes association rule mining as a multiobjective problem rather than a single objective one. In order to make the acquired solutions scatter uniformly toward the Pareto frontier in the objective space, elitism policy and uniform design are introduced. The paper presents the algorithm of attribute index and uniform design based multiobjective association rule mining with evolutionary algorithm, abbreviated as IUARMMEA. It does not require the user-specified minimum support and minimum confidence anymore, but uses a simple attribute index. It uses a well-designed real encoding so as to extend its application scope. Experiments performed on several databases demonstrate that the proposed algorithm has excellent performance, and it can significantly reduce the number of comparisons and time consumption.

  9. Attribute Index and Uniform Design Based Multiobjective Association Rule Mining with Evolutionary Algorithm

    PubMed Central

    Wang, Yuping; Feng, Junhong

    2013-01-01

    In association rule mining, evaluating an association rule needs to repeatedly scan database to compare the whole database with the antecedent, consequent of a rule and the whole rule. In order to decrease the number of comparisons and time consuming, we present an attribute index strategy. It only needs to scan database once to create the attribute index of each attribute. Then all metrics values to evaluate an association rule do not need to scan database any further, but acquire data only by means of the attribute indices. The paper visualizes association rule mining as a multiobjective problem rather than a single objective one. In order to make the acquired solutions scatter uniformly toward the Pareto frontier in the objective space, elitism policy and uniform design are introduced. The paper presents the algorithm of attribute index and uniform design based multiobjective association rule mining with evolutionary algorithm, abbreviated as IUARMMEA. It does not require the user-specified minimum support and minimum confidence anymore, but uses a simple attribute index. It uses a well-designed real encoding so as to extend its application scope. Experiments performed on several databases demonstrate that the proposed algorithm has excellent performance, and it can significantly reduce the number of comparisons and time consumption. PMID:23766683

  10. Java implementation of Class Association Rule algorithms

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tamura, Makio

    2007-08-30

    Java implementation of three Class Association Rule mining algorithms, NETCAR, CARapriori, and clustering based rule mining. NETCAR algorithm is a novel algorithm developed by Makio Tamura. The algorithm is discussed in a paper: UCRL-JRNL-232466-DRAFT, and would be published in a peer review scientific journal. The software is used to extract combinations of genes relevant with a phenotype from a phylogenetic profile and a phenotype profile. The phylogenetic profiles is represented by a binary matrix and a phenotype profile is represented by a binary vector. The present application of this software will be in genome analysis, however, it could be appliedmore » more generally.« less

  11. Highly scalable and robust rule learner: performance evaluation and comparison.

    PubMed

    Kurgan, Lukasz A; Cios, Krzysztof J; Dick, Scott

    2006-02-01

    Business intelligence and bioinformatics applications increasingly require the mining of datasets consisting of millions of data points, or crafting real-time enterprise-level decision support systems for large corporations and drug companies. In all cases, there needs to be an underlying data mining system, and this mining system must be highly scalable. To this end, we describe a new rule learner called DataSqueezer. The learner belongs to the family of inductive supervised rule extraction algorithms. DataSqueezer is a simple, greedy, rule builder that generates a set of production rules from labeled input data. In spite of its relative simplicity, DataSqueezer is a very effective learner. The rules generated by the algorithm are compact, comprehensible, and have accuracy comparable to rules generated by other state-of-the-art rule extraction algorithms. The main advantages of DataSqueezer are very high efficiency, and missing data resistance. DataSqueezer exhibits log-linear asymptotic complexity with the number of training examples, and it is faster than other state-of-the-art rule learners. The learner is also robust to large quantities of missing data, as verified by extensive experimental comparison with the other learners. DataSqueezer is thus well suited to modern data mining and business intelligence tasks, which commonly involve huge datasets with a large fraction of missing data.

  12. Data mining for multiagent rules, strategies, and fuzzy decision tree structure

    NASA Astrophysics Data System (ADS)

    Smith, James F., III; Rhyne, Robert D., II; Fisher, Kristin

    2002-03-01

    A fuzzy logic based resource manager (RM) has been developed that automatically allocates electronic attack resources in real-time over many dissimilar platforms. Two different data mining algorithms have been developed to determine rules, strategies, and fuzzy decision tree structure. The first data mining algorithm uses a genetic algorithm as a data mining function and is called from an electronic game. The game allows a human expert to play against the resource manager in a simulated battlespace with each of the defending platforms being exclusively directed by the fuzzy resource manager and the attacking platforms being controlled by the human expert or operating autonomously under their own logic. This approach automates the data mining problem. The game automatically creates a database reflecting the domain expert's knowledge. It calls a data mining function, a genetic algorithm, for data mining of the database as required and allows easy evaluation of the information mined in the second step. The criterion for re- optimization is discussed as well as experimental results. Then a second data mining algorithm that uses a genetic program as a data mining function is introduced to automatically discover fuzzy decision tree structures. Finally, a fuzzy decision tree generated through this process is discussed.

  13. Genetic Algorithm Calibration of Probabilistic Cellular Automata for Modeling Mining Permit Activity

    USGS Publications Warehouse

    Louis, S.J.; Raines, G.L.

    2003-01-01

    We use a genetic algorithm to calibrate a spatially and temporally resolved cellular automata to model mining activity on public land in Idaho and western Montana. The genetic algorithm searches through a space of transition rule parameters of a two dimensional cellular automata model to find rule parameters that fit observed mining activity data. Previous work by one of the authors in calibrating the cellular automaton took weeks - the genetic algorithm takes a day and produces rules leading to about the same (or better) fit to observed data. These preliminary results indicate that genetic algorithms are a viable tool in calibrating cellular automata for this application. Experience gained during the calibration of this cellular automata suggests that mineral resource information is a critical factor in the quality of the results. With automated calibration, further refinements of how the mineral-resource information is provided to the cellular automaton will probably improve our model.

  14. On Interestingness Measures for Mining Statistically Significant and Novel Clinical Associations from EMRs

    PubMed Central

    Abar, Orhan; Charnigo, Richard J.; Rayapati, Abner

    2017-01-01

    Association rule mining has received significant attention from both the data mining and machine learning communities. While data mining researchers focus more on designing efficient algorithms to mine rules from large datasets, the learning community has explored applications of rule mining to classification. A major problem with rule mining algorithms is the explosion of rules even for moderate sized datasets making it very difficult for end users to identify both statistically significant and potentially novel rules that could lead to interesting new insights and hypotheses. Researchers have proposed many domain independent interestingness measures using which, one can rank the rules and potentially glean useful rules from the top ranked ones. However, these measures have not been fully explored for rule mining in clinical datasets owing to the relatively large sizes of the datasets often encountered in healthcare and also due to limited access to domain experts for review/analysis. In this paper, using an electronic medical record (EMR) dataset of diagnoses and medications from over three million patient visits to the University of Kentucky medical center and affiliated clinics, we conduct a thorough evaluation of dozens of interestingness measures proposed in data mining literature, including some new composite measures. Using cumulative relevance metrics from information retrieval, we compare these interestingness measures against human judgments obtained from a practicing psychiatrist for association rules involving the depressive disorders class as the consequent. Our results not only surface new interesting associations for depressive disorders but also indicate classes of interestingness measures that weight rule novelty and statistical strength in contrasting ways, offering new insights for end users in identifying interesting rules. PMID:28736771

  15. A novel artificial immune clonal selection classification and rule mining with swarm learning model

    NASA Astrophysics Data System (ADS)

    Al-Sheshtawi, Khaled A.; Abdul-Kader, Hatem M.; Elsisi, Ashraf B.

    2013-06-01

    Metaheuristic optimisation algorithms have become popular choice for solving complex problems. By integrating Artificial Immune clonal selection algorithm (CSA) and particle swarm optimisation (PSO) algorithm, a novel hybrid Clonal Selection Classification and Rule Mining with Swarm Learning Algorithm (CS2) is proposed. The main goal of the approach is to exploit and explore the parallel computation merit of Clonal Selection and the speed and self-organisation merits of Particle Swarm by sharing information between clonal selection population and particle swarm. Hence, we employed the advantages of PSO to improve the mutation mechanism of the artificial immune CSA and to mine classification rules within datasets. Consequently, our proposed algorithm required less training time and memory cells in comparison to other AIS algorithms. In this paper, classification rule mining has been modelled as a miltiobjective optimisation problem with predictive accuracy. The multiobjective approach is intended to allow the PSO algorithm to return an approximation to the accuracy and comprehensibility border, containing solutions that are spread across the border. We compared our proposed algorithm classification accuracy CS2 with five commonly used CSAs, namely: AIRS1, AIRS2, AIRS-Parallel, CLONALG, and CSCA using eight benchmark datasets. We also compared our proposed algorithm classification accuracy CS2 with other five methods, namely: Naïve Bayes, SVM, MLP, CART, and RFB. The results show that the proposed algorithm is comparable to the 10 studied algorithms. As a result, the hybridisation, built of CSA and PSO, can develop respective merit, compensate opponent defect, and make search-optimal effect and speed better.

  16. Effect of Temporal Relationships in Associative Rule Mining for Web Log Data

    PubMed Central

    Mohd Khairudin, Nazli; Mustapha, Aida

    2014-01-01

    The advent of web-based applications and services has created such diverse and voluminous web log data stored in web servers, proxy servers, client machines, or organizational databases. This paper attempts to investigate the effect of temporal attribute in relational rule mining for web log data. We incorporated the characteristics of time in the rule mining process and analysed the effect of various temporal parameters. The rules generated from temporal relational rule mining are then compared against the rules generated from the classical rule mining approach such as the Apriori and FP-Growth algorithms. The results showed that by incorporating the temporal attribute via time, the number of rules generated is subsequently smaller but is comparable in terms of quality. PMID:24587757

  17. A Bayesian Scoring Technique for Mining Predictive and Non-Spurious Rules

    PubMed Central

    Batal, Iyad; Cooper, Gregory; Hauskrecht, Milos

    2015-01-01

    Rule mining is an important class of data mining methods for discovering interesting patterns in data. The success of a rule mining method heavily depends on the evaluation function that is used to assess the quality of the rules. In this work, we propose a new rule evaluation score - the Predictive and Non-Spurious Rules (PNSR) score. This score relies on Bayesian inference to evaluate the quality of the rules and considers the structure of the rules to filter out spurious rules. We present an efficient algorithm for finding rules with high PNSR scores. The experiments demonstrate that our method is able to cover and explain the data with a much smaller rule set than existing methods. PMID:25938136

  18. A Bayesian Scoring Technique for Mining Predictive and Non-Spurious Rules.

    PubMed

    Batal, Iyad; Cooper, Gregory; Hauskrecht, Milos

    Rule mining is an important class of data mining methods for discovering interesting patterns in data. The success of a rule mining method heavily depends on the evaluation function that is used to assess the quality of the rules. In this work, we propose a new rule evaluation score - the Predictive and Non-Spurious Rules (PNSR) score. This score relies on Bayesian inference to evaluate the quality of the rules and considers the structure of the rules to filter out spurious rules. We present an efficient algorithm for finding rules with high PNSR scores. The experiments demonstrate that our method is able to cover and explain the data with a much smaller rule set than existing methods.

  19. Effective application of improved profit-mining algorithm for the interday trading model.

    PubMed

    Hsieh, Yu-Lung; Yang, Don-Lin; Wu, Jungpin

    2014-01-01

    Many real world applications of association rule mining from large databases help users make better decisions. However, they do not work well in financial markets at this time. In addition to a high profit, an investor also looks for a low risk trading with a better rate of winning. The traditional approach of using minimum confidence and support thresholds needs to be changed. Based on an interday model of trading, we proposed effective profit-mining algorithms which provide investors with profit rules including information about profit, risk, and winning rate. Since profit-mining in the financial market is still in its infant stage, it is important to detail the inner working of mining algorithms and illustrate the best way to apply them. In this paper we go into details of our improved profit-mining algorithm and showcase effective applications with experiments using real world trading data. The results show that our approach is practical and effective with good performance for various datasets.

  20. Effective Application of Improved Profit-Mining Algorithm for the Interday Trading Model

    PubMed Central

    Wu, Jungpin

    2014-01-01

    Many real world applications of association rule mining from large databases help users make better decisions. However, they do not work well in financial markets at this time. In addition to a high profit, an investor also looks for a low risk trading with a better rate of winning. The traditional approach of using minimum confidence and support thresholds needs to be changed. Based on an interday model of trading, we proposed effective profit-mining algorithms which provide investors with profit rules including information about profit, risk, and winning rate. Since profit-mining in the financial market is still in its infant stage, it is important to detail the inner working of mining algorithms and illustrate the best way to apply them. In this paper we go into details of our improved profit-mining algorithm and showcase effective applications with experiments using real world trading data. The results show that our approach is practical and effective with good performance for various datasets. PMID:24688442

  1. Big data mining analysis method based on cloud computing

    NASA Astrophysics Data System (ADS)

    Cai, Qing Qiu; Cui, Hong Gang; Tang, Hao

    2017-08-01

    Information explosion era, large data super-large, discrete and non-(semi) structured features have gone far beyond the traditional data management can carry the scope of the way. With the arrival of the cloud computing era, cloud computing provides a new technical way to analyze the massive data mining, which can effectively solve the problem that the traditional data mining method cannot adapt to massive data mining. This paper introduces the meaning and characteristics of cloud computing, analyzes the advantages of using cloud computing technology to realize data mining, designs the mining algorithm of association rules based on MapReduce parallel processing architecture, and carries out the experimental verification. The algorithm of parallel association rule mining based on cloud computing platform can greatly improve the execution speed of data mining.

  2. Software tool for data mining and its applications

    NASA Astrophysics Data System (ADS)

    Yang, Jie; Ye, Chenzhou; Chen, Nianyi

    2002-03-01

    A software tool for data mining is introduced, which integrates pattern recognition (PCA, Fisher, clustering, hyperenvelop, regression), artificial intelligence (knowledge representation, decision trees), statistical learning (rough set, support vector machine), computational intelligence (neural network, genetic algorithm, fuzzy systems). It consists of nine function models: pattern recognition, decision trees, association rule, fuzzy rule, neural network, genetic algorithm, Hyper Envelop, support vector machine, visualization. The principle and knowledge representation of some function models of data mining are described. The software tool of data mining is realized by Visual C++ under Windows 2000. Nonmonotony in data mining is dealt with by concept hierarchy and layered mining. The software tool of data mining has satisfactorily applied in the prediction of regularities of the formation of ternary intermetallic compounds in alloy systems, and diagnosis of brain glioma.

  3. Techniques of Acceleration for Association Rule Induction with Pseudo Artificial Life Algorithm

    NASA Astrophysics Data System (ADS)

    Kanakubo, Masaaki; Hagiwara, Masafumi

    Frequent patterns mining is one of the important problems in data mining. Generally, the number of potential rules grows rapidly as the size of database increases. It is therefore hard for a user to extract the association rules. To avoid such a difficulty, we propose a new method for association rule induction with pseudo artificial life approach. The proposed method is to decide whether there exists an item set which contains N or more items in two transactions. If it exists, a series of item sets which are contained in the part of transactions will be recorded. The iteration of this step contributes to the extraction of association rules. It is not necessary to calculate the huge number of candidate rules. In the evaluation test, we compared the extracted association rules using our method with the rules using other algorithms like Apriori algorithm. As a result of the evaluation using huge retail market basket data, our method is approximately 10 and 20 times faster than the Apriori algorithm and many its variants.

  4. DTFP-Growth: Dynamic Threshold-Based FP-Growth Rule Mining Algorithm Through Integrating Gene Expression, Methylation, and Protein-Protein Interaction Profiles.

    PubMed

    Mallik, Saurav; Bhadra, Tapas; Mukherji, Ayan; Mallik, Saurav; Bhadra, Tapas; Mukherji, Ayan; Mallik, Saurav; Bhadra, Tapas; Mukherji, Ayan

    2018-04-01

    Association rule mining is an important technique for identifying interesting relationships between gene pairs in a biological data set. Earlier methods basically work for a single biological data set, and, in maximum cases, a single minimum support cutoff can be applied globally, i.e., across all genesets/itemsets. To overcome this limitation, in this paper, we propose dynamic threshold-based FP-growth rule mining algorithm that integrates gene expression, methylation and protein-protein interaction profiles based on weighted shortest distance to find the novel associations among different pairs of genes in multi-view data sets. For this purpose, we introduce three new thresholds, namely, Distance-based Variable/Dynamic Supports (DVS), Distance-based Variable Confidences (DVC), and Distance-based Variable Lifts (DVL) for each rule by integrating co-expression, co-methylation, and protein-protein interactions existed in the multi-omics data set. We develop the proposed algorithm utilizing these three novel multiple threshold measures. In the proposed algorithm, the values of , , and are computed for each rule separately, and subsequently it is verified whether the support, confidence, and lift of each evolved rule are greater than or equal to the corresponding individual , , and values, respectively, or not. If all these three conditions for a rule are found to be true, the rule is treated as a resultant rule. One of the major advantages of the proposed method compared with other related state-of-the-art methods is that it considers both the quantitative and interactive significance among all pairwise genes belonging to each rule. Moreover, the proposed method generates fewer rules, takes less running time, and provides greater biological significance for the resultant top-ranking rules compared to previous methods.

  5. The association rules search of Indonesian university graduate’s data using FP-growth algorithm

    NASA Astrophysics Data System (ADS)

    Faza, S.; Rahmat, R. F.; Nababan, E. B.; Arisandi, D.; Effendi, S.

    2018-02-01

    The attribute varieties in university graduates data have caused frustrations to the institution in finding the combinations of attributes that often emerge and have high integration between attributes. Association rules mining is a data mining technique to determine the integration of the data or the way of a data set affects another set of data. By way of explanation, there are possibilities in finding the integration of data on a large scale. Frequent Pattern-Growth (FP-Growth) algorithm is one of the association rules mining technique to determine a frequent itemset in an FP-Tree data set. From the research on the search of university graduate’s association rules, it can be concluded that the most common attributes that have high integration between them are in the combination of State-owned High School outside Medan, regular university entrance exam, GPA of 3.00 to 3.49 and over 4-year-long study duration.

  6. Inferring Intra-Community Microbial Interaction Patterns from Metagenomic Datasets Using Associative Rule Mining Techniques

    PubMed Central

    Mande, Sharmila S.

    2016-01-01

    The nature of inter-microbial metabolic interactions defines the stability of microbial communities residing in any ecological niche. Deciphering these interaction patterns is crucial for understanding the mode/mechanism(s) through which an individual microbial community transitions from one state to another (e.g. from a healthy to a diseased state). Statistical correlation techniques have been traditionally employed for mining microbial interaction patterns from taxonomic abundance data corresponding to a given microbial community. In spite of their efficiency, these correlation techniques can capture only 'pair-wise interactions'. Moreover, their emphasis on statistical significance can potentially result in missing out on several interactions that are relevant from a biological standpoint. This study explores the applicability of one of the earliest association rule mining algorithm i.e. the 'Apriori algorithm' for deriving 'microbial association rules' from the taxonomic profile of given microbial community. The classical Apriori approach derives association rules by analysing patterns of co-occurrence/co-exclusion between various '(subsets of) features/items' across various samples. Using real-world microbiome data, the efficiency/utility of this rule mining approach in deciphering multiple (biologically meaningful) association patterns between 'subsets/subgroups' of microbes (constituting microbiome samples) is demonstrated. As an example, association rules derived from publicly available gut microbiome datasets indicate an association between a group of microbes (Faecalibacterium, Dorea, and Blautia) that are known to have mutualistic metabolic associations among themselves. Application of the rule mining approach on gut microbiomes (sourced from the Human Microbiome Project) further indicated similar microbial association patterns in gut microbiomes irrespective of the gender of the subjects. A Linux implementation of the Association Rule Mining (ARM) software (customised for deriving 'microbial association rules' from microbiome data) is freely available for download from the following link: http://metagenomics.atc.tcs.com/arm. PMID:27124399

  7. Inferring Intra-Community Microbial Interaction Patterns from Metagenomic Datasets Using Associative Rule Mining Techniques.

    PubMed

    Tandon, Disha; Haque, Mohammed Monzoorul; Mande, Sharmila S

    2016-01-01

    The nature of inter-microbial metabolic interactions defines the stability of microbial communities residing in any ecological niche. Deciphering these interaction patterns is crucial for understanding the mode/mechanism(s) through which an individual microbial community transitions from one state to another (e.g. from a healthy to a diseased state). Statistical correlation techniques have been traditionally employed for mining microbial interaction patterns from taxonomic abundance data corresponding to a given microbial community. In spite of their efficiency, these correlation techniques can capture only 'pair-wise interactions'. Moreover, their emphasis on statistical significance can potentially result in missing out on several interactions that are relevant from a biological standpoint. This study explores the applicability of one of the earliest association rule mining algorithm i.e. the 'Apriori algorithm' for deriving 'microbial association rules' from the taxonomic profile of given microbial community. The classical Apriori approach derives association rules by analysing patterns of co-occurrence/co-exclusion between various '(subsets of) features/items' across various samples. Using real-world microbiome data, the efficiency/utility of this rule mining approach in deciphering multiple (biologically meaningful) association patterns between 'subsets/subgroups' of microbes (constituting microbiome samples) is demonstrated. As an example, association rules derived from publicly available gut microbiome datasets indicate an association between a group of microbes (Faecalibacterium, Dorea, and Blautia) that are known to have mutualistic metabolic associations among themselves. Application of the rule mining approach on gut microbiomes (sourced from the Human Microbiome Project) further indicated similar microbial association patterns in gut microbiomes irrespective of the gender of the subjects. A Linux implementation of the Association Rule Mining (ARM) software (customised for deriving 'microbial association rules' from microbiome data) is freely available for download from the following link: http://metagenomics.atc.tcs.com/arm.

  8. Empirical evaluation of interest-level criteria

    NASA Astrophysics Data System (ADS)

    Sahar, Sigal; Mansour, Yishay

    1999-02-01

    Efficient association rule mining algorithms already exist, however, as the size of databases increases, the number of patterns mined by the algorithms increases to such an extent that their manual evaluation becomes impractical. Automatic evaluation methods are, therefore, required in order to sift through the initial list of rules, which the datamining algorithm outputs. These evaluation methods, or criteria, rank the association rules mined from the dataset. We empirically examined several such statistical criteria: new criteria, as well as previously known ones. The empirical evaluation was conducted using several databases, including a large real-life dataset, acquired from an order-by-phone grocery store, a dataset composed from www proxy logs, and several datasets from the UCI repository. We were interested in discovering whether the ranking performed by the various criteria is similar or easily distinguishable. Our evaluation detected, when significant differences exist, three patterns of behavior in the eight criteria we examined. There is an obvious dilemma in determining how many association rules to choose (in accordance with support and confidence parameters). The tradeoff is between having stringent parameters and, therefore, few rules, or lenient parameters and, thus, a multitude of rules. In many cases, our empirical evaluation revealed that most of the rules found by the comparably strict parameters ranked highly according to the interestingness criteria, when using lax parameters (producing significantly more association rules). Finally, we discuss the association rules that ranked highest, explain why these results are sound, and how they direct future research.

  9. Knowledge-guided mutation in classification rules for autism treatment efficacy.

    PubMed

    Engle, Kelley; Rada, Roy

    2017-03-01

    Data mining methods in biomedical research might benefit by combining genetic algorithms with domain-specific knowledge. The objective of this research is to show how the evolution of treatment rules for autism might be guided. The semantic distance between two concepts in the taxonomy is measured by the number of relationships separating the concepts in the taxonomy. The hypothesis is that replacing a concept in a treatment rule will change the accuracy of the rule in direct proportion to the semantic distance between the concepts. The method uses a patient database and autism taxonomies. Treatment rules are developed with an algorithm that exploits the taxonomies. The results support the hypothesis. This research should both advance the understanding of autism data mining in particular and of knowledge-guided evolutionary search in biomedicine in general.

  10. Predicting missing values in a home care database using an adaptive uncertainty rule method.

    PubMed

    Konias, S; Gogou, G; Bamidis, P D; Vlahavas, I; Maglaveras, N

    2005-01-01

    Contemporary literature illustrates an abundance of adaptive algorithms for mining association rules. However, most literature is unable to deal with the peculiarities, such as missing values and dynamic data creation, that are frequently encountered in fields like medicine. This paper proposes an uncertainty rule method that uses an adaptive threshold for filling missing values in newly added records. A new approach for mining uncertainty rules and filling missing values is proposed, which is in turn particularly suitable for dynamic databases, like the ones used in home care systems. In this study, a new data mining method named FiMV (Filling Missing Values) is illustrated based on the mined uncertainty rules. Uncertainty rules have quite a similar structure to association rules and are extracted by an algorithm proposed in previous work, namely AURG (Adaptive Uncertainty Rule Generation). The main target was to implement an appropriate method for recovering missing values in a dynamic database, where new records are continuously added, without needing to specify any kind of thresholds beforehand. The method was applied to a home care monitoring system database. Randomly, multiple missing values for each record's attributes (rate 5-20% by 5% increments) were introduced in the initial dataset. FiMV demonstrated 100% completion rates with over 90% success in each case, while usual approaches, where all records with missing values are ignored or thresholds are required, experienced significantly reduced completion and success rates. It is concluded that the proposed method is appropriate for the data-cleaning step of the Knowledge Discovery process in databases. The latter, containing much significance for the output efficiency of any data mining technique, can improve the quality of the mined information.

  11. Urinary metabolic profiling of asymptomatic acute intermittent porphyria using a rule-mining-based algorithm.

    PubMed

    Luck, Margaux; Schmitt, Caroline; Talbi, Neila; Gouya, Laurent; Caradeuc, Cédric; Puy, Hervé; Bertho, Gildas; Pallet, Nicolas

    2018-01-01

    Metabolomic profiling combines Nuclear Magnetic Resonance spectroscopy with supervised statistical analysis that might allow to better understanding the mechanisms of a disease. In this study, the urinary metabolic profiling of individuals with porphyrias was performed to predict different types of disease, and to propose new pathophysiological hypotheses. Urine 1 H-NMR spectra of 73 patients with asymptomatic acute intermittent porphyria (aAIP) and familial or sporadic porphyria cutanea tarda (f/sPCT) were compared using a supervised rule-mining algorithm. NMR spectrum buckets bins, corresponding to rules, were extracted and a logistic regression was trained. Our rule-mining algorithm generated results were consistent with those obtained using partial least square discriminant analysis (PLS-DA) and the predictive performance of the model was significant. Buckets that were identified by the algorithm corresponded to metabolites involved in glycolysis and energy-conversion pathways, notably acetate, citrate, and pyruvate, which were found in higher concentrations in the urines of aAIP compared with PCT patients. Metabolic profiling did not discriminate sPCT from fPCT patients. These results suggest that metabolic reprogramming occurs in aAIP individuals, even in the absence of overt symptoms, and supports the relationship that occur between heme synthesis and mitochondrial energetic metabolism.

  12. Exploration of the association rules mining technique for the signal detection of adverse drug events in spontaneous reporting systems.

    PubMed

    Wang, Chao; Guo, Xiao-Jing; Xu, Jin-Fang; Wu, Cheng; Sun, Ya-Lin; Ye, Xiao-Fei; Qian, Wei; Ma, Xiu-Qiang; Du, Wen-Min; He, Jia

    2012-01-01

    The detection of signals of adverse drug events (ADEs) has increased because of the use of data mining algorithms in spontaneous reporting systems (SRSs). However, different data mining algorithms have different traits and conditions for application. The objective of our study was to explore the application of association rule (AR) mining in ADE signal detection and to compare its performance with that of other algorithms. Monte Carlo simulation was applied to generate drug-ADE reports randomly according to the characteristics of SRS datasets. Thousand simulated datasets were mined by AR and other algorithms. On average, 108,337 reports were generated by the Monte Carlo simulation. Based on the predefined criterion that 10% of the drug-ADE combinations were true signals, with RR equaling to 10, 4.9, 1.5, and 1.2, AR detected, on average, 284 suspected associations with a minimum support of 3 and a minimum lift of 1.2. The area under the receiver operating characteristic (ROC) curve of the AR was 0.788, which was equivalent to that shown for other algorithms. Additionally, AR was applied to reports submitted to the Shanghai SRS in 2009. Five hundred seventy combinations were detected using AR from 24,297 SRS reports, and they were compared with recognized ADEs identified by clinical experts and various other sources. AR appears to be an effective method for ADE signal detection, both in simulated and real SRS datasets. The limitations of this method exposed in our study, i.e., a non-uniform thresholds setting and redundant rules, require further research.

  13. Quantum algorithm for association rules mining

    NASA Astrophysics Data System (ADS)

    Yu, Chao-Hua; Gao, Fei; Wang, Qing-Le; Wen, Qiao-Yan

    2016-10-01

    Association rules mining (ARM) is one of the most important problems in knowledge discovery and data mining. Given a transaction database that has a large number of transactions and items, the task of ARM is to acquire consumption habits of customers by discovering the relationships between itemsets (sets of items). In this paper, we address ARM in the quantum settings and propose a quantum algorithm for the key part of ARM, finding frequent itemsets from the candidate itemsets and acquiring their supports. Specifically, for the case in which there are Mf(k ) frequent k -itemsets in the Mc(k ) candidate k -itemsets (Mf(k )≤Mc(k ) ), our algorithm can efficiently mine these frequent k -itemsets and estimate their supports by using parallel amplitude estimation and amplitude amplification with complexity O (k/√{Mc(k )Mf(k ) } ɛ ) , where ɛ is the error for estimating the supports. Compared with the classical counterpart, i.e., the classical sampling-based algorithm, whose complexity is O (k/Mc(k ) ɛ2) , our quantum algorithm quadratically improves the dependence on both ɛ and Mc(k ) in the best case when Mf(k )≪Mc(k ) and on ɛ alone in the worst case when Mf(k )≈Mc(k ) .

  14. Mining Hesitation Information by Vague Association Rules

    NASA Astrophysics Data System (ADS)

    Lu, An; Ng, Wilfred

    In many online shopping applications, such as Amazon and eBay, traditional Association Rule (AR) mining has limitations as it only deals with the items that are sold but ignores the items that are almost sold (for example, those items that are put into the basket but not checked out). We say that those almost sold items carry hesitation information, since customers are hesitating to buy them. The hesitation information of items is valuable knowledge for the design of good selling strategies. However, there is no conceptual model that is able to capture different statuses of hesitation information. Herein, we apply and extend vague set theory in the context of AR mining. We define the concepts of attractiveness and hesitation of an item, which represent the overall information of a customer's intent on an item. Based on the two concepts, we propose the notion of Vague Association Rules (VARs). We devise an efficient algorithm to mine the VARs. Our experiments show that our algorithm is efficient and the VARs capture more specific and richer information than do the traditional ARs.

  15. A New Data Mining Scheme Using Artificial Neural Networks

    PubMed Central

    Kamruzzaman, S. M.; Jehad Sarkar, A. M.

    2011-01-01

    Classification is one of the data mining problems receiving enormous attention in the database community. Although artificial neural networks (ANNs) have been successfully applied in a wide range of machine learning applications, they are however often regarded as black boxes, i.e., their predictions cannot be explained. To enhance the explanation of ANNs, a novel algorithm to extract symbolic rules from ANNs has been proposed in this paper. ANN methods have not been effectively utilized for data mining tasks because how the classifications were made is not explicitly stated as symbolic rules that are suitable for verification or interpretation by human experts. With the proposed approach, concise symbolic rules with high accuracy, that are easily explainable, can be extracted from the trained ANNs. Extracted rules are comparable with other methods in terms of number of rules, average number of conditions for a rule, and the accuracy. The effectiveness of the proposed approach is clearly demonstrated by the experimental results on a set of benchmark data mining classification problems. PMID:22163866

  16. Co-evolutionary data mining for fuzzy rules: automatic fitness function creation phase space, and experiments

    NASA Astrophysics Data System (ADS)

    Smith, James F., III; Blank, Joseph A.

    2003-03-01

    An approach is being explored that involves embedding a fuzzy logic based resource manager in an electronic game environment. Game agents can function under their own autonomous logic or human control. This approach automates the data mining problem. The game automatically creates a cleansed database reflecting the domain expert's knowledge, it calls a data mining function, a genetic algorithm, for data mining of the data base as required and allows easy evaluation of the information extracted. The co-evolutionary fitness functions, chromosomes and stopping criteria for ending the game are discussed. Genetic algorithm and genetic program based data mining procedures are discussed that automatically discover new fuzzy rules and strategies. The strategy tree concept and its relationship to co-evolutionary data mining are examined as well as the associated phase space representation of fuzzy concepts. The overlap of fuzzy concepts in phase space reduces the effective strategies available to adversaries. Co-evolutionary data mining alters the geometric properties of the overlap region known as the admissible region of phase space significantly enhancing the performance of the resource manager. Procedures for validation of the information data mined are discussed and significant experimental results provided.

  17. Data Mining and Privacy of Social Network Sites' Users: Implications of the Data Mining Problem.

    PubMed

    Al-Saggaf, Yeslam; Islam, Md Zahidul

    2015-08-01

    This paper explores the potential of data mining as a technique that could be used by malicious data miners to threaten the privacy of social network sites (SNS) users. It applies a data mining algorithm to a real dataset to provide empirically-based evidence of the ease with which characteristics about the SNS users can be discovered and used in a way that could invade their privacy. One major contribution of this article is the use of the decision forest data mining algorithm (SysFor) to the context of SNS, which does not only build a decision tree but rather a forest allowing the exploration of more logic rules from a dataset. One logic rule that SysFor built in this study, for example, revealed that anyone having a profile picture showing just the face or a picture showing a family is less likely to be lonely. Another contribution of this article is the discussion of the implications of the data mining problem for governments, businesses, developers and the SNS users themselves.

  18. Comparison of rule induction, decision trees and formal concept analysis approaches for classification

    NASA Astrophysics Data System (ADS)

    Kotelnikov, E. V.; Milov, V. R.

    2018-05-01

    Rule-based learning algorithms have higher transparency and easiness to interpret in comparison with neural networks and deep learning algorithms. These properties make it possible to effectively use such algorithms to solve descriptive tasks of data mining. The choice of an algorithm depends also on its ability to solve predictive tasks. The article compares the quality of the solution of the problems with binary and multiclass classification based on the experiments with six datasets from the UCI Machine Learning Repository. The authors investigate three algorithms: Ripper (rule induction), C4.5 (decision trees), In-Close (formal concept analysis). The results of the experiments show that In-Close demonstrates the best quality of classification in comparison with Ripper and C4.5, however the latter two generate more compact rule sets.

  19. Real-time intelligent decision making with data mining

    NASA Astrophysics Data System (ADS)

    Gupta, Deepak P.; Gopalakrishnan, Bhaskaran

    2004-03-01

    Database mining, widely known as knowledge discovery and data mining (KDD), has attracted lot of attention in recent years. With the rapid growth of databases in commercial, industrial, administrative and other applications, it is necessary and interesting to extract knowledge automatically from huge amount of data. Almost all the organizations are generating data and information at an unprecedented rate and they need to get some useful information from this data. Data mining is the extraction of non-trivial, previously unknown and potentially useful patterns, trends, dependence and correlation known as association rules among data values in large databases. In last ten to fifteen years, data mining spread out from one company to the other to help them understand more about customers' aspect of quality and response and also distinguish the customers they want from those they do not. A credit-card company found that customers who complete their applications in pencil rather than pen are more likely to default. There is a program that identifies callers by purchase history. The bigger the spender, the quicker the call will be answered. If you feel your call is being answered in the order in which it was received, think again. Many algorithms assume that data is static in nature and mine the rules and relations in that data. But for a dynamic database e.g. in most of the manufacturing industries, the rules and relations thus developed among the variables/items no longer hold true. A simple approach may be to mine the associations among the variables after every fixed period of time. But again, how much the length of this period should be, is a question to be answered. The next problem with the static data mining is that some of the relationships that might be of interest from one period to the other may be lost after a new set of data is used. To reflect the effect of new data set and current status of the association rules where some of the strong rules might become weak and vice versa, there is a need to develop an efficient algorithm to adapt to the current patterns and associations. Some work has been done in developing the association rules for incremental database but to the best of the author"s knowledge no work has been done to do the same for periodic cause and effect analysis for online association rules in manufacturing industries. The present research attempts to answer these questions and develop an algorithm that can display the association rules online, find the periodic patterns in the data and detect the root cause of the problem.

  20. A fuzzy hill-climbing algorithm for the development of a compact associative classifier

    NASA Astrophysics Data System (ADS)

    Mitra, Soumyaroop; Lam, Sarah S.

    2012-02-01

    Classification, a data mining technique, has widespread applications including medical diagnosis, targeted marketing, and others. Knowledge discovery from databases in the form of association rules is one of the important data mining tasks. An integrated approach, classification based on association rules, has drawn the attention of the data mining community over the last decade. While attention has been mainly focused on increasing classifier accuracies, not much efforts have been devoted towards building interpretable and less complex models. This paper discusses the development of a compact associative classification model using a hill-climbing approach and fuzzy sets. The proposed methodology builds the rule-base by selecting rules which contribute towards increasing training accuracy, thus balancing classification accuracy with the number of classification association rules. The results indicated that the proposed associative classification model can achieve competitive accuracies on benchmark datasets with continuous attributes and lend better interpretability, when compared with other rule-based systems.

  1. Mining Research on Vibration Signal Association Rules of Quayside Container Crane Hoisting Motor Based on Apriori Algorithm

    NASA Astrophysics Data System (ADS)

    Yang, Chencheng; Tang, Gang; Hu, Xiong

    2017-07-01

    Shore-hoisting motor in the daily work will produce a large number of vibration signal data,in order to analyze the correlation among the data and discover the fault and potential safety hazard of the motor, the data are discretized first, and then Apriori algorithm are used to mine the strong association rules among the data. The results show that the relationship between day 1 and day 16 is the most closely related, which can guide the staff to analyze the work of these two days of motor to find and solve the problem of fault and safety.

  2. Evolutionary Data Mining Approach to Creating Digital Logic

    DTIC Science & Technology

    2010-01-01

    To deal with this problem a genetic program (GP) based data mining ( DM ) procedure has been invented (Smith 2005). A genetic program is an algorithm...that can operate on the variables. When a GP was used as a DM function in the past to automatically create fuzzy decision trees, the Report...rules represents an approach to the determining the effect of linguistic imprecision, i.e., the inability of experts to provide crisp rules. The

  3. Negative and Positive Association Rules Mining from Text Using Frequent and Infrequent Itemsets

    PubMed Central

    Mahmood, Sajid; Shahbaz, Muhammad; Guergachi, Aziz

    2014-01-01

    Association rule mining research typically focuses on positive association rules (PARs), generated from frequently occurring itemsets. However, in recent years, there has been a significant research focused on finding interesting infrequent itemsets leading to the discovery of negative association rules (NARs). The discovery of infrequent itemsets is far more difficult than their counterparts, that is, frequent itemsets. These problems include infrequent itemsets discovery and generation of accurate NARs, and their huge number as compared with positive association rules. In medical science, for example, one is interested in factors which can either adjudicate the presence of a disease or write-off of its possibility. The vivid positive symptoms are often obvious; however, negative symptoms are subtler and more difficult to recognize and diagnose. In this paper, we propose an algorithm for discovering positive and negative association rules among frequent and infrequent itemsets. We identify associations among medications, symptoms, and laboratory results using state-of-the-art data mining technology. PMID:24955429

  4. The Weather Forecast Using Data Mining Research Based on Cloud Computing.

    NASA Astrophysics Data System (ADS)

    Wang, ZhanJie; Mazharul Mujib, A. B. M.

    2017-10-01

    Weather forecasting has been an important application in meteorology and one of the most scientifically and technologically challenging problem around the world. In my study, we have analyzed the use of data mining techniques in forecasting weather. This paper proposes a modern method to develop a service oriented architecture for the weather information systems which forecast weather using these data mining techniques. This can be carried out by using Artificial Neural Network and Decision tree Algorithms and meteorological data collected in Specific time. Algorithm has presented the best results to generate classification rules for the mean weather variables. The results showed that these data mining techniques can be enough for weather forecasting.

  5. Dynamic association rules for gene expression data analysis.

    PubMed

    Chen, Shu-Chuan; Tsai, Tsung-Hsien; Chung, Cheng-Han; Li, Wen-Hsiung

    2015-10-14

    The purpose of gene expression analysis is to look for the association between regulation of gene expression levels and phenotypic variations. This association based on gene expression profile has been used to determine whether the induction/repression of genes correspond to phenotypic variations including cell regulations, clinical diagnoses and drug development. Statistical analyses on microarray data have been developed to resolve gene selection issue. However, these methods do not inform us of causality between genes and phenotypes. In this paper, we propose the dynamic association rule algorithm (DAR algorithm) which helps ones to efficiently select a subset of significant genes for subsequent analysis. The DAR algorithm is based on association rules from market basket analysis in marketing. We first propose a statistical way, based on constructing a one-sided confidence interval and hypothesis testing, to determine if an association rule is meaningful. Based on the proposed statistical method, we then developed the DAR algorithm for gene expression data analysis. The method was applied to analyze four microarray datasets and one Next Generation Sequencing (NGS) dataset: the Mice Apo A1 dataset, the whole genome expression dataset of mouse embryonic stem cells, expression profiling of the bone marrow of Leukemia patients, Microarray Quality Control (MAQC) data set and the RNA-seq dataset of a mouse genomic imprinting study. A comparison of the proposed method with the t-test on the expression profiling of the bone marrow of Leukemia patients was conducted. We developed a statistical way, based on the concept of confidence interval, to determine the minimum support and minimum confidence for mining association relationships among items. With the minimum support and minimum confidence, one can find significant rules in one single step. The DAR algorithm was then developed for gene expression data analysis. Four gene expression datasets showed that the proposed DAR algorithm not only was able to identify a set of differentially expressed genes that largely agreed with that of other methods, but also provided an efficient and accurate way to find influential genes of a disease. In the paper, the well-established association rule mining technique from marketing has been successfully modified to determine the minimum support and minimum confidence based on the concept of confidence interval and hypothesis testing. It can be applied to gene expression data to mine significant association rules between gene regulation and phenotype. The proposed DAR algorithm provides an efficient way to find influential genes that underlie the phenotypic variance.

  6. A Swarm Optimization approach for clinical knowledge mining.

    PubMed

    Christopher, J Jabez; Nehemiah, H Khanna; Kannan, A

    2015-10-01

    Rule-based classification is a typical data mining task that is being used in several medical diagnosis and decision support systems. The rules stored in the rule base have an impact on classification efficiency. Rule sets that are extracted with data mining tools and techniques are optimized using heuristic or meta-heuristic approaches in order to improve the quality of the rule base. In this work, a meta-heuristic approach called Wind-driven Swarm Optimization (WSO) is used. The uniqueness of this work lies in the biological inspiration that underlies the algorithm. WSO uses Jval, a new metric, to evaluate the efficiency of a rule-based classifier. Rules are extracted from decision trees. WSO is used to obtain different permutations and combinations of rules whereby the optimal ruleset that satisfies the requirement of the developer is used for predicting the test data. The performance of various extensions of decision trees, namely, RIPPER, PART, FURIA and Decision Tables are analyzed. The efficiency of WSO is also compared with the traditional Particle Swarm Optimization. Experiments were carried out with six benchmark medical datasets. The traditional C4.5 algorithm yields 62.89% accuracy with 43 rules for liver disorders dataset where as WSO yields 64.60% with 19 rules. For Heart disease dataset, C4.5 is 68.64% accurate with 98 rules where as WSO is 77.8% accurate with 34 rules. The normalized standard deviation for accuracy of PSO and WSO are 0.5921 and 0.5846 respectively. WSO provides accurate and concise rulesets. PSO yields results similar to that of WSO but the novelty of WSO lies in its biological motivation and it is customization for rule base optimization. The trade-off between the prediction accuracy and the size of the rule base is optimized during the design and development of rule-based clinical decision support system. The efficiency of a decision support system relies on the content of the rule base and classification accuracy. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  7. Frequent Itemset Hiding Algorithm Using Frequent Pattern Tree Approach

    ERIC Educational Resources Information Center

    Alnatsheh, Rami

    2012-01-01

    A problem that has been the focus of much recent research in privacy preserving data-mining is the frequent itemset hiding (FIH) problem. Identifying itemsets that appear together frequently in customer transactions is a common task in association rule mining. Organizations that share data with business partners may consider some of the frequent…

  8. Analyzing Large Gene Expression and Methylation Data Profiles Using StatBicRM: Statistical Biclustering-Based Rule Mining

    PubMed Central

    Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

    2015-01-01

    Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level. PMID:25830807

  9. Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.

    PubMed

    Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

    2015-01-01

    Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level.

  10. Prediction model for peninsular Indian summer monsoon rainfall using data mining and statistical approaches

    NASA Astrophysics Data System (ADS)

    Vathsala, H.; Koolagudi, Shashidhar G.

    2017-01-01

    In this paper we discuss a data mining application for predicting peninsular Indian summer monsoon rainfall, and propose an algorithm that combine data mining and statistical techniques. We select likely predictors based on association rules that have the highest confidence levels. We then cluster the selected predictors to reduce their dimensions and use cluster membership values for classification. We derive the predictors from local conditions in southern India, including mean sea level pressure, wind speed, and maximum and minimum temperatures. The global condition variables include southern oscillation and Indian Ocean dipole conditions. The algorithm predicts rainfall in five categories: Flood, Excess, Normal, Deficit and Drought. We use closed itemset mining, cluster membership calculations and a multilayer perceptron function in the algorithm to predict monsoon rainfall in peninsular India. Using Indian Institute of Tropical Meteorology data, we found the prediction accuracy of our proposed approach to be exceptionally good.

  11. Privacy Preserving Association Rule Mining Revisited: Privacy Enhancement and Resources Efficiency

    NASA Astrophysics Data System (ADS)

    Mohaisen, Abedelaziz; Jho, Nam-Su; Hong, Dowon; Nyang, Daehun

    Privacy preserving association rule mining algorithms have been designed for discovering the relations between variables in data while maintaining the data privacy. In this article we revise one of the recently introduced schemes for association rule mining using fake transactions (FS). In particular, our analysis shows that the FS scheme has exhaustive storage and high computation requirements for guaranteeing a reasonable level of privacy. We introduce a realistic definition of privacy that benefits from the average case privacy and motivates the study of a weakness in the structure of FS by fake transactions filtering. In order to overcome this problem, we improve the FS scheme by presenting a hybrid scheme that considers both privacy and resources as two concurrent guidelines. Analytical and empirical results show the efficiency and applicability of our proposed scheme.

  12. SCADA-based Operator Support System for Power Plant Equipment Fault Forecasting

    NASA Astrophysics Data System (ADS)

    Mayadevi, N.; Ushakumari, S. S.; Vinodchandra, S. S.

    2014-12-01

    Power plant equipment must be monitored closely to prevent failures from disrupting plant availability. Online monitoring technology integrated with hybrid forecasting techniques can be used to prevent plant equipment faults. A self learning rule-based expert system is proposed in this paper for fault forecasting in power plants controlled by supervisory control and data acquisition (SCADA) system. Self-learning utilizes associative data mining algorithms on the SCADA history database to form new rules that can dynamically update the knowledge base of the rule-based expert system. In this study, a number of popular associative learning algorithms are considered for rule formation. Data mining results show that the Tertius algorithm is best suited for developing a learning engine for power plants. For real-time monitoring of the plant condition, graphical models are constructed by K-means clustering. To build a time-series forecasting model, a multi layer preceptron (MLP) is used. Once created, the models are updated in the model library to provide an adaptive environment for the proposed system. Graphical user interface (GUI) illustrates the variation of all sensor values affecting a particular alarm/fault, as well as the step-by-step procedure for avoiding critical situations and consequent plant shutdown. The forecasting performance is evaluated by computing the mean absolute error and root mean square error of the predictions.

  13. Applying Data Mining Techniques to Extract Hidden Patterns about Breast Cancer Survival in an Iranian Cohort Study.

    PubMed

    Khalkhali, Hamid Reza; Lotfnezhad Afshar, Hadi; Esnaashari, Omid; Jabbari, Nasrollah

    2016-01-01

    Breast cancer survival has been analyzed by many standard data mining algorithms. A group of these algorithms belonged to the decision tree category. Ability of the decision tree algorithms in terms of visualizing and formulating of hidden patterns among study variables were main reasons to apply an algorithm from the decision tree category in the current study that has not studied already. The classification and regression trees (CART) was applied to a breast cancer database contained information on 569 patients in 2007-2010. The measurement of Gini impurity used for categorical target variables was utilized. The classification error that is a function of tree size was measured by 10-fold cross-validation experiments. The performance of created model was evaluated by the criteria as accuracy, sensitivity and specificity. The CART model produced a decision tree with 17 nodes, 9 of which were associated with a set of rules. The rules were meaningful clinically. They showed in the if-then format that Stage was the most important variable for predicting breast cancer survival. The scores of accuracy, sensitivity and specificity were: 80.3%, 93.5% and 53%, respectively. The current study model as the first one created by the CART was able to extract useful hidden rules from a relatively small size dataset.

  14. Predicting biomedical metadata in CEDAR: A study of Gene Expression Omnibus (GEO).

    PubMed

    Panahiazar, Maryam; Dumontier, Michel; Gevaert, Olivier

    2017-08-01

    A crucial and limiting factor in data reuse is the lack of accurate, structured, and complete descriptions of data, known as metadata. Towards improving the quantity and quality of metadata, we propose a novel metadata prediction framework to learn associations from existing metadata that can be used to predict metadata values. We evaluate our framework in the context of experimental metadata from the Gene Expression Omnibus (GEO). We applied four rule mining algorithms to the most common structured metadata elements (sample type, molecular type, platform, label type and organism) from over 1.3million GEO records. We examined the quality of well supported rules from each algorithm and visualized the dependencies among metadata elements. Finally, we evaluated the performance of the algorithms in terms of accuracy, precision, recall, and F-measure. We found that PART is the best algorithm outperforming Apriori, Predictive Apriori, and Decision Table. All algorithms perform significantly better in predicting class values than the majority vote classifier. We found that the performance of the algorithms is related to the dimensionality of the GEO elements. The average performance of all algorithm increases due of the decreasing of dimensionality of the unique values of these elements (2697 platforms, 537 organisms, 454 labels, 9 molecules, and 5 types). Our work suggests that experimental metadata such as present in GEO can be accurately predicted using rule mining algorithms. Our work has implications for both prospective and retrospective augmentation of metadata quality, which are geared towards making data easier to find and reuse. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  15. Extracting Cross-Ontology Weighted Association Rules from Gene Ontology Annotations.

    PubMed

    Agapito, Giuseppe; Milano, Marianna; Guzzi, Pietro Hiram; Cannataro, Mario

    2016-01-01

    Gene Ontology (GO) is a structured repository of concepts (GO Terms) that are associated to one or more gene products through a process referred to as annotation. The analysis of annotated data is an important opportunity for bioinformatics. There are different approaches of analysis, among those, the use of association rules (AR) which provides useful knowledge, discovering biologically relevant associations between terms of GO, not previously known. In a previous work, we introduced GO-WAR (Gene Ontology-based Weighted Association Rules), a methodology for extracting weighted association rules from ontology-based annotated datasets. We here adapt the GO-WAR algorithm to mine cross-ontology association rules, i.e., rules that involve GO terms present in the three sub-ontologies of GO. We conduct a deep performance evaluation of GO-WAR by mining publicly available GO annotated datasets, showing how GO-WAR outperforms current state of the art approaches.

  16. Long-range prediction of Indian summer monsoon rainfall using data mining and statistical approaches

    NASA Astrophysics Data System (ADS)

    H, Vathsala; Koolagudi, Shashidhar G.

    2017-10-01

    This paper presents a hybrid model to better predict Indian summer monsoon rainfall. The algorithm considers suitable techniques for processing dense datasets. The proposed three-step algorithm comprises closed itemset generation-based association rule mining for feature selection, cluster membership for dimensionality reduction, and simple logistic function for prediction. The application of predicting rainfall into flood, excess, normal, deficit, and drought based on 36 predictors consisting of land and ocean variables is presented. Results show good accuracy in the considered study period of 37years (1969-2005).

  17. Data Mining Methods for Recommender Systems

    NASA Astrophysics Data System (ADS)

    Amatriain, Xavier; Jaimes*, Alejandro; Oliver, Nuria; Pujol, Josep M.

    In this chapter, we give an overview of the main Data Mining techniques used in the context of Recommender Systems. We first describe common preprocessing methods such as sampling or dimensionality reduction. Next, we review the most important classification techniques, including Bayesian Networks and Support Vector Machines. We describe the k-means clustering algorithm and discuss several alternatives. We also present association rules and related algorithms for an efficient training process. In addition to introducing these techniques, we survey their uses in Recommender Systems and present cases where they have been successfully applied.

  18. Multiagent data warehousing and multiagent data mining for cerebrum/cerebellum modeling

    NASA Astrophysics Data System (ADS)

    Zhang, Wen-Ran

    2002-03-01

    An algorithm named Neighbor-Miner is outlined for multiagent data warehousing and multiagent data mining. The algorithm is defined in an evolving dynamic environment with autonomous or semiautonomous agents. Instead of mining frequent itemsets from customer transactions, the new algorithm discovers new agents and mining agent associations in first-order logic from agent attributes and actions. While the Apriori algorithm uses frequency as a priory threshold, the new algorithm uses agent similarity as priory knowledge. The concept of agent similarity leads to the notions of agent cuboid, orthogonal multiagent data warehousing (MADWH), and multiagent data mining (MADM). Based on agent similarities and action similarities, Neighbor-Miner is proposed and illustrated in a MADWH/MADM approach to cerebrum/cerebellum modeling. It is shown that (1) semiautonomous neurofuzzy agents can be identified for uniped locomotion and gymnastic training based on attribute relevance analysis; (2) new agents can be discovered and agent cuboids can be dynamically constructed in an orthogonal MADWH, which resembles an evolving cerebrum/cerebellum system; and (3) dynamic motion laws can be discovered as association rules in first order logic. Although examples in legged robot gymnastics are used to illustrate the basic ideas, the new approach is generally suitable for a broad category of data mining tasks where knowledge can be discovered collectively by a set of agents from a geographically or geometrically distributed but relevant environment, especially in scientific and engineering data environments.

  19. Association rule mining in the US Vaccine Adverse Event Reporting System (VAERS).

    PubMed

    Wei, Lai; Scott, John

    2015-09-01

    Spontaneous adverse event reporting systems are critical tools for monitoring the safety of licensed medical products. Commonly used signal detection algorithms identify disproportionate product-adverse event pairs and may not be sensitive to more complex potential signals. We sought to develop a computationally tractable multivariate data-mining approach to identify product-multiple adverse event associations. We describe an application of stepwise association rule mining (Step-ARM) to detect potential vaccine-symptom group associations in the US Vaccine Adverse Event Reporting System. Step-ARM identifies strong associations between one vaccine and one or more adverse events. To reduce the number of redundant association rules found by Step-ARM, we also propose a clustering method for the post-processing of association rules. In sample applications to a trivalent intradermal inactivated influenza virus vaccine and to measles, mumps, rubella, and varicella (MMRV) vaccine and in simulation studies, we find that Step-ARM can detect a variety of medically coherent potential vaccine-symptom group signals efficiently. In the MMRV example, Step-ARM appears to outperform univariate methods in detecting a known safety signal. Our approach is sensitive to potentially complex signals, which may be particularly important when monitoring novel medical countermeasure products such as pandemic influenza vaccines. The post-processing clustering algorithm improves the applicability of the approach as a screening method to identify patterns that may merit further investigation. Copyright © 2015 John Wiley & Sons, Ltd.

  20. The application of data mining to explore association rules between metabolic syndrome and lifestyles.

    PubMed

    Huang, Yi Chao

    This study used an efficient data mining algorithm, called DCIP (the data cutting and inner product method), to explore association rules between the lifestyles of factory workers in Taiwan and the metabolic syndrome. A total of 1,216 workers in four companies completed a lifestyle questionnaire. Results of the questionnaire survey were integrated into the workers' health examination reports to form an attribute database of the metabolic syndrome. Among the association rules derived by DCIP, 80% of those on the list of the top 15 highest support counts are corroborated by medical literature or by healthcare professionals. These findings prove that data mining is a valid and effective research method, and that larger sample sizes will likely produce more accurate associations connecting the metabolic syndrome to specific lifestyles. The rules already verified can serve as a reference guide for the health management of factory workers. The remaining 20%, while still lacking hard evidence, provide fertile ground for future research.

  1. Efficient discovery of risk patterns in medical data.

    PubMed

    Li, Jiuyong; Fu, Ada Wai-chee; Fahey, Paul

    2009-01-01

    This paper studies a problem of efficiently discovering risk patterns in medical data. Risk patterns are defined by a statistical metric, relative risk, which has been widely used in epidemiological research. To avoid fruitless search in the complete exploration of risk patterns, we define optimal risk pattern set to exclude superfluous patterns, i.e. complicated patterns with lower relative risk than their corresponding simpler form patterns. We prove that mining optimal risk pattern sets conforms an anti-monotone property that supports an efficient mining algorithm. We propose an efficient algorithm for mining optimal risk pattern sets based on this property. We also propose a hierarchical structure to present discovered patterns for the easy perusal by domain experts. The proposed approach is compared with two well-known rule discovery methods, decision tree and association rule mining approaches on benchmark data sets and applied to a real world application. The proposed method discovers more and better quality risk patterns than a decision tree approach. The decision tree method is not designed for such applications and is inadequate for pattern exploring. The proposed method does not discover a large number of uninteresting superfluous patterns as an association mining approach does. The proposed method is more efficient than an association rule mining method. A real world case study shows that the method reveals some interesting risk patterns to medical practitioners. The proposed method is an efficient approach to explore risk patterns. It quickly identifies cohorts of patients that are vulnerable to a risk outcome from a large data set. The proposed method is useful for exploratory study on large medical data to generate and refine hypotheses. The method is also useful for designing medical surveillance systems.

  2. A novel association rule mining approach using TID intermediate itemset.

    PubMed

    Aqra, Iyad; Herawan, Tutut; Abdul Ghani, Norjihan; Akhunzada, Adnan; Ali, Akhtar; Bin Razali, Ramdan; Ilahi, Manzoor; Raymond Choo, Kim-Kwang

    2018-01-01

    Designing an efficient association rule mining (ARM) algorithm for multilevel knowledge-based transactional databases that is appropriate for real-world deployments is of paramount concern. However, dynamic decision making that needs to modify the threshold either to minimize or maximize the output knowledge certainly necessitates the extant state-of-the-art algorithms to rescan the entire database. Subsequently, the process incurs heavy computation cost and is not feasible for real-time applications. The paper addresses efficiently the problem of threshold dynamic updation for a given purpose. The paper contributes by presenting a novel ARM approach that creates an intermediate itemset and applies a threshold to extract categorical frequent itemsets with diverse threshold values. Thus, improving the overall efficiency as we no longer needs to scan the whole database. After the entire itemset is built, we are able to obtain real support without the need of rebuilding the itemset (e.g. Itemset list is intersected to obtain the actual support). Moreover, the algorithm supports to extract many frequent itemsets according to a pre-determined minimum support with an independent purpose. Additionally, the experimental results of our proposed approach demonstrate the capability to be deployed in any mining system in a fully parallel mode; consequently, increasing the efficiency of the real-time association rules discovery process. The proposed approach outperforms the extant state-of-the-art and shows promising results that reduce computation cost, increase accuracy, and produce all possible itemsets.

  3. A novel association rule mining approach using TID intermediate itemset

    PubMed Central

    Ali, Akhtar; Bin Razali, Ramdan; Ilahi, Manzoor; Raymond Choo, Kim-Kwang

    2018-01-01

    Designing an efficient association rule mining (ARM) algorithm for multilevel knowledge-based transactional databases that is appropriate for real-world deployments is of paramount concern. However, dynamic decision making that needs to modify the threshold either to minimize or maximize the output knowledge certainly necessitates the extant state-of-the-art algorithms to rescan the entire database. Subsequently, the process incurs heavy computation cost and is not feasible for real-time applications. The paper addresses efficiently the problem of threshold dynamic updation for a given purpose. The paper contributes by presenting a novel ARM approach that creates an intermediate itemset and applies a threshold to extract categorical frequent itemsets with diverse threshold values. Thus, improving the overall efficiency as we no longer needs to scan the whole database. After the entire itemset is built, we are able to obtain real support without the need of rebuilding the itemset (e.g. Itemset list is intersected to obtain the actual support). Moreover, the algorithm supports to extract many frequent itemsets according to a pre-determined minimum support with an independent purpose. Additionally, the experimental results of our proposed approach demonstrate the capability to be deployed in any mining system in a fully parallel mode; consequently, increasing the efficiency of the real-time association rules discovery process. The proposed approach outperforms the extant state-of-the-art and shows promising results that reduce computation cost, increase accuracy, and produce all possible itemsets. PMID:29351287

  4. Automatic detection of referral patients due to retinal pathologies through data mining.

    PubMed

    Quellec, Gwenolé; Lamard, Mathieu; Erginay, Ali; Chabouis, Agnès; Massin, Pascale; Cochener, Béatrice; Cazuguel, Guy

    2016-04-01

    With the increased prevalence of retinal pathologies, automating the detection of these pathologies is becoming more and more relevant. In the past few years, many algorithms have been developed for the automated detection of a specific pathology, typically diabetic retinopathy, using eye fundus photography. No matter how good these algorithms are, we believe many clinicians would not use automatic detection tools focusing on a single pathology and ignoring any other pathology present in the patient's retinas. To solve this issue, an algorithm for characterizing the appearance of abnormal retinas, as well as the appearance of the normal ones, is presented. This algorithm does not focus on individual images: it considers examination records consisting of multiple photographs of each retina, together with contextual information about the patient. Specifically, it relies on data mining in order to learn diagnosis rules from characterizations of fundus examination records. The main novelty is that the content of examination records (images and context) is characterized at multiple levels of spatial and lexical granularity: 1) spatial flexibility is ensured by an adaptive decomposition of composite retinal images into a cascade of regions, 2) lexical granularity is ensured by an adaptive decomposition of the feature space into a cascade of visual words. This multigranular representation allows for great flexibility in automatically characterizing normality and abnormality: it is possible to generate diagnosis rules whose precision and generalization ability can be traded off depending on data availability. A variation on usual data mining algorithms, originally designed to mine static data, is proposed so that contextual and visual data at adaptive granularity levels can be mined. This framework was evaluated in e-ophtha, a dataset of 25,702 examination records from the OPHDIAT screening network, as well as in the publicly-available Messidor dataset. It was successfully applied to the detection of patients that should be referred to an ophthalmologist and also to the specific detection of several pathologies. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. Implementation of Data Mining to Analyze Drug Cases Using C4.5 Decision Tree

    NASA Astrophysics Data System (ADS)

    Wahyuni, Sri

    2018-03-01

    Data mining was the process of finding useful information from a large set of databases. One of the existing techniques in data mining was classification. The method used was decision tree method and algorithm used was C4.5 algorithm. The decision tree method was a method that transformed a very large fact into a decision tree which was presenting the rules. Decision tree method was useful for exploring data, as well as finding a hidden relationship between a number of potential input variables with a target variable. The decision tree of the C4.5 algorithm was constructed with several stages including the selection of attributes as roots, created a branch for each value and divided the case into the branch. These stages would be repeated for each branch until all the cases on the branch had the same class. From the solution of the decision tree there would be some rules of a case. In this case the researcher classified the data of prisoners at Labuhan Deli prison to know the factors of detainees committing criminal acts of drugs. By applying this C4.5 algorithm, then the knowledge was obtained as information to minimize the criminal acts of drugs. From the findings of the research, it was found that the most influential factor of the detainee committed the criminal act of drugs was from the address variable.

  6. Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences.

    PubMed

    Chiu, Shih-Hau; Chen, Chien-Chi; Yuan, Gwo-Fang; Lin, Thy-Hou

    2006-06-15

    The number of sequences compiled in many genome projects is growing exponentially, but most of them have not been characterized experimentally. An automatic annotation scheme must be in an urgent need to reduce the gap between the amount of new sequences produced and reliable functional annotation. This work proposes rules for automatically classifying the fungus genes. The approach involves elucidating the enzyme classifying rule that is hidden in UniProt protein knowledgebase and then applying it for classification. The association algorithm, Apriori, is utilized to mine the relationship between the enzyme class and significant InterPro entries. The candidate rules are evaluated for their classificatory capacity. There were five datasets collected from the Swiss-Prot for establishing the annotation rules. These were treated as the training sets. The TrEMBL entries were treated as the testing set. A correct enzyme classification rate of 70% was obtained for the prokaryote datasets and a similar rate of about 80% was obtained for the eukaryote datasets. The fungus training dataset which lacks an enzyme class description was also used to evaluate the fungus candidate rules. A total of 88 out of 5085 test entries were matched with the fungus rule set. These were otherwise poorly annotated using their functional descriptions. The feasibility of using the method presented here to classify enzyme classes based on the enzyme domain rules is evident. The rules may be also employed by the protein annotators in manual annotation or implemented in an automatic annotation flowchart.

  7. Spatio-Temporal Pattern Mining on Trajectory Data Using Arm

    NASA Astrophysics Data System (ADS)

    Khoshahval, S.; Farnaghi, M.; Taleai, M.

    2017-09-01

    Preliminary mobile was considered to be a device to make human connections easier. But today the consumption of this device has been evolved to a platform for gaming, web surfing and GPS-enabled application capabilities. Embedding GPS in handheld devices, altered them to significant trajectory data gathering facilities. Raw GPS trajectory data is a series of points which contains hidden information. For revealing hidden information in traces, trajectory data analysis is needed. One of the most beneficial concealed information in trajectory data is user activity patterns. In each pattern, there are multiple stops and moves which identifies users visited places and tasks. This paper proposes an approach to discover user daily activity patterns from GPS trajectories using association rules. Finding user patterns needs extraction of user's visited places from stops and moves of GPS trajectories. In order to locate stops and moves, we have implemented a place recognition algorithm. After extraction of visited points an advanced association rule mining algorithm, called Apriori was used to extract user activity patterns. This study outlined that there are useful patterns in each trajectory that can be emerged from raw GPS data using association rule mining techniques in order to find out about multiple users' behaviour in a system and can be utilized in various location-based applications.

  8. A Recommendation Algorithm for Automating Corollary Order Generation

    PubMed Central

    Klann, Jeffrey; Schadow, Gunther; McCoy, JM

    2009-01-01

    Manual development and maintenance of decision support content is time-consuming and expensive. We explore recommendation algorithms, e-commerce data-mining tools that use collective order history to suggest purchases, to assist with this. In particular, previous work shows corollary order suggestions are amenable to automated data-mining techniques. Here, an item-based collaborative filtering algorithm augmented with association rule interestingness measures mined suggestions from 866,445 orders made in an inpatient hospital in 2007, generating 584 potential corollary orders. Our expert physician panel evaluated the top 92 and agreed 75.3% were clinically meaningful. Also, at least one felt 47.9% would be directly relevant in guideline development. This automated generation of a rough-cut of corollary orders confirms prior indications about automated tools in building decision support content. It is an important step toward computerized augmentation to decision support development, which could increase development efficiency and content quality while automatically capturing local standards. PMID:20351875

  9. A recommendation algorithm for automating corollary order generation.

    PubMed

    Klann, Jeffrey; Schadow, Gunther; McCoy, J M

    2009-11-14

    Manual development and maintenance of decision support content is time-consuming and expensive. We explore recommendation algorithms, e-commerce data-mining tools that use collective order history to suggest purchases, to assist with this. In particular, previous work shows corollary order suggestions are amenable to automated data-mining techniques. Here, an item-based collaborative filtering algorithm augmented with association rule interestingness measures mined suggestions from 866,445 orders made in an inpatient hospital in 2007, generating 584 potential corollary orders. Our expert physician panel evaluated the top 92 and agreed 75.3% were clinically meaningful. Also, at least one felt 47.9% would be directly relevant in guideline development. This automated generation of a rough-cut of corollary orders confirms prior indications about automated tools in building decision support content. It is an important step toward computerized augmentation to decision support development, which could increase development efficiency and content quality while automatically capturing local standards.

  10. A primer to frequent itemset mining for bioinformatics

    PubMed Central

    Naulaerts, Stefan; Meysman, Pieter; Bittremieux, Wout; Vu, Trung Nghia; Vanden Berghe, Wim; Goethals, Bart

    2015-01-01

    Over the past two decades, pattern mining techniques have become an integral part of many bioinformatics solutions. Frequent itemset mining is a popular group of pattern mining techniques designed to identify elements that frequently co-occur. An archetypical example is the identification of products that often end up together in the same shopping basket in supermarket transactions. A number of algorithms have been developed to address variations of this computationally non-trivial problem. Frequent itemset mining techniques are able to efficiently capture the characteristics of (complex) data and succinctly summarize it. Owing to these and other interesting properties, these techniques have proven their value in biological data analysis. Nevertheless, information about the bioinformatics applications of these techniques remains scattered. In this primer, we introduce frequent itemset mining and their derived association rules for life scientists. We give an overview of various algorithms, and illustrate how they can be used in several real-life bioinformatics application domains. We end with a discussion of the future potential and open challenges for frequent itemset mining in the life sciences. PMID:24162173

  11. Biclustering Learning of Trading Rules.

    PubMed

    Huang, Qinghua; Wang, Ting; Tao, Dacheng; Li, Xuelong

    2015-10-01

    Technical analysis with numerous indicators and patterns has been regarded as important evidence for making trading decisions in financial markets. However, it is extremely difficult for investors to find useful trading rules based on numerous technical indicators. This paper innovatively proposes the use of biclustering mining to discover effective technical trading patterns that contain a combination of indicators from historical financial data series. This is the first attempt to use biclustering algorithm on trading data. The mined patterns are regarded as trading rules and can be classified as three trading actions (i.e., the buy, the sell, and no-action signals) with respect to the maximum support. A modified K nearest neighborhood ( K -NN) method is applied to classification of trading days in the testing period. The proposed method [called biclustering algorithm and the K nearest neighbor (BIC- K -NN)] was implemented on four historical datasets and the average performance was compared with the conventional buy-and-hold strategy and three previously reported intelligent trading systems. Experimental results demonstrate that the proposed trading system outperforms its counterparts and will be useful for investment in various financial markets.

  12. Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences

    PubMed Central

    Chiu, Shih-Hau; Chen, Chien-Chi; Yuan, Gwo-Fang; Lin, Thy-Hou

    2006-01-01

    Background The number of sequences compiled in many genome projects is growing exponentially, but most of them have not been characterized experimentally. An automatic annotation scheme must be in an urgent need to reduce the gap between the amount of new sequences produced and reliable functional annotation. This work proposes rules for automatically classifying the fungus genes. The approach involves elucidating the enzyme classifying rule that is hidden in UniProt protein knowledgebase and then applying it for classification. The association algorithm, Apriori, is utilized to mine the relationship between the enzyme class and significant InterPro entries. The candidate rules are evaluated for their classificatory capacity. Results There were five datasets collected from the Swiss-Prot for establishing the annotation rules. These were treated as the training sets. The TrEMBL entries were treated as the testing set. A correct enzyme classification rate of 70% was obtained for the prokaryote datasets and a similar rate of about 80% was obtained for the eukaryote datasets. The fungus training dataset which lacks an enzyme class description was also used to evaluate the fungus candidate rules. A total of 88 out of 5085 test entries were matched with the fungus rule set. These were otherwise poorly annotated using their functional descriptions. Conclusion The feasibility of using the method presented here to classify enzyme classes based on the enzyme domain rules is evident. The rules may be also employed by the protein annotators in manual annotation or implemented in an automatic annotation flowchart. PMID:16776838

  13. Analysis of North Atlantic tropical cyclone intensify change using data mining

    NASA Astrophysics Data System (ADS)

    Tang, Jiang

    Tropical cyclones (TC), especially when their intensity reaches hurricane scale, can become a costly natural hazard. Accurate prediction of tropical cyclone intensity is very difficult because of inadequate observations on TC structures, poor understanding of physical processes, coarse model resolution and inaccurate initial conditions, etc. This study aims to tackle two factors that account for the underperformance of current TC intensity forecasts: (1) inadequate observations of TC structures, and (2) deficient understanding of the underlying physical processes governing TC intensification. To tackle the problem of inadequate observations of TC structures, efforts have been made to extract vertical and horizontal structural parameters of latent heat release from Tropical Rainfall Measuring Mission (TRMM) Precipitation Radar (PR) data products. A case study of Hurricane Isabel (2003) was conducted first to explore the feasibility of using the 3D TC structure information in predicting TC intensification. Afterwards, several structural parameters were extracted from 53 TRMM PR 2A25 observations on 25 North Atlantic TCs during the period of 1998 to 2003. A new generation of multi-correlation data mining algorithm (Apriori and its variations) was applied to find roles of the latent heat release structure in TC intensification. The results showed that the buildup of TC energy is indicated by the height of the convective tower, and the relative low latent heat release at the core area and around the outer band. Adverse conditions which prevent TC intensification include the following: (1) TC entering a higher latitude area where the underlying sea is relative cold, (2) TC moving too fast to absorb the thermal energy from the underlying sea, or (3) strong energy loss at the outer band. When adverse conditions and amicable conditions reached equilibrium status, tropical cyclone intensity would remain stable. The dataset from Statistical Hurricane Intensity Prediction Scheme (SHIPS) covering the period of 1982-2003 and the Apriori-based association rule mining algorithm were used to study the associations of underlying geophysical characteristics with the intensity change of tropical cyclones. The data have been stratified into 6 TC categories from tropical depression to category 4 hurricanes based on their strength. The result showed that the persistence of intensity change in the past and the strength of vertical shear in the environment are the most prevalent factors for all of the 6 TC categories. Hyper-edge searching had found 3 sets of parameters which showed strong intramural binds. Most of the parameters used in SHIPS model have a consistent "I-W" relation over different TC categories, indicating a consistent function of those parameters in TC development. However, the "I-W" relations of the relative momentum flux and the meridional motion change from tropical storm stage to hurricane stage, indicating a change in the role of those two parameters in TC development. Because rapid intensification (RI) is a major source of errors when predicting hurricane intensity, the association rule mining algorithm was performed on RI versus non-RI tropical cyclone cases using the same SHIPS dataset. The results had been compared with those from the traditional statistical analysis conducted by Kaplan and DeMaria (2003). The rapid intensification rule with 5 RI conditions proposed by the traditional statistical analysis was found by the association rule mining in this study as well. However, further analysis showed that the 5 RI conditions can be replaced by another association rule using fewer conditions but with a higher RI probability (RIP). This means that the rule with all 5 constraints found by Kaplan and DeMaria is not optimal, and the association rule mining technique can find a rule with fewer constraints yet fits more RI cases. The further analysis with the highest RIPs over different numbers of conditions has demonstrated that the interactions among multiple factors are responsible for the RI process of TCs. However, the influence of factors saturates at certain numbers. This study has shown successful data mining examples in studying tropical cyclone intensification using association rules. The higher RI probability with fewer conditions found by association rule technique is significant. This work demonstrated that data mining techniques can be used as an efficient exploration method to generate hypotheses, and that statistical analysis should be performed to confirm the hypotheses, as is generally expected for data mining applications.

  14. Integrated approach using data mining-based decision tree and object-based image analysis for high-resolution urban mapping of WorldView-2 satellite sensor data

    NASA Astrophysics Data System (ADS)

    Hamedianfar, Alireza; Shafri, Helmi Zulhaidi Mohd

    2016-04-01

    This paper integrates decision tree-based data mining (DM) and object-based image analysis (OBIA) to provide a transferable model for the detailed characterization of urban land-cover classes using WorldView-2 (WV-2) satellite images. Many articles have been published on OBIA in recent years based on DM for different applications. However, less attention has been paid to the generation of a transferable model for characterizing detailed urban land cover features. Three subsets of WV-2 images were used in this paper to generate transferable OBIA rule-sets. Many features were explored by using a DM algorithm, which created the classification rules as a decision tree (DT) structure from the first study area. The developed DT algorithm was applied to object-based classifications in the first study area. After this process, we validated the capability and transferability of the classification rules into second and third subsets. Detailed ground truth samples were collected to assess the classification results. The first, second, and third study areas achieved 88%, 85%, and 85% overall accuracies, respectively. Results from the investigation indicate that DM was an efficient method to provide the optimal and transferable classification rules for OBIA, which accelerates the rule-sets creation stage in the OBIA classification domain.

  15. Power System Transient Stability Based on Data Mining Theory

    NASA Astrophysics Data System (ADS)

    Cui, Zhen; Shi, Jia; Wu, Runsheng; Lu, Dan; Cui, Mingde

    2018-01-01

    In order to study the stability of power system, a power system transient stability based on data mining theory is designed. By introducing association rules analysis in data mining theory, an association classification method for transient stability assessment is presented. A mathematical model of transient stability assessment based on data mining technology is established. Meanwhile, combining rule reasoning with classification prediction, the method of association classification is proposed to perform transient stability assessment. The transient stability index is used to identify the samples that cannot be correctly classified in association classification. Then, according to the critical stability of each sample, the time domain simulation method is used to determine the state, so as to ensure the accuracy of the final results. The results show that this stability assessment system can improve the speed of operation under the premise that the analysis result is completely correct, and the improved algorithm can find out the inherent relation between the change of power system operation mode and the change of transient stability degree.

  16. Simple, Scalable, Script-based, Science Processor for Measurements - Data Mining Edition (S4PM-DME)

    NASA Astrophysics Data System (ADS)

    Pham, L. B.; Eng, E. K.; Lynnes, C. S.; Berrick, S. W.; Vollmer, B. E.

    2005-12-01

    The S4PM-DME is the Goddard Earth Sciences Distributed Active Archive Center's (GES DAAC) web-based data mining environment. The S4PM-DME replaces the Near-line Archive Data Mining (NADM) system with a better web environment and a richer set of production rules. S4PM-DME enables registered users to submit and execute custom data mining algorithms. The S4PM-DME system uses the GES DAAC developed Simple Scalable Script-based Science Processor for Measurements (S4PM) to automate tasks and perform the actual data processing. A web interface allows the user to access the S4PM-DME system. The user first develops personalized data mining algorithm on his/her home platform and then uploads them to the S4PM-DME system. Algorithms in C and FORTRAN languages are currently supported. The user developed algorithm is automatically audited for any potential security problems before it is installed within the S4PM-DME system and made available to the user. Once the algorithm has been installed the user can promote the algorithm to the "operational" environment. From here the user can search and order the data available in the GES DAAC archive for his/her science algorithm. The user can also set up a processing subscription. The subscription will automatically process new data as it becomes available in the GES DAAC archive. The generated mined data products are then made available for FTP pickup. The benefits of using S4PM-DME are 1) to decrease the downloading time it typically takes a user to transfer the GES DAAC data to his/her system thus off-load the heavy network traffic, 2) to free-up the load on their system, and last 3) to utilize the rich and abundance ocean, atmosphere data from the MODIS and AIRS instruments available from the GES DAAC.

  17. Microbial genotype-phenotype mapping by class association rule mining.

    PubMed

    Tamura, Makio; D'haeseleer, Patrik

    2008-07-01

    Microbial phenotypes are typically due to the concerted action of multiple gene functions, yet the presence of each gene may have only a weak correlation with the observed phenotype. Hence, it may be more appropriate to examine co-occurrence between sets of genes and a phenotype (multiple-to-one) instead of pairwise relations between a single gene and the phenotype. Here, we propose an efficient class association rule mining algorithm, netCAR, in order to extract sets of COGs (clusters of orthologous groups of proteins) associated with a phenotype from COG phylogenetic profiles and a phenotype profile. netCAR takes into account the phylogenetic co-occurrence graph between COGs to restrict hypothesis space, and uses mutual information to evaluate the biconditional relation. We examined the mining capability of pairwise and multiple-to-one association by using netCAR to extract COGs relevant to six microbial phenotypes (aerobic, anaerobic, facultative, endospore, motility and Gram negative) from 11,969 unique COG profiles across 155 prokaryotic organisms. With the same level of false discovery rate, multiple-to-one association can extract about 10 times more relevant COGs than one-to-one association. We also reveal various topologies of association networks among COGs (modules) from extracted multiple-to-one correlation rules relevant with the six phenotypes; including a well-connected network for motility, a star-shaped network for aerobic and intermediate topologies for the other phenotypes. netCAR outperforms a standard CAR mining algorithm, CARapriori, while requiring several orders of magnitude less computational time for extracting 3-COG sets. Source code of the Java implementation is available as Supplementary Material at the Bioinformatics online website, or upon request to the author. Supplementary data are available at Bioinformatics online.

  18. Educational Data Mining Application for Estimating Students Performance in Weka Environment

    NASA Astrophysics Data System (ADS)

    Gowri, G. Shiyamala; Thulasiram, Ramasamy; Amit Baburao, Mahindra

    2017-11-01

    Educational data mining (EDM) is a multi-disciplinary research area that examines artificial intelligence, statistical modeling and data mining with the data generated from an educational institution. EDM utilizes computational ways to deal with explicate educational information keeping in mind the end goal to examine educational inquiries. To make a country stand unique among the other nations of the world, the education system has to undergo a major transition by redesigning its framework. The concealed patterns and data from various information repositories can be extracted by adopting the techniques of data mining. In order to summarize the performance of students with their credentials, we scrutinize the exploitation of data mining in the field of academics. Apriori algorithmic procedure is extensively applied to the database of students for a wider classification based on various categorizes. K-means procedure is applied to the same set of databases in order to accumulate them into a specific category. Apriori algorithm deals with mining the rules in order to extract patterns that are similar along with their associations in relation to various set of records. The records can be extracted from academic information repositories. The parameters used in this study gives more importance to psychological traits than academic features. The undesirable student conduct can be clearly witnessed if we make use of information mining frameworks. Thus, the algorithms efficiently prove to profile the students in any educational environment. The ultimate objective of the study is to suspect if a student is prone to violence or not.

  19. Improve Data Mining and Knowledge Discovery Through the Use of MatLab

    NASA Technical Reports Server (NTRS)

    Shaykhian, Gholam Ali; Martin, Dawn (Elliott); Beil, Robert

    2011-01-01

    Data mining is widely used to mine business, engineering, and scientific data. Data mining uses pattern based queries, searches, or other analyses of one or more electronic databases/datasets in order to discover or locate a predictive pattern or anomaly indicative of system failure, criminal or terrorist activity, etc. There are various algorithms, techniques and methods used to mine data; including neural networks, genetic algorithms, decision trees, nearest neighbor method, rule induction association analysis, slice and dice, segmentation, and clustering. These algorithms, techniques and methods used to detect patterns in a dataset, have been used in the development of numerous open source and commercially available products and technology for data mining. Data mining is best realized when latent information in a large quantity of data stored is discovered. No one technique solves all data mining problems; challenges are to select algorithms or methods appropriate to strengthen data/text mining and trending within given datasets. In recent years, throughout industry, academia and government agencies, thousands of data systems have been designed and tailored to serve specific engineering and business needs. Many of these systems use databases with relational algebra and structured query language to categorize and retrieve data. In these systems, data analyses are limited and require prior explicit knowledge of metadata and database relations; lacking exploratory data mining and discoveries of latent information. This presentation introduces MatLab(R) (MATrix LABoratory), an engineering and scientific data analyses tool to perform data mining. MatLab was originally intended to perform purely numerical calculations (a glorified calculator). Now, in addition to having hundreds of mathematical functions, it is a programming language with hundreds built in standard functions and numerous available toolboxes. MatLab's ease of data processing, visualization and its enormous availability of built in functionalities and toolboxes make it suitable to perform numerical computations and simulations as well as a data mining tool. Engineers and scientists can take advantage of the readily available functions/toolboxes to gain wider insight in their perspective data mining experiments.

  20. Improve Data Mining and Knowledge Discovery through the use of MatLab

    NASA Technical Reports Server (NTRS)

    Shaykahian, Gholan Ali; Martin, Dawn Elliott; Beil, Robert

    2011-01-01

    Data mining is widely used to mine business, engineering, and scientific data. Data mining uses pattern based queries, searches, or other analyses of one or more electronic databases/datasets in order to discover or locate a predictive pattern or anomaly indicative of system failure, criminal or terrorist activity, etc. There are various algorithms, techniques and methods used to mine data; including neural networks, genetic algorithms, decision trees, nearest neighbor method, rule induction association analysis, slice and dice, segmentation, and clustering. These algorithms, techniques and methods used to detect patterns in a dataset, have been used in the development of numerous open source and commercially available products and technology for data mining. Data mining is best realized when latent information in a large quantity of data stored is discovered. No one technique solves all data mining problems; challenges are to select algorithms or methods appropriate to strengthen data/text mining and trending within given datasets. In recent years, throughout industry, academia and government agencies, thousands of data systems have been designed and tailored to serve specific engineering and business needs. Many of these systems use databases with relational algebra and structured query language to categorize and retrieve data. In these systems, data analyses are limited and require prior explicit knowledge of metadata and database relations; lacking exploratory data mining and discoveries of latent information. This presentation introduces MatLab(TradeMark)(MATrix LABoratory), an engineering and scientific data analyses tool to perform data mining. MatLab was originally intended to perform purely numerical calculations (a glorified calculator). Now, in addition to having hundreds of mathematical functions, it is a programming language with hundreds built in standard functions and numerous available toolboxes. MatLab's ease of data processing, visualization and its enormous availability of built in functionalities and toolboxes make it suitable to perform numerical computations and simulations as well as a data mining tool. Engineers and scientists can take advantage of the readily available functions/toolboxes to gain wider insight in their perspective data mining experiments.

  1. A Comparison of different learning models used in Data Mining for Medical Data

    NASA Astrophysics Data System (ADS)

    Srimani, P. K.; Koti, Manjula Sanjay

    2011-12-01

    The present study aims at investigating the different Data mining learning models for different medical data sets and to give practical guidelines to select the most appropriate algorithm for a specific medical data set. In practical situations, it is absolutely necessary to take decisions with regard to the appropriate models and parameters for diagnosis and prediction problems. Learning models and algorithms are widely implemented for rule extraction and the prediction of system behavior. In this paper, some of the well-known Machine Learning(ML) systems are investigated for different methods and are tested on five medical data sets. The practical criteria for evaluating different learning models are presented and the potential benefits of the proposed methodology for diagnosis and learning are suggested.

  2. Information pricing based on trusted system

    NASA Astrophysics Data System (ADS)

    Liu, Zehua; Zhang, Nan; Han, Hongfeng

    2018-05-01

    Personal information has become a valuable commodity in today's society. So our goal aims to develop a price point and a pricing system to be realistic. First of all, we improve the existing BLP system to prevent cascading incidents, design a 7-layer model. Through the cost of encryption in each layer, we develop PI price points. Besides, we use association rules mining algorithms in data mining algorithms to calculate the importance of information in order to optimize informational hierarchies of different attribute types when located within a multi-level trusted system. Finally, we use normal distribution model to predict encryption level distribution for users in different classes and then calculate information prices through a linear programming model with the help of encryption level distribution above.

  3. Weighted Association Rule Mining for Item Groups with Different Properties and Risk Assessment for Networked Systems

    NASA Astrophysics Data System (ADS)

    Kim, Jungja; Ceong, Heetaek; Won, Yonggwan

    In market-basket analysis, weighted association rule (WAR) discovery can mine the rules that include more beneficial information by reflecting item importance for special products. In the point-of-sale database, each transaction is composed of items with similar properties, and item weights are pre-defined and fixed by a factor such as the profit. However, when items are divided into more than one group and the item importance must be measured independently for each group, traditional weighted association rule discovery cannot be used. To solve this problem, we propose a new weighted association rule mining methodology. The items should be first divided into subgroups according to their properties, and the item importance, i.e. item weight, is defined or calculated only with the items included in the subgroup. Then, transaction weight is measured by appropriately summing the item weights from each subgroup, and the weighted support is computed as the fraction of the transaction weights that contains the candidate items relative to the weight of all transactions. As an example, our proposed methodology is applied to assess the vulnerability to threats of computer systems that provide networked services. Our algorithm provides both quantitative risk-level values and qualitative risk rules for the security assessment of networked computer systems using WAR discovery. Also, it can be widely used for new applications with many data sets in which the data items are distinctly separated.

  4. Predicting mining activity with parallel genetic algorithms

    USGS Publications Warehouse

    Talaie, S.; Leigh, R.; Louis, S.J.; Raines, G.L.; Beyer, H.G.; O'Reilly, U.M.; Banzhaf, Arnold D.; Blum, W.; Bonabeau, C.; Cantu-Paz, E.W.; ,; ,

    2005-01-01

    We explore several different techniques in our quest to improve the overall model performance of a genetic algorithm calibrated probabilistic cellular automata. We use the Kappa statistic to measure correlation between ground truth data and data predicted by the model. Within the genetic algorithm, we introduce a new evaluation function sensitive to spatial correctness and we explore the idea of evolving different rule parameters for different subregions of the land. We reduce the time required to run a simulation from 6 hours to 10 minutes by parallelizing the code and employing a 10-node cluster. Our empirical results suggest that using the spatially sensitive evaluation function does indeed improve the performance of the model and our preliminary results also show that evolving different rule parameters for different regions tends to improve overall model performance. Copyright 2005 ACM.

  5. Mining Distance Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule

    NASA Technical Reports Server (NTRS)

    Bay, Stephen D.; Schwabacher, Mark

    2003-01-01

    Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the efficiency is because the time to process non-outliers, which are the majority of examples, does not depend on the size of the data set.

  6. Effective and efficient analysis of spatio-temporal data

    NASA Astrophysics Data System (ADS)

    Zhang, Zhongnan

    Spatio-temporal data mining, i.e., mining knowledge from large amount of spatio-temporal data, is a highly demanding field because huge amounts of spatio-temporal data have been collected in various applications, ranging from remote sensing, to geographical information systems (GIS), computer cartography, environmental assessment and planning, etc. The collection data far exceeded human's ability to analyze which make it crucial to develop analysis tools. Recent studies on data mining have extended to the scope of data mining from relational and transactional datasets to spatial and temporal datasets. Among the various forms of spatio-temporal data, remote sensing images play an important role, due to the growing wide-spreading of outer space satellites. In this dissertation, we proposed two approaches to analyze the remote sensing data. The first one is about applying association rules mining onto images processing. Each image was divided into a number of image blocks. We built a spatial relationship for these blocks during the dividing process. This made a large number of images into a spatio-temporal dataset since each image was shot in time-series. The second one implemented co-occurrence patterns discovery from these images. The generated patterns represent subsets of spatial features that are located together in space and time. A weather analysis is composed of individual analysis of several meteorological variables. These variables include temperature, pressure, dew point, wind, clouds, visibility and so on. Local-scale models provide detailed analysis and forecasts of meteorological phenomena ranging from a few kilometers to about 100 kilometers in size. When some of above meteorological variables have some special change tendency, some kind of severe weather will happen in most cases. Using the discovery of association rules, we found that some special meteorological variables' changing has tight relation with some severe weather situation that will happen very soon. This dissertation is composed of three parts: an introduction, some basic knowledges and relative works, and my own three contributions to the development of approaches for spatio-temporal data mining: DYSTAL algorithm, STARSI algorithm, and COSTCOP+ algorithm.

  7. A guided search genetic algorithm using mined rules for optimal affective product design

    NASA Astrophysics Data System (ADS)

    Fung, Chris K. Y.; Kwong, C. K.; Chan, Kit Yan; Jiang, H.

    2014-08-01

    Affective design is an important aspect of new product development, especially for consumer products, to achieve a competitive edge in the marketplace. It can help companies to develop new products that can better satisfy the emotional needs of customers. However, product designers usually encounter difficulties in determining the optimal settings of the design attributes for affective design. In this article, a novel guided search genetic algorithm (GA) approach is proposed to determine the optimal design attribute settings for affective design. The optimization model formulated based on the proposed approach applied constraints and guided search operators, which were formulated based on mined rules, to guide the GA search and to achieve desirable solutions. A case study on the affective design of mobile phones was conducted to illustrate the proposed approach and validate its effectiveness. Validation tests were conducted, and the results show that the guided search GA approach outperforms the GA approach without the guided search strategy in terms of GA convergence and computational time. In addition, the guided search optimization model is capable of improving GA to generate good solutions for affective design.

  8. Order Batching in Warehouses by Minimizing Total Tardiness: A Hybrid Approach of Weighted Association Rule Mining and Genetic Algorithms

    PubMed Central

    Taheri, Shahrooz; Mat Saman, Muhamad Zameri; Wong, Kuan Yew

    2013-01-01

    One of the cost-intensive issues in managing warehouses is the order picking problem which deals with the retrieval of items from their storage locations in order to meet customer requests. Many solution approaches have been proposed in order to minimize traveling distance in the process of order picking. However, in practice, customer orders have to be completed by certain due dates in order to avoid tardiness which is neglected in most of the related scientific papers. Consequently, we proposed a novel solution approach in order to minimize tardiness which consists of four phases. First of all, weighted association rule mining has been used to calculate associations between orders with respect to their due date. Next, a batching model based on binary integer programming has been formulated to maximize the associations between orders within each batch. Subsequently, the order picking phase will come up which used a Genetic Algorithm integrated with the Traveling Salesman Problem in order to identify the most suitable travel path. Finally, the Genetic Algorithm has been applied for sequencing the constructed batches in order to minimize tardiness. Illustrative examples and comparisons are presented to demonstrate the proficiency and solution quality of the proposed approach. PMID:23864823

  9. Order batching in warehouses by minimizing total tardiness: a hybrid approach of weighted association rule mining and genetic algorithms.

    PubMed

    Azadnia, Amir Hossein; Taheri, Shahrooz; Ghadimi, Pezhman; Saman, Muhamad Zameri Mat; Wong, Kuan Yew

    2013-01-01

    One of the cost-intensive issues in managing warehouses is the order picking problem which deals with the retrieval of items from their storage locations in order to meet customer requests. Many solution approaches have been proposed in order to minimize traveling distance in the process of order picking. However, in practice, customer orders have to be completed by certain due dates in order to avoid tardiness which is neglected in most of the related scientific papers. Consequently, we proposed a novel solution approach in order to minimize tardiness which consists of four phases. First of all, weighted association rule mining has been used to calculate associations between orders with respect to their due date. Next, a batching model based on binary integer programming has been formulated to maximize the associations between orders within each batch. Subsequently, the order picking phase will come up which used a Genetic Algorithm integrated with the Traveling Salesman Problem in order to identify the most suitable travel path. Finally, the Genetic Algorithm has been applied for sequencing the constructed batches in order to minimize tardiness. Illustrative examples and comparisons are presented to demonstrate the proficiency and solution quality of the proposed approach.

  10. A Study of Pattern Prediction in the Monitoring Data of Earthen Ruins with the Internet of Things.

    PubMed

    Xiao, Yun; Wang, Xin; Eshragh, Faezeh; Wang, Xuanhong; Chen, Xiaojiang; Fang, Dingyi

    2017-05-11

    An understanding of the changes of the rammed earth temperature of earthen ruins is important for protection of such ruins. To predict the rammed earth temperature pattern using the air temperature pattern of the monitoring data of earthen ruins, a pattern prediction method based on interesting pattern mining and correlation, called PPER, is proposed in this paper. PPER first finds the interesting patterns in the air temperature sequence and the rammed earth temperature sequence. To reduce the processing time, two pruning rules and a new data structure based on an R-tree are also proposed. Correlation rules between the air temperature patterns and the rammed earth temperature patterns are then mined. The correlation rules are merged into predictive rules for the rammed earth temperature pattern. Experiments were conducted to show the accuracy of the presented method and the power of the pruning rules. Moreover, the Ming Dynasty Great Wall dataset was used to examine the algorithm, and six predictive rules from the air temperature to rammed earth temperature based on the interesting patterns were obtained, with the average hit rate reaching 89.8%. The PPER and predictive rules will be useful for rammed earth temperature prediction in protection of earthen ruins.

  11. Research of Litchi Diseases Diagnosis Expertsystem Based on Rbr and Cbr

    NASA Astrophysics Data System (ADS)

    Xu, Bing; Liu, Liqun

    To conquer the bottleneck problems existing in the traditional rule-based reasoning diseases diagnosis system, such as low reasoning efficiency and lack of flexibility, etc.. It researched the integrated case-based reasoning (CBR) and rule-based reasoning (RBR) technology, and put forward a litchi diseases diagnosis expert system (LDDES) with integrated reasoning method. The method use data mining and knowledge obtaining technology to establish knowledge base and case library. It adopt rules to instruct the retrieval and matching for CBR, and use association rule and decision trees algorithm to calculate case similarity.The experiment shows that the method can increase the system's flexibility and reasoning ability, and improve the accuracy of litchi diseases diagnosis.

  12. Evolving optimised decision rules for intrusion detection using particle swarm paradigm

    NASA Astrophysics Data System (ADS)

    Sivatha Sindhu, Siva S.; Geetha, S.; Kannan, A.

    2012-12-01

    The aim of this article is to construct a practical intrusion detection system (IDS) that properly analyses the statistics of network traffic pattern and classify them as normal or anomalous class. The objective of this article is to prove that the choice of effective network traffic features and a proficient machine-learning paradigm enhances the detection accuracy of IDS. In this article, a rule-based approach with a family of six decision tree classifiers, namely Decision Stump, C4.5, Naive Baye's Tree, Random Forest, Random Tree and Representative Tree model to perform the detection of anomalous network pattern is introduced. In particular, the proposed swarm optimisation-based approach selects instances that compose training set and optimised decision tree operate over this trained set producing classification rules with improved coverage, classification capability and generalisation ability. Experiment with the Knowledge Discovery and Data mining (KDD) data set which have information on traffic pattern, during normal and intrusive behaviour shows that the proposed algorithm produces optimised decision rules and outperforms other machine-learning algorithm.

  13. Rule-Mining for the Early Prediction of Chronic Kidney Disease Based on Metabolomics and Multi-Source Data

    PubMed Central

    Luck, Margaux; Bertho, Gildas; Bateson, Mathilde; Karras, Alexandre; Yartseva, Anastasia; Thervet, Eric

    2016-01-01

    1H Nuclear Magnetic Resonance (NMR)-based metabolic profiling is very promising for the diagnostic of the stages of chronic kidney disease (CKD). Because of the high dimension of NMR spectra datasets and the complex mixture of metabolites in biological samples, the identification of discriminant biomarkers of a disease is challenging. None of the widely used chemometric methods in NMR metabolomics performs a local exhaustive exploration of the data. We developed a descriptive and easily understandable approach that searches for discriminant local phenomena using an original exhaustive rule-mining algorithm in order to predict two groups of patients: 1) patients having low to mild CKD stages with no renal failure and 2) patients having moderate to established CKD stages with renal failure. Our predictive algorithm explores the m-dimensional variable space to capture the local overdensities of the two groups of patients under the form of easily interpretable rules. Afterwards, a L2-penalized logistic regression on the discriminant rules was used to build predictive models of the CKD stages. We explored a complex multi-source dataset that included the clinical, demographic, clinical chemistry, renal pathology and urine metabolomic data of a cohort of 110 patients. Given this multi-source dataset and the complex nature of metabolomic data, we analyzed 1- and 2-dimensional rules in order to integrate the information carried by the interactions between the variables. The results indicated that our local algorithm is a valuable analytical method for the precise characterization of multivariate CKD stage profiles and as efficient as the classical global model using chi2 variable section with an approximately 70% of good classification level. The resulting predictive models predominantly identify urinary metabolites (such as 3-hydroxyisovalerate, carnitine, citrate, dimethylsulfone, creatinine and N-methylnicotinamide) as relevant variables indicating that CKD significantly affects the urinary metabolome. In addition, the simple knowledge of the concentration of urinary metabolites classifies the CKD stage of the patients correctly. PMID:27861591

  14. Recommendation System Based On Association Rules For Distributed E-Learning Management Systems

    NASA Astrophysics Data System (ADS)

    Mihai, Gabroveanu

    2015-09-01

    Traditional Learning Management Systems are installed on a single server where learning materials and user data are kept. To increase its performance, the Learning Management System can be installed on multiple servers; learning materials and user data could be distributed across these servers obtaining a Distributed Learning Management System. In this paper is proposed the prototype of a recommendation system based on association rules for Distributed Learning Management System. Information from LMS databases is analyzed using distributed data mining algorithms in order to extract the association rules. Then the extracted rules are used as inference rules to provide personalized recommendations. The quality of provided recommendations is improved because the rules used to make the inferences are more accurate, since these rules aggregate knowledge from all e-Learning systems included in Distributed Learning Management System.

  15. Adaptive process control using fuzzy logic and genetic algorithms

    NASA Technical Reports Server (NTRS)

    Karr, C. L.

    1993-01-01

    Researchers at the U.S. Bureau of Mines have developed adaptive process control systems in which genetic algorithms (GA's) are used to augment fuzzy logic controllers (FLC's). GA's are search algorithms that rapidly locate near-optimum solutions to a wide spectrum of problems by modeling the search procedures of natural genetics. FLC's are rule based systems that efficiently manipulate a problem environment by modeling the 'rule-of-thumb' strategy used in human decision making. Together, GA's and FLC's possess the capabilities necessary to produce powerful, efficient, and robust adaptive control systems. To perform efficiently, such control systems require a control element to manipulate the problem environment, and a learning element to adjust to the changes in the problem environment. Details of an overall adaptive control system are discussed. A specific laboratory acid-base pH system is used to demonstrate the ideas presented.

  16. Adaptive Process Control with Fuzzy Logic and Genetic Algorithms

    NASA Technical Reports Server (NTRS)

    Karr, C. L.

    1993-01-01

    Researchers at the U.S. Bureau of Mines have developed adaptive process control systems in which genetic algorithms (GA's) are used to augment fuzzy logic controllers (FLC's). GA's are search algorithms that rapidly locate near-optimum solutions to a wide spectrum of problems by modeling the search procedures of natural genetics. FLC's are rule based systems that efficiently manipulate a problem environment by modeling the 'rule-of-thumb' strategy used in human decision-making. Together, GA's and FLC's possess the capabilities necessary to produce powerful, efficient, and robust adaptive control systems. To perform efficiently, such control systems require a control element to manipulate the problem environment, an analysis element to recognize changes in the problem environment, and a learning element to adjust to the changes in the problem environment. Details of an overall adaptive control system are discussed. A specific laboratory acid-base pH system is used to demonstrate the ideas presented.

  17. Genetic algorithms in adaptive fuzzy control

    NASA Technical Reports Server (NTRS)

    Karr, C. Lucas; Harper, Tony R.

    1992-01-01

    Researchers at the U.S. Bureau of Mines have developed adaptive process control systems in which genetic algorithms (GA's) are used to augment fuzzy logic controllers (FLC's). GA's are search algorithms that rapidly locate near-optimum solutions to a wide spectrum of problems by modeling the search procedures of natural genetics. FLC's are rule based systems that efficiently manipulate a problem environment by modeling the 'rule-of-thumb' strategy used in human decision making. Together, GA's and FLC's possess the capabilities necessary to produce powerful, efficient, and robust adaptive control systems. To perform efficiently, such control systems require a control element to manipulate the problem environment, an analysis element to recognize changes in the problem environment, and a learning element to adjust fuzzy membership functions in response to the changes in the problem environment. Details of an overall adaptive control system are discussed. A specific computer-simulated chemical system is used to demonstrate the ideas presented.

  18. E-book recommender system design and implementation based on data mining

    NASA Astrophysics Data System (ADS)

    Wang, Zongjiang

    2011-12-01

    In the knowledge explosion, rapid development of information age, how quickly the user or users interested in useful information for feedback to the user problem to be solved in this article. This paper based on data mining, association rules to the model and classification model a combination of electronic books on the recommendation of the user's neighboring users interested in e-books to target users. Introduced the e-book recommendation and the key technologies, system implementation algorithms, and implementation process, was proved through experiments that this system can help users quickly find the required e-books.

  19. Managing the Big Data Avalanche in Astronomy - Data Mining the Galaxy Zoo Classification Database

    NASA Astrophysics Data System (ADS)

    Borne, Kirk D.

    2014-01-01

    We will summarize a variety of data mining experiments that have been applied to the Galaxy Zoo database of galaxy classifications, which were provided by the volunteer citizen scientists. The goal of these exercises is to learn new and improved classification rules for diverse populations of galaxies, which can then be applied to much larger sky surveys of the future, such as the LSST (Large Synoptic Sky Survey), which is proposed to obtain detailed photometric data for approximately 20 billion galaxies. The massive Big Data that astronomy projects will generate in the future demand greater application of data mining and data science algorithms, as well as greater training of astronomy students in the skills of data mining and data science. The project described here has involved several graduate and undergraduate research assistants at George Mason University.

  20. Occupancy schedules learning process through a data mining framework

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    D'Oca, Simona; Hong, Tianzhen

    Building occupancy is a paramount factor in building energy simulations. Specifically, lighting, plug loads, HVAC equipment utilization, fresh air requirements and internal heat gain or loss greatly depends on the level of occupancy within a building. Developing the appropriate methodologies to describe and reproduce the intricate network responsible for human-building interactions are needed. Extrapolation of patterns from big data streams is a powerful analysis technique which will allow for a better understanding of energy usage in buildings. A three-step data mining framework is applied to discover occupancy patterns in office spaces. First, a data set of 16 offices with 10more » minute interval occupancy data, over a two year period is mined through a decision tree model which predicts the occupancy presence. Then a rule induction algorithm is used to learn a pruned set of rules on the results from the decision tree model. Finally, a cluster analysis is employed in order to obtain consistent patterns of occupancy schedules. Furthermore, the identified occupancy rules and schedules are representative as four archetypal working profiles that can be used as input to current building energy modeling programs, such as EnergyPlus or IDA-ICE, to investigate impact of occupant presence on design, operation and energy use in office buildings.« less

  1. Occupancy schedules learning process through a data mining framework

    DOE PAGES

    D'Oca, Simona; Hong, Tianzhen

    2014-12-17

    Building occupancy is a paramount factor in building energy simulations. Specifically, lighting, plug loads, HVAC equipment utilization, fresh air requirements and internal heat gain or loss greatly depends on the level of occupancy within a building. Developing the appropriate methodologies to describe and reproduce the intricate network responsible for human-building interactions are needed. Extrapolation of patterns from big data streams is a powerful analysis technique which will allow for a better understanding of energy usage in buildings. A three-step data mining framework is applied to discover occupancy patterns in office spaces. First, a data set of 16 offices with 10more » minute interval occupancy data, over a two year period is mined through a decision tree model which predicts the occupancy presence. Then a rule induction algorithm is used to learn a pruned set of rules on the results from the decision tree model. Finally, a cluster analysis is employed in order to obtain consistent patterns of occupancy schedules. Furthermore, the identified occupancy rules and schedules are representative as four archetypal working profiles that can be used as input to current building energy modeling programs, such as EnergyPlus or IDA-ICE, to investigate impact of occupant presence on design, operation and energy use in office buildings.« less

  2. Mining association rule based on the diseases population for recommendation of medicine need

    NASA Astrophysics Data System (ADS)

    Harahap, M.; Husein, A. M.; Aisyah, S.; Lubis, F. R.; Wijaya, B. A.

    2018-04-01

    Selection of medicines that is inappropriate will lead to an empty result at medicines, this has an impact on medical services and economic value in hospital. The importance of an appropriate medicine selection process requires an automated way to select need based on the development of the patient's illness. In this study, we analyzed patient prescriptions to identify the relationship between the disease and the medicine used by the physician in treating the patient's illness. The analytical framework includes: (1) patient prescription data collection, (2) applying k-means clustering to classify the top 10 diseases, (3) applying Apriori algorithm to find association rules based on support, confidence and lift value. The results of the tests of patient prescription datasets in 2015-2016, the application of the k-means algorithm for the clustering of 10 dominant diseases significantly affects the value of trust and support of all association rules on the Apriori algorithm making it more consistent with finding association rules of disease and related medicine. The value of support, confidence and the lift value of disease and related medicine can be used as recommendations for appropriate medicine selection. Based on the conditions of disease progressions of the hospital, there is so more optimal medicine procurement.

  3. Safety rules and regulations on mine sites - the problem and a solution.

    PubMed

    Laurence, David

    2005-01-01

    Many accidents and incidents on mine sites have a causal factor in the rules and regulations that supposedly are in place to prevent the incident from occurring. The causes involve a lack of awareness or understanding, ignorance, or deliberate violations. The issue of mine rules, procedures, and regulations is a central focus of this paper, highlighted by this recent comment - "very few people have accidents for which there is no procedure in place..." An attitudinal survey was conducted at 33 mines throughout NSW, Queensland and international mine sites involving almost 500 mineworkers. The survey was in the form of a self-completing questionnaire, consisting of approximately 65 questions. It aimed to seek the opinions of the mining workforce on safety rules and regulations generally, as well as how they apply to their specific jobs on a mine site. The research also aimed to investigate: (a) the level of awareness and understanding of mine rules and procedures such as manager's rules and safe work procedures (SWPs); (b) the level of awareness and understanding of mine safety regulations and legislation; (c) the extent of communication of and commitment to rules and regulations; (d) the extent of compliance with rules and regulations; and (e) attitudes regarding errors, risk-taking, and accidents and their interaction with rules and regulations. The sample consisted of a random selection of underground and open pit mines, extracting coal, metals, or industrial minerals. The insights provided by the mineworkers enabled a set of principles to be developed to guide mine management and regulators in the development of more effective rules and regulations. CONCLUSIONS AND IMPACT ON THE MINING INDUSTRY: (a) Management and regulators should not continue to produce more and more rules and regulations to cover every aspect of mining. (b) Detailed prescriptive regulations, detailed safe work procedures, and voluminous safety management plans will not "connect" with a miner. (c) Achieving more effective rules and regulations is not the only answer to a safer workplace.

  4. Traffic accident in Cuiabá-MT: an analysis through the data mining technology.

    PubMed

    Galvão, Noemi Dreyer; de Fátima Marin, Heimar

    2010-01-01

    The traffic road accidents (ATT) are non-intentional events with an important magnitude worldwide, mainly in the urban centers. This article aims to analyzes data related to the victims of ATT recorded by the Justice Secretariat and Public Security (SEJUSP) in hospital morbidity and mortality incidence at the city of Cuiabá-MT during 2006, using data mining technology. An observational, retrospective and exploratory study of the secondary data bases was carried out. The three database selected were related using the probabilistic method, through the free software RecLink. One hundred and thirty-nine (139) real pairs of victims of ATT were obtained. In this related database the data mining technology was applied with the software WEKA using the Apriori algorithm. The result generated 10 best rules, six of them were considered according to the parameters established that indicated a useful and comprehensible knowledge to characterize the victims of accidents in Cuiabá. Finally, the findings of the associative rules showed peculiarities of the road traffic accident victims in Cuiabá and highlight the need of prevention measures in the collision accidents for males.

  5. Knowledge discovery with classification rules in a cardiovascular dataset.

    PubMed

    Podgorelec, Vili; Kokol, Peter; Stiglic, Milojka Molan; Hericko, Marjan; Rozman, Ivan

    2005-12-01

    In this paper we study an evolutionary machine learning approach to data mining and knowledge discovery based on the induction of classification rules. A method for automatic rules induction called AREX using evolutionary induction of decision trees and automatic programming is introduced. The proposed algorithm is applied to a cardiovascular dataset consisting of different groups of attributes which should possibly reveal the presence of some specific cardiovascular problems in young patients. A case study is presented that shows the use of AREX for the classification of patients and for discovering possible new medical knowledge from the dataset. The defined knowledge discovery loop comprises a medical expert's assessment of induced rules to drive the evolution of rule sets towards more appropriate solutions. The final result is the discovery of a possible new medical knowledge in the field of pediatric cardiology.

  6. An Analysis Pipeline with Statistical and Visualization-Guided Knowledge Discovery for Michigan-Style Learning Classifier Systems

    PubMed Central

    Urbanowicz, Ryan J.; Granizo-Mackenzie, Ambrose; Moore, Jason H.

    2014-01-01

    Michigan-style learning classifier systems (M-LCSs) represent an adaptive and powerful class of evolutionary algorithms which distribute the learned solution over a sizable population of rules. However their application to complex real world data mining problems, such as genetic association studies, has been limited. Traditional knowledge discovery strategies for M-LCS rule populations involve sorting and manual rule inspection. While this approach may be sufficient for simpler problems, the confounding influence of noise and the need to discriminate between predictive and non-predictive attributes calls for additional strategies. Additionally, tests of significance must be adapted to M-LCS analyses in order to make them a viable option within fields that require such analyses to assess confidence. In this work we introduce an M-LCS analysis pipeline that combines uniquely applied visualizations with objective statistical evaluation for the identification of predictive attributes, and reliable rule generalizations in noisy single-step data mining problems. This work considers an alternative paradigm for knowledge discovery in M-LCSs, shifting the focus from individual rules to a global, population-wide perspective. We demonstrate the efficacy of this pipeline applied to the identification of epistasis (i.e., attribute interaction) and heterogeneity in noisy simulated genetic association data. PMID:25431544

  7. A fuzzy classifier system for process control

    NASA Technical Reports Server (NTRS)

    Karr, C. L.; Phillips, J. C.

    1994-01-01

    A fuzzy classifier system that discovers rules for controlling a mathematical model of a pH titration system was developed by researchers at the U.S. Bureau of Mines (USBM). Fuzzy classifier systems successfully combine the strengths of learning classifier systems and fuzzy logic controllers. Learning classifier systems resemble familiar production rule-based systems, but they represent their IF-THEN rules by strings of characters rather than in the traditional linguistic terms. Fuzzy logic is a tool that allows for the incorporation of abstract concepts into rule based-systems, thereby allowing the rules to resemble the familiar 'rules-of-thumb' commonly used by humans when solving difficult process control and reasoning problems. Like learning classifier systems, fuzzy classifier systems employ a genetic algorithm to explore and sample new rules for manipulating the problem environment. Like fuzzy logic controllers, fuzzy classifier systems encapsulate knowledge in the form of production rules. The results presented in this paper demonstrate the ability of fuzzy classifier systems to generate a fuzzy logic-based process control system.

  8. Classification Based on Pruning and Double Covered Rule Sets for the Internet of Things Applications

    PubMed Central

    Zhou, Zhongmei; Wang, Weiping

    2014-01-01

    The Internet of things (IOT) is a hot issue in recent years. It accumulates large amounts of data by IOT users, which is a great challenge to mining useful knowledge from IOT. Classification is an effective strategy which can predict the need of users in IOT. However, many traditional rule-based classifiers cannot guarantee that all instances can be covered by at least two classification rules. Thus, these algorithms cannot achieve high accuracy in some datasets. In this paper, we propose a new rule-based classification, CDCR-P (Classification based on the Pruning and Double Covered Rule sets). CDCR-P can induce two different rule sets A and B. Every instance in training set can be covered by at least one rule not only in rule set A, but also in rule set B. In order to improve the quality of rule set B, we take measure to prune the length of rules in rule set B. Our experimental results indicate that, CDCR-P not only is feasible, but also it can achieve high accuracy. PMID:24511304

  9. Classification based on pruning and double covered rule sets for the internet of things applications.

    PubMed

    Li, Shasha; Zhou, Zhongmei; Wang, Weiping

    2014-01-01

    The Internet of things (IOT) is a hot issue in recent years. It accumulates large amounts of data by IOT users, which is a great challenge to mining useful knowledge from IOT. Classification is an effective strategy which can predict the need of users in IOT. However, many traditional rule-based classifiers cannot guarantee that all instances can be covered by at least two classification rules. Thus, these algorithms cannot achieve high accuracy in some datasets. In this paper, we propose a new rule-based classification, CDCR-P (Classification based on the Pruning and Double Covered Rule sets). CDCR-P can induce two different rule sets A and B. Every instance in training set can be covered by at least one rule not only in rule set A, but also in rule set B. In order to improve the quality of rule set B, we take measure to prune the length of rules in rule set B. Our experimental results indicate that, CDCR-P not only is feasible, but also it can achieve high accuracy.

  10. Generic framework for mining cellular automata models on protein-folding simulations.

    PubMed

    Diaz, N; Tischer, I

    2016-05-13

    Cellular automata model identification is an important way of building simplified simulation models. In this study, we describe a generic architectural framework to ease the development process of new metaheuristic-based algorithms for cellular automata model identification in protein-folding trajectories. Our framework was developed by a methodology based on design patterns that allow an improved experience for new algorithms development. The usefulness of the proposed framework is demonstrated by the implementation of four algorithms, able to obtain extremely precise cellular automata models of the protein-folding process with a protein contact map representation. Dynamic rules obtained by the proposed approach are discussed, and future use for the new tool is outlined.

  11. 76 FR 63238 - Proximity Detection Systems for Continuous Mining Machines in Underground Coal Mines

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-10-12

    ... Detection Systems for Continuous Mining Machines in Underground Coal Mines AGENCY: Mine Safety and Health... Agency's proposed rule addressing Proximity Detection Systems for Continuous Mining Machines in... proposed rule for Proximity Detection Systems on Continuous Mining Machines in Underground Coal Mines. Due...

  12. CARSVM: a class association rule-based classification framework and its application to gene expression data.

    PubMed

    Kianmehr, Keivan; Alhajj, Reda

    2008-09-01

    In this study, we aim at building a classification framework, namely the CARSVM model, which integrates association rule mining and support vector machine (SVM). The goal is to benefit from advantages of both, the discriminative knowledge represented by class association rules and the classification power of the SVM algorithm, to construct an efficient and accurate classifier model that improves the interpretability problem of SVM as a traditional machine learning technique and overcomes the efficiency issues of associative classification algorithms. In our proposed framework: instead of using the original training set, a set of rule-based feature vectors, which are generated based on the discriminative ability of class association rules over the training samples, are presented to the learning component of the SVM algorithm. We show that rule-based feature vectors present a high-qualified source of discrimination knowledge that can impact substantially the prediction power of SVM and associative classification techniques. They provide users with more conveniences in terms of understandability and interpretability as well. We have used four datasets from UCI ML repository to evaluate the performance of the developed system in comparison with five well-known existing classification methods. Because of the importance and popularity of gene expression analysis as real world application of the classification model, we present an extension of CARSVM combined with feature selection to be applied to gene expression data. Then, we describe how this combination will provide biologists with an efficient and understandable classifier model. The reported test results and their biological interpretation demonstrate the applicability, efficiency and effectiveness of the proposed model. From the results, it can be concluded that a considerable increase in classification accuracy can be obtained when the rule-based feature vectors are integrated in the learning process of the SVM algorithm. In the context of applicability, according to the results obtained from gene expression analysis, we can conclude that the CARSVM system can be utilized in a variety of real world applications with some adjustments.

  13. Performance of Case-Based Reasoning Retrieval Using Classification Based on Associations versus Jcolibri and FreeCBR: A Further Validation Study

    NASA Astrophysics Data System (ADS)

    Aljuboori, Ahmed S.; Coenen, Frans; Nsaif, Mohammed; Parsons, David J.

    2018-05-01

    Case-Based Reasoning (CBR) plays a major role in expert system research. However, a critical problem can be met when a CBR system retrieves incorrect cases. Class Association Rules (CARs) have been utilized to offer a potential solution in a previous work. The aim of this paper was to perform further validation of Case-Based Reasoning using a Classification based on Association Rules (CBRAR) to enhance the performance of Similarity Based Retrieval (SBR). The CBRAR strategy uses a classed frequent pattern tree algorithm (FP-CAR) in order to disambiguate wrongly retrieved cases in CBR. The research reported in this paper makes contributions to both fields of CBR and Association Rules Mining (ARM) in that full target cases can be extracted from the FP-CAR algorithm without invoking P-trees and union operations. The dataset used in this paper provided more efficient results when the SBR retrieves unrelated answers. The accuracy of the proposed CBRAR system outperforms the results obtained by existing CBR tools such as Jcolibri and FreeCBR.

  14. A Hybrid Data Mining Approach for Credit Card Usage Behavior Analysis

    NASA Astrophysics Data System (ADS)

    Tsai, Chieh-Yuan

    Credit card is one of the most popular e-payment approaches in current online e-commerce. To consolidate valuable customers, card issuers invest a lot of money to maintain good relationship with their customers. Although several efforts have been done in studying card usage motivation, few researches emphasize on credit card usage behavior analysis when time periods change from t to t+1. To address this issue, an integrated data mining approach is proposed in this paper. First, the customer profile and their transaction data at time period t are retrieved from databases. Second, a LabelSOM neural network groups customers into segments and identify critical characteristics for each group. Third, a fuzzy decision tree algorithm is used to construct usage behavior rules of interesting customer groups. Finally, these rules are used to analysis the behavior changes between time periods t and t+1. An implementation case using a practical credit card database provided by a commercial bank in Taiwan is illustrated to show the benefits of the proposed framework.

  15. A rough set-based association rule approach implemented on a brand trust evaluation model

    NASA Astrophysics Data System (ADS)

    Liao, Shu-Hsien; Chen, Yin-Ju

    2017-09-01

    In commerce, businesses use branding to differentiate their product and service offerings from those of their competitors. The brand incorporates a set of product or service features that are associated with that particular brand name and identifies the product/service segmentation in the market. This study proposes a new data mining approach, a rough set-based association rule induction, implemented on a brand trust evaluation model. In addition, it presents as one way to deal with data uncertainty to analyse ratio scale data, while creating predictive if-then rules that generalise data values to the retail region. As such, this study uses the analysis of algorithms to find alcoholic beverages brand trust recall. Finally, discussions and conclusion are presented for further managerial implications.

  16. Association Rule Analysis for Tour Route Recommendation and Application to Wctsnop

    NASA Astrophysics Data System (ADS)

    Fang, H.; Chen, C.; Lin, J.; Liu, X.; Fang, D.

    2017-09-01

    The increasing E-tourism systems provide intelligent tour recommendation for tourists. In this sense, recommender system can make personalized suggestions and provide satisfied information associated with their tour cycle. Data mining is a proper tool that extracting potential information from large database for making strategic decisions. In the study, association rule analysis based on FP-growth algorithm is applied to find the association relationship among scenic spots in different cities as tour route recommendation. In order to figure out valuable rules, Kulczynski interestingness measure is adopted and imbalance ratio is computed. The proposed scheme was evaluated on Wangluzhe cultural tourism service network operation platform (WCTSNOP), where it could verify that it is able to quick recommend tour route and to rapidly enhance the recommendation quality.

  17. Implementation of hospital examination reservation system using data mining technique.

    PubMed

    Cha, Hyo Soung; Yoon, Tae Sik; Ryu, Ki Chung; Shin, Il Won; Choe, Yang Hyo; Lee, Kyoung Yong; Lee, Jae Dong; Ryu, Keun Ho; Chung, Seung Hyun

    2015-04-01

    New methods for obtaining appropriate information for users have been attempted with the development of information technology and the Internet. Among such methods, the demand for systems and services that can improve patient satisfaction has increased in hospital care environments. In this paper, we proposed the Hospital Exam Reservation System (HERS), which uses the data mining method. First, we focused on carrying clinical exam data and finding the optimal schedule for generating rules using the multi-examination pattern-mining algorithm. Then, HERS was applied by a rule master and recommending system with an exam log. Finally, HERS was designed as a user-friendly interface. HERS has been applied at the National Cancer Center in Korea since June 2014. As the number of scheduled exams increased, the time required to schedule more than a single condition decreased (from 398.67% to 168.67% and from 448.49% to 188.49%; p < 0.0001). As the number of tests increased, the difference between HERS and non-HERS increased (from 0.18 days to 0.81 days). It was possible to expand the efficiency of HERS studies using mining technology in not only exam reservations, but also the medical environment. The proposed system based on doctor prescription removes exams that were not executed in order to improve recommendation accuracy. In addition, we expect HERS to become an effective system in various medical environments.

  18. 26 CFR 1.611-2 - Rules applicable to mines, oil and gas wells, and other natural deposits.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 26 Internal Revenue 7 2013-04-01 2013-04-01 false Rules applicable to mines, oil and gas wells....611-2 Rules applicable to mines, oil and gas wells, and other natural deposits. (a) Computation of cost depletion of mines, oil and gas wells, and other natural deposits. (1) The basis upon which cost...

  19. 26 CFR 1.611-2 - Rules applicable to mines, oil and gas wells, and other natural deposits.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 26 Internal Revenue 7 2012-04-01 2012-04-01 false Rules applicable to mines, oil and gas wells....611-2 Rules applicable to mines, oil and gas wells, and other natural deposits. (a) Computation of cost depletion of mines, oil and gas wells, and other natural deposits. (1) The basis upon which cost...

  20. Runtime support for parallelizing data mining algorithms

    NASA Astrophysics Data System (ADS)

    Jin, Ruoming; Agrawal, Gagan

    2002-03-01

    With recent technological advances, shared memory parallel machines have become more scalable, and offer large main memories and high bus bandwidths. They are emerging as good platforms for data warehousing and data mining. In this paper, we focus on shared memory parallelization of data mining algorithms. We have developed a series of techniques for parallelization of data mining algorithms, including full replication, full locking, fixed locking, optimized full locking, and cache-sensitive locking. Unlike previous work on shared memory parallelization of specific data mining algorithms, all of our techniques apply to a large number of common data mining algorithms. In addition, we propose a reduction-object based interface for specifying a data mining algorithm. We show how our runtime system can apply any of the technique we have developed starting from a common specification of the algorithm.

  1. CARIBIAM: constrained Association Rules using Interactive Biological IncrementAl Mining.

    PubMed

    Rahal, Imad; Rahhal, Riad; Wang, Baoying; Perrizo, William

    2008-01-01

    This paper analyses annotated genome data by applying a very central data-mining technique known as Association Rule Mining (ARM) with the aim of discovering rules and hypotheses capable of yielding deeper insights into this type of data. In the literature, ARM has been noted for producing an overwhelming number of rules. This work proposes a new technique capable of using domain knowledge in the form of queries in order to efficiently mine only the subset of the associations that are of interest to investigators in an incremental and interactive manner.

  2. The application of remote sensing image sea ice monitoring method in Bohai Bay based on C4.5 decision tree algorithm

    NASA Astrophysics Data System (ADS)

    Ye, Wei; Song, Wei

    2018-02-01

    In The Paper, the remote sensing monitoring of sea ice problem was turned into a classification problem in data mining. Based on the statistic of the related band data of HJ1B remote sensing images, the main bands of HJ1B images related with the reflectance of seawater and sea ice were found. On the basis, the decision tree rules for sea ice monitoring were constructed by the related bands found above, and then the rules were applied to Liaodong Bay area seriously covered by sea ice for sea ice monitoring. The result proved that the method is effective.

  3. A Collaborative Educational Association Rule Mining Tool

    ERIC Educational Resources Information Center

    Garcia, Enrique; Romero, Cristobal; Ventura, Sebastian; de Castro, Carlos

    2011-01-01

    This paper describes a collaborative educational data mining tool based on association rule mining for the ongoing improvement of e-learning courses and allowing teachers with similar course profiles to share and score the discovered information. The mining tool is oriented to be used by non-expert instructors in data mining so its internal…

  4. Efficient hiding of confidential high-utility itemsets with minimal side effects

    NASA Astrophysics Data System (ADS)

    Lin, Jerry Chun-Wei; Hong, Tzung-Pei; Fournier-Viger, Philippe; Liu, Qiankun; Wong, Jia-Wei; Zhan, Justin

    2017-11-01

    Privacy preserving data mining (PPDM) is an emerging research problem that has become critical in the last decades. PPDM consists of hiding sensitive information to ensure that it cannot be discovered by data mining algorithms. Several PPDM algorithms have been developed. Most of them are designed for hiding sensitive frequent itemsets or association rules. Hiding sensitive information in a database can have several side effects such as hiding other non-sensitive information and introducing redundant information. Finding the set of itemsets or transactions to be sanitised that minimises side effects is an NP-hard problem. In this paper, a genetic algorithm (GA) using transaction deletion is designed to hide sensitive high-utility itemsets for PPUM. A flexible fitness function with three adjustable weights is used to evaluate the goodness of each chromosome for hiding sensitive high-utility itemsets. To speed up the evolution process, the pre-large concept is adopted in the designed algorithm. It reduces the number of database scans required for verifying the goodness of an evaluated chromosome. Substantial experiments are conducted to compare the performance of the designed GA approach (with/without the pre-large concept), with a GA-based approach relying on transaction insertion and a non-evolutionary algorithm, in terms of execution time, side effects, database integrity and utility integrity. Results demonstrate that the proposed algorithm hides sensitive high-utility itemsets with fewer side effects than previous studies, while preserving high database and utility integrity.

  5. MINEs: Open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics

    DOE PAGES

    Jeffryes, James G.; Colastani, Ricardo L.; Elbadawi-Sidhu, Mona; ...

    2015-08-28

    Metabolomics have proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography–mass spectrometry (LC–MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases. Here we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likelymore » to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC–MS accurate mass data enabled the identity of an unknown peak to be confidently predicted. MINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical databases whose results include irrelevant synthetic compounds. MINEs complement and expand on previous in silico generated compound databases that focus on human metabolism. We are actively developing the database; future versions of this resource will incorporate transformation rules for spontaneous chemical reactions and more advanced filtering and prioritization of candidate structures.« less

  6. Handling Data Skew in MapReduce Cluster by Using Partition Tuning

    PubMed

    Gao, Yufei; Zhou, Yanjie; Zhou, Bing; Shi, Lei; Zhang, Jiacai

    2017-01-01

    The healthcare industry has generated large amounts of data, and analyzing these has emerged as an important problem in recent years. The MapReduce programming model has been successfully used for big data analytics. However, data skew invariably occurs in big data analytics and seriously affects efficiency. To overcome the data skew problem in MapReduce, we have in the past proposed a data processing algorithm called Partition Tuning-based Skew Handling (PTSH). In comparison with the one-stage partitioning strategy used in the traditional MapReduce model, PTSH uses a two-stage strategy and the partition tuning method to disperse key-value pairs in virtual partitions and recombines each partition in case of data skew. The robustness and efficiency of the proposed algorithm were tested on a wide variety of simulated datasets and real healthcare datasets. The results showed that PTSH algorithm can handle data skew in MapReduce efficiently and improve the performance of MapReduce jobs in comparison with the native Hadoop, Closer, and locality-aware and fairness-aware key partitioning (LEEN). We also found that the time needed for rule extraction can be reduced significantly by adopting the PTSH algorithm, since it is more suitable for association rule mining (ARM) on healthcare data. © 2017 Yufei Gao et al.

  7. Handling Data Skew in MapReduce Cluster by Using Partition Tuning.

    PubMed

    Gao, Yufei; Zhou, Yanjie; Zhou, Bing; Shi, Lei; Zhang, Jiacai

    2017-01-01

    The healthcare industry has generated large amounts of data, and analyzing these has emerged as an important problem in recent years. The MapReduce programming model has been successfully used for big data analytics. However, data skew invariably occurs in big data analytics and seriously affects efficiency. To overcome the data skew problem in MapReduce, we have in the past proposed a data processing algorithm called Partition Tuning-based Skew Handling (PTSH). In comparison with the one-stage partitioning strategy used in the traditional MapReduce model, PTSH uses a two-stage strategy and the partition tuning method to disperse key-value pairs in virtual partitions and recombines each partition in case of data skew. The robustness and efficiency of the proposed algorithm were tested on a wide variety of simulated datasets and real healthcare datasets. The results showed that PTSH algorithm can handle data skew in MapReduce efficiently and improve the performance of MapReduce jobs in comparison with the native Hadoop, Closer, and locality-aware and fairness-aware key partitioning (LEEN). We also found that the time needed for rule extraction can be reduced significantly by adopting the PTSH algorithm, since it is more suitable for association rule mining (ARM) on healthcare data.

  8. Handling Data Skew in MapReduce Cluster by Using Partition Tuning

    PubMed Central

    Zhou, Yanjie; Zhou, Bing; Shi, Lei

    2017-01-01

    The healthcare industry has generated large amounts of data, and analyzing these has emerged as an important problem in recent years. The MapReduce programming model has been successfully used for big data analytics. However, data skew invariably occurs in big data analytics and seriously affects efficiency. To overcome the data skew problem in MapReduce, we have in the past proposed a data processing algorithm called Partition Tuning-based Skew Handling (PTSH). In comparison with the one-stage partitioning strategy used in the traditional MapReduce model, PTSH uses a two-stage strategy and the partition tuning method to disperse key-value pairs in virtual partitions and recombines each partition in case of data skew. The robustness and efficiency of the proposed algorithm were tested on a wide variety of simulated datasets and real healthcare datasets. The results showed that PTSH algorithm can handle data skew in MapReduce efficiently and improve the performance of MapReduce jobs in comparison with the native Hadoop, Closer, and locality-aware and fairness-aware key partitioning (LEEN). We also found that the time needed for rule extraction can be reduced significantly by adopting the PTSH algorithm, since it is more suitable for association rule mining (ARM) on healthcare data. PMID:29065568

  9. Association mining of mutated cancer genes in different clinical stages across 11 cancer types.

    PubMed

    Hu, Wangxiong; Li, Xiaofen; Wang, Tingzhang; Zheng, Shu

    2016-10-18

    Many studies have demonstrated that some genes (e.g. APC, BRAF, KRAS, PTEN, TP53) are frequently mutated in cancer, however, underlying mechanism that contributes to their high mutation frequency remains unclear. Here we used Apriori algorithm to find the frequent mutational gene sets (FMGSs) from 4,904 tumors across 11 cancer types as part of the TCGA Pan-Cancer effort and then mined the hidden association rules (ARs) within these FMGSs. Intriguingly, we found that well-known cancer driver genes such as BRAF, KRAS, PTEN, and TP53 were often co-occurred with other driver genes and FMGSs size peaked at an itemset size of 3~4 genes. Besides, the number and constitution of FMGS and ARs differed greatly among different cancers and stages. In addition, FMGS and ARs were rare in endocrine-related cancers such as breast carcinoma, ovarian cystadenocarcinoma, and thyroid carcinoma, but abundant in cancers contact directly with external environments such as skin melanoma and stomach adenocarcinoma. Furthermore, we observed more rules in stage IV than in other stages, indicating that distant metastasis needed more sophisticated gene regulatory network.

  10. Analysis of mesenchymal stem cell differentiation in vitro using classification association rule mining.

    PubMed

    Wang, Weiqi; Wang, Yanbo Justin; Bañares-Alcántara, René; Coenen, Frans; Cui, Zhanfeng

    2009-12-01

    In this paper, data mining is used to analyze the data on the differentiation of mammalian Mesenchymal Stem Cells (MSCs), aiming at discovering known and hidden rules governing MSC differentiation, following the establishment of a web-based public database containing experimental data on the MSC proliferation and differentiation. To this effect, a web-based public interactive database comprising the key parameters which influence the fate and destiny of mammalian MSCs has been constructed and analyzed using Classification Association Rule Mining (CARM) as a data-mining technique. The results show that the proposed approach is technically feasible and performs well with respect to the accuracy of (classification) prediction. Key rules mined from the constructed MSC database are consistent with experimental observations, indicating the validity of the method developed and the first step in the application of data mining to the study of MSCs.

  11. Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks

    NASA Astrophysics Data System (ADS)

    Esfandiar, Pooya; Bonchi, Francesco; Gleich, David F.; Greif, Chen; Lakshmanan, Laks V. S.; On, Byung-Won

    Motivated by social network data mining problems such as link prediction and collaborative filtering, significant research effort has been devoted to computing topological measures including the Katz score and the commute time. Existing approaches typically approximate all pairwise relationships simultaneously. In this paper, we are interested in computing: the score for a single pair of nodes, and the top-k nodes with the best scores from a given source node. For the pairwise problem, we apply an iterative algorithm that computes upper and lower bounds for the measures we seek. This algorithm exploits a relationship between the Lanczos process and a quadrature rule. For the top-k problem, we propose an algorithm that only accesses a small portion of the graph and is related to techniques used in personalized PageRank computing. To test the scalability and accuracy of our algorithms we experiment with three real-world networks and find that these algorithms run in milliseconds to seconds without any preprocessing.

  12. Fast katz and commuters : efficient estimation of social relatedness in large networks.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    On, Byung-Won; Lakshmanan, Laks V. S.; Greif, Chen

    Motivated by social network data mining problems such as link prediction and collaborative filtering, significant research effort has been devoted to computing topological measures including the Katz score and the commute time. Existing approaches typically approximate all pairwise relationships simultaneously. In this paper, we are interested in computing: the score for a single pair of nodes, and the top-k nodes with the best scores from a given source node. For the pairwise problem, we apply an iterative algorithm that computes upper and lower bounds for the measures we seek. This algorithm exploits a relationship between the Lanczos process and amore » quadrature rule. For the top-k problem, we propose an algorithm that only accesses a small portion of the graph and is related to techniques used in personalized PageRank computing. To test the scalability and accuracy of our algorithms we experiment with three real-world networks and find that these algorithms run in milliseconds to seconds without any preprocessing.« less

  13. 76 FR 2617 - Lowering Miners' Exposure to Respirable Coal Mine Dust, Including Continuous Personal Dust Monitors

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-01-14

    ... 1219-AB64 Lowering Miners' Exposure to Respirable Coal Mine Dust, Including Continuous Personal Dust... comment period on the proposed rule addressing Lowering Miners' Exposure to Respirable Coal Mine Dust...), MSHA published a proposed rule, Lowering Miners' Exposure to Respirable Coal Mine Dust, Including...

  14. Analysis of Occupational Accidents in Underground and Surface Mining in Spain Using Data-Mining Techniques.

    PubMed

    Sanmiquel, Lluís; Bascompta, Marc; Rossell, Josep M; Anticoi, Hernán Francisco; Guash, Eduard

    2018-03-07

    An analysis of occupational accidents in the mining sector was conducted using the data from the Spanish Ministry of Employment and Social Safety between 2005 and 2015, and data-mining techniques were applied. Data was processed with the software Weka. Two scenarios were chosen from the accidents database: surface and underground mining. The most important variables involved in occupational accidents and their association rules were determined. These rules are composed of several predictor variables that cause accidents, defining its characteristics and context. This study exposes the 20 most important association rules in the sector-either surface or underground mining-based on the statistical confidence levels of each rule as obtained by Weka. The outcomes display the most typical immediate causes, along with the percentage of accidents with a basis in each association rule. The most important immediate cause is body movement with physical effort or overexertion, and the type of accident is physical effort or overexertion. On the other hand, the second most important immediate cause and type of accident are different between the two scenarios. Data-mining techniques were chosen as a useful tool to find out the root cause of the accidents.

  15. Use HypE to Hide Association Rules by Adding Items

    PubMed Central

    Cheng, Peng; Lin, Chun-Wei; Pan, Jeng-Shyang

    2015-01-01

    During business collaboration, partners may benefit through sharing data. People may use data mining tools to discover useful relationships from shared data. However, some relationships are sensitive to the data owners and they hope to conceal them before sharing. In this paper, we address this problem in forms of association rule hiding. A hiding method based on evolutionary multi-objective optimization (EMO) is proposed, which performs the hiding task by selectively inserting items into the database to decrease the confidence of sensitive rules below specified thresholds. The side effects generated during the hiding process are taken as optimization goals to be minimized. HypE, a recently proposed EMO algorithm, is utilized to identify promising transactions for modification to minimize side effects. Results on real datasets demonstrate that the proposed method can effectively perform sanitization with fewer damages to the non-sensitive knowledge in most cases. PMID:26070130

  16. Using GO-WAR for mining cross-ontology weighted association rules.

    PubMed

    Agapito, Giuseppe; Cannataro, Mario; Guzzi, Pietro Hiram; Milano, Marianna

    2015-07-01

    The Gene Ontology (GO) is a structured repository of concepts (GO terms) that are associated to one or more gene products. The process of association is referred to as annotation. The relevance and the specificity of both GO terms and annotations are evaluated by a measure defined as information content (IC). The analysis of annotated data is thus an important challenge for bioinformatics. There exist different approaches of analysis. From those, the use of association rules (AR) may provide useful knowledge, and it has been used in some applications, e.g. improving the quality of annotations. Nevertheless classical association rules algorithms do not take into account the source of annotation nor the importance yielding to the generation of candidate rules with low IC. This paper presents GO-WAR (Gene Ontology-based Weighted Association Rules) a methodology for extracting weighted association rules. GO-WAR can extract association rules with a high level of IC without loss of support and confidence from a dataset of annotated data. A case study on using of GO-WAR on publicly available GO annotation datasets is used to demonstrate that our method outperforms current state of the art approaches. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  17. MINEs: open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics.

    PubMed

    Jeffryes, James G; Colastani, Ricardo L; Elbadawi-Sidhu, Mona; Kind, Tobias; Niehaus, Thomas D; Broadbelt, Linda J; Hanson, Andrew D; Fiehn, Oliver; Tyo, Keith E J; Henry, Christopher S

    2015-01-01

    In spite of its great promise, metabolomics has proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography-mass spectrometry (LC-MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases. Here we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likely to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC-MS accurate mass data enabled the identity of an unknown peak to be confidently predicted. MINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical databases whose results include irrelevant synthetic compounds. Furthermore, MINEs complement and expand on previous in silico generated compound databases that focus on human metabolism. We are actively developing the database; future versions of this resource will incorporate transformation rules for spontaneous chemical reactions and more advanced filtering and prioritization of candidate structures. Graphical abstractMINE database construction and access methods. The process of constructing a MINE database from the curated source databases is depicted on the left. The methods for accessing the database are shown on the right.

  18. 77 FR 34894 - Wyoming Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-06-12

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 950... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; withdrawal. SUMMARY: We, the Office of Surface Mining Reclamation and Enforcement (OSM), are announcing the withdrawal of a proposed rule...

  19. Exploring Characterizations of Learning Object Repositories Using Data Mining Techniques

    NASA Astrophysics Data System (ADS)

    Segura, Alejandra; Vidal, Christian; Menendez, Victor; Zapata, Alfredo; Prieto, Manuel

    Learning object repositories provide a platform for the sharing of Web-based educational resources. As these repositories evolve independently, it is difficult for users to have a clear picture of the kind of contents they give access to. Metadata can be used to automatically extract a characterization of these resources by using machine learning techniques. This paper presents an exploratory study carried out in the contents of four public repositories that uses clustering and association rule mining algorithms to extract characterizations of repository contents. The results of the analysis include potential relationships between different attributes of learning objects that may be useful to gain an understanding of the kind of resources available and eventually develop search mechanisms that consider repository descriptions as a criteria in federated search.

  20. Location Prediction Based on Transition Probability Matrices Constructing from Sequential Rules for Spatial-Temporal K-Anonymity Dataset

    PubMed Central

    Liu, Zhao; Zhu, Yunhong; Wu, Chenxue

    2016-01-01

    Spatial-temporal k-anonymity has become a mainstream approach among techniques for protection of users’ privacy in location-based services (LBS) applications, and has been applied to several variants such as LBS snapshot queries and continuous queries. Analyzing large-scale spatial-temporal anonymity sets may benefit several LBS applications. In this paper, we propose two location prediction methods based on transition probability matrices constructing from sequential rules for spatial-temporal k-anonymity dataset. First, we define single-step sequential rules mined from sequential spatial-temporal k-anonymity datasets generated from continuous LBS queries for multiple users. We then construct transition probability matrices from mined single-step sequential rules, and normalize the transition probabilities in the transition matrices. Next, we regard a mobility model for an LBS requester as a stationary stochastic process and compute the n-step transition probability matrices by raising the normalized transition probability matrices to the power n. Furthermore, we propose two location prediction methods: rough prediction and accurate prediction. The former achieves the probabilities of arriving at target locations along simple paths those include only current locations, target locations and transition steps. By iteratively combining the probabilities for simple paths with n steps and the probabilities for detailed paths with n-1 steps, the latter method calculates transition probabilities for detailed paths with n steps from current locations to target locations. Finally, we conduct extensive experiments, and correctness and flexibility of our proposed algorithm have been verified. PMID:27508502

  1. Target-Based Maintenance of Privacy Preserving Association Rules

    ERIC Educational Resources Information Center

    Ahluwalia, Madhu V.

    2011-01-01

    In the context of association rule mining, the state-of-the-art in privacy preserving data mining provides solutions for categorical and Boolean association rules but not for quantitative association rules. This research fills this gap by describing a method based on discrete wavelet transform (DWT) to protect input data privacy while preserving…

  2. Interestingness measures and strategies for mining multi-ontology multi-level association rules from gene ontology annotations for the discovery of new GO relationships.

    PubMed

    Manda, Prashanti; McCarthy, Fiona; Bridges, Susan M

    2013-10-01

    The Gene Ontology (GO), a set of three sub-ontologies, is one of the most popular bio-ontologies used for describing gene product characteristics. GO annotation data containing terms from multiple sub-ontologies and at different levels in the ontologies is an important source of implicit relationships between terms from the three sub-ontologies. Data mining techniques such as association rule mining that are tailored to mine from multiple ontologies at multiple levels of abstraction are required for effective knowledge discovery from GO annotation data. We present a data mining approach, Multi-ontology data mining at All Levels (MOAL) that uses the structure and relationships of the GO to mine multi-ontology multi-level association rules. We introduce two interestingness measures: Multi-ontology Support (MOSupport) and Multi-ontology Confidence (MOConfidence) customized to evaluate multi-ontology multi-level association rules. We also describe a variety of post-processing strategies for pruning uninteresting rules. We use publicly available GO annotation data to demonstrate our methods with respect to two applications (1) the discovery of co-annotation suggestions and (2) the discovery of new cross-ontology relationships. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

  3. 75 FR 22723 - Stream Protection Rule; Environmental Impact Statement

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-04-30

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Parts 780... of Surface Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; notice of intent to prepare an environmental impact statement. SUMMARY: We, the Office of Surface Mining Reclamation and...

  4. Analysis of Occupational Accidents in Underground and Surface Mining in Spain Using Data-Mining Techniques

    PubMed Central

    Sanmiquel, Lluís; Bascompta, Marc; Rossell, Josep M.; Anticoi, Hernán Francisco; Guash, Eduard

    2018-01-01

    An analysis of occupational accidents in the mining sector was conducted using the data from the Spanish Ministry of Employment and Social Safety between 2005 and 2015, and data-mining techniques were applied. Data was processed with the software Weka. Two scenarios were chosen from the accidents database: surface and underground mining. The most important variables involved in occupational accidents and their association rules were determined. These rules are composed of several predictor variables that cause accidents, defining its characteristics and context. This study exposes the 20 most important association rules in the sector—either surface or underground mining—based on the statistical confidence levels of each rule as obtained by Weka. The outcomes display the most typical immediate causes, along with the percentage of accidents with a basis in each association rule. The most important immediate cause is body movement with physical effort or overexertion, and the type of accident is physical effort or overexertion. On the other hand, the second most important immediate cause and type of accident are different between the two scenarios. Data-mining techniques were chosen as a useful tool to find out the root cause of the accidents. PMID:29518921

  5. [Analysis of on medication rules for Qi-deficiency and blood-stasis syndrome of chronic heart failure based on data mining technology].

    PubMed

    Wang, Qian; Yao, Geng-Zhen; Pan, Guang-Ming; Huang, Jing-Yi; An, Yi-Pei; Zou, Xu

    2017-01-01

    To analyze the medication features and the regularity of prescriptions of traditional Chinese medicine in treating patients with Qi-deficiency and blood-stasis syndrome of chronic heart failure based on modern literature. In this article, CNKI Chinese academic journal database, Wanfang Chinese academic journal database and VIP Chinese periodical database were all searched from January 2000 to December 2015 for the relevant literature on traditional Chinese medicine treatment for Qi-deficiency and blood-stasis syndrome of chronic heart failure. Then a normalized database was established for further data mining and analysis. Subsequently, the medication features and the regularity of prescriptions were mined by using traditional Chinese medicine inheritance support system(V2.5), association rules, improved mutual information algorithm, complex system entropy clustering and other mining methods. Finally, a total of 171 articles were included, involving 171 prescriptions, 140 kinds of herbs, with a total frequency of 1 772 for the herbs. As a result, 19 core prescriptions and 7 new prescriptions were mined. The most frequently used herbs included Huangqi(Astragali Radix), Danshen(Salviae Miltiorrhizae Radix et Rhizoma), Fuling(Poria), Renshen(Ginseng Radix et Rhizoma), Tinglizi(Semen Lepidii), Baizhu(Atractylodis Macrocephalae Rhizoma), and Guizhi(Cinnamomum Ramulus). The core prescriptions were composed of Huangqi(Astragali Radix), Danshen(Salviae Miltiorrhizae Radix et Rhizoma) and Fuling(Poria), etc. The high frequent herbs and core prescriptions not only highlight the medication features of Qi-invigorating and blood-circulating therapy, but also reflect the regularity of prescriptions of blood-circulating, Yang-warming, and urination-promoting therapy based on syndrome differentiation. Moreover, the mining of the new prescriptions provide new reference and inspiration for clinical treatment of various accompanying symptoms of chronic heart failure. In conclusion, this article provides new reference for traditional Chinese medicine in the treatment of chronic heart failure. Copyright© by the Chinese Pharmaceutical Association.

  6. Data Mining for Financial Applications

    NASA Astrophysics Data System (ADS)

    Kovalerchuk, Boris; Vityaev, Evgenii

    This chapter describes Data Mining in finance by discussing financial tasks, specifics of methodologies and techniques in this Data Mining area. It includes time dependence, data selection, forecast horizon, measures of success, quality of patterns, hypothesis evaluation, problem ID, method profile, attribute-based and relational methodologies. The second part of the chapter discusses Data Mining models and practice in finance. It covers use of neural networks in portfolio management, design of interpretable trading rules and discovering money laundering schemes using decision rules and relational Data Mining methodology.

  7. Analysis 320 coal mine accidents using structural equation modeling with unsafe conditions of the rules and regulations as exogenous variables.

    PubMed

    Zhang, Yingyu; Shao, Wei; Zhang, Mengjia; Li, Hejun; Yin, Shijiu; Xu, Yingjun

    2016-07-01

    Mining has been historically considered as a naturally high-risk industry worldwide. Deaths caused by coal mine accidents are more than the sum of all other accidents in China. Statistics of 320 coal mine accidents in Shandong province show that all accidents contain indicators of "unsafe conditions of the rules and regulations" with a frequency of 1590, accounting for 74.3% of the total frequency of 2140. "Unsafe behaviors of the operator" is another important contributory factor, which mainly includes "operator error" and "venturing into dangerous places." A systems analysis approach was applied by using structural equation modeling (SEM) to examine the interactions between the contributory factors of coal mine accidents. The analysis of results leads to three conclusions. (i) "Unsafe conditions of the rules and regulations," affect the "unsafe behaviors of the operator," "unsafe conditions of the equipment," and "unsafe conditions of the environment." (ii) The three influencing factors of coal mine accidents (with the frequency of effect relation in descending order) are "lack of safety education and training," "rules and regulations of safety production responsibility," and "rules and regulations of supervision and inspection." (iii) The three influenced factors (with the frequency in descending order) of coal mine accidents are "venturing into dangerous places," "poor workplace environment," and "operator error." Copyright © 2016 Elsevier Ltd. All rights reserved.

  8. 5 CFR 5201.105 - Additional rules for Mine Safety and Health Administration employees.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... Health Administration employees. 5201.105 Section 5201.105 Administrative Personnel DEPARTMENT OF LABOR... for Mine Safety and Health Administration employees. The rules in this section apply to employees of the Mine Safety and Health Administration (MSHA) and are in addition to §§ 5201.101, 5201.102, and...

  9. From data mining rules to medical logical modules and medical advices.

    PubMed

    Gomoi, Valentin; Vida, Mihaela; Robu, Raul; Stoicu-Tivadar, Vasile; Bernad, Elena; Lupşe, Oana

    2013-01-01

    Using data mining in collaboration with Clinical Decision Support Systems adds new knowledge as support for medical diagnosis. The current work presents a tool which translates data mining rules supporting generation of medical advices to Arden Syntax formalism. The developed system was tested with data related to 2326 births that took place in 2010 at the Bega Obstetrics - Gynaecology Hospital, Timişoara. Based on processing these data, 14 medical rules regarding the Apgar score were generated and then translated in Arden Syntax language.

  10. Dynamic Task Optimization in Remote Diabetes Monitoring Systems.

    PubMed

    Suh, Myung-Kyung; Woodbridge, Jonathan; Moin, Tannaz; Lan, Mars; Alshurafa, Nabil; Samy, Lauren; Mortazavi, Bobak; Ghasemzadeh, Hassan; Bui, Alex; Ahmadi, Sheila; Sarrafzadeh, Majid

    2012-09-01

    Diabetes is the seventh leading cause of death in the United States, but careful symptom monitoring can prevent adverse events. A real-time patient monitoring and feedback system is one of the solutions to help patients with diabetes and their healthcare professionals monitor health-related measurements and provide dynamic feedback. However, data-driven methods to dynamically prioritize and generate tasks are not well investigated in the domain of remote health monitoring. This paper presents a wireless health project (WANDA) that leverages sensor technology and wireless communication to monitor the health status of patients with diabetes. The WANDA dynamic task management function applies data analytics in real-time to discretize continuous features, applying data clustering and association rule mining techniques to manage a sliding window size dynamically and to prioritize required user tasks. The developed algorithm minimizes the number of daily action items required by patients with diabetes using association rules that satisfy a minimum support, confidence and conditional probability thresholds. Each of these tasks maximizes information gain, thereby improving the overall level of patient adherence and satisfaction. Experimental results from applying EM-based clustering and Apriori algorithms show that the developed algorithm can predict further events with higher confidence levels and reduce the number of user tasks by up to 76.19 %.

  11. Dynamic Task Optimization in Remote Diabetes Monitoring Systems

    PubMed Central

    Suh, Myung-kyung; Woodbridge, Jonathan; Moin, Tannaz; Lan, Mars; Alshurafa, Nabil; Samy, Lauren; Mortazavi, Bobak; Ghasemzadeh, Hassan; Bui, Alex; Ahmadi, Sheila; Sarrafzadeh, Majid

    2016-01-01

    Diabetes is the seventh leading cause of death in the United States, but careful symptom monitoring can prevent adverse events. A real-time patient monitoring and feedback system is one of the solutions to help patients with diabetes and their healthcare professionals monitor health-related measurements and provide dynamic feedback. However, data-driven methods to dynamically prioritize and generate tasks are not well investigated in the domain of remote health monitoring. This paper presents a wireless health project (WANDA) that leverages sensor technology and wireless communication to monitor the health status of patients with diabetes. The WANDA dynamic task management function applies data analytics in real-time to discretize continuous features, applying data clustering and association rule mining techniques to manage a sliding window size dynamically and to prioritize required user tasks. The developed algorithm minimizes the number of daily action items required by patients with diabetes using association rules that satisfy a minimum support, confidence and conditional probability thresholds. Each of these tasks maximizes information gain, thereby improving the overall level of patient adherence and satisfaction. Experimental results from applying EM-based clustering and Apriori algorithms show that the developed algorithm can predict further events with higher confidence levels and reduce the number of user tasks by up to 76.19 %. PMID:27617297

  12. Artificial neural network, genetic algorithm, and logistic regression applications for predicting renal colic in emergency settings.

    PubMed

    Eken, Cenker; Bilge, Ugur; Kartal, Mutlu; Eray, Oktay

    2009-06-03

    Logistic regression is the most common statistical model for processing multivariate data in the medical literature. Artificial intelligence models like an artificial neural network (ANN) and genetic algorithm (GA) may also be useful to interpret medical data. The purpose of this study was to perform artificial intelligence models on a medical data sheet and compare to logistic regression. ANN, GA, and logistic regression analysis were carried out on a data sheet of a previously published article regarding patients presenting to an emergency department with flank pain suspicious for renal colic. The study population was composed of 227 patients: 176 patients had a diagnosis of urinary stone, while 51 ultimately had no calculus. The GA found two decision rules in predicting urinary stones. Rule 1 consisted of being male, pain not spreading to back, and no fever. In rule 2, pelvicaliceal dilatation on bedside ultrasonography replaced no fever. ANN, GA rule 1, GA rule 2, and logistic regression had a sensitivity of 94.9, 67.6, 56.8, and 95.5%, a specificity of 78.4, 76.47, 86.3, and 47.1%, a positive likelihood ratio of 4.4, 2.9, 4.1, and 1.8, and a negative likelihood ratio of 0.06, 0.42, 0.5, and 0.09, respectively. The area under the curve was found to be 0.867, 0.720, 0.715, and 0.713 for all applications, respectively. Data mining techniques such as ANN and GA can be used for predicting renal colic in emergency settings and to constitute clinical decision rules. They may be an alternative to conventional multivariate analysis applications used in biostatistics.

  13. On-Demand Associative Cross-Language Information Retrieval

    NASA Astrophysics Data System (ADS)

    Geraldo, André Pinto; Moreira, Viviane P.; Gonçalves, Marcos A.

    This paper proposes the use of algorithms for mining association rules as an approach for Cross-Language Information Retrieval. These algorithms have been widely used to analyse market basket data. The idea is to map the problem of finding associations between sales items to the problem of finding term translations over a parallel corpus. The proposal was validated by means of experiments using queries in two distinct languages: Portuguese and Finnish to retrieve documents in English. The results show that the performance of our proposed approach is comparable to the performance of the monolingual baseline and to query translation via machine translation, even though these systems employ more complex Natural Language Processing techniques. The combination between machine translation and our approach yielded the best results, even outperforming the monolingual baseline.

  14. Research on parallel algorithm for sequential pattern mining

    NASA Astrophysics Data System (ADS)

    Zhou, Lijuan; Qin, Bai; Wang, Yu; Hao, Zhongxiao

    2008-03-01

    Sequential pattern mining is the mining of frequent sequences related to time or other orders from the sequence database. Its initial motivation is to discover the laws of customer purchasing in a time section by finding the frequent sequences. In recent years, sequential pattern mining has become an important direction of data mining, and its application field has not been confined to the business database and has extended to new data sources such as Web and advanced science fields such as DNA analysis. The data of sequential pattern mining has characteristics as follows: mass data amount and distributed storage. Most existing sequential pattern mining algorithms haven't considered the above-mentioned characteristics synthetically. According to the traits mentioned above and combining the parallel theory, this paper puts forward a new distributed parallel algorithm SPP(Sequential Pattern Parallel). The algorithm abides by the principal of pattern reduction and utilizes the divide-and-conquer strategy for parallelization. The first parallel task is to construct frequent item sets applying frequent concept and search space partition theory and the second task is to structure frequent sequences using the depth-first search method at each processor. The algorithm only needs to access the database twice and doesn't generate the candidated sequences, which abates the access time and improves the mining efficiency. Based on the random data generation procedure and different information structure designed, this paper simulated the SPP algorithm in a concrete parallel environment and implemented the AprioriAll algorithm. The experiments demonstrate that compared with AprioriAll, the SPP algorithm had excellent speedup factor and efficiency.

  15. Handling Dynamic Weights in Weighted Frequent Pattern Mining

    NASA Astrophysics Data System (ADS)

    Ahmed, Chowdhury Farhan; Tanbeer, Syed Khairuzzaman; Jeong, Byeong-Soo; Lee, Young-Koo

    Even though weighted frequent pattern (WFP) mining is more effective than traditional frequent pattern mining because it can consider different semantic significances (weights) of items, existing WFP algorithms assume that each item has a fixed weight. But in real world scenarios, the weight (price or significance) of an item can vary with time. Reflecting these changes in item weight is necessary in several mining applications, such as retail market data analysis and web click stream analysis. In this paper, we introduce the concept of a dynamic weight for each item, and propose an algorithm, DWFPM (dynamic weighted frequent pattern mining), that makes use of this concept. Our algorithm can address situations where the weight (price or significance) of an item varies dynamically. It exploits a pattern growth mining technique to avoid the level-wise candidate set generation-and-test methodology. Furthermore, it requires only one database scan, so it is eligible for use in stream data mining. An extensive performance analysis shows that our algorithm is efficient and scalable for WFP mining using dynamic weights.

  16. In Brief: Coal mining regulations

    NASA Astrophysics Data System (ADS)

    Showstack, Randy

    2009-12-01

    The U.S. Department of the Interior (DOI) announced on 18 November measures to strengthen the oversight of state surface coal mining programs and to promulgate federal regulations to protect streams affected by surface coal mining operations. DOI's Office of Surface Mining Reclamation and Enforcement (OSM) is publishing an advance notice of a proposed rule about protecting streams from adverse impacts of surface coal mining operations. A rule issued by the Bush administration in December 2008 allows coal mine operators to place excess excavated materials into streams if they can show it is not reasonably possible to avoid doing so. “We are moving as quickly as possible under the law to gather public input for a new rule, based on sound science, that will govern how companies handle fill removed from mountaintop coal seams,” according to Wilma Lewis, assistant secretary for Land and Minerals Management at DOI.

  17. Development of an evolutionary fuzzy expert system for estimating future behavior of stock price

    NASA Astrophysics Data System (ADS)

    Mehmanpazir, Farhad; Asadi, Shahrokh

    2017-03-01

    The stock market has always been an attractive area for researchers since no method has been found yet to predict the stock price behavior precisely. Due to its high rate of uncertainty and volatility, it carries a higher risk than any other investment area, thus the stock price behavior is difficult to simulation. This paper presents a "data mining-based evolutionary fuzzy expert system" (DEFES) approach to estimate the behavior of stock price. This tool is developed in seven-stage architecture. Data mining is used in three stages to reduce the complexity of the whole data space. The first stage, noise filtering, is used to make our raw data clean and smooth. Variable selection is second stage; we use stepwise regression analysis to choose the key variables been considered in the model. In the third stage, K-means is used to divide the data into sub-populations to decrease the effects of noise and rebate complexity of the patterns. At next stage, extraction of Mamdani type fuzzy rule-based system will be carried out for each cluster by means of genetic algorithm and evolutionary strategy. In the fifth stage, we use binary genetic algorithm to rule filtering to remove the redundant rules in order to solve over learning phenomenon. In the sixth stage, we utilize the genetic tuning process to slightly adjust the shape of the membership functions. Last stage is the testing performance of tool and adjusts parameters. This is the first study on using an approximate fuzzy rule base system and evolutionary strategy with the ability of extracting the whole knowledge base of fuzzy expert system for stock price forecasting problems. The superiority and applicability of DEFES are shown for International Business Machines Corporation and compared the outcome with the results of the other methods. Results with MAPE metric and Wilcoxon signed ranks test indicate that DEFES provides more accuracy and outperforms all previous methods, so it can be considered as a superior tool for stock price forecasting problems.

  18. Image Information Mining Utilizing Hierarchical Segmentation

    NASA Technical Reports Server (NTRS)

    Tilton, James C.; Marchisio, Giovanni; Koperski, Krzysztof; Datcu, Mihai

    2002-01-01

    The Hierarchical Segmentation (HSEG) algorithm is an approach for producing high quality, hierarchically related image segmentations. The VisiMine image information mining system utilizes clustering and segmentation algorithms for reducing visual information in multispectral images to a manageable size. The project discussed herein seeks to enhance the VisiMine system through incorporating hierarchical segmentations from HSEG into the VisiMine system.

  19. Soil quality assessment using weighted fuzzy association rules

    USGS Publications Warehouse

    Xue, Yue-Ju; Liu, Shu-Guang; Hu, Yue-Ming; Yang, Jing-Feng

    2010-01-01

    Fuzzy association rules (FARs) can be powerful in assessing regional soil quality, a critical step prior to land planning and utilization; however, traditional FARs mined from soil quality database, ignoring the importance variability of the rules, can be redundant and far from optimal. In this study, we developed a method applying different weights to traditional FARs to improve accuracy of soil quality assessment. After the FARs for soil quality assessment were mined, redundant rules were eliminated according to whether the rules were significant or not in reducing the complexity of the soil quality assessment models and in improving the comprehensibility of FARs. The global weights, each representing the importance of a FAR in soil quality assessment, were then introduced and refined using a gradient descent optimization method. This method was applied to the assessment of soil resources conditions in Guangdong Province, China. The new approach had an accuracy of 87%, when 15 rules were mined, as compared with 76% from the traditional approach. The accuracy increased to 96% when 32 rules were mined, in contrast to 88% from the traditional approach. These results demonstrated an improved comprehensibility of FARs and a high accuracy of the proposed method.

  20. 77 FR 43721 - Examinations of Work Areas in Underground Coal Mines for Violations of Mandatory Health or Safety...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-07-26

    ... Examinations of Work Areas in Underground Coal Mines for Violations of Mandatory Health or Safety Standards... effectiveness of information collection requirements contained in the final rule on Examinations of Work Areas... requirements in MSHA's final rule on Examinations of Work Areas in Underground Coal Mines for Violations of...

  1. 75 FR 28227 - National Emission Standards for Hazardous Air Pollutants: Gold Mine Ore Processing and Production...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-05-20

    ... published a proposed rule for mercury emissions from the gold mine ore processing and production area source... proposed rule (75 FR 22470). Several parties requested that EPA extend the comment period. EPA has granted...-AP48 National Emission Standards for Hazardous Air Pollutants: Gold Mine Ore Processing and Production...

  2. Using association rule mining to identify risk factors for early childhood caries.

    PubMed

    Ivančević, Vladimir; Tušek, Ivan; Tušek, Jasmina; Knežević, Marko; Elheshk, Salaheddin; Luković, Ivan

    2015-11-01

    Early childhood caries (ECC) is a potentially severe disease affecting children all over the world. The available findings are mostly based on a logistic regression model, but data mining, in particular association rule mining, could be used to extract more information from the same data set. ECC data was collected in a cross-sectional analytical study of the 10% sample of preschool children in the South Bačka area (Vojvodina, Serbia). Association rules were extracted from the data by association rule mining. Risk factors were extracted from the highly ranked association rules. Discovered dominant risk factors include male gender, frequent breastfeeding (with other risk factors), high birth order, language, and low body weight at birth. Low health awareness of parents was significantly associated to ECC only in male children. The discovered risk factors are mostly confirmed by the literature, which corroborates the value of the methods. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  3. Herbal compatibility of traditional Chinese medical formulas for acquired immunodeficiency syndrome.

    PubMed

    Cui, Meng; Li, Jinghua; Li, Haiyan; Song, Chunxin

    2012-09-01

    Because herbal compatibility is one of the most important reasons why Traditional Chinese Medcine (TCM) formulas are effective for acquired immunodeficiency syndrome (AIDS), our study aimed to determine the compatibility of herbs based on published AIDS clinical research in Chinese periodicals. To achieve this aim, we designed a new data-mining algorithm according to TCM data characteristics. We found 25 clinical AIDS studies, all using Chinese herbs for treatment, in the Traditional Chinese Medicine Database System, and information on diagnosis and treatment was extracted. To find out herbal compatibility, especially the formulae for herbal combinations, we proposed an improved association rule algorithm based on the frequency of combinations. In this algorithm, all the compatibility relationships were displayed in a tree structure, by which the relationship between formulas and their derivation could be clearly inferred. Data analysis showed that approximately 100 herbs have been used for treating AIDS. Based on the whole herb compatibility tree, we calculated a basic formula for AIDS: Huang Qi combined with Ren Shen, Fu Ling, Bai Zhu, Bai Zhu, Dang Gui, and Bai Shao. This formula, deriving from most of clinical prescriptions, and was chosed by most of clinicians for AIDS treatment. From data mining we found that Qi replenishment and detoxification were the main treatment principles, which coincided with the AIDS pathological mechanism in which immune function is destroyed by human immunodeficiency virus (HIV). Our data-mining results suggest that the core TCM treatment of AIDS is replenishing Qi and detoxification, by which AIDS patients' immune system may be enhanced. Compatibility of Huang Qi with some frequently-used herbs have shown real efficacy in clinical practice, which warrants pharmacological research in the future.

  4. Semisupervised GDTW kernel-based fuzzy c-means algorithm for mapping vegetation dynamics in mining region using normalized difference vegetation index time series

    NASA Astrophysics Data System (ADS)

    Jia, Duo; Wang, Cangjiao; Lei, Shaogang

    2018-01-01

    Mapping vegetation dynamic types in mining areas is significant for revealing the mechanisms of environmental damage and for guiding ecological construction. Dynamic types of vegetation can be identified by applying interannual normalized difference vegetation index (NDVI) time series. However, phase differences and time shifts in interannual time series decrease mapping accuracy in mining regions. To overcome these problems and to increase the accuracy of mapping vegetation dynamics, an interannual Landsat time series for optimum vegetation growing status was constructed first by using the enhanced spatial and temporal adaptive reflectance fusion model algorithm. We then proposed a Markov random field optimized semisupervised Gaussian dynamic time warping kernel-based fuzzy c-means (FCM) cluster algorithm for interannual NDVI time series to map dynamic vegetation types in mining regions. The proposed algorithm has been tested in the Shengli mining region and Shendong mining region, which are typical representatives of China's open-pit and underground mining regions, respectively. Experiments show that the proposed algorithm can solve the problems of phase differences and time shifts to achieve better performance when mapping vegetation dynamic types. The overall accuracies for the Shengli and Shendong mining regions were 93.32% and 89.60%, respectively, with improvements of 7.32% and 25.84% when compared with the original semisupervised FCM algorithm.

  5. A modeling of dynamic storage assignment for order picking in beverage warehousing with Drive-in Rack system

    NASA Astrophysics Data System (ADS)

    Hadi, M. Z.; Djatna, T.; Sugiarto

    2018-04-01

    This paper develops a dynamic storage assignment model to solve storage assignment problem (SAP) for beverages order picking in a drive-in rack warehousing system to determine the appropriate storage location and space for each beverage products dynamically so that the performance of the system can be improved. This study constructs a graph model to represent drive-in rack storage position then combine association rules mining, class-based storage policies and an arrangement rule algorithm to determine an appropriate storage location and arrangement of the product according to dynamic orders from customers. The performance of the proposed model is measured as rule adjacency accuracy, travel distance (for picking process) and probability a product become expiry using Last Come First Serve (LCFS) queue approach. Finally, the proposed model is implemented through computer simulation and compare the performance for different storage assignment methods as well. The result indicates that the proposed model outperforms other storage assignment methods.

  6. An improved association-mining research for exploring Chinese herbal property theory: based on data of the Shennong's Classic of Materia Medica.

    PubMed

    Jin, Rui; Lin, Zhi-jian; Xue, Chun-miao; Zhang, Bing

    2013-09-01

    Knowledge Discovery in Databases is gaining attention and raising new hopes for traditional Chinese medicine (TCM) researchers. It is a useful tool in understanding and deciphering TCM theories. Aiming for a better understanding of Chinese herbal property theory (CHPT), this paper performed an improved association rule learning to analyze semistructured text in the book entitled Shennong's Classic of Materia Medica. The text was firstly annotated and transformed to well-structured multidimensional data. Subsequently, an Apriori algorithm was employed for producing association rules after the sensitivity analysis of parameters. From the confirmed 120 resulting rules that described the intrinsic relationships between herbal property (qi, flavor and their combinations) and herbal efficacy, two novel fundamental principles underlying CHPT were acquired and further elucidated: (1) the many-to-one mapping of herbal efficacy to herbal property; (2) the nonrandom overlap between the related efficacy of qi and flavor. This work provided an innovative knowledge about CHPT, which would be helpful for its modern research.

  7. 75 FR 34666 - Stream Protection Rule; Environmental Impact Statement

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-06-18

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Chapter VII RIN 1029-AC63 Stream Protection Rule; Environmental Impact Statement AGENCY: Office of Surface Mining... impact statement. [[Page 34667

  8. Recommending Learning Activities in Social Network Using Data Mining Algorithms

    ERIC Educational Resources Information Center

    Mahnane, Lamia

    2017-01-01

    In this paper, we show how data mining algorithms (e.g. Apriori Algorithm (AP) and Collaborative Filtering (CF)) is useful in New Social Network (NSN-AP-CF). "NSN-AP-CF" processes the clusters based on different learning styles. Next, it analyzes the habits and the interests of the users through mining the frequent episodes by the…

  9. 26 CFR 1.611-2 - Rules applicable to mines, oil and gas wells, and other natural deposits.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... Rules applicable to mines, oil and gas wells, and other natural deposits. (a) Computation of cost depletion of mines, oil and gas wells, and other natural deposits. (1) The basis upon which cost depletion... for the taxable year, the cost depletion for that year shall be computed by dividing such amount by...

  10. GraDit: graph-based data repair algorithm for multiple data edits rule violations

    NASA Astrophysics Data System (ADS)

    Ode Zuhayeni Madjida, Wa; Gusti Bagus Baskara Nugraha, I.

    2018-03-01

    Constraint-based data cleaning captures data violation to a set of rule called data quality rules. The rules consist of integrity constraint and data edits. Structurally, they are similar, where the rule contain left hand side and right hand side. Previous research proposed a data repair algorithm for integrity constraint violation. The algorithm uses undirected hypergraph as rule violation representation. Nevertheless, this algorithm can not be applied for data edits because of different rule characteristics. This study proposed GraDit, a repair algorithm for data edits rule. First, we use bipartite-directed hypergraph as model representation of overall defined rules. These representation is used for getting interaction between violation rules and clean rules. On the other hand, we proposed undirected graph as violation representation. Our experimental study showed that algorithm with undirected graph as violation representation model gave better data quality than algorithm with undirected hypergraph as representation model.

  11. 76 FR 70075 - Proximity Detection Systems for Continuous Mining Machines in Underground Coal Mines

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-11-10

    ... Detection Systems for Continuous Mining Machines in Underground Coal Mines AGENCY: Mine Safety and Health... proposed rule addressing Proximity Detection Systems for Continuous Mining Machines in Underground Coal... Detection Systems for Continuous Mining Machines in Underground Coal Mines. MSHA conducted hearings on...

  12. Fuzzy Modelling for Human Dynamics Based on Online Social Networks

    PubMed Central

    Cuenca-Jara, Jesus; Valdes-Vela, Mercedes; Skarmeta, Antonio F.

    2017-01-01

    Human mobility mining has attracted a lot of attention in the research community due to its multiple implications in the provisioning of innovative services for large metropolises. In this scope, Online Social Networks (OSN) have arisen as a promising source of location data to come up with new mobility models. However, the human nature of this data makes it rather noisy and inaccurate. In order to deal with such limitations, the present work introduces a framework for human mobility mining based on fuzzy logic. Firstly, a fuzzy clustering algorithm extracts the most active OSN areas at different time periods. Next, such clusters are the building blocks to compose mobility patterns. Furthermore, a location prediction service based on a fuzzy rule classifier has been developed on top of the framework. Finally, both the framework and the predictor has been tested with a Twitter and Flickr dataset in two large cities. PMID:28837120

  13. Fuzzy Modelling for Human Dynamics Based on Online Social Networks.

    PubMed

    Cuenca-Jara, Jesus; Terroso-Saenz, Fernando; Valdes-Vela, Mercedes; Skarmeta, Antonio F

    2017-08-24

    Human mobility mining has attracted a lot of attention in the research community due to its multiple implications in the provisioning of innovative services for large metropolises. In this scope, Online Social Networks (OSN) have arisen as a promising source of location data to come up with new mobility models. However, the human nature of this data makes it rather noisy and inaccurate. In order to deal with such limitations, the present work introduces a framework for human mobility mining based on fuzzy logic. Firstly, a fuzzy clustering algorithm extracts the most active OSN areas at different time periods. Next, such clusters are the building blocks to compose mobility patterns. Furthermore, a location prediction service based on a fuzzy rule classifier has been developed on top of the framework. Finally, both the framework and the predictor has been tested with a Twitter and Flickr dataset in two large cities.

  14. SPMBR: a scalable algorithm for mining sequential patterns based on bitmaps

    NASA Astrophysics Data System (ADS)

    Xu, Xiwei; Zhang, Changhai

    2013-12-01

    Now some sequential patterns mining algorithms generate too many candidate sequences, and increase the processing cost of support counting. Therefore, we present an effective and scalable algorithm called SPMBR (Sequential Patterns Mining based on Bitmap Representation) to solve the problem of mining the sequential patterns for large databases. Our method differs from previous related works of mining sequential patterns. The main difference is that the database of sequential patterns is represented by bitmaps, and a simplified bitmap structure is presented firstly. In this paper, First the algorithm generate candidate sequences by SE(Sequence Extension) and IE(Item Extension), and then obtain all frequent sequences by comparing the original bitmap and the extended item bitmap .This method could simplify the problem of mining the sequential patterns and avoid the high processing cost of support counting. Both theories and experiments indicate that the performance of SPMBR is predominant for large transaction databases, the required memory size for storing temporal data is much less during mining process, and all sequential patterns can be mined with feasibility.

  15. Data mining and visualization techniques

    DOEpatents

    Wong, Pak Chung [Richland, WA; Whitney, Paul [Richland, WA; Thomas, Jim [Richland, WA

    2004-03-23

    Disclosed are association rule identification and visualization methods, systems, and apparatus. An association rule in data mining is an implication of the form X.fwdarw.Y where X is a set of antecedent items and Y is the consequent item. A unique visualization technique that provides multiple antecedent, consequent, confidence, and support information is disclosed to facilitate better presentation of large quantities of complex association rules.

  16. A data mining approach to optimize pellets manufacturing process based on a decision tree algorithm.

    PubMed

    Ronowicz, Joanna; Thommes, Markus; Kleinebudde, Peter; Krysiński, Jerzy

    2015-06-20

    The present study is focused on the thorough analysis of cause-effect relationships between pellet formulation characteristics (pellet composition as well as process parameters) and the selected quality attribute of the final product. The shape using the aspect ratio value expressed the quality of pellets. A data matrix for chemometric analysis consisted of 224 pellet formulations performed by means of eight different active pharmaceutical ingredients and several various excipients, using different extrusion/spheronization process conditions. The data set contained 14 input variables (both formulation and process variables) and one output variable (pellet aspect ratio). A tree regression algorithm consistent with the Quality by Design concept was applied to obtain deeper understanding and knowledge of formulation and process parameters affecting the final pellet sphericity. The clear interpretable set of decision rules were generated. The spehronization speed, spheronization time, number of holes and water content of extrudate have been recognized as the key factors influencing pellet aspect ratio. The most spherical pellets were achieved by using a large number of holes during extrusion, a high spheronizer speed and longer time of spheronization. The described data mining approach enhances knowledge about pelletization process and simultaneously facilitates searching for the optimal process conditions which are necessary to achieve ideal spherical pellets, resulting in good flow characteristics. This data mining approach can be taken into consideration by industrial formulation scientists to support rational decision making in the field of pellets technology. Copyright © 2015 Elsevier B.V. All rights reserved.

  17. Machine Learning and Data Mining Methods in Diabetes Research.

    PubMed

    Kavakiotis, Ioannis; Tsave, Olga; Salifoglou, Athanasios; Maglaveras, Nicos; Vlahavas, Ioannis; Chouvarda, Ioanna

    2017-01-01

    The remarkable advances in biotechnology and health sciences have led to a significant production of data, such as high throughput genetic data and clinical information, generated from large Electronic Health Records (EHRs). To this end, application of machine learning and data mining methods in biosciences is presently, more than ever before, vital and indispensable in efforts to transform intelligently all available information into valuable knowledge. Diabetes mellitus (DM) is defined as a group of metabolic disorders exerting significant pressure on human health worldwide. Extensive research in all aspects of diabetes (diagnosis, etiopathophysiology, therapy, etc.) has led to the generation of huge amounts of data. The aim of the present study is to conduct a systematic review of the applications of machine learning, data mining techniques and tools in the field of diabetes research with respect to a) Prediction and Diagnosis, b) Diabetic Complications, c) Genetic Background and Environment, and e) Health Care and Management with the first category appearing to be the most popular. A wide range of machine learning algorithms were employed. In general, 85% of those used were characterized by supervised learning approaches and 15% by unsupervised ones, and more specifically, association rules. Support vector machines (SVM) arise as the most successful and widely used algorithm. Concerning the type of data, clinical datasets were mainly used. The title applications in the selected articles project the usefulness of extracting valuable knowledge leading to new hypotheses targeting deeper understanding and further investigation in DM.

  18. Applying data mining techniques to determine important parameters in chronic kidney disease and the relations of these parameters to each other.

    PubMed

    Tahmasebian, Shahram; Ghazisaeedi, Marjan; Langarizadeh, Mostafa; Mokhtaran, Mehrshad; Mahdavi-Mazdeh, Mitra; Javadian, Parisa

    2017-01-01

    Introduction: Chronic kidney disease (CKD) includes a wide range of pathophysiological processes which will be observed along with abnormal function of kidneys and progressive decrease in glomerular filtration rate (GFR). According to the definition decreasing GFR must have been present for at least three months. CKD will eventually result in end-stage kidney disease. In this process different factors play role and finding the relations between effective parameters in this regard can help to prevent or slow progression of this disease. There are always a lot of data being collected from the patients' medical records. This huge array of data can be considered a valuable source for analyzing, exploring and discovering information. Objectives: Using the data mining techniques, the present study tries to specify the effective parameters and also aims to determine their relations with each other in Iranian patients with CKD. Material and Methods: The study population includes 31996 patients with CKD. First, all of the data is registered in the database. Then data mining tools were used to find the hidden rules and relationships between parameters in collected data. Results: After data cleaning based on CRISP-DM (Cross Industry Standard Process for Data Mining) methodology and running mining algorithms on the data in the database the relationships between the effective parameters was specified. Conclusion: This study was done using the data mining method pertaining to the effective factors on patients with CKD.

  19. Applying data mining techniques to determine important parameters in chronic kidney disease and the relations of these parameters to each other

    PubMed Central

    Tahmasebian, Shahram; Ghazisaeedi, Marjan; Langarizadeh, Mostafa; Mokhtaran, Mehrshad; Mahdavi-Mazdeh, Mitra; Javadian, Parisa

    2017-01-01

    Introduction: Chronic kidney disease (CKD) includes a wide range of pathophysiological processes which will be observed along with abnormal function of kidneys and progressive decrease in glomerular filtration rate (GFR). According to the definition decreasing GFR must have been present for at least three months. CKD will eventually result in end-stage kidney disease. In this process different factors play role and finding the relations between effective parameters in this regard can help to prevent or slow progression of this disease. There are always a lot of data being collected from the patients’ medical records. This huge array of data can be considered a valuable source for analyzing, exploring and discovering information. Objectives: Using the data mining techniques, the present study tries to specify the effective parameters and also aims to determine their relations with each other in Iranian patients with CKD. Material and Methods: The study population includes 31996 patients with CKD. First, all of the data is registered in the database. Then data mining tools were used to find the hidden rules and relationships between parameters in collected data. Results: After data cleaning based on CRISP-DM (Cross Industry Standard Process for Data Mining) methodology and running mining algorithms on the data in the database the relationships between the effective parameters was specified. Conclusion: This study was done using the data mining method pertaining to the effective factors on patients with CKD. PMID:28497080

  20. Prediction of pork quality parameters by applying fractals and data mining on MRI.

    PubMed

    Caballero, Daniel; Pérez-Palacios, Trinidad; Caro, Andrés; Amigo, José Manuel; Dahl, Anders B; ErsbØll, Bjarne K; Antequera, Teresa

    2017-09-01

    This work firstly investigates the use of MRI, fractal algorithms and data mining techniques to determine pork quality parameters non-destructively. The main objective was to evaluate the capability of fractal algorithms (Classical Fractal algorithm, CFA; Fractal Texture Algorithm, FTA and One Point Fractal Texture Algorithm, OPFTA) to analyse MRI in order to predict quality parameters of loin. In addition, the effect of the sequence acquisition of MRI (Gradient echo, GE; Spin echo, SE and Turbo 3D, T3D) and the predictive technique of data mining (Isotonic regression, IR and Multiple linear regression, MLR) were analysed. Both fractal algorithm, FTA and OPFTA are appropriate to analyse MRI of loins. The sequence acquisition, the fractal algorithm and the data mining technique seems to influence on the prediction results. For most physico-chemical parameters, prediction equations with moderate to excellent correlation coefficients were achieved by using the following combinations of acquisition sequences of MRI, fractal algorithms and data mining techniques: SE-FTA-MLR, SE-OPFTA-IR, GE-OPFTA-MLR, SE-OPFTA-MLR, with the last one offering the best prediction results. Thus, SE-OPFTA-MLR could be proposed as an alternative technique to determine physico-chemical traits of fresh and dry-cured loins in a non-destructive way with high accuracy. Copyright © 2017. Published by Elsevier Ltd.

  1. Apriori Versions Based on MapReduce for Mining Frequent Patterns on Big Data.

    PubMed

    Luna, Jose Maria; Padillo, Francisco; Pechenizkiy, Mykola; Ventura, Sebastian

    2017-09-27

    Pattern mining is one of the most important tasks to extract meaningful and useful information from raw data. This task aims to extract item-sets that represent any type of homogeneity and regularity in data. Although many efficient algorithms have been developed in this regard, the growing interest in data has caused the performance of existing pattern mining techniques to be dropped. The goal of this paper is to propose new efficient pattern mining algorithms to work in big data. To this aim, a series of algorithms based on the MapReduce framework and the Hadoop open-source implementation have been proposed. The proposed algorithms can be divided into three main groups. First, two algorithms [Apriori MapReduce (AprioriMR) and iterative AprioriMR] with no pruning strategy are proposed, which extract any existing item-set in data. Second, two algorithms (space pruning AprioriMR and top AprioriMR) that prune the search space by means of the well-known anti-monotone property are proposed. Finally, a last algorithm (maximal AprioriMR) is also proposed for mining condensed representations of frequent patterns. To test the performance of the proposed algorithms, a varied collection of big data datasets have been considered, comprising up to 3 · 10#x00B9;⁸ transactions and more than 5 million of distinct single-items. The experimental stage includes comparisons against highly efficient and well-known pattern mining algorithms. Results reveal the interest of applying MapReduce versions when complex problems are considered, and also the unsuitability of this paradigm when dealing with small data.

  2. A Node Linkage Approach for Sequential Pattern Mining

    PubMed Central

    Navarro, Osvaldo; Cumplido, René; Villaseñor-Pineda, Luis; Feregrino-Uribe, Claudia; Carrasco-Ochoa, Jesús Ariel

    2014-01-01

    Sequential Pattern Mining is a widely addressed problem in data mining, with applications such as analyzing Web usage, examining purchase behavior, and text mining, among others. Nevertheless, with the dramatic increase in data volume, the current approaches prove inefficient when dealing with large input datasets, a large number of different symbols and low minimum supports. In this paper, we propose a new sequential pattern mining algorithm, which follows a pattern-growth scheme to discover sequential patterns. Unlike most pattern growth algorithms, our approach does not build a data structure to represent the input dataset, but instead accesses the required sequences through pseudo-projection databases, achieving better runtime and reducing memory requirements. Our algorithm traverses the search space in a depth-first fashion and only preserves in memory a pattern node linkage and the pseudo-projections required for the branch being explored at the time. Experimental results show that our new approach, the Node Linkage Depth-First Traversal algorithm (NLDFT), has better performance and scalability in comparison with state of the art algorithms. PMID:24933123

  3. Privacy Preserving Nearest Neighbor Search

    NASA Astrophysics Data System (ADS)

    Shaneck, Mark; Kim, Yongdae; Kumar, Vipin

    Data mining is frequently obstructed by privacy concerns. In many cases data is distributed, and bringing the data together in one place for analysis is not possible due to privacy laws (e.g. HIPAA) or policies. Privacy preserving data mining techniques have been developed to address this issue by providing mechanisms to mine the data while giving certain privacy guarantees. In this chapter we address the issue of privacy preserving nearest neighbor search, which forms the kernel of many data mining applications. To this end, we present a novel algorithm based on secure multiparty computation primitives to compute the nearest neighbors of records in horizontally distributed data. We show how this algorithm can be used in three important data mining algorithms, namely LOF outlier detection, SNN clustering, and kNN classification. We prove the security of these algorithms under the semi-honest adversarial model, and describe methods that can be used to optimize their performance. Keywords: Privacy Preserving Data Mining, Nearest Neighbor Search, Outlier Detection, Clustering, Classification, Secure Multiparty Computation

  4. Efficient frequent pattern mining algorithm based on node sets in cloud computing environment

    NASA Astrophysics Data System (ADS)

    Billa, V. N. Vinay Kumar; Lakshmanna, K.; Rajesh, K.; Reddy, M. Praveen Kumar; Nagaraja, G.; Sudheer, K.

    2017-11-01

    The ultimate goal of Data Mining is to determine the hidden information which is useful in making decisions using the large databases collected by an organization. This Data Mining involves many tasks that are to be performed during the process. Mining frequent itemsets is the one of the most important tasks in case of transactional databases. These transactional databases contain the data in very large scale where the mining of these databases involves the consumption of physical memory and time in proportion to the size of the database. A frequent pattern mining algorithm is said to be efficient only if it consumes less memory and time to mine the frequent itemsets from the given large database. Having these points in mind in this thesis we proposed a system which mines frequent itemsets in an optimized way in terms of memory and time by using cloud computing as an important factor to make the process parallel and the application is provided as a service. A complete framework which uses a proven efficient algorithm called FIN algorithm. FIN algorithm works on Nodesets and POC (pre-order coding) tree. In order to evaluate the performance of the system we conduct the experiments to compare the efficiency of the same algorithm applied in a standalone manner and in cloud computing environment on a real time data set which is traffic accidents data set. The results show that the memory consumption and execution time taken for the process in the proposed system is much lesser than those of standalone system.

  5. DMET-Miner: Efficient discovery of association rules from pharmacogenomic data.

    PubMed

    Agapito, Giuseppe; Guzzi, Pietro H; Cannataro, Mario

    2015-08-01

    Microarray platforms enable the investigation of allelic variants that may be correlated to phenotypes. Among those, the Affymetrix DMET (Drug Metabolism Enzymes and Transporters) platform enables the simultaneous investigation of all the genes that are related to drug absorption, distribution, metabolism and excretion (ADME). Although recent studies demonstrated the effectiveness of the use of DMET data for studying drug response or toxicity in clinical studies, there is a lack of tools for the automatic analysis of DMET data. In a previous work we developed DMET-Analyzer, a methodology and a supporting platform able to automatize the statistical study of allelic variants, that has been validated in several clinical studies. Although DMET-Analyzer is able to correlate a single variant for each probe (related to a portion of a gene) through the use of the Fisher test, it is unable to discover multiple associations among allelic variants, due to its underlying statistic analysis strategy that focuses on a single variant for each time. To overcome those limitations, here we propose a new analysis methodology for DMET data based on Association Rules mining, and an efficient implementation of this methodology, named DMET-Miner. DMET-Miner extends the DMET-Analyzer tool with data mining capabilities and correlates the presence of a set of allelic variants with the conditions of patient's samples by exploiting association rules. To face the high number of frequent itemsets generated when considering large clinical studies based on DMET data, DMET-Miner uses an efficient data structure and implements an optimized search strategy that reduces the search space and the execution time. Preliminary experiments on synthetic DMET datasets, show how DMET-Miner outperforms off-the-shelf data mining suites such as the FP-Growth algorithms available in Weka and RapidMiner. To demonstrate the biological relevance of the extracted association rules and the effectiveness of the proposed approach from a medical point of view, some preliminary studies on a real clinical dataset are currently under medical investigation. Copyright © 2015 Elsevier Inc. All rights reserved.

  6. Imaging spectroscopy: Earth and planetary remote sensing with the USGS Tetracorder and expert systems

    USGS Publications Warehouse

    Clark, Roger N.; Swayze, Gregg A.; Livo, K. Eric; Kokaly, Raymond F.; Sutley, Steve J.; Dalton, J. Brad; McDougal, Robert R.; Gent, Carol A.

    2003-01-01

    Imaging spectroscopy is a tool that can be used to spectrally identify and spatially map materials based on their specific chemical bonds. Spectroscopic analysis requires significantly more sophistication than has been employed in conventional broadband remote sensing analysis. We describe a new system that is effective at material identification and mapping: a set of algorithms within an expert system decision‐making framework that we call Tetracorder. The expertise in the system has been derived from scientific knowledge of spectral identification. The expert system rules are implemented in a decision tree where multiple algorithms are applied to spectral analysis, additional expert rules and algorithms can be applied based on initial results, and more decisions are made until spectral analysis is complete. Because certain spectral features are indicative of specific chemical bonds in materials, the system can accurately identify and map those materials. In this paper we describe the framework of the decision making process used for spectral identification, describe specific spectral feature analysis algorithms, and give examples of what analyses and types of maps are possible with imaging spectroscopy data. We also present the expert system rules that describe which diagnostic spectral features are used in the decision making process for a set of spectra of minerals and other common materials. We demonstrate the applications of Tetracorder to identify and map surface minerals, to detect sources of acid rock drainage, and to map vegetation species, ice, melting snow, water, and water pollution, all with one set of expert system rules. Mineral mapping can aid in geologic mapping and fault detection and can provide a better understanding of weathering, mineralization, hydrothermal alteration, and other geologic processes. Environmental site assessment, such as mapping source areas of acid mine drainage, has resulted in the acceleration of site cleanup, saving millions of dollars and years in cleanup time. Imaging spectroscopy data and Tetracorder analysis can be used to study both terrestrial and planetary science problems. Imaging spectroscopy can be used to probe planetary systems, including their atmospheres, oceans, and land surfaces.

  7. A hybrid, auto-adaptive and rule-based multi-agent approach using evolutionary algorithms for improved searching

    NASA Astrophysics Data System (ADS)

    Izquierdo, Joaquín; Montalvo, Idel; Campbell, Enrique; Pérez-García, Rafael

    2016-08-01

    Selecting the most appropriate heuristic for solving a specific problem is not easy, for many reasons. This article focuses on one of these reasons: traditionally, the solution search process has operated in a given manner regardless of the specific problem being solved, and the process has been the same regardless of the size, complexity and domain of the problem. To cope with this situation, search processes should mould the search into areas of the search space that are meaningful for the problem. This article builds on previous work in the development of a multi-agent paradigm using techniques derived from knowledge discovery (data-mining techniques) on databases of so-far visited solutions. The aim is to improve the search mechanisms, increase computational efficiency and use rules to enrich the formulation of optimization problems, while reducing the search space and catering to realistic problems.

  8. Data Mining Citizen Science Results

    NASA Astrophysics Data System (ADS)

    Borne, K. D.

    2012-12-01

    Scientific discovery from big data is enabled through multiple channels, including data mining (through the application of machine learning algorithms) and human computation (commonly implemented through citizen science tasks). We will describe the results of new data mining experiments on the results from citizen science activities. Discovering patterns, trends, and anomalies in data are among the powerful contributions of citizen science. Establishing scientific algorithms that can subsequently re-discover the same types of patterns, trends, and anomalies in automatic data processing pipelines will ultimately result from the transformation of those human algorithms into computer algorithms, which can then be applied to much larger data collections. Scientific discovery from big data is thus greatly amplified through the marriage of data mining with citizen science.

  9. 77 FR 5740 - Tennessee Abandoned Mine Land Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-02-06

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 942... Mining Reclamation and Enforcement (OSM), Interior. ACTION: Proposed rule; public comment period and... amendment to the Tennessee Abandoned Mine Land (AML) Reclamation Plan under the Surface Mining Control and...

  10. 30 CFR 784.200 - Interpretive rules related to General Performance Standards.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... RECLAMATION AND OPERATION PLAN § 784.200 Interpretive rules related to General Performance Standards. The... ENFORCEMENT, DEPARTMENT OF THE INTERIOR SURFACE COAL MINING AND RECLAMATION OPERATIONS PERMITS AND COAL... Surface Mining Reclamation and Enforcement. (a) Interpretation of § 784.15: Reclamation plan: Postmining...

  11. Improving the Interpretability of Classification Rules Discovered by an Ant Colony Algorithm: Extended Results.

    PubMed

    Otero, Fernando E B; Freitas, Alex A

    2016-01-01

    Most ant colony optimization (ACO) algorithms for inducing classification rules use a ACO-based procedure to create a rule in a one-at-a-time fashion. An improved search strategy has been proposed in the cAnt-Miner[Formula: see text] algorithm, where an ACO-based procedure is used to create a complete list of rules (ordered rules), i.e., the ACO search is guided by the quality of a list of rules instead of an individual rule. In this paper we propose an extension of the cAnt-Miner[Formula: see text] algorithm to discover a set of rules (unordered rules). The main motivations for this work are to improve the interpretation of individual rules by discovering a set of rules and to evaluate the impact on the predictive accuracy of the algorithm. We also propose a new measure to evaluate the interpretability of the discovered rules to mitigate the fact that the commonly used model size measure ignores how the rules are used to make a class prediction. Comparisons with state-of-the-art rule induction algorithms, support vector machines, and the cAnt-Miner[Formula: see text] producing ordered rules are also presented.

  12. Advances in algorithm fusion for automated sea mine detection and classification

    NASA Astrophysics Data System (ADS)

    Dobeck, Gerald J.; Cobb, J. Tory

    2002-11-01

    Along with other sensors, the Navy uses high-resolution sonar to detect and classify sea mines in mine-hunting operations. Scientists and engineers have devoted substantial effort to the development of automated detection and classification (D/C) algorithms for these high-resolution systems. Several factors spurred these efforts, including: (1) aids for operators to reduce work overload; (2) more optimal use of all available data; and (3) the introduction of unmanned minehunting systems. The environments where sea mines are typically laid (harbor areas, shipping lanes, and the littorals) give rise to many false alarms caused by natural, biologic, and manmade clutter. The objective of the automated D/C algorithms is to eliminate most of these false alarms while maintaining a very high probability of mine detection and classification (PdPc). In recent years, the benefits of fusing the outputs of multiple D/C algorithms (Algorithm Fusion) have been studied. To date, the results have been remarkable, including reliable robustness to new environments. In this paper a brief history of existing Algorithm Fusion technology and some techniques recently used to improve performance are presented. An exploration of new developments is presented in conclusion.

  13. Hybrid analysis for indicating patients with breast cancer using temperature time series.

    PubMed

    Silva, Lincoln F; Santos, Alair Augusto S M D; Bravo, Renato S; Silva, Aristófanes C; Muchaluat-Saade, Débora C; Conci, Aura

    2016-07-01

    Breast cancer is the most common cancer among women worldwide. Diagnosis and treatment in early stages increase cure chances. The temperature of cancerous tissue is generally higher than that of healthy surrounding tissues, making thermography an option to be considered in screening strategies of this cancer type. This paper proposes a hybrid methodology for analyzing dynamic infrared thermography in order to indicate patients with risk of breast cancer, using unsupervised and supervised machine learning techniques, which characterizes the methodology as hybrid. The dynamic infrared thermography monitors or quantitatively measures temperature changes on the examined surface, after a thermal stress. In the dynamic infrared thermography execution, a sequence of breast thermograms is generated. In the proposed methodology, this sequence is processed and analyzed by several techniques. First, the region of the breasts is segmented and the thermograms of the sequence are registered. Then, temperature time series are built and the k-means algorithm is applied on these series using various values of k. Clustering formed by k-means algorithm, for each k value, is evaluated using clustering validation indices, generating values treated as features in the classification model construction step. A data mining tool was used to solve the combined algorithm selection and hyperparameter optimization (CASH) problem in classification tasks. Besides the classification algorithm recommended by the data mining tool, classifiers based on Bayesian networks, neural networks, decision rules and decision tree were executed on the data set used for evaluation. Test results support that the proposed analysis methodology is able to indicate patients with breast cancer. Among 39 tested classification algorithms, K-Star and Bayes Net presented 100% classification accuracy. Furthermore, among the Bayes Net, multi-layer perceptron, decision table and random forest classification algorithms, an average accuracy of 95.38% was obtained. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  14. Photometric redshift estimation based on data mining with PhotoRApToR

    NASA Astrophysics Data System (ADS)

    Cavuoti, S.; Brescia, M.; De Stefano, V.; Longo, G.

    2015-03-01

    Photometric redshifts (photo-z) are crucial to the scientific exploitation of modern panchromatic digital surveys. In this paper we present PhotoRApToR (Photometric Research Application To Redshift): a Java/C ++ based desktop application capable to solve non-linear regression and multi-variate classification problems, in particular specialized for photo-z estimation. It embeds a machine learning algorithm, namely a multi-layer neural network trained by the Quasi Newton learning rule, and special tools dedicated to pre- and post-processing data. PhotoRApToR has been successfully tested on several scientific cases. The application is available for free download from the DAME Program web site.

  15. Learning Behavior Characterization with Multi-Feature, Hierarchical Activity Sequences

    ERIC Educational Resources Information Center

    Ye, Cheng; Segedy, James R.; Kinnebrew, John S.; Biswas, Gautam

    2015-01-01

    This paper discusses Multi-Feature Hierarchical Sequential Pattern Mining, MFH-SPAM, a novel algorithm that efficiently extracts patterns from students' learning activity sequences. This algorithm extends an existing sequential pattern mining algorithm by dynamically selecting the level of specificity for hierarchically-defined features…

  16. Application of the pessimistic pruning to increase the accuracy of C4.5 algorithm in diagnosing chronic kidney disease

    NASA Astrophysics Data System (ADS)

    Muslim, M. A.; Herowati, A. J.; Sugiharti, E.; Prasetiyo, B.

    2018-03-01

    A technique to dig valuable information buried or hidden in data collection which is so big to be found an interesting patterns that was previously unknown is called data mining. Data mining has been applied in the healthcare industry. One technique used data mining is classification. The decision tree included in the classification of data mining and algorithm developed by decision tree is C4.5 algorithm. A classifier is designed using applying pessimistic pruning in C4.5 algorithm in diagnosing chronic kidney disease. Pessimistic pruning use to identify and remove branches that are not needed, this is done to avoid overfitting the decision tree generated by the C4.5 algorithm. In this paper, the result obtained using these classifiers are presented and discussed. Using pessimistic pruning shows increase accuracy of C4.5 algorithm of 1.5% from 95% to 96.5% in diagnosing of chronic kidney disease.

  17. Spectral methods to detect surface mines

    NASA Astrophysics Data System (ADS)

    Winter, Edwin M.; Schatten Silvious, Miranda

    2008-04-01

    Over the past five years, advances have been made in the spectral detection of surface mines under minefield detection programs at the U. S. Army RDECOM CERDEC Night Vision and Electronic Sensors Directorate (NVESD). The problem of detecting surface land mines ranges from the relatively simple, the detection of large anti-vehicle mines on bare soil, to the very difficult, the detection of anti-personnel mines in thick vegetation. While spatial and spectral approaches can be applied to the detection of surface mines, spatial-only detection requires many pixels-on-target such that the mine is actually imaged and shape-based features can be exploited. This method is unreliable in vegetated areas because only part of the mine may be exposed, while spectral detection is possible without the mine being resolved. At NVESD, hyperspectral and multi-spectral sensors throughout the reflection and thermal spectral regimes have been applied to the mine detection problem. Data has been collected on mines in forest and desert regions and algorithms have been developed both to detect the mines as anomalies and to detect the mines based on their spectral signature. In addition to the detection of individual mines, algorithms have been developed to exploit the similarities of mines in a minefield to improve their detection probability. In this paper, the types of spectral data collected over the past five years will be summarized along with the advances in algorithm development.

  18. Data Mining: The Art of Automated Knowledge Extraction

    NASA Astrophysics Data System (ADS)

    Karimabadi, H.; Sipes, T.

    2012-12-01

    Data mining algorithms are used routinely in a wide variety of fields and they are gaining adoption in sciences. The realities of real world data analysis are that (a) data has flaws, and (b) the models and assumptions that we bring to the data are inevitably flawed, and/or biased and misspecified in some way. Data mining can improve data analysis by detecting anomalies in the data, check for consistency of the user model assumptions, and decipher complex patterns and relationships that would not be possible otherwise. The common form of data collected from in situ spacecraft measurements is multi-variate time series which represents one of the most challenging problems in data mining. We have successfully developed algorithms to deal with such data and have extended the algorithms to handle streaming data. In this talk, we illustrate the utility of our algorithms through several examples including automated detection of reconnection exhausts in the solar wind and flux ropes in the magnetotail. We also show examples from successful applications of our technique to analysis of 3D kinetic simulations. With an eye to the future, we provide an overview of our upcoming plans that include collaborative data mining, expert outsourcing data mining, computer vision for image analysis, among others. Finally, we discuss the integration of data mining algorithms with web-based services such as VxOs and other Heliophysics data centers and the resulting capabilities that it would enable.

  19. Global Optimization Ensemble Model for Classification Methods

    PubMed Central

    Anwar, Hina; Qamar, Usman; Muzaffar Qureshi, Abdul Wahab

    2014-01-01

    Supervised learning is the process of data mining for deducing rules from training datasets. A broad array of supervised learning algorithms exists, every one of them with its own advantages and drawbacks. There are some basic issues that affect the accuracy of classifier while solving a supervised learning problem, like bias-variance tradeoff, dimensionality of input space, and noise in the input data space. All these problems affect the accuracy of classifier and are the reason that there is no global optimal method for classification. There is not any generalized improvement method that can increase the accuracy of any classifier while addressing all the problems stated above. This paper proposes a global optimization ensemble model for classification methods (GMC) that can improve the overall accuracy for supervised learning problems. The experimental results on various public datasets showed that the proposed model improved the accuracy of the classification models from 1% to 30% depending upon the algorithm complexity. PMID:24883382

  20. Data Streams: An Overview and Scientific Applications

    NASA Astrophysics Data System (ADS)

    Aggarwal, Charu C.

    In recent years, advances in hardware technology have facilitated the ability to collect data continuously. Simple transactions of everyday life such as using a credit card, a phone, or browsing the web lead to automated data storage. Similarly, advances in information technology have lead to large flows of data across IP networks. In many cases, these large volumes of data can be mined for interesting and relevant information in a wide variety of applications. When the volume of the underlying data is very large, it leads to a number of computational and mining challenges: With increasing volume of the data, it is no longer possible to process the data efficiently by using multiple passes. Rather, one can process a data item at most once. This leads to constraints on the implementation of the underlying algorithms. Therefore, stream mining algorithms typically need to be designed so that the algorithms work with one pass of the data. In most cases, there is an inherent temporal component to the stream mining process. This is because the data may evolve over time. This behavior of data streams is referred to as temporal locality. Therefore, a straightforward adaptation of one-pass mining algorithms may not be an effective solution to the task. Stream mining algorithms need to be carefully designed with a clear focus on the evolution of the underlying data. Another important characteristic of data streams is that they are often mined in a distributed fashion. Furthermore, the individual processors may have limited processing and memory. Examples of such cases include sensor networks, in which it may be desirable to perform in-network processing of data stream with limited processing and memory [1, 2]. This chapter will provide an overview of the key challenges in stream mining algorithms which arise from the unique setup in which these problems are encountered. This chapter is organized as follows. In the next section, we will discuss the generic challenges that stream mining poses to a variety of data management and data mining problems. The next section also deals with several issues which arise in the context of data stream management. In Sect. 3, we discuss several mining algorithms on the data stream model. Section 4 discusses various scientific applications of data streams. Section 5 discusses the research directions and conclusions.

  1. Collaborative Data Mining Tool for Education

    ERIC Educational Resources Information Center

    Garcia, Enrique; Romero, Cristobal; Ventura, Sebastian; Gea, Miguel; de Castro, Carlos

    2009-01-01

    This paper describes a collaborative educational data mining tool based on association rule mining for the continuous improvement of e-learning courses allowing teachers with similar course's profile sharing and scoring the discovered information. This mining tool is oriented to be used by instructors non experts in data mining such that, its…

  2. 77 FR 44155 - Administration of Mining Claims and Sites

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-07-27

    ... 1004-AE27 Administration of Mining Claims and Sites AGENCY: Bureau of Land Management, Interior. ACTION... on locating, recording, and maintaining mining claims or sites. In this rule, the BLM amends its... placer mining claims. The law specifies that the holder of an unpatented placer mining claim must pay the...

  3. 43 CFR 3487.1 - Logical mining units.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 43 Public Lands: Interior 2 2011-10-01 2011-10-01 false Logical mining units. 3487.1 Section 3487..., DEPARTMENT OF THE INTERIOR MINERALS MANAGEMENT (3000) COAL EXPLORATION AND MINING OPERATIONS RULES Logical Mining Unit § 3487.1 Logical mining units. (a) An LMU shall become effective only upon approval of the...

  4. Application of Three Existing Stope Boundary Optimisation Methods in an Operating Underground Mine

    NASA Astrophysics Data System (ADS)

    Erdogan, Gamze; Yavuz, Mahmut

    2017-12-01

    The underground mine planning and design optimisation process have received little attention because of complexity and variability of problems in underground mines. Although a number of optimisation studies and software tools are available and some of them, in special, have been implemented effectively to determine the ultimate-pit limits in an open pit mine, there is still a lack of studies for optimisation of ultimate stope boundaries in underground mines. The proposed approaches for this purpose aim at maximizing the economic profit by selecting the best possible layout under operational, technical and physical constraints. In this paper, the existing three heuristic techniques including Floating Stope Algorithm, Maximum Value Algorithm and Mineable Shape Optimiser (MSO) are examined for optimisation of stope layout in a case study. Each technique is assessed in terms of applicability, algorithm capabilities and limitations considering the underground mine planning challenges. Finally, the results are evaluated and compared.

  5. Health-Mining: a Disease Management Support Service based on Data Mining and Rule Extraction.

    PubMed

    Bei, Andrea; De Luca, Stefano; Ruscitti, Giancarlo; Salamon, Diego

    2005-01-01

    The disease management is the collection of the processes aimed to control the health care and improving the quality at same time reducing the overall cost of the procedures. Our system, Health-Mining, is a Decision Support System with the objective of controlling the adequacy of hospitalization and therapies, determining the effective use of standard guidelines and eventually identifying better ones emerged from the medical practice (Evidence Based Medicine). In realizing the system, we have the aim of creation of a path to admissions- appropriateness criteria construction, valid at an international level. A main goal of the project is rule extraction and the identification of the rules adequate in term of efficacy, quality and cost reduction, especially in the view of fast changing technologies and medicines. We tested Health-Mining in a real test case for an Italian Region, Regione Veneto, on the installation of pacemaker and ICD.

  6. Mining Rare Associations between Biological Ontologies

    PubMed Central

    Benites, Fernando; Simon, Svenja; Sapozhnikova, Elena

    2014-01-01

    The constantly increasing volume and complexity of available biological data requires new methods for their management and analysis. An important challenge is the integration of information from different sources in order to discover possible hidden relations between already known data. In this paper we introduce a data mining approach which relates biological ontologies by mining cross and intra-ontology pairwise generalized association rules. Its advantage is sensitivity to rare associations, for these are important for biologists. We propose a new class of interestingness measures designed for hierarchically organized rules. These measures allow one to select the most important rules and to take into account rare cases. They favor rules with an actual interestingness value that exceeds the expected value. The latter is calculated taking into account the parent rule. We demonstrate this approach by applying it to the analysis of data from Gene Ontology and GPCR databases. Our objective is to discover interesting relations between two different ontologies or parts of a single ontology. The association rules that are thus discovered can provide the user with new knowledge about underlying biological processes or help improve annotation consistency. The obtained results show that produced rules represent meaningful and quite reliable associations. PMID:24404165

  7. Mining rare associations between biological ontologies.

    PubMed

    Benites, Fernando; Simon, Svenja; Sapozhnikova, Elena

    2014-01-01

    The constantly increasing volume and complexity of available biological data requires new methods for their management and analysis. An important challenge is the integration of information from different sources in order to discover possible hidden relations between already known data. In this paper we introduce a data mining approach which relates biological ontologies by mining cross and intra-ontology pairwise generalized association rules. Its advantage is sensitivity to rare associations, for these are important for biologists. We propose a new class of interestingness measures designed for hierarchically organized rules. These measures allow one to select the most important rules and to take into account rare cases. They favor rules with an actual interestingness value that exceeds the expected value. The latter is calculated taking into account the parent rule. We demonstrate this approach by applying it to the analysis of data from Gene Ontology and GPCR databases. Our objective is to discover interesting relations between two different ontologies or parts of a single ontology. The association rules that are thus discovered can provide the user with new knowledge about underlying biological processes or help improve annotation consistency. The obtained results show that produced rules represent meaningful and quite reliable associations.

  8. pubmed.mineR: an R package with text-mining algorithms to analyse PubMed abstracts.

    PubMed

    Rani, Jyoti; Shah, A B Rauf; Ramachandran, Srinivasan

    2015-10-01

    The PubMed literature database is a valuable source of information for scientific research. It is rich in biomedical literature with more than 24 million citations. Data-mining of voluminous literature is a challenging task. Although several text-mining algorithms have been developed in recent years with focus on data visualization, they have limitations such as speed, are rigid and are not available in the open source. We have developed an R package, pubmed.mineR, wherein we have combined the advantages of existing algorithms, overcome their limitations, and offer user flexibility and link with other packages in Bioconductor and the Comprehensive R Network (CRAN) in order to expand the user capabilities for executing multifaceted approaches. Three case studies are presented, namely, 'Evolving role of diabetes educators', 'Cancer risk assessment' and 'Dynamic concepts on disease and comorbidity' to illustrate the use of pubmed.mineR. The package generally runs fast with small elapsed times in regular workstations even on large corpus sizes and with compute intensive functions. The pubmed.mineR is available at http://cran.rproject. org/web/packages/pubmed.mineR.

  9. 76 FR 35801 - Examinations of Work Areas in Underground Coal Mines and Pattern of Violations

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-06-20

    ..., 1219-AB73 Examinations of Work Areas in Underground Coal Mines and Pattern of Violations AGENCY: Mine... public hearings on the Agency's proposed rules for Examinations of Work Areas in Underground Coal Mines... Underground Coal Mines' submissions, and with ``RIN 1219-AB73'' for Pattern of Violations' submissions...

  10. 78 FR 48591 - Refuge Alternatives for Underground Coal Mines

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-08-08

    ... Administration 30 CFR Parts 7 and 75 Refuge Alternatives for Underground Coal Mines; Proposed Rules #0;#0;Federal... Underground Coal Mines AGENCY: Mine Safety and Health Administration, Labor. ACTION: Limited reopening of the... for miners to deploy and use refuge alternatives in underground coal mines. The U.S. Court of Appeals...

  11. 75 FR 20918 - High-Voltage Continuous Mining Machine Standard for Underground Coal Mines

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-04-22

    ... DEPARTMENT OF LABOR Mine Safety and Health Administration 30 CFR Parts 18 and 75 RIN 1219-AB34 High-Voltage Continuous Mining Machine Standard for Underground Coal Mines Correction In rule document 2010-7309 beginning on page 17529 in the issue of Tuesday, April 6, 2010, make the following correction...

  12. Learner Typologies Development Using OIndex and Data Mining Based Clustering Techniques

    ERIC Educational Resources Information Center

    Luan, Jing

    2004-01-01

    This explorative data mining project used distance based clustering algorithm to study 3 indicators, called OIndex, of student behavioral data and stabilized at a 6-cluster scenario following an exhaustive explorative study of 4, 5, and 6 cluster scenarios produced by K-Means and TwoStep algorithms. Using principles in data mining, the study…

  13. ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis.

    PubMed

    Mallik, Saurav; Zhao, Zhongming

    2017-12-28

    For transcriptomic analysis, there are numerous microarray-based genomic data, especially those generated for cancer research. The typical analysis measures the difference between a cancer sample-group and a matched control group for each transcript or gene. Association rule mining is used to discover interesting item sets through rule-based methodology. Thus, it has advantages to find causal effect relationships between the transcripts. In this work, we introduce two new rule-based similarity measures-weighted rank-based Jaccard and Cosine measures-and then propose a novel computational framework to detect condensed gene co-expression modules ( C o n G E M s) through the association rule-based learning system and the weighted similarity scores. In practice, the list of evolved condensed markers that consists of both singular and complex markers in nature depends on the corresponding condensed gene sets in either antecedent or consequent of the rules of the resultant modules. In our evaluation, these markers could be supported by literature evidence, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway and Gene Ontology annotations. Specifically, we preliminarily identified differentially expressed genes using an empirical Bayes test. A recently developed algorithm-RANWAR-was then utilized to determine the association rules from these genes. Based on that, we computed the integrated similarity scores of these rule-based similarity measures between each rule-pair, and the resultant scores were used for clustering to identify the co-expressed rule-modules. We applied our method to a gene expression dataset for lung squamous cell carcinoma and a genome methylation dataset for uterine cervical carcinogenesis. Our proposed module discovery method produced better results than the traditional gene-module discovery measures. In summary, our proposed rule-based method is useful for exploring biomarker modules from transcriptomic data.

  14. Using fuzzy data mining to diagnose patients' degrees of melancholia

    NASA Astrophysics Data System (ADS)

    Huang, Yo-Ping; Kuo, Wen-Lin

    2011-06-01

    The common treatments of melancholia are psychotherapy and taking medicines. The psychotherapy treatment which this study focuses on is limited by time and location. It is easier for psychiatrists to grasp information from clinical manifestation but it is difficult for psychiatrists to collect information from patients' daily conversations or emotion. To design a system which psychiatrists enable to capture patients' daily symptoms will show great help in the treatment. This study proposes to use fuzzy data mining algorithm to find association rules among keywords segmented from patients' daily voice/text messages to assist psychiatrists extract useful information before outpatient service. Patients of melancholia can use devices such as mobile phones or computers to record their own emotion anytime and anywhere and then uploading the recorded files to the back-end server for further analysis. The analytical results can be used for psychiatrists to diagnose patients' degrees of melancholia. Experimental results will be given to verify the effectiveness of the proposed methodology.

  15. Privacy Is Become with, Data Perturbation

    NASA Astrophysics Data System (ADS)

    Singh, Er. Niranjan; Singhai, Niky

    2011-06-01

    Privacy is becoming an increasingly important issue in many data mining applications that deal with health care, security, finance, behavior and other types of sensitive data. Is particularly becoming important in counterterrorism and homeland security-related applications. We touch upon several techniques of masking the data, namely random distortion, including the uniform and Gaussian noise, applied to the data in order to protect it. These perturbation schemes are equivalent to additive perturbation after the logarithmic Transformation. Due to the large volume of research in deriving private information from the additive noise perturbed data, the security of these perturbation schemes is questionable Many artificial intelligence and statistical methods exist for data analysis interpretation, Identifying and measuring the interestingness of patterns and rules discovered, or to be discovered is essential for the evaluation of the mined knowledge and the KDD process as a whole. While some concrete measurements exist, assessing the interestingness of discovered knowledge is still an important research issue. As the tool for the algorithm implementations we chose the language of choice in industrial world MATLAB.

  16. Comparison analysis for classification algorithm in data mining and the study of model use

    NASA Astrophysics Data System (ADS)

    Chen, Junde; Zhang, Defu

    2018-04-01

    As a key technique in data mining, classification algorithm was received extensive attention. Through an experiment of classification algorithm in UCI data set, we gave a comparison analysis method for the different algorithms and the statistical test was used here. Than that, an adaptive diagnosis model for preventive electricity stealing and leakage was given as a specific case in the paper.

  17. Multi-Level Sequential Pattern Mining Based on Prime Encoding

    NASA Astrophysics Data System (ADS)

    Lianglei, Sun; Yun, Li; Jiang, Yin

    Encoding is not only to express the hierarchical relationship, but also to facilitate the identification of the relationship between different levels, which will directly affect the efficiency of the algorithm in the area of mining the multi-level sequential pattern. In this paper, we prove that one step of division operation can decide the parent-child relationship between different levels by using prime encoding and present PMSM algorithm and CROSS-PMSM algorithm which are based on prime encoding for mining multi-level sequential pattern and cross-level sequential pattern respectively. Experimental results show that the algorithm can effectively extract multi-level and cross-level sequential pattern from the sequence database.

  18. 43 CFR 3482.3 - Mining operations maps.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 43 Public Lands: Interior 2 2011-10-01 2011-10-01 false Mining operations maps. 3482.3 Section... MANAGEMENT, DEPARTMENT OF THE INTERIOR MINERALS MANAGEMENT (3000) COAL EXPLORATION AND MINING OPERATIONS RULES Exploration and Resource Recovery and Protection Plans § 3482.3 Mining operations maps. (a...

  19. Novel approach for identification of influenza virus host range and zoonotic transmissible sequences by determination of host-related associative positions in viral genome segments.

    PubMed

    Kargarfard, Fatemeh; Sami, Ashkan; Mohammadi-Dehcheshmeh, Manijeh; Ebrahimie, Esmaeil

    2016-11-16

    Recent (2013 and 2009) zoonotic transmission of avian or porcine influenza to humans highlights an increase in host range by evading species barriers. Gene reassortment or antigenic shift between viruses from two or more hosts can generate a new life-threatening virus when the new shuffled virus is no longer recognized by antibodies existing within human populations. There is no large scale study to help understand the underlying mechanisms of host transmission. Furthermore, there is no clear understanding of how different segments of the influenza genome contribute in the final determination of host range. To obtain insight into the rules underpinning host range determination, various supervised machine learning algorithms were employed to mine reassortment changes in different viral segments in a range of hosts. Our multi-host dataset contained whole segments of 674 influenza strains organized into three host categories: avian, human, and swine. Some of the sequences were assigned to multiple hosts. In point of fact, the datasets are a form of multi-labeled dataset and we utilized a multi-label learning method to identify discriminative sequence sites. Then algorithms such as CBA, Ripper, and decision tree were applied to extract informative and descriptive association rules for each viral protein segment. We found informative rules in all segments that are common within the same host class but varied between different hosts. For example, for infection of an avian host, HA14V and NS1230S were the most important discriminative and combinatorial positions. Host range identification is facilitated by high support combined rules in this study. Our major goal was to detect discriminative genomic positions that were able to identify multi host viruses, because such viruses are likely to cause pandemic or disastrous epidemics.

  20. Effective Diagnosis of Alzheimer's Disease by Means of Association Rules

    NASA Astrophysics Data System (ADS)

    Chaves, R.; Ramírez, J.; Górriz, J. M.; López, M.; Salas-Gonzalez, D.; Illán, I.; Segovia, F.; Padilla, P.

    In this paper we present a novel classification method of SPECT images for the early diagnosis of the Alzheimer's disease (AD). The proposed method is based on Association Rules (ARs) aiming to discover interesting associations between attributes contained in the database. The system uses firstly voxel-as-features (VAF) and Activation Estimation (AE) to find tridimensional activated brain regions of interest (ROIs) for each patient. These ROIs act as inputs to secondly mining ARs between activated blocks for controls, with a specified minimum support and minimum confidence. ARs are mined in supervised mode, using information previously extracted from the most discriminant rules for centering interest in the relevant brain areas, reducing the computational requirement of the system. Finally classification process is performed depending on the number of previously mined rules verified by each subject, yielding an up to 95.87% classification accuracy, thus outperforming recent developed methods for AD diagnosis.

  1. [Exploring the clinical characters of Shugan Jieyu capsule through text mining].

    PubMed

    Pu, Zheng-Ping; Xia, Jiang-Ming; Xie, Wei; He, Jin-Cai

    2017-09-01

    The study was main to explore the clinical characters of Shugan Jieyu capsule through text mining. The data sets of Shugan Jieyu capsule were downloaded from CMCC database by the method of literature retrieved from May 2009 to Jan 2016. Rules of Chinese medical patterns, diseases, symptoms and combination treatment were mined out by data slicing algorithm, and they were demonstrated in frequency tables and two dimension based network. Then totally 190 literature were recruited. The outcomess suggested that SC was most frequently correlated with liver Qi stagnation. Primary depression, depression due to brain disease, concomitant depression followed by physical diseases, concomitant depression followed by schizophrenia and functional dyspepsia were main diseases treated by Shugan Jieyu capsule. Symptoms like low mood, psychic anxiety, somatic anxiety and dysfunction of automatic nerve were mainy relieved bv Shugan Jieyu capsule.For combination treatment. Shugan Jieyu capsule was most commonly used with paroxetine, sertraline and fluoxetine. The research suggested that syndrome types and mining results of Shugan Jieyu capsule were almost the same as its instructions. Syndrome of malnutrition of heart spirit was the potential Chinese medical pattern of Shugan Jieyu capsule. Primary comorbid anxiety and depression, concomitant comorbid anxiety and depression followed by physical diseases, and postpartum depression were potential diseases treated by Shugan Jieyu capsule.For combination treatment, Shugan Jieyu capsule was most commonly used with paroxetine, sertraline and fluoxetine. Copyright© by the Chinese Pharmaceutical Association.

  2. Community detection in complex networks by using membrane algorithm

    NASA Astrophysics Data System (ADS)

    Liu, Chuang; Fan, Linan; Liu, Zhou; Dai, Xiang; Xu, Jiamei; Chang, Baoren

    Community detection in complex networks is a key problem of network analysis. In this paper, a new membrane algorithm is proposed to solve the community detection in complex networks. The proposed algorithm is based on membrane systems, which consists of objects, reaction rules, and a membrane structure. Each object represents a candidate partition of a complex network, and the quality of objects is evaluated according to network modularity. The reaction rules include evolutionary rules and communication rules. Evolutionary rules are responsible for improving the quality of objects, which employ the differential evolutionary algorithm to evolve objects. Communication rules implement the information exchanged among membranes. Finally, the proposed algorithm is evaluated on synthetic, real-world networks with real partitions known and the large-scaled networks with real partitions unknown. The experimental results indicate the superior performance of the proposed algorithm in comparison with other experimental algorithms.

  3. 76 FR 12852 - Louisiana Regulatory Program/Abandoned Mine Land Reclamation Plan

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-03-09

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 918... Reclamation Plan AGENCY: Office of Surface Mining Reclamation and Enforcement, Interior. ACTION: Final rule; approval of amendment. SUMMARY: We, the Office of Surface Mining Reclamation and Enforcement (OSM), are...

  4. 75 FR 60373 - Louisiana Regulatory Program/Abandoned Mine Land Reclamation Plan

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-09-30

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 918... Reclamation Plan AGENCY: Office of Surface Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule... of Surface Mining Reclamation and Enforcement (OSM), are announcing receipt of a proposed amendment...

  5. 26 CFR 1.614-3 - Rules relating to separate operating mineral interests in the case of mines.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 26 Internal Revenue 7 2011-04-01 2009-04-01 true Rules relating to separate operating mineral interests in the case of mines. 1.614-3 Section 1.614-3 Internal Revenue INTERNAL REVENUE SERVICE, DEPARTMENT OF THE TREASURY (CONTINUED) INCOME TAX (CONTINUED) INCOME TAXES (CONTINUED) Natural Resources § 1...

  6. 26 CFR 1.611-2 - Rules applicable to mines, oil and gas wells, and other natural deposits.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... other natural deposits. 1.611-2 Section 1.611-2 Internal Revenue INTERNAL REVENUE SERVICE, DEPARTMENT OF THE TREASURY (CONTINUED) INCOME TAX (CONTINUED) INCOME TAXES (CONTINUED) Natural Resources § 1.611-2 Rules applicable to mines, oil and gas wells, and other natural deposits. (a) Computation of cost...

  7. An evaluation and implementation of rule-based Home Energy Management System using the Rete algorithm.

    PubMed

    Kawakami, Tomoya; Fujita, Naotaka; Yoshihisa, Tomoki; Tsukamoto, Masahiko

    2014-01-01

    In recent years, sensors become popular and Home Energy Management System (HEMS) takes an important role in saving energy without decrease in QoL (Quality of Life). Currently, many rule-based HEMSs have been proposed and almost all of them assume "IF-THEN" rules. The Rete algorithm is a typical pattern matching algorithm for IF-THEN rules. Currently, we have proposed a rule-based Home Energy Management System (HEMS) using the Rete algorithm. In the proposed system, rules for managing energy are processed by smart taps in network, and the loads for processing rules and collecting data are distributed to smart taps. In addition, the number of processes and collecting data are reduced by processing rules based on the Rete algorithm. In this paper, we evaluated the proposed system by simulation. In the simulation environment, rules are processed by a smart tap that relates to the action part of each rule. In addition, we implemented the proposed system as HEMS using smart taps.

  8. Analyzing injury severity factors at highway railway grade crossing accidents involving vulnerable road users: A comparative study.

    PubMed

    Ghomi, Haniyeh; Bagheri, Morteza; Fu, Liping; Miranda-Moreno, Luis F

    2016-11-16

    The main objective of this study is to identify the main factors associated with injury severity of vulnerable road users (VRUs) involved in accidents at highway railroad grade crossings (HRGCs) using data mining techniques. This article applies an ordered probit model, association rules, and classification and regression tree (CART) algorithms to the U.S. Federal Railroad Administration's (FRA) HRGC accident database for the period 2007-2013 to identify VRU injury severity factors at HRGCs. The results show that train speed is a key factor influencing injury severity. Further analysis illustrated that the presence of illumination does not reduce the severity of accidents for high-speed trains. In addition, there is a greater propensity toward fatal accidents for elderly road users compared to younger individuals. Interestingly, at night, injury accidents involving female road users are more severe compared to those involving males. The ordered probit model was the primary technique, and CART and association rules act as the supporter and identifier of interactions between variables. All 3 algorithms' results consistently show that the most influential accident factors are train speed, VRU age, and gender. The findings of this research could be applied for identifying high-risk hotspots and developing cost-effective countermeasures targeting VRUs at HRGCs.

  9. Data Mining and Machine Learning in Astronomy

    NASA Astrophysics Data System (ADS)

    Ball, Nicholas M.; Brunner, Robert J.

    We review the current state of data mining and machine learning in astronomy. Data Mining can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those in which data mining techniques directly contributed to improving science, and important current and future directions, including probability density functions, parallel algorithms, Peta-Scale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.

  10. Revealing Significant Relations between Chemical/Biological Features and Activity: Associative Classification Mining for Drug Discovery

    ERIC Educational Resources Information Center

    Yu, Pulan

    2012-01-01

    Classification, clustering and association mining are major tasks of data mining and have been widely used for knowledge discovery. Associative classification mining, the combination of both association rule mining and classification, has emerged as an indispensable way to support decision making and scientific research. In particular, it offers a…

  11. 30 CFR 49.60 - Requirements for a local mine rescue contest.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... EDUCATION AND TRAINING MINE RESCUE TEAMS Mine Rescue Teams for Underground Coal Mines § 49.60 Requirements... United States; (2) Uses MSHA-recognized rules; (3) Has a minimum of three mine rescue teams competing; (4) Has one or more problems conducted on one or more days with a determined winner; (5) Includes team...

  12. 30 CFR 49.60 - Requirements for a local mine rescue contest.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... EDUCATION AND TRAINING MINE RESCUE TEAMS Mine Rescue Teams for Underground Coal Mines § 49.60 Requirements... United States; (2) Uses MSHA-recognized rules; (3) Has a minimum of three mine rescue teams competing; (4) Has one or more problems conducted on one or more days with a determined winner; (5) Includes team...

  13. 30 CFR 49.60 - Requirements for a local mine rescue contest.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... EDUCATION AND TRAINING MINE RESCUE TEAMS Mine Rescue Teams for Underground Coal Mines § 49.60 Requirements... United States; (2) Uses MSHA-recognized rules; (3) Has a minimum of three mine rescue teams competing; (4) Has one or more problems conducted on one or more days with a determined winner; (5) Includes team...

  14. 30 CFR 49.60 - Requirements for a local mine rescue contest.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... EDUCATION AND TRAINING MINE RESCUE TEAMS Mine Rescue Teams for Underground Coal Mines § 49.60 Requirements... United States; (2) Uses MSHA-recognized rules; (3) Has a minimum of three mine rescue teams competing; (4) Has one or more problems conducted on one or more days with a determined winner; (5) Includes team...

  15. 30 CFR 49.60 - Requirements for a local mine rescue contest.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... EDUCATION AND TRAINING MINE RESCUE TEAMS Mine Rescue Teams for Underground Coal Mines § 49.60 Requirements... United States; (2) Uses MSHA-recognized rules; (3) Has a minimum of three mine rescue teams competing; (4) Has one or more problems conducted on one or more days with a determined winner; (5) Includes team...

  16. A parallel algorithm for finding the shortest exit paths in mines

    NASA Astrophysics Data System (ADS)

    Jastrzab, Tomasz; Buchcik, Agata

    2017-11-01

    In the paper we study the problem of finding the shortest exit path in an underground mine in case of emergency. Since emergency situations, such as underground fires, can put the miners' lives at risk, the ability to quickly determine the safest exit path is crucial. We propose a parallel algorithm capable of finding the shortest path between the safe exit point and any other point in the mine. The algorithm is also able to take into account the characteristics of individual miners, to make the path determination more reliable.

  17. Data quality enhancement and knowledge discovery from relevant signals in acoustic emission

    NASA Astrophysics Data System (ADS)

    Mejia, Felipe; Shyu, Mei-Ling; Nanni, Antonio

    2015-10-01

    The increasing popularity of structural health monitoring has brought with it a growing need for automated data management and data analysis tools. Of great importance are filters that can systematically detect unwanted signals in acoustic emission datasets. This study presents a semi-supervised data mining scheme that detects data belonging to unfamiliar distributions. This type of outlier detection scheme is useful detecting the presence of new acoustic emission sources, given a training dataset of unwanted signals. In addition to classifying new observations (herein referred to as "outliers") within a dataset, the scheme generates a decision tree that classifies sub-clusters within the outlier context set. The obtained tree can be interpreted as a series of characterization rules for newly-observed data, and they can potentially describe the basic structure of different modes within the outlier distribution. The data mining scheme is first validated on a synthetic dataset, and an attempt is made to confirm the algorithms' ability to discriminate outlier acoustic emission sources from a controlled pencil-lead-break experiment. Finally, the scheme is applied to data from two fatigue crack-growth steel specimens, where it is shown that extracted rules can adequately describe crack-growth related acoustic emission sources while filtering out background "noise." Results show promising performance in filter generation, thereby allowing analysts to extract, characterize, and focus only on meaningful signals.

  18. The LSST Data Mining Research Agenda

    NASA Astrophysics Data System (ADS)

    Borne, K.; Becla, J.; Davidson, I.; Szalay, A.; Tyson, J. A.

    2008-12-01

    We describe features of the LSST science database that are amenable to scientific data mining, object classification, outlier identification, anomaly detection, image quality assurance, and survey science validation. The data mining research agenda includes: scalability (at petabytes scales) of existing machine learning and data mining algorithms; development of grid-enabled parallel data mining algorithms; designing a robust system for brokering classifications from the LSST event pipeline (which may produce 10,000 or more event alerts per night) multi-resolution methods for exploration of petascale databases; indexing of multi-attribute multi-dimensional astronomical databases (beyond spatial indexing) for rapid querying of petabyte databases; and more.

  19. Fusion of multiple quadratic penalty function support vector machines (QPFSVM) for automated sea mine detection and classification

    NASA Astrophysics Data System (ADS)

    Dobeck, Gerald J.; Cobb, J. Tory

    2002-08-01

    The high-resolution sonar is one of the principal sensors used by the Navy to detect and classify sea mines in minehunting operations. For such sonar systems, substantial effort has been devoted to the development of automated detection and classification (D/C) algorithms. These have been spurred by several factors including (1) aids for operators to reduce work overload, (2) more optimal use of all available data, and (3) the introduction of unmanned minehunting systems. The environments where sea mines are typically laid (harbor areas, shipping lanes, and the littorals) give rise to many false alarms caused by natural, biologic, and man-made clutter. The objective of the automated D/C algorithms is to eliminate most of these false alarms while still maintaining a very high probability of mine detection and classification (PdPc). In recent years, the benefits of fusing the outputs of multiple D/C algorithms have been studied. We refer to this as Algorithm Fusion. The results have been remarkable, including reliable robustness to new environments. The Quadratic Penalty Function Support Vector Machine (QPFSVM) algorithm to aid in the automated detection and classification of sea mines is introduced in this paper. The QPFSVM algorithm is easy to train, simple to implement, and robust to feature space dimension. Outputs of successive SVM algorithms are cascaded in stages (fused) to improve the Probability of Classification (Pc) and reduce the number of false alarms. Even though our experience has been gained in the area of sea mine detection and classification, the principles described herein are general and can be applied to fusion of any D/C problem (e.g., automated medical diagnosis or automatic target recognition for ballistic missile defense).

  20. 76 FR 76104 - Arkansas Regulatory Program and Abandoned Mine Land Reclamation Plan

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-12-06

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 904... Reclamation Plan AGENCY: Office of Surface Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period on proposed amendment. SUMMARY: We, the Office of Surface Mining Reclamation and...

  1. 77 FR 55430 - Arkansas Regulatory Program and Abandoned Mine Land Reclamation Plan

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-09-10

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 904... Reclamation Plan AGENCY: Office of Surface Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period on proposed amendment. SUMMARY: We, the Office of Surface Mining Reclamation and...

  2. Association Rule Mining from an Intelligent Tutor

    ERIC Educational Resources Information Center

    Dogan, Buket; Camurcu, A. Yilmaz

    2008-01-01

    Educational data mining is a very novel research area, offering fertile ground for many interesting data mining applications. Educational data mining can extract useful information from educational activities for better understanding and assessment of the student learning process. In this way, it is possible to explore how students learn topics in…

  3. Applying Suffix Rules to Organization Name Recognition

    NASA Astrophysics Data System (ADS)

    Inui, Takashi; Murakami, Koji; Hashimoto, Taiichi; Utsumi, Kazuo; Ishikawa, Masamichi

    This paper presents a method for boosting the performance of the organization name recognition, which is a part of named entity recognition (NER). Although gazetteers (lists of the NEs) have been known as one of the effective features for supervised machine learning approaches on the NER task, the previous methods which have applied the gazetteers to the NER were very simple. The gazetteers have been used just for searching the exact matches between input text and NEs included in them. The proposed method generates regular expression rules from gazetteers, and, with these rules, it can realize a high-coverage searches based on looser matches between input text and NEs. To generate these rules, we focus on the two well-known characteristics of NE expressions; 1) most of NE expressions can be divided into two parts, class-reference part and instance-reference part, 2) for most of NE expressions the class-reference parts are located at the suffix position of them. A pattern mining algorithm runs on the set of NEs in the gazetteers, and some frequent word sequences from which NEs are constructed are found. Then, we employ only word sequences which have the class-reference part at the suffix position as suffix rules. Experimental results showed that our proposed method improved the performance of the organization name recognition, and achieved the 84.58 F-value for evaluation data.

  4. Mining of high utility-probability sequential patterns from uncertain databases

    PubMed Central

    Zhang, Binbin; Fournier-Viger, Philippe; Li, Ting

    2017-01-01

    High-utility sequential pattern mining (HUSPM) has become an important issue in the field of data mining. Several HUSPM algorithms have been designed to mine high-utility sequential patterns (HUPSPs). They have been applied in several real-life situations such as for consumer behavior analysis and event detection in sensor networks. Nonetheless, most studies on HUSPM have focused on mining HUPSPs in precise data. But in real-life, uncertainty is an important factor as data is collected using various types of sensors that are more or less accurate. Hence, data collected in a real-life database can be annotated with existing probabilities. This paper presents a novel pattern mining framework called high utility-probability sequential pattern mining (HUPSPM) for mining high utility-probability sequential patterns (HUPSPs) in uncertain sequence databases. A baseline algorithm with three optional pruning strategies is presented to mine HUPSPs. Moroever, to speed up the mining process, a projection mechanism is designed to create a database projection for each processed sequence, which is smaller than the original database. Thus, the number of unpromising candidates can be greatly reduced, as well as the execution time for mining HUPSPs. Substantial experiments both on real-life and synthetic datasets show that the designed algorithm performs well in terms of runtime, number of candidates, memory usage, and scalability for different minimum utility and minimum probability thresholds. PMID:28742847

  5. A review of reverse vaccinology approaches for the development of vaccines against ticks and tick borne diseases.

    PubMed

    Lew-Tabor, A E; Rodriguez Valle, M

    2016-06-01

    The field of reverse vaccinology developed as an outcome of the genome sequence revolution. Following the introduction of live vaccinations in the western world by Edward Jenner in 1798 and the coining of the phrase 'vaccine', in 1881 Pasteur developed a rational design for vaccines. Pasteur proposed that in order to make a vaccine that one should 'isolate, inactivate and inject the microorganism' and these basic rules of vaccinology were largely followed for the next 100 years leading to the elimination of several highly infectious diseases. However, new technologies were needed to conquer many pathogens which could not be eliminated using these traditional technologies. Thus increasingly, computers were used to mine genome sequences to rationally design recombinant vaccines. Several vaccines for bacterial and viral diseases (i.e. meningococcus and HIV) have been developed, however the on-going challenge for parasite vaccines has been due to their comparatively larger genomes. Understanding the immune response is important in reverse vaccinology studies as this knowledge will influence how the genome mining is to be conducted. Vaccine candidates for anaplasmosis, cowdriosis, theileriosis, leishmaniasis, malaria, schistosomiasis, and the cattle tick have been identified using reverse vaccinology approaches. Some challenges for parasite vaccine development include the ability to address antigenic variability as well the understanding of the complex interplay between antibody, mucosal and/or T cell immune responses. To understand the complex parasite interactions with the livestock host, there is the limitation where algorithms for epitope mining using the human genome cannot directly be adapted for bovine, for example the prediction of peptide binding to major histocompatibility complex motifs. As the number of genomes for both hosts and parasites increase, the development of new algorithms for pan-genomic mining will continue to impact the future of parasite and ricketsial (and other tick borne pathogens) disease vaccine development. Copyright © 2015 Elsevier GmbH. All rights reserved.

  6. Buried landmine detection using multivariate normal clustering

    NASA Astrophysics Data System (ADS)

    Duston, Brian M.

    2001-10-01

    A Bayesian classification algorithm is presented for discriminating buried land mines from buried and surface clutter in Ground Penetrating Radar (GPR) signals. This algorithm is based on multivariate normal (MVN) clustering, where feature vectors are used to identify populations (clusters) of mines and clutter objects. The features are extracted from two-dimensional images created from ground penetrating radar scans. MVN clustering is used to determine the number of clusters in the data and to create probability density models for target and clutter populations, producing the MVN clustering classifier (MVNCC). The Bayesian Information Criteria (BIC) is used to evaluate each model to determine the number of clusters in the data. An extension of the MVNCC allows the model to adapt to local clutter distributions by treating each of the MVN cluster components as a Poisson process and adaptively estimating the intensity parameters. The algorithm is developed using data collected by the Mine Hunter/Killer Close-In Detector (MH/K CID) at prepared mine lanes. The Mine Hunter/Killer is a prototype mine detecting and neutralizing vehicle developed for the U.S. Army to clear roads of anti-tank mines.

  7. Analysis on composition rules of Chinese patent drugs treating pain-related diseases based on data mining method.

    PubMed

    Tang, Shi-Huan; Shen, Dan; Yang, Hong-Jun

    2017-08-24

    To analyze the composition rules of oral prescriptions in the treatment of headache, stomachache and dysmenorrhea recorded in National Standard for Chinese Patent Drugs (NSCPD) enacted by Ministry of Public Health of China and then make comparison between them to better understand pain treatment in different regions of human body. Constructed NSCPD database had been constructed in 2014. Prescriptions treating the three pain-related diseases were searched and screened from the database. Then data mining method such as association rules analysis and complex system entropy method integrated in the data mining software Traditional Chinese Medicine Inheritance Support System (TCMISS) were applied to process the data. Top 25 drugs with high frequency in the treatment of each disease were selected, and 51, 33 and 22 core combinations treating headache, stomachache and dysmenorrhea respectively were mined out as well. The composition rules of the oral prescriptions for treating headache, stomachache and dysmenorrhea recorded in NSCPD has been summarized. Although there were similarities between them, formula varied according to different locations of pain. It can serve as an evidence and reference for clinical treatment and new drug development.

  8. 76 FR 64047 - Montana Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-10-17

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 926... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period and... amendment to the Montana regulatory program (hereinafter, the ``Montana program'') under the Surface Mining...

  9. 76 FR 36040 - Wyoming Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-06-21

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 950... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period and... amendment to the Wyoming regulatory program (hereinafter, the ``Wyoming program'') under the Surface Mining...

  10. 78 FR 16204 - Wyoming Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-03-14

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 950... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period and... amendment to the Wyoming regulatory program (hereinafter, the ``Wyoming program'') under the Surface Mining...

  11. 76 FR 80310 - Wyoming Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-12-23

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 950... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period and... amendment to the Wyoming regulatory program (hereinafter, the ``Wyoming program'') under the Surface Mining...

  12. 76 FR 67635 - Alaska Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-11-02

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 902... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period and... amendment to the Alaska regulatory program (hereinafter, the ``Alaska program'') under the Surface Mining...

  13. 76 FR 64045 - Montana Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-10-17

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 926... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period and... amendment to the Montana regulatory program (hereinafter, the ``Montana program'') under the Surface Mining...

  14. 76 FR 76111 - Montana Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-12-06

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 926... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period and... amendment to the Montana regulatory program (hereinafter, the ``Montana program'') under the Surface Mining...

  15. 77 FR 25874 - Pennsylvania Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-05-02

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 938... Mining Reclamation and Enforcement (OSM), Interior. ACTION: Final rule; removal of required amendment... regulatory program (the ``Pennsylvania program'') regulations under the Surface Mining Control and...

  16. TSCA Chemical Data Reporting Fact Sheet: Reporting Manufactured Chemical Substances from Metal Mining and Related Activities

    EPA Pesticide Factsheets

    This fact sheet provides guidance on the Chemical Data Reporting (CDR) rule requirements related to the reporting of mined metals, intermediates, and byproducts manufactured during metal mining and related activities.

  17. 77 FR 1430 - Maryland Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-01-10

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 920... Mining Reclamation and Enforcement (OSM), Interior. ACTION: Proposed rule; extension of the comment... the Maryland regulatory program (the ``Maryland program'') under the Surface Mining Control and...

  18. Multivariate Spatial Condition Mapping Using Subtractive Fuzzy Cluster Means

    PubMed Central

    Sabit, Hakilo; Al-Anbuky, Adnan

    2014-01-01

    Wireless sensor networks are usually deployed for monitoring given physical phenomena taking place in a specific space and over a specific duration of time. The spatio-temporal distribution of these phenomena often correlates to certain physical events. To appropriately characterise these events-phenomena relationships over a given space for a given time frame, we require continuous monitoring of the conditions. WSNs are perfectly suited for these tasks, due to their inherent robustness. This paper presents a subtractive fuzzy cluster means algorithm and its application in data stream mining for wireless sensor systems over a cloud-computing-like architecture, which we call sensor cloud data stream mining. Benchmarking on standard mining algorithms, the k-means and the FCM algorithms, we have demonstrated that the subtractive fuzzy cluster means model can perform high quality distributed data stream mining tasks comparable to centralised data stream mining. PMID:25313495

  19. Finding Frequent Closed Itemsets in Sliding Window in Linear Time

    NASA Astrophysics Data System (ADS)

    Chen, Junbo; Zhou, Bo; Chen, Lu; Wang, Xinyu; Ding, Yiqun

    One of the most well-studied problems in data mining is computing the collection of frequent itemsets in large transactional databases. Since the introduction of the famous Apriori algorithm [14], many others have been proposed to find the frequent itemsets. Among such algorithms, the approach of mining closed itemsets has raised much interest in data mining community. The algorithms taking this approach include TITANIC [8], CLOSET+[6], DCI-Closed [4], FCI-Stream [3], GC-Tree [15], TGC-Tree [16] etc. Among these algorithms, FCI-Stream, GC-Tree and TGC-Tree are online algorithms work under sliding window environments. By the performance evaluation in [16], GC-Tree [15] is the fastest one. In this paper, an improved algorithm based on GC-Tree is proposed, the computational complexity of which is proved to be a linear combination of the average transaction size and the average closed itemset size. The algorithm is based on the essential theorem presented in Sect. 4.2. Empirically, the new algorithm is several orders of magnitude faster than the state of art algorithm, GC-Tree.

  20. Detection of pseudosinusoidal epileptic seizure segments in the neonatal EEG by cascading a rule-based algorithm with a neural network.

    PubMed

    Karayiannis, Nicolaos B; Mukherjee, Amit; Glover, John R; Ktonas, Periklis Y; Frost, James D; Hrachovy, Richard A; Mizrahi, Eli M

    2006-04-01

    This paper presents an approach to detect epileptic seizure segments in the neonatal electroencephalogram (EEG) by characterizing the spectral features of the EEG waveform using a rule-based algorithm cascaded with a neural network. A rule-based algorithm screens out short segments of pseudosinusoidal EEG patterns as epileptic based on features in the power spectrum. The output of the rule-based algorithm is used to train and compare the performance of conventional feedforward neural networks and quantum neural networks. The results indicate that the trained neural networks, cascaded with the rule-based algorithm, improved the performance of the rule-based algorithm acting by itself. The evaluation of the proposed cascaded scheme for the detection of pseudosinusoidal seizure segments reveals its potential as a building block of the automated seizure detection system under development.

  1. Evaluation of the influence of dominance rules for the assembly line design problem under consideration of product design alternatives

    NASA Astrophysics Data System (ADS)

    Oesterle, Jonathan; Lionel, Amodeo

    2018-06-01

    The current competitive situation increases the importance of realistically estimating product costs during the early phases of product and assembly line planning projects. In this article, several multi-objective algorithms using difference dominance rules are proposed to solve the problem associated with the selection of the most effective combination of product and assembly lines. The list of developed algorithms includes variants of ant colony algorithms, evolutionary algorithms and imperialist competitive algorithms. The performance of each algorithm and dominance rule is analysed by five multi-objective quality indicators and fifty problem instances. The algorithms and dominance rules are ranked using a non-parametric statistical test.

  2. 75 FR 69617 - Lowering Miners' Exposure to Respirable Coal Mine Dust, Including Continuous Personal Dust Monitors

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-11-15

    ... 1219-AB64 Lowering Miners' Exposure to Respirable Coal Mine Dust, Including Continuous Personal Dust... hearings on the proposed rule addressing Lowering Miners' Exposure to Respirable Coal Mine Dust, Including... miners' exposure to respirable coal mine dust by revising the Agency's existing standards on miners...

  3. 76 FR 11187 - Examinations of Work Areas in Underground Coal Mines for Violations of Mandatory Health or Safety...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-03-01

    ... Examinations of Work Areas in Underground Coal Mines for Violations of Mandatory Health or Safety Standards... rule addressing Examinations of Work Areas in Underground Coal Mines for Violations of Mandatory Health..., and weekly examinations of underground coal mines. This extension gives commenters an additional 30...

  4. H2RM: A Hybrid Rough Set Reasoning Model for Prediction and Management of Diabetes Mellitus.

    PubMed

    Ali, Rahman; Hussain, Jamil; Siddiqi, Muhammad Hameed; Hussain, Maqbool; Lee, Sungyoung

    2015-07-03

    Diabetes is a chronic disease characterized by high blood glucose level that results either from a deficiency of insulin produced by the body, or the body's resistance to the effects of insulin. Accurate and precise reasoning and prediction models greatly help physicians to improve diagnosis, prognosis and treatment procedures of different diseases. Though numerous models have been proposed to solve issues of diagnosis and management of diabetes, they have the following drawbacks: (1) restricted one type of diabetes; (2) lack understandability and explanatory power of the techniques and decision; (3) limited either to prediction purpose or management over the structured contents; and (4) lack competence for dimensionality and vagueness of patient's data. To overcome these issues, this paper proposes a novel hybrid rough set reasoning model (H2RM) that resolves problems of inaccurate prediction and management of type-1 diabetes mellitus (T1DM) and type-2 diabetes mellitus (T2DM). For verification of the proposed model, experimental data from fifty patients, acquired from a local hospital in semi-structured format, is used. First, the data is transformed into structured format and then used for mining prediction rules. Rough set theory (RST) based techniques and algorithms are used to mine the prediction rules. During the online execution phase of the model, these rules are used to predict T1DM and T2DM for new patients. Furthermore, the proposed model assists physicians to manage diabetes using knowledge extracted from online diabetes guidelines. Correlation-based trend analysis techniques are used to manage diabetic observations. Experimental results demonstrate that the proposed model outperforms the existing methods with 95.9% average and balanced accuracies.

  5. H2RM: A Hybrid Rough Set Reasoning Model for Prediction and Management of Diabetes Mellitus

    PubMed Central

    Ali, Rahman; Hussain, Jamil; Siddiqi, Muhammad Hameed; Hussain, Maqbool; Lee, Sungyoung

    2015-01-01

    Diabetes is a chronic disease characterized by high blood glucose level that results either from a deficiency of insulin produced by the body, or the body’s resistance to the effects of insulin. Accurate and precise reasoning and prediction models greatly help physicians to improve diagnosis, prognosis and treatment procedures of different diseases. Though numerous models have been proposed to solve issues of diagnosis and management of diabetes, they have the following drawbacks: (1) restricted one type of diabetes; (2) lack understandability and explanatory power of the techniques and decision; (3) limited either to prediction purpose or management over the structured contents; and (4) lack competence for dimensionality and vagueness of patient’s data. To overcome these issues, this paper proposes a novel hybrid rough set reasoning model (H2RM) that resolves problems of inaccurate prediction and management of type-1 diabetes mellitus (T1DM) and type-2 diabetes mellitus (T2DM). For verification of the proposed model, experimental data from fifty patients, acquired from a local hospital in semi-structured format, is used. First, the data is transformed into structured format and then used for mining prediction rules. Rough set theory (RST) based techniques and algorithms are used to mine the prediction rules. During the online execution phase of the model, these rules are used to predict T1DM and T2DM for new patients. Furthermore, the proposed model assists physicians to manage diabetes using knowledge extracted from online diabetes guidelines. Correlation-based trend analysis techniques are used to manage diabetic observations. Experimental results demonstrate that the proposed model outperforms the existing methods with 95.9% average and balanced accuracies. PMID:26151207

  6. Validity of association rules extracted by healthcare-data-mining.

    PubMed

    Takeuchi, Hiroshi; Kodama, Naoki

    2014-01-01

    A personal healthcare system used with cloud computing has been developed. It enables a daily time-series of personal health and lifestyle data to be stored in the cloud through mobile devices. The cloud automatically extracts personally useful information, such as rules and patterns concerning the user's lifestyle and health condition embedded in their personal big data, by using healthcare-data-mining. This study has verified that the extracted rules on the basis of a daily time-series data stored during a half- year by volunteer users of this system are valid.

  7. [Traditional Chinese medicine inheritance system analysis of professor Ding Yuanqing in treating tic disorder medication based on experience].

    PubMed

    Sun, Lu-yan; Li, Qing-peng; Zhao, Li-li; Ding, Yuan-qing

    2015-08-01

    In recent years, the incidence of tic disorders has increased, and it is not uncommon for the patients to treat the disease. The pathogenesis and pathogenesis of Western medicine are not yet clear, the clinical commonly used western medicine has many adverse reactions, traditional Chinese medicine (TCM) research is increasingly valued. Based on the software of TCM inheritance assistant system, this paper discusses Ding Yuanqing's experience in treating tic disorder with Professor. Collect yuan Qing Ding professor in treating tic disorder of medical records by association rules Apriori algorithm, complex system entropy clustering without supervision and data mining method, carries on the analysis to the selected 800 prescriptions, to determine the frequency of use of prescription drugs, the association rules between the drug and digging out the 12 core combination and the first six new prescription, medication transferred to the liver and extinguish wind, cooling blood and relieving convulsion, Qingxin soothe the nerves, with the card cut, flexible application, strict compatibility.

  8. Prediction of Disease Case Severity Level To Determine INA CBGs Rate

    NASA Astrophysics Data System (ADS)

    Puspitorini, Sukma; Kusumadewi, Sri; Rosita, Linda

    2017-03-01

    Indonesian Case-Based Groups (INA CBGs) is case-mix payment system using software grouper application. INA CBGs consisting of four digits code where the last digits indicating the severity level of disease cases. Severity level influence by secondary diagnosis (complications and co-morbidity) related to resource intensity level. It is medical resources used to treat a hospitalized patient. Objectives of this research is developing decision support system to predict severity level of disease cases and illustrate INA CBGs rate by using data mining decision tree classification model. Primary diagnosis (DU), first secondary diagnosis (DS 1), and second secondary diagnosis (DS 2) are attributes that used as input of severity level. The training process using C4.5 algorithm and the rules will represent in the IF-THEN form. Credibility of the system analyzed through testing process and confusion matrix present the results. Outcome of this research shows that first secondary diagnosis influence significant to form severity level predicting rules from new disease cases and INA CBGs rate illustration.

  9. Granular support vector machines with association rules mining for protein homology prediction.

    PubMed

    Tang, Yuchun; Jin, Bo; Zhang, Yan-Qing

    2005-01-01

    Protein homology prediction between protein sequences is one of critical problems in computational biology. Such a complex classification problem is common in medical or biological information processing applications. How to build a model with superior generalization capability from training samples is an essential issue for mining knowledge to accurately predict/classify unseen new samples and to effectively support human experts to make correct decisions. A new learning model called granular support vector machines (GSVM) is proposed based on our previous work. GSVM systematically and formally combines the principles from statistical learning theory and granular computing theory and thus provides an interesting new mechanism to address complex classification problems. It works by building a sequence of information granules and then building support vector machines (SVM) in some of these information granules on demand. A good granulation method to find suitable granules is crucial for modeling a GSVM with good performance. In this paper, we also propose an association rules-based granulation method. For the granules induced by association rules with high enough confidence and significant support, we leave them as they are because of their high "purity" and significant effect on simplifying the classification task. For every other granule, a SVM is modeled to discriminate the corresponding data. In this way, a complex classification problem is divided into multiple smaller problems so that the learning task is simplified. The proposed algorithm, here named GSVM-AR, is compared with SVM by KDDCUP04 protein homology prediction data. The experimental results show that finding the splitting hyperplane is not a trivial task (we should be careful to select the association rules to avoid overfitting) and GSVM-AR does show significant improvement compared to building one single SVM in the whole feature space. Another advantage is that the utility of GSVM-AR is very good because it is easy to be implemented. More importantly and more interestingly, GSVM provides a new mechanism to address complex classification problems.

  10. Exploring context and content links in social media: a latent space method.

    PubMed

    Qi, Guo-Jun; Aggarwal, Charu; Tian, Qi; Ji, Heng; Huang, Thomas S

    2012-05-01

    Social media networks contain both content and context-specific information. Most existing methods work with either of the two for the purpose of multimedia mining and retrieval. In reality, both content and context information are rich sources of information for mining, and the full power of mining and processing algorithms can be realized only with the use of a combination of the two. This paper proposes a new algorithm which mines both context and content links in social media networks to discover the underlying latent semantic space. This mapping of the multimedia objects into latent feature vectors enables the use of any off-the-shelf multimedia retrieval algorithms. Compared to the state-of-the-art latent methods in multimedia analysis, this algorithm effectively solves the problem of sparse context links by mining the geometric structure underlying the content links between multimedia objects. Specifically for multimedia annotation, we show that an effective algorithm can be developed to directly construct annotation models by simultaneously leveraging both context and content information based on latent structure between correlated semantic concepts. We conduct experiments on the Flickr data set, which contains user tags linked with images. We illustrate the advantages of our approach over the state-of-the-art multimedia retrieval techniques.

  11. Combined mine tremors source location and error evaluation in the Lubin Copper Mine (Poland)

    NASA Astrophysics Data System (ADS)

    Leśniak, Andrzej; Pszczoła, Grzegorz

    2008-08-01

    A modified method of mine tremors location used in Lubin Copper Mine is presented in the paper. In mines where an intensive exploration is carried out a high accuracy source location technique is usually required. The effect of the flatness of the geophones array, complex geological structure of the rock mass and intense exploitation make the location results ambiguous in such mines. In the present paper an effective method of source location and location's error evaluations are presented, combining data from two different arrays of geophones. The first consists of uniaxial geophones spaced in the whole mine area. The second is installed in one of the mining panels and consists of triaxial geophones. The usage of the data obtained from triaxial geophones allows to increase the hypocenter vertical coordinate precision. The presented two-step location procedure combines standard location methods: P-waves directions and P-waves arrival times. Using computer simulations the efficiency of the created algorithm was tested. The designed algorithm is fully non-linear and was tested on the multilayered rock mass model of the Lubin Copper Mine, showing a computational better efficiency than the traditional P-wave arrival times location algorithm. In this paper we present the complete procedure that effectively solves the non-linear location problems, i.e. the mine tremor location and measurement of the error propagation.

  12. 77 FR 58056 - Mississippi Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-09-19

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 924... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period and opportunity for public hearing. SUMMARY: We, the Office of Surface Mining Reclamation and Enforcement (OSM...

  13. 76 FR 36039 - Colorado Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-06-21

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 906... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period and... Mining Control and Reclamation Act of 1977 (``SMCRA'' or ``the Act''). Colorado proposes both additions...

  14. 77 FR 34890 - Oklahoma Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-06-12

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 936... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period and opportunity for public hearing on proposed amendment. SUMMARY: We, the Office of Surface Mining Reclamation...

  15. 76 FR 50708 - Texas Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-08-16

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 943... AGENCY: Office of Surface Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period and opportunity for public hearing. SUMMARY: We, the Office of Surface Mining Reclamation...

  16. 75 FR 60371 - Alabama Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-09-30

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 901... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period and opportunity for public hearing on proposed amendment. SUMMARY: We, the Office of Surface Mining Reclamation...

  17. 77 FR 41680 - Indiana Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-07-16

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 914... Mining Reclamation and Enforcement, Interior. ACTION: Final rule; approval of amendment. SUMMARY: We, the Office of Surface Mining Reclamation and Enforcement (OSM), are approving amendments to the Indiana...

  18. 77 FR 25949 - Texas Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-05-02

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 943... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period and opportunity for public hearing on proposed amendment. SUMMARY: We, the Office of Surface Mining Reclamation...

  19. 76 FR 76109 - Colorado Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-12-06

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 906... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; reopening and extension of public...'') under the Surface Mining Control and Reclamation Act of 1977 (``SMCRA'' or ``the Act''). Colorado...

  20. 77 FR 66574 - Texas Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-11-06

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 943... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period and opportunity for public hearing on proposed amendment. SUMMARY: We, the Office of Surface Mining Reclamation...

  1. 77 FR 18149 - Montana Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-03-27

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 926... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; reopening and extension of public... receipt of Montana's response to the Office of Surface Mining Reclamation and Enforcement's (OSM) November...

  2. 77 FR 24661 - North Dakota Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-04-25

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 934... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period and... Surface Mining Control and Reclamation Act of 1977 (``SMCRA'' or ``the Act''). North Dakota proposes...

  3. 76 FR 23522 - Oklahoma Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-04-27

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 936... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period and opportunity for public hearing. SUMMARY: We, the Office of Surface Mining Reclamation and Enforcement (OSM...

  4. 75 FR 21534 - Texas Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-04-26

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 943... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period and opportunity for public hearing on proposed amendment. SUMMARY: We, the Office of Surface Mining Reclamation...

  5. 77 FR 34892 - Utah Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-06-12

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 944... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period and opportunity for public hearing on proposed amendment. SUMMARY: We, the Office of Surface Mining Reclamation...

  6. 77 FR 18738 - Texas Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-03-28

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 943... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period and opportunity for public hearing on proposed amendment. SUMMARY: We, the Office of Surface Mining Reclamation...

  7. 76 FR 9700 - Alabama Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-02-22

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 901... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period and opportunity for public hearing on proposed amendment. SUMMARY: We, the Office of Surface Mining Reclamation...

  8. 77 FR 40796 - Wyoming Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-07-11

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 950... Mining Reclamation and Enforcement, Interior. ACTION: Final rule. SUMMARY: We, the Office of Surface Mining Reclamation and Enforcement (OSM), are removing a disapproval codified in OSM regulations...

  9. 76 FR 12857 - Montana Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-03-09

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 926... of Surface Mining Reclamation and Enforcement, Interior. ACTION: Final rule; approval of amendment... the Surface Mining Control and Reclamation Act of 1977 (``SMCRA'' or ``the Act''). Montana proposed...

  10. 78 FR 11617 - Pennsylvania Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-02-19

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 938... Surface Mining Reclamation and Enforcement (OSM), Interior. ACTION: Proposed rule; reopening of comment... regulatory program (the ``Pennsylvania program'') under the Surface Mining Control and Reclamation Act of...

  11. Finding novel relationships with integrated gene-gene association network analysis of Synechocystis sp. PCC 6803 using species-independent text-mining.

    PubMed

    Kreula, Sanna M; Kaewphan, Suwisa; Ginter, Filip; Jones, Patrik R

    2018-01-01

    The increasing move towards open access full-text scientific literature enhances our ability to utilize advanced text-mining methods to construct information-rich networks that no human will be able to grasp simply from 'reading the literature'. The utility of text-mining for well-studied species is obvious though the utility for less studied species, or those with no prior track-record at all, is not clear. Here we present a concept for how advanced text-mining can be used to create information-rich networks even for less well studied species and apply it to generate an open-access gene-gene association network resource for Synechocystis sp. PCC 6803, a representative model organism for cyanobacteria and first case-study for the methodology. By merging the text-mining network with networks generated from species-specific experimental data, network integration was used to enhance the accuracy of predicting novel interactions that are biologically relevant. A rule-based algorithm (filter) was constructed in order to automate the search for novel candidate genes with a high degree of likely association to known target genes by (1) ignoring established relationships from the existing literature, as they are already 'known', and (2) demanding multiple independent evidences for every novel and potentially relevant relationship. Using selected case studies, we demonstrate the utility of the network resource and filter to ( i ) discover novel candidate associations between different genes or proteins in the network, and ( ii ) rapidly evaluate the potential role of any one particular gene or protein. The full network is provided as an open-source resource.

  12. A software tool for determination of breast cancer treatment methods using data mining approach.

    PubMed

    Cakır, Abdülkadir; Demirel, Burçin

    2011-12-01

    In this work, breast cancer treatment methods are determined using data mining. For this purpose, software is developed to help to oncology doctor for the suggestion of application of the treatment methods about breast cancer patients. 462 breast cancer patient data, obtained from Ankara Oncology Hospital, are used to determine treatment methods for new patients. This dataset is processed with Weka data mining tool. Classification algorithms are applied one by one for this dataset and results are compared to find proper treatment method. Developed software program called as "Treatment Assistant" uses different algorithms (IB1, Multilayer Perception and Decision Table) to find out which one is giving better result for each attribute to predict and by using Java Net beans interface. Treatment methods are determined for the post surgical operation of breast cancer patients using this developed software tool. At modeling step of data mining process, different Weka algorithms are used for output attributes. For hormonotherapy output IB1, for tamoxifen and radiotherapy outputs Multilayer Perceptron and for the chemotherapy output decision table algorithm shows best accuracy performance compare to each other. In conclusion, this work shows that data mining approach can be a useful tool for medical applications particularly at the treatment decision step. Data mining helps to the doctor to decide in a short time.

  13. A Comparative Study of Frequent and Maximal Periodic Pattern Mining Algorithms in Spatiotemporal Databases

    NASA Astrophysics Data System (ADS)

    Obulesu, O.; Rama Mohan Reddy, A., Dr; Mahendra, M.

    2017-08-01

    Detecting regular and efficient cyclic models is the demanding activity for data analysts due to unstructured, vigorous and enormous raw information produced from web. Many existing approaches generate large candidate patterns in the occurrence of huge and complex databases. In this work, two novel algorithms are proposed and a comparative examination is performed by considering scalability and performance parameters. The first algorithm is, EFPMA (Extended Regular Model Detection Algorithm) used to find frequent sequential patterns from the spatiotemporal dataset and the second one is, ETMA (Enhanced Tree-based Mining Algorithm) for detecting effective cyclic models with symbolic database representation. EFPMA is an algorithm grows models from both ends (prefixes and suffixes) of detected patterns, which results in faster pattern growth because of less levels of database projection compared to existing approaches such as Prefixspan and SPADE. ETMA uses distinct notions to store and manage transactions data horizontally such as segment, sequence and individual symbols. ETMA exploits a partition-and-conquer method to find maximal patterns by using symbolic notations. Using this algorithm, we can mine cyclic models in full-series sequential patterns including subsection series also. ETMA reduces the memory consumption and makes use of the efficient symbolic operation. Furthermore, ETMA only records time-series instances dynamically, in terms of character, series and section approaches respectively. The extent of the pattern and proving efficiency of the reducing and retrieval techniques from synthetic and actual datasets is a really open & challenging mining problem. These techniques are useful in data streams, traffic risk analysis, medical diagnosis, DNA sequence Mining, Earthquake prediction applications. Extensive investigational outcomes illustrates that the algorithms outperforms well towards efficiency and scalability than ECLAT, STNR and MAFIA approaches.

  14. Statistically significant performance results of a mine detector and fusion algorithm from an x-band high-resolution SAR

    NASA Astrophysics Data System (ADS)

    Williams, Arnold C.; Pachowicz, Peter W.

    2004-09-01

    Current mine detection research indicates that no single sensor or single look from a sensor will detect mines/minefields in a real-time manner at a performance level suitable for a forward maneuver unit. Hence, the integrated development of detectors and fusion algorithms are of primary importance. A problem in this development process has been the evaluation of these algorithms with relatively small data sets, leading to anecdotal and frequently over trained results. These anecdotal results are often unreliable and conflicting among various sensors and algorithms. Consequently, the physical phenomena that ought to be exploited and the performance benefits of this exploitation are often ambiguous. The Army RDECOM CERDEC Night Vision Laboratory and Electron Sensors Directorate has collected large amounts of multisensor data such that statistically significant evaluations of detection and fusion algorithms can be obtained. Even with these large data sets care must be taken in algorithm design and data processing to achieve statistically significant performance results for combined detectors and fusion algorithms. This paper discusses statistically significant detection and combined multilook fusion results for the Ellipse Detector (ED) and the Piecewise Level Fusion Algorithm (PLFA). These statistically significant performance results are characterized by ROC curves that have been obtained through processing this multilook data for the high resolution SAR data of the Veridian X-Band radar. We discuss the implications of these results on mine detection and the importance of statistical significance, sample size, ground truth, and algorithm design in performance evaluation.

  15. 78 FR 6062 - North Dakota Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-01-29

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 934... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period and... Surface Mining Control and Reclamation Act of 1977 (``SMCRA'' or ``the Act''). North Dakota intends to...

  16. 76 FR 4266 - New Mexico Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-01-25

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 931... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period and... Mining Control and Reclamation Act of 1977 (``SMCRA'' or ``the Act''). New Mexico proposes revisions to...

  17. 76 FR 9642 - Alabama Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-02-22

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 901... Mining Reclamation and Enforcement, Interior. ACTION: Final rule; approval of amendment. SUMMARY: We, the Office of Surface Mining Reclamation and Enforcement (OSM), are approving an amendment to the Alabama...

  18. 78 FR 13002 - Pennsylvania Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-02-26

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 938... Mining Reclamation and Enforcement (``OSM''), Interior. ACTION: Proposed rule; public comment period and... regulatory program under the Surface Mining Control and Reclamation Act of 1977 (``SMCRA'' or the ``Act...

  19. 78 FR 11579 - Texas Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-02-19

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 943... Mining Reclamation and Enforcement, Interior. ACTION: Final rule; approval of amendment. SUMMARY: We, the Office of Surface Mining Reclamation and Enforcement (OSM), are approving an amendment to the Texas...

  20. 76 FR 40649 - Indiana Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-07-11

    ... at 312 IAC 25-6-30 Surface mining; explosives; general requirements. The full text of the program... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 914... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period on proposed...

  1. 78 FR 10512 - Wyoming Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-02-14

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 950... Mining Reclamation and Enforcement, Interior. ACTION: Final rule; approval of amendment with certain... ``Wyoming program'') under the Surface Mining Control and Reclamation Act of 1977 (``SMCRA'' or ``the Act...

  2. 77 FR 8144 - Texas Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-02-14

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 943... AGENCY: Office of Surface Mining Reclamation and Enforcement, Interior. ACTION: Final rule; approval of amendment. SUMMARY: We, the Office of Surface Mining Reclamation and Enforcement (OSM), are approving three...

  3. 78 FR 9807 - Utah Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-02-12

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 944... Mining Reclamation and Enforcement, Interior. ACTION: Final rule; approval of amendment. SUMMARY: We are approving an amendment to the Utah regulatory program (the ``Utah program'') under the Surface Mining...

  4. 76 FR 30008 - Alabama Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-05-24

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 901... Mining Reclamation and Enforcement, Interior. ACTION: Final rule; approval of amendment. SUMMARY: We, the Office of Surface Mining Reclamation and Enforcement (OSM), are approving an amendment to the Alabama...

  5. 75 FR 43476 - Montana Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-07-26

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 926... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; reopening and extension of public...'') under the Surface Mining Control and Reclamation Act of 1977 (``SMCRA'' or ``the Act''). Montana revised...

  6. 75 FR 81122 - Texas Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-12-27

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 943... Mining Reclamation and Enforcement, Interior. ACTION: Final rule; approval of amendment. SUMMARY: We, the Office of Surface Mining Reclamation and Enforcement (OSM), are approving an amendment to the Texas...

  7. 77 FR 58025 - Texas Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-09-19

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 943... Mining Reclamation and Enforcement, Interior. ACTION: Final rule; approval of amendment. SUMMARY: We, the Office of Surface Mining Reclamation and Enforcement (OSM), are approving an amendment to the Texas...

  8. 76 FR 25277 - Examinations of Work Areas in Underground Coal Mines and Pattern of Violations

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-05-04

    ..., 1219-AB73 Examinations of Work Areas in Underground Coal Mines and Pattern of Violations AGENCY: Mine... four public hearings on the Agency's proposed rules for Examinations of Work Areas in Underground Coal... 1219-AB75'' for Examinations of Work Areas in Underground Coal Mines' submissions, and with ``RIN 1219...

  9. 78 FR 49079 - Lease Modifications, Lease and Logical Mining Unit Diligence, Advance Royalty, Royalty Rates, and...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-08-12

    ... Management 43 CFR Parts 3000, 3400, 3430, et al. Lease Modifications, Lease and Logical Mining Unit Diligence... Lease Modifications, Lease and Logical Mining Unit Diligence, Advance Royalty, Royalty Rates, and Bonds... leases and logical mining units (LMUs). The proposed rule would implement Title IV, Subtitle D of the...

  10. Mining Co-Location Patterns with Clustering Items from Spatial Data Sets

    NASA Astrophysics Data System (ADS)

    Zhou, G.; Li, Q.; Deng, G.; Yue, T.; Zhou, X.

    2018-05-01

    The explosive growth of spatial data and widespread use of spatial databases emphasize the need for the spatial data mining. Co-location patterns discovery is an important branch in spatial data mining. Spatial co-locations represent the subsets of features which are frequently located together in geographic space. However, the appearance of a spatial feature C is often not determined by a single spatial feature A or B but by the two spatial features A and B, that is to say where A and B appear together, C often appears. We note that this co-location pattern is different from the traditional co-location pattern. Thus, this paper presents a new concept called clustering terms, and this co-location pattern is called co-location patterns with clustering items. And the traditional algorithm cannot mine this co-location pattern, so we introduce the related concept in detail and propose a novel algorithm. This algorithm is extended by join-based approach proposed by Huang. Finally, we evaluate the performance of this algorithm.

  11. Modular Algorithm Testbed Suite (MATS): A Software Framework for Automatic Target Recognition

    DTIC Science & Technology

    2017-01-01

    004 OFFICE OF NAVAL RESEARCH ATTN JASON STACK MINE WARFARE & OCEAN ENGINEERING PROGRAMS CODE 32, SUITE 1092 875 N RANDOLPH ST ARLINGTON VA 22203 ONR...naval mine countermeasures (MCM) operations by automating a large portion of the data analysis. Successful long-term implementation of ATR requires a...Modular Algorithm Testbed Suite; MATS; Mine Countermeasures Operations U U U SAR 24 Derek R. Kolacinski (850) 230-7218 THIS PAGE INTENTIONALLY LEFT

  12. 76 FR 64048 - Pennsylvania Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-10-17

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 938... Surface Mining Reclamation and Enforcement (OSM), Interior. ACTION: Proposed rule; reopening and extension... Mining Control and Reclamation Act of 1977 (SMCRA or the Act) published on February 7, 2011. In response...

  13. 30 CFR 301.1 - Cross reference.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... within the jurisdiction of administrative law judges and the Interior Board of Surface Mining and... Resources BOARD OF SURFACE MINING AND RECLAMATION APPEALS, DEPARTMENT OF THE INTERIOR PROCEDURES UNDER SURFACE MINING CONTROL AND RECLAMATION ACT OF 1977 § 301.1 Cross reference. For special rules applicable...

  14. 75 FR 60271 - Technical Amendments 2010

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-09-29

    ... Part VI Department of the Interior Office of Surface Mining Reclamation and Enforcement 30 CFR... INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Parts 740, 761, 773, 795, 816, 817...: Office of Surface Mining Reclamation and Enforcement, Interior. ACTION: Final rule. SUMMARY: We, the...

  15. 30 CFR 921.700 - Massachusetts Federal program.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 921.700 Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR PROGRAMS FOR THE CONDUCT OF SURFACE MINING OPERATIONS WITHIN EACH STATE MASSACHUSETTS § 921.700 Massachusetts Federal program. (a) This part contains all rules that are applicable to surface coal mining...

  16. 77 FR 58053 - Kentucky Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-09-19

    ... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 917... Mining Reclamation and Enforcement (OSM), Interior. ACTION: Proposed rule; Removal of Required Amendments... program'') under the Surface Mining Control and Reclamation Act of 1977 (SMCRA or the Act). As a result of...

  17. 30 CFR 937.700 - Oregon Federal program.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... Federal program. (c) The rules in this part apply to all surface coal mining operations in Oregon... more stringent environmental control and regulation of surface coal mining operations than do the... extent they provide for regulation of surface coal mining and reclamation operations which are exempt...

  18. 30 CFR 912.700 - Idaho Federal program.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... seq. and Rules 1 through 20 promulgated thereunder pertaining to regulation of dredge mining. (6... Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR PROGRAMS FOR THE CONDUCT OF SURFACE MINING OPERATIONS WITHIN EACH STATE IDAHO § 912.700 Idaho Federal...

  19. 75 FR 64411 - Lowering Miners' Exposure to Respirable Coal Mine Dust, Including Continuous Personal Dust Monitors

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-10-19

    ...The Mine Safety and Health Administration (MSHA) proposes to lower miners' exposure to respirable coal mine dust by revising the Agency's existing standards on miners' occupational exposure to respirable coal mine dust. The major provisions of the proposal would lower the existing exposure limit; provide for full-shift sampling; redefine the term ``normal production shift; '' and add reexamination and decertification requirements for persons certified to sample, and maintain and calibrate sampling devices. In addition, the proposed rule would provide for single shift compliance sampling under the mine operator and MSHA's inspector sampling programs, and would establish sampling requirements for use of the Continuous Personal Dust Monitor (CPDM) and expanded requirements for medical surveillance. The proposed rule would significantly improve health protections for this Nation's coal miners by reducing their occupational exposure to respirable coal mine dust and lowering the risk that they will suffer material impairment of health or functional capacity over their working lives.

  20. MINE: Module Identification in Networks

    PubMed Central

    2011-01-01

    Background Graphical models of network associations are useful for both visualizing and integrating multiple types of association data. Identifying modules, or groups of functionally related gene products, is an important challenge in analyzing biological networks. However, existing tools to identify modules are insufficient when applied to dense networks of experimentally derived interaction data. To address this problem, we have developed an agglomerative clustering method that is able to identify highly modular sets of gene products within highly interconnected molecular interaction networks. Results MINE outperforms MCODE, CFinder, NEMO, SPICi, and MCL in identifying non-exclusive, high modularity clusters when applied to the C. elegans protein-protein interaction network. The algorithm generally achieves superior geometric accuracy and modularity for annotated functional categories. In comparison with the most closely related algorithm, MCODE, the top clusters identified by MINE are consistently of higher density and MINE is less likely to designate overlapping modules as a single unit. MINE offers a high level of granularity with a small number of adjustable parameters, enabling users to fine-tune cluster results for input networks with differing topological properties. Conclusions MINE was created in response to the challenge of discovering high quality modules of gene products within highly interconnected biological networks. The algorithm allows a high degree of flexibility and user-customisation of results with few adjustable parameters. MINE outperforms several popular clustering algorithms in identifying modules with high modularity and obtains good overall recall and precision of functional annotations in protein-protein interaction networks from both S. cerevisiae and C. elegans. PMID:21605434

  1. Improved mine blast algorithm for optimal cost design of water distribution systems

    NASA Astrophysics Data System (ADS)

    Sadollah, Ali; Guen Yoo, Do; Kim, Joong Hoon

    2015-12-01

    The design of water distribution systems is a large class of combinatorial, nonlinear optimization problems with complex constraints such as conservation of mass and energy equations. Since feasible solutions are often extremely complex, traditional optimization techniques are insufficient. Recently, metaheuristic algorithms have been applied to this class of problems because they are highly efficient. In this article, a recently developed optimizer called the mine blast algorithm (MBA) is considered. The MBA is improved and coupled with the hydraulic simulator EPANET to find the optimal cost design for water distribution systems. The performance of the improved mine blast algorithm (IMBA) is demonstrated using the well-known Hanoi, New York tunnels and Balerma benchmark networks. Optimization results obtained using IMBA are compared to those using MBA and other optimizers in terms of their minimum construction costs and convergence rates. For the complex Balerma network, IMBA offers the cheapest network design compared to other optimization algorithms.

  2. Mining subspace clusters from DNA microarray data using large itemset techniques.

    PubMed

    Chang, Ye-In; Chen, Jiun-Rung; Tsai, Yueh-Chi

    2009-05-01

    Mining subspace clusters from the DNA microarrays could help researchers identify those genes which commonly contribute to a disease, where a subspace cluster indicates a subset of genes whose expression levels are similar under a subset of conditions. Since in a DNA microarray, the number of genes is far larger than the number of conditions, those previous proposed algorithms which compute the maximum dimension sets (MDSs) for any two genes will take a long time to mine subspace clusters. In this article, we propose the Large Itemset-Based Clustering (LISC) algorithm for mining subspace clusters. Instead of constructing MDSs for any two genes, we construct only MDSs for any two conditions. Then, we transform the task of finding the maximal possible gene sets into the problem of mining large itemsets from the condition-pair MDSs. Since we are only interested in those subspace clusters with gene sets as large as possible, it is desirable to pay attention to those gene sets which have reasonable large support values in the condition-pair MDSs. From our simulation results, we show that the proposed algorithm needs shorter processing time than those previous proposed algorithms which need to construct gene-pair MDSs.

  3. Depth data research of GIS based on clustering analysis algorithm

    NASA Astrophysics Data System (ADS)

    Xiong, Yan; Xu, Wenli

    2018-03-01

    The data of GIS have spatial distribution. Geographic data has both spatial characteristics and attribute characteristics, and also changes with time. Therefore, the amount of data is very large. Nowadays, many industries and departments in the society are using GIS. However, without proper data analysis and mining scheme, GIS will not exert its maximum effectiveness and will waste a lot of data. In this paper, we use the geographic information demand of a national security department as the experimental object, combining the characteristics of GIS data, taking into account the characteristics of time, space, attributes and so on, and using cluster analysis algorithm. We further study the mining scheme for depth data, and get the algorithm model. This algorithm can automatically classify sample data, and then carry out exploratory analysis. The research shows that the algorithm model and the information mining scheme can quickly find hidden depth information from the surface data of GIS, thus improving the efficiency of the security department. This algorithm can also be extended to other fields.

  4. Bayesian Analysis of High Dimensional Classification

    NASA Astrophysics Data System (ADS)

    Mukhopadhyay, Subhadeep; Liang, Faming

    2009-12-01

    Modern data mining and bioinformatics have presented an important playground for statistical learning techniques, where the number of input variables is possibly much larger than the sample size of the training data. In supervised learning, logistic regression or probit regression can be used to model a binary output and form perceptron classification rules based on Bayesian inference. In these cases , there is a lot of interest in searching for sparse model in High Dimensional regression(/classification) setup. we first discuss two common challenges for analyzing high dimensional data. The first one is the curse of dimensionality. The complexity of many existing algorithms scale exponentially with the dimensionality of the space and by virtue of that algorithms soon become computationally intractable and therefore inapplicable in many real applications. secondly, multicollinearities among the predictors which severely slowdown the algorithm. In order to make Bayesian analysis operational in high dimension we propose a novel 'Hierarchical stochastic approximation monte carlo algorithm' (HSAMC), which overcomes the curse of dimensionality, multicollinearity of predictors in high dimension and also it possesses the self-adjusting mechanism to avoid the local minima separated by high energy barriers. Models and methods are illustrated by simulation inspired from from the feild of genomics. Numerical results indicate that HSAMC can work as a general model selection sampler in high dimensional complex model space.

  5. Rare itemsets mining algorithm based on RP-Tree and spark framework

    NASA Astrophysics Data System (ADS)

    Liu, Sainan; Pan, Haoan

    2018-05-01

    For the issues of the rare itemsets mining in big data, this paper proposed a rare itemsets mining algorithm based on RP-Tree and Spark framework. Firstly, it arranged the data vertically according to the transaction identifier, in order to solve the defects of scan the entire data set, the vertical datasets are divided into frequent vertical datasets and rare vertical datasets. Then, it adopted the RP-Tree algorithm to construct the frequent pattern tree that contains rare items and generate rare 1-itemsets. After that, it calculated the support of the itemsets by scanning the two vertical data sets, finally, it used the iterative process to generate rare itemsets. The experimental show that the algorithm can effectively excavate rare itemsets and have great superiority in execution time.

  6. Vlsi implementation of flexible architecture for decision tree classification in data mining

    NASA Astrophysics Data System (ADS)

    Sharma, K. Venkatesh; Shewandagn, Behailu; Bhukya, Shankar Nayak

    2017-07-01

    The Data mining algorithms have become vital to researchers in science, engineering, medicine, business, search and security domains. In recent years, there has been a terrific raise in the size of the data being collected and analyzed. Classification is the main difficulty faced in data mining. In a number of the solutions developed for this problem, most accepted one is Decision Tree Classification (DTC) that gives high precision while handling very large amount of data. This paper presents VLSI implementation of flexible architecture for Decision Tree classification in data mining using c4.5 algorithm.

  7. Economic Considerations of Early Rule-In/Rule-Out Algorithms for The Diagnosis of Myocardial Infarction in The Emergency Department Using Cardiac Troponin and Glycemic Biomarkers.

    PubMed

    Shortt, Colleen; Xie, Feng; Whitlock, Richard; Ma, Jinhui; Clayton, Natasha; Sherbino, Jonathan; Hill, Stephen A; Pare, Guillaume; McQueen, Matthew; Mehta, Shamir R; Devereaux, P J; Worster, Andrew; Kavsak, Peter

    2017-02-01

    We have previously demonstrated the utility of a rule-in/rule-out strategy for myocardial infarction (MI) using glycemic biomarkers in combination with cardiac troponin in the emergency department (ED). Given that the cost of assessing patients with possible MI in the ED is increasing, we sought to compare the health services cost of our previously identified early rule-in/rule-out approaches for MI among patients who present to the ED with symptoms suggestive of acute coronary syndrome (ACS). We compared the cost differences between different rule-in/rule-out strategies for MI using presentation cardiac troponin I (cTnI), high-sensitivity cTnI (hs-cTnI), high-sensitivity cardiac troponin T (hs-cTnT), glucose, and/or hemoglobin A 1c (Hb A 1c ) in 1137 ED patients (7-day MI n = 133) as per our previously defined algorithms and compared them with the European Society of Cardiology (ESC) 0-h algorithm-cutoffs. Costs associated with each decision model were obtained from site-specific sources (length of stay) and provincial sources (Ontario Case Costing Initiative). Algorithms incorporating cardiac troponin and glucose for early rule-in/rule-out were the most cost effective and clinically safest methods (i.e., ≤1 MI missed) for early decision making, with hs-cTnI and glucose yielding lower costs compared to cTnI and glucose, despite the higher price for the hs-cTnI test. The addition of Hb A 1c to the algorithms increased the cost of these algorithms but did not miss any additional patients with MI. Applying the ESC 0-h algorithm-cutoffs for hs-cTnI and hs-cTnT were the most costly. Rule-in/rule-out algorithms incorporating presentation glucose with high-sensitivity cardiac troponin are the safest and most cost-effective options as compared to the ESC 0-h algorithm-cutoffs. © 2016 American Association for Clinical Chemistry.

  8. 76 FR 41411 - West Virginia Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-07-14

    ... of Environmental Protection (WVDEP). The interim rule provided an opportunity for public comment and... 30 CFR Part 948 Intergovernmental relations, Surface mining, Underground mining. Dated: July 5, 2011...

  9. Comparative analysis of chemical similarity methods for modular natural products with a hypothetical structure enumeration algorithm.

    PubMed

    Skinnider, Michael A; Dejong, Chris A; Franczak, Brian C; McNicholas, Paul D; Magarvey, Nathan A

    2017-08-16

    Natural products represent a prominent source of pharmaceutically and industrially important agents. Calculating the chemical similarity of two molecules is a central task in cheminformatics, with applications at multiple stages of the drug discovery pipeline. Quantifying the similarity of natural products is a particularly important problem, as the biological activities of these molecules have been extensively optimized by natural selection. The large and structurally complex scaffolds of natural products distinguish their physical and chemical properties from those of synthetic compounds. However, no analysis of the performance of existing methods for molecular similarity calculation specific to natural products has been reported to date. Here, we present LEMONS, an algorithm for the enumeration of hypothetical modular natural product structures. We leverage this algorithm to conduct a comparative analysis of molecular similarity methods within the unique chemical space occupied by modular natural products using controlled synthetic data, and comprehensively investigate the impact of diverse biosynthetic parameters on similarity search. We additionally investigate a recently described algorithm for natural product retrobiosynthesis and alignment, and find that when rule-based retrobiosynthesis can be applied, this approach outperforms conventional two-dimensional fingerprints, suggesting it may represent a valuable approach for the targeted exploration of natural product chemical space and microbial genome mining. Our open-source algorithm is an extensible method of enumerating hypothetical natural product structures with diverse potential applications in bioinformatics.

  10. Temporal data mining for the quality assessment of hemodialysis services.

    PubMed

    Bellazzi, Riccardo; Larizza, Cristiana; Magni, Paolo; Bellazzi, Roberto

    2005-05-01

    This paper describes the temporal data mining aspects of a research project that deals with the definition of methods and tools for the assessment of the clinical performance of hemodialysis (HD) services, on the basis of the time series automatically collected during hemodialysis sessions. Intelligent data analysis and temporal data mining techniques are applied to gain insight and to discover knowledge on the causes of unsatisfactory clinical results. In particular, two new methods for association rule discovery and temporal rule discovery are applied to the time series. Such methods exploit several pre-processing techniques, comprising data reduction, multi-scale filtering and temporal abstractions. We have analyzed the data of more than 5800 dialysis sessions coming from 43 different patients monitored for 19 months. The qualitative rules associating the outcome parameters and the measured variables were examined by the domain experts, which were able to distinguish between rules confirming available background knowledge and unexpected but plausible rules. The new methods proposed in the paper are suitable tools for knowledge discovery in clinical time series. Their use in the context of an auditing system for dialysis management helped clinicians to improve their understanding of the patients' behavior.

  11. Graph Mining Meets the Semantic Web

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, Sangkeun; Sukumar, Sreenivas R; Lim, Seung-Hwan

    The Resource Description Framework (RDF) and SPARQL Protocol and RDF Query Language (SPARQL) were introduced about a decade ago to enable flexible schema-free data interchange on the Semantic Web. Today, data scientists use the framework as a scalable graph representation for integrating, querying, exploring and analyzing data sets hosted at different sources. With increasing adoption, the need for graph mining capabilities for the Semantic Web has emerged. We address that need through implementation of three popular iterative Graph Mining algorithms (Triangle count, Connected component analysis, and PageRank). We implement these algorithms as SPARQL queries, wrapped within Python scripts. We evaluatemore » the performance of our implementation on 6 real world data sets and show graph mining algorithms (that have a linear-algebra formulation) can indeed be unleashed on data represented as RDF graphs using the SPARQL query interface.« less

  12. 30 CFR 912.700 - Idaho Federal program.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR PROGRAMS FOR THE CONDUCT OF SURFACE MINING OPERATIONS WITHIN EACH STATE IDAHO § 912.700 Idaho Federal program. (a) This part contains all rules that are applicable to surface coal mining operations in Idaho...

  13. 30 CFR 905.700 - California Federal Program.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ....700 Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR PROGRAMS FOR THE CONDUCT OF SURFACE MINING OPERATIONS WITHIN EACH STATE CALIFORNIA § 905.700 California Federal Program. (a) This part contains all rules that are applicable to surface coal mining operations in...

  14. 30 CFR 947.700 - Washington Federal program.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ....700 Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR PROGRAMS FOR THE CONDUCT OF SURFACE MINING OPERATIONS WITHIN EACH STATE WASHINGTON § 947.700 Washington Federal program. (a) This part contains all rules that are applicable to surface coal mining operations in...

  15. 30 CFR 922.700 - Michigan Federal program.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ....700 Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR PROGRAMS FOR THE CONDUCT OF SURFACE MINING OPERATIONS WITHIN EACH STATE MICHIGAN § 922.700 Michigan Federal program. (a) This part contains all rules that are applicable to surface coal mining operations in...

  16. 30 CFR 910.700 - Georgia Federal program.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ....700 Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR PROGRAMS FOR THE CONDUCT OF SURFACE MINING OPERATIONS WITHIN EACH STATE GEORGIA § 910.700 Georgia Federal program. (a) This part contains all rules that are applicable to surface coal mining operations in Georgia...

  17. 30 CFR 937.700 - Oregon Federal program.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR PROGRAMS FOR THE CONDUCT OF SURFACE MINING OPERATIONS WITHIN EACH STATE OREGON § 937.700 Oregon Federal program. (a) This part contains all rules that are applicable to surface coal mining operations in Oregon...

  18. 30 CFR 942.700 - Tennessee Federal program.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ....700 Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR PROGRAMS FOR THE CONDUCT OF SURFACE MINING OPERATIONS WITHIN EACH STATE TENNESSEE § 942.700 Tennessee Federal program. (a) This part contains all rules that are applicable to surface coal mining operations in...

  19. A Novel Hybrid Intelligent Indoor Location Method for Mobile Devices by Zones Using Wi-Fi Signals

    PubMed Central

    Castañón–Puga, Manuel; Salazar, Abby Stephanie; Aguilar, Leocundo; Gaxiola-Pacheco, Carelia; Licea, Guillermo

    2015-01-01

    The increasing use of mobile devices in indoor spaces brings challenges to location methods. This work presents a hybrid intelligent method based on data mining and Type-2 fuzzy logic to locate mobile devices in an indoor space by zones using Wi-Fi signals from selected access points (APs). This approach takes advantage of wireless local area networks (WLANs) over other types of architectures and implements the complete method in a mobile application using the developed tools. Besides, the proposed approach is validated by experimental data obtained from case studies and the cross-validation technique. For the purpose of generating the fuzzy rules that conform to the Takagi–Sugeno fuzzy system structure, a semi-supervised data mining technique called subtractive clustering is used. This algorithm finds centers of clusters from the radius map given by the collected signals from APs. Measurements of Wi-Fi signals can be noisy due to several factors mentioned in this work, so this method proposed the use of Type-2 fuzzy logic for modeling and dealing with such uncertain information. PMID:26633417

  20. A Novel Hybrid Intelligent Indoor Location Method for Mobile Devices by Zones Using Wi-Fi Signals.

    PubMed

    Castañón-Puga, Manuel; Salazar, Abby Stephanie; Aguilar, Leocundo; Gaxiola-Pacheco, Carelia; Licea, Guillermo

    2015-12-02

    The increasing use of mobile devices in indoor spaces brings challenges to location methods. This work presents a hybrid intelligent method based on data mining and Type-2 fuzzy logic to locate mobile devices in an indoor space by zones using Wi-Fi signals from selected access points (APs). This approach takes advantage of wireless local area networks (WLANs) over other types of architectures and implements the complete method in a mobile application using the developed tools. Besides, the proposed approach is validated by experimental data obtained from case studies and the cross-validation technique. For the purpose of generating the fuzzy rules that conform to the Takagi-Sugeno fuzzy system structure, a semi-supervised data mining technique called subtractive clustering is used. This algorithm finds centers of clusters from the radius map given by the collected signals from APs. Measurements of Wi-Fi signals can be noisy due to several factors mentioned in this work, so this method proposed the use of Type-2 fuzzy logic for modeling and dealing with such uncertain information.

  1. Object-Driven and Temporal Action Rules Mining

    ERIC Educational Resources Information Center

    Hajja, Ayman

    2013-01-01

    In this thesis, I present my complete research work in the field of action rules, more precisely object-driven and temporal action rules. The drive behind the introduction of object-driven and temporally based action rules is to bring forth an adapted approach to extract action rules from a subclass of systems that have a specific nature, in which…

  2. Data Mining.

    ERIC Educational Resources Information Center

    Benoit, Gerald

    2002-01-01

    Discusses data mining (DM) and knowledge discovery in databases (KDD), taking the view that KDD is the larger view of the entire process, with DM emphasizing the cleaning, warehousing, mining, and visualization of knowledge discovery in databases. Highlights include algorithms; users; the Internet; text mining; and information extraction.…

  3. Intertransaction Class Association Rule Mining Based on Genetic Network Programming and Its Application to Stock Market Prediction

    NASA Astrophysics Data System (ADS)

    Yang, Yuchen; Mabu, Shingo; Shimada, Kaoru; Hirasawa, Kotaro

    Intertransaction association rules have been reported to be useful in many fields such as stock market prediction, but still there are not so many efficient methods to dig them out from large data sets. Furthermore, how to use and measure these more complex rules should be considered carefully. In this paper, we propose a new intertransaction class association rule mining method based on Genetic Network Programming (GNP), which has the ability to overcome some shortages of Apriori-like based intertransaction association methods. Moreover, a general classifier model for intertransaction rules is also introduced. In experiments on the real world application of stock market prediction, the method shows its efficiency and ability to obtain good results and can bring more benefits with a suitable classifier considering larger interval span.

  4. Determinants and development of a web-based child mortality prediction model in resource-limited settings: A data mining approach.

    PubMed

    Tesfaye, Brook; Atique, Suleman; Elias, Noah; Dibaba, Legesse; Shabbir, Syed-Abdul; Kebede, Mihiretu

    2017-03-01

    Improving child health and reducing child mortality rate are key health priorities in developing countries. This study aimed to identify determinant sand develop, a web-based child mortality prediction model in Ethiopian local language using classification data mining algorithm. Decision tree (using J48 algorithm) and rule induction (using PART algorithm) techniques were applied on 11,654 records of Ethiopian demographic and health survey data. Waikato Environment for Knowledge Analysis (WEKA) for windows version 3.6.8 was used to develop optimal models. 8157 (70%) records were randomly allocated to training group for model building while; the remaining 3496 (30%) records were allocated as the test group for model validation. The validation of the model was assessed using accuracy, sensitivity, specificity and area under Receiver Operating Characteristics (ROC) curve. Using Statistical Package for Social Sciences (SPSS) version 20.0; logistic regressions and Odds Ratio (OR) with 95% Confidence Interval (CI) was used to identify determinants of child mortality. The child mortality rate was 72 deaths per 1000 live births. Breast-feeding (AOR= 1.46, (95% CI [1.22. 1.75]), maternal education (AOR= 1.40, 95% CI [1.11, 1.81]), family planning (AOR= 1.21, [1.08, 1.43]), preceding birth interval (AOR= 4.90, [2.94, 8.15]), presence of diarrhea (AOR= 1.54, 95% CI [1.32, 1.66]), father's education (AOR= 1.4, 95% CI [1.04, 1.78]), low birth weight (AOR= 1.2, 95% CI [0.98, 1.51]) and, age of the mother at first birth (AOR= 1.42, [1.01-1.89]) were found to be determinants for child mortality. The J48 model had better performance, accuracy (94.3%), sensitivity (93.8%), specificity (94.3%), Positive Predictive Value (PPV) (92.2%), Negative Predictive Value (NPV) (94.5%) and, the area under ROC (94.8%). Subsequent to developing an optimal prediction model, we relied on this model to develop a web-based application system for child mortality prediction. In this study, nearly accurate results were obtained by employing decision tree and rule induction techniques. Determinants are identified and a web-based child mortality prediction model in Ethiopian local language is developed. Thus, the result obtained could support child health intervention programs in Ethiopia where trained human resource for health is limited. Advanced classification algorithms need to be tested to come up with optimal models. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  5. Rescue complex for coal mines

    NASA Astrophysics Data System (ADS)

    Yungmeyster, D. A.; Urazbakhtin, R. Yu

    2017-10-01

    The mining industry was potentially dangerous at all times, even with the use of modern equipment in mines, accidents continue to occur, including catastrophic ones. Accidents in mines are due to the presence of specific features in the conduct of mining operations. These include the inconsistency of mining and geological conditions, the contamination of the mine atmosphere due to the release of gases from minerals, the presence of self-igniting coal strata, which creates the danger of underground fires, gas explosions. The main cause of accidents is the irresponsibility of both the manager and the personnel who violate the safety rules during mining operations.

  6. Performance analysis of a multispectral system for mine detection in the littoral zone

    NASA Astrophysics Data System (ADS)

    Hargrove, John T.; Louchard, Eric

    2004-09-01

    Science & Technology International (STI) has developed, under contract with the Office of Naval Research, a system of multispectral airborne sensors and processing algorithms capable of detecting mine-like objects in the surf zone. STI has used this system to detect mine-like objects in a littoral environment as part of blind tests at Kaneohe Marine Corps Base Hawaii, and Panama City, Florida. The airborne and ground subsystems are described. The detection algorithm is graphically illustrated. We report on the performance of the system configured to operate without a human in the loop. A subsurface (underwater bottom proud mine in the surf zone and moored mine in shallow water) mine detection capability is demonstrated in the surf zone, and in shallow water with wave spillage and foam. Our analysis demonstrates that this STI-developed multispectral airborne mine detection system provides a technical foundation for a viable mine counter-measures system for use prior to an amphibious assault.

  7. Graphics-based intelligent search and abstracting using Data Modeling

    NASA Astrophysics Data System (ADS)

    Jaenisch, Holger M.; Handley, James W.; Case, Carl T.; Songy, Claude G.

    2002-11-01

    This paper presents an autonomous text and context-mining algorithm that converts text documents into point clouds for visual search cues. This algorithm is applied to the task of data-mining a scriptural database comprised of the Old and New Testaments from the Bible and the Book of Mormon, Doctrine and Covenants, and the Pearl of Great Price. Results are generated which graphically show the scripture that represents the average concept of the database and the mining of the documents down to the verse level.

  8. Cart'Eaux: an automatic mapping procedure for wastewater networks using machine learning and data mining

    NASA Astrophysics Data System (ADS)

    Bailly, J. S.; Delenne, C.; Chahinian, N.; Bringay, S.; Commandré, B.; Chaumont, M.; Derras, M.; Deruelle, L.; Roche, M.; Rodriguez, F.; Subsol, G.; Teisseire, M.

    2017-12-01

    In France, local government institutions must establish a detailed description of wastewater networks. The information should be available, but it remains fragmented (different formats held by different stakeholders) and incomplete. In the "Cart'Eaux" project, a multidisciplinary team, including an industrial partner, develops a global methodology using Machine Learning and Data Mining approaches applied to various types of large data to recover information in the aim of mapping urban sewage systems for hydraulic modelling. Deep-learning is first applied using a Convolution Neural Network to localize manhole covers on 5 cm resolution aerial RGB images. The detected manhole covers are then automatically connected using a tree-shaped graph constrained by industry rules. Based on a Delaunay triangulation, connections are chosen to minimize a cost function depending on pipe length, slope and possible intersection with roads or buildings. A stochastic version of this algorithm is currently being developed to account for positional uncertainty and detection errors, and generate sets of probable networks. As more information is required for hydraulic modeling (slopes, diameters, materials, etc.), text data mining is used to extract network characteristics from data posted on the Web or available through governmental or specific databases. Using an appropriate list of keywords, the web is scoured for documents which are saved in text format. The thematic entities are identified and linked to the surrounding spatial and temporal entities. The methodology is developed and tested on two towns in southern France. The primary results are encouraging: 54% of manhole covers are detected with few false detections, enabling the reconstruction of probable networks. The data mining results are still being investigated. It is clear at this stage that getting numerical values on specific pipes will be challenging. Thus, when no information is found, decision rules will be used to assign admissible numerical values to enable the final hydraulic modelling. Consequently, sensitivity analysis of the hydraulic model will be performed to take into account the uncertainty associated with each piece of information. Project funded by the European Regional Development Fund and the Occitanie Region.

  9. Finding novel relationships with integrated gene-gene association network analysis of Synechocystis sp. PCC 6803 using species-independent text-mining

    PubMed Central

    Kreula, Sanna M.; Kaewphan, Suwisa; Ginter, Filip

    2018-01-01

    The increasing move towards open access full-text scientific literature enhances our ability to utilize advanced text-mining methods to construct information-rich networks that no human will be able to grasp simply from ‘reading the literature’. The utility of text-mining for well-studied species is obvious though the utility for less studied species, or those with no prior track-record at all, is not clear. Here we present a concept for how advanced text-mining can be used to create information-rich networks even for less well studied species and apply it to generate an open-access gene-gene association network resource for Synechocystis sp. PCC 6803, a representative model organism for cyanobacteria and first case-study for the methodology. By merging the text-mining network with networks generated from species-specific experimental data, network integration was used to enhance the accuracy of predicting novel interactions that are biologically relevant. A rule-based algorithm (filter) was constructed in order to automate the search for novel candidate genes with a high degree of likely association to known target genes by (1) ignoring established relationships from the existing literature, as they are already ‘known’, and (2) demanding multiple independent evidences for every novel and potentially relevant relationship. Using selected case studies, we demonstrate the utility of the network resource and filter to (i) discover novel candidate associations between different genes or proteins in the network, and (ii) rapidly evaluate the potential role of any one particular gene or protein. The full network is provided as an open-source resource. PMID:29844966

  10. Toward a Progress Indicator for Machine Learning Model Building and Data Mining Algorithm Execution: A Position Paper.

    PubMed

    Luo, Gang

    2017-12-01

    For user-friendliness, many software systems offer progress indicators for long-duration tasks. A typical progress indicator continuously estimates the remaining task execution time as well as the portion of the task that has been finished. Building a machine learning model often takes a long time, but no existing machine learning software supplies a non-trivial progress indicator. Similarly, running a data mining algorithm often takes a long time, but no existing data mining software provides a nontrivial progress indicator. In this article, we consider the problem of offering progress indicators for machine learning model building and data mining algorithm execution. We discuss the goals and challenges intrinsic to this problem. Then we describe an initial framework for implementing such progress indicators and two advanced, potential uses of them, with the goal of inspiring future research on this topic.

  11. Toward a Progress Indicator for Machine Learning Model Building and Data Mining Algorithm Execution: A Position Paper

    PubMed Central

    Luo, Gang

    2017-01-01

    For user-friendliness, many software systems offer progress indicators for long-duration tasks. A typical progress indicator continuously estimates the remaining task execution time as well as the portion of the task that has been finished. Building a machine learning model often takes a long time, but no existing machine learning software supplies a non-trivial progress indicator. Similarly, running a data mining algorithm often takes a long time, but no existing data mining software provides a nontrivial progress indicator. In this article, we consider the problem of offering progress indicators for machine learning model building and data mining algorithm execution. We discuss the goals and challenges intrinsic to this problem. Then we describe an initial framework for implementing such progress indicators and two advanced, potential uses of them, with the goal of inspiring future research on this topic. PMID:29177022

  12. Redundancy checking algorithms based on parallel novel extension rule

    NASA Astrophysics Data System (ADS)

    Liu, Lei; Yang, Yang; Li, Guangli; Wang, Qi; Lü, Shuai

    2017-05-01

    Redundancy checking (RC) is a key knowledge reduction technology. Extension rule (ER) is a new reasoning method, first presented in 2003 and well received by experts at home and abroad. Novel extension rule (NER) is an improved ER-based reasoning method, presented in 2009. In this paper, we first analyse the characteristics of the extension rule, and then present a simple algorithm for redundancy checking based on extension rule (RCER). In addition, we introduce MIMF, a type of heuristic strategy. Using the aforementioned rule and strategy, we design and implement RCHER algorithm, which relies on MIMF. Next we design and implement an RCNER (redundancy checking based on NER) algorithm based on NER. Parallel computing greatly accelerates the NER algorithm, which has weak dependence among tasks when executed. Considering this, we present PNER (parallel NER) and apply it to redundancy checking and necessity checking. Furthermore, we design and implement the RCPNER (redundancy checking based on PNER) and NCPPNER (necessary clause partition based on PNER) algorithms as well. The experimental results show that MIMF significantly influences the acceleration of algorithm RCER in formulae on a large scale and high redundancy. Comparing PNER with NER and RCPNER with RCNER, the average speedup can reach up to the number of task decompositions when executed. Comparing NCPNER with the RCNER-based algorithm on separating redundant formulae, speedup increases steadily as the scale of the formulae is incrementing. Finally, we describe the challenges that the extension rule will be faced with and suggest possible solutions.

  13. Differentially Private Frequent Subgraph Mining

    PubMed Central

    Xu, Shengzhi; Xiong, Li; Cheng, Xiang; Xiao, Ke

    2016-01-01

    Mining frequent subgraphs from a collection of input graphs is an important topic in data mining research. However, if the input graphs contain sensitive information, releasing frequent subgraphs may pose considerable threats to individual's privacy. In this paper, we study the problem of frequent subgraph mining (FGM) under the rigorous differential privacy model. We introduce a novel differentially private FGM algorithm, which is referred to as DFG. In this algorithm, we first privately identify frequent subgraphs from input graphs, and then compute the noisy support of each identified frequent subgraph. In particular, to privately identify frequent subgraphs, we present a frequent subgraph identification approach which can improve the utility of frequent subgraph identifications through candidates pruning. Moreover, to compute the noisy support of each identified frequent subgraph, we devise a lattice-based noisy support derivation approach, where a series of methods has been proposed to improve the accuracy of the noisy supports. Through formal privacy analysis, we prove that our DFG algorithm satisfies ε-differential privacy. Extensive experimental results on real datasets show that the DFG algorithm can privately find frequent subgraphs with high data utility. PMID:27616876

  14. A gossip based information fusion protocol for distributed frequent itemset mining

    NASA Astrophysics Data System (ADS)

    Sohrabi, Mohammad Karim

    2018-07-01

    The computational complexity, huge memory space requirement, and time-consuming nature of frequent pattern mining process are the most important motivations for distribution and parallelization of this mining process. On the other hand, the emergence of distributed computational and operational environments, which causes the production and maintenance of data on different distributed data sources, makes the parallelization and distribution of the knowledge discovery process inevitable. In this paper, a gossip based distributed itemset mining (GDIM) algorithm is proposed to extract frequent itemsets, which are special types of frequent patterns, in a wireless sensor network environment. In this algorithm, local frequent itemsets of each sensor are extracted using a bit-wise horizontal approach (LHPM) from the nodes which are clustered using a leach-based protocol. Heads of clusters exploit a gossip based protocol in order to communicate each other to find the patterns which their global support is equal to or more than the specified support threshold. Experimental results show that the proposed algorithm outperforms the best existing gossip based algorithm in term of execution time.

  15. Neural network explanation using inversion.

    PubMed

    Saad, Emad W; Wunsch, Donald C

    2007-01-01

    An important drawback of many artificial neural networks (ANN) is their lack of explanation capability [Andrews, R., Diederich, J., & Tickle, A. B. (1996). A survey and critique of techniques for extracting rules from trained artificial neural networks. Knowledge-Based Systems, 8, 373-389]. This paper starts with a survey of algorithms which attempt to explain the ANN output. We then present HYPINV, a new explanation algorithm which relies on network inversion; i.e. calculating the ANN input which produces a desired output. HYPINV is a pedagogical algorithm, that extracts rules, in the form of hyperplanes. It is able to generate rules with arbitrarily desired fidelity, maintaining a fidelity-complexity tradeoff. To our knowledge, HYPINV is the only pedagogical rule extraction method, which extracts hyperplane rules from continuous or binary attribute neural networks. Different network inversion techniques, involving gradient descent as well as an evolutionary algorithm, are presented. An information theoretic treatment of rule extraction is presented. HYPINV is applied to example synthetic problems, to a real aerospace problem, and compared with similar algorithms using benchmark problems.

  16. Formulations and algorithms for problems on rock mass and support deformation during mining

    NASA Astrophysics Data System (ADS)

    Seryakov, VM

    2018-03-01

    The analysis of problem formulations to calculate stress-strain state of mine support and surrounding rocks mass in rock mechanics shows that such formulations incompletely describe the mechanical features of joint deformation in the rock mass–support system. The present paper proposes an algorithm to take into account the actual conditions of rock mass and support interaction and the algorithm implementation method to ensure efficient calculation of stresses in rocks and support.

  17. NIOSH comments to DOL on the Mine Safety and Health Administration's proposed rule on air quality, chemical substances, and respiratory protection standards by J. D. Millar, March 1, 1990

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    The testimony concerns the views of NIOSH regarding the Mine Safety and Health Administration (MSHA) proposed rule on permissible exposure limits; exposure monitoring, abrasive blasting; drill dust control; dangerous atmospheres; and prohibited areas for food and beverages. NIOSH continues to endorse the recommended exposure limit of 1 part per million (ppm) as a 15 minute short term exposure limit for nitrogen-dioxide (10102440). NIOSH supports MSHA in proposing an 8 hour time weighted average of 25ppm for nitric-oxide (10102439). NIOSH supports MSHA in proposing a limit of 35ppm as an 8 hour time weighted average (TWA) for carbon-monoxide (630080) and recommendsmore » that sulfur-dioxide (7446095) exposure be limited to 0.5ppm as an 8 hour TWA. NIOSH recommends that routine air monitoring be required on a periodic basis. NIOSH recommends that mine operators be required to establish a written exposure monitoring plan for each facility that outlines where area and personal samples should be taken, how many samples should be taken, and the implementation of the remaining portions of the proposed rule change. NIOSH supports the rules for abrasive blasting for both coal and metal/nonmetal mines and has identified several substitutive materials for silica sand that could be used in abrasive blasting.« less

  18. 30 CFR 77.1600 - Loading and haulage; general.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... permitted on haulage roads and at loading or dumping locations. (b) Traffic rules, signals, and warning signs shall be standardized at each mine and posted. (c) Where side or overhead clearances on any haulage road or at any loading or dumping location at the mine are hazardous to mine workers, such areas...

  19. 30 CFR 77.1600 - Loading and haulage; general.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... permitted on haulage roads and at loading or dumping locations. (b) Traffic rules, signals, and warning signs shall be standardized at each mine and posted. (c) Where side or overhead clearances on any haulage road or at any loading or dumping location at the mine are hazardous to mine workers, such areas...

  20. 30 CFR 77.1600 - Loading and haulage; general.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... permitted on haulage roads and at loading or dumping locations. (b) Traffic rules, signals, and warning signs shall be standardized at each mine and posted. (c) Where side or overhead clearances on any haulage road or at any loading or dumping location at the mine are hazardous to mine workers, such areas...

  1. 30 CFR 77.1600 - Loading and haulage; general.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... permitted on haulage roads and at loading or dumping locations. (b) Traffic rules, signals, and warning signs shall be standardized at each mine and posted. (c) Where side or overhead clearances on any haulage road or at any loading or dumping location at the mine are hazardous to mine workers, such areas...

  2. 30 CFR 77.1600 - Loading and haulage; general.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... permitted on haulage roads and at loading or dumping locations. (b) Traffic rules, signals, and warning signs shall be standardized at each mine and posted. (c) Where side or overhead clearances on any haulage road or at any loading or dumping location at the mine are hazardous to mine workers, such areas...

  3. 30 CFR 944.30 - State-Federal Cooperative Agreement.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... Division of Oil, Gas, and Mining (DOGM) will be responsible for administering this Agreement on behalf of..., Final Rules of the Board of Oil, Gas and Mining, UMC/SMC 700 et seq. [52 FR 7850, Mar. 13, 1987] ... INTERIOR PROGRAMS FOR THE CONDUCT OF SURFACE MINING OPERATIONS WITHIN EACH STATE UTAH § 944.30 State...

  4. 30 CFR 944.30 - State-Federal Cooperative Agreement.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... Division of Oil, Gas, and Mining (DOGM) will be responsible for administering this Agreement on behalf of..., Final Rules of the Board of Oil, Gas and Mining, UMC/SMC 700 et seq. [52 FR 7850, Mar. 13, 1987] ... INTERIOR PROGRAMS FOR THE CONDUCT OF SURFACE MINING OPERATIONS WITHIN EACH STATE UTAH § 944.30 State...

  5. 30 CFR 944.30 - State-Federal Cooperative Agreement.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... Division of Oil, Gas, and Mining (DOGM) will be responsible for administering this Agreement on behalf of..., Final Rules of the Board of Oil, Gas and Mining, UMC/SMC 700 et seq. [52 FR 7850, Mar. 13, 1987] ... INTERIOR PROGRAMS FOR THE CONDUCT OF SURFACE MINING OPERATIONS WITHIN EACH STATE UTAH § 944.30 State...

  6. 30 CFR 944.30 - State-Federal Cooperative Agreement.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... Division of Oil, Gas, and Mining (DOGM) will be responsible for administering this Agreement on behalf of..., Final Rules of the Board of Oil, Gas and Mining, UMC/SMC 700 et seq. [52 FR 7850, Mar. 13, 1987] ... INTERIOR PROGRAMS FOR THE CONDUCT OF SURFACE MINING OPERATIONS WITHIN EACH STATE UTAH § 944.30 State...

  7. Promoter Sequences Prediction Using Relational Association Rule Mining

    PubMed Central

    Czibula, Gabriela; Bocicor, Maria-Iuliana; Czibula, Istvan Gergely

    2012-01-01

    In this paper we are approaching, from a computational perspective, the problem of promoter sequences prediction, an important problem within the field of bioinformatics. As the conditions for a DNA sequence to function as a promoter are not known, machine learning based classification models are still developed to approach the problem of promoter identification in the DNA. We are proposing a classification model based on relational association rules mining. Relational association rules are a particular type of association rules and describe numerical orderings between attributes that commonly occur over a data set. Our classifier is based on the discovery of relational association rules for predicting if a DNA sequence contains or not a promoter region. An experimental evaluation of the proposed model and comparison with similar existing approaches is provided. The obtained results show that our classifier overperforms the existing techniques for identifying promoter sequences, confirming the potential of our proposal. PMID:22563233

  8. Rule Mining Techniques to Predict Prokaryotic Metabolic Pathways.

    PubMed

    Saidi, Rabie; Boudellioua, Imane; Martin, Maria J; Solovyev, Victor

    2017-01-01

    It is becoming more evident that computational methods are needed for the identification and the mapping of pathways in new genomes. We introduce an automatic annotation system (ARBA4Path Association Rule-Based Annotator for Pathways) that utilizes rule mining techniques to predict metabolic pathways across wide range of prokaryotes. It was demonstrated that specific combinations of protein domains (recorded in our rules) strongly determine pathways in which proteins are involved and thus provide information that let us very accurately assign pathway membership (with precision of 0.999 and recall of 0.966) to proteins of a given prokaryotic taxon. Our system can be used to enhance the quality of automatically generated annotations as well as annotating proteins with unknown function. The prediction models are represented in the form of human-readable rules, and they can be used effectively to add absent pathway information to many proteins in UniProtKB/TrEMBL database.

  9. Real -time dispatching modelling for trucks with different capacities in open pit mines / Modelowanie w czasie rzeczywistym przewozów ciężarówek o różnej ładowności w kopalni odkrywkowej

    NASA Astrophysics Data System (ADS)

    Ahangaran, Daryoush Kaveh; Yasrebi, Amir Bijan; Wetherelt, Andy; Foster, Patrick

    2012-10-01

    Application of fully automated systems for truck dispatching plays a major role in decreasing the transportation costs which often represent the majority of costs spent on open pit mining. Consequently, the application of a truck dispatching system has become fundamentally important in most of the world's open pit mines. Recent experiences indicate that by decreasing a truck's travelling time and the associated waiting time of its associated shovel then due to the application of a truck dispatching system the rate of production will be considerably improved. Computer-based truck dispatching systems using algorithms, advanced and accurate software are examples of these innovations. Developing an algorithm of a computer- based program appropriated to a specific mine's conditions is considered as one of the most important activities in connection with computer-based dispatching in open pit mines. In this paper the changing trend of programming and dispatching control algorithms and automation conditions will be discussed. Furthermore, since the transportation fleet of most mines use trucks with different capacities, innovative methods, operational optimisation techniques and the best possible methods for developing the required algorithm for real-time dispatching are selected by conducting research on mathematical-based planning methods. Finally, a real-time dispatching model compatible with the requirement of trucks with different capacities is developed by using two techniques of flow networks and integer programming.

  10. An application of data mining in district heating substations for improving energy performance

    NASA Astrophysics Data System (ADS)

    Xue, Puning; Zhou, Zhigang; Chen, Xin; Liu, Jing

    2017-11-01

    Automatic meter reading system is capable of collecting and storing a huge number of district heating (DH) data. However, the data obtained are rarely fully utilized. Data mining is a promising technology to discover potential interesting knowledge from vast data. This paper applies data mining methods to analyse the massive data for improving energy performance of DH substation. The technical approach contains three steps: data selection, cluster analysis and association rule mining (ARM). Two-heating-season data of a substation are used for case study. Cluster analysis identifies six distinct heating patterns based on the primary heat of the substation. ARM reveals that secondary pressure difference and secondary flow rate have a strong correlation. Using the discovered rules, a fault occurring in remote flow meter installed at secondary network is detected accurately. The application demonstrates that data mining techniques can effectively extrapolate potential useful knowledge to better understand substation operation strategies and improve substation energy performance.

  11. Anchor-Free Localization Method for Mobile Targets in Coal Mine Wireless Sensor Networks

    PubMed Central

    Pei, Zhongmin; Deng, Zhidong; Xu, Shuo; Xu, Xiao

    2009-01-01

    Severe natural conditions and complex terrain make it difficult to apply precise localization in underground mines. In this paper, an anchor-free localization method for mobile targets is proposed based on non-metric multi-dimensional scaling (Multi-dimensional Scaling: MDS) and rank sequence. Firstly, a coal mine wireless sensor network is constructed in underground mines based on the ZigBee technology. Then a non-metric MDS algorithm is imported to estimate the reference nodes’ location. Finally, an improved sequence-based localization algorithm is presented to complete precise localization for mobile targets. The proposed method is tested through simulations with 100 nodes, outdoor experiments with 15 ZigBee physical nodes, and the experiments in the mine gas explosion laboratory with 12 ZigBee nodes. Experimental results show that our method has better localization accuracy and is more robust in underground mines. PMID:22574048

  12. Anchor-free localization method for mobile targets in coal mine wireless sensor networks.

    PubMed

    Pei, Zhongmin; Deng, Zhidong; Xu, Shuo; Xu, Xiao

    2009-01-01

    Severe natural conditions and complex terrain make it difficult to apply precise localization in underground mines. In this paper, an anchor-free localization method for mobile targets is proposed based on non-metric multi-dimensional scaling (Multi-dimensional Scaling: MDS) and rank sequence. Firstly, a coal mine wireless sensor network is constructed in underground mines based on the ZigBee technology. Then a non-metric MDS algorithm is imported to estimate the reference nodes' location. Finally, an improved sequence-based localization algorithm is presented to complete precise localization for mobile targets. The proposed method is tested through simulations with 100 nodes, outdoor experiments with 15 ZigBee physical nodes, and the experiments in the mine gas explosion laboratory with 12 ZigBee nodes. Experimental results show that our method has better localization accuracy and is more robust in underground mines.

  13. Convalescing Cluster Configuration Using a Superlative Framework

    PubMed Central

    Sabitha, R.; Karthik, S.

    2015-01-01

    Competent data mining methods are vital to discover knowledge from databases which are built as a result of enormous growth of data. Various techniques of data mining are applied to obtain knowledge from these databases. Data clustering is one such descriptive data mining technique which guides in partitioning data objects into disjoint segments. K-means algorithm is a versatile algorithm among the various approaches used in data clustering. The algorithm and its diverse adaptation methods suffer certain problems in their performance. To overcome these issues a superlative algorithm has been proposed in this paper to perform data clustering. The specific feature of the proposed algorithm is discretizing the dataset, thereby improving the accuracy of clustering, and also adopting the binary search initialization method to generate cluster centroids. The generated centroids are fed as input to K-means approach which iteratively segments the data objects into respective clusters. The clustered results are measured for accuracy and validity. Experiments conducted by testing the approach on datasets from the UC Irvine Machine Learning Repository evidently show that the accuracy and validity measure is higher than the other two approaches, namely, simple K-means and Binary Search method. Thus, the proposed approach proves that discretization process will improve the efficacy of descriptive data mining tasks. PMID:26543895

  14. EAGLE: 'EAGLE'Is an' Algorithmic Graph Library for Exploration

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    2015-01-16

    The Resource Description Framework (RDF) and SPARQL Protocol and RDF Query Language (SPARQL) were introduced about a decade ago to enable flexible schema-free data interchange on the Semantic Web. Today data scientists use the framework as a scalable graph representation for integrating, querying, exploring and analyzing data sets hosted at different sources. With increasing adoption, the need for graph mining capabilities for the Semantic Web has emerged. Today there is no tools to conduct "graph mining" on RDF standard data sets. We address that need through implementation of popular iterative Graph Mining algorithms (Triangle count, Connected component analysis, degree distribution,more » diversity degree, PageRank, etc.). We implement these algorithms as SPARQL queries, wrapped within Python scripts and call our software tool as EAGLE. In RDF style, EAGLE stands for "EAGLE 'Is an' algorithmic graph library for exploration. EAGLE is like 'MATLAB' for 'Linked Data.'« less

  15. 43 CFR 4.1272 - Interlocutory appeals.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... PROCEDURES Special Rules Applicable to Surface Coal Mining Hearings and Appeals Appeals to the Board from... modification of the administrative law judge's interlocutory ruling or order, the jurisdiction of the Board...

  16. Anytime synthetic projection: Maximizing the probability of goal satisfaction

    NASA Technical Reports Server (NTRS)

    Drummond, Mark; Bresina, John L.

    1990-01-01

    A projection algorithm is presented for incremental control rule synthesis. The algorithm synthesizes an initial set of goal achieving control rules using a combination of situation probability and estimated remaining work as a search heuristic. This set of control rules has a certain probability of satisfying the given goal. The probability is incrementally increased by synthesizing additional control rules to handle 'error' situations the execution system is likely to encounter when following the initial control rules. By using situation probabilities, the algorithm achieves a computationally effective balance between the limited robustness of triangle tables and the absolute robustness of universal plans.

  17. Predicting the survival of diabetes using neural network

    NASA Astrophysics Data System (ADS)

    Mamuda, Mamman; Sathasivam, Saratha

    2017-08-01

    Data mining techniques at the present time are used in predicting diseases of health care industries. Neural Network is one among the prevailing method in data mining techniques of an intelligent field for predicting diseases in health care industries. This paper presents a study on the prediction of the survival of diabetes diseases using different learning algorithms from the supervised learning algorithms of neural network. Three learning algorithms are considered in this study: (i) The levenberg-marquardt learning algorithm (ii) The Bayesian regulation learning algorithm and (iii) The scaled conjugate gradient learning algorithm. The network is trained using the Pima Indian Diabetes Dataset with the help of MATLAB R2014(a) software. The performance of each algorithm is further discussed through regression analysis. The prediction accuracy of the best algorithm is further computed to validate the accurate prediction

  18. Trend Motif: A Graph Mining Approach for Analysis of Dynamic Complex Networks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jin, R; McCallen, S; Almaas, E

    2007-05-28

    Complex networks have been used successfully in scientific disciplines ranging from sociology to microbiology to describe systems of interacting units. Until recently, studies of complex networks have mainly focused on their network topology. However, in many real world applications, the edges and vertices have associated attributes that are frequently represented as vertex or edge weights. Furthermore, these weights are often not static, instead changing with time and forming a time series. Hence, to fully understand the dynamics of the complex network, we have to consider both network topology and related time series data. In this work, we propose a motifmore » mining approach to identify trend motifs for such purposes. Simply stated, a trend motif describes a recurring subgraph where each of its vertices or edges displays similar dynamics over a userdefined period. Given this, each trend motif occurrence can help reveal significant events in a complex system; frequent trend motifs may aid in uncovering dynamic rules of change for the system, and the distribution of trend motifs may characterize the global dynamics of the system. Here, we have developed efficient mining algorithms to extract trend motifs. Our experimental validation using three disparate empirical datasets, ranging from the stock market, world trade, to a protein interaction network, has demonstrated the efficiency and effectiveness of our approach.« less

  19. Data Mining Research with the LSST

    NASA Astrophysics Data System (ADS)

    Borne, Kirk D.; Strauss, M. A.; Tyson, J. A.

    2007-12-01

    The LSST catalog database will exceed 10 petabytes, comprising several hundred attributes for 5 billion galaxies, 10 billion stars, and over 1 billion variable sources (optical variables, transients, or moving objects), extracted from over 20,000 square degrees of deep imaging in 5 passbands with thorough time domain coverage: 1000 visits over the 10-year LSST survey lifetime. The opportunities are enormous for novel scientific discoveries within this rich time-domain ultra-deep multi-band survey database. Data Mining, Machine Learning, and Knowledge Discovery research opportunities with the LSST are now under study, with a potential for new collaborations to develop to contribute to these investigations. We will describe features of the LSST science database that are amenable to scientific data mining, object classification, outlier identification, anomaly detection, image quality assurance, and survey science validation. We also give some illustrative examples of current scientific data mining research in astronomy, and point out where new research is needed. In particular, the data mining research community will need to address several issues in the coming years as we prepare for the LSST data deluge. The data mining research agenda includes: scalability (at petabytes scales) of existing machine learning and data mining algorithms; development of grid-enabled parallel data mining algorithms; designing a robust system for brokering classifications from the LSST event pipeline (which may produce 10,000 or more event alerts per night); multi-resolution methods for exploration of petascale databases; visual data mining algorithms for visual exploration of the data; indexing of multi-attribute multi-dimensional astronomical databases (beyond RA-Dec spatial indexing) for rapid querying of petabyte databases; and more. Finally, we will identify opportunities for synergistic collaboration between the data mining research group and the LSST Data Management and Science Collaboration teams.

  20. A New Framework for Textual Information Mining over Parse Trees. CRESST Report 805

    ERIC Educational Resources Information Center

    Mousavi, Hamid; Kerr, Deirdre; Iseli, Markus R.

    2011-01-01

    Textual information mining is a challenging problem that has resulted in the creation of many different rule-based linguistic query languages. However, these languages generally are not optimized for the purpose of text mining. In other words, they usually consider queries as individuals and only return raw results for each query. Moreover they…

  1. Privacy Preserving Sequential Pattern Mining in Data Stream

    NASA Astrophysics Data System (ADS)

    Huang, Qin-Hua

    The privacy preserving data mining technique researches have gained much attention in recent years. For data stream systems, wireless networks and mobile devices, the related stream data mining techniques research is still in its' early stage. In this paper, an data mining algorithm dealing with privacy preserving problem in data stream is presented.

  2. Documents for SBAR Panel: CERCLA 108(b) Hard Rock Mining Financial Assurance Rule

    EPA Pesticide Factsheets

    SBAR panel documents for small business advocacy review panel on the financial responsibilities of the hard rock mining industry under Section 108(b) of the Comprehensive Environmental Response, Compensation, and Liability Act

  3. Preference Mining Using Neighborhood Rough Set Model on Two Universes.

    PubMed

    Zeng, Kai

    2016-01-01

    Preference mining plays an important role in e-commerce and video websites for enhancing user satisfaction and loyalty. Some classical methods are not available for the cold-start problem when the user or the item is new. In this paper, we propose a new model, called parametric neighborhood rough set on two universes (NRSTU), to describe the user and item data structures. Furthermore, the neighborhood lower approximation operator is used for defining the preference rules. Then, we provide the means for recommending items to users by using these rules. Finally, we give an experimental example to show the details of NRSTU-based preference mining for cold-start problem. The parameters of the model are also discussed. The experimental results show that the proposed method presents an effective solution for preference mining. In particular, NRSTU improves the recommendation accuracy by about 19% compared to the traditional method.

  4. Using redescription mining to relate clinical and biological characteristics of cognitively impaired and Alzheimer's disease patients.

    PubMed

    Mihelčić, Matej; Šimić, Goran; Babić Leko, Mirjana; Lavrač, Nada; Džeroski, Sašo; Šmuc, Tomislav

    2017-01-01

    Based on a set of subjects and a collection of attributes obtained from the Alzheimer's Disease Neuroimaging Initiative database, we used redescription mining to find interpretable rules revealing associations between those determinants that provide insights about the Alzheimer's disease (AD). We extended the CLUS-RM redescription mining algorithm to a constraint-based redescription mining (CBRM) setting, which enables several modes of targeted exploration of specific, user-constrained associations. Redescription mining enabled finding specific constructs of clinical and biological attributes that describe many groups of subjects of different size, homogeneity and levels of cognitive impairment. We confirmed some previously known findings. However, in some instances, as with the attributes: testosterone, ciliary neurotrophic factor, brain natriuretic peptide, Fas ligand, the imaging attribute Spatial Pattern of Abnormalities for Recognition of Early AD, as well as the levels of leptin and angiopoietin-2 in plasma, we corroborated previously debatable findings or provided additional information about these variables and their association with AD pathogenesis. Moreover, applying redescription mining on ADNI data resulted with the discovery of one largely unknown attribute: the Pregnancy-Associated Protein-A (PAPP-A), which we found highly associated with cognitive impairment in AD. Statistically significant correlations (p ≤ 0.01) were found between PAPP-A and clinical tests: Alzheimer's Disease Assessment Scale, Clinical Dementia Rating Sum of Boxes, Mini Mental State Examination, etc. The high importance of this finding lies in the fact that PAPP-A is a metalloproteinase, known to cleave insulin-like growth factor binding proteins. Since it also shares similar substrates with A Disintegrin and the Metalloproteinase family of enzymes that act as α-secretase to physiologically cleave amyloid precursor protein (APP) in the non-amyloidogenic pathway, it could be directly involved in the metabolism of APP very early during the disease course. Therefore, further studies should investigate the role of PAPP-A in the development of AD more thoroughly.

  5. Using redescription mining to relate clinical and biological characteristics of cognitively impaired and Alzheimer’s disease patients

    PubMed Central

    Mihelčić, Matej; Šimić, Goran; Babić Leko, Mirjana; Lavrač, Nada; Džeroski, Sašo; Šmuc, Tomislav

    2017-01-01

    Based on a set of subjects and a collection of attributes obtained from the Alzheimer’s Disease Neuroimaging Initiative database, we used redescription mining to find interpretable rules revealing associations between those determinants that provide insights about the Alzheimer’s disease (AD). We extended the CLUS-RM redescription mining algorithm to a constraint-based redescription mining (CBRM) setting, which enables several modes of targeted exploration of specific, user-constrained associations. Redescription mining enabled finding specific constructs of clinical and biological attributes that describe many groups of subjects of different size, homogeneity and levels of cognitive impairment. We confirmed some previously known findings. However, in some instances, as with the attributes: testosterone, ciliary neurotrophic factor, brain natriuretic peptide, Fas ligand, the imaging attribute Spatial Pattern of Abnormalities for Recognition of Early AD, as well as the levels of leptin and angiopoietin-2 in plasma, we corroborated previously debatable findings or provided additional information about these variables and their association with AD pathogenesis. Moreover, applying redescription mining on ADNI data resulted with the discovery of one largely unknown attribute: the Pregnancy-Associated Protein-A (PAPP-A), which we found highly associated with cognitive impairment in AD. Statistically significant correlations (p ≤ 0.01) were found between PAPP-A and clinical tests: Alzheimer’s Disease Assessment Scale, Clinical Dementia Rating Sum of Boxes, Mini Mental State Examination, etc. The high importance of this finding lies in the fact that PAPP-A is a metalloproteinase, known to cleave insulin-like growth factor binding proteins. Since it also shares similar substrates with A Disintegrin and the Metalloproteinase family of enzymes that act as α-secretase to physiologically cleave amyloid precursor protein (APP) in the non-amyloidogenic pathway, it could be directly involved in the metabolism of APP very early during the disease course. Therefore, further studies should investigate the role of PAPP-A in the development of AD more thoroughly. PMID:29088293

  6. Does Tumor Development Follow a Programmed Path?

    NASA Astrophysics Data System (ADS)

    Austin, Robert

    2011-03-01

    The initiation and progression of a tumor is a complex process, resembling the growth of a embryo in terms of the stages of development and increasing differentiation and somatic evolution of constituent cells in the community of cells that constitute the tumor. Typically we view cancer cells as rogue individuals violating the rules of the games played within an organism, but I would suggest that what we see is a programmed and algorithmic process. I will then question If tumor progression is dominated by the random acquisition of successive survival traits, or by a systematic and sequential unpacking of ``weapons'' from a pre-adapted ``toolkit'' of genetic and epigenetic potentialities? Can we then address this hypothesis by data mining solid tumors layer by layer? Support of the NSF and the NCI is gratefully acknowledged.

  7. A multiobjective optimization model and an orthogonal design-based hybrid heuristic algorithm for regional urban mining management problems.

    PubMed

    Wu, Hao; Wan, Zhong

    2018-02-01

    In this paper, a multiobjective mixed-integer piecewise nonlinear programming model (MOMIPNLP) is built to formulate the management problem of urban mining system, where the decision variables are associated with buy-back pricing, choices of sites, transportation planning, and adjustment of production capacity. Different from the existing approaches, the social negative effect, generated from structural optimization of the recycling system, is minimized in our model, as well as the total recycling profit and utility from environmental improvement are jointly maximized. For solving the problem, the MOMIPNLP model is first transformed into an ordinary mixed-integer nonlinear programming model by variable substitution such that the piecewise feature of the model is removed. Then, based on technique of orthogonal design, a hybrid heuristic algorithm is developed to find an approximate Pareto-optimal solution, where genetic algorithm is used to optimize the structure of search neighborhood, and both local branching algorithm and relaxation-induced neighborhood search algorithm are employed to cut the searching branches and reduce the number of variables in each branch. Numerical experiments indicate that this algorithm spends less CPU (central processing unit) time in solving large-scale regional urban mining management problems, especially in comparison with the similar ones available in literature. By case study and sensitivity analysis, a number of practical managerial implications are revealed from the model. Since the metal stocks in society are reliable overground mineral sources, urban mining has been paid great attention as emerging strategic resources in an era of resource shortage. By mathematical modeling and development of efficient algorithms, this paper provides decision makers with useful suggestions on the optimal design of recycling system in urban mining. For example, this paper can answer how to encourage enterprises to join the recycling activities by government's support and subsidies, whether the existing recycling system can meet the developmental requirements or not, and what is a reasonable adjustment of production capacity.

  8. Evaluation of rational nonsteroidal anti-inflammatory drugs and gastro-protective agents use; association rule data mining using outpatient prescription patterns.

    PubMed

    Pattanaprateep, Oraluck; McEvoy, Mark; Attia, John; Thakkinstian, Ammarin

    2017-07-04

    Nonsteroidal anti-inflammatory drugs (NSAIDs) and gastro-protective agents should be co-prescribed following a standard clinical practice guideline; however, adherence to this guideline in routine practice is unknown. This study applied an association rule model (ARM) to estimate rational NSAIDs and gastro-protective agents use in an outpatient prescriptions dataset. A database of hospital outpatients from October 1st, 2013 to September 30th, 2015 was searched for any of following drugs: oral antacids (A02A), peptic ulcer and gastro-oesophageal reflux disease drugs (GORD, A02B), and anti-inflammatory and anti-rheumatic products, non-steroids or NSAIDs (M01A). Data including patient demographics, diagnoses, and drug utilization were also retrieved. An association rule model was used to analyze co-prescription of the same drug class (i.e., prescriptions within A02A-A02B, M01A) and between drug classes (A02A-A02B & M01A) using the Apriori algorithm in R. The lift value, was calculated by a ratio of confidence to expected confidence, which gave information about the association between drugs in the prescription. We identified a total of 404,273 patients with 2,575,331 outpatient visits in 2 fiscal years. Mean age was 48 years and 34% were male. Among A02A, A02B and M01A drug classes, 12 rules of associations were discovered with support and confidence thresholds of 1% and 50%. The highest lift was between Omeprazole and Ranitidine (340 visits); about one-third of these visits (118) were prescriptions to non-GORD patients, contrary to guidelines. Another finding was the concomitant use of COX-2 inhibitors (Etoricoxib or Celecoxib) and PPIs. 35.6% of these were for patients aged less than 60 years with no GI complication and no Aspirin, inconsistent with guidelines. Around one-third of occasions where these medications were co-prescribed were inconsistent with guidelines. With the rapid growth of health datasets, data mining methods may help assess quality of care and concordance with guidelines and best evidence.

  9. On the fusion of tuning parameters of fuzzy rules and neural network

    NASA Astrophysics Data System (ADS)

    Mamuda, Mamman; Sathasivam, Saratha

    2017-08-01

    Learning fuzzy rule-based system with neural network can lead to a precise valuable empathy of several problems. Fuzzy logic offers a simple way to reach at a definite conclusion based upon its vague, ambiguous, imprecise, noisy or missing input information. Conventional learning algorithm for tuning parameters of fuzzy rules using training input-output data usually end in a weak firing state, this certainly powers the fuzzy rule and makes it insecure for a multiple-input fuzzy system. In this paper, we introduce a new learning algorithm for tuning the parameters of the fuzzy rules alongside with radial basis function neural network (RBFNN) in training input-output data based on the gradient descent method. By the new learning algorithm, the problem of weak firing using the conventional method was addressed. We illustrated the efficiency of our new learning algorithm by means of numerical examples. MATLAB R2014(a) software was used in simulating our result The result shows that the new learning method has the best advantage of training the fuzzy rules without tempering with the fuzzy rule table which allowed a membership function of the rule to be used more than one time in the fuzzy rule base.

  10. Developing image processing meta-algorithms with data mining of multiple metrics.

    PubMed

    Leung, Kelvin; Cunha, Alexandre; Toga, A W; Parker, D Stott

    2014-01-01

    People often use multiple metrics in image processing, but here we take a novel approach of mining the values of batteries of metrics on image processing results. We present a case for extending image processing methods to incorporate automated mining of multiple image metric values. Here by a metric we mean any image similarity or distance measure, and in this paper we consider intensity-based and statistical image measures and focus on registration as an image processing problem. We show how it is possible to develop meta-algorithms that evaluate different image processing results with a number of different metrics and mine the results in an automated fashion so as to select the best results. We show that the mining of multiple metrics offers a variety of potential benefits for many image processing problems, including improved robustness and validation.

  11. Exploiting Sequential Patterns Found in Users' Solutions and Virtual Tutor Behavior to Improve Assistance in ITS

    ERIC Educational Resources Information Center

    Fournier-Viger, Philippe; Faghihi, Usef; Nkambou, Roger; Nguifo, Engelbert Mephu

    2010-01-01

    We propose to mine temporal patterns in Intelligent Tutoring Systems (ITSs) to uncover useful knowledge that can enhance their ability to provide assistance. To discover patterns, we suggest using a custom, sequential pattern-mining algorithm. Two ways of applying the algorithm to enhance an ITS's capabilities are addressed. The first is to…

  12. Efficiently hiding sensitive itemsets with transaction deletion based on genetic algorithms.

    PubMed

    Lin, Chun-Wei; Zhang, Binbin; Yang, Kuo-Tung; Hong, Tzung-Pei

    2014-01-01

    Data mining is used to mine meaningful and useful information or knowledge from a very large database. Some secure or private information can be discovered by data mining techniques, thus resulting in an inherent risk of threats to privacy. Privacy-preserving data mining (PPDM) has thus arisen in recent years to sanitize the original database for hiding sensitive information, which can be concerned as an NP-hard problem in sanitization process. In this paper, a compact prelarge GA-based (cpGA2DT) algorithm to delete transactions for hiding sensitive itemsets is thus proposed. It solves the limitations of the evolutionary process by adopting both the compact GA-based (cGA) mechanism and the prelarge concept. A flexible fitness function with three adjustable weights is thus designed to find the appropriate transactions to be deleted in order to hide sensitive itemsets with minimal side effects of hiding failure, missing cost, and artificial cost. Experiments are conducted to show the performance of the proposed cpGA2DT algorithm compared to the simple GA-based (sGA2DT) algorithm and the greedy approach in terms of execution time and three side effects.

  13. Data Mining and Optimization Tools for Developing Engine Parameters Tools

    NASA Technical Reports Server (NTRS)

    Dhawan, Atam P.

    1998-01-01

    This project was awarded for understanding the problem and developing a plan for Data Mining tools for use in designing and implementing an Engine Condition Monitoring System. Tricia Erhardt and I studied the problem domain for developing an Engine Condition Monitoring system using the sparse and non-standardized datasets to be available through a consortium at NASA Lewis Research Center. We visited NASA three times to discuss additional issues related to dataset which was not made available to us. We discussed and developed a general framework of data mining and optimization tools to extract useful information from sparse and non-standard datasets. These discussions lead to the training of Tricia Erhardt to develop Genetic Algorithm based search programs which were written in C++ and used to demonstrate the capability of GA algorithm in searching an optimal solution in noisy, datasets. From the study and discussion with NASA LeRC personnel, we then prepared a proposal, which is being submitted to NASA for future work for the development of data mining algorithms for engine conditional monitoring. The proposed set of algorithm uses wavelet processing for creating multi-resolution pyramid of tile data for GA based multi-resolution optimal search.

  14. Collaborative mining and transfer learning for relational data

    NASA Astrophysics Data System (ADS)

    Levchuk, Georgiy; Eslami, Mohammed

    2015-06-01

    Many of the real-world problems, - including human knowledge, communication, biological, and cyber network analysis, - deal with data entities for which the essential information is contained in the relations among those entities. Such data must be modeled and analyzed as graphs, with attributes on both objects and relations encode and differentiate their semantics. Traditional data mining algorithms were originally designed for analyzing discrete objects for which a set of features can be defined, and thus cannot be easily adapted to deal with graph data. This gave rise to the relational data mining field of research, of which graph pattern learning is a key sub-domain [11]. In this paper, we describe a model for learning graph patterns in collaborative distributed manner. Distributed pattern learning is challenging due to dependencies between the nodes and relations in the graph, and variability across graph instances. We present three algorithms that trade-off benefits of parallelization and data aggregation, compare their performance to centralized graph learning, and discuss individual benefits and weaknesses of each model. Presented algorithms are designed for linear speedup in distributed computing environments, and learn graph patterns that are both closer to ground truth and provide higher detection rates than centralized mining algorithm.

  15. MISAGA: An Algorithm for Mining Interesting Subgraphs in Attributed Graphs.

    PubMed

    He, Tiantian; Chan, Keith C C

    2018-05-01

    An attributed graph contains vertices that are associated with a set of attribute values. Mining clusters or communities, which are interesting subgraphs in the attributed graph is one of the most important tasks of graph analytics. Many problems can be defined as the mining of interesting subgraphs in attributed graphs. Algorithms that discover subgraphs based on predefined topologies cannot be used to tackle these problems. To discover interesting subgraphs in the attributed graph, we propose an algorithm called mining interesting subgraphs in attributed graph algorithm (MISAGA). MISAGA performs its tasks by first using a probabilistic measure to determine whether the strength of association between a pair of attribute values is strong enough to be interesting. Given the interesting pairs of attribute values, then the degree of association is computed for each pair of vertices using an information theoretic measure. Based on the edge structure and degree of association between each pair of vertices, MISAGA identifies interesting subgraphs by formulating it as a constrained optimization problem and solves it by identifying the optimal affiliation of subgraphs for the vertices in the attributed graph. MISAGA has been tested with several large-sized real graphs and is found to be potentially very useful for various applications.

  16. Analysis of data mining classification by comparison of C4.5 and ID algorithms

    NASA Astrophysics Data System (ADS)

    Sudrajat, R.; Irianingsih, I.; Krisnawan, D.

    2017-01-01

    The rapid development of information technology, triggered by the intensive use of information technology. For example, data mining widely used in investment. Many techniques that can be used assisting in investment, the method that used for classification is decision tree. Decision tree has a variety of algorithms, such as C4.5 and ID3. Both algorithms can generate different models for similar data sets and different accuracy. C4.5 and ID3 algorithms with discrete data provide accuracy are 87.16% and 99.83% and C4.5 algorithm with numerical data is 89.69%. C4.5 and ID3 algorithms with discrete data provides 520 and 598 customers and C4.5 algorithm with numerical data is 546 customers. From the analysis of the both algorithm it can classified quite well because error rate less than 15%.

  17. Astrophysical data mining with GPU. A case study: Genetic classification of globular clusters

    NASA Astrophysics Data System (ADS)

    Cavuoti, S.; Garofalo, M.; Brescia, M.; Paolillo, M.; Pescape', A.; Longo, G.; Ventre, G.

    2014-01-01

    We present a multi-purpose genetic algorithm, designed and implemented with GPGPU/CUDA parallel computing technology. The model was derived from our CPU serial implementation, named GAME (Genetic Algorithm Model Experiment). It was successfully tested and validated on the detection of candidate Globular Clusters in deep, wide-field, single band HST images. The GPU version of GAME will be made available to the community by integrating it into the web application DAMEWARE (DAta Mining Web Application REsource, http://dame.dsf.unina.it/beta_info.html), a public data mining service specialized on massive astrophysical data. Since genetic algorithms are inherently parallel, the GPGPU computing paradigm leads to a speedup of a factor of 200× in the training phase with respect to the CPU based version.

  18. Distributed Function Mining for Gene Expression Programming Based on Fast Reduction.

    PubMed

    Deng, Song; Yue, Dong; Yang, Le-chan; Fu, Xiong; Feng, Ya-zhou

    2016-01-01

    For high-dimensional and massive data sets, traditional centralized gene expression programming (GEP) or improved algorithms lead to increased run-time and decreased prediction accuracy. To solve this problem, this paper proposes a new improved algorithm called distributed function mining for gene expression programming based on fast reduction (DFMGEP-FR). In DFMGEP-FR, fast attribution reduction in binary search algorithms (FAR-BSA) is proposed to quickly find the optimal attribution set, and the function consistency replacement algorithm is given to solve integration of the local function model. Thorough comparative experiments for DFMGEP-FR, centralized GEP and the parallel gene expression programming algorithm based on simulated annealing (parallel GEPSA) are included in this paper. For the waveform, mushroom, connect-4 and musk datasets, the comparative results show that the average time-consumption of DFMGEP-FR drops by 89.09%%, 88.85%, 85.79% and 93.06%, respectively, in contrast to centralized GEP and by 12.5%, 8.42%, 9.62% and 13.75%, respectively, compared with parallel GEPSA. Six well-studied UCI test data sets demonstrate the efficiency and capability of our proposed DFMGEP-FR algorithm for distributed function mining.

  19. The Usage of Association Rule Mining to Identify Influencing Factors on Deafness After Birth.

    PubMed

    Shahraki, Azimeh Danesh; Safdari, Reza; Gahfarokhi, Hamid Habibi; Tahmasebian, Shahram

    2015-12-01

    Providing complete and high quality health care services has very important role to enable people to understand the factors related to personal and social health and to make decision regarding choice of suitable healthy behaviors in order to achieve healthy life. For this reason, demographic and clinical data of person are collecting, this huge volume of data can be known as a valuable resource for analyzing, exploring and discovering valuable information and communication. This study using forum rules techniques in the data mining has tried to identify the affecting factors on hearing loss after birth in Iran. The survey is kind of data oriented study. The population of the study is contained questionnaires in several provinces of the country. First, all data of questionnaire was implemented in the form of information table in Software SQL Server and followed by Data Entry using written software of C # .Net, then algorithm Association in SQL Server Data Tools software and Clementine software was implemented to determine the rules and hidden patterns in the gathered data. Two factors of number of deaf brothers and the degree of consanguinity of the parents have a significant impact on severity of deafness of individuals. Also, when the severity of hearing loss is greater than or equal to moderately severe hearing loss, people use hearing aids and Men are also less interested in the use of hearing aids. In fact, it can be said that in families with consanguineous marriage of parents that are from first degree (girl/boy cousins) and 2(nd) degree relatives (girl/boy cousins) and especially from first degree, the number of people with severe hearing loss or deafness are more and in the use of hearing aids, gender of the patient is more important than the severity of the hearing loss.

  20. Developing and Implementing the Data Mining Algorithms in RAVEN

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sen, Ramazan Sonat; Maljovec, Daniel Patrick; Alfonsi, Andrea

    The RAVEN code is becoming a comprehensive tool to perform probabilistic risk assessment, uncertainty quantification, and verification and validation. The RAVEN code is being developed to support many programs and to provide a set of methodologies and algorithms for advanced analysis. Scientific computer codes can generate enormous amounts of data. To post-process and analyze such data might, in some cases, take longer than the initial software runtime. Data mining algorithms/methods help in recognizing and understanding patterns in the data, and thus discover knowledge in databases. The methodologies used in the dynamic probabilistic risk assessment or in uncertainty and error quantificationmore » analysis couple system/physics codes with simulation controller codes, such as RAVEN. RAVEN introduces both deterministic and stochastic elements into the simulation while the system/physics code model the dynamics deterministically. A typical analysis is performed by sampling values of a set of parameter values. A major challenge in using dynamic probabilistic risk assessment or uncertainty and error quantification analysis for a complex system is to analyze the large number of scenarios generated. Data mining techniques are typically used to better organize and understand data, i.e. recognizing patterns in the data. This report focuses on development and implementation of Application Programming Interfaces (APIs) for different data mining algorithms, and the application of these algorithms to different databases.« less

  1. Function Clustering Self-Organization Maps (FCSOMs) for mining differentially expressed genes in Drosophila and its correlation with the growth medium.

    PubMed

    Liu, L L; Liu, M J; Ma, M

    2015-09-28

    The central task of this study was to mine the gene-to-medium relationship. Adequate knowledge of this relationship could potentially improve the accuracy of differentially expressed gene mining. One of the approaches to differentially expressed gene mining uses conventional clustering algorithms to identify the gene-to-medium relationship. Compared to conventional clustering algorithms, self-organization maps (SOMs) identify the nonlinear aspects of the gene-to-medium relationships by mapping the input space into another higher dimensional feature space. However, SOMs are not suitable for huge datasets consisting of millions of samples. Therefore, a new computational model, the Function Clustering Self-Organization Maps (FCSOMs), was developed. FCSOMs take advantage of the theory of granular computing as well as advanced statistical learning methodologies, and are built specifically for each information granule (a function cluster of genes), which are intelligently partitioned by the clustering algorithm provided by the DAVID_6.7 software platform. However, only the gene functions, and not their expression values, are considered in the fuzzy clustering algorithm of DAVID. Compared to the clustering algorithm of DAVID, these experimental results show a marked improvement in the accuracy of classification with the application of FCSOMs. FCSOMs can handle huge datasets and their complex classification problems, as each FCSOM (modeled for each function cluster) can be easily parallelized.

  2. Knowledge discovery through games and game theory

    NASA Astrophysics Data System (ADS)

    Smith, James F., III; Rhyne, Robert D.

    2001-03-01

    A fuzzy logic based expert system has been developed that automatically allocates electronic attack (EA) resources in real-time over many dissimilar platforms. The platforms can be very general, e.g., ships, planes, robots, land based facilities, etc. Potential foes the platforms deal with can also be general. The initial version of the algorithm was optimized using a genetic algorithm employing fitness functions constructed based on expertise. A new approach is being explored that involves embedding the resource manager in a electronic game environment. The game allows a human expert to play against the resource manager in a simulated battlespace with each of the defending platforms being exclusively directed by the fuzzy resource manager and the attacking platforms being controlled by the human expert or operating autonomously under their own logic. This approach automates the data mining problem. The game automatically creates a database reflecting the domain expert's knowledge, it calls a data mining function, a genetic algorithm, for data mining of the database as required. The game allows easy evaluation of the information mined in the second step. The measure of effectiveness (MOE) for re-optimization is discussed. The mined information is extremely valuable as shown through demanding scenarios.

  3. Physics Mining of Multi-Source Data Sets

    NASA Technical Reports Server (NTRS)

    Helly, John; Karimabadi, Homa; Sipes, Tamara

    2012-01-01

    Powerful new parallel data mining algorithms can produce diagnostic and prognostic numerical models and analyses from observational data. These techniques yield higher-resolution measures than ever before of environmental parameters by fusing synoptic imagery and time-series measurements. These techniques are general and relevant to observational data, including raster, vector, and scalar, and can be applied in all Earth- and environmental science domains. Because they can be highly automated and are parallel, they scale to large spatial domains and are well suited to change and gap detection. This makes it possible to analyze spatial and temporal gaps in information, and facilitates within-mission replanning to optimize the allocation of observational resources. The basis of the innovation is the extension of a recently developed set of algorithms packaged into MineTool to multi-variate time-series data. MineTool is unique in that it automates the various steps of the data mining process, thus making it amenable to autonomous analysis of large data sets. Unlike techniques such as Artificial Neural Nets, which yield a blackbox solution, MineTool's outcome is always an analytical model in parametric form that expresses the output in terms of the input variables. This has the advantage that the derived equation can then be used to gain insight into the physical relevance and relative importance of the parameters and coefficients in the model. This is referred to as physics-mining of data. The capabilities of MineTool are extended to include both supervised and unsupervised algorithms, handle multi-type data sets, and parallelize it.

  4. Health Terrain: Visualizing Large Scale Health Data

    DTIC Science & Technology

    2015-12-01

    Text mining ; Data mining . 16. SECURITY  CLASSIFICATION  OF: 17... text   mining  algorithms  to  construct  a  concept  space.  A   browser-­‐based  user  interface  is  developed  to...Public  health  data,  Notifiable  condition  detector,   Text   mining ,  Data   mining   4 of 29 Disease Patient Location Term

  5. Pattern Discovery and Change Detection of Online Music Query Streams

    NASA Astrophysics Data System (ADS)

    Li, Hua-Fu

    In this paper, an efficient stream mining algorithm, called FTP-stream (Frequent Temporal Pattern mining of streams), is proposed to find the frequent temporal patterns over melody sequence streams. In the framework of our proposed algorithm, an effective bit-sequence representation is used to reduce the time and memory needed to slide the windows. The FTP-stream algorithm can calculate the support threshold in only a single pass based on the concept of bit-sequence representation. It takes the advantage of "left" and "and" operations of the representation. Experiments show that the proposed algorithm only scans the music query stream once, and runs significant faster and consumes less memory than existing algorithms, such as SWFI-stream and Moment.

  6. Customizing FP-growth algorithm to parallel mining with Charm++ library

    NASA Astrophysics Data System (ADS)

    Puścian, Marek

    2017-08-01

    This paper presents a frequent item mining algorithm that was customized to handle growing data repositories. The proposed solution applies Master Slave scheme to frequent pattern growth technique. Efficient utilization of available computation units is achieved by dynamic reallocation of tasks. Conditional frequent trees are assigned to parallel workers basing on their workload. Proposed enhancements have been successfully implemented using Charm++ library. This paper discusses results of the performance of parallelized FP-growth algorithm against different datasets. The approach has been illustrated with many experiments and measurements performed using multiprocessor and multithreaded computer.

  7. Study on key technologies of optimization of big data for thermal power plant performance

    NASA Astrophysics Data System (ADS)

    Mao, Mingyang; Xiao, Hong

    2018-06-01

    Thermal power generation accounts for 70% of China's power generation, the pollutants accounted for 40% of the same kind of emissions, thermal power efficiency optimization needs to monitor and understand the whole process of coal combustion and pollutant migration, power system performance data show explosive growth trend, The purpose is to study the integration of numerical simulation of big data technology, the development of thermal power plant efficiency data optimization platform and nitrogen oxide emission reduction system for the thermal power plant to improve efficiency, energy saving and emission reduction to provide reliable technical support. The method is big data technology represented by "multi-source heterogeneous data integration", "large data distributed storage" and "high-performance real-time and off-line computing", can greatly enhance the energy consumption capacity of thermal power plants and the level of intelligent decision-making, and then use the data mining algorithm to establish the boiler combustion mathematical model, mining power plant boiler efficiency data, combined with numerical simulation technology to find the boiler combustion and pollutant generation rules and combustion parameters of boiler combustion and pollutant generation Influence. The result is to optimize the boiler combustion parameters, which can achieve energy saving.

  8. A comprehensive review on privacy preserving data mining.

    PubMed

    Aldeen, Yousra Abdul Alsahib S; Salleh, Mazleena; Razzaque, Mohammad Abdur

    2015-01-01

    Preservation of privacy in data mining has emerged as an absolute prerequisite for exchanging confidential information in terms of data analysis, validation, and publishing. Ever-escalating internet phishing posed severe threat on widespread propagation of sensitive information over the web. Conversely, the dubious feelings and contentions mediated unwillingness of various information providers towards the reliability protection of data from disclosure often results utter rejection in data sharing or incorrect information sharing. This article provides a panoramic overview on new perspective and systematic interpretation of a list published literatures via their meticulous organization in subcategories. The fundamental notions of the existing privacy preserving data mining methods, their merits, and shortcomings are presented. The current privacy preserving data mining techniques are classified based on distortion, association rule, hide association rule, taxonomy, clustering, associative classification, outsourced data mining, distributed, and k-anonymity, where their notable advantages and disadvantages are emphasized. This careful scrutiny reveals the past development, present research challenges, future trends, the gaps and weaknesses. Further significant enhancements for more robust privacy protection and preservation are affirmed to be mandatory.

  9. [Research of bleeding volume and method in blood-letting acupuncture therapy based on data mining].

    PubMed

    Liu, Xin; Jia, Chun-Sheng; Wang, Jian-Ling; Du, Yu-Zhu; Zhang, Xiao-Xu; Shi, Jing; Li, Xiao-Feng; Sun, Yan-Hui; Zhang, Shen; Zhang, Xuan-Ping; Gang, Wei-Juan

    2014-03-01

    Through computer-based technology and data mining method, with treatment in cases of bloodletting acupuncture therapy in collected literature as sample data, the association rule in data mining was applied. According to self-built database platform, the data was input, arranged and summarized, and eventually required data was acquired to perform the data mining of bleeding volume and method in blood-letting acupuncture therapy, which summarized its application rules and clinical values to provide better guide for clinical practice. There were 9 kinds of blood-letting tools in the literature, in which the frequency of three-edge needle was the highest, accounting for 84.4% (1239/1468). The bleeding volume was classified into six levels, in which less volume (less than 0.1 mL) had the highest frequency (401 times). According to the results of the data mining, blood-letting acupuncture therapy was widely applied in clinical practice of acupuncture, in which use of three-edge needle and less volume (less than 0.1 mL) of blood were the most common, however, there was no central tendency in general.

  10. 20 CFR 410.681 - Change of ruling or legal precedent.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 20 Employees' Benefits 2 2011-04-01 2011-04-01 false Change of ruling or legal precedent. 410.681 Section 410.681 Employees' Benefits SOCIAL SECURITY ADMINISTRATION FEDERAL COAL MINE HEALTH AND SAFETY ACT..., Administrative Review, Finality of Decisions, and Representation of Parties § 410.681 Change of ruling or legal...

  11. 20 CFR 410.681 - Change of ruling or legal precedent.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 20 Employees' Benefits 2 2010-04-01 2010-04-01 false Change of ruling or legal precedent. 410.681 Section 410.681 Employees' Benefits SOCIAL SECURITY ADMINISTRATION FEDERAL COAL MINE HEALTH AND SAFETY ACT..., Administrative Review, Finality of Decisions, and Representation of Parties § 410.681 Change of ruling or legal...

  12. A method of extracting impervious surface based on rule algorithm

    NASA Astrophysics Data System (ADS)

    Peng, Shuangyun; Hong, Liang; Xu, Quanli

    2018-02-01

    The impervious surface has become an important index to evaluate the urban environmental quality and measure the development level of urbanization. At present, the use of remote sensing technology to extract impervious surface has become the main way. In this paper, a method to extract impervious surface based on rule algorithm is proposed. The main ideas of the method is to use the rule-based algorithm to extract impermeable surface based on the characteristics and the difference which is between the impervious surface and the other three types of objects (water, soil and vegetation) in the seven original bands, NDWI and NDVI. The steps can be divided into three steps: 1) Firstly, the vegetation is extracted according to the principle that the vegetation is higher in the near-infrared band than the other bands; 2) Then, the water is extracted according to the characteristic of the water with the highest NDWI and the lowest NDVI; 3) Finally, the impermeable surface is extracted based on the fact that the impervious surface has a higher NDWI value and the lowest NDVI value than the soil.In order to test the accuracy of the rule algorithm, this paper uses the linear spectral mixed decomposition algorithm, the CART algorithm, the NDII index algorithm for extracting the impervious surface based on six remote sensing image of the Dianchi Lake Basin from 1999 to 2014. Then, the accuracy of the above three methods is compared with the accuracy of the rule algorithm by using the overall classification accuracy method. It is found that the extraction method based on the rule algorithm is obviously higher than the above three methods.

  13. Developing Image Processing Meta-Algorithms with Data Mining of Multiple Metrics

    PubMed Central

    Cunha, Alexandre; Toga, A. W.; Parker, D. Stott

    2014-01-01

    People often use multiple metrics in image processing, but here we take a novel approach of mining the values of batteries of metrics on image processing results. We present a case for extending image processing methods to incorporate automated mining of multiple image metric values. Here by a metric we mean any image similarity or distance measure, and in this paper we consider intensity-based and statistical image measures and focus on registration as an image processing problem. We show how it is possible to develop meta-algorithms that evaluate different image processing results with a number of different metrics and mine the results in an automated fashion so as to select the best results. We show that the mining of multiple metrics offers a variety of potential benefits for many image processing problems, including improved robustness and validation. PMID:24653748

  14. Japanese Aggression in Asia (1895-1930). Japan’s Dream of ’Hakko Ichuo’ (Eight Corners of the World under Japanese Rule).

    DTIC Science & Technology

    1980-12-01

    the British Navy was also of significant value, for then Britannia still ruled the waves. The huge indemnity received from the Chinese played an...11 among the sons, the eldest took all and the second and third sons became either factory or mine workers or apprentices of a merchant. When...warehouses, spin- ning, paper and sugar mills, all based on the large profits which came from banking, mining and foreign trade. Mitsubishi had its

  15. Performance analysis of a multispectral framing camera for detecting mines in the littoral zone and beach zone

    NASA Astrophysics Data System (ADS)

    Louchard, Eric; Farm, Brian; Acker, Andrew

    2008-04-01

    BAE Systems Sensor Systems Identification & Surveillance (IS) has developed, under contract with the Office of Naval Research, a multispectral airborne sensor system and processing algorithms capable of detecting mine-like objects in the surf zone and land mines in the beach zone. BAE Systems has used this system in a blind test at a test range established by the Naval Surface Warfare Center - Panama City Division (NSWC-PCD) at Eglin Air Force Base. The airborne and ground subsystems used in this test are described, with graphical illustrations of the detection algorithms. We report on the performance of the system configured to operate with a human operator analyzing data on a ground station. A subsurface (underwater bottom proud mine in the surf zone and moored mine in shallow water) mine detection capability is demonstrated in the surf zone. Surface float detection and proud land mine detection capability is also demonstrated. Our analysis shows that this BAE Systems-developed multispectral airborne sensor provides a robust technical foundation for a viable system for mine counter-measures, and would be a valuable asset for use prior to an amphibious assault.

  16. British Defense Policy: A New Approach?

    DTIC Science & Technology

    1988-12-14

    inherent to their well-being, was also acknowledged by the remainder of the world in its attitude toward Britain. Is not "Rule Britannia , Britannia ...Castle Class 1 1 Island Class 7 43 Mine -Counter Minesweepers 2 2 Mine River Class 12 Ton Class 10 3 Hunt Class 12 1 Patrol Craft Bird Class 5 Coastal 15...submarine warfare carriers, assault ships, and mine -counter mine vessels. British naval aircraft is as depicted in Table 2. Table 2. Aircraft of the Royal

  17. Implementation of a Flexible Tool for Automated Literature-Mining and Knowledgebase Development (DevToxMine)

    EPA Science Inventory

    Deriving novel relationships from the scientific literature is an important adjunct to datamining activities for complex datasets in genomics and high-throughput screening activities. Automated text-mining algorithms can be used to extract relevant content from the literature and...

  18. Protective and control relays as coal-mine power-supply ACS subsystem

    NASA Astrophysics Data System (ADS)

    Kostin, V. N.; Minakova, T. E.

    2017-10-01

    The paper presents instantaneous selective short-circuit protection for the cabling of the underground part of a coal mine and central control algorithms as a Coal-Mine Power-Supply ACS Subsystem. In order to improve the reliability of electricity supply and reduce the mining equipment down-time, a dual channel relay protection and central control system is proposed as a subsystem of the coal-mine power-supply automated control system (PS ACS).

  19. 77 FR 54490 - Alabama Regulatory Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-09-05

    ... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period and opportunity for public hearing on proposed amendment. SUMMARY: We, the Office of Surface Mining Reclamation... will follow for the public hearing, if one is requested. DATES: We will accept written comments on this...

  20. Quantifying Associations between Environmental Stressors and Demographic Factors

    EPA Science Inventory

    Association rule mining (ARM) [1-3], also known as frequent item set mining [4] or market basket analysis [1], has been widely applied in many different areas, such as business product portfolio planning [5], intrusion detection infrastructure design [6], gene expression analysis...

  1. A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering

    ERIC Educational Resources Information Center

    Chahine, Firas Safwan

    2012-01-01

    Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…

  2. Evaluating data mining algorithms using molecular dynamics trajectories.

    PubMed

    Tatsis, Vasileios A; Tjortjis, Christos; Tzirakis, Panagiotis

    2013-01-01

    Molecular dynamics simulations provide a sample of a molecule's conformational space. Experiments on the mus time scale, resulting in large amounts of data, are nowadays routine. Data mining techniques such as classification provide a way to analyse such data. In this work, we evaluate and compare several classification algorithms using three data sets which resulted from computer simulations, of a potential enzyme mimetic biomolecule. We evaluated 65 classifiers available in the well-known data mining toolkit Weka, using 'classification' errors to assess algorithmic performance. Results suggest that: (i) 'meta' classifiers perform better than the other groups, when applied to molecular dynamics data sets; (ii) Random Forest and Rotation Forest are the best classifiers for all three data sets; and (iii) classification via clustering yields the highest classification error. Our findings are consistent with bibliographic evidence, suggesting a 'roadmap' for dealing with such data.

  3. Application of ant colony Algorithm and particle swarm optimization in architectural design

    NASA Astrophysics Data System (ADS)

    Song, Ziyi; Wu, Yunfa; Song, Jianhua

    2018-02-01

    By studying the development of ant colony algorithm and particle swarm algorithm, this paper expounds the core idea of the algorithm, explores the combination of algorithm and architectural design, sums up the application rules of intelligent algorithm in architectural design, and combines the characteristics of the two algorithms, obtains the research route and realization way of intelligent algorithm in architecture design. To establish algorithm rules to assist architectural design. Taking intelligent algorithm as the beginning of architectural design research, the authors provide the theory foundation of ant colony Algorithm and particle swarm algorithm in architectural design, popularize the application range of intelligent algorithm in architectural design, and provide a new idea for the architects.

  4. Electro-Optic Identification (EOID) Research Program

    DTIC Science & Technology

    2002-09-30

    The goal of this research is to provide computer-assisted identification of underwater mines in electro - optic imagery. Identification algorithms will...greatly reduce the time and risk to reacquire mine-like-objects for positive classification and identification. The objectives are to collect electro ... optic data under a wide range of operating and environmental conditions and develop precise algorithms that can provide accurate target recognition on this data for all possible conditions.

  5. Fast Algorithms for Mining Co-evolving Time Series

    DTIC Science & Technology

    2011-09-01

    Keogh et al., 2001, 2004] and (b) forecasting, like an autoregressive integrated moving average model ( ARIMA ) and related meth- ods [Box et al., 1994...computing hardware? We develop models to mine time series with missing values, to extract compact representation from time sequences, to segment the...sequences, and to do forecasting. For large scale data, we propose algorithms for learning time series models , in particular, including Linear Dynamical

  6. A Contextualized, Differential Sequence Mining Method to Derive Students' Learning Behavior Patterns

    ERIC Educational Resources Information Center

    Kinnebrew, John S.; Loretz, Kirk M.; Biswas, Gautam

    2013-01-01

    Computer-based learning environments can produce a wealth of data on student learning interactions. This paper presents an exploratory data mining methodology for assessing and comparing students' learning behaviors from these interaction traces. The core algorithm employs a novel combination of sequence mining techniques to identify deferentially…

  7. An AK-LDMeans algorithm based on image clustering

    NASA Astrophysics Data System (ADS)

    Chen, Huimin; Li, Xingwei; Zhang, Yongbin; Chen, Nan

    2018-03-01

    Clustering is an effective analytical technique for handling unmarked data for value mining. Its ultimate goal is to mark unclassified data quickly and correctly. We use the roadmap for the current image processing as the experimental background. In this paper, we propose an AK-LDMeans algorithm to automatically lock the K value by designing the Kcost fold line, and then use the long-distance high-density method to select the clustering centers to further replace the traditional initial clustering center selection method, which further improves the efficiency and accuracy of the traditional K-Means Algorithm. And the experimental results are compared with the current clustering algorithm and the results are obtained. The algorithm can provide effective reference value in the fields of image processing, machine vision and data mining.

  8. Medication regularity of pulmonary fibrosis treatment by contemporary traditional Chinese medicine experts based on data mining.

    PubMed

    Zhang, Suxian; Wu, Hao; Liu, Jie; Gu, Huihui; Li, Xiujuan; Zhang, Tiansong

    2018-03-01

    Treatment of pulmonary fibrosis by traditional Chinese medicine (TCM) has accumulated important experience. Our interest is in exploring the medication regularity of contemporary Chinese medical specialists treating pulmonary fibrosis. Through literature search, medical records from TCM experts who treat pulmonary fibrosis, which were published in Chinese and English medical journals, were selected for this study. As the object of study, a database was established after analysing the records. After data cleaning, the rules of medicine in the treatment of pulmonary fibrosis in medical records of TCM were explored by using data mining technologies such as frequency analysis, association rule analysis, and link analysis. A total of 124 medical records from 60 doctors were selected in this study; 263 types of medicinals were used a total of 5,455 times; the herbs that were used more than 30 times can be grouped into 53 species and were used a total of 3,681 times. Using main medicinals cluster analysis, medicinals were divided into qi-tonifying, yin-tonifying, blood-activating, phlegm-resolving, cough-suppressing, panting-calming, and ten other major medicinal categories. According to the set conditions, a total of 62 drug compatibility rules have been obtained, involving mainly qi-tonifying, yin-tonifying, blood-activating, phlegm-resolving, qi-descending, and panting-calming medicinals, as well as other medicinals used in combination. The results of data mining are consistent with clinical practice and it is feasible to explore the medical rules applicable to the treatment of pulmonary fibrosis in medical records of TCM by data mining.

  9. 26 CFR 1.614-3 - Rules relating to separate operating mineral interests in the case of mines.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... method of mining the mineral, the location of the excavations or other workings in relation to the mineral deposit or deposits, and the topography of the area. The determination of the taxpayer as to the...

  10. 30 CFR 56.18006 - New employees.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... New employees. New employees shall be indoctrinated in safety rules and safe work procedures. ... 30 Mineral Resources 1 2010-07-01 2010-07-01 false New employees. 56.18006 Section 56.18006 Mineral Resources MINE SAFETY AND HEALTH ADMINISTRATION, DEPARTMENT OF LABOR METAL AND NONMETAL MINE...

  11. Combined rule extraction and feature elimination in supervised classification.

    PubMed

    Liu, Sheng; Patel, Ronak Y; Daga, Pankaj R; Liu, Haining; Fu, Gang; Doerksen, Robert J; Chen, Yixin; Wilkins, Dawn E

    2012-09-01

    There are a vast number of biology related research problems involving a combination of multiple sources of data to achieve a better understanding of the underlying problems. It is important to select and interpret the most important information from these sources. Thus it will be beneficial to have a good algorithm to simultaneously extract rules and select features for better interpretation of the predictive model. We propose an efficient algorithm, Combined Rule Extraction and Feature Elimination (CRF), based on 1-norm regularized random forests. CRF simultaneously extracts a small number of rules generated by random forests and selects important features. We applied CRF to several drug activity prediction and microarray data sets. CRF is capable of producing performance comparable with state-of-the-art prediction algorithms using a small number of decision rules. Some of the decision rules are biologically significant.

  12. Mining the National Career Assessment Examination Result Using Clustering Algorithm

    NASA Astrophysics Data System (ADS)

    Pagudpud, M. V.; Palaoag, T. T.; Padirayon, L. M.

    2018-03-01

    Education is an essential process today which elicits authorities to discover and establish innovative strategies for educational improvement. This study applied data mining using clustering technique for knowledge extraction from the National Career Assessment Examination (NCAE) result in the Division of Quirino. The NCAE is an examination given to all grade 9 students in the Philippines to assess their aptitudes in the different domains. Clustering the students is helpful in identifying students’ learning considerations. With the use of the RapidMiner tool, clustering algorithms such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN), k-means, k-medoid, expectation maximization clustering, and support vector clustering algorithms were analyzed. The silhouette indexes of the said clustering algorithms were compared, and the result showed that the k-means algorithm with k = 3 and silhouette index equal to 0.196 is the most appropriate clustering algorithm to group the students. Three groups were formed having 477 students in the determined group (cluster 0), 310 proficient students (cluster 1) and 396 developing students (cluster 2). The data mining technique used in this study is essential in extracting useful information from the NCAE result to better understand the abilities of students which in turn is a good basis for adopting teaching strategies.

  13. Analysis of design characteristics of a V-type support using an advanced engineering environment

    NASA Astrophysics Data System (ADS)

    Gwiazda, A.; Banaś, W.; Sękala, A.; Cwikla, G.; Topolska, S.; Foit, K.; Monica, Z.

    2017-08-01

    Modern mining support, for the entire period of their use, is the important part of the mining complex, which includes all the devices in the excavation during his normal use. Therefore, during the design of the support, it is an important task to choose the shape and to select the dimensions of a support as well as its strength characteristics. According to the rules, the design process of a support must take into account, inter alia, the type and the dimensions of the expected means of transport, the number and size of pipelines, and the type of additional equipment used excavation area. The support design must ensure the functionality of the excavation process and job security, while maintaining the economic viability of the entire project. Among others it should ensure the selection of a support for specific natural conditions. It is also important to take into consideration the economic characteristics of the project. The article presents an algorithm of integrative approach and its formalized description in the form of integration the areas of different construction characteristics optimization of a V-type mining support. The paper includes the example of its application for developing the construction of this support. In the paper is also described the results of the characteristics analysis and changings that were introduced afterwards. The support models are prepared in the computer environment of the CAD class (Siemens NX PLM). Also the analyses were conducted in this design, graphical environment.

  14. A novel method for predicting kidney stone type using ensemble learning.

    PubMed

    Kazemi, Yassaman; Mirroshandel, Seyed Abolghasem

    2018-01-01

    The high morbidity rate associated with kidney stone disease, which is a silent killer, is one of the main concerns in healthcare systems all over the world. Advanced data mining techniques such as classification can help in the early prediction of this disease and reduce its incidence and associated costs. The objective of the present study is to derive a model for the early detection of the type of kidney stone and the most influential parameters with the aim of providing a decision-support system. Information was collected from 936 patients with nephrolithiasis at the kidney center of the Razi Hospital in Rasht from 2012 through 2016. The prepared dataset included 42 features. Data pre-processing was the first step toward extracting the relevant features. The collected data was analyzed with Weka software, and various data mining models were used to prepare a predictive model. Various data mining algorithms such as the Bayesian model, different types of Decision Trees, Artificial Neural Networks, and Rule-based classifiers were used in these models. We also proposed four models based on ensemble learning to improve the accuracy of each learning algorithm. In addition, a novel technique for combining individual classifiers in ensemble learning was proposed. In this technique, for each individual classifier, a weight is assigned based on our proposed genetic algorithm based method. The generated knowledge was evaluated using a 10-fold cross-validation technique based on standard measures. However, the assessment of each feature for building a predictive model was another significant challenge. The predictive strength of each feature for creating a reproducible outcome was also investigated. Regarding the applied models, parameters such as sex, acid uric condition, calcium level, hypertension, diabetes, nausea and vomiting, flank pain, and urinary tract infection (UTI) were the most vital parameters for predicting the chance of nephrolithiasis. The final ensemble-based model (with an accuracy of 97.1%) was a robust one and could be safely applied to future studies to predict the chances of developing nephrolithiasis. This model provides a novel way to study stone disease by deciphering the complex interaction among different biological variables, thus helping in an early identification and reduction in diagnosis time. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Mining association patterns of drug-interactions using post marketing FDA's spontaneous reporting data.

    PubMed

    Ibrahim, Heba; Saad, Amr; Abdo, Amany; Sharaf Eldin, A

    2016-04-01

    Pharmacovigilance (PhV) is an important clinical activity with strong implications for population health and clinical research. The main goal of PhV is the timely detection of adverse drug events (ADEs) that are novel in their clinical nature, severity and/or frequency. Drug interactions (DI) pose an important problem in the development of new drugs and post marketing PhV that contribute to 6-30% of all unexpected ADEs. Therefore, the early detection of DI is vital. Spontaneous reporting systems (SRS) have served as the core data collection system for post marketing PhV since the 1960s. The main objective of our study was to particularly identify signals of DI from SRS. In addition, we are presenting an optimized tailored mining algorithm called "hybrid Apriori". The proposed algorithm is based on an optimized and modified association rule mining (ARM) approach. A hybrid Apriori algorithm has been applied to the SRS of the United States Food and Drug Administration's (U.S. FDA) adverse events reporting system (FAERS) in order to extract significant association patterns of drug interaction-adverse event (DIAE). We have assessed the resulting DIAEs qualitatively and quantitatively using two different triage features: a three-element taxonomy and three performance metrics. These features were applied on two random samples of 100 interacting and 100 non-interacting DIAE patterns. Additionally, we have employed logistic regression (LR) statistic method to quantify the magnitude and direction of interactions in order to test for confounding by co-medication in unknown interacting DIAE patterns. Hybrid Apriori extracted 2933 interacting DIAE patterns (including 1256 serious ones) and 530 non-interacting DIAE patterns. Referring to the current knowledge using four different reliable resources of DI, the results showed that the proposed method can extract signals of serious interacting DIAEs. Various association patterns could be identified based on the relationships among the elements which composed a pattern. The average performance of the method showed 85% precision, 80% negative predictive value, 81% sensitivity and 84% specificity. The LR modeling could provide the statistical context to guard against spurious DIAEs. The proposed method could efficiently detect DIAE signals from SRS data as well as, identifying rare adverse drug reactions (ADRs). Copyright © 2016 Elsevier Inc. All rights reserved.

  16. Application of the EVEX resource to event extraction and network construction: Shared Task entry and result analysis

    PubMed Central

    2015-01-01

    Background Modern methods for mining biomolecular interactions from literature typically make predictions based solely on the immediate textual context, in effect a single sentence. No prior work has been published on extending this context to the information automatically gathered from the whole biomedical literature. Thus, our motivation for this study is to explore whether mutually supporting evidence, aggregated across several documents can be utilized to improve the performance of the state-of-the-art event extraction systems. In this paper, we describe our participation in the latest BioNLP Shared Task using the large-scale text mining resource EVEX. We participated in the Genia Event Extraction (GE) and Gene Regulation Network (GRN) tasks with two separate systems. In the GE task, we implemented a re-ranking approach to improve the precision of an existing event extraction system, incorporating features from the EVEX resource. In the GRN task, our system relied solely on the EVEX resource and utilized a rule-based conversion algorithm between the EVEX and GRN formats. Results In the GE task, our re-ranking approach led to a modest performance increase and resulted in the first rank of the official Shared Task results with 50.97% F-score. Additionally, in this paper we explore and evaluate the usage of distributed vector representations for this challenge. In the GRN task, we ranked fifth in the official results with a strict/relaxed SER score of 0.92/0.81 respectively. To try and improve upon these results, we have implemented a novel machine learning based conversion system and benchmarked its performance against the original rule-based system. Conclusions For the GRN task, we were able to produce a gene regulatory network from the EVEX data, warranting the use of such generic large-scale text mining data in network biology settings. A detailed performance and error analysis provides more insight into the relatively low recall rates. In the GE task we demonstrate that both the re-ranking approach and the word vectors can provide slight performance improvement. A manual evaluation of the re-ranking results pinpoints some of the challenges faced in applying large-scale text mining knowledge to event extraction. PMID:26551766

  17. Application-Specific Graph Sampling for Frequent Subgraph Mining and Community Detection

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Purohit, Sumit; Choudhury, Sutanay; Holder, Lawrence B.

    Graph mining is an important data analysis methodology, but struggles as the input graph size increases. The scalability and usability challenges posed by such large graphs make it imperative to sample the input graph and reduce its size. The critical challenge in sampling is to identify the appropriate algorithm to insure the resulting analysis does not suffer heavily from the data reduction. Predicting the expected performance degradation for a given graph and sampling algorithm is also useful. In this paper, we present different sampling approaches for graph mining applications such as Frequent Subgrpah Mining (FSM), and Community Detection (CD). Wemore » explore graph metrics such as PageRank, Triangles, and Diversity to sample a graph and conclude that for heterogeneous graphs Triangles and Diversity perform better than degree based metrics. We also present two new sampling variations for targeted graph mining applications. We present empirical results to show that knowledge of the target application, along with input graph properties can be used to select the best sampling algorithm. We also conclude that performance degradation is an abrupt, rather than gradual phenomena, as the sample size decreases. We present the empirical results to show that the performance degradation follows a logistic function.« less

  18. Comparison of dermatoscopic diagnostic algorithms based on calculation: The ABCD rule of dermatoscopy, the seven-point checklist, the three-point checklist and the CASH algorithm in dermatoscopic evaluation of melanocytic lesions.

    PubMed

    Unlu, Ezgi; Akay, Bengu N; Erdem, Cengizhan

    2014-07-01

    Dermatoscopic analysis of melanocytic lesions using the CASH algorithm has rarely been described in the literature. The purpose of this study was to compare the sensitivity, specificity, and diagnostic accuracy rates of the ABCD rule of dermatoscopy, the seven-point checklist, the three-point checklist, and the CASH algorithm in the diagnosis and dermatoscopic evaluation of melanocytic lesions on the hairy skin. One hundred and fifteen melanocytic lesions of 115 patients were examined retrospectively using dermatoscopic images and compared with the histopathologic diagnosis. Four dermatoscopic algorithms were carried out for all lesions. The ABCD rule of dermatoscopy showed sensitivity of 91.6%, specificity of 60.4%, and diagnostic accuracy of 66.9%. The seven-point checklist showed sensitivity, specificity, and diagnostic accuracy of 87.5, 65.9, and 70.4%, respectively; the three-point checklist 79.1, 62.6, 66%; and the CASH algorithm 91.6, 64.8, and 70.4%, respectively. To our knowledge, this is the first study that compares the sensitivity, specificity and diagnostic accuracy of the ABCD rule of dermatoscopy, the three-point checklist, the seven-point checklist, and the CASH algorithm for the diagnosis of melanocytic lesions on the hairy skin. In our study, the ABCD rule of dermatoscopy and the CASH algorithm showed the highest sensitivity for the diagnosis of melanoma. © 2014 Japanese Dermatological Association.

  19. A Segment-Based Trajectory Similarity Measure in the Urban Transportation Systems.

    PubMed

    Mao, Yingchi; Zhong, Haishi; Xiao, Xianjian; Li, Xiaofang

    2017-03-06

    With the rapid spread of built-in GPS handheld smart devices, the trajectory data from GPS sensors has grown explosively. Trajectory data has spatio-temporal characteristics and rich information. Using trajectory data processing techniques can mine the patterns of human activities and the moving patterns of vehicles in the intelligent transportation systems. A trajectory similarity measure is one of the most important issues in trajectory data mining (clustering, classification, frequent pattern mining, etc.). Unfortunately, the main similarity measure algorithms with the trajectory data have been found to be inaccurate, highly sensitive of sampling methods, and have low robustness for the noise data. To solve the above problems, three distances and their corresponding computation methods are proposed in this paper. The point-segment distance can decrease the sensitivity of the point sampling methods. The prediction distance optimizes the temporal distance with the features of trajectory data. The segment-segment distance introduces the trajectory shape factor into the similarity measurement to improve the accuracy. The three kinds of distance are integrated with the traditional dynamic time warping algorithm (DTW) algorithm to propose a new segment-based dynamic time warping algorithm (SDTW). The experimental results show that the SDTW algorithm can exhibit about 57%, 86%, and 31% better accuracy than the longest common subsequence algorithm (LCSS), and edit distance on real sequence algorithm (EDR) , and DTW, respectively, and that the sensitivity to the noise data is lower than that those algorithms.

  20. 20 CFR 410.703 - Adjudicatory rules for determining entitlement to benefits.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... COAL MINE HEALTH AND SAFETY ACT OF 1969, TITLE IV-BLACK LUNG BENEFITS (1969- ) Rules for the Review of Denied and Pending Claims Under the Black Lung Benefits Reform Act (BLBRA) of 1977 § 410.703 Adjudicatory...

  1. 20 CFR 410.703 - Adjudicatory rules for determining entitlement to benefits.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... COAL MINE HEALTH AND SAFETY ACT OF 1969, TITLE IV-BLACK LUNG BENEFITS (1969- ) Rules for the Review of Denied and Pending Claims Under the Black Lung Benefits Reform Act (BLBRA) of 1977 § 410.703 Adjudicatory...

  2. Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications.

    PubMed

    Zhang, Yiyan; Xin, Yi; Li, Qin; Ma, Jianshe; Li, Shuai; Lv, Xiaodan; Lv, Weiqi

    2017-11-02

    Various kinds of data mining algorithms are continuously raised with the development of related disciplines. The applicable scopes and their performances of these algorithms are different. Hence, finding a suitable algorithm for a dataset is becoming an important emphasis for biomedical researchers to solve practical problems promptly. In this paper, seven kinds of sophisticated active algorithms, namely, C4.5, support vector machine, AdaBoost, k-nearest neighbor, naïve Bayes, random forest, and logistic regression, were selected as the research objects. The seven algorithms were applied to the 12 top-click UCI public datasets with the task of classification, and their performances were compared through induction and analysis. The sample size, number of attributes, number of missing values, and the sample size of each class, correlation coefficients between variables, class entropy of task variable, and the ratio of the sample size of the largest class to the least class were calculated to character the 12 research datasets. The two ensemble algorithms reach high accuracy of classification on most datasets. Moreover, random forest performs better than AdaBoost on the unbalanced dataset of the multi-class task. Simple algorithms, such as the naïve Bayes and logistic regression model are suitable for a small dataset with high correlation between the task and other non-task attribute variables. K-nearest neighbor and C4.5 decision tree algorithms perform well on binary- and multi-class task datasets. Support vector machine is more adept on the balanced small dataset of the binary-class task. No algorithm can maintain the best performance in all datasets. The applicability of the seven data mining algorithms on the datasets with different characteristics was summarized to provide a reference for biomedical researchers or beginners in different fields.

  3. 43 CFR 4.1383 - Hearing.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... 43 Public Lands: Interior 1 2014-10-01 2014-10-01 false Hearing. 4.1383 Section 4.1383 Public Lands: Interior Office of the Secretary of the Interior DEPARTMENT HEARINGS AND APPEALS PROCEDURES Special Rules Applicable to Surface Coal Mining Hearings and Appeals Review of Office of Surface Mining...

  4. 30 CFR 48.6 - Experienced miner training.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    .... (b) Experienced miners must complete the training prescribed in this section before beginning work... to work environment. The course shall include a visit and tour of the mine. The methods of mining... responsibilities of such supervisors and miners' representatives; and an introduction to the operator's rules and...

  5. 43 CFR 3483.6 - Special logical mining unit rules.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... the LMU, of either Federal or non-Federal recoverable coal reserves or a combination thereof, shall be... Section 3483.6 Public Lands: Interior Regulations Relating to Public Lands (Continued) BUREAU OF LAND MANAGEMENT, DEPARTMENT OF THE INTERIOR MINERALS MANAGEMENT (3000) COAL EXPLORATION AND MINING OPERATIONS...

  6. 30 CFR 939.700 - Rhode Island Federal program.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... Rhode Island Federal program. (a) This part contains all rules that are applicable to surface coal... to all surface coal mining and reclamation operations in Rhode Island conducted on non-Federal and... stringent environmental control and regulation of surface coal mining and reclamation operations than do the...

  7. 43 CFR 3483.6 - Special logical mining unit rules.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... the LMU, of either Federal or non-Federal recoverable coal reserves or a combination thereof, shall be... Section 3483.6 Public Lands: Interior Regulations Relating to Public Lands (Continued) BUREAU OF LAND MANAGEMENT, DEPARTMENT OF THE INTERIOR MINERALS MANAGEMENT (3000) COAL EXPLORATION AND MINING OPERATIONS...

  8. 43 CFR 3483.6 - Special logical mining unit rules.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... the LMU, of either Federal or non-Federal recoverable coal reserves or a combination thereof, shall be... Section 3483.6 Public Lands: Interior Regulations Relating to Public Lands (Continued) BUREAU OF LAND MANAGEMENT, DEPARTMENT OF THE INTERIOR MINERALS MANAGEMENT (3000) COAL EXPLORATION AND MINING OPERATIONS...

  9. 43 CFR 4.1351 - Preliminary finding by OSM.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... APPEALS PROCEDURES Special Rules Applicable to Surface Coal Mining Hearings and Appeals Request for...(c) of the Act, 30 U.s.c. 1260(c) (federal Program; Federal Lands Program; Federal Program for Indian... or has controlled surface coal mining and reclamation operations with a demonstrated pattern of...

  10. 43 CFR 3483.6 - Special logical mining unit rules.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... the LMU, of either Federal or non-Federal recoverable coal reserves or a combination thereof, shall be... Section 3483.6 Public Lands: Interior Regulations Relating to Public Lands (Continued) BUREAU OF LAND MANAGEMENT, DEPARTMENT OF THE INTERIOR MINERALS MANAGEMENT (3000) COAL EXPLORATION AND MINING OPERATIONS...

  11. 43 CFR 4.1383 - Hearing.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 43 Public Lands: Interior 1 2010-10-01 2010-10-01 false Hearing. 4.1383 Section 4.1383 Public Lands: Interior Office of the Secretary of the Interior DEPARTMENT HEARINGS AND APPEALS PROCEDURES Special Rules Applicable to Surface Coal Mining Hearings and Appeals Review of Office of Surface Mining...

  12. Association rule mining on grid monitoring data to detect error sources

    NASA Astrophysics Data System (ADS)

    Maier, Gerhild; Schiffers, Michael; Kranzlmueller, Dieter; Gaidioz, Benjamin

    2010-04-01

    Error handling is a crucial task in an infrastructure as complex as a grid. There are several monitoring tools put in place, which report failing grid jobs including exit codes. However, the exit codes do not always denote the actual fault, which caused the job failure. Human time and knowledge is required to manually trace back errors to the real fault underlying an error. We perform association rule mining on grid job monitoring data to automatically retrieve knowledge about the grid components' behavior by taking dependencies between grid job characteristics into account. Therewith, problematic grid components are located automatically and this information - expressed by association rules - is visualized in a web interface. This work achieves a decrease in time for fault recovery and yields an improvement of a grid's reliability.

  13. External validation of heart-type fatty acid binding protein, high-sensitivity cardiac troponin, and electrocardiography as rule-out for acute myocardial infarction.

    PubMed

    Van Hise, Christopher B; Greenslade, Jaimi H; Parsonage, William; Than, Martin; Young, Joanna; Cullen, Louise

    2018-02-01

    To externally validate a clinical decision rule incorporating heart fatty acid binding protein (h-FABP), high-sensitivity troponin (hs-cTn) and electrocardiogram (ECG) for the detection of acute myocardial infarction (AMI) on presentation to the Emergency Department. We also investigated whether this clinical decision rule improved identification of AMI over algorithms incorporating hs-cTn and ECG only. This study included data from 789 patients from the Brisbane ADAPT cohort and 441 patients from the Christchurch TIMI RCT cohort. The primary outcome was index AMI. Sensitivity, specificity, positive predictive value and negative predictive value were used to assess the diagnostic accuracy of the algorithms. 1230 patients were recruited, including 112 (9.1%) with AMI. The algorithm including h-FABP and hs-cTnT had 100% sensitivity and 32.4% specificity. The algorithm utilising h-FABP and hs-cTnI had similar sensitivity (99.1%) and higher specificity (43.4%). The hs-cTnI and hs-cTnT algorithms without h-FABP both had a sensitivity of 98.2%; a result that was not significantly different from either algorithm incorporating h-FABP. Specificity was higher for the hs-cTnI algorithm (68.1%) compared to the hs-cTnT algorithm (33.0%). The specificity of the algorithm incorporating hs-cTnI alone was also significantly higher than both of the algorithms incorporating h-FABP (p<0.01). For patients presenting to the Emergency Department with chest pain, an algorithm incorporating h-FABP, hs-cTn and ECG has high accuracy and can rule out up to 40% of patients. An algorithm incorporating only hs-cTn and ECG has similar sensitivity and may rule out a higher proportion of patients. Each of the algorithms can be used to safely identify patients as low risk for AMI on presentation to the Emergency Department. Copyright © 2017 The Canadian Society of Clinical Chemists. All rights reserved.

  14. COBRA ATD minefield detection model initial performance analysis

    NASA Astrophysics Data System (ADS)

    Holmes, V. Todd; Kenton, Arthur C.; Hilton, Russell J.; Witherspoon, Ned H.; Holloway, John H., Jr.

    2000-08-01

    A statistical performance analysis of the USMC Coastal Battlefield Reconnaissance and Analysis (COBRA) Minefield Detection (MFD) Model has been performed in support of the COBRA ATD Program under execution by the Naval Surface Warfare Center/Dahlgren Division/Coastal Systems Station . This analysis uses the Veridian ERIM International MFD model from the COBRA Sensor Performance Evaluation and Computational Tools for Research Analysis modeling toolbox and a collection of multispectral mine detection algorithm response distributions for mines and minelike clutter objects. These mine detection response distributions were generated form actual COBRA ATD test missions over littoral zone minefields. This analysis serves to validate both the utility and effectiveness of the COBRA MFD Model as a predictive MFD performance too. COBRA ATD minefield detection model algorithm performance results based on a simulate baseline minefield detection scenario are presented, as well as result of a MFD model algorithm parametric sensitivity study.

  15. Improving Classification of Cancer and Mining Biomarkers from Gene Expression Profiles Using Hybrid Optimization Algorithms and Fuzzy Support Vector Machine

    PubMed Central

    Moteghaed, Niloofar Yousefi; Maghooli, Keivan; Garshasbi, Masoud

    2018-01-01

    Background: Gene expression data are characteristically high dimensional with a small sample size in contrast to the feature size and variability inherent in biological processes that contribute to difficulties in analysis. Selection of highly discriminative features decreases the computational cost and complexity of the classifier and improves its reliability for prediction of a new class of samples. Methods: The present study used hybrid particle swarm optimization and genetic algorithms for gene selection and a fuzzy support vector machine (SVM) as the classifier. Fuzzy logic is used to infer the importance of each sample in the training phase and decrease the outlier sensitivity of the system to increase the ability to generalize the classifier. A decision-tree algorithm was applied to the most frequent genes to develop a set of rules for each type of cancer. This improved the abilities of the algorithm by finding the best parameters for the classifier during the training phase without the need for trial-and-error by the user. The proposed approach was tested on four benchmark gene expression profiles. Results: Good results have been demonstrated for the proposed algorithm. The classification accuracy for leukemia data is 100%, for colon cancer is 96.67% and for breast cancer is 98%. The results show that the best kernel used in training the SVM classifier is the radial basis function. Conclusions: The experimental results show that the proposed algorithm can decrease the dimensionality of the dataset, determine the most informative gene subset, and improve classification accuracy using the optimal parameters of the classifier with no user interface. PMID:29535919

  16. Advances in Machine Learning and Data Mining for Astronomy

    NASA Astrophysics Data System (ADS)

    Way, Michael J.; Scargle, Jeffrey D.; Ali, Kamal M.; Srivastava, Ashok N.

    2012-03-01

    Advances in Machine Learning and Data Mining for Astronomy documents numerous successful collaborations among computer scientists, statisticians, and astronomers who illustrate the application of state-of-the-art machine learning and data mining techniques in astronomy. Due to the massive amount and complexity of data in most scientific disciplines, the material discussed in this text transcends traditional boundaries between various areas in the sciences and computer science. The book's introductory part provides context to issues in the astronomical sciences that are also important to health, social, and physical sciences, particularly probabilistic and statistical aspects of classification and cluster analysis. The next part describes a number of astrophysics case studies that leverage a range of machine learning and data mining technologies. In the last part, developers of algorithms and practitioners of machine learning and data mining show how these tools and techniques are used in astronomical applications. With contributions from leading astronomers and computer scientists, this book is a practical guide to many of the most important developments in machine learning, data mining, and statistics. It explores how these advances can solve current and future problems in astronomy and looks at how they could lead to the creation of entirely new algorithms within the data mining community.

  17. Identifying Learning Behaviors by Contextualizing Differential Sequence Mining with Action Features and Performance Evolution

    ERIC Educational Resources Information Center

    Kinnebrew, John S.; Biswas, Gautam

    2012-01-01

    Our learning-by-teaching environment, Betty's Brain, captures a wealth of data on students' learning interactions as they teach a virtual agent. This paper extends an exploratory data mining methodology for assessing and comparing students' learning behaviors from these interaction traces. The core algorithm employs sequence mining techniques to…

  18. Adaptive semantic tag mining from heterogeneous clinical research texts.

    PubMed

    Hao, T; Weng, C

    2015-01-01

    To develop an adaptive approach to mine frequent semantic tags (FSTs) from heterogeneous clinical research texts. We develop a "plug-n-play" framework that integrates replaceable unsupervised kernel algorithms with formatting, functional, and utility wrappers for FST mining. Temporal information identification and semantic equivalence detection were two example functional wrappers. We first compared this approach's recall and efficiency for mining FSTs from ClinicalTrials.gov to that of a recently published tag-mining algorithm. Then we assessed this approach's adaptability to two other types of clinical research texts: clinical data requests and clinical trial protocols, by comparing the prevalence trends of FSTs across three texts. Our approach increased the average recall and speed by 12.8% and 47.02% respectively upon the baseline when mining FSTs from ClinicalTrials.gov, and maintained an overlap in relevant FSTs with the base- line ranging between 76.9% and 100% for varying FST frequency thresholds. The FSTs saturated when the data size reached 200 documents. Consistent trends in the prevalence of FST were observed across the three texts as the data size or frequency threshold changed. This paper contributes an adaptive tag-mining framework that is scalable and adaptable without sacrificing its recall. This component-based architectural design can be potentially generalizable to improve the adaptability of other clinical text mining methods.

  19. Air Pollution Monitoring and Mining Based on Sensor Grid in London

    PubMed Central

    Ma, Yajie; Richards, Mark; Ghanem, Moustafa; Guo, Yike; Hassard, John

    2008-01-01

    In this paper, we present a distributed infrastructure based on wireless sensors network and Grid computing technology for air pollution monitoring and mining, which aims to develop low-cost and ubiquitous sensor networks to collect real-time, large scale and comprehensive environmental data from road traffic emissions for air pollution monitoring in urban environment. The main informatics challenges in respect to constructing the high-throughput sensor Grid are discussed in this paper. We present a two-layer network framework, a P2P e-Science Grid architecture, and the distributed data mining algorithm as the solutions to address the challenges. We simulated the system in TinyOS to examine the operation of each sensor as well as the networking performance. We also present the distributed data mining result to examine the effectiveness of the algorithm. PMID:27879895

  20. Air Pollution Monitoring and Mining Based on Sensor Grid in London.

    PubMed

    Ma, Yajie; Richards, Mark; Ghanem, Moustafa; Guo, Yike; Hassard, John

    2008-06-01

    In this paper, we present a distributed infrastructure based on wireless sensors network and Grid computing technology for air pollution monitoring and mining, which aims to develop low-cost and ubiquitous sensor networks to collect real-time, large scale and comprehensive environmental data from road traffic emissions for air pollution monitoring in urban environment. The main informatics challenges in respect to constructing the high-throughput sensor Grid are discussed in this paper. We present a twolayer network framework, a P2P e-Science Grid architecture, and the distributed data mining algorithm as the solutions to address the challenges. We simulated the system in TinyOS to examine the operation of each sensor as well as the networking performance. We also present the distributed data mining result to examine the effectiveness of the algorithm.

  1. 75 FR 26828 - Self-Regulatory Organizations; Chicago Board Options Exchange, Incorporated; Notice of Filing and...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-05-12

    ... amend [sic] its rules relating to the Penny Pilot Program. The text of the rule proposal is available on... proposed rule change. The text of those statements may be examined at the places specified in Item IV below... Technology Select Sector XME SPDR S&P Metals & Mining SPDR Fund. ETF. AKS AK Steel Holding Corp... KGC...

  2. Algorithms and data structures for automated change detection and classification of sidescan sonar imagery

    NASA Astrophysics Data System (ADS)

    Gendron, Marlin Lee

    During Mine Warfare (MIW) operations, MIW analysts perform change detection by visually comparing historical sidescan sonar imagery (SSI) collected by a sidescan sonar with recently collected SSI in an attempt to identify objects (which might be explosive mines) placed at sea since the last time the area was surveyed. This dissertation presents a data structure and three algorithms, developed by the author, that are part of an automated change detection and classification (ACDC) system. MIW analysts at the Naval Oceanographic Office, to reduce the amount of time to perform change detection, are currently using ACDC. The dissertation introductory chapter gives background information on change detection, ACDC, and describes how SSI is produced from raw sonar data. Chapter 2 presents the author's Geospatial Bitmap (GB) data structure, which is capable of storing information geographically and is utilized by the three algorithms. This chapter shows that a GB data structure used in a polygon-smoothing algorithm ran between 1.3--48.4x faster than a sparse matrix data structure. Chapter 3 describes the GB clustering algorithm, which is the author's repeatable, order-independent method for clustering. Results from tests performed in this chapter show that the time to cluster a set of points is not affected by the distribution or the order of the points. In Chapter 4, the author presents his real-time computer-aided detection (CAD) algorithm that automatically detects mine-like objects on the seafloor in SSI. The author ran his GB-based CAD algorithm on real SSI data, and results of these tests indicate that his real-time CAD algorithm performs comparably to or better than other non-real-time CAD algorithms. The author presents his computer-aided search (CAS) algorithm in Chapter 5. CAS helps MIW analysts locate mine-like features that are geospatially close to previously detected features. A comparison between the CAS and a great circle distance algorithm shows that the CAS performs geospatial searching 1.75x faster on large data sets. Finally, the concluding chapter of this dissertation gives important details on how the completed ACDC system will function, and discusses the author's future research to develop additional algorithms and data structures for ACDC.

  3. Federal Register Notice for the Mining Waste Exclusion Final Rule, September 1, 1989

    EPA Pesticide Factsheets

    Final rule responding to a federal Appeals Court directive to narrow the exclusion of solid waste from the extraction, beneficiation, and processing of ores and minerals from regulation as hazardous waste as it applies to mineral processing wastes.

  4. Data Mining Web Services for Science Data Repositories

    NASA Astrophysics Data System (ADS)

    Graves, S.; Ramachandran, R.; Keiser, K.; Maskey, M.; Lynnes, C.; Pham, L.

    2006-12-01

    The maturation of web services standards and technologies sets the stage for a distributed "Service-Oriented Architecture" (SOA) for NASA's next generation science data processing. This architecture will allow members of the scientific community to create and combine persistent distributed data processing services and make them available to other users over the Internet. NASA has initiated a project to create a suite of specialized data mining web services designed specifically for science data. The project leverages the Algorithm Development and Mining (ADaM) toolkit as its basis. The ADaM toolkit is a robust, mature and freely available science data mining toolkit that is being used by several research organizations and educational institutions worldwide. These mining services will give the scientific community a powerful and versatile data mining capability that can be used to create higher order products such as thematic maps from current and future NASA satellite data records with methods that are not currently available. The package of mining and related services are being developed using Web Services standards so that community-based measurement processing systems can access and interoperate with them. These standards-based services allow users different options for utilizing them, from direct remote invocation by a client application to deployment of a Business Process Execution Language (BPEL) solutions package where a complex data mining workflow is exposed to others as a single service. The ability to deploy and operate these services at a data archive allows the data mining algorithms to be run where the data are stored, a more efficient scenario than moving large amounts of data over the network. This will be demonstrated in a scenario in which a user uses a remote Web-Service-enabled clustering algorithm to create cloud masks from satellite imagery at the Goddard Earth Sciences Data and Information Services Center (GES DISC).

  5. 43 CFR 4.1109 - Service.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... Special Rules Applicable to Surface Coal Mining Hearings and Appeals General Provisions § 4.1109 Service.... Department of the Interior, representing OSMRE in the state in which the mining operation at issue is located, and on any other statutory parties specified under § 4.1105 of this part. (2) The jurisdictions...

  6. 78 FR 37404 - Small Business Size Standards: Support Activities for Mining

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-06-20

    ... SMALL BUSINESS ADMINISTRATION 13 CFR Part 121 RIN 3245-AG44 Small Business Size Standards: Support Activities for Mining AGENCY: U.S. Small Business Administration. ACTION: Final rule. SUMMARY: The United States Small Business Administration (SBA) is increasing the small business size standards for three of...

  7. 26 CFR 1.611-5 - Depreciation of improvements.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... (CONTINUED) INCOME TAXES (CONTINUED) Natural Resources § 1.611-5 Depreciation of improvements. (a) In general. Section 611 provides in the case of mines, oil and gas wells, other natural deposits, and timber that...). (b) Special rules for mines, oil and gas wells, other natural deposits and timber. (1) For principles...

  8. 75 FR 21987 - Penalty Settlement Procedure

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-04-27

    ... and Health Act of 1977, or Mine Act. Hearings are held before the Commission's Administrative Law... settling civil penalties assessed under the Mine Act. DATES: The interim rule takes effect on May 27, 2010... Commission has explored is to simplify how it processes civil penalty settlements. Under section 110(k) of...

  9. 30 CFR 906.1 - Scope.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 30 Mineral Resources 3 2010-07-01 2010-07-01 false Scope. 906.1 Section 906.1 Mineral Resources... OF SURFACE MINING OPERATIONS WITHIN EACH STATE COLORADO § 906.1 Scope. This part contains all rules applicable only within Colorado that have been adopted under the Surface Mining Control and Reclamation Act...

  10. 75 FR 52980 - Submission for OMB Review; Comment Request

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-08-30

    .../maintaining): $303,512. Description: The Safety Standards for Underground Coal Mine Ventilation Belt Entry rule provides safety requirements for the use of the conveyor belt entry as a ventilation intake to... Underground Coal Mine Ventilation--Belt Entry Used as an Intake Air Course to Ventilate Working Sections and...

  11. Data Mining in Health and Medical Information.

    ERIC Educational Resources Information Center

    Bath, Peter A.

    2004-01-01

    Presents a literature review that covers the following topics related to data mining (DM) in health and medical information: the potential of DM in health and medicine; statistical methods; evaluation of methods; DM tools for health and medicine; inductive learning of symbolic rules; application of DM tools in diagnosis and prognosis; and…

  12. Army Needs to Identify Government Purchase Card High-Risk Transactions

    DTIC Science & Technology

    2012-01-20

    Purchase Card Program Data Mining Process Needs Improvement 11...Mining Process Needs Improvement The 17 transactions that were noncompliant occurred because cardholders ignored the GPC business rules so the...Scope and Methodology 16 Use of Computer- Processed Data 16 Use of Technical Assistance 17 Prior Coverage

  13. Rule Extraction Based on Extreme Learning Machine and an Improved Ant-Miner Algorithm for Transient Stability Assessment.

    PubMed

    Li, Yang; Li, Guoqing; Wang, Zhenhao

    2015-01-01

    In order to overcome the problems of poor understandability of the pattern recognition-based transient stability assessment (PRTSA) methods, a new rule extraction method based on extreme learning machine (ELM) and an improved Ant-miner (IAM) algorithm is presented in this paper. First, the basic principles of ELM and Ant-miner algorithm are respectively introduced. Then, based on the selected optimal feature subset, an example sample set is generated by the trained ELM-based PRTSA model. And finally, a set of classification rules are obtained by IAM algorithm to replace the original ELM network. The novelty of this proposal is that transient stability rules are extracted from an example sample set generated by the trained ELM-based transient stability assessment model by using IAM algorithm. The effectiveness of the proposed method is shown by the application results on the New England 39-bus power system and a practical power system--the southern power system of Hebei province.

  14. Sex-specific performance of pre-imaging diagnostic algorithms for pulmonary embolism.

    PubMed

    van Mens, T E; van der Pol, L M; van Es, N; Bistervels, I M; Mairuhu, A T A; van der Hulle, T; Klok, F A; Huisman, M V; Middeldorp, S

    2018-05-01

    Essentials Decision rules for pulmonary embolism are used indiscriminately despite possible sex-differences. Various pre-imaging diagnostic algorithms have been investigated in several prospective studies. When analysed at an individual patient data level the algorithms perform similarly in both sexes. Estrogen use and male sex were associated with a higher prevalence in suspected pulmonary embolism. Background In patients suspected of pulmonary embolism (PE), clinical decision rules are combined with D-dimer testing to rule out PE, avoiding the need for imaging in those at low risk. Despite sex differences in several aspects of the disease, including its diagnosis, these algorithms are used indiscriminately in women and men. Objectives To compare the performance, defined as efficiency and failure rate, of three pre-imaging diagnostic algorithms for PE between women and men: the Wells rule with fixed or with age-adjusted D-dimer cut-off, and a recently validated algorithm (YEARS). A secondary aim was to determine the sex-specific prevalence of PE. Methods Individual patient data were obtained from six studies using the Wells rule (fixed D-dimer, n = 5; age adjusted, n = 1) and from one study using the YEARS algorithm. All studies prospectively enrolled consecutive patients with suspected PE. Main outcomes were efficiency (proportion of patients in which the algorithm ruled out PE without imaging) and failure rate (proportion of patients with PE not detected by the algorithm). Outcomes were estimated using (multilevel) logistic regression models. Results The main outcomes showed no sex differences in any of the separate algorithms. With all three, the prevalence of PE was lower in women (OR, 0.66, 0.68 and 0.74). In women, estrogen use, adjusted for age, was associated with lower efficiency and higher prevalence and D-dimer levels. Conclusions The investigated pre-imaging diagnostic algorithms for patients suspected of PE show no sex differences in performance. Male sex and estrogen use are both associated with a higher probability of having the disease. © 2018 International Society on Thrombosis and Haemostasis.

  15. Economic evaluation of the one-hour rule-out and rule-in algorithm for acute myocardial infarction using the high-sensitivity cardiac troponin T assay in the emergency department.

    PubMed

    Ambavane, Apoorva; Lindahl, Bertil; Giannitsis, Evangelos; Roiz, Julie; Mendivil, Joan; Frankenstein, Lutz; Body, Richard; Christ, Michael; Bingisser, Roland; Alquezar, Aitor; Mueller, Christian

    2017-01-01

    The 1-hour (h) algorithm triages patients presenting with suspected acute myocardial infarction (AMI) to the emergency department (ED) towards "rule-out," "rule-in," or "observation," depending on baseline and 1-h levels of high-sensitivity cardiac troponin (hs-cTn). The economic consequences of applying the accelerated 1-h algorithm are unknown. We performed a post-hoc economic analysis in a large, diagnostic, multicenter study of hs-cTnT using central adjudication of the final diagnosis by two independent cardiologists. Length of stay (LoS), resource utilization (RU), and predicted diagnostic accuracy of the 1-h algorithm compared to standard of care (SoC) in the ED were estimated. The ED LoS, RU, and accuracy of the 1-h algorithm was compared to that achieved by the SoC at ED discharge. Expert opinion was sought to characterize clinical implementation of the 1-h algorithm, which required blood draws at ED presentation and 1h, after which "rule-in" patients were transferred for coronary angiography, "rule-out" patients underwent outpatient stress testing, and "observation" patients received SoC. Unit costs were for the United Kingdom, Switzerland, and Germany. The sensitivity and specificity for the 1-h algorithm were 87% and 96%, respectively, compared to 69% and 98% for SoC. The mean ED LoS for the 1-h algorithm was 4.3h-it was 6.5h for SoC, which is a reduction of 33%. The 1-h algorithm was associated with reductions in RU, driven largely by the shorter LoS in the ED for patients with a diagnosis other than AMI. The estimated total costs per patient were £2,480 for the 1-h algorithm compared to £4,561 for SoC, a reduction of up to 46%. The analysis shows that the use of 1-h algorithm is associated with reduction in overall AMI diagnostic costs, provided it is carefully implemented in clinical practice. These results need to be prospectively validated in the future.

  16. Comparative performance between compressed and uncompressed airborne imagery

    NASA Astrophysics Data System (ADS)

    Phan, Chung; Rupp, Ronald; Agarwal, Sanjeev; Trang, Anh; Nair, Sumesh

    2008-04-01

    The US Army's RDECOM CERDEC Night Vision and Electronic Sensors Directorate (NVESD), Countermine Division is evaluating the compressibility of airborne multi-spectral imagery for mine and minefield detection application. Of particular interest is to assess the highest image data compression rate that can be afforded without the loss of image quality for war fighters in the loop and performance of near real time mine detection algorithm. The JPEG-2000 compression standard is used to perform data compression. Both lossless and lossy compressions are considered. A multi-spectral anomaly detector such as RX (Reed & Xiaoli), which is widely used as a core algorithm baseline in airborne mine and minefield detection on different mine types, minefields, and terrains to identify potential individual targets, is used to compare the mine detection performance. This paper presents the compression scheme and compares detection performance results between compressed and uncompressed imagery for various level of compressions. The compression efficiency is evaluated and its dependence upon different backgrounds and other factors are documented and presented using multi-spectral data.

  17. Detecting Malicious Tweets in Twitter Using Runtime Monitoring With Hidden Information

    DTIC Science & Technology

    2016-06-01

    text mining using Twitter streaming API and python [Online]. Available: http://adilmoujahid.com/posts/2014/07/twitter-analytics/ [22] M. Singh, B...sites with 645,750,000 registered users [3] and has open source public tweets for data mining . 2. Malicious Users and Tweets In the modern world...want to data mine in Twitter, and presents the natural language assertions and corresponding rule patterns. It then describes the steps performed using

  18. Bayesian design of decision rules for failure detection

    NASA Technical Reports Server (NTRS)

    Chow, E. Y.; Willsky, A. S.

    1984-01-01

    The formulation of the decision making process of a failure detection algorithm as a Bayes sequential decision problem provides a simple conceptualization of the decision rule design problem. As the optimal Bayes rule is not computable, a methodology that is based on the Bayesian approach and aimed at a reduced computational requirement is developed for designing suboptimal rules. A numerical algorithm is constructed to facilitate the design and performance evaluation of these suboptimal rules. The result of applying this design methodology to an example shows that this approach is potentially a useful one.

  19. Algorithm Diversity for Resilent Systems

    DTIC Science & Technology

    2016-06-27

    data structures. 15. SUBJECT TERMS computer security, software diversity, program transformation 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18...systematic method for transforming Datalog rules with general universal and existential quantification into efficient algorithms with precise complexity...worst case in the size of the ground rules. There are numerous choices during the transformation that lead to diverse algorithms and different

  20. 75 FR 73995 - Lowering Miners' Exposure to Respirable Coal Mine Dust, Including Continuous Personal Dust Monitors

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-11-30

    ... http://www.msha.gov/REGS/FEDREG/PROPOSED/2010PROP/2010-25249.pdf . The proposed rule would revise the.../PROPOSED/2010PROP/2010-25249.pdf . The following error in the preamble to the proposed rule is corrected to...

  1. Integrated mined-area reclamation and land-use planning. Volume 3C. A case study of surface mining and reclamation planning: Georgia Kaolin Company Clay Mines, Washington County, Georgia

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Guernsey, J L; Brown, L A; Perry, A O

    1978-02-01

    This case study examines the reclamation practices of the Georgia Kaolin's American Industrial Clay Company Division, a kaolin producer centered in Twiggs, Washington, and Wilkinson Counties, Georgia. The State of Georgia accounts for more than one-fourth of the world's kaolin production and about three-fourths of U.S. kaolin output. The mining of kaolin in Georgia illustrates the effects of mining and reclaiming lands disturbed by area surface mining. The disturbed areas are reclaimed under the rules and regulations of the Georgia Surface Mining Act of 1968. The natural conditions influencing the reclamation methodologies and techniques are markedly unique from those ofmore » other mining operations. The environmental disturbances and procedures used in reclaiming the kaolin mined lands are reviewed and implications for planners are noted.« less

  2. 78 FR 5055 - Pattern of Violations

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-01-23

    ...The Mine Safety and Health Administration (MSHA) is revising the Agency's existing regulation for pattern of violations (POV). MSHA has determined that the existing regulation does not adequately achieve the intent of the Federal Mine Safety and Health Act of 1977 (Mine Act) that the POV provision be used to address mine operators who have demonstrated a disregard for the health and safety of miners. Congress included the POV provision in the Mine Act so that mine operators would manage health and safety conditions at mines and find and fix the root causes of significant and substantial (S&S) violations, protecting the health and safety of miners. The final rule simplifies the existing POV criteria, improves consistency in applying the POV criteria, and more effectively achieves the Mine Act's statutory intent. It also encourages chronic safety violators to comply with the Mine Act and MSHA's health and safety standards.

  3. Data mining: sophisticated forms of managed care modeling through artificial intelligence.

    PubMed

    Borok, L S

    1997-01-01

    Data mining is a recent development in computer science that combines artificial intelligence algorithms and relational databases to discover patterns automatically, without the use of traditional statistical methods. Work with data mining tools in health care is in a developmental stage that holds great promise, given the combination of demographic and diagnostic information.

  4. Computer-aided visual assessment in mine planning and design

    Treesearch

    Michael Hatfield; A. J. LeRoy Balzer; Roger E. Nelson

    1979-01-01

    A computer modeling technique is described for evaluating the visual impact of a proposed surface mine located within the viewshed of a national park. A computer algorithm analyzes digitized USGS baseline topography and identifies areas subject to surface disturbance visible from the park. Preliminary mine and reclamation plan information is used to describe how the...

  5. An algorithm for rule-in and rule-out of acute myocardial infarction using a novel troponin I assay.

    PubMed

    Lindahl, Bertil; Jernberg, Tomas; Badertscher, Patrick; Boeddinghaus, Jasper; Eggers, Kai M; Frick, Mats; Rubini Gimenez, Maria; Linder, Rickard; Ljung, Lina; Martinsson, Arne; Melki, Dina; Nestelberger, Thomas; Rentsch, Katharina; Reichlin, Tobias; Sabti, Zaid; Schubera, Marie; Svensson, Per; Twerenbold, Raphael; Wildi, Karin; Mueller, Christian

    2017-01-15

    To derive and validate a hybrid algorithm for rule-out and rule-in of acute myocardial infarction based on measurements at presentation and after 2 hours with a novel cardiac troponin I (cTnI) assay. The algorithm was derived and validated in two cohorts (605 and 592 patients) from multicentre studies enrolling chest pain patients presenting to the emergency department (ED) with onset of last episode within 12 hours. The index diagnosis and cardiovascular events up to 30 days were adjudicated by independent reviewers. In the validation cohort, 32.6% of the patients were ruled out on ED presentation, 6.1% were ruled in and 61.3% remained undetermined. A further 22% could be ruled out and 9.8% ruled in, after 2 hours. In total, 54.6% of the patients were ruled out with a negative predictive value (NPV) of 99.4% (95% CI 97.8% to 99.9%) and a sensitivity of 97.7% (95% CI 91.9% to 99.7%); 15.8% were ruled in with a positive predictive value (PPV) of 74.5% (95% CI 64.8% to 82.2%) and a specificity of 95.2% (95% CI 93.0% to 96.9%); and 29.6% remained undetermined after 2 hours. No patient in the rule-out group died during the 30-day follow-up in the two cohorts. This novel two-step algorithm based on cTnI measurements enabled just over a third of the patients with acute chest pain to be ruled in or ruled out already at presentation and an additional third after 2 hours. This strategy maximises the speed of rule-out and rule-in while maintaining a high NPV and PPV, respectively. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  6. Rules of meridians and acupoints selection in treatment of Parkinson's disease based on data mining techniques.

    PubMed

    Li, Zhe; Hu, Ying-Yu; Zheng, Chun-Ye; Su, Qiao-Zhen; An, Chang; Luo, Xiao-Dong; Liu, Mao-Cai

    2018-01-15

    To help selecting appropriate meridians and acupoints in clinical practice and experimental study for Parkinson's disease (PD), the rules of meridians and acupoints selection of acupuncture and moxibustion were analyzed in domestic and foreign clinical treatment for PD based on data mining techniques. Literature about PD treated by acupuncture and moxibustion in China and abroad was searched and selected from China National Knowledge Infrastructure and MEDLINE. Then the data from all eligible articles were extracted to establish the database of acupuncture-moxibustion for PD. The association rules of data mining techniques were used to analyze the rules of meridians and acupoints selection. Totally, 168 eligible articles were included and 184 acupoints were applied. The total frequency of acupoints application was 1,090 times. Those acupoints were mainly distributed in head and neck and extremities. Among all, Taichong (LR 3), Baihui (DU 20), Fengchi (GB 20), Hegu (LI 4) and Chorea-tremor Controlled Zone were the top five acupoints that had been used. Superior-inferior acupoints matching was utilized the most. As to involved meridians, Du Meridian, Dan (Gallbladder) Meridian, Dachang (Large Intestine) Meridian, and Gan (Liver) Meridian were the most popular meridians. The application of meridians and acupoints for PD treatment lay emphasis on the acupoints on the head, attach importance to extinguishing Gan wind, tonifying qi and blood, and nourishing sinews, and make good use of superior-inferior acupoints matching.

  7. Association Rule Based Feature Extraction for Character Recognition

    NASA Astrophysics Data System (ADS)

    Dua, Sumeet; Singh, Harpreet

    Association rules that represent isomorphisms among data have gained importance in exploratory data analysis because they can find inherent, implicit, and interesting relationships among data. They are also commonly used in data mining to extract the conditions among attribute values that occur together frequently in a dataset [1]. These rules have wide range of applications, namely in the financial and retail sectors of marketing, sales, and medicine.

  8. Simulation-Based Rule Generation Considering Readability

    PubMed Central

    Yahagi, H.; Shimizu, S.; Ogata, T.; Hara, T.; Ota, J.

    2015-01-01

    Rule generation method is proposed for an aircraft control problem in an airport. Designing appropriate rules for motion coordination of taxiing aircraft in the airport is important, which is conducted by ground control. However, previous studies did not consider readability of rules, which is important because it should be operated and maintained by humans. Therefore, in this study, using the indicator of readability, we propose a method of rule generation based on parallel algorithm discovery and orchestration (PADO). By applying our proposed method to the aircraft control problem, the proposed algorithm can generate more readable and more robust rules and is found to be superior to previous methods. PMID:27347501

  9. Knowledge discovery from patients' behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services.

    PubMed

    Zare Hosseini, Zeinab; Mohammadzadeh, Mahdi

    2016-01-01

    The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer demographic and transactions information. Data mining techniques can be used to analyze this data and discover hidden knowledge of customers. This research develops an extended RFM model, namely RFML (added parameter: Length) based on health care services for a public sector hospital in Iran with the idea that there is contrast between patient and customer loyalty, to estimate customer life time value (CLV) for each patient. We used Two-step and K-means algorithms as clustering methods and Decision tree (CHAID) as classification technique to segment the patients to find out target, potential and loyal customers in order to implement strengthen CRM. Two approaches are used for classification: first, the result of clustering is considered as Decision attribute in classification process and second, the result of segmentation based on CLV value of patients (estimated by RFML) is considered as Decision attribute. Finally the results of CHAID algorithm show the significant hidden rules and identify existing patterns of hospital consumers.

  10. Knowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services

    PubMed Central

    Zare Hosseini, Zeinab; Mohammadzadeh, Mahdi

    2016-01-01

    The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer demographic and transactions information. Data mining techniques can be used to analyze this data and discover hidden knowledge of customers. This research develops an extended RFM model, namely RFML (added parameter: Length) based on health care services for a public sector hospital in Iran with the idea that there is contrast between patient and customer loyalty, to estimate customer life time value (CLV) for each patient. We used Two-step and K-means algorithms as clustering methods and Decision tree (CHAID) as classification technique to segment the patients to find out target, potential and loyal customers in order to implement strengthen CRM. Two approaches are used for classification: first, the result of clustering is considered as Decision attribute in classification process and second, the result of segmentation based on CLV value of patients (estimated by RFML) is considered as Decision attribute. Finally the results of CHAID algorithm show the significant hidden rules and identify existing patterns of hospital consumers. PMID:27610177

  11. Mind your crossings: Mining GIS imagery for crosswalk localization.

    PubMed

    Ahmetovic, Dragan; Manduchi, Roberto; Coughlan, James M; Mascetti, Sergio

    2017-04-01

    For blind travelers, finding crosswalks and remaining within their borders while traversing them is a crucial part of any trip involving street crossings. While standard Orientation & Mobility (O&M) techniques allow blind travelers to safely negotiate street crossings, additional information about crosswalks and other important features at intersections would be helpful in many situations, resulting in greater safety and/or comfort during independent travel. For instance, in planning a trip a blind pedestrian may wish to be informed of the presence of all marked crossings near a desired route. We have conducted a survey of several O&M experts from the United States and Italy to determine the role that crosswalks play in travel by blind pedestrians. The results show stark differences between survey respondents from the U.S. compared with Italy: the former group emphasized the importance of following standard O&M techniques at all legal crossings (marked or unmarked), while the latter group strongly recommended crossing at marked crossings whenever possible. These contrasting opinions reflect differences in the traffic regulations of the two countries and highlight the diversity of needs that travelers in different regions may have. To address the challenges faced by blind pedestrians in negotiating street crossings, we devised a computer vision-based technique that mines existing spatial image databases for discovery of zebra crosswalks in urban settings. Our algorithm first searches for zebra crosswalks in satellite images; all candidates thus found are validated against spatially registered Google Street View images. This cascaded approach enables fast and reliable discovery and localization of zebra crosswalks in large image datasets. While fully automatic, our algorithm can be improved by a final crowdsourcing validation. To this end, we developed a Pedestrian Crossing Human Validation (PCHV) web service, which supports crowdsourcing to rule out false positives and identify false negatives.

  12. Mind your crossings: Mining GIS imagery for crosswalk localization

    PubMed Central

    Ahmetovic, Dragan; Manduchi, Roberto; Coughlan, James M.; Mascetti, Sergio

    2017-01-01

    For blind travelers, finding crosswalks and remaining within their borders while traversing them is a crucial part of any trip involving street crossings. While standard Orientation & Mobility (O&M) techniques allow blind travelers to safely negotiate street crossings, additional information about crosswalks and other important features at intersections would be helpful in many situations, resulting in greater safety and/or comfort during independent travel. For instance, in planning a trip a blind pedestrian may wish to be informed of the presence of all marked crossings near a desired route. We have conducted a survey of several O&M experts from the United States and Italy to determine the role that crosswalks play in travel by blind pedestrians. The results show stark differences between survey respondents from the U.S. compared with Italy: the former group emphasized the importance of following standard O&M techniques at all legal crossings (marked or unmarked), while the latter group strongly recommended crossing at marked crossings whenever possible. These contrasting opinions reflect differences in the traffic regulations of the two countries and highlight the diversity of needs that travelers in different regions may have. To address the challenges faced by blind pedestrians in negotiating street crossings, we devised a computer vision-based technique that mines existing spatial image databases for discovery of zebra crosswalks in urban settings. Our algorithm first searches for zebra crosswalks in satellite images; all candidates thus found are validated against spatially registered Google Street View images. This cascaded approach enables fast and reliable discovery and localization of zebra crosswalks in large image datasets. While fully automatic, our algorithm can be improved by a final crowdsourcing validation. To this end, we developed a Pedestrian Crossing Human Validation (PCHV) web service, which supports crowdsourcing to rule out false positives and identify false negatives. PMID:28757907

  13. Data mining in soft computing framework: a survey.

    PubMed

    Mitra, S; Pal, S K; Mitra, P

    2002-01-01

    The present article provides a survey of the available literature on data mining using soft computing. A categorization has been provided based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the model. The utility of the different soft computing methodologies is highlighted. Generally fuzzy sets are suitable for handling the issues related to understandability of patterns, incomplete/noisy data, mixed media information and human interaction, and can provide approximate solutions faster. Neural networks are nonparametric, robust, and exhibit good learning and generalization capabilities in data-rich environments. Genetic algorithms provide efficient search algorithms to select a model, from mixed media data, based on some preference criterion/objective function. Rough sets are suitable for handling different types of uncertainty in data. Some challenges to data mining and the application of soft computing methodologies are indicated. An extensive bibliography is also included.

  14. Application of data mining in science and technology management information system based on WebGIS

    NASA Astrophysics Data System (ADS)

    Wu, Xiaofang; Xu, Zhiyong; Bao, Shitai; Chen, Feixiang

    2009-10-01

    With the rapid development of science and technology and the quick increase of information, a great deal of data is accumulated in the management department of science and technology. Usually, many knowledge and rules are contained and concealed in the data. Therefore, how to excavate and use the knowledge fully is very important in the management of science and technology. It will help to examine and approve the project of science and technology more scientifically and make the achievement transformed as the realistic productive forces easier. Therefore, the data mine technology will be researched and applied to the science and technology management information system to find and excavate the knowledge in the paper. According to analyzing the disadvantages of traditional science and technology management information system, the database technology, data mining and web geographic information systems (WebGIS) technology will be introduced to develop and construct the science and technology management information system based on WebGIS. The key problems are researched in detail such as data mining and statistical analysis. What's more, the prototype system is developed and validated based on the project data of National Natural Science Foundation Committee. The spatial data mining is done from the axis of time, space and other factors. Then the variety of knowledge and rules will be excavated by using data mining technology, which helps to provide an effective support for decisionmaking.

  15. Non-Convex Sparse and Low-Rank Based Robust Subspace Segmentation for Data Mining.

    PubMed

    Cheng, Wenlong; Zhao, Mingbo; Xiong, Naixue; Chui, Kwok Tai

    2017-07-15

    Parsimony, including sparsity and low-rank, has shown great importance for data mining in social networks, particularly in tasks such as segmentation and recognition. Traditionally, such modeling approaches rely on an iterative algorithm that minimizes an objective function with convex l ₁-norm or nuclear norm constraints. However, the obtained results by convex optimization are usually suboptimal to solutions of original sparse or low-rank problems. In this paper, a novel robust subspace segmentation algorithm has been proposed by integrating l p -norm and Schatten p -norm constraints. Our so-obtained affinity graph can better capture local geometrical structure and the global information of the data. As a consequence, our algorithm is more generative, discriminative and robust. An efficient linearized alternating direction method is derived to realize our model. Extensive segmentation experiments are conducted on public datasets. The proposed algorithm is revealed to be more effective and robust compared to five existing algorithms.

  16. 76 FR 6110 - Mine Safety Disclosure

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-02-03

    ... Comments Use the Commission's Internet comment form ( http://www.sec.gov/rules/proposed.shtml ); Send an e... all comments on the Commission's Internet Web site ( http://www.sec.gov/rules/proposed.shtml... on the proposal to, among other things, allow for the collection of information and improve the...

  17. 30 CFR 937.700 - Oregon Federal program.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 30 Mineral Resources 3 2012-07-01 2012-07-01 false Oregon Federal program. 937.700 Section 937.700... PROGRAMS FOR THE CONDUCT OF SURFACE MINING OPERATIONS WITHIN EACH STATE OREGON § 937.700 Oregon Federal program. (a) This part contains all rules that are applicable to surface coal mining operations in Oregon...

  18. 30 CFR 937.700 - Oregon Federal program.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 30 Mineral Resources 3 2011-07-01 2011-07-01 false Oregon Federal program. 937.700 Section 937.700... PROGRAMS FOR THE CONDUCT OF SURFACE MINING OPERATIONS WITHIN EACH STATE OREGON § 937.700 Oregon Federal program. (a) This part contains all rules that are applicable to surface coal mining operations in Oregon...

  19. 30 CFR 937.700 - Oregon Federal program.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 30 Mineral Resources 3 2014-07-01 2014-07-01 false Oregon Federal program. 937.700 Section 937.700... PROGRAMS FOR THE CONDUCT OF SURFACE MINING OPERATIONS WITHIN EACH STATE OREGON § 937.700 Oregon Federal program. (a) This part contains all rules that are applicable to surface coal mining operations in Oregon...

  20. 76 FR 12648 - Lowering Miners' Exposure to Respirable Coal Mine Dust, Including Continuous Personal Dust Monitors

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-03-08

    ... be appropriate to use on a short-term basis. 13. The proposed rule addresses (1) which occupations... for respirable coal mine dust, provide for full- shift sampling, redefine the term ``normal production... respect to their availability. If shorter or longer timeframes are recommended, please provide the...

Top