Sample records for mining frequent patterns

  1. Handling Dynamic Weights in Weighted Frequent Pattern Mining

    NASA Astrophysics Data System (ADS)

    Ahmed, Chowdhury Farhan; Tanbeer, Syed Khairuzzaman; Jeong, Byeong-Soo; Lee, Young-Koo

    Even though weighted frequent pattern (WFP) mining is more effective than traditional frequent pattern mining because it can consider different semantic significances (weights) of items, existing WFP algorithms assume that each item has a fixed weight. But in real world scenarios, the weight (price or significance) of an item can vary with time. Reflecting these changes in item weight is necessary in several mining applications, such as retail market data analysis and web click stream analysis. In this paper, we introduce the concept of a dynamic weight for each item, and propose an algorithm, DWFPM (dynamic weighted frequent pattern mining), that makes use of this concept. Our algorithm can address situations where the weight (price or significance) of an item varies dynamically. It exploits a pattern growth mining technique to avoid the level-wise candidate set generation-and-test methodology. Furthermore, it requires only one database scan, so it is eligible for use in stream data mining. An extensive performance analysis shows that our algorithm is efficient and scalable for WFP mining using dynamic weights.

  2. Research on parallel algorithm for sequential pattern mining

    NASA Astrophysics Data System (ADS)

    Zhou, Lijuan; Qin, Bai; Wang, Yu; Hao, Zhongxiao

    2008-03-01

    Sequential pattern mining is the mining of frequent sequences related to time or other orders from the sequence database. Its initial motivation is to discover the laws of customer purchasing in a time section by finding the frequent sequences. In recent years, sequential pattern mining has become an important direction of data mining, and its application field has not been confined to the business database and has extended to new data sources such as Web and advanced science fields such as DNA analysis. The data of sequential pattern mining has characteristics as follows: mass data amount and distributed storage. Most existing sequential pattern mining algorithms haven't considered the above-mentioned characteristics synthetically. According to the traits mentioned above and combining the parallel theory, this paper puts forward a new distributed parallel algorithm SPP(Sequential Pattern Parallel). The algorithm abides by the principal of pattern reduction and utilizes the divide-and-conquer strategy for parallelization. The first parallel task is to construct frequent item sets applying frequent concept and search space partition theory and the second task is to structure frequent sequences using the depth-first search method at each processor. The algorithm only needs to access the database twice and doesn't generate the candidated sequences, which abates the access time and improves the mining efficiency. Based on the random data generation procedure and different information structure designed, this paper simulated the SPP algorithm in a concrete parallel environment and implemented the AprioriAll algorithm. The experiments demonstrate that compared with AprioriAll, the SPP algorithm had excellent speedup factor and efficiency.

  3. Association mining of dependency between time series

    NASA Astrophysics Data System (ADS)

    Hafez, Alaaeldin

    2001-03-01

    Time series analysis is considered as a crucial component of strategic control over a broad variety of disciplines in business, science and engineering. Time series data is a sequence of observations collected over intervals of time. Each time series describes a phenomenon as a function of time. Analysis on time series data includes discovering trends (or patterns) in a time series sequence. In the last few years, data mining has emerged and been recognized as a new technology for data analysis. Data Mining is the process of discovering potentially valuable patterns, associations, trends, sequences and dependencies in data. Data mining techniques can discover information that many traditional business analysis and statistical techniques fail to deliver. In this paper, we adapt and innovate data mining techniques to analyze time series data. By using data mining techniques, maximal frequent patterns are discovered and used in predicting future sequences or trends, where trends describe the behavior of a sequence. In order to include different types of time series (e.g. irregular and non- systematic), we consider past frequent patterns of the same time sequences (local patterns) and of other dependent time sequences (global patterns). We use the word 'dependent' instead of the word 'similar' for emphasis on real life time series where two time series sequences could be completely different (in values, shapes, etc.), but they still react to the same conditions in a dependent way. In this paper, we propose the Dependence Mining Technique that could be used in predicting time series sequences. The proposed technique consists of three phases: (a) for all time series sequences, generate their trend sequences, (b) discover maximal frequent trend patterns, generate pattern vectors (to keep information of frequent trend patterns), use trend pattern vectors to predict future time series sequences.

  4. A primer to frequent itemset mining for bioinformatics

    PubMed Central

    Naulaerts, Stefan; Meysman, Pieter; Bittremieux, Wout; Vu, Trung Nghia; Vanden Berghe, Wim; Goethals, Bart

    2015-01-01

    Over the past two decades, pattern mining techniques have become an integral part of many bioinformatics solutions. Frequent itemset mining is a popular group of pattern mining techniques designed to identify elements that frequently co-occur. An archetypical example is the identification of products that often end up together in the same shopping basket in supermarket transactions. A number of algorithms have been developed to address variations of this computationally non-trivial problem. Frequent itemset mining techniques are able to efficiently capture the characteristics of (complex) data and succinctly summarize it. Owing to these and other interesting properties, these techniques have proven their value in biological data analysis. Nevertheless, information about the bioinformatics applications of these techniques remains scattered. In this primer, we introduce frequent itemset mining and their derived association rules for life scientists. We give an overview of various algorithms, and illustrate how they can be used in several real-life bioinformatics application domains. We end with a discussion of the future potential and open challenges for frequent itemset mining in the life sciences. PMID:24162173

  5. Combined mining: discovering informative knowledge in complex data.

    PubMed

    Cao, Longbing; Zhang, Huaifeng; Zhao, Yanchang; Luo, Dan; Zhang, Chengqi

    2011-06-01

    Enterprise data mining applications often involve complex data such as multiple large heterogeneous data sources, user preferences, and business impact. In such situations, a single method or one-step mining is often limited in discovering informative knowledge. It would also be very time and space consuming, if not impossible, to join relevant large data sources for mining patterns consisting of multiple aspects of information. It is crucial to develop effective approaches for mining patterns combining necessary information from multiple relevant business lines, catering for real business settings and decision-making actions rather than just providing a single line of patterns. The recent years have seen increasing efforts on mining more informative patterns, e.g., integrating frequent pattern mining with classifications to generate frequent pattern-based classifiers. Rather than presenting a specific algorithm, this paper builds on our existing works and proposes combined mining as a general approach to mining for informative patterns combining components from either multiple data sets or multiple features or by multiple methods on demand. We summarize general frameworks, paradigms, and basic processes for multifeature combined mining, multisource combined mining, and multimethod combined mining. Novel types of combined patterns, such as incremental cluster patterns, can result from such frameworks, which cannot be directly produced by the existing methods. A set of real-world case studies has been conducted to test the frameworks, with some of them briefed in this paper. They identify combined patterns for informing government debt prevention and improving government service objectives, which show the flexibility and instantiation capability of combined mining in discovering informative knowledge in complex data.

  6. A gossip based information fusion protocol for distributed frequent itemset mining

    NASA Astrophysics Data System (ADS)

    Sohrabi, Mohammad Karim

    2018-07-01

    The computational complexity, huge memory space requirement, and time-consuming nature of frequent pattern mining process are the most important motivations for distribution and parallelization of this mining process. On the other hand, the emergence of distributed computational and operational environments, which causes the production and maintenance of data on different distributed data sources, makes the parallelization and distribution of the knowledge discovery process inevitable. In this paper, a gossip based distributed itemset mining (GDIM) algorithm is proposed to extract frequent itemsets, which are special types of frequent patterns, in a wireless sensor network environment. In this algorithm, local frequent itemsets of each sensor are extracted using a bit-wise horizontal approach (LHPM) from the nodes which are clustered using a leach-based protocol. Heads of clusters exploit a gossip based protocol in order to communicate each other to find the patterns which their global support is equal to or more than the specified support threshold. Experimental results show that the proposed algorithm outperforms the best existing gossip based algorithm in term of execution time.

  7. Frequent Itemset Hiding Algorithm Using Frequent Pattern Tree Approach

    ERIC Educational Resources Information Center

    Alnatsheh, Rami

    2012-01-01

    A problem that has been the focus of much recent research in privacy preserving data-mining is the frequent itemset hiding (FIH) problem. Identifying itemsets that appear together frequently in customer transactions is a common task in association rule mining. Organizations that share data with business partners may consider some of the frequent…

  8. Personalized Privacy-Preserving Frequent Itemset Mining Using Randomized Response

    PubMed Central

    Sun, Chongjing; Fu, Yan; Zhou, Junlin; Gao, Hui

    2014-01-01

    Frequent itemset mining is the important first step of association rule mining, which discovers interesting patterns from the massive data. There are increasing concerns about the privacy problem in the frequent itemset mining. Some works have been proposed to handle this kind of problem. In this paper, we introduce a personalized privacy problem, in which different attributes may need different privacy levels protection. To solve this problem, we give a personalized privacy-preserving method by using the randomized response technique. By providing different privacy levels for different attributes, this method can get a higher accuracy on frequent itemset mining than the traditional method providing the same privacy level. Finally, our experimental results show that our method can have better results on the frequent itemset mining while preserving personalized privacy. PMID:25143989

  9. Personalized privacy-preserving frequent itemset mining using randomized response.

    PubMed

    Sun, Chongjing; Fu, Yan; Zhou, Junlin; Gao, Hui

    2014-01-01

    Frequent itemset mining is the important first step of association rule mining, which discovers interesting patterns from the massive data. There are increasing concerns about the privacy problem in the frequent itemset mining. Some works have been proposed to handle this kind of problem. In this paper, we introduce a personalized privacy problem, in which different attributes may need different privacy levels protection. To solve this problem, we give a personalized privacy-preserving method by using the randomized response technique. By providing different privacy levels for different attributes, this method can get a higher accuracy on frequent itemset mining than the traditional method providing the same privacy level. Finally, our experimental results show that our method can have better results on the frequent itemset mining while preserving personalized privacy.

  10. Unravelling associations between unassigned mass spectrometry peaks with frequent itemset mining techniques.

    PubMed

    Vu, Trung Nghia; Mrzic, Aida; Valkenborg, Dirk; Maes, Evelyne; Lemière, Filip; Goethals, Bart; Laukens, Kris

    2014-01-01

    Mass spectrometry-based proteomics experiments generate spectra that are rich in information. Often only a fraction of this information is used for peptide/protein identification, whereas a significant proportion of the peaks in a spectrum remain unexplained. In this paper we explore how a specific class of data mining techniques termed "frequent itemset mining" can be employed to discover patterns in the unassigned data, and how such patterns can help us interpret the origin of the unexpected/unexplained peaks. First a model is proposed that describes the origin of the observed peaks in a mass spectrum. For this purpose we use the classical correlative database search algorithm. Peaks that support a positive identification of the spectrum are termed explained peaks. Next, frequent itemset mining techniques are introduced to infer which unexplained peaks are associated in a spectrum. The method is validated on two types of experimental proteomic data. First, peptide mass fingerprint data is analyzed to explain the unassigned peaks in a full scan mass spectrum. Interestingly, a large numbers of experimental spectra reveals several highly frequent unexplained masses, and pattern mining on these frequent masses demonstrates that subsets of these peaks frequently co-occur. Further evaluation shows that several of these co-occurring peaks indeed have a known common origin, and other patterns are promising hypothesis generators for further analysis. Second, the proposed methodology is validated on tandem mass spectrometral data using a public spectral library, where associations within the mass differences of unassigned peaks and peptide modifications are explored. The investigation of the found patterns illustrates that meaningful patterns can be discovered that can be explained by features of the employed technology and found modifications. This simple approach offers opportunities to monitor accumulating unexplained mass spectrometry data for emerging new patterns, with possible applications for the development of mass exclusion lists, for the refinement of quality control strategies and for a further interpretation of unexplained spectral peaks in mass spectrometry and tandem mass spectrometry.

  11. An Adaptive Sensor Mining Framework for Pervasive Computing Applications

    NASA Astrophysics Data System (ADS)

    Rashidi, Parisa; Cook, Diane J.

    Analyzing sensor data in pervasive computing applications brings unique challenges to the KDD community. The challenge is heightened when the underlying data source is dynamic and the patterns change. We introduce a new adaptive mining framework that detects patterns in sensor data, and more importantly, adapts to the changes in the underlying model. In our framework, the frequent and periodic patterns of data are first discovered by the Frequent and Periodic Pattern Miner (FPPM) algorithm; and then any changes in the discovered patterns over the lifetime of the system are discovered by the Pattern Adaptation Miner (PAM) algorithm, in order to adapt to the changing environment. This framework also captures vital context information present in pervasive computing applications, such as the startup triggers and temporal information. In this paper, we present a description of our mining framework and validate the approach using data collected in the CASAS smart home testbed.

  12. Supporting Solar Physics Research via Data Mining

    NASA Astrophysics Data System (ADS)

    Angryk, Rafal; Banda, J.; Schuh, M.; Ganesan Pillai, K.; Tosun, H.; Martens, P.

    2012-05-01

    In this talk we will briefly introduce three pillars of data mining (i.e. frequent patterns discovery, classification, and clustering), and discuss some possible applications of known data mining techniques which can directly benefit solar physics research. In particular, we plan to demonstrate applicability of frequent patterns discovery methods for the verification of hypotheses about co-occurrence (in space and time) of filaments and sigmoids. We will also show how classification/machine learning algorithms can be utilized to verify human-created software modules to discover individual types of solar phenomena. Finally, we will discuss applicability of clustering techniques to image data processing.

  13. Binary Coded Web Access Pattern Tree in Education Domain

    ERIC Educational Resources Information Center

    Gomathi, C.; Moorthi, M.; Duraiswamy, K.

    2008-01-01

    Web Access Pattern (WAP), which is the sequence of accesses pursued by users frequently, is a kind of interesting and useful knowledge in practice. Sequential Pattern mining is the process of applying data mining techniques to a sequential database for the purposes of discovering the correlation relationships that exist among an ordered list of…

  14. Efficient frequent pattern mining algorithm based on node sets in cloud computing environment

    NASA Astrophysics Data System (ADS)

    Billa, V. N. Vinay Kumar; Lakshmanna, K.; Rajesh, K.; Reddy, M. Praveen Kumar; Nagaraja, G.; Sudheer, K.

    2017-11-01

    The ultimate goal of Data Mining is to determine the hidden information which is useful in making decisions using the large databases collected by an organization. This Data Mining involves many tasks that are to be performed during the process. Mining frequent itemsets is the one of the most important tasks in case of transactional databases. These transactional databases contain the data in very large scale where the mining of these databases involves the consumption of physical memory and time in proportion to the size of the database. A frequent pattern mining algorithm is said to be efficient only if it consumes less memory and time to mine the frequent itemsets from the given large database. Having these points in mind in this thesis we proposed a system which mines frequent itemsets in an optimized way in terms of memory and time by using cloud computing as an important factor to make the process parallel and the application is provided as a service. A complete framework which uses a proven efficient algorithm called FIN algorithm. FIN algorithm works on Nodesets and POC (pre-order coding) tree. In order to evaluate the performance of the system we conduct the experiments to compare the efficiency of the same algorithm applied in a standalone manner and in cloud computing environment on a real time data set which is traffic accidents data set. The results show that the memory consumption and execution time taken for the process in the proposed system is much lesser than those of standalone system.

  15. SPMBR: a scalable algorithm for mining sequential patterns based on bitmaps

    NASA Astrophysics Data System (ADS)

    Xu, Xiwei; Zhang, Changhai

    2013-12-01

    Now some sequential patterns mining algorithms generate too many candidate sequences, and increase the processing cost of support counting. Therefore, we present an effective and scalable algorithm called SPMBR (Sequential Patterns Mining based on Bitmap Representation) to solve the problem of mining the sequential patterns for large databases. Our method differs from previous related works of mining sequential patterns. The main difference is that the database of sequential patterns is represented by bitmaps, and a simplified bitmap structure is presented firstly. In this paper, First the algorithm generate candidate sequences by SE(Sequence Extension) and IE(Item Extension), and then obtain all frequent sequences by comparing the original bitmap and the extended item bitmap .This method could simplify the problem of mining the sequential patterns and avoid the high processing cost of support counting. Both theories and experiments indicate that the performance of SPMBR is predominant for large transaction databases, the required memory size for storing temporal data is much less during mining process, and all sequential patterns can be mined with feasibility.

  16. An novel frequent probability pattern mining algorithm based on circuit simulation method in uncertain biological networks.

    PubMed

    He, Jieyue; Wang, Chunyan; Qiu, Kunpu; Zhong, Wei

    2014-01-01

    Motif mining has always been a hot research topic in bioinformatics. Most of current research on biological networks focuses on exact motif mining. However, due to the inevitable experimental error and noisy data, biological network data represented as the probability model could better reflect the authenticity and biological significance, therefore, it is more biological meaningful to discover probability motif in uncertain biological networks. One of the key steps in probability motif mining is frequent pattern discovery which is usually based on the possible world model having a relatively high computational complexity. In this paper, we present a novel method for detecting frequent probability patterns based on circuit simulation in the uncertain biological networks. First, the partition based efficient search is applied to the non-tree like subgraph mining where the probability of occurrence in random networks is small. Then, an algorithm of probability isomorphic based on circuit simulation is proposed. The probability isomorphic combines the analysis of circuit topology structure with related physical properties of voltage in order to evaluate the probability isomorphism between probability subgraphs. The circuit simulation based probability isomorphic can avoid using traditional possible world model. Finally, based on the algorithm of probability subgraph isomorphism, two-step hierarchical clustering method is used to cluster subgraphs, and discover frequent probability patterns from the clusters. The experiment results on data sets of the Protein-Protein Interaction (PPI) networks and the transcriptional regulatory networks of E. coli and S. cerevisiae show that the proposed method can efficiently discover the frequent probability subgraphs. The discovered subgraphs in our study contain all probability motifs reported in the experiments published in other related papers. The algorithm of probability graph isomorphism evaluation based on circuit simulation method excludes most of subgraphs which are not probability isomorphism and reduces the search space of the probability isomorphism subgraphs using the mismatch values in the node voltage set. It is an innovative way to find the frequent probability patterns, which can be efficiently applied to probability motif discovery problems in the further studies.

  17. An novel frequent probability pattern mining algorithm based on circuit simulation method in uncertain biological networks

    PubMed Central

    2014-01-01

    Background Motif mining has always been a hot research topic in bioinformatics. Most of current research on biological networks focuses on exact motif mining. However, due to the inevitable experimental error and noisy data, biological network data represented as the probability model could better reflect the authenticity and biological significance, therefore, it is more biological meaningful to discover probability motif in uncertain biological networks. One of the key steps in probability motif mining is frequent pattern discovery which is usually based on the possible world model having a relatively high computational complexity. Methods In this paper, we present a novel method for detecting frequent probability patterns based on circuit simulation in the uncertain biological networks. First, the partition based efficient search is applied to the non-tree like subgraph mining where the probability of occurrence in random networks is small. Then, an algorithm of probability isomorphic based on circuit simulation is proposed. The probability isomorphic combines the analysis of circuit topology structure with related physical properties of voltage in order to evaluate the probability isomorphism between probability subgraphs. The circuit simulation based probability isomorphic can avoid using traditional possible world model. Finally, based on the algorithm of probability subgraph isomorphism, two-step hierarchical clustering method is used to cluster subgraphs, and discover frequent probability patterns from the clusters. Results The experiment results on data sets of the Protein-Protein Interaction (PPI) networks and the transcriptional regulatory networks of E. coli and S. cerevisiae show that the proposed method can efficiently discover the frequent probability subgraphs. The discovered subgraphs in our study contain all probability motifs reported in the experiments published in other related papers. Conclusions The algorithm of probability graph isomorphism evaluation based on circuit simulation method excludes most of subgraphs which are not probability isomorphism and reduces the search space of the probability isomorphism subgraphs using the mismatch values in the node voltage set. It is an innovative way to find the frequent probability patterns, which can be efficiently applied to probability motif discovery problems in the further studies. PMID:25350277

  18. Tree-based approach for exploring marine spatial patterns with raster datasets.

    PubMed

    Liao, Xiaohan; Xue, Cunjin; Su, Fenzhen

    2017-01-01

    From multiple raster datasets to spatial association patterns, the data-mining technique is divided into three subtasks, i.e., raster dataset pretreatment, mining algorithm design, and spatial pattern exploration from the mining results. Comparison with the former two subtasks reveals that the latter remains unresolved. Confronted with the interrelated marine environmental parameters, we propose a Tree-based Approach for eXploring Marine Spatial Patterns with multiple raster datasets called TAXMarSP, which includes two models. One is the Tree-based Cascading Organization Model (TCOM), and the other is the Spatial Neighborhood-based CAlculation Model (SNCAM). TCOM designs the "Spatial node→Pattern node" from top to bottom layers to store the table-formatted frequent patterns. Together with TCOM, SNCAM considers the spatial neighborhood contributions to calculate the pattern-matching degree between the specified marine parameters and the table-formatted frequent patterns and then explores the marine spatial patterns. Using the prevalent quantification Apriori algorithm and a real remote sensing dataset from January 1998 to December 2014, a successful application of TAXMarSP to marine spatial patterns in the Pacific Ocean is described, and the obtained marine spatial patterns present not only the well-known but also new patterns to Earth scientists.

  19. Integrating Entropy and Closed Frequent Pattern Mining for Social Network Modelling and Analysis

    NASA Astrophysics Data System (ADS)

    Adnan, Muhaimenul; Alhajj, Reda; Rokne, Jon

    The recent increase in the explicitly available social networks has attracted the attention of the research community to investigate how it would be possible to benefit from such a powerful model in producing effective solutions for problems in other domains where the social network is implicit; we argue that social networks do exist around us but the key issue is how to realize and analyze them. This chapter presents a novel approach for constructing a social network model by an integrated framework that first preparing the data to be analyzed and then applies entropy and frequent closed patterns mining for network construction. For a given problem, we first prepare the data by identifying items and transactions, which arc the basic ingredients for frequent closed patterns mining. Items arc main objects in the problem and a transaction is a set of items that could exist together at one time (e.g., items purchased in one visit to the supermarket). Transactions could be analyzed to discover frequent closed patterns using any of the well-known techniques. Frequent closed patterns have the advantage that they successfully grab the inherent information content of the dataset and is applicable to a broader set of domains. Entropies of the frequent closed patterns arc used to keep the dimensionality of the feature vectors to a reasonable size; it is a kind of feature reduction process. Finally, we analyze the dynamic behavior of the constructed social network. Experiments were conducted on a synthetic dataset and on the Enron corpus email dataset. The results presented in the chapter show that social networks extracted from a feature set as frequent closed patterns successfully carry the community structure information. Moreover, for the Enron email dataset, we present an analysis to dynamically indicate the deviations from each user's individual and community profile. These indications of deviations can be very useful to identify unusual events.

  20. Careflow Mining Techniques to Explore Type 2 Diabetes Evolution.

    PubMed

    Dagliati, Arianna; Tibollo, Valentina; Cogni, Giulia; Chiovato, Luca; Bellazzi, Riccardo; Sacchi, Lucia

    2018-03-01

    In this work we describe the application of a careflow mining algorithm to detect the most frequent patterns of care in a type 2 diabetes patients cohort. The applied method enriches the detected patterns with clinical data to define temporal phenotypes across the studied population. Novel phenotypes are discovered from heterogeneous data of 424 Italian patients, and compared in terms of metabolic control and complications. Results show that careflow mining can help to summarize the complex evolution of the disease into meaningful patterns, which are also significant from a clinical point of view.

  1. Apriori Versions Based on MapReduce for Mining Frequent Patterns on Big Data.

    PubMed

    Luna, Jose Maria; Padillo, Francisco; Pechenizkiy, Mykola; Ventura, Sebastian

    2017-09-27

    Pattern mining is one of the most important tasks to extract meaningful and useful information from raw data. This task aims to extract item-sets that represent any type of homogeneity and regularity in data. Although many efficient algorithms have been developed in this regard, the growing interest in data has caused the performance of existing pattern mining techniques to be dropped. The goal of this paper is to propose new efficient pattern mining algorithms to work in big data. To this aim, a series of algorithms based on the MapReduce framework and the Hadoop open-source implementation have been proposed. The proposed algorithms can be divided into three main groups. First, two algorithms [Apriori MapReduce (AprioriMR) and iterative AprioriMR] with no pruning strategy are proposed, which extract any existing item-set in data. Second, two algorithms (space pruning AprioriMR and top AprioriMR) that prune the search space by means of the well-known anti-monotone property are proposed. Finally, a last algorithm (maximal AprioriMR) is also proposed for mining condensed representations of frequent patterns. To test the performance of the proposed algorithms, a varied collection of big data datasets have been considered, comprising up to 3 · 10#x00B9;⁸ transactions and more than 5 million of distinct single-items. The experimental stage includes comparisons against highly efficient and well-known pattern mining algorithms. Results reveal the interest of applying MapReduce versions when complex problems are considered, and also the unsuitability of this paradigm when dealing with small data.

  2. Mining Productive-Associated Periodic-Frequent Patterns in Body Sensor Data for Smart Home Care

    PubMed Central

    Ismail, Walaa N.; Hassan, Mohammad Mehedi

    2017-01-01

    The understanding of various health-oriented vital sign data generated from body sensor networks (BSNs) and discovery of the associations between the generated parameters is an important task that may assist and promote important decision making in healthcare. For example, in a smart home scenario where occupants’ health status is continuously monitored remotely, it is essential to provide the required assistance when an unusual or critical situation is detected in their vital sign data. In this paper, we present an efficient approach for mining the periodic patterns obtained from BSN data. In addition, we employ a correlation test on the generated patterns and introduce productive-associated periodic-frequent patterns as the set of correlated periodic-frequent items. The combination of these measures has the advantage of empowering healthcare providers and patients to raise the quality of diagnosis as well as improve treatment and smart care, especially for elderly people in smart homes. We develop an efficient algorithm named PPFP-growth (Productive Periodic-Frequent Pattern-growth) to discover all productive-associated periodic frequent patterns using these measures. PPFP-growth is efficient and the productiveness measure removes uncorrelated periodic items. An experimental evaluation on synthetic and real datasets shows the efficiency of the proposed PPFP-growth algorithm, which can filter a huge number of periodic patterns to reveal only the correlated ones. PMID:28445441

  3. Mining Productive-Associated Periodic-Frequent Patterns in Body Sensor Data for Smart Home Care.

    PubMed

    Ismail, Walaa N; Hassan, Mohammad Mehedi

    2017-04-26

    The understanding of various health-oriented vital sign data generated from body sensor networks (BSNs) and discovery of the associations between the generated parameters is an important task that may assist and promote important decision making in healthcare. For example, in a smart home scenario where occupants' health status is continuously monitored remotely, it is essential to provide the required assistance when an unusual or critical situation is detected in their vital sign data. In this paper, we present an efficient approach for mining the periodic patterns obtained from BSN data. In addition, we employ a correlation test on the generated patterns and introduce productive-associated periodic-frequent patterns as the set of correlated periodic-frequent items. The combination of these measures has the advantage of empowering healthcare providers and patients to raise the quality of diagnosis as well as improve treatment and smart care, especially for elderly people in smart homes. We develop an efficient algorithm named PPFP-growth (Productive Periodic-Frequent Pattern-growth) to discover all productive-associated periodic frequent patterns using these measures. PPFP-growth is efficient and the productiveness measure removes uncorrelated periodic items. An experimental evaluation on synthetic and real datasets shows the efficiency of the proposed PPFP-growth algorithm, which can filter a huge number of periodic patterns to reveal only the correlated ones.

  4. A Comparative Study of Frequent and Maximal Periodic Pattern Mining Algorithms in Spatiotemporal Databases

    NASA Astrophysics Data System (ADS)

    Obulesu, O.; Rama Mohan Reddy, A., Dr; Mahendra, M.

    2017-08-01

    Detecting regular and efficient cyclic models is the demanding activity for data analysts due to unstructured, vigorous and enormous raw information produced from web. Many existing approaches generate large candidate patterns in the occurrence of huge and complex databases. In this work, two novel algorithms are proposed and a comparative examination is performed by considering scalability and performance parameters. The first algorithm is, EFPMA (Extended Regular Model Detection Algorithm) used to find frequent sequential patterns from the spatiotemporal dataset and the second one is, ETMA (Enhanced Tree-based Mining Algorithm) for detecting effective cyclic models with symbolic database representation. EFPMA is an algorithm grows models from both ends (prefixes and suffixes) of detected patterns, which results in faster pattern growth because of less levels of database projection compared to existing approaches such as Prefixspan and SPADE. ETMA uses distinct notions to store and manage transactions data horizontally such as segment, sequence and individual symbols. ETMA exploits a partition-and-conquer method to find maximal patterns by using symbolic notations. Using this algorithm, we can mine cyclic models in full-series sequential patterns including subsection series also. ETMA reduces the memory consumption and makes use of the efficient symbolic operation. Furthermore, ETMA only records time-series instances dynamically, in terms of character, series and section approaches respectively. The extent of the pattern and proving efficiency of the reducing and retrieval techniques from synthetic and actual datasets is a really open & challenging mining problem. These techniques are useful in data streams, traffic risk analysis, medical diagnosis, DNA sequence Mining, Earthquake prediction applications. Extensive investigational outcomes illustrates that the algorithms outperforms well towards efficiency and scalability than ECLAT, STNR and MAFIA approaches.

  5. Pattern Discovery and Change Detection of Online Music Query Streams

    NASA Astrophysics Data System (ADS)

    Li, Hua-Fu

    In this paper, an efficient stream mining algorithm, called FTP-stream (Frequent Temporal Pattern mining of streams), is proposed to find the frequent temporal patterns over melody sequence streams. In the framework of our proposed algorithm, an effective bit-sequence representation is used to reduce the time and memory needed to slide the windows. The FTP-stream algorithm can calculate the support threshold in only a single pass based on the concept of bit-sequence representation. It takes the advantage of "left" and "and" operations of the representation. Experiments show that the proposed algorithm only scans the music query stream once, and runs significant faster and consumes less memory than existing algorithms, such as SWFI-stream and Moment.

  6. Customizing FP-growth algorithm to parallel mining with Charm++ library

    NASA Astrophysics Data System (ADS)

    Puścian, Marek

    2017-08-01

    This paper presents a frequent item mining algorithm that was customized to handle growing data repositories. The proposed solution applies Master Slave scheme to frequent pattern growth technique. Efficient utilization of available computation units is achieved by dynamic reallocation of tasks. Conditional frequent trees are assigned to parallel workers basing on their workload. Proposed enhancements have been successfully implemented using Charm++ library. This paper discusses results of the performance of parallelized FP-growth algorithm against different datasets. The approach has been illustrated with many experiments and measurements performed using multiprocessor and multithreaded computer.

  7. Mining Co-Location Patterns with Clustering Items from Spatial Data Sets

    NASA Astrophysics Data System (ADS)

    Zhou, G.; Li, Q.; Deng, G.; Yue, T.; Zhou, X.

    2018-05-01

    The explosive growth of spatial data and widespread use of spatial databases emphasize the need for the spatial data mining. Co-location patterns discovery is an important branch in spatial data mining. Spatial co-locations represent the subsets of features which are frequently located together in geographic space. However, the appearance of a spatial feature C is often not determined by a single spatial feature A or B but by the two spatial features A and B, that is to say where A and B appear together, C often appears. We note that this co-location pattern is different from the traditional co-location pattern. Thus, this paper presents a new concept called clustering terms, and this co-location pattern is called co-location patterns with clustering items. And the traditional algorithm cannot mine this co-location pattern, so we introduce the related concept in detail and propose a novel algorithm. This algorithm is extended by join-based approach proposed by Huang. Finally, we evaluate the performance of this algorithm.

  8. FraudMiner: A Novel Credit Card Fraud Detection Model Based on Frequent Itemset Mining

    PubMed Central

    Seeja, K. R.; Zareapoor, Masoumeh

    2014-01-01

    This paper proposes an intelligent credit card fraud detection model for detecting fraud from highly imbalanced and anonymous credit card transaction datasets. The class imbalance problem is handled by finding legal as well as fraud transaction patterns for each customer by using frequent itemset mining. A matching algorithm is also proposed to find to which pattern (legal or fraud) the incoming transaction of a particular customer is closer and a decision is made accordingly. In order to handle the anonymous nature of the data, no preference is given to any of the attributes and each attribute is considered equally for finding the patterns. The performance evaluation of the proposed model is done on UCSD Data Mining Contest 2009 Dataset (anonymous and imbalanced) and it is found that the proposed model has very high fraud detection rate, balanced classification rate, Matthews correlation coefficient, and very less false alarm rate than other state-of-the-art classifiers. PMID:25302317

  9. FraudMiner: a novel credit card fraud detection model based on frequent itemset mining.

    PubMed

    Seeja, K R; Zareapoor, Masoumeh

    2014-01-01

    This paper proposes an intelligent credit card fraud detection model for detecting fraud from highly imbalanced and anonymous credit card transaction datasets. The class imbalance problem is handled by finding legal as well as fraud transaction patterns for each customer by using frequent itemset mining. A matching algorithm is also proposed to find to which pattern (legal or fraud) the incoming transaction of a particular customer is closer and a decision is made accordingly. In order to handle the anonymous nature of the data, no preference is given to any of the attributes and each attribute is considered equally for finding the patterns. The performance evaluation of the proposed model is done on UCSD Data Mining Contest 2009 Dataset (anonymous and imbalanced) and it is found that the proposed model has very high fraud detection rate, balanced classification rate, Matthews correlation coefficient, and very less false alarm rate than other state-of-the-art classifiers.

  10. A New Approach for Mining Order-Preserving Submatrices Based on All Common Subsequences.

    PubMed

    Xue, Yun; Liao, Zhengling; Li, Meihang; Luo, Jie; Kuang, Qiuhua; Hu, Xiaohui; Li, Tiechen

    2015-01-01

    Order-preserving submatrices (OPSMs) have been applied in many fields, such as DNA microarray data analysis, automatic recommendation systems, and target marketing systems, as an important unsupervised learning model. Unfortunately, most existing methods are heuristic algorithms which are unable to reveal OPSMs entirely in NP-complete problem. In particular, deep OPSMs, corresponding to long patterns with few supporting sequences, incur explosive computational costs and are completely pruned by most popular methods. In this paper, we propose an exact method to discover all OPSMs based on frequent sequential pattern mining. First, an existing algorithm was adjusted to disclose all common subsequence (ACS) between every two row sequences, and therefore all deep OPSMs will not be missed. Then, an improved data structure for prefix tree was used to store and traverse ACS, and Apriori principle was employed to efficiently mine the frequent sequential pattern. Finally, experiments were implemented on gene and synthetic datasets. Results demonstrated the effectiveness and efficiency of this method.

  11. Dietary patterns analysis using data mining method. An application to data from the CYKIDS study.

    PubMed

    Lazarou, Chrystalleni; Karaolis, Minas; Matalas, Antonia-Leda; Panagiotakos, Demosthenes B

    2012-11-01

    Data mining is a computational method that permits the extraction of patterns from large databases. We applied the data mining approach in data from 1140 children (9-13 years), in order to derive dietary habits related to children's obesity status. Rules emerged via data mining approach revealed the detrimental influence of the increased consumption of soft dinks, delicatessen meat, sweets, fried and junk food. For example, frequent (3-5 times/week) consumption of all these foods increases the risk for being obese by 75%, whereas in children who have a similar dietary pattern, but eat >2 times/week fish and seafood the risk for obesity is reduced by 33%. In conclusion patterns revealed from data mining technique refer to specific groups of children and demonstrate the effect on the risk associated with obesity status when a single dietary habit might be modified. Thus, a more individualized approach when translating public health messages could be achieved. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  12. An efficient, versatile and scalable pattern growth approach to mine frequent patterns in unaligned protein sequences.

    PubMed

    Ye, Kai; Kosters, Walter A; Ijzerman, Adriaan P

    2007-03-15

    Pattern discovery in protein sequences is often based on multiple sequence alignments (MSA). The procedure can be computationally intensive and often requires manual adjustment, which may be particularly difficult for a set of deviating sequences. In contrast, two algorithms, PRATT2 (http//www.ebi.ac.uk/pratt/) and TEIRESIAS (http://cbcsrv.watson.ibm.com/) are used to directly identify frequent patterns from unaligned biological sequences without an attempt to align them. Here we propose a new algorithm with more efficiency and more functionality than both PRATT2 and TEIRESIAS, and discuss some of its applications to G protein-coupled receptors, a protein family of important drug targets. In this study, we designed and implemented six algorithms to mine three different pattern types from either one or two datasets using a pattern growth approach. We compared our approach to PRATT2 and TEIRESIAS in efficiency, completeness and the diversity of pattern types. Compared to PRATT2, our approach is faster, capable of processing large datasets and able to identify the so-called type III patterns. Our approach is comparable to TEIRESIAS in the discovery of the so-called type I patterns but has additional functionality such as mining the so-called type II and type III patterns and finding discriminating patterns between two datasets. The source code for pattern growth algorithms and their pseudo-code are available at http://www.liacs.nl/home/kosters/pg/.

  13. Discovering amino acid patterns on binding sites in protein complexes

    PubMed Central

    Kuo, Huang-Cheng; Ong, Ping-Lin; Lin, Jung-Chang; Huang, Jen-Peng

    2011-01-01

    Discovering amino acid (AA) patterns on protein binding sites has recently become popular. We propose a method to discover the association relationship among AAs on binding sites. Such knowledge of binding sites is very helpful in predicting protein-protein interactions. In this paper, we focus on protein complexes which have protein-protein recognition. The association rule mining technique is used to discover geographically adjacent amino acids on a binding site of a protein complex. When mining, instead of treating all AAs of binding sites as a transaction, we geographically partition AAs of binding sites in a protein complex. AAs in a partition are treated as a transaction. For the partition process, AAs on a binding site are projected from three-dimensional to two-dimensional. And then, assisted with a circular grid, AAs on the binding site are placed into grid cells. A circular grid has ten rings: a central ring, the second ring with 6 sectors, the third ring with 12 sectors, and later rings are added to four sectors in order. As for the radius of each ring, we examined the complexes and found that 10Å is a suitable range, which can be set by the user. After placing these recognition complexes on the circular grid, we obtain mining records (i.e. transactions) from each sector. A sector is regarded as a record. Finally, we use the association rule to mine these records for frequent AA patterns. If the support of an AA pattern is larger than the predetermined minimum support (i.e. threshold), it is called a frequent pattern. With these discovered patterns, we offer the biologists a novel point of view, which will improve the prediction accuracy of protein-protein recognition. In our experiments, we produced the AA patterns by data mining. As a result, we found that arginine (arg) most frequently appears on the binding sites of two proteins in the recognition protein complexes, while cysteine (cys) appears the fewest. In addition, if we discriminate the shape of binding sites between concave and convex further, we discover that patterns {arg, glu, asp} and {arg, ser, asp} on the concave shape of binding sites in a protein more frequently (i.e. higher probability) make contact with {lys} or {arg} on the convex shape of binding sites in another protein. Thus, we can confidently achieve a rate of at least 78%. On the other hand {val, gly, lys} on the convex surface of binding sites in proteins is more frequently in contact with {asp} on the concave site of another protein, and the confidence achieved is over 81%. Applying data mining in biology can reveal more facts that may otherwise be ignored or not easily discovered by the naked eye. Furthermore, we can discover more relationships among AAs on binding sites by appropriately rotating these residues on binding sites from a three-dimension to two-dimension perspective. We designed a circular grid to deposit the data, which total to 463 records consisting of AAs. Then we used the association rules to mine these records for discovering relationships. The proposed method in this paper provides an insight into the characteristics of binding sites for recognition complexes. PMID:21464838

  14. Estimating the Importance of Terrorists in a Terror Network

    NASA Astrophysics Data System (ADS)

    Elhajj, Ahmed; Elsheikh, Abdallah; Addam, Omar; Alzohbi, Mohamad; Zarour, Omar; Aksaç, Alper; Öztürk, Orkun; Özyer, Tansel; Ridley, Mick; Alhajj, Reda

    While criminals may start their activities at individual level, the same is in general not true for terrorists who are mostly organized in well established networks. The effectiveness of a terror network could be realized by watching many factors, including the volume of activities accomplished by its members, the capabilities of its members to hide, and the ability of the network to grow and to maintain its influence even after the loss of some members, even leaders. Social network analysis, data mining and machine learning techniques could play important role in measuring the effectiveness of a network in general and in particular a terror network in support of the work presented in this chapter. We present a framework that employs clustering, frequent pattern mining and some social network analysis measures to determine the effectiveness of a network. The clustering and frequent pattern mining techniques start with the adjacency matrix of the network. For clustering, we utilize entries in the table by considering each row as an object and each column as a feature. Thus features of a network member are his/her direct neighbors. We maintain the weight of links in case of weighted network links. For frequent pattern mining, we consider each row of the adjacency matrix as a transaction and each column as an item. Further, we map entries into a 0/1 scale such that every entry whose value is greater than zero is assigned the value one; entries keep the value zero otherwise. This way we can apply frequent pattern mining algorithms to determine the most influential members in a network as well as the effect of removing some members or even links between members of a network. We also investigate the effect of adding some links between members. The target is to study how the various members in the network change role as the network evolves. This is measured by applying some social network analysis measures on the network at each stage during the development. We report some interesting results related to two benchmark networks: the first is 9/11 and the second is Madrid bombing.

  15. Efficient Mining of Interesting Patterns in Large Biological Sequences

    PubMed Central

    Rashid, Md. Mamunur; Karim, Md. Rezaul; Jeong, Byeong-Soo

    2012-01-01

    Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology, however, a pattern that is not frequent may still be considered very informative if its actual support frequency exceeds the prior expectation by a large margin. In this paper, we propose a new interesting measure that can provide meaningful biological information. We also propose an efficient index-based method for mining such interesting patterns. Experimental results show that our approach can find interesting patterns within an acceptable computation time. PMID:23105928

  16. Efficient mining of interesting patterns in large biological sequences.

    PubMed

    Rashid, Md Mamunur; Karim, Md Rezaul; Jeong, Byeong-Soo; Choi, Ho-Jin

    2012-03-01

    Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology, however, a pattern that is not frequent may still be considered very informative if its actual support frequency exceeds the prior expectation by a large margin. In this paper, we propose a new interesting measure that can provide meaningful biological information. We also propose an efficient index-based method for mining such interesting patterns. Experimental results show that our approach can find interesting patterns within an acceptable computation time.

  17. Data mining and frequency analysis for licorice as a "Two-Face" herb in Chinese Formulae based on Chinese Formulae Database.

    PubMed

    Guo, Jianming; Shang, Erxin; Zhao, Jinlong; Fan, Xinsheng; Duan, Jinao; Qian, Dawei; Tao, Weiwei; Tang, Yuping

    2014-09-25

    Liquorice is the root of Glycyrrhiza uralensis Fisch. or Glycyrrhiza glabra L., Leguminosae. Licorice is described as 'National Venerable Master' in Chinese medicine and plays paradoxical roles, i.e. detoxification/strengthen efficacy and inducing/enhancing toxicity. Therefore, licorice was called "Two-Face" herb in this paper. The aim of this study is to discuss the paradoxical roles and the perspective usage of this "Two-Face" herb using data mining and frequency analysis. More than 96,000 prescriptions from Chinese Formulae Database were selected. The frequency and the prescription patterns were analyzed using Microsoft SQL Server 2000. Data mining methods (frequent itemsets) were used to analyze the regular patterns and compatibility laws of the constituent herbs in the selected prescriptions. The result showed that licorice (Radix glycyrrhizae) was the most frequently used herb in Chinese Formulae Database, other frequently used herbs including Radix Angelicae Sinensis (Dang gui), Radix et rhizoma ginseng (Ren shen), etc. Toxic herbs such as Radix aconiti lateralis praeparata (Fu zi), Rhizoma pinelliae (Ban xia) and Cinnabaris (Zhu sha) are top 3 herbs that most frequently used in combination with licorice. Radix et rhizoma ginseng (Ren shen), Poria (Fu ling), Radix Angelicae Sinensis (Dang gui) are top 3 nontoxic herbs that most frequently used in combination with licorice. Moreover, Licorice was seldom used with sargassum (Hai Zao), Herba Cirsii Japonici (Da Ji), Euphorbia kansui (Gan Sui) and Flos genkwa (Yuan Hua), which proved the description of contradictory effect of Radix glycyrrhizae and these herbs as recorded in Chinese medicine theory. This study showed the principle pattern of Chinese herbal drugs used in combination with licorice or not. The principle patterns and special compatibility laws reported here could be useful and instructive for scientific usage of licorice in clinic application. Further pharmacological and chemical researches are needed to evaluate the efficacy and the combination pattern of these Chinese herbs. The mechanism of the combination pattern of these prescriptions should also be investigated whether additive, synergistic or antagonistic effect exist using in vitro or in vivo models. Copyright © 2014 Elsevier GmbH. All rights reserved.

  18. Mining High-Dimensional Data

    NASA Astrophysics Data System (ADS)

    Wang, Wei; Yang, Jiong

    With the rapid growth of computational biology and e-commerce applications, high-dimensional data becomes very common. Thus, mining high-dimensional data is an urgent problem of great practical importance. However, there are some unique challenges for mining data of high dimensions, including (1) the curse of dimensionality and more crucial (2) the meaningfulness of the similarity measure in the high dimension space. In this chapter, we present several state-of-art techniques for analyzing high-dimensional data, e.g., frequent pattern mining, clustering, and classification. We will discuss how these methods deal with the challenges of high dimensionality.

  19. Statistical evaluation of synchronous spike patterns extracted by frequent item set mining

    PubMed Central

    Torre, Emiliano; Picado-Muiño, David; Denker, Michael; Borgelt, Christian; Grün, Sonja

    2013-01-01

    We recently proposed frequent itemset mining (FIM) as a method to perform an optimized search for patterns of synchronous spikes (item sets) in massively parallel spike trains. This search outputs the occurrence count (support) of individual patterns that are not trivially explained by the counts of any superset (closed frequent item sets). The number of patterns found by FIM makes direct statistical tests infeasible due to severe multiple testing. To overcome this issue, we proposed to test the significance not of individual patterns, but instead of their signatures, defined as the pairs of pattern size z and support c. Here, we derive in detail a statistical test for the significance of the signatures under the null hypothesis of full independence (pattern spectrum filtering, PSF) by means of surrogate data. As a result, injected spike patterns that mimic assembly activity are well detected, yielding a low false negative rate. However, this approach is prone to additionally classify patterns resulting from chance overlap of real assembly activity and background spiking as significant. These patterns represent false positives with respect to the null hypothesis of having one assembly of given signature embedded in otherwise independent spiking activity. We propose the additional method of pattern set reduction (PSR) to remove these false positives by conditional filtering. By employing stochastic simulations of parallel spike trains with correlated activity in form of injected spike synchrony in subsets of the neurons, we demonstrate for a range of parameter settings that the analysis scheme composed of FIM, PSF and PSR allows to reliably detect active assemblies in massively parallel spike trains. PMID:24167487

  20. Large Scale Frequent Pattern Mining using MPI One-Sided Model

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vishnu, Abhinav; Agarwal, Khushbu

    In this paper, we propose a work-stealing runtime --- Library for Work Stealing LibWS --- using MPI one-sided model for designing scalable FP-Growth --- {\\em de facto} frequent pattern mining algorithm --- on large scale systems. LibWS provides locality efficient and highly scalable work-stealing techniques for load balancing on a variety of data distributions. We also propose a novel communication algorithm for FP-growth data exchange phase, which reduces the communication complexity from state-of-the-art O(p) to O(f + p/f) for p processes and f frequent attributed-ids. FP-Growth is implemented using LibWS and evaluated on several work distributions and support counts. Anmore » experimental evaluation of the FP-Growth on LibWS using 4096 processes on an InfiniBand Cluster demonstrates excellent efficiency for several work distributions (87\\% efficiency for Power-law and 91% for Poisson). The proposed distributed FP-Tree merging algorithm provides 38x communication speedup on 4096 cores.« less

  1. Discovering significant evolution patterns from satellite image time series.

    PubMed

    Petitjean, François; Masseglia, Florent; Gançarski, Pierre; Forestier, Germain

    2011-12-01

    Satellite Image Time Series (SITS) provide us with precious information on land cover evolution. By studying these series of images we can both understand the changes of specific areas and discover global phenomena that spread over larger areas. Changes that can occur throughout the sensing time can spread over very long periods and may have different start time and end time depending on the location, which complicates the mining and the analysis of series of images. This work focuses on frequent sequential pattern mining (FSPM) methods, since this family of methods fits the above-mentioned issues. This family of methods consists of finding the most frequent evolution behaviors, and is actually able to extract long-term changes as well as short term ones, whenever the change may start and end. However, applying FSPM methods to SITS implies confronting two main challenges, related to the characteristics of SITS and the domain's constraints. First, satellite images associate multiple measures with a single pixel (the radiometric levels of different wavelengths corresponding to infra-red, red, etc.), which makes the search space multi-dimensional and thus requires specific mining algorithms. Furthermore, the non evolving regions, which are the vast majority and overwhelm the evolving ones, challenge the discovery of these patterns. We propose a SITS mining framework that enables discovery of these patterns despite these constraints and characteristics. Our proposal is inspired from FSPM and provides a relevant visualization principle. Experiments carried out on 35 images sensed over 20 years show the proposed approach makes it possible to extract relevant evolution behaviors.

  2. Rare itemsets mining algorithm based on RP-Tree and spark framework

    NASA Astrophysics Data System (ADS)

    Liu, Sainan; Pan, Haoan

    2018-05-01

    For the issues of the rare itemsets mining in big data, this paper proposed a rare itemsets mining algorithm based on RP-Tree and Spark framework. Firstly, it arranged the data vertically according to the transaction identifier, in order to solve the defects of scan the entire data set, the vertical datasets are divided into frequent vertical datasets and rare vertical datasets. Then, it adopted the RP-Tree algorithm to construct the frequent pattern tree that contains rare items and generate rare 1-itemsets. After that, it calculated the support of the itemsets by scanning the two vertical data sets, finally, it used the iterative process to generate rare itemsets. The experimental show that the algorithm can effectively excavate rare itemsets and have great superiority in execution time.

  3. Assessing Lightning and Wildfire Hazard by Land Properties and Cloud to Ground Lightning Data with Association Rule Mining in Alberta, Canada.

    PubMed

    Cha, DongHwan; Wang, Xin; Kim, Jeong Woo

    2017-10-23

    Hotspot analysis was implemented to find regions in the province of Alberta (Canada) with high frequency Cloud to Ground (CG) lightning strikes clustered together. Generally, hotspot regions are located in the central, central east, and south central regions of the study region. About 94% of annual lightning occurred during warm months (June to August) and the daily lightning frequency was influenced by the diurnal heating cycle. The association rule mining technique was used to investigate frequent CG lightning patterns, which were verified by similarity measurement to check the patterns' consistency. The similarity coefficient values indicated that there were high correlations throughout the entire study period. Most wildfires (about 93%) in Alberta occurred in forests, wetland forests, and wetland shrub areas. It was also found that lightning and wildfires occur in two distinct areas: frequent wildfire regions with a high frequency of lightning, and frequent wild-fire regions with a low frequency of lightning. Further, the preference index (PI) revealed locations where the wildfires occurred more frequently than in other class regions. The wildfire hazard area was estimated with the CG lightning hazard map and specific land use types.

  4. Pattern mining of user interaction logs for a post-deployment usability evaluation of a radiology PACS client.

    PubMed

    Jorritsma, Wiard; Cnossen, Fokie; Dierckx, Rudi A; Oudkerk, Matthijs; van Ooijen, Peter M A

    2016-01-01

    To perform a post-deployment usability evaluation of a radiology Picture Archiving and Communication System (PACS) client based on pattern mining of user interaction log data, and to assess the usefulness of this approach compared to a field study. All user actions performed on the PACS client were logged for four months. A data mining technique called closed sequential pattern mining was used to automatically extract frequently occurring interaction patterns from the log data. These patterns were used to identify usability issues with the PACS. The results of this evaluation were compared to the results of a field study based usability evaluation of the same PACS client. The interaction patterns revealed four usability issues: (1) the display protocols do not function properly, (2) the line measurement tool stays active until another tool is selected, rather than being deactivated after one use, (3) the PACS's built-in 3D functionality does not allow users to effectively perform certain 3D-related tasks, (4) users underuse the PACS's customization possibilities. All usability issues identified based on the log data were also found in the field study, which identified 48 issues in total. Post-deployment usability evaluation based on pattern mining of user interaction log data provides useful insights into the way users interact with the radiology PACS client. However, it reveals few usability issues compared to a field study and should therefore not be used as the sole method of usability evaluation. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  5. Assembler: Efficient Discovery of Spatial Co-evolving Patterns in Massive Geo-sensory Data.

    PubMed

    Zhang, Chao; Zheng, Yu; Ma, Xiuli; Han, Jiawei

    2015-08-01

    Recent years have witnessed the wide proliferation of geo-sensory applications wherein a bundle of sensors are deployed at different locations to cooperatively monitor the target condition. Given massive geo-sensory data, we study the problem of mining spatial co-evolving patterns (SCPs), i.e ., groups of sensors that are spatially correlated and co-evolve frequently in their readings. SCP mining is of great importance to various real-world applications, yet it is challenging because (1) the truly interesting evolutions are often flooded by numerous trivial fluctuations in the geo-sensory time series; and (2) the pattern search space is extremely large due to the spatiotemporal combinatorial nature of SCP. In this paper, we propose a two-stage method called Assembler. In the first stage, Assembler filters trivial fluctuations using wavelet transform and detects frequent evolutions for individual sensors via a segment-and-group approach. In the second stage, Assembler generates SCPs by assembling the frequent evolutions of individual sensors. Leveraging the spatial constraint, it conceptually organizes all the SCPs into a novel structure called the SCP search tree, which facilitates the effective pruning of the search space to generate SCPs efficiently. Our experiments on both real and synthetic data sets show that Assembler is effective, efficient, and scalable.

  6. Bit-Table Based Biclustering and Frequent Closed Itemset Mining in High-Dimensional Binary Data

    PubMed Central

    Király, András; Abonyi, János

    2014-01-01

    During the last decade various algorithms have been developed and proposed for discovering overlapping clusters in high-dimensional data. The two most prominent application fields in this research, proposed independently, are frequent itemset mining (developed for market basket data) and biclustering (applied to gene expression data analysis). The common limitation of both methodologies is the limited applicability for very large binary data sets. In this paper we propose a novel and efficient method to find both frequent closed itemsets and biclusters in high-dimensional binary data. The method is based on simple but very powerful matrix and vector multiplication approaches that ensure that all patterns can be discovered in a fast manner. The proposed algorithm has been implemented in the commonly used MATLAB environment and freely available for researchers. PMID:24616651

  7. The association rules search of Indonesian university graduate’s data using FP-growth algorithm

    NASA Astrophysics Data System (ADS)

    Faza, S.; Rahmat, R. F.; Nababan, E. B.; Arisandi, D.; Effendi, S.

    2018-02-01

    The attribute varieties in university graduates data have caused frustrations to the institution in finding the combinations of attributes that often emerge and have high integration between attributes. Association rules mining is a data mining technique to determine the integration of the data or the way of a data set affects another set of data. By way of explanation, there are possibilities in finding the integration of data on a large scale. Frequent Pattern-Growth (FP-Growth) algorithm is one of the association rules mining technique to determine a frequent itemset in an FP-Tree data set. From the research on the search of university graduate’s association rules, it can be concluded that the most common attributes that have high integration between them are in the combination of State-owned High School outside Medan, regular university entrance exam, GPA of 3.00 to 3.49 and over 4-year-long study duration.

  8. Determination of Abutment Pressure in Coal Mines with Extremely Thick Alluvium Stratum: A Typical Kind of Rockburst Mines in China

    NASA Astrophysics Data System (ADS)

    Zhu, Sitao; Feng, Yu; Jiang, Fuxing

    2016-05-01

    This paper investigates the abutment pressure distribution in coal mines with extremely thick alluvium stratum (ETAS), which is a typical kind of mines encountering frequent intense rockbursts in China. This occurs due to poor understanding to abutment pressure distribution pattern and the consequent inappropriate mine design. In this study, a theoretical computational model of abutment pressure for ETAS longwall panels is proposed based on the analysis of load transfer mechanisms of key stratum (KS) and ETAS. The model was applied to determine the abutment pressure distribution of LW2302S in Xinjulong Coal Mine; the results of stress and microseismic monitoring verified the rationality of this model. The calculated abutment pressure of LW2302S was also used in the terminal mining line design of LW2301N for rockburst prevention, successfully protecting the main roadway from the adverse influence of the abutment pressure.

  9. Mining moving object trajectories in location-based services for spatio-temporal database update

    NASA Astrophysics Data System (ADS)

    Guo, Danhuai; Cui, Weihong

    2008-10-01

    Advances in wireless transmission and mobile technology applied to LBS (Location-based Services) flood us with amounts of moving objects data. Vast amounts of gathered data from position sensors of mobile phones, PDAs, or vehicles hide interesting and valuable knowledge and describe the behavior of moving objects. The correlation between temporal moving patterns of moving objects and geo-feature spatio-temporal attribute was ignored, and the value of spatio-temporal trajectory data was not fully exploited too. Urban expanding or frequent town plan change bring about a large amount of outdated or imprecise data in spatial database of LBS, and they cannot be updated timely and efficiently by manual processing. In this paper we introduce a data mining approach to movement pattern extraction of moving objects, build a model to describe the relationship between movement patterns of LBS mobile objects and their environment, and put up with a spatio-temporal database update strategy in LBS database based on trajectories spatiotemporal mining. Experimental evaluation reveals excellent performance of the proposed model and strategy. Our original contribution include formulation of model of interaction between trajectory and its environment, design of spatio-temporal database update strategy based on moving objects data mining, and the experimental application of spatio-temporal database update by mining moving objects trajectories.

  10. PIPE: a protein–protein interaction passage extraction module for BioCreative challenge

    PubMed Central

    Chu, Chun-Han; Su, Yu-Chen; Chen, Chien Chin; Hsu, Wen-Lian

    2016-01-01

    Identifying the interactions between proteins mentioned in biomedical literatures is one of the frequently discussed topics of text mining in the life science field. In this article, we propose PIPE, an interaction pattern generation module used in the Collaborative Biocurator Assistant Task at BioCreative V (http://www.biocreative.org/) to capture frequent protein-protein interaction (PPI) patterns within text. We also present an interaction pattern tree (IPT) kernel method that integrates the PPI patterns with convolution tree kernel (CTK) to extract PPIs. Methods were evaluated on LLL, IEPA, HPRD50, AIMed and BioInfer corpora using cross-validation, cross-learning and cross-corpus evaluation. Empirical evaluations demonstrate that our method is effective and outperforms several well-known PPI extraction methods. Database URL: PMID:27524807

  11. Crime Pattern Analysis: A Spatial Frequent Pattern Mining Approach

    DTIC Science & Technology

    2012-05-10

    econometrics. A companion to theoretical econometrics, pages 310-330, 1988. [5] L. Anselin, J. Cohen, D. Cook, W. Gorr, and G. Tita . Spatial analyses...52] G. Mohler, M. Short, P. Brantingham, F. Schoenberg, and G. Tita . Self-exciting point process modeling of crime. Journal of the American...Systems, 9:462, 2010. [69] M. Short, P. Brantingham, A. Bertozzi, and G. Tita . Dissipation and displacement of hotspots in reaction-diffusion models

  12. A framework for periodic outlier pattern detection in time-series sequences.

    PubMed

    Rasheed, Faraz; Alhajj, Reda

    2014-05-01

    Periodic pattern detection in time-ordered sequences is an important data mining task, which discovers in the time series all patterns that exhibit temporal regularities. Periodic pattern mining has a large number of applications in real life; it helps understanding the regular trend of the data along time, and enables the forecast and prediction of future events. An interesting related and vital problem that has not received enough attention is to discover outlier periodic patterns in a time series. Outlier patterns are defined as those which are different from the rest of the patterns; outliers are not noise. While noise does not belong to the data and it is mostly eliminated by preprocessing, outliers are actual instances in the data but have exceptional characteristics compared with the majority of the other instances. Outliers are unusual patterns that rarely occur, and, thus, have lesser support (frequency of appearance) in the data. Outlier patterns may hint toward discrepancy in the data such as fraudulent transactions, network intrusion, change in customer behavior, recession in the economy, epidemic and disease biomarkers, severe weather conditions like tornados, etc. We argue that detecting the periodicity of outlier patterns might be more important in many sequences than the periodicity of regular, more frequent patterns. In this paper, we present a robust and time efficient suffix tree-based algorithm capable of detecting the periodicity of outlier patterns in a time series by giving more significance to less frequent yet periodic patterns. Several experiments have been conducted using both real and synthetic data; all aspects of the proposed approach are compared with the existing algorithm InfoMiner; the reported results demonstrate the effectiveness and applicability of the proposed approach.

  13. Fault Tolerant Frequent Pattern Mining

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shohdy, Sameh; Vishnu, Abhinav; Agrawal, Gagan

    FP-Growth algorithm is a Frequent Pattern Mining (FPM) algorithm that has been extensively used to study correlations and patterns in large scale datasets. While several researchers have designed distributed memory FP-Growth algorithms, it is pivotal to consider fault tolerant FP-Growth, which can address the increasing fault rates in large scale systems. In this work, we propose a novel parallel, algorithm-level fault-tolerant FP-Growth algorithm. We leverage algorithmic properties and MPI advanced features to guarantee an O(1) space complexity, achieved by using the dataset memory space itself for checkpointing. We also propose a recovery algorithm that can use in-memory and disk-based checkpointing,more » though in many cases the recovery can be completed without any disk access, and incurring no memory overhead for checkpointing. We evaluate our FT algorithm on a large scale InfiniBand cluster with several large datasets using up to 2K cores. Our evaluation demonstrates excellent efficiency for checkpointing and recovery in comparison to the disk-based approach. We have also observed 20x average speed-up in comparison to Spark, establishing that a well designed algorithm can easily outperform a solution based on a general fault-tolerant programming model.« less

  14. Mining co-occurrence and sequence patterns from cancer diagnoses in New York State.

    PubMed

    Wang, Yu; Hou, Wei; Wang, Fusheng

    2018-01-01

    The goal of this study is to discover disease co-occurrence and sequence patterns from large scale cancer diagnosis histories in New York State. In particular, we want to identify disparities among different patient groups. Our study will provide essential knowledge for clinical researchers to further investigate comorbidities and disease progression for improving the management of multiple diseases. We used inpatient discharge and outpatient visit records from the New York State Statewide Planning and Research Cooperative System (SPARCS) from 2011-2015. We grouped each patient's visit history to generate diagnosis sequences for seven most popular cancer types. We performed frequent disease co-occurrence mining using the Apriori algorithm, and frequent disease sequence patterns discovery using the cSPADE algorithm. Different types of cancer demonstrated distinct patterns. Disparities of both disease co-occurrence and sequence patterns were observed from patients within different age groups. There were also considerable disparities in disease co-occurrence patterns with respect to different claim types (i.e., inpatient, outpatient, emergency department and ambulatory surgery). Disparities regarding genders were mostly found where the cancer types were gender specific. Supports of most patterns were usually higher for males than for females. Compared with secondary diagnosis codes, primary diagnosis codes can convey more stable results. Two disease sequences consisting of the same diagnoses but in different orders were usually with different supports. Our results suggest that the methods adopted can generate potentially interesting and clinically meaningful disease co-occurrence and sequence patterns, and identify disparities among various patient groups. These patterns could imply comorbidities and disease progressions.

  15. Comparison between BIDE, PrefixSpan, and TRuleGrowth for Mining of Indonesian Text

    NASA Astrophysics Data System (ADS)

    Sa'adillah Maylawati, Dian; Irfan, Mohamad; Budiawan Zulfikar, Wildan

    2017-01-01

    Mining proscess for Indonesian language still be an interesting research. Multiple of words representation was claimed can keep the meaning of text better than bag of words. In this paper, we compare several sequential pattern algortihm, among others BIDE (BIDirectional Extention), PrefixSpan, and TRuleGrowth. All of those algorithm produce frequent word sequence to keep the meaning of text. However, the experiment result, with 14.006 of Indonesian tweet from Twitter, shows that BIDE can produce more efficient frequent word sequence than PrefixSpan and TRuleGrowth without missing the meaning of text. Then, the average of time process of PrefixSpan is faster than BIDE and TRuleGrowth. In the other hand, PrefixSpan and TRuleGrowth is more efficient in using memory than BIDE.

  16. Understanding Human Motion Skill with Peak Timing Synergy

    NASA Astrophysics Data System (ADS)

    Ueno, Ken; Furukawa, Koichi

    The careful observation of motion phenomena is important in understanding the skillful human motion. However, this is a difficult task due to the complexities in timing when dealing with the skilful control of anatomical structures. To investigate the dexterity of human motion, we decided to concentrate on timing with respect to motion, and we have proposed a method to extract the peak timing synergy from multivariate motion data. The peak timing synergy is defined as a frequent ordered graph with time stamps, which has nodes consisting of turning points in motion waveforms. A proposed algorithm, PRESTO automatically extracts the peak timing synergy. PRESTO comprises the following 3 processes: (1) detecting peak sequences with polygonal approximation; (2) generating peak-event sequences; and (3) finding frequent peak-event sequences using a sequential pattern mining method, generalized sequential patterns (GSP). Here, we measured right arm motion during the task of cello bowing and prepared a data set of the right shoulder and arm motion. We successfully extracted the peak timing synergy on cello bowing data set using the PRESTO algorithm, which consisted of common skills among cellists and personal skill differences. To evaluate the sequential pattern mining algorithm GSP in PRESTO, we compared the peak timing synergy by using GSP algorithm and the one by using filtering by reciprocal voting (FRV) algorithm as a non time-series method. We found that the support is 95 - 100% in GSP, while 83 - 96% in FRV and that the results by GSP are better than the one by FRV in the reproducibility of human motion. Therefore we show that sequential pattern mining approach is more effective to extract the peak timing synergy than non-time series analysis approach.

  17. A Method of Cross-Level Frequent Pattern Mining for Web-Based Instruction

    ERIC Educational Resources Information Center

    Huang, Yueh-Min; Chen, Juei-Nan; Cheng, Shu-Chen

    2007-01-01

    Due to the rise of e-Learning, more and more useful learning materials are open to public access. Therefore, an appropriate learning suggestion mechanism is an important tool to enable learners to work more efficiently. A smoother learning process increases the learning effect, avoiding unnecessarily difficult concepts and disorientation during…

  18. Improving the Scalability of an Exact Approach for Frequent Item Set Hiding

    ERIC Educational Resources Information Center

    LaMacchia, Carolyn

    2013-01-01

    Technological advances have led to the generation of large databases of organizational data recognized as an information-rich, strategic asset for internal analysis and sharing with trading partners. Data mining techniques can discover patterns in large databases including relationships considered strategically relevant to the owner of the data.…

  19. Mining the Temporal Dimension of the Information Propagation

    NASA Astrophysics Data System (ADS)

    Berlingerio, Michele; Coscia, Michele; Giannotti, Fosca

    In the last decade, Social Network Analysis has been a field in which the effort devoted from several researchers in the Data Mining area has increased very fast. Among the possible related topics, the study of the information propagation in a network attracted the interest of many researchers, also from the industrial world. However, only a few answers to the questions “How does the information propagates over a network, why and how fast?” have been discovered so far. On the other hand, these answers are of large interest, since they help in the tasks of finding experts in a network, assessing viral marketing strategies, identifying fast or slow paths of the information inside a collaborative network. In this paper we study the problem of finding frequent patterns in a network with the help of two different techniques: TAS (Temporally Annotated Sequences) mining, aimed at extracting sequential patterns where each transition between two events is annotated with a typical transition time that emerges from input data, and Graph Mining, which is helpful for locally analyzing the nodes of the networks with their properties. Finally we show preliminary results done in the direction of mining the information propagation over a network, performed on two well known email datasets, that show the power of the combination of these two approaches.

  20. Application of remote-sensing techniques to hydrologic studies in selected coal-mine areas of southeastern Kansas

    USGS Publications Warehouse

    Kenny, J.F.; McCauley, J.R.

    1983-01-01

    Disturbances resulting from intensive coal mining in the Cherry Creek basin of southeastern Kansas were investigated using color and color-infrared aerial photography in conjunction with water-quality data from simultaneously acquired samples. Imagery was used to identify the type and extent of vegetative cover on strip-mined lands and the extent and success of reclamation practices. Drainage patterns, point sources of acid mine drainage, and recharge areas for underground mines were located for onsite inspection. Comparison of these interpretations with water-quality data illustrated differences between the eastern and western parts of the Cherry Creek basin. Contamination in the eastern part is due largely to circulation of water from unreclaimed strip mines and collapse features through the network of underground mines and subsequent discharge of acidic drainage through seeps. Contamination in the western part is primarily caused by runoff and seepage from strip-mined lands in which surfaces have frequently been graded and limed but are generally devoid of mature stands of soil-anchoring vegetation. The successful use of aerial photography in the study of Cherry Creek basin indicates the potential of using remote-sensing techniques in studies of other coal-mined regions. (USGS)

  1. Study of the distribution patterns of the constituent herbs in classical Chinese medicine prescriptions treating respiratory disease by data mining methods.

    PubMed

    Fu, Xian-Jun; Song, Xu-Xia; Wei, Lin-Bo; Wang, Zhen-Guo

    2013-08-01

    To provide the distribution pattern and compatibility laws of the constituent herbs in prescriptions, for doctor's convenience to make decision in choosing correct herbs and prescriptions for treating respiratory disease. Classical prescriptions treating respiratory disease were selected from authoritative prescription books. Data mining methods (frequent itemsets and association rules) were used to analyze the regular patterns and compatibility laws of the constituent herbs in the selected prescriptions. A total of 562 prescriptions were selected to be studied. The result exhibited that, Radix glycyrrhizae was the most frequently used in 47.2% prescriptions, other frequently used were Semen armeniacae amarum, Fructus schisandrae Chinese, Herba ephedrae, and Radix ginseng. Herbal ephedrae was always coupled with Semen armeniacae amarum with the confidence of 73.3%, and many herbs were always accompanied by Radix glycyrrhizae with high confidence. More over, Fructus schisandrae Chinese, Herba ephedrae and Rhizoma pinelliae was most commonly used to treat cough, dyspnoea and associated sputum respectively besides Radix glycyrrhizae and Semen armeniacae amarum. The prescriptions treating dyspnoea often used double herb group of Herba ephedrae & Radix glycyrrhizae, while prescriptions treating sputum often used double herb group of Rhizoma pinelliae & Radix glycyrrhizae and Rhizoma pinelliae & Semen armeniacae amarum, triple herb groups of Rhizoma pinelliae & Semen armeniacae amarum & Radix glycyrrhizae and Pericarpium citri reticulatae & Rhizoma pinelliae & Radix glycyrrhizae. The prescriptions treating respiratory disease showed common compatibility laws in using herbs and special compatibility laws for treating different respiratory symptoms. These principle patterns and special compatibility laws reported here could be useful for doctors to choose correct herbs and prescriptions in treating respiratory disease.

  2. Application of Frequent Itemsets Mining to Analyze Patterns of One-Stop Visits in Taiwan

    PubMed Central

    Tu, Chun-Yi; Chen, Tzeng-Ji; Chou, Li-Fang

    2011-01-01

    Background The free choice of health care facilities without limitations on frequency of visits within the National Health Insurance in Taiwan gives rise to not only a high number of annual ambulatory visits per capita but also a unique “one-stop shopping”phenomenon, which refers to a patient' visits to several specialties of the same healthcare facility in one day. The visits to multiple physicians would increase the potential risk of polypharmacy. The aim of this study was to analyze the frequency and patterns of one-stop visits in Taiwan. Methodology/Principal Findings The claims datasets of 1 million nationally representative people within Taiwan's National Health Insurance in 2005 were used to calculate the number of patients with one-stop visits. The frequent itemsets mining was applied to compute the combination patterns of specialties in the one-stop visits. Among the total 13,682,469 ambulatory care visits in 2005, one-stop visits occurred 144,132 times and involved 296,822 visits (2.2% of all visits) by 66,294 (6.6%) persons. People tended to have this behavior with age and the percentage reached 27.5% (5,662 in 20,579) in the age group ≥80 years. In general, women were more likely to have one-stop visits than men (7.2% vs. 6.0%). Internal medicine plus ophthalmology was the most frequent combination with a visited frequency of 3,552 times (2.5%), followed by cardiology plus neurology with 3,183 times (2.2%). The most frequent three-specialty combination, cardiology plus neurology and gastroenterology, occurred only 111 times. Conclusions/Significance Without the novel computational technique, it would be hardly possible to analyze the extremely diverse combination patterns of specialties in one-stop visits. The results of the study could provide useful information either for the hospital manager to set up integrated services or for the policymaker to rebuild the health care system. PMID:21747926

  3. Design pattern mining using distributed learning automata and DNA sequence alignment.

    PubMed

    Esmaeilpour, Mansour; Naderifar, Vahideh; Shukur, Zarina

    2014-01-01

    Over the last decade, design patterns have been used extensively to generate reusable solutions to frequently encountered problems in software engineering and object oriented programming. A design pattern is a repeatable software design solution that provides a template for solving various instances of a general problem. This paper describes a new method for pattern mining, isolating design patterns and relationship between them; and a related tool, DLA-DNA for all implemented pattern and all projects used for evaluation. DLA-DNA achieves acceptable precision and recall instead of other evaluated tools based on distributed learning automata (DLA) and deoxyribonucleic acid (DNA) sequences alignment. The proposed method mines structural design patterns in the object oriented source code and extracts the strong and weak relationships between them, enabling analyzers and programmers to determine the dependency rate of each object, component, and other section of the code for parameter passing and modular programming. The proposed model can detect design patterns better that available other tools those are Pinot, PTIDEJ and DPJF; and the strengths of their relationships. The result demonstrate that whenever the source code is build standard and non-standard, based on the design patterns, then the result of the proposed method is near to DPJF and better that Pinot and PTIDEJ. The proposed model is tested on the several source codes and is compared with other related models and available tools those the results show the precision and recall of the proposed method, averagely 20% and 9.6% are more than Pinot, 27% and 31% are more than PTIDEJ and 3.3% and 2% are more than DPJF respectively. The primary idea of the proposed method is organized in two following steps: the first step, elemental design patterns are identified, while at the second step, is composed to recognize actual design patterns.

  4. Assessing Lightning and Wildfire Hazard by Land Properties and Cloud to Ground Lightning Data with Association Rule Mining in Alberta, Canada

    PubMed Central

    Cha, DongHwan; Wang, Xin; Kim, Jeong Woo

    2017-01-01

    Hotspot analysis was implemented to find regions in the province of Alberta (Canada) with high frequency Cloud to Ground (CG) lightning strikes clustered together. Generally, hotspot regions are located in the central, central east, and south central regions of the study region. About 94% of annual lightning occurred during warm months (June to August) and the daily lightning frequency was influenced by the diurnal heating cycle. The association rule mining technique was used to investigate frequent CG lightning patterns, which were verified by similarity measurement to check the patterns’ consistency. The similarity coefficient values indicated that there were high correlations throughout the entire study period. Most wildfires (about 93%) in Alberta occurred in forests, wetland forests, and wetland shrub areas. It was also found that lightning and wildfires occur in two distinct areas: frequent wildfire regions with a high frequency of lightning, and frequent wild-fire regions with a low frequency of lightning. Further, the preference index (PI) revealed locations where the wildfires occurred more frequently than in other class regions. The wildfire hazard area was estimated with the CG lightning hazard map and specific land use types. PMID:29065564

  5. COPS: Detecting Co-Occurrence and Spatial Arrangement of Transcription Factor Binding Motifs in Genome-Wide Datasets

    PubMed Central

    Lohmann, Ingrid

    2012-01-01

    In multi-cellular organisms, spatiotemporal activity of cis-regulatory DNA elements depends on their occupancy by different transcription factors (TFs). In recent years, genome-wide ChIP-on-Chip, ChIP-Seq and DamID assays have been extensively used to unravel the combinatorial interaction of TFs with cis-regulatory modules (CRMs) in the genome. Even though genome-wide binding profiles are increasingly becoming available for different TFs, single TF binding profiles are in most cases not sufficient for dissecting complex regulatory networks. Thus, potent computational tools detecting statistically significant and biologically relevant TF-motif co-occurrences in genome-wide datasets are essential for analyzing context-dependent transcriptional regulation. We have developed COPS (Co-Occurrence Pattern Search), a new bioinformatics tool based on a combination of association rules and Markov chain models, which detects co-occurring TF binding sites (BSs) on genomic regions of interest. COPS scans DNA sequences for frequent motif patterns using a Frequent-Pattern tree based data mining approach, which allows efficient performance of the software with respect to both data structure and implementation speed, in particular when mining large datasets. Since transcriptional gene regulation very often relies on the formation of regulatory protein complexes mediated by closely adjoining TF binding sites on CRMs, COPS additionally detects preferred short distance between co-occurring TF motifs. The performance of our software with respect to biological significance was evaluated using three published datasets containing genomic regions that are independently bound by several TFs involved in a defined biological process. In sum, COPS is a fast, efficient and user-friendly tool mining statistically and biologically significant TFBS co-occurrences and therefore allows the identification of TFs that combinatorially regulate gene expression. PMID:23272209

  6. Spatio-Temporal Mining of PolSAR Satellite Image Time Series

    NASA Astrophysics Data System (ADS)

    Julea, A.; Meger, N.; Trouve, E.; Bolon, Ph.; Rigotti, C.; Fallourd, R.; Nicolas, J.-M.; Vasile, G.; Gay, M.; Harant, O.; Ferro-Famil, L.

    2010-12-01

    This paper presents an original data mining approach for describing Satellite Image Time Series (SITS) spatially and temporally. It relies on pixel-based evolution and sub-evolution extraction. These evolutions, namely the frequent grouped sequential patterns, are required to cover a minimum surface and to affect pixels that are sufficiently connected. These spatial constraints are actively used to face large data volumes and to select evolutions making sense for end-users. In this paper, a specific application to fully polarimetric SAR image time series is presented. Preliminary experiments performed on a RADARSAT-2 SITS covering the Chamonix Mont-Blanc test-site are used to illustrate the proposed approach.

  7. Differentially Private Frequent Subgraph Mining

    PubMed Central

    Xu, Shengzhi; Xiong, Li; Cheng, Xiang; Xiao, Ke

    2016-01-01

    Mining frequent subgraphs from a collection of input graphs is an important topic in data mining research. However, if the input graphs contain sensitive information, releasing frequent subgraphs may pose considerable threats to individual's privacy. In this paper, we study the problem of frequent subgraph mining (FGM) under the rigorous differential privacy model. We introduce a novel differentially private FGM algorithm, which is referred to as DFG. In this algorithm, we first privately identify frequent subgraphs from input graphs, and then compute the noisy support of each identified frequent subgraph. In particular, to privately identify frequent subgraphs, we present a frequent subgraph identification approach which can improve the utility of frequent subgraph identifications through candidates pruning. Moreover, to compute the noisy support of each identified frequent subgraph, we devise a lattice-based noisy support derivation approach, where a series of methods has been proposed to improve the accuracy of the noisy supports. Through formal privacy analysis, we prove that our DFG algorithm satisfies ε-differential privacy. Extensive experimental results on real datasets show that the DFG algorithm can privately find frequent subgraphs with high data utility. PMID:27616876

  8. Analysis of Human Mobility Based on Cellular Data

    NASA Astrophysics Data System (ADS)

    Arifiansyah, F.; Saptawati, G. A. P.

    2017-01-01

    Nowadays not only adult but even teenager and children have then own mobile phones. This phenomena indicates that the mobile phone becomes an important part of everyday’s life. Based on these indication, the amount of cellular data also increased rapidly. Cellular data defined as the data that records communication among mobile phone users. Cellular data is easy to obtain because the telecommunications company had made a record of the data for the billing system of the company. Billing data keeps a log of the users cellular data usage each time. We can obtained information from the data about communication between users. Through data visualization process, an interesting pattern can be seen in the raw cellular data, so that users can obtain prior knowledge to perform data analysis. Cellular data processing can be done using data mining to find out human mobility patterns and on the existing data. In this paper, we use frequent pattern mining and finding association rules to observe the relation between attributes in cellular data and then visualize them. We used weka tools for finding the rules in stage of data mining. Generally, the utilization of cellular data can provide supporting information for the decision making process and become a data support to provide solutions and information needed by the decision makers.

  9. Informatic innovations in glycobiology: relevance to drug discovery.

    PubMed

    Mamitsuka, Hiroshi

    2008-02-01

    The recent development and applications of tree-based informatics on glycans have accelerated the biological analysis on glycans, particularly from structural viewpoints. We review three major aspects of recent informatics innovations on glycan structures: maturity of well-organized databases on glycan structures linking with other biological information, implementation of glycan structure matching algorithms and extensive development of methods for mining frequent patterns from glycan structures.

  10. Template for preparation of papers for IEEE sponsored conferences & symposia.

    PubMed

    Sacchi, L; Dagliati, A; Tibollo, V; Leporati, P; De Cata, P; Cerra, C; Chiovato, L; Bellazzi, R

    2015-01-01

    To improve the access to medical information is necessary to design and implement integrated informatics techniques aimed to gather data from different and heterogeneous sources. This paper describes the technologies used to integrate data coming from the electronic medical record of the IRCCS Fondazione Maugeri (FSM) hospital of Pavia, Italy, and combines them with administrative, pharmacy drugs purchase coming from the local healthcare agency (ASL) of the Pavia area and environmental open data of the same region. The integration process is focused on data coming from a cohort of one thousand patients diagnosed with Type 2 Diabetes Mellitus (T2DM). Data analysis and temporal data mining techniques have been integrated to enhance the initial dataset allowing the possibility to stratify patients using further information coming from the mined data like behavioral patterns of prescription-related drug purchases and other frequent clinical temporal patterns, through the use of an intuitive dashboard controlled system.

  11. Visualizing frequent patterns in large multivariate time series

    NASA Astrophysics Data System (ADS)

    Hao, M.; Marwah, M.; Janetzko, H.; Sharma, R.; Keim, D. A.; Dayal, U.; Patnaik, D.; Ramakrishnan, N.

    2011-01-01

    The detection of previously unknown, frequently occurring patterns in time series, often called motifs, has been recognized as an important task. However, it is difficult to discover and visualize these motifs as their numbers increase, especially in large multivariate time series. To find frequent motifs, we use several temporal data mining and event encoding techniques to cluster and convert a multivariate time series to a sequence of events. Then we quantify the efficiency of the discovered motifs by linking them with a performance metric. To visualize frequent patterns in a large time series with potentially hundreds of nested motifs on a single display, we introduce three novel visual analytics methods: (1) motif layout, using colored rectangles for visualizing the occurrences and hierarchical relationships of motifs in a multivariate time series, (2) motif distortion, for enlarging or shrinking motifs as appropriate for easy analysis and (3) motif merging, to combine a number of identical adjacent motif instances without cluttering the display. Analysts can interactively optimize the degree of distortion and merging to get the best possible view. A specific motif (e.g., the most efficient or least efficient motif) can be quickly detected from a large time series for further investigation. We have applied these methods to two real-world data sets: data center cooling and oil well production. The results provide important new insights into the recurring patterns.

  12. A construction scheme of web page comment information extraction system based on frequent subtree mining

    NASA Astrophysics Data System (ADS)

    Zhang, Xiaowen; Chen, Bingfeng

    2017-08-01

    Based on the frequent sub-tree mining algorithm, this paper proposes a construction scheme of web page comment information extraction system based on frequent subtree mining, referred to as FSM system. The entire system architecture and the various modules to do a brief introduction, and then the core of the system to do a detailed description, and finally give the system prototype.

  13. Design Pattern Mining Using Distributed Learning Automata and DNA Sequence Alignment

    PubMed Central

    Esmaeilpour, Mansour; Naderifar, Vahideh; Shukur, Zarina

    2014-01-01

    Context Over the last decade, design patterns have been used extensively to generate reusable solutions to frequently encountered problems in software engineering and object oriented programming. A design pattern is a repeatable software design solution that provides a template for solving various instances of a general problem. Objective This paper describes a new method for pattern mining, isolating design patterns and relationship between them; and a related tool, DLA-DNA for all implemented pattern and all projects used for evaluation. DLA-DNA achieves acceptable precision and recall instead of other evaluated tools based on distributed learning automata (DLA) and deoxyribonucleic acid (DNA) sequences alignment. Method The proposed method mines structural design patterns in the object oriented source code and extracts the strong and weak relationships between them, enabling analyzers and programmers to determine the dependency rate of each object, component, and other section of the code for parameter passing and modular programming. The proposed model can detect design patterns better that available other tools those are Pinot, PTIDEJ and DPJF; and the strengths of their relationships. Results The result demonstrate that whenever the source code is build standard and non-standard, based on the design patterns, then the result of the proposed method is near to DPJF and better that Pinot and PTIDEJ. The proposed model is tested on the several source codes and is compared with other related models and available tools those the results show the precision and recall of the proposed method, averagely 20% and 9.6% are more than Pinot, 27% and 31% are more than PTIDEJ and 3.3% and 2% are more than DPJF respectively. Conclusion The primary idea of the proposed method is organized in two following steps: the first step, elemental design patterns are identified, while at the second step, is composed to recognize actual design patterns. PMID:25243670

  14. The Interaction Network Ontology-supported modeling and mining of complex interactions represented with multiple keywords in biomedical literature.

    PubMed

    Özgür, Arzucan; Hur, Junguk; He, Yongqun

    2016-01-01

    The Interaction Network Ontology (INO) logically represents biological interactions, pathways, and networks. INO has been demonstrated to be valuable in providing a set of structured ontological terms and associated keywords to support literature mining of gene-gene interactions from biomedical literature. However, previous work using INO focused on single keyword matching, while many interactions are represented with two or more interaction keywords used in combination. This paper reports our extension of INO to include combinatory patterns of two or more literature mining keywords co-existing in one sentence to represent specific INO interaction classes. Such keyword combinations and related INO interaction type information could be automatically obtained via SPARQL queries, formatted in Excel format, and used in an INO-supported SciMiner, an in-house literature mining program. We studied the gene interaction sentences from the commonly used benchmark Learning Logic in Language (LLL) dataset and one internally generated vaccine-related dataset to identify and analyze interaction types containing multiple keywords. Patterns obtained from the dependency parse trees of the sentences were used to identify the interaction keywords that are related to each other and collectively represent an interaction type. The INO ontology currently has 575 terms including 202 terms under the interaction branch. The relations between the INO interaction types and associated keywords are represented using the INO annotation relations: 'has literature mining keywords' and 'has keyword dependency pattern'. The keyword dependency patterns were generated via running the Stanford Parser to obtain dependency relation types. Out of the 107 interactions in the LLL dataset represented with two-keyword interaction types, 86 were identified by using the direct dependency relations. The LLL dataset contained 34 gene regulation interaction types, each of which associated with multiple keywords. A hierarchical display of these 34 interaction types and their ancestor terms in INO resulted in the identification of specific gene-gene interaction patterns from the LLL dataset. The phenomenon of having multi-keyword interaction types was also frequently observed in the vaccine dataset. By modeling and representing multiple textual keywords for interaction types, the extended INO enabled the identification of complex biological gene-gene interactions represented with multiple keywords.

  15. Comparative data mining analysis for information retrieval of MODIS images: monitoring lake turbidity changes at Lake Okeechobee, Florida

    NASA Astrophysics Data System (ADS)

    Chang, Ni-Bin; Daranpob, Ammarin; Yang, Y. Jeffrey; Jin, Kang-Ren

    2009-09-01

    In the remote sensing field, a frequently recurring question is: Which computational intelligence or data mining algorithms are most suitable for the retrieval of essential information given that most natural systems exhibit very high non-linearity. Among potential candidates might be empirical regression, neural network model, support vector machine, genetic algorithm/genetic programming, analytical equation, etc. This paper compares three types of data mining techniques, including multiple non-linear regression, artificial neural networks, and genetic programming, for estimating multi-temporal turbidity changes following hurricane events at Lake Okeechobee, Florida. This retrospective analysis aims to identify how the major hurricanes impacted the water quality management in 2003-2004. The Moderate Resolution Imaging Spectroradiometer (MODIS) Terra 8-day composite imageries were used to retrieve the spatial patterns of turbidity distributions for comparison against the visual patterns discernible in the in-situ observations. By evaluating four statistical parameters, the genetic programming model was finally selected as the most suitable data mining tool for classification in which the MODIS band 1 image and wind speed were recognized as the major determinants by the model. The multi-temporal turbidity maps generated before and after the major hurricane events in 2003-2004 showed that turbidity levels were substantially higher after hurricane episodes. The spatial patterns of turbidity confirm that sediment-laden water travels to the shore where it reduces the intensity of the light necessary to submerged plants for photosynthesis. This reduction results in substantial loss of biomass during the post-hurricane period.

  16. Towards comprehensive structural motif mining for better fold annotation in the "twilight zone" of sequence dissimilarity

    PubMed Central

    Jia, Yi; Huan, Jun; Buhr, Vincent; Zhang, Jintao; Carayannopoulos, Leonidas N

    2009-01-01

    Background Automatic identification of structure fingerprints from a group of diverse protein structures is challenging, especially for proteins whose divergent amino acid sequences may fall into the "twilight-" or "midnight-" zones where pair-wise sequence identities to known sequences fall below 25% and sequence-based functional annotations often fail. Results Here we report a novel graph database mining method and demonstrate its application to protein structure pattern identification and structure classification. The biologic motivation of our study is to recognize common structure patterns in "immunoevasins", proteins mediating virus evasion of host immune defense. Our experimental study, using both viral and non-viral proteins, demonstrates the efficiency and efficacy of the proposed method. Conclusion We present a theoretic framework, offer a practical software implementation for incorporating prior domain knowledge, such as substitution matrices as studied here, and devise an efficient algorithm to identify approximate matched frequent subgraphs. By doing so, we significantly expanded the analytical power of sophisticated data mining algorithms in dealing with large volume of complicated and noisy protein structure data. And without loss of generality, choice of appropriate compatibility matrices allows our method to be easily employed in domains where subgraph labels have some uncertainty. PMID:19208148

  17. A review of contrast pattern based data mining

    NASA Astrophysics Data System (ADS)

    Zhu, Shiwei; Ju, Meilong; Yu, Junfeng; Cai, Binlei; Wang, Aiping

    2015-07-01

    Contrast pattern based data mining is concerned with the mining of patterns and models that contrast two or more datasets. Contrast patterns can describe similarities or differences between the datasets. They represent strong contrast knowledge and have been shown to be very successful for constructing accurate and robust clusters and classifiers. The increasing use of contrast pattern data mining has initiated a great deal of research and development attempts in the field of data mining. A comprehensive revision on the existing contrast pattern based data mining research is given in this paper. They are generally categorized into background and representation, definitions and mining algorithms, contrast pattern based classification, clustering, and other applications, the research trends in future. The primary of this paper is to server as a glossary for interested researchers to have an overall picture on the current contrast based data mining development and identify their potential research direction to future investigation.

  18. Large-Scale Constraint-Based Pattern Mining

    ERIC Educational Resources Information Center

    Zhu, Feida

    2009-01-01

    We studied the problem of constraint-based pattern mining for three different data formats, item-set, sequence and graph, and focused on mining patterns of large sizes. Colossal patterns in each data formats are studied to discover pruning properties that are useful for direct mining of these patterns. For item-set data, we observed robustness of…

  19. Data mining of space heating system performance in affordable housing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ren, Xiaoxin; Yan, Da; Hong, Tianzhen

    The space heating in residential buildings accounts for a considerable amount of the primary energy use. Therefore, understanding the operation and performance of space heating systems becomes crucial in improving occupant comfort while reducing energy use. This study investigated the behavior of occupants adjusting their thermostat settings and heating system operations in a 62-unit affordable housing complex in Revere, Massachusetts, USA. The data mining methods, including clustering approach and decision trees, were used to ascertain occupant behavior patterns. Data tabulating ON/OFF space heating states was assessed, to provide a better understanding of the intermittent operation of space heating systems inmore » terms of system cycling frequency and the duration of each operation. The decision tree was used to verify the link between room temperature settings, house and heating system characteristics and the heating energy use. The results suggest that the majority of apartments show fairly constant room temperature profiles with limited variations during a day or between weekday and weekend. Data clustering results revealed six typical patterns of room temperature profiles during the heating season. Space heating systems cycled more frequently than anticipated due to a tight range of room thermostat settings and potentially oversized heating capacities. In conclusion, from this study affirm data mining techniques are an effective method to analyze large datasets and extract hidden patterns to inform design and improve operations.« less

  20. Data mining of space heating system performance in affordable housing

    DOE PAGES

    Ren, Xiaoxin; Yan, Da; Hong, Tianzhen

    2015-02-16

    The space heating in residential buildings accounts for a considerable amount of the primary energy use. Therefore, understanding the operation and performance of space heating systems becomes crucial in improving occupant comfort while reducing energy use. This study investigated the behavior of occupants adjusting their thermostat settings and heating system operations in a 62-unit affordable housing complex in Revere, Massachusetts, USA. The data mining methods, including clustering approach and decision trees, were used to ascertain occupant behavior patterns. Data tabulating ON/OFF space heating states was assessed, to provide a better understanding of the intermittent operation of space heating systems inmore » terms of system cycling frequency and the duration of each operation. The decision tree was used to verify the link between room temperature settings, house and heating system characteristics and the heating energy use. The results suggest that the majority of apartments show fairly constant room temperature profiles with limited variations during a day or between weekday and weekend. Data clustering results revealed six typical patterns of room temperature profiles during the heating season. Space heating systems cycled more frequently than anticipated due to a tight range of room thermostat settings and potentially oversized heating capacities. In conclusion, from this study affirm data mining techniques are an effective method to analyze large datasets and extract hidden patterns to inform design and improve operations.« less

  1. [Exploring the clinical characters of Shugan Jieyu capsule through text mining].

    PubMed

    Pu, Zheng-Ping; Xia, Jiang-Ming; Xie, Wei; He, Jin-Cai

    2017-09-01

    The study was main to explore the clinical characters of Shugan Jieyu capsule through text mining. The data sets of Shugan Jieyu capsule were downloaded from CMCC database by the method of literature retrieved from May 2009 to Jan 2016. Rules of Chinese medical patterns, diseases, symptoms and combination treatment were mined out by data slicing algorithm, and they were demonstrated in frequency tables and two dimension based network. Then totally 190 literature were recruited. The outcomess suggested that SC was most frequently correlated with liver Qi stagnation. Primary depression, depression due to brain disease, concomitant depression followed by physical diseases, concomitant depression followed by schizophrenia and functional dyspepsia were main diseases treated by Shugan Jieyu capsule. Symptoms like low mood, psychic anxiety, somatic anxiety and dysfunction of automatic nerve were mainy relieved bv Shugan Jieyu capsule.For combination treatment. Shugan Jieyu capsule was most commonly used with paroxetine, sertraline and fluoxetine. The research suggested that syndrome types and mining results of Shugan Jieyu capsule were almost the same as its instructions. Syndrome of malnutrition of heart spirit was the potential Chinese medical pattern of Shugan Jieyu capsule. Primary comorbid anxiety and depression, concomitant comorbid anxiety and depression followed by physical diseases, and postpartum depression were potential diseases treated by Shugan Jieyu capsule.For combination treatment, Shugan Jieyu capsule was most commonly used with paroxetine, sertraline and fluoxetine. Copyright© by the Chinese Pharmaceutical Association.

  2. 76 FR 35801 - Examinations of Work Areas in Underground Coal Mines and Pattern of Violations

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-06-20

    ..., 1219-AB73 Examinations of Work Areas in Underground Coal Mines and Pattern of Violations AGENCY: Mine... public hearings on the Agency's proposed rules for Examinations of Work Areas in Underground Coal Mines... Underground Coal Mines' submissions, and with ``RIN 1219-AB73'' for Pattern of Violations' submissions...

  3. Mining algorithm for association rules in big data based on Hadoop

    NASA Astrophysics Data System (ADS)

    Fu, Chunhua; Wang, Xiaojing; Zhang, Lijun; Qiao, Liying

    2018-04-01

    In order to solve the problem that the traditional association rules mining algorithm has been unable to meet the mining needs of large amount of data in the aspect of efficiency and scalability, take FP-Growth as an example, the algorithm is realized in the parallelization based on Hadoop framework and Map Reduce model. On the basis, it is improved using the transaction reduce method for further enhancement of the algorithm's mining efficiency. The experiment, which consists of verification of parallel mining results, comparison on efficiency between serials and parallel, variable relationship between mining time and node number and between mining time and data amount, is carried out in the mining results and efficiency by Hadoop clustering. Experiments show that the paralleled FP-Growth algorithm implemented is able to accurately mine frequent item sets, with a better performance and scalability. It can be better to meet the requirements of big data mining and efficiently mine frequent item sets and association rules from large dataset.

  4. A Segment-Based Trajectory Similarity Measure in the Urban Transportation Systems.

    PubMed

    Mao, Yingchi; Zhong, Haishi; Xiao, Xianjian; Li, Xiaofang

    2017-03-06

    With the rapid spread of built-in GPS handheld smart devices, the trajectory data from GPS sensors has grown explosively. Trajectory data has spatio-temporal characteristics and rich information. Using trajectory data processing techniques can mine the patterns of human activities and the moving patterns of vehicles in the intelligent transportation systems. A trajectory similarity measure is one of the most important issues in trajectory data mining (clustering, classification, frequent pattern mining, etc.). Unfortunately, the main similarity measure algorithms with the trajectory data have been found to be inaccurate, highly sensitive of sampling methods, and have low robustness for the noise data. To solve the above problems, three distances and their corresponding computation methods are proposed in this paper. The point-segment distance can decrease the sensitivity of the point sampling methods. The prediction distance optimizes the temporal distance with the features of trajectory data. The segment-segment distance introduces the trajectory shape factor into the similarity measurement to improve the accuracy. The three kinds of distance are integrated with the traditional dynamic time warping algorithm (DTW) algorithm to propose a new segment-based dynamic time warping algorithm (SDTW). The experimental results show that the SDTW algorithm can exhibit about 57%, 86%, and 31% better accuracy than the longest common subsequence algorithm (LCSS), and edit distance on real sequence algorithm (EDR) , and DTW, respectively, and that the sensitivity to the noise data is lower than that those algorithms.

  5. 30 CFR 104.1 - Purpose and scope.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... Resources MINE SAFETY AND HEALTH ADMINISTRATION, DEPARTMENT OF LABOR PATTERN OF VIOLATIONS PATTERN OF... whether a mine operator has established a pattern of significant and substantial (S&S) violations at a mine. It implements section 104(e) of the Federal Mine Safety and Health Act of 1977 (Act) by...

  6. Mining Spatiotemporal Patterns of the Elder's Daily Movement

    NASA Astrophysics Data System (ADS)

    Chen, C. R.; Chen, C. F.; Liu, M. E.; Tsai, S. J.; Son, N. T.; Kinh, L. V.

    2016-06-01

    With rapid developments in wearable device technology, a vast amount of spatiotemporal data, such as people's movement and physical activities, are generated. Information derived from the data reveals important knowledge that can contribute a long-term care and psychological assessment of the elders' living condition especially in long-term care institutions. This study aims to develop a method to investigate the spatial-temporal movement patterns of the elders with their outdoor trajectory information. To achieve the goal, GPS based location data of the elderly subjects from long-term care institutions are collected and analysed with geographic information system (GIS). A GIS statistical model is developed to mine the elderly subjects' spatiotemporal patterns with the location data and represent their daily movement pattern at particular time. The proposed method first finds the meaningful trajectory and extracts the frequent patterns from the time-stamp location data. Then, a density-based clustering method is used to identify the major moving range and the gather/stay hotspot in both spatial and temporal dimensions. The preliminary results indicate that the major moving area of the elderly people encompasses their dorm and has a short moving distance who often stay in the same site. Subjects' outdoor appearance are corresponded to their life routine. The results can be useful for understanding elders' social network construction, risky area identification and medical care monitoring.

  7. Mining of high utility-probability sequential patterns from uncertain databases

    PubMed Central

    Zhang, Binbin; Fournier-Viger, Philippe; Li, Ting

    2017-01-01

    High-utility sequential pattern mining (HUSPM) has become an important issue in the field of data mining. Several HUSPM algorithms have been designed to mine high-utility sequential patterns (HUPSPs). They have been applied in several real-life situations such as for consumer behavior analysis and event detection in sensor networks. Nonetheless, most studies on HUSPM have focused on mining HUPSPs in precise data. But in real-life, uncertainty is an important factor as data is collected using various types of sensors that are more or less accurate. Hence, data collected in a real-life database can be annotated with existing probabilities. This paper presents a novel pattern mining framework called high utility-probability sequential pattern mining (HUPSPM) for mining high utility-probability sequential patterns (HUPSPs) in uncertain sequence databases. A baseline algorithm with three optional pruning strategies is presented to mine HUPSPs. Moroever, to speed up the mining process, a projection mechanism is designed to create a database projection for each processed sequence, which is smaller than the original database. Thus, the number of unpromising candidates can be greatly reduced, as well as the execution time for mining HUPSPs. Substantial experiments both on real-life and synthetic datasets show that the designed algorithm performs well in terms of runtime, number of candidates, memory usage, and scalability for different minimum utility and minimum probability thresholds. PMID:28742847

  8. Genetic Programming and Frequent Itemset Mining to Identify Feature Selection Patterns of iEEG and fMRI Epilepsy Data

    PubMed Central

    Smart, Otis; Burrell, Lauren

    2014-01-01

    Pattern classification for intracranial electroencephalogram (iEEG) and functional magnetic resonance imaging (fMRI) signals has furthered epilepsy research toward understanding the origin of epileptic seizures and localizing dysfunctional brain tissue for treatment. Prior research has demonstrated that implicitly selecting features with a genetic programming (GP) algorithm more effectively determined the proper features to discern biomarker and non-biomarker interictal iEEG and fMRI activity than conventional feature selection approaches. However for each the iEEG and fMRI modalities, it is still uncertain whether the stochastic properties of indirect feature selection with a GP yield (a) consistent results within a patient data set and (b) features that are specific or universal across multiple patient data sets. We examined the reproducibility of implicitly selecting features to classify interictal activity using a GP algorithm by performing several selection trials and subsequent frequent itemset mining (FIM) for separate iEEG and fMRI epilepsy patient data. We observed within-subject consistency and across-subject variability with some small similarity for selected features, indicating a clear need for patient-specific features and possible need for patient-specific feature selection or/and classification. For the fMRI, using nearest-neighbor classification and 30 GP generations, we obtained over 60% median sensitivity and over 60% median selectivity. For the iEEG, using nearest-neighbor classification and 30 GP generations, we obtained over 65% median sensitivity and over 65% median selectivity except one patient. PMID:25580059

  9. Building associations between markers of environmental stressors and adverse human health impacts using frequent itemset mining

    EPA Science Inventory

    Building associations between markers of exposure and effect using frequent itemset mining The human-health impact of environmental contaminant exposures is unclear. While some exposure-effect relationships are well studied, health effects are unknown for the vast majority of the...

  10. Mining Recent Temporal Patterns for Event Detection in Multivariate Time Series Data

    PubMed Central

    Batal, Iyad; Fradkin, Dmitriy; Harrison, James; Moerchen, Fabian; Hauskrecht, Milos

    2015-01-01

    Improving the performance of classifiers using pattern mining techniques has been an active topic of data mining research. In this work we introduce the recent temporal pattern mining framework for finding predictive patterns for monitoring and event detection problems in complex multivariate time series data. This framework first converts time series into time-interval sequences of temporal abstractions. It then constructs more complex temporal patterns backwards in time using temporal operators. We apply our framework to health care data of 13,558 diabetic patients and show its benefits by efficiently finding useful patterns for detecting and diagnosing adverse medical conditions that are associated with diabetes. PMID:25937993

  11. Frequency and pattern of Chinese herbal medicine prescriptions for urticaria in Taiwan during 2009: analysis of the national health insurance database

    PubMed Central

    2013-01-01

    Background Large-scale pharmaco-epidemiological studies of Chinese herbal medicine (CHM) for treatment of urticaria are few, even though clinical trials showed some CHM are effective. The purpose of this study was to explore the frequencies and patterns of CHM prescriptions for urticaria by analysing the population-based CHM database in Taiwan. Methods This study was linked to and processed through the complete traditional CHM database of the National Health Insurance Research Database in Taiwan during 2009. We calculated the frequencies and patterns of CHM prescriptions used for treatment of urticaria, of which the diagnosis was defined as the single ICD-9 Code of 708. Frequent itemset mining, as applied to data mining, was used to analyse co-prescription of CHM for patients with urticaria. Results There were 37,386 subjects who visited traditional Chinese Medicine clinics for urticaria in Taiwan during 2009 and received a total of 95,765 CHM prescriptions. Subjects between 18 and 35 years of age comprised the largest number of those treated (32.76%). In addition, women used CHM for urticaria more frequently than men (female:male = 1.94:1). There was an average of 5.54 items prescribed in the form of either individual Chinese herbs or a formula in a single CHM prescription for urticaria. Bai-Xian-Pi (Dictamnus dasycarpus Turcz) was the most commonly prescribed single Chinese herb while Xiao-Feng San was the most commonly prescribed Chinese herbal formula. The most commonly prescribed CHM drug combination was Xiao-Feng San plus Bai-Xian-Pi while the most commonly prescribed triple drug combination was Xiao-Feng San, Bai-Xian-Pi, and Di-Fu Zi (Kochia scoparia). Conclusions In view of the popularity of CHM such as Xiao-Feng San prescribed for the wind-heat pattern of urticaria in this study, a large-scale, randomized clinical trial is warranted to research their efficacy and safety. PMID:23947955

  12. Frequency and pattern of Chinese herbal medicine prescriptions for urticaria in Taiwan during 2009: analysis of the national health insurance database.

    PubMed

    Chien, Pei-Shan; Tseng, Yu-Fang; Hsu, Yao-Chin; Lai, Yu-Kai; Weng, Shih-Feng

    2013-08-15

    Large-scale pharmaco-epidemiological studies of Chinese herbal medicine (CHM) for treatment of urticaria are few, even though clinical trials showed some CHM are effective. The purpose of this study was to explore the frequencies and patterns of CHM prescriptions for urticaria by analysing the population-based CHM database in Taiwan. This study was linked to and processed through the complete traditional CHM database of the National Health Insurance Research Database in Taiwan during 2009. We calculated the frequencies and patterns of CHM prescriptions used for treatment of urticaria, of which the diagnosis was defined as the single ICD-9 Code of 708. Frequent itemset mining, as applied to data mining, was used to analyse co-prescription of CHM for patients with urticaria. There were 37,386 subjects who visited traditional Chinese Medicine clinics for urticaria in Taiwan during 2009 and received a total of 95,765 CHM prescriptions. Subjects between 18 and 35 years of age comprised the largest number of those treated (32.76%). In addition, women used CHM for urticaria more frequently than men (female:male = 1.94:1). There was an average of 5.54 items prescribed in the form of either individual Chinese herbs or a formula in a single CHM prescription for urticaria. Bai-Xian-Pi (Dictamnus dasycarpus Turcz) was the most commonly prescribed single Chinese herb while Xiao-Feng San was the most commonly prescribed Chinese herbal formula. The most commonly prescribed CHM drug combination was Xiao-Feng San plus Bai-Xian-Pi while the most commonly prescribed triple drug combination was Xiao-Feng San, Bai-Xian-Pi, and Di-Fu Zi (Kochia scoparia). In view of the popularity of CHM such as Xiao-Feng San prescribed for the wind-heat pattern of urticaria in this study, a large-scale, randomized clinical trial is warranted to research their efficacy and safety.

  13. Discovering interesting molecular substructures for molecular classification.

    PubMed

    Lam, Winnie W M; Chan, Keith C C

    2010-06-01

    Given a set of molecular structure data preclassified into a number of classes, the molecular classification problem is concerned with the discovering of interesting structural patterns in the data so that "unseen" molecules not originally in the dataset can be accurately classified. To tackle the problem, interesting molecular substructures have to be discovered and this is done typically by first representing molecular structures in molecular graphs, and then, using graph-mining algorithms to discover frequently occurring subgraphs in them. These subgraphs are then used to characterize different classes for molecular classification. While such an approach can be very effective, it should be noted that a substructure that occurs frequently in one class may also does occur in another. The discovering of frequent subgraphs for molecular classification may, therefore, not always be the most effective. In this paper, we propose a novel technique called mining interesting substructures in molecular data for classification (MISMOC) that can discover interesting frequent subgraphs not just for the characterization of a molecular class but also for the distinguishing of it from the others. Using a test statistic, MISMOC screens each frequent subgraph to determine if they are interesting. For those that are interesting, their degrees of interestingness are determined using an information-theoretic measure. When classifying an unseen molecule, its structure is then matched against the interesting subgraphs in each class and a total interestingness measure for the unseen molecule to be classified into a particular class is determined, which is based on the interestingness of each matched subgraphs. The performance of MISMOC is evaluated using both artificial and real datasets, and the results show that it can be an effective approach for molecular classification.

  14. Percolator: Scalable Pattern Discovery in Dynamic Graphs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Choudhury, Sutanay; Purohit, Sumit; Lin, Peng

    We demonstrate Percolator, a distributed system for graph pattern discovery in dynamic graphs. In contrast to conventional mining systems, Percolator advocates efficient pattern mining schemes that (1) support pattern detection with keywords; (2) integrate incremental and parallel pattern mining; and (3) support analytical queries such as trend analysis. The core idea of Percolator is to dynamically decide and verify a small fraction of patterns and their in- stances that must be inspected in response to buffered updates in dynamic graphs, with a total mining cost independent of graph size. We demonstrate a) the feasibility of incremental pattern mining by walkingmore » through each component of Percolator, b) the efficiency and scalability of Percolator over the sheer size of real-world dynamic graphs, and c) how the user-friendly GUI of Percolator inter- acts with users to support keyword-based queries that detect, browse and inspect trending patterns. We also demonstrate two user cases of Percolator, in social media trend analysis and academic collaboration analysis, respectively.« less

  15. A Node Linkage Approach for Sequential Pattern Mining

    PubMed Central

    Navarro, Osvaldo; Cumplido, René; Villaseñor-Pineda, Luis; Feregrino-Uribe, Claudia; Carrasco-Ochoa, Jesús Ariel

    2014-01-01

    Sequential Pattern Mining is a widely addressed problem in data mining, with applications such as analyzing Web usage, examining purchase behavior, and text mining, among others. Nevertheless, with the dramatic increase in data volume, the current approaches prove inefficient when dealing with large input datasets, a large number of different symbols and low minimum supports. In this paper, we propose a new sequential pattern mining algorithm, which follows a pattern-growth scheme to discover sequential patterns. Unlike most pattern growth algorithms, our approach does not build a data structure to represent the input dataset, but instead accesses the required sequences through pseudo-projection databases, achieving better runtime and reducing memory requirements. Our algorithm traverses the search space in a depth-first fashion and only preserves in memory a pattern node linkage and the pseudo-projections required for the branch being explored at the time. Experimental results show that our new approach, the Node Linkage Depth-First Traversal algorithm (NLDFT), has better performance and scalability in comparison with state of the art algorithms. PMID:24933123

  16. Method of locating underground mines fires

    DOEpatents

    Laage, Linneas; Pomroy, William

    1992-01-01

    An improved method of locating an underground mine fire by comparing the pattern of measured combustion product arrival times at detector locations with a real time computer-generated array of simulated patterns. A number of electronic fire detection devices are linked thru telemetry to a control station on the surface. The mine's ventilation is modeled on a digital computer using network analysis software. The time reguired to locate a fire consists of the time required to model the mines' ventilation, generate the arrival time array, scan the array, and to match measured arrival time patterns to the simulated patterns.

  17. Efficient discovery of risk patterns in medical data.

    PubMed

    Li, Jiuyong; Fu, Ada Wai-chee; Fahey, Paul

    2009-01-01

    This paper studies a problem of efficiently discovering risk patterns in medical data. Risk patterns are defined by a statistical metric, relative risk, which has been widely used in epidemiological research. To avoid fruitless search in the complete exploration of risk patterns, we define optimal risk pattern set to exclude superfluous patterns, i.e. complicated patterns with lower relative risk than their corresponding simpler form patterns. We prove that mining optimal risk pattern sets conforms an anti-monotone property that supports an efficient mining algorithm. We propose an efficient algorithm for mining optimal risk pattern sets based on this property. We also propose a hierarchical structure to present discovered patterns for the easy perusal by domain experts. The proposed approach is compared with two well-known rule discovery methods, decision tree and association rule mining approaches on benchmark data sets and applied to a real world application. The proposed method discovers more and better quality risk patterns than a decision tree approach. The decision tree method is not designed for such applications and is inadequate for pattern exploring. The proposed method does not discover a large number of uninteresting superfluous patterns as an association mining approach does. The proposed method is more efficient than an association rule mining method. A real world case study shows that the method reveals some interesting risk patterns to medical practitioners. The proposed method is an efficient approach to explore risk patterns. It quickly identifies cohorts of patients that are vulnerable to a risk outcome from a large data set. The proposed method is useful for exploratory study on large medical data to generate and refine hypotheses. The method is also useful for designing medical surveillance systems.

  18. A prospective cohort study among new Chinese coal miners: the early pattern of lung function change.

    PubMed

    Wang, M-L; Wu, Z-E; Du, Q-G; Petsonk, E L; Peng, K-L; Li, Y-D; Li, S-K; Han, G-H; Atffield, M D

    2005-11-01

    To investigate the early pattern of longitudinal change in forced expiratory volume in 1 second (FEV1) among new Chinese coal miners, and the relation between coal mine dust exposure and the decline of lung function. The early pattern of lung function changes in 317 newly hired Chinese underground coal miners was compared to 132 referents. This three year prospective cohort study involved a pre-employment and 15 follow up health surveys, including a questionnaire and spirometry tests. Twice a month, total and respirable dust area sampling was done. The authors used a two stage analysis and a linear mixed effects model approach to analyse the longitudinal spirometry data, and to investigate the changes in FEV1 over time, controlling for age, height, pack years of smoking, mean respirable dust concentration, the room temperature during testing, and the groupxtime interaction terms. FEV1 change over time in new miners is non-linear. New miners experience initial rapid FEV1 declines, primarily during the first year of mining, little change during the second year, and partial recovery during the third year. Both linear and quadratic time trends in FEV1 change are highly significant. Smoking miners lost more FEV1 than non-smokers. Referents, all age less than 20 years, showed continued lung growth, whereas the miners who were under age 20 exhibited a decline in FEV1. Dust and smoking affect lung function in young, newly hired Chinese coal miners. FEV1 change over the first three years of employment is non-linear. The findings have implications for both methods and interpretation of medical screening in coal mining and other dusty work: during the first several years of employment more frequent testing may be desirable, and caution is required in interpreting early FEV1 declines.

  19. 76 FR 25277 - Examinations of Work Areas in Underground Coal Mines and Pattern of Violations

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-05-04

    ..., 1219-AB73 Examinations of Work Areas in Underground Coal Mines and Pattern of Violations AGENCY: Mine... four public hearings on the Agency's proposed rules for Examinations of Work Areas in Underground Coal... 1219-AB75'' for Examinations of Work Areas in Underground Coal Mines' submissions, and with ``RIN 1219...

  20. Monitoring coal mine changes and their impact on landscape patterns in an alpine region: a case study of the Muli coal mine in the Qinghai-Tibet Plateau.

    PubMed

    Qian, Dawen; Yan, Changzhen; Xing, Zanpin; Xiu, Lina

    2017-10-14

    The Muli coal mine is the largest open-cast coal mine in the Qinghai-Tibet Plateau, and it consists of two independent mining sites named Juhugeng and Jiangcang. It has received much attention due to the ecological problems caused by rapid expansion in recent years. The objective of this paper was to monitor the mining area and its surrounding land cover over the period 1976-2016 utilizing Landsat images, and the network structure of land cover changes was determined to visualize the relationships and pattern of the mining-induced land cover changes. In addition, the responses of the surrounding landscape pattern were analysed by constructing gradient transects. The results show that the mining area was increasing in size, especially after 2000 (increased by 71.68 km 2 ), and this caused shrinkage of the surrounding lands, including alpine meadow wetland (53.44 km 2 ), alpine meadow (6.28 km 2 ) and water (6.24 km 2 ). The network structure of the mining area revealed the changes in lands surrounding the mining area. The impact of mining development on landscape patterns was mainly distributed within a range of 1-6 km. Alpine meadow wetland was most affected in Juhugeng, while alpine meadow was most affected in Jiangcang. The results of this study provide a reference for the ecological assessment and restoration of the Muli coal mine land.

  1. Effects of host-plant population size and plant sex on a specialist leaf-miner

    NASA Astrophysics Data System (ADS)

    Bañuelos, María-José; Kollmann, Johannes

    2011-03-01

    Animal population density has been related to resource patch size through various hypotheses such as those derived from island biogeography and resource concentration theory. This theoretical framework can be also applied to plant-herbivore interactions, and it can be modified by the sex of the host-plant, and density-dependent relationships. Leaf-miners are specialised herbivores that leave distinct traces on infested leaves in the form of egg scars, mines, signs of predation and emergence holes. This allows the life cycle of the insect to be reconstructed and the success at the different stages to be estimated. The main stages of the leaf-miner Phytomyza ilicis were recorded in eleven populations of the evergreen host Ilex aquifolium in Denmark. Survival rates were calculated and related to population size, sex of the host plant, and egg and mine densities. Host population size was negatively related to leaf-miner prevalence, with larger egg and mine densities in small populations. Percentage of eggs hatching and developing into mines, and percentage of adult flies emerging from mines also differed among host populations, but were not related to population size or host cover. Feeding punctures left by adults were marginally more frequent on male plants, whereas egg scars and mines were more common on females. Overall survival rate from egg stage to adult emergence was higher on female plants. Egg density was negatively correlated with hatching, while mine density was positively correlated with emergence of the larvae. The inverse effects of host population size were not in line with predictions based on island biogeography and resource concentration theory. We discuss how a thorough knowledge of the immigration behaviour of this fly might help to understand the patterns found.

  2. Working with Data: Discovering Knowledge through Mining and Analysis; Systematic Knowledge Management and Knowledge Discovery; Text Mining; Methodological Approach in Discovering User Search Patterns through Web Log Analysis; Knowledge Discovery in Databases Using Formal Concept Analysis; Knowledge Discovery with a Little Perspective.

    ERIC Educational Resources Information Center

    Qin, Jian; Jurisica, Igor; Liddy, Elizabeth D.; Jansen, Bernard J; Spink, Amanda; Priss, Uta; Norton, Melanie J.

    2000-01-01

    These six articles discuss knowledge discovery in databases (KDD). Topics include data mining; knowledge management systems; applications of knowledge discovery; text and Web mining; text mining and information retrieval; user search patterns through Web log analysis; concept analysis; data collection; and data structure inconsistency. (LRW)

  3. Differential affinities of MinD and MinE to anionic phospholipid influence Min Patterning dynamics in vitro

    PubMed Central

    Vecchiarelli, Anthony G.; Li, Min; Mizuuchi, Michiyo; Mizuuchi, Kiyoshi

    2014-01-01

    The E. coli Min system forms a cell-pole-to-cell-pole oscillator that positions the divisome at mid-cell. The MinD ATPase binds the membrane and recruits the cell division inhibitor MinC. MinE interacts with and releases MinD (and MinC) from the membrane. The chase of MinD by MinE creates the in vivo oscillator that maintains a low level of the division inhibitor at mid-cell. In vitro reconstitution and visualization of Min proteins on a supported lipid bilayer has provided significant advances in understanding Min patterns in vivo. Here we studied the effects of flow, lipid composition, and salt concentration on Min patterning. Flow and no-flow conditions both supported Min protein patterns with somewhat different characteristics. Without flow, MinD and MinE formed spiraling waves. MinD and, to a greater extent MinE, have stronger affinities for anionic phospholipid. MinD-independent binding of MinE to anionic lipid resulted in slower and narrower waves. MinE binding to the bilayer was also more susceptible to changes in ionic strength than MinD. We find that modulating protein diffusion with flow, or membrane binding affinities with changes in lipid composition or salt concentration, can differentially affect the retention time of MinD and MinE, leading to spatiotemporal changes in Min patterning. PMID:24930948

  4. Data Mining in Social Media

    NASA Astrophysics Data System (ADS)

    Barbier, Geoffrey; Liu, Huan

    The rise of online social media is providing a wealth of social network data. Data mining techniques provide researchers and practitioners the tools needed to analyze large, complex, and frequently changing social media data. This chapter introduces the basics of data mining, reviews social media, discusses how to mine social media data, and highlights some illustrative examples with an emphasis on social networking sites and blogs.

  5. Automating Network Node Behavior Characterization by Mining Communication Patterns

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Carroll, Thomas E.; Chikkagoudar, Satish; Arthur-Durett, Kristine M.

    Enterprise networks of scale are complex, dynamic computing environments that respond to evolv- ing business objectives and requirements. Characteriz- ing system behaviors in these environments is essential for network management and cyber security operations. Characterization of system’s communication is typical and is supported using network flow information (NetFlow). Related work has characterized behavior using theoretical graph metrics; results are often difficult to interpret by enterprise staff. We propose a different approach, where flow information is mapped to sets of tags that contextualize the data in terms of network principals and enterprise concepts. Frequent patterns are then extracted and are expressedmore » as behaviors. Behaviors can be com- pared, identifying systems expressing similar behaviors. We evaluate the approach using flow information collected by a third party.« less

  6. A Framework for Spatial Interaction Analysis Based on Large-Scale Mobile Phone Data

    PubMed Central

    Li, Weifeng; Cheng, Xiaoyun; Guo, Gaohua

    2014-01-01

    The overall understanding of spatial interaction and the exact knowledge of its dynamic evolution are required in the urban planning and transportation planning. This study aimed to analyze the spatial interaction based on the large-scale mobile phone data. The newly arisen mass dataset required a new methodology which was compatible with its peculiar characteristics. A three-stage framework was proposed in this paper, including data preprocessing, critical activity identification, and spatial interaction measurement. The proposed framework introduced the frequent pattern mining and measured the spatial interaction by the obtained association. A case study of three communities in Shanghai was carried out as verification of proposed method and demonstration of its practical application. The spatial interaction patterns and the representative features proved the rationality of the proposed framework. PMID:25435865

  7. Negative and Positive Association Rules Mining from Text Using Frequent and Infrequent Itemsets

    PubMed Central

    Mahmood, Sajid; Shahbaz, Muhammad; Guergachi, Aziz

    2014-01-01

    Association rule mining research typically focuses on positive association rules (PARs), generated from frequently occurring itemsets. However, in recent years, there has been a significant research focused on finding interesting infrequent itemsets leading to the discovery of negative association rules (NARs). The discovery of infrequent itemsets is far more difficult than their counterparts, that is, frequent itemsets. These problems include infrequent itemsets discovery and generation of accurate NARs, and their huge number as compared with positive association rules. In medical science, for example, one is interested in factors which can either adjudicate the presence of a disease or write-off of its possibility. The vivid positive symptoms are often obvious; however, negative symptoms are subtler and more difficult to recognize and diagnose. In this paper, we propose an algorithm for discovering positive and negative association rules among frequent and infrequent itemsets. We identify associations among medications, symptoms, and laboratory results using state-of-the-art data mining technology. PMID:24955429

  8. Exploring patterns of epigenetic information with data mining techniques.

    PubMed

    Aguiar-Pulido, Vanessa; Seoane, José A; Gestal, Marcos; Dorado, Julián

    2013-01-01

    Data mining, a part of the Knowledge Discovery in Databases process (KDD), is the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with database management. Analyses of epigenetic data have evolved towards genome-wide and high-throughput approaches, thus generating great amounts of data for which data mining is essential. Part of these data may contain patterns of epigenetic information which are mitotically and/or meiotically heritable determining gene expression and cellular differentiation, as well as cellular fate. Epigenetic lesions and genetic mutations are acquired by individuals during their life and accumulate with ageing. Both defects, either together or individually, can result in losing control over cell growth and, thus, causing cancer development. Data mining techniques could be then used to extract the previous patterns. This work reviews some of the most important applications of data mining to epigenetics.

  9. Constraint-based Data Mining

    NASA Astrophysics Data System (ADS)

    Boulicaut, Jean-Francois; Jeudy, Baptiste

    Knowledge Discovery in Databases (KDD) is a complex interactive process. The promising theoretical framework of inductive databases considers this is essentially a querying process. It is enabled by a query language which can deal either with raw data or patterns which hold in the data. Mining patterns turns to be the so-called inductive query evaluation process for which constraint-based Data Mining techniques have to be designed. An inductive query specifies declaratively the desired constraints and algorithms are used to compute the patterns satisfying the constraints in the data. We survey important results of this active research domain. This chapter emphasizes a real breakthrough for hard problems concerning local pattern mining under various constraints and it points out the current directions of research as well.

  10. TOPTRAC: Topical Trajectory Pattern Mining

    PubMed Central

    Kim, Younghoon; Han, Jiawei; Yuan, Cangzhou

    2015-01-01

    With the increasing use of GPS-enabled mobile phones, geo-tagging, which refers to adding GPS information to media such as micro-blogging messages or photos, has seen a surge in popularity recently. This enables us to not only browse information based on locations, but also discover patterns in the location-based behaviors of users. Many techniques have been developed to find the patterns of people's movements using GPS data, but latent topics in text messages posted with local contexts have not been utilized effectively. In this paper, we present a latent topic-based clustering algorithm to discover patterns in the trajectories of geo-tagged text messages. We propose a novel probabilistic model to capture the semantic regions where people post messages with a coherent topic as well as the patterns of movement between the semantic regions. Based on the model, we develop an efficient inference algorithm to calculate model parameters. By exploiting the estimated model, we next devise a clustering algorithm to find the significant movement patterns that appear frequently in data. Our experiments on real-life data sets show that the proposed algorithm finds diverse and interesting trajectory patterns and identifies the semantic regions in a finer granularity than the traditional geographical clustering methods. PMID:26709365

  11. Multi-Level Sequential Pattern Mining Based on Prime Encoding

    NASA Astrophysics Data System (ADS)

    Lianglei, Sun; Yun, Li; Jiang, Yin

    Encoding is not only to express the hierarchical relationship, but also to facilitate the identification of the relationship between different levels, which will directly affect the efficiency of the algorithm in the area of mining the multi-level sequential pattern. In this paper, we prove that one step of division operation can decide the parent-child relationship between different levels by using prime encoding and present PMSM algorithm and CROSS-PMSM algorithm which are based on prime encoding for mining multi-level sequential pattern and cross-level sequential pattern respectively. Experimental results show that the algorithm can effectively extract multi-level and cross-level sequential pattern from the sequence database.

  12. A Novel Method for Discovering Fuzzy Sequential Patterns Using the Simple Fuzzy Partition Method.

    ERIC Educational Resources Information Center

    Chen, Ruey-Shun; Hu, Yi-Chung

    2003-01-01

    Discusses sequential patterns, data mining, knowledge acquisition, and fuzzy sequential patterns described by natural language. Proposes a fuzzy data mining technique to discover fuzzy sequential patterns by using the simple partition method which allows the linguistic interpretation of each fuzzy set to be easily obtained. (Author/LRW)

  13. Data Mining Techniques Applied to Hydrogen Lactose Breath Test.

    PubMed

    Rubio-Escudero, Cristina; Valverde-Fernández, Justo; Nepomuceno-Chamorro, Isabel; Pontes-Balanza, Beatriz; Hernández-Mendoza, Yoedusvany; Rodríguez-Herrera, Alfonso

    2017-01-01

    Analyze a set of data of hydrogen breath tests by use of data mining tools. Identify new patterns of H2 production. Hydrogen breath tests data sets as well as k-means clustering as the data mining technique to a dataset of 2571 patients. Six different patterns have been extracted upon analysis of the hydrogen breath test data. We have also shown the relevance of each of the samples taken throughout the test. Analysis of the hydrogen breath test data sets using data mining techniques has identified new patterns of hydrogen generation upon lactose absorption. We can see the potential of application of data mining techniques to clinical data sets. These results offer promising data for future research on the relations between gut microbiota produced hydrogen and its link to clinical symptoms.

  14. Learning Behavior Characterization with Multi-Feature, Hierarchical Activity Sequences

    ERIC Educational Resources Information Center

    Ye, Cheng; Segedy, James R.; Kinnebrew, John S.; Biswas, Gautam

    2015-01-01

    This paper discusses Multi-Feature Hierarchical Sequential Pattern Mining, MFH-SPAM, a novel algorithm that efficiently extracts patterns from students' learning activity sequences. This algorithm extends an existing sequential pattern mining algorithm by dynamically selecting the level of specificity for hierarchically-defined features…

  15. 78 FR 5055 - Pattern of Violations

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-01-23

    ...The Mine Safety and Health Administration (MSHA) is revising the Agency's existing regulation for pattern of violations (POV). MSHA has determined that the existing regulation does not adequately achieve the intent of the Federal Mine Safety and Health Act of 1977 (Mine Act) that the POV provision be used to address mine operators who have demonstrated a disregard for the health and safety of miners. Congress included the POV provision in the Mine Act so that mine operators would manage health and safety conditions at mines and find and fix the root causes of significant and substantial (S&S) violations, protecting the health and safety of miners. The final rule simplifies the existing POV criteria, improves consistency in applying the POV criteria, and more effectively achieves the Mine Act's statutory intent. It also encourages chronic safety violators to comply with the Mine Act and MSHA's health and safety standards.

  16. Recommending Learning Activities in Social Network Using Data Mining Algorithms

    ERIC Educational Resources Information Center

    Mahnane, Lamia

    2017-01-01

    In this paper, we show how data mining algorithms (e.g. Apriori Algorithm (AP) and Collaborative Filtering (CF)) is useful in New Social Network (NSN-AP-CF). "NSN-AP-CF" processes the clusters based on different learning styles. Next, it analyzes the habits and the interests of the users through mining the frequent episodes by the…

  17. An Incremental High-Utility Mining Algorithm with Transaction Insertion

    PubMed Central

    Gan, Wensheng; Zhang, Binbin

    2015-01-01

    Association-rule mining is commonly used to discover useful and meaningful patterns from a very large database. It only considers the occurrence frequencies of items to reveal the relationships among itemsets. Traditional association-rule mining is, however, not suitable in real-world applications since the purchased items from a customer may have various factors, such as profit or quantity. High-utility mining was designed to solve the limitations of association-rule mining by considering both the quantity and profit measures. Most algorithms of high-utility mining are designed to handle the static database. Fewer researches handle the dynamic high-utility mining with transaction insertion, thus requiring the computations of database rescan and combination explosion of pattern-growth mechanism. In this paper, an efficient incremental algorithm with transaction insertion is designed to reduce computations without candidate generation based on the utility-list structures. The enumeration tree and the relationships between 2-itemsets are also adopted in the proposed algorithm to speed up the computations. Several experiments are conducted to show the performance of the proposed algorithm in terms of runtime, memory consumption, and number of generated patterns. PMID:25811038

  18. Software tool for data mining and its applications

    NASA Astrophysics Data System (ADS)

    Yang, Jie; Ye, Chenzhou; Chen, Nianyi

    2002-03-01

    A software tool for data mining is introduced, which integrates pattern recognition (PCA, Fisher, clustering, hyperenvelop, regression), artificial intelligence (knowledge representation, decision trees), statistical learning (rough set, support vector machine), computational intelligence (neural network, genetic algorithm, fuzzy systems). It consists of nine function models: pattern recognition, decision trees, association rule, fuzzy rule, neural network, genetic algorithm, Hyper Envelop, support vector machine, visualization. The principle and knowledge representation of some function models of data mining are described. The software tool of data mining is realized by Visual C++ under Windows 2000. Nonmonotony in data mining is dealt with by concept hierarchy and layered mining. The software tool of data mining has satisfactorily applied in the prediction of regularities of the formation of ternary intermetallic compounds in alloy systems, and diagnosis of brain glioma.

  19. Quantum algorithm for association rules mining

    NASA Astrophysics Data System (ADS)

    Yu, Chao-Hua; Gao, Fei; Wang, Qing-Le; Wen, Qiao-Yan

    2016-10-01

    Association rules mining (ARM) is one of the most important problems in knowledge discovery and data mining. Given a transaction database that has a large number of transactions and items, the task of ARM is to acquire consumption habits of customers by discovering the relationships between itemsets (sets of items). In this paper, we address ARM in the quantum settings and propose a quantum algorithm for the key part of ARM, finding frequent itemsets from the candidate itemsets and acquiring their supports. Specifically, for the case in which there are Mf(k ) frequent k -itemsets in the Mc(k ) candidate k -itemsets (Mf(k )≤Mc(k ) ), our algorithm can efficiently mine these frequent k -itemsets and estimate their supports by using parallel amplitude estimation and amplitude amplification with complexity O (k/√{Mc(k )Mf(k ) } ɛ ) , where ɛ is the error for estimating the supports. Compared with the classical counterpart, i.e., the classical sampling-based algorithm, whose complexity is O (k/Mc(k ) ɛ2) , our quantum algorithm quadratically improves the dependence on both ɛ and Mc(k ) in the best case when Mf(k )≪Mc(k ) and on ɛ alone in the worst case when Mf(k )≈Mc(k ) .

  20. Site-specific climate analysis elucidates revegetation challenges for post-mining landscapes in eastern Australia

    NASA Astrophysics Data System (ADS)

    Audet, P.; Arnold, S.; Lechner, A. M.; Baumgartl, T.

    2013-10-01

    In eastern Australia, the availability of water is critical for the successful rehabilitation of post-mining landscapes and climatic characteristics of this diverse geographical region are closely defined by factors such as erratic rainfall and periods of drought and flooding. Despite this, specific metrics of climate patterning are seldom incorporated into the initial design of current post-mining land rehabilitation strategies. Our study proposes that a few common rainfall parameters can be combined and rated using arbitrary rainfall thresholds to characterise bioregional climate sensitivity relevant to the rehabilitation these landscapes. This approach included assessments of annual rainfall depth, average recurrence interval of prolonged low intensity rainfall, average recurrence intervals of short or prolonged high intensity events, median period without rain (or water-deficit) and standard deviation for this period in order to address climatic factors such as total water availability, seasonality and intensity - which were selected as potential proxies of both short- and long-term biological sensitivity to climate within the context of post-disturbance ecological development and recovery. Following our survey of available climate data, we derived site "climate sensitivity" indexes and compared the performance of 9 ongoing mine sites: Weipa, Mt. Isa and Cloncurry, Eromanga, Kidston, the Bowen Basin (Curragh), Tarong, North Stradbroke Island, and the Newnes Plateau. The sites were then ranked from most-to-least sensitive and compared with natural bioregional patterns of vegetation density using mean NDVI. It was determined that regular rainfall and relatively short periods of water-deficit were key characteristics of sites having less sensitivity to climate - as found among the relatively more temperate inland mining locations. Whereas, high rainfall variability, frequently occurring high intensity events, and (or) prolonged seasonal drought were primary indicators of sites having greater sensitivity to climate - as found among the semi-arid central-inland sites. Overall, the manner in which these climatic factors are identified and ultimately addressed by land managers and rehabilitation practitioners could be a key determinant of achievable success at given locations at the planning stages of rehabilitation design.

  1. Content-Aware DataGuide with Incremental Index Update using Frequently Used Paths

    NASA Astrophysics Data System (ADS)

    Sharma, A. K.; Duhan, Neelam; Khattar, Priyanka

    2010-11-01

    Size of the WWW is increasing day by day. Due to the absence of structured data on the Web, it becomes very difficult for information retrieval tools to fully utilize the Web information. As a solution to this problem, XML pages come into play, which provide structural information to the users to some extent. Without efficient indexes, query processing can be quite inefficient due to an exhaustive traversal on XML data. In this paper an improved content-centric approach of Content-Aware DataGuide, which is an indexing technique for XML databases, is being proposed that uses frequently used paths from historical query logs to improve query performance. The index can be updated incrementally according to the changes in query workload and thus, the overhead of reconstruction can be minimized. Frequently used paths are extracted using any Sequential Pattern mining algorithm on subsequent queries in the query workload. After this, the data structures are incrementally updated. This indexing technique proves to be efficient as partial matching queries can be executed efficiently and users can now get the more relevant documents in results.

  2. Exploiting Sequential Patterns Found in Users' Solutions and Virtual Tutor Behavior to Improve Assistance in ITS

    ERIC Educational Resources Information Center

    Fournier-Viger, Philippe; Faghihi, Usef; Nkambou, Roger; Nguifo, Engelbert Mephu

    2010-01-01

    We propose to mine temporal patterns in Intelligent Tutoring Systems (ITSs) to uncover useful knowledge that can enhance their ability to provide assistance. To discover patterns, we suggest using a custom, sequential pattern-mining algorithm. Two ways of applying the algorithm to enhance an ITS's capabilities are addressed. The first is to…

  3. Ecosystem Health Assessment of Mining Cities Based on Landscape Pattern

    NASA Astrophysics Data System (ADS)

    Yu, W.; Liu, Y.; Lin, M.; Fang, F.; Xiao, R.

    2017-09-01

    Ecosystem health assessment (EHA) is one of the most important aspects in ecosystem management. Nowadays, ecological environment of mining cities is facing various problems. In this study, through ecosystem health theory and remote sensing images in 2005, 2009 and 2013, landscape pattern analysis and Vigor-Organization-Resilience (VOR) model were applied to set up an evaluation index system of ecosystem health of mining city to assess the healthy level of ecosystem in Panji District Huainan city. Results showed a temporal stable but high spatial heterogeneity landscape pattern during 2005-2013. According to the regional ecosystem health index, it experienced a rapid decline after a slight increase, and finally it maintained at an ordinary level. Among these areas, a significant distinction was presented in different towns. It indicates that the ecosystem health of Tianjijiedao town, the regional administrative centre, descended rapidly during the study period, and turned into the worst level in the study area. While the Hetuan Town, located in the northwestern suburb area of Panji District, stayed on a relatively better level than other towns. The impacts of coal mining collapse area, land reclamation on the landscape pattern and ecosystem health status of mining cities were also discussed. As a result of underground coal mining, land subsidence has become an inevitable problem in the study area. In addition, the coal mining subsidence area has brought about the destruction of the farmland, construction land and water bodies, which causing the change of the regional landscape pattern and making the evaluation of ecosystem health in mining area more difficult. Therefore, this study provided an ecosystem health approach for relevant departments to make scientific decisions.

  4. An Efficient Pattern Mining Approach for Event Detection in Multivariate Temporal Data

    PubMed Central

    Batal, Iyad; Cooper, Gregory; Fradkin, Dmitriy; Harrison, James; Moerchen, Fabian; Hauskrecht, Milos

    2015-01-01

    This work proposes a pattern mining approach to learn event detection models from complex multivariate temporal data, such as electronic health records. We present Recent Temporal Pattern mining, a novel approach for efficiently finding predictive patterns for event detection problems. This approach first converts the time series data into time-interval sequences of temporal abstractions. It then constructs more complex time-interval patterns backward in time using temporal operators. We also present the Minimal Predictive Recent Temporal Patterns framework for selecting a small set of predictive and non-spurious patterns. We apply our methods for predicting adverse medical events in real-world clinical data. The results demonstrate the benefits of our methods in learning accurate event detection models, which is a key step for developing intelligent patient monitoring and decision support systems. PMID:26752800

  5. Implications of Emerging Data Mining

    NASA Astrophysics Data System (ADS)

    Kulathuramaiyer, Narayanan; Maurer, Hermann

    Data Mining describes a technology that discovers non-trivial hidden patterns in a large collection of data. Although this technology has a tremendous impact on our lives, the invaluable contributions of this invisible technology often go unnoticed. This paper discusses advances in data mining while focusing on the emerging data mining capability. Such data mining applications perform multidimensional mining on a wide variety of heterogeneous data sources, providing solutions to many unresolved problems. This paper also highlights the advantages and disadvantages arising from the ever-expanding scope of data mining. Data Mining augments human intelligence by equipping us with a wealth of knowledge and by empowering us to perform our daily tasks better. As the mining scope and capacity increases, users and organizations become more willing to compromise privacy. The huge data stores of the ‚master miners` allow them to gain deep insights into individual lifestyles and their social and behavioural patterns. Data integration and analysis capability of combining business and financial trends together with the ability to deterministically track market changes will drastically affect our lives.

  6. DMT-TAFM: a data mining tool for technical analysis of futures market

    NASA Astrophysics Data System (ADS)

    Stepanov, Vladimir; Sathaye, Archana

    2002-03-01

    Technical analysis of financial markets describes many patterns of market behavior. For practical use, all these descriptions need to be adjusted for each particular trading session. In this paper, we develop a data mining tool for technical analysis of the futures markets (DMT-TAFM), which dynamically generates rules based on the notion of the price pattern similarity. The tool consists of three main components. The first component provides visualization of data series on a chart with different ranges, scales, and chart sizes and types. The second component constructs pattern descriptions using sets of polynomials. The third component specifies the training set for mining, defines the similarity notion, and searches for a set of similar patterns. DMT-TAFM is useful to prepare the data, and then reveal and systemize statistical information about similar patterns found in any type of historical price series. We performed experiments with our tool on three decades of trading data fro hundred types of futures. Our results for this data set shows that, we can prove or disprove many well-known patterns based on real data, as well as reveal new ones, and use the set of relatively consistent patterns found during data mining for developing better futures trading strategies.

  7. Trace metal depositional patterns from an open pit mining activity as revealed by archived avian gizzard contents.

    PubMed

    Bendell, L I

    2011-02-15

    Archived samples of blue grouse (Dendragapus obscurus) gizzard contents, inclusive of grit, collected yearly between 1959 and 1970 were analyzed for cadmium, lead, zinc, and copper content. Approximately halfway through the 12-year sampling period, an open-pit copper mine began activities, then ceased operations 2 years later. Thus the archived samples provided a unique opportunity to determine if avian gizzard contents, inclusive of grit, could reveal patterns in the anthropogenic deposition of trace metals associated with mining activities. Gizzard concentrations of cadmium and copper strongly coincided with the onset of opening and the closing of the pit mining activity. Gizzard zinc and lead demonstrated significant among year variation; however, maximum concentrations did not correlate to mining activity. The archived gizzard contents did provide a useful tool for documenting trends in metal depositional patterns related to an anthropogenic activity. Further, blue grouse ingesting grit particles during the time of active mining activity would have been exposed to toxicologically significant levels of cadmium. Gizzard lead concentrations were also of toxicological significance but not related to mining activity. This type of "pulse" toxic metal exposure as a consequence of open-pit mining activity would not necessarily have been revealed through a "snap-shot" of soil, plant or avian tissue trace metal analysis post-mining activity. Copyright © 2010 Elsevier B.V. All rights reserved.

  8. Mining with Rare Cases

    NASA Astrophysics Data System (ADS)

    Weiss, Gary M.

    Rare cases are often the most interesting cases. For example, in medical diagnosis one is typically interested in identifying relatively rare diseases, such as cancer, rather than more frequently occurring ones, such as the common cold. In this chapter we discuss the role of rare cases in Data Mining. Specific problems associated with mining rare cases are discussed, followed by a description of methods for addressing these problems.

  9. Exploration of geo-mineral compounds in granite mining soils using XRD pattern data analysis

    NASA Astrophysics Data System (ADS)

    Koteswara Reddy, G.; Yarakkula, Kiran

    2017-11-01

    The purpose of the study was to investigate the major minerals present in granite mining waste and agricultural soils near and away from mining areas. The mineral exploration of representative sub-soil samples are identified by X-Ray Diffractometer (XRD) pattern data analysis. The morphological features and quantitative elementary analysis was performed by Scanning Electron Microscopy-Energy Dispersed Spectroscopy (SEM-EDS).The XRD pattern data revealed that the major minerals are identified as Quartz, Albite, Anorthite, K-Feldspars, Muscovite, Annite, Lepidolite, Illite, Enstatite and Ferrosilite in granite waste. However, in case of agricultural farm soils the major minerals are identified as Gypsum, Calcite, Magnetite, Hematite, Muscovite, K-Feldspars and Quartz. Moreover, the agricultural soils neighbouring mining areas, the minerals are found that, the enriched Mica group minerals (Lepidolite and Illite) the enriched Orthopyroxene group minerals (Ferrosilite and Enstatite). It is observed that the Mica and Orthopyroxene group minerals are present in agricultural farm soils neighbouring mining areas and absent in agricultural farm soils away from mining areas. The study demonstrated that the chemical migration takes place at agricultural farm lands in the vicinity of the granite mining areas.

  10. Differentially Private Frequent Sequence Mining via Sampling-based Candidate Pruning

    PubMed Central

    Xu, Shengzhi; Cheng, Xiang; Li, Zhengyi; Xiong, Li

    2016-01-01

    In this paper, we study the problem of mining frequent sequences under the rigorous differential privacy model. We explore the possibility of designing a differentially private frequent sequence mining (FSM) algorithm which can achieve both high data utility and a high degree of privacy. We found, in differentially private FSM, the amount of required noise is proportionate to the number of candidate sequences. If we could effectively reduce the number of unpromising candidate sequences, the utility and privacy tradeoff can be significantly improved. To this end, by leveraging a sampling-based candidate pruning technique, we propose a novel differentially private FSM algorithm, which is referred to as PFS2. The core of our algorithm is to utilize sample databases to further prune the candidate sequences generated based on the downward closure property. In particular, we use the noisy local support of candidate sequences in the sample databases to estimate which sequences are potentially frequent. To improve the accuracy of such private estimations, a sequence shrinking method is proposed to enforce the length constraint on the sample databases. Moreover, to decrease the probability of misestimating frequent sequences as infrequent, a threshold relaxation method is proposed to relax the user-specified threshold for the sample databases. Through formal privacy analysis, we show that our PFS2 algorithm is ε-differentially private. Extensive experiments on real datasets illustrate that our PFS2 algorithm can privately find frequent sequences with high accuracy. PMID:26973430

  11. Quantifying Associations between Environmental Stressors and Demographic Factors

    EPA Science Inventory

    Association rule mining (ARM) [1-3], also known as frequent item set mining [4] or market basket analysis [1], has been widely applied in many different areas, such as business product portfolio planning [5], intrusion detection infrastructure design [6], gene expression analysis...

  12. Data Mining and Pattern Recognition Models for Identifying Inherited Diseases: Challenges and Implications.

    PubMed

    Iddamalgoda, Lahiru; Das, Partha S; Aponso, Achala; Sundararajan, Vijayaraghava S; Suravajhala, Prashanth; Valadi, Jayaraman K

    2016-01-01

    Data mining and pattern recognition methods reveal interesting findings in genetic studies, especially on how the genetic makeup is associated with inherited diseases. Although researchers have proposed various data mining models for biomedical approaches, there remains a challenge in accurately prioritizing the single nucleotide polymorphisms (SNP) associated with the disease. In this commentary, we review the state-of-art data mining and pattern recognition models for identifying inherited diseases and deliberate the need of binary classification- and scoring-based prioritization methods in determining causal variants. While we discuss the pros and cons associated with these methods known, we argue that the gene prioritization methods and the protein interaction (PPI) methods in conjunction with the K nearest neighbors' could be used in accurately categorizing the genetic factors in disease causation.

  13. The prevalence of selected potentially hazardous workplace exposures in the US: findings from the 2010 National Health Interview Survey.

    PubMed

    Calvert, Geoffrey M; Luckhaupt, Sara E; Sussell, Aaron; Dahlhamer, James M; Ward, Brian W

    2013-06-01

    Assess the national prevalence of current workplace exposure to potential skin hazards, secondhand smoke (SHS), and outdoor work among various industry and occupation groups. Also, assess the national prevalence of chronic workplace exposure to vapors, gas, dust, and fumes (VGDF) among these groups. Data were obtained from the 2010 National Health Interview Survey (NHIS). NHIS is a multistage probability sample survey of the civilian non-institutionalized population of the US. Prevalence rates and their variances were calculated using SUDAAN to account for the complex NHIS sample design. The data for 2010 were available for 17,524 adults who worked in the 12 months that preceded interview. The highest prevalence rates of hazardous workplace exposures were typically in agriculture, mining, and construction. The prevalence rate of frequent handling of or skin contact with chemicals, and of non-smokers frequently exposed to SHS at work was highest in mining and construction. Outdoor work was most common in agriculture (85%), construction (73%), and mining (65%). Finally, frequent occupational exposure to VGDF was most common among mining (67%), agriculture (53%), and construction workers (51%). We identified industries and occupations with the highest prevalence of potentially hazardous workplace exposures, and provided targets for investigation and intervention activities. Copyright © 2012 Wiley Periodicals, Inc.

  14. The Prevalence of Selected Potentially Hazardous Workplace Exposures in the US: Findings From the 2010 National Health Interview Survey

    PubMed Central

    Calvert, Geoffrey M.; Luckhaupt, Sara E.; Sussell, Aaron; Dahlhamer, James M.; Ward, Brian W.

    2015-01-01

    Objective Assess the national prevalence of current workplace exposure to potential skin hazards, secondhand smoke (SHS), and outdoor work among various industry and occupation groups. Also, assess the national prevalence of chronic workplace exposure to vapors, gas, dust, and fumes (VGDF) among these groups. Methods Data were obtained from the 2010 National Health Interview Survey (NHIS). NHIS is a multistage probability sample survey of the civilian non-institutionalized population of the US. Prevalence rates and their variances were calculated using SUDAAN to account for the complex NHIS sample design. Results The data for 2010 were available for 17,524 adults who worked in the 12 months that preceded interview. The highest prevalence rates of hazardous workplace exposures were typically in agriculture, mining, and construction. The prevalence rate of frequent handling of or skin contact with chemicals, and of non-smokers frequently exposed to SHS at work was highest in mining and construction. Outdoor work was most common in agriculture (85%), construction (73%), and mining (65%). Finally, frequent occupational exposure to VGDF was most common among mining (67%), agriculture (53%), and construction workers (51%). Conclusion We identified industries and occupations with the highest prevalence of potentially hazardous workplace exposures, and provided targets for investigation and intervention activities. PMID:22821700

  15. General health status of residents of the Selebi Phikwe Ni-Cu mine area, Botswana.

    PubMed

    Ekosse, Georges

    2005-10-01

    Residents of the Selebi Phikwe area, Botswana where nickel-copper (Ni-Cu) is being exploited often exhibit symptoms of varied degrees of ailments, sicknesses and diseases. A need to investigate their general health status was therefore eminent. Primary data was obtained by means of a questionnaire and structured interviews conducted with individuals, health service providers, business enterprises and educational Institutions. The generated data revealed common ailments, sicknesses and diseases in the area with the four most frequent health complaints being frequent coughing headaches, influenza/common colds and rampant chest pains. Research findings indicated that residents had respiratory tract-related problems, suspected to be linked to the effects of air pollution caused by the emission of sulphur dioxide (SO2) from mining and smelting activities. Residents were frequently in contact with SO2 and related gases and fumes, mineral and silica dust generated from the mining processes. No clearly demarcating differences were noticed in the health status of residents living in the control site from those in the main study area. However, sites most affected were those close to where Ni-Cu is exploited. Environmental factors resulting from mining and smelting activities, among others, could be contributory to the negative health effects occurring at Selebi Phikwe.

  16. A study on PubMed search tag usage pattern: association rule mining of a full-day PubMed query log.

    PubMed

    Mosa, Abu Saleh Mohammad; Yoo, Illhoi

    2013-01-09

    The practice of evidence-based medicine requires efficient biomedical literature search such as PubMed/MEDLINE. Retrieval performance relies highly on the efficient use of search field tags. The purpose of this study was to analyze PubMed log data in order to understand the usage pattern of search tags by the end user in PubMed/MEDLINE search. A PubMed query log file was obtained from the National Library of Medicine containing anonymous user identification, timestamp, and query text. Inconsistent records were removed from the dataset and the search tags were extracted from the query texts. A total of 2,917,159 queries were selected for this study issued by a total of 613,061 users. The analysis of frequent co-occurrences and usage patterns of the search tags was conducted using an association mining algorithm. The percentage of search tag usage was low (11.38% of the total queries) and only 2.95% of queries contained two or more tags. Three out of four users used no search tag and about two-third of them issued less than four queries. Among the queries containing at least one tagged search term, the average number of search tags was almost half of the number of total search terms. Navigational search tags are more frequently used than informational search tags. While no strong association was observed between informational and navigational tags, six (out of 19) informational tags and six (out of 29) navigational tags showed strong associations in PubMed searches. The low percentage of search tag usage implies that PubMed/MEDLINE users do not utilize the features of PubMed/MEDLINE widely or they are not aware of such features or solely depend on the high recall focused query translation by the PubMed's Automatic Term Mapping. The users need further education and interactive search application for effective use of the search tags in order to fulfill their biomedical information needs from PubMed/MEDLINE.

  17. A Study on Pubmed Search Tag Usage Pattern: Association Rule Mining of a Full-day Pubmed Query Log

    PubMed Central

    2013-01-01

    Background The practice of evidence-based medicine requires efficient biomedical literature search such as PubMed/MEDLINE. Retrieval performance relies highly on the efficient use of search field tags. The purpose of this study was to analyze PubMed log data in order to understand the usage pattern of search tags by the end user in PubMed/MEDLINE search. Methods A PubMed query log file was obtained from the National Library of Medicine containing anonymous user identification, timestamp, and query text. Inconsistent records were removed from the dataset and the search tags were extracted from the query texts. A total of 2,917,159 queries were selected for this study issued by a total of 613,061 users. The analysis of frequent co-occurrences and usage patterns of the search tags was conducted using an association mining algorithm. Results The percentage of search tag usage was low (11.38% of the total queries) and only 2.95% of queries contained two or more tags. Three out of four users used no search tag and about two-third of them issued less than four queries. Among the queries containing at least one tagged search term, the average number of search tags was almost half of the number of total search terms. Navigational search tags are more frequently used than informational search tags. While no strong association was observed between informational and navigational tags, six (out of 19) informational tags and six (out of 29) navigational tags showed strong associations in PubMed searches. Conclusions The low percentage of search tag usage implies that PubMed/MEDLINE users do not utilize the features of PubMed/MEDLINE widely or they are not aware of such features or solely depend on the high recall focused query translation by the PubMed’s Automatic Term Mapping. The users need further education and interactive search application for effective use of the search tags in order to fulfill their biomedical information needs from PubMed/MEDLINE. PMID:23302604

  18. Quantification of Operational Risk Using A Data Mining

    NASA Technical Reports Server (NTRS)

    Perera, J. Sebastian

    1999-01-01

    What is Data Mining? - Data Mining is the process of finding actionable information hidden in raw data. - Data Mining helps find hidden patterns, trends, and important relationships often buried in a sea of data - Typically, automated software tools based on advanced statistical analysis and data modeling technology can be utilized to automate the data mining process

  19. A Temporal Pattern Mining Approach for Classifying Electronic Health Record Data

    PubMed Central

    Batal, Iyad; Valizadegan, Hamed; Cooper, Gregory F.; Hauskrecht, Milos

    2013-01-01

    We study the problem of learning classification models from complex multivariate temporal data encountered in electronic health record systems. The challenge is to define a good set of features that are able to represent well the temporal aspect of the data. Our method relies on temporal abstractions and temporal pattern mining to extract the classification features. Temporal pattern mining usually returns a large number of temporal patterns, most of which may be irrelevant to the classification task. To address this problem, we present the Minimal Predictive Temporal Patterns framework to generate a small set of predictive and non-spurious patterns. We apply our approach to the real-world clinical task of predicting patients who are at risk of developing heparin induced thrombocytopenia. The results demonstrate the benefit of our approach in efficiently learning accurate classifiers, which is a key step for developing intelligent clinical monitoring systems. PMID:25309815

  20. The Determination of Children's Knowledge of Global Lunar Patterns from Online Essays Using Text Mining Analysis

    ERIC Educational Resources Information Center

    Cheon, Jongpil; Lee, Sangno; Smith, Walter; Song, Jaeki; Kim, Yongjin

    2013-01-01

    The purpose of this study was to use text mining analysis of early adolescents' online essays to determine their knowledge of global lunar patterns. Australian and American students in grades five to seven wrote about global lunar patterns they had discovered by sharing observations with each other via the Internet. These essays were analyzed for…

  1. Using classification tree modelling to investigate drug prescription practices at health facilities in rural Tanzania.

    PubMed

    Kajungu, Dan K; Selemani, Majige; Masanja, Irene; Baraka, Amuri; Njozi, Mustafa; Khatib, Rashid; Dodoo, Alexander N; Binka, Fred; Macq, Jean; D'Alessandro, Umberto; Speybroeck, Niko

    2012-09-05

    Drug prescription practices depend on several factors related to the patient, health worker and health facilities. A better understanding of the factors influencing prescription patterns is essential to develop strategies to mitigate the negative consequences associated with poor practices in both the public and private sectors. A cross-sectional study was conducted in rural Tanzania among patients attending health facilities, and health workers. Patients, health workers and health facilities-related factors with the potential to influence drug prescription patterns were used to build a model of key predictors. Standard data mining methodology of classification tree analysis was used to define the importance of the different factors on prescription patterns. This analysis included 1,470 patients and 71 health workers practicing in 30 health facilities. Patients were mostly treated in dispensaries. Twenty two variables were used to construct two classification tree models: one for polypharmacy (prescription of ≥3 drugs) on a single clinic visit and one for co-prescription of artemether-lumefantrine (AL) with antibiotics. The most important predictor of polypharmacy was the diagnosis of several illnesses. Polypharmacy was also associated with little or no supervision of the health workers, administration of AL and private facilities. Co-prescription of AL with antibiotics was more frequent in children under five years of age and the other important predictors were transmission season, mode of diagnosis and the location of the health facility. Standard data mining methodology is an easy-to-implement analytical approach that can be useful for decision-making. Polypharmacy is mainly due to the diagnosis of multiple illnesses.

  2. Inferring Intra-Community Microbial Interaction Patterns from Metagenomic Datasets Using Associative Rule Mining Techniques

    PubMed Central

    Mande, Sharmila S.

    2016-01-01

    The nature of inter-microbial metabolic interactions defines the stability of microbial communities residing in any ecological niche. Deciphering these interaction patterns is crucial for understanding the mode/mechanism(s) through which an individual microbial community transitions from one state to another (e.g. from a healthy to a diseased state). Statistical correlation techniques have been traditionally employed for mining microbial interaction patterns from taxonomic abundance data corresponding to a given microbial community. In spite of their efficiency, these correlation techniques can capture only 'pair-wise interactions'. Moreover, their emphasis on statistical significance can potentially result in missing out on several interactions that are relevant from a biological standpoint. This study explores the applicability of one of the earliest association rule mining algorithm i.e. the 'Apriori algorithm' for deriving 'microbial association rules' from the taxonomic profile of given microbial community. The classical Apriori approach derives association rules by analysing patterns of co-occurrence/co-exclusion between various '(subsets of) features/items' across various samples. Using real-world microbiome data, the efficiency/utility of this rule mining approach in deciphering multiple (biologically meaningful) association patterns between 'subsets/subgroups' of microbes (constituting microbiome samples) is demonstrated. As an example, association rules derived from publicly available gut microbiome datasets indicate an association between a group of microbes (Faecalibacterium, Dorea, and Blautia) that are known to have mutualistic metabolic associations among themselves. Application of the rule mining approach on gut microbiomes (sourced from the Human Microbiome Project) further indicated similar microbial association patterns in gut microbiomes irrespective of the gender of the subjects. A Linux implementation of the Association Rule Mining (ARM) software (customised for deriving 'microbial association rules' from microbiome data) is freely available for download from the following link: http://metagenomics.atc.tcs.com/arm. PMID:27124399

  3. Inferring Intra-Community Microbial Interaction Patterns from Metagenomic Datasets Using Associative Rule Mining Techniques.

    PubMed

    Tandon, Disha; Haque, Mohammed Monzoorul; Mande, Sharmila S

    2016-01-01

    The nature of inter-microbial metabolic interactions defines the stability of microbial communities residing in any ecological niche. Deciphering these interaction patterns is crucial for understanding the mode/mechanism(s) through which an individual microbial community transitions from one state to another (e.g. from a healthy to a diseased state). Statistical correlation techniques have been traditionally employed for mining microbial interaction patterns from taxonomic abundance data corresponding to a given microbial community. In spite of their efficiency, these correlation techniques can capture only 'pair-wise interactions'. Moreover, their emphasis on statistical significance can potentially result in missing out on several interactions that are relevant from a biological standpoint. This study explores the applicability of one of the earliest association rule mining algorithm i.e. the 'Apriori algorithm' for deriving 'microbial association rules' from the taxonomic profile of given microbial community. The classical Apriori approach derives association rules by analysing patterns of co-occurrence/co-exclusion between various '(subsets of) features/items' across various samples. Using real-world microbiome data, the efficiency/utility of this rule mining approach in deciphering multiple (biologically meaningful) association patterns between 'subsets/subgroups' of microbes (constituting microbiome samples) is demonstrated. As an example, association rules derived from publicly available gut microbiome datasets indicate an association between a group of microbes (Faecalibacterium, Dorea, and Blautia) that are known to have mutualistic metabolic associations among themselves. Application of the rule mining approach on gut microbiomes (sourced from the Human Microbiome Project) further indicated similar microbial association patterns in gut microbiomes irrespective of the gender of the subjects. A Linux implementation of the Association Rule Mining (ARM) software (customised for deriving 'microbial association rules' from microbiome data) is freely available for download from the following link: http://metagenomics.atc.tcs.com/arm.

  4. A Framework for Mining Actionable Navigation Patterns from In-Store RFID Datasets via Indoor Mapping

    PubMed Central

    Shen, Bin; Zheng, Qiuhua; Li, Xingsen; Xu, Libo

    2015-01-01

    With the quick development of RFID technology and the decreasing prices of RFID devices, RFID is becoming widely used in various intelligent services. Especially in the retail application domain, RFID is increasingly adopted to capture the shopping tracks and behavior of in-store customers. To further enhance the potential of this promising application, in this paper, we propose a unified framework for RFID-based path analytics, which uses both in-store shopping paths and RFID-based purchasing data to mine actionable navigation patterns. Four modules of this framework are discussed, which are: (1) mapping from the physical space to the cyber space, (2) data preprocessing, (3) pattern mining and (4) knowledge understanding and utilization. In the data preprocessing module, the critical problem of how to capture the mainstream shopping path sequences while wiping out unnecessary redundant and repeated details is addressed in detail. To solve this problem, two types of redundant patterns, i.e., loop repeat pattern and palindrome-contained pattern are recognized and the corresponding processing algorithms are proposed. The experimental results show that the redundant pattern filtering functions are effective and scalable. Overall, this work builds a bridge between indoor positioning and advanced data mining technologies, and provides a feasible way to study customers’ shopping behaviors via multi-source RFID data. PMID:25751076

  5. Long Creek Creek Mine Drainage Study: South Fork Reservation: Final Report

    EPA Science Inventory

    To characterize water quality in streams affected by historical mining it is necessary to determine the seasonal and spatial distribution patterns of trace metals concentrations. Identification of these patterns is used to identify the trace metals that are of ecological concern ...

  6. Anoxia stimulates microbially catalyzed metal release from Animas River sediments.

    PubMed

    Saup, Casey M; Williams, Kenneth H; Rodríguez-Freire, Lucía; Cerrato, José M; Johnston, Michael D; Wilkins, Michael J

    2017-04-19

    The Gold King Mine spill in August 2015 released 11 million liters of metal-rich mine waste to the Animas River watershed, an area that has been previously exposed to historical mining activity spanning more than a century. Although adsorption onto fluvial sediments was responsible for rapid immobilization of a significant fraction of the spill-associated metals, patterns of longer-term mobility are poorly constrained. Metals associated with river sediments collected downstream of the Gold King Mine in August 2015 exhibited distinct presence and abundance patterns linked to location and mineralogy. Simulating riverbed burial and development of anoxic conditions, sediment microcosm experiments amended with Animas River dissolved organic carbon revealed the release of specific metal pools coupled to microbial Fe- and SO 4 2- -reduction. Results suggest that future sedimentation and burial of riverbed materials may drive longer-term changes in patterns of metal remobilization linked to anaerobic microbial metabolism, potentially driving decreases in downstream water quality. Such patterns emphasize the need for long-term water monitoring efforts in metal-impacted watersheds.

  7. 30 CFR 57.5002 - Exposure monitoring.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 30 Mineral Resources 1 2010-07-01 2010-07-01 false Exposure monitoring. 57.5002 Section 57.5002 Mineral Resources MINE SAFETY AND HEALTH ADMINISTRATION, DEPARTMENT OF LABOR METAL AND NONMETAL MINE... monitoring. Dust, gas, mist, and fume surveys shall be conducted as frequently as necessary to determine the...

  8. Analyzing Teaching Performance of Instructors Using Data Mining Techniques

    ERIC Educational Resources Information Center

    Mardikyan, Sona; Badur, Bertain

    2011-01-01

    Student evaluations to measure the teaching effectiveness of instructor's are very frequently applied in higher education for many years. This study investigates the factors associated with the assessment of instructors teaching performance using two different data mining techniques; stepwise regression and decision trees. The data collected…

  9. Techniques of Acceleration for Association Rule Induction with Pseudo Artificial Life Algorithm

    NASA Astrophysics Data System (ADS)

    Kanakubo, Masaaki; Hagiwara, Masafumi

    Frequent patterns mining is one of the important problems in data mining. Generally, the number of potential rules grows rapidly as the size of database increases. It is therefore hard for a user to extract the association rules. To avoid such a difficulty, we propose a new method for association rule induction with pseudo artificial life approach. The proposed method is to decide whether there exists an item set which contains N or more items in two transactions. If it exists, a series of item sets which are contained in the part of transactions will be recorded. The iteration of this step contributes to the extraction of association rules. It is not necessary to calculate the huge number of candidate rules. In the evaluation test, we compared the extracted association rules using our method with the rules using other algorithms like Apriori algorithm. As a result of the evaluation using huge retail market basket data, our method is approximately 10 and 20 times faster than the Apriori algorithm and many its variants.

  10. Knowledge discovery in cardiology: A systematic literature review.

    PubMed

    Kadi, I; Idri, A; Fernandez-Aleman, J L

    2017-01-01

    Data mining (DM) provides the methodology and technology needed to transform huge amounts of data into useful information for decision making. It is a powerful process employed to extract knowledge and discover new patterns embedded in large data sets. Data mining has been increasingly used in medicine, particularly in cardiology. In fact, DM applications can greatly benefit all those involved in cardiology, such as patients, cardiologists and nurses. The purpose of this paper is to review papers concerning the application of DM techniques in cardiology so as to summarize and analyze evidence regarding: (1) the DM techniques most frequently used in cardiology; (2) the performance of DM models in cardiology; (3) comparisons of the performance of different DM models in cardiology. We performed a systematic literature review of empirical studies on the application of DM techniques in cardiology published in the period between 1 January 2000 and 31 December 2015. A total of 149 articles published between 2000 and 2015 were selected, studied and analyzed according to the following criteria: DM techniques and performance of the approaches developed. The results obtained showed that a significant number of the studies selected used classification and prediction techniques when developing DM models. Neural networks, decision trees and support vector machines were identified as being the techniques most frequently employed when developing DM models in cardiology. Moreover, neural networks and support vector machines achieved the highest accuracy rates and were proved to be more efficient than other techniques. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  11. Identification of Work-Related Musculoskeletal Disorders in Mining

    PubMed Central

    Weston, Eric; Pollard, Jonisha P.

    2016-01-01

    Work-related musculoskeletal disorder (WMSD) prevention measures have been studied in great depth throughout various industries. While the nature and causes of these disorders have been characterized in many industries, WMSDs occurring in the U.S. mining sector have not been characterized for several years. In this report, MSHA accident/injury/illness data from 2009 to 2013 were characterized to determine the most frequently reported WMSDs in the U.S. mining sector. WMSDs were most frequently reported in workers with less than 5 years or more than 20 years of mining experience. The number of days lost from work was the highest for shoulder and knee injuries and was found to increase with worker age. Underground and surface coal, surface stone and stone processing plants experienced the greatest number of WMSDs over the period studied. WMSDs were most commonly caused by an employee suffering from an overexertion, falls or being struck by an object while performing materials handling, maintenance and repair tasks, getting on or off equipment or machines, and walking or running. The injury trends presented should be used to help determine the focus of future WMSD prevention research in mining. PMID:27294012

  12. Finding Frequent Closed Itemsets in Sliding Window in Linear Time

    NASA Astrophysics Data System (ADS)

    Chen, Junbo; Zhou, Bo; Chen, Lu; Wang, Xinyu; Ding, Yiqun

    One of the most well-studied problems in data mining is computing the collection of frequent itemsets in large transactional databases. Since the introduction of the famous Apriori algorithm [14], many others have been proposed to find the frequent itemsets. Among such algorithms, the approach of mining closed itemsets has raised much interest in data mining community. The algorithms taking this approach include TITANIC [8], CLOSET+[6], DCI-Closed [4], FCI-Stream [3], GC-Tree [15], TGC-Tree [16] etc. Among these algorithms, FCI-Stream, GC-Tree and TGC-Tree are online algorithms work under sliding window environments. By the performance evaluation in [16], GC-Tree [15] is the fastest one. In this paper, an improved algorithm based on GC-Tree is proposed, the computational complexity of which is proved to be a linear combination of the average transaction size and the average closed itemset size. The algorithm is based on the essential theorem presented in Sect. 4.2. Empirically, the new algorithm is several orders of magnitude faster than the state of art algorithm, GC-Tree.

  13. Comparative Data Mining Analysis for Information Retrieval of MODIS Images: Monitoring Lake Turbidity Changes at Lake Okeechobee, Florida

    EPA Science Inventory

    In the remote sensing field, a frequently recurring question is: Which computational intelligence or data mining algorithms are most suitable for the retrieval of essential information given that most natural systems exhibit very high non-linearity. Among potential candidates mig...

  14. Methane Content Estimation in DuongHuy Coal Mine

    NASA Astrophysics Data System (ADS)

    Nguyen, Van Thinh; Mijał, Waldemar; Dang, Vu Chi; Nguyen, Thi Tuyet Mai

    2018-03-01

    Methane hazard has always been considered for underground coal mining as it can lead to methane explosion. In Quang Ninh province, several coal mines such as Mạo Khe coal mine, Khe Cham coal mine, especially Duong Huy mine that have high methane content. Experimental data to examine contents of methane bearing coal seams at different depths are not similar in Duong coal mine. In order to ensure safety, this report has been undertaken to determine a pattern of changing methane contents of coal seams at different exploitation depths in Duong Huy underground coal mine.

  15. Seismic activity in the Sunnyside mining district, Utah, during 1967

    USGS Publications Warehouse

    Barnes, Barton K.; Dunrud, C. Richard; Hernandez, Jerome

    1969-01-01

    A seismic monitoring network near Sunnyside, Utah, consisting of a triangular array of seismometer stations that encompasses most of the mine workings in the district, recorded over 50,000 local earth tremors during 1967. About 540 of the tremors were of sufficient magnitude to be accurately located. Most of these were located within 2-3 miles of mine workings and were also near known or suspected faults. The district-wide seismic activity generally consisted of two different patterns--a periodic increase in the daily number of tremors at weekly intervals, and also a less regular and longer term increase and decrease of seismic activity that occurred over a period of weeks or even months. The shorter and more regular pattern can be correlated with the mine work week and seems to result from mining. The longer term activity, however, does not correlate with known mining causes sad therefore seems to be .caused by natural stresses.

  16. Finding Spatio-Temporal Patterns in Large Sensor Datasets

    ERIC Educational Resources Information Center

    McGuire, Michael Patrick

    2010-01-01

    Spatial or temporal data mining tasks are performed in the context of the relevant space, defined by a spatial neighborhood, and the relevant time period, defined by a specific time interval. Furthermore, when mining large spatio-temporal datasets, interesting patterns typically emerge where the dataset is most dynamic. This dissertation is…

  17. Examining Online Learning Patterns with Data Mining Techniques in Peer-Moderated and Teacher-Moderated Courses

    ERIC Educational Resources Information Center

    Hung, Jui-Long; Crooks, Steven M.

    2009-01-01

    The student learning process is important in online learning environments. If instructors can "observe" online learning behaviors, they can provide adaptive feedback, adjust instructional strategies, and assist students in establishing patterns of successful learning activities. This study used data mining techniques to examine and…

  18. Pattern Mining for Extraction of mentions of Adverse Drug Reactions from User Comments

    PubMed Central

    Nikfarjam, Azadeh; Gonzalez, Graciela H.

    2011-01-01

    Rapid growth of online health social networks has enabled patients to communicate more easily with each other. This way of exchange of opinions and experiences has provided a rich source of information about drugs and their effectiveness and more importantly, their possible adverse reactions. We developed a system to automatically extract mentions of Adverse Drug Reactions (ADRs) from user reviews about drugs in social network websites by mining a set of language patterns. The system applied association rule mining on a set of annotated comments to extract the underlying patterns of colloquial expressions about adverse effects. The patterns were tested on a set of unseen comments to evaluate their performance. We reached to precision of 70.01% and recall of 66.32% and F-measure of 67.96%. PMID:22195162

  19. Privacy Preserving Sequential Pattern Mining in Data Stream

    NASA Astrophysics Data System (ADS)

    Huang, Qin-Hua

    The privacy preserving data mining technique researches have gained much attention in recent years. For data stream systems, wireless networks and mobile devices, the related stream data mining techniques research is still in its' early stage. In this paper, an data mining algorithm dealing with privacy preserving problem in data stream is presented.

  20. The Lure of Statistics in Data Mining

    ERIC Educational Resources Information Center

    Grover, Lovleen Kumar; Mehra, Rajni

    2008-01-01

    The field of Data Mining like Statistics concerns itself with "learning from data" or "turning data into information". For statisticians the term "Data mining" has a pejorative meaning. Instead of finding useful patterns in large volumes of data as in the case of Statistics, data mining has the connotation of searching for data to fit preconceived…

  1. Combining QSAR Modeling and Text-Mining Techniques to Link Chemical Structures and Carcinogenic Modes of Action.

    PubMed

    Papamokos, George; Silins, Ilona

    2016-01-01

    There is an increasing need for new reliable non-animal based methods to predict and test toxicity of chemicals. Quantitative structure-activity relationship (QSAR), a computer-based method linking chemical structures with biological activities, is used in predictive toxicology. In this study, we tested the approach to combine QSAR data with literature profiles of carcinogenic modes of action automatically generated by a text-mining tool. The aim was to generate data patterns to identify associations between chemical structures and biological mechanisms related to carcinogenesis. Using these two methods, individually and combined, we evaluated 96 rat carcinogens of the hematopoietic system, liver, lung, and skin. We found that skin and lung rat carcinogens were mainly mutagenic, while the group of carcinogens affecting the hematopoietic system and the liver also included a large proportion of non-mutagens. The automatic literature analysis showed that mutagenicity was a frequently reported endpoint in the literature of these carcinogens, however, less common endpoints such as immunosuppression and hormonal receptor-mediated effects were also found in connection with some of the carcinogens, results of potential importance for certain target organs. The combined approach, using QSAR and text-mining techniques, could be useful for identifying more detailed information on biological mechanisms and the relation with chemical structures. The method can be particularly useful in increasing the understanding of structure and activity relationships for non-mutagens.

  2. Combining QSAR Modeling and Text-Mining Techniques to Link Chemical Structures and Carcinogenic Modes of Action

    PubMed Central

    Papamokos, George; Silins, Ilona

    2016-01-01

    There is an increasing need for new reliable non-animal based methods to predict and test toxicity of chemicals. Quantitative structure-activity relationship (QSAR), a computer-based method linking chemical structures with biological activities, is used in predictive toxicology. In this study, we tested the approach to combine QSAR data with literature profiles of carcinogenic modes of action automatically generated by a text-mining tool. The aim was to generate data patterns to identify associations between chemical structures and biological mechanisms related to carcinogenesis. Using these two methods, individually and combined, we evaluated 96 rat carcinogens of the hematopoietic system, liver, lung, and skin. We found that skin and lung rat carcinogens were mainly mutagenic, while the group of carcinogens affecting the hematopoietic system and the liver also included a large proportion of non-mutagens. The automatic literature analysis showed that mutagenicity was a frequently reported endpoint in the literature of these carcinogens, however, less common endpoints such as immunosuppression and hormonal receptor-mediated effects were also found in connection with some of the carcinogens, results of potential importance for certain target organs. The combined approach, using QSAR and text-mining techniques, could be useful for identifying more detailed information on biological mechanisms and the relation with chemical structures. The method can be particularly useful in increasing the understanding of structure and activity relationships for non-mutagens. PMID:27625608

  3. Detecting causality from online psychiatric texts using inter-sentential language patterns

    PubMed Central

    2012-01-01

    Background Online psychiatric texts are natural language texts expressing depressive problems, published by Internet users via community-based web services such as web forums, message boards and blogs. Understanding the cause-effect relations embedded in these psychiatric texts can provide insight into the authors’ problems, thus increasing the effectiveness of online psychiatric services. Methods Previous studies have proposed the use of word pairs extracted from a set of sentence pairs to identify cause-effect relations between sentences. A word pair is made up of two words, with one coming from the cause text span and the other from the effect text span. Analysis of the relationship between these words can be used to capture individual word associations between cause and effect sentences. For instance, (broke up, life) and (boyfriend, meaningless) are two word pairs extracted from the sentence pair: “I broke up with my boyfriend. Life is now meaningless to me”. The major limitation of word pairs is that individual words in sentences usually cannot reflect the exact meaning of the cause and effect events, and thus may produce semantically incomplete word pairs, as the previous examples show. Therefore, this study proposes the use of inter-sentential language patterns such as ≪broke up, boyfriend>,

  4. Source and path identification of metals pollution in a mining area by PMF and rare earth element patterns in road dust.

    PubMed

    Tian, Shuhan; Liang, Tao; Li, Kexin; Wang, Lingqing

    2018-08-15

    To better assess pollution and offer efficient protection for local residents, it is necessary to both conduct an exhaustive investigation into pollution levels and quantify its contributing sources and paths. As it is the biggest light rare earth element (REE) reserve in the world, Bayan Obo deposit releases large amounts of heavy metals into the surrounding environment. In this study, road dust from zones located at different distances to the mining area was collected and sieved using seven sizes. This allowed for subsequent analysis of size-dependent influences of mining activities. A receptor model was used to quantitatively assess mine contributions. REE distribution patterns and other REE parameters were compared with those in airborne particulates and the surrounding soil to analyze pollution paths. Results showed that 27 metals were rated as moderately to extremely polluted (2

  5. Occupational Malfunctioning and Fatigue Related Work Stress Disorders (FRWSDs): An Emerging Issue in Indian Underground Mine (UGM) Operations

    NASA Astrophysics Data System (ADS)

    Dey, Shibaji Ch.; Dey, Netai Chandra; Sharma, Gourab Dhara

    2018-04-01

    Indian underground mining (UGM) transport system largely deals with different fore and back bearing work processes associated with different occupational disorders and fatigue related work stress disorders (FRWSDs). Therefore, this research study is specifically aimed to determine the fatigue related problems in general and determination of Recovery Heart Rate (Rec HR) pattern and exact cause of FRWSDs in particular. A group of twenty (N = 20) UGM operators are selected for the study. Heart rate profiles and work intensities of selected workforces have been recorded continuously during their regular mine operation and the same workforces are tested on a treadmill on surface with almost same work intensity (%Maximal Heart Rate) which was earlier observed in the mine. Recovery Heart Rate (Rec HR) in both the experiment zones is recorded. It is observed that with almost same work intensity, the recovery patterns of submaximal prolonged work in mine are different as compared to treadmill. This research study indicates that non-biomechanical muscle activity along with environmental stressors may have an influence on recovery pattern and FRWSDs.

  6. Data mining applications in the context of casemix.

    PubMed

    Koh, H C; Leong, S K

    2001-07-01

    In October 1999, the Singapore Government introduced casemix-based funding to public hospitals. The casemix approach to health care funding is expected to yield significant benefits, including equity and rationality in financing health care, the use of comparative casemix data for quality improvement activities, and the provision of information that enables hospitals to understand their cost behaviour and reinforces the drive for more cost-efficient services. However, there is some concern about the "quicker and sicker" syndrome (that is, the rapid discharge of patients with little regard for the quality of outcome). As it is likely that consequences of premature discharges will be reflected in the readmission data, an analysis of possible systematic patterns in readmission data can provide useful insight into the "quicker and sicker" syndrome. This paper explores potential data mining applications in the context of casemix by using readmission data as an illustration. In particular, it illustrates how data mining can be used to better understand readmission data and to detect systematic patterns, if any. From a technical perspective, data mining (which is capable of analysing complex non-linear and interaction relationships) supplements and complements traditional statistical methods in data analysis. From an applications perspective, data mining provides the technology and methodology to analyse mass volume of data to detect hidden patterns in data. Using readmission data as an illustrative data mining application, this paper explores potential data mining applications in the general casemix context.

  7. Complex Feeding Tracks of the Sessile Herbivorous Insect Ophiomyia maura as a Function of the Defense against Insect Parasitoids

    PubMed Central

    Ayabe, Yoshiko; Ueno, Takatoshi

    2012-01-01

    Because insect herbivores generally suffer from high mortality due to their natural enemies, reducing the risk of being located by natural enemies is of critical importance for them, forcing them to develop a variety of defensive measures. Larvae of leaf-mining insects lead a sedentary life inside a leaf and make conspicuous feeding tracks called mines, exposing themselves to the potential risk of parasitism. We investigated the defense strategy of the linear leafminer Ophiomyia maura Meigen (Diptera: Agromyzidae), by focusing on its mining patterns. We examined whether the leafminer could reduce the risk of being parasitized (1) by making cross structures in the inner area of a leaf to deter parasitoids from tracking the mines due to complex pathways, and (2) by mining along the edge of a leaf to hinder visually searching parasitoids from finding mined leaves due to effective background matching of the mined leaves among intact leaves. We quantified fractal dimension as mine complexity and area of mine in the inner area of the leaf as interior mine density for each sample mine, and analyzed whether these mine traits affected the susceptibility of O. maura to parasitism. Our results have shown that an increase in mine complexity with the development of occupying larvae decreases the probability of being parasitized, while interior mine density has no influence on parasitism. These results suggest that the larval development increases the host defense ability through increasing mine complexity. Thus the feeding pattern of these sessile insects has a defensive function by reducing the risk of parasitism. PMID:22393419

  8. Anoxia stimulates microbially catalyzed metal release from Animas River sediments

    DOE PAGES

    Saup, Casey M.; Williams, Kenneth H.; Rodríguez-Freire, Lucía; ...

    2017-03-06

    The Gold King Mine spill in August 2015 released 11 million liters of metal-rich mine waste to the Animas River watershed, an area that has been previously exposed to historical mining activity spanning more than a century. Although adsorption onto fluvial sediments was responsible for rapid immobilization of a significant fraction of the spill-associated metals, patterns of longer-term mobility are poorly constrained. Metals associated with river sediments collected downstream of the Gold King Mine in August 2015 exhibited distinct presence and abundance patterns linked to location and mineralogy. Simulating riverbed burial and development of anoxic conditions, sediment microcosm experiments amendedmore » with Animas River dissolved organic carbon revealed the release of specific metal pools coupled to microbial Fe- and SO 4 2-reduction. Results suggest that future sedimentation and burial of riverbed materials may drive longer-term changes in patterns of metal remobilization linked to anaerobic microbial metabolism, potentially driving decreases in downstream water quality. Such patterns emphasize the need for long-term water monitoring efforts in metal-impacted watersheds.« less

  9. Anoxia stimulates microbially catalyzed metal release from Animas River sediments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Saup, Casey M.; Williams, Kenneth H.; Rodríguez-Freire, Lucía

    The Gold King Mine spill in August 2015 released 11 million liters of metal-rich mine waste to the Animas River watershed, an area that has been previously exposed to historical mining activity spanning more than a century. Although adsorption onto fluvial sediments was responsible for rapid immobilization of a significant fraction of the spill-associated metals, patterns of longer-term mobility are poorly constrained. Metals associated with river sediments collected downstream of the Gold King Mine in August 2015 exhibited distinct presence and abundance patterns linked to location and mineralogy. Simulating riverbed burial and development of anoxic conditions, sediment microcosm experiments amendedmore » with Animas River dissolved organic carbon revealed the release of specific metal pools coupled to microbial Fe- and SO 4 2-reduction. Results suggest that future sedimentation and burial of riverbed materials may drive longer-term changes in patterns of metal remobilization linked to anaerobic microbial metabolism, potentially driving decreases in downstream water quality. Such patterns emphasize the need for long-term water monitoring efforts in metal-impacted watersheds.« less

  10. Mining Temporal Patterns to Improve Agents Behavior: Two Case Studies

    NASA Astrophysics Data System (ADS)

    Fournier-Viger, Philippe; Nkambou, Roger; Faghihi, Usef; Nguifo, Engelbert Mephu

    We propose two mechanisms for agent learning based on the idea of mining temporal patterns from agent behavior. The first one consists of extracting temporal patterns from the perceived behavior of other agents accomplishing a task, to learn the task. The second learning mechanism consists in extracting temporal patterns from an agent's own behavior. In this case, the agent then reuses patterns that brought self-satisfaction. In both cases, no assumption is made on how the observed agents' behavior is internally generated. A case study with a real application is presented to illustrate each learning mechanism.

  11. On mining complex sequential data by means of FCA and pattern structures

    NASA Astrophysics Data System (ADS)

    Buzmakov, Aleksey; Egho, Elias; Jay, Nicolas; Kuznetsov, Sergei O.; Napoli, Amedeo; Raïssi, Chedy

    2016-02-01

    Nowadays data-sets are available in very complex and heterogeneous ways. Mining of such data collections is essential to support many real-world applications ranging from healthcare to marketing. In this work, we focus on the analysis of "complex" sequential data by means of interesting sequential patterns. We approach the problem using the elegant mathematical framework of formal concept analysis and its extension based on "pattern structures". Pattern structures are used for mining complex data (such as sequences or graphs) and are based on a subsumption operation, which in our case is defined with respect to the partial order on sequences. We show how pattern structures along with projections (i.e. a data reduction of sequential structures) are able to enumerate more meaningful patterns and increase the computing efficiency of the approach. Finally, we show the applicability of the presented method for discovering and analysing interesting patient patterns from a French healthcare data-set on cancer. The quantitative and qualitative results (with annotations and analysis from a physician) are reported in this use-case which is the main motivation for this work.

  12. Analyzing Large Gene Expression and Methylation Data Profiles Using StatBicRM: Statistical Biclustering-Based Rule Mining

    PubMed Central

    Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

    2015-01-01

    Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level. PMID:25830807

  13. Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.

    PubMed

    Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

    2015-01-01

    Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level.

  14. Application-Specific Graph Sampling for Frequent Subgraph Mining and Community Detection

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Purohit, Sumit; Choudhury, Sutanay; Holder, Lawrence B.

    Graph mining is an important data analysis methodology, but struggles as the input graph size increases. The scalability and usability challenges posed by such large graphs make it imperative to sample the input graph and reduce its size. The critical challenge in sampling is to identify the appropriate algorithm to insure the resulting analysis does not suffer heavily from the data reduction. Predicting the expected performance degradation for a given graph and sampling algorithm is also useful. In this paper, we present different sampling approaches for graph mining applications such as Frequent Subgrpah Mining (FSM), and Community Detection (CD). Wemore » explore graph metrics such as PageRank, Triangles, and Diversity to sample a graph and conclude that for heterogeneous graphs Triangles and Diversity perform better than degree based metrics. We also present two new sampling variations for targeted graph mining applications. We present empirical results to show that knowledge of the target application, along with input graph properties can be used to select the best sampling algorithm. We also conclude that performance degradation is an abrupt, rather than gradual phenomena, as the sample size decreases. We present the empirical results to show that the performance degradation follows a logistic function.« less

  15. Integrated field and laboratory tests to evaluate effects of metals-impacted wetlands on amphibians: A case study from Montana

    USGS Publications Warehouse

    Linder, G.; ,

    2003-01-01

    Mining activities frequently impact wildlife habitats, and a wide range of habitats may require evaluations of the linkages between wildlife and environmental stressors common to mining activities (e.g., physical alteration of habitat, releases of chemicals such as metals and other inorganic constituents as part of the mining operation). Wetlands, for example, are frequently impacted by mining activities. Within an ecological assessment for a wetland, toxicity evaluations for representative species may be advantageous to the site evaluation, since these species could be exposed to complex chemical mixtures potentially released from the site. Amphibian species common to these transition zones between terrestrial and aquatic habitats are one key biological indicator of exposure, and integrated approaches which involve both field and laboratory methods focused on amphibians are critical to the assessment process. The laboratory and field evaluations of a wetland in western Montana illustrates the integrated approach to risk assessment and causal analysis. Here, amphibians were used to evaluate the potential toxicity associated with heavy metal-laden sediments deposited in a reservoir. Field and laboratory methods were applied to a toxicity assessment for metals characteristic of mine tailings to reduce potential "lab to field" extrapolation errors and provide adaptive management programs with critical site-specific information targeted on remediation.

  16. Porites corals as recorders of mining and environmental impacts: Misima Island, Papua New Guinea

    NASA Astrophysics Data System (ADS)

    Fallon, Stewart J.; White, Jamie C.; McCulloch, Malcolm T.

    2002-01-01

    In 1989 open-cut gold mining commenced on Misima Island in Papua New Guinea (PNG). Open-cut mining by its nature causes a significant increase in sedimentation via the exposure of soils to the erosive forces of rain and runoff. This increased sedimentation affected the nearby fringing coral reef to varying degrees, ranging from coral mortality (smothering) to relatively minor short-term impacts. The sediment associated with the mining operation consists of weathered quartz feldspar, greenstone, and schist. These rocks have distinct chemical characteristics (rare earth element patterns and high abundances of manganese, zinc, and lead) and are entering the near-shore environment in considerably higher than normal concentrations. Using laser ablation inductively coupled plasma mass spectrometry (LA-ICP-MS), we analyzed eight colonies (two from high sedimentation, two transitional, two minor, and two unaffected control sites) for Y, La, Ce, Mn, Zn, and Pb. All sites show low steady background levels prior to the commencement of mining in 1988. Subsequently, all sites apart from the control show dramatic increases of Y, La, and Ce associated with the increased sedimentation as well as rapid decreases following the cessation of mining. The elements Zn and Pb exhibit a different behavior, increasing in concentration after 1989 when ore processing began and one year after initial mining operations. Elevated levels of Zn and Pb in corals has continued well after the cessation of mining, indicating ongoing transport into the reef of these metals via sulfate-rich waters. Rare earth element (REE) abundance patterns measured in two corals show significant differences compared to Coral Sea seawater. The corals display enrichments in the light and middle REEs while the heavy REEs are depleted relative to the seawater pattern. This suggests that the nearshore seawater REE pattern is dominated by island sedimentation. Trace element abundances of Misima Island corals clearly record the dramatic changes in the environmental conditions at this site and provide a basis for identifying anthropogenic influences on corals reefs.

  17. Improve Data Mining and Knowledge Discovery Through the Use of MatLab

    NASA Technical Reports Server (NTRS)

    Shaykhian, Gholam Ali; Martin, Dawn (Elliott); Beil, Robert

    2011-01-01

    Data mining is widely used to mine business, engineering, and scientific data. Data mining uses pattern based queries, searches, or other analyses of one or more electronic databases/datasets in order to discover or locate a predictive pattern or anomaly indicative of system failure, criminal or terrorist activity, etc. There are various algorithms, techniques and methods used to mine data; including neural networks, genetic algorithms, decision trees, nearest neighbor method, rule induction association analysis, slice and dice, segmentation, and clustering. These algorithms, techniques and methods used to detect patterns in a dataset, have been used in the development of numerous open source and commercially available products and technology for data mining. Data mining is best realized when latent information in a large quantity of data stored is discovered. No one technique solves all data mining problems; challenges are to select algorithms or methods appropriate to strengthen data/text mining and trending within given datasets. In recent years, throughout industry, academia and government agencies, thousands of data systems have been designed and tailored to serve specific engineering and business needs. Many of these systems use databases with relational algebra and structured query language to categorize and retrieve data. In these systems, data analyses are limited and require prior explicit knowledge of metadata and database relations; lacking exploratory data mining and discoveries of latent information. This presentation introduces MatLab(R) (MATrix LABoratory), an engineering and scientific data analyses tool to perform data mining. MatLab was originally intended to perform purely numerical calculations (a glorified calculator). Now, in addition to having hundreds of mathematical functions, it is a programming language with hundreds built in standard functions and numerous available toolboxes. MatLab's ease of data processing, visualization and its enormous availability of built in functionalities and toolboxes make it suitable to perform numerical computations and simulations as well as a data mining tool. Engineers and scientists can take advantage of the readily available functions/toolboxes to gain wider insight in their perspective data mining experiments.

  18. Improve Data Mining and Knowledge Discovery through the use of MatLab

    NASA Technical Reports Server (NTRS)

    Shaykahian, Gholan Ali; Martin, Dawn Elliott; Beil, Robert

    2011-01-01

    Data mining is widely used to mine business, engineering, and scientific data. Data mining uses pattern based queries, searches, or other analyses of one or more electronic databases/datasets in order to discover or locate a predictive pattern or anomaly indicative of system failure, criminal or terrorist activity, etc. There are various algorithms, techniques and methods used to mine data; including neural networks, genetic algorithms, decision trees, nearest neighbor method, rule induction association analysis, slice and dice, segmentation, and clustering. These algorithms, techniques and methods used to detect patterns in a dataset, have been used in the development of numerous open source and commercially available products and technology for data mining. Data mining is best realized when latent information in a large quantity of data stored is discovered. No one technique solves all data mining problems; challenges are to select algorithms or methods appropriate to strengthen data/text mining and trending within given datasets. In recent years, throughout industry, academia and government agencies, thousands of data systems have been designed and tailored to serve specific engineering and business needs. Many of these systems use databases with relational algebra and structured query language to categorize and retrieve data. In these systems, data analyses are limited and require prior explicit knowledge of metadata and database relations; lacking exploratory data mining and discoveries of latent information. This presentation introduces MatLab(TradeMark)(MATrix LABoratory), an engineering and scientific data analyses tool to perform data mining. MatLab was originally intended to perform purely numerical calculations (a glorified calculator). Now, in addition to having hundreds of mathematical functions, it is a programming language with hundreds built in standard functions and numerous available toolboxes. MatLab's ease of data processing, visualization and its enormous availability of built in functionalities and toolboxes make it suitable to perform numerical computations and simulations as well as a data mining tool. Engineers and scientists can take advantage of the readily available functions/toolboxes to gain wider insight in their perspective data mining experiments.

  19. Study of acid mine drainage management with evaluating climate and rainfall in East Pit 3 West Banko coal mine

    NASA Astrophysics Data System (ADS)

    Rochyani, Neny

    2017-11-01

    Acid mine drainage is a major problem for the mining environment. The main factor that formed acid mine drainage is the volume of rainfall. Therefore, it is important to know clearly the main climate pattern of rainfall and season on the management of acid mine drainage. This study focuses on the effects of rainfall on acid mine water management. Based on daily rainfall data, monthly and seasonal patterns by using Gumbel approach is known the amount of rainfall that occurred in East Pit 3 West Banko area. The data also obtained the highest maximum daily rainfall on 165 mm/day and the lowest at 76.4 mm/day, where it is known that the rainfall conditions during the period 2007 - 2016 is from November to April so the use of lime is also slightly, While the low rainfall is from May to October and the use of lime will be more and more. Based on calculation of lime requirement for each return period, it can be seen the total of lime and financial requirement for treatment of each return period.

  20. Safety Psychology Applicating on Coal Mine Safety Management Based on Information System

    NASA Astrophysics Data System (ADS)

    Hou, Baoyue; Chen, Fei

    In recent years, with the increase of intensity of coal mining, a great number of major accidents happen frequently, the reason mostly due to human factors, but human's unsafely behavior are affected by insecurity mental control. In order to reduce accidents, and to improve safety management, with the help of application security psychology, we analyse the cause of insecurity psychological factors from human perception, from personality development, from motivation incentive, from reward and punishment mechanism, and from security aspects of mental training , and put forward countermeasures to promote coal mine safety production,and to provide information for coal mining to improve the level of safety management.

  1. Data Mining Citizen Science Results

    NASA Astrophysics Data System (ADS)

    Borne, K. D.

    2012-12-01

    Scientific discovery from big data is enabled through multiple channels, including data mining (through the application of machine learning algorithms) and human computation (commonly implemented through citizen science tasks). We will describe the results of new data mining experiments on the results from citizen science activities. Discovering patterns, trends, and anomalies in data are among the powerful contributions of citizen science. Establishing scientific algorithms that can subsequently re-discover the same types of patterns, trends, and anomalies in automatic data processing pipelines will ultimately result from the transformation of those human algorithms into computer algorithms, which can then be applied to much larger data collections. Scientific discovery from big data is thus greatly amplified through the marriage of data mining with citizen science.

  2. Data mining in pharma sector: benefits.

    PubMed

    Ranjan, Jayanthi

    2009-01-01

    The amount of data getting generated in any sector at present is enormous. The information flow in the pharma industry is huge. Pharma firms are progressing into increased technology-enabled products and services. Data mining, which is knowledge discovery from large sets of data, helps pharma firms to discover patterns in improving the quality of drug discovery and delivery methods. The paper aims to present how data mining is useful in the pharma industry, how its techniques can yield good results in pharma sector, and to show how data mining can really enhance in making decisions using pharmaceutical data. This conceptual paper is written based on secondary study, research and observations from magazines, reports and notes. The author has listed the types of patterns that can be discovered using data mining in pharma data. The paper shows how data mining is useful in the pharma industry and how its techniques can yield good results in pharma sector. Although much work can be produced for discovering knowledge in pharma data using data mining, the paper is limited to conceptualizing the ideas and view points at this stage; future work may include applying data mining techniques to pharma data based on primary research using the available, famous significant data mining tools. Research papers and conceptual papers related to data mining in Pharma industry are rare; this is the motivation for the paper.

  3. Process Mining Online Assessment Data

    ERIC Educational Resources Information Center

    Pechenizkiy, Mykola; Trcka, Nikola; Vasilyeva, Ekaterina; van der Aalst, Wil; De Bra, Paul

    2009-01-01

    Traditional data mining techniques have been extensively applied to find interesting patterns, build descriptive and predictive models from large volumes of data accumulated through the use of different information systems. The results of data mining can be used for getting a better understanding of the underlying educational processes, for…

  4. Mechanistic insights of the Min oscillator via cell-free reconstitution and imaging

    NASA Astrophysics Data System (ADS)

    Mizuuchi, Kiyoshi; Vecchiarelli, Anthony G.

    2018-05-01

    The MinD and MinE proteins of Escherichia coli self-organize into a standing-wave oscillator on the membrane to help align division at mid-cell. When unleashed from cellular confines, MinD and MinE form a spectrum of patterns on artificial bilayers—static amoebas, traveling waves, traveling mushrooms, and bursts with standing-wave dynamics. We recently focused our cell-free studies on bursts because their dynamics recapitulate many features of Min oscillation observed in vivo. The data unveiled a patterning mechanism largely governed by MinE regulation of MinD interaction with membrane. We proposed that the MinD to MinE ratio on the membrane acts as a toggle switch between MinE-stimulated recruitment and release of MinD from the membrane. In this review, we summarize cell-free data on the Min system and expand upon a molecular mechanism that provides a biochemical explanation as to how these two ‘simple’ proteins can form the remarkable spectrum of patterns.

  5. AprioriGWAS, a new pattern mining strategy for detecting genetic variants associated with disease through interaction effects.

    PubMed

    Zhang, Qingrun; Long, Quan; Ott, Jurg

    2014-06-01

    Identifying gene-gene interaction is a hot topic in genome wide association studies. Two fundamental challenges are: (1) how to smartly identify combinations of variants that may be associated with the trait from astronomical number of all possible combinations; and (2) how to test epistatic interaction when all potential combinations are available. We developed AprioriGWAS, which brings two innovations. (1) Based on Apriori, a successful method in field of Frequent Itemset Mining (FIM) in which a pattern growth strategy is leveraged to effectively and accurately reduce search space, AprioriGWAS can efficiently identify genetically associated genotype patterns. (2) To test the hypotheses of epistasis, we adopt a new conditional permutation procedure to obtain reliable statistical inference of Pearson's chi-square test for the [Formula: see text] contingency table generated by associated variants. By applying AprioriGWAS to age-related macular degeneration (AMD) data, we found that: (1) angiopoietin 1 (ANGPT1) and four retinal genes interact with Complement Factor H (CFH). (2) GO term "glycosaminoglycan biosynthetic process" was enriched in AMD interacting genes. The epistatic interactions newly found by AprioriGWAS on AMD data are likely true interactions, since genes interacting with CFH are retinal genes, and GO term enrichment also verified that interaction between glycosaminoglycans (GAGs) and CFH plays an important role in disease pathology of AMD. By applying AprioriGWAS on Bipolar disorder in WTCCC data, we found variants without marginal effect show significant interactions. For example, multiple-SNP genotype patterns inside gene GABRB2 and GRIA1 (AMPA subunit 1 receptor gene). AMPARs are found in many parts of the brain and are the most commonly found receptor in the nervous system. The GABRB2 mediates the fastest inhibitory synaptic transmission in the central nervous system. GRIA1 and GABRB2 are relevant to mental disorders supported by multiple evidences.

  6. A Study on Environmental Research Trends Using Text-Mining Method - Focus on Spatial information and ICT -

    NASA Astrophysics Data System (ADS)

    Lee, M. J.; Oh, K. Y.; Joung-ho, L.

    2016-12-01

    Recently there are many research about analysing the interaction between entities by text-mining analysis in various fields. In this paper, we aimed to quantitatively analyse research-trends in the area of environmental research relating either spatial information or ICT (Information and Communications Technology) by Text-mining analysis. To do this, we applied low-dimensional embedding method, clustering analysis, and association rule to find meaningful associative patterns of key words frequently appeared in the articles. As the authors suppose that KCI (Korea Citation Index) articles reflect academic demands, total 1228 KCI articles that have been published from 1996 to 2015 were reviewed and analysed by Text-mining method. First, we derived KCI articles from NDSL(National Discovery for Science Leaders) site. And then we pre-processed their key-words elected from abstract and then classified those in separable sectors. We investigated the appearance rates and association rule of key-words for articles in the two fields: spatial-information and ICT. In order to detect historic trends, analysis was conducted separately for the four periods: 1996-2000, 2001-2005, 2006-2010, 2011-2015. These analysis were conducted with the usage of R-software. As a result, we conformed that environmental research relating spatial information mainly focused upon such fields as `GIS(35%)', `Remote-Sensing(25%)', `environmental theme map(15.7%)'. Next, `ICT technology(23.6%)', `ICT service(5.4%)', `mobile(24%)', `big data(10%)', `AI(7%)' are primarily emerging from environmental research relating ICT. Thus, from the analysis results, this paper asserts that research trends and academic progresses are well-structured to review recent spatial information and ICT technology and the outcomes of the analysis can be an adequate guidelines to establish environment policies and strategies. KEY WORDS: Big data, Test-mining, Environmental research, Spatial-information, ICT Acknowledgements: The authors appreciate the support that this study has received from `Building application frame of environmental issues, to respond to the latest ICT trends'.

  7. Information mining over heterogeneous and high-dimensional time-series data in clinical trials databases.

    PubMed

    Altiparmak, Fatih; Ferhatosmanoglu, Hakan; Erdal, Selnur; Trost, Donald C

    2006-04-01

    An effective analysis of clinical trials data involves analyzing different types of data such as heterogeneous and high dimensional time series data. The current time series analysis methods generally assume that the series at hand have sufficient length to apply statistical techniques to them. Other ideal case assumptions are that data are collected in equal length intervals, and while comparing time series, the lengths are usually expected to be equal to each other. However, these assumptions are not valid for many real data sets, especially for the clinical trials data sets. An addition, the data sources are different from each other, the data are heterogeneous, and the sensitivity of the experiments varies by the source. Approaches for mining time series data need to be revisited, keeping the wide range of requirements in mind. In this paper, we propose a novel approach for information mining that involves two major steps: applying a data mining algorithm over homogeneous subsets of data, and identifying common or distinct patterns over the information gathered in the first step. Our approach is implemented specifically for heterogeneous and high dimensional time series clinical trials data. Using this framework, we propose a new way of utilizing frequent itemset mining, as well as clustering and declustering techniques with novel distance metrics for measuring similarity between time series data. By clustering the data, we find groups of analytes (substances in blood) that are most strongly correlated. Most of these relationships already known are verified by the clinical panels, and, in addition, we identify novel groups that need further biomedical analysis. A slight modification to our algorithm results an effective declustering of high dimensional time series data, which is then used for "feature selection." Using industry-sponsored clinical trials data sets, we are able to identify a small set of analytes that effectively models the state of normal health.

  8. A Contextualized, Differential Sequence Mining Method to Derive Students' Learning Behavior Patterns

    ERIC Educational Resources Information Center

    Kinnebrew, John S.; Loretz, Kirk M.; Biswas, Gautam

    2013-01-01

    Computer-based learning environments can produce a wealth of data on student learning interactions. This paper presents an exploratory data mining methodology for assessing and comparing students' learning behaviors from these interaction traces. The core algorithm employs a novel combination of sequence mining techniques to identify deferentially…

  9. Collaborative mining of graph patterns from multiple sources

    NASA Astrophysics Data System (ADS)

    Levchuk, Georgiy; Colonna-Romanoa, John

    2016-05-01

    Intelligence analysts require automated tools to mine multi-source data, including answering queries, learning patterns of life, and discovering malicious or anomalous activities. Graph mining algorithms have recently attracted significant attention in intelligence community, because the text-derived knowledge can be efficiently represented as graphs of entities and relationships. However, graph mining models are limited to use-cases involving collocated data, and often make restrictive assumptions about the types of patterns that need to be discovered, the relationships between individual sources, and availability of accurate data segmentation. In this paper we present a model to learn the graph patterns from multiple relational data sources, when each source might have only a fragment (or subgraph) of the knowledge that needs to be discovered, and segmentation of data into training or testing instances is not available. Our model is based on distributed collaborative graph learning, and is effective in situations when the data is kept locally and cannot be moved to a centralized location. Our experiments show that proposed collaborative learning achieves learning quality better than aggregated centralized graph learning, and has learning time comparable to traditional distributed learning in which a knowledge of data segmentation is needed.

  10. Collaborative mining and transfer learning for relational data

    NASA Astrophysics Data System (ADS)

    Levchuk, Georgiy; Eslami, Mohammed

    2015-06-01

    Many of the real-world problems, - including human knowledge, communication, biological, and cyber network analysis, - deal with data entities for which the essential information is contained in the relations among those entities. Such data must be modeled and analyzed as graphs, with attributes on both objects and relations encode and differentiate their semantics. Traditional data mining algorithms were originally designed for analyzing discrete objects for which a set of features can be defined, and thus cannot be easily adapted to deal with graph data. This gave rise to the relational data mining field of research, of which graph pattern learning is a key sub-domain [11]. In this paper, we describe a model for learning graph patterns in collaborative distributed manner. Distributed pattern learning is challenging due to dependencies between the nodes and relations in the graph, and variability across graph instances. We present three algorithms that trade-off benefits of parallelization and data aggregation, compare their performance to centralized graph learning, and discuss individual benefits and weaknesses of each model. Presented algorithms are designed for linear speedup in distributed computing environments, and learn graph patterns that are both closer to ground truth and provide higher detection rates than centralized mining algorithm.

  11. SOMA: A Proposed Framework for Trend Mining in Large UK Diabetic Retinopathy Temporal Databases

    NASA Astrophysics Data System (ADS)

    Somaraki, Vassiliki; Harding, Simon; Broadbent, Deborah; Coenen, Frans

    In this paper, we present SOMA, a new trend mining framework; and Aretaeus, the associated trend mining algorithm. The proposed framework is able to detect different kinds of trends within longitudinal datasets. The prototype trends are defined mathematically so that they can be mapped onto the temporal patterns. Trends are defined and generated in terms of the frequency of occurrence of pattern changes over time. To evaluate the proposed framework the process was applied to a large collection of medical records, forming part of the diabetic retinopathy screening programme at the Royal Liverpool University Hospital.

  12. Web usage data mining agent

    NASA Astrophysics Data System (ADS)

    Madiraju, Praveen; Zhang, Yanqing

    2002-03-01

    When a user logs in to a website, behind the scenes the user leaves his/her impressions, usage patterns and also access patterns in the web servers log file. A web usage mining agent can analyze these web logs to help web developers to improve the organization and presentation of their websites. They can help system administrators in improving the system performance. Web logs provide invaluable help in creating adaptive web sites and also in analyzing the network traffic analysis. This paper presents the design and implementation of a Web usage mining agent for digging in to the web log files.

  13. Data mining for blood glucose prediction and knowledge discovery in diabetic patients: the METABO diabetes modeling and management system.

    PubMed

    Georga, Eleni; Protopappas, Vasilios; Guillen, Alejandra; Fico, Giuseppe; Ardigo, Diego; Arredondo, Maria Teresa; Exarchos, Themis P; Polyzos, Demosthenes; Fotiadis, Dimitrios I

    2009-01-01

    METABO is a diabetes monitoring and management system which aims at recording and interpreting patient's context, as well as, at providing decision support to both the patient and the doctor. The METABO system consists of (a) a Patient's Mobile Device (PMD), (b) different types of unobtrusive biosensors, (c) a Central Subsystem (CS) located remotely at the hospital and (d) the Control Panel (CP) from which physicians can follow-up their patients and gain also access to the CS. METABO provides a multi-parametric monitoring system which facilitates the efficient and systematic recording of dietary, physical activity, medication and medical information (continuous and discontinuous glucose measurements). Based on all recorded contextual information, data mining schemes that run in the PMD are responsible to model patients' metabolism, predict hypo/hyper-glycaemic events, and provide the patient with short and long-term alerts. In addition, all past and recently-recorded data are analyzed to extract patterns of behavior, discover new knowledge and provide explanations to the physician through the CP. Advanced tools in the CP allow the physician to prescribe personalized treatment plans and frequently quantify patient's adherence to treatment.

  14. Data Mining for Financial Applications

    NASA Astrophysics Data System (ADS)

    Kovalerchuk, Boris; Vityaev, Evgenii

    This chapter describes Data Mining in finance by discussing financial tasks, specifics of methodologies and techniques in this Data Mining area. It includes time dependence, data selection, forecast horizon, measures of success, quality of patterns, hypothesis evaluation, problem ID, method profile, attribute-based and relational methodologies. The second part of the chapter discusses Data Mining models and practice in finance. It covers use of neural networks in portfolio management, design of interpretable trading rules and discovering money laundering schemes using decision rules and relational Data Mining methodology.

  15. 76 FR 5719 - Pattern of Violations

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-02-02

    ... safety and health record of each mine rather than on a strictly quantitative comparison of mines to... several reservations, given the methodological difficulties involved in estimating the compensating wage...

  16. Algorithm to Identify Frequent Coupled Modules from Two-Layered Network Series: Application to Study Transcription and Splicing Coupling

    PubMed Central

    Li, Wenyuan; Dai, Chao; Liu, Chun-Chi

    2012-01-01

    Abstract Current network analysis methods all focus on one or multiple networks of the same type. However, cells are organized by multi-layer networks (e.g., transcriptional regulatory networks, splicing regulatory networks, protein-protein interaction networks), which interact and influence each other. Elucidating the coupling mechanisms among those different types of networks is essential in understanding the functions and mechanisms of cellular activities. In this article, we developed the first computational method for pattern mining across many two-layered graphs, with the two layers representing different types yet coupled biological networks. We formulated the problem of identifying frequent coupled clusters between the two layers of networks into a tensor-based computation problem, and proposed an efficient solution to solve the problem. We applied the method to 38 two-layered co-transcription and co-splicing networks, derived from 38 RNA-seq datasets. With the identified atlas of coupled transcription-splicing modules, we explored to what extent, for which cellular functions, and by what mechanisms transcription-splicing coupling takes place. PMID:22697243

  17. Constructing and Classifying Email Networks from Raw Forensic Images

    DTIC Science & Technology

    2016-09-01

    data mining for sequence and pattern mining ; in medical imaging for image segmentation; and in computer vision for object recognition” [28]. 2.3.1...machine learning and data mining suite that is written in Python. It provides a platform for experiment selection, recommendation systems, and...predictivemod- eling. The Orange library is a hierarchically-organized toolbox of data mining components. Data filtering and probability assessment are at the

  18. Spatial and temporal relationships among watershed mining, water quality, and freshwater mussel status in an eastern USA river.

    PubMed

    Zipper, Carl E; Donovan, Patricia F; Jones, Jess W; Li, Jing; Price, Jennifer E; Stewart, Roger E

    2016-01-15

    The Powell River of southwestern Virginia and northeastern Tennessee, USA, drains a watershed with extensive coal surface mining, and it hosts exceptional biological richness, including at-risk species of freshwater mussels, downstream of mining-disturbed watershed areas. We investigated spatial and temporal patterns of watershed mining disturbance; their relationship to water quality change in the section of the river that connects mining areas to mussel habitat; and relationships of mining-related water constituents to measures of recent and past mussel status. Freshwater mussels in the Powell River have experienced significant declines over the past 3.5 decades. Over that same period, surface coal mining has influenced the watershed. Water-monitoring data collected by state and federal agencies demonstrate that dissolved solids and associated constituents that are commonly influenced by Appalachian mining (specific conductance, pH, hardness and sulfates) have experienced increasing temporal trends from the 1960s through ~2008; but, of those constituents, only dissolved solids concentrations are available widely within the Powell River since ~2008. Dissolved solids concentrations have stabilized in recent years. Dissolved solids, specific conductance, pH, and sulfates also exhibited spatial patterns that are consistent with dilution of mining influence with increasing distance from mined areas. Freshwater mussel status indicators are correlated negatively with dissolved solids concentrations, spatially and temporally, but the direct causal mechanisms responsible for mussel declines remain unknown. Copyright © 2015 Elsevier B.V. All rights reserved.

  19. 76 FR 51274 - Supplemental Nutrition Assistance Program: Major System Failures

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-08-18

    ... data mining as necessary to determine if losses are occurring in the process of issuing benefits. It is... further by using data mining techniques on States' data or analyzing QC data for error patterns that may... conjunction with an additional sample of cases. Data mining techniques may be employed when QC data cannot...

  20. Exploring the Integration of Data Mining and Data Visualization

    ERIC Educational Resources Information Center

    Zhang, Yi

    2011-01-01

    Due to the rapid advances in computing and sensing technologies, enormous amounts of data are being generated everyday in various applications. The integration of data mining and data visualization has been widely used to analyze these massive and complex data sets to discover hidden patterns. For both data mining and visualization to be…

  1. Selection of Most Proper Blasting Pattern in Mines Using Linear Assignment Method: Sungun Copper Mine / Wybór Najodpowiedniejszego Schematu Prowadzenia Prac Strzałowych W Kopalni Miedzi Sungun Z Użyciem Metody Przyporządkowania Liniowego

    NASA Astrophysics Data System (ADS)

    Yari, Mojtaba; Bagherpour, Raheb; Jamali, Saeed; Asadi, Fatemeh

    2015-03-01

    One of the most important operations in mining is blasting. Improper design of blasting pattern will cause technical and safety problems. Considering impact of results of blasting on next steps of mining, correct pattern selection needs a great cautiousness. In selecting of blasting pattern, technical, economical and safety aspects should be considered. Thus, most appropriate pattern selection can be defined as a Multi Attribute Decision Making (MADM) problem. Linear assignment method is one of the very applicable methods in decision making problems. In this paper, this method was used for the first time to evaluate blasting patterns in mine. In this ranking, safety and technical parameters have been considered to evaluate blasting patterns. Finally, blasting pattern with burden of 3.5 m, spacing of 4.5 m, stemming of 3.8 m and hole length of 12.1 m has been presented as the most suitable pattern obtained from linear assignment model for Sungun Copper Mine. Jedną z najpoważniejszych operacji wykonywanych w ramach prac wydobywczych są prace strzałowe. Niewłaściwe rozplanowanie prac powoduje problemy techniczne i stanowi zagrożenie dla bezpieczeństwa. Z uwagi na potencjalne skutki prac strzałowych i ich wpływ na kolejne etapy procesu wydobycia, właściwe rozplanowanie tych prac wymaga wielkiej uwagi i uwzględnienia kwestii technicznych, ekonomicznych a także bezpieczeństwa pracy. Dlatego też wybór najodpowiedniejszego schematu prowadzenia prac strzałowych zdefiniować można jako wieloatrybutowy problem decyzyjny (MADM - Multi Attribute Decision Making). Metoda przyporządkowania liniowego jest jedną z metod mających zastosowanie w rozwiązywaniu problemów decyzyjnych. W obecnej pracy metoda ta wykorzystana została po raz pierwszy do oceny schematów prowadzenia prac strzałowych w kopalni, w procedurze uwzględniono parametry techniczne oraz parametry związane z bezpieczeństwem. Zaprezentowano wybrany przy pomocy metody najkorzystniejszy schemat prowadzenia prac strzałowych w kopalni miedzi Sungun: nadkład 3.5m, odległości pomiędzy otworami 4.5 m, zastosowana przybitka 3.8 m, długość otworu strzałowego 12.1 m.

  2. Data Mining in Cyber Operations

    DTIC Science & Technology

    2014-07-01

    information processing units intended to mimic the network of neurons in the human brain for performing pattern recognition  Self- organizing maps (SOM...patterns are mined from in order to influence the learning model . An exploratory attack does not alter the training process , but rather uses other...New Jersey: Prentice Hall. 21) Kohonen, T. (1982). Self- organized formation of topologically correct feature maps. Biological Cybernetics , 43, 59–69

  3. A novel approach for acid mine drainage pollution biomonitoring using rare earth elements bioaccumulated in the freshwater clam Corbicula fluminea.

    PubMed

    Bonnail, Estefanía; Pérez-López, Rafael; Sarmiento, Aguasanta M; Nieto, José Miguel; DelValls, T Ángel

    2017-09-15

    Lanthanide series have been used as a record of the water-rock interaction and work as a tool for identifying impacts of acid mine drainage (lixiviate residue derived from sulphide oxidation). The application of North-American Shale Composite-normalized rare earth elements patterns to these minority elements allows determining the origin of the contamination. In the current study, geochemical patterns were applied to rare earth elements bioaccumulated in the soft tissue of the freshwater clam Corbicula fluminea after exposure to different acid mine drainage contaminated environments. Results show significant bioaccumulation of rare earth elements in soft tissue of the clam after 14 days of exposure to acid mine drainage contaminated sediment (ΣREE=1.3-8μg/gdw). Furthermore, it was possible to biomonitor different degrees of contamination based on rare earth elements in tissue. The pattern of this type of contamination describes a particular curve characterized by an enrichment in the middle rare earth elements; a homologous pattern (E MREE =0.90) has also been observed when applied NASC normalization in clam tissues. Results of lanthanides found in clams were contrasted with the paucity of toxicity studies, determining risk caused by light rare earth elements in the Odiel River close to the Estuary. The current study purposes the use of clam as an innovative "bio-tool" for the biogeochemical monitoring of pollution inputs that determines the acid mine drainage networks affection. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. Data Mining and Homeland Security: An Overview

    DTIC Science & Technology

    2007-01-18

    originally collected. A fourth issue is privacy. Questions that may be considered include the degree to which government agencies should use and mix...commercial data with government data, whether data sources are being used for purposes other than those for which they were originally designed, and...unique or frequently represented. For example, a hardware CRS-2 3 John Makulowich, “ Government Data Mining Systems Defy Definition,” Washington

  5. Metal speciation in agricultural soils adjacent to the Irankuh Pb-Zn mining area, central Iran

    NASA Astrophysics Data System (ADS)

    Mokhtari, Ahmad Reza; Roshani Rodsari, Parisa; Cohen, David R.; Emami, Adel; Dehghanzadeh Bafghi, Ali Akbar; Khodaian Ghegeni, Ziba

    2015-01-01

    Mining activities are a significant potential source of metal contamination of soils in surrounding areas, with particular concern for metals dispersed into agricultural area in forms that are bioavailable and which may affect human health. Soils in agricultural land adjacent to Pb-Zn mining operations in the southern part of the Irankuh Mountains contain elevated concentrations for a range of metals associated with the mineralization (including Pb, Zn and As). Total and partial geochemical extraction data from a suite of 137 soil samples is used to establish mineralogical controls on ore-related trace elements and help differentiate spatial patterns that can be related to the effects of mining on the agricultural land soils from general geological and environmental controls. Whereas the patterns for Pb, Zn and As are spatially related to the mining operations they display little correlation with the distribution of secondary Fe + Mn oxyhydroxides or carbonates, suggesting dispersion as dust and in forms with limited bioavailability.

  6. Efficient Mining and Detection of Sequential Intrusion Patterns for Network Intrusion Detection Systems

    NASA Astrophysics Data System (ADS)

    Shyu, Mei-Ling; Huang, Zifang; Luo, Hongli

    In recent years, pervasive computing infrastructures have greatly improved the interaction between human and system. As we put more reliance on these computing infrastructures, we also face threats of network intrusion and/or any new forms of undesirable IT-based activities. Hence, network security has become an extremely important issue, which is closely connected with homeland security, business transactions, and people's daily life. Accurate and efficient intrusion detection technologies are required to safeguard the network systems and the critical information transmitted in the network systems. In this chapter, a novel network intrusion detection framework for mining and detecting sequential intrusion patterns is proposed. The proposed framework consists of a Collateral Representative Subspace Projection Modeling (C-RSPM) component for supervised classification, and an inter-transactional association rule mining method based on Layer Divided Modeling (LDM) for temporal pattern analysis. Experiments on the KDD99 data set and the traffic data set generated by a private LAN testbed show promising results with high detection rates, low processing time, and low false alarm rates in mining and detecting sequential intrusion detections.

  7. Data mining: sophisticated forms of managed care modeling through artificial intelligence.

    PubMed

    Borok, L S

    1997-01-01

    Data mining is a recent development in computer science that combines artificial intelligence algorithms and relational databases to discover patterns automatically, without the use of traditional statistical methods. Work with data mining tools in health care is in a developmental stage that holds great promise, given the combination of demographic and diagnostic information.

  8. Using Text Mining to Uncover Students' Technology-Related Problems in Live Video Streaming

    ERIC Educational Resources Information Center

    Abdous, M'hammed; He, Wu

    2011-01-01

    Because of their capacity to sift through large amounts of data, text mining and data mining are enabling higher education institutions to reveal valuable patterns in students' learning behaviours without having to resort to traditional survey methods. In an effort to uncover live video streaming (LVS) students' technology related-problems and to…

  9. An improved Pearson's correlation proximity-based hierarchical clustering for mining biological association between genes.

    PubMed

    Booma, P M; Prabhakaran, S; Dhanalakshmi, R

    2014-01-01

    Microarray gene expression datasets has concerned great awareness among molecular biologist, statisticians, and computer scientists. Data mining that extracts the hidden and usual information from datasets fails to identify the most significant biological associations between genes. A search made with heuristic for standard biological process measures only the gene expression level, threshold, and response time. Heuristic search identifies and mines the best biological solution, but the association process was not efficiently addressed. To monitor higher rate of expression levels between genes, a hierarchical clustering model was proposed, where the biological association between genes is measured simultaneously using proximity measure of improved Pearson's correlation (PCPHC). Additionally, the Seed Augment algorithm adopts average linkage methods on rows and columns in order to expand a seed PCPHC model into a maximal global PCPHC (GL-PCPHC) model and to identify association between the clusters. Moreover, a GL-PCPHC applies pattern growing method to mine the PCPHC patterns. Compared to existing gene expression analysis, the PCPHC model achieves better performance. Experimental evaluations are conducted for GL-PCPHC model with standard benchmark gene expression datasets extracted from UCI repository and GenBank database in terms of execution time, size of pattern, significance level, biological association efficiency, and pattern quality.

  10. An Improved Pearson's Correlation Proximity-Based Hierarchical Clustering for Mining Biological Association between Genes

    PubMed Central

    Booma, P. M.; Prabhakaran, S.; Dhanalakshmi, R.

    2014-01-01

    Microarray gene expression datasets has concerned great awareness among molecular biologist, statisticians, and computer scientists. Data mining that extracts the hidden and usual information from datasets fails to identify the most significant biological associations between genes. A search made with heuristic for standard biological process measures only the gene expression level, threshold, and response time. Heuristic search identifies and mines the best biological solution, but the association process was not efficiently addressed. To monitor higher rate of expression levels between genes, a hierarchical clustering model was proposed, where the biological association between genes is measured simultaneously using proximity measure of improved Pearson's correlation (PCPHC). Additionally, the Seed Augment algorithm adopts average linkage methods on rows and columns in order to expand a seed PCPHC model into a maximal global PCPHC (GL-PCPHC) model and to identify association between the clusters. Moreover, a GL-PCPHC applies pattern growing method to mine the PCPHC patterns. Compared to existing gene expression analysis, the PCPHC model achieves better performance. Experimental evaluations are conducted for GL-PCPHC model with standard benchmark gene expression datasets extracted from UCI repository and GenBank database in terms of execution time, size of pattern, significance level, biological association efficiency, and pattern quality. PMID:25136661

  11. Spatio-Temporal Pattern Mining on Trajectory Data Using Arm

    NASA Astrophysics Data System (ADS)

    Khoshahval, S.; Farnaghi, M.; Taleai, M.

    2017-09-01

    Preliminary mobile was considered to be a device to make human connections easier. But today the consumption of this device has been evolved to a platform for gaming, web surfing and GPS-enabled application capabilities. Embedding GPS in handheld devices, altered them to significant trajectory data gathering facilities. Raw GPS trajectory data is a series of points which contains hidden information. For revealing hidden information in traces, trajectory data analysis is needed. One of the most beneficial concealed information in trajectory data is user activity patterns. In each pattern, there are multiple stops and moves which identifies users visited places and tasks. This paper proposes an approach to discover user daily activity patterns from GPS trajectories using association rules. Finding user patterns needs extraction of user's visited places from stops and moves of GPS trajectories. In order to locate stops and moves, we have implemented a place recognition algorithm. After extraction of visited points an advanced association rule mining algorithm, called Apriori was used to extract user activity patterns. This study outlined that there are useful patterns in each trajectory that can be emerged from raw GPS data using association rule mining techniques in order to find out about multiple users' behaviour in a system and can be utilized in various location-based applications.

  12. Measures to restore metallurgical mine wasteland using ecological restoration technologies: A case study at Longnan Rare Earth Mine

    NASA Astrophysics Data System (ADS)

    Rao, Yunzhang; Gu, Ruizhi; Guo, Ruikai; Zhang, Xueyan

    2017-01-01

    Whereas mining activities produce the raw materials that are crucial to economic growth, such activities leave extensive scarring on the land, contributing to the waste of valuable land resources and upsetting the ecological environment. The aim of this study is therefore to investigate various ecological technologies to restore metallurgical mine wastelands. These technologies include measures such as soil amelioration, vegetation restoration, different vegetation planting patterns, and engineering technologies. The Longnan Rare Earth Mine in the Jiangxi Province of China is used as the case study. The ecological restoration process provides a favourable reference for the restoration of a metallurgical mine wasteland.

  13. Surface Mining and Reclamation Effects on Flood Response of Watersheds in the Central Appalachian Plateau Region

    NASA Technical Reports Server (NTRS)

    Ferrari, J. R.; Lookingbill, T. R.; McCormick, B.; Townsend, P. A.; Eshleman, K. N.

    2009-01-01

    Surface mining of coal and subsequent reclamation represent the dominant land use change in the central Appalachian Plateau (CAP) region of the United States. Hydrologic impacts of surface mining have been studied at the plot scale, but effects at broader scales have not been explored adequately. Broad-scale classification of reclaimed sites is difficult because standing vegetation makes them nearly indistinguishable from alternate land uses. We used a land cover data set that accurately maps surface mines for a 187-km2 watershed within the CAP. These land cover data, as well as plot-level data from within the watershed, are used with HSPF (Hydrologic Simulation Program-Fortran) to estimate changes in flood response as a function of increased mining. Results show that the rate at which flood magnitude increases due to increased mining is linear, with greater rates observed for less frequent return intervals. These findings indicate that mine reclamation leaves the landscape in a condition more similar to urban areas rather than does simple deforestation, and call into question the effectiveness of reclamation in terms of returning mined areas to the hydrological state that existed before mining.

  14. Data Mining and Homeland Security: An Overview

    DTIC Science & Technology

    2007-03-28

    originally collected. A fourth issue is privacy. Questions that may be considered include the degree to which government agencies should use and mix...commercial data with government data, whether data sources are being used for purposes other than those for which they were originally designed, and...frequently represented. For example, a hardware CRS-2 3 John Makulowich, “ Government Data Mining Systems Defy Definition,” Washington Technology, 22 February

  15. Assertions of Japanese Websites for and Against Cancer Screening: a Text Mining Analysis

    PubMed

    Okuhara, Tsuyoshi; Ishikawa, Hirono; Okada, Masahumi; Kato, Mio; Kiuchi, Takahiro

    2017-04-01

    Background: Cancer screening rates are lower in Japan than in Western countries such as the United States and the United Kingdom. While health professionals publish pro-cancer-screening messages online to encourage proactive seeking for screening, anti-screening activists use the same medium to warn readers against following guidelines. Contents of pro- and anti-cancer-screening sites may contribute to readers’ acceptance of one or the other position. We aimed to use a text-mining method to examine frequently appearing contents on sites for and against cancer screening. Methods: We conducted online searches in December 2016 using two major search engines in Japan (Google Japan and Yahoo! Japan). Targeted websites were classified as “pro”, “anti”, or “neutral” depending on their claims, with the author(s) classified as “health professional”, “mass media”, or “layperson”. Text-mining analyses were conducted, and statistical analysis was performed using the chi-square test. Results: Of the 169 websites analyzed, the top-three most frequently appearing content topics in pro sites were reducing mortality via cancer screening, benefits of early detection, and recommendations for obtaining detailed examination. The top three most frequent in anti-sites were harm from radiation exposure, non-efficacy of cancer screening, and lack of necessity of early detection. Anti-sites also frequently referred to a well-known Japanese radiologist, Makoto Kondo, who rejects the standard forms of cancer care. Conclusion: Our findings should enable authors of pro-cancer-screening sites to write to counter misleading anti-cancer-screening messages and facilitate dissemination of accurate information. Creative Commons Attribution License

  16. Study of the crater deformation of the CODELCO/Andina mine using the satellite and ground data

    NASA Astrophysics Data System (ADS)

    Caverlotti-Silva, M. A.; Arellano-Baeza, A. A.

    2011-12-01

    The correct monitoring of the subsidence of the craters related to the underground mine exploitation is one of the most important endeavors of the satellite remote sensing. The ASTER and LANDSAT satellite images have been used to study the deformation of the crater of the CODELCO/Andina mine, Valparaiso Region, Chile. The high-resolution satellite images were used to detect changes in the lineament patterns related to the subsidence. These results were compared with the ground deformation extracted from the GPS and topography station networks. It was found that sudden changes in the lineament patterns appear when the ground deformation overcomes a definite threshold.

  17. Privacy Preserving Nearest Neighbor Search

    NASA Astrophysics Data System (ADS)

    Shaneck, Mark; Kim, Yongdae; Kumar, Vipin

    Data mining is frequently obstructed by privacy concerns. In many cases data is distributed, and bringing the data together in one place for analysis is not possible due to privacy laws (e.g. HIPAA) or policies. Privacy preserving data mining techniques have been developed to address this issue by providing mechanisms to mine the data while giving certain privacy guarantees. In this chapter we address the issue of privacy preserving nearest neighbor search, which forms the kernel of many data mining applications. To this end, we present a novel algorithm based on secure multiparty computation primitives to compute the nearest neighbors of records in horizontally distributed data. We show how this algorithm can be used in three important data mining algorithms, namely LOF outlier detection, SNN clustering, and kNN classification. We prove the security of these algorithms under the semi-honest adversarial model, and describe methods that can be used to optimize their performance. Keywords: Privacy Preserving Data Mining, Nearest Neighbor Search, Outlier Detection, Clustering, Classification, Secure Multiparty Computation

  18. Protein classification using sequential pattern mining.

    PubMed

    Exarchos, Themis P; Papaloukas, Costas; Lampros, Christos; Fotiadis, Dimitrios I

    2006-01-01

    Protein classification in terms of fold recognition can be employed to determine the structural and functional properties of a newly discovered protein. In this work sequential pattern mining (SPM) is utilized for sequence-based fold recognition. One of the most efficient SPM algorithms, cSPADE, is employed for protein primary structure analysis. Then a classifier uses the extracted sequential patterns for classifying proteins of unknown structure in the appropriate fold category. The proposed methodology exhibited an overall accuracy of 36% in a multi-class problem of 17 candidate categories. The classification performance reaches up to 65% when the three most probable protein folds are considered.

  19. Differentiation of closely related isomers: application of data mining techniques in conjunction with variable wavelength infrared multiple photon dissociation mass spectrometry for identification of glucose-containing disaccharide ions.

    PubMed

    Stefan, Sarah E; Ehsan, Mohammad; Pearson, Wright L; Aksenov, Alexander; Boginski, Vladimir; Bendiak, Brad; Eyler, John R

    2011-11-15

    Data mining algorithms have been used to analyze the infrared multiple photon dissociation (IRMPD) patterns of gas-phase lithiated disaccharide isomers irradiated with either a line-tunable CO(2) laser or a free electron laser (FEL). The IR fragmentation patterns over the wavelength range of 9.2-10.6 μm have been shown in earlier work to correlate uniquely with the asymmetry at the anomeric carbon in each disaccharide. Application of data mining approaches for data analysis allowed unambiguous determination of the anomeric carbon configurations for each disaccharide isomer pair using fragmentation data at a single wavelength. In addition, the linkage positions were easily assigned. This combination of wavelength-selective IRMPD and data mining offers a powerful and convenient tool for differentiation of structurally closely related isomers, including those of gas-phase carbohydrate complexes.

  20. The design and implementation of web mining in web sites security

    NASA Astrophysics Data System (ADS)

    Li, Jian; Zhang, Guo-Yin; Gu, Guo-Chang; Li, Jian-Li

    2003-06-01

    The backdoor or information leak of Web servers can be detected by using Web Mining techniques on some abnormal Web log and Web application log data. The security of Web servers can be enhanced and the damage of illegal access can be avoided. Firstly, the system for discovering the patterns of information leakages in CGI scripts from Web log data was proposed. Secondly, those patterns for system administrators to modify their codes and enhance their Web site security were provided. The following aspects were described: one is to combine web application log with web log to extract more information, so web data mining could be used to mine web log for discovering the information that firewall and Information Detection System cannot find. Another approach is to propose an operation module of web site to enhance Web site security. In cluster server session, Density-Based Clustering technique is used to reduce resource cost and obtain better efficiency.

  1. Chronodes: Interactive Multifocus Exploration of Event Sequences

    PubMed Central

    POLACK, PETER J.; CHEN, SHANG-TSE; KAHNG, MINSUK; DE BARBARO, KAYA; BASOLE, RAHUL; SHARMIN, MOUSHUMI; CHAU, DUEN HORNG

    2018-01-01

    The advent of mobile health (mHealth) technologies challenges the capabilities of current visualizations, interactive tools, and algorithms. We present Chronodes, an interactive system that unifies data mining and human-centric visualization techniques to support explorative analysis of longitudinal mHealth data. Chronodes extracts and visualizes frequent event sequences that reveal chronological patterns across multiple participant timelines of mHealth data. It then combines novel interaction and visualization techniques to enable multifocus event sequence analysis, which allows health researchers to interactively define, explore, and compare groups of participant behaviors using event sequence combinations. Through summarizing insights gained from a pilot study with 20 behavioral and biomedical health experts, we discuss Chronodes’s efficacy and potential impact in the mHealth domain. Ultimately, we outline important open challenges in mHealth, and offer recommendations and design guidelines for future research. PMID:29515937

  2. Graph-based biomedical text summarization: An itemset mining and sentence clustering approach.

    PubMed

    Nasr Azadani, Mozhgan; Ghadiri, Nasser; Davoodijam, Ensieh

    2018-06-12

    Automatic text summarization offers an efficient solution to access the ever-growing amounts of both scientific and clinical literature in the biomedical domain by summarizing the source documents while maintaining their most informative contents. In this paper, we propose a novel graph-based summarization method that takes advantage of the domain-specific knowledge and a well-established data mining technique called frequent itemset mining. Our summarizer exploits the Unified Medical Language System (UMLS) to construct a concept-based model of the source document and mapping the document to the concepts. Then, it discovers frequent itemsets to take the correlations among multiple concepts into account. The method uses these correlations to propose a similarity function based on which a represented graph is constructed. The summarizer then employs a minimum spanning tree based clustering algorithm to discover various subthemes of the document. Eventually, it generates the final summary by selecting the most informative and relative sentences from all subthemes within the text. We perform an automatic evaluation over a large number of summaries using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics. The results demonstrate that the proposed summarization system outperforms various baselines and benchmark approaches. The carried out research suggests that the incorporation of domain-specific knowledge and frequent itemset mining equips the summarization system in a better way to address the informativeness measurement of the sentences. Moreover, clustering the graph nodes (sentences) can enable the summarizer to target different main subthemes of a source document efficiently. The evaluation results show that the proposed approach can significantly improve the performance of the summarization systems in the biomedical domain. Copyright © 2018. Published by Elsevier Inc.

  3. Leptospirosis Seroprevalence among Blue Metal Mine Workers of Tamil Nadu, India

    PubMed Central

    Parveen, Sakkarai Mohamed Asha; Suganyaa, Baskar; Sathya, Muthu Sri; Margreat, Alphonse Asirvatham Princy; Sivasankari, Karikalacholan; Shanmughapriya, Santhanam; Hoffman, Nicholas E.; Natarajaseenivasan, Kalimuthusamy

    2016-01-01

    Leptospirosis is mainly considered an occupational disease, prevalent among agriculture, sewage works, forestry, and animal slaughtering populations. However, putative risk to miners and their inclusion in the high-risk leptospirosis group remain in need of rigorous analysis. Therefore, a study was conducted with the objective to assess the leptospirosis seroprevalence among miners of two districts of Tamil Nadu, India. A total of 244 sera samples from Pudukkottai miners (124) and Karur miners (120) were analyzed by microscopic agglutination test. Antibodies to leptospires were detected in 94 samples giving an overall seroprevalence of 38.5%. The seroprevalence was higher among Pudukkottai miners (65.3%) when compared with Karur miners (10.8%). Seroprevalence among control population (13%) was significantly less than that of the Pudukkottai miners marking a possible high-risk population group distinction. Subject sera most commonly reacted with organisms of the serogroup Autumnalis, and the pattern was similar in carrier animals of the study areas. Two leptospires were isolated from kidney samples of rats. The prevalence of Autumnalis among rodents and humans source tracked human leptospirosis among the miners. The study also determined that Pudukkottai miners are subjected to high-risk challenges such as exposure to water bodies on the way to the mines (odds ratio [OR] = 10.6), wet mine areas (OR = 10.6), rat infestation (OR = 4.6), and cattle rearing (OR = 10.4) and are thus frequently exposed to leptospirosis compared with Karur miners. Hence, control strategies targeting these populations will likely to prove to be effective remediation strategies benefiting Pudukkottai miners and workers in similar environments across occupations. PMID:27044567

  4. Japanese anti- versus pro-influenza vaccination websites: a text-mining analysis.

    PubMed

    Okuhara, Tsuyoshi; Ishikawa, Hirono; Okada, Masafumi; Kato, Mio; Kiuchi, Takahiro

    2018-03-23

    Anti-vaccination sentiment exists worldwide and Japan is no exception. Health professionals publish pro-influenza vaccination messages online to encourage proactive seeking of influenza vaccination. However, influenza vaccine coverage among the Japanese population is less than optimal. The contents of pro- and anti-influenza vaccination websites may contribute to readers' acceptance of one or the other position. We aimed to use a text-mining method to examine frequently appearing content on websites for and against influenza vaccination. We conducted online searches in January 2017 using two major Japanese search engines (Google Japan and Yahoo! Japan). Targeted websites were classified as 'pro', 'anti' or 'neutral' depending on their claims, with author(s) classified as 'health professionals', 'mass media' or 'laypersons'. Text-mining analysis was conducted, and statistical analysis was performed using a chi-squared test. Of the 334 websites analyzed, 13 content topics were identified. The three most frequently appearing content topics on pro-vaccination websites were vaccination effect for preventing serious cases of influenza, side effects of vaccination, and efficacy rate of vaccination. The three most frequent topics on anti-vaccination websites were ineffectiveness of influenza vaccination, toxicity of vaccination, and side effects of vaccination. The main disseminators of each topic, by author classification, were also revealed. We discuss possible tactics of online influenza vaccination promotion to counter anti-vaccination websites.

  5. Off-road truck-related accidents in U.S. mines

    PubMed Central

    Dindarloo, Saeid R.; Pollard, Jonisha P.; Siami-Irdemoosa, Elnaz

    2016-01-01

    Introduction Off-road trucks are one of the major sources of equipment-related accidents in the U.S. mining industries. A systematic analysis of all off-road truck-related accidents, injuries, and illnesses, which are reported and published by the Mine Safety and Health Administration (MSHA), is expected to provide practical insights for identifying the accident patterns and trends in the available raw database. Therefore, appropriate safety management measures can be administered and implemented based on these accident patterns/trends. Methods A hybrid clustering-classification methodology using K-means clustering and gene expression programming (GEP) is proposed for the analysis of severe and non-severe off-road truck-related injuries at U.S. mines. Using the GEP sub-model, a small subset of the 36 recorded attributes was found to be correlated to the severity level. Results Given the set of specified attributes, the clustering sub-model was able to cluster the accident records into 5 distinct groups. For instance, the first cluster contained accidents related to minerals processing mills and coal preparation plants (91%). More than two-thirds of the victims in this cluster had less than 5 years of job experience. This cluster was associated with the highest percentage of severe injuries (22 severe accidents, 3.4%). Almost 50% of all accidents in this cluster occurred at stone operations. Similarly, the other four clusters were characterized to highlight important patterns that can be used to determine areas of focus for safety initiatives. Conclusions The identified clusters of accidents may play a vital role in the prevention of severe injuries in mining. Further research into the cluster attributes and identified patterns will be necessary to determine how these factors can be mitigated to reduce the risk of severe injuries. Practical application Analyzing injury data using data mining techniques provides some insight into attributes that are associated with high accuracies for predicting injury severity. PMID:27620937

  6. Off-road truck-related accidents in U.S. mines.

    PubMed

    Dindarloo, Saeid R; Pollard, Jonisha P; Siami-Irdemoosa, Elnaz

    2016-09-01

    Off-road trucks are one of the major sources of equipment-related accidents in the U.S. mining industries. A systematic analysis of all off-road truck-related accidents, injuries, and illnesses, which are reported and published by the Mine Safety and Health Administration (MSHA), is expected to provide practical insights for identifying the accident patterns and trends in the available raw database. Therefore, appropriate safety management measures can be administered and implemented based on these accident patterns/trends. A hybrid clustering-classification methodology using K-means clustering and gene expression programming (GEP) is proposed for the analysis of severe and non-severe off-road truck-related injuries at U.S. mines. Using the GEP sub-model, a small subset of the 36 recorded attributes was found to be correlated to the severity level. Given the set of specified attributes, the clustering sub-model was able to cluster the accident records into 5 distinct groups. For instance, the first cluster contained accidents related to minerals processing mills and coal preparation plants (91%). More than two-thirds of the victims in this cluster had less than 5years of job experience. This cluster was associated with the highest percentage of severe injuries (22 severe accidents, 3.4%). Almost 50% of all accidents in this cluster occurred at stone operations. Similarly, the other four clusters were characterized to highlight important patterns that can be used to determine areas of focus for safety initiatives. The identified clusters of accidents may play a vital role in the prevention of severe injuries in mining. Further research into the cluster attributes and identified patterns will be necessary to determine how these factors can be mitigated to reduce the risk of severe injuries. Analyzing injury data using data mining techniques provides some insight into attributes that are associated with high accuracies for predicting injury severity. Copyright © 2016 Elsevier Ltd and National Safety Council. All rights reserved.

  7. Data Mining and Homeland Security: An Overview

    DTIC Science & Technology

    2006-01-27

    which government agencies should use and mix commercial data with government data, whether data sources are being used for purposes other than those...example, a hardware store may compare their customers’ tool purchases with home ownership, type of CRS-2 3 John Makulowich, “ Government Data Mining...cleaning, data integration, data selection, data transformation , (data mining), pattern evaluation, and knowledge presentation.4 A number of advances in

  8. Data mining of air traffic control operational errors

    DOT National Transportation Integrated Search

    2006-01-01

    In this paper we present the results of : applying data mining techniques to identify patterns and : anomalies in air traffic control operational errors (OEs). : Reducing the OE rate is of high importance and remains a : challenge in the aviation saf...

  9. Pattern extraction for high-risk accidents in the construction industry: a data-mining approach.

    PubMed

    Amiri, Mehran; Ardeshir, Abdollah; Fazel Zarandi, Mohammad Hossein; Soltanaghaei, Elahe

    2016-09-01

    Accidents involving falls and falling objects (group I) are highly frequent accidents in the construction industry. While being hit by a vehicle, electric shock, collapse in the excavation and fire or explosion accidents (group II) are much less frequent, they make up a considerable proportion of severe accidents. In this study, multiple-correspondence analysis, decision tree, ensembles of decision tree and association rules methods are employed to analyse a database of construction accidents throughout Iran between 2007 and 2011. The findings indicate that in group I, there is a significant correspondence among these variables: time of accident, place of accident, body part affected, final consequence of accident and lost workdays. Moreover, the frequency of accidents in the night shift is less than others, and the frequency of injury to the head, back, spine and limbs are more. In group II, the variables time of accident and body part affected are mostly related and the frequency of accidents among married and older workers is more than single and young workers. There was a higher frequency in the evening, night shifts and weekends. The results of this study are totally in line with the previous research.

  10. Water spray ventilator system for continuous mining machines

    DOEpatents

    Page, Steven J.; Mal, Thomas

    1995-01-01

    The invention relates to a water spray ventilator system mounted on a continuous mining machine to streamline airflow and provide effective face ventilation of both respirable dust and methane in underground coal mines. This system has two side spray nozzles mounted one on each side of the mining machine and six spray nozzles disposed on a manifold mounted to the underside of the machine boom. The six spray nozzles are angularly and laterally oriented on the manifold so as to provide non-overlapping spray patterns along the length of the cutter drum.

  11. A Data Mining Approach to Identify Sexuality Patterns in a Brazilian University Population.

    PubMed

    Waleska Simões, Priscyla; Cesconetto, Samuel; Toniazzo de Abreu, Larissa Letieli; Côrtes de Mattos Garcia, Merisandra; Cassettari Junior, José Márcio; Comunello, Eros; Bisognin Ceretta, Luciane; Aparecida Manenti, Sandra

    2015-01-01

    This paper presents the profile and experience of sexuality generated from a data mining classification task. We used a database about sexuality and gender violence performed on a university population in southern Brazil. The data mining task identified two relationships between the variables, which enabled the distinction of subgroups that better detail the profile and experience of sexuality. The identification of the relationships between the variables define behavioral models and factors of risk that will help define the algorithms being implemented in the data mining classification task.

  12. Aircraft Mishap Fire Pattern Investigations

    DTIC Science & Technology

    1985-08-01

    AD-AI61 094 AIRC1Arr WSWEA FlREg PATMEN INVESTIGATIONS . Joseph M. Kuchta Mining and industrial Cadre15143 Green latetrutiovalp 𔃻nco 54 Sewickley...ORGANIZATION REPORT NUMSER(S) AFWAL-TR-85-2057 6. NAME OF PERFORMING ORGANIZATION kb. OFFICE SYMBOL 7a. NAME OF MONITORING ORGANIZATION Mining and Industrial...IS OBSOLETE. Unc .assi fied SECURITY CLASSIFICATION OF THIS PAGE ( / FOREWARD This report was prepared by the Mining and Industrial Cadre of Green

  13. Data Stream Mining

    NASA Astrophysics Data System (ADS)

    Gaber, Mohamed Medhat; Zaslavsky, Arkady; Krishnaswamy, Shonali

    Data mining is concerned with the process of computationally extracting hidden knowledge structures represented in models and patterns from large data repositories. It is an interdisciplinary field of study that has its roots in databases, statistics, machine learning, and data visualization. Data mining has emerged as a direct outcome of the data explosion that resulted from the success in database and data warehousing technologies over the past two decades (Fayyad, 1997,Fayyad, 1998,Kantardzic, 2003).

  14. Ground-water resources and potential hydrologic effects of surface coal mining in the northern Powder River basin, southeastern Montana

    USGS Publications Warehouse

    Slagle, Steven E.; Lewis, Barney D.; Lee, Roger W.

    1985-01-01

    The shallow ground-water system in the northern Powder River Basin consists of Upper Cretaceous to Holocene aquifers overlying the Bearpaw Shale--namely, the Fox Hills Sandstone; Hell Creek, Fort Union, and Wasatch Formations; terrace deposits; and alluvium. Ground-water flow above the Bearpaw Shale can be divided into two general flow patterns. An upper flow pattern occurs in aquifers at depths of less than about 200 feet and occurs primarily as localized flow controlled by the surface topography. A lower flow pattern occurs in aquifers at depths from about 200 to 1,200 feet and exhibits a more regional flow, which is generally northward toward the Yellowstone River with significant flow toward the Powder and Tongue Rivers. The chemical quality of water in the shallow ground-water system in the study area varies widely, and most of the ground water does not meet standards for dissolved constituents in public drinking water established by the U.S. Environmental Protection Agency. Water from depths less than 200 feet generally is a sodium sulfate type having an average dissolved-solids concentration of 2,100 milligrams per liter. Sodium bicarbonate water having an average dissolved-solids concentration of 1,400 milligrams per liter is typical from aquifers in the shallow ground-water system at depths between 200 and 1,200 feet. Effects of surface coal mining on the water resources in the northern Powder River Basin are dependent on the stratigraphic location of the mine cut. Where the cut lies above the water-yielding zone, the effects will be minimal. Where the mine cut intersects a water-ielding zone, effects on water levels and flow patterns can be significant locally, but water levels and flow patterns will return to approximate premining conditions after mining ceases. Ground water in and near active and former mines may become more mineralized, owing to the placement of spoil material from the reducing zone in the unsaturated zone where the minerals are subject to oxidation. Regional effects probably will be small because of the limited areal extent of ground-water flow systems where mining is feasible. Results of digital models are presented to illustrate the effects of varying hydraulic properties on water-level changes resulting from mine dewatering. The model simulations were designed to depict maximum-drawdown situations. One simulation indicates that after 20 years of continuous dewatering of an infinite, homogeneous, isotropic aquifer that is 10 feet thick and has an initial potentiometric surface 10 feet above the top of the aquifer, water-level declines greater than 1 foot would generally be limited to within 7.5 miles of the center of the mine excavation; declines greater than 2 feet to within about 6 miles; declines greater than 5 feet to within about 3.7 miles; declines greater than 10 feet to within about 1.7 miles; and declines greater than 15 feet to within 1.2 miles.

  15. Modeling Spatial Dependencies and Semantic Concepts in Data Mining

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vatsavai, Raju

    Data mining is the process of discovering new patterns and relationships in large datasets. However, several studies have shown that general data mining techniques often fail to extract meaningful patterns and relationships from the spatial data owing to the violation of fundamental geospatial principles. In this tutorial, we introduce basic principles behind explicit modeling of spatial and semantic concepts in data mining. In particular, we focus on modeling these concepts in the widely used classification, clustering, and prediction algorithms. Classification is the process of learning a structure or model (from user given inputs) and applying the known model to themore » new data. Clustering is the process of discovering groups and structures in the data that are ``similar,'' without applying any known structures in the data. Prediction is the process of finding a function that models (explains) the data with least error. One common assumption among all these methods is that the data is independent and identically distributed. Such assumptions do not hold well in spatial data, where spatial dependency and spatial heterogeneity are a norm. In addition, spatial semantics are often ignored by the data mining algorithms. In this tutorial we cover recent advances in explicitly modeling of spatial dependencies and semantic concepts in data mining.« less

  16. Data mining in radiology

    PubMed Central

    Kharat, Amit T; Singh, Amarjit; Kulkarni, Vilas M; Shah, Digish

    2014-01-01

    Data mining facilitates the study of radiology data in various dimensions. It converts large patient image and text datasets into useful information that helps in improving patient care and provides informative reports. Data mining technology analyzes data within the Radiology Information System and Hospital Information System using specialized software which assesses relationships and agreement in available information. By using similar data analysis tools, radiologists can make informed decisions and predict the future outcome of a particular imaging finding. Data, information and knowledge are the components of data mining. Classes, Clusters, Associations, Sequential patterns, Classification, Prediction and Decision tree are the various types of data mining. Data mining has the potential to make delivery of health care affordable and ensure that the best imaging practices are followed. It is a tool for academic research. Data mining is considered to be ethically neutral, however concerns regarding privacy and legality exists which need to be addressed to ensure success of data mining. PMID:25024513

  17. [Basic Regularities and Characteristics of Compound Reinforcing--reducing Manipulation of Acu- puncture Revealed by Data Mining].

    PubMed

    Yang, Qing-qing; Jia, Chun-sheng; Wang, Jian-ling; Li, Jun-lei; Feng, Xin-xin; Tan, Zhan-na; Li, Bo-ying; Zhu, Xue-liang; Shi, Jing; Sun, Yan-hui; Li, Xiao-feng; Xu, Jing; Zhang, Xuan-ping; Zhang, Xin; Du, Yu-zhu; Bao, Na; Wang, Qiong

    2016-04-01

    To explore the regularities and features of compound reinforcing-reducing manipulation of acupuncture filiform needles in the treatment of clinical conditions or diseases by using data mining technique, so as to guide clinical practice. At first, the data base about the reinforcing-reducing manipulation (CRRM) of filiform needles for different clinical problems was established by collection, sorting, screening, recording, collation, data extraction of the related original papers published in journals and conferences and related academic dissertations from Jan. 1 of 1950 to Jan. 31 of 2015 by using key words of "acupuncture" "moxibustion" "needling" "filiform needle", and according to the included and excluded standards. A total of 130 835 papers met the included standards were collected. Outcomes of data mining in the present study showed that (1) the ORRM is most frequently applied in the internal medicine, followed by surgery, gynecology, ophthalmology and otorhinolaryngology, dermatology, and pediatrics, successively, mostly for lumbago and leg pain; (2) the heat-producing needling manipulation is the most frequently applied technique, followed by cool-producing needling, dragon-tiger warring, yang occluding in yin, yin occluding in yang techniques; (3) the highest effective rate of CRRM is for problems of the pediatrics, followed by those of the internal medicine, surgery, ophthalmology and otorhinolaryngology, dermatology, and gynecology; (4) the most fre- quently used acupoints are Zusanli (ST 36), then Sanyinjiao (SP 6), stimulated by heat-producing needling, and Zusanli (ST 36), then Quchi (LI 11), stimulated by cool-producing needling, and Huantiao (GB 30), stimulated by dragon-tiger warring needling. The compound reinforcing-reducing manipulation of acupuncture is most frequently applied to problems in the inter- nal medicine, predominately for lumbago and leg pain, and the best effectiveness is for pediatric conditions. The heat-producing needling and cool-producing needling are most frequently applied at Zusanli (ST 36) and the dragon-tiger warring manipulation is most frequently applied at Huantiao (GB 30).

  18. The Distribution and Status of Bats at Fort Irwin National Training Center

    DTIC Science & Technology

    2012-12-01

    the Avawatz Mountains (Table 9) in the vicinity of Goat Mountain are more human accessible due to their close proximity to roads. Troops are currently...altitudinally, (Grinnell 1918, Krutzsch 1948, Cryan 2003) and are often the species most frequently killed at wind farms . For southern California...As noted in the results section, the current level of bat use was similar at the Desert King Mine and the Avawatz mines near Goat Mountain as was

  19. Biosorption of metal and salt tolerant microbial isolates from a former uranium mining area. Their impact on changes in rare earth element patterns in acid mine drainage.

    PubMed

    Haferburg, Götz; Merten, Dirk; Büchel, Georg; Kothe, Erika

    2007-12-01

    The concentration of metals in microbial habitats influenced by mining operations can reach enormous values. Worldwide, much emphasis is placed on the research of resistance and biosorptive capacities of microorganisms suitable for bioremediation purposes. Using a collection of isolates from a former uranium mining area in Eastern Thuringia, Germany, this study presents three Gram-positive bacterial strains with distinct metal tolerances. These strains were identified as members of the genera Bacillus, Micrococcus and Streptomyces. Acid mine drainage (AMD) originating from the same mining area is characterized by high metal concentrations of a broad range of elements and a very low pH. AMD was analyzed and used as incubation solution. The sorption of rare earth elements (REE), aluminum, cobalt, copper, manganese, nickel, strontium, and uranium through selected strains was studied during a time course of four weeks. Biosorption was investigated after one hour, one week and four weeks by analyzing the concentrations of metals in supernatant and biomass. Additionally, dead biomass was investigated after four weeks of incubation. The maximum of metal removal was reached after one week. Up to 80% of both Al and Cu, and more than 60% of U was shown to be removed from the solution. High concentrations of metals could be bound to the biomass, as for example 2.2 mg/g U. The strains could survive four weeks of incubation. Distinct and different patterns of rare earth elements of the inoculated and non-inoculated AMD water were observed. Changes in REE patterns hint at different binding types of heavy metals regarding incubation time and metabolic activity of the cells. (c) 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  20. The prescribing of Chinese herbal products in Taiwan: a cross-sectional analysis of the national health insurance reimbursement database.

    PubMed

    Hsieh, Shu-Ching; Lai, Jung-Nien; Lee, Chuan-Fang; Hu, Fu-Chang; Tseng, Wei-Lum; Wang, Jung-Der

    2008-06-01

    The consumption of Chinese herbal products (CHPs) is increasing exponentially. However, the scientific evidence is lacking and there is an urgent requirement for detailed pharmacoepidemiological information on CHP usage. This study was to investigate CHP prescription patterns in Taiwan. We carried out a cross-sectional analysis on a cohort of 200,000 patients based on 2004 data from the National Health Insurance (NHI) reimbursement database. Data mining techniques were applied to explore CHP co-prescription patterns. A total of 46,938 patients had been prescribed CHPs on at least one occasion in 2004. Patients using CHPs were generally female and middle-aged, made more outpatient visits, had fewer hospitalizations and consumed more medical resources than non-users of CHPs. A total of 1,073,030 CHPs were contained within 220,123 prescriptions, for which acute nasopharyngitis was the most common indication. Yan hu suo and Jia Wei Xiao Yao San were the most frequently prescribed single herb (SH) and herbal formula (HF), respectively. The results of the data mining showed that the best predictions were provided by co-prescriptions of 'Mo yao and Ru xiang', 'Ye jiao teng and Suan Zao Ren Tan' and 'Dang Gui Nian Tong Tang and Shu Jing Huo Xue Tang' in the groups of SH-SH, SH-HF and HF-HF, respectively. This study provides national-level CHP prescription profiles and utilization rates, and documents, for the first time, HF-HF prescription combinations in Chinese medicine (CM) practices in Taiwan. We conclude that more studies are needed to validate the safety and effectiveness of CHP prescriptions.

  1. Spatial and temporal patterns in trace element deposition to lakes in the Athabasca oil sands region (Alberta, Canada)

    NASA Astrophysics Data System (ADS)

    Cooke, Colin A.; Kirk, Jane L.; Muir, Derek C. G.; Wiklund, Johan A.; Wang, Xiaowa; Gleason, Amber; Evans, Marlene S.

    2017-12-01

    The mining and processing of the Athabasca oil sands (Alberta, Canada) has been occurring for decades; however, a lack of consistent regional monitoring has obscured the long-term environmental impact. Here, we present sediment core results to reconstruct spatial and temporal patterns in trace element deposition to lakes in the Athabasca oil sands region. Early mining operations (during the 1970s and 1980s) led to elevated V and Pb inputs to lakes located <50 km from mining operations. Subsequent improvements to mining and upgrading technologies since the 1980s have reduced V and Pb loading to near background levels at many sites. In contrast, Hg deposition increased by a factor of ~3 to all 20 lakes over the 20th century, reflecting global-scale patterns in atmospheric Hg emissions. Base cation deposition (from fugitive dust emissions) has not measurably impacted regional lake sediments. Instead, results from a principal components analysis suggest that the presence of carbonate bedrock underlying lakes located close to development appears to exert a first-order control over lake sediment base cation concentrations and overall lake sediment geochemical composition. Trace element concentrations generally did not exceed Canadian sediment quality guidelines, and no spatial or temporal trends were observed in the frequency of guideline exceedence. Our results demonstrate that early mining efforts had an even greater impact on trace element cycling than has been appreciated previously, placing recent monitoring efforts in a critical long-term context.

  2. Mineralogical controls on mobility of rare earth elements in acid mine drainage environments.

    PubMed

    Soyol-Erdene, T O; Valente, T; Grande, J A; de la Torre, M L

    2018-08-01

    Rare earth elements (REE) were analyzed in river waters, acid mine waters, and extracts of secondary precipitates collected in the Iberian Pyrite Belt. The obtained concentrations of the REE in river water and mine waters (acid mine drainage - AMD) were in the range of 0.57 μg/L (Lu) and 2579 μg/L (Ce), which is higher than previously reported in surface waters from the Iberian Pyrite Belt, but are comparable with previous findings from AMD worldwide. Total REE concentrations in river waters were ranged between 297 μg/L (Cobica River) and 7032 μg/L (Trimpancho River) with an average of 2468 μg/L. NASC (North American Shale Composite) normalized REE patterns for river and acid mine waters show clear convex curvatures in middle-REE (MREE) with respect to light- and heavy-REE. During the dissolution experiments of AMD-precipitates, heavy-REE and middle-REE generate the most enriched patterns in the solution. A small number of precipitates did not display MREE enrichment (an index Gd n /Lu n  < 1.0) in NASC normalized pattern and produced relatively lower REE concentrations in extracts. Additionally, very few samples, which mainly contained aluminum sulfates, e.g., pickeringite and alunogen, displayed light-REE enrichment relative to heavy-REE (HREE). In general, the highest retention of REE occurs in samples enriched in magnesium (epsomite or hexahydrite) and aluminum sulfates, mainly pickeringite. Copyright © 2018 Elsevier Ltd. All rights reserved.

  3. VisualUrText: A Text Analytics Tool for Unstructured Textual Data

    NASA Astrophysics Data System (ADS)

    Zainol, Zuraini; Jaymes, Mohd T. H.; Nohuddin, Puteri N. E.

    2018-05-01

    The growing amount of unstructured text over Internet is tremendous. Text repositories come from Web 2.0, business intelligence and social networking applications. It is also believed that 80-90% of future growth data is available in the form of unstructured text databases that may potentially contain interesting patterns and trends. Text Mining is well known technique for discovering interesting patterns and trends which are non-trivial knowledge from massive unstructured text data. Text Mining covers multidisciplinary fields involving information retrieval (IR), text analysis, natural language processing (NLP), data mining, machine learning statistics and computational linguistics. This paper discusses the development of text analytics tool that is proficient in extracting, processing, analyzing the unstructured text data and visualizing cleaned text data into multiple forms such as Document Term Matrix (DTM), Frequency Graph, Network Analysis Graph, Word Cloud and Dendogram. This tool, VisualUrText, is developed to assist students and researchers for extracting interesting patterns and trends in document analyses.

  4. Reverse and forward engineering of protein pattern formation.

    PubMed

    Kretschmer, Simon; Harrington, Leon; Schwille, Petra

    2018-05-26

    Living systems employ protein pattern formation to regulate important life processes in space and time. Although pattern-forming protein networks have been identified in various prokaryotes and eukaryotes, their systematic experimental characterization is challenging owing to the complex environment of living cells. In turn, cell-free systems are ideally suited for this goal, as they offer defined molecular environments that can be precisely controlled and manipulated. Towards revealing the molecular basis of protein pattern formation, we outline two complementary approaches: the biochemical reverse engineering of reconstituted networks and the de novo design, or forward engineering, of artificial self-organizing systems. We first illustrate the reverse engineering approach by the example of the Escherichia coli Min system, a model system for protein self-organization based on the reversible and energy-dependent interaction of the ATPase MinD and its activating protein MinE with a lipid membrane. By reconstituting MinE mutants impaired in ATPase stimulation, we demonstrate how large-scale Min protein patterns are modulated by MinE activity and concentration. We then provide a perspective on the de novo design of self-organizing protein networks. Tightly integrated reverse and forward engineering approaches will be key to understanding and engineering the intriguing phenomenon of protein pattern formation.This article is part of the theme issue 'Self-organization in cell biology'. © 2018 The Author(s).

  5. Microbial diversity at the moderate acidic stage in three different sulfidic mine tailings dumps generating acid mine drainage.

    PubMed

    Korehi, Hananeh; Blöthe, Marco; Schippers, Axel

    2014-11-01

    In freshly deposited sulfidic mine tailings the pH is alkaline or circumneutral. Due to pyrite or pyrrhotite oxidation the pH is dropping over time to pH values <3 at which acidophilic iron- and sulfur-oxidizing prokaryotes prevail and accelerate the oxidation processes, well described for several mine waste sites. The microbial communities at the moderate acidic stage in mine tailings are only scarcely studied. Here we investigated the microbial diversity via 16S rRNA gene sequence analysis in eight samples (pH range 3.2-6.5) from three different sulfidic mine tailings dumps in Botswana, Germany and Sweden. In total 701 partial 16S rRNA gene sequences revealed a divergent microbial community between the three sites and at different tailings depths. Proteobacteria and Firmicutes were overall the most abundant phyla in the clone libraries. Acidobacteria, Actinobacteria, Bacteroidetes, and Nitrospira occurred less frequently. The found microbial communities were completely different to microbial communities in tailings at

  6. Process Mining for Individualized Behavior Modeling Using Wireless Tracking in Nursing Homes

    PubMed Central

    Fernández-Llatas, Carlos; Benedi, José-Miguel; García-Gómez, Juan M.; Traver, Vicente

    2013-01-01

    The analysis of human behavior patterns is increasingly used for several research fields. The individualized modeling of behavior using classical techniques requires too much time and resources to be effective. A possible solution would be the use of pattern recognition techniques to automatically infer models to allow experts to understand individual behavior. However, traditional pattern recognition algorithms infer models that are not readily understood by human experts. This limits the capacity to benefit from the inferred models. Process mining technologies can infer models as workflows, specifically designed to be understood by experts, enabling them to detect specific behavior patterns in users. In this paper, the eMotiva process mining algorithms are presented. These algorithms filter, infer and visualize workflows. The workflows are inferred from the samples produced by an indoor location system that stores the location of a resident in a nursing home. The visualization tool is able to compare and highlight behavior patterns in order to facilitate expert understanding of human behavior. This tool was tested with nine real users that were monitored for a 25-week period. The results achieved suggest that the behavior of users is continuously evolving and changing and that this change can be measured, allowing for behavioral change detection. PMID:24225907

  7. Topographic Maps and Coal Mining.

    ERIC Educational Resources Information Center

    Raitz, Karl B.

    1984-01-01

    Geography teachers can illustrate the patterns associated with mineral fuel production, especially coal, by using United States Geological Survey topographic maps, which are illustrated by symbols that indicate mine-related features, such as shafts and tailings. Map reading exercises are presented; an interpretative map key that can facilitate…

  8. Application and Exploration of Big Data Mining in Clinical Medicine.

    PubMed

    Zhang, Yue; Guo, Shu-Li; Han, Li-Na; Li, Tie-Ling

    2016-03-20

    To review theories and technologies of big data mining and their application in clinical medicine. Literatures published in English or Chinese regarding theories and technologies of big data mining and the concrete applications of data mining technology in clinical medicine were obtained from PubMed and Chinese Hospital Knowledge Database from 1975 to 2015. Original articles regarding big data mining theory/technology and big data mining's application in the medical field were selected. This review characterized the basic theories and technologies of big data mining including fuzzy theory, rough set theory, cloud theory, Dempster-Shafer theory, artificial neural network, genetic algorithm, inductive learning theory, Bayesian network, decision tree, pattern recognition, high-performance computing, and statistical analysis. The application of big data mining in clinical medicine was analyzed in the fields of disease risk assessment, clinical decision support, prediction of disease development, guidance of rational use of drugs, medical management, and evidence-based medicine. Big data mining has the potential to play an important role in clinical medicine.

  9. Discovering Activities to Recognize and Track in a Smart Environment.

    PubMed

    Rashidi, Parisa; Cook, Diane J; Holder, Lawrence B; Schmitter-Edgecombe, Maureen

    2011-01-01

    The machine learning and pervasive sensing technologies found in smart homes offer unprecedented opportunities for providing health monitoring and assistance to individuals experiencing difficulties living independently at home. In order to monitor the functional health of smart home residents, we need to design technologies that recognize and track activities that people normally perform as part of their daily routines. Although approaches do exist for recognizing activities, the approaches are applied to activities that have been pre-selected and for which labeled training data is available. In contrast, we introduce an automated approach to activity tracking that identifies frequent activities that naturally occur in an individual's routine. With this capability we can then track the occurrence of regular activities to monitor functional health and to detect changes in an individual's patterns and lifestyle. In this paper we describe our activity mining and tracking approach and validate our algorithms on data collected in physical smart environments.

  10. Probabilistic Seeking Prediction in P2P VoD Systems

    NASA Astrophysics Data System (ADS)

    Wang, Weiwei; Xu, Tianyin; Gao, Yang; Lu, Sanglu

    In P2P VoD streaming systems, user behavior modeling is critical to help optimise user experience as well as system throughput. However, it still remains a challenging task due to the dynamic characteristics of user viewing behavior. In this paper, we consider the problem of user seeking prediction which is to predict the user's next seeking position so that the system can proactively make response. We present a novel method for solving this problem. In our method, frequent sequential patterns mining is first performed to extract abstract states which are not overlapped and cover the whole video file altogether. After mapping the raw training dataset to state transitions according to the abstract states, we use a simpel probabilistic contingency table to build the prediction model. We design an experiment on the synthetic P2P VoD dataset. The results demonstrate the effectiveness of our method.

  11. Capturing coupled riparian and coastal disturbance from industrial mining using cloud-resilient satellite time series analysis.

    PubMed

    Alonzo, Michael; Van Den Hoek, Jamon; Ahmed, Nabil

    2016-10-11

    The socio-ecological impacts of large scale resource extraction are frequently underreported in underdeveloped regions. The open-pit Grasberg mine in Papua, Indonesia, is one of the world's largest copper and gold extraction operations. Grasberg mine tailings are discharged into the lowland Ajkwa River deposition area (ADA) leading to forest inundation and degradation of water bodies critical to indigenous peoples. The extent of the changes and temporal linkages with mining activities are difficult to establish given restricted access to the region and persistent cloud cover. Here, we introduce remote sensing methods to "peer through" atmospheric contamination using a dense Landsat time series to simultaneously quantify forest loss and increases in estuarial suspended particulate matter (SPM) concentration. We identified 138 km 2 of forest loss between 1987 and 2014, an area >42 times larger than the mine itself. Between 1987 and 1998, the rate of disturbance was highly correlated (Pearson's r = 0.96) with mining activity. Following mine expansion and levee construction along the ADA in the mid-1990s, we recorded significantly (p < 0.05) higher SPM in the Ajkwa Estuary compared to neighboring estuaries. This research provides a means to quantify multiple modes of ecological damage from mine waste disposal or other disturbance events.

  12. Capturing coupled riparian and coastal disturbance from industrial mining using cloud-resilient satellite time series analysis

    PubMed Central

    Alonzo, Michael; Van Den Hoek, Jamon; Ahmed, Nabil

    2016-01-01

    The socio-ecological impacts of large scale resource extraction are frequently underreported in underdeveloped regions. The open-pit Grasberg mine in Papua, Indonesia, is one of the world’s largest copper and gold extraction operations. Grasberg mine tailings are discharged into the lowland Ajkwa River deposition area (ADA) leading to forest inundation and degradation of water bodies critical to indigenous peoples. The extent of the changes and temporal linkages with mining activities are difficult to establish given restricted access to the region and persistent cloud cover. Here, we introduce remote sensing methods to “peer through” atmospheric contamination using a dense Landsat time series to simultaneously quantify forest loss and increases in estuarial suspended particulate matter (SPM) concentration. We identified 138 km2 of forest loss between 1987 and 2014, an area >42 times larger than the mine itself. Between 1987 and 1998, the rate of disturbance was highly correlated (Pearson’s r = 0.96) with mining activity. Following mine expansion and levee construction along the ADA in the mid-1990s, we recorded significantly (p < 0.05) higher SPM in the Ajkwa Estuary compared to neighboring estuaries. This research provides a means to quantify multiple modes of ecological damage from mine waste disposal or other disturbance events. PMID:27725748

  13. Capturing coupled riparian and coastal disturbance from industrial mining using cloud-resilient satellite time series analysis

    NASA Astrophysics Data System (ADS)

    Alonzo, Michael; van den Hoek, Jamon; Ahmed, Nabil

    2016-10-01

    The socio-ecological impacts of large scale resource extraction are frequently underreported in underdeveloped regions. The open-pit Grasberg mine in Papua, Indonesia, is one of the world’s largest copper and gold extraction operations. Grasberg mine tailings are discharged into the lowland Ajkwa River deposition area (ADA) leading to forest inundation and degradation of water bodies critical to indigenous peoples. The extent of the changes and temporal linkages with mining activities are difficult to establish given restricted access to the region and persistent cloud cover. Here, we introduce remote sensing methods to “peer through” atmospheric contamination using a dense Landsat time series to simultaneously quantify forest loss and increases in estuarial suspended particulate matter (SPM) concentration. We identified 138 km2 of forest loss between 1987 and 2014, an area >42 times larger than the mine itself. Between 1987 and 1998, the rate of disturbance was highly correlated (Pearson’s r = 0.96) with mining activity. Following mine expansion and levee construction along the ADA in the mid-1990s, we recorded significantly (p < 0.05) higher SPM in the Ajkwa Estuary compared to neighboring estuaries. This research provides a means to quantify multiple modes of ecological damage from mine waste disposal or other disturbance events.

  14. Adaptive semantic tag mining from heterogeneous clinical research texts.

    PubMed

    Hao, T; Weng, C

    2015-01-01

    To develop an adaptive approach to mine frequent semantic tags (FSTs) from heterogeneous clinical research texts. We develop a "plug-n-play" framework that integrates replaceable unsupervised kernel algorithms with formatting, functional, and utility wrappers for FST mining. Temporal information identification and semantic equivalence detection were two example functional wrappers. We first compared this approach's recall and efficiency for mining FSTs from ClinicalTrials.gov to that of a recently published tag-mining algorithm. Then we assessed this approach's adaptability to two other types of clinical research texts: clinical data requests and clinical trial protocols, by comparing the prevalence trends of FSTs across three texts. Our approach increased the average recall and speed by 12.8% and 47.02% respectively upon the baseline when mining FSTs from ClinicalTrials.gov, and maintained an overlap in relevant FSTs with the base- line ranging between 76.9% and 100% for varying FST frequency thresholds. The FSTs saturated when the data size reached 200 documents. Consistent trends in the prevalence of FST were observed across the three texts as the data size or frequency threshold changed. This paper contributes an adaptive tag-mining framework that is scalable and adaptable without sacrificing its recall. This component-based architectural design can be potentially generalizable to improve the adaptability of other clinical text mining methods.

  15. Automated Analysis of Renewable Energy Datasets ('EE/RE Data Mining')

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bush, Brian; Elmore, Ryan; Getman, Dan

    This poster illustrates methods to substantially improve the understanding of renewable energy data sets and the depth and efficiency of their analysis through the application of statistical learning methods ('data mining') in the intelligent processing of these often large and messy information sources. The six examples apply methods for anomaly detection, data cleansing, and pattern mining to time-series data (measurements from metering points in buildings) and spatiotemporal data (renewable energy resource datasets).

  16. Detecting Malicious Tweets in Twitter Using Runtime Monitoring With Hidden Information

    DTIC Science & Technology

    2016-06-01

    text mining using Twitter streaming API and python [Online]. Available: http://adilmoujahid.com/posts/2014/07/twitter-analytics/ [22] M. Singh, B...sites with 645,750,000 registered users [3] and has open source public tweets for data mining . 2. Malicious Users and Tweets In the modern world...want to data mine in Twitter, and presents the natural language assertions and corresponding rule patterns. It then describes the steps performed using

  17. Numerical linear algebra in data mining

    NASA Astrophysics Data System (ADS)

    Eldén, Lars

    Ideas and algorithms from numerical linear algebra are important in several areas of data mining. We give an overview of linear algebra methods in text mining (information retrieval), pattern recognition (classification of handwritten digits), and PageRank computations for web search engines. The emphasis is on rank reduction as a method of extracting information from a data matrix, low-rank approximation of matrices using the singular value decomposition and clustering, and on eigenvalue methods for network analysis.

  18. Remote mineral mapping using AVIRIS data at Summitville, Colorado and the adjacent San Juan Mountains

    NASA Technical Reports Server (NTRS)

    King, Trude V. V.; Clark, Roger N.; Ager, Cathy; Swayze, Gregg A.

    1995-01-01

    We have demonstrated the unique utility of imaging spectroscopy in mapping mineral distribution. In the Summitville mining region we have shown that the mine site does not contribute clay minerals to the Alamosa River, but does contribute Fe-bearing minerals. Such minerals have the potential to carry heavy metals. This application illustrates only one specific environmental application of imaging spectroscopy data. For instance, the types of minerals we can map with confidence are those frequently associated with environmental problems related to active and abandoned mine lands. Thus, the potential utility of this technology to the field of environmental science has yet to be fully explored.

  19. Natural thorium resources and recovery: Options and impacts

    USGS Publications Warehouse

    Ault, Timothy; Van Gosen, Bradley S.; Krahn, Steven; Croff, Allen

    2016-01-01

    This paper reviews the front end of the thorium fuel cycle, including the extent and variety of thorium deposits, the potential sources of thorium production, and the physical and chemical technologies required to isolate and purify thorium. Thorium is frequently found within rare earth element–bearing minerals that exist in diverse types of mineral deposits, often in conjunction with other minerals mined for their commercial value. It may be possible to recover substantial quantities of thorium as a by-product from active titanium, uranium, tin, iron, and rare earth mines. Incremental physical and chemical processing is required to obtain a purified thorium product from thorium minerals, but documented experience with these processes is extensive, and incorporating thorium recovery should not be overly challenging. The anticipated environmental impacts of by-product thorium recovery are small relative to those of uranium recovery since existing mining infrastructure utilization avoids the opening and operation of new mines and thorium recovery removes radionuclides from the mining tailings.

  20. Data Mining and Complex Problems: Case Study in Composite Materials

    NASA Technical Reports Server (NTRS)

    Rabelo, Luis; Marin, Mario

    2009-01-01

    Data mining is defined as the discovery of useful, possibly unexpected, patterns and relationships in data using statistical and non-statistical techniques in order to develop schemes for decision and policy making. Data mining can be used to discover the sources and causes of problems in complex systems. In addition, data mining can support simulation strategies by finding the different constants and parameters to be used in the development of simulation models. This paper introduces a framework for data mining and its application to complex problems. To further explain some of the concepts outlined in this paper, the potential application to the NASA Shuttle Reinforced Carbon-Carbon structures and genetic programming is used as an illustration.

  1. Application of EREP imagery to fracture-related mine safety hazards in coal mining and mining-environmental problems in Indiana. [Indiana and Illinois

    NASA Technical Reports Server (NTRS)

    Wier, C. E. (Principal Investigator); Powell, R. L.; Amato, R. V.; Russell, O. R.; Martin, K. R.

    1975-01-01

    The author has identified the following significant results. This investigation evaluated the applicability of a variety of sensor types, formats, and resolution capabilities to the study of both fuel and nonfuel mined lands. The image reinforcement provided by stereo viewing of the EREP images proved useful for identifying lineaments and for mined lands mapping. Skylab S190B color and color infrared transparencies were the most useful EREP imagery. New information on lineament and fracture patterns in the bedrock of Indiana and Illinois extracted from analysis of the Skylab imagery has contributed to furthering the geological understanding of this portion of the Illinois basin.

  2. Constructing Patient Specific Clinical Trajectories from Electronic Healthcare Reimbursement Claims using Sequential Pattern Mining

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pullum, Laura L; Hobson, Tanner C

    We examine the use of electronic healthcare reimbursement claims (EHRC) for analyzing healthcare delivery and practice patterns across the United States (US). By analyzing over 1 billion EHRCs, we track patterns of clinical procedures administered to patients with heart disease (HD) using sequential pattern mining algorithms. Our analyses reveal that the clinical procedures performed on HD patients are highly varied leading up to and after the primary diagnosis. The discovered clinical procedure sequences reveal significant differences in the overall costs incurred across different parts of the US, indicating significant heterogeneity in treating HD patients. We show that a data-driven approachmore » to understand patient specific clinical trajectories constructed from EHRC can provide quantitative insights into how to better manage and treat patients.« less

  3. Educational Data Mining Application for Estimating Students Performance in Weka Environment

    NASA Astrophysics Data System (ADS)

    Gowri, G. Shiyamala; Thulasiram, Ramasamy; Amit Baburao, Mahindra

    2017-11-01

    Educational data mining (EDM) is a multi-disciplinary research area that examines artificial intelligence, statistical modeling and data mining with the data generated from an educational institution. EDM utilizes computational ways to deal with explicate educational information keeping in mind the end goal to examine educational inquiries. To make a country stand unique among the other nations of the world, the education system has to undergo a major transition by redesigning its framework. The concealed patterns and data from various information repositories can be extracted by adopting the techniques of data mining. In order to summarize the performance of students with their credentials, we scrutinize the exploitation of data mining in the field of academics. Apriori algorithmic procedure is extensively applied to the database of students for a wider classification based on various categorizes. K-means procedure is applied to the same set of databases in order to accumulate them into a specific category. Apriori algorithm deals with mining the rules in order to extract patterns that are similar along with their associations in relation to various set of records. The records can be extracted from academic information repositories. The parameters used in this study gives more importance to psychological traits than academic features. The undesirable student conduct can be clearly witnessed if we make use of information mining frameworks. Thus, the algorithms efficiently prove to profile the students in any educational environment. The ultimate objective of the study is to suspect if a student is prone to violence or not.

  4. Windblown Dust Deposition Forecasting and Spread of Contamination around Mine Tailings.

    PubMed

    Stovern, Michael; Guzmán, Héctor; Rine, Kyle P; Felix, Omar; King, Matthew; Ela, Wendell P; Betterton, Eric A; Sáez, Avelino Eduardo

    2016-02-01

    Wind erosion, transport and deposition of windblown dust from anthropogenic sources, such as mine tailings impoundments, can have significant effects on the surrounding environment. The lack of vegetation and the vertical protrusion of the mine tailings above the neighboring terrain make the tailings susceptible to wind erosion. Modeling the erosion, transport and deposition of particulate matter from mine tailings is a challenge for many reasons, including heterogeneity of the soil surface, vegetative canopy coverage, dynamic meteorological conditions and topographic influences. In this work, a previously developed Deposition Forecasting Model (DFM) that is specifically designed to model the transport of particulate matter from mine tailings impoundments is verified using dust collection and topsoil measurements. The DFM is initialized using data from an operational Weather Research and Forecasting (WRF) model. The forecast deposition patterns are compared to dust collected by inverted-disc samplers and determined through gravimetric, chemical composition and lead isotopic analysis. The DFM is capable of predicting dust deposition patterns from the tailings impoundment to the surrounding area. The methodology and approach employed in this work can be generalized to other contaminated sites from which dust transport to the local environment can be assessed as a potential route for human exposure.

  5. Windblown Dust Deposition Forecasting and Spread of Contamination around Mine Tailings

    PubMed Central

    Stovern, Michael; Guzmán, Héctor; Rine, Kyle P.; Felix, Omar; King, Matthew; Ela, Wendell P.; Betterton, Eric A.; Sáez, Avelino Eduardo

    2017-01-01

    Wind erosion, transport and deposition of windblown dust from anthropogenic sources, such as mine tailings impoundments, can have significant effects on the surrounding environment. The lack of vegetation and the vertical protrusion of the mine tailings above the neighboring terrain make the tailings susceptible to wind erosion. Modeling the erosion, transport and deposition of particulate matter from mine tailings is a challenge for many reasons, including heterogeneity of the soil surface, vegetative canopy coverage, dynamic meteorological conditions and topographic influences. In this work, a previously developed Deposition Forecasting Model (DFM) that is specifically designed to model the transport of particulate matter from mine tailings impoundments is verified using dust collection and topsoil measurements. The DFM is initialized using data from an operational Weather Research and Forecasting (WRF) model. The forecast deposition patterns are compared to dust collected by inverted-disc samplers and determined through gravimetric, chemical composition and lead isotopic analysis. The DFM is capable of predicting dust deposition patterns from the tailings impoundment to the surrounding area. The methodology and approach employed in this work can be generalized to other contaminated sites from which dust transport to the local environment can be assessed as a potential route for human exposure. PMID:29082035

  6. Video mining using combinations of unsupervised and supervised learning techniques

    NASA Astrophysics Data System (ADS)

    Divakaran, Ajay; Miyahara, Koji; Peker, Kadir A.; Radhakrishnan, Regunathan; Xiong, Ziyou

    2003-12-01

    We discuss the meaning and significance of the video mining problem, and present our work on some aspects of video mining. A simple definition of video mining is unsupervised discovery of patterns in audio-visual content. Such purely unsupervised discovery is readily applicable to video surveillance as well as to consumer video browsing applications. We interpret video mining as content-adaptive or "blind" content processing, in which the first stage is content characterization and the second stage is event discovery based on the characterization obtained in stage 1. We discuss the target applications and find that using a purely unsupervised approach are too computationally complex to be implemented on our product platform. We then describe various combinations of unsupervised and supervised learning techniques that help discover patterns that are useful to the end-user of the application. We target consumer video browsing applications such as commercial message detection, sports highlights extraction etc. We employ both audio and video features. We find that supervised audio classification combined with unsupervised unusual event discovery enables accurate supervised detection of desired events. Our techniques are computationally simple and robust to common variations in production styles etc.

  7. Integration of Text- and Data-Mining Technologies for Use in Banking Applications

    NASA Astrophysics Data System (ADS)

    Maslankowski, Jacek

    Unstructured data, most of it in the form of text files, typically accounts for 85% of an organization's knowledge stores, but it's not always easy to find, access, analyze or use (Robb 2004). That is why it is important to use solutions based on text and data mining. This solution is known as duo mining. This leads to improve management based on knowledge owned in organization. The results are interesting. Data mining provides to lead with structuralized data, usually powered from data warehouses. Text mining, sometimes called web mining, looks for patterns in unstructured data — memos, document and www. Integrating text-based information with structured data enriches predictive modeling capabilities and provides new stores of insightful and valuable information for driving business and research initiatives forward.

  8. 43 CFR 4.1351 - Preliminary finding by OSM.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... APPEALS PROCEDURES Special Rules Applicable to Surface Coal Mining Hearings and Appeals Request for...(c) of the Act, 30 U.s.c. 1260(c) (federal Program; Federal Lands Program; Federal Program for Indian... or has controlled surface coal mining and reclamation operations with a demonstrated pattern of...

  9. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shumway, R.H.; McQuarrie, A.D.

    Robust statistical approaches to the problem of discriminating between regional earthquakes and explosions are developed. We compare linear discriminant analysis using descriptive features like amplitude and spectral ratios with signal discrimination techniques using the original signal waveforms and spectral approximations to the log likelihood function. Robust information theoretic techniques are proposed and all methods are applied to 8 earthquakes and 8 mining explosions in Scandinavia and to an event from Novaya Zemlya of unknown origin. It is noted that signal discrimination approaches based on discrimination information and Renyi entropy perform better in the test sample than conventional methods based onmore » spectral ratios involving the P and S phases. Two techniques for identifying the ripple-firing pattern for typical mining explosions are proposed and shown to work well on simulated data and on several Scandinavian earthquakes and explosions. We use both cepstral analysis in the frequency domain and a time domain method based on the autocorrelation and partial autocorrelation functions. The proposed approach strips off underlying smooth spectral and seasonal spectral components corresponding to the echo pattern induced by two simple ripple-fired models. For two mining explosions, a pattern is identified whereas for two earthquakes, no pattern is evident.« less

  10. Moving Equipment and Workers to Mine Construction Site at a Logistically Challenged Area

    NASA Astrophysics Data System (ADS)

    Tikasz, Laszlo; Biroscak, Dennis; Pentiah, Scheale Duvah; McCulloch, Robert I.

    Social sensitivity of habitants, minimal impact on the environment, low-grade infrastructure, high altitude, frequent rock slides combined with expectations for the timely moving of equipment and workers are some of the challenges emerging from the current construction of a mine. Starting with traditional planning, and experiencing issues in the early phase of the construction, a traffic simulator was requested by the Procurement Department in order to validate daily-weekly schedules and predict likely delays or blockages on the long-term.

  11. Land use-based landscape planning and restoration in mine closure areas.

    PubMed

    Zhang, Jianjun; Fu, Meichen; Hassani, Ferri P; Zeng, Hui; Geng, Yuhuan; Bai, Zhongke

    2011-05-01

    Landscape planning and restoration in mine closure areas is not only an inevitable choice to sustain mining areas but also an important path to maximize landscape resources and to improve ecological function in mine closure areas. The analysis of the present mine development shows that many mines are unavoidably facing closures in China. This paper analyzes the periodic impact of mining activities on landscapes and then proposes planning concepts and principles. According to the landscape characteristics in mine closure areas, this paper classifies available landscape resources in mine closure areas into the landscape for restoration, for limited restoration and for protection, and then summarizes directions for their uses. This paper establishes the framework of spatial control planning and design of landscape elements from "macro control, medium allocation and micro optimization" for the purpose of managing and using this kind of special landscape resources. Finally, this paper applies the theories and methods to a case study in Wu'an from two aspects: the construction of a sustainable land-use pattern on a large scale and the optimized allocation of typical mine landscape resources on a small scale.

  12. Quantifying the contribution of airborne lead (Pb) to surface waters in northeastern Oklahoma

    NASA Astrophysics Data System (ADS)

    Li, J. J.; McDonald, J.; Curtis, H.

    2017-12-01

    The northeastern Oklahoma, home to a number of Native American Tribes, is part of the well-known Tri-State Mining District (TSMD). One hundred years of mining production in this area has left numerous, large chat piles on the surrounding environment, directly affecting the town of Picher and many other tribe communities. Byproducts of the mining, including lead (Pb)-contain dust have been transported to the atmosphere and seeped into groundwater, lakes, ponds and rivers. Due to this contamination, many children in the area have elevated levels of Pb in their bodies. Despite a substantial number of studies and efforts on the restoration of heavy metal contamination in this area (e.g. The Tar Creek Superfund Site, EPA), no studies have attempted to distinguish the contributions of different sources, particularly from the atmospheric deposition, of heavy metals to the aquatic environment. In this study, we analyzed the atmospheric deposition of Pb from 4 sites located close to the chat piles for the period of 2010 to 2016. Our preliminary analysis showed that atmospheric Pb has a strong seasonal pattern with two peak times in early spring and late fall, which largely correspond with the dry periods in the this area. Atmospheric concentrations of Pb monitored at these sites frequently exceeded 0.15 μg/m3, the National Ambient Air Quality Standards (NAAQS) standard for ambient air Pb, and was generally 10 times higher than atmospheric Pb monitored in Tulsa, OK, a major metropolitan area 150 km southwest of the monitoring sites. With the known Pb flux to the sediments of the water bodies, we estimated that the contribution of Pb from the atmospheric deposition to the surface waters is up to 25%, depending on the distance of the water bodies to concentrated distribution of the chat piles.

  13. Leptospirosis Seroprevalence Among Blue Metal Mine Workers of Tamil Nadu, India.

    PubMed

    Parveen, Sakkarai Mohamed Asha; Suganyaa, Baskar; Sathya, Muthu Sri; Margreat, Alphonse Asirvatham Princy; Sivasankari, Karikalacholan; Shanmughapriya, Santhanam; Hoffman, Nicholas E; Natarajaseenivasan, Kalimuthusamy

    2016-07-06

    Leptospirosis is mainly considered an occupational disease, prevalent among agriculture, sewage works, forestry, and animal slaughtering populations. However, putative risk to miners and their inclusion in the high-risk leptospirosis group remain in need of rigorous analysis. Therefore, a study was conducted with the objective to assess the leptospirosis seroprevalence among miners of two districts of Tamil Nadu, India. A total of 244 sera samples from Pudukkottai miners (124) and Karur miners (120) were analyzed by microscopic agglutination test. Antibodies to leptospires were detected in 94 samples giving an overall seroprevalence of 38.5%. The seroprevalence was higher among Pudukkottai miners (65.3%) when compared with Karur miners (10.8%). Seroprevalence among control population (13%) was significantly less than that of the Pudukkottai miners marking a possible high-risk population group distinction. Subject sera most commonly reacted with organisms of the serogroup Autumnalis, and the pattern was similar in carrier animals of the study areas. Two leptospires were isolated from kidney samples of rats. The prevalence of Autumnalis among rodents and humans source tracked human leptospirosis among the miners. The study also determined that Pudukkottai miners are subjected to high-risk challenges such as exposure to water bodies on the way to the mines (odds ratio [OR] = 10.6), wet mine areas (OR = 10.6), rat infestation (OR = 4.6), and cattle rearing (OR = 10.4) and are thus frequently exposed to leptospirosis compared with Karur miners. Hence, control strategies targeting these populations will likely to prove to be effective remediation strategies benefiting Pudukkottai miners and workers in similar environments across occupations. © The American Society of Tropical Medicine and Hygiene.

  14. Conserving relics from ancient underground worlds: assessing the influence of cave and landscape features on obligate iron cave dwellers from the Eastern Amazon

    PubMed Central

    Prous, Xavier; Calux, Allan; Gastauer, Markus; Nicacio, Gilberto; Zampaulo, Robson; Souza-Filho, Pedro W.M.; Oliveira, Guilherme; Brandi, Iuri V.; Siqueira, José O.

    2018-01-01

    The degradation of subterranean habitats is believed to represent a serious threat for the conservation of obligate subterranean dwellers (troglobites), many of which are short-range endemics. However, while the factors influencing cave biodiversity remain largely unknown, the influence of the surrounding landscape and patterns of subterranean connectivity of terrestrial troglobitic communities have never been systematically assessed. Using spatial statistics to analyze the most comprehensive speleological database yet available for tropical caves, we first assess the influence of iron cave characteristics and the surrounding landscape on troglobitic communities from the Eastern Amazon. We then determine the spatial pattern of troglobitic community composition, species richness, phylogenetic diversity, and the occurrence of frequent troglobitic species, and finally quantify how different landscape features influence the connectivity between caves. Our results reveal the key importance of habitat amount, guano, water, lithology, geomorphology, and elevation in shaping iron cave troglobitic communities. While mining within 250 m from the caves influenced species composition, increasing agricultural land cover within 50 m from the caves reduced species richness and phylogenetic diversity. Troglobitic species composition, species richness, phylogenetic diversity, and the occurrence of frequent troglobites showed spatial autocorrelation for up to 40 km. Finally, our results suggest that the conservation of cave clusters should be prioritized, as geographic distance was the main factor determining connectivity between troglobitic communities. Overall, our work sheds important light onto one of the most overlooked terrestrial ecosystems, and highlights the need to shift conservation efforts from individual caves to subterranean habitats as a whole. PMID:29576987

  15. Conserving relics from ancient underground worlds: assessing the influence of cave and landscape features on obligate iron cave dwellers from the Eastern Amazon.

    PubMed

    Jaffé, Rodolfo; Prous, Xavier; Calux, Allan; Gastauer, Markus; Nicacio, Gilberto; Zampaulo, Robson; Souza-Filho, Pedro W M; Oliveira, Guilherme; Brandi, Iuri V; Siqueira, José O

    2018-01-01

    The degradation of subterranean habitats is believed to represent a serious threat for the conservation of obligate subterranean dwellers (troglobites), many of which are short-range endemics. However, while the factors influencing cave biodiversity remain largely unknown, the influence of the surrounding landscape and patterns of subterranean connectivity of terrestrial troglobitic communities have never been systematically assessed. Using spatial statistics to analyze the most comprehensive speleological database yet available for tropical caves, we first assess the influence of iron cave characteristics and the surrounding landscape on troglobitic communities from the Eastern Amazon. We then determine the spatial pattern of troglobitic community composition, species richness, phylogenetic diversity, and the occurrence of frequent troglobitic species, and finally quantify how different landscape features influence the connectivity between caves. Our results reveal the key importance of habitat amount, guano, water, lithology, geomorphology, and elevation in shaping iron cave troglobitic communities. While mining within 250 m from the caves influenced species composition, increasing agricultural land cover within 50 m from the caves reduced species richness and phylogenetic diversity. Troglobitic species composition, species richness, phylogenetic diversity, and the occurrence of frequent troglobites showed spatial autocorrelation for up to 40 km. Finally, our results suggest that the conservation of cave clusters should be prioritized, as geographic distance was the main factor determining connectivity between troglobitic communities. Overall, our work sheds important light onto one of the most overlooked terrestrial ecosystems, and highlights the need to shift conservation efforts from individual caves to subterranean habitats as a whole.

  16. Contents of Japanese pro- and anti-HPV vaccination websites: A text mining analysis.

    PubMed

    Okuhara, Tsuyoshi; Ishikawa, Hirono; Okada, Masahumi; Kato, Mio; Kiuchi, Takahiro

    2018-03-01

    In Japan, the human papillomavirus (HPV) vaccination rate has sharply fallen to nearly 0% due to sensational media reports of adverse events. Online anti-HPV-vaccination activists often warn readers of the vaccine's dangers. Here, we aimed to examine frequently appearing contents on pro- and anti-HPV vaccination websites. We conducted online searches via two major search engines (Google Japan and Yahoo! Japan). Targeted websites were classified as "pro," "anti," or "neutral" according to their claims, with the author(s) classified as "health professionals," "mass media," or "laypersons." We then conducted a text mining analysis. Of the 270 sites analyzed, 16 contents were identified. The most frequently appearing contents on pro websites were vaccine side effects, preventable effect of vaccination, and cause of cervical cancer. The most frequently appearing contents on anti websites were vaccine side effects, vaccine toxicity, and girls who suffer from vaccine side effects. Main disseminators of each content according to the author's expertise were also revealed. Pro-HPV vaccination websites should supplement deficient contents and respond to frequent contents on anti-HPV websites. Effective tactics are needed to better communicate susceptibility to cervical cancer, frequency of side effects, and responses to vaccine toxicity and conspiracy theories. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. Solar Data Mining at Georgia State University

    NASA Astrophysics Data System (ADS)

    Angryk, R.; Martens, P. C.; Schuh, M.; Aydin, B.; Kempton, D.; Banda, J.; Ma, R.; Naduvil-Vadukootu, S.; Akkineni, V.; Küçük, A.; Filali Boubrahimi, S.; Hamdi, S. M.

    2016-12-01

    In this talk we give an overview of research projects related to solar data analysis that are conducted at Georgia State University. We will provide update on multiple advances made by our research team on the analysis of image parameters, spatio-temporal patterns mining, temporal data analysis and our experiences with big, heterogeneous solar data visualization, analysis, processing and storage. We will talk about up-to-date data mining methodologies, and their importance for big data-driven solar physics research.

  18. Mining Longitudinal Web Queries: Trends and Patterns.

    ERIC Educational Resources Information Center

    Wang, Peiling; Berry, Michael W.; Yang, Yiheng

    2003-01-01

    Analyzed user queries submitted to an academic Web site during a four-year period, using a relational database, to examine users' query behavior, to identify problems they encounter, and to develop techniques for optimizing query analysis and mining. Linguistic analyses focus on query structures, lexicon, and word associations using statistical…

  19. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kargupta, H.; Stafford, B.; Hamzaoglu, I.

    This paper describes an experimental parallel/distributed data mining system PADMA (PArallel Data Mining Agents) that uses software agents for local data accessing and analysis and a web based interface for interactive data visualization. It also presents the results of applying PADMA for detecting patterns in unstructured texts of postmortem reports and laboratory test data for Hepatitis C patients.

  20. Restoring tropical forests on bauxite mined lands: lessons from the Brazilian Amazon

    Treesearch

    John A. Parrotta; Oliver H. Knowles

    2001-01-01

    Restoring self-sustaining tropical forest ecosystems on surface mined sites is a formidable challenge that requires the integration of proven reclamation techniques and reforestation strategies appropriate to specific site conditions, including landscape biodiversity patterns. Restorationists working in most tropical settings are usually hampered by lack of basic...

  1. [Analysis of on medication rules for Qi-deficiency and blood-stasis syndrome of chronic heart failure based on data mining technology].

    PubMed

    Wang, Qian; Yao, Geng-Zhen; Pan, Guang-Ming; Huang, Jing-Yi; An, Yi-Pei; Zou, Xu

    2017-01-01

    To analyze the medication features and the regularity of prescriptions of traditional Chinese medicine in treating patients with Qi-deficiency and blood-stasis syndrome of chronic heart failure based on modern literature. In this article, CNKI Chinese academic journal database, Wanfang Chinese academic journal database and VIP Chinese periodical database were all searched from January 2000 to December 2015 for the relevant literature on traditional Chinese medicine treatment for Qi-deficiency and blood-stasis syndrome of chronic heart failure. Then a normalized database was established for further data mining and analysis. Subsequently, the medication features and the regularity of prescriptions were mined by using traditional Chinese medicine inheritance support system(V2.5), association rules, improved mutual information algorithm, complex system entropy clustering and other mining methods. Finally, a total of 171 articles were included, involving 171 prescriptions, 140 kinds of herbs, with a total frequency of 1 772 for the herbs. As a result, 19 core prescriptions and 7 new prescriptions were mined. The most frequently used herbs included Huangqi(Astragali Radix), Danshen(Salviae Miltiorrhizae Radix et Rhizoma), Fuling(Poria), Renshen(Ginseng Radix et Rhizoma), Tinglizi(Semen Lepidii), Baizhu(Atractylodis Macrocephalae Rhizoma), and Guizhi(Cinnamomum Ramulus). The core prescriptions were composed of Huangqi(Astragali Radix), Danshen(Salviae Miltiorrhizae Radix et Rhizoma) and Fuling(Poria), etc. The high frequent herbs and core prescriptions not only highlight the medication features of Qi-invigorating and blood-circulating therapy, but also reflect the regularity of prescriptions of blood-circulating, Yang-warming, and urination-promoting therapy based on syndrome differentiation. Moreover, the mining of the new prescriptions provide new reference and inspiration for clinical treatment of various accompanying symptoms of chronic heart failure. In conclusion, this article provides new reference for traditional Chinese medicine in the treatment of chronic heart failure. Copyright© by the Chinese Pharmaceutical Association.

  2. Mining in low coal. Volume 1. Biomechanics and work physiology. Open file report 15 Jun 78-15 Sep 81

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ayoub, M.M.; Bethea, N.J.; Bobo, M.

    1981-11-01

    The objectives of this research were (1) to evaluate the job demands associated with low coal mining, (2) to survey the anthropometry, strength, and aerobic capacity of low coal miners to determine if they differ from the U.S. population, and (3) to recommend, on the basis of available information, optimal job and work station design for low coal mining. The male and female anthropometry, except for weight and circumferential dimensions, was quite similar to the comparison populations. Back strength for male and female miners was significantly lower than the industrial worker population. This can be one of the contributing factorsmore » of low back problems in mining. Shoveling, timbering, and helpers tasks were physiologically demanding activities. However, because of the frequent stoppage of work, adequate rest was usually available. If work stoppage is corrected, then better work and rest schedules are essential.« less

  3. Modeling Patterns of Total Dissolved Solids Release from Central Appalachia, USA, Mine Spoils.

    PubMed

    Clark, Elyse V; Zipper, Carl E; Daniels, W Lee; Orndorff, Zenah W; Keefe, Matthew J

    2017-01-01

    Surface mining in the central Appalachian coalfields (USA) influences water quality because the interaction of infiltrated waters and O with freshly exposed mine spoils releases elevated levels of total dissolved solids (TDS) to streams. Modeling and predicting the short- and long-term TDS release potentials of mine spoils can aid in the management of current and future mining-influenced watersheds and landscapes. In this study, the specific conductance (SC, a proxy variable for TDS) patterns of 39 mine spoils during a sequence of 40 leaching events were modeled using a five-parameter nonlinear regression. Estimated parameter values were compared to six rapid spoil assessment techniques (RSATs) to assess predictive relationships between model parameters and RSATs. Spoil leachates reached maximum values, 1108 ± 161 μS cm on average, within the first three leaching events, then declined exponentially to a breakpoint at the 16th leaching event on average. After the breakpoint, SC release remained linear, with most spoil samples exhibiting declines in SC release with successive leaching events. The SC asymptote averaged 276 ± 25 μS cm. Only three samples had SCs >500 μS cm at the end of the 40 leaching events. Model parameters varied with mine spoil rock and weathering type, and RSATs were predictive of four model parameters. Unweathered samples released higher SCs throughout the leaching period relative to weathered samples, and rock type influenced the rate of SC release. The RSATs for SC, total S, and neutralization potential may best predict certain phases of mine spoil TDS release. Copyright © by the American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America, Inc.

  4. Quality Assurance of NCI Thesaurus by Mining Structural-Lexical Patterns

    PubMed Central

    Abeysinghe, Rashmie; Brooks, Michael A.; Talbert, Jeffery; Licong, Cui

    2017-01-01

    Quality assurance of biomedical terminologies such as the National Cancer Institute (NCI) Thesaurus is an essential part of the terminology management lifecycle. We investigate a structural-lexical approach based on non-lattice subgraphs to automatically identify missing hierarchical relations and missing concepts in the NCI Thesaurus. We mine six structural-lexical patterns exhibiting in non-lattice subgraphs: containment, union, intersection, union-intersection, inference-contradiction, and inference union. Each pattern indicates a potential specific type of error and suggests a potential type of remediation. We found 809 non-lattice subgraphs with these patterns in the NCI Thesaurus (version 16.12d). Domain experts evaluated a random sample of 50 small non-lattice subgraphs, of which 33 were confirmed to contain errors and make correct suggestions (33/50 = 66%). Of the 25 evaluated subgraphs revealing multiple patterns, 22 were verified correct (22/25 = 88%). This shows the effectiveness of our structurallexical-pattern-based approach in detecting errors and suggesting remediations in the NCI Thesaurus. PMID:29854100

  5. The Network Structure Underlying the Earth Observation Assessment

    NASA Astrophysics Data System (ADS)

    Vitkin, S.; Doane, W. E. J.; Mary, J. C.

    2017-12-01

    The Earth Observations Assessment (EOA 2016) is a multiyear project designed to assess the effectiveness of civil earth observation data sources (instruments, sensors, models, etc.) on societal benefit areas (SBAs) for the United States. Subject matter experts (SMEs) provided input and scored how data sources inform products, product groups, key objectives, SBA sub-areas, and SBAs in an attempt to quantify the relationships between data sources and SBAs. The resulting data were processed by Integrated Applications Incorporated (IAI) using MITRE's PALMA software to create normalized relative impact scores for each of these relationships. However, PALMA processing obscures the natural network representation of the data. Any network analysis that might identify patterns of interaction among data sources, products, and SBAs is therefore impossible. Collaborating with IAI, we cleaned and recreated a network from the original dataset. Using R and Python we explore the underlying structure of the network and apply frequent itemset mining algorithms to identify groups of data sources and products that interact. We reveal interesting patterns and relationships in the EOA dataset that were not immediately observable from the EOA 2016 report and provide a basis for further exploration of the EOA network dataset.

  6. The impact of gold mining on the Witwatersrand on the rivers and karst system of Gauteng and North West Province, South Africa

    NASA Astrophysics Data System (ADS)

    Durand, J. F.

    2012-06-01

    The Witwatersrand has been subjected to geological exploration, mining activities, parallel industrial development and associated settlement patterns over the past century. The gold mines brought with them not only development, employment and wealth, but also the most devastating war in the history of South Africa, civil unrest, economical inequality, social uprooting, pollution, negative health impacts and ecological destruction. One of the most consistent and pressing problems caused by mining has been its impact on the water bodies in and adjacent to the Witwatersrand. The dewatering and rewatering of the karstic aquifer overlying and adjacent to the Witwatersrand Supergroup and the pollution caused by Acid Mine Drainage (AMD) are some of the most serious consequences of gold mining in South Africa and will affect the lives of many South Africans.

  7. Subnetwork mining on functional connectivity network for classification of minimal hepatic encephalopathy.

    PubMed

    Zhang, Daoqiang; Tu, Liyang; Zhang, Long-Jiang; Jie, Biao; Lu, Guang-Ming

    2018-06-01

    Hepatic encephalopathy (HE), as a complication of cirrhosis, is a serious brain disease, which may lead to death. Accurate diagnosis of HE and its intermediate stage, i.e., minimal HE (MHE), is very important for possibly early diagnosis and treatment. Brain connectivity network, as a simple representation of brain interaction, has been widely used for the brain disease (e.g., HE and MHE) analysis. However, those studies mainly focus on finding disease-related abnormal connectivity between brain regions, although a large number of studies have indicated that some brain diseases are usually related to local structure of brain connectivity network (i.e., subnetwork), rather than solely on some single brain regions or connectivities. Also, mining such disease-related subnetwork is a challenging task because of the complexity of brain network. To address this problem, we proposed a novel frequent-subnetwork-based method to mine disease-related subnetworks for MHE classification. Specifically, we first mine frequent subnetworks from both groups, i.e., MHE patients and non-HE (NHE) patients, respectively. Then we used the graph-kernel based method to select the most discriminative subnetworks for subsequent classification. We evaluate our proposed method on a MHE dataset with 77 cirrhosis patients, including 38 MHE patients and 39 NHE patients. The results demonstrate that our proposed method can not only obtain the improved classification performance in comparison with state-of-the-art network-based methods, but also identify disease-related subnetworks which can help us better understand the pathology of the brain diseases.

  8. Application and Exploration of Big Data Mining in Clinical Medicine

    PubMed Central

    Zhang, Yue; Guo, Shu-Li; Han, Li-Na; Li, Tie-Ling

    2016-01-01

    Objective: To review theories and technologies of big data mining and their application in clinical medicine. Data Sources: Literatures published in English or Chinese regarding theories and technologies of big data mining and the concrete applications of data mining technology in clinical medicine were obtained from PubMed and Chinese Hospital Knowledge Database from 1975 to 2015. Study Selection: Original articles regarding big data mining theory/technology and big data mining's application in the medical field were selected. Results: This review characterized the basic theories and technologies of big data mining including fuzzy theory, rough set theory, cloud theory, Dempster–Shafer theory, artificial neural network, genetic algorithm, inductive learning theory, Bayesian network, decision tree, pattern recognition, high-performance computing, and statistical analysis. The application of big data mining in clinical medicine was analyzed in the fields of disease risk assessment, clinical decision support, prediction of disease development, guidance of rational use of drugs, medical management, and evidence-based medicine. Conclusion: Big data mining has the potential to play an important role in clinical medicine. PMID:26960378

  9. Mining local climate data to assess spatiotemporal dengue fever epidemic patterns in French Guiana

    PubMed Central

    Flamand, Claude; Fabregue, Mickael; Bringay, Sandra; Ardillon, Vanessa; Quénel, Philippe; Desenclos, Jean-Claude; Teisseire, Maguelonne

    2014-01-01

    Objective To identify local meteorological drivers of dengue fever in French Guiana, we applied an original data mining method to the available epidemiological and climatic data. Through this work, we also assessed the contribution of the data mining method to the understanding of factors associated with the dissemination of infectious diseases and their spatiotemporal spread. Methods We applied contextual sequential pattern extraction techniques to epidemiological and meteorological data to identify the most significant climatic factors for dengue fever, and we investigated the relevance of the extracted patterns for the early warning of dengue outbreaks in French Guiana. Results The maximum temperature, minimum relative humidity, global brilliance, and cumulative rainfall were identified as determinants of dengue outbreaks, and the precise intervals of their values and variations were quantified according to the epidemiologic context. The strongest significant correlations were observed between dengue incidence and meteorological drivers after a 4–6-week lag. Discussion We demonstrated the use of contextual sequential patterns to better understand the determinants of the spatiotemporal spread of dengue fever in French Guiana. Future work should integrate additional variables and explore the notion of neighborhood for extracting sequential patterns. Conclusions Dengue fever remains a major public health issue in French Guiana. The development of new methods to identify such specific characteristics becomes crucial in order to better understand and control spatiotemporal transmission. PMID:24549761

  10. Mining of Business-Oriented Conversations at a Call Center

    NASA Astrophysics Data System (ADS)

    Takeuchi, Hironori; Nasukawa, Tetsuya; Watanabe, Hideo

    Recently it has become feasible to transcribe textual records from telephone conversations at call centers by using automatic speech recognition. In this research, we extended a text mining system for call summary records and constructed a conversation mining system for the business-oriented conversations at the call center. To acquire useful business insights from the conversational data through the text mining system, it is critical to identify appropriate textual segments and expressions as the viewpoints to focus on. In the analysis of call summary data using a text mining system, some experts defined the viewpoints for the analysis by looking at some sample records and by preparing the dictionaries based on frequent keywords in the sample dataset. However with conversations it is difficult to identify such viewpoints manually and in advance because the target data consists of complete transcripts that are often lengthy and redundant. In this research, we defined a model of the business-oriented conversations and proposed a mining method to identify segments that have impacts on the outcomes of the conversations and can then extract useful expressions in each of these identified segments. In the experiment, we processed the real datasets from a car rental service center and constructed a mining system. With this system, we show the effectiveness of the method based on the defined conversation model.

  11. Clustering and Dimensionality Reduction to Discover Interesting Patterns in Binary Data

    NASA Astrophysics Data System (ADS)

    Palumbo, Francesco; D'Enza, Alfonso Iodice

    The attention towards binary data coding increased consistently in the last decade due to several reasons. The analysis of binary data characterizes several fields of application, such as market basket analysis, DNA microarray data, image mining, text mining and web-clickstream mining. The paper illustrates two different approaches exploiting a profitable combination of clustering and dimensionality reduction for the identification of non-trivial association structures in binary data. An application in the Association Rules framework supports the theory with the empirical evidence.

  12. Source Analysis of the Crandall Canyon, Utah, Mine Collapse

    DOE PAGES

    Dreger, D. S.; Ford, S. R.; Walter, W. R.

    2008-07-11

    Analysis of seismograms from a magnitude 3.9 seismic event on August 6, 2007 in central Utah reveals an anomalous radiation pattern that is contrary to that expected for a tectonic earthquake, and which is dominated by an implosive component. The results show the seismic event is best modeled as a shallow underground collapse. Interestingly, large transverse surface waves require a smaller additional non-collapse source component that represents either faulting in the rocks above the mine workings or deformation of the medium surrounding the mine.

  13. Geovisualization of Local and Regional Migration Using Web-mined Demographics

    NASA Astrophysics Data System (ADS)

    Schuermann, R. T.; Chow, T. E.

    2014-11-01

    The intent of this research was to augment and facilitate analyses, which gauges the feasibility of web-mined demographics to study spatio-temporal dynamics of migration. As a case study, we explored the spatio-temporal dynamics of Vietnamese Americans (VA) in Texas through geovisualization of mined demographic microdata from the World Wide Web. Based on string matching across all demographic attributes, including full name, address, date of birth, age and phone number, multiple records of the same entity (i.e. person) over time were resolved and reconciled into a database. Migration trajectories were geovisualized through animated sprites by connecting the different addresses associated with the same person and segmenting the trajectory into small fragments. Intra-metropolitan migration patterns appeared at the local scale within many metropolitan areas. At the scale of metropolitan area, varying degrees of immigration and emigration manifest different types of migration clusters. This paper presents a methodology incorporating GIS methods and cartographic design to produce geovisualization animation, enabling the cognitive identification of migration patterns at multiple scales. Identification of spatio-temporal patterns often stimulates further research to better understand the phenomenon and enhance subsequent modeling.

  14. Study of application of ERTS-A imagery to fracture related mine safety hazards in the coal mining industry

    NASA Technical Reports Server (NTRS)

    Wier, C. E.; Wobber, F. J. (Principal Investigator); Russell, O. R.; Amato, R. V.

    1973-01-01

    The author has identified the following significant results. The 70mm black and white infrared photography acquired in March 1973 at an approximate scale of 1:115,000 permits the identification of areas of mine subsidence not readily evident on other films. This is largely due to the high contrast rendition of water and land by this film and the excessive surface moisture conditions prevalent in the area at the time of photography. Subsided areas consist of shallow depressions which have impounded water. Patterns with a regularity indicative of the room and pillar configuration used in subsurface coal mining are evident.

  15. Kinetics of bed fracturing around mine workings

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Veksler, Yu.A.

    1988-03-01

    A failure of the bed near the walls of the workings of a mine away from the face occurs gradually over time and in this paper the authors take a kinetic approach to evaluating its development. The influence of certain mine engineering factors on the pattern of bed fracturing is discussed. The effect of the depth of mining is shown. Cracking occurs in the portion of the seam at the face near the ground at some distance from it on the interface between soft and hard coal. The density of the fractured rocks and their response affect the bed fracturingmore » near the stope face.« less

  16. RANWAR: rank-based weighted association rule mining from gene expression and methylation data.

    PubMed

    Mallik, Saurav; Mukhopadhyay, Anirban; Maulik, Ujjwal

    2015-01-01

    Ranking of association rules is currently an interesting topic in data mining and bioinformatics. The huge number of evolved rules of items (or, genes) by association rule mining (ARM) algorithms makes confusion to the decision maker. In this article, we propose a weighted rule-mining technique (say, RANWAR or rank-based weighted association rule-mining) to rank the rules using two novel rule-interestingness measures, viz., rank-based weighted condensed support (wcs) and weighted condensed confidence (wcc) measures to bypass the problem. These measures are basically depended on the rank of items (genes). Using the rank, we assign weight to each item. RANWAR generates much less number of frequent itemsets than the state-of-the-art association rule mining algorithms. Thus, it saves time of execution of the algorithm. We run RANWAR on gene expression and methylation datasets. The genes of the top rules are biologically validated by Gene Ontologies (GOs) and KEGG pathway analyses. Many top ranked rules extracted from RANWAR that hold poor ranks in traditional Apriori, are highly biologically significant to the related diseases. Finally, the top rules evolved from RANWAR, that are not in Apriori, are reported.

  17. A systematic review of data mining and machine learning for air pollution epidemiology.

    PubMed

    Bellinger, Colin; Mohomed Jabbar, Mohomed Shazan; Zaïane, Osmar; Osornio-Vargas, Alvaro

    2017-11-28

    Data measuring airborne pollutants, public health and environmental factors are increasingly being stored and merged. These big datasets offer great potential, but also challenge traditional epidemiological methods. This has motivated the exploration of alternative methods to make predictions, find patterns and extract information. To this end, data mining and machine learning algorithms are increasingly being applied to air pollution epidemiology. We conducted a systematic literature review on the application of data mining and machine learning methods in air pollution epidemiology. We carried out our search process in PubMed, the MEDLINE database and Google Scholar. Research articles applying data mining and machine learning methods to air pollution epidemiology were queried and reviewed. Our search queries resulted in 400 research articles. Our fine-grained analysis employed our inclusion/exclusion criteria to reduce the results to 47 articles, which we separate into three primary areas of interest: 1) source apportionment; 2) forecasting/prediction of air pollution/quality or exposure; and 3) generating hypotheses. Early applications had a preference for artificial neural networks. In more recent work, decision trees, support vector machines, k-means clustering and the APRIORI algorithm have been widely applied. Our survey shows that the majority of the research has been conducted in Europe, China and the USA, and that data mining is becoming an increasingly common tool in environmental health. For potential new directions, we have identified that deep learning and geo-spacial pattern mining are two burgeoning areas of data mining that have good potential for future applications in air pollution epidemiology. We carried out a systematic review identifying the current trends, challenges and new directions to explore in the application of data mining methods to air pollution epidemiology. This work shows that data mining is increasingly being applied in air pollution epidemiology. The potential to support air pollution epidemiology continues to grow with advancements in data mining related to temporal and geo-spacial mining, and deep learning. This is further supported by new sensors and storage mediums that enable larger, better quality data. This suggests that many more fruitful applications can be expected in the future.

  18. Student Consistency and Implications for Feedback in Online Assessment Systems

    ERIC Educational Resources Information Center

    Madhyastha, Tara M.; Tanimoto, Steven

    2009-01-01

    Most of the emphasis on mining online assessment logs has been to identify content-specific errors. However, the pattern of general "consistency" is domain independent, strongly related to performance, and can itself be a target of educational data mining. We demonstrate that simple consistency indicators are related to student outcomes,…

  19. Using Syntactic Patterns to Enhance Text Analytics

    ERIC Educational Resources Information Center

    Meyer, Bradley B.

    2017-01-01

    Large scale product and service reviews proliferate and are commonly found across the web. The ability to harvest, digest and analyze a large corpus of reviews from online websites is still however a difficult problem. This problem is referred to as "opinion mining." Opinion mining is an important area of research as advances in the…

  20. Data mining application in customer relationship management for hospital inpatients.

    PubMed

    Lee, Eun Whan

    2012-09-01

    This study aims to discover patients loyal to a hospital and model their medical service usage patterns. Consequently, this study proposes a data mining application in customer relationship management (CRM) for hospital inpatients. A recency, frequency, monetary (RFM) model has been applied toward 14,072 patients discharged from a university hospital. Cluster analysis was conducted to segment customers, and it modeled the patterns of the loyal customers' medical services usage via a decision tree. Patients were divided into two groups according to the variables of the RFM model and the group which had significantly high frequency of medical use and expenses was defined as loyal customers, a target market. As a result of the decision tree, the predictable factors of the loyal clients were; length of stay, certainty of selectable treatment, surgery, number of accompanying treatments, kind of patient room, and department from which they were discharged. Particularly, this research showed that when a patient within the internal medicine department who did not have surgery stayed for more than 13.5 days, their probability of being a classified as a loyal customer was 70.0%. To discover a hospital's loyal patients and model their medical usage patterns, the application of data-mining has been suggested. This paper suggests practical use of combining segmentation, targeting, positioning (STP) strategy and the RFM model with data-mining in CRM.

  1. Data Mining Application in Customer Relationship Management for Hospital Inpatients

    PubMed Central

    2012-01-01

    Objectives This study aims to discover patients loyal to a hospital and model their medical service usage patterns. Consequently, this study proposes a data mining application in customer relationship management (CRM) for hospital inpatients. Methods A recency, frequency, monetary (RFM) model has been applied toward 14,072 patients discharged from a university hospital. Cluster analysis was conducted to segment customers, and it modeled the patterns of the loyal customers' medical services usage via a decision tree. Results Patients were divided into two groups according to the variables of the RFM model and the group which had significantly high frequency of medical use and expenses was defined as loyal customers, a target market. As a result of the decision tree, the predictable factors of the loyal clients were; length of stay, certainty of selectable treatment, surgery, number of accompanying treatments, kind of patient room, and department from which they were discharged. Particularly, this research showed that when a patient within the internal medicine department who did not have surgery stayed for more than 13.5 days, their probability of being a classified as a loyal customer was 70.0%. Conclusions To discover a hospital's loyal patients and model their medical usage patterns, the application of data-mining has been suggested. This paper suggests practical use of combining segmentation, targeting, positioning (STP) strategy and the RFM model with data-mining in CRM. PMID:23115740

  2. The taxonomy statistic uncovers novel clinical patterns in a population of ischemic stroke patients.

    PubMed

    Tukiendorf, Andrzej; Kaźmierski, Radosław; Michalak, Sławomir

    2013-01-01

    In this paper, we describe a simple taxonomic approach for clinical data mining elaborated by Marczewski and Steinhaus (M-S), whose performance equals the advanced statistical methodology known as the expectation-maximization (E-M) algorithm. We tested these two methods on a cohort of ischemic stroke patients. The comparison of both methods revealed strong agreement. Direct agreement between M-S and E-M classifications reached 83%, while Cohen's coefficient of agreement was κ = 0.766(P < 0.0001). The statistical analysis conducted and the outcomes obtained in this paper revealed novel clinical patterns in ischemic stroke patients. The aim of the study was to evaluate the clinical usefulness of Marczewski-Steinhaus' taxonomic approach as a tool for the detection of novel patterns of data in ischemic stroke patients and the prediction of disease outcome. In terms of the identification of fairly frequent types of stroke patients using their age, National Institutes of Health Stroke Scale (NIHSS), and diabetes mellitus (DM) status, when dealing with rough characteristics of patients, four particular types of patients are recognized, which cannot be identified by means of routine clinical methods. Following the obtained taxonomical outcomes, the strong correlation between the health status at moment of admission to emergency department (ED) and the subsequent recovery of patients is established. Moreover, popularization and simplification of the ideas of advanced mathematicians may provide an unconventional explorative platform for clinical problems.

  3. High contents of rare earth elements (REEs) in stream waters of a Cu-Pb-Zn mining area.

    PubMed

    Protano, G; Riccobono, F

    2002-01-01

    Stream waters draining an old mining area present very high rare earth element (REE) contents, reaching 928 microg/l as the maximum total value (sigmaREE). The middle rare earth elements (MREEs) are usually enriched with respect to both the light (LREEs) and heavy (HREEs) elements of this group, producing a characteristic "roof-shaped" pattern of the shale Post-Archean Australian Shales-normalized concentrations. At the Fenice Capanne Mine (FCM), the most important base metal mine of the study area, the REE source coincides with the mine tailings, mostly the oldest ones composed of iron-rich materials. The geochemical history of the REEs released into Noni stream from wastes in the FCM area is strictly determined by the pH, which controls the REE speciation and in-stream processes. The formation of Al-rich and mainly Fe-rich flocs effectively scavenges the REEs, which are readily and drastically removed from the solution when the pH approaches neutrality. Leaching experiments performed on flocs and waste materials demonstrate that Fe-oxides/oxyhydroxides play a key role in the release of lanthanide elements into stream waters. The origin of the "roof-shaped" REE distribution pattern as well as the peculiar geochemical behavior of some lanthanide elements in the aqueous system are discussed.

  4. A novel water quality data analysis framework based on time-series data mining.

    PubMed

    Deng, Weihui; Wang, Guoyin

    2017-07-01

    The rapid development of time-series data mining provides an emerging method for water resource management research. In this paper, based on the time-series data mining methodology, we propose a novel and general analysis framework for water quality time-series data. It consists of two parts: implementation components and common tasks of time-series data mining in water quality data. In the first part, we propose to granulate the time series into several two-dimensional normal clouds and calculate the similarities in the granulated level. On the basis of the similarity matrix, the similarity search, anomaly detection, and pattern discovery tasks in the water quality time-series instance dataset can be easily implemented in the second part. We present a case study of this analysis framework on weekly Dissolve Oxygen time-series data collected from five monitoring stations on the upper reaches of Yangtze River, China. It discovered the relationship of water quality in the mainstream and tributary as well as the main changing patterns of DO. The experimental results show that the proposed analysis framework is a feasible and efficient method to mine the hidden and valuable knowledge from water quality historical time-series data. Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. A systems approach to accident causation in mining: an application of the HFACS method.

    PubMed

    Lenné, Michael G; Salmon, Paul M; Liu, Charles C; Trotter, Margaret

    2012-09-01

    This project aimed to provide a greater understanding of the systemic factors involved in mining accidents, and to examine those organisational and supervisory failures that are predictive of sub-standard performance at operator level. A sample of 263 significant mining incidents in Australia across 2007-2008 were analysed using the Human Factors Analysis and Classification System (HFACS). Two human factors specialists independently undertook the analysis. Incidents occurred more frequently in operations concerning the use of surface mobile equipment (38%) and working at heights (21%), however injury was more frequently associated with electrical operations and vehicles and machinery. Several HFACS categories appeared frequently: skill-based errors (64%) and violations (57%), issues with the physical environment (56%), and organisational processes (65%). Focussing on the overall system, several factors were found to predict the presence of failures in other parts of the system, including planned inappropriate operations and team resource management; inadequate supervision and team resource management; and organisational climate and inadequate supervision. It is recommended that these associations deserve greater attention in future attempts to develop accident countermeasures, although other significant associations should not be ignored. In accordance with findings from previous HFACS-based analyses of aviation and medical incidents, efforts to reduce the frequency of unsafe acts or operations should be directed to a few critical HFACS categories at the higher levels: organisational climate, planned inadequate operations, and inadequate supervision. While remedial strategies are proposed it is important that future efforts evaluate the utility of the measures proposed in studies of system safety. Copyright © 2011. Published by Elsevier Ltd.

  6. A Graph Approach to Mining Biological Patterns in the Binding Interfaces.

    PubMed

    Cheng, Wen; Yan, Changhui

    2017-01-01

    Protein-RNA interactions play important roles in the biological systems. Searching for regular patterns in the Protein-RNA binding interfaces is important for understanding how protein and RNA recognize each other and bind to form a complex. Herein, we present a graph-mining method for discovering biological patterns in the protein-RNA interfaces. We represented known protein-RNA interfaces using graphs and then discovered graph patterns enriched in the interfaces. Comparison of the discovered graph patterns with UniProt annotations showed that the graph patterns had a significant overlap with residue sites that had been proven crucial for the RNA binding by experimental methods. Using 200 patterns as input features, a support vector machine method was able to classify protein surface patches into RNA-binding sites and non-RNA-binding sites with 84.0% accuracy and 88.9% precision. We built a simple scoring function that calculated the total number of the graph patterns that occurred in a protein-RNA interface. That scoring function was able to discriminate near-native protein-RNA complexes from docking decoys with a performance comparable with that of a state-of-the-art complex scoring function. Our work also revealed possible patterns that might be important for binding affinity.

  7. Temporal variation and the effect of rainfall on metals flux from the historic Beatson mine, Prince William Sound, Alaska, USA

    USGS Publications Warehouse

    Stillings, L.L.; Foster, A.L.; Koski, R.A.; Munk, L.; Shanks, Wayne C.

    2008-01-01

    Several abandoned Cu mines are located along the shore of Prince William Sound, AK, where the effect of mining-related discharge upon shoreline ecosystems is unknown. To determine the magnitude of this effect at the former Beatson mine, the largest Cu mine in the region and a Besshi-type massive sulfide ore deposit, trace metal concentration and flux were measured in surface run-off from remnant, mineralized workings and waste. Samples were collected from seepage waters; a remnant glory hole which is now a pit lake; a braided stream draining an area of mineralized rock, underground mine workings, and waste piles; and a background location upstream of the mine workings and mineralized rock. In the background stream pH averaged ???7.3, specific conductivity (SC) was ???40 ??S/cm, and the aqueous components indicative of sulfide mineral weathering, SO4 and trace metals, were at detection limits or lower. In the braided stream below the mine workings and waste piles, pH usually varied from 6.7 to 7.1, SC varied from 40 to 120 ??S/cm, SO4 had maximum concentrations of 32 mg/L, and the trace metals Cu, Ni, Pb, and Zn showed maximum total acid extractable concentrations of 186, 5.9, 6.2 and 343 ??g/L, respectively. With an annual rainfall of ???340 cm (estimated from the 2006 water year) it was expected that rain water would have a large effect on the chemistry of the braided stream draining the mine site. A linear mixing model with two end members, seepage water from mineralized rock and background water, estimated that the braided stream contained 10-35% mine drainage. After rain events the braided stream showed a decrease in pH, SC, Ca + Mg, SO4, and alkalinity, due to dilution. The trace metals Ni and Zn followed this same pattern. Sodium + K and Cl did not vary between the background and braided stream, nor did they vary with rainfall. At approximately 2 and 3 mg/L, respectively, these concentrations are similar to concentrations found in rainfall on the coasts of North America. High concentrations of total acid extractable Al and Fe were found at near-neutral pH in most of the waters collected at the site. Equilibrium solubility simulations, performed with PHREEQC, show that the stream waters are saturated with respect to Al, Fe and SiO2 solid phases. Because the "dissolved" sample fractions (acid preserved and filtered to 0.45 ??m) show significant concentrations of Al and Fe it is presumed that these are present as colloids. The relationship between concentrations of Al and Fe, and rainfall was the opposite of that observed for the major ions Ca + Mg, SO4, and alkalinity, in that Al and Fe concentrations increased with increasing rainfall. Concentrations of Cu and Pb followed the same pattern. Adsorption calculations were performed with Visual MINTEQ, using the diffuse double layer electrostatic model and surface complexation constants for the ferrihydrite surface. These results suggest that 30-93% of Cu and 58-97% of Pb was adsorbed to ferrihydrite precipitates in the stream waters. Ni and Zn showed little adsorption in this pH range. Flux calculations show that the total mass of trace metals transported from the mine site, during the 60 day study period, was ranked as Zn (196 kg) > Cu (87 kg) > Pb(1.9 kg) ??? Ni(1.9 kg). Nickel and Zn were transported mostly as dissolved species while Cu and Pb were transported mostly as adsorbed species. pH control on adsorption was evident when Cu and Pb isotherms were normalized by ferrihydrite flux. Decreased stream water pH due to periods of frequent and high volume rain events would cause desorption of Cu and Pb from the ferrihydrite surface, thus changing not only their speciation in solution but also their mechanism of transport. ?? 2007 Elsevier Ltd. All rights reserved.

  8. Statistically significant relational data mining :

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berry, Jonathan W.; Leung, Vitus Joseph; Phillips, Cynthia Ann

    This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publicationsmore » that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.« less

  9. Modeling of the Nano- and Picoseismicity Rate Changes Resulting from Static Stress Triggering due to Small (MW2.2) Event Recorded at Mponeng Deep Gold Mine, South Africa

    NASA Astrophysics Data System (ADS)

    Kozlowska, M.; Orlecka-Sikora, B.; Kwiatek, G.; Boettcher, M. S.; Dresen, G. H.

    2014-12-01

    Static stress changes following large earthquakes are known to affect the rate and spatio-temporal distribution of the aftershocks. Here we utilize a unique dataset of M ≥ -3.4 earthquakes following a MW 2.2 earthquake in Mponeng gold mine, South Africa, to investigate this process for nano- and pico- scale seismicity at centimeter length scales in shallow, mining conditions. The aftershock sequence was recorded during a quiet interval in the mine and thus enabled us to perform the analysis using Dietrich's (1994) rate and state dependent friction law. The formulation for earthquake productivity requires estimation of Coulomb stress changes due to the mainshock, the reference seismicity rate, frictional resistance parameter, and the duration of aftershock relaxation time. We divided the area into six depth intervals and for each we estimated the parameters and modeled the spatio-temporal patterns of seismicity rates after the stress perturbation. Comparing the modeled patterns of seismicity with the observed distribution we found that while the spatial patterns match well, the rate of modeled aftershocks is lower than the observed rate. To test our model, we used four metrics of the goodness-of-fit evaluation. Testing procedure allowed rejecting the null hypothesis of no significant difference between seismicity rates only for one depth interval containing the mainshock, for the other, no significant differences have been found. Results show that mining-induced earthquakes may be followed by a stress relaxation expressed through aftershocks located on the rupture plane and in regions of positive Coulomb stress change. Furthermore, we demonstrate that the main features of the temporal and spatial distribution of very small, mining-induced earthquakes at shallow depths can be successfully determined using rate- and state-based stress modeling.

  10. Evaluation of the environmental contamination at an abandoned mining site using multivariate statistical techniques--the Rodalquilar (Southern Spain) mining district.

    PubMed

    Bagur, M G; Morales, S; López-Chicano, M

    2009-11-15

    Unsupervised and supervised pattern recognition techniques such as hierarchical cluster analysis, principal component analysis, factor analysis and linear discriminant analysis have been applied to water samples recollected in Rodalquilar mining district (Southern Spain) in order to identify different sources of environmental pollution caused by the abandoned mining industry. The effect of the mining activity on waters was monitored determining the concentration of eleven elements (Mn, Ba, Co, Cu, Zn, As, Cd, Sb, Hg, Au and Pb) by inductively coupled plasma mass spectrometry (ICP-MS). The Box-Cox transformation has been used to transform the data set in normal form in order to minimize the non-normal distribution of the geochemical data. The environmental impact is affected mainly by the mining activity developed in the zone, the acid drainage and finally by the chemical treatment used for the benefit of gold.

  11. A text-based data mining and toxicity prediction modeling system for a clinical decision support in radiation oncology: A preliminary study

    NASA Astrophysics Data System (ADS)

    Kim, Kwang Hyeon; Lee, Suk; Shim, Jang Bo; Chang, Kyung Hwan; Yang, Dae Sik; Yoon, Won Sup; Park, Young Je; Kim, Chul Yong; Cao, Yuan Jie

    2017-08-01

    The aim of this study is an integrated research for text-based data mining and toxicity prediction modeling system for clinical decision support system based on big data in radiation oncology as a preliminary research. The structured and unstructured data were prepared by treatment plans and the unstructured data were extracted by dose-volume data image pattern recognition of prostate cancer for research articles crawling through the internet. We modeled an artificial neural network to build a predictor model system for toxicity prediction of organs at risk. We used a text-based data mining approach to build the artificial neural network model for bladder and rectum complication predictions. The pattern recognition method was used to mine the unstructured toxicity data for dose-volume at the detection accuracy of 97.9%. The confusion matrix and training model of the neural network were achieved with 50 modeled plans (n = 50) for validation. The toxicity level was analyzed and the risk factors for 25% bladder, 50% bladder, 20% rectum, and 50% rectum were calculated by the artificial neural network algorithm. As a result, 32 plans could cause complication but 18 plans were designed as non-complication among 50 modeled plans. We integrated data mining and a toxicity modeling method for toxicity prediction using prostate cancer cases. It is shown that a preprocessing analysis using text-based data mining and prediction modeling can be expanded to personalized patient treatment decision support based on big data.

  12. Zoning method for environmental engineering geological patterns in underground coal mining areas.

    PubMed

    Liu, Shiliang; Li, Wenping; Wang, Qiqing

    2018-09-01

    Environmental engineering geological patterns (EEGPs) are used to express the trend and intensity of eco-geological environment caused by mining in underground coal mining areas, a complex process controlled by multiple factors. A new zoning method for EEGPs was developed based on the variable-weight theory (VWT), where the weights of factors vary with their value. The method was applied to the Yushenfu mining area, Shaanxi, China. First, the mechanism of the EEGPs caused by mining was elucidated, and four types of EEGPs were proposed. Subsequently, 13 key control factors were selected from mining conditions, lithosphere, hydrosphere, ecosphere, and climatic conditions; their thematic maps were constructed using ArcGIS software and remote-sensing technologies. Then, a stimulation-punishment variable-weight model derived from the partition of basic evaluation unit of study area, construction of partition state-variable-weight vector, and determination of variable-weight interval was built to calculate the variable weights of each factor. On this basis, a zoning mathematical model of EEGPs was established, and the zoning results were analyzed. For comparison, the traditional constant-weight theory (CWT) was also applied to divide the EEGPs. Finally, the zoning results obtained using VWT and CWT were compared. The verification of field investigation indicates that VWT is more accurate and reliable than CWT. The zoning results are consistent with the actual situations and the key of planning design for the rational development of coal resources and protection of eco-geological environment. Copyright © 2018 Elsevier B.V. All rights reserved.

  13. Spatiotemporal analysis of changes in lode mining claims around the McDermitt Caldera, northern Nevada and southern Oregon

    USGS Publications Warehouse

    Coyan, Joshua; Zientek, Michael L.; Mihalasky, Mark J.

    2017-01-01

    Resource managers and agencies involved with planning for future federal land needs are required to complete an assessment of and forecast for future land use every ten years. Predicting mining activities on federal lands is difficult as current regulations do not require disclosure of exploration results. In these cases, historic mining claims may serve as a useful proxy for determining where mining-related activities may occur. We assess the utility of using a space–time cube (STC) and associated analyses to evaluate and characterize mining claim activities around the McDermitt Caldera in northern Nevada and southern Oregon. The most significant advantage of arranging the mining claim data into a STC is the ability to visualize and compare the data, which allows scientists to better understand patterns and results. Additional analyses of the STC (i.e., Trend, Emerging Hot Spot, Hot Spot, and Cluster and Outlier Analyses) provide extra insights into the data and may aid in predicting future mining claim activities.

  14. Geological survey of Maryland using EREP flight data. [mining, mapping, Chesapeake Bay islands, coastal water features

    NASA Technical Reports Server (NTRS)

    Weaver, K. N. (Principal Investigator)

    1973-01-01

    The author has identified the following significant results. Underflight photography has been used in the Baltimore County mined land inventory to determine areas of disturbed land where surface mining of sand and ground clay, or stone has taken place. Both active and abandoned pits and quarries were located. Aircraft data has been used to update cultural features of Calvert, Caroline, St. Mary's, Somerset, Talbot, and Wicomico Counties. Islands have been located and catalogued for comparison with older film and map data for erosion data. Strip mined areas are being mapped to obtain total area disturbed to aid in future mining and reclamation problems. Coastal estuarine and Atlantic Coast features are being studied to determine nearshore bedforms, sedimentary, and erosional patterns, and manmade influence on natural systems.

  15. Mining sequential patterns for protein fold recognition.

    PubMed

    Exarchos, Themis P; Papaloukas, Costas; Lampros, Christos; Fotiadis, Dimitrios I

    2008-02-01

    Protein data contain discriminative patterns that can be used in many beneficial applications if they are defined correctly. In this work sequential pattern mining (SPM) is utilized for sequence-based fold recognition. Protein classification in terms of fold recognition plays an important role in computational protein analysis, since it can contribute to the determination of the function of a protein whose structure is unknown. Specifically, one of the most efficient SPM algorithms, cSPADE, is employed for the analysis of protein sequence. A classifier uses the extracted sequential patterns to classify proteins in the appropriate fold category. For training and evaluating the proposed method we used the protein sequences from the Protein Data Bank and the annotation of the SCOP database. The method exhibited an overall accuracy of 25% in a classification problem with 36 candidate categories. The classification performance reaches up to 56% when the five most probable protein folds are considered.

  16. Synoptic sampling and principal components analysis to identify sources of water and metals to an acid mine drainage stream.

    PubMed

    Byrne, Patrick; Runkel, Robert L; Walton-Day, Katherine

    2017-07-01

    Combining the synoptic mass balance approach with principal components analysis (PCA) can be an effective method for discretising the chemistry of inflows and source areas in watersheds where contamination is diffuse in nature and/or complicated by groundwater interactions. This paper presents a field-scale study in which synoptic sampling and PCA are employed in a mineralized watershed (Lion Creek, Colorado, USA) under low flow conditions to (i) quantify the impacts of mining activity on stream water quality; (ii) quantify the spatial pattern of constituent loading; and (iii) identify inflow sources most responsible for observed changes in stream chemistry and constituent loading. Several of the constituents investigated (Al, Cd, Cu, Fe, Mn, Zn) fail to meet chronic aquatic life standards along most of the study reach. The spatial pattern of constituent loading suggests four primary sources of contamination under low flow conditions. Three of these sources are associated with acidic (pH <3.1) seeps that enter along the left bank of Lion Creek. Investigation of inflow water (trace metal and major ion) chemistry using PCA suggests a hydraulic connection between many of the left bank inflows and mine water in the Minnesota Mine shaft located to the north-east of the river channel. In addition, water chemistry data during a rainfall-runoff event suggests the spatial pattern of constituent loading may be modified during rainfall due to dissolution of efflorescent salts or erosion of streamside tailings. These data point to the complexity of contaminant mobilisation processes and constituent loading in mining-affected watersheds but the combined synoptic sampling and PCA approach enables a conceptual model of contaminant dynamics to be developed to inform remediation.

  17. Synoptic sampling and principal components analysis to identify sources of water and metals to an acid mine drainage stream

    USGS Publications Warehouse

    Byrne, Patrick; Runkel, Robert L.; Walton-Day, Katie

    2017-01-01

    Combining the synoptic mass balance approach with principal components analysis (PCA) can be an effective method for discretising the chemistry of inflows and source areas in watersheds where contamination is diffuse in nature and/or complicated by groundwater interactions. This paper presents a field-scale study in which synoptic sampling and PCA are employed in a mineralized watershed (Lion Creek, Colorado, USA) under low flow conditions to (i) quantify the impacts of mining activity on stream water quality; (ii) quantify the spatial pattern of constituent loading; and (iii) identify inflow sources most responsible for observed changes in stream chemistry and constituent loading. Several of the constituents investigated (Al, Cd, Cu, Fe, Mn, Zn) fail to meet chronic aquatic life standards along most of the study reach. The spatial pattern of constituent loading suggests four primary sources of contamination under low flow conditions. Three of these sources are associated with acidic (pH <3.1) seeps that enter along the left bank of Lion Creek. Investigation of inflow water (trace metal and major ion) chemistry using PCA suggests a hydraulic connection between many of the left bank inflows and mine water in the Minnesota Mine shaft located to the north-east of the river channel. In addition, water chemistry data during a rainfall-runoff event suggests the spatial pattern of constituent loading may be modified during rainfall due to dissolution of efflorescent salts or erosion of streamside tailings. These data point to the complexity of contaminant mobilisation processes and constituent loading in mining-affected watersheds but the combined synoptic sampling and PCA approach enables a conceptual model of contaminant dynamics to be developed to inform remediation.

  18. Comparsion analysis of data mining models applied to clinical research in traditional Chinese medicine.

    PubMed

    Zhao, Yufeng; Xie, Qi; He, Liyun; Liu, Baoyan; Li, Kun; Zhang, Xiang; Bai, Wenjing; Luo, Lin; Jing, Xianghong; Huo, Ruili

    2014-10-01

    To help researchers selecting appropriate data mining models to provide better evidence for the clinical practice of Traditional Chinese Medicine (TCM) diagnosis and therapy. Clinical issues based on data mining models were comprehensively summarized from four significant elements of the clinical studies: symptoms, symptom patterns, herbs, and efficacy. Existing problems were further generalized to determine the relevant factors of the performance of data mining models, e.g. data type, samples, parameters, variable labels. Combining these relevant factors, the TCM clinical data features were compared with regards to statistical characters and informatics properties. Data models were compared simultaneously from the view of applied conditions and suitable scopes. The main application problems were the inconsistent data type and the small samples for the used data mining models, which caused the inappropriate results, even the mistake results. These features, i.e. advantages, disadvantages, satisfied data types, tasks of data mining, and the TCM issues, were summarized and compared. By aiming at the special features of different data mining models, the clinical doctors could select the suitable data mining models to resolve the TCM problem.

  19. The Development of Novel Chemical Fragment-Based Descriptors Using Frequent Common Subgraph Mining Approach and Their Application in QSAR Modeling.

    PubMed

    Khashan, Raed; Zheng, Weifan; Tropsha, Alexander

    2014-03-01

    We present a novel approach to generating fragment-based molecular descriptors. The molecules are represented by labeled undirected chemical graph. Fast Frequent Subgraph Mining (FFSM) is used to find chemical-fragments (subgraphs) that occur in at least a subset of all molecules in a dataset. The collection of frequent subgraphs (FSG) forms a dataset-specific descriptors whose values for each molecule are defined by the number of times each frequent fragment occurs in this molecule. We have employed the FSG descriptors to develop variable selection k Nearest Neighbor (kNN) QSAR models of several datasets with binary target property including Maximum Recommended Therapeutic Dose (MRTD), Salmonella Mutagenicity (Ames Genotoxicity), and P-Glycoprotein (PGP) data. Each dataset was divided into training, test, and validation sets to establish the statistical figures of merit reflecting the model validated predictive power. The classification accuracies of models for both training and test sets for all datasets exceeded 75 %, and the accuracy for the external validation sets exceeded 72 %. The model accuracies were comparable or better than those reported earlier in the literature for the same datasets. Furthermore, the use of fragment-based descriptors affords mechanistic interpretation of validated QSAR models in terms of essential chemical fragments responsible for the compounds' target property. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  20. Mining Interactions in Immersive Learning Environments for Real-Time Student Feedback

    ERIC Educational Resources Information Center

    Kennedy, Gregor; Ioannou, Ioanna; Zhou, Yun; Bailey, James; O'Leary, Stephen

    2013-01-01

    The analysis and use of data generated by students' interactions with learning systems or programs--learning analytics--has recently gained widespread attention in the educational technology community. Part of the reason for this interest is based on the potential of learning analytic techniques such as data mining to find hidden patterns in…

  1. Mineral resource of the month: barite

    USGS Publications Warehouse

    Miller, M. Michael

    2006-01-01

    Also called barytes, barite forms in various geologic environments and is frequently found with both metallic and nonmetallic minerals. Most barite is produced by open-pit mining techniques, and most crude barite requires some upgrading to meet minimum purity or specific gravity levels.

  2. Landfill mining from a deposit of the chlorine/organochlorine industry as source of dioxin contamination of animal feed and assessment of the responsible processes.

    PubMed

    Torres, João Paulo Machado; Leite, Claudio; Krauss, Thomas; Weber, Roland

    2013-04-01

    In 1997, the Polychlorinated dibenzo-para-dioxin (PCDD)/Polychlorinated dibenzofuran (PCDF) concentrations in dairy products in Germany and other European countries increased. The PCDD/PCDF source was contaminated lime used in Brazilian citrus pulp pellets. The contaminated lime was mined from an industrial dump site. However, the detailed origin of the PCDD/PCDFs in the lime was not revealed. This paper investigates the contamination origin and describes the link between lime milk from the dumpsite of a chlorine/organochlorine industry and the contaminated lime. The contaminated lime stem from mining at the corporate landfill of Solvay Indupa in Sao Paulo. The landfill was used for 40 years for deposition of production residues and closed in 1996. The factory operated/operates at least two processes with potentially high PCDD/PCDFs releases namely the oxychlorination process for production of ethylene dichloride (EDC) and the chlor-alkali process. The main landfilled waste was lime milk (1.4 million tons) from the vinyl chloride monomer production (via the acetylene process) along with residues from other processes. The PCDD/PCDF fingerprint revealed that most samples from the chemical landfill showed an EDC PCDD/PCDF pattern with a characteristic octachlorodibenzofuran dominance. The PCDD/PCDF pattern of a Rio Grande sediment samples downstream the facility showed a chlor-alkali pattern with a minor impact of the EDC pattern. The case highlights that PCDD/PCDF- and persistent organic pollutants-contaminated sites need to be identified in a comprehensive manner as required by the Stockholm Convention (article 6) and controlled for their impact on the environment and human health. Landfill mining and reuse of materials from contaminated deposits should be prohibited.

  3. Characterization of gut microbiota profiles in coronary artery disease patients using data mining analysis of terminal restriction fragment length polymorphism: gut microbiota could be a diagnostic marker of coronary artery disease.

    PubMed

    Emoto, Takuo; Yamashita, Tomoya; Kobayashi, Toshio; Sasaki, Naoto; Hirota, Yushi; Hayashi, Tomohiro; So, Anna; Kasahara, Kazuyuki; Yodoi, Keiko; Matsumoto, Takuya; Mizoguchi, Taiji; Ogawa, Wataru; Hirata, Ken-Ichi

    2017-01-01

    The association between atherosclerosis and gut microbiota has been attracting increased attention. We previously demonstrated a possible link between gut microbiota and coronary artery disease. Our aim of this study was to clarify the gut microbiota profiles in coronary artery disease patients using data mining analysis of terminal restriction fragment length polymorphism (T-RFLP). This study included 39 coronary artery disease (CAD) patients and 30 age- and sex- matched no-CAD controls (Ctrls) with coronary risk factors. Bacterial DNA was extracted from their fecal samples and analyzed by T-RFLP and data mining analysis using the classification and regression algorithm. Five additional CAD patients were newly recruited to confirm the reliability of this analysis. Data mining analysis could divide the composition of gut microbiota into 2 characteristic nodes. The CAD group was classified into 4 CAD pattern nodes (35/39 = 90 %), while the Ctrl group was classified into 3 Ctrl pattern nodes (28/30 = 93 %). Five additional CAD samples were applied to the same dividing model, which could validate the accuracy to predict the risk of CAD by data mining analysis. We could demonstrate that operational taxonomic unit 853 (OTU853), OTU657, and OTU990 were determined important both by the data mining method and by the usual statistical comparison. We classified the gut microbiota profiles in coronary artery disease patients using data mining analysis of T-RFLP data and demonstrated the possibility that gut microbiota is a diagnostic marker of suffering from CAD.

  4. Extracting nursing practice patterns from structured labor and delivery data sets.

    PubMed

    Hall, Eric S; Thornton, Sidney N

    2007-10-11

    This study was designed to demonstrate the feasibility of a computerized care process model that provides real-time case profiling and outcome forecasting. A methodology was defined for extracting nursing practice patterns from structured point-of-care data collected using the labor and delivery information system at Intermountain Healthcare. Data collected during January 2006 were retrieved from Intermountain Healthcare's enterprise data warehouse for use in the study. The knowledge discovery in databases process provided a framework for data analysis including data selection, preprocessing, data-mining, and evaluation. Development of an interactive data-mining tool and construction of a data model for stratification of patient records into profiles supported the goals of the study. Five benefits of the practice pattern extraction capability, which extend to other clinical domains, are listed with supporting examples.

  5. A Study of Pattern Prediction in the Monitoring Data of Earthen Ruins with the Internet of Things.

    PubMed

    Xiao, Yun; Wang, Xin; Eshragh, Faezeh; Wang, Xuanhong; Chen, Xiaojiang; Fang, Dingyi

    2017-05-11

    An understanding of the changes of the rammed earth temperature of earthen ruins is important for protection of such ruins. To predict the rammed earth temperature pattern using the air temperature pattern of the monitoring data of earthen ruins, a pattern prediction method based on interesting pattern mining and correlation, called PPER, is proposed in this paper. PPER first finds the interesting patterns in the air temperature sequence and the rammed earth temperature sequence. To reduce the processing time, two pruning rules and a new data structure based on an R-tree are also proposed. Correlation rules between the air temperature patterns and the rammed earth temperature patterns are then mined. The correlation rules are merged into predictive rules for the rammed earth temperature pattern. Experiments were conducted to show the accuracy of the presented method and the power of the pruning rules. Moreover, the Ming Dynasty Great Wall dataset was used to examine the algorithm, and six predictive rules from the air temperature to rammed earth temperature based on the interesting patterns were obtained, with the average hit rate reaching 89.8%. The PPER and predictive rules will be useful for rammed earth temperature prediction in protection of earthen ruins.

  6. Rare earth elements (REE) as natural and applied tracers in the catchment area of Gessental valley, former uranium mining area of Eastern Thuringia, Germany

    NASA Astrophysics Data System (ADS)

    Buechel, G.; Merten, D.; Geletneky, J. W.; Kothe, E.

    2003-04-01

    Between 1947 and 1990 about 113.000 t of uranium were excavated at the former uranium mining site of Ronneburg (Eastern Thuringia, Germany). The legacy consists of more than 200 million m^3 of metasedimentary rocks rich in organic matter, sulfides and heavy metals originally deposited in mining heaps at the surface. The metasedimentary rocks formed under anoxic conditions about a 400 Mio. years ago are now exposed to oxic conditions. The oxidation of markasite and pyrite results in the formation of H_2SO_4. The formation of acid mine drainage (AMD) leads to high concentrations of uranium, rare earth elements (REE) and other heavy metals in surface water, seepage water and groundwater. This mobilization is due to alteration enhanced by high microbial activity and low pH. The tolerance mechanisms towards heavy metal pollution of soil substrate and surface/groundwater has allowed the selection of microbes which have, e.g. specific transporter genes and which are associated to plants in symbiotic interactions like mycorrhiza. In order to follow the processes linking alteration of metasedimentary rocks to biological systems the use of tracers is needed. One group of such tracers occuring in high concentrations in the water phase at the Ronneburg mining site are the REE (La-Lu) which are featured by very similar chemical behaviour. They show smooth but continuous variations of their chemical behaviour as a function of atomic number. For seepage water of the waste rock dump Nordhalde - sampled over a period of two years - the shale normalized REE patterns show enrichment of heavy REE and only minor variations, although the concentration differs. At sampling points in the surface water and in groundwater rather similar REE patterns were observed. Thus, REE can be used as tracers to identify diffuse inflow of REE-rich acid mine drainage of the dumps into the creek and the sediments. The absolute concentrations of REE in the creek and in ground water are up to 1000 times less than in seepage water due to mixing and (co)precipitation of REE. Lu/La and Sm/La relations show a significant decrease with increasing distance from the dump caused by preferential (co)precipitation of heavy REE with amorphous Fe-hydroxides along the Gessenbach. Thus, REE patterns can not only be used as tracers but also to study processes. In contrast to the patterns of the seepage, the REE patterns of the Silurian rocks as determined by LA-ICP-MS feature rather flat patterns with enrichment of middle REE (Sm - Dy). Results from batch experiments show preferentially leaching of heavy REE for all investigated source rocks. The highest absolute concentrations of REE appear in the eluates of the Silurian 'Ockerkalk'. Since the REE pattern closely reflects the pattern found in the seepage water it is assumed to be the most important source for the occurence of the REE pattern observed in seepage water. Studies of microbial heavy metal retention were performed by direct incubation of seepage water using well characterized fungal and bacterial strains. Using the bacterium Escherichia coli for incubation of seepage water sorption of heavy metals to biomass was observed. Use of the fungus Schizophyllum commune for incubation, however, has a much more pronounced effect including significant fractionation of REE pointing to the possibility of a specific active uptake mechanism. Bioextraction with bacteria and fungal mycelia might be an alternative to plant growth and phytoextraction and might be preferable for AMD water treatment since no soil substrate is necessary. Future research must be directed towards genes for active transport, intra- or extracellular storage proteins and their application. Biotechnological use of such genes in, e.g., strains of E. coli, might yield highly useful bioremediation strains that can help to reduce the ecological effects of pollution resulting from former mining activities.

  7. Mining local climate data to assess spatiotemporal dengue fever epidemic patterns in French Guiana.

    PubMed

    Flamand, Claude; Fabregue, Mickael; Bringay, Sandra; Ardillon, Vanessa; Quénel, Philippe; Desenclos, Jean-Claude; Teisseire, Maguelonne

    2014-10-01

    To identify local meteorological drivers of dengue fever in French Guiana, we applied an original data mining method to the available epidemiological and climatic data. Through this work, we also assessed the contribution of the data mining method to the understanding of factors associated with the dissemination of infectious diseases and their spatiotemporal spread. We applied contextual sequential pattern extraction techniques to epidemiological and meteorological data to identify the most significant climatic factors for dengue fever, and we investigated the relevance of the extracted patterns for the early warning of dengue outbreaks in French Guiana. The maximum temperature, minimum relative humidity, global brilliance, and cumulative rainfall were identified as determinants of dengue outbreaks, and the precise intervals of their values and variations were quantified according to the epidemiologic context. The strongest significant correlations were observed between dengue incidence and meteorological drivers after a 4-6-week lag. We demonstrated the use of contextual sequential patterns to better understand the determinants of the spatiotemporal spread of dengue fever in French Guiana. Future work should integrate additional variables and explore the notion of neighborhood for extracting sequential patterns. Dengue fever remains a major public health issue in French Guiana. The development of new methods to identify such specific characteristics becomes crucial in order to better understand and control spatiotemporal transmission. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  8. A review on the mechanism, risk evaluation, and prevention of coal spontaneous combustion in China.

    PubMed

    Kong, Biao; Li, Zenghua; Yang, Yongliang; Liu, Zhen; Yan, Daocheng

    2017-10-01

    In recent years, the ecology, security, and sustainable development of modern mines have become the theme of coal mine development worldwide. However, spontaneous combustion of coal under conditions of oxygen supply and automatic exothermic heating during coal mining lead to coalfield fires. Coal spontaneous combustion (CSC) causes huge economic losses and casualties, with the toxic and harmful gases produced during coal combustion not only polluting the working environment, but also causing great damage to the ecological environment. China is the world's largest coal producer and consumer; however, coal production in Chinese mines is seriously threatened by the CSC risk. Because deep underground mining methods are commonly adopted in Chinese coal mines, coupling disasters are frequent in these mines with the coalfield fires becoming increasingly serious. Therefore, in this study, we analyzed the development mechanism of CSC. The CSC risk assessment was performed from the aspects of prediction, detection, and determination of the "dangerous area" in a coal mine (i.e., the area most susceptible to fire hazards). A new geophysical method for CSC determination is proposed and analyzed. Furthermore, the main methods for CSC fire prevention and control and their advantages and disadvantages are analyzed. To eventually construct CSC prevention and control integration system, future developmental direction of CSC was given from five aspects. Our results can present a reference for the development of CSC fire prevention and control technology and promote the protection of ecological environment in China.

  9. Correlating microbial community profiles with geochemical conditions in a watershed heavily contaminated by an antimony tailing pond.

    PubMed

    Xiao, Enzong; Krumins, Valdis; Tang, Song; Xiao, Tangfu; Ning, Zengping; Lan, Xiaolong; Sun, Weimin

    2016-08-01

    Mining activities have introduced various pollutants to surrounding aquatic and terrestrial environments, causing adverse impacts to the environment. Indigenous microbial communities are responsible for the biogeochemical cycling of pollutants in diverse environments, indicating the potential for bioremediation of such pollutants. Antimony (Sb) has been extensively mined in China and Sb contamination in mining areas has been frequently encountered. To date, however, the microbial composition and structure in response to Sb contamination has remained overlooked. Sb and As frequently co-occur in sulfide-rich ores, and co-contamination of Sb and As is observed in some mining areas. We characterized, for the first time, the microbial community profiles and their responses to Sb and As pollution from a watershed heavily contaminated by Sb tailing pond in Southwest China. The indigenous microbial communities were profiled by high-throughput sequencing from 16 sediment samples (535,390 valid reads). The comprehensive geochemical data (specifically, physical-chemical properties and different Sb and As extraction fractions) were obtained from river water and sediments at different depths as well. Canonical correspondence analysis (CCA) demonstrated that a suite of in situ geochemical and physical factors significantly structured the overall microbial community compositions. Further, we found significant correlations between individual phylotypes (bacterial genera) and the geochemical fractions of Sb and As by Spearman rank correlation. A number of taxonomic groups were positively correlated with the Sb and As extractable fractions and various Sb and As species in sediment, suggesting potential roles of these phylotypes in Sb biogeochemical cycling. Copyright © 2016 Elsevier Ltd. All rights reserved.

  10. Biomedical text mining and its applications in cancer research.

    PubMed

    Zhu, Fei; Patumcharoenpol, Preecha; Zhang, Cheng; Yang, Yang; Chan, Jonathan; Meechai, Asawin; Vongsangnak, Wanwipa; Shen, Bairong

    2013-04-01

    Cancer is a malignant disease that has caused millions of human deaths. Its study has a long history of well over 100years. There have been an enormous number of publications on cancer research. This integrated but unstructured biomedical text is of great value for cancer diagnostics, treatment, and prevention. The immense body and rapid growth of biomedical text on cancer has led to the appearance of a large number of text mining techniques aimed at extracting novel knowledge from scientific text. Biomedical text mining on cancer research is computationally automatic and high-throughput in nature. However, it is error-prone due to the complexity of natural language processing. In this review, we introduce the basic concepts underlying text mining and examine some frequently used algorithms, tools, and data sets, as well as assessing how much these algorithms have been utilized. We then discuss the current state-of-the-art text mining applications in cancer research and we also provide some resources for cancer text mining. With the development of systems biology, researchers tend to understand complex biomedical systems from a systems biology viewpoint. Thus, the full utilization of text mining to facilitate cancer systems biology research is fast becoming a major concern. To address this issue, we describe the general workflow of text mining in cancer systems biology and each phase of the workflow. We hope that this review can (i) provide a useful overview of the current work of this field; (ii) help researchers to choose text mining tools and datasets; and (iii) highlight how to apply text mining to assist cancer systems biology research. Copyright © 2012 Elsevier Inc. All rights reserved.

  11. Using association rule mining to identify risk factors for early childhood caries.

    PubMed

    Ivančević, Vladimir; Tušek, Ivan; Tušek, Jasmina; Knežević, Marko; Elheshk, Salaheddin; Luković, Ivan

    2015-11-01

    Early childhood caries (ECC) is a potentially severe disease affecting children all over the world. The available findings are mostly based on a logistic regression model, but data mining, in particular association rule mining, could be used to extract more information from the same data set. ECC data was collected in a cross-sectional analytical study of the 10% sample of preschool children in the South Bačka area (Vojvodina, Serbia). Association rules were extracted from the data by association rule mining. Risk factors were extracted from the highly ranked association rules. Discovered dominant risk factors include male gender, frequent breastfeeding (with other risk factors), high birth order, language, and low body weight at birth. Low health awareness of parents was significantly associated to ECC only in male children. The discovered risk factors are mostly confirmed by the literature, which corroborates the value of the methods. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  12. Riparian plants on mine runoff in Zimapan, Hidalgo, Mexico: Useful for phytoremediation?

    PubMed

    Carmona-Chit, Eréndira; Carrillo-González, Rogelio; González-Chávez, Ma Del Carmen A; Vibrans, Heike; Yáñez-Espinosa, Laura; Delgado-Alvarado, Adriana

    2016-09-01

    Dispersion and runoff of mine tailings have serious implications for human and ecosystem health in the surroundings of mines. Water, soils and plants were sampled in transects perpendicular to the Santiago stream in Zimapan, Hidalgo, which receives runoff sediments from two acidic and one alkaline mine tailing. Concentrations of potentially toxic elements (PTE) were measured in water, soils (rhizosphere and non-rhizosphere) and plants. Using diethylenetriaminepentaacetic acid (DTPA) extractable concentrations of Cu, Zn, Ni, Cd and Pb in rhizosphere soil, the bioconcentration and translocation factors were calculated. Ruderal annuals formed the principal element of the herbaceous vegetation. Accumulation was the most frequent strategy to deal with high concentrations of Zn, Cu, Ni, Cd and Pb. The order of concentration in plant tissue was Zn>Pb>Cu>Ni>Cd. Most plants contained concentrations of PTE considered as phytotoxic and behaved as metal tolerant species. Rorippa nasturtium-aquaticum accumulated particularly high concentrations of Cu. Parietaria pensylvanica and Commelina diffusa, common tropical weeds, behaved as Zn hyperaccumulators and should be studied further.

  13. Study of Internal Dump Stability of Dudhichua Open Cast Project, Northern Coalfields Limited, India

    NASA Astrophysics Data System (ADS)

    Sengupta, S.; Roy, I.

    2015-04-01

    Dudhichua Open Cast Project is one of the prestigious projects of Northern Coalfields Limited, India; with total mineable coal reserves of approximately 400 million tonnes and corresponding 1,700 million m3 volume of waste rock i.e. overburden material. Accommodating this waste dump masses in the limited space of the de-coaled portion of the quarry is considered as one of the major challenges to the mine operators. It has been reported that this mine is facing frequent slope failures of waste rock dumps which is of great concern to the mine management in view of unsafe working condition. To tackle the above problem, a detailed investigation was carried out to propose a stable dump profile which will cater to the land economics and safety aspects of the mine. A detailed investigation along with recommendation of optimum design for dragline dump profile along with shovel-dumper-dump profile is presented in this paper.

  14. Thermal infrared remote sensing in assessing groundwater and surface-water resources related to Hannukainen mining development site, northern Finland

    NASA Astrophysics Data System (ADS)

    Rautio, Anne B.; Korkka-Niemi, Kirsti I.; Salonen, Veli-Pekka

    2018-02-01

    Mining development sites occasionally host complicated aquifer systems with notable connections to natural surface water (SW) bodies. A low-altitude thermal infrared (TIR) imaging survey was conducted to identify hydraulic connections between aquifers and rivers and to map spatial surface temperature patterns along the subarctic rivers in the proximity of the Hannukainen mining development area, northern Finland. In addition to TIR data, stable isotopic compositions ( δ 18O, δD) and dissolved silica concentrations were used as tracers to verify the observed groundwater (GW) discharge into the river system. Based on the TIR survey, notable GW discharge into the main river channel and its tributaries (61 km altogether) was observed and over 500 GW discharge sites were located. On the basis of the survey, the longitudinal temperature patterns of the studied rivers were found to be highly variable. Hydrological and hydrogeological information is crucial in planning and siting essential mining operations, such as tailing areas, in order to prevent any undesirable environmental impacts. The observed notable GW discharge was taken into consideration in the planning of the Hannukainen mining development area. The results of this study support the use of TIR imagery in GW-SW interaction and environmental studies in extensive and remote areas with special concerns for water-related issues but lacking the baseline research.

  15. The role of conflict minerals, artisanal mining, and informal trading networks in African intrastate and regional conflicts

    USGS Publications Warehouse

    Chirico, Peter G.; Malpeli, Katherine C.

    2014-01-01

    The relationship between natural resources and armed conflict gained public and political attention in the 1990s, when it became evident that the mining and trading of diamonds were connected with brutal rebellions in several African nations. Easily extracted resources such as alluvial diamonds and gold have been and continue to be exploited by rebel groups to fund their activities. Artisanal and small-scale miners operating under a quasi-legal status often mine these mineral deposits. While many African countries have legalized artisanal mining and established flow chains through which production is intended to travel, informal trading networks frequently emerge in which miners seek to evade taxes and fees by selling to unauthorized buyers. These networks have the potential to become international in scope, with actors operating in multiple countries. The lack of government control over the artisanal mining sector and the prominence of informal trade networks can have severe social, political, and economic consequences. In the past, mineral extraction fuelled violent civil wars in Sierra Leone, Liberia, and Angola, and it continues to do so today in several other countries. The significant influence of the informal network that surrounds artisanal mining is therefore an important security concern that can extend across borders and have far-reaching impacts.

  16. Modeling N Cycling during Succession after Forest Disturbance: an Analysis of N Mining and Retention Hypothesis

    NASA Astrophysics Data System (ADS)

    Zhou, Z.; Ollinger, S. V.; Ouimette, A.; Lovett, G. M.; Fuss, C. B.; Goodale, C. L.

    2017-12-01

    Dissolved inorganic nitrogen losses at the Hubbard Brook Experimental Forest (HBEF), New Hampshire, USA, have declined in recent decades, a pattern that counters expectations based on prevailing theory. An unbalanced ecosystem nitrogen (N) budget implies there is a missing component for N sink. Hypotheses to explain this discrepancy include increasing rates of denitrification and accumulation of N in mineral soil pools following N mining by plants. Here, we conducted a modeling analysis fused with field measurements of N cycling, specifically examining the hypothesis relevant to N mining and retention in mineral soils. We included simplified representations of both mechanisms, N mining and retention, in a revised ecosystem process model, PnET-SOM, to evaluate the dynamics of N cycling during succession after forest disturbance at the HBEF. The predicted N mining during the early succession was regulated by a metric representing a potential demand of extra soil N for large wood growth. The accumulation of nitrate in mineral soil pools was a function of the net aboveground biomass accumulation and soil N availability and parameterized based on field 15N tracer incubation data. The predicted patterns of forest N dynamics were consistent with observations. The addition of the new algorithms also improved the predicted DIN export in stream water with an R squared of 0.35 (P<0.01) aganist observations. Predicted mining processes had an average rate of 7.4 kgNha-1yr-1 and Predicted rates of N retention processes were 5.2 kgNha-1yr-1, both of which were in line with estimates only based on field data. The predicted trend of low DIN export could continue for another 70 years to pay back the mined N in mineral soils. Predicted ecosystem N balance showed that N gas loss could account for 14-46% of the total N deposition, the soil mining about 103% during the early succession, and soil retention about 35% at the current forest stage at the HBEF.

  17. Information mining in remote sensing imagery

    NASA Astrophysics Data System (ADS)

    Li, Jiang

    The volume of remotely sensed imagery continues to grow at an enormous rate due to the advances in sensor technology, and our capability for collecting and storing images has greatly outpaced our ability to analyze and retrieve information from the images. This motivates us to develop image information mining techniques, which is very much an interdisciplinary endeavor drawing upon expertise in image processing, databases, information retrieval, machine learning, and software design. This dissertation proposes and implements an extensive remote sensing image information mining (ReSIM) system prototype for mining useful information implicitly stored in remote sensing imagery. The system consists of three modules: image processing subsystem, database subsystem, and visualization and graphical user interface (GUI) subsystem. Land cover and land use (LCLU) information corresponding to spectral characteristics is identified by supervised classification based on support vector machines (SVM) with automatic model selection, while textural features that characterize spatial information are extracted using Gabor wavelet coefficients. Within LCLU categories, textural features are clustered using an optimized k-means clustering approach to acquire search efficient space. The clusters are stored in an object-oriented database (OODB) with associated images indexed in an image database (IDB). A k-nearest neighbor search is performed using a query-by-example (QBE) approach. Furthermore, an automatic parametric contour tracing algorithm and an O(n) time piecewise linear polygonal approximation (PLPA) algorithm are developed for shape information mining of interesting objects within the image. A fuzzy object-oriented database based on the fuzzy object-oriented data (FOOD) model is developed to handle the fuzziness and uncertainty. Three specific applications are presented: integrated land cover and texture pattern mining, shape information mining for change detection of lakes, and fuzzy normalized difference vegetation index (NDVI) pattern mining. The study results show the effectiveness of the proposed system prototype and the potentials for other applications in remote sensing.

  18. Using a Data Mining Approach to Develop a Student Engagement-Based Institutional Typology. IR Applications, Volume 18, February 8, 2009

    ERIC Educational Resources Information Center

    Luan, Jing; Zhao, Chun-Mei; Hayek, John C.

    2009-01-01

    Data mining provides both systematic and systemic ways to detect patterns of student engagement among students at hundreds of institutions. Using traditional statistical techniques alone, the task would be significantly difficult--if not impossible--considering the size and complexity in both data and analytical approaches necessary for this…

  19. Use of Data Mining to Reveal Body Mass Index (BMI): Patterns among Pennsylvania Schoolchildren, Pre-K to Grade 12

    ERIC Educational Resources Information Center

    YoussefAgha, Ahmed H.; Lohrmann, David K.; Jayawardene, Wasantha P.

    2013-01-01

    Background: Health eTools for Schools was developed to assist school nurses with routine entries, including height and weight, on student health records, thus providing a readily accessible data base. Data-mining techniques were applied to this database to determine if clinically signi?cant results could be generated. Methods: Body mass index…

  20. Large Scale Data Mining to Improve Usability of Data: An Intelligent Archive Testbed

    NASA Technical Reports Server (NTRS)

    Ramapriyan, Hampapuram; Isaac, David; Yang, Wenli; Morse, Steve

    2005-01-01

    Research in certain scientific disciplines - including Earth science, particle physics, and astrophysics - continually faces the challenge that the volume of data needed to perform valid scientific research can at times overwhelm even a sizable research community. The desire to improve utilization of this data gave rise to the Intelligent Archives project, which seeks to make data archives active participants in a knowledge building system capable of discovering events or patterns that represent new information or knowledge. Data mining can automatically discover patterns and events, but it is generally viewed as unsuited for large-scale use in disciplines like Earth science that routinely involve very high data volumes. Dozens of research projects have shown promising uses of data mining in Earth science, but all of these are based on experiments with data subsets of a few gigabytes or less, rather than the terabytes or petabytes typically encountered in operational systems. To bridge this gap, the Intelligent Archives project is establishing a testbed with the goal of demonstrating the use of data mining techniques in an operationally-relevant environment. This paper discusses the goals of the testbed and the design choices surrounding critical issues that arose during testbed implementation.

  1. The Application of Data Mining Techniques to Create Promotion Strategy for Mobile Phone Shop

    NASA Astrophysics Data System (ADS)

    Khasanah, A. U.; Wibowo, K. S.; Dewantoro, H. F.

    2017-12-01

    The number of mobile shop is growing very fast in various regions in Indonesia including in Yogyakarta due to the increasing demand of mobile phone. This fact leads high competition among the mobile phone shops. In these conditions the mobile phone shop should have a good promotion strategy in order to survive in competition, especially for a small mobile phone shop. To create attractive promotion strategy, the companies/shops should know their customer segmentation and the buying pattern of their target market. These kind of analysis can be done using Data mining technique. This study aims to segment customer using Agglomerative Hierarchical Clustering and know customer buying pattern using Association Rule Mining. This result conducted in a mobile shop in Sleman Yogyakarta. The clustering result shows that the biggest customer segment of the shop was male university student who come on weekend and from association rule mining, it can be concluded that tempered glass and smart phone “x” as well as action camera and waterproof monopod and power bank have strong relationship. This results that used to create promotion strategies which are presented in the end of the study.

  2. Mining dynamic noteworthy functions in software execution sequences.

    PubMed

    Zhang, Bing; Huang, Guoyan; Wang, Yuqian; He, Haitao; Ren, Jiadong

    2017-01-01

    As the quality of crucial entities can directly affect that of software, their identification and protection become an important premise for effective software development, management, maintenance and testing, which thus contribute to improving the software quality and its attack-defending ability. Most analysis and evaluation on important entities like codes-based static structure analysis are on the destruction of the actual software running. In this paper, from the perspective of software execution process, we proposed an approach to mine dynamic noteworthy functions (DNFM)in software execution sequences. First, according to software decompiling and tracking stack changes, the execution traces composed of a series of function addresses were acquired. Then these traces were modeled as execution sequences and then simplified so as to get simplified sequences (SFS), followed by the extraction of patterns through pattern extraction (PE) algorithm from SFS. After that, evaluating indicators inner-importance and inter-importance were designed to measure the noteworthiness of functions in DNFM algorithm. Finally, these functions were sorted by their noteworthiness. Comparison and contrast were conducted on the experiment results from two traditional complex network-based node mining methods, namely PageRank and DegreeRank. The results show that the DNFM method can mine noteworthy functions in software effectively and precisely.

  3. An empirical method for estimating instream pre-mining pH and dissolved Cu concentration in catchments with acidic drainage and ferricrete

    USGS Publications Warehouse

    Nimick, D.A.; Gurrieri, J.T.; Furniss, G.

    2009-01-01

    Methods for assessing natural background water quality of streams affected by historical mining are vigorously debated. An empirical method is proposed in which stream-specific estimation equations are generated from relationships between either pH or dissolved Cu concentration in stream water and the Fe/Cu concentration ratio in Fe-precipitates presently forming in the stream. The equations and Fe/Cu ratios for pre-mining deposits of alluvial ferricrete then were used to reconstruct estimated pre-mining longitudinal profiles for pH and dissolved Cu in three acidic streams in Montana, USA. Primary assumptions underlying the proposed method are that alluvial ferricretes and modern Fe-precipitates share a common origin, that the Cu content of Fe-precipitates remains constant during and after conversion to ferricrete, and that geochemical factors other than pH and dissolved Cu concentration play a lesser role in determining Fe/Cu ratios in Fe-precipitates. The method was evaluated by applying it in a fourth, naturally acidic stream unaffected by mining, where estimated pre-mining pH and Cu concentrations were similar to present-day values, and by demonstrating that inflows, particularly from unmined areas, had consistent effects on both the pre-mining and measured profiles of pH and Cu concentration. Using this method, it was estimated that mining has affected about 480 m of Daisy Creek, 1.8 km of Fisher Creek, and at least 1 km of Swift Gulch. Mean values of pH decreased by about 0.6 pH units to about 3.2 in Daisy Creek and by 1-1.5 pH units to about 3.5 in Fisher Creek. In Swift Gulch, mining appears to have decreased pH from about 5.5 to as low as 3.6. Dissolved Cu concentrations increased due to mining almost 40% in Daisy Creek to a mean of 11.7 mg/L and as much as 230% in Fisher Creek to 0.690 mg/L. Uncertainty in the fate of Cu during the conversion of Fe-precipitates to ferricrete translates to potential errors in pre-mining estimates of as much as 0.25 units for pH and 22% for dissolved Cu concentration. The method warrants further testing in other mined and unmined watersheds. Comparison of pre-mining water-quality estimates derived from the ferricrete and other methods in single watersheds would be particularly valuable. The method has potential for use in monitoring remedial efforts at mine sites with ferricrete deposits. A reasonable remediation objective might be realized when the downstream pattern of Fe/Cu ratios in modern streambed Fe-precipitates corresponds to the pattern in pre-mining alluvial ferricrete deposits along a stream valley.

  4. A cross-sectional survey on knowledge and perceptions of health risks associated with arsenic and mercury contamination from artisanal gold mining in Tanzania

    PubMed Central

    2013-01-01

    Background An estimated 0.5 to 1.5 million informal miners, of whom 30-50% are women, rely on artisanal mining for their livelihood in Tanzania. Mercury, used in the processing gold ore, and arsenic, which is a constituent of some ores, are common occupational exposures that frequently result in widespread environmental contamination. Frequently, the mining activities are conducted haphazardly without regard for environmental, occupational, or community exposure. The primary objective of this study was to assess community risk knowledge and perception of potential mercury and arsenic toxicity and/or exposure from artisanal gold mining in Rwamagasa in northwestern Tanzania. Methods A cross-sectional survey of respondents in five sub-villages in the Rwamagasa Village located in Geita District in northwestern Tanzania near Lake Victoria was conducted. This area has a history of artisanal gold mining and many of the population continue to work as miners. Using a clustered random selection approach for recruitment, a total of 160 individuals over 18 years of age completed a structured interview. Results The interviews revealed wide variations in knowledge and risk perceptions concerning mercury and arsenic exposure, with 40.6% (n=65) and 89.4% (n=143) not aware of the health effects of mercury and arsenic exposure respectively. Males were significantly more knowledgeable (n=59, 36.9%) than females (n=36, 22.5%) with regard to mercury (x2=3.99, p<0.05). An individual’s occupation category was associated with level of knowledge (x2=22.82, p=<0.001). Individuals involved in mining (n=63, 73.2%) were more knowledgeable about the negative health effects of mercury than individuals in other occupations. Of the few individuals (n=17, 10.6%) who knew about arsenic toxicity, the majority (n=10, 58.8%) were miners. Conclusions The knowledge of individuals living in Rwamagasa, Tanzania, an area with a history of artisanal gold mining, varied widely with regard to the health hazards of mercury and arsenic. In these communities there was limited awareness of the threats to health associated with exposure to mercury and arsenic. This lack of knowledge, combined with minimal environmental monitoring and controlled waste management practices, highlights the need for health education, surveillance, and policy changes. PMID:23351708

  5. A cross-sectional survey on knowledge and perceptions of health risks associated with arsenic and mercury contamination from artisanal gold mining in Tanzania.

    PubMed

    Charles, Elias; Thomas, Deborah S K; Dewey, Deborah; Davey, Mark; Ngallaba, Sospatro E; Konje, Eveline

    2013-01-25

    An estimated 0.5 to 1.5 million informal miners, of whom 30-50% are women, rely on artisanal mining for their livelihood in Tanzania. Mercury, used in the processing gold ore, and arsenic, which is a constituent of some ores, are common occupational exposures that frequently result in widespread environmental contamination. Frequently, the mining activities are conducted haphazardly without regard for environmental, occupational, or community exposure. The primary objective of this study was to assess community risk knowledge and perception of potential mercury and arsenic toxicity and/or exposure from artisanal gold mining in Rwamagasa in northwestern Tanzania. A cross-sectional survey of respondents in five sub-villages in the Rwamagasa Village located in Geita District in northwestern Tanzania near Lake Victoria was conducted. This area has a history of artisanal gold mining and many of the population continue to work as miners. Using a clustered random selection approach for recruitment, a total of 160 individuals over 18 years of age completed a structured interview. The interviews revealed wide variations in knowledge and risk perceptions concerning mercury and arsenic exposure, with 40.6% (n=65) and 89.4% (n=143) not aware of the health effects of mercury and arsenic exposure respectively. Males were significantly more knowledgeable (n=59, 36.9%) than females (n=36, 22.5%) with regard to mercury (x²=3.99, p<0.05). An individual's occupation category was associated with level of knowledge (x²=22.82, p=<0.001). Individuals involved in mining (n=63, 73.2%) were more knowledgeable about the negative health effects of mercury than individuals in other occupations. Of the few individuals (n=17, 10.6%) who knew about arsenic toxicity, the majority (n=10, 58.8%) were miners. The knowledge of individuals living in Rwamagasa, Tanzania, an area with a history of artisanal gold mining, varied widely with regard to the health hazards of mercury and arsenic. In these communities there was limited awareness of the threats to health associated with exposure to mercury and arsenic. This lack of knowledge, combined with minimal environmental monitoring and controlled waste management practices, highlights the need for health education, surveillance, and policy changes.

  6. FREQUENT SUBGRAPH MINING OF PERSONALIZED SIGNALING PATHWAY NETWORKS GROUPS PATIENTS WITH FREQUENTLY DYSREGULATED DISEASE PATHWAYS AND PREDICTS PROGNOSIS.

    PubMed

    Durmaz, Arda; Henderson, Tim A D; Brubaker, Douglas; Bebek, Gurkan

    2017-01-01

    Large scale genomics studies have generated comprehensive molecular characterization of numerous cancer types. Subtypes for many tumor types have been established; however, these classifications are based on molecular characteristics of a small gene sets with limited power to detect dysregulation at the patient level. We hypothesize that frequent graph mining of pathways to gather pathways functionally relevant to tumors can characterize tumor types and provide opportunities for personalized therapies. In this study we present an integrative omics approach to group patients based on their altered pathway characteristics and show prognostic differences within breast cancer (p < 9:57E - 10) and glioblastoma multiforme (p < 0:05) patients. We were able validate this approach in secondary RNA-Seq datasets with p < 0:05 and p < 0:01 respectively. We also performed pathway enrichment analysis to further investigate the biological relevance of dysregulated pathways. We compared our approach with network-based classifier algorithms and showed that our unsupervised approach generates more robust and biologically relevant clustering whereas previous approaches failed to report specific functions for similar patient groups or classify patients into prognostic groups. These results could serve as a means to improve prognosis for future cancer patients, and to provide opportunities for improved treatment options and personalized interventions. The proposed novel graph mining approach is able to integrate PPI networks with gene expression in a biologically sound approach and cluster patients in to clinically distinct groups. We have utilized breast cancer and glioblastoma multiforme datasets from microarray and RNA-Seq platforms and identified disease mechanisms differentiating samples. Supplementary methods, figures, tables and code are available at https://github.com/bebeklab/dysprog.

  7. Mercury and methylmercury concentrations and loads in the Cache Creek watershed, California

    USGS Publications Warehouse

    Domagalski, Joseph L.; Alpers, Charles N.; Slotton, D.G.; Suchanek, T.H.; Ayers, S.M.

    2004-01-01

    Concentrations and loads of total mercury and methylmercury were measured in streams draining abandoned mercury mines and in the proximity of geothermal discharge in the Cache Creek watershed of California during a 17-month period from January 2000 through May 2001. Rainfall and runoff were lower than long-term averages during the study period. The greatest loading of mercury and methylmercury from upstream sources to downstream receiving waters, such as San Francisco Bay, generally occurred during or after winter rainfall events. During the study period, loads of mercury and methylmercury from geothermal sources tended to be greater than those from abandoned mining areas, a pattern attributable to the lack of large precipitation events capable of mobilizing significant amounts of either mercury-laden sediment or dissolved mercury and methylmercury from mine waste. Streambed sediments of Cache Creek are a significant source of mercury and methylmercury to downstream receiving bodies of water. Much of the mercury in these sediments is the result of deposition over the last 100-150 years by either storm-water runoff, from abandoned mines, or continuous discharges from geothermal areas. Several geochemical constituents were useful as natural tracers for mining and geothermal areas, including the aqueous concentrations of boron, chloride, lithium and sulfate, and the stable isotopes of hydrogen and oxygen in water. Stable isotopes of water in areas draining geothermal discharges showed a distinct trend toward enrichment of 18O compared with meteoric waters, whereas much of the runoff from abandoned mines indicated a stable isotopic pattern more consistent with local meteoric water. ?? 2004 Elsevier B.V. All rights reserved.

  8. [Spatiotemporal patterns and driving forces of land use change in industrial relocation area: a case study of old industrial area in Tiexi of Shenyang, Northeast China].

    PubMed

    Wang, Mei-Ling; Bing, Long-Fei; Xi, Feng-Ming; Wu, Rui; Geng, Yong

    2013-07-01

    Based on the QuickBird remote sensing images and with the support of GIS, this paper analyzed the spatiotemporal characteristics of land use change and its driving forces in old industrial area of Tiexi, Shenyang City of Liaoning Province in 2000-2010. During the study period, the industrial and mining warehouse land pattern had the greatest change, evolving from the historical pattern of residential land in the south and of industrial land in the north into residential land as the dominant land use pattern. In the last decade, the residential land area increased by 9%, mainly transferred from the industrial and mining warehouse land located in the north of Jianshe Road, while the industrial and mining warehouse land area decreased by 20%. The land areas for the commercial service and for the administrative and public services were increased by 1.3% and 3.1%, respectively. The land area for construction had a greater change, with an overall change rate being 76.9%. The land use change rate in 2000-2005 was greater than that in 2005-2010. National development strategies and policies, regional development planning, administrative reform, and industrial upgrading were the main driving forces of the land use change in old industrial area of Tiexi.

  9. Identification of Shearer Cutting Patterns Using Vibration Signals Based on a Least Squares Support Vector Machine with an Improved Fruit Fly Optimization Algorithm

    PubMed Central

    Si, Lei; Wang, Zhongbin; Liu, Xinhua; Tan, Chao; Liu, Ze; Xu, Jing

    2016-01-01

    Shearers play an important role in fully mechanized coal mining face and accurately identifying their cutting pattern is very helpful for improving the automation level of shearers and ensuring the safety of coal mining. The least squares support vector machine (LSSVM) has been proven to offer strong potential in prediction and classification issues, particularly by employing an appropriate meta-heuristic algorithm to determine the values of its two parameters. However, these meta-heuristic algorithms have the drawbacks of being hard to understand and reaching the global optimal solution slowly. In this paper, an improved fly optimization algorithm (IFOA) to optimize the parameters of LSSVM was presented and the LSSVM coupled with IFOA (IFOA-LSSVM) was used to identify the shearer cutting pattern. The vibration acceleration signals of five cutting patterns were collected and the special state features were extracted based on the ensemble empirical mode decomposition (EEMD) and the kernel function. Some examples on the IFOA-LSSVM model were further presented and the results were compared with LSSVM, PSO-LSSVM, GA-LSSVM and FOA-LSSVM models in detail. The comparison results indicate that the proposed approach was feasible, efficient and outperformed the others. Finally, an industrial application example at the coal mining face was demonstrated to specify the effect of the proposed system. PMID:26771615

  10. Discovering Activities to Recognize and Track in a Smart Environment

    PubMed Central

    Rashidi, Parisa; Cook, Diane J.; Holder, Lawrence B.; Schmitter-Edgecombe, Maureen

    2011-01-01

    The machine learning and pervasive sensing technologies found in smart homes offer unprecedented opportunities for providing health monitoring and assistance to individuals experiencing difficulties living independently at home. In order to monitor the functional health of smart home residents, we need to design technologies that recognize and track activities that people normally perform as part of their daily routines. Although approaches do exist for recognizing activities, the approaches are applied to activities that have been pre-selected and for which labeled training data is available. In contrast, we introduce an automated approach to activity tracking that identifies frequent activities that naturally occur in an individual’s routine. With this capability we can then track the occurrence of regular activities to monitor functional health and to detect changes in an individual’s patterns and lifestyle. In this paper we describe our activity mining and tracking approach and validate our algorithms on data collected in physical smart environments. PMID:21617742

  11. Spatial patterns of cadmium and lead deposition on and adjacent to National Park Service lands in the vicinity of Red Dog Mine, Alaska.

    PubMed

    Hasselbach, L; Ver Hoef, J M; Ford, J; Neitlich, P; Crecelius, E; Berryman, S; Wolk, B; Bohle, T

    2005-09-15

    Heavy metal escapement associated with ore trucks is known to occur along the DeLong Mountain Regional Transportation System (DMTS) haul road corridor in Cape Krusenstern National Monument, northwest Alaska. Heavy metal concentrations in Hylocomium splendens moss (n = 226) were used in geostatistical models to predict the extent and pattern of atmospheric deposition of Cd and Pb on Monument lands. A stratified grid-based sample design was used with more intensive sampling near mine-related activity areas. Spatial predictions were used to produce maps of concentration patterns, and to estimate the total area in 10 moss concentration categories. Heavy metal levels in moss were highest immediately adjacent to the DMTS haul road (Cd > 24 mg/kg dw; Pb > 900 mg/kg dw). Spatial regression analyses indicated that heavy metal deposition decreased with the log of distance from the DMTS haul road and the DMTS port site. Analysis of subsurface soil suggested that observed patterns of heavy metal deposition reflected in moss were not attributable to subsurface lithology at the sample points. Further, moss Pb concentrations throughout the northern half of the study area were high relative to concentrations previously reported from other Arctic Alaska sites. Collectively, these findings indicate the presence of mine-related heavy metal deposition throughout the northern portion of Cape Krusenstern National Monument. Geospatial analyses suggest that the Pb depositional area extends 25 km north of the haul road to the Kisimilot/Iyikrok hills, and possibly beyond. More study is needed to determine whether higher moss heavy metal concentrations in the northernmost portion of the study area reflect deposition from mining-related activities, weathering from mineralized Pb/Zn outcrops in the broader region, or a combination of the two. South of the DMTS haul road, airborne deposition appears to be constrained by the Tahinichok Mountains. Heavy metal levels continue to diminish south of the mountains, reaching a minimum in the southernmost portion of the study area near the Igichuk Hills (45 km from the haul road). The influence of the mine site was not studied.

  12. Historical archaeology at the Clarkson Mine, an eastern Ohio mining complex

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Keener, C.S.

    2003-07-01

    This study examines the Clarkson Mine (33BL333), an eastern Ohio coal mine complex dating to the 1910s to 1920s, situated along Wheeling Creek. The results of preliminary surveys and the subsequent mitigation of four structures at the site are presented. The historical archaeology conducted at the site demonstrates the significant research possibilities inherent at many of these early industrial mine complexes. Of particular interest is the findings of depositional patterning around residential structures that revealed the influence of architecture on where and how items were deposited on the land surface. The ceramic and faunal assemblage were analyzed and provide significantmore » details on socioeconomic attributes associated with the workers or staff. Artifacts recovered at the site provide an excellent diagnostic framework from which other similarly aged sites can be compared and dated. The findings at the Clarkson Mine are also placed into a more regional perspective and compared with other contemporary studies.« less

  13. A Bayesian Scoring Technique for Mining Predictive and Non-Spurious Rules

    PubMed Central

    Batal, Iyad; Cooper, Gregory; Hauskrecht, Milos

    2015-01-01

    Rule mining is an important class of data mining methods for discovering interesting patterns in data. The success of a rule mining method heavily depends on the evaluation function that is used to assess the quality of the rules. In this work, we propose a new rule evaluation score - the Predictive and Non-Spurious Rules (PNSR) score. This score relies on Bayesian inference to evaluate the quality of the rules and considers the structure of the rules to filter out spurious rules. We present an efficient algorithm for finding rules with high PNSR scores. The experiments demonstrate that our method is able to cover and explain the data with a much smaller rule set than existing methods. PMID:25938136

  14. A Bayesian Scoring Technique for Mining Predictive and Non-Spurious Rules.

    PubMed

    Batal, Iyad; Cooper, Gregory; Hauskrecht, Milos

    Rule mining is an important class of data mining methods for discovering interesting patterns in data. The success of a rule mining method heavily depends on the evaluation function that is used to assess the quality of the rules. In this work, we propose a new rule evaluation score - the Predictive and Non-Spurious Rules (PNSR) score. This score relies on Bayesian inference to evaluate the quality of the rules and considers the structure of the rules to filter out spurious rules. We present an efficient algorithm for finding rules with high PNSR scores. The experiments demonstrate that our method is able to cover and explain the data with a much smaller rule set than existing methods.

  15. EMRlog method for computer security for electronic medical records with logic and data mining.

    PubMed

    Martínez Monterrubio, Sergio Mauricio; Frausto Solis, Juan; Monroy Borja, Raúl

    2015-01-01

    The proper functioning of a hospital computer system is an arduous work for managers and staff. However, inconsistent policies are frequent and can produce enormous problems, such as stolen information, frequent failures, and loss of the entire or part of the hospital data. This paper presents a new method named EMRlog for computer security systems in hospitals. EMRlog is focused on two kinds of security policies: directive and implemented policies. Security policies are applied to computer systems that handle huge amounts of information such as databases, applications, and medical records. Firstly, a syntactic verification step is applied by using predicate logic. Then data mining techniques are used to detect which security policies have really been implemented by the computer systems staff. Subsequently, consistency is verified in both kinds of policies; in addition these subsets are contrasted and validated. This is performed by an automatic theorem prover. Thus, many kinds of vulnerabilities can be removed for achieving a safer computer system.

  16. EMRlog Method for Computer Security for Electronic Medical Records with Logic and Data Mining

    PubMed Central

    Frausto Solis, Juan; Monroy Borja, Raúl

    2015-01-01

    The proper functioning of a hospital computer system is an arduous work for managers and staff. However, inconsistent policies are frequent and can produce enormous problems, such as stolen information, frequent failures, and loss of the entire or part of the hospital data. This paper presents a new method named EMRlog for computer security systems in hospitals. EMRlog is focused on two kinds of security policies: directive and implemented policies. Security policies are applied to computer systems that handle huge amounts of information such as databases, applications, and medical records. Firstly, a syntactic verification step is applied by using predicate logic. Then data mining techniques are used to detect which security policies have really been implemented by the computer systems staff. Subsequently, consistency is verified in both kinds of policies; in addition these subsets are contrasted and validated. This is performed by an automatic theorem prover. Thus, many kinds of vulnerabilities can be removed for achieving a safer computer system. PMID:26495300

  17. Discovering weighted patterns in intron sequences using self-adaptive harmony search and back-propagation algorithms.

    PubMed

    Huang, Yin-Fu; Wang, Chia-Ming; Liou, Sing-Wu

    2013-01-01

    A hybrid self-adaptive harmony search and back-propagation mining system was proposed to discover weighted patterns in human intron sequences. By testing the weights under a lazy nearest neighbor classifier, the numerical results revealed the significance of these weighted patterns. Comparing these weighted patterns with the popular intron consensus model, it is clear that the discovered weighted patterns make originally the ambiguous 5SS and 3SS header patterns more specific and concrete.

  18. Discovering Weighted Patterns in Intron Sequences Using Self-Adaptive Harmony Search and Back-Propagation Algorithms

    PubMed Central

    Wang, Chia-Ming; Liou, Sing-Wu

    2013-01-01

    A hybrid self-adaptive harmony search and back-propagation mining system was proposed to discover weighted patterns in human intron sequences. By testing the weights under a lazy nearest neighbor classifier, the numerical results revealed the significance of these weighted patterns. Comparing these weighted patterns with the popular intron consensus model, it is clear that the discovered weighted patterns make originally the ambiguous 5SS and 3SS header patterns more specific and concrete. PMID:23737711

  19. Mapping Hydrothermal Alteration Zones at a Sediment-Hosted Gold Deposit - Goldstrike Mining District, Utah, Using Ground-Based Hyperspectral Imaging

    NASA Astrophysics Data System (ADS)

    Krupnik, D.; Khan, S.; Crockett, M.

    2017-12-01

    Understanding the origin, genesis, as well as depositional and structural mechanisms of gold mineralization as well as detailed mapping of gold-bearing mineral phases at centimeter scale can be useful for exploration. This work was conducted in the Goldstrike mining district near St. George, UT, a structurally complex region which contains Carlin-style disseminated gold deposits in permeable sedimentary layers near high-angle fault zones. These fault zones are likely a conduit for gold-bearing hydrothermal fluids, are silicified, and are frequently gold-bearing. Alteration patterns are complex, difficult to distinguish visually, composed of several phases, and vary significantly over centimeter to meter scale distances. This makes identifying and quantifying the extent of the target zones costly, time consuming, and discontinuous with traditional geochemical methods. A ground-based hyperspectral scanning system with sensors collecting data in the Visible Near Infrared (VNIR) and Short-Wave Infrared (SWIR) portions of the electromagnetic spectrum are utilized for close-range outcrop scanning. Scans were taken of vertical exposures of both gold-bearing and barren silicified rocks (jasperoids), with the intent to produce images which delineate and quantify the extent of each phase of alteration, in combination with discrete geochemical data. This ongoing study produces mineralogical maps of surface minerals at centimeter scale, with the intent of mapping original and alteration minerals. This efficient method of outcrop characterization increases our understanding of fluid flow and alteration of economic deposits.

  20. Land Use on the Island of Oahu, Hawaii, 1998

    USGS Publications Warehouse

    Klasner, Frederick L.; Mikami, Clinton D.

    2003-01-01

    A hierarchical land-use classification system for Hawaii was developed, and land use on the island of Oahu was mapped. The land-use classification system emphasizes agriculture, developed (urban), and barren/mining uses. Areas with other land uses (conservation, forest reserve, natural areas, wetlands, water, and barren [sand, rock, or soil] regions, and unmanaged vegetation [native or exotic]) were defined as 'other.' Multiple sources of digital orthophotographs from 1998 and 1999 were used as source data. The 1998 island of Oahu land-use data are provided in digital format at http://water.usgs.gov/lookup/getspatial?oahu_lu98 for use in a Geographic Information System (GIS), at 1:24,000-scale with minimum mapping units of 2 hectares (4.9 acres) area and 30-meters (98.4 feet) feature width. In 1998, a total of 59,195 acres (15.4 percent) of the island of Oahu were classified as agricultural land use; 98,663 acres (25.7 percent) were classified as developed; 1,522 acres (0.4 percent) were classified as barren/mining; and 224,331 acres (58.5 percent) were classified as other. An accuracy assessment identified 98 percent accuracy for all land-use classes. In windward (moister) areas, dense vegetation and canopy cover along with rapid recolonization by vegetation potentially obscured land use from photo-interpretation. While in leeward (drier) areas, sparse vegetative cover and slower vegetation recolonization may have resulted in more frequent recognition of apparent land-use patterns.

  1. A Note on Interfacing Object Warehouses and Mass Storage Systems for Data Mining Applications

    NASA Technical Reports Server (NTRS)

    Grossman, Robert L.; Northcutt, Dave

    1996-01-01

    Data mining is the automatic discovery of patterns, associations, and anomalies in data sets. Data mining requires numerically and statistically intensive queries. Our assumption is that data mining requires a specialized data management infrastructure to support the aforementioned intensive queries, but because of the sizes of data involved, this infrastructure is layered over a hierarchical storage system. In this paper, we discuss the architecture of a system which is layered for modularity, but exploits specialized lightweight services to maintain efficiency. Rather than use a full functioned database for example, we use light weight object services specialized for data mining. We propose using information repositories between layers so that components on either side of the layer can access information in the repositories to assist in making decisions about data layout, the caching and migration of data, the scheduling of queries, and related matters.

  2. A comparison of Eichhornia crassipes (Pontederiaceae) and Sphagnum quinquefarium (Sphagnaceae) in treatment of acid mine water

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Falbo, M.B.; Weaks, T.E.

    Tests were conducted under greenhouse conditions to evaluate the ability of Eichhornia crassipes (Pontederiaceae) and Sphagnum quinquefarium (Sphagnaceae) to ameliorate acid mine water discharged from coal operations. In addition, the survivorship and growth rate of E. crassipes (water-hyacinth), cultured in toxic acid mine water, were determined. The results of both short- and long-term studies indicated that E. crassipes readily reduced levels of heavy metals in acid mine water while the plants exhibited few signs of toxicity. Patterns of reduction of pollutants, for both E. crassipes and S. quinquefarium indicated that treatment efficiency could be improved by the periodic harvesting ofmore » plants. It is suggested that the ease with which water-hyacinths can be introduced into wetlands and harvested cannot be economically duplicated with other plants currently in use in treating acid mine water.« less

  3. Study on online community user motif using web usage mining

    NASA Astrophysics Data System (ADS)

    Alphy, Meera; Sharma, Ajay

    2016-04-01

    The Web usage mining is the application of data mining, which is used to extract useful information from the online community. The World Wide Web contains at least 4.73 billion pages according to Indexed Web and it contains at least 228.52 million pages according Dutch Indexed web on 6th august 2015, Thursday. It’s difficult to get needed data from these billions of web pages in World Wide Web. Here is the importance of web usage mining. Personalizing the search engine helps the web user to identify the most used data in an easy way. It reduces the time consumption; automatic site search and automatic restore the useful sites. This study represents the old techniques to latest techniques used in pattern discovery and analysis in web usage mining from 1996 to 2015. Analyzing user motif helps in the improvement of business, e-commerce, personalisation and improvement of websites.

  4. Activity recognition from minimal distinguishing subsequence mining

    NASA Astrophysics Data System (ADS)

    Iqbal, Mohammad; Pao, Hsing-Kuo

    2017-08-01

    Human activity recognition is one of the most important research topics in the era of Internet of Things. To separate different activities given sensory data, we utilize a Minimal Distinguishing Subsequence (MDS) mining approach to efficiently find distinguishing patterns among different activities. We first transform the sensory data into a series of sensor triggering events and operate the MDS mining procedure afterwards. The gap constraints are also considered in the MDS mining. Given the multi-class nature of most activity recognition tasks, we modify the MDS mining approach from a binary case to a multi-class one to fit the need for multiple activity recognition. We also study how to select the best parameter set including the minimal and the maximal support thresholds in finding the MDSs for effective activity recognition. Overall, the prediction accuracy is 86.59% on the van Kasteren dataset which consists of four different activities for recognition.

  5. Survey of Analysis of Crime Detection Techniques Using Data Mining and Machine Learning

    NASA Astrophysics Data System (ADS)

    Prabakaran, S.; Mitra, Shilpa

    2018-04-01

    Data mining is the field containing procedures for finding designs or patterns in a huge dataset, it includes strategies at the convergence of machine learning and database framework. It can be applied to various fields like future healthcare, market basket analysis, education, manufacturing engineering, crime investigation etc. Among these, crime investigation is an interesting application to process crime characteristics to help the society for a better living. This paper survey various data mining techniques used in this domain. This study may be helpful in designing new strategies for crime prediction and analysis.

  6. Autumn olive (Elaeagnus umbellata) presence and proliferation on former surface coal mines in Eastern USA

    USGS Publications Warehouse

    Oliphant, Adam J.; Wynne, R.H.; Zipper, Carl E.; Ford, W. Mark; Donovan, P. F.; Li, Jing

    2017-01-01

    Invasive plants threaten native plant communities. Surface coal mines in the Appalachian Mountains are among the most disturbed landscapes in North America, but information about land cover characteristics of Appalachian mined lands is lacking. The invasive shrub autumn olive (Elaeagnus umbellata) occurs on these sites and interferes with ecosystem recovery by outcompeting native trees, thus inhibiting re-establishment of the native woody-plant community. We analyzed Landsat 8 satellite imagery to describe autumn olive’s distribution on post-mined lands in southwestern Virginia within the Appalachian coalfield. Eight images from April 2013 through January 2015 served as input data. Calibration and validation data obtained from high-resolution aerial imagery were used to develop a land cover classification model that identified areas where autumn olive was a primary component of land cover. Results indicate that autumn olive cover was sufficiently dense to enable detection on approximately 12.6 % of post-mined lands within the study area. The classified map had user’s and producer’s accuracies of 85.3 and 78.6 %, respectively, for the autumn olive coverage class. Overall accuracy was assessed in reference to an independent validation dataset at 96.8 %. Autumn olive was detected more frequently on mines disturbed prior to 2003, the last year of known plantings, than on lands disturbed by more recent mining. These results indicate that autumn olive growing on reclaimed coal mines in Virginia and elsewhere in eastern USA can be mapped using Landsat 8 Operational Land Imager imagery; and that autumn olive occurrence is a significant landscape vegetation feature on former surface coal mines in the southwestern Virginia segment of the Appalachian coalfield.

  7. Examining health and well-being outcomes associated with mining activity in rural communities of high-income countries: A systematic review.

    PubMed

    Mactaggart, Fiona; McDermott, Liane; Tynan, Anna; Gericke, Christian

    2016-08-01

    It is recognised internationally that rural communities often experience greater barriers to accessing services and have poorer health outcomes compared to urban communities. In some settings, health disparities may be further exacerbated by mining activity, which can affect the social, physical and economic environment in which rural communities reside. Direct environmental health impacts are often associated with mining activity and are frequently investigated. However, there is evidence of broader, indirect health and well-being implications emerging in the literature. This systematic review examines these health and well-being outcomes in communities living in proximity to mining in high-income countries, and, in doing so, discusses their possible determinants. Four databases were systematically searched. Articles were selected if adult residents in mining communities were studied and outcomes were related to health or individual or community-level well-being. A narrative synthesis was conducted. Sixteen publications were included. Evidence of increased prevalence of chronic diseases and poor self-reported health status was reported in the mining communities. Relationship breakdown and poor family health, lack of social connectedness and decreased access to health services were also reported. Changes to the physical landscape; risky health behaviours; shift work of partners in the mine industry; social isolation and cyclical nature of 'boom and bust' activity contributed to poorer outcomes in the communities. This review highlights the broader health and well-being outcomes associated with mining activity that should be monitored and addressed in addition to environmental health impacts to support co-existence of mining activities and rural communities. © 2016 National Rural Health Alliance Inc.

  8. Sleep Patterns of Naval Aviation Personnel Conducting Mine Hunting Operations

    DTIC Science & Technology

    2006-09-01

    Personnel Conducting Mine Hunting Operations 6. AUTHOR(S) Bennett Solberg 5. FUNDING NUMBERS 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES...Naval Postgraduate School Monterey, CA 93943-5000 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING /MONITORING AGENCY NAME(S) AND...human performance , resulting in predictable changes not only on the individual level but also on the system as a whole. This descriptive study

  9. Population Change in West Virginia 1950-1970. West Virginia University Agricultural and Forestry Experiment Station Bulletin 658.

    ERIC Educational Resources Information Center

    Sizer, Leonard M.

    Growth patterns of the national economy during the 1950's and 1960's have not been shared by the state of West Virginia; towns and rural areas have lost population and job opportunities have declined. The switch to petroleum products and advanced mining technology displaced many coal mine workers. A national food surplus and the difficulty in…

  10. Practice-Relevant Pedagogy for Mining Software Engineering Curricula Assets

    DTIC Science & Technology

    2007-06-20

    permits the application of the Lean methods by virtually grouping shared services into eWorkcenters to which only non-routine requests are routed...engineering can be applied to IT shared services improvement and provide precise system improvement methods to complement the ITIL best practice. This...Vertical� or internal service- chain of primary business functions and enabling shared services Framework results - Mined patterns that relate

  11. Development of floristic diversity in 10-year-old restoration forests on a bauxite mined site in Amazonia.

    Treesearch

    J. A. Parrotta; O. H. Knowles; J.M. Wunderle Jr.

    1997-01-01

    Patterns of plant and animal diversity were studied in a 10-year-old native species reforestation area at a bauxite-mined site at porto Trombetas in western Para State, Brazil. Understorey and overstorey floristic composition and structure, understorey light conditions, forest floor development and soil properties were evaluated in a total of 38 78.5-m2

  12. Exploring Online Students' Self-Regulated Learning with Self-Reported Surveys and Log Files: A Data Mining Approach

    ERIC Educational Resources Information Center

    Cho, Moon-Heum; Yoo, Jin Soung

    2017-01-01

    Many researchers who are interested in studying students' online self-regulated learning (SRL) have heavily relied on self-reported surveys. Data mining is an alternative technique that can be used to discover students' SRL patterns from large data logs saved on a course management system. The purpose of this study was to identify students' online…

  13. Pattern formation in mass conserving reaction-diffusion systems

    NASA Astrophysics Data System (ADS)

    Brauns, Fridtjof; Halatek, Jacob; Frey, Erwin

    We present a rigorous theoretical framework able to generalize and unify pattern formation for quantitative mass conserving reaction-diffusion models. Mass redistribution controls chemical equilibria locally. Separation of diffusive mass redistribution on the level of conserved species provides a general mathematical procedure to decompose complex reaction-diffusion systems into effectively independent functional units, and to reveal the general underlying bifurcation scenarios. We apply this framework to Min protein pattern formation and identify the mechanistic roles of both involved protein species. MinD generates polarity through phase separation, whereas MinE takes the role of a control variable regulating the existence of MinD phases. Hence, polarization and not oscillations is the generic core dynamics of Min proteins in vivo. This establishes an intrinsic mechanistic link between the Min system and a broad class of intracellular pattern forming systems based on bistability and phase separation (wave-pinning). Oscillations are facilitated by MinE redistribution and can be understood mechanistically as relaxation oscillations of the polarization direction.

  14. Knowledge Discovery and Data Mining in Iran's Climatic Researches

    NASA Astrophysics Data System (ADS)

    Karimi, Mostafa

    2013-04-01

    Advances in measurement technology and data collection is the database gets larger. Large databases require powerful tools for analysis data. Iterative process of acquiring knowledge from information obtained from data processing is done in various forms in all scientific fields. However, when the data volume large, and many of the problems the Traditional methods cannot respond. in the recent years, use of databases in various scientific fields, especially atmospheric databases in climatology expanded. in addition, increases in the amount of data generated by the climate models is a challenge for analysis of it for extraction of hidden pattern and knowledge. The approach to this problem has been made in recent years uses the process of knowledge discovery and data mining techniques with the use of the concepts of machine learning, artificial intelligence and expert (professional) systems is overall performance. Data manning is analytically process for manning in massive volume data. The ultimate goal of data mining is access to information and finally knowledge. climatology is a part of science that uses variety and massive volume data. Goal of the climate data manning is Achieve to information from variety and massive atmospheric and non-atmospheric data. in fact, Knowledge Discovery performs these activities in a logical and predetermined and almost automatic process. The goal of this research is study of uses knowledge Discovery and data mining technique in Iranian climate research. For Achieve This goal, study content (descriptive) analysis and classify base method and issue. The result shown that in climatic research of Iran most clustering, k-means and wards applied and in terms of issues precipitation and atmospheric circulation patterns most introduced. Although several studies in geography and climate issues with statistical techniques such as clustering and pattern extraction is done, Due to the nature of statistics and data mining, but cannot say for internal climate studies in data mining and knowledge discovery techniques are used. However, it is necessary to use the KDD Approach and DM techniques in the climatic studies, specific interpreter of climate modeling result.

  15. Prescription pattern of traditional Chinese medicine for climacteric women in Taiwan.

    PubMed

    Yang, Y-H; Chen, P-C; Wang, J-D; Lee, C-H; Lai, J-N

    2009-12-01

    Traditional Chinese medicine (TCM) has become more popular as a therapy for symptom relief among menopause-aged women. The aim of this study was to analyze the utilization of TCM for climacteric women in Taiwan. The study analyzed frequency distributions among 19 379 women aged 45-55 years, recruited from a random-sampled cohort of 200 000 people from the National Health Insurance database. Data mining was conducted to explore the co-prescription patterns for finished herbal products (FHP). There were 19 379 women aged 45-55 years in the sample; of these, 12 572 (64.9%) utilized TCM services at least once. A total of 4078 (21.0%) of the 19 379 climacteric women utilized 145 200 (79.2%) TCM visits. Of these, 39 802 (21.7%) visits were because of diseases of the musculoskeletal system and connective tissue, of which more than half were treated with acupuncture and traumatology manipulative therapies. There were 28 154 visits with FHP prescriptions because of non-specific symptoms and ill-defined conditions, and Jia-wei-xiao-yao-san was the most frequent formula. Nearly two-thirds of FHP contained more than two herbal formulae. Women of climacteric age in Taiwan utilized TCM more often than other age groups. To deal with multiple symptoms and/or diseases among climacteric women, new prescription patterns of combining two or more herbal formulae have evolved. Studies on safety issues and drug-herb interactions are warranted for future research.

  16. Odiel River, acid mine drainage and current characterisation by means of univariate analysis.

    PubMed

    Sainz, A; Grande, J A; de la Torre, M L

    2003-04-01

    Water pollution caused by sulfide oxidation responds to two geochemical processes: a natural one of temporal patterns, and the 'acid mine drainage', an accelerated process derived from the extractive activity. The Odiel River is located in Southwestern Spain; it flows to the south and into the Atlantic Ocean after joining the Tinto River near its mouth, forming a common estuary. There are three kinds of metallic mining in the Odiel River Basin: manganese, gold and silver, and pyrite mining, the latter being the most important in this basin, which is the object of this study. The main objective of the present study is centred in the characterisation of the sources responsible for the 'acid mine drainage' processes in the Odiel River Basin, through the sampling and subsequent chemical and statistical analyses of water samples collected in three types of sources: mine dumps, active mines and abandoned mines. The main conclusion is that mean pH values in the target area are remarkably lower than those in other active and abandoned mines outside of the study zone. On the contrary, mean values for heavy metal sulfates are much higher. Regarding mine dumps, mean values for pH, sulfates and heavy metals are within a similar range to those data known for areas outside the study zone. Copyright 2003 Elsevier Science Ltd.

  17. Research on Occupational Safety, Health Management and Risk Control Technology in Coal Mines.

    PubMed

    Zhou, Lu-Jie; Cao, Qing-Gui; Yu, Kai; Wang, Lin-Lin; Wang, Hai-Bin

    2018-04-26

    This paper studies the occupational safety and health management methods as well as risk control technology associated with the coal mining industry, including daily management of occupational safety and health, identification and assessment of risks, early warning and dynamic monitoring of risks, etc.; also, a B/S mode software (Geting Coal Mine, Jining, Shandong, China), i.e., Coal Mine Occupational Safety and Health Management and Risk Control System, is developed to attain the aforementioned objectives, namely promoting the coal mine occupational safety and health management based on early warning and dynamic monitoring of risks. Furthermore, the practical effectiveness and the associated pattern for applying this software package to coal mining is analyzed. The study indicates that the presently developed coal mine occupational safety and health management and risk control technology and the associated software can support the occupational safety and health management efforts in coal mines in a standardized and effective manner. It can also control the accident risks scientifically and effectively; its effective implementation can further improve the coal mine occupational safety and health management mechanism, and further enhance the risk management approaches. Besides, its implementation indicates that the occupational safety and health management and risk control technology has been established based on a benign cycle involving dynamic feedback and scientific development, which can provide a reliable assurance to the safe operation of coal mines.

  18. Research on Occupational Safety, Health Management and Risk Control Technology in Coal Mines

    PubMed Central

    Zhou, Lu-jie; Cao, Qing-gui; Yu, Kai; Wang, Lin-lin; Wang, Hai-bin

    2018-01-01

    This paper studies the occupational safety and health management methods as well as risk control technology associated with the coal mining industry, including daily management of occupational safety and health, identification and assessment of risks, early warning and dynamic monitoring of risks, etc.; also, a B/S mode software (Geting Coal Mine, Jining, Shandong, China), i.e., Coal Mine Occupational Safety and Health Management and Risk Control System, is developed to attain the aforementioned objectives, namely promoting the coal mine occupational safety and health management based on early warning and dynamic monitoring of risks. Furthermore, the practical effectiveness and the associated pattern for applying this software package to coal mining is analyzed. The study indicates that the presently developed coal mine occupational safety and health management and risk control technology and the associated software can support the occupational safety and health management efforts in coal mines in a standardized and effective manner. It can also control the accident risks scientifically and effectively; its effective implementation can further improve the coal mine occupational safety and health management mechanism, and further enhance the risk management approaches. Besides, its implementation indicates that the occupational safety and health management and risk control technology has been established based on a benign cycle involving dynamic feedback and scientific development, which can provide a reliable assurance to the safe operation of coal mines. PMID:29701715

  19. Proceedings: Fourth Workshop on Mining Scientific Datasets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kamath, C

    Commercial applications of data mining in areas such as e-commerce, market-basket analysis, text-mining, and web-mining have taken on a central focus in the JCDD community. However, there is a significant amount of innovative data mining work taking place in the context of scientific and engineering applications that is not well represented in the mainstream KDD conferences. For example, scientific data mining techniques are being developed and applied to diverse fields such as remote sensing, physics, chemistry, biology, astronomy, structural mechanics, computational fluid dynamics etc. In these areas, data mining frequently complements and enhances existing analysis methods based on statistics, exploratorymore » data analysis, and domain-specific approaches. On the surface, it may appear that data from one scientific field, say genomics, is very different from another field, such as physics. However, despite their diversity, there is much that is common across the mining of scientific and engineering data. For example, techniques used to identify objects in images are very similar, regardless of whether the images came from a remote sensing application, a physics experiment, an astronomy observation, or a medical study. Further, with data mining being applied to new types of data, such as mesh data from scientific simulations, there is the opportunity to apply and extend data mining to new scientific domains. This one-day workshop brings together data miners analyzing science data and scientists from diverse fields to share their experiences, learn how techniques developed in one field can be applied in another, and better understand some of the newer techniques being developed in the KDD community. This is the fourth workshop on the topic of Mining Scientific Data sets; for information on earlier workshops, see http://www.ahpcrc.org/conferences/. This workshop continues the tradition of addressing challenging problems in a field where the diversity of applications is matched only by the opportunities that await a practitioner.« less

  20. Mining and harnessing natural variation - a little MAGIC

    USDA-ARS?s Scientific Manuscript database

    As has been frequently noted, exotic germplasm ( lines unadapted to local conditions) can be sources of very beneficial genes. The trouble is that it's often difficult to identify these genes. We propose an approach in which mutations can be used to uncover useful variants of natural genes....

  1. The requirements for implementing Sustainable Development Goals (SDGs) and for planning and implementing Integrated Territorial Investments (ITI) in mining areas

    NASA Astrophysics Data System (ADS)

    Florkowska, Lucyna; Bryt-Nitarska, Izabela

    2018-04-01

    The notion of Integrated Territorial Investments (ITI) appears more and more frequently in contemporary regional development strategies. Formulating the main assumptions of ITI is a response to a growing need for a co-ordinated, multi-dimensional regional development suitable for the characteristics of a given area. Activities are mainly aimed at improving people's quality of life with their significant participation. These activities include implementing the Sustainable development Goals (SDGs). Territorial investments include, among others, projects in areas where land and building use is governed not only by general regulations (Spatial Planning and Land Development Act) but also by separate legal acts. This issue also concerns areas with active mines and post-mining areas undergoing revitalization. For the areas specified above land development and in particular making building investments is subject to the requirements set forth in the Geological and Mining Law and in the general regulations. In practice this means that factors connected with the present and future mining impacts must be taken into consideration in planning the investment process. This article discusses the role of proper assessment of local geological conditions as well as the current and future mining situation in the context of proper planning and performance of the Integrated Territorial Investment programme and also in the context of implementing the SDGs. It also describes the technical and legislative factors which need to be taken into consideration in areas where mining is planned or where it took place in the past.

  2. HC StratoMineR: A Web-Based Tool for the Rapid Analysis of High-Content Datasets.

    PubMed

    Omta, Wienand A; van Heesbeen, Roy G; Pagliero, Romina J; van der Velden, Lieke M; Lelieveld, Daphne; Nellen, Mehdi; Kramer, Maik; Yeong, Marley; Saeidi, Amir M; Medema, Rene H; Spruit, Marco; Brinkkemper, Sjaak; Klumperman, Judith; Egan, David A

    2016-10-01

    High-content screening (HCS) can generate large multidimensional datasets and when aligned with the appropriate data mining tools, it can yield valuable insights into the mechanism of action of bioactive molecules. However, easy-to-use data mining tools are not widely available, with the result that these datasets are frequently underutilized. Here, we present HC StratoMineR, a web-based tool for high-content data analysis. It is a decision-supportive platform that guides even non-expert users through a high-content data analysis workflow. HC StratoMineR is built by using My Structured Query Language for storage and querying, PHP: Hypertext Preprocessor as the main programming language, and jQuery for additional user interface functionality. R is used for statistical calculations, logic and data visualizations. Furthermore, C++ and graphical processor unit power is diffusely embedded in R by using the rcpp and rpud libraries for operations that are computationally highly intensive. We show that we can use HC StratoMineR for the analysis of multivariate data from a high-content siRNA knock-down screen and a small-molecule screen. It can be used to rapidly filter out undesirable data; to select relevant data; and to perform quality control, data reduction, data exploration, morphological hit picking, and data clustering. Our results demonstrate that HC StratoMineR can be used to functionally categorize HCS hits and, thus, provide valuable information for hit prioritization.

  3. Effects of coal mining, forestry, and road construction on southern Appalachian stream invertebrates and habitats.

    PubMed

    Gangloff, Michael M; Perkins, Michael; Blum, Peter W; Walker, Craig

    2015-03-01

    Coal has been extracted via surface and sub-surface mining for decades throughout the Appalachian Mountains. New interest in ridge-top mining has raised concerns about possible waterway impacts. We examined effects of forestry, mining, and road construction-based disturbance on physico-chemistry and macroinvertebrate communities in east-central Tennessee headwater streams. Although 11 of 30 sites failed Tennessee's biocriteria scoring system, invertebrate richness was moderately high and we did not find significant differences in any water chemistry or habitat parameters between sites with passing and failing scores. However, conductivity and dissolved solid concentrations appeared elevated in the majority of study streams. Principal components (PCs) analysis indicated that six PCs accounted for ~77 % of among-site habitat variability. One PC associated with dissolved oxygen and specific conductance explained the second highest proportion of among-site variability after catchment area. Specific conductance was not correlated with catchment area but was strongly correlated with mining activity. Composition and success of multivariate models using habitat PCs to predict macroinvertebrate metrics was highly variable. PC scores associated with water chemistry and substrate composition were most frequently included in significant models. These results suggest that impacts of historical and current coal mining remain a source of water quality and macroinvertebrate community impairment in this region, but effects are subtle. Our results suggest that surface mining may have chronic and system-wide effects on habitat conditions and invertebrate communities in Cumberland Plateau streams.

  4. Course-Taking Patterns of Community College Students Beginning in STEM: Using Data Mining Techniques to Reveal Viable STEM Transfer Pathways

    ERIC Educational Resources Information Center

    Wang, Xueli

    2016-01-01

    This research focuses on course-taking patterns of beginning community college students enrolled in one or more non-remedial science, technology, engineering, and mathematics (STEM) courses during their first year of college, and how these patterns are mapped against upward transfer in STEM fields of study. Drawing upon postsecondary transcript…

  5. Trend Motif: A Graph Mining Approach for Analysis of Dynamic Complex Networks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jin, R; McCallen, S; Almaas, E

    2007-05-28

    Complex networks have been used successfully in scientific disciplines ranging from sociology to microbiology to describe systems of interacting units. Until recently, studies of complex networks have mainly focused on their network topology. However, in many real world applications, the edges and vertices have associated attributes that are frequently represented as vertex or edge weights. Furthermore, these weights are often not static, instead changing with time and forming a time series. Hence, to fully understand the dynamics of the complex network, we have to consider both network topology and related time series data. In this work, we propose a motifmore » mining approach to identify trend motifs for such purposes. Simply stated, a trend motif describes a recurring subgraph where each of its vertices or edges displays similar dynamics over a userdefined period. Given this, each trend motif occurrence can help reveal significant events in a complex system; frequent trend motifs may aid in uncovering dynamic rules of change for the system, and the distribution of trend motifs may characterize the global dynamics of the system. Here, we have developed efficient mining algorithms to extract trend motifs. Our experimental validation using three disparate empirical datasets, ranging from the stock market, world trade, to a protein interaction network, has demonstrated the efficiency and effectiveness of our approach.« less

  6. [Features of Professor Ma Kun's medication in treating ovulatory infertility].

    PubMed

    Tong, Ya-Jing; Zhang, Hui-Xian; Chen, Yan-Xia; Dong, Mei-Ling; Ma, Kun

    2017-12-01

    In order to analyze Professor Ma Kun's medication in treating anovulatory infertility, her prescriptions for treating anovulatory infertility in 2012-2015 were collected. The medication features and the regularity of prescriptions were mined by using traditional Chinese medicine inheritance support system, association rules, complex system entropy clustering and other mining methods. Finally, a total of 684 prescriptions and 300 kinds of herbs were screened out, with a total frequency of 11 156 times; And 68 core combinations and 8 new prescriptions were mined. The top three frequently used herbs by effect were respectively tonic herb, blood circulation promoting herb, and Qi-circulation promoting herb. The top three tastes were sweetness, bitterness and pungent flavor. The results showed 28 herbs with a high frequency of ≥100.The top 10 frequently used herbs were respectively Angelica Sinensis Radix, Cyperi Rhizoma, Chuanxiong Rhizome, Paeoniae Radix Rubra, Cyathulae Radix, Taxilli Herba, Cuscutae Semen, Codonopsis Radix, Ligustri Lucidi Fructus, Paeoniae Albaand Paeoniae Radix Alba. The association rules analysis showed commonly used herbal pairs, including Rehmanniae Radix Preparata-Chuanxiong Rhizome, Rehmanniae Radix Preparata-Angelica Sinensis Radix, Cuscutae Semen-Dipsaci Radix. In conclusion, Professor Ma has treated anovulatory infertility by nourishing the kidney and activating blood throughout the treatment course, and attached the importance to the relationship between Qi and blood and there gulation of liver, spleen and kidney in treating anovulatory infertility. Copyright© by the Chinese Pharmaceutical Association.

  7. Prodromal signs and symptoms of serious infections with tocilizumab treatment for rheumatoid arthritis: Text mining of the Japanese postmarketing adverse event-reporting database.

    PubMed

    Atsumi, Tatsuya; Ando, Yoshiaki; Matsuda, Shinichi; Tomizawa, Shiho; Tanaka, Riwa; Takagi, Nobuhiro; Nakasone, Ayako

    2018-05-01

    To search for signs and symptoms before serious infection (SI) occurs in tocilizumab (TCZ)-treated rheumatoid arthritis (RA) patients. Individual case safety reports, including structured (age, sex, adverse event [AE]) and unstructured (clinical narratives) data, were analyzed by automated text mining from a Japanese post-marketing AE-reporting database (16 April 2008-10 April 2015) assuming the following: treated in Japan; TCZ RA treatment; ≥1 SI; unable to exclude causality between TCZ and SIs. The database included 7653 RA patients; 1221 reports met four criteria, encompassing 1591 SIs. Frequent SIs were pneumonia (15.9%), cellulitis (9.9%), and sepsis (5.0%). Reports for 782 patients included SI onset date; 60.7% of patients had signs/symptoms ≤28 days before SI diagnosis, 32.7% had signs/symptoms with date unidentified, 1.7% were asymptomatic, and 4.9% had unknown signs/symptoms. The most frequent signs/symptoms were for skin (swelling and pain) and respiratory (cough and pyrexia) infections. Among 68 patients who had normal laboratory results for C-reactive protein, body temperature, and white blood cell count, 94.1% had signs or symptoms of infection. This study identified prodromal signs and symptoms of SIs in RA patients receiving TCZ. Data mining clinical narratives from post-marketing AE databases may be beneficial in characterizing SIs.

  8. Changes in the Extent of Surface Mining and Reclamation in the Central Appalachians Detected Using a 1976-2006 Landsat Time Series

    NASA Technical Reports Server (NTRS)

    Townsend, Philip A.; Helmers, David P.; Kingdon, Clayton C.; McNeil, Brenden E.; de Beurs, Kirsten M.; Eshleman, Keith N.

    2009-01-01

    Surface mining and reclamation is the dominant driver of land cover land use change (LCLUC) in the Central Appalachian Mountain region of the Eastern U.S. Accurate quantification of the extent of mining activities is important for assessing how this LCLUC affects ecosystem services such as aesthetics, biodiversity, and mitigation of flooding.We used Landsat imagery from 1976, 1987, 1999 and 2006 to map the extent of surface mines and mine reclamation for eight large watersheds in the Central Appalachian region of West Virginia, Maryland and Pennsylvania. We employed standard image processing techniques in conjunction with a temporal decision tree and GIS maps of mine permits and wetlands to map active and reclaimed mines and track changes through time. For the entire study area, active surface mine extent was highest in 1976, prior to implementation of the Surface Mine Control and Reclamation Act in 1977, with 1.76% of the study area in active mines, declining to 0.44% in 2006. The most extensively mined watershed, Georges Creek in Maryland, was 5.45% active mines in 1976, declining to 1.83% in 2006. For the entire study area, the area of reclaimed mines increased from 1.35% to 4.99% from 1976 to 2006, and from 4.71% to 15.42% in Georges Creek. Land cover conversion to mines and then reclaimed mines after 1976 was almost exclusively from forest. Accuracy levels for mined and reclaimed cover was above 85% for all time periods, and was generally above 80% for mapping active and reclaimed mines separately, especially for the later time periods in which good accuracy assessment data were available. Among other implications, the mapped patterns of LCLUC are likely to significantly affect watershed hydrology, as mined and reclaimed areas have lower infiltration capacity and thus more rapid runoff than unmined forest watersheds, leading to greater potential for extreme flooding during heavy rainfall events.

  9. An investigation into heterogeneity in a single vein-type uranium ore deposit: Implications for nuclear forensics.

    PubMed

    Keatley, A C; Scott, T B; Davis, S; Jones, C P; Turner, P

    2015-12-01

    Minor element composition and rare earth element (REE) concentrations in nuclear materials are important as they are used within the field of nuclear forensics as an indicator of sample origin. However recent studies into uranium ores and uranium ore concentrates (UOCs) have shown significant elemental and isotopic heterogeneity from a single mine site such that some sites have shown higher variation within the mine site than that seen between multiple sites. The elemental composition of both uranium and gangue minerals within ore samples taken along a single mineral vein in South West England have been measured and reported here. The analysis of the samples was undertaken to determine the extent of the localised variation in key elements. Energy Dispersive X-ray spectroscopy (EDS) was used to analyse the gangue mineralogy and measure major element composition. Minor element composition and rare earth element (REE) concentrations were measured by Electron Probe Microanalysis (EPMA). The results confirm that a number of key elements, REE concentrations and patterns used for origin location do show significant variation within mine. Furthermore significant variation is also visible on a meter scale. In addition three separate uranium phases were identified within the vein which indicates multiple uranium mineralisation events. In light of these localised elemental variations it is recommended that representative sampling for an area is undertaken prior to establishing the REE pattern that may be used to identify the originating mine for an unknown ore sample and prior to investigating impact of ore processing on any arising REE patterns. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. Natural forest expansion on reclaimed coal mines in Northern Spain: the role of native shrubs as suitable microsites.

    PubMed

    Alday, Josu G; Zaldívar, Pilar; Torroba-Balmori, Paloma; Fernández-Santos, Belén; Martínez-Ruiz, Carolina

    2016-07-01

    The characterization of suitable microsites for tree seedling establishment and growth is one of the most important tasks to achieve the restoration of native forest using natural processes in disturbed sites. For that, we assessed the natural Quercus petraea forest expansion in a 20-year-old reclaimed open-cast mine under sub-Mediterranean climate in northern Spain, monitoring seedling survival, growth, and recruitment during 5 years in three contrasting environments (undisturbed forest, mine edge, and mine center). Seedling density and proportion of dead branches decreased greatly from undisturbed forest towards the center of the mine. There was a positive effect of shrubs on Q. petraea seedling establishment in both mine environments, which increase as the environment undergoes more stress (from the mine edge to the center of the mine), and it was produced by different shrub structural features in each mine environment. Seedling survival reduction through time in three environments did not lead to a density reduction because there was a yearly recruitment of new seedlings. Seedling survival, annual growth, and height through time were greater in mine sites than in the undisturbed forest. The successful colonization patterns and positive neighbor effect of shrubs on natural seedlings establishment found in this study during the first years support the use of shrubs as ecosystem engineers to increase heterogeneity in micro-environmental conditions on reclaimed mine sites, which improves late-successional Quercus species establishment.

  11. Occurrence and variability of mining-related lead and zinc in the Spring River flood plain and tributary flood plains, Cherokee County, Kansas, 2009--11

    USGS Publications Warehouse

    Juracek, Kyle E.

    2013-01-01

    Historical mining activity in the Tri-State Mining District (TSMD), located in parts of southeast Kansas, southwest Missouri, and northeast Oklahoma, has resulted in a substantial ongoing input of cadmium, lead, and zinc to the environment. To provide some of the information needed to support remediation efforts in the Cherokee County, Kansas, superfund site, a 4-year study was begun in 2009 by the U.S. Geological Survey that was requested and funded by the U.S. Environmental Protection Agency. A combination of surficial-soil sampling and coring was used to investigate the occurrence and variability of mining-related lead and zinc in the flood plains of the Spring River and several tributaries within the superfund site. Lead- and zinc-contaminated flood plains are a concern, in part, because they represent a long-term source of contamination to the fluvial environment. Lead and zinc contamination was assessed with reference to probable-effect concentrations (PECs), which represent the concentrations above which adverse aquatic biological effects are likely to occur. The general PECs for lead and zinc were 128 and 459 milligrams per kilogram, respectively. The TSMD-specific PECs for lead and zinc were 150 and 2,083 milligrams per kilogram, respectively. Typically, surficial soils in the Spring River flood plain had lead and zinc concentrations that were less than the general PECs. Lead and zinc concentrations in the surficial-soil samples were variable with distance downstream and with distance from the Spring River channel, and the largest lead and zinc concentrations usually were located near the channel. Lead and zinc concentrations larger than the general or TSMD-specific PECs, or both, were infrequent at depth in the Spring River flood plain. When present, such contamination typically was confined to the upper 2 feet of the core and frequently was confined to the upper 6 inches. Tributaries with few or no lead- and zinc-mined areas in the basin—Brush Creek, Cow Creek, and Shawnee Creek—generally had flood-plain lead and zinc concentrations (surficial soil, 6- and 12-inch depth) that were substantially less than the general PECs. Tributaries with extensive lead- and zinc-mined areas in the basin—Shoal Creek, Short Creek, Spring Branch, Tar Creek, Turkey Creek, and Willow Creek—had flood-plain lead concentrations (surficial soil, 6- and 12-inch depth) that frequently or typically exceeded the general and TSMD-specific PECs. Likewise, the tributaries with extensive lead- and zinc-mined areas in the basin had flood-plain zinc concentrations (surficial soil, 6- and 12-inch depth) that frequently or typically exceeded the general PEC. With the exception of Shoal and Willow Creeks, zinc concentrations typically exceeded the TSMD-specific PEC. The largest flood-plain lead and zinc concentrations (surficial soil, 6- and 12-inch depth) were measured for Short and Tar Creeks. Lead and zinc concentrations in the surficial-soil samples collected from the tributary flood plains varied longitudinally in relation to sources of mining-contaminated sediment in the basins. Lead and zinc concentrations also varied with distance from the channel; however, no consistent spatial trend was evident. For the surficial-soil samples collected from the Spring River flood plain and tributary flood plains, both the coarse (larger than 63 micrometers) and fine particles (less than 63 micrometers) contained substantial lead and zinc concentrations.

  12. Exploring the evolution of node neighborhoods in Dynamic Networks

    NASA Astrophysics Data System (ADS)

    Orman, Günce Keziban; Labatut, Vincent; Naskali, Ahmet Teoman

    2017-09-01

    Dynamic Networks are a popular way of modeling and studying the behavior of evolving systems. However, their analysis constitutes a relatively recent subfield of Network Science, and the number of available tools is consequently much smaller than for static networks. In this work, we propose a method specifically designed to take advantage of the longitudinal nature of dynamic networks. It characterizes each individual node by studying the evolution of its direct neighborhood, based on the assumption that the way this neighborhood changes reflects the role and position of the node in the whole network. For this purpose, we define the concept of neighborhood event, which corresponds to the various transformations such groups of nodes can undergo, and describe an algorithm for detecting such events. We demonstrate the interest of our method on three real-world networks: DBLP, LastFM and Enron. We apply frequent pattern mining to extract meaningful information from temporal sequences of neighborhood events. This results in the identification of behavioral trends emerging in the whole network, as well as the individual characterization of specific nodes. We also perform a cluster analysis, which reveals that, in all three networks, one can distinguish two types of nodes exhibiting different behaviors: a very small group of active nodes, whose neighborhood undergo diverse and frequent events, and a very large group of stable nodes.

  13. Microarray data and gene expression statistics for Saccharomyces cerevisiae exposed to simulated asbestos mine drainage.

    PubMed

    Driscoll, Heather E; Murray, Janet M; English, Erika L; Hunter, Timothy C; Pivarski, Kara; Dolci, Elizabeth D

    2017-08-01

    Here we describe microarray expression data (raw and normalized), experimental metadata, and gene-level data with expression statistics from Saccharomyces cerevisiae exposed to simulated asbestos mine drainage from the Vermont Asbestos Group (VAG) Mine on Belvidere Mountain in northern Vermont, USA. For nearly 100 years (between the late 1890s and 1993), chrysotile asbestos fibers were extracted from serpentinized ultramafic rock at the VAG Mine for use in construction and manufacturing industries. Studies have shown that water courses and streambeds nearby have become contaminated with asbestos mine tailings runoff, including elevated levels of magnesium, nickel, chromium, and arsenic, elevated pH, and chrysotile asbestos-laden mine tailings, due to leaching and gradual erosion of massive piles of mine waste covering approximately 9 km 2 . We exposed yeast to simulated VAG Mine tailings leachate to help gain insight on how eukaryotic cells exposed to VAG Mine drainage may respond in the mine environment. Affymetrix GeneChip® Yeast Genome 2.0 Arrays were utilized to assess gene expression after 24-h exposure to simulated VAG Mine tailings runoff. The chemistry of mine-tailings leachate, mine-tailings leachate plus yeast extract peptone dextrose media, and control yeast extract peptone dextrose media is also reported. To our knowledge this is the first dataset to assess global gene expression patterns in a eukaryotic model system simulating asbestos mine tailings runoff exposure. Raw and normalized gene expression data are accessible through the National Center for Biotechnology Information Gene Expression Omnibus (NCBI GEO) Database Series GSE89875 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE89875).

  14. Action recognition using mined hierarchical compound features.

    PubMed

    Gilbert, Andrew; Illingworth, John; Bowden, Richard

    2011-05-01

    The field of Action Recognition has seen a large increase in activity in recent years. Much of the progress has been through incorporating ideas from single-frame object recognition and adapting them for temporal-based action recognition. Inspired by the success of interest points in the 2D spatial domain, their 3D (space-time) counterparts typically form the basic components used to describe actions, and in action recognition the features used are often engineered to fire sparsely. This is to ensure that the problem is tractable; however, this can sacrifice recognition accuracy as it cannot be assumed that the optimum features in terms of class discrimination are obtained from this approach. In contrast, we propose to initially use an overcomplete set of simple 2D corners in both space and time. These are grouped spatially and temporally using a hierarchical process, with an increasing search area. At each stage of the hierarchy, the most distinctive and descriptive features are learned efficiently through data mining. This allows large amounts of data to be searched for frequently reoccurring patterns of features. At each level of the hierarchy, the mined compound features become more complex, discriminative, and sparse. This results in fast, accurate recognition with real-time performance on high-resolution video. As the compound features are constructed and selected based upon their ability to discriminate, their speed and accuracy increase at each level of the hierarchy. The approach is tested on four state-of-the-art data sets, the popular KTH data set to provide a comparison with other state-of-the-art approaches, the Multi-KTH data set to illustrate performance at simultaneous multiaction classification, despite no explicit localization information provided during training. Finally, the recent Hollywood and Hollywood2 data sets provide challenging complex actions taken from commercial movie sequences. For all four data sets, the proposed hierarchical approach outperforms all other methods reported thus far in the literature and can achieve real-time operation.

  15. Knowledge based word-concept model estimation and refinement for biomedical text mining.

    PubMed

    Jimeno Yepes, Antonio; Berlanga, Rafael

    2015-02-01

    Text mining of scientific literature has been essential for setting up large public biomedical databases, which are being widely used by the research community. In the biomedical domain, the existence of a large number of terminological resources and knowledge bases (KB) has enabled a myriad of machine learning methods for different text mining related tasks. Unfortunately, KBs have not been devised for text mining tasks but for human interpretation, thus performance of KB-based methods is usually lower when compared to supervised machine learning methods. The disadvantage of supervised methods though is they require labeled training data and therefore not useful for large scale biomedical text mining systems. KB-based methods do not have this limitation. In this paper, we describe a novel method to generate word-concept probabilities from a KB, which can serve as a basis for several text mining tasks. This method not only takes into account the underlying patterns within the descriptions contained in the KB but also those in texts available from large unlabeled corpora such as MEDLINE. The parameters of the model have been estimated without training data. Patterns from MEDLINE have been built using MetaMap for entity recognition and related using co-occurrences. The word-concept probabilities were evaluated on the task of word sense disambiguation (WSD). The results showed that our method obtained a higher degree of accuracy than other state-of-the-art approaches when evaluated on the MSH WSD data set. We also evaluated our method on the task of document ranking using MEDLINE citations. These results also showed an increase in performance over existing baseline retrieval approaches. Copyright © 2014 Elsevier Inc. All rights reserved.

  16. Distribution characteristics of rare earth elements in children's scalp hair from a rare earths mining area in southern China.

    PubMed

    Tong, Shi-Lu; Zhu, Wang-Zhao; Gao, Zhao-Hua; Meng, Yu-Xiu; Peng, Rui-Ling; Lu, Guo-Cheng

    2004-01-01

    In order to demonstrate the validity of using scalp hair rare earth elements (REEs) content as a biomarker of human REEs exposure, data were collected on REEs exposure levels from children aged 11-15 years old and living in an ion-adsorptive type light REEs (LREEs) mining and surrounding areas in southern China. Sixty scalp hair samples were analyzed by ICP-MS for 16 REEs (La Lu, Y and Sc). Sixteen REEs contents in the samples from the mining area (e.g., range: La: 0.14-6.93 microg/g; Nd: 0.09-5.27 microg/g; Gd: 12.2-645.6ng/g; Lu: 0.2-13.3 ng/g; Y: 0.03-1.27 microg/g; Sc: 0.05-0.30 microg/g) were significantly higher than those from the reference area (range: La: 0.04-0.40 microg/g; Nd: 0.04-0.32 microg/g; Gd: 8.3-64.6 ng/g; Lu: 0.4-3.3ng/g; Y: 0.03-0.29 microg/g; Sc: 0.11-0.36 microg/g) and even much higher than those published in the literature. The distribution pattern of REEs in scalp hair from the mining area was very similar to that of REEs in the mine and the atmosphere shrouding that area. In conclusion, the scalp hair REEs contents may indicate not only quantitatively but also qualitatively (distribution pattern) the absorption of REEs from environmental exposure into human body. The children living in this mining area should be regarded as a high-risk group with REEs (especially LREEs) exposure, and their health status should be examined from a REEs health risk assessment perspective.

  17. Land mine detection using multispectral image fusion

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Clark, G.A.; Sengupta, S.K.; Aimonetti, W.D.

    1995-03-29

    Our system fuses information contained in registered images from multiple sensors to reduce the effects of clutter and improve the ability to detect surface and buried land mines. The sensor suite currently consists of a camera that acquires images in six bands (400nm, 500nm, 600nm, 700nm, 800nm and 900nm). Past research has shown that it is extremely difficult to distinguish land mines from background clutter in images obtained from a single sensor. It is hypothesized, however, that information fused from a suite of various sensors is likely to provide better detection reliability, because the suite of sensors detects a varietymore » of physical properties that are more separable in feature space. The materials surrounding the mines can include natural materials (soil, rocks, foliage, water, etc.) and some artifacts. We use a supervised learning pattern recognition approach to detecting the metal and plastic land mines. The overall process consists of four main parts: Preprocessing, feature extraction, feature selection, and classification. These parts are used in a two step process to classify a subimage. We extract features from the images, and use feature selection algorithms to select only the most important features according to their contribution to correct detections. This allows us to save computational complexity and determine which of the spectral bands add value to the detection system. The most important features from the various sensors are fused using a supervised learning pattern classifier (the probabilistic neural network). We present results of experiments to detect land mines from real data collected from an airborne platform, and evaluate the usefulness of fusing feature information from multiple spectral bands.« less

  18. Nanoseismicity and picoseismicity rate changes from static stress triggering caused by a Mw 2.2 earthquake in Mponeng gold mine, South Africa

    NASA Astrophysics Data System (ADS)

    Kozłowska, Maria; Orlecka-Sikora, Beata; Kwiatek, Grzegorz; Boettcher, Margaret S.; Dresen, Georg

    2015-01-01

    Static stress changes following large earthquakes are known to affect the rate and distribution of aftershocks, yet this process has not been thoroughly investigated for nanoseismicity and picoseismicity at centimeter length scales. Here we utilize a unique data set of M ≥ -3.4 earthquakes following a Mw 2.2 earthquake in Mponeng gold mine, South Africa, that was recorded during a quiet interval in the mine to investigate if rate- and state-based modeling is valid for shallow, mining-induced seismicity. We use Dieterich's (1994) rate- and state-dependent formulation for earthquake productivity, which requires estimation of four parameters: (1) Coulomb stress changes due to the main shock, (2) the reference seismicity rate, (3) frictional resistance parameter, and (4) the duration of aftershock relaxation time. Comparisons of the modeled spatiotemporal patterns of seismicity based on two different source models with the observed distribution show that while the spatial patterns match well, the rate of modeled aftershocks is lower than the observed rate. To test our model, we used three metrics of the goodness-of-fit evaluation. The null hypothesis, of no significant difference between modeled and observed seismicity rates, was only rejected in the depth interval containing the main shock. Results show that mining-induced earthquakes may be followed by a stress relaxation expressed through aftershocks located on the rupture plane and in regions of positive Coulomb stress change. Furthermore, we demonstrate that the main features of the temporal and spatial distributions of very small, mining-induced earthquakes can be successfully determined using rate- and state-based stress modeling.

  19. Benthic invertebrate communities and their responses to selected environmental factors in the Kanawha River basin, West Virginia, Virginia, and North Carolina

    USGS Publications Warehouse

    Chambers, Douglas B.; Messinger, Terence

    2001-01-01

    The effects of selected environmental factors on the composition and structure of benthic invertebrate communities in the Kanawha River Basin of West Virginia, Virginia and North Carolina were investigated in 1997 and 1998. Environmental factors investigated include physiography, land-use pattern, streamwater chemistry, streambed- sediment chemistry, and habitat characteristics. Land-use patterns investigated include coal mining, agriculture, and low intensity rural-residential patterns, at four main stem and seven tributary sites throughout the basin. Of the 37 sites sampled, basin size and physiography most strongly affected benthic invertebrate-community structure. Land-use practices also affected invertebrate community structure in these basins. The basins that differed most from the minimally affected reference condition were those basins in which coal mining was the dominant nonforest land use, as determined by comparing invertebrate- community metric values among sites. Basins in which agriculture was important were more similar to the reference condition. The effect of coal mining upon benthic invertebrate communities was further studied at 29 sites and the relations among invertebrate communities and the selected environmental factors of land use, streamwater chemistry, streambed- sediment chemistry, and habitat characteristics analyzed. Division of coal-mining synoptic-survey sites based on invertebrate-community composition resulted in two groups?one with more than an average production of 9,000 tons of coal per square mile per year since 1980, and one with lesser or no recent coal production. The group with significant recent coal production showed higher levels of community impairment than the group with little or no recent coal production. Median particle size of streambed sediment, and specific conductance and sulfate concentration of streamwater were most strongly correlated with effects on invertebrate communities. These characteristics were related to mining intensity, as measured by thousands of tons of coal produced per square mile of drainage area.

  20. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kartsaklis, Christos; Hernandez, Oscar R

    Interrogating the structure of a program for patterns of interest is attractive to the broader spectrum of software engineering. The very approach by which a pattern is constructed remains a concern for the source code mining community. This paper presents a pattern programming model, for the C and Fortran programming languages, using a compiler directives approach. We discuss our specification, called HERCULES/PL, throughout a number of examples and show how different patterns can be constructed, plus some preliminary results.

  1. Are environmental characteristics in the municipal eldercare, more closely associated with frequent short sick leave spells among employees than with total sick leave: a cross-sectional study.

    PubMed

    Stapelfeldt, Christina Malmose; Nielsen, Claus Vinther; Andersen, Niels Trolle; Krane, Line; Fleten, Nils; Borg, Vilhelm; Jensen, Chris

    2013-06-13

    It has been suggested that frequent-, short-term sick leave is associated with work environment factors, whereas long-term sick leave is associated mainly with health factors. However, studies of the hypothesis of an association between a poor working environment and frequent short spells of sick leave are few and results are inconsistent. Therefore, we aimed to explore associations between self-reported psychosocial work factors and workplace-registered frequency and length of sick leave in the eldercare sector. Employees from the municipal eldercare in Aarhus (N = 2,534) were included. In 2005, they responded to a work environment questionnaire. Sick leave records from 2005 were dichotomised into total sick leave days (0-14 and above 14 days) and into spell patterns (0-2 short, 3-9 short, and mixed spells and 1-3 long spells). Logistic regression models were used to analyse associations; adjusted for age, gender, occupation, and number of spells or sick leave length. The response rate was 76%; 96% of the respondents were women. Unfavourable mean scores in work pace, demands for hiding emotions, poor quality of leadership and bullying were best indicated by more than 14 sick leave days compared with 0-14 sick leave days. For work pace, the best indicator was a long-term sick leave pattern compared with a non-frequent short-term pattern. A frequent short-term sick leave pattern was a better indicator of emotional demands (1.62; 95% CI: 1.1-2.5) and role conflict (1.50; 95% CI: 1.2-1.9) than a short-term non-frequent pattern.Age (= < 40 / >40 years) statistically significantly modified the association between the 1-3 long-term sick leave spell pattern and commitment to the workplace compared with the 3-9 frequent short-term pattern. Total sick leave length and a long-term sick leave spell pattern were just as good or even better indicators of unfavourable work factor scores than a frequent short-term sick leave pattern. Scores in commitment to the workplace and quality of leadership varied with sick leave pattern and age. Thus, different sick leave measures seem to be associated with different work environment factors. Further studies on these associations may inform interventions to improve occupational health care.

  2. Multiagent data warehousing and multiagent data mining for cerebrum/cerebellum modeling

    NASA Astrophysics Data System (ADS)

    Zhang, Wen-Ran

    2002-03-01

    An algorithm named Neighbor-Miner is outlined for multiagent data warehousing and multiagent data mining. The algorithm is defined in an evolving dynamic environment with autonomous or semiautonomous agents. Instead of mining frequent itemsets from customer transactions, the new algorithm discovers new agents and mining agent associations in first-order logic from agent attributes and actions. While the Apriori algorithm uses frequency as a priory threshold, the new algorithm uses agent similarity as priory knowledge. The concept of agent similarity leads to the notions of agent cuboid, orthogonal multiagent data warehousing (MADWH), and multiagent data mining (MADM). Based on agent similarities and action similarities, Neighbor-Miner is proposed and illustrated in a MADWH/MADM approach to cerebrum/cerebellum modeling. It is shown that (1) semiautonomous neurofuzzy agents can be identified for uniped locomotion and gymnastic training based on attribute relevance analysis; (2) new agents can be discovered and agent cuboids can be dynamically constructed in an orthogonal MADWH, which resembles an evolving cerebrum/cerebellum system; and (3) dynamic motion laws can be discovered as association rules in first order logic. Although examples in legged robot gymnastics are used to illustrate the basic ideas, the new approach is generally suitable for a broad category of data mining tasks where knowledge can be discovered collectively by a set of agents from a geographically or geometrically distributed but relevant environment, especially in scientific and engineering data environments.

  3. Post-acquisition data mining techniques for LC-MS/MS-acquired data in drug metabolite identification.

    PubMed

    Dhurjad, Pooja Sukhdev; Marothu, Vamsi Krishna; Rathod, Rajeshwari

    2017-08-01

    Metabolite identification is a crucial part of the drug discovery process. LC-MS/MS-based metabolite identification has gained widespread use, but the data acquired by the LC-MS/MS instrument is complex, and thus the interpretation of data becomes troublesome. Fortunately, advancements in data mining techniques have simplified the process of data interpretation with improved mass accuracy and provide a potentially selective, sensitive, accurate and comprehensive way for metabolite identification. In this review, we have discussed the targeted (extracted ion chromatogram, mass defect filter, product ion filter, neutral loss filter and isotope pattern filter) and untargeted (control sample comparison, background subtraction and metabolomic approaches) post-acquisition data mining techniques, which facilitate the drug metabolite identification. We have also discussed the importance of integrated data mining strategy.

  4. Monitoring the growth or decline of vegetation on mine dumps

    NASA Technical Reports Server (NTRS)

    Gilbertson, B. P. (Principal Investigator)

    1975-01-01

    The author has identified the following signficant results. It was established that particular mine dumps throughout the entire test area can be detected and identified. It was also established that patterns of vegetative growth on the mine dumps can be recognized from a simple visual analysis of photographic images. Because vegetation tends to occur in patches on many mine dumps, it is unsatisfactory to classify complete dumps into categories of percentage vegetative cover. A more desirable approach is to classify the patches of vegetation themselves. The coarse resolution of conventional densitometers restricts the accuracy of this procedure, and consequently a direct analysis of ERTS CCT's is preferred. A set of computer programs was written to perform the data reading and manipulating functions required for basic CCT analysis.

  5. An intelligent knowledge mining model for kidney cancer using rough set theory.

    PubMed

    Durai, M A Saleem; Acharjya, D P; Kannan, A; Iyengar, N Ch Sriman Narayana

    2012-01-01

    Medical diagnosis processes vary in the degree to which they attempt to deal with different complicating aspects of diagnosis such as relative importance of symptoms, varied symptom pattern and the relation between diseases themselves. Rough set approach has two major advantages over the other methods. First, it can handle different types of data such as categorical, numerical etc. Secondly, it does not make any assumption like probability distribution function in stochastic modeling or membership grade function in fuzzy set theory. It involves pattern recognition through logical computational rules rather than approximating them through smooth mathematical functional forms. In this paper we use rough set theory as a data mining tool to derive useful patterns and rules for kidney cancer faulty diagnosis. In particular, the historical data of twenty five research hospitals and medical college is used for validation and the results show the practical viability of the proposed approach.

  6. A Recommendation System to Facilitate Business Process Modeling.

    PubMed

    Deng, Shuiguang; Wang, Dongjing; Li, Ying; Cao, Bin; Yin, Jianwei; Wu, Zhaohui; Zhou, Mengchu

    2017-06-01

    This paper presents a system that utilizes process recommendation technology to help design new business processes from scratch in an efficient and accurate way. The proposed system consists of two phases: 1) offline mining and 2) online recommendation. At the first phase, it mines relations among activity nodes from existing processes in repository, and then stores the extracted relations as patterns in a database. At the second phase, it compares the new process under construction with the premined patterns, and recommends proper activity nodes of the most matching patterns to help build a new process. Specifically, there are three different online recommendation strategies in this system. Experiments on both real and synthetic datasets are conducted to compare the proposed approaches with the other state-of-the-art ones, and the results show that the proposed approaches outperform them in terms of accuracy and efficiency.

  7. Introducing Artificial Neural Networks through a Spreadsheet Model

    ERIC Educational Resources Information Center

    Rienzo, Thomas F.; Athappilly, Kuriakose K.

    2012-01-01

    Business students taking data mining classes are often introduced to artificial neural networks (ANN) through point and click navigation exercises in application software. Even if correct outcomes are obtained, students frequently do not obtain a thorough understanding of ANN processes. This spreadsheet model was created to illuminate the roles of…

  8. Semi-Supervised Clustering for High-Dimensional and Sparse Features

    ERIC Educational Resources Information Center

    Yan, Su

    2010-01-01

    Clustering is one of the most common data mining tasks, used frequently for data organization and analysis in various application domains. Traditional machine learning approaches to clustering are fully automated and unsupervised where class labels are unknown a priori. In real application domains, however, some "weak" form of side…

  9. Big Data: You Are Adding to . . . and Using It

    ERIC Educational Resources Information Center

    Makela, Carole J.

    2016-01-01

    "Big data" prompts a whole lexicon of terms--data flow; analytics; data mining; data science; smart you name it (cars, houses, cities, wearables, etc.); algorithms; learning analytics; predictive analytics; data aggregation; data dashboards; digital tracks; and big data brokers. New terms are being coined frequently. Are we paying…

  10. Occupancy schedules learning process through a data mining framework

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    D'Oca, Simona; Hong, Tianzhen

    Building occupancy is a paramount factor in building energy simulations. Specifically, lighting, plug loads, HVAC equipment utilization, fresh air requirements and internal heat gain or loss greatly depends on the level of occupancy within a building. Developing the appropriate methodologies to describe and reproduce the intricate network responsible for human-building interactions are needed. Extrapolation of patterns from big data streams is a powerful analysis technique which will allow for a better understanding of energy usage in buildings. A three-step data mining framework is applied to discover occupancy patterns in office spaces. First, a data set of 16 offices with 10more » minute interval occupancy data, over a two year period is mined through a decision tree model which predicts the occupancy presence. Then a rule induction algorithm is used to learn a pruned set of rules on the results from the decision tree model. Finally, a cluster analysis is employed in order to obtain consistent patterns of occupancy schedules. Furthermore, the identified occupancy rules and schedules are representative as four archetypal working profiles that can be used as input to current building energy modeling programs, such as EnergyPlus or IDA-ICE, to investigate impact of occupant presence on design, operation and energy use in office buildings.« less

  11. Succession on regraded placer mine spoil in Alaska, USA, in relation to initial site characteristics

    USGS Publications Warehouse

    Densmore, R.V.

    1994-01-01

    This study evaluated the rate and pattern of natural succession on regraded placer mine spoil in relation to initial substrate characteristics. The study site was the Glen Creek watershed of the Kantishna mining area of Denali National Park and Preserve, Alaska. After regrading, twelve 0.01-ha plots were established and substrate characteristics were measured. Natural plant succession was evaluated after five growing seasons. Three successional patterns were identified on the basis of plant community characteristics using cluster analysis, and were related to substrate characteristics. First, a riparian plant community with vigorous Salix alaxensis and Alnus crispa grew rapidly on topsoil that had been spread over the regraded spoil. Second, a similar plant community with less vigorous S. alaxensis developed more slowly on unprocessed spoil and spoil amended with a small amount of topsoil. Third, processed spoil remained almost bare of vegetation, although S. alaxensis was able to establish and persist in a stunted growth form. In contrast, Alnus crispa had difficulty establishing on processed spoil, but the few established seedlings grew well. Several substrate variables, including the proportion of silt and clay vs. sand, total nitrogen, and water retention capacity, were good predictors of the rate and pattern of succession. Total nitrogen was the best single predictor for the number of vigorous S. alaxensis.

  12. Occupancy schedules learning process through a data mining framework

    DOE PAGES

    D'Oca, Simona; Hong, Tianzhen

    2014-12-17

    Building occupancy is a paramount factor in building energy simulations. Specifically, lighting, plug loads, HVAC equipment utilization, fresh air requirements and internal heat gain or loss greatly depends on the level of occupancy within a building. Developing the appropriate methodologies to describe and reproduce the intricate network responsible for human-building interactions are needed. Extrapolation of patterns from big data streams is a powerful analysis technique which will allow for a better understanding of energy usage in buildings. A three-step data mining framework is applied to discover occupancy patterns in office spaces. First, a data set of 16 offices with 10more » minute interval occupancy data, over a two year period is mined through a decision tree model which predicts the occupancy presence. Then a rule induction algorithm is used to learn a pruned set of rules on the results from the decision tree model. Finally, a cluster analysis is employed in order to obtain consistent patterns of occupancy schedules. Furthermore, the identified occupancy rules and schedules are representative as four archetypal working profiles that can be used as input to current building energy modeling programs, such as EnergyPlus or IDA-ICE, to investigate impact of occupant presence on design, operation and energy use in office buildings.« less

  13. Geochemistry of rare earth elements in minesoils from São Domingos mining district (Iberian Pyrite Belt)

    NASA Astrophysics Data System (ADS)

    Delgado, Joaquin; Perez-Lopez, Rafael; Nieto, Jose Miguel; Ayora, Carles

    2010-05-01

    The São Domingos mine is one of the most emblematic mining districts in the lower part of the Guadiana River Basin (SW of Iberian Peninsula). It is located in Portugal (about 5 km from the Spanish border), in the northern sector of the Iberian Pyrite Belt (IPB), one of the largest metallogenetic provinces of massive sulphides in the world. Although mining activity has ceased at present, the large-scale exploitation of this deposit between the second half of the XIX century and the first half of the XX century, has favoured the production of enormous waste dumps, where oxidation of pyrite and associated sulphides is resulting in the production of acid mine drainage (AMD). Mining wastes, minesoils, and acid mine drainage have been analyzed for their major ions and rare earth elements (REE) with the aim of understanding the REE mobility during sulphide weathering so that lanthanoid series can be used both as a proxy for the extent of water-rock interaction and as a tool for identifying impacts of AMD on natural ecosystems. Chemical speciation of REE in extracts from minesoils indicates that REE sulphate complexes (mainly LnSO4+) are the primary aqueous form (60-90%), and free ionic species (Ln3+, 10-40%) are the next most abundant form of soil water-soluble fraction and controls the REE speciation model. The REE from this fraction have NASC-normalized patterns with middle-REE (MREE) enriched signature compared to the light-REE (LREE) and heavy-REE (HREE), showing convex MREE-signatures and convexity index values of +1.29 +/- 1.13. These results are consistent with the typical REE fractionation patterns reported for AMD. Poorly crystalline iron oxyhydroxysulphates act as a source of labile MREE by dissolution and/or desorption processes and could explain the MREE-enriched signatures in solution.

  14. Mining dynamic noteworthy functions in software execution sequences

    PubMed Central

    Huang, Guoyan; Wang, Yuqian; He, Haitao; Ren, Jiadong

    2017-01-01

    As the quality of crucial entities can directly affect that of software, their identification and protection become an important premise for effective software development, management, maintenance and testing, which thus contribute to improving the software quality and its attack-defending ability. Most analysis and evaluation on important entities like codes-based static structure analysis are on the destruction of the actual software running. In this paper, from the perspective of software execution process, we proposed an approach to mine dynamic noteworthy functions (DNFM)in software execution sequences. First, according to software decompiling and tracking stack changes, the execution traces composed of a series of function addresses were acquired. Then these traces were modeled as execution sequences and then simplified so as to get simplified sequences (SFS), followed by the extraction of patterns through pattern extraction (PE) algorithm from SFS. After that, evaluating indicators inner-importance and inter-importance were designed to measure the noteworthiness of functions in DNFM algorithm. Finally, these functions were sorted by their noteworthiness. Comparison and contrast were conducted on the experiment results from two traditional complex network-based node mining methods, namely PageRank and DegreeRank. The results show that the DNFM method can mine noteworthy functions in software effectively and precisely. PMID:28278276

  15. Applying soil science for restoration of post mining degraded landscapes in semi-arid Australia: challenges and opportunities

    NASA Astrophysics Data System (ADS)

    Muñoz-Rojas, Miriam; Martini, Dylan; Erickson, Todd; Merritt, David; Dixon, Kingsley

    2015-04-01

    Introduction Current challenges in ecological restoration of post mining environments include the deficit of original topsoil which is frequently lost or damaged, and the lack of soil forming materials. A comprehensive knowledge of soil properties and processes and an adequate management of soil resources are critical to improve the restoration success of these degraded areas. In particular, understanding soil physical, chemical and biological parameters is decisive in environments where water is a limiting factor for seedling establishment and plant survival. To improve the restoration success of biodiverse semi-arid areas disturbed by mining activities (Pilbara region, Western Australia), we conducted experiments to (i) analyse changes in soil physico-chemical properties and soil microbial activity of topsoil stockpiles to optimise its handling and minimise deterioration of nutrients and soil biota, (ii) test climate effects on seedling emergence of native plant species and (iii) assess the potential of mine waste materials as a suitable growth medium for seedling emergence of native plant species under various water regimes. Methods The experimental studies were conducted in controlled environment facilities where air temperature, relative humidity and soil moisture were monitored routinely. Watering regimes were selected to represent rainfall patterns of the area. As a growth media we used material obtained from topsoil stockpiles and waste materials from an active mine site, which were mixed at different ratios. Samples were collected from different parts of the topsoil stockpiles and analysed to determine physical, chemical and biological properties. Results No large discrepancies in physical and chemical values were detected at different positions of the stockpiles. However, microbial activity was highly variable, particularly inside the stockpiles. Seedling emergence on topsoil growth media was highly dependent on climate factors with emergence rates varying significantly (P< 0.001) across species. Highest emergence rates were obtained for Acacia adoxa and Grevillea pyramidalis in the 30°C scenario and adequate soil moisture levels (mean % ± SE 71±5.3 and 80±3.8 respectively). With available water, emergence was above 30% for all species and growth media types (topsoil, waste and mixes of topsoil and waste at 50:50 and 25:75 ratios). However, under drought conditions, emergence severely decreased for all species. In particular, Gossypium robinsonii and Grevillea pyramidalis did not show any response with less than 50% of topsoil in the composition of growth media. Our results suggest that changes in precipitation regimes can have a critical effect on seedling emergence of native plant species from the Pilbara. Understanding soil physico-chemical properties of soil materials and changes in soil moisture related to rainfall patterns and growth media blends are crucial to predict the success of seedling emergence and ultimately achieve biodiverse restoration in semiarid areas. This research is part of a broader multi-study approach, the Restoration Seedbank Initiative project, a partnership between The University of Western Australia, BHP Billiton Iron Ore, and Kings Park and Botanic Garden. Keywords Pilbara region, biodiverse ecosystems, soil microbial activity, topsoil stockpile, dry environments, land rehabilitation.

  16. Biogeochemical behaviour and bioremediation of uranium in waters of abandoned mines.

    PubMed

    Mkandawire, Martin

    2013-11-01

    The discharges of uranium and associated radionuclides as well as heavy metals and metalloids from waste and tailing dumps in abandoned uranium mining and processing sites pose contamination risks to surface and groundwater. Although many more are being planned for nuclear energy purposes, most of the abandoned uranium mines are a legacy of uranium production that fuelled arms race during the cold war of the last century. Since the end of cold war, there have been efforts to rehabilitate the mining sites, initially, using classical remediation techniques based on high chemical and civil engineering. Recently, bioremediation technology has been sought as alternatives to the classical approach due to reasons, which include: (a) high demand of sites requiring remediation; (b) the economic implication of running and maintaining the facilities due to high energy and work force demand; and (c) the pattern and characteristics of contaminant discharges in most of the former uranium mining and processing sites prevents the use of classical methods. This review discusses risks of uranium contamination from abandoned uranium mines from the biogeochemical point of view and the potential and limitation of uranium bioremediation technique as alternative to classical approach in abandoned uranium mining and processing sites.

  17. The Hazards of Data Mining in Healthcare.

    PubMed

    Househ, Mowafa; Aldosari, Bakheet

    2017-01-01

    From the mid-1990s, data mining methods have been used to explore and find patterns and relationships in healthcare data. During the 1990s and early 2000's, data mining was a topic of great interest to healthcare researchers, as data mining showed some promise in the use of its predictive techniques to help model the healthcare system and improve the delivery of healthcare services. However, it was soon discovered that mining healthcare data had many challenges relating to the veracity of healthcare data and limitations around predictive modelling leading to failures of data mining projects. As the Big Data movement has gained momentum over the past few years, there has been a reemergence of interest in the use of data mining techniques and methods to analyze healthcare generated Big Data. Much has been written on the positive impacts of data mining on healthcare practice relating to issues of best practice, fraud detection, chronic disease management, and general healthcare decision making. Little has been written about the limitations and challenges of data mining use in healthcare. In this review paper, we explore some of the limitations and challenges in the use of data mining techniques in healthcare. Our results show that the limitations of data mining in healthcare include reliability of medical data, data sharing between healthcare organizations, inappropriate modelling leading to inaccurate predictions. We conclude that there are many pitfalls in the use of data mining in healthcare and more work is needed to show evidence of its utility in facilitating healthcare decision-making for healthcare providers, managers, and policy makers and more evidence is needed on data mining's overall impact on healthcare services and patient care.

  18. Renewed mining and reclamation: Imapacts on bats and potential mitigation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brown, P.E.; Berry, R.D.

    Historic mining created new roosting habitat for many bat species. Now the same industry has the potential to adversely impact bats. Contemporary mining operations usually occur in historic districts; consequently the old workings are destroyed by open pit operations. Occasionally, underground techniques are employed, resulting in the enlargement or destruction of the original workings. Even during exploratory operations, historic mine openings can be covered as drill roads are bulldozed, or drills can penetrate and collapse underground workings. Nearby blasting associated with mine construction and operation can disrupt roosting bats. Bats can also be disturbed by the entry of mine personnelmore » to collect ore samples or by recreational mine explorers, since the creation of roads often results in easier access. In addition to roost disturbance, other aspects of renewed mining can have adverse impacts on bat populations, and affect even those bats that do not live in mines. Open cyanide ponds, or other water in which toxic chemicals accumulate, can poison bats and other wildlife. The creation of the pits, roads and processing areas often destroys critical foraging habitat, or change drainage patterns. Finally, at the completion of mining, any historic mines still open may be sealed as part of closure and reclamation activities. The net result can be a loss of bats and bat habitat. Conversely, in some contemporary underground operations, future roosting habitat for bats can be fabricated. An experimental approach to the creation of new roosting habitat is to bury culverts or old tires beneath waste rock. Mining companies can mitigate for impacts to bats by surveying to identify bat-roosting habitat, removing bats prior to renewed mining or closure, protecting non-impacted roost sites with gates and fences, researching to identify habitat requirements and creating new artificial roosts.« less

  19. Citation-related reliability analysis for a pilot sample of underground coal mines.

    PubMed

    Kinilakodi, Harisha; Grayson, R Larry

    2011-05-01

    The scrutiny of underground coal mine safety was heightened because of the disasters that occurred in 2006-2007, and more recently in 2010. In the aftermath of the 2006 incidents, the U.S. Congress passed the Mine Improvement and New Emergency Response Act of 2006 (MINER Act), which strengthened the existing regulations and mandated new laws to address various issues related to emergency preparedness and response, escape from an emergency situation, and protection of miners. The National Mining Association-sponsored Mine Safety Technology and Training Commission study highlighted the role of risk management in identifying and controlling major hazards, which are elements that could come together and cause a mine disaster. In 2007 MSHA revised its approach to the "Pattern of Violations" (POV) process in order to target unsafe mines and then force them to remediate conditions in their mines. The POV approach has certain limitations that make it difficult for it to be enforced. One very understandable way to focus on removing threats from major-hazard conditions is to use citation-related reliability analysis. The citation reliability approach, which focuses on the probability of not getting a citation on a given inspector day, is considered an analogue to the maintenance reliability approach, which many mine operators understand and use. In this study, the citation reliability approach was applied to a stratified random sample of 31 underground coal mines to examine its potential for broader application. The results clearly show the best-performing and worst-performing mines for compliance with mine safety standards, and they highlight differences among different mine sizes. Copyright © 2010 Elsevier Ltd. All rights reserved.

  20. Geo-Spatial Characterization of Soil Mercury and Arsenic at a High-Altitude Bolivian Gold Mine.

    PubMed

    Johnson, Glen D; Pavilonis, Brian; Caravanos, Jack; Grassman, Jean

    2018-02-01

    Soil mercury concentrations at a typical small-scale mine site in the Bolivian Andes were elevated (28-737 mg/kg or ppm) in localized areas where mercury amalgams were either formed or vaporized to release gold, but was not detectable beyond approximately 10 m from its sources. Arsenic was measurable, exceeding known background levels throughout the mine site (77-137,022 ppm), and was also measurable through the local village of Ingenio (36-1803 ppm). Although arsenic levels were high at all surveyed locations, its spatial pattern followed mercury, being highest where mercury was high.

  1. Mercury Levels in Human Hair and Farmed Fish near Artisanal and Small-Scale Gold Mining Communities in the Madre de Dios River Basin, Peru

    PubMed Central

    Langeland, Aubrey L.; Hardin, Rebecca D.; Neitzel, Richard L.

    2017-01-01

    Artisanal and small-scale gold mining (ASGM) has been an important source of income for communities in the Madre de Dios River Basin in Peru for hundreds of years. However, in recent decades, the scale of ASGM activities in the region has increased dramatically, and exposures to a variety of occupational and environmental hazards related to ASGM, including mercury, are becoming more widespread. The aims of our study were to: (1) examine patterns in the total hair mercury level of human participants in several communities in the region and compare these results to the 2.2 µg/g total hair mercury level equivalent to the World Health Organization (WHO) Expert Committee of Food Additives (JECFA)’s Provisional Tolerable Weekly Intake (PTWI); and (2), to measure the mercury levels of paco (Piaractus brachypomus) fish raised in local aquaculture ponds, in order to compare these levels to the EPA Fish Tissue Residue Criterion of 0.3 µg Hg/g fish (wet weight). We collected hair samples from 80 participants in four communities (one control and three where ASGM activities occurred) in the region, and collected 111 samples from fish raised in 24 local aquaculture farms. We then analyzed the samples for total mercury. Total mercury levels in hair were statistically significantly higher in the mining communities than in the control community, and increased with increasing geodesic distance from the Madre de Dios headwaters, did not differ by sex, and frequently exceeded the reference level. Regression analyses indicated that higher hair mercury levels were associated with residence in ASGM communities. The analysis of paco fish samples found no samples that exceeded the EPA tissue residue criterion. Collectively, these results align with other recent studies showing that ASGM activities are associated with elevated human mercury exposure. The fish farmed through the relatively new process of aquaculture in ASGM areas appeared to have little potential to contribute to human mercury exposure. More research is needed on human health risks associated with ASGM to discern occupational, residential, and nutritional exposure, especially through tracking temporal changes in mercury levels as fish ponds age, and assessing levels in different farmed fish species. Additionally, research is needed to definitively determine that elevated mercury levels in humans and fish result from the elemental mercury from mining, rather than from a different source, such as the mercury released from soil erosion during deforestation events from mining or other activities. PMID:28335439

  2. Mercury Levels in Human Hair and Farmed Fish near Artisanal and Small-Scale Gold Mining Communities in the Madre de Dios River Basin, Peru.

    PubMed

    Langeland, Aubrey L; Hardin, Rebecca D; Neitzel, Richard L

    2017-03-14

    Artisanal and small-scale gold mining (ASGM) has been an important source of income for communities in the Madre de Dios River Basin in Peru for hundreds of years. However, in recent decades, the scale of ASGM activities in the region has increased dramatically, and exposures to a variety of occupational and environmental hazards related to ASGM, including mercury, are becoming more widespread. The aims of our study were to: (1) examine patterns in the total hair mercury level of human participants in several communities in the region and compare these results to the 2.2 µg/g total hair mercury level equivalent to the World Health Organization (WHO) Expert Committee of Food Additives (JECFA)'s Provisional Tolerable Weekly Intake (PTWI); and (2), to measure the mercury levels of paco ( Piaractus brachypomus ) fish raised in local aquaculture ponds, in order to compare these levels to the EPA Fish Tissue Residue Criterion of 0.3 µg Hg/g fish (wet weight). We collected hair samples from 80 participants in four communities (one control and three where ASGM activities occurred) in the region, and collected 111 samples from fish raised in 24 local aquaculture farms. We then analyzed the samples for total mercury. Total mercury levels in hair were statistically significantly higher in the mining communities than in the control community, and increased with increasing geodesic distance from the Madre de Dios headwaters, did not differ by sex, and frequently exceeded the reference level. Regression analyses indicated that higher hair mercury levels were associated with residence in ASGM communities. The analysis of paco fish samples found no samples that exceeded the EPA tissue residue criterion. Collectively, these results align with other recent studies showing that ASGM activities are associated with elevated human mercury exposure. The fish farmed through the relatively new process of aquaculture in ASGM areas appeared to have little potential to contribute to human mercury exposure. More research is needed on human health risks associated with ASGM to discern occupational, residential, and nutritional exposure, especially through tracking temporal changes in mercury levels as fish ponds age, and assessing levels in different farmed fish species. Additionally, research is needed to definitively determine that elevated mercury levels in humans and fish result from the elemental mercury from mining, rather than from a different source, such as the mercury released from soil erosion during deforestation events from mining or other activities.

  3. A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases.

    PubMed

    Pérez, Joaquín; Iturbide, Emmanuel; Olivares, Víctor; Hidalgo, Miguel; Martínez, Alicia; Almanza, Nelva

    2015-11-01

    It is known that the data preparation phase is the most time consuming in the data mining process, using up to 50% or up to 70% of the total project time. Currently, data mining methodologies are of general purpose and one of their limitations is that they do not provide a guide about what particular task to develop in a specific domain. This paper shows a new data preparation methodology oriented to the epidemiological domain in which we have identified two sets of tasks: General Data Preparation and Specific Data Preparation. For both sets, the Cross-Industry Standard Process for Data Mining (CRISP-DM) is adopted as a guideline. The main contribution of our methodology is fourteen specialized tasks concerning such domain. To validate the proposed methodology, we developed a data mining system and the entire process was applied to real mortality databases. The results were encouraging because it was observed that the use of the methodology reduced some of the time consuming tasks and the data mining system showed findings of unknown and potentially useful patterns for the public health services in Mexico.

  4. Is Blast Injury a Modern Phenomenon?: Early Historical Descriptions of Mining and Volcanic Traumatic Brain Injury With Relevance to Modern Terrorist Attacks and Military Warfare.

    PubMed

    Bowen, Lauren N; Moore, David F; Okun, Michael S

    2016-03-01

    Given the recent interest in blast injury spurred by returning soldiers from overseas conflicts, we sought to research the early historical descriptions of blast injuries and their treatments. Consideration was given to specific descriptions of survivors of closed head injury and their treatment. A review of the medical and nonmedical literature was undertaken, with particular emphasis on pre-1800 descriptions of volcanic eruptions and mining accidents. Compilations of accounts of the Etna eruptions dating from 126 BC were translated into English, and early mining texts from the 1600s and 1700s were reviewed. Accumulations of flammable gases were recorded in many medieval sources and this knowledge of toxic gas which could lead to blast injury was known in the mining community by 1316. No direct attribution of injuries to blast forces was present in the historical record examined before the 1300s, although mining accounts in the 1600s detail deaths due to blast. No specific descriptions of survivors of a closed head injury were found in the mining and volcanic eruption literature. Descriptions and warnings of blast forces were commonly written about in the medieval and Renaissance mining communities. Personal narratives as early as 1316 recognize the traumatic effects of blast injury. No mining or volcanic blast descriptions before 1800 detailed severe closed head injury survivors, suggesting greater mortality than morbidity from blast injury in the premodern era. This review also uncovered that there was no historical treatment or remedy recommended to survivors of blast injury. Blast explosions resulting in injury or death were frequently described, although in simplistic terminology.

  5. Controls on the Mobility of Antimony in Mine Waste from Three Deposit Types

    NASA Astrophysics Data System (ADS)

    Jamieson, H.; Radková, A. B.; Fawcett, S.

    2017-12-01

    Antimony can be considered both a critical metal and an environmental hazard, with a toxicity similar to arsenic. It is concentrated in stibnite deposits, but also present in polymetallic and precious metal ores, frequently accompanied by arsenic. We have studied the mineralogical controls on the mobility of antimony in three types of mine waste: stibnite tailings from an antimony mine, tetrahedrite-bearing waste rock from copper mining, and gold mine tailings and ore roaster waste. Our results demonstrate that the tendency of antimony to leach into the aqueous environment or remain sequestered in solid phases depends on the primary host minerals and conditions governing the precipitation of secondary antimony-hosting phases. In tailings at the Beaver Brook antimony mine in Newfoundland, Canada, stibnite oxidizes rapidly, and secondary minerals such as the relatively insoluble Sb-Fe tripuhyite-like phase and Sb-bearing goethite. However, under dry conditions, the most important secondary Sb host is the Mg-Sb hydroxide brandholzite, but this easily soluble mineral disappears when it rains. Antimony that was originally hosted in tetrahedrite, a complex multi-element sulfosalt, in the historic waste rock piles at Špania Dolina-Piesky, Slovakia, is not as mobile as Cu and As during weathering but reprecipiates to a mixture of tripuhyite and romeite. Finally, the original antimony-hosting minerals, both stibnite and sulphosalts, in the gold ore at Giant Mine, Yellowknife, Canada were completely destroyed during ore roasting. In tailings-contaminated sediments, antimony persists in roaster-generated iron oxide phases, except under reducing conditions where some of the antimony forms a Sb-S phase. The combined presence of antimony and arsenic in mine waste complicates risk assessment but in general, our findings suggest that antimony is less mobile than arsenic in the environment.

  6. Clustering XML Documents Using Frequent Subtrees

    NASA Astrophysics Data System (ADS)

    Kutty, Sangeetha; Tran, Tien; Nayak, Richi; Li, Yuefeng

    This paper presents an experimental study conducted over the INEX 2008 Document Mining Challenge corpus using both the structure and the content of XML documents for clustering them. The concise common substructures known as the closed frequent subtrees are generated using the structural information of the XML documents. The closed frequent subtrees are then used to extract the constrained content from the documents. A matrix containing the term distribution of the documents in the dataset is developed using the extracted constrained content. The k-way clustering algorithm is applied to the matrix to obtain the required clusters. In spite of the large number of documents in the INEX 2008 Wikipedia dataset, the proposed frequent subtree-based clustering approach was successful in clustering the documents. This approach significantly reduces the dimensionality of the terms used for clustering without much loss in accuracy.

  7. PDBj Mine: design and implementation of relational database interface for Protein Data Bank Japan

    PubMed Central

    Kinjo, Akira R.; Yamashita, Reiko; Nakamura, Haruki

    2010-01-01

    This article is a tutorial for PDBj Mine, a new database and its interface for Protein Data Bank Japan (PDBj). In PDBj Mine, data are loaded from files in the PDBMLplus format (an extension of PDBML, PDB's canonical XML format, enriched with annotations), which are then served for the user of PDBj via the worldwide web (WWW). We describe the basic design of the relational database (RDB) and web interfaces of PDBj Mine. The contents of PDBMLplus files are first broken into XPath entities, and these paths and data are indexed in the way that reflects the hierarchical structure of the XML files. The data for each XPath type are saved into the corresponding relational table that is named as the XPath itself. The generation of table definitions from the PDBMLplus XML schema is fully automated. For efficient search, frequently queried terms are compiled into a brief summary table. Casual users can perform simple keyword search, and 'Advanced Search' which can specify various conditions on the entries. More experienced users can query the database using SQL statements which can be constructed in a uniform manner. Thus, PDBj Mine achieves a combination of the flexibility of XML documents and the robustness of the RDB. Database URL: http://www.pdbj.org/ PMID:20798081

  8. PDBj Mine: design and implementation of relational database interface for Protein Data Bank Japan.

    PubMed

    Kinjo, Akira R; Yamashita, Reiko; Nakamura, Haruki

    2010-08-25

    This article is a tutorial for PDBj Mine, a new database and its interface for Protein Data Bank Japan (PDBj). In PDBj Mine, data are loaded from files in the PDBMLplus format (an extension of PDBML, PDB's canonical XML format, enriched with annotations), which are then served for the user of PDBj via the worldwide web (WWW). We describe the basic design of the relational database (RDB) and web interfaces of PDBj Mine. The contents of PDBMLplus files are first broken into XPath entities, and these paths and data are indexed in the way that reflects the hierarchical structure of the XML files. The data for each XPath type are saved into the corresponding relational table that is named as the XPath itself. The generation of table definitions from the PDBMLplus XML schema is fully automated. For efficient search, frequently queried terms are compiled into a brief summary table. Casual users can perform simple keyword search, and 'Advanced Search' which can specify various conditions on the entries. More experienced users can query the database using SQL statements which can be constructed in a uniform manner. Thus, PDBj Mine achieves a combination of the flexibility of XML documents and the robustness of the RDB. Database URL: http://www.pdbj.org/

  9. Trace Metal Content of Sediments Close to Mine Sites in the Andean Region

    PubMed Central

    Yacoub, Cristina; Pérez-Foguet, Agustí; Miralles, Nuria

    2012-01-01

    This study is a preliminary examination of heavy metal pollution in sediments close to two mine sites in the upper part of the Jequetepeque River Basin, Peru. Sediment concentrations of Al, As, Cd, Cu, Cr, Fe, Hg, Ni, Pb, Sb, Sn, and Zn were analyzed. A comparative study of the trace metal content of sediments shows that the highest concentrations are found at the closest points to the mine sites in both cases. The sediment quality analysis was performed using the threshold effect level of the Canadian guidelines (TEL). The sediment samples analyzed show that potential ecological risk is caused frequently at both sites by As, Cd, Cu, Hg, Pb, and Zn. The long-term influence of sediment metals in the environment is also assessed by sequential extraction scheme analysis (SES). The availability of metals in sediments is assessed, and it is considered a significant threat to the environment for As, Cd, and Sb close to one mine site and Cr and Hg close to the other mine site. Statistical analysis of sediment samples provides a characterization of both subbasins, showing low concentrations of a specific set of metals and identifies the main characteristics of the different pollution sources. A tentative relationship between pollution sources and possible ecological risk is established. PMID:22606058

  10. Potential synergy: the thorium fuel cycle and rare earths processing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ault, T.; Wymer, R.; Croff, A.

    2013-07-01

    The use of thorium in nuclear power programs has been evaluated on a recurring basis. A concern often raised is the lack of 'thorium infrastructure'; however, for at least a part of a potential thorium fuel cycle, this may less of a problem than previously thought. Thorium is frequently encountered in association with rare earth elements and, since the U.S. last systematically evaluated the large-scale use of thorium (the 1970's,) the use of rare earth elements has increased ten-fold to approximately 200,000 metric tons per year. Integration of thorium extraction with rare earth processing has been previously described and top-levelmore » estimates have been done on thorium resource availability; however, since ores and mining operations differ markedly, what is needed is process flowsheet analysis to determine whether a specific mining operation can feasibly produce thorium as a by-product. Also, the collocation of thorium with rare earths means that, even if a thorium product stream is not developed, its presence in mining waste streams needs to be addressed and there are previous instances where this has caused issues. This study analyzes several operational mines, estimates the mines' ability to produce a thorium by-product stream, and discusses some waste management implications of recovering thorium. (authors)« less

  11. Development and implementation of the Good Neighbor Agreement (GNA) practice in the USA sustainable mining development.

    NASA Astrophysics Data System (ADS)

    Masaitis, Alexandra

    2014-05-01

    New economic, environmental and social challenges for the mining industry in the USA show the need to implement "responsible" mining practices that include improved community involvement. Conflicts which occur in the US territory and with US mining companies around the world are now common between the mining proponents, NGO's and communities. These conflicts can sometimes be alleviated by early development of modes of communication, and a formal discussion format that allows airing of concerns and potential resolution of problems. One of the methods that can formalize this process is to establish a Good Neighbor Agreement (GNA), which deals specifically with challenges in relationships between mining operations and the local communities. It is a new practice related to mining operations that are oriented toward social needs and concerns of local communities that arise during the normal life of a mine, which can achieve sustainable mining practices. The GNA project being currently developed at the University of Nevada, USA in cooperation with the Newmont Mining Corporation has a goal of creating an open company/community dialog that will help identify and address sociological and environmental concerns associated with mining. Discussion: The Good Neighbor Agreement currently evolving will address the following: 1. Identify spheres of possible cooperation between mining companies, government organizations, and NGO's. 2. Provide an economically viable mechanism for developing a partnership between mining operations and the local communities that will increase mining industry's accountability and provide higher levels of confidence for the community that a mine is operated in a safe and sustainable manner. Implementation of the GNA can help identify and evaluate conflict criteria in mining/community relationships; determine the status of concerns; determine the role and responsibilities of stakeholders; analyze problem resolution feasibility; maintain the community involvement and support through economic benefits and environmental safeguards; develop options for the concerns resolution. Difficulties in establishing the GNA standards include lack of insurance/bonding policies, and by the lack of audit and monitoring that could determine the level of exposure of the local community and the environment to the contaminants released at the mine sites. Since many problems of mines can occur during closure and post-closure, GNA's should address those issues also. The goal of the GNA is to have open access for the public to the safety, health, and environmental information pertaining to the mining operation, as well as to educate the local communities about mining practices that promote mutual acknowledgment of the need to build a relationship amenable to each other's needs. Frequent conflicts between mining companies and surrounding communities lead to work disruptions or even mine closures and show the necessity of a less confrontational approach to environmental and social justice. The Good Neighbor Agreement is a unique way to provide the benefits for the both mining operations and local community to provide a mechanism for risk redaction and communication that offer the potential to protect both mining and community interests.

  12. Real-time intelligent decision making with data mining

    NASA Astrophysics Data System (ADS)

    Gupta, Deepak P.; Gopalakrishnan, Bhaskaran

    2004-03-01

    Database mining, widely known as knowledge discovery and data mining (KDD), has attracted lot of attention in recent years. With the rapid growth of databases in commercial, industrial, administrative and other applications, it is necessary and interesting to extract knowledge automatically from huge amount of data. Almost all the organizations are generating data and information at an unprecedented rate and they need to get some useful information from this data. Data mining is the extraction of non-trivial, previously unknown and potentially useful patterns, trends, dependence and correlation known as association rules among data values in large databases. In last ten to fifteen years, data mining spread out from one company to the other to help them understand more about customers' aspect of quality and response and also distinguish the customers they want from those they do not. A credit-card company found that customers who complete their applications in pencil rather than pen are more likely to default. There is a program that identifies callers by purchase history. The bigger the spender, the quicker the call will be answered. If you feel your call is being answered in the order in which it was received, think again. Many algorithms assume that data is static in nature and mine the rules and relations in that data. But for a dynamic database e.g. in most of the manufacturing industries, the rules and relations thus developed among the variables/items no longer hold true. A simple approach may be to mine the associations among the variables after every fixed period of time. But again, how much the length of this period should be, is a question to be answered. The next problem with the static data mining is that some of the relationships that might be of interest from one period to the other may be lost after a new set of data is used. To reflect the effect of new data set and current status of the association rules where some of the strong rules might become weak and vice versa, there is a need to develop an efficient algorithm to adapt to the current patterns and associations. Some work has been done in developing the association rules for incremental database but to the best of the author"s knowledge no work has been done to do the same for periodic cause and effect analysis for online association rules in manufacturing industries. The present research attempts to answer these questions and develop an algorithm that can display the association rules online, find the periodic patterns in the data and detect the root cause of the problem.

  13. Information and communication technology and climate change adaptation: Evidence from selected mining companies in South Africa

    PubMed Central

    Nhamo, Godwell

    2016-01-01

    The mining sector is a significant contributor to the gross domestic product of many global economies. Given the increasing trends in climate-induced disasters and the growing desire to find lasting solutions, information and communication technology (ICT) has been introduced into the climate change adaptation mix. Climate change-induced extreme weather events such as flooding, drought, excessive fog, and cyclones have compounded the environmental challenges faced by the mining sector. This article presents the adoption of ICT innovation as part of the adaptation strategies towards reducing the mining sector’s vulnerability and exposure to climate change disaster risks. Document analysis and systematic literature review were adopted as the methodology. Findings from the study reflect how ICT intervention orchestrated changes in communication patterns which are tailored towards the reduction in climate change vulnerability and exposure. The research concludes with a proposition that ICT intervention must be part of the bigger and ongoing climate change adaptation agenda in the mining sector.

  14. An application of data mining in district heating substations for improving energy performance

    NASA Astrophysics Data System (ADS)

    Xue, Puning; Zhou, Zhigang; Chen, Xin; Liu, Jing

    2017-11-01

    Automatic meter reading system is capable of collecting and storing a huge number of district heating (DH) data. However, the data obtained are rarely fully utilized. Data mining is a promising technology to discover potential interesting knowledge from vast data. This paper applies data mining methods to analyse the massive data for improving energy performance of DH substation. The technical approach contains three steps: data selection, cluster analysis and association rule mining (ARM). Two-heating-season data of a substation are used for case study. Cluster analysis identifies six distinct heating patterns based on the primary heat of the substation. ARM reveals that secondary pressure difference and secondary flow rate have a strong correlation. Using the discovered rules, a fault occurring in remote flow meter installed at secondary network is detected accurately. The application demonstrates that data mining techniques can effectively extrapolate potential useful knowledge to better understand substation operation strategies and improve substation energy performance.

  15. RADSS: an integration of GIS, spatial statistics, and network service for regional data mining

    NASA Astrophysics Data System (ADS)

    Hu, Haitang; Bao, Shuming; Lin, Hui; Zhu, Qing

    2005-10-01

    Regional data mining, which aims at the discovery of knowledge about spatial patterns, clusters or association between regions, has widely applications nowadays in social science, such as sociology, economics, epidemiology, crime, and so on. Many applications in the regional or other social sciences are more concerned with the spatial relationship, rather than the precise geographical location. Based on the spatial continuity rule derived from Tobler's first law of geography: observations at two sites tend to be more similar to each other if the sites are close together than if far apart, spatial statistics, as an important means for spatial data mining, allow the users to extract the interesting and useful information like spatial pattern, spatial structure, spatial association, spatial outlier and spatial interaction, from the vast amount of spatial data or non-spatial data. Therefore, by integrating with the spatial statistical methods, the geographical information systems will become more powerful in gaining further insights into the nature of spatial structure of regional system, and help the researchers to be more careful when selecting appropriate models. However, the lack of such tools holds back the application of spatial data analysis techniques and development of new methods and models (e.g., spatio-temporal models). Herein, we make an attempt to develop such an integrated software and apply it into the complex system analysis for the Poyang Lake Basin. This paper presents a framework for integrating GIS, spatial statistics and network service in regional data mining, as well as their implementation. After discussing the spatial statistics methods involved in regional complex system analysis, we introduce RADSS (Regional Analysis and Decision Support System), our new regional data mining tool, by integrating GIS, spatial statistics and network service. RADSS includes the functions of spatial data visualization, exploratory spatial data analysis, and spatial statistics. The tool also includes some fundamental spatial and non-spatial database in regional population and environment, which can be updated by external database via CD or network. Utilizing this data mining and exploratory analytical tool, the users can easily and quickly analyse the huge mount of the interrelated regional data, and better understand the spatial patterns and trends of the regional development, so as to make a credible and scientific decision. Moreover, it can be used as an educational tool for spatial data analysis and environmental studies. In this paper, we also present a case study on Poyang Lake Basin as an application of the tool and spatial data mining in complex environmental studies. At last, several concluding remarks are discussed.

  16. Sensor feature fusion for detecting buried objects

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Clark, G.A.; Sengupta, S.K.; Sherwood, R.J.

    1993-04-01

    Given multiple registered images of the earth`s surface from dual-band sensors, our system fuses information from the sensors to reduce the effects of clutter and improve the ability to detect buried or surface target sites. The sensor suite currently includes two sensors (5 micron and 10 micron wavelengths) and one ground penetrating radar (GPR) of the wide-band pulsed synthetic aperture type. We use a supervised teaming pattern recognition approach to detect metal and plastic land mines buried in soil. The overall process consists of four main parts: Preprocessing, feature extraction, feature selection, and classification. These parts are used in amore » two step process to classify a subimage. Thee first step, referred to as feature selection, determines the features of sub-images which result in the greatest separability among the classes. The second step, image labeling, uses the selected features and the decisions from a pattern classifier to label the regions in the image which are likely to correspond to buried mines. We extract features from the images, and use feature selection algorithms to select only the most important features according to their contribution to correct detections. This allows us to save computational complexity and determine which of the sensors add value to the detection system. The most important features from the various sensors are fused using supervised teaming pattern classifiers (including neural networks). We present results of experiments to detect buried land mines from real data, and evaluate the usefulness of fusing feature information from multiple sensor types, including dual-band infrared and ground penetrating radar. The novelty of the work lies mostly in the combination of the algorithms and their application to the very important and currently unsolved operational problem of detecting buried land mines from an airborne standoff platform.« less

  17. Data Mining for Understanding and Impriving Decision-Making Affecting Ground Delay Programs

    NASA Technical Reports Server (NTRS)

    Kulkarni, Deepak; Wang, Yao Xun; Sridhar, Banavar

    2013-01-01

    The continuous growth in the demand for air transportation results in an imbalance between airspace capacity and traffic demand. The airspace capacity of a region depends on the ability of the system to maintain safe separation between aircraft in the region. In addition to growing demand, the airspace capacity is severely limited by convective weather. During such conditions, traffic managers at the FAA's Air Traffic Control System Command Center (ATCSCC) and dispatchers at various Airlines' Operations Center (AOC) collaborate to mitigate the demand-capacity imbalance caused by weather. The end result is the implementation of a set of Traffic Flow Management (TFM) initiatives such as ground delay programs, reroute advisories, flow metering, and ground stops. Data Mining is the automated process of analyzing large sets of data and then extracting patterns in the data. Data mining tools are capable of predicting behaviors and future trends, allowing an organization to benefit from past experience in making knowledge-driven decisions. The work reported in this paper is focused on ground delay programs. Data mining algorithms have the potential to develop associations between weather patterns and the corresponding ground delay program responses. If successful, they can be used to improve and standardize TFM decision resulting in better predictability of traffic flows on days with reliable weather forecasts. The approach here seeks to develop a set of data mining and machine learning models and apply them to historical archives of weather observations and forecasts and TFM initiatives to determine the extent to which the theory can predict and explain the observed traffic flow behaviors.

  18. Development of rapid methods for measuring stream ecosystem functions in the Appalachian coal mining region: preliminary results

    EPA Science Inventory

    Headwater streams represent the majority of U.S. stream miles. As a consequence of being abundant and widespread, the alteration and loss of headwater streams may have impacts on downstream waterbodies. These streams are frequently the subject of proposed dredge and fill projects...

  19. Mycobacteria in water used for personal hygiene in heavy industry and collieries: a potential risk for employees.

    PubMed

    Ulmann, Vit; Kracalikova, Anna; Dziedzinska, Radka

    2015-03-04

    Environmental mycobacteria (EM) constitute a health risk, particularly for immunocompromised people. Workers in heavy industry and in collieries represent an at-risk group of people as their immunity is often weakened by long-term employment in dusty environments, frequent smoking and an increased occurrence of pulmonary diseases. This study was concerned with the presence of EM in non-drinking water used for the hygiene of employees in six large industrial companies and collieries. Over a period of ten years, 1096 samples of surface water treated for hygiene purposes (treated surface water) and treated surface water diluted with mining water were examined. EM were detected in 63.4 and 41.5% samples of treated surface water and treated surface water diluted with mining water, respectively. Mycobacterium gordonae, M. avium-intracellulare and M. kansasii were the most frequently detected species. Adoption of suitable precautions should be enforced to reduce the incidence of mycobacteria in shower water and to decrease the infectious pressure on employees belonging to an at-risk group of people.

  20. Association mining of mutated cancer genes in different clinical stages across 11 cancer types.

    PubMed

    Hu, Wangxiong; Li, Xiaofen; Wang, Tingzhang; Zheng, Shu

    2016-10-18

    Many studies have demonstrated that some genes (e.g. APC, BRAF, KRAS, PTEN, TP53) are frequently mutated in cancer, however, underlying mechanism that contributes to their high mutation frequency remains unclear. Here we used Apriori algorithm to find the frequent mutational gene sets (FMGSs) from 4,904 tumors across 11 cancer types as part of the TCGA Pan-Cancer effort and then mined the hidden association rules (ARs) within these FMGSs. Intriguingly, we found that well-known cancer driver genes such as BRAF, KRAS, PTEN, and TP53 were often co-occurred with other driver genes and FMGSs size peaked at an itemset size of 3~4 genes. Besides, the number and constitution of FMGS and ARs differed greatly among different cancers and stages. In addition, FMGS and ARs were rare in endocrine-related cancers such as breast carcinoma, ovarian cystadenocarcinoma, and thyroid carcinoma, but abundant in cancers contact directly with external environments such as skin melanoma and stomach adenocarcinoma. Furthermore, we observed more rules in stage IV than in other stages, indicating that distant metastasis needed more sophisticated gene regulatory network.

  1. Discovering Visual Scanning Patterns in a Computerized Cancellation Test

    ERIC Educational Resources Information Center

    Huang, Ho-Chuan; Wang, Tsui-Ying

    2013-01-01

    The purpose of this study was to develop an attention sequential mining mechanism for investigating the sequential patterns of children's visual scanning process in a computerized cancellation test. Participants had to locate and cancel the target amongst other non-targets in a structured form, and a random form with Chinese stimuli. Twenty-three…

  2. Using temporal mining to examine the development of lymphedema in breast cancer survivors.

    PubMed

    Green, Jason M; Paladugu, Sowjanya; Shuyu, Xu; Stewart, Bob R; Shyu, Chi-Ren; Armer, Jane M

    2013-01-01

    Secondary lymphedema is a lifetime risk for breast cancer survivors and can severely affect quality of life. Early detection and treatment are crucial for successful lymphedema management. Limb volume measurements can be utilized not only to diagnose lymphedema but also to track progression of limb volume changes before lymphedema, which has the potential to provide insight into the development of this condition. This study aims to identify commonly occurring patterns in limb volume changes in breast cancer survivors before the development of lymphedema and to determine if there were differences in these patterns between certain patient subgroups. Furthermore, pattern differences were studied between patients who developed lymphedema quickly and those whose onset was delayed. A temporal data mining technique was used to identify and compare common patterns in limb volume measurements in patient subgroups of study participants (n = 232). Patterns were filtered initially by support and confidence values, and then t tests were used to determine statistical significance of the remaining patterns. Higher body mass index and the presence of postoperative swelling are supported as risk factors for lymphedema. In addition, a difference in trajectory to the lymphedema state was observed. The results have potential to guide clinical guidelines for assessment of latent and early-onset lymphedema.

  3. Are environmental characteristics in the municipal eldercare, more closely associated with frequent short sick leave spells among employees than with total sick leave: a cross-sectional study

    PubMed Central

    2013-01-01

    Background It has been suggested that frequent-, short-term sick leave is associated with work environment factors, whereas long-term sick leave is associated mainly with health factors. However, studies of the hypothesis of an association between a poor working environment and frequent short spells of sick leave are few and results are inconsistent. Therefore, we aimed to explore associations between self-reported psychosocial work factors and workplace-registered frequency and length of sick leave in the eldercare sector. Methods Employees from the municipal eldercare in Aarhus (N = 2,534) were included. In 2005, they responded to a work environment questionnaire. Sick leave records from 2005 were dichotomised into total sick leave days (0–14 and above 14 days) and into spell patterns (0–2 short, 3–9 short, and mixed spells and 1–3 long spells). Logistic regression models were used to analyse associations; adjusted for age, gender, occupation, and number of spells or sick leave length. Results The response rate was 76%; 96% of the respondents were women. Unfavourable mean scores in work pace, demands for hiding emotions, poor quality of leadership and bullying were best indicated by more than 14 sick leave days compared with 0–14 sick leave days. For work pace, the best indicator was a long-term sick leave pattern compared with a non-frequent short-term pattern. A frequent short-term sick leave pattern was a better indicator of emotional demands (1.62; 95% CI: 1.1-2.5) and role conflict (1.50; 95% CI: 1.2-1.9) than a short-term non-frequent pattern. Age (= < 40 / >40 years) statistically significantly modified the association between the 1–3 long-term sick leave spell pattern and commitment to the workplace compared with the 3–9 frequent short-term pattern. Conclusions Total sick leave length and a long-term sick leave spell pattern were just as good or even better indicators of unfavourable work factor scores than a frequent short-term sick leave pattern. Scores in commitment to the workplace and quality of leadership varied with sick leave pattern and age. Thus, different sick leave measures seem to be associated with different work environment factors. Further studies on these associations may inform interventions to improve occupational health care. PMID:23764253

  4. Social cost of land mines in four countries: Afghanistan, Bosnia, Cambodia, and Mozambique.

    PubMed Central

    Andersson, N.; da Sousa, C. P.; Paredes, S.

    1995-01-01

    OBJECTIVES--To document the effects of land mines on the health and social conditions of communities in four affected countries. DESIGN--A cross design of cluster survey and rapid appraisal methods including a household questionnaire and qualitative data from key informants, institutional reviews, and focus groups of survivors of land mines from the same communities. SETTING--206 communities, 37 in Afghanistan, 66 in Bosnia, 38 in Cambodia, and 65 in Mozambique. SUBJECTS--174,489 people living in 32,904 households in the selected communities. MAIN OUTCOME MEASURES--Effects of land mines on food security, residence, livestock, and land use; risk factors: extent of individual land mine injuries; physical, psychological, social, and economic costs of injuries during medical care and rehabilitation. RESULTS--Between 25% and 87% of households had daily activities affected by land mines. Based on expected production without the mines, agricultural production could increase by 88-200% in different regions of Afghanistan, 11% in Bosnia, 135% in Cambodia, and 3.6% in Mozambique. A total of 54,554 animals was lost because of land mines, with a minimum cash value of $6.5m, or nearly $200 per household. Overall, 6% of households (1964) reported a land mine victim; a third of victims died in the blast. One in 10 of the victims was a child. The most frequent activities associated with land mine incidents were agricultural or pastoral, except in Bosnia where more than half resulted from military activities, usually during patrols. Incidences have more than doubled between 1980-3 and 1990-3, excluding the incidents in Bosnia. Some 22% of victims (455/2100) were from households reporting attempts to remove land mines; in these households there was a greatly increased risk of injury (odds ratio 4.2 and risk difference 19% across the four countries). Lethality of the mines varied; in Bosnia each blast killed an average of 0.54 people and injured 1.4, whereas in Mozambique each blast killed 1.45 people and wounded 1.27. Households with a land mine victim were 40% more likely to experience difficulty in providing food for the family. Family relationships were affected for around one in every four victims and relationships with colleagues in 40%. CONCLUSIONS--Land mines seriously undermine the economy and food security in affected countries; they kill and maim civilians at an increasing rate. The expense of medical care and rehabilitation add economic disability to the physical burden. Awareness of land mines can be targeted at high risk attitudes, such as those associated with tampering with mines. PMID:7549685

  5. Social cost of land mines in four countries: Afghanistan, Bosnia, Cambodia, and Mozambique.

    PubMed

    Andersson, N; da Sousa, C P; Paredes, S

    1995-09-16

    To document the effects of land mines on the health and social conditions of communities in four affected countries. A cross design of cluster survey and rapid appraisal methods including a household questionnaire and qualitative data from key informants, institutional reviews, and focus groups of survivors of land mines from the same communities. 206 communities, 37 in Afghanistan, 66 in Bosnia, 38 in Cambodia, and 65 in Mozambique. 174,489 people living in 32,904 households in the selected communities. Effects of land mines on food security, residence, livestock, and land use; risk factors: extent of individual land mine injuries; physical, psychological, social, and economic costs of injuries during medical care and rehabilitation. Between 25% and 87% of households had daily activities affected by land mines. Based on expected production without the mines, agricultural production could increase by 88-200% in different regions of Afghanistan, 11% in Bosnia, 135% in Cambodia, and 3.6% in Mozambique. A total of 54,554 animals was lost because of land mines, with a minimum cash value of $6.5m, or nearly $200 per household. Overall, 6% of households (1964) reported a land mine victim; a third of victims died in the blast. One in 10 of the victims was a child. The most frequent activities associated with land mine incidents were agricultural or pastoral, except in Bosnia where more than half resulted from military activities, usually during patrols. Incidences have more than doubled between 1980-3 and 1990-3, excluding the incidents in Bosnia. Some 22% of victims (455/2100) were from households reporting attempts to remove land mines; in these households there was a greatly increased risk of injury (odds ratio 4.2 and risk difference 19% across the four countries). Lethality of the mines varied; in Bosnia each blast killed an average of 0.54 people and injured 1.4, whereas in Mozambique each blast killed 1.45 people and wounded 1.27. Households with a land mine victim were 40% more likely to experience difficulty in providing food for the family. Family relationships were affected for around one in every four victims and relationships with colleagues in 40%. Land mines seriously undermine the economy and food security in affected countries; they kill and maim civilians at an increasing rate. The expense of medical care and rehabilitation add economic disability to the physical burden. Awareness of land mines can be targeted at high risk attitudes, such as those associated with tampering with mines.

  6. Comparison of coseismic near-field and off-fault surface deformation patterns of the 1992 Mw 7.3 Landers and 1999 Mw 7.1 Hector Mine earthquakes: Implications for controls on the distribution of surface strain

    NASA Astrophysics Data System (ADS)

    Milliner, C. W. D.; Dolan, J. F.; Hollingsworth, J.; Leprince, S.; Ayoub, F.

    2016-10-01

    Subpixel correlation of preevent and postevent air photos reveal the complete near-field, horizontal surface deformation patterns of the 1992 Mw 7.3 Landers and 1999 Mw 7.1 Hector Mine ruptures. Total surface displacement values for both earthquakes are systematically larger than "on-fault" displacements from geologic field surveys, indicating significant distributed, inelastic deformation occurred along these ruptures. Comparison of these two data sets shows that 46 ± 10% and 39 ± 22% of the total surface deformation were distributed over fault zones averaging 154 m and 121 m in width for the Landers and Hector Mine events, respectively. Spatial variations of distributed deformation along both ruptures show correlations with the type of near-surface lithology and degree of fault complexity; larger amounts of distributed shear occur where the rupture propagated through loose unconsolidated sediments and areas of more complex fault structure. These results have basic implications for geologic-geodetic rate comparisons and probabilistic seismic hazard analysis.

  7. Sexual dimorphism in digital dermatoglyphic traits among Sinhalese people in Sri Lanka

    PubMed Central

    2013-01-01

    Background The purpose of this study was to evaluate gender-wise diversity of digital dermatoglyphic traits in a sample of Sinhalese people in Sri Lanka. Findings Four thousand and thirty-four digital prints of 434 Sinhalese individuals (217 males and 217 females) were examined for their digital dermatoglyphic pattern distribution. The mean age for the entire group was 23.66 years (standard deviation = 4.93 years). The loop pattern is observed more frequently (n = 2,592, 59.72%) compared to whorl (n = 1,542, 35.53%) and arch (n = 206, 4.75%) in the Sinhalese population. Females (n = 1,274, 58.71%) have a more ulnar loop pattern than males (n = 1,231, 56.73%). The plain whorl pattern is observed more frequently in males (n = 560, 25.81%) compared to females (n = 514, 23.69%).The double loop pattern is observed more frequently on the right and left thumb (digit 1) of both males and females. Pattern intensity index, Dankmeijer index and Furuhata index are higher in males. Conclusions Ulnar loop is the most frequently occurring digital dermatoglyphic pattern among the Sinhalese. All pattern indices are higher in males. To some extent, dermatoglyphic patterns of Sinhalese are similar to North Indians and other Caucasoid populations. Further studies with larger sample sizes are recommended to confirm our findings. PMID:24377367

  8. A novel association rule mining approach using TID intermediate itemset.

    PubMed

    Aqra, Iyad; Herawan, Tutut; Abdul Ghani, Norjihan; Akhunzada, Adnan; Ali, Akhtar; Bin Razali, Ramdan; Ilahi, Manzoor; Raymond Choo, Kim-Kwang

    2018-01-01

    Designing an efficient association rule mining (ARM) algorithm for multilevel knowledge-based transactional databases that is appropriate for real-world deployments is of paramount concern. However, dynamic decision making that needs to modify the threshold either to minimize or maximize the output knowledge certainly necessitates the extant state-of-the-art algorithms to rescan the entire database. Subsequently, the process incurs heavy computation cost and is not feasible for real-time applications. The paper addresses efficiently the problem of threshold dynamic updation for a given purpose. The paper contributes by presenting a novel ARM approach that creates an intermediate itemset and applies a threshold to extract categorical frequent itemsets with diverse threshold values. Thus, improving the overall efficiency as we no longer needs to scan the whole database. After the entire itemset is built, we are able to obtain real support without the need of rebuilding the itemset (e.g. Itemset list is intersected to obtain the actual support). Moreover, the algorithm supports to extract many frequent itemsets according to a pre-determined minimum support with an independent purpose. Additionally, the experimental results of our proposed approach demonstrate the capability to be deployed in any mining system in a fully parallel mode; consequently, increasing the efficiency of the real-time association rules discovery process. The proposed approach outperforms the extant state-of-the-art and shows promising results that reduce computation cost, increase accuracy, and produce all possible itemsets.

  9. A novel association rule mining approach using TID intermediate itemset

    PubMed Central

    Ali, Akhtar; Bin Razali, Ramdan; Ilahi, Manzoor; Raymond Choo, Kim-Kwang

    2018-01-01

    Designing an efficient association rule mining (ARM) algorithm for multilevel knowledge-based transactional databases that is appropriate for real-world deployments is of paramount concern. However, dynamic decision making that needs to modify the threshold either to minimize or maximize the output knowledge certainly necessitates the extant state-of-the-art algorithms to rescan the entire database. Subsequently, the process incurs heavy computation cost and is not feasible for real-time applications. The paper addresses efficiently the problem of threshold dynamic updation for a given purpose. The paper contributes by presenting a novel ARM approach that creates an intermediate itemset and applies a threshold to extract categorical frequent itemsets with diverse threshold values. Thus, improving the overall efficiency as we no longer needs to scan the whole database. After the entire itemset is built, we are able to obtain real support without the need of rebuilding the itemset (e.g. Itemset list is intersected to obtain the actual support). Moreover, the algorithm supports to extract many frequent itemsets according to a pre-determined minimum support with an independent purpose. Additionally, the experimental results of our proposed approach demonstrate the capability to be deployed in any mining system in a fully parallel mode; consequently, increasing the efficiency of the real-time association rules discovery process. The proposed approach outperforms the extant state-of-the-art and shows promising results that reduce computation cost, increase accuracy, and produce all possible itemsets. PMID:29351287

  10. Reclamation technology development for western Arkansas coal refuse waste materials

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    King, J.R.; Veith, D.L.

    Coal mining has been an important industry in the Arkansas River Valley Major Land Resource Area (MLRA) of western Arkansas for more than 100 yr., most of it with little regard for environmental concerns. Almost 3,640 ha. of land affected by surface coal mines cover the seven-county area, with less than 1,200 ha. currently in various stages of operation or reclamation. Since only the active mining sites must now be reclaimed by law, the remaining 2,440 ha. of abandoned land remains at the mercy of natural forces. Little topsoil exists on these sites and the coal wastes are generally acidicmore » with a pH in the 4.0-5.5 range. Revegetation attempts under these conditions generally require continued maintenance and retreatment until an acceptable cover is achieved. If and when an acceptable vegetative cover is established, the cost frequently approaches $7,400/ha. ($3,000/acre). In an effort to resolve these issues and provide some direction for stabilizing coal waste lands, the US Department of Agriculture through its Soil Conservation Service Plant Materials Center at Boonville, Arkansas, received a Congressional Pass through administered by the US Bureau of Mines, to support a 5-yr. revegetation study on the coal mine spoils of western Arkansas. This paper reports the results through the spring of 1994 on that portion of the study dealing with the establishment of blackberries as a cash crop on coal mine spoils.« less

  11. Systematic Review of Data Mining Applications in Patient-Centered Mobile-Based Information Systems.

    PubMed

    Fallah, Mina; Niakan Kalhori, Sharareh R

    2017-10-01

    Smartphones represent a promising technology for patient-centered healthcare. It is claimed that data mining techniques have improved mobile apps to address patients' needs at subgroup and individual levels. This study reviewed the current literature regarding data mining applications in patient-centered mobile-based information systems. We systematically searched PubMed, Scopus, and Web of Science for original studies reported from 2014 to 2016. After screening 226 records at the title/abstract level, the full texts of 92 relevant papers were retrieved and checked against inclusion criteria. Finally, 30 papers were included in this study and reviewed. Data mining techniques have been reported in development of mobile health apps for three main purposes: data analysis for follow-up and monitoring, early diagnosis and detection for screening purpose, classification/prediction of outcomes, and risk calculation (n = 27); data collection (n = 3); and provision of recommendations (n = 2). The most accurate and frequently applied data mining method was support vector machine; however, decision tree has shown superior performance to enhance mobile apps applied for patients' self-management. Embedded data-mining-based feature in mobile apps, such as case detection, prediction/classification, risk estimation, or collection of patient data, particularly during self-management, would save, apply, and analyze patient data during and after care. More intelligent methods, such as artificial neural networks, fuzzy logic, and genetic algorithms, and even the hybrid methods may result in more patients-centered recommendations, providing education, guidance, alerts, and awareness of personalized output.

  12. Optimising post-mining soil conditions to maximise restoration success in a biodiverse semiarid environment

    NASA Astrophysics Data System (ADS)

    Muñoz-Rojas, Miriam; Erickson, Todd; Merritt, David; Dixon, Kingsley

    2014-05-01

    The original topsoil of mine degraded areas is frequently lost or damaged, which together with the absence of soil forming materials is a major constraint for seed germination and establishment in post-mining restoration. Thus, management of the available topsoil and the use of alternative growth media are critical to improve restoration areas disturbed through mining. Here we are developing laboratory and field trials to define the optimal range for physical and chemical properties of potentially suitable natural and 're-made' soil substrates and growth medium for 20 selected native plant species from the mining intensive Pilbara region of Western Australia. In this semiarid area, water is a limiting factor for seedling establishment, which is compounded by the lack of organic matter of post-disturbance soils. Therefore, particular attention is given to indicators of soil biological activity such as soil respiration, and hydrological soil properties such as water holding capacity, infiltration, hydraulic conductivity and soil water repellence. This research is part of a broader multi-study approach, the Restoration Seedbank Initiative project, a partnership between The University of Western Australia, BHP Billiton Iron Ore, and Kings Park and Botanic Garden to develop the science and underpinning knowledge to achieve biodiverse restoration in the Pilbara region, where land areas disturbed by mining exceed 40,000 ha. Achieving restoration success is critical as the Pilbara region is an ancient landscape with diverse geology and high levels of regional and local endemism in plants and animals.

  13. Integrating multiple immunogenetic data sources for feature extraction and mining somatic hypermutation patterns: the case of "towards analysis" in chronic lymphocytic leukaemia.

    PubMed

    Kavakiotis, Ioannis; Xochelli, Aliki; Agathangelidis, Andreas; Tsoumakas, Grigorios; Maglaveras, Nicos; Stamatopoulos, Kostas; Hadzidimitriou, Anastasia; Vlahavas, Ioannis; Chouvarda, Ioanna

    2016-06-06

    Somatic Hypermutation (SHM) refers to the introduction of mutations within rearranged V(D)J genes, a process that increases the diversity of Immunoglobulins (IGs). The analysis of SHM has offered critical insight into the physiology and pathology of B cells, leading to strong prognostication markers for clinical outcome in chronic lymphocytic leukaemia (CLL), the most frequent adult B-cell malignancy. In this paper we present a methodology for integrating multiple immunogenetic and clinocobiological data sources in order to extract features and create high quality datasets for SHM analysis in IG receptors of CLL patients. This dataset is used as the basis for a higher level integration procedure, inspired form social choice theory. This is applied in the Towards Analysis, our attempt to investigate the potential ontogenetic transformation of genes belonging to specific stereotyped CLL subsets towards other genes or gene families, through SHM. The data integration process, followed by feature extraction, resulted in the generation of a dataset containing information about mutations occurring through SHM. The Towards analysis performed on the integrated dataset applying voting techniques, revealed the distinct behaviour of subset #201 compared to other subsets, as regards SHM related movements among gene clans, both in allele-conserved and non-conserved gene areas. With respect to movement between genes, a high percentage movement towards pseudo genes was found in all CLL subsets. This data integration and feature extraction process can set the basis for exploratory analysis or a fully automated computational data mining approach on many as yet unanswered, clinically relevant biological questions.

  14. [Vegetation spatial and temporal dynamic characteristics based on NDVI time series trajectories in grassland opencast coal mining].

    PubMed

    Jia, Duo; Wang, Cang Jiao; Mu, Shou Guo; Zhao, Hua

    2017-06-18

    The spatiotemporal dynamic patterns of vegetation in mining area are still unclear. This study utilized time series trajectory segmentation algorithm to fit Landsat NDVI time series which generated from fusion images at the most prosperous period of growth based on ESTARFM algorithm. Combining with the shape features of the fitted trajectory, this paper extracted five vegetation dynamic patterns including pre-disturbance type, continuous disturbance type, stabilization after disturbance type, stabilization between disturbance and recovery type, and recovery after disturbance type. The result indicated that recovery after disturbance type was the dominant vegetation change pattern among the five types of vegetation dynamic pattern, which accounted for 55.2% of the total number of pixels. The follows were stabilization after disturbance type and continuous disturbance type, accounting for 25.6% and 11.0%, respectively. The pre-disturbance type and stabilization between disturbance and recovery type accounted for 3.5% and 4.7%, respectively. Vegetation disturbance mainly occurred from 2004 to 2009 in Shengli mining area. The onset time of stable state was 2008 and the spatial locations mainlydistributed in open-pit stope and waste dump. The reco-very state mainly started since the year of 2008 and 2010, while the areas were small and mainly distributed at the periphery of open-pit stope and waste dump. Duration of disturbance was mainly 1 year. The duration of stable period usually sustained 7 years. The duration of recovery state of the type of stabilization between disturbances continued 2 to 5 years, while the type of recovery after disturbance often sustained 8 years.

  15. Forming artificial soils from waste materials for mine site rehabilitation

    NASA Astrophysics Data System (ADS)

    Yellishetty, Mohan; Wong, Vanessa; Taylor, Michael; Li, Johnson

    2014-05-01

    Surface mining activities often produce large volumes of solid wastes which invariably requires the removal of significant quantities of waste rock (overburden). As mines expand, larger volumes of waste rock need to be moved which also require extensive areas for their safe disposal and containment. The erosion of these dumps may result in landform instability, which in turn may result in exposure of contaminants such as trace metals, elevated sediment delivery in adjacent waterways, and the subsequent degradation of downstream water quality. The management of solid waste materials from industrial operations is also a key component for a sustainable economy. For example, in addition to overburden, coal mines produce large amounts of waste in the form of fly ash while sewage treatment plants require disposal of large amounts of compost. Similarly, paper mills produce large volumes of alkaline rejected wood chip waste which is usually disposed of in landfill. These materials, therefore, presents a challenge in their use, and re-use in the rehabilitation of mine sites and provides a number of opportunities for innovative waste disposal. The combination of solid wastes sourced from mines, which are frequently nutrient poor and acidic, with nutrient-rich composted material produced from sewage treatment and alkaline wood chip waste has the potential to lead to a soil suitable for mine rehabilitation and successful seed germination and plant growth. This paper presents findings from two pilot projects which investigated the potential of artificial soils to support plant growth for mine site rehabilitation. We found that pH increased in all the artificial soil mixtures and were able to support plant establishment. Plant growth was greatest in those soils with the greatest proportion of compost due to the higher nutrient content. These pot trials suggest that the use of different waste streams to form an artificial soil can potentially be used in mine site rehabilitation where there is a nutrient-rich source of waste.

  16. Metal contamination and post-remediation recovery in the Boulder River watershed, Jefferson County, Montana

    USGS Publications Warehouse

    Unruh, Daniel M.; Church, Stanley E; Nimick, David A.; Fey, David L.

    2009-01-01

    The legacy of acid mine drainage and toxic trace metals left in streams by historical mining is being addressed by many important yet costly remediation efforts. Monitoring of environmental conditions frequently is not performed but is essential to evaluate remediation effectiveness, determine whether clean-up goals have been met, and assess which remediation strategies are most effective. Extensive pre- and post-remediation data for water and sediment quality for the Boulder River watershed in southwestern Montana provide an unusual opportunity to demonstrate the importance of monitoring. The most extensive restoration in the watershed occurred at the Comet mine on High Ore Creek and resulted in the most dramatic improvement in aquatic habitat. Removal of contaminated sediment and tailings, and stream-channel reconstruction reduced Cd and Zn concentrations in water such that fish are now present, and reduced metal concentrations in streambed sediment by a factor of c. 10, the largest improvement in the district. Waste removals at the Buckeye/Enterprise and Bullion mine sites produced limited or no improvement in water and sediment quality, and acidic drainage from mine adits continues to degrade stream aquatic habitat. Recontouring of hillslopes that had funnelled runoff into the workings of the Crystal mine substantially reduced metal concentrations in Uncle Sam Gulch, but did not eliminate all of the acidic adit drainage. Lead isotopic evidence suggests that the Crystal mine rather than the Comet mine is now the largest source of metals in streambed sediment of the Boulder River. The completed removal actions prevent additional contaminants from entering the stream, but it may take many years for erosional processes to diminish the effects of contaminated sediment already in streams. Although significant strides have been made, additional efforts to seal draining adits or treat the adit effluent at the Bullion and Crystal mines would need to be completed to achieve the desired restoration.

  17. Hospitalization patterns associated with Appalachian coal mining.

    PubMed

    Hendryx, Michael; Ahern, Melissa M; Nurkiewicz, Timothy R

    2007-12-01

    The goal of this study was to test whether the volume of coal mining was related to population hospitalization risk for diseases postulated to be sensitive or insensitive to coal mining by-products. The study was a retrospective analysis of 2001 adult hospitalization data (n = 93,952) for West Virginia, Kentucky, and Pennsylvania, merged with county-level coal production figures. Hospitalization data were obtained from the Health Care Utilization Project National Inpatient Sample. Diagnoses postulated to be sensitive to coal mining by-product exposure were contrasted with diagnoses postulated to be insensitive to exposure. Data were analyzed using hierarchical nonlinear models, controlling for patient age, gender, insurance, comorbidities, hospital teaching status, county poverty, and county social capital. Controlling for covariates, the volume of coal mining was significantly related to hospitalization risk for two conditions postulated to be sensitive to exposure: hypertension and chronic obstructive pulmonary disease (COPD). The odds for a COPD hospitalization increased 1% for each 1462 tons of coal, and the odds for a hypertension hospitalization increased 1% for each 1873 tons of coal. Other conditions were not related to mining volume. Exposure to particulates or other pollutants generated by coal mining activities may be linked to increased risk of COPD and hypertension hospitalizations. Limitations in the data likely result in an underestimate of associations.

  18. Characterization and Modeling of Dust Emissions from an Instrumented Mine Tailings Site

    NASA Astrophysics Data System (ADS)

    Betterton, E. A.; Stovern, M.; Saez, A.; Csavina, J. L.; Felix Villar, O. I.; Field, J. P.; Rine, K. P.; Russell, M. R.; Saliba, P.

    2012-12-01

    Mining operations are potential sources of airborne particulate metal and metalloid contaminants through both direct smelter emissions and wind erosion of mine tailings. The warmer, drier conditions predicted for the Southwestern US by climate models may make contaminated atmospheric dust and aerosols increasingly important, due to potential deleterious effects on human health and ecology. Dust emissions and dispersion of contaminants from the Iron King Mine tailings in Dewey-Humboldt, Arizona, a Superfund site, are currently being investigated through in situ field measurements and computational fluid dynamics modeling. These tailings are heavily contaminated with lead and arsenic. We report on the chemical characterization of atmospheric dust and aerosol sampled near the mine tailings. Instrumented eddy flux towers were also setup on the mine tailings to give both spatial and temporal dust observations. The eddy flux towers have multiple DUSTTRAK monitors as well as weather stations. These in situ observations allow us to assess spatial distribution of suspended particulate. Using the DUSTTRAK flux tower observations at 10-second resolution in conjunction with a computational fluid dynamics model, we have been able to model dust transport from the mine tailings to downwind areas. In order to improve the accuracy of the dust transport simulations both regional topographical features and local weather patterns have been incorporated into the model simulations.

  19. Drilling and blasting parameters in sublevel caving in Sheregesh mine

    NASA Astrophysics Data System (ADS)

    Eremenko, AA; Filippov, VN; Konurin, AI; Khmelinin, AP; Baryshnikov, DV; Khristolyubov, EA

    2018-03-01

    The factors that influence geomechanical state of rock mass in Sheregesh Mine are determined. The authors discuss a variant of geotechnology with fan drilling. The drill-hole patterns and drilling-and-blasting parameters are presented. The revealed causes of low-quality fragmentation of rocks include the presence of closed and open fractures at different distances from drill-hole mouths, both in case of rings and fans, as well as the blocking of drill-holes with rocks.

  20. International strategic mineral issues summary report: tungsten

    USGS Publications Warehouse

    Werner, Antony B.T.; Sinclair, W. David; Amey, Earle B.

    1998-01-01

    In 1995, China and the former Soviet Union accounted for over three-fourths of the world's mine production of tungsten. China alone produced about two-thirds of world output. Given its vast resources, China will likely maintain its prominent role in world tungsten supply. By the year 2020, changes in supply patterns are likely to result from declining output from individual deposits in Australia, Austria, and Portugal and the opening of new mines in Canada, China, and the United Kingdom.

  1. Developing and Implementing the Data Mining Algorithms in RAVEN

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sen, Ramazan Sonat; Maljovec, Daniel Patrick; Alfonsi, Andrea

    The RAVEN code is becoming a comprehensive tool to perform probabilistic risk assessment, uncertainty quantification, and verification and validation. The RAVEN code is being developed to support many programs and to provide a set of methodologies and algorithms for advanced analysis. Scientific computer codes can generate enormous amounts of data. To post-process and analyze such data might, in some cases, take longer than the initial software runtime. Data mining algorithms/methods help in recognizing and understanding patterns in the data, and thus discover knowledge in databases. The methodologies used in the dynamic probabilistic risk assessment or in uncertainty and error quantificationmore » analysis couple system/physics codes with simulation controller codes, such as RAVEN. RAVEN introduces both deterministic and stochastic elements into the simulation while the system/physics code model the dynamics deterministically. A typical analysis is performed by sampling values of a set of parameter values. A major challenge in using dynamic probabilistic risk assessment or uncertainty and error quantification analysis for a complex system is to analyze the large number of scenarios generated. Data mining techniques are typically used to better organize and understand data, i.e. recognizing patterns in the data. This report focuses on development and implementation of Application Programming Interfaces (APIs) for different data mining algorithms, and the application of these algorithms to different databases.« less

  2. Analysis of In-Flight Collision Process During V-Type Firing Pattern in Surface Blasting Using Simple Physics

    NASA Astrophysics Data System (ADS)

    Chouhan, Lalit Singh; Raina, Avtar K.

    2015-10-01

    Blasting is a unit operation in Mine-Mill Fragmentation System (MMFS) and plays a vital role in mining cost. One of the goals of MMFS is to achieve optimum fragment size at minimal cost. Blast fragmentation optimization is known to result in better explosive energy utilization. Fragmentation depends on the rock, explosive and blast design variables. If burden, spacing and type of explosive used in a mine are kept constant, the firing sequence of blast-holes plays a vital role in rock fragmentation. To obtain smaller fragmentation size, mining professionals and relevant publications recommend V- or extended V-pattern of firing sequence. In doing so, it is assumed that the in-flight air collision breaks larger rock fragments into smaller ones, thus aiding further fragmentation. There is very little support to the phenomenon of breakage during in-flight collision of fragments during blasting in published literature. In order to assess the breakage of in-flight fragments due to collision, a mathematical simulation was carried over using basic principles of physics. The calculations revealed that the collision breakage is dependent on velocity of fragments, mass of fragments, the strength of the rock and the area of fragments over which collision takes place. For higher strength rocks, the in-flight collision breakage is very difficult to achieve. This leads to the conclusion that the concept demands an in-depth investigation and validation.

  3. In vivo drug metabolite identification in preclinical ADME studies by means of UPLC/TWIMS/high resolution-QTOF MS(E) and control comparison: cost and benefit of vehicle-dosed control samples.

    PubMed

    Fiebig, Lukas; Laux, Ralf; Binder, Rudolf; Ebner, Thomas

    2016-10-01

    1. Liquid chromatography (LC)-high resolution mass spectrometry (HRMS) techniques proved to be well suited for the identification of predicted and unexpected drug metabolites in complex biological matrices. 2. To efficiently discriminate between drug-related and endogenous matrix compounds, however, sophisticated postacquisition data mining tools, such as control comparison techniques are needed. For preclinical absorption, distribution, metabolism and excretion (ADME) studies that usually lack a placebo-dosed control group, the question arises how high-quality control data can be yielded using only a minimum number of control animals. 3. In the present study, the combination of LC-traveling wave ion mobility separation (TWIMS)-HRMS(E) and multivariate data analysis was used to study the polymer patterns of the frequently used formulation constituents polyethylene glycol 400 and polysorbate 80 in rat plasma and urine after oral and intravenous administration, respectively. 4. Complex peak patterns of both constituents were identified underlining the general importance of a vehicle-dosed control group in ADME studies for control comparison. Furthermore, the detailed analysis of administration route, blood sampling time and gender influences on both vehicle peak pattern as well as endogenous matrix background revealed that high-quality control data is obtained when (i) control animals receive an intravenous dose of the vehicle, (ii) the blood sampling time point is the same for analyte and control sample and (iii) analyte and control samples of the same gender are compared.

  4. Suspended sediment load below open-cast mines for ungauged river basin

    NASA Astrophysics Data System (ADS)

    Kuksina, L.

    2011-12-01

    Placer mines are located in river valleys along river benches or river ancient channels. Frequently the existing mining sites are characterized by low contribution of the environmental technologies. Therefore open-pit mining alters stream hydrology and sediment processes and enhances sediment transport. The most serious environmental consequences of the sediment yield increase occur in the rivers populated by salmon fish community because salmon species prefer clean water with low turbidity. For instance, placer mining located in Kamchatka peninsula (Far East of Russia) which is regarded to be the last global gene pool of wild salmon Oncorhynchus threatens rivers ecosystems significantly. Impact assessment is limited by the hydrological observations scarcity. Gauging network is rare and in many cases whole basins up to 200 km length miss any hydrological data. The main purpose of the work is elaboration of methods for sediment yield estimation in rivers under mining impact and implementation of corresponding calculations. Subjects of the study are rivers of the Vivenka river basin where open-cast platinum mine is situated. It's one of the largest platinum mines in Russian Federation and in the world. This mine is the most well-studied in Kamchatka (research covers a period from 2003 to 2011). Empirical - analytical model of suspended sediment yield estimation was elaborated for rivers draining mine's territories. Sediment delivery at the open-cast mine happens due to the following sediment processes: - erosion in the channel diversions; - soil erosion on the exposed hillsides; - effluent from settling ponds; - mine waste water inflow; - accident mine waste water escape into rivers. Sediment washout caused by erosion was estimated by repeated measurements of the channel profiles in 2003, 2006 and 2008. Estimation of horizontal deformation rates was carried out on the basis of erosion dependence on water discharge rates, slopes and composition of sediments. Soil erosion on the exposed hillsides was estimated taking into account precipitation of various intensity and solid material washout during this period. Effluent from settling ponds was calculated on the basis of minimum anthropogenic turbidity. Its value is difference in background turbidity and minimal turbidity caused by effluent and waste water overflow. Mine waste water inflow was estimated due to actual data on water balance of purification system. Accident mine waste water escape into rivers was estimated by duration and material washout during accidents data measured during observation period. Total suspended sediment yield of rivers draining mine's territory is the sum of its components. Total sediment supply from mining site is 24.7 % from the Vivenka sediment yield. Polluted placer-mined rivers contribute about 35.4 % of the whole sediment yield of the Vivenka river. At the same time the catchment area of these rivers is less than 0.2 % from the whole Vivenka catchment area.

  5. 42 CFR 455.2 - Definitions.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... any source, including but not limited to the following: (1) Fraud hotline complaints. (2) Claims data mining. (3) Patterns identified through provider audits, civil false claims cases, and law enforcement...

  6. 42 CFR 455.2 - Definitions.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... any source, including but not limited to the following: (1) Fraud hotline complaints. (2) Claims data mining. (3) Patterns identified through provider audits, civil false claims cases, and law enforcement...

  7. 42 CFR 455.2 - Definitions.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... any source, including but not limited to the following: (1) Fraud hotline complaints. (2) Claims data mining. (3) Patterns identified through provider audits, civil false claims cases, and law enforcement...

  8. Fluvial transport and surface enrichment of arsenic in semi-arid mining regions: examples from the Mojave Desert, California.

    PubMed

    Kim, Christopher S; Stack, David H; Rytuba, James J

    2012-07-01

    As a result of extensive gold and silver mining in the Mojave Desert, southern California, mine wastes and tailings containing highly elevated arsenic (As) concentrations remain exposed at a number of former mining sites. Decades of weathering and erosion have contributed to the mobilization of As-enriched tailings, which now contaminate surrounding communities. Fluvial transport plays an intermittent yet important and relatively undocumented role in the migration and dispersal of As-contaminated mine wastes in semi-arid climates. Assessing the contribution of fluvial systems to tailings mobilization is critical in order to assess the distribution and long-term exposure potential of tailings in a mining-impacted environment. Extensive sampling, chemical analysis, and geospatial mapping of dry streambed (wash) sediments, tailings piles, alluvial fans, and rainwater runoff at multiple mine sites have aided the development of a conceptual model to explain the fluvial migration of mine wastes in semi-arid climates. Intense and episodic precipitation events mobilize mine wastes downstream and downslope as a series of discrete pulses, causing dispersion both down and lateral to washes with exponential decay behavior as distance from the source increases. Accordingly a quantitative model of arsenic concentrations in wash sediments, represented as a series of overlapping exponential power-law decay curves, results in the acceptable reproducibility of observed arsenic concentration patterns. Such a model can be transferable to other abandoned mine lands as a predictive tool for monitoring the fate and transport of arsenic and related contaminants in similar settings. Effective remediation of contaminated mine wastes in a semi-arid environment requires addressing concurrent changes in the amounts of potential tailings released through fluvial processes and the transport capacity of a wash.

  9. Fluvial transport and surface enrichment of arsenic in semi-arid mining regions: examples from the Mojave Desert, California

    USGS Publications Warehouse

    Kim, Christopher S.; Slack, David H.; Rytuba, James J.

    2012-01-01

    As a result of extensive gold and silver mining in the Mojave Desert, southern California, mine wastes and tailings containing highly elevated arsenic (As) concentrations remain exposed at a number of former mining sites. Decades of weathering and erosion have contributed to the mobilization of As-enriched tailings, which now contaminate surrounding communities. Fluvial transport plays an intermittent yet important and relatively undocumented role in the migration and dispersal of As-contaminated mine wastes in semi-arid climates. Assessing the contribution of fluvial systems to tailings mobilization is critical in order to assess the distribution and long-term exposure potential of tailings in a mining-impacted environment. Extensive sampling, chemical analysis, and geospatial mapping of dry streambed (wash) sediments, tailings piles, alluvial fans, and rainwater runoff at multiple mine sites have aided the development of a conceptual model to explain the fluvial migration of mine wastes in semi-arid climates. Intense and episodic precipitation events mobilize mine wastes downstream and downslope as a series of discrete pulses, causing dispersion both down and lateral to washes with exponential decay behavior as distance from the source increases. Accordingly a quantitative model of arsenic concentrations in wash sediments, represented as a series of overlapping exponential power-law decay curves, results in the acceptable reproducibility of observed arsenic concentration patterns. Such a model can be transferable to other abandoned mine lands as a predictive tool for monitoring the fate and transport of arsenic and related contaminants in similar settings. Effective remediation of contaminated mine wastes in a semi-arid environment requires addressing concurrent changes in the amounts of potential tailings released through fluvial processes and the transport capacity of a wash.

  10. Data mining in soft computing framework: a survey.

    PubMed

    Mitra, S; Pal, S K; Mitra, P

    2002-01-01

    The present article provides a survey of the available literature on data mining using soft computing. A categorization has been provided based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the model. The utility of the different soft computing methodologies is highlighted. Generally fuzzy sets are suitable for handling the issues related to understandability of patterns, incomplete/noisy data, mixed media information and human interaction, and can provide approximate solutions faster. Neural networks are nonparametric, robust, and exhibit good learning and generalization capabilities in data-rich environments. Genetic algorithms provide efficient search algorithms to select a model, from mixed media data, based on some preference criterion/objective function. Rough sets are suitable for handling different types of uncertainty in data. Some challenges to data mining and the application of soft computing methodologies are indicated. An extensive bibliography is also included.

  11. Blasting for abandoned-mine land reclamation (closure of individual subsidence features and erratic, undocumented underground coal-mine workings). Final report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Workman, J.L.; Thompson, J.

    1991-01-01

    The study has examined the feasibility of blasting for mitigating various abandoned mine land features on AML sites. The investigation included extensive field trial blasts at sites in North Dakota and Montana. A blasting technique was used that was based on spherical cratering concepts. At the Beulah, North Dakota site thirteen individual vertical openings (sinkholes) were blasted with the intent to fill the voids. The blasts were designed to displace material laterally into the void. Good success was had in filling the sinkholes. At the White site in Montana erratic underground rooms with no available documentation were collapsed. An aditmore » leading into the mine was also blasted. Both individual room blasting and area pattern blasting were studied. A total of eight blasts were fired on the one acre area. Exploration requirements and costs were found to be extensive.« less

  12. Mineralogy from Cores in Prospect Gulch, San Juan County, Colorado

    USGS Publications Warehouse

    Bove, Dana J.; Johnson, Raymond H.; Yager, Douglas B.

    2007-01-01

    In the late nineteenth century, San Juan County, Colorado, was the center of a metal mining boom in the San Juan Mountains. Although most mining activity ceased by the 1990s, the effects of historical mining continue to contribute metals to ground water and surface water. Previous research by the U.S. Geological Survey identified ground-water discharge as a significant pathway for the loading of metals to surface water from both acid-mine drainage and acid-rock drainage. In an effort to understand the ground-water flow system in the upper Animas River watershed, Prospect Gulch was selected for further study because of the amount of previous data provided in and around that particular watershed. In support of this ground-water research effort, data was collected from drill core, which included: (1) detailed descriptions of the subsurface geology and hydrothermal alteration patterns, (2) depth of sulfide oxidation, and (3) quantitative mineralogy.

  13. Modeling the emission, transport and deposition of contaminated dust from a mine tailing site.

    PubMed

    Stovern, Michael; Betterton, Eric A; Sáez, A Eduardo; Villar, Omar Ignacio Felix; Rine, Kyle P; Russell, Mackenzie R; King, Matt

    2014-01-01

    Mining operations are potential sources of airborne particulate metal and metalloid contaminants through both direct smelter emissions and wind erosion of mine tailings. The warmer, drier conditions predicted for the Southwestern US by climate models may make contaminated atmospheric dust and aerosols increasingly important, due to potential deleterious effects on human health and ecology. Dust emissions and dispersion of contaminants from the Iron King Mine tailings in Dewey-Humboldt, Arizona, a Superfund site, are currently being investigated through in situ field measurements and computational fluid dynamics modeling. These tailings are significantly contaminated with lead and arsenic with an average soil concentration of 1616 and 1420 ppm, respectively. Similar levels of these contaminants have also been measured in soil samples taken from the area surrounding the mine tailings. Using a computational fluid dynamics model, we have been able to model dust transport from the mine tailings to the surrounding region. The model includes a distributed Eulerian model to simulate fine aerosol transport and a Lagrangian approach to model fate and transport of larger particles. In order to improve the accuracy of the dust transport simulations both regional topographical features and local weather patterns have been incorporated into the model simulations.

  14. String Mining in Bioinformatics

    NASA Astrophysics Data System (ADS)

    Abouelhoda, Mohamed; Ghanem, Moustafa

    Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra- and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word "data-mining" is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14].

  15. Performance of Case-Based Reasoning Retrieval Using Classification Based on Associations versus Jcolibri and FreeCBR: A Further Validation Study

    NASA Astrophysics Data System (ADS)

    Aljuboori, Ahmed S.; Coenen, Frans; Nsaif, Mohammed; Parsons, David J.

    2018-05-01

    Case-Based Reasoning (CBR) plays a major role in expert system research. However, a critical problem can be met when a CBR system retrieves incorrect cases. Class Association Rules (CARs) have been utilized to offer a potential solution in a previous work. The aim of this paper was to perform further validation of Case-Based Reasoning using a Classification based on Association Rules (CBRAR) to enhance the performance of Similarity Based Retrieval (SBR). The CBRAR strategy uses a classed frequent pattern tree algorithm (FP-CAR) in order to disambiguate wrongly retrieved cases in CBR. The research reported in this paper makes contributions to both fields of CBR and Association Rules Mining (ARM) in that full target cases can be extracted from the FP-CAR algorithm without invoking P-trees and union operations. The dataset used in this paper provided more efficient results when the SBR retrieves unrelated answers. The accuracy of the proposed CBRAR system outperforms the results obtained by existing CBR tools such as Jcolibri and FreeCBR.

  16. Molecular Networking and Pattern-Based Genome Mining Improves Discovery of Biosynthetic Gene Clusters and their Products from Salinispora Species

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna

    Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. In this paper, we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains, including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated themore » identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. Finally, these efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches.« less

  17. Molecular Networking and Pattern-Based Genome Mining Improves Discovery of Biosynthetic Gene Clusters and their Products from Salinispora Species

    DOE PAGES

    Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna; ...

    2015-04-09

    Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. In this paper, we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains, including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated themore » identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. Finally, these efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches.« less

  18. A Typology of Communication Dynamics in Families Living a Slow-Motion Technological Disaster.

    PubMed

    Orom, Heather; Cline, Rebecca J W; Hernandez, Tanis; Berry-Bobovski, Lisa; Schwartz, Ann G; Ruckdeschel, John C

    2012-10-01

    With increasing numbers of communities harmed by exposures to toxic substances, greater understanding of the psychosocial consequences of these technological disasters is needed. One community living the consequences of a slow-motion technological disaster is Libby, Montana, where, for nearly 70 years, amphibole asbestos-contaminated vermiculite was mined and processed. Former mine employees and Libby area residents continue to cope with the health consequences of occupational and environmental asbestos exposure and with the psychosocial challenges accompanying chronic and often fatal asbestos-related diseases (ARD). Nine focus groups were conducted with Libby area residents. Transcripts were analyzed to explore patterns of family communication about ARD. The following five patterns emerged: Open/Supportive, Silent/Supportive, Open/Conflictual, Silent/Conflictual, and Silent/Denial. Open/Supportive communication included encouragement to be screened for ARD, information about ARD and related disaster topics, and emotional support for people with ARD. In contrast, communication patterns characterized by silence or conflict have the potential to hinder health-promoting communication and increase psychological distress.

  19. Brain-computer interface using wavelet transformation and naïve bayes classifier.

    PubMed

    Bassani, Thiago; Nievola, Julio Cesar

    2010-01-01

    The main purpose of this work is to establish an exploratory approach using electroencephalographic (EEG) signal, analyzing the patterns in the time-frequency plane. This work also aims to optimize the EEG signal analysis through the improvement of classifiers and, eventually, of the BCI performance. In this paper a novel exploratory approach for data mining of EEG signal based on continuous wavelet transformation (CWT) and wavelet coherence (WC) statistical analysis is introduced and applied. The CWT allows the representation of time-frequency patterns of the signal's information content by WC qualiatative analysis. Results suggest that the proposed methodology is capable of identifying regions in time-frequency spectrum during the specified task of BCI. Furthermore, an example of a region is identified, and the patterns are classified using a Naïve Bayes Classifier (NBC). This innovative characteristic of the process justifies the feasibility of the proposed approach to other data mining applications. It can open new physiologic researches in this field and on non stationary time series analysis.

  20. Molecular Networking and Pattern-Based Genome Mining Improves discovery of biosynthetic gene clusters and their products from Salinispora species

    PubMed Central

    Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna; Sarkar, Anindita; Li, Jie; Ziemert, Nadine; Wang, Mingxun; Bandeira, Nuno; Moore, Bradley S.; Dorrestein, Pieter C.; Jensen, Paul R.

    2015-01-01

    Summary Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. Here we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated the identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. These efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches. PMID:25865308

  1. A data mining technique for discovering distinct patterns of hand signs: implications in user training and computer interface design.

    PubMed

    Ye, Nong; Li, Xiangyang; Farley, Toni

    2003-01-15

    Hand signs are considered as one of the important ways to enter information into computers for certain tasks. Computers receive sensor data of hand signs for recognition. When using hand signs as computer inputs, we need to (1) train computer users in the sign language so that their hand signs can be easily recognized by computers, and (2) design the computer interface to avoid the use of confusing signs for improving user input performance and user satisfaction. For user training and computer interface design, it is important to have a knowledge of which signs can be easily recognized by computers and which signs are not distinguishable by computers. This paper presents a data mining technique to discover distinct patterns of hand signs from sensor data. Based on these patterns, we derive a group of indistinguishable signs by computers. Such information can in turn assist in user training and computer interface design.

  2. Parallel object-oriented data mining system

    DOEpatents

    Kamath, Chandrika; Cantu-Paz, Erick

    2004-01-06

    A data mining system uncovers patterns, associations, anomalies and other statistically significant structures in data. Data files are read and displayed. Objects in the data files are identified. Relevant features for the objects are extracted. Patterns among the objects are recognized based upon the features. Data from the Faint Images of the Radio Sky at Twenty Centimeters (FIRST) sky survey was used to search for bent doubles. This test was conducted on data from the Very Large Array in New Mexico which seeks to locate a special type of quasar (radio-emitting stellar object) called bent doubles. The FIRST survey has generated more than 32,000 images of the sky to date. Each image is 7.1 megabytes, yielding more than 100 gigabytes of image data in the entire data set.

  3. Learning System of Web Navigation Patterns through Hypertext Probabilistic Grammars

    ERIC Educational Resources Information Center

    Cortes Vasquez, Augusto

    2015-01-01

    One issue of real interest in the area of web data mining is to capture users' activities during connection and extract behavior patterns that help define their preferences in order to improve the design of future pages adapting websites interfaces to individual users. This research is intended to provide, first of all, a presentation of the…

  4. Online Learners' Navigational Patterns Based on Data Mining in Terms of Learning Achievement

    ERIC Educational Resources Information Center

    Keskin, Sinan; Sahin, Muhittin; Ozgur, Adem; Yurdugul, Halil

    2016-01-01

    The aim of this study is to determine navigational patterns of university students in a learning management system (LMS). It also investigates whether online learners' navigational behaviors differ in terms of their academic achievement (pass, fail). The data for the study comes from 65 third grade students enrolled in online Computer Network and…

  5. The numerical simulation on the stability of steep rock slope by DDA

    NASA Astrophysics Data System (ADS)

    Zhu, Jianye; Xue, Yiguo; Tao, Yufan; Zhang, Kai; Li, Zhiqiang; Zhang, Xuedong; Yang, Ying

    2017-05-01

    China is a mountainous country, especially in the southwest area. Recently, the variety of geological disasters such as landslides caused by roadway excavation has become a growing concern for our society. Blindly pursuing mining interests without regard for either the environment or residents in the surrounding areas has created a dangerous situation. In recent years, frequent collapses have occurred at Zengzi Rock in Chongqing, especially after torrential rains [1]. This landslide site is a typical example of collapse caused by mine roadway excavations. To study the mechanism of mining slope stability, we conducted a numerical simulation by DDA based on Zengzi Rock in Chongqing, China. The numerical simulation analyzes the slopes under different engineering conditions and rainfall conditions. The results show that the slope has already been changed under the action of its own joints and fissures. After the excavation of the roadway and the rainfall action, this change is drastically increased and the effect is obvious. Through the result graph, we can find that the change of the displacement and stress distribution is obvious, and the simulation results can be great significance to the mining and support of similar mountain conditions.

  6. Mineral saturation states in natural waters and their sensitivity to thermodynamic and analytical errors

    USGS Publications Warehouse

    Nordstrom, D. Kirk; Ball, James W.

    1989-01-01

    Saturation indices computed with WATEQ4F chemical analyses from a groundwater in crystalline bedrock and a surface water receiving acid mine drainage are frequently at or above saturation with respect to calcite, fluorite, barite, gibbsite and ferrihydrite. Deep granitic groundwaters from Stripa, Sweden, are supersaturated with respect to calcite and fluorite. Acid mine waters from the Leviathan Mine drainage basin in California are supersaturated with respect to barite by about a factor of three. These mine waters also are 10 times supersaturated with respect to the most soluble form of ferric hydroxide but are near saturation with respect to microcrystalline gibbsite. A sensitivity analysis has been performed by varying the analytic and thermodynamic parameters for which the saturation indices are most sensitive. For calcite, fluorite and barite, the supersaturation effect appears to be real because it is only slightly decreased by sources of uncertainty. Apparent supersaturation for gibbsite is most likely caused by the degree of crystallinity on solubility behavior. Apparent supersaturation for ferric hydroxide is likely caused by small colloidal particles (< 0.1 µm) in the water sample that cannot be removed by standard field filtration, although several other possible explanations cannot be easily excluded.

  7. Environmental consequences of the Retsof Salt Mine roof collapse

    USGS Publications Warehouse

    Yager, Richard M.

    2013-01-01

    In 1994, the largest salt mine in North America, which had been in operation for more than 100 years, catastrophically flooded when the mine ceiling collapsed. In addition to causing the loss of the mine and the mineral resources it provided, this event formed sinkholes, caused widespread subsidence to land, caused structures to crack and subside, and changed stream flow and erosion patterns. Subsequent flooding of the mine drained overlying aquifers, changed the groundwater salinity distribution (rendering domestic wells unusable), and allowed locally present natural gas to enter dwellings through water wells. Investigations including exploratory drilling, hydrologic and water-quality monitoring, geologic and geophysical studies, and numerical simulation of groundwater flow, salinity, and subsidence have been effective tools in understanding the environmental consequences of the mine collapse and informing decisions about management of those consequences for the future. Salt mines are generally dry, but are susceptible to leaks and can become flooded if groundwater from overlying aquifers or surface water finds a way downward into the mined cavity through hundreds of feet of rock. With its potential to flood the entire mine cavity, groundwater is a constant source of concern for mine operators. The problem is compounded by the viscous nature of salt and the fact that salt mines commonly lie beneath water-bearing aquifers. Salt (for example halite or potash) deforms and “creeps” into the mined openings over time spans that range from years to centuries. This movement of salt can destabilize the overlying rock layers and lead to their eventual sagging and collapse, creating permeable pathways for leakage of water and depressions or openings at land surface, such as sinkholes. Salt is also highly soluble in water; therefore, whenever water begins to flow into a salt mine, the channels through which it flows increase in diameter as the surrounding salt dissolves. Some mines leak at a slow rate for decades before a section of rock gives way, allowing what initially was a trickle of water to suddenly become a cascade and finally a torrent. Other mines become flooded and are destroyed when an errant drill hole punctures the mine ceiling, allowing water from overlying sources to flow into the mine. Either scenario can cause catastrophic flooding and permanent loss of the mine. Occasionally, a mine that has remained dry for a century will undergo a roof collapse that results in flooding.

  8. Pathogens, patterns of pneumonia, and epidemiologic risk factors associated with respiratory disease in recently weaned cattle in Ireland.

    PubMed

    Murray, Gerard M; More, Simon J; Sammin, Dónal; Casey, Mìcheàl J; McElroy, Máire C; O'Neill, Rónan G; Byrne, William J; Earley, Bernadette; Clegg, Tracy A; Ball, Hywel; Bell, Colin J; Cassidy, Joseph P

    2017-01-01

    We examined the pathogens, morphologic patterns, and risk factors associated with bovine respiratory disease (BRD) in 136 recently weaned cattle ("weanlings"), 6-12 mo of age, that were submitted for postmortem examination to regional veterinary laboratories in Ireland. A standardized sampling protocol included routine microbiologic investigations as well as polymerase chain reaction and immunohistochemistry. Lungs with histologic lesions were categorized into 1 of 5 morphologic patterns of pneumonia. Fibrinosuppurative bronchopneumonia (49%) and interstitial pneumonia (48%) were the morphologic patterns recorded most frequently. The various morphologic patterns of pulmonary lesions suggest the involvement of variable combinations of initiating and compounding infectious agents that hindered any simple classification of the etiopathogenesis of the pneumonias. Dual infections were detected in 58% of lungs, with Mannheimia haemolytica and Histophilus somni most frequently recorded in concert. M. haemolytica (43%) was the most frequently detected respiratory pathogen; H. somni was also shown to be frequently implicated in pneumonia in this age group of cattle. Bovine parainfluenza virus 3 (BPIV-3) and Bovine respiratory syncytial virus (16% each) were the viral agents detected most frequently. Potential respiratory pathogens (particularly Pasteurella multocida, BPIV-3, and H. somni) were frequently detected (64%) in lungs that had neither gross nor histologic pulmonary lesions, raising questions regarding their role in the pathogenesis of BRD. The breadth of respiratory pathogens detected in bovine lungs by various detection methods highlights the diagnostic value of parallel analyses in respiratory disease postmortem investigation.

  9. Monitoring, analyzing and simulating of spatial-temporal changes of landscape pattern over mining area

    NASA Astrophysics Data System (ADS)

    Liu, Pei; Han, Ruimei; Wang, Shuangting

    2014-11-01

    According to the merits of remotely sensed data in depicting regional land cover and Land changes, multi- objective information processing is employed to remote sensing images to analyze and simulate land cover in mining areas. In this paper, multi-temporal remotely sensed data were selected to monitor the pattern, distri- bution and trend of LUCC and predict its impacts on ecological environment and human settlement in mining area. The monitor, analysis and simulation of LUCC in this coal mining areas are divided into five steps. The are information integration of optical and SAR data, LULC types extraction with SVM classifier, LULC trends simulation with CA Markov model, landscape temporal changes monitoring and analysis with confusion matrixes and landscape indices. The results demonstrate that the improved data fusion algorithm could make full use of information extracted from optical and SAR data; SVM classifier has an efficient and stable ability to obtain land cover maps, which could provide a good basis for both land cover change analysis and trend simulation; CA Markov model is able to predict LULC trends with good performance, and it is an effective way to integrate remotely sensed data with spatial-temporal model for analysis of land use / cover change and corresponding environmental impacts in mining area. Confusion matrixes are combined with landscape indices to evaluation and analysis show that, there was a sustained downward trend in agricultural land and bare land, but a continues growth trend tendency in water body, forest and other lands, and building area showing a wave like change, first increased and then decreased; mining landscape has undergone a from small to large and large to small process of fragmentation, agricultural land is the strongest influenced landscape type in this area, and human activities are the primary cause, so the problem should be pay more attentions by government and other organizations.

  10. The ACODEA Framework: Developing Segmentation and Classification Schemes for Fully Automatic Analysis of Online Discussions

    ERIC Educational Resources Information Center

    Mu, Jin; Stegmann, Karsten; Mayfield, Elijah; Rose, Carolyn; Fischer, Frank

    2012-01-01

    Research related to online discussions frequently faces the problem of analyzing huge corpora. Natural Language Processing (NLP) technologies may allow automating this analysis. However, the state-of-the-art in machine learning and text mining approaches yields models that do not transfer well between corpora related to different topics. Also,…

  11. The canary in the coal mine: Sprouts as a rapid indicator of browse impact in managed forests

    Treesearch

    Alex Royo; David W. Kramer; Karl V. Miller; Nathan P. Nibbelink; Susan L. Stout

    2016-01-01

    Forest managers are frequently confronted with sustaining vegetation diversity and structure in land-scapes experiencing high ungulate browsing pressure. Often, managers monitor browse damage and risk to plant communities using vegetation as indicators (i.e., phytoindicators). Although useful, the efficacy of traditional phytoindicators is sometimes hampered by limited...

  12. Data mining for the identification of metabolic syndrome status

    PubMed Central

    Worachartcheewan, Apilak; Schaduangrat, Nalini; Prachayasittikul, Virapong; Nantasenamat, Chanin

    2018-01-01

    Metabolic syndrome (MS) is a condition associated with metabolic abnormalities that are characterized by central obesity (e.g. waist circumference or body mass index), hypertension (e.g. systolic or diastolic blood pressure), hyperglycemia (e.g. fasting plasma glucose) and dyslipidemia (e.g. triglyceride and high-density lipoprotein cholesterol). It is also associated with the development of diabetes mellitus (DM) type 2 and cardiovascular disease (CVD). Therefore, the rapid identification of MS is required to prevent the occurrence of such diseases. Herein, we review the utilization of data mining approaches for MS identification. Furthermore, the concept of quantitative population-health relationship (QPHR) is also presented, which can be defined as the elucidation/understanding of the relationship that exists between health parameters and health status. The QPHR modeling uses data mining techniques such as artificial neural network (ANN), support vector machine (SVM), principal component analysis (PCA), decision tree (DT), random forest (RF) and association analysis (AA) for modeling and construction of predictive models for MS characterization. The DT method has been found to outperform other data mining techniques in the identification of MS status. Moreover, the AA technique has proved useful in the discovery of in-depth as well as frequently occurring health parameters that can be used for revealing the rules of MS development. This review presents the potential benefits on the applications of data mining as a rapid identification tool for classifying MS. PMID:29383020

  13. Data mining for the identification of metabolic syndrome status.

    PubMed

    Worachartcheewan, Apilak; Schaduangrat, Nalini; Prachayasittikul, Virapong; Nantasenamat, Chanin

    2018-01-01

    Metabolic syndrome (MS) is a condition associated with metabolic abnormalities that are characterized by central obesity (e.g. waist circumference or body mass index), hypertension (e.g. systolic or diastolic blood pressure), hyperglycemia (e.g. fasting plasma glucose) and dyslipidemia (e.g. triglyceride and high-density lipoprotein cholesterol). It is also associated with the development of diabetes mellitus (DM) type 2 and cardiovascular disease (CVD). Therefore, the rapid identification of MS is required to prevent the occurrence of such diseases. Herein, we review the utilization of data mining approaches for MS identification. Furthermore, the concept of quantitative population-health relationship (QPHR) is also presented, which can be defined as the elucidation/understanding of the relationship that exists between health parameters and health status. The QPHR modeling uses data mining techniques such as artificial neural network (ANN), support vector machine (SVM), principal component analysis (PCA), decision tree (DT), random forest (RF) and association analysis (AA) for modeling and construction of predictive models for MS characterization. The DT method has been found to outperform other data mining techniques in the identification of MS status. Moreover, the AA technique has proved useful in the discovery of in-depth as well as frequently occurring health parameters that can be used for revealing the rules of MS development. This review presents the potential benefits on the applications of data mining as a rapid identification tool for classifying MS.

  14. Post-mining deterioration of bauxite overburdens in Jamaica: storage methods or subsoil dilution?

    NASA Astrophysics Data System (ADS)

    Harris, Mark A.; Omoregie, Samson N.

    2008-03-01

    Rapid degradation of disturbed soil from a karst bauxite mine in Jamaica was recorded. Substantial macronutrient losses were incurred during a short (1 month) or a long (12 months) storage of the replaced topsoils during frequent wet/dry changes. The results suggested very high rates (>70% in the first year) of soil degradation from storage, alongside moderate rates (30%) within the same storage dump. However, higher levels of soil organic matter (SOM) were indicated just below the surface, compared with the surface horizons. It was unlikely that under a high leaching humid tropical rainfall regime, natural degradation processes could have re-emplaced such material firmly intact in the 15-30 cm zone. It was therefore concluded that these SOM anomalies were due to mechanical dilution of surface soil with subsoil material during overburden removal and emplacement rather than from long storage. Increasing the soil organic content during storage could be one corrective approach. However, it is far less costly to exercise greater care to apply more precise overburden removal and emplacement techniques initially, than it is to correct the results of topsoil contamination with subsoil. Although this study was limited to one mine, in the context of imminent large-scale mining expansion and current practices, further investigations are needed to accurately ascertain the proportion of similar subsoil contamination in other bauxite-mined sites.

  15. Measuring Two-Event Structural Correlations on Graphs

    DTIC Science & Technology

    2012-08-01

    2012 to 00-00-2012 4. TITLE AND SUBTITLE Measuring Two-Event Structural Correlations on Graphs 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ...by event simulation on the DBLP graph. Then we examine the efficiency and scala - bility of the framework with a Twitter network. The third part of...correlation pattern mining for large graphs. In Proc. of the 8th Workshop on Mining and Learning with Graphs, pages 119–126, 2010. [23] T. Smith. A

  16. Comparison of Automated and Manual Recording of Brief Episodes of Intracranial Hypertension and Cerebral Hypoperfusion and Their Association with Outcome After Severe Traumatic Brain Injury

    DTIC Science & Technology

    2017-03-01

    neuro ICP care beyond trauma care. 15. SUBJECT TERMS Advanced machine learning techniques, intracranial pressure, vital signs, monitoring...death and disability in combat casualties [1,2]. Approximately 2 million head injuries occur annually in the United States, resulting in more than...editor. Machine learning and data mining in pattern recognition. Proceedings of the 8th International Workshop on Machine Learning and Data Mining in

  17. Production and precipitation of rare earth elements in acidic to alkaline coal mine discharges, Appalachian Basin, USA

    NASA Astrophysics Data System (ADS)

    Stewart, B. W.; Capo, R. C.; Hedin, B. C.; Wallrich, I. L. R.; Hedin, R. S.

    2016-12-01

    Abandoned coal mine discharges are a serious threat to ground and surface waters due to their high metal content and often high acidity. However, these discharges represent a potential source of rare earth elements (REE), many of which are considered to be critical resources. Trace element data from 18 coal mine drainage (CMD) sites within the Appalachian Basin suggest CMD is enriched in total REE by 1-4 orders of magnitude relative to concentrations expected in unaffected surface or ground waters. When normalized to the North American Shale Composite (NASC), the discharges generally show a pattern of enrichment in the middle REE, including several identified as critical resources (Nd, Eu, Dy, Tb). In contrast, shale, sandstone and coal samples from Appalachian Basin coal-bearing units have concentrations and patterns similar to NASC, indicating that the REE in CMD are fractionated during interaction with rock in the mine pool. The highest total REE contents (up to 2800 mg/L) are found in low-pH discharges (acid mine drainage, or AMD). A precipitous drop in REE concentration in CMD with pH ≥6.6 suggests adsorption or precipitation of REE in the mine pool at circumneutral pH. Precipitated solids from 21 CMD active and passive treatment sites in the Appalachian Basin, including Fe oxy-hydroxides, Ca-Mg lime slurries, and Si- and Al-rich precipitates, are enriched in total REE content relative to the average CMD discharges by about four orders of magnitude. Similar REE trends in the discharges and precipitates, including MREE enrichment, suggest minimal fractionation of REE during precipitation; direct comparisons over multiple seasonal cycles are needed to confirm this. Although the data are limited, Al-rich precipitates generally have high REE concentrations, while those in iron oxy-hydroxides tend to be lower. Based on the area of mined coal in the Appalachian Basin, estimated infiltration rates, and the mean REE flux from discharges analyzed in this study and that of Cravotta and Brady (2015, Appl. Geochem. 62, 108-130), we estimate that coal mine drainage outflows in this region generate approximately 450 metric tons of dissolved REE per year, a portion of which could be targeted for resource recovery during CMD treatment.

  18. Possibilities of Effective Inertisation of Self-Heating Places in Goaf of Longwall in Hard Coal Mines

    NASA Astrophysics Data System (ADS)

    Szlązak, Nikodem; Piergies, Kazimierz

    2016-12-01

    Underground fires in coal mines belong to the most common hazards, the exposure to which frequently requires long term and costly rescue operations. It is mainly connected with the specific character of underground excavations which have limited volume. This makes the maximum permissible concentration of harmful gases rapidly exceeded and may also cause changes in air flow direction. The most certain way of improving a safety situation in Polish coal mining industry is taking early prevention steps. One of the prevention methods is inertisation of the atmosphere in longwall goaf. These activities rely on partial or total replacement of air or combustible atmosphere by inert gas. Thanks to them the risk of spontaneous fires hazard and gas explosion decreases. The main reason for the use of inert gases is to reduce the oxygen content to a limit which prevents further development of fire. This article presents methods for assessing inert gas to replace oxygen in the atmosphere in goaf.

  19. The horse-collar aurora - A frequent pattern of the aurora in quiet times

    NASA Technical Reports Server (NTRS)

    Hones, E. W., Jr.; Craven, J. D.; Frank, L. A.; Evans, D. S.; Newell, P. T.

    1989-01-01

    The frequent appearance of the 'horse-collar aurora' pattern in quiet-time DE 1 images is reported, presenting a two-hour image sequence that displays the basic features and shows that it sometimes evolves toward the theta configuration. There is some evidence for interplanetary magnetic field B(y) influence on the temporal development of the pattern. A preliminary statistical analysis finds the pattern appearing in one-third or more of the image sequences recorded during quiet times.

  20. Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation

    PubMed Central

    Huang, Chung-Chi; Lu, Zhiyong

    2016-01-01

    Identifying relevant papers from the literature is a common task in biocuration. Most current biomedical literature search systems primarily rely on matching user keywords. Semantic search, on the other hand, seeks to improve search accuracy by understanding the entities and contextual relations in user keywords. However, past research has mostly focused on semantically identifying biological entities (e.g. chemicals, diseases and genes) with little effort on discovering semantic relations. In this work, we aim to discover biomedical semantic relations in PubMed queries in an automated and unsupervised fashion. Specifically, we focus on extracting and understanding the contextual information (or context patterns) that is used by PubMed users to represent semantic relations between entities such as ‘CHEMICAL-1 compared to CHEMICAL-2.’ With the advances in automatic named entity recognition, we first tag entities in PubMed queries and then use tagged entities as knowledge to recognize pattern semantics. More specifically, we transform PubMed queries into context patterns involving participating entities, which are subsequently projected to latent topics via latent semantic analysis (LSA) to avoid the data sparseness and specificity issues. Finally, we mine semantically similar contextual patterns or semantic relations based on LSA topic distributions. Our two separate evaluation experiments of chemical-chemical (CC) and chemical–disease (CD) relations show that the proposed approach significantly outperforms a baseline method, which simply measures pattern semantics by similarity in participating entities. The highest performance achieved by our approach is nearly 0.9 and 0.85 respectively for the CC and CD task when compared against the ground truth in terms of normalized discounted cumulative gain (nDCG), a standard measure of ranking quality. These results suggest that our approach can effectively identify and return related semantic patterns in a ranked order covering diverse bio-entity relations. To assess the potential utility of our automated top-ranked patterns of a given relation in semantic search, we performed a pilot study on frequently sought semantic relations in PubMed and observed improved literature retrieval effectiveness based on post-hoc human relevance evaluation. Further investigation in larger tests and in real-world scenarios is warranted. PMID:27016698

Top