Science.gov

Sample records for mining approach identifies

  1. An Integrative data mining approach to identifying Adverse ...

    EPA Pesticide Factsheets

    The Adverse Outcome Pathway (AOP) framework is a tool for making biological connections and summarizing key information across different levels of biological organization to connect biological perturbations at the molecular level to adverse outcomes for an individual or population. Computational approaches to explore and determine these connections can accelerate the assembly of AOPs. By leveraging the wealth of publicly available data covering chemical effects on biological systems, computationally-predicted AOPs (cpAOPs) were assembled via data mining of high-throughput screening (HTS) in vitro data, in vivo data and other disease phenotype information. Frequent Itemset Mining (FIM) was used to find associations between the gene targets of ToxCast HTS assays and disease data from Comparative Toxicogenomics Database (CTD) by using the chemicals as the common aggregators between datasets. The method was also used to map gene expression data to disease data from CTD. A cpAOP network was defined by considering genes and diseases as nodes and FIM associations as edges. This network contained 18,283 gene to disease associations for the ToxCast data and 110,253 for CTD gene expression. Two case studies show the value of the cpAOP network by extracting subnetworks focused either on fatty liver disease or the Aryl Hydrocarbon Receptor (AHR). The subnetwork surrounding fatty liver disease included many genes known to play a role in this disease. When querying the cpAOP

  2. A novel pattern mining approach for identifying cognitive activity in EEG based functional brain networks.

    PubMed

    Thilaga, M; Vijayalakshmi, R; Nadarajan, R; Nandagopal, D

    2016-06-01

    The complex nature of neuronal interactions of the human brain has posed many challenges to the research community. To explore the underlying mechanisms of neuronal activity of cohesive brain regions during different cognitive activities, many innovative mathematical and computational models are required. This paper presents a novel Common Functional Pattern Mining approach to demonstrate the similar patterns of interactions due to common behavior of certain brain regions. The electrode sites of EEG-based functional brain network are modeled as a set of transactions and node-based complex network measures as itemsets. These itemsets are transformed into a graph data structure called Functional Pattern Graph. By mining this Functional Pattern Graph, the common functional patterns due to specific brain functioning can be identified. The empirical analyses show the efficiency of the proposed approach in identifying the extent to which the electrode sites (transactions) are similar during various cognitive load states.

  3. Computational Approaches for Mining GRO-Seq Data to Identify and Characterize Active Enhancers.

    PubMed

    Nagari, Anusha; Murakami, Shino; Malladi, Venkat S; Kraus, W Lee

    2017-01-01

    Transcriptional enhancers are DNA regulatory elements that are bound by transcription factors and act to positively regulate the expression of nearby or distally located target genes. Enhancers have many features that have been discovered using genomic analyses. Recent studies have shown that active enhancers recruit RNA polymerase II (Pol II) and are transcribed, producing enhancer RNAs (eRNAs). GRO-seq, a method for identifying the location and orientation of all actively transcribing RNA polymerases across the genome, is a powerful approach for monitoring nascent enhancer transcription. Furthermore, the unique pattern of enhancer transcription can be used to identify enhancers in the absence of any information about the underlying transcription factors. Here, we describe the computational approaches required to identify and analyze active enhancers using GRO-seq data, including data pre-processing, alignment, and transcript calling. In addition, we describe protocols and computational pipelines for mining GRO-seq data to identify active enhancers, as well as known transcription factor binding sites that are transcribed. Furthermore, we discuss approaches for integrating GRO-seq-based enhancer data with other genomic data, including target gene expression and function. Finally, we describe molecular biology assays that can be used to confirm and explore further the function of enhancers that have been identified using genomic assays. Together, these approaches should allow the user to identify and explore the features and biological functions of new cell type-specific enhancers.

  4. A Data Mining Approach to Identify Sexuality Patterns in a Brazilian University Population.

    PubMed

    Waleska Simões, Priscyla; Cesconetto, Samuel; Toniazzo de Abreu, Larissa Letieli; Côrtes de Mattos Garcia, Merisandra; Cassettari Junior, José Márcio; Comunello, Eros; Bisognin Ceretta, Luciane; Aparecida Manenti, Sandra

    2015-01-01

    This paper presents the profile and experience of sexuality generated from a data mining classification task. We used a database about sexuality and gender violence performed on a university population in southern Brazil. The data mining task identified two relationships between the variables, which enabled the distinction of subgroups that better detail the profile and experience of sexuality. The identification of the relationships between the variables define behavioral models and factors of risk that will help define the algorithms being implemented in the data mining classification task.

  5. Identifying functional connectivity in large-scale neural ensemble recordings: a multiscale data mining approach.

    PubMed

    Eldawlatly, Seif; Jin, Rong; Oweiss, Karim G

    2009-02-01

    Identifying functional connectivity between neuronal elements is an essential first step toward understanding how the brain orchestrates information processing at the single-cell and population levels to carry out biological computations. This letter suggests a new approach to identify functional connectivity between neuronal elements from their simultaneously recorded spike trains. In particular, we identify clusters of neurons that exhibit functional interdependency over variable spatial and temporal patterns of interaction. We represent neurons as objects in a graph and connect them using arbitrarily defined similarity measures calculated across multiple timescales. We then use a probabilistic spectral clustering algorithm to cluster the neurons in the graph by solving a minimum graph cut optimization problem. Using point process theory to model population activity, we demonstrate the robustness of the approach in tracking a broad spectrum of neuronal interaction, from synchrony to rate co-modulation, by systematically varying the length of the firing history interval and the strength of the connecting synapses that govern the discharge pattern of each neuron. We also demonstrate how activity-dependent plasticity can be tracked and quantified in multiple network topologies built to mimic distinct behavioral contexts. We compare the performance to classical approaches to illustrate the substantial gain in performance.

  6. Using PharmGKB to train text mining approaches for identifying potential gene targets for pharmacogenomic studies.

    PubMed

    Pakhomov, S; McInnes, B T; Lamba, J; Liu, Y; Melton, G B; Ghodke, Y; Bhise, N; Lamba, V; Birnbaum, A K

    2012-10-01

    The main objective of this study was to investigate the feasibility of using PharmGKB, a pharmacogenomic database, as a source of training data in combination with text of MEDLINE abstracts for a text mining approach to identification of potential gene targets for pathway-driven pharmacogenomics research. We used the manually curated relations between drugs and genes in PharmGKB database to train a support vector machine predictive model and applied this model prospectively to MEDLINE abstracts. The gene targets suggested by this approach were subsequently manually reviewed. Our quantitative analysis showed that a support vector machine classifiers trained on MEDLINE abstracts with single words (unigrams) used as features and PharmGKB relations used for supervision, achieve an overall sensitivity of 85% and specificity of 69%. The subsequent qualitative analysis showed that gene targets "suggested" by the automatic classifier were not anticipated by expert reviewers but were subsequently found to be relevant to the three drugs that were investigated: carbamazepine, lamivudine and zidovudine. Our results show that this approach is not only feasible but may also find new gene targets not identifiable by other methods thus making it a valuable tool for pathway-driven pharmacogenomics research. Copyright © 2012 Elsevier Inc. All rights reserved.

  7. Data mining approach identifies research priorities and data requirements for resolving the red algal tree of life

    PubMed Central

    2010-01-01

    Background The assembly of the tree of life has seen significant progress in recent years but algae and protists have been largely overlooked in this effort. Many groups of algae and protists have ancient roots and it is unclear how much data will be required to resolve their phylogenetic relationships for incorporation in the tree of life. The red algae, a group of primary photosynthetic eukaryotes of more than a billion years old, provide the earliest fossil evidence for eukaryotic multicellularity and sexual reproduction. Despite this evolutionary significance, their phylogenetic relationships are understudied. This study aims to infer a comprehensive red algal tree of life at the family level from a supermatrix containing data mined from GenBank. We aim to locate remaining regions of low support in the topology, evaluate their causes and estimate the amount of data required to resolve them. Results Phylogenetic analysis of a supermatrix of 14 loci and 98 red algal families yielded the most complete red algal tree of life to date. Visualization of statistical support showed the presence of five poorly supported regions. Causes for low support were identified with statistics about the age of the region, data availability and node density, showing that poor support has different origins in different parts of the tree. Parametric simulation experiments yielded optimistic estimates of how much data will be needed to resolve the poorly supported regions (ca. 103 to ca. 104 nucleotides for the different regions). Nonparametric simulations gave a markedly more pessimistic image, some regions requiring more than 2.8 105 nucleotides or not achieving the desired level of support at all. The discrepancies between parametric and nonparametric simulations are discussed in light of our dataset and known attributes of both approaches. Conclusions Our study takes the red algae one step closer to meaningful inclusion in the tree of life. In addition to the recovery of stable

  8. SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics.

    PubMed

    Bertone, P; Kluger, Y; Lan, N; Zheng, D; Christendat, D; Yee, A; Edwards, A M; Arrowsmith, C H; Montelione, G T; Gerstein, M

    2001-07-01

    High-throughput structural proteomics is expected to generate considerable amounts of data on the progress of structure determination for many proteins. For each protein this includes information about cloning, expression, purification, biophysical characterization and structure determination via NMR spectroscopy or X-ray crystallography. It will be essential to develop specifications and ontologies for standardizing this information to make it amenable to retrospective analysis. To this end we created the SPINE database and analysis system for the Northeast Structural Genomics Consortium. SPINE, which is available at bioinfo.mbb.yale.edu/nesg or nesg.org, is specifically designed to enable distributed scientific collaboration via the Internet. It was designed not just as an information repository but as an active vehicle to standardize proteomics data in a form that would enable systematic data mining. The system features an intuitive user interface for interactive retrieval and modification of expression construct data, query forms designed to track global project progress and external links to many other resources. Currently the database contains experimental data on 985 constructs, of which 740 are drawn from Methanobacterium thermoautotrophicum, 123 from Saccharomyces cerevisiae, 93 from Caenorhabditis elegans and the remainder from other organisms. We developed a comprehensive set of data mining features for each protein, including several related to experimental progress (e.g. expression level, solubility and crystallization) and 42 based on the underlying protein sequence (e.g. amino acid composition, secondary structure and occurrence of low complexity regions). We demonstrate in detail the application of a particular machine learning approach, decision trees, to the tasks of predicting a protein's solubility and propensity to crystallize based on sequence features. We are able to extract a number of key rules from our trees, in particular that soluble

  9. Target discovery from data mining approaches.

    PubMed

    Yang, Yongliang; Adelstein, S James; Kassis, Amin I

    2009-02-01

    Data mining of available biomedical data and information has greatly boosted target discovery in the 'omics' era. Target discovery is the key step in the biomarker and drug discovery pipeline to diagnose and fight human diseases. In biomedical science, the 'target' is a broad concept ranging from molecular entities (such as genes, proteins and miRNAs) to biological phenomena (such as molecular functions, pathways and phenotypes). Within the context of biomedical science, data mining refers to a bioinformatics approach that combines biological concepts with computer tools or statistical methods that are mainly used to discover, select and prioritize targets. In response to the huge demand of data mining for target discovery in the 'omics' era, this review explicates various data mining approaches and their applications to target discovery with emphasis on text and microarray data analysis. Two emerging data mining approaches, chemogenomic data mining and proteomic data mining, are briefly introduced. Also discussed are the limitations of various data mining approaches found in the level of database integration, the quality of data annotation, sample heterogeneity and the performance of analytical and mining tools. Tentative strategies of integrating different data sources for target discovery, such as integrated text mining with high-throughput data analysis and integrated mining with pathway databases, are introduced.

  10. Target discovery from data mining approaches.

    PubMed

    Yang, Yongliang; Adelstein, S James; Kassis, Amin I

    2012-02-01

    Data mining of available biomedical data and information has greatly boosted target discovery in the 'omics' era. Target discovery is the key step in the biomarker and drug discovery pipeline to diagnose and fight human diseases. In biomedical science, the 'target' is a broad concept ranging from molecular entities (such as genes, proteins and miRNAs) to biological phenomena (such as molecular functions, pathways and phenotypes). Within the context of biomedical science, data mining refers to a bioinformatics approach that combines biological concepts with computer tools or statistical methods that are mainly used to discover, select and prioritize targets. In response to the huge demand of data mining for target discovery in the 'omics' era, this review explicates various data mining approaches and their applications to target discovery with emphasis on text and microarray data analysis. Two emerging data mining approaches, chemogenomic data mining and proteomic data mining, are briefly introduced. Also discussed are the limitations of various data mining approaches found in the level of database integration, the quality of data annotation, sample heterogeneity and the performance of analytical and mining tools. Tentative strategies of integrating different data sources for target discovery, such as integrated text mining with high-throughput data analysis and integrated mining with pathway databases, are introduced. Published by Elsevier Ltd.

  11. Developing Isotope Tools for Identifying Mercury Mining Sources

    NASA Astrophysics Data System (ADS)

    Koster van Groos, P. G.; Esser, B. K.; Williams, R. W.; Hunt, J. R.

    2009-12-01

    Mining operations in California during the past two centuries have resulted in widespread mercury contamination. Source control strategies are difficult and expensive to implement, in part because links between specific mercury sources and exposures are often uncertain. Examination of mercury’s stable isotopes can help resolve this issue. Sources with distinct isotope compositions may be traced through the environment. Mercury mining operations are predicted to have led to waste tailings, mercury metal products, and air emissions with different isotope compositions as a result of inefficient mercury extraction and recovery from ores. The predicted differences in isotope composition, based on estimated kinetic and diffusion isotope effects, are greater than the precision of current analytical methods using multi-collector inductively coupled plasma mass-spectrometers (MC-ICP-MS). As such, mercury isotope measurements may help identify mercury originating from different mining operations. To support a mechanistic approach to mercury isotope fractionation, the isotope effects of diffusion through solids and gases are being investigated experimentally. Besides demonstrating the utility of mercury isotope analysis for source identification, this work is providing a mechanistic basis for differences in isotope compositions.

  12. A data mining approach for identifying pathway-gene biomarkers for predicting clinical outcome: A case study of erlotinib and sorafenib

    PubMed Central

    2017-01-01

    A novel data mining procedure is proposed for identifying potential pathway-gene biomarkers from preclinical drug sensitivity data for predicting clinical responses to erlotinib or sorafenib. The analysis applies linear ridge regression modeling to generate a small (N~1000) set of baseline gene expressions that jointly yield quality predictions of preclinical drug sensitivity data and clinical responses. Standard clustering of the pathway-gene combinations from gene set enrichment analysis of this initial gene set, according to their shared appearance in molecular function pathways, yields a reduced (N~300) set of potential pathway-gene biomarkers. A modified method for quantifying pathway fitness is used to determine smaller numbers of over and under expressed genes that correspond with favorable and unfavorable clinical responses. Detailed literature-based evidence is provided in support of the roles of these under and over expressed genes in compound efficacy. RandomForest analysis of potential pathway-gene biomarkers finds average treatment prediction errors of 10% and 22%, respectively, for patients receiving erlotinib or sorafenib that had a favorable clinical response. Higher errors were found for both compounds when predicting an unfavorable clinical response. Collectively these results suggest complementary roles for biomarker genes and biomarker pathways when predicting clinical responses from preclinical data. PMID:28792525

  13. Identifying novel biomarkers through data mining-a realistic scenario?

    PubMed

    Griss, Johannes; Perez-Riverol, Yasset; Hermjakob, Henning; Vizcaíno, Juan Antonio

    2015-04-01

    In this article we discuss the requirements to use data mining of published proteomics datasets to assist proteomics-based biomarker discovery, the use of external data integration to solve the issue of inadequate small sample sizes and finally, we try to estimate the probability that new biomarkers will be identified through data mining alone. © 2014 The Authors. PROTEOMICS - Clinical Applications Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  14. A data mining approach to intelligence operations

    NASA Astrophysics Data System (ADS)

    Memon, Nasrullah; Hicks, David L.; Harkiolakis, Nicholas

    2008-03-01

    In this paper we examine the latest thinking, approaches and methodologies in use for finding the nuggets of information and subliminal (and perhaps intentionally hidden) patterns and associations that are critical to identify criminal activity and suspects to private and government security agencies. An emphasis in the paper is placed on Social Network Analysis and Investigative Data Mining, and the use of these technologies in the counterterrorism domain. Tools and techniques from both areas are described, along with the important tasks for which they can be used to assist with the investigation and analysis of terrorist organizations. The process of collecting data about these organizations is also considered along with the inherent difficulties that are involved.

  15. Edu-mining: A Machine Learning Approach

    NASA Astrophysics Data System (ADS)

    Srimani, P. K.; Patil, Malini M.

    2011-12-01

    Mining Educational data is an emerging interdisciplinary research area that mainly deals with the development of methods to explore the data stored in educational institutions. The educational data is referred as Edu-DATA. Queries related to Edu-DATA are of practical interest as SQL approach is insufficient and needs to be focused in a different way. The paper aims at developing a technique called Edu-MINING which converts raw data coming from educational institutions using data mining techniques into useful information. The discovered knowledge will have a great impact on the educational research and practices. Edu-MINING explores Edu-DATA, discovers new knowledge and suggests useful methods to improve the quality of education with regard to teaching-learning process. This is illustrated through a case study.

  16. Implementation of an original approach on the Mines-Douai Comparative Reactivity Method (MD-CRM) instrument to identify part of the missing OH reactivity at an urban site

    NASA Astrophysics Data System (ADS)

    Dusanter, S.; Michoud, V.; Leonardis, T.; Riffault, V.; Zhang, S.; Locoge, N.

    2015-12-01

    Due to the large number of Volatile Organic Compounds (VOCs) expected in the atmosphere (104-105) (Goldstein and Galbally, ES&T, 2007), exhaustive measurements of VOCs appear to be currently unfeasible using common analytical techniques. In this context, measurements of the total sink of OH, referred as total OH reactivity, can provide a critical test to assess the completeness of trace gas measurements during field campaigns. This can be done by comparing the measured total OH reactivity to values calculated from trace gas measurements. Indeed, large discrepancies are usually found between measured and calculated OH reactivity values revealing the presence of important unmeasured reactive species, which have yet to be identified. A Comparative Reactivity Method (CRM) instrument has been setup at Mines Douai to allow sequential measurements of VOCs and OH reactivity using the same Proton Transfer Reaction-Time of Flight Mass Spectrometer. This approach aims at identifying unmeasured reactive VOCs based on a method proposed by Kato et al. (Atmos. Environ., 2011), taking advantage of VOC oxidations occurring in the CRM sampling reactor. MD-CRM has been deployed at an urban site in Dunkirk (France) during July 2014 to test this new approach. During this campaign, a large fraction of the OH reactivity was not explained by collocated measurements of trace gases (67% on average). In this presentation, we will first describe the approach that was implemented in the CRM instrument to identify part of the observed missing OH reactivity and we will then discuss the OH reactivity budget regarding the origin of air masses reaching the measurement site.

  17. Mining for Murder-Suicide: An Approach to Identifying Cases of Murder-Suicide in the National Violent Death Reporting System Restricted Access Database.

    PubMed

    McNally, Matthew R; Patton, Christina L; Fremouw, William J

    2016-01-01

    The National Violent Death Reporting System (NVDRS) is a United States Centers for Disease Control and Prevention (CDC) database of violent deaths from 2003 to the present. The NVDRS collects information from 32 states on several types of violent deaths, including suicides, homicides, homicides followed by suicides, and deaths resulting from child maltreatment or intimate partner violence, as well as legal intervention and accidental firearm deaths. Despite the availability of data from police narratives, medical examiner reports, and other sources, reliably finding the cases of murder-suicide in the NVDRS has proven problematic due to the lack of a unique code for murder-suicide incidents and outdated descriptions of case-finding procedures from previous researchers. By providing a description of the methods used to access to the NVDRS and coding procedures used to decipher these data, the authors seek to assist future researchers in correctly identifying cases of murder-suicide deaths while avoiding false positives.

  18. Identifying Engineering Students' English Sentence Reading Comprehension Errors: Applying a Data Mining Technique

    ERIC Educational Resources Information Center

    Tsai, Yea-Ru; Ouyang, Chen-Sen; Chang, Yukon

    2016-01-01

    The purpose of this study is to propose a diagnostic approach to identify engineering students' English reading comprehension errors. Student data were collected during the process of reading texts of English for science and technology on a web-based cumulative sentence analysis system. For the analysis, the association-rule, data mining technique…

  19. Identifying Engineering Students' English Sentence Reading Comprehension Errors: Applying a Data Mining Technique

    ERIC Educational Resources Information Center

    Tsai, Yea-Ru; Ouyang, Chen-Sen; Chang, Yukon

    2016-01-01

    The purpose of this study is to propose a diagnostic approach to identify engineering students' English reading comprehension errors. Student data were collected during the process of reading texts of English for science and technology on a web-based cumulative sentence analysis system. For the analysis, the association-rule, data mining technique…

  20. Identifying the Cause of Toxicity of a Saline Mine Water

    PubMed Central

    van Dam, Rick A.; Harford, Andrew J.; Lunn, Simon A.; Gagnon, Marthe M.

    2014-01-01

    Elevated major ions (or salinity) are recognised as being a key contributor to the toxicity of many mine waste waters but the complex interactions between the major ions and large inter-species variability in response to salinity, make it difficult to relate toxicity to causal factors. This study aimed to determine if the toxicity of a typical saline seepage water was solely due to its major ion constituents; and determine which major ions were the leading contributors to the toxicity. Standardised toxicity tests using two tropical freshwater species Chlorella sp. (alga) and Moinodaphnia macleayi (cladoceran) were used to compare the toxicity of 1) mine and synthetic seepage water; 2) key major ions (e.g. Na, Cl, SO4 and HCO3); 3) synthetic seepage water that were modified by excluding key major ions. For Chlorella sp., the toxicity of the seepage water was not solely due to its major ion concentrations because there were differences in effects caused by the mine seepage and synthetic seepage. However, for M. macleayi this hypothesis was supported because similar effects caused by mine seepage and synthetic seepage. Sulfate was identified as a major ion that could predict the toxicity of the synthetic waters, which might be expected as it was the dominant major ion in the seepage water. However, sulfate was not the primary cause of toxicity in the seepage water and electrical conductivity was a better predictor of effects. Ultimately, the results show that specific major ions do not clearly drive the toxicity of saline seepage waters and the effects are probably due to the electrical conductivity of the mine waste waters. PMID:25180579

  1. Identifying the cause of toxicity of a saline mine water.

    PubMed

    van Dam, Rick A; Harford, Andrew J; Lunn, Simon A; Gagnon, Marthe M

    2014-01-01

    Elevated major ions (or salinity) are recognised as being a key contributor to the toxicity of many mine waste waters but the complex interactions between the major ions and large inter-species variability in response to salinity, make it difficult to relate toxicity to causal factors. This study aimed to determine if the toxicity of a typical saline seepage water was solely due to its major ion constituents; and determine which major ions were the leading contributors to the toxicity. Standardised toxicity tests using two tropical freshwater species Chlorella sp. (alga) and Moinodaphnia macleayi (cladoceran) were used to compare the toxicity of 1) mine and synthetic seepage water; 2) key major ions (e.g. Na, Cl, SO4 and HCO3); 3) synthetic seepage water that were modified by excluding key major ions. For Chlorella sp., the toxicity of the seepage water was not solely due to its major ion concentrations because there were differences in effects caused by the mine seepage and synthetic seepage. However, for M. macleayi this hypothesis was supported because similar effects caused by mine seepage and synthetic seepage. Sulfate was identified as a major ion that could predict the toxicity of the synthetic waters, which might be expected as it was the dominant major ion in the seepage water. However, sulfate was not the primary cause of toxicity in the seepage water and electrical conductivity was a better predictor of effects. Ultimately, the results show that specific major ions do not clearly drive the toxicity of saline seepage waters and the effects are probably due to the electrical conductivity of the mine waste waters.

  2. Data Mining and Pattern Recognition Models for Identifying Inherited Diseases: Challenges and Implications

    PubMed Central

    Iddamalgoda, Lahiru; Das, Partha S.; Aponso, Achala; Sundararajan, Vijayaraghava S.; Suravajhala, Prashanth; Valadi, Jayaraman K.

    2016-01-01

    Data mining and pattern recognition methods reveal interesting findings in genetic studies, especially on how the genetic makeup is associated with inherited diseases. Although researchers have proposed various data mining models for biomedical approaches, there remains a challenge in accurately prioritizing the single nucleotide polymorphisms (SNP) associated with the disease. In this commentary, we review the state-of-art data mining and pattern recognition models for identifying inherited diseases and deliberate the need of binary classification- and scoring-based prioritization methods in determining causal variants. While we discuss the pros and cons associated with these methods known, we argue that the gene prioritization methods and the protein interaction (PPI) methods in conjunction with the K nearest neighbors' could be used in accurately categorizing the genetic factors in disease causation. PMID:27559342

  3. Pennsylvania's approach to underground coal mine permitting and long-term mine pool management

    SciTech Connect

    Callaghan, T.; Koricich, J.

    1999-07-01

    Pennsylvania's underground coal mine permitting process has two goals: first, to ensure that the mining and reclamation plan is designed to minimize adverse environmental impacts; and second, to minimize interference with the applicant's recovery of coal. A successful review process includes the consistent evaluation of mine site hydrology through scrutiny of key indicators of mining-induced, adverse hydrologic consequences. This allows the regulatory agency to assess the potential for mining-related impacts as well as cumulative impacts throughout the proposed mine area and adjacent area. General trends have been identified regarding quality of underground mine drainage versus coal seam mined. However, the large number of factors controlling the final mine pool chemistry along with the lack of focused research have combined to stunt the development of reliable methodologies for the prediction of postmining water quality. Absent reliable predictive methodologies, mine layout has become the best demonstrated technology for pollution prevention. Strategies include: (1) promotion of postmining inundation by down-dip development with proper location of mine openings and sizing and location of barriers; (2) restriction of mining to zones within the groundwater system where flow is relatively lethargic and time of travel is great when compared to natural mine pool amelioration time frames; and (3) mining in zones remote from groundwater discharge areas and features which may serve to short-circuit mine water to nearby existing water-supply aquifers or to the surface. This paper discusses Pennsylvania's application process for underground bituminous coal mines. It briefly outlines Pennsylvania's statutory history relating to mine discharges, touches on some of the tools permit reviewers use to evaluate the hydrology of proposed underground mining sites, and discusses the key factors that permit reviewers consider in assessing potential postmining mine pool levels.

  4. APPLYING DATA MINING APPROACHES TO FURTHER ...

    EPA Pesticide Factsheets

    This dataset will be used to illustrate various data mining techniques to biologically profile the chemical space. This dataset will be used to illustrate various data mining techniques to biologically profile the chemical space.

  5. Data Mining Approaches for Intrusion Detection

    DTIC Science & Technology

    2000-10-12

    In this paper we discuss our research in developing general and systematic methods for intrusion detection. The key ideas are to use data mining techniques...two general data mining algorithms that we have implemented: the association rules algorithm and the frequent episodes algorithm. These algorithms can

  6. Integrating text mining, data mining, and network analysis for identifying genetic breast cancer trends.

    PubMed

    Jurca, Gabriela; Addam, Omar; Aksac, Alper; Gao, Shang; Özyer, Tansel; Demetrick, Douglas; Alhajj, Reda

    2016-04-26

    Breast cancer is a serious disease which affects many women and may lead to death. It has received considerable attention from the research community. Thus, biomedical researchers aim to find genetic biomarkers indicative of the disease. Novel biomarkers can be elucidated from the existing literature. However, the vast amount of scientific publications on breast cancer make this a daunting task. This paper presents a framework which investigates existing literature data for informative discoveries. It integrates text mining and social network analysis in order to identify new potential biomarkers for breast cancer. We utilized PubMed for the testing. We investigated gene-gene interactions, as well as novel interactions such as gene-year, gene-country, and abstract-country to find out how the discoveries varied over time and how overlapping/diverse are the discoveries and the interest of various research groups in different countries. Interesting trends have been identified and discussed, e.g., different genes are highlighted in relationship to different countries though the various genes were found to share functionality. Some text analysis based results have been validated against results from other tools that predict gene-gene relations and gene functions.

  7. Screening and prioritisation of chemical risks from metal mining operations, identifying exposure media of concern.

    PubMed

    Pan, Jilang; Oates, Christopher J; Ihlenfeld, Christian; Plant, Jane A; Voulvoulis, Nikolaos

    2010-04-01

    Metals have been central to the development of human civilisation from the Bronze Age to modern times, although in the past, metal mining and smelting have been the cause of serious environmental pollution with the potential to harm human health. Despite problems from artisanal mining in some developing countries, modern mining to Western standards now uses the best available mining technology combined with environmental monitoring, mitigation and remediation measures to limit emissions to the environment. This paper develops risk screening and prioritisation methods previously used for contaminated land on military and civilian sites and engineering systems for the analysis and prioritisation of chemical risks from modern metal mining operations. It uses hierarchical holographic modelling and multi-criteria decision making to analyse and prioritise the risks from potentially hazardous inorganic chemical substances released by mining operations. A case study of an active platinum group metals mine in South Africa is used to demonstrate the potential of the method. This risk-based methodology for identifying, filtering and ranking mining-related environmental and human health risks can be used to identify exposure media of greatest concern to inform risk management. It also provides a practical decision-making tool for mine acquisition and helps to communicate risk to all members of mining operation teams.

  8. Bayesian network approach to spatial data mining: a case study

    NASA Astrophysics Data System (ADS)

    Huang, Jiejun; Wan, Youchuan

    2006-10-01

    Spatial data mining is a process of discovering interesting, novel, and potentially useful information or knowledge hidden in spatial data sets. It involves different techniques and different methods from various areas of research. A Bayesian network is a graphical model that encodes causal probabilistic relationships among variables of interest, which has a powerful ability for representing and reasoning and provides an effective way to spatial data mining. In this paper we give an introduction to Bayesian networks, and discuss using Bayesian networks for spatial data mining. We propose a framework of spatial data mining based on Bayesian networks. Then we show a case study and use the experimental results to validate the practical viability of the proposed approach to spatial data mining. Finally, the paper gives a summary and some remarks.

  9. Design approaches in quarrying and pit-mining reclamation

    USGS Publications Warehouse

    Arbogast, Belinda F.

    1999-01-01

    Reclaimed mine sites have been evaluated so that the public, industry, and land planners may recognize there are innovative designs available for consideration and use. People tend to see cropland, range, and road cuts as a necessary part of their everyday life, not as disturbed areas despite their high visibility. Mining also generates a disturbed landscape, unfortunately one that many consider waste until reclaimed by human beings. The development of mining provides an economic base and use of a natural resource to improve the quality of human life. Equally important is a sensitivity to the geologic origin and natural pattern of the land. Wisely shaping out environment requires a design plan and product that responds to a site's physiography, ecology, function, artistic form, and publication perception. An examination of selected sites for their landscape design suggested nine approaches for mining reclamation. The oldest design approach around is nature itself. Humans may sometimes do more damage going to an area in the attempt to repair it. Given enough geologic time, a small-site area, and stable adjacent ecosystems, disturbed areas recover without mankind's input. Visual screens and buffer zones conceal the facility in a camouflage approach. Typically, earth berms, fences, and plantings are used to disguise the mining facility. Restoration targets social or economic benefits by reusing the site for public amenities, most often in urban centers with large populations. A mitigation approach attempts to protect the environment and return mined areas to use with scientific input. The reuse of cement, building rubble, macadam meets only about 10% of the demand from aggregate. Recognizing the limited supply of mineral resources and encouraging recycling efforts are steps are steps in a renewable resource approach. An educative design approach effectively communicates mining information through outreach, land stewardship, and community service. Mine sites used for

  10. A MeSH-based text mining method for identifying novel prebiotics

    PubMed Central

    Shan, Guangyu; Lu, Yiming; Min, Bo; Qu, Wubin; Zhang, Chenggang

    2016-01-01

    Abstract Prebiotics contribute to the well-being of their host by altering the composition of the gut microbiota. Discovering new prebiotics is a challenging and arduous task due to strict inclusion criteria; thus, highly limited numbers of prebiotic candidates have been identified. Notably, the large numbers of published studies may contain substantial information attached to various features of known prebiotics that can be used to predict new candidates. In this paper, we propose a medical subject headings (MeSH)-based text mining method for identifying new prebiotics with structured texts obtained from PubMed. We defined an optimal feature set for prebiotics prediction using a systematic feature-ranking algorithm with which a variety of carbohydrates can be accurately classified into different clusters in accordance with their chemical and biological attributes. The optimal feature set was used to separate positive prebiotics from other carbohydrates, and a cross-validation procedure was employed to assess the prediction accuracy of the model. Our method achieved a specificity of 0.876 and a sensitivity of 0.838. Finally, we identified a high-confidence list of candidates of prebiotics that are strongly supported by the literature. Our study demonstrates that text mining from high-volume biomedical literature is a promising approach in searching for new prebiotics. PMID:27930574

  11. Systematic evaluation of satellite remote sensing for identifying uranium mines and mills.

    SciTech Connect

    Blair, Dianna Sue; Stork, Christopher Lyle; Smartt, Heidi Anne; Smith, Jody Lynn

    2006-01-01

    In this report, we systematically evaluate the ability of current-generation, satellite-based spectroscopic sensors to distinguish uranium mines and mills from other mineral mining and milling operations. We perform this systematic evaluation by (1) outlining the remote, spectroscopic signal generation process, (2) documenting the capabilities of current commercial satellite systems, (3) systematically comparing the uranium mining and milling process to other mineral mining and milling operations, and (4) identifying the most promising observables associated with uranium mining and milling that can be identified using satellite remote sensing. The Ranger uranium mine and mill in Australia serves as a case study where we apply and test the techniques developed in this systematic analysis. Based on literature research of mineral mining and milling practices, we develop a decision tree which utilizes the information contained in one or more observables to determine whether uranium is possibly being mined and/or milled at a given site. Promising observables associated with uranium mining and milling at the Ranger site included in the decision tree are uranium ore, sulfur, the uranium pregnant leach liquor, ammonia, and uranyl compounds and sulfate ion disposed of in the tailings pond. Based on the size, concentration, and spectral characteristics of these promising observables, we then determine whether these observables can be identified using current commercial satellite systems, namely Hyperion, ASTER, and Quickbird. We conclude that the only promising observables at Ranger that can be uniquely identified using a current commercial satellite system (notably Hyperion) are magnesium chlorite in the open pit mine and the sulfur stockpile. Based on the identified magnesium chlorite and sulfur observables, the decision tree narrows the possible mineral candidates at Ranger to uranium, copper, zinc, manganese, vanadium, the rare earths, and phosphorus, all of which are

  12. A vector space model approach to identify genetically related diseases

    PubMed Central

    2012-01-01

    Objective The relationship between diseases and their causative genes can be complex, especially in the case of polygenic diseases. Further exacerbating the challenges in their study is that many genes may be causally related to multiple diseases. This study explored the relationship between diseases through the adaptation of an approach pioneered in the context of information retrieval: vector space models. Materials and Methods A vector space model approach was developed that bridges gene disease knowledge inferred across three knowledge bases: Online Mendelian Inheritance in Man, GenBank, and Medline. The approach was then used to identify potentially related diseases for two target diseases: Alzheimer disease and Prader-Willi Syndrome. Results In the case of both Alzheimer Disease and Prader-Willi Syndrome, a set of plausible diseases were identified that may warrant further exploration. Discussion This study furthers seminal work by Swanson, et al. that demonstrated the potential for mining literature for putative correlations. Using a vector space modeling approach, information from both biomedical literature and genomic resources (like GenBank) can be combined towards identification of putative correlations of interest. To this end, the relevance of the predicted diseases of interest in this study using the vector space modeling approach were validated based on supporting literature. Conclusion The results of this study suggest that a vector space model approach may be a useful means to identify potential relationships between complex diseases, and thereby enable the coordination of gene-based findings across multiple complex diseases. PMID:22227640

  13. Graduates employment classification using data mining approach

    NASA Astrophysics Data System (ADS)

    Aziz, Mohd Tajul Rizal Ab; Yusof, Yuhanis

    2016-08-01

    Data Mining is a platform to extract hidden knowledge in a collection of data. This study investigates the suitable classification model to classify graduates employment for one of the MARA Professional College (KPM) in Malaysia. The aim is to classify the graduates into either as employed, unemployed or further study. Five data mining algorithms offered in WEKA were used; Naïve Bayes, Logistic regression, Multilayer perceptron, k-nearest neighbor and Decision tree J48. Based on the obtained result, it is learned that the Logistic regression produces the highest classification accuracy which is at 92.5%. Such result was obtained while using 80% data for training and 20% for testing. The produced classification model will benefit the management of the college as it provides insight to the quality of graduates that they produce and how their curriculum can be improved to cater the needs from the industry.

  14. Text mining applied to electronic cardiovascular procedure reports to identify patients with trileaflet aortic stenosis and coronary artery disease.

    PubMed

    Small, Aeron M; Kiss, Daniel H; Zlatsin, Yevgeny; Birtwell, David L; Williams, Heather; Guerraty, Marie A; Han, Yuchi; Anwaruddin, Saif; Holmes, John H; Chirinos, Julio A; Wilensky, Robert L; Giri, Jay; Rader, Daniel J

    2017-08-01

    Interrogation of the electronic health record (EHR) using billing codes as a surrogate for diagnoses of interest has been widely used for clinical research. However, the accuracy of this methodology is variable, as it reflects billing codes rather than severity of disease, and depends on the disease and the accuracy of the coding practitioner. Systematic application of text mining to the EHR has had variable success for the detection of cardiovascular phenotypes. We hypothesize that the application of text mining algorithms to cardiovascular procedure reports may be a superior method to identify patients with cardiovascular conditions of interest. We adapted the Oracle product Endeca, which utilizes text mining to identify terms of interest from a NoSQL-like database, for purposes of searching cardiovascular procedure reports and termed the tool "PennSeek". We imported 282,569 echocardiography reports representing 81,164 individuals and 27,205 cardiac catheterization reports representing 14,567 individuals from non-searchable databases into PennSeek. We then applied clinical criteria to these reports in PennSeek to identify patients with trileaflet aortic stenosis (TAS) and coronary artery disease (CAD). Accuracy of patient identification by text mining through PennSeek was compared with ICD-9 billing codes. Text mining identified 7115 patients with TAS and 9247 patients with CAD. ICD-9 codes identified 8272 patients with TAS and 6913 patients with CAD. 4346 patients with AS and 6024 patients with CAD were identified by both approaches. A randomly selected sample of 200-250 patients uniquely identified by text mining was compared with 200-250 patients uniquely identified by billing codes for both diseases. We demonstrate that text mining was superior, with a positive predictive value (PPV) of 0.95 compared to 0.53 by ICD-9 for TAS, and a PPV of 0.97 compared to 0.86 for CAD. These results highlight the superiority of text mining algorithms applied to electronic

  15. Autonomous decision-making: a data mining approach.

    PubMed

    Kusiak, A; Kern, J A; Kernstine, K H; Tseng, B T

    2000-12-01

    The researchers and practitioners of today create models, algorithms, functions, and other constructs defined in abstract spaces. The research of the future will likely be data driven. Symbolic and numeric data that are becoming available in large volumes will define the need for new data analysis techniques and tools. Data mining is an emerging area of computational intelligence that offers new theories, techniques, and tools for analysis of large data sets. In this paper, a novel approach for autonomous decision-making is developed based on the rough set theory of data mining. The approach has been tested on a medical data set for patients with lung abnormalities referred to as solitary pulmonary nodules (SPNs). The two independent algorithms developed in this paper either generate an accurate diagnosis or make no decision. The methodolgy discussed in the paper depart from the developments in data mining as well as current medical literature, thus creating a variable approach for autonomous decision-making.

  16. A proactive approach to sustainable management of mine tailings

    NASA Astrophysics Data System (ADS)

    Edraki, Mansour; Baumgartl, Thomas

    2015-04-01

    The reactive strategies to manage mine tailings i.e. containment of slurries of tailings in tailings storage facilities (TSF's) and remediation of tailings solids or tailings seepage water after the decommissioning of those facilities, can be technically inefficient to eliminate environmental risks (e.g. prevent dispersion of contaminants and catastrophic dam wall failures), pose a long term economic burden for companies, governments and society after mine closure, and often fail to meet community expectations. Most preventive environmental management practices promote proactive integrated approaches to waste management whereby the source of environmental issues are identified to help make a more informed decisions. They often use life cycle assessment to find the "hot spots" of environmental burdens. This kind of approach is often based on generic data and has rarely been used for tailings. Besides, life cycle assessments are less useful for designing operations or simulating changes in the process and consequent environmental outcomes. It is evident that an integrated approach for tailings research linked to better processing options is needed. A literature review revealed that there are only few examples of integrated approaches. The aim of this project is to develop new tailings management models by streamlining orebody characterization, process optimization and rehabilitation. The approach is based on continuous fingerprinting of geochemical processes from orebody to tailings storage facility, and benchmark the success of such proactive initiatives by evidence of no impacts and no future projected impacts on receiving environments. We present an approach for developing such a framework and preliminary results from a case study where combined grinding and flotation models developed using geometallurgical data from the orebody were constructed to predict the properties of tailings produced under various processing scenarios. The modelling scenarios based on the

  17. An Empirical Approach to Identifying Effective Schools.

    ERIC Educational Resources Information Center

    Webster, William J.; Olson, George H.

    One approach to identifying effective schools defines effectiveness in terms of student achievement in reading, mathematics, and language usage. Exceptional school achievement is indicated by performance above or below the level expected if students were merely to maintain their previous rate of growth. Regression analysis is used to compute…

  18. EST mining and functional expression assays identify extracellular effector proteins from the plant pathogen Phytophthora.

    PubMed

    Torto, Trudy A; Li, Shuang; Styer, Allison; Huitema, Edgar; Testa, Antonino; Gow, Neil A R; van West, Pieter; Kamoun, Sophien

    2003-07-01

    Plant pathogenic microbes have the remarkable ability to manipulate biochemical, physiological, and morphological processes in their host plants. These manipulations are achieved through a diverse array of effector molecules that can either promote infection or trigger defense responses. We describe a general functional genomics approach aimed at identifying extracellular effector proteins from plant pathogenic microorganisms by combining data mining of expressed sequence tags (ESTs) with virus-based high-throughput functional expression assays in plants. PexFinder, an algorithm for automated identification of extracellular proteins from EST data sets, was developed and applied to 2147 ESTs from the oomycete plant pathogen Phytophthora infestans. The program identified 261 ESTs (12.2%) corresponding to a set of 142 nonredundant Pex (Phytophthora extracellular protein) cDNAs. Of these, 78 (55%) Pex cDNAs were novel with no significant matches in public databases. Validation of PexFinder was performed using proteomic analysis of secreted protein of P. infestans. To identify which of the Pex cDNAs encode effector proteins that manipulate plant processes, high-throughput functional expression assays in plants were performed on 63 of the identified cDNAs using an Agrobacterium tumefaciens binary vector carrying the potato virus X (PVX) genome. This led to the discovery of two novel necrosis-inducing cDNAs, crn1 and crn2, encoding extracellular proteins that belong to a large and complex protein family in Phytophthora. Further characterization of the crn genes indicated that they are both expressed in P. infestans during colonization of the host plant tomato and that crn2 induced defense-response genes in tomato. Our results indicate that combining data mining using PexFinder with PVX-based functional assays can facilitate the discovery of novel pathogen effector proteins. In principle, this strategy can be applied to a variety of eukaryotic plant pathogens, including

  19. Current approaches for mitigating acid mine drainage.

    PubMed

    Sahoo, Prafulla Kumar; Kim, Kangjoo; Equeenuddin, Sk Md; Powell, Michael A

    2013-01-01

    AMD is one of the critical environmental problems that causes acidification and metal contamination of surface and ground water bodies when mine materials and/or over burden-containing metal sulfides are exposed to oxidizing conditions. The best option to limit AMD is early avoidance of sulfide oxidation. Several techniques are available to achieve this. In this paper, we review all of the major methods now used to limit sulfide oxidation. These fall into five categories: (1) physical barriers,(2) bacterial inhibition, (3) chemical passivation, ( 4) electrochemical, and (5) desulfurization.We describe the processes underlying each method by category and then address aspects relating to effectiveness, cost, and environmental impact. This paper may help researchers and environmental engineers to select suitable methods for addressing site-specific AMD problems.Irrespective of the mechanism by which each method works, all share one common feature, i.e., they delay or prevent oxidation. In addition, all have limitations.Physical barriers such as wet or dry cover have retarded sulfide oxidation in several studies; however, both wet and dry barriers exhibit only short-term effectiveness.Wet cover is suitable at specific sites where complete inundation is established, but this approach requires high maintenance costs. When employing dry cover, plastic liners are expensive and rarely used for large volumes of waste. Bactericides can suppress oxidation, but are only effective on fresh tailings and short-lived, and do not serve as a permanent solution to AMD. In addition, application of bactericides may be toxic to aquatic organisms.Encapsulation or passivation of sulfide surfaces (applying organic and/or inorganic coatings) is simple and effective in preventing AMD. Among inorganic coatings,silica is the most promising, stable, acid-resistant and long lasting, as compared to phosphate and other inorganic coatings. Permanganate passivation is also promising because it

  20. Distributed design approach in persistent identifiers systems

    NASA Astrophysics Data System (ADS)

    Golodoniuc, Pavel; Car, Nicholas; Klump, Jens

    2017-04-01

    The need to identify both digital and physical objects is ubiquitous in our society. Past and present persistent identifier (PID) systems, of which there is a great variety in terms of technical and social implementations, have evolved with the advent of the Internet, which has allowed for globally unique and globally resolvable identifiers. PID systems have catered for identifier uniqueness, integrity, persistence, and trustworthiness, regardless of the identifier's application domain, the scope of which has expanded significantly in the past two decades. Since many PID systems have been largely conceived and developed by small communities, or even a single organisation, they have faced challenges in gaining widespread adoption and, most importantly, the ability to survive change of technology. This has left a legacy of identifiers that still exist and are being used but which have lost their resolution service. We believe that one of the causes of once successful PID systems fading is their reliance on a centralised technical infrastructure or a governing authority. Golodoniuc et al. (2016) proposed an approach to the development of PID systems that combines the use of (a) the Handle system, as a distributed system for the registration and first-degree resolution of persistent identifiers, and (b) the PID Service (Golodoniuc et al., 2015), to enable fine-grained resolution to different information object representations. The proposed approach solved the problem of guaranteed first-degree resolution of identifiers, but left fine-grained resolution and information delivery under the control of a single authoritative source, posing risk to the long-term availability of information resources. Herein, we develop these approaches further and explore the potential of large-scale decentralisation at all levels: (i) persistent identifiers and information resources registration; (ii) identifier resolution; and (iii) data delivery. To achieve large-scale decentralisation

  1. Identifying Emerging Trends in Medical Informatics: A Synthesis Approach.

    PubMed

    Van Kasteren, Yasmin; Williams, Patricia A H; Maeder, Anthony

    2017-01-01

    Medical informatics is a young and rapidly evolving field, influenced by and impacting on many different knowledge domains. Recent contributions on scoping the associated body of knowledge are confounded both by variations in popular use of terminology for established areas, and by the advent of new areas without yet established terminology. Determining the scope of a topic through online bibliographic search filters is a well-established approach in scientific research and has been developed as a human-directed task. Establishing the best approach and automating the process has proved a difficult problem. This paper explores the use of text analysis of bibliographic information using available search engines and NVIVO text analysis tools to test the potential for dynamic word based filters based on data mining. Results show that word searches of abstracts are more effective than topic searches for identifying health informatics papers, however more work is required to refine search terms to improve generalisability. Using data mining to track changes in word use in medical informatics journals, may make it possible to establish a more dynamic search filter to match the evolving nature of the field of health informatics.

  2. Detection of antipersonnel (AP) mines using mechatronics approach

    NASA Astrophysics Data System (ADS)

    Shahri, Ali M.; Naghdy, Fazel

    1998-09-01

    At present there are approximately 110 million land-mines scattered around the world in 64 countries. The clearance of these mines takes place manually. Unfortunately, on average for every 5000 mines cleared one mine clearer is killed. A Mine Detector Arm (MDA) using mechatronics approach is under development in this work. The robot arm imitates manual hand- prodding technique for mine detection. It inserts a bayonet into the soil and models the dynamics of the manipulator and environment parameters, such as stiffness variation in the soil to control the impact caused by contacting a stiff object. An explicit impact control scheme is applied as the main control scheme, while two different intelligent control methods are designed to deal with uncertainties and varying environmental parameters. Firstly, a neuro-fuzzy adaptive gain controller (NFAGC) is designed to adapt the force gain control according to the estimated environment stiffness. Then, an adaptive neuro-fuzzy plus PID controller is employed to switch from a conventional PID controller to neuro-fuzzy impact control (NFIC), when an impact is detected. The developed control schemes are validated through computer simulation and experimental work.

  3. Phytoremediation of coal mine spoil dump through integrated biotechnological approach.

    PubMed

    Juwarkar, Asha A; Jambhulkar, Hemlata P

    2008-07-01

    Field experiment was conducted on mine spoil dump on an area of 10 ha, to restore the fertility and productivity of the coal mine spoil dump using integrated biotechnological approach. The approach involves use of effluent treatment plant sludge (ETP sludge), as an organic amendment, biofertilizers and mycorrihzal fungi along with suitable plant species. The results of the study indicated that amendment with effluent treatment plant sludge (ETP sludge), @ 50 ton/ha improved the physico-chemical properties of coal mine spoil. Due to biofertilizer inoculation different microbial groups such as Rhizobium, Azotobacter and VAM spores, which were practically absent in mine spoil improved greatly. Inoculation of biofertilizer and application of ETP sludge helped in reducing the toxicity of heavy metals such as chromium, zinc, copper, iron, manganese lead, nickel and cadmium, which were significantly reduced to 41%, 43%, 37%, 37%, 34%, 39%, 37% and 40%, respectively, due to the increased organic matter content in the ETP sludge and its alkaline pH (8.10-8.28), at which the metals gets immobilized and translocation of metals is arrested. Thus, amendment and biofertilizer application provided better supportive material for anchorage and growth of the plant on coal mine spoil dump.

  4. Text mining electronic health records to identify hospital adverse events.

    PubMed

    Gerdes, Lars Ulrik; Hardahl, Christian

    2013-01-01

    Manual reviews of health records to identify possible adverse events are time consuming. We are developing a method based on natural language processing to quickly search electronic health records for common triggers and adverse events. Our results agree fairly well with those obtained using manual reviews, and we therefore believe that it is possible to develop automatic tools for monitoring aspects of patient safety.

  5. Mining genomic databases to identify novel hydrogen producers.

    PubMed

    Kalia, Vipin C; Lal, Sadhana; Ghai, Rohit; Mandal, Manabendra; Chauhan, Ashwini

    2003-04-01

    The realization that fossil fuel reserves are limited and their adverse effect on the environment has forced us to look into alternative sources of energy. Hydrogen is a strong contender as a future fuel. Biological hydrogen production ranges from 0.37 to 3.3 moles H(2) per mole of glucose and, considering the high theoretical values of production (4.0 moles H(2) per mole of glucose), it is worth exploring approaches to increase hydrogen yields. Screening the untapped microbial population is a promising possibility. Sequence analysis and pathway alignment of hydrogen metabolism in complete and incomplete genomes has led to the identification of potential hydrogen producers.

  6. Using text-mining techniques in electronic patient records to identify ADRs from medicine use.

    PubMed

    Warrer, Pernille; Hansen, Ebba Holme; Juhl-Jensen, Lars; Aagaard, Lise

    2012-05-01

    This literature review included studies that use text-mining techniques in narrative documents stored in electronic patient records (EPRs) to investigate ADRs. We searched PubMed, Embase, Web of Science and International Pharmaceutical Abstracts without restrictions from origin until July 2011. We included empirically based studies on text mining of electronic patient records (EPRs) that focused on detecting ADRs, excluding those that investigated adverse events not related to medicine use. We extracted information on study populations, EPR data sources, frequencies and types of the identified ADRs, medicines associated with ADRs, text-mining algorithms used and their performance. Seven studies, all from the United States, were eligible for inclusion in the review. Studies were published from 2001, the majority between 2009 and 2010. Text-mining techniques varied over time from simple free text searching of outpatient visit notes and inpatient discharge summaries to more advanced techniques involving natural language processing (NLP) of inpatient discharge summaries. Performance appeared to increase with the use of NLP, although many ADRs were still missed. Due to differences in study design and populations, various types of ADRs were identified and thus we could not make comparisons across studies. The review underscores the feasibility and potential of text mining to investigate narrative documents in EPRs for ADRs. However, more empirical studies are needed to evaluate whether text mining of EPRs can be used systematically to collect new information about ADRs. © 2011 The Authors. British Journal of Clinical Pharmacology © 2011 The British Pharmacological Society.

  7. Using text-mining techniques in electronic patient records to identify ADRs from medicine use

    PubMed Central

    Warrer, Pernille; Hansen, Ebba Holme; Juhl-Jensen, Lars; Aagaard, Lise

    2012-01-01

    This literature review included studies that use text-mining techniques in narrative documents stored in electronic patient records (EPRs) to investigate ADRs. We searched PubMed, Embase, Web of Science and International Pharmaceutical Abstracts without restrictions from origin until July 2011. We included empirically based studies on text mining of electronic patient records (EPRs) that focused on detecting ADRs, excluding those that investigated adverse events not related to medicine use. We extracted information on study populations, EPR data sources, frequencies and types of the identified ADRs, medicines associated with ADRs, text-mining algorithms used and their performance. Seven studies, all from the United States, were eligible for inclusion in the review. Studies were published from 2001, the majority between 2009 and 2010. Text-mining techniques varied over time from simple free text searching of outpatient visit notes and inpatient discharge summaries to more advanced techniques involving natural language processing (NLP) of inpatient discharge summaries. Performance appeared to increase with the use of NLP, although many ADRs were still missed. Due to differences in study design and populations, various types of ADRs were identified and thus we could not make comparisons across studies. The review underscores the feasibility and potential of text mining to investigate narrative documents in EPRs for ADRs. However, more empirical studies are needed to evaluate whether text mining of EPRs can be used systematically to collect new information about ADRs. PMID:22122057

  8. Wastewater treatment polymers identified as the toxic component of a diamond mine effluent.

    PubMed

    De Rosemond, Simone J C; Liber, Karsten

    2004-09-01

    The Ekati Diamond Mine, located approximately 300 km northeast of Yellowknife in Canada's Northwest Territories, uses mechanical crushing and washing processes to extract diamonds from kimberlite ore. The processing plant's effluent contains kimberlite ore particles (< or =0.5 mm), wastewater, and two wastewater treatment polymers, a cationic polydiallydimethylammonium chloride (DADMAC) polymer and an anionic sodium acrylate polyacrylamide (PAM) polymer. A series of acute (48-h) and chronic (7-d) toxicity tests determined the processed kimberlite effluent (PKE) was chronically, but not acutely, toxic to Ceriodaphnia dubia. Reproduction of C. dubia was inhibited significantly at concentrations as low as 12.5% PKE. Toxicity identification evaluations (TIE) were initiated to identify the toxic component of PKE. Ethylenediaminetetraacetic acid (EDTA), sodium thiosulfate, aeration, and solid phase extraction with C-18 manipulations failed to reduce PKE toxicity. Toxicity was reduced significantly by pH adjustments to pH 3 or 11 followed by filtration. Toxicity testing with C. dubia determined that the cationic DADMAC polymer had a 48-h median lethal concentration (LC50) of 0.32 mg/L and 7-d median effective concentration (EC50) of 0.014 mg/L. The anionic PAM polymer had a 48-h LC50 of 218 mg/L. A weight-of-evidence approach, using the data obtained from the TIE, the polymer toxicity experiments, the estimated concentration of the cationic polymer in the kimberlite effluent, and the behavior of kimberlite minerals in pH-adjusted solutions provided sufficient evidence to identify the cationic DADMAC polymer as the toxic component of the diamond mine PKE.

  9. An integrative approach for biological data mining and visualisation.

    PubMed

    Gopalacharyulu, Peddinti V; Lindfors, Erno; Miettinen, Jarkko; Bounsaythip, Catherine K; Oresic, Matej

    2008-01-01

    The emergence of systems biology necessitates development of platforms to organise and interpret plentitude of biological data. We present a system to integrate data across multiple bioinformatics databases and enable mining across various conceptual levels of biological information. The results are represented as complex networks. Context dependent mining of these networks is achieved by use of distances. Our approach is demonstrated with three applications: full metabolic network retrieval with network topology study, exploration of properties and relationships of a set of selected proteins, and combined visualisation and exploration of gene expression data with related pathways and ontologies.

  10. Data mining approach to model the diagnostic service management.

    PubMed

    Lee, Sun-Mi; Lee, Ae-Kyung; Park, Il-Su

    2006-01-01

    Korea has National Health Insurance Program operated by the government-owned National Health Insurance Corporation, and diagnostic services are provided every two year for the insured and their family members. Developing a customer relationship management (CRM) system using data mining technology would be useful to improve the performance of diagnostic service programs. Under these circumstances, this study developed a model for diagnostic service management taking into account the characteristics of subjects using a data mining approach. This study could be further used to develop an automated CRM system contributing to the increase in the rate of receiving diagnostic services.

  11. A node linkage approach for sequential pattern mining.

    PubMed

    Navarro, Osvaldo; Cumplido, René; Villaseñor-Pineda, Luis; Feregrino-Uribe, Claudia; Carrasco-Ochoa, Jesús Ariel

    2014-01-01

    Sequential Pattern Mining is a widely addressed problem in data mining, with applications such as analyzing Web usage, examining purchase behavior, and text mining, among others. Nevertheless, with the dramatic increase in data volume, the current approaches prove inefficient when dealing with large input datasets, a large number of different symbols and low minimum supports. In this paper, we propose a new sequential pattern mining algorithm, which follows a pattern-growth scheme to discover sequential patterns. Unlike most pattern growth algorithms, our approach does not build a data structure to represent the input dataset, but instead accesses the required sequences through pseudo-projection databases, achieving better runtime and reducing memory requirements. Our algorithm traverses the search space in a depth-first fashion and only preserves in memory a pattern node linkage and the pseudo-projections required for the branch being explored at the time. Experimental results show that our new approach, the Node Linkage Depth-First Traversal algorithm (NLDFT), has better performance and scalability in comparison with state of the art algorithms.

  12. A Node Linkage Approach for Sequential Pattern Mining

    PubMed Central

    Navarro, Osvaldo; Cumplido, René; Villaseñor-Pineda, Luis; Feregrino-Uribe, Claudia; Carrasco-Ochoa, Jesús Ariel

    2014-01-01

    Sequential Pattern Mining is a widely addressed problem in data mining, with applications such as analyzing Web usage, examining purchase behavior, and text mining, among others. Nevertheless, with the dramatic increase in data volume, the current approaches prove inefficient when dealing with large input datasets, a large number of different symbols and low minimum supports. In this paper, we propose a new sequential pattern mining algorithm, which follows a pattern-growth scheme to discover sequential patterns. Unlike most pattern growth algorithms, our approach does not build a data structure to represent the input dataset, but instead accesses the required sequences through pseudo-projection databases, achieving better runtime and reducing memory requirements. Our algorithm traverses the search space in a depth-first fashion and only preserves in memory a pattern node linkage and the pseudo-projections required for the branch being explored at the time. Experimental results show that our new approach, the Node Linkage Depth-First Traversal algorithm (NLDFT), has better performance and scalability in comparison with state of the art algorithms. PMID:24933123

  13. Mining the Metabiome: Identifying Novel Natural Products from Microbial Communities

    PubMed Central

    Milshteyn, Aleksandr; Schneider, Jessica S.; Brady, Sean F.

    2014-01-01

    Summary Microbial-derived natural products provide the foundation for most of the chemotherapeutic arsenal available to contemporary medicine. In the face of a dwindling pipeline of new lead structures identified by traditional culturing techniques and an increasing need for new therapeutics, surveys of microbial biosynthetic diversity across environmental metabiomes have revealed enormous reservoirs of as yet untapped natural products chemistry. In this review we touch on the historical context of microbial natural product discovery and discuss innovations and technological advances that are facilitating culture-dependent and culture-independent access to new chemistry from environmental microbiomes with the goal of re-invigorating the small molecule therapeutics discovery pipeline. We highlight the successful strategies that have emerged and some of the challenges that must be overcome to enable the development of high-throughput methods for natural product discovery from complex microbial communities. PMID:25237864

  14. Data Mining Approaches for Modeling Complex Electronic Circuit Design Activities

    SciTech Connect

    Kwon, Yongjin; Omitaomu, Olufemi A; Wang, Gi-Nam

    2008-01-01

    A printed circuit board (PCB) is an essential part of modern electronic circuits. It is made of a flat panel of insulating materials with patterned copper foils that act as electric pathways for various components such as ICs, diodes, capacitors, resistors, and coils. The size of PCBs has been shrinking over the years, while the number of components mounted on these boards has increased considerably. This trend makes the design and fabrication of PCBs ever more difficult. At the beginning of design cycles, it is important to estimate the time to complete the steps required accurately, based on many factors such as the required parts, approximate board size and shape, and a rough sketch of schematics. Current approach uses multiple linear regression (MLR) technique for time and cost estimations. However, the need for accurate predictive models continues to grow as the technology becomes more advanced. In this paper, we analyze a large volume of historical PCB design data, extract some important variables, and develop predictive models based on the extracted variables using a data mining approach. The data mining approach uses an adaptive support vector regression (ASVR) technique; the benchmark model used is the MLR technique currently being used in the industry. The strengths of SVR for this data include its ability to represent data in high-dimensional space through kernel functions. The computational results show that a data mining approach is a better prediction technique for this data. Our approach reduces computation time and enhances the practical applications of the SVR technique.

  15. Application of data mining approaches to drug delivery.

    PubMed

    Ekins, Sean; Shimada, Jun; Chang, Cheng

    2006-11-30

    Computational approaches play a key role in all areas of the pharmaceutical industry from data mining, experimental and clinical data capture to pharmacoeconomics and adverse events monitoring. They will likely continue to be indispensable assets along with a growing library of software applications. This is primarily due to the increasingly massive amount of biology, chemistry and clinical data, which is now entering the public domain mainly as a result of NIH and commercially funded projects. We are therefore in need of new methods for mining this mountain of data in order to enable new hypothesis generation. The computational approaches include, but are not limited to, database compilation, quantitative structure activity relationships (QSAR), pharmacophores, network visualization models, decision trees, machine learning algorithms and multidimensional data visualization software that could be used to improve drug delivery after mining public and/or proprietary data. We will discuss some areas of unmet needs in the area of data mining for drug delivery that can be addressed with new software tools or databases of relevance to future pharmaceutical projects.

  16. Selecting Proper Plant Species for Mine Reclamation Using Fuzzy AHP Approach (Case Study: Chadormaloo Iron Mine of Iran)

    NASA Astrophysics Data System (ADS)

    Ebrahimabadi, Arash

    2016-12-01

    This paper describes an effective approach to select suitable plant species for reclamation of mined lands in Chadormaloo iron mine which is located in central part of Iran, near the city of Bafgh in Yazd province. After mine's total reserves are excavated, the mine requires to be permanently closed and reclaimed. Mine reclamation and post-mining land-use are the main issues in the phase of mine closure. In general, among various scenarios for mine reclamation process, i.e. planting, agriculture, forestry, residency, tourist attraction, etc., planting is the oldest and commonly-used technology for the reclamation of lands damaged by mining activities. Planting and vegetation play a major role in restoring productivity, ecosystem stability and biological diversity to degraded areas, therefore the main goal of this research work is to choose proper and suitable plants compatible with the conditions of Chadormaloo mined area, providing consistent conditions for future use. To ensure the sustainability of the reclaimed landscape, the most suitable plant species adapted to the mine conditions are selected. Plant species selection is a Multi Criteria Decision Making (MCDM) problem. In this paper, a fuzzy MCDM technique, namely Fuzzy Analytic Hierarchy Process (FAHP) is developed to assist chadormaloo iron mine managers and designers in the process of plant type selection for reclamation of the mine under fuzzy environment where the vagueness and uncertainty are taken into account with linguistic variables parameterized by triangular fuzzy numbers. The results achieved from using FAHP approach demonstrate that the most proper plant species are ranked as Artemisia sieberi, Salsola yazdiana, Halophytes types, and Zygophyllum, respectively for reclamation of Chadormaloo iron mine.

  17. Mining Functional Modules in Heterogeneous Biological Networks Using Multiplex PageRank Approach

    PubMed Central

    Li, Jun; Zhao, Patrick X.

    2016-01-01

    Identification of functional modules/sub-networks in large-scale biological networks is one of the important research challenges in current bioinformatics and systems biology. Approaches have been developed to identify functional modules in single-class biological networks; however, methods for systematically and interactively mining multiple classes of heterogeneous biological networks are lacking. In this paper, we present a novel algorithm (called mPageRank) that utilizes the Multiplex PageRank approach to mine functional modules from two classes of biological networks. We demonstrate the capabilities of our approach by successfully mining functional biological modules through integrating expression-based gene-gene association networks and protein-protein interaction networks. We first compared the performance of our method with that of other methods using simulated data. We then applied our method to identify the cell division cycle related functional module and plant signaling defense-related functional module in the model plant Arabidopsis thaliana. Our results demonstrated that the mPageRank method is effective for mining sub-networks in both expression-based gene-gene association networks and protein-protein interaction networks, and has the potential to be adapted for the discovery of functional modules/sub-networks in other heterogeneous biological networks. The mPageRank executable program, source code, the datasets and results of the presented two case studies are publicly and freely available at http://plantgrn.noble.org/MPageRank/. PMID:27446133

  18. Using Helicopter Electromagnetic Surveys to Identify Potential Hazards at Mine Waste Impoundments

    SciTech Connect

    Hammack, R.W.

    2008-01-01

    In July 2003, helicopter electromagnetic surveys were conducted at 14 coal waste impoundments in southern West Virginia. The purpose of the surveys was to detect conditions that could lead to impoundment failure either by structural failure of the embankment or by the flooding of adjacent or underlying mine works. Specifically, the surveys attempted to: 1) identify saturated zones within the mine waste, 2) delineate filtrate flow paths through the embankment or into adjacent strata and receiving streams, and 3) identify flooded mine workings underlying or adjacent to the waste impoundment. Data from the helicopter surveys were processed to generate conductivity/depth images. Conductivity/depth images were then spatially linked to georeferenced air photos or topographic maps for interpretation. Conductivity/depth images were found to provide a snapshot of the hydrologic conditions that exist within the impoundment. This information can be used to predict potential areas of failure within the embankment because of its ability to image the phreatic zone. Also, the electromagnetic survey can identify areas of unconsolidated slurry in the decant basin and beneath the embankment. Although shallow, flooded mineworks beneath the impoundment were identified by this survey, it cannot be assumed that electromagnetic surveys can detect all underlying mines. A preliminary evaluation of the data implies that helicopter electromagnetic surveys can provide a better understanding of the phreatic zone than the piezometer arrays that are typically used.

  19. A Pattern Mining Approach for Classifying Multivariate Temporal Data.

    PubMed

    Batal, Iyad; Valizadegan, Hamed; Cooper, Gregory F; Hauskrecht, Milos

    2011-11-12

    We study the problem of learning classification models from complex multivariate temporal data encountered in electronic health record systems. The challenge is to define a good set of features that are able to represent well the temporal aspect of the data. Our method relies on temporal abstractions and temporal pattern mining to extract the classification features. Temporal pattern mining usually returns a large number of temporal patterns, most of which may be irrelevant to the classification task. To address this problem, we present the minimal predictive temporal patterns framework to generate a small set of predictive and non-spurious patterns. We apply our approach to the real-world clinical task of predicting patients who are at risk of developing heparin induced thrombocytopenia. The results demonstrate the benefit of our approach in learning accurate classifiers, which is a key step for developing intelligent clinical monitoring systems.

  20. Mining Clinicians' Electronic Documentation to Identify Heart Failure Patients with Ineffective Self-Management: A Pilot Text-Mining Study.

    PubMed

    Topaz, Maxim; Radhakrishnan, Kavita; Lei, Victor; Zhou, Li

    2016-01-01

    Effective self-management can decrease up to 50% of heart failure hospitalizations. Unfortunately, self-management by patients with heart failure remains poor. This pilot study aimed to explore the use of text-mining to identify heart failure patients with ineffective self-management. We first built a comprehensive self-management vocabulary based on the literature and clinical notes review. We then randomly selected 545 heart failure patients treated within Partners Healthcare hospitals (Boston, MA, USA) and conducted a regular expression search with the compiled vocabulary within 43,107 interdisciplinary clinical notes of these patients. We found that 38.2% (n = 208) patients had documentation of ineffective heart failure self-management in the domains of poor diet adherence (28.4%), missed medical encounters (26.4%) poor medication adherence (20.2%) and non-specified self-management issues (e.g., "compliance issues", 34.6%). We showed the feasibility of using text-mining to identify patients with ineffective self-management. More natural language processing algorithms are needed to help busy clinicians identify these patients.

  1. Data Mining for Identifying Novel Associations and Temporal Relationships with Charcot Foot

    PubMed Central

    Munson, Michael E.; Wrobel, James S.; Holmes, Crystal M.; Hanauer, David A.

    2014-01-01

    Introduction. Charcot foot is a rare and devastating complication of diabetes. While some risk factors are known, debate continues regarding etiology. Elucidating other associated disorders and their temporal occurrence could lead to a better understanding of its pathogenesis. We applied a large data mining approach to Charcot foot for elucidating novel associations. Methods. We conducted an association analysis using ICD-9 diagnosis codes for every patient in our health system (n = 1.6 million with 41.2 million time-stamped ICD-9 codes). For the current analysis, we focused on the 388 patients with Charcot foot (ICD-9 713.5). Results. We found 710 associations, 676 (95.2%) of which had a P value for the association less than 1.0 × 10−5 and 603 (84.9%) of which had an odds ratio > 5.0. There were 111 (15.6%) associations with a significant temporal relationship (P < 1.0 × 10−3). The three novel associations with the strongest temporal component were cardiac dysrhythmia, pulmonary eosinophilia, and volume depletion disorder. Conclusion. We identified novel associations with Charcot foot in the context of pathogenesis models that include neurotrophic, neurovascular, and microtraumatic factors mediated through inflammatory cytokines. Future work should focus on confirmatory analyses. These novel areas of investigation could lead to prevention or earlier diagnosis. PMID:24868558

  2. A geomorphological approach to the management of rivers contaminated by metal mining

    NASA Astrophysics Data System (ADS)

    Macklin, M. G.; Brewer, P. A.; Hudson-Edwards, K. A.; Bird, G.; Coulthard, T. J.; Dennis, I. A.; Lechler, P. J.; Miller, J. R.; Turner, J. N.

    2006-09-01

    As the result of current and historical metal mining, river channels and floodplains in many parts of the world have become contaminated by metal-rich waste in concentrations that may pose a hazard to human livelihoods and sustainable development. Environmental and human health impacts commonly arise because of the prolonged residence time of heavy metals in river sediments and alluvial soils and their bioaccumulatory nature in plants and animals. This paper considers how an understanding of the processes of sediment-associated metal dispersion in rivers, and the space and timescales over which they operate, can be used in a practical way to help river basin managers more effectively control and remediate catchments affected by current and historical metal mining. A geomorphological approach to the management of rivers contaminated by metals is outlined and four emerging research themes are highlighted and critically reviewed. These are: (1) response and recovery of river systems following the failures of major tailings dams; (2) effects of flooding on river contamination and the sustainable use of floodplains; (3) new developments in isotopic fingerprinting, remote sensing and numerical modelling for identifying the sources of contaminant metals and for mapping the spatial distribution of contaminants in river channels and floodplains; and (4) current approaches to the remediation of river basins affected by mining, appraised in light of the European Union's Water Framework Directive (2000/60/EC). Future opportunities for geomorphologically-based assessments of mining-affected catchments are also identified.

  3. Identifying Bully Victims: Definitional versus Behavioral Approaches

    PubMed Central

    Green, Jennifer Greif; Felix, Erika D.; Sharkey, Jill D.; Furlong, Michael J.; Kras, Jennifer E.

    2013-01-01

    Schools frequently assess bullying and the Olweus Bully/Victimization Questionnaire (BVQ; Olweus, 1996) is the most widely adopted tool for this purpose. The BVQ is a self-report survey that uses a definitional measurement method — describing “bullying” as involving repeated, intentional aggression in a relationship where there is an imbalance of power, and then asking respondents to indicate how frequently they experienced this type of victimization. Few studies have examined BVQ validity and whether this definitional method truly identifies the repetition and power differential that distinguish bullying from other forms of peer victimization. This study examined concurrent validity of the BVQ definitional question among 435 students reporting peer victimization. BVQ definitional responses were compared with responses to a behavioral measure that did not use the term “bullying,” but instead included items that asked about its defining characteristics (repetition, intentionality, power imbalance). Concordance between the two approaches was moderate, with an area under the receiver-operating curve of .72. BVQ responses were more strongly associated with students indicating repeated victimization and multiple forms of victimization, than with power imbalance in their relationship with the bully. Findings indicate that the BVQ is a valid measure of repeated victimization and a broad range of victimization experiences, but may not detect the more subtle and complex power imbalances that distinguish bullying from other forms of peer victimization. PMID:23244644

  4. Synoptic sampling and principal components analysis to identify sources of water and metals to an acid mine drainage stream

    USGS Publications Warehouse

    Byrne, Patrick; Runkel, Robert L.; Walton-Day, Katie

    2017-01-01

    Combining the synoptic mass balance approach with principal components analysis (PCA) can be an effective method for discretising the chemistry of inflows and source areas in watersheds where contamination is diffuse in nature and/or complicated by groundwater interactions. This paper presents a field-scale study in which synoptic sampling and PCA are employed in a mineralized watershed (Lion Creek, Colorado, USA) under low flow conditions to (i) quantify the impacts of mining activity on stream water quality; (ii) quantify the spatial pattern of constituent loading; and (iii) identify inflow sources most responsible for observed changes in stream chemistry and constituent loading. Several of the constituents investigated (Al, Cd, Cu, Fe, Mn, Zn) fail to meet chronic aquatic life standards along most of the study reach. The spatial pattern of constituent loading suggests four primary sources of contamination under low flow conditions. Three of these sources are associated with acidic (pH <3.1) seeps that enter along the left bank of Lion Creek. Investigation of inflow water (trace metal and major ion) chemistry using PCA suggests a hydraulic connection between many of the left bank inflows and mine water in the Minnesota Mine shaft located to the north-east of the river channel. In addition, water chemistry data during a rainfall-runoff event suggests the spatial pattern of constituent loading may be modified during rainfall due to dissolution of efflorescent salts or erosion of streamside tailings. These data point to the complexity of contaminant mobilisation processes and constituent loading in mining-affected watersheds but the combined synoptic sampling and PCA approach enables a conceptual model of contaminant dynamics to be developed to inform remediation.

  5. Magnetic signature of overbank sediment in industry impacted floodplains identified by data mining methods

    NASA Astrophysics Data System (ADS)

    Chudaničová, Monika; Hutchinson, Simon M.

    2016-11-01

    Our study attempts to identify a characteristic magnetic signature of overbank sediments exhibiting anthropogenically induced magnetic enhancement and thereby to distinguish them from unenhanced sediments with weak magnetic background values, using a novel approach based on data mining methods, thus providing a mean of rapid pollution determination. Data were obtained from 539 bulk samples from vertical profiles through overbank sediment, collected on seven rivers in the eastern Czech Republic and three rivers in northwest England. k-Means clustering and hierarchical clustering methods, paired group (UPGMA) and Ward's method, were used to divide the samples to natural groups according to their attributes. Interparametric ratios: SIRM/χ; SIRM/ARM; and S-0.1T were chosen as attributes for analyses making the resultant model more widely applicable as magnetic concentration values can differ by two orders. Division into three clusters appeared to be optimal and corresponded to inherent clusters in the data scatter. Clustering managed to separate samples with relatively weak anthropogenically induced enhancement, relatively strong anthropogenically induced enhancement and samples lacking enhancement. To describe the clusters explicitly and thus obtain a discrete magnetic signature, classification rules (JRip method) and decision trees (J4.8 and Simple Cart methods) were used. Samples lacking anthropogenic enhancement typically exhibited an S-0.1T < c. 0.5, SIRM/ARM < c. 150 and SIRM/χ < c. 6000 A m-1. Samples with magnetic enhancement all exhibited an S-0.1T > 0.5. Samples with relatively stronger anthropogenic enhancement were unequivocally distinguished from the samples with weaker enhancement by an SIRM/ARM > c. 150. Samples with SIRM/ARM in a range c. 126-150 were classified as relatively strongly enhanced when their SIRM/χ > 18 000 A m-1 and relatively less enhanced when their SIRM/χ < 18 000 A m-1. An additional rule was arbitrary added to exclude samples with

  6. Building a glaucoma interaction network using a text mining approach.

    PubMed

    Soliman, Maha; Nasraoui, Olfa; Cooper, Nigel G F

    2016-01-01

    The volume of biomedical literature and its underlying knowledge base is rapidly expanding, making it beyond the ability of a single human being to read through all the literature. Several automated methods have been developed to help make sense of this dilemma. The present study reports on the results of a text mining approach to extract gene interactions from the data warehouse of published experimental results which are then used to benchmark an interaction network associated with glaucoma. To the best of our knowledge, there is, as yet, no glaucoma interaction network derived solely from text mining approaches. The presence of such a network could provide a useful summative knowledge base to complement other forms of clinical information related to this disease. A glaucoma corpus was constructed from PubMed Central and a text mining approach was applied to extract genes and their relations from this corpus. The extracted relations between genes were checked using reference interaction databases and classified generally as known or new relations. The extracted genes and relations were then used to construct a glaucoma interaction network. Analysis of the resulting network indicated that it bears the characteristics of a small world interaction network. Our analysis showed the presence of seven glaucoma linked genes that defined the network modularity. A web-based system for browsing and visualizing the extracted glaucoma related interaction networks is made available at http://neurogene.spd.louisville.edu/GlaucomaINViewer/Form1.aspx. This study has reported the first version of a glaucoma interaction network using a text mining approach. The power of such an approach is in its ability to cover a wide range of glaucoma related studies published over many years. Hence, a bigger picture of the disease can be established. To the best of our knowledge, this is the first glaucoma interaction network to summarize the known literature. The major findings were a set of

  7. Clustering-based approaches to SAGE data mining

    PubMed Central

    Wang, Haiying; Zheng, Huiru; Azuaje, Francisco

    2008-01-01

    Serial analysis of gene expression (SAGE) is one of the most powerful tools for global gene expression profiling. It has led to several biological discoveries and biomedical applications, such as the prediction of new gene functions and the identification of biomarkers in human cancer research. Clustering techniques have become fundamental approaches in these applications. This paper reviews relevant clustering techniques specifically designed for this type of data. It places an emphasis on current limitations and opportunities in this area for supporting biologically-meaningful data mining and visualisation. PMID:18822151

  8. Efflorescent sulfates from Baia Sprie mining area (Romania)--Acid mine drainage and climatological approach.

    PubMed

    Buzatu, Andrei; Dill, Harald G; Buzgar, Nicolae; Damian, Gheorghe; Maftei, Andreea Elena; Apopei, Andrei Ionuț

    2016-01-15

    The Baia Sprie epithermal system, a well-known deposit for its impressive mineralogical associations, shows the proper conditions for acid mine drainage and can be considered a general example for affected mining areas around the globe. Efflorescent samples from the abandoned open pit Minei Hill have been analyzed by X-ray diffraction (XRD), scanning electron microscopy (SEM), Raman and near-infrared (NIR) spectrometry. The identified phases represent mostly iron sulfates with different hydration degrees (szomolnokite, rozenite, melanterite, coquimbite, ferricopiapite), Zn and Al sulfates (gunningite, alunogen, halotrichite). The samples were heated at different temperatures in order to establish the phase transformations among the studied sulfates. The dehydration temperatures and intermediate phases upon decomposition were successfully identified for each of mineral phases. Gunningite was the single sulfate that showed no transformations during the heating experiment. All the other sulfates started to dehydrate within the 30-90 °C temperature range. The acid mine drainage is the main cause for sulfates formation, triggered by pyrite oxidation as the major source for the abundant iron sulfates. Based on the dehydration temperatures, the climatological interpretation indicated that melanterite formation and long-term presence is related to continental and temperate climates. Coquimbite and rozenite are attributed also to the dry arid/semi-arid areas, in addition to the above mentioned ones. The more stable sulfates, alunogen, halotrichite, szomolnokite, ferricopiapite and gunningite, can form and persists in all climate regimes, from dry continental to even tropical humid. Copyright © 2015 Elsevier B.V. All rights reserved.

  9. Using cloud association rule data mining approach in optical networks

    NASA Astrophysics Data System (ADS)

    Ma, Bin

    2007-11-01

    In the current DWDM network, one of the critical design issues in the utilization of networks is careful planning to minimize burst dropping resulting from resource contention. The provision of suitable planning before metadata are sent is critical to improve the rate of successful transmission. In this paper, we attempt to adopt a novel data mining approaches to determining a suitable routing path in the OBS network. Instead of using label switching techniques in DWDM, we proposed the hybrid OBS routing planning on the basics of Cloud Association Rules Algorithm, thus reduced the transmission collision rate in OBS routing. This paper searches for the optimal routing path from all the possible routing paths using cloud association rule approach with Apriori-gen algorithm based on the PACNet topology. The heuristic rules discovered by Apriori-gen algorithm are stored in the Knowledge Base (KB) as references for determining the most suitable routing path. The Knowledge Base of the routing path are set up by means of optimal path routing with the highest successful rate which is mined from the database of historical routing paths using cloud association rules. The experiment results show that the successful rates of routing paths obtained by the proposed routing planning approach can effectively improve the successful rates of transmission.

  10. WHAT INNOVATIVE APPROACHES CAN BE DEVELOPED FOR MINING SITES?

    EPA Science Inventory

    Mining is essential to maintain our way of life. However, based upon industry's reporting in the most recent Toxic Release Inventory (TRI), the primary sources of heavy metal releases to the environment are mining and mining related activities. The hard rock mining industry rel...

  11. WHAT INNOVATIVE APPROACHES CAN BE DEVELOPED FOR MINING SITES?

    EPA Science Inventory

    Mining is essential to maintain our way of life. However, based upon industry's reporting in the most recent Toxic Release Inventory (TRI), the primary sources of heavy metal releases to the environment are mining and mining related activities. The hard rock mining industry rel...

  12. A Practical Approach for Content Mining of Tweets

    PubMed Central

    Yoon, Sunmoo; Elhadad, Noémie; Bakken, Suzanne

    2013-01-01

    Use of data generated through social media for health studies is gradually increasing. Twitter is a short-text message system developed 6 years ago, now with more than 100 million users generating over 300 million Tweets every day. Twitter may be used to gain real-world insights to promote healthy behaviors. The purposes of this paper are to describe a practical approach to analyzing Tweet contents and to illustrate an application of the approach to the topic of physical activity. The approach includes five steps: (1) selecting keywords to gather an initial set of Tweets to analyze; (2) importing data; (3) preparing data; (4) analyzing data (topic, sentiment, and ecologic context); and (5) interpreting data. The steps are implemented using tools that are publically available and free of charge and designed for use by researchers with limited programming skills. Content mining of Tweets can contribute to addressing challenges in health behavior research. PMID:23790998

  13. A practical approach for content mining of Tweets.

    PubMed

    Yoon, Sunmoo; Elhadad, Noémie; Bakken, Suzanne

    2013-07-01

    Use of data generated through social media for health studies is gradually increasing. Twitter is a short-text message system developed 6 years ago, now with more than 100 million users generating over 300 million Tweets every day. Twitter may be used to gain real-world insights to promote healthy behaviors. The purposes of this paper are to describe a practical approach to analyzing Tweet contents and to illustrate an application of the approach to the topic of physical activity. The approach includes five steps: (1) selecting keywords to gather an initial set of Tweets to analyze; (2) importing data; (3) preparing data; (4) analyzing data (topic, sentiment, and ecologic context); and (5) interpreting data. The steps are implemented using tools that are publically available and free of charge and designed for use by researchers with limited programming skills. Content mining of Tweets can contribute to addressing challenges in health behavior research.

  14. Approaches to Post-Mining Land Reclamation in Polish Open-Cast Lignite Mining

    NASA Astrophysics Data System (ADS)

    Kasztelewicz, Zbigniew

    2014-06-01

    The paper presents the situation regarding the reclamation of post-mining land in the case of particular lignite mines in Poland until 2012 against the background of the whole opencast mining. It discusses the process of land purchase for mining operations and its sales after reclamation. It presents the achievements of mines in the reclamation and regeneration of post-mining land as a result of which-after development processes carried out according to European standards-it now serves the inhabitants as a recreational area that increases the attractiveness of the regions.

  15. Different medical data mining approaches based prediction of ischemic stroke.

    PubMed

    Arslan, Ahmet Kadir; Colak, Cemil; Sarihan, Mehmet Ediz

    2016-07-01

    Medical data mining (also called knowledge discovery process in medicine) processes for extracting patterns from large datasets. In the current study, we intend to assess different medical data mining approaches to predict ischemic stroke. The collected dataset from Turgut Ozal Medical Centre, Inonu University, Malatya, Turkey, comprised the medical records of 80 patients and 112 healthy individuals with 17 predictors and a target variable. As data mining approaches, support vector machine (SVM), stochastic gradient boosting (SGB) and penalized logistic regression (PLR) were employed. 10-fold cross validation resampling method was utilized, and model performance evaluation metrics were accuracy, area under ROC curve (AUC), sensitivity, specificity, positive predictive value and negative predictive value. The grid search method was used for optimizing tuning parameters of the models. The accuracy values with 95% CI were 0.9789 (0.9470-0.9942) for SVM, 0.9737 (0.9397-0.9914) for SGB and 0.8947 (0.8421-0.9345) for PLR. The AUC values with 95% CI were 0.9783 (0.9569-0.9997) for SVM, 0.9757 (0.9543-0.9970) for SGB and 0.8953 (0.8510-0.9396) for PLR. The results of the current study demonstrated that the SVM produced the best predictive performance compared to the other models according to the majority of evaluation metrics. SVM and SGB models explained in the current study could yield remarkable predictive performance in the classification of ischemic stroke. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  16. Identifying and Describing a Seismogenic Zone in a Sublevel Caving Mine

    NASA Astrophysics Data System (ADS)

    Abolfazlzadeh, Yousef; Hudyma, Marty

    2016-09-01

    Analysis of caving-induced seismicity can aid in the understanding of rock mass behaviour in the different stages of the caving process. A detailed analysis of caving-induced seismicity at the Telfer sublevel caving mine was undertaken. Interpretation of seismic data in the Telfer mine showed the influence of the major geological features on cave behaviour and helped to identify the phases of cave evolution. Two geological zones with unique seismic characteristics (the M50 and M30 stiff reefs) and four key caving phases (initial undercut blasting, cave initiation, cave propagation and breakthrough) were defined through seismic data analysis. Movement of the seismogenic zone was significantly affected by the stiff reefs within the cave column. Seismic source parameter analysis was used to investigate caving mechanisms at Telfer.

  17. An efficacy driven approach for medication recommendation in type 2 diabetes treatment using data mining techniques.

    PubMed

    Liu, Haifeng; Xie, Guotong; Mei, Jing; Shen, Weijia; Sun, Wen; Li, Xiang

    2013-01-01

    We demonstrate how data mining techniques can help recommend effective medications when physicians need to control the glucose level of patients with type 2 diabetes. We first identify the factors that may affect physicians' medication decisions and then develop a patient-similarity based approach to automatically recommend medications for a patient with the specific condition so that his blood glucose level (measured by HbA1C value) can be well controlled. The approach is validated through experiments on real data sets and compared with the recommendations by following a clinical guideline.

  18. An Integrated Assessment Approach to Address Artisanal and Small-Scale Gold Mining in Ghana

    PubMed Central

    Basu, Niladri; Renne, Elisha P.; Long, Rachel N.

    2015-01-01

    Artisanal and small-scale gold mining (ASGM) is growing in many regions of the world including Ghana. The problems in these communities are complex and multi-faceted. To help increase understanding of such problems, and to enable consensus-building and effective translation of scientific findings to stakeholders, help inform policies, and ultimately improve decision making, we utilized an Integrated Assessment approach to study artisanal and small-scale gold mining activities in Ghana. Though Integrated Assessments have been used in the fields of environmental science and sustainable development, their use in addressing specific matter in public health, and in particular, environmental and occupational health is quite limited despite their many benefits. The aim of the current paper was to describe specific activities undertaken and how they were organized, and the outputs and outcomes of our activity. In brief, three disciplinary workgroups (Natural Sciences, Human Health, Social Sciences and Economics) were formed, with 26 researchers from a range of Ghanaian institutions plus international experts. The workgroups conducted activities in order to address the following question: What are the causes, consequences and correctives of small-scale gold mining in Ghana? More specifically: What alternatives are available in resource-limited settings in Ghana that allow for gold-mining to occur in a manner that maintains ecological health and human health without hindering near- and long-term economic prosperity? Several response options were identified and evaluated, and are currently being disseminated to various stakeholders within Ghana and internationally. PMID:26393627

  19. An Integrated Assessment Approach to Address Artisanal and Small-Scale Gold Mining in Ghana.

    PubMed

    Basu, Niladri; Renne, Elisha P; Long, Rachel N

    2015-09-17

    Artisanal and small-scale gold mining (ASGM) is growing in many regions of the world including Ghana. The problems in these communities are complex and multi-faceted. To help increase understanding of such problems, and to enable consensus-building and effective translation of scientific findings to stakeholders, help inform policies, and ultimately improve decision making, we utilized an Integrated Assessment approach to study artisanal and small-scale gold mining activities in Ghana. Though Integrated Assessments have been used in the fields of environmental science and sustainable development, their use in addressing specific matter in public health, and in particular, environmental and occupational health is quite limited despite their many benefits. The aim of the current paper was to describe specific activities undertaken and how they were organized, and the outputs and outcomes of our activity. In brief, three disciplinary workgroups (Natural Sciences, Human Health, Social Sciences and Economics) were formed, with 26 researchers from a range of Ghanaian institutions plus international experts. The workgroups conducted activities in order to address the following question: What are the causes, consequences and correctives of small-scale gold mining in Ghana? More specifically: What alternatives are available in resource-limited settings in Ghana that allow for gold-mining to occur in a manner that maintains ecological health and human health without hindering near- and long-term economic prosperity? Several response options were identified and evaluated, and are currently being disseminated to various stakeholders within Ghana and internationally.

  20. Development and application of the Safe Performance Index as a risk-based methodology for identifying major hazard-related safety issues in underground coal mines

    NASA Astrophysics Data System (ADS)

    Kinilakodi, Harisha

    The underground coal mining industry has been under constant watch due to the high risk involved in its activities, and scrutiny increased because of the disasters that occurred in 2006-07. In the aftermath of the incidents, the U.S. Congress passed the Mine Improvement and New Emergency Response Act of 2006 (MINER Act), which strengthened the existing regulations and mandated new laws to address the various issues related to a safe working environment in the mines. Risk analysis in any form should be done on a regular basis to tackle the possibility of unwanted major hazard-related events such as explosions, outbursts, airbursts, inundations, spontaneous combustion, and roof fall instabilities. One of the responses by the Mine Safety and Health Administration (MSHA) in 2007 involved a new pattern of violations (POV) process to target mines with a poor safety performance, specifically to improve their safety. However, the 2010 disaster (worst in 40 years) gave an impression that the collective effort of the industry, federal/state agencies, and researchers to achieve the goal of zero fatalities and serious injuries has gone awry. The Safe Performance Index (SPI) methodology developed in this research is a straight-forward, effective, transparent, and reproducible approach that can help in identifying and addressing some of the existing issues while targeting (poor safety performance) mines which need help. It combines three injury and three citation measures that are scaled to have an equal mean (5.0) in a balanced way with proportionate weighting factors (0.05, 0.15, 0.30) and overall normalizing factor (15) into a mine safety performance evaluation tool. It can be used to assess the relative safety-related risk of mines, including by mine-size category. Using 2008 and 2009 data, comparisons were made of SPI-associated, normalized safety performance measures across mine-size categories, with emphasis on small-mine safety performance as compared to large- and

  1. Identifying Understudied Nuclear Reactions by Text-mining the EXFOR Experimental Nuclear Reaction Library

    SciTech Connect

    Hirdt, J.A.; Brown, D.A.

    2016-01-15

    The EXFOR library contains the largest collection of experimental nuclear reaction data available as well as the data's bibliographic information and experimental details. We text-mined the REACTION and MONITOR fields of the ENTRYs in the EXFOR library in order to identify understudied reactions and quantities. Using the results of the text-mining, we created an undirected graph from the EXFOR datasets with each graph node representing a single reaction and quantity and graph links representing the various types of connections between these reactions and quantities. This graph is an abstract representation of the connections in EXFOR, similar to graphs of social networks, authorship networks, etc. We use various graph theoretical tools to identify important yet understudied reactions and quantities in EXFOR. Although we identified a few cross sections relevant for shielding applications and isotope production, mostly we identified charged particle fluence monitor cross sections. As a side effect of this work, we learn that our abstract graph is typical of other real-world graphs.

  2. Identifying Understudied Nuclear Reactions by Text-mining the EXFOR Experimental Nuclear Reaction Library

    NASA Astrophysics Data System (ADS)

    Hirdt, J. A.; Brown, D. A.

    2016-01-01

    The EXFOR library contains the largest collection of experimental nuclear reaction data available as well as the data's bibliographic information and experimental details. We text-mined the REACTION and MONITOR fields of the ENTRYs in the EXFOR library in order to identify understudied reactions and quantities. Using the results of the text-mining, we created an undirected graph from the EXFOR datasets with each graph node representing a single reaction and quantity and graph links representing the various types of connections between these reactions and quantities. This graph is an abstract representation of the connections in EXFOR, similar to graphs of social networks, authorship networks, etc. We use various graph theoretical tools to identify important yet understudied reactions and quantities in EXFOR. Although we identified a few cross sections relevant for shielding applications and isotope production, mostly we identified charged particle fluence monitor cross sections. As a side effect of this work, we learn that our abstract graph is typical of other real-world graphs.

  3. Identifying MMORPG Bots: A Traffic Analysis Approach

    NASA Astrophysics Data System (ADS)

    Chen, Kuan-Ta; Jiang, Jhih-Wei; Huang, Polly; Chu, Hao-Hua; Lei, Chin-Laung; Chen, Wen-Chin

    2008-12-01

    Massively multiplayer online role playing games (MMORPGs) have become extremely popular among network gamers. Despite their success, one of MMORPG's greatest challenges is the increasing use of game bots, that is, autoplaying game clients. The use of game bots is considered unsportsmanlike and is therefore forbidden. To keep games in order, game police, played by actual human players, often patrol game zones and question suspicious players. This practice, however, is labor-intensive and ineffective. To address this problem, we analyze the traffic generated by human players versus game bots and propose general solutions to identify game bots. Taking Ragnarok Online as our subject, we study the traffic generated by human players and game bots. We find that their traffic is distinguishable by 1) the regularity in the release time of client commands, 2) the trend and magnitude of traffic burstiness in multiple time scales, and 3) the sensitivity to different network conditions. Based on these findings, we propose four strategies and two ensemble schemes to identify bots. Finally, we discuss the robustness of the proposed methods against countermeasures of bot developers, and consider a number of possible ways to manage the increasingly serious bot problem.

  4. Evolutionary Data Mining Approach to Creating Digital Logic

    DTIC Science & Technology

    2010-01-01

    A data mining based procedure for automated reverse engineering has been developed. The data mining algorithm for reverse engineering uses a genetic...program (GP) as a data mining function. A genetic program is an algorithm based on the theory of evolution that automatically evolves populations of...based data mining is then conducted. This procedure incorporates not only the experts? rules into the fitness function, but also the information in the

  5. A Data Mining approach for building cost-sensitive and light intrusion detection models

    DTIC Science & Technology

    2004-03-01

    AFRL-IF-RS-TR-2004-84 Final Technical Report March 2004 A DATA MINING APPROACH FOR BUILDING COST-SENSITIVE AND LIGHT INTRUSION...4. TITLE AND SUBTITLE A DATA MINING APPROACH FOR BUILDING COST-SENSITIVE AND LIGHT INTRUSION DETECTION MODELS 6. AUTHOR(S) Wenke Lee...NUMBER OF PAGES 31 14. SUBJECT TERMS Intrusion Detection System, IDS, Data Mining , Anomaly Detection Algorithms, Alert Correlation, Light-Weight

  6. Identifying Catchment-Scale Predictors of Coal Mining Impacts on New Zealand Stream Communities.

    PubMed

    Clapcott, Joanne E; Goodwin, Eric O; Harding, Jon S

    2016-03-01

    Coal mining activities can have severe and long-term impacts on freshwater ecosystems. At the individual stream scale, these impacts have been well studied; however, few attempts have been made to determine the predictors of mine impacts at a regional scale. We investigated whether catchment-scale measures of mining impacts could be used to predict biological responses. We collated data from multiple studies and analyzed algae, benthic invertebrate, and fish community data from 186 stream sites, including un-mined streams, and those associated with 620 mines on the West Coast of the South Island, New Zealand. Algal, invertebrate, and fish richness responded to mine impacts and were significantly higher in un-mined compared to mine-impacted streams. Changes in community composition toward more acid- and metal-tolerant species were evident for algae and invertebrates, whereas changes in fish communities were significant and driven by a loss of nonmigratory native species. Consistent catchment-scale predictors of mining activities affecting biota included the time post mining (years), mining density (the number of mines upstream per catchment area), and mining intensity (tons of coal production per catchment area). Mining was associated with a decline in stream biodiversity irrespective of catchment size, and recovery was not evident until at least 30 years after mining activities have ceased. These catchment-scale predictors can provide managers and regulators with practical metrics to focus on management and remediation decisions.

  7. Identifying Catchment-Scale Predictors of Coal Mining Impacts on New Zealand Stream Communities

    NASA Astrophysics Data System (ADS)

    Clapcott, Joanne E.; Goodwin, Eric O.; Harding, Jon S.

    2016-03-01

    Coal mining activities can have severe and long-term impacts on freshwater ecosystems. At the individual stream scale, these impacts have been well studied; however, few attempts have been made to determine the predictors of mine impacts at a regional scale. We investigated whether catchment-scale measures of mining impacts could be used to predict biological responses. We collated data from multiple studies and analyzed algae, benthic invertebrate, and fish community data from 186 stream sites, including un-mined streams, and those associated with 620 mines on the West Coast of the South Island, New Zealand. Algal, invertebrate, and fish richness responded to mine impacts and were significantly higher in un-mined compared to mine-impacted streams. Changes in community composition toward more acid- and metal-tolerant species were evident for algae and invertebrates, whereas changes in fish communities were significant and driven by a loss of nonmigratory native species. Consistent catchment-scale predictors of mining activities affecting biota included the time post mining (years), mining density (the number of mines upstream per catchment area), and mining intensity (tons of coal production per catchment area). Mining was associated with a decline in stream biodiversity irrespective of catchment size, and recovery was not evident until at least 30 years after mining activities have ceased. These catchment-scale predictors can provide managers and regulators with practical metrics to focus on management and remediation decisions.

  8. Using machine vision and data mining techniques to identify cell properties via microfluidic flow analysis

    NASA Astrophysics Data System (ADS)

    Horowitz, Geoffrey; Bowie, Samuel; Liu, Anna; Stone, Nicholas; Sulchek, Todd; Alexeev, Alexander

    2016-11-01

    In order to quickly identify the wide range of mechanistic properties that are seen in cell populations, a coupled machine vision and data mining analysis is developed to examine high speed videos of cells flowing through a microfluidic device. The microfluidic device contains a microchannel decorated with a periodical array of diagonal ridges. The ridges compress flowing cells that results in complex cell trajectory and induces cell cross-channel drift, both depend on the cell intrinsic mechanical properties that can be used to characterize specific cell lines. Thus, the cell trajectory analysis can yield a parameter set that can serve as a unique identifier of a cell's membership to a specific cell population. By using the correlations between the cell populations and measured cell trajectories in the ridged microchannel, mechanical properties of individual cells and their specific populations can be identified via only information captured using video analysis. Financial support provided by National Science Foundation (NSF) Grant No. CMMI 1538161.

  9. A data-mining approach to predict influent quality.

    PubMed

    Kusiak, Andrew; Verma, Anoop; Wei, Xiupeng

    2013-03-01

    In wastewater treatment plants, predicting influent water quality is important for energy management. The influent water quality is measured by metrics such as carbonaceous biochemical oxygen demand (CBOD), potential of hydrogen, and total suspended solid. In this paper, a data-driven approach for time-ahead prediction of CBOD is presented. Due to limitations in the industrial data acquisition system, CBOD is not recorded at regular time intervals, which causes gaps in the time-series data. Numerous experiments have been performed to approximate the functional relationship between the input and output parameters and thereby fill in the missing CBOD data. Models incorporating seasonality effects are investigated. Four data-mining algorithms-multilayered perceptron, classification and regression tree, multivariate adaptive regression spline, and random forest-are employed to construct prediction models with the maximum prediction horizon of 5 days.

  10. A lazy data mining approach for protein classification.

    PubMed

    Merschmann, Luiz; Plastino, Alexandre

    2007-03-01

    In this work, we propose a new computational technique to solve the protein classification problem. The goal is to predict the functional family of novel protein sequences based on their motif composition. In order to improve the results obtained with other known approaches, we propose a new data mining technique for protein classification based on Bayes' theorem, called highest subset probability (HiSP). To evaluate our proposal, datasets extracted from Prosite, a curated protein family database, are used as experimental datasets. The computational results have shown that the proposed method outperforms other known methods for all tested datasets and looks very promising for problems with characteristics similar to the problem addressed here. In addition, our experiments suggest that HiSP performs well on highly imbalanced datasets.

  11. An efficient, versatile and scalable pattern growth approach to mine frequent patterns in unaligned protein sequences.

    PubMed

    Ye, Kai; Kosters, Walter A; Ijzerman, Adriaan P

    2007-03-15

    Pattern discovery in protein sequences is often based on multiple sequence alignments (MSA). The procedure can be computationally intensive and often requires manual adjustment, which may be particularly difficult for a set of deviating sequences. In contrast, two algorithms, PRATT2 (http//www.ebi.ac.uk/pratt/) and TEIRESIAS (http://cbcsrv.watson.ibm.com/) are used to directly identify frequent patterns from unaligned biological sequences without an attempt to align them. Here we propose a new algorithm with more efficiency and more functionality than both PRATT2 and TEIRESIAS, and discuss some of its applications to G protein-coupled receptors, a protein family of important drug targets. In this study, we designed and implemented six algorithms to mine three different pattern types from either one or two datasets using a pattern growth approach. We compared our approach to PRATT2 and TEIRESIAS in efficiency, completeness and the diversity of pattern types. Compared to PRATT2, our approach is faster, capable of processing large datasets and able to identify the so-called type III patterns. Our approach is comparable to TEIRESIAS in the discovery of the so-called type I patterns but has additional functionality such as mining the so-called type II and type III patterns and finding discriminating patterns between two datasets. The source code for pattern growth algorithms and their pseudo-code are available at http://www.liacs.nl/home/kosters/pg/.

  12. Using Frequent Item Set Mining and Feature Selection Methods to Identify Interacted Risk Factors - The Atrial Fibrillation Case Study.

    PubMed

    Li, Xiang; Liu, Haifeng; Du, Xin; Hu, Gang; Xie, Guotong; Zhang, Ping

    2016-01-01

    Disease risk prediction is highly important for early intervention and treatment, and identification of predictive risk factors is the key point to achieve accurate prediction. In addition to original independent features in a dataset, some interacted features, such as comorbidities and combination therapies, may have non-additive influence on the disease outcome and can also be used in risk prediction to improve the prediction performance. However, it is usually difficult to manually identify the possible interacted risk factors due to the combination explosion of features. In this paper, we propose an automatic approach to identify predictive risk factors with interactions using frequent item set mining and feature selection methods. The proposed approach was applied in the real world case study of predicting ischemic stroke and thromboembolism for atrial fibrillation patients on the Chinese atrial fibrillation registry dataset, and the results show that our approach can not only improve the prediction performance, but also identify the comorbidities and combination therapies that have potential influences on TE occurrence for AF.

  13. Online discourse on fibromyalgia: text-mining to identify clinical distinction and patient concerns.

    PubMed

    Park, Jungsik; Ryu, Young Uk

    2014-10-07

    The purpose of this study was to evaluate the possibility of using text-mining to identify clinical distinctions and patient concerns in online memoires posted by patients with fibromyalgia (FM). A total of 399 memoirs were collected from an FM group website. The unstructured data of memoirs associated with FM were collected through a crawling process and converted into structured data with a concordance, parts of speech tagging, and word frequency. We also conducted a lexical analysis and phrase pattern identification. After examining the data, a set of FM-related keywords were obtained and phrase net relationships were set through a web-based visualization tool. The clinical distinction of FM was verified. Pain is the biggest issue to the FM patients. The pains were affecting body parts including 'muscles,' 'leg,' 'neck,' 'back,' 'joints,' and 'shoulders' with accompanying symptoms such as 'spasms,' 'stiffness,' and 'aching,' and were described as 'sever,' 'chronic,' and 'constant.' This study also demonstrated that it was possible to understand the interests and concerns of FM patients through text-mining. FM patients wanted to escape from the pain and symptoms, so they were interested in medical treatment and help. Also, they seemed to have interest in their work and occupation, and hope to continue to live life through the relationships with the people around them. This research shows the potential for extracting keywords to confirm the clinical distinction of a certain disease, and text-mining can help objectively understand the concerns of patients by generalizing their large number of subjective illness experiences. However, it is believed that there are limitations to the processes and methods for organizing and classifying large amounts of text, so these limits have to be considered when analyzing the results. The development of research methodology to overcome these limitations is greatly needed.

  14. Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids

    PubMed Central

    2014-01-01

    Background Computational methods for mining of biomedical literature can be useful in augmenting manual searches of the literature using keywords for disease-specific biomarker discovery from biofluids. In this work, we develop and apply a semi-automated literature mining method to mine abstracts obtained from PubMed to discover putative biomarkers of breast and lung cancers in specific biofluids. Methodology A positive set of abstracts was defined by the terms ‘breast cancer’ and ‘lung cancer’ in conjunction with 14 separate ‘biofluids’ (bile, blood, breastmilk, cerebrospinal fluid, mucus, plasma, saliva, semen, serum, synovial fluid, stool, sweat, tears, and urine), while a negative set of abstracts was defined by the terms ‘(biofluid) NOT breast cancer’ or ‘(biofluid) NOT lung cancer.’ More than 5.3 million total abstracts were obtained from PubMed and examined for biomarker-disease-biofluid associations (34,296 positive and 2,653,396 negative for breast cancer; 28,355 positive and 2,595,034 negative for lung cancer). Biological entities such as genes and proteins were tagged using ABNER, and processed using Python scripts to produce a list of putative biomarkers. Z-scores were calculated, ranked, and used to determine significance of putative biomarkers found. Manual verification of relevant abstracts was performed to assess our method’s performance. Results Biofluid-specific markers were identified from the literature, assigned relevance scores based on frequency of occurrence, and validated using known biomarker lists and/or databases for lung and breast cancer [NCBI’s On-line Mendelian Inheritance in Man (OMIM), Cancer Gene annotation server for cancer genomics (CAGE), NCBI’s Genes & Disease, NCI’s Early Detection Research Network (EDRN), and others]. The specificity of each marker for a given biofluid was calculated, and the performance of our semi-automated literature mining method assessed for breast and lung cancer

  15. Precursor-centric genome-mining approach for lasso peptide discovery

    PubMed Central

    Maksimov, Mikhail O.; Pelczer, István; Link, A. James

    2012-01-01

    Lasso peptides are a class of ribosomally synthesized posttranslationally modified natural products found in bacteria. Currently known lasso peptides have a diverse set of pharmacologically relevant activities, including inhibition of bacterial growth, receptor antagonism, and enzyme inhibition. The biosynthesis of lasso peptides is specified by a cluster of three genes encoding a precursor protein and two enzymes. Here we develop a unique genome-mining algorithm to identify lasso peptide gene clusters in prokaryotes. Our approach involves pattern matching to a small number of conserved amino acids in precursor proteins, and thus allows for a more global survey of lasso peptide gene clusters than does homology-based genome mining. Of more than 3,000 currently sequenced prokaryotic genomes, we found 76 organisms that are putative lasso peptide producers. These organisms span nine bacterial phyla and an archaeal phylum. To provide validation of the genome-mining method, we focused on a single lasso peptide predicted to be produced by the freshwater bacterium Asticcacaulis excentricus. Heterologous expression of an engineered, minimal gene cluster in Escherichia coli led to the production of a unique lasso peptide, astexin-1. At 23 aa, astexin-1 is the largest lasso peptide isolated to date. It is also highly polar, in contrast to many lasso peptides that are primarily hydrophobic. Astexin-1 has modest antimicrobial activity against its phylogenetic relative Caulobacter crescentus. The solution structure of astexin-1 was determined revealing a unique topology that is stabilized by hydrogen bonding between segments of the peptide. PMID:22949633

  16. Precursor-centric genome-mining approach for lasso peptide discovery.

    PubMed

    Maksimov, Mikhail O; Pelczer, István; Link, A James

    2012-09-18

    Lasso peptides are a class of ribosomally synthesized posttranslationally modified natural products found in bacteria. Currently known lasso peptides have a diverse set of pharmacologically relevant activities, including inhibition of bacterial growth, receptor antagonism, and enzyme inhibition. The biosynthesis of lasso peptides is specified by a cluster of three genes encoding a precursor protein and two enzymes. Here we develop a unique genome-mining algorithm to identify lasso peptide gene clusters in prokaryotes. Our approach involves pattern matching to a small number of conserved amino acids in precursor proteins, and thus allows for a more global survey of lasso peptide gene clusters than does homology-based genome mining. Of more than 3,000 currently sequenced prokaryotic genomes, we found 76 organisms that are putative lasso peptide producers. These organisms span nine bacterial phyla and an archaeal phylum. To provide validation of the genome-mining method, we focused on a single lasso peptide predicted to be produced by the freshwater bacterium Asticcacaulis excentricus. Heterologous expression of an engineered, minimal gene cluster in Escherichia coli led to the production of a unique lasso peptide, astexin-1. At 23 aa, astexin-1 is the largest lasso peptide isolated to date. It is also highly polar, in contrast to many lasso peptides that are primarily hydrophobic. Astexin-1 has modest antimicrobial activity against its phylogenetic relative Caulobacter crescentus. The solution structure of astexin-1 was determined revealing a unique topology that is stabilized by hydrogen bonding between segments of the peptide.

  17. A New Approach in Coal Mine Exploration Using Cosmic Ray Muons

    NASA Astrophysics Data System (ADS)

    Darijani, Reza; Negarestani, Ali; Rezaie, Mohammad Reza; Fatemi, Syed Jalil; Akhond, Ahmad

    2016-08-01

    Muon radiography is a technique that uses cosmic ray muons to image the interior of large scale geological structures. The muon absorption in matter is the most important parameter in cosmic ray muon radiography. Cosmic ray muon radiography is similar to X-ray radiography. The main aim in this survey is the simulation of the muon radiography for exploration of mines. So, the production source, tracking, and detection of cosmic ray muons were simulated by MCNPX code. For this purpose, the input data of the source card in MCNPX code were extracted from the muon energy spectrum at sea level. In addition, the other input data such as average density and thickness of layers that were used in this code are the measured data from Pabdana (Kerman, Iran) coal mines. The average thickness and density of these layers in the coal mines are from 2 to 4 m and 1.3 gr/c3, respectively. To increase the spatial resolution, a detector was placed inside the mountain. The results indicated that using this approach, the layers with minimum thickness about 2.5 m can be identified.

  18. A data-mining approach for multiple structural alignment of proteins.

    PubMed

    Siu, Wing-Yan; Mamoulis, Nikos; Yiu, Siu-Ming; Chan, Ho-Leung

    2010-02-28

    Comparing the 3D structures of proteins is an important but computationally hard problem in bioinformatics. In this paper, we propose studying the problem when much less information or assumptions are available. We model the structural alignment of proteins as a combinatorial problem. In the problem, each protein is simply a set of points in the 3D space, without sequence order information, and the objective is to discover all large enough alignments for any subset of the input. We propose a data-mining approach for this problem. We first perform geometric hashing of the structures such that points with similar locations in the 3D space are hashed into the same bin in the hash table. The novelty is that we consider each bin as a coincidence group and mine for frequent patterns, which is a well-studied technique in data mining. We observe that these frequent patterns are already potentially large alignments. Then a simple heuristic is used to extend the alignments if possible. We implemented the algorithm and tested it using real protein structures. The results were compared with existing tools. They showed that the algorithm is capable of finding conserved substructures that do not preserve sequence order, especially those existing in protein interfaces. The algorithm can also identify conserved substructures of functionally similar structures within a mixture with dissimilar ones. The running time of the program was smaller or comparable to that of the existing tools.

  19. A Tools-Based Approach to Teaching Data Mining Methods

    ERIC Educational Resources Information Center

    Jafar, Musa J.

    2010-01-01

    Data mining is an emerging field of study in Information Systems programs. Although the course content has been streamlined, the underlying technology is still in a state of flux. The purpose of this paper is to describe how we utilized Microsoft Excel's data mining add-ins as a front-end to Microsoft's Cloud Computing and SQL Server 2008 Business…

  20. A Tools-Based Approach to Teaching Data Mining Methods

    ERIC Educational Resources Information Center

    Jafar, Musa J.

    2010-01-01

    Data mining is an emerging field of study in Information Systems programs. Although the course content has been streamlined, the underlying technology is still in a state of flux. The purpose of this paper is to describe how we utilized Microsoft Excel's data mining add-ins as a front-end to Microsoft's Cloud Computing and SQL Server 2008 Business…

  1. Risk evaluation of uranium mining: A geochemical inverse modelling approach

    NASA Astrophysics Data System (ADS)

    Rillard, J.; Zuddas, P.; Scislewski, A.

    2011-12-01

    It is well known that uranium extraction operations can increase risks linked to radiation exposure. The toxicity of uranium and associated heavy metals is the main environmental concern regarding exploitation and processing of U-ore. In areas where U mining is planned, a careful assessment of toxic and radioactive element concentrations is recommended before the start of mining activities. A background evaluation of harmful elements is important in order to prevent and/or quantify future water contamination resulting from possible migration of toxic metals coming from ore and waste water interaction. Controlled leaching experiments were carried out to investigate processes of ore and waste (leached ore) degradation, using samples from the uranium exploitation site located in Caetité-Bahia, Brazil. In experiments in which the reaction of waste with water was tested, we found that the water had low pH and high levels of sulphates and aluminium. On the other hand, in experiments in which ore was tested, the water had a chemical composition comparable to natural water found in the region of Caetité. On the basis of our experiments, we suggest that waste resulting from sulphuric acid treatment can induce acidification and salinization of surface and ground water. For this reason proper storage of waste is imperative. As a tool to evaluate the risks, a geochemical inverse modelling approach was developed to estimate the water-mineral interaction involving the presence of toxic elements. We used a method earlier described by Scislewski and Zuddas 2010 (Geochim. Cosmochim. Acta 74, 6996-7007) in which the reactive surface area of mineral dissolution can be estimated. We found that the reactive surface area of rock parent minerals is not constant during time but varies according to several orders of magnitude in only two months of interaction. We propose that parent mineral heterogeneity and particularly, neogenic phase formation may explain the observed variation of the

  2. A data mining approach to finding relationships between reservoir properties and oil production for CHOPS

    NASA Astrophysics Data System (ADS)

    Cai, Yongxiang; Wang, Xin; Hu, Kezhen; Dong, Mingzhe

    2014-12-01

    Cold heavy oil production with sand (CHOPS) is a primary oil extraction process for heavy crude oil and reservoir properties are key factors that contribute to the effectiveness of CHOPS. However, identification of the key reservoir properties and quantification of the relationships between the reservoir properties and the oil production are still challenging tasks. In this paper, we propose the use of a data mining approach for finding quantitative relationships between various reservoir properties and oil production for CHOPS. The approach includes four steps: firstly, a set of reservoir properties are identified to describe reservoir characteristics through a petrophysical analysis. In addition to common parameters, such as porosity and permeability, two new parameters - a fluid mobility factor and the maximum inscribed rectangular of net pay (MIRNP) - are proposed. Secondly, three new parameters to describe the production performance of wells are proposed: the peak value, effective life cycle and effective yield. Next, the fuzzy ranking method is used to rank the importance of the identified reservoir properties in terms of oil production. Finally, association rule mining is used to obtain quantitative relationships between reservoir property variables and the production performance of wells. The proposed methods have been applied for 118 wells in the Sparky Formation of the Lloydminster heavy oil field in Alberta. The result shows that the production performance of wells in the area could be described and predicted by using the found quantitative relations.

  3. TOXICITY APPROACHES TO ASSESSING MINING IMPACTS AND MINE WASTE TREATMENT EFFECTIVENESS

    EPA Science Inventory

    The USEPA Office of Research and Development's National Exposure Research Laboratory and National Risk Management Research Laboratory have been evaluating the impact of mining sites on receiving streams and the effectiveness of waste treatment technologies in removing toxicity fo...

  4. TOXICITY APPROACHES TO ASSESSING MINING IMPACTS AND MINE WASTE TREATMENT EFFECTIVENESS

    EPA Science Inventory

    The USEPA Office of Research and Development's National Exposure Research Laboratory and National Risk Management Research Laboratory have been evaluating the impact of mining sites on receiving streams and the effectiveness of waste treatment technologies in removing toxicity fo...

  5. National Conference on Mining-Influenced Waters: Approaches for Characterization, Source Control and Treatment

    EPA Science Inventory

    The conference goal was to provide a forum for the exchange of scientific information on current and emerging approaches to assessing characterization, monitoring, source control, treatment and/or remediation on mining-influenced waters. The conference was aimed at mining remedi...

  6. National Conference on Mining-Influenced Waters: Approaches for Characterization, Source Control and Treatment

    EPA Science Inventory

    The conference goal was to provide a forum for the exchange of scientific information on current and emerging approaches to assessing characterization, monitoring, source control, treatment and/or remediation on mining-influenced waters. The conference was aimed at mining remedi...

  7. Using text mining for study identification in systematic reviews: a systematic review of current approaches.

    PubMed

    O'Mara-Eves, Alison; Thomas, James; McNaught, John; Miwa, Makoto; Ananiadou, Sophia

    2015-01-14

    The large and growing number of published studies, and their increasing rate of publication, makes the task of identifying relevant studies in an unbiased way for inclusion in systematic reviews both complex and time consuming. Text mining has been offered as a potential solution: through automating some of the screening process, reviewer time can be saved. The evidence base around the use of text mining for screening has not yet been pulled together systematically; this systematic review fills that research gap. Focusing mainly on non-technical issues, the review aims to increase awareness of the potential of these technologies and promote further collaborative research between the computer science and systematic review communities. Five research questions led our review: what is the state of the evidence base; how has workload reduction been evaluated; what are the purposes of semi-automation and how effective are they; how have key contextual problems of applying text mining to the systematic review field been addressed; and what challenges to implementation have emerged? We answered these questions using standard systematic review methods: systematic and exhaustive searching, quality-assured data extraction and a narrative synthesis to synthesise findings. The evidence base is active and diverse; there is almost no replication between studies or collaboration between research teams and, whilst it is difficult to establish any overall conclusions about best approaches, it is clear that efficiencies and reductions in workload are potentially achievable. On the whole, most suggested that a saving in workload of between 30% and 70% might be possible, though sometimes the saving in workload is accompanied by the loss of 5% of relevant studies (i.e. a 95% recall). Using text mining to prioritise the order in which items are screened should be considered safe and ready for use in 'live' reviews. The use of text mining as a 'second screener' may also be used cautiously

  8. Acid mine drainage risks - A modeling approach to siting mine facilities in Northern Minnesota USA

    NASA Astrophysics Data System (ADS)

    Myers, Tom

    2016-02-01

    Most watershed-scale planning for mine-caused contamination concerns remediation of past problems while future planning relies heavily on engineering controls. As an alternative, a watershed scale groundwater fate and transport model for the Rainy Headwaters, a northeastern Minnesota watershed, has been developed to examine the risks of leaks or spills to a pristine downstream watershed. The model shows that the risk depends on the location and whether the source of the leak is on the surface or from deeper underground facilities. Underground sources cause loads that last longer but arrive at rivers after a longer travel time and have lower concentrations due to dilution and attenuation. Surface contaminant sources could cause much more short-term damage to the resource. Because groundwater dominates baseflow, mine contaminant seepage would cause the most damage during low flow periods. Groundwater flow and transport modeling is a useful tool for decreasing the risk to downgradient sources by aiding in the placement of mine facilities. Although mines are located based on the minerals, advance planning and analysis could avoid siting mine facilities where failure or leaks would cause too much natural resource damage. Watershed scale transport modeling could help locate the facilities or decide in advance that the mine should not be constructed due to the risk to downstream resources.

  9. Identifying Trustworthiness Deficit in Legacy Systems Using the NFR Approach

    DTIC Science & Technology

    2014-01-01

    Architecture Adaptability - An NFR Approach”, Proceedings of the International Workshop on Principles of Software Evolution (IWPSE 2001), ACM...identify solutions neces- sary to make that system trustworthy for a specified time-scale. In this paper we apply the NFR Approach to a selected software ...Phoenix by using the NFR Approach and developed a process for applying this approach to other software systems. Our study identified an 89% shortfall

  10. Identifying Liver Cancer and Its Relations with Diseases, Drugs, and Genes: A Literature-Based Approach

    PubMed Central

    Song, Min

    2016-01-01

    In biomedicine, scientific literature is a valuable source for knowledge discovery. Mining knowledge from textual data has become an ever important task as the volume of scientific literature is growing unprecedentedly. In this paper, we propose a framework for examining a certain disease based on existing information provided by scientific literature. Disease-related entities that include diseases, drugs, and genes are systematically extracted and analyzed using a three-level network-based approach. A paper-entity network and an entity co-occurrence network (macro-level) are explored and used to construct six entity specific networks (meso-level). Important diseases, drugs, and genes as well as salient entity relations (micro-level) are identified from these networks. Results obtained from the literature-based literature mining can serve to assist clinical applications. PMID:27195695

  11. Theoretical approaches to creation of robotic coal mines based on the synthesis of simulation technologies

    NASA Astrophysics Data System (ADS)

    Fryanov, V. N.; Pavlova, L. D.; Temlyantsev, M. V.

    2017-09-01

    Methodological approaches to theoretical substantiation of the structure and parameters of robotic coal mines are outlined. The results of mathematical and numerical modeling revealed the features of manifestation of geomechanical and gas dynamic processes in the conditions of robotic mines. Technological solutions for the design and manufacture of technical means for robotic mine are adopted using the method of economic and mathematical modeling and in accordance with the current regulatory documents. For a comparative performance evaluation of technological schemes of traditional and robotic mines, methods of cognitive modeling and matrix search for subsystem elements in the synthesis of a complex geotechnological system are applied. It is substantiated that the process of technical re-equipment of a traditional mine with a phased transition to a robotic mine will reduce unit costs by almost 1.5 times with a significant social effect due to a reduction in the number of personnel engaged in hazardous work.

  12. Analysis of biological processes and diseases using text mining approaches.

    PubMed

    Krallinger, Martin; Leitner, Florian; Valencia, Alfonso

    2010-01-01

    A number of biomedical text mining systems have been developed to extract biologically relevant information directly from the literature, complementing bioinformatics methods in the analysis of experimentally generated data. We provide a short overview of the general characteristics of natural language data, existing biomedical literature databases, and lexical resources relevant in the context of biomedical text mining. A selected number of practically useful systems are introduced together with the type of user queries supported and the results they generate. The extraction of biological relationships, such as protein-protein interactions as well as metabolic and signaling pathways using information extraction systems, will be discussed through example cases of cancer-relevant proteins. Basic strategies for detecting associations of genes to diseases together with literature mining of mutations, SNPs, and epigenetic information (methylation) are described. We provide an overview of disease-centric and gene-centric literature mining methods for linking genes to phenotypic and genotypic aspects. Moreover, we discuss recent efforts for finding biomarkers through text mining and for gene list analysis and prioritization. Some relevant issues for implementing a customized biomedical text mining system will be pointed out. To demonstrate the usefulness of literature mining for the molecular oncology domain, we implemented two cancer-related applications. The first tool consists of a literature mining system for retrieving human mutations together with supporting articles. Specific gene mutations are linked to a set of predefined cancer types. The second application consists of a text categorization system supporting breast cancer-specific literature search and document-based breast cancer gene ranking. Future trends in text mining emphasize the importance of community efforts such as the BioCreative challenge for the development and integration of multiple systems into

  13. Data Mining Approaches for Genome-Wide Association of Mood Disorders

    PubMed Central

    Pirooznia, Mehdi; Seifuddin, Fayaz; Judy, Jennifer; Mahon, Pamela B.; Potash, James B.; Zandi, Peter P.

    2012-01-01

    Mood disorders are highly heritable forms of major mental illness. A major breakthrough in elucidating the genetic architecture of mood disorders was anticipated with the advent of genome-wide association studies (GWAS). However, to date few susceptibility loci have been conclusively identified. The genetic etiology of mood disorders appears to be quite complex, and as a result, alternative approaches for analyzing GWAS data are needed. Recently, a polygenic scoring approach that captures the effects of alleles across multiple loci was successfully applied to the analysis of GWAS data in schizophrenia and bipolar disorder (BP). However, this method may be overly simplistic in its approach to the complexity of genetic effects. Data mining methods are available that may be applied to analyze the high dimensional data generated by GWAS of complex psychiatric disorders. We sought to compare the performance of five data mining methods, namely, Bayesian Networks (BN), Support Vector Machine (SVM), Random Forest (RF), Radial Basis Function network (RBF), and Logistic Regression (LR), against the polygenic scoring approach in the analysis of GWAS data on BP. The different classification methods were trained on GWAS datasets from the Bipolar Genome Study (2,191 cases with BP and 1,434 controls) and their ability to accurately classify case/control status was tested on a GWAS dataset from the Wellcome Trust Case Control Consortium. The performance of the classifiers in the test dataset was evaluated by comparing area under the receiver operating characteristic curves (AUC). BN performed the best of all the data mining classifiers, but none of these did significantly better than the polygenic score approach. We further examined a subset of SNPs in genes that are expressed in the brain, under the hypothesis that these might be most relevant to BP susceptibility, but all the classifiers performed worse with this reduced set of SNPs. The discriminative accuracy of all of these

  14. Drug-target networks for Tanshinone IIA identified by data mining.

    PubMed

    Chen, Shao-Jun

    2015-10-01

    Tanshinone IIA is a pharmacologically active compound isolated from Danshen (Salvia miltiorrhiza), a traditional Chinese herbal medicine for the management of cardiac diseases and other disorders. But its underlying molecular mechanisms of action are still unclear. The present investigation utilized a data mining approach based on network pharmacology to uncover the potential protein targets of Tanshinone IIA. Network pharmacology, an integrated multidisciplinary study, incorporates systems biology, network analysis, connectivity, redundancy, and pleiotropy, providing powerful new tools and insights into elucidating the fine details of drug-target interactions. In the present study, two separate drug-target networks for Tanshinone IIA were constructed using the Agilent Literature Search (ALS) and STITCH (search tool for interactions of chemicals) methods. Analysis of the ALS-constructed network revealed a target network with a scale-free topology and five top nodes (protein targets) corresponding to Fos, Jun, Src, phosphatidylinositol-4, 5-bisphosphate 3-kinase, catalytic subunit alpha (PIK3CA), and mitogen-activated protein kinase kinase 1 (MAP2K1), whereas analysis of the STITCH-constructed network revealed three top nodes corresponding to cytochrome P450 3A4 (CYP3A4), cytochrome P450 A1 (CYP1A1), and nuclear factor kappa B1 (NFκB1). The discrepancies were probably due to the differences in the divergent computer mining tools and databases employed by the two methods. However, it is conceivable that all eight proteins mediate important biological functions of Tanshinone IIA, contributing to its overall drug-target network. In conclusion, the current results may assist in developing a comprehensive understanding of the molecular mechanisms and signaling pathways of in a simple, compact, and visual manner. Copyright © 2015 China Pharmaceutical University. Published by Elsevier B.V. All rights reserved.

  15. Online Discourse on Fibromyalgia: Text-Mining to Identify Clinical Distinction and Patient Concerns

    PubMed Central

    Park, Jungsik; Ryu, Young Uk

    2014-01-01

    Background The purpose of this study was to evaluate the possibility of using text-mining to identify clinical distinctions and patient concerns in online memoires posted by patients with fibromyalgia (FM). Material/Methods A total of 399 memoirs were collected from an FM group website. The unstructured data of memoirs associated with FM were collected through a crawling process and converted into structured data with a concordance, parts of speech tagging, and word frequency. We also conducted a lexical analysis and phrase pattern identification. After examining the data, a set of FM-related keywords were obtained and phrase net relationships were set through a web-based visualization tool. Results The clinical distinction of FM was verified. Pain is the biggest issue to the FM patients. The pains were affecting body parts including ‘muscles,’ ‘leg,’ ‘neck,’ ‘back,’ ‘joints,’ and ‘shoulders’ with accompanying symptoms such as ‘spasms,’ ‘stiffness,’ and ‘aching,’ and were described as ‘sever,’ ‘chronic,’ and ‘constant.’ This study also demonstrated that it was possible to understand the interests and concerns of FM patients through text-mining. FM patients wanted to escape from the pain and symptoms, so they were interested in medical treatment and help. Also, they seemed to have interest in their work and occupation, and hope to continue to live life through the relationships with the people around them. Conclusions This research shows the potential for extracting keywords to confirm the clinical distinction of a certain disease, and text-mining can help objectively understand the concerns of patients by generalizing their large number of subjective illness experiences. However, it is believed that there are limitations to the processes and methods for organizing and classifying large amounts of text, so these limits have to be considered when analyzing the results. The development of research methodology to overcome

  16. Non-lexical approaches to identifying associative relations in the gene ontology.

    PubMed

    Bodenreider, Olivier; Aubry, Marc; Burgun, Anita

    2005-01-01

    The Gene Ontology (GO) is a controlled vocabulary widely used for the annotation of gene products. GO is organized in three hierarchies for molecular functions, cellular components, and biological processes but no relations are provided among terms across hierarchies. The objective of this study is to investigate three non-lexical approaches to identifying such associative relations in GO and compare them among themselves and to lexical approaches. The three approaches are: computing similarity in a vector space model, statistical analysis of co-occurrence of GO terms in annotation databases, and association rule mining. Five annotation databases (FlyBase, the Human subset of GOA, MGI, SGD, and WormBase) are used in this study. A total of 7,665 associations were identified by at least one of the three non-lexical approaches. Of these, 12% were identified by more than one approach. While there are almost 6,000 lexical relations among GO terms, only 203 associations were identified by both non-lexical and lexical approaches. The associations identified in this study could serve as the starting point for adding associative relations across hierarchies to GO, but would require manual curation. The application to quality assurance of annotation databases is also discussed.

  17. Identifying candidates with favorable prognosis following liver transplantation for hepatocellular carcinoma: Data mining analysis.

    PubMed

    Tanaka, Tomohiro; Kurosaki, Masayuki; Lilly, Leslie B; Izumi, Namiki; Sherman, Morris

    2015-07-01

    The optimal cutoff of each value in configuring selection criteria for pre-transplant assessment of hepatocellular carcinoma (HCC) remains uncertain. To build a predictive model for recurrent HCC, we performed data mining analysis on patients who underwent LT for HCC at University Health Network (n = 246). The model was externally validated using a cohort from the Scientific Registry of Transplant Recipients (SRTR) database (n = 9,769). Among 246 patients, 14.6% (n = 36) experienced recurrent HCC within 2.5 years post-LT. The risk prediction model for recurrent HCC identified two subgroups with low-risk (total tumor diameter [TTD] <4 cm and serum alpha-fetoprotein [AFP] <73 ng/ml, n = 135) and with high-risk (TTD >4 cm and/or AFP >73 ng/ml, n = 111). The reproducibility of the model was validated through the SRTR database; overall patient survival rate was significantly better in low-risk group than high-risk group (P < 0.0001). Using Cox regression model, this yardstick, not Milan criteria, was revealed to efficiently predict post-transplant survival independent of underlying characteristics (P < 0.0001). Grouping LT candidates with pre-LT HCC by the cutoffs of TTD 4 cm and AFP 73 ng/ml which were unearthed by data mining analysis efficiently classify patients according by the post-transplant prognosis. © 2015 Wiley Periodicals, Inc.

  18. Using a Data Mining Approach to Discover Behavior Correlates of Chronic Disease: A Case Study of Depression

    PubMed Central

    YOON, Sunmoo; TAHA, Basirah; BAKKEN, Suzanne

    2015-01-01

    The purposes of this methodological paper are: 1) to describe data mining methods for building a classification model for a chronic disease using a U.S. behavior risk factor data set, and 2) to illustrate application of the methods using a case study of depressive disorder. Methods described include: 1) six steps of data mining to build a disease model using classification techniques, 2) an innovative approach to analyzing high-dimensionality data, and 3) a visualization strategy to communicate with clinicians who are unfamiliar with advanced statistics. Our application of data mining strategies identified childhood experience living with mentally ill and sexual abuse, and limited usual activity as the strongest correlates of depression among hundreds variables. The methods that we applied may be useful to others wishing to build a classification model from complex, large volume datasets for other health conditions. PMID:24943527

  19. Using a data mining approach to discover behavior correlates of chronic disease: a case study of depression.

    PubMed

    Yoon, Sunmoo; Taha, Basirah; Bakken, Suzanne

    2014-01-01

    The purposes of this methodological paper are: 1) to describe data mining methods for building a classification model for a chronic disease using a U.S. behavior risk factor data set, and 2) to illustrate application of the methods using a case study of depressive disorder. Methods described include: 1) six steps of data mining to build a disease model using classification techniques, 2) an innovative approach to analyzing high-dimensionality data, and 3) a visualization strategy to communicate with clinicians who are unfamiliar with advanced statistics. Our application of data mining strategies identified childhood experience living with mentally ill and sexual abuse, and limited usual activity as the strongest correlates of depression among hundreds variables. The methods that we applied may be useful to others wishing to build a classification model from complex, large volume datasets for other health conditions.

  20. A novel approach to generating CER hypotheses based on mining clinical data.

    PubMed

    Zhang, Shuo; Li, Lin; Yu, Yiqin; Sun, Xingzhi; Xu, Linhao; Zhao, Wei; Teng, Xiaofei; Pan, Yue

    2013-01-01

    Comparative effectiveness research (CER) is a scientific method of investigating the effectiveness of alternative intervention methods. In a CER study, clinical researchers typically start with a CER hypothesis, and aim to evaluate it by applying a series of medical statistical methods. Traditionally, the CER hypotheses are defined manually by clinical researchers. This makes the task of hypothesis generation very time-consuming and the quality of hypothesis heavily dependent on the researchers' skills. Recently, with more electronic medical data being collected, it is highly promising to apply the computerized method for discovering CER hypotheses from clinical data sets. In this poster, we proposes a novel approach to automatically generating CER hypotheses based on mining clinical data, and presents a case study showing that the approach can facilitate clinical researchers to identify potentially valuable hypotheses and eventually define high quality CER studies.

  1. Identifying woody vegetation on coal surface mines using phenological indicators with multitemporal Landsat imagery

    NASA Astrophysics Data System (ADS)

    Oliphant, A. J.; Li, J.; Wynne, R. H.; Donovan, P. F.; Zipper, C. E.

    2014-11-01

    Surface mining for coal has disturbed large land areas in the Appalachian Mountains. Better information on mined lands' ecosystem recovery status is necessary for effective environmental management in mining-impacted regions. Because record quality varies between state mining agencies and much mining occurred prior to widespread use of geospatial technologies, accurate maps of mining extents, durations, and land cover effects are often not available. Landsat data are well suited to mapping and characterizing land cover and forest recovery on former coal surface mines. Past mine reclamation techniques have often failed to restore premining forest vegetation but natural processes may enable native forests to re-establish on mined areas with time. However, the invasive species autumn olive (Elaeagnus umbellate) is proliferating widely on former coal surface mines, often inhibiting reestablishment of native forests. Autumn olive outcompetes native vegetation because it fixes atmospheric nitrogen and benefits from a longer growing season than native deciduous trees. This longer growing season, along with Landsat 8's high signal to noise ratio, has enabled species-level classification of autumn olive using multitemporal Landsat 8 data at accuracy levels usually only obtainable using higher spatial or spectral resolution sensors. We have used classification and regression tree (CART®) and support vector machine (SVM) to classify five counties in the coal mining region of Virginia for presence and absence of autumn olive. The best model found was a CART® model with 36 nodes which had an overall accuracy of 84% and kappa of 0.68. Autumn olive had conditional kappa of 0.65 and a producers and users accuracy of 86% and 83% respectively. The best SVM model used a second order polynomial kernel and had an overall accuracy of 77%, an overall kappa of 0.54 and a producers and users accuracy of 60% and 90% respectively.

  2. SMM-system: A mining tool to identify specific markers in Salmonella enterica.

    PubMed

    Yu, Shuijing; Liu, Weibing; Shi, Chunlei; Wang, Dapeng; Dan, Xianlong; Li, Xiao; Shi, Xianming

    2011-03-01

    This report presents SMM-system, a software package that implements various personalized pre- and post-BLASTN tasks for mining specific markers of microbial pathogens. The main functionalities of SMM-system are summarized as follows: (i) converting multi-FASTA file, (ii) cutting interesting genomic sequence, (iii) automatic high-throughput BLASTN searches, and (iv) screening target sequences. The utility of SMM-system was demonstrated by using it to identify 214 Salmonella enterica-specific protein-coding sequences (CDSs). Eighteen primer pairs were designed based on eighteen S. enterica-specific CDSs, respectively. Seven of these primer pairs were validated with PCR assay, which showed 100% inclusivity for the 101 S. enterica genomes and 100% exclusivity of 30 non-S. enterica genomes. Three specific primer pairs were chosen to develop a multiplex PCR assay, which generated specific amplicons with a size of 180bp (SC1286), 238bp (SC1598) and 405bp (SC4361), respectively. This study demonstrates that SMM-system is a high-throughput specific marker generation tool that can be used to identify genus-, species-, serogroup- and even serovar-specific DNA sequences of microbial pathogens, which has a potential to be applied in food industries, diagnostics and taxonomic studies. SMM-system is freely available and can be downloaded from http://foodsafety.sjtu.edu.cn/SMM-system.html.

  3. Identifying the Uncertainty in Physician Practice Location through Spatial Analytics and Text Mining

    PubMed Central

    Shi, Xuan; Xue, Bowei; Xierali, Imam M.

    2016-01-01

    In response to the widespread concern about the adequacy, distribution, and disparity of access to a health care workforce, the correct identification of physicians’ practice locations is critical to access public health services. In prior literature, little effort has been made to detect and resolve the uncertainty about whether the address provided by a physician in the survey is a practice address or a home address. This paper introduces how to identify the uncertainty in a physician’s practice location through spatial analytics, text mining, and visual examination. While land use and zoning code, embedded within the parcel datasets, help to differentiate resident areas from other types, spatial analytics may have certain limitations in matching and comparing physician and parcel datasets with different uncertainty issues, which may lead to unforeseen results. Handling and matching the string components between physicians’ addresses and the addresses of the parcels could identify the spatial uncertainty and instability to derive a more reasonable relationship between different datasets. Visual analytics and examination further help to clarify the undetectable patterns. This research will have a broader impact over federal and state initiatives and policies to address both insufficiency and maldistribution of a health care workforce to improve the accessibility to public health services. PMID:27657100

  4. IDENTIFYING RECENT SURFACE MINING ACTIVITIES USING A NORMALIZED DIFFERENCE VEGETATION INDEX (NDVI) CHANGE DETECTION METHOD

    EPA Science Inventory



    Coal mining is a major resource extraction activity on the Appalachian Mountains. The increased size and frequency of a specific type of surface mining, known as mountain top removal-valley fill, has in recent years raised various environmental concerns. During mountainto...

  5. Identifying Learning Behaviors by Contextualizing Differential Sequence Mining with Action Features and Performance Evolution

    ERIC Educational Resources Information Center

    Kinnebrew, John S.; Biswas, Gautam

    2012-01-01

    Our learning-by-teaching environment, Betty's Brain, captures a wealth of data on students' learning interactions as they teach a virtual agent. This paper extends an exploratory data mining methodology for assessing and comparing students' learning behaviors from these interaction traces. The core algorithm employs sequence mining techniques to…

  6. IDENTIFYING RECENT SURFACE MINING ACTIVITIES USING A NORMALIZED DIFFERENCE VEGETATION INDEX (NDVI) CHANGE DETECTION METHOD

    EPA Science Inventory



    Coal mining is a major resource extraction activity on the Appalachian Mountains. The increased size and frequency of a specific type of surface mining, known as mountain top removal-valley fill, has in recent years raised various environmental concerns. During mountainto...

  7. Trend Motif: A Graph Mining Approach for Analysis of Dynamic Complex Networks

    SciTech Connect

    Jin, R; McCallen, S; Almaas, E

    2007-05-28

    Complex networks have been used successfully in scientific disciplines ranging from sociology to microbiology to describe systems of interacting units. Until recently, studies of complex networks have mainly focused on their network topology. However, in many real world applications, the edges and vertices have associated attributes that are frequently represented as vertex or edge weights. Furthermore, these weights are often not static, instead changing with time and forming a time series. Hence, to fully understand the dynamics of the complex network, we have to consider both network topology and related time series data. In this work, we propose a motif mining approach to identify trend motifs for such purposes. Simply stated, a trend motif describes a recurring subgraph where each of its vertices or edges displays similar dynamics over a userdefined period. Given this, each trend motif occurrence can help reveal significant events in a complex system; frequent trend motifs may aid in uncovering dynamic rules of change for the system, and the distribution of trend motifs may characterize the global dynamics of the system. Here, we have developed efficient mining algorithms to extract trend motifs. Our experimental validation using three disparate empirical datasets, ranging from the stock market, world trade, to a protein interaction network, has demonstrated the efficiency and effectiveness of our approach.

  8. Determining the familial risk distribution of colorectal cancer: a data mining approach.

    PubMed

    Chau, Rowena; Jenkins, Mark A; Buchanan, Daniel D; Ait Ouakrim, Driss; Giles, Graham G; Casey, Graham; Gallinger, Steven; Haile, Robert W; Le Marchand, Loic; Newcomb, Polly A; Lindor, Noralane M; Hopper, John L; Win, Aung Ko

    2016-04-01

    This study was aimed to characterize the distribution of colorectal cancer risk using family history of cancers by data mining. Family histories for 10,066 colorectal cancer cases recruited to population cancer registries of the Colon Cancer Family Registry were analyzed using a data mining framework. A novel index was developed to quantify familial cancer aggregation. Artificial neural network was used to identify distinct categories of familial risk. Standardized incidence ratios (SIRs) and corresponding 95% confidence intervals (CIs) of colorectal cancer were calculated for each category. We identified five major, and 66 minor categories of familial risk for developing colorectal cancer. The distribution the major risk categories were: (1) 7% of families (SIR = 7.11; 95% CI 6.65-7.59) had a strong family history of colorectal cancer; (2) 13% of families (SIR = 2.94; 95% CI 2.78-3.10) had a moderate family history of colorectal cancer; (3) 11% of families (SIR = 1.23; 95% CI 1.12-1.36) had a strong family history of breast cancer and a weak family history of colorectal cancer; (4) 9 % of families (SIR = 1.06; 95 % CI 0.96-1.18) had strong family history of prostate cancer and weak family history of colorectal cancer; and (5) 60% of families (SIR = 0.61; 95% CI 0.57-0.65) had a weak family history of all cancers. There is a wide variation of colorectal cancer risk that can be categorized by family history of cancer, with a strong gradient of colorectal cancer risk between the highest and lowest risk categories. The risk of colorectal cancer for people with the highest risk category of family history (7% of the population) was 12-times that for people in the lowest risk category (60%) of the population. Data mining was proven an effective approach for gaining insight into the underlying cancer aggregation patterns and for categorizing familial risk of colorectal cancer.

  9. Identifying candidate drivers of drug response in heterogeneous cancer by mining high throughput genomics data.

    PubMed

    Nabavi, Sheida

    2016-08-15

    With advances in technologies, huge amounts of multiple types of high-throughput genomics data are available. These data have tremendous potential to identify new and clinically valuable biomarkers to guide the diagnosis, assessment of prognosis, and treatment of complex diseases, such as cancer. Integrating, analyzing, and interpreting big and noisy genomics data to obtain biologically meaningful results, however, remains highly challenging. Mining genomics datasets by utilizing advanced computational methods can help to address these issues. To facilitate the identification of a short list of biologically meaningful genes as candidate drivers of anti-cancer drug resistance from an enormous amount of heterogeneous data, we employed statistical machine-learning techniques and integrated genomics datasets. We developed a computational method that integrates gene expression, somatic mutation, and copy number aberration data of sensitive and resistant tumors. In this method, an integrative method based on module network analysis is applied to identify potential driver genes. This is followed by cross-validation and a comparison of the results of sensitive and resistance groups to obtain the final list of candidate biomarkers. We applied this method to the ovarian cancer data from the cancer genome atlas. The final result contains biologically relevant genes, such as COL11A1, which has been reported as a cis-platinum resistant biomarker for epithelial ovarian carcinoma in several recent studies. The described method yields a short list of aberrant genes that also control the expression of their co-regulated genes. The results suggest that the unbiased data driven computational method can identify biologically relevant candidate biomarkers. It can be utilized in a wide range of applications that compare two conditions with highly heterogeneous datasets.

  10. Practical Approaches for Mining Frequent Patterns in Molecular Datasets

    PubMed Central

    Naulaerts, Stefan; Moens, Sandy; Engelen, Kristof; Berghe, Wim Vanden; Goethals, Bart; Laukens, Kris; Meysman, Pieter

    2016-01-01

    Pattern detection is an inherent task in the analysis and interpretation of complex and continuously accumulating biological data. Numerous itemset mining algorithms have been developed in the last decade to efficiently detect specific pattern classes in data. Although many of these have proven their value for addressing bioinformatics problems, several factors still slow down promising algorithms from gaining popularity in the life science community. Many of these issues stem from the low user-friendliness of these tools and the complexity of their output, which is often large, static, and consequently hard to interpret. Here, we apply three software implementations on common bioinformatics problems and illustrate some of the advantages and disadvantages of each, as well as inherent pitfalls of biological data mining. Frequent itemset mining exists in many different flavors, and users should decide their software choice based on their research question, programming proficiency, and added value of extra features. PMID:27168722

  11. Practical Approaches for Mining Frequent Patterns in Molecular Datasets.

    PubMed

    Naulaerts, Stefan; Moens, Sandy; Engelen, Kristof; Berghe, Wim Vanden; Goethals, Bart; Laukens, Kris; Meysman, Pieter

    2016-01-01

    Pattern detection is an inherent task in the analysis and interpretation of complex and continuously accumulating biological data. Numerous itemset mining algorithms have been developed in the last decade to efficiently detect specific pattern classes in data. Although many of these have proven their value for addressing bioinformatics problems, several factors still slow down promising algorithms from gaining popularity in the life science community. Many of these issues stem from the low user-friendliness of these tools and the complexity of their output, which is often large, static, and consequently hard to interpret. Here, we apply three software implementations on common bioinformatics problems and illustrate some of the advantages and disadvantages of each, as well as inherent pitfalls of biological data mining. Frequent itemset mining exists in many different flavors, and users should decide their software choice based on their research question, programming proficiency, and added value of extra features.

  12. A systematic approach to identify cellular auxetic materials

    NASA Astrophysics Data System (ADS)

    Körner, Carolin; Liebold-Ribeiro, Yvonne

    2015-02-01

    Auxetics are materials showing a negative Poisson’s ratio. This characteristic leads to unusual mechanical properties that make this an interesting class of materials. So far no systematic approach for generating auxetic cellular materials has been reported. In this contribution, we present a systematic approach to identifying auxetic cellular materials based on eigenmode analysis. The fundamental mechanism generating auxetic behavior is identified as rotation. With this knowledge, a variety of complex two-dimensional (2D) and three-dimensional (3D) auxetic structures based on simple unit cells can be identified.

  13. Comparison of approaches for parameter identifiability analysis of biological systems.

    PubMed

    Raue, Andreas; Karlsson, Johan; Saccomani, Maria Pia; Jirstrand, Mats; Timmer, Jens

    2014-05-15

    Modeling of dynamical systems using ordinary differential equations is a popular approach in the field of Systems Biology. The amount of experimental data that are used to build and calibrate these models is often limited. In this setting, the model parameters may not be uniquely determinable. Structural or a priori identifiability is a property of the system equations that indicates whether, in principle, the unknown model parameters can be determined from the available data. We performed a case study using three current approaches for structural identifiability analysis for an application from cell biology. The approaches are conceptually different and are developed independently. The results of the three approaches are in agreement. We discuss strength and weaknesses of each of them and illustrate how they can be applied to real world problems. For application of the approaches to further applications, code representations (DAISY, Mathematica and MATLAB) for benchmark model and data are provided on the authors webpage. andreas.raue@fdm.uni-freiburg.de.

  14. Exploring factors associated with pressure ulcers: a data mining approach.

    PubMed

    Raju, Dheeraj; Su, Xiaogang; Patrician, Patricia A; Loan, Lori A; McCarthy, Mary S

    2015-01-01

    Pressure ulcers are associated with a nearly three-fold increase in in-hospital mortality. It is essential to investigate how other factors besides the Braden scale could enhance the prediction of pressure ulcers. Data mining modeling techniques can be beneficial to conduct this type of analysis. Data mining techniques have been applied extensively in health care, but are not widely used in nursing research. To remedy this methodological gap, this paper will review, explain, and compare several data mining models to examine patient level factors associated with pressure ulcers based on a four year study from military hospitals in the United States. The variables included in the analysis are easily accessible demographic information and medical measurements. Logistic regression, decision trees, random forests, and multivariate adaptive regression splines were compared based on their performance and interpretability. The random forests model had the highest accuracy (C-statistic) with the following variables, in order of importance, ranked highest in predicting pressure ulcers: days in the hospital, serum albumin, age, blood urea nitrogen, and total Braden score. Data mining, particularly, random forests are useful in predictive modeling. It is important for hospitals and health care systems to use their own data over time for pressure ulcer risk prediction, to develop risk models based upon more than the total Braden score, and specific to their patient population. Copyright © 2014 Elsevier Ltd. All rights reserved.

  15. Diagnosis of cardiovascular abnormalities from compressed ECG: a data mining-based approach.

    PubMed

    Sufi, Fahim; Khalil, Ibrahim

    2011-01-01

    Usage of compressed ECG for fast and efficient telecardiology application is crucial, as ECG signals are enormously large in size. However, conventional ECG diagnosis algorithms require the compressed ECG packets to be decompressed before diagnosis can be performed. This added step of decompression before performing diagnosis for every ECG packet introduces unnecessary delay, which is undesirable for cardiovascular diseased (CVD) patients. In this paper, we are demonstrating an innovative technique that performs real-time classification of CVD. With the help of this real-time classification of CVD, the emergency personnel or the hospital can automatically be notified via SMS/MMS/e-mail when a life-threatening cardiac abnormality of the CVD affected patient is detected. Our proposed system initially uses data mining techniques, such as attribute selection (i.e., selects only a few features from the compressed ECG) and expectation maximization (EM)-based clustering. These data mining techniques running on a hospital server generate a set of constraints for representing each of the abnormalities. Then, the patient's mobile phone receives these set of constraints and employs a rule-based system that can identify each of abnormal beats in real time. Our experimentation results on 50 MIT-BIH ECG entries reveal that the proposed approach can successfully detect cardiac abnormalities (e.g., ventricular flutter/fibrillation, premature ventricular contraction, atrial fibrillation, etc.) with 97% accuracy on average. This innovative data mining technique on compressed ECG packets enables faster identification of cardiac abnormality directly from the compressed ECG, helping to build an efficient telecardiology diagnosis system.

  16. Hazards identified and the need for health risk assessment in the South African mining industry.

    PubMed

    Utembe, W; Faustman, E M; Matatiele, P; Gulumian, M

    2015-12-01

    Although mining plays a prominent role in the economy of South Africa, it is associated with many chemical hazards. Exposure to dust from mining can lead to many pathological effects depending on mineralogical composition, size, shape and levels and duration of exposure. Mining and processing of minerals also result in occupational exposure to toxic substances such as platinum, chromium, vanadium, manganese, mercury, cyanide and diesel particulate. South Africa has set occupational exposure limits (OELs) for some hazards, but mine workers are still at a risk. Since the hazard posed by a mineral depends on its physiochemical properties, it is recommended that South Africa should not simply adopt OELs from other countries but rather set her own standards based on local toxicity studies. The limits should take into account the issue of mixtures to which workers could be exposed as well as the health status of the workers. The mining industry is also a source of contamination of the environment, due inter alia to the large areas of tailings dams and dumps left behind. Therefore, there is need to develop guidelines for safe land-uses of contaminated lands after mine closure.

  17. A data mining approach to evolutionary optimisation of noisy multi-objective problems

    NASA Astrophysics Data System (ADS)

    Chia, J. Y.; Goh, C. K.; Shim, V. A.; Tan, K. C.

    2012-07-01

    Many real world optimisation problems have opposing objective functions which are subjected to the influence of noise. Noise in the objective functions can adversely affect the stability, performance and convergence of evolutionary optimisers. This article proposes a Bayesian frequent data mining (DM) approach to identify optimal regions to guide the population amidst the presence of noise. The aggregated information provided by all the solutions helped to average out the effects of noise. This article proposes a DM crossover operator to make use of the rules mined. After implementation of this operator, a better convergence to the true Pareto front is achieved at the expense of the diversity of the solution. Consequently, an ExtremalExploration operator will be proposed in the later part of this article to help curb the loss in diversity caused by the DM operator. The result is a more directive search with a faster convergence rate. The search is effective in decision space where the Pareto set is in a tight cluster. A further investigation of the performance of the proposed algorithm in noisy and noiseless environment will also be studied with respect to non-convexity, discontinuity, multi-modality and uniformity. The proposed algorithm is evaluated on ZDT and other benchmarks problems. The results of the simulations indicate that the proposed method is effective in handling noise and is competitive against the other noise tolerant algorithms.

  18. A predictive approach to identify genes differentially expressed

    NASA Astrophysics Data System (ADS)

    Saraiva, Erlandson F.; Louzada, Francisco; Milan, Luís A.; Meira, Silvana; Cobre, Juliana

    2012-10-01

    The main objective of gene expression data analysis is to identify genes that present significant changes in expression levels between a treatment and a control biological condition. In this paper, we propose a Bayesian approach to identify genes differentially expressed calculating credibility intervals from predictive densities which are constructed using sampled mean treatment effect from all genes in study excluding the treatment effect of genes previously identified with statistical evidence for difference. We compare our Bayesian approach with the standard ones based on the use of the t-test and modified t-tests via a simulation study, using small sample sizes which are common in gene expression data analysis. Results obtained indicate that the proposed approach performs better than standard ones, especially for cases with mean differences and increases in treatment variance in relation to control variance. We also apply the methodologies to a publicly available data set on Escherichia coli bacteria.

  19. The Usage of Association Rule Mining to Identify Influencing Factors on Deafness After Birth.

    PubMed

    Shahraki, Azimeh Danesh; Safdari, Reza; Gahfarokhi, Hamid Habibi; Tahmasebian, Shahram

    2015-12-01

    Providing complete and high quality health care services has very important role to enable people to understand the factors related to personal and social health and to make decision regarding choice of suitable healthy behaviors in order to achieve healthy life. For this reason, demographic and clinical data of person are collecting, this huge volume of data can be known as a valuable resource for analyzing, exploring and discovering valuable information and communication. This study using forum rules techniques in the data mining has tried to identify the affecting factors on hearing loss after birth in Iran. The survey is kind of data oriented study. The population of the study is contained questionnaires in several provinces of the country. First, all data of questionnaire was implemented in the form of information table in Software SQL Server and followed by Data Entry using written software of C # .Net, then algorithm Association in SQL Server Data Tools software and Clementine software was implemented to determine the rules and hidden patterns in the gathered data. Two factors of number of deaf brothers and the degree of consanguinity of the parents have a significant impact on severity of deafness of individuals. Also, when the severity of hearing loss is greater than or equal to moderately severe hearing loss, people use hearing aids and Men are also less interested in the use of hearing aids. In fact, it can be said that in families with consanguineous marriage of parents that are from first degree (girl/boy cousins) and 2(nd) degree relatives (girl/boy cousins) and especially from first degree, the number of people with severe hearing loss or deafness are more and in the use of hearing aids, gender of the patient is more important than the severity of the hearing loss.

  20. The Usage of Association Rule Mining to Identify Influencing Factors on Deafness After Birth

    PubMed Central

    Shahraki, Azimeh Danesh; Safdari, Reza; Gahfarokhi, Hamid Habibi; Tahmasebian, Shahram

    2015-01-01

    Background: Providing complete and high quality health care services has very important role to enable people to understand the factors related to personal and social health and to make decision regarding choice of suitable healthy behaviors in order to achieve healthy life. For this reason, demographic and clinical data of person are collecting, this huge volume of data can be known as a valuable resource for analyzing, exploring and discovering valuable information and communication. This study using forum rules techniques in the data mining has tried to identify the affecting factors on hearing loss after birth in Iran. Materials and Methods: The survey is kind of data oriented study. The population of the study is contained questionnaires in several provinces of the country. First, all data of questionnaire was implemented in the form of information table in Software SQL Server and followed by Data Entry using written software of C # .Net, then algorithm Association in SQL Server Data Tools software and Clementine software was implemented to determine the rules and hidden patterns in the gathered data. Findings: Two factors of number of deaf brothers and the degree of consanguinity of the parents have a significant impact on severity of deafness of individuals. Also, when the severity of hearing loss is greater than or equal to moderately severe hearing loss, people use hearing aids and Men are also less interested in the use of hearing aids. Conclusion: In fact, it can be said that in families with consanguineous marriage of parents that are from first degree (girl/boy cousins) and 2nd degree relatives (girl/boy cousins) and especially from first degree, the number of people with severe hearing loss or deafness are more and in the use of hearing aids, gender of the patient is more important than the severity of the hearing loss. PMID:26862245

  1. An efficient data preprocessing approach for large scale medical data mining.

    PubMed

    Hu, Ya-Han; Lin, Wei-Chao; Tsai, Chih-Fong; Ke, Shih-Wen; Chen, Chih-Wen

    2015-01-01

    The size of medical datasets is usually very large, which directly affects the computational cost of the data mining process. Instance selection is a data preprocessing step in the knowledge discovery process, which can be employed to reduce storage requirements while also maintaining the mining quality. This process aims to filter out outliers (or noisy data) from a given (training) dataset. However, when the dataset is very large in size, more time is required to accomplish the instance selection task. In this paper, we introduce an efficient data preprocessing approach (EDP), which is composed of two steps. The first step is based on training a model over a small amount of training data after preforming instance selection. The model is then used to identify the rest of the large amount of training data. Experiments are conducted based on two medical datasets for breast cancer and protein homology prediction problems that contain over 100000 data samples. In addition, three well-known instance selection algorithms are used, IB3, DROP3, and genetic algorithms. On the other hand, three popular classification techniques are used to construct the learning models for comparison, namely the CART decision tree, k-nearest neighbor (k-NN), and support vector machine (SVM). The results show that our proposed approach not only reduces the computational cost by nearly a factor of two or three over three other state-of-the-art algorithms, but also maintains the final classification accuracy. To perform instance selection over large scale medical datasets, it requires a large computational cost to directly execute existing instance selection algorithms. Our proposed EDP approach solves this problem by training a learning model to recognize good and noisy data. To consider both computational complexity and final classification accuracy, the proposed EDP has been demonstrated its efficiency and effectiveness in the large scale instance selection problem.

  2. An enhanced stream mining approach for network anomaly detection

    NASA Astrophysics Data System (ADS)

    Bellaachia, Abdelghani; Bhatt, Rajat

    2005-03-01

    Network anomaly detection is one of the hot topics in the market today. Currently, researchers are trying to find a way in which machines could automatically learn both normal and anomalous behavior and thus detect anomalies if and when they occur. Most important applications which could spring out of these systems is intrusion detection and spam mail detection. In this paper, the primary focus on the problem and solution of "real time" network intrusion detection although the underlying theory discussed may be used for other applications of anomaly detection (like spam detection or spy-ware detection) too. Since a machine needs a learning process on its own, data mining has been chosen as a preferred technique. The object of this paper is to present a real time clustering system; we call Enhanced Stream Mining (ESM) which could analyze packet information (headers, and data) to determine intrusions.

  3. An Approach to Realizing Process Control for Underground Mining Operations of Mobile Machines.

    PubMed

    Song, Zhen; Schunnesson, Håkan; Rinne, Mikael; Sturgul, John

    2015-01-01

    The excavation and production in underground mines are complicated processes which consist of many different operations. The process of underground mining is considerably constrained by the geometry and geology of the mine. The various mining operations are normally performed in series at each working face. The delay of a single operation will lead to a domino effect, thus delay the starting time for the next process and the completion time of the entire process. This paper presents a new approach to the process control for underground mining operations, e.g. drilling, bolting, mucking. This approach can estimate the working time and its probability for each operation more efficiently and objectively by improving the existing PERT (Program Evaluation and Review Technique) and CPM (Critical Path Method). If the delay of the critical operation (which is on a critical path) inevitably affects the productivity of mined ore, the approach can rapidly assign mucking machines new jobs to increase this amount at a maximum level by using a new mucking algorithm under external constraints.

  4. An Approach to Realizing Process Control for Underground Mining Operations of Mobile Machines

    PubMed Central

    Song, Zhen; Schunnesson, Håkan; Rinne, Mikael; Sturgul, John

    2015-01-01

    The excavation and production in underground mines are complicated processes which consist of many different operations. The process of underground mining is considerably constrained by the geometry and geology of the mine. The various mining operations are normally performed in series at each working face. The delay of a single operation will lead to a domino effect, thus delay the starting time for the next process and the completion time of the entire process. This paper presents a new approach to the process control for underground mining operations, e.g. drilling, bolting, mucking. This approach can estimate the working time and its probability for each operation more efficiently and objectively by improving the existing PERT (Program Evaluation and Review Technique) and CPM (Critical Path Method). If the delay of the critical operation (which is on a critical path) inevitably affects the productivity of mined ore, the approach can rapidly assign mucking machines new jobs to increase this amount at a maximum level by using a new mucking algorithm under external constraints. PMID:26062092

  5. Data mining approaches for information retrieval from genomic databases

    NASA Astrophysics Data System (ADS)

    Liu, Donglin; Singh, Gautam B.

    2000-04-01

    Sequence retrieval in genomic databases is used for finding sequences related to a query sequence specified by a user. Comparison is the main part of the retrieval system in genomic databases. An efficient sequence comparison algorithm is critical in bioinformatics. There are several different algorithms to perform sequence comparison, such as the suffix array based database search, divergence measurement, methods that rely upon the existence of a local similarity between the query sequence and sequences in the database, or common mutual information between query and sequences in DB. In this paper we have described a new method for DNA sequence retrieval based on data mining techniques. Data mining tools generally find patterns among data and have been successfully applied in industries to improve marketing, sales, and customer support operations. We have applied the descriptive data mining techniques to find relevant patterns that are significant for comparing genetic sequences. Relevance feedback score based on common patterns is developed and employed to compute distance between sequences. The contigs of human chromosomes are used to test the retrieval accuracy and the experimental results are presented.

  6. Identifying the Educationally Influential Physician: A Systematic Review of Approaches

    ERIC Educational Resources Information Center

    Kronberger, Matthew P.; Bakken, Lori L.

    2011-01-01

    Introduction: Previous studies have indicated that educationally influential physicians' (EIPs) interactions with peers can lead to practice changes and improved patient outcomes. However, multiple approaches have been used to identify and investigate EIPs' informal or formal influence on practice, which creates study outcomes that are difficult…

  7. Identifying the Educationally Influential Physician: A Systematic Review of Approaches

    ERIC Educational Resources Information Center

    Kronberger, Matthew P.; Bakken, Lori L.

    2011-01-01

    Introduction: Previous studies have indicated that educationally influential physicians' (EIPs) interactions with peers can lead to practice changes and improved patient outcomes. However, multiple approaches have been used to identify and investigate EIPs' informal or formal influence on practice, which creates study outcomes that are difficult…

  8. Identifying novel biomarkers in sarcoidosis using genome-based approaches

    PubMed Central

    Knox, Kenneth S.; Garcia, Joe G.N.

    2015-01-01

    Synopsis We briefly review conventional biomarkers used clinically to 1) support a diagnosis and 2) monitor disease progression in patients with sarcoidosis. We describe potential new biomarkers identified by genome-wide screening and the approaches to discover these biomarkers. PMID:26593137

  9. A Feature Mining Based Approach for the Classification of Text Documents into Disjoint Classes.

    ERIC Educational Resources Information Center

    Nieto Sanchez, Salvador; Triantaphyllou, Evangelos; Kraft, Donald

    2002-01-01

    Proposes a new approach for classifying text documents into two disjoint classes. Highlights include a brief overview of document clustering; a data mining approach called the One Clause at a Time (OCAT) algorithm which is based on mathematical logic; vector space model (VSM); and comparing the OCAT to the VSM. (Author/LRW)

  10. A Feature Mining Based Approach for the Classification of Text Documents into Disjoint Classes.

    ERIC Educational Resources Information Center

    Nieto Sanchez, Salvador; Triantaphyllou, Evangelos; Kraft, Donald

    2002-01-01

    Proposes a new approach for classifying text documents into two disjoint classes. Highlights include a brief overview of document clustering; a data mining approach called the One Clause at a Time (OCAT) algorithm which is based on mathematical logic; vector space model (VSM); and comparing the OCAT to the VSM. (Author/LRW)

  11. THE FUTURE OF COMPUTER-BASED TOXICITY PREDICTION: MECHANISM-BASED MODELS VS. INFORMATION MINING APPROACHES

    EPA Science Inventory


    The Future of Computer-Based Toxicity Prediction:
    Mechanism-Based Models vs. Information Mining Approaches

    When we speak of computer-based toxicity prediction, we are generally referring to a broad array of approaches which rely primarily upon chemical structure ...

  12. Dual-band, infrared buried mine detection using a statistical pattern recognition approach

    SciTech Connect

    Buhl, M.R.; Hernandez, J.E.; Clark, G.A.; Sengupta, S.K.

    1993-08-01

    The main objective of this work was to detect surrogate land mines, which were buried in clay and sand, using dual-band, infrared images. A statistical pattern recognition approach was used to achieve this objective. This approach is discussed and results of applying it to real images are given.

  13. THE FUTURE OF COMPUTER-BASED TOXICITY PREDICTION: MECHANISM-BASED MODELS VS. INFORMATION MINING APPROACHES

    EPA Science Inventory


    The Future of Computer-Based Toxicity Prediction:
    Mechanism-Based Models vs. Information Mining Approaches

    When we speak of computer-based toxicity prediction, we are generally referring to a broad array of approaches which rely primarily upon chemical structure ...

  14. Genetic and proteomic approaches to identify cancer drug targets

    PubMed Central

    Roti, G; Stegmaier, K

    2012-01-01

    While target-based small-molecule discovery has taken centre-stage in the pharmaceutical industry, there are many cancer-promoting proteins not easily addressed with a traditional target-based screening approach. In order to address this problem, as well as to identify modulators of biological states in the absence of knowing the protein target of the state switch, alternative phenotypic screening approaches, such as gene expression-based and high-content imaging, have been developed. With this renewed interest in phenotypic screening, however, comes the challenge of identifying the binding protein target(s) of small-molecule hits. Emerging technologies have the potential to improve the process of target identification. In this review, we discuss the application of genomic (gene expression-based), genetic (short hairpin RNA and open reading frame screening), and proteomic approaches to protein target identification. PMID:22166799

  15. Approaches to identify kinase dependencies in cancer signalling networks.

    PubMed

    Dermit, Maria; Dokal, Arran; Cutillas, Pedro R

    2017-09-01

    Cells integrate extracellular signals into appropriate responses through a complex network of biochemical reactions driven by the activity of protein and lipid kinases, among other proteins. In order to understand this complexity, new approaches, both experimental and computational, have recently been developed with the aim to identify regulatory kinases and infer their activation status in the context of their signalling network. Here, we review such approaches with particular focus on those based on phosphoproteomics. Integration of kinase activity measurements inferred from phosphoproteomics data with other 'omics' datasets is starting to be used to identify regulatory nodes in biochemical networks. These methodologies may in the future be used to identify patient-specific targets and thus advance personalised cancer medicine. © 2017 Federation of European Biochemical Societies.

  16. Genomic approaches to identifying targets for treating β hemoglobinopathies.

    PubMed

    Ngo, Duyen A; Steinberg, Martin H

    2015-07-29

    Sickle cell disease and β thalassemia are common severe diseases with little effective pathophysiologically-based treatment. Their phenotypic heterogeneity prompted genomic approaches to identify modifiers that ultimately might be exploited therapeutically. Fetal hemoglobin (HbF) is the major modulator of the phenotype of the β hemoglobinopathies. HbF inhibits deoxyHbS polymerization and in β thalassemia compensates for the reduction of HbA. The major success of genomics has been a better understanding the genetic regulation of HbF by identifying the major quantitative trait loci for this trait. If the targets identified can lead to means of increasing HbF to therapeutic levels in sufficient numbers of sickle or β-thalassemia erythrocytes, the pathophysiology of these diseases would be reversed. The availability of new target loci, high-throughput drug screening, and recent advances in genome editing provide the opportunity for new approaches to therapeutically increasing HbF production.

  17. A Computer Vision Approach to Identify Einstein Rings and Arcs

    NASA Astrophysics Data System (ADS)

    Lee, Chien-Hsiu

    2017-03-01

    Einstein rings are rare gems of strong lensing phenomena; the ring images can be used to probe the underlying lens gravitational potential at every position angles, tightly constraining the lens mass profile. In addition, the magnified images also enable us to probe high-z galaxies with enhanced resolution and signal-to-noise ratios. However, only a handful of Einstein rings have been reported, either from serendipitous discoveries or or visual inspections of hundred thousands of massive galaxies or galaxy clusters. In the era of large sky surveys, an automated approach to identify ring pattern in the big data to come is in high demand. Here, we present an Einstein ring recognition approach based on computer vision techniques. The workhorse is the circle Hough transform that recognise circular patterns or arcs in the images. We propose a two-tier approach by first pre-selecting massive galaxies associated with multiple blue objects as possible lens, than use Hough transform to identify circular pattern. As a proof-of-concept, we apply our approach to SDSS, with a high completeness, albeit with low purity. We also apply our approach to other lenses in DES, HSC-SSP, and UltraVISTA survey, illustrating the versatility of our approach.

  18. Fuzzy OLAP association rules mining-based modular reinforcement learning approach for multiagent systems.

    PubMed

    Kaya, Mehmet; Alhajj, Reda

    2005-04-01

    Multiagent systems and data mining have recently attracted considerable attention in the field of computing. Reinforcement learning is the most commonly used learning process for multiagent systems. However, it still has some drawbacks, including modeling other learning agents present in the domain as part of the state of the environment, and some states are experienced much less than others, or some state-action pairs are never visited during the learning phase. Further, before completing the learning process, an agent cannot exhibit a certain behavior in some states that may be experienced sufficiently. In this study, we propose a novel multiagent learning approach to handle these problems. Our approach is based on utilizing the mining process for modular cooperative learning systems. It incorporates fuzziness and online analytical processing (OLAP) based mining to effectively process the information reported by agents. First, we describe a fuzzy data cube OLAP architecture which facilitates effective storage and processing of the state information reported by agents. This way, the action of the other agent, not even in the visual environment. of the agent under consideration, can simply be predicted by extracting online association rules, a well-known data mining technique, from the constructed data cube. Second, we present a new action selection model, which is also based on association rules mining. Finally, we generalize not sufficiently experienced states, by mining multilevel association rules from the proposed fuzzy data cube. Experimental results obtained on two different versions of a well-known pursuit domain show the robustness and effectiveness of the proposed fuzzy OLAP mining based modular learning approach. Finally, we tested the scalability of the approach presented in this paper and compared it with our previous work on modular-fuzzy Q-learning and ordinary Q-learning.

  19. PM2: a partitioning-mining-measuring method for identifying progressive changes in older adults' sleeping activity.

    PubMed

    Lin, Qiang; Zhang, Daqing; Connelly, Kay; Zhou, Xingshe; Ni, Hongbo

    2014-01-01

    As people age, their health typically declines, resulting in difficulty in performing daily activities. Sleep-related problems are common issues with older adults, including shifts in circadian rhythms. A detection method is proposed to identify progressive changes in sleeping activity using a three-step process: partitioning, mining, and measuring. Specifically, the original spatiotemporal representation of each sleeping activity instance was first transformed into a sequence of equal-sized segments, or symbols, via a partitioning process. A data-mining-based algorithm was proposed to find symbols that are not present in all instances of a sleeping activity. Finally, a measuring process was responsible for evaluating the changes in these symbols. Experimental evaluation conducted on a group of datasets of older adults showed that the proposed method is able to identify progressive changes in sleeping activity.

  20. A Visualization System Using Data Mining Techniques for Identifying Information Sources.

    ERIC Educational Resources Information Center

    Fowler, Richard H.; Karadayi, Tarkan; Chen, Zhixiang; Meng, Xiannong; Fowler, Wendy A. Lawrence

    The Visual Analysis System (VAS) was developed to couple emerging successes in data mining with information visualization techniques in order to create a richly interactive environment for information retrieval from the World Wide Web. VAS's retrieval strategy operates by first using a conventional search engine to form a core set of retrieved…

  1. Similarity transformation approach to identifiability analysis of nonlinear compartmental models.

    PubMed

    Vajda, S; Godfrey, K R; Rabitz, H

    1989-04-01

    Through use of the local state isomorphism theorem instead of the algebraic equivalence theorem of linear systems theory, the similarity transformation approach is extended to nonlinear models, resulting in finitely verifiable sufficient and necessary conditions for global and local identifiability. The approach requires testing of certain controllability and observability conditions, but in many practical examples these conditions prove very easy to verify. In principle the method also involves nonlinear state variable transformations, but in all of the examples presented in the paper the transformations turn out to be linear. The method is applied to an unidentifiable nonlinear model and a locally identifiable nonlinear model, and these are the first nonlinear models other than bilinear models where the reason for lack of global identifiability is nontrivial. The method is also applied to two models with Michaelis-Menten elimination kinetics, both of considerable importance in pharmacokinetics, and for both of which the complicated nature of the algebraic equations arising from the Taylor series approach has hitherto defeated attempts to establish identifiability results for specific input functions.

  2. A text mining approach to the prediction of disease status from clinical discharge summaries.

    PubMed

    Yang, Hui; Spasic, Irena; Keane, John A; Nenadic, Goran

    2009-01-01

    OBJECTIVE The authors present a system developed for the Challenge in Natural Language Processing for Clinical Data-the i2b2 obesity challenge, whose aim was to automatically identify the status of obesity and 15 related co-morbidities in patients using their clinical discharge summaries. The challenge consisted of two tasks, textual and intuitive. The textual task was to identify explicit references to the diseases, whereas the intuitive task focused on the prediction of the disease status when the evidence was not explicitly asserted. DESIGN The authors assembled a set of resources to lexically and semantically profile the diseases and their associated symptoms, treatments, etc. These features were explored in a hybrid text mining approach, which combined dictionary look-up, rule-based, and machine-learning methods. MEASUREMENTS The methods were applied on a set of 507 previously unseen discharge summaries, and the predictions were evaluated against a manually prepared gold standard. The overall ranking of the participating teams was primarily based on the macro-averaged F-measure. RESULTS The implemented method achieved the macro-averaged F-measure of 81% for the textual task (which was the highest achieved in the challenge) and 63% for the intuitive task (ranked 7(th) out of 28 teams-the highest was 66%). The micro-averaged F-measure showed an average accuracy of 97% for textual and 96% for intuitive annotations. CONCLUSIONS The performance achieved was in line with the agreement between human annotators, indicating the potential of text mining for accurate and efficient prediction of disease statuses from clinical discharge summaries.

  3. Assessing Weather-Yield Relationships in Rice at Local Scale Using Data Mining Approaches

    PubMed Central

    Delerce, Sylvain; Dorado, Hugo; Grillon, Alexandre; Rebolledo, Maria Camila; Prager, Steven D.; Patiño, Victor Hugo; Garcés Varón, Gabriel; Jiménez, Daniel

    2016-01-01

    Seasonal and inter-annual climate variability have become important issues for farmers, and climate change has been shown to increase them. Simultaneously farmers and agricultural organizations are increasingly collecting observational data about in situ crop performance. Agriculture thus needs new tools to cope with changing environmental conditions and to take advantage of these data. Data mining techniques make it possible to extract embedded knowledge associated with farmer experiences from these large observational datasets in order to identify best practices for adapting to climate variability. We introduce new approaches through a case study on irrigated and rainfed rice in Colombia. Preexisting observational datasets of commercial harvest records were combined with in situ daily weather series. Using Conditional Inference Forest and clustering techniques, we assessed the relationships between climatic factors and crop yield variability at the local scale for specific cultivars and growth stages. The analysis showed clear relationships in the various location-cultivar combinations, with climatic factors explaining 6 to 46% of spatiotemporal variability in yield, and with crop responses to weather being non-linear and cultivar-specific. Climatic factors affected cultivars differently during each stage of development. For instance, one cultivar was affected by high nighttime temperatures in the reproductive stage but responded positively to accumulated solar radiation during the ripening stage. Another was affected by high nighttime temperatures during both the vegetative and reproductive stages. Clustering of the weather patterns corresponding to individual cropping events revealed different groups of weather patterns for irrigated and rainfed systems with contrasting yield levels. Best-suited cultivars were identified for some weather patterns, making weather-site-specific recommendations possible. This study illustrates the potential of data mining for

  4. Assessing Weather-Yield Relationships in Rice at Local Scale Using Data Mining Approaches.

    PubMed

    Delerce, Sylvain; Dorado, Hugo; Grillon, Alexandre; Rebolledo, Maria Camila; Prager, Steven D; Patiño, Victor Hugo; Garcés Varón, Gabriel; Jiménez, Daniel

    2016-01-01

    Seasonal and inter-annual climate variability have become important issues for farmers, and climate change has been shown to increase them. Simultaneously farmers and agricultural organizations are increasingly collecting observational data about in situ crop performance. Agriculture thus needs new tools to cope with changing environmental conditions and to take advantage of these data. Data mining techniques make it possible to extract embedded knowledge associated with farmer experiences from these large observational datasets in order to identify best practices for adapting to climate variability. We introduce new approaches through a case study on irrigated and rainfed rice in Colombia. Preexisting observational datasets of commercial harvest records were combined with in situ daily weather series. Using Conditional Inference Forest and clustering techniques, we assessed the relationships between climatic factors and crop yield variability at the local scale for specific cultivars and growth stages. The analysis showed clear relationships in the various location-cultivar combinations, with climatic factors explaining 6 to 46% of spatiotemporal variability in yield, and with crop responses to weather being non-linear and cultivar-specific. Climatic factors affected cultivars differently during each stage of development. For instance, one cultivar was affected by high nighttime temperatures in the reproductive stage but responded positively to accumulated solar radiation during the ripening stage. Another was affected by high nighttime temperatures during both the vegetative and reproductive stages. Clustering of the weather patterns corresponding to individual cropping events revealed different groups of weather patterns for irrigated and rainfed systems with contrasting yield levels. Best-suited cultivars were identified for some weather patterns, making weather-site-specific recommendations possible. This study illustrates the potential of data mining for

  5. What are the important flood damage-influencing parameters? A data mining approach

    NASA Astrophysics Data System (ADS)

    Merz, B.; Kreibich, H.; Lall, U.

    2012-04-01

    Today's approaches for assessing and modeling direct flood damages are not very advanced. The usual approach consists of stage-damage functions which relate the relative or absolute damage for a certain class of objects to the inundation depth. Other characteristics of the flooding situation and of the flooded object are rarely taken into account, although flood damage is influenced by a variety of factors. In this contribution we apply a group of data-mining techniques, known as tree-structured models, to flood damage assessment. Tree-structured models are attractive candidates for identifying important damage-influencing parameters in large damage data sets and for describing quantitatively the non-linear interactions between damage and damage-influencing parameters. A very comprehensive data set of more than 2000 damage records of private households in Germany is used. Each record contains details about a variety of potential damage-influencing characteristics, such as hydrological and hydraulic aspects of the flooding situation, state of precaution of the household, early warning and emergency measures undertaken, socio-economic status of the household. Tree-structured models are used to derive the dominating damage-influencing variables and their (non-linear) interactions. We show that they are a flexible and powerful alternative to traditional damage assessment approaches.

  6. GTA: a game theoretic approach to identifying cancer subnetwork markers.

    PubMed

    Farahmand, S; Goliaei, S; Ansari-Pour, N; Razaghi-Moghadam, Z

    2016-03-01

    The identification of genetic markers (e.g. genes, pathways and subnetworks) for cancer has been one of the most challenging research areas in recent years. A subset of these studies attempt to analyze genome-wide expression profiles to identify markers with high reliability and reusability across independent whole-transcriptome microarray datasets. Therefore, the functional relationships of genes are integrated with their expression data. However, for a more accurate representation of the functional relationships among genes, utilization of the protein-protein interaction network (PPIN) seems to be necessary. Herein, a novel game theoretic approach (GTA) is proposed for the identification of cancer subnetwork markers by integrating genome-wide expression profiles and PPIN. The GTA method was applied to three distinct whole-transcriptome breast cancer datasets to identify the subnetwork markers associated with metastasis. To evaluate the performance of our approach, the identified subnetwork markers were compared with gene-based, pathway-based and network-based markers. We show that GTA is not only capable of identifying robust metastatic markers, it also provides a higher classification performance. In addition, based on these GTA-based subnetworks, we identified a new bonafide candidate gene for breast cancer susceptibility.

  7. An automatic and effective approach in identifying tower cranes

    NASA Astrophysics Data System (ADS)

    Yu, Bo; Niu, Zheng; Wang, Li; Liu, Yaqi

    2012-04-01

    A method which can distinguish tower cranes from other objects in an image is proposed in this paper. It synthesizes the advantages of both morphological theory and geometrical characters to identify tower cranes accurately. The algorithm uses morphological theory to remove noise and segment images. Moreover, geometrical characters are adopted to extract tower cranes with thresholds. To test the algorithm's practical applicability, we apply it to another image to check the result. The experiments show that the approach can locate the position of tower cranes precisely and calculate the number of cranes at 100% accuracy rate. It can be applied to identifying tower cranes in small regions.

  8. Determining the familial risk distribution of colorectal cancer: A data mining approach

    PubMed Central

    Chau, Rowena; Jenkins, Mark A.; Buchanan, Daniel D.; Ouakrim, Driss Ait; Giles, Graham G.; Casey, Graham; Gallinger, Steven; Haile, Robert W.; Le Marchand, Loic; Newcomb, Polly A.; Lindor, Noralane M.; Hopper, John L.; Win, Aung Ko

    2016-01-01

    This study was aimed to characterize the distribution of colorectal cancer risk using family history of cancers by data mining. Family histories for 10,066 colorectal cancer cases recruited to population cancer registries of the Colon Cancer Family Registry were analyzed using a data mining framework. A novel index was developed to quantify familial cancer aggregation. Artificial neural network was used to identify distinct categories of familial risk. Standardized incidence ratios (SIRs) and corresponding 95% confidence intervals (CIs) of colorectal cancer were calculated for each category. We identified five major, and sixty-six minor categories of familial risk for developing colorectal cancer. The distribution the major risk categories were: (i) 7% of families (SIR=7.11; 95%CI=6.65–7.59) had a strong family history of colorectal cancer; (ii) 13% of families (SIR=2.94; 95%CI=2.78–3.10) had a moderate family history of colorectal cancer; (iii) 11% of families (SIR=1.23; 95%CI=1.12–1.36) had a strong family history of breast cancer and weak family history of colorectal cancer; (iv) 9% of families (SIR=1.06; 95% CI=0.96–1.18) had a strong family history of prostate cancer and a weak family history of colorectal cancer; and (v) 60% of families (SIR=0.61; 95%CI=0.57–0.65) had weak family history of all cancers. There is a wide variation of colorectal cancer risk that can be categorized by family history of cancer, with a strong gradient of colorectal cancer risk between the highest and lowest risk categories. The risk of colorectal cancer for people with the highest risk category of family history (7% of the population) was 12-times that for people in the lowest risk category (60%) of the population. Data mining was proven an effective approach for gaining insight into the underlying cancer aggregation patterns and for categorizing familial risk of colorectal cancer. PMID:26681340

  9. Mining the human complexome database identifies RBM14 as an XPO1-associated protein involved in HIV-1 Rev function.

    PubMed

    Budhiraja, Sona; Liu, Hongbing; Couturier, Jacob; Malovannaya, Anna; Qin, Jun; Lewis, Dorothy E; Rice, Andrew P

    2015-04-01

    By recruiting the host protein XPO1 (CRM1), the HIV-1 Rev protein mediates the nuclear export of incompletely spliced viral transcripts. We mined data from the recently described human nuclear complexome to identify a host protein, RBM14, which associates with XPO1 and Rev and is involved in Rev function. Using a Rev-dependent p24 reporter plasmid, we found that RBM14 depletion decreased Rev activity and Rev-mediated enhancement of the cytoplasmic levels of unspliced viral transcripts. RBM14 depletion also reduced p24 expression during viral infection, indicating that RBM14 is limiting for Rev function. RBM14 has previously been shown to localize to nuclear paraspeckles, a structure implicated in retaining unspliced HIV-1 transcripts for either Rev-mediated nuclear export or degradation. We found that depletion of NEAT1 RNA, a long noncoding RNA required for paraspeckle integrity, abolished the ability of overexpressed RBM14 to enhance Rev function, indicating the dependence of RBM14 function on paraspeckle integrity. Our study extends the known host cell interactome of Rev and XPO1 and further substantiates a critical role for paraspeckles in the mechanism of action of Rev. Our study also validates the nuclear complexome as a database from which viral cofactors can be mined. This study mined a database of nuclear protein complexes to identify a cellular protein named RBM14 that is associated with XPO1 (CRM1), a nuclear protein that binds to the HIV-1 Rev protein and mediates nuclear export of incompletely spliced viral RNAs. Functional assays demonstrated that RBM14, a protein found in paraspeckle structures in the nucleus, is involved in HIV-1 Rev function. This study validates the nuclear complexome database as a reference that can be mined to identify viral cofactors. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  10. Data-mining the FlyAtlas online resource to identify core functional motifs across transporting epithelia

    PubMed Central

    2013-01-01

    Background Comparative analysis of tissue-specific transcriptomes is a powerful technique to uncover tissue functions. Our FlyAtlas.org provides authoritative gene expression levels for multiple tissues of Drosophila melanogaster (1). Although the main use of such resources is single gene lookup, there is the potential for powerful meta-analysis to address questions that could not easily be framed otherwise. Here, we illustrate the power of data-mining of FlyAtlas data by comparing epithelial transcriptomes to identify a core set of highly-expressed genes, across the four major epithelial tissues (salivary glands, Malpighian tubules, midgut and hindgut) of both adults and larvae. Method Parallel hypothesis-led and hypothesis-free approaches were adopted to identify core genes that underpin insect epithelial function. In the former, gene lists were created from transport processes identified in the literature, and their expression profiles mapped from the flyatlas.org online dataset. In the latter, gene enrichment lists were prepared for each epithelium, and genes (both transport related and unrelated) consistently enriched in transporting epithelia identified. Results A key set of transport genes, comprising V-ATPases, cation exchangers, aquaporins, potassium and chloride channels, and carbonic anhydrase, was found to be highly enriched across the epithelial tissues, compared with the whole fly. Additionally, a further set of genes that had not been predicted to have epithelial roles, were co-expressed with the core transporters, extending our view of what makes a transporting epithelium work. Further insights were obtained by studying the genes uniquely overexpressed in each epithelium; for example, the salivary gland expresses lipases, the midgut organic solute transporters, the tubules specialize for purine metabolism and the hindgut overexpresses still unknown genes. Conclusion Taken together, these data provide a unique insight into epithelial function in this

  11. North American Bats and Mines Project: A cooperative approach for integrating bat conservation and mine-land reclamation

    SciTech Connect

    Ducummon, S.L.

    1997-12-31

    Inactive underground mines now provide essential habitat for more than half of North America`s 44 bat species, including some of the largest remaining populations. Thousands of abandoned mines have already been closed or are slated for safety closures, and many are destroyed during renewed mining in historic districts. The available evidence suggests that millions of bats have already been lost due to these closures. Bats are primary predators of night-flying insects that cost American farmers and foresters billions of dollars annually, therefore, threats to bat survival are cause for serious concern. Fortunately, mine closure methods exist that protect both bats and humans. Bat Conservation International (BCI) and the USDI-Bureau of Land Management founded the North American Bats and Mines Project to provide national leadership and coordination to minimize the loss of mine-roosting bats. This partnership has involved federal and state mine-land and wildlife managers and the mining industry. BCI has trained hundreds of mine-land and wildlife managers nationwide in mine assessment techniques for bats and bat-compatible closure methods, published technical information on bats and mine-land management, presented papers on bats and mines at national mining and wildlife conferences, and collaborated with numerous federal, state, and private partners to protect some of the most important mine-roosting bat populations. Our new mining industry initiative, Mining for Habitat, is designed to develop bat habitat conservation and enhancement plans for active mining operations. It includes the creation of cost-effective artificial underground bat roosts using surplus mining materials such as old mine-truck tires and culverts buried beneath waste rock.

  12. Large screen approaches to identify novel malaria vaccine candidates

    PubMed Central

    Davies, D. Huw; Duffy, Patrick; Bodmer, Jean-Luc; Felgner, Philip L.; Doolan, Denise L.

    2016-01-01

    Until recently, malaria vaccine development efforts have focused almost exclusively on a handful of well characterized Plasmodium falciparum antigens. Despite dedicated work by many researchers on different continents spanning more than half a century, a successful malaria vaccine remains elusive. Sequencing of the P. falciparum genome has revealed more than five thousand genes, providing the foundation for systematic approaches to discover candidate vaccine antigens. We are taking advantage of this wealth of information to discover new antigens that may be more effective vaccine targets. Herein, we describe different approaches to large-scale screening of the P. falciparum genome to identify targets of either antibody responses or T cell responses using human specimens collected in Controlled Human Malaria Infections (CHMI) or under conditions of natural exposure in the field. These genome, proteome and transcriptome based approaches offer enormous potential for the development of an efficacious malaria vaccine. PMID:26428458

  13. Systems biology approaches to identify developmental bases for lung diseases.

    PubMed

    Bhattacharya, Soumyaroop; Mariani, Thomas J

    2013-04-01

    A greater understanding of the regulatory processes contributing to lung development could be helpful to identify strategies to ameliorate morbidity and mortality in premature infants and to identify individuals at risk for congenital and/or chronic lung diseases. Over the past decade, genomics technologies have enabled the production of rich gene expression databases providing information for all genes across developmental time or in diseased tissue. These data sets facilitate systems biology approaches for identifying underlying biological modules and programs contributing to the complex processes of normal development and those that may be associated with disease states. The next decade will undoubtedly see rapid and significant advances in redefining both lung development and disease at the systems level.

  14. A Hybrid Approach for Efficient Modeling of Medium-Frequency Propagation in Coal Mines

    PubMed Central

    Brocker, Donovan E.; Sieber, Peter E.; Waynert, Joseph A.; Li, Jingcheng; Werner, Pingjuan L.; Werner, Douglas H.

    2015-01-01

    An efficient procedure for modeling medium frequency (MF) communications in coal mines is introduced. In particular, a hybrid approach is formulated and demonstrated utilizing ideal transmission line equations to model MF propagation in combination with full-wave sections used for accurate simulation of local antenna-line coupling and other near-field effects. This work confirms that the hybrid method accurately models signal propagation from a source to a load for various system geometries and material compositions, while significantly reducing computation time. With such dramatic improvement to solution times, it becomes feasible to perform large-scale optimizations with the primary motivation of improving communications in coal mines both for daily operations and emergency response. Furthermore, it is demonstrated that the hybrid approach is suitable for modeling and optimizing large communication networks in coal mines that may otherwise be intractable to simulate using traditional full-wave techniques such as moment methods or finite-element analysis. PMID:26478686

  15. Social Network Analysis and Mining to Monitor and Identify Problems with Large-Scale Information and Communication Technology Interventions.

    PubMed

    da Silva, Aleksandra do Socorro; de Brito, Silvana Rossy; Vijaykumar, Nandamudi Lankalapalli; da Rocha, Cláudio Alex Jorge; Monteiro, Maurílio de Abreu; Costa, João Crisóstomo Weyl Albuquerque; Francês, Carlos Renato Lisboa

    2016-01-01

    The published literature reveals several arguments concerning the strategic importance of information and communication technology (ICT) interventions for developing countries where the digital divide is a challenge. Large-scale ICT interventions can be an option for countries whose regions, both urban and rural, present a high number of digitally excluded people. Our goal was to monitor and identify problems in interventions aimed at certification for a large number of participants in different geographical regions. Our case study is the training at the Telecentros.BR, a program created in Brazil to install telecenters and certify individuals to use ICT resources. We propose an approach that applies social network analysis and mining techniques to data collected from Telecentros.BR dataset and from the socioeconomics and telecommunications infrastructure indicators of the participants' municipalities. We found that (i) the analysis of interactions in different time periods reflects the objectives of each phase of training, highlighting the increased density in the phase in which participants develop and disseminate their projects; (ii) analysis according to the roles of participants (i.e., tutors or community members) reveals that the interactions were influenced by the center (or region) to which the participant belongs (that is, a community contained mainly members of the same region and always with the presence of tutors, contradicting expectations of the training project, which aimed for intense collaboration of the participants, regardless of the geographic region); (iii) the social network of participants influences the success of the training: that is, given evidence that the degree of the community member is in the highest range, the probability of this individual concluding the training is 0.689; (iv) the North region presented the lowest probability of participant certification, whereas the Northeast, which served municipalities with similar

  16. Social Network Analysis and Mining to Monitor and Identify Problems with Large-Scale Information and Communication Technology Interventions

    PubMed Central

    da Silva, Aleksandra do Socorro; de Brito, Silvana Rossy; Vijaykumar, Nandamudi Lankalapalli; da Rocha, Cláudio Alex Jorge; Monteiro, Maurílio de Abreu; Costa, João Crisóstomo Weyl Albuquerque; Francês, Carlos Renato Lisboa

    2016-01-01

    The published literature reveals several arguments concerning the strategic importance of information and communication technology (ICT) interventions for developing countries where the digital divide is a challenge. Large-scale ICT interventions can be an option for countries whose regions, both urban and rural, present a high number of digitally excluded people. Our goal was to monitor and identify problems in interventions aimed at certification for a large number of participants in different geographical regions. Our case study is the training at the Telecentros.BR, a program created in Brazil to install telecenters and certify individuals to use ICT resources. We propose an approach that applies social network analysis and mining techniques to data collected from Telecentros.BR dataset and from the socioeconomics and telecommunications infrastructure indicators of the participants’ municipalities. We found that (i) the analysis of interactions in different time periods reflects the objectives of each phase of training, highlighting the increased density in the phase in which participants develop and disseminate their projects; (ii) analysis according to the roles of participants (i.e., tutors or community members) reveals that the interactions were influenced by the center (or region) to which the participant belongs (that is, a community contained mainly members of the same region and always with the presence of tutors, contradicting expectations of the training project, which aimed for intense collaboration of the participants, regardless of the geographic region); (iii) the social network of participants influences the success of the training: that is, given evidence that the degree of the community member is in the highest range, the probability of this individual concluding the training is 0.689; (iv) the North region presented the lowest probability of participant certification, whereas the Northeast, which served municipalities with similar

  17. Integrated approach of environmental impact and risk assessment of Rosia Montana Mining Area, Romania.

    PubMed

    Stefănescu, Lucrina; Robu, Brînduşa Mihaela; Ozunu, Alexandru

    2013-11-01

    The environmental impact assessment of mining sites represents nowadays a large interest topic in Romania. Historical pollution in the Rosia Montana mining area of Romania caused extensive damage to environmental media. This paper has two goals: to investigate the environmental pollution induced by mining activities in the Rosia Montana area and to quantify the environmental impacts and associated risks by means of an integrated approach. Thus, a new method was developed and applied for quantifying the impact of mining activities, taking account of the quality of environmental media in the mining area, and used as case study in the present paper. The associated risks are a function of the environmental impacts and the probability of their occurrence. The results show that the environmental impacts and quantified risks, based on quality indicators to characterize the environmental quality, are of a higher order, and thus measures for pollution remediation and control need to be considered in the investigated area. The conclusion drawn is that an integrated approach for the assessment of environmental impact and associated risks is a valuable and more objective method, and is an important tool that can be applied in the decision-making process for national authorities in the prioritization of emergency action.

  18. Quantitative risk-based approach for improving water quality management in mining.

    PubMed

    Liu, Wenying; Moran, Chris J; Vink, Sue

    2011-09-01

    The potential environmental threats posed by freshwater withdrawal and mine water discharge are some of the main drivers for the mining industry to improve water management. The use of multiple sources of water supply and introducing water reuse into the mine site water system have been part of the operating philosophies employed by the mining industry to realize these improvements. However, a barrier to implementation of such good water management practices is concomitant water quality variation and the resulting impacts on the efficiency of mineral separation processes, and an increased environmental consequence of noncompliant discharge events. There is an increasing appreciation that conservative water management practices, production efficiency, and environmental consequences are intimately linked through the site water system. It is therefore essential to consider water management decisions and their impacts as an integrated system as opposed to dealing with each impact separately. This paper proposes an approach that could assist mine sites to manage water quality issues in a systematic manner at the system level. This approach can quantitatively forecast the risk related with water quality and evaluate the effectiveness of management strategies in mitigating the risk by quantifying implications for production and hence economic viability.

  19. Meta-control of combustion performance with a data mining approach

    NASA Astrophysics Data System (ADS)

    Song, Zhe

    Large scale combustion process is complex and proposes challenges of optimizing its performance. Traditional approaches based on thermal dynamics have limitations on finding optimal operational regions due to time-shift nature of the process. Recent advances in information technology enable people collect large volumes of process data easily and continuously. The collected process data contains rich information about the process and, to some extent, represents a digital copy of the process over time. Although large volumes of data exist in industrial combustion processes, they are not fully utilized to the level where the process can be optimized. Data mining is an emerging science which finds patterns or models from large data sets. It has found many successful applications in business marketing, medical and manufacturing domains The focus of this dissertation is on applying data mining to industrial combustion processes, and ultimately optimizing the combustion performance. However the philosophy, methods and frameworks discussed in this research can also be applied to other industrial processes. Optimizing an industrial combustion process has two major challenges. One is the underlying process model changes over time and obtaining an accurate process model is nontrivial. The other is that a process model with high fidelity is usually highly nonlinear, solving the optimization problem needs efficient heuristics. This dissertation is set to solve these two major challenges. The major contribution of this 4-year research is the data-driven solution to optimize the combustion process, where process model or knowledge is identified based on the process data, then optimization is executed by evolutionary algorithms to search for optimal operating regions.

  20. A Study of the Physical and Mechanical Properties of Lutetium Compared with Those of Transition Metals: A Data Mining Approach

    NASA Astrophysics Data System (ADS)

    Settouti, Nadera; Aourag, Hafid

    2015-01-01

    In this article, we study the physical and mechanical properties of lutetium, which will be compared with the elements of the third-row transition metals (Cs, Ba, Hf, Ta, W, Re, Os, Ir, Pt, Au, Tl, Pb, and Bi). Data mining is an ideal approach for analyzing the information and exploring the hidden knowledge among the data. The purpose of the data mining scheme is to identify and classify the effects of the relationships existing between properties. The results of the investigation are presented by means of multivariate modeling methods, such as the principal component analysis and the partial least squares regression to discover the implicit, yet meaningful, relationship between the elements of the data set, and to locate correlations between the properties of the materials. In this study, we present a data mining approach to discover such unusual correlations between properties of the elements. When comparing the properties of the transition metals with those of lutetium, our results show that lutetium shares many properties and similarities with the transition metals of the sixth row in the periodic table and can be well described as a transition metal.

  1. Identifying heterogeneity among injection drug users: a cluster analysis approach.

    PubMed

    Shaw, Souradet Y; Shah, Lena; Jolly, Ann M; Wylie, John L

    2008-08-01

    We used cluster analysis to subdivide a population of injection drug users and identify previously unknown behavioral heterogeneity within that population. We applied cluster analysis techniques to data collected in a cross-sectional survey of injection drug users in Winnipeg, Manitoba. The clustering variables we used were based on receptive syringe sharing, ethnicity, and types of drugs injected. Seven clusters were identified for both male and female injection drug users. Some relationships previously revealed in our study setting, such as the known relationship between Talwin (pentazocine) and Ritalin (methylphenidate) use, injection in hotels, and hepatitis C virus prevalence, were confirmed through our cluster analysis approach. Also, relationships between drug use and infection risk not previously observed in our study setting were identified, an example being a cluster of female crystal methamphetamine users who exhibited high-risk behaviors but an absence or low prevalence of blood-borne pathogens. Cluster analysis was useful in both confirming relationships previously identified and identifying new ones relevant to public health research and interventions.

  2. Reverse Pathway Genetic Approach Identifies Epistasis in Autism Spectrum Disorders

    PubMed Central

    Traglia, Michela; Tsang, Kathryn; Bearden, Carrie E.; Rauen, Katherine A.

    2017-01-01

    Although gene-gene interaction, or epistasis, plays a large role in complex traits in model organisms, genome-wide by genome-wide searches for two-way interaction have limited power in human studies. We thus used knowledge of a biological pathway in order to identify a contribution of epistasis to autism spectrum disorders (ASDs) in humans, a reverse-pathway genetic approach. Based on previous observation of increased ASD symptoms in Mendelian disorders of the Ras/MAPK pathway (RASopathies), we showed that common SNPs in RASopathy genes show enrichment for association signal in GWAS (P = 0.02). We then screened genome-wide for interactors with RASopathy gene SNPs and showed strong enrichment in ASD-affected individuals (P < 2.2 x 10−16), with a number of pairwise interactions meeting genome-wide criteria for significance. Finally, we utilized quantitative measures of ASD symptoms in RASopathy-affected individuals to perform modifier mapping via GWAS. One top region overlapped between these independent approaches, and we showed dysregulation of a gene in this region, GPR141, in a RASopathy neural cell line. We thus used orthogonal approaches to provide strong evidence for a contribution of epistasis to ASDs, confirm a role for the Ras/MAPK pathway in idiopathic ASDs, and to identify a convergent candidate gene that may interact with the Ras/MAPK pathway. PMID:28076348

  3. A Hybrid Data Mining Approach for Credit Card Usage Behavior Analysis

    NASA Astrophysics Data System (ADS)

    Tsai, Chieh-Yuan

    Credit card is one of the most popular e-payment approaches in current online e-commerce. To consolidate valuable customers, card issuers invest a lot of money to maintain good relationship with their customers. Although several efforts have been done in studying card usage motivation, few researches emphasize on credit card usage behavior analysis when time periods change from t to t+1. To address this issue, an integrated data mining approach is proposed in this paper. First, the customer profile and their transaction data at time period t are retrieved from databases. Second, a LabelSOM neural network groups customers into segments and identify critical characteristics for each group. Third, a fuzzy decision tree algorithm is used to construct usage behavior rules of interesting customer groups. Finally, these rules are used to analysis the behavior changes between time periods t and t+1. An implementation case using a practical credit card database provided by a commercial bank in Taiwan is illustrated to show the benefits of the proposed framework.

  4. Mining 3D genome structure populations identifies major factors governing the stability of regulatory communities

    PubMed Central

    Dai, Chao; Li, Wenyuan; Tjong, Harianto; Hao, Shengli; Zhou, Yonggang; Li, Qingjiao; Chen, Lin; Zhu, Bing; Alber, Frank; Jasmine Zhou, Xianghong

    2016-01-01

    Three-dimensional (3D) genome structures vary from cell to cell even in an isogenic sample. Unlike protein structures, genome structures are highly plastic, posing a significant challenge for structure-function mapping. Here we report an approach to comprehensively identify 3D chromatin clusters that each occurs frequently across a population of genome structures, either deconvoluted from ensemble-averaged Hi-C data or from a collection of single-cell Hi-C data. Applying our method to a population of genome structures (at the macrodomain resolution) of lymphoblastoid cells, we identify an atlas of stable inter-chromosomal chromatin clusters. A large number of these clusters are enriched in binding of specific regulatory factors and are therefore defined as ‘Regulatory Communities.' We reveal two major factors, centromere clustering and transcription factor binding, which significantly stabilize such communities. Finally, we show that the regulatory communities differ substantially from cell to cell, indicating that expression variability could be impacted by genome structures. PMID:27240697

  5. Kinomic profiling approach identifies Trk as a novel radiation modulator

    PubMed Central

    Jarboe, John S.; Jaboin, Jerry J.; Anderson, Joshua C.; Nowsheen, Somaira; Stanley, Jennifer A.; Naji, Faris; Ruijtenbeek, Rob; Tu, Tianxiang; Hallahan, Dennis E.; Yang, Eddy S.; Bonner, James A.; Willey, Christopher D.

    2012-01-01

    Background Ionizing radiation treatment is used in over half of all cancer patients, thus determining the mechanisms of response or resistance is critical for the development of novel treatment approaches. Materials and methods In this report, we utilize a high-content peptide array platform that performs multiplex kinase assays with real-time kinetic readout to investigate the mechanism of radiation response in vascular endothelial cells. We applied this technology to irradiated human umbilical vein endothelial cells (HUVEC). Results We identified 49 specific tyrosine phosphopeptides that were differentially affected by irradiation over a time course of one hour. In one example, the Tropomyosin receptor kinase (Trk) family members, TrkA and TrkB, showed transient activation between 2–15 minutes following irradiation. When we targeted TrkA and TrkB using small molecule inhibitors, HUVEC were protected from radiation damage. Conversely, stimulation of TrkA using gambogic amide promoted radiation enhancement. Conclusions Thus, we show that our approach not only can identify rapid changes in kinase activity but also identify novel targets such as TrkA. TrkA inhibition resulted in radioprotection that correlated with enhanced repair of radiation-induced damage while TrkA stimulation by gambogic amide produced radiation sensitization. PMID:22561027

  6. Kinomic profiling approach identifies Trk as a novel radiation modulator.

    PubMed

    Jarboe, John S; Jaboin, Jerry J; Anderson, Joshua C; Nowsheen, Somaira; Stanley, Jennifer A; Naji, Faris; Ruijtenbeek, Rob; Tu, Tianxiang; Hallahan, Dennis E; Yang, Eddy S; Bonner, James A; Willey, Christopher D

    2012-06-01

    Ionizing radiation treatment is used in over half of all cancer patients, thus determining the mechanisms of response or resistance is critical for the development of novel treatment approaches. In this report, we utilize a high-content peptide array platform that performs multiplex kinase assays with real-time kinetic readout to investigate the mechanism of radiation response in vascular endothelial cells. We applied this technology to irradiated human umbilical vein endothelial cells (HUVEC). We identified 49 specific tyrosine phosphopeptides that were differentially affected by irradiation over a time course of 1h. In one example, the Tropomyosin receptor kinase (Trk) family members, TrkA and TrkB, showed transient activation between 2 and 15 min following irradiation. When we targeted TrkA and TrkB using small molecule inhibitors, HUVEC were protected from radiation damage. Conversely, stimulation of TrkA using gambogic amide promoted radiation enhancement. Thus, we show that our approach not only can identify rapid changes in kinase activity but also identify novel targets such as TrkA. TrkA inhibition resulted in radioprotection that correlated with enhanced repair of radiation-induced damage while TrkA stimulation by gambogic amide produced radiation sensitization. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  7. A multidimensional proteomic approach to identify hypertrophy-associated proteins.

    PubMed

    Lindsey, Merry L; Goshorn, Danielle K; Comte-Walters, Susana; Hendrick, Jennifer W; Hapke, Elizabeth; Zile, Michael R; Schey, Kevin

    2006-04-01

    Left ventricular hypertrophy (LVH) is a leading cause of congestive heart failure. The exact mechanisms that control cardiac growth and regulate the transition to failure are not fully understood, in part due to the lack of a complete inventory of proteins associated with LVH. We investigated the proteomic basis of LVH using the transverse aortic constriction model of pressure overload in mice coupled with a multidimensional approach to identify known and novel proteins that may be relevant to the development and maintenance of LVH. We identified 123 proteins that were differentially expressed during LVH, including LIM proteins, thioredoxin, myoglobin, fatty acid binding protein 3, the abnormal spindle-like microcephaly protein (ASPM), and cytoskeletal proteins such as actin and myosin. In addition, proteins with unknown functions were identified, providing new directions for future research in this area. We also discuss common pitfalls and strategies to overcome the limitations of current proteomic technologies. Together, the multidimensional approach provides insight into the proteomic changes that occur in the LV during hypertrophy.

  8. Performance-based approach to improve skills, safety, and training in the mining industry. Research report, September 1984-October 1989 (Final)

    SciTech Connect

    Klishis, M.J.; Althouse, R.C.; Grayson, R.L.; Lies, G.M.

    1989-10-01

    The Mining Extension Service of West Virginia University has developed an approach for identifying the specific training needs of mining operations (production, maintenance and support tasks) that can be used to upgrade on-the-job training, annual refresher training and task training. It can also be directed at systematically correcting performance discrepancies of an individual, a crew, or the mining operation, or to challenge workers and management toward attaining improved performances. Built on the systematic use of operational data, the Training in Operations Process (TOP) approach is designed to integrate features such as diligence in monitoring and evaluating performances into operational performance decisions. The approach may also be applied to other (non-training) interventions by mine management. Based on research on the state of training and worker performance in longwall mining, coal preparation plants, and underground haulage operations, this approach provides a practical five-step process for managers to implement and focus training so that it coincides with the organization's productivity and safety goals. The system permits management to plan, organize, and schedule task training, cross training, annual refresher training, and specialized skills training for miners' regular job assignments, and for their back-up, or fill-in-roles.

  9. EVALUATION OF A TWO-STAGE PASSIVE TREATMENT APPROACH FOR MINING INFLUENCE WATERS

    EPA Science Inventory

    A two-stage passive treatment approach was assessed at bench-scale using two Colorado Mining Influenced Waters (MIWs). The first-stage was a limestone drain with the purpose of removing iron and aluminum and mitigating the potential effects of mineral acidity. The second stage w...

  10. EVALUATION OF A TWO-STAGE PASSIVE TREATMENT APPROACH FOR MINING INFLUENCE WATERS

    EPA Science Inventory

    A two-stage passive treatment approach was assessed at bench-scale using two Colorado Mining Influenced Waters (MIWs). The first-stage was a limestone drain with the purpose of removing iron and aluminum and mitigating the potential effects of mineral acidity. The second stage w...

  11. Assessing the effectiveness of sustainable land management policies for combating desertification: A data mining approach.

    PubMed

    Salvati, L; Kosmas, C; Kairis, O; Karavitis, C; Acikalin, S; Belgacem, A; Solé-Benet, A; Chaker, M; Fassouli, V; Gokceoglu, C; Gungor, H; Hessel, R; Khatteli, H; Kounalaki, A; Laouina, A; Ocakoglu, F; Ouessar, M; Ritsema, C; Sghaier, M; Sonmez, H; Taamallah, H; Tezcan, L; de Vente, J; Kelly, C; Colantoni, A; Carlucci, M

    2016-12-01

    This study investigates the relationship between fine resolution, local-scale biophysical and socioeconomic contexts within which land degradation occurs, and the human responses to it. The research draws on experimental data collected under different territorial and socioeconomic conditions at 586 field sites in five Mediterranean countries (Spain, Greece, Turkey, Tunisia and Morocco). We assess the level of desertification risk under various land management practices (terracing, grazing control, prevention of wildland fires, soil erosion control measures, soil water conservation measures, sustainable farming practices, land protection measures and financial subsidies) taken as possible responses to land degradation. A data mining approach, incorporating principal component analysis, non-parametric correlations, multiple regression and canonical analysis, was developed to identify the spatial relationship between land management conditions, the socioeconomic and environmental context (described using 40 biophysical and socioeconomic indicators) and desertification risk. Our analysis identified a number of distinct relationships between the level of desertification experienced and the underlying socioeconomic context, suggesting that the effectiveness of responses to land degradation is strictly dependent on the local biophysical and socioeconomic context. Assessing the latent relationship between land management practices and the biophysical/socioeconomic attributes characterizing areas exposed to different levels of desertification risk proved to be an indirect measure of the effectiveness of field actions contrasting land degradation. Copyright © 2016 Elsevier Ltd. All rights reserved.

  12. A spatio-temporal mining approach towards summarizing and analyzing protein folding trajectories.

    PubMed

    Yang, Hui; Parthasarathy, Srinivasan; Ucar, Duygu

    2007-04-04

    Understanding the protein folding mechanism remains a grand challenge in structural biology. In the past several years, computational theories in molecular dynamics have been employed to shed light on the folding process. Coupled with high computing power and large scale storage, researchers now can computationally simulate the protein folding process in atomistic details at femtosecond temporal resolution. Such simulation often produces a large number of folding trajectories, each consisting of a series of 3D conformations of the protein under study. As a result, effectively managing and analyzing such trajectories is becoming increasingly important. In this article, we present a spatio-temporal mining approach to analyze protein folding trajectories. It exploits the simplicity of contact maps, while also integrating 3D structural information in the analysis. It characterizes the dynamic folding process by first identifying spatio-temporal association patterns in contact maps, then studying how such patterns evolve along a folding trajectory. We demonstrate that such patterns can be leveraged to summarize folding trajectories, and to facilitate the detection and ordering of important folding events along a folding path. We also show that such patterns can be used to identify a consensus partial folding pathway across multiple folding trajectories. Furthermore, we argue that such patterns can capture both local and global structural topology in a 3D protein conformation, thereby facilitating effective structural comparison amongst conformations. We apply this approach to analyze the folding trajectories of two small synthetic proteins-BBA5 and GSGS (or Beta3S). We show that this approach is promising towards addressing the above issues, namely, folding trajectory summarization, folding events detection and ordering, and consensus partial folding pathway identification across trajectories.

  13. Ultrabroadband photonic Internet: data mining approach to security aspects

    NASA Astrophysics Data System (ADS)

    Kalicki, Arkadiusz

    2009-06-01

    Web applications became most popular medium in the Internet. Popularity, easiness of web application frameworks together with careless development results in high number of vulnerabilities and attacks. There are several types of attacks possible because of improper input validation. SQL injection is ability to execute arbitrary SQL queries in a database through an existing application. Cross-site scripting is the vulnerability which allows malicious web users to inject code into the web pages viewed by other users. Cross-Site Request Forgery (CSRF) is an attack that tricks the victim into loading a page that contains malicious request. Web spam in blogs. In order to secure web applications intrusion detection (IDS) and intrusion prevention systems (IPS) are being used. Intrusion detection systems are divided in two groups: misuse detection (traditional IDS) and anomaly detection. Misuse detection systems are signature based, have high accuracy in detecting many kinds of known attacks but cannot detect unknown and emerging attacks. This can be complemented with anomaly based intrusion detection and prevention systems. This paper presents anomaly driven proxy as an IPS and data mining based algorithm which was used to detecting anomalies. The principle of this method is the comparison of the incoming HTTP traffic with a previously built profile that contains a representation of the "normal" or expected web application usage sequence patterns. The frequent sequence patterns are found with GSP algorithm. Some basic tests show that the software catches malicious requests.

  14. Data mining approach to web application intrusions detection

    NASA Astrophysics Data System (ADS)

    Kalicki, Arkadiusz

    2011-10-01

    Web applications became most popular medium in the Internet. Popularity, easiness of web application script languages and frameworks together with careless development results in high number of web application vulnerabilities and high number of attacks performed. There are several types of attacks possible because of improper input validation: SQL injection Cross-site scripting, Cross-Site Request Forgery (CSRF), web spam in blogs and others. In order to secure web applications intrusion detection (IDS) and intrusion prevention systems (IPS) are being used. Intrusion detection systems are divided in two groups: misuse detection (traditional IDS) and anomaly detection. This paper presents data mining based algorithm for anomaly detection. The principle of this method is the comparison of the incoming HTTP traffic with a previously built profile that contains a representation of the "normal" or expected web application usage sequence patterns. The frequent sequence patterns are found with GSP algorithm. Previously presented detection method was rewritten and improved. Some tests show that the software catches malicious requests, especially long attack sequences, results quite good with medium length sequences, for short length sequences must be complemented with other methods.

  15. Quantiles Regression Approach to Identifying the Determinant of Breastfeeding Duration

    NASA Astrophysics Data System (ADS)

    Mahdiyah; Norsiah Mohamed, Wan; Ibrahim, Kamarulzaman

    In this study, quantiles regression approach is applied to the data of Malaysian Family Life Survey (MFLS), to identify factors which are significantly related to the different conditional quantiles of the breastfeeding duration. It is known that the classical linear regression methods are based on minimizing residual sum of squared, but quantiles regression use a mechanism which are based on the conditional median function and the full range of other conditional quantile functions. Overall, it is found that the period of breastfeeding is significantly related to place of living, religion and total number of children in the family.

  16. Identifying technology innovations for marginalized smallholders-A conceptual approach.

    PubMed

    Malek, Mohammad Abdul; Gatzweiler, Franz W; Von Braun, Joachim

    2017-05-01

    This paper adds a contribution in the existing literature in terms of theoretical and conceptual background for the identification of idle potentials of marginal rural areas and people by means of technological and institutional innovations. The approach follows ex-ante assessment for identifying suitable technology and institutional innovations for marginalized smallholders in marginal areas-divided into three main parts (mapping, surveying and evaluating) and several steps. Finally, it contributes to the inclusion of marginalized smallholders by an improved way of understanding the interactions between technology needs, farming systems, ecological resources and poverty characteristics in the different segments of the poor, and to link these insights with productivity enhancing technologies.

  17. Genome-wide analysis of regulatory proteases sequences identified through bioinformatics data mining in Taenia solium.

    PubMed

    Yan, Hong-Bin; Lou, Zhong-Zi; Li, Li; Brindley, Paul J; Zheng, Yadong; Luo, Xuenong; Hou, Junling; Guo, Aijiang; Jia, Wan-Zhong; Cai, Xuepeng

    2014-06-04

    . Phylogenetic analysis using Bayes approach provided support for inferring functional divergence among regulatory cysteine and serine proteases. Numerous putative proteases were identified for the first time in T. solium, and important regulatory proteases have been predicted. This comprehensive analysis not only complements the growing knowledge base of proteolytic enzymes, but also provides a platform from which to expand knowledge of cestode proteases and to explore their biochemistry and potential as intervention targets.

  18. Chemical Topic Modeling: Exploring Molecular Data Sets Using a Common Text-Mining Approach.

    PubMed

    Schneider, Nadine; Fechner, Nikolas; Landrum, Gregory A; Stiefl, Nikolaus

    2017-08-28

    Big data is one of the key transformative factors which increasingly influences all aspects of modern life. Although this transformation brings vast opportunities it also generates novel challenges, not the least of which is organizing and searching this data deluge. The field of medicinal chemistry is not different: more and more data are being generated, for instance, by technologies such as DNA encoded libraries, peptide libraries, text mining of large literature corpora, and new in silico enumeration methods. Handling those huge sets of molecules effectively is quite challenging and requires compromises that often come at the expense of the interpretability of the results. In order to find an intuitive and meaningful approach to organizing large molecular data sets, we adopted a probabilistic framework called "topic modeling" from the text-mining field. Here we present the first chemistry-related implementation of this method, which allows large molecule sets to be assigned to "chemical topics" and investigating the relationships between those. In this first study, we thoroughly evaluate this novel method in different experiments and discuss both its disadvantages and advantages. We show very promising results in reproducing human-assigned concepts using the approach to identify and retrieve chemical series from sets of molecules. We have also created an intuitive visualization of the chemical topics output by the algorithm. This is a huge benefit compared to other unsupervised machine-learning methods, like clustering, which are commonly used to group sets of molecules. Finally, we applied the new method to the 1.6 million molecules of the ChEMBL22 data set to test its robustness and efficiency. In about 1 h we built a 100-topic model of this large data set in which we could identify interesting topics like "proteins", "DNA", or "steroids". Along with this publication we provide our data sets and an open-source implementation of the new method (CheTo) which

  19. Functional epigenetic approach identifies frequently methylated genes in Ewing sarcoma.

    PubMed

    Alholle, Abdullah; Brini, Anna T; Gharanei, Seley; Vaiyapuri, Sumathi; Arrigoni, Elena; Dallol, Ashraf; Gentle, Dean; Kishida, Takeshi; Hiruma, Toru; Avigad, Smadar; Grimer, Robert; Maher, Eamonn R; Latif, Farida

    2013-11-01

    Using a candidate gene approach we recently identified frequent methylation of the RASSF2 gene associated with poor overall survival in Ewing sarcoma (ES). To identify effective biomarkers in ES on a genome-wide scale, we used a functionally proven epigenetic approach, in which gene expression was induced in ES cell lines by treatment with a demethylating agent followed by hybridization onto high density gene expression microarrays. After following a strict selection criterion, 34 genes were selected for expression and methylation analysis in ES cell lines and primary ES. Eight genes (CTHRC1, DNAJA4, ECHDC2, NEFH, NPTX2, PHF11, RARRES2, TSGA14) showed methylation frequencies of>20% in ES tumors (range 24-71%), these genes were expressed in human bone marrow derived mesenchymal stem cells (hBMSC) and hypermethylation was associated with transcriptional silencing. Methylation of NPTX2 or PHF11 was associated with poorer prognosis in ES. In addition, six of the above genes also showed methylation frequency of>20% (range 36-50%) in osteosarcomas. Identification of these genes may provide insights into bone cancer tumorigenesis and development of epigenetic biomarkers for prognosis and detection of these rare tumor types.

  20. Prediction model for peninsular Indian summer monsoon rainfall using data mining and statistical approaches

    NASA Astrophysics Data System (ADS)

    Vathsala, H.; Koolagudi, Shashidhar G.

    2017-01-01

    In this paper we discuss a data mining application for predicting peninsular Indian summer monsoon rainfall, and propose an algorithm that combine data mining and statistical techniques. We select likely predictors based on association rules that have the highest confidence levels. We then cluster the selected predictors to reduce their dimensions and use cluster membership values for classification. We derive the predictors from local conditions in southern India, including mean sea level pressure, wind speed, and maximum and minimum temperatures. The global condition variables include southern oscillation and Indian Ocean dipole conditions. The algorithm predicts rainfall in five categories: Flood, Excess, Normal, Deficit and Drought. We use closed itemset mining, cluster membership calculations and a multilayer perceptron function in the algorithm to predict monsoon rainfall in peninsular India. Using Indian Institute of Tropical Meteorology data, we found the prediction accuracy of our proposed approach to be exceptionally good.

  1. Image Mining in Remote Sensing for Coastal Wetlands Mapping: from Pixel Based to Object Based Approach

    NASA Astrophysics Data System (ADS)

    Farda, N. M.; Danoedoro, P.; Hartono; Harjoko, A.

    2016-11-01

    The availably of remote sensing image data is numerous now, and with a large amount of data it makes “knowledge gap” in extraction of selected information, especially coastal wetlands. Coastal wetlands provide ecosystem services essential to people and the environment. The aim of this research is to extract coastal wetlands information from satellite data using pixel based and object based image mining approach. Landsat MSS, Landsat 5 TM, Landsat 7 ETM+, and Landsat 8 OLI images located in Segara Anakan lagoon are selected to represent data at various multi temporal images. The input for image mining are visible and near infrared bands, PCA band, invers PCA bands, mean shift segmentation bands, bare soil index, vegetation index, wetness index, elevation from SRTM and ASTER GDEM, and GLCM (Harralick) or variability texture. There is three methods were applied to extract coastal wetlands using image mining: pixel based - Decision Tree C4.5, pixel based - Back Propagation Neural Network, and object based - Mean Shift segmentation and Decision Tree C4.5. The results show that remote sensing image mining can be used to map coastal wetlands ecosystem. Decision Tree C4.5 can be mapped with highest accuracy (0.75 overall kappa). The availability of remote sensing image mining for mapping coastal wetlands is very important to provide better understanding about their spatiotemporal coastal wetlands dynamics distribution.

  2. A software tool for determination of breast cancer treatment methods using data mining approach.

    PubMed

    Cakır, Abdülkadir; Demirel, Burçin

    2011-12-01

    In this work, breast cancer treatment methods are determined using data mining. For this purpose, software is developed to help to oncology doctor for the suggestion of application of the treatment methods about breast cancer patients. 462 breast cancer patient data, obtained from Ankara Oncology Hospital, are used to determine treatment methods for new patients. This dataset is processed with Weka data mining tool. Classification algorithms are applied one by one for this dataset and results are compared to find proper treatment method. Developed software program called as "Treatment Assistant" uses different algorithms (IB1, Multilayer Perception and Decision Table) to find out which one is giving better result for each attribute to predict and by using Java Net beans interface. Treatment methods are determined for the post surgical operation of breast cancer patients using this developed software tool. At modeling step of data mining process, different Weka algorithms are used for output attributes. For hormonotherapy output IB1, for tamoxifen and radiotherapy outputs Multilayer Perceptron and for the chemotherapy output decision table algorithm shows best accuracy performance compare to each other. In conclusion, this work shows that data mining approach can be a useful tool for medical applications particularly at the treatment decision step. Data mining helps to the doctor to decide in a short time.

  3. Evaluation of an efficient approach for identifying genetic disease loci

    SciTech Connect

    Sheffield, V.C.; Kwitek-Black, A.E.; Rokhlina, T.

    1994-09-01

    Identification of disease loci by genetic linkage analysis has been enhanced by the availability of highly polymorphic short tandem repeat polymorphic markers (STRPs). The development of high quality tri- and tetranucleotide STRPs allows new strategies to increase the efficiency of genotyping resulting in streamlined linkage studies. We have tested a strategy using pooled DNA samples from affected individuals from large Bedouin pedigrees segregating recessive disorders. Equal molar amounts of DNA from affected individuals are pooled and used as a template for PCR of STRPs. Pooled DNA from unaffected siblings are used as controls. STRPS linked to the disorder show a shift in allele frequency in the affected compared to the control pool, whereas unlinked markers show an identical allele distribution in affected and control pools. We have demonstrated the sensitivity of this approach for identifying STRPs giving positive lod scores in recessive kindreds. We have also modelled this approach with dominant pedigrees. Application of this approach to polygenic disorders should be possible by using methods to quantitate allele frequencies in pooled samples. The high quality tri- and tetranucleotide repeat markers developed by the Cooperative Human Linkage Center (CHLC) facilitate the use of this method.

  4. New approach for identifying boundary characteristics using transmissibility

    NASA Astrophysics Data System (ADS)

    Joo, Kyung-Hoon; Min, Dongwoo; Kim, Jun-Gu; Kang, Yeon June

    2017-04-01

    A novel approach is proposed for identifying boundary properties as a response model using transmissibility. This approach differs from those proposed in previous studies dealing with frequency response functions (FRFs) for joint identification. Transmissibility includes only response data, unlike FRFs that include force measurements. The boundary properties can be estimated by comparing the characteristics of the components under the free condition and connected to boundary conditions. When analyzing the components assembled compactly in the system for setting the shaker or measuring the impact force exerted on the component correctly, the proposed method could reduce the errors caused by an incorrectly measured force. The derived equation is verified using a discrete multiple degrees of freedom system with single boundary and multiple boundary conditions and by application to a beam, which is the simplest continuous structural form to validate the feasibility of the theory. The transmissibility defined by the apparent mass matrix is used for verifying the derived equation for identifying the boundary properties in the discrete system. However, when applying the equation to practical cases, as is the purpose of this research, the transmissibility matrix should be defined using only the response data. For this purpose, the accelerance matrix is modified slightly to the response matrix using the input as a unit force. This transmissibility matrix composed of response data is used for validating the equation in a continuous system. Furthermore, the effects of measurement noise are also investigated to assess the robustness of the method for application under practical conditions. Consequently, the proposed method could show reliable results by properly extracting the boundary properties in both cases. In many practical cases, this research is expected to contribute toward identifying the boundary properties in a complex system more conveniently compared to the method

  5. A computational approach for identifying pathogenicity islands in prokaryotic genomes

    PubMed Central

    Yoon, Sung Ho; Hur, Cheol-Goo; Kang, Ho-Young; Kim, Yeoun Hee; Oh, Tae Kwang; Kim, Jihyun F

    2005-01-01

    Background Pathogenicity islands (PAIs), distinct genomic segments of pathogens encoding virulence factors, represent a subgroup of genomic islands (GIs) that have been acquired by horizontal gene transfer event. Up to now, computational approaches for identifying PAIs have been focused on the detection of genomic regions which only differ from the rest of the genome in their base composition and codon usage. These approaches often lead to the identification of genomic islands, rather than PAIs. Results We present a computational method for detecting potential PAIs in complete prokaryotic genomes by combining sequence similarities and abnormalities in genomic composition. We first collected 207 GenBank accessions containing either part or all of the reported PAI loci. In sequenced genomes, strips of PAI-homologs were defined based on the proximity of the homologs of genes in the same PAI accession. An algorithm reminiscent of sequence-assembly procedure was then devised to merge overlapping or adjacent genomic strips into a large genomic region. Among the defined genomic regions, PAI-like regions were identified by the presence of homolog(s) of virulence genes. Also, GIs were postulated by calculating G+C content anomalies and codon usage bias. Of 148 prokaryotic genomes examined, 23 pathogenic and 6 non-pathogenic bacteria contained 77 candidate PAIs that partly or entirely overlap GIs. Conclusion Supporting the validity of our method, included in the list of candidate PAIs were thirty four PAIs previously identified from genome sequencing papers. Furthermore, in some instances, our method was able to detect entire PAIs for those only partial sequences are available. Our method was proven to be an efficient method for demarcating the potential PAIs in our study. Also, the function(s) and origin(s) of a candidate PAI can be inferred by investigating the PAI queries comprising it. Identification and analysis of potential PAIs in prokaryotic genomes will broaden our

  6. A clustering approach to identify severe bronchiolitis profiles in children.

    PubMed

    Dumas, Orianne; Mansbach, Jonathan M; Jartti, Tuomas; Hasegawa, Kohei; Sullivan, Ashley F; Piedra, Pedro A; Camargo, Carlos A

    2016-08-01

    Although bronchiolitis is generally considered a single disease, recent studies suggest heterogeneity. We aimed to identify severe bronchiolitis profiles using a clustering approach. We analysed data from two prospective, multicentre cohorts of children younger than 2 years hospitalised with bronchiolitis, one in the USA (2007-2010 winter seasons, n=2207) and one in Finland (2008-2010 winter seasons, n=408). Severe bronchiolitis profiles were determined by latent class analysis, classifying children based on clinical factors and viral aetiology. In the US study, four profiles were identified. Profile A (12%) was characterised by history of wheezing and eczema, wheezing at the emergency department (ED) presentation and rhinovirus infection. Profile B (36%) included children with wheezing at the ED presentation, but, in contrast to profile A, most did not have history of wheezing or eczema; this profile had the largest probability of respiratory syncytial virus infection. Profile C (34%) was the most severely ill group, with longer hospital stay and moderate-to-severe retractions. Profile D (17%) had the least severe illness, including non-wheezing children with shorter length of stay. Two of these profiles (A and D) were replicated in the Finnish cohort; a third group ('BC') included Finnish children with characteristics of profiles B and/or C in the US population. Several distinct clinical profiles (phenotypes) were identified by a clustering approach in two multicentre studies of children hospitalised for bronchiolitis. The observed heterogeneity has important implications for future research on the aetiology, management and long-term outcomes of bronchiolitis, such as future risk of childhood asthma. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/

  7. Data mining approaches to high-throughput crystal structure and compound prediction.

    PubMed

    Hautier, Geoffroy

    2014-01-01

    Predicting unknown inorganic compounds and their crystal structure is a critical step of high-throughput computational materials design and discovery. One way to achieve efficient compound prediction is to use data mining or machine learning methods. In this chapter we present a few algorithms for data mining compound prediction and their applications to different materials discovery problems. In particular, the patterns or correlations governing phase stability for experimental or computational inorganic compound databases are statistically learned and used to build probabilistic or regression models to identify novel compounds and their crystal structures. The stability of those compound candidates is then assessed using ab initio techniques. Finally, we report a few cases where data mining driven computational predictions were experimentally confirmed through inorganic synthesis.

  8. A comparative genomics approach to identifying the plasticity transcriptome

    PubMed Central

    Pfenning, Andreas R; Schwartz, Russell; Barth, Alison L

    2007-01-01

    Background Neuronal activity regulates gene expression to control learning and memory, homeostasis of neuronal function, and pathological disease states such as epilepsy. A great deal of experimental evidence supports the involvement of two particular transcription factors in shaping the genomic response to neuronal activity and mediating plasticity: CREB and zif268 (egr-1, krox24, NGFI-A). The gene targets of these two transcription factors are of considerable interest, since they may help develop hypotheses about how neural activity is coupled to changes in neural function. Results We have developed a computational approach for identifying binding sites for these transcription factors within the promoter regions of annotated genes in the mouse, rat, and human genomes. By combining a robust search algorithm to identify discrete binding sites, a comparison of targets across species, and an analysis of binding site locations within promoter regions, we have defined a group of candidate genes that are strong CREB- or zif268 targets and are thus regulated by neural activity. Our analysis revealed that CREB and zif268 share a disproportionate number of targets in common and that these common targets are dominated by transcription factors. Conclusion These observations may enable a more detailed understanding of the regulatory networks that are induced by neural activity and contribute to the plasticity transcriptome. The target genes identified in this study will be a valuable resource for investigators who hope to define the functions of specific genes that underlie activity-dependent changes in neuronal properties. PMID:17355637

  9. Identifying Subgroups among Hardcore Smokers: a Latent Profile Approach

    PubMed Central

    Bommelé, Jeroen; Kleinjan, Marloes; Schoenmakers, Tim M.; Burk, William J.; van den Eijnden, Regina; van de Mheen, Dike

    2015-01-01

    Introduction Hardcore smokers are smokers who have little to no intention to quit. Previous research suggests that there are distinct subgroups among hardcore smokers and that these subgroups vary in the perceived pros and cons of smoking and quitting. Identifying these subgroups could help to develop individualized messages for the group of hardcore smokers. In this study we therefore used the perceived pros and cons of smoking and quitting to identify profiles among hardcore smokers. Methods A sample of 510 hardcore smokers completed an online survey on the perceived pros and cons of smoking and quitting. We used these perceived pros and cons in a latent profile analysis to identify possible subgroups among hardcore smokers. To validate the profiles identified among hardcore smokers, we analysed data from a sample of 338 non-hardcore smokers in a similar way. Results We found three profiles among hardcore smokers. ‘Receptive’ hardcore smokers (36%) perceived many cons of smoking and many pros of quitting. ‘Ambivalent’ hardcore smokers (59%) were rather undecided towards quitting. ‘Resistant’ hardcore smokers (5%) saw few cons of smoking and few pros of quitting. Among non-hardcore smokers, we found similar groups of ‘receptive’ smokers (30%) and ‘ambivalent’ smokers (54%). However, a third group consisted of ‘disengaged’ smokers (16%), who saw few pros and cons of both smoking and quitting. Discussion Among hardcore smokers, we found three distinct profiles based on perceived pros and cons of smoking. This indicates that hardcore smokers are not a homogenous group. Each profile might require a different tobacco control approach. Our findings may help to develop individualized tobacco control messages for the particularly hard-to-reach group of hardcore smokers. PMID:26207829

  10. An Efficient Pattern Mining Approach for Event Detection in Multivariate Temporal Data.

    PubMed

    Batal, Iyad; Cooper, Gregory; Fradkin, Dmitriy; Harrison, James; Moerchen, Fabian; Hauskrecht, Milos

    2016-01-01

    This work proposes a pattern mining approach to learn event detection models from complex multivariate temporal data, such as electronic health records. We present Recent Temporal Pattern mining, a novel approach for efficiently finding predictive patterns for event detection problems. This approach first converts the time series data into time-interval sequences of temporal abstractions. It then constructs more complex time-interval patterns backward in time using temporal operators. We also present the Minimal Predictive Recent Temporal Patterns framework for selecting a small set of predictive and non-spurious patterns. We apply our methods for predicting adverse medical events in real-world clinical data. The results demonstrate the benefits of our methods in learning accurate event detection models, which is a key step for developing intelligent patient monitoring and decision support systems.

  11. An Efficient Pattern Mining Approach for Event Detection in Multivariate Temporal Data

    PubMed Central

    Batal, Iyad; Cooper, Gregory; Fradkin, Dmitriy; Harrison, James; Moerchen, Fabian; Hauskrecht, Milos

    2015-01-01

    This work proposes a pattern mining approach to learn event detection models from complex multivariate temporal data, such as electronic health records. We present Recent Temporal Pattern mining, a novel approach for efficiently finding predictive patterns for event detection problems. This approach first converts the time series data into time-interval sequences of temporal abstractions. It then constructs more complex time-interval patterns backward in time using temporal operators. We also present the Minimal Predictive Recent Temporal Patterns framework for selecting a small set of predictive and non-spurious patterns. We apply our methods for predicting adverse medical events in real-world clinical data. The results demonstrate the benefits of our methods in learning accurate event detection models, which is a key step for developing intelligent patient monitoring and decision support systems. PMID:26752800

  12. Data Mining: A Systems Approach to Formative Assessment

    ERIC Educational Resources Information Center

    Schmid, Dale

    2012-01-01

    This article describes how using raw data and information from reliable assessments can inform teachers' decisions leading to improved instruction. The primary aim is to use a systems approach to provide evidence of what students know and how they demonstrate mastery. Such evidence can empower teachers to reach all students. The pedagogic…

  13. Discovering drug-drug interactions: a text-mining and reasoning approach based on properties of drug metabolism.

    PubMed

    Tari, Luis; Anwar, Saadat; Liang, Shanshan; Cai, James; Baral, Chitta

    2010-09-15

    Identifying drug-drug interactions (DDIs) is a critical process in drug administration and drug development. Clinical support tools often provide comprehensive lists of DDIs, but they usually lack the supporting scientific evidences and different tools can return inconsistent results. In this article, we propose a novel approach that integrates text mining and automated reasoning to derive DDIs. Through the extraction of various facts of drug metabolism, not only the DDIs that are explicitly mentioned in text can be extracted but also the potential interactions that can be inferred by reasoning. Our approach was able to find several potential DDIs that are not present in DrugBank. We manually evaluated these interactions based on their supporting evidences, and our analysis revealed that 81.3% of these interactions are determined to be correct. This suggests that our approach can uncover potential DDIs with scientific evidences explaining the mechanism of the interactions.

  14. Discovering drug–drug interactions: a text-mining and reasoning approach based on properties of drug metabolism

    PubMed Central

    Tari, Luis; Anwar, Saadat; Liang, Shanshan; Cai, James; Baral, Chitta

    2010-01-01

    Motivation: Identifying drug–drug interactions (DDIs) is a critical process in drug administration and drug development. Clinical support tools often provide comprehensive lists of DDIs, but they usually lack the supporting scientific evidences and different tools can return inconsistent results. In this article, we propose a novel approach that integrates text mining and automated reasoning to derive DDIs. Through the extraction of various facts of drug metabolism, not only the DDIs that are explicitly mentioned in text can be extracted but also the potential interactions that can be inferred by reasoning. Results: Our approach was able to find several potential DDIs that are not present in DrugBank. We manually evaluated these interactions based on their supporting evidences, and our analysis revealed that 81.3% of these interactions are determined to be correct. This suggests that our approach can uncover potential DDIs with scientific evidences explaining the mechanism of the interactions. Contact: luis.tari@roche.com PMID:20823320

  15. A review of approaches to identifying patient phenotype cohorts using electronic health records

    PubMed Central

    Shivade, Chaitanya; Raghavan, Preethi; Fosler-Lussier, Eric; Embi, Peter J; Elhadad, Noemie; Johnson, Stephen B; Lai, Albert M

    2014-01-01

    Objective To summarize literature describing approaches aimed at automatically identifying patients with a common phenotype. Materials and methods We performed a review of studies describing systems or reporting techniques developed for identifying cohorts of patients with specific phenotypes. Every full text article published in (1) Journal of American Medical Informatics Association, (2) Journal of Biomedical Informatics, (3) Proceedings of the Annual American Medical Informatics Association Symposium, and (4) Proceedings of Clinical Research Informatics Conference within the past 3 years was assessed for inclusion in the review. Only articles using automated techniques were included. Results Ninety-seven articles met our inclusion criteria. Forty-six used natural language processing (NLP)-based techniques, 24 described rule-based systems, 41 used statistical analyses, data mining, or machine learning techniques, while 22 described hybrid systems. Nine articles described the architecture of large-scale systems developed for determining cohort eligibility of patients. Discussion We observe that there is a rise in the number of studies associated with cohort identification using electronic medical records. Statistical analyses or machine learning, followed by NLP techniques, are gaining popularity over the years in comparison with rule-based systems. Conclusions There are a variety of approaches for classifying patients into a particular phenotype. Different techniques and data sources are used, and good performance is reported on datasets at respective institutions. However, no system makes comprehensive use of electronic medical records addressing all of their known weaknesses. PMID:24201027

  16. A review of approaches to identifying patient phenotype cohorts using electronic health records.

    PubMed

    Shivade, Chaitanya; Raghavan, Preethi; Fosler-Lussier, Eric; Embi, Peter J; Elhadad, Noemie; Johnson, Stephen B; Lai, Albert M

    2014-01-01

    To summarize literature describing approaches aimed at automatically identifying patients with a common phenotype. We performed a review of studies describing systems or reporting techniques developed for identifying cohorts of patients with specific phenotypes. Every full text article published in (1) Journal of American Medical Informatics Association, (2) Journal of Biomedical Informatics, (3) Proceedings of the Annual American Medical Informatics Association Symposium, and (4) Proceedings of Clinical Research Informatics Conference within the past 3 years was assessed for inclusion in the review. Only articles using automated techniques were included. Ninety-seven articles met our inclusion criteria. Forty-six used natural language processing (NLP)-based techniques, 24 described rule-based systems, 41 used statistical analyses, data mining, or machine learning techniques, while 22 described hybrid systems. Nine articles described the architecture of large-scale systems developed for determining cohort eligibility of patients. We observe that there is a rise in the number of studies associated with cohort identification using electronic medical records. Statistical analyses or machine learning, followed by NLP techniques, are gaining popularity over the years in comparison with rule-based systems. There are a variety of approaches for classifying patients into a particular phenotype. Different techniques and data sources are used, and good performance is reported on datasets at respective institutions. However, no system makes comprehensive use of electronic medical records addressing all of their known weaknesses.

  17. Systematic approaches to identify E3 ligase substrates

    PubMed Central

    Iconomou, Mary; Saunders, Darren N.

    2016-01-01

    Protein ubiquitylation is a widespread post-translational modification, regulating cellular signalling with many outcomes, such as protein degradation, endocytosis, cell cycle progression, DNA repair and transcription. E3 ligases are a critical component of the ubiquitin proteasome system (UPS), determining the substrate specificity of the cascade by the covalent attachment of ubiquitin to substrate proteins. Currently, there are over 600 putative E3 ligases, but many are poorly characterized, particularly with respect to individual protein substrates. Here, we highlight systematic approaches to identify and validate UPS targets and discuss how they are underpinning rapid advances in our understanding of the biochemistry and biology of the UPS. The integration of novel tools, model systems and methods for target identification is driving significant interest in drug development, targeting various aspects of UPS function and advancing the understanding of a diverse range of disease processes. PMID:27834739

  18. A data mining based approach to predict spatiotemporal changes in satellite images

    NASA Astrophysics Data System (ADS)

    Boulila, W.; Farah, I. R.; Ettabaa, K. Saheb; Solaiman, B.; Ghézala, H. Ben

    2011-06-01

    The interpretation of remotely sensed images in a spatiotemporal context is becoming a valuable research topic. However, the constant growth of data volume in remote sensing imaging makes reaching conclusions based on collected data a challenging task. Recently, data mining appears to be a promising research field leading to several interesting discoveries in various areas such as marketing, surveillance, fraud detection and scientific discovery. By integrating data mining and image interpretation techniques, accurate and relevant information (i.e. functional relation between observed parcels and a set of informational contents) can be automatically elicited. This study presents a new approach to predict spatiotemporal changes in satellite image databases. The proposed method exploits fuzzy sets and data mining concepts to build predictions and decisions for several remote sensing fields. It takes into account imperfections related to the spatiotemporal mining process in order to provide more accurate and reliable information about land cover changes in satellite images. The proposed approach is validated using SPOT images representing the Saint-Denis region, capital of Reunion Island. Results show good performances of the proposed framework in predicting change for the urban zone.

  19. Novel approaches to identify protective malaria vaccine candidates

    PubMed Central

    Chia, Wan Ni; Goh, Yun Shan; Rénia, Laurent

    2014-01-01

    Efforts to develop vaccines against malaria have been the focus of substantial research activities for decades. Several categories of candidate vaccines are currently being developed for protection against malaria, based on antigens corresponding to the pre-erythrocytic, blood stage, or sexual stages of the parasite. Long lasting sterile protection from Plasmodium falciparum sporozoite challenge has been observed in human following vaccination with whole parasite formulations, clearly demonstrating that a protective immune response targeting predominantly the pre-erythrocytic stages can develop against malaria. However, most of vaccine candidates currently being investigated, which are mostly subunits vaccines, have not been able to induce substantial (>50%) protection thus far. This is due to the fact that the antigens responsible for protection against the different parasite stages are still yet to be known and relevant correlates of protection have remained elusive. For a vaccine to be developed in a timely manner, novel approaches are required. In this article, we review the novel approaches that have been developed to identify the antigens for the development of an effective malaria vaccine. PMID:25452745

  20. An implicit shape model based approach to identify armed persons

    NASA Astrophysics Data System (ADS)

    Becker, Stefan; Jüngling, Kai

    2011-06-01

    In addition to detecting and tracking persons via video surveillance in public spaces like airports and train stations, another important aspect of a situation analysis is the appearance of objects in the periphery of a person. Not only from a military perspective, in certain environments, an unidentified armed person can be an indicator for a potential threat. In order to become aware of an unidentified armed person and to initiate counteractive measures, the ability to identify persons carrying weapons is needed. In this paper we present a classification approach, which fits into an Implicit Shape Model (ISM) based person detection and is capable to differentiate between unarmed persons and persons in an aiming body posture. The approach relies on SIFT features and thus is completely independent of sensor-specific features which might only be perceivable in the visible spectrum. For person representation and detection, a generalized appearance codebook is used. Compared to a stand-alone person detection strategy with ISM, an additional training step is introduced that allows interpretation of a person hypothesis delivered by the ISM. During training, the codebook activations and positions of participated features are stored for the desired classes, in this case, persons in an aiming posture and unarmed persons. With the stored information, one is able to calculate weight factors for every feature participating in a person hypothesis in order to derive a specific classification model. The introduced model is validated using an infrared dataset which shows persons in aiming and non-aiming body postures from different angles.

  1. An effective data mining approach for structure damage identification

    NASA Astrophysics Data System (ADS)

    Hong, Soonyoung

    An efficient, neural network based, online nondestructive structural damage identification procedure is developed for determining the damage characteristics (the damage locations and the corresponding severity) from dynamic measurements in near real-time. The procedure utilizes unique data processing techniques to track the most useful modal information based on modal strain energy and to calculate the associated data based on principal component analysis for further processing in a neural network based identification scheme. With two unique features, this approach is significantly different from currently available damage identification procedures for real-time structural integrity monitoring/diagnostics. First, the most sensitive mode for the specific damage is selected in an automatic process which increases the accuracy of damage identification and decreases time spent on neural network training. Second, the approach creates unique data that extracts core characteristics from modal information for a number of different damage cases; and consequently, the accuracy of the damage identification improves significantly. This approach can be operated online providing real time structural damage identification. The method is tested for simulated damage cases, including situations of single and multiple damage in the closely-spaced frequencies of Kabe's model. The philosophy behind the proposed research is to provide a means to online and nondestructively predict the degradation of a structure's integrity (i.e. damage location and the corresponding severity, strength loss).

  2. Using Bioinformatic Approaches to Identify Pathways Targeted by Human Leukemogens

    PubMed Central

    Thomas, Reuben; Phuong, Jimmy; McHale, Cliona M.; Zhang, Luoping

    2012-01-01

    We have applied bioinformatic approaches to identify pathways common to chemical leukemogens and to determine whether leukemogens could be distinguished from non-leukemogenic carcinogens. From all known and probable carcinogens classified by IARC and NTP, we identified 35 carcinogens that were associated with leukemia risk in human studies and 16 non-leukemogenic carcinogens. Using data on gene/protein targets available in the Comparative Toxicogenomics Database (CTD) for 29 of the leukemogens and 11 of the non-leukemogenic carcinogens, we analyzed for enrichment of all 250 human biochemical pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. The top pathways targeted by the leukemogens included metabolism of xenobiotics by cytochrome P450, glutathione metabolism, neurotrophin signaling pathway, apoptosis, MAPK signaling, Toll-like receptor signaling and various cancer pathways. The 29 leukemogens formed 18 distinct clusters comprising 1 to 3 chemicals that did not correlate with known mechanism of action or with structural similarity as determined by 2D Tanimoto coefficients in the PubChem database. Unsupervised clustering and one-class support vector machines, based on the pathway data, were unable to distinguish the 29 leukemogens from 11 non-leukemogenic known and probable IARC carcinogens. However, using two-class random forests to estimate leukemogen and non-leukemogen patterns, we estimated a 76% chance of distinguishing a random leukemogen/non-leukemogen pair from each other. PMID:22851955

  3. Integrative biology approach identifies cytokine targeting strategies for psoriasis.

    PubMed

    Perera, Gayathri K; Ainali, Chrysanthi; Semenova, Ekaterina; Hundhausen, Christian; Barinaga, Guillermo; Kassen, Deepika; Williams, Andrew E; Mirza, Muddassar M; Balazs, Mercedesz; Wang, Xiaoting; Rodriguez, Robert Sanchez; Alendar, Andrej; Barker, Jonathan; Tsoka, Sophia; Ouyang, Wenjun; Nestle, Frank O

    2014-02-12

    Cytokines are critical checkpoints of inflammation. The treatment of human autoimmune disease has been revolutionized by targeting inflammatory cytokines as key drivers of disease pathogenesis. Despite this, there exist numerous pitfalls when translating preclinical data into the clinic. We developed an integrative biology approach combining human disease transcriptome data sets with clinically relevant in vivo models in an attempt to bridge this translational gap. We chose interleukin-22 (IL-22) as a model cytokine because of its potentially important proinflammatory role in epithelial tissues. Injection of IL-22 into normal human skin grafts produced marked inflammatory skin changes resembling human psoriasis. Injection of anti-IL-22 monoclonal antibody in a human xenotransplant model of psoriasis, developed specifically to test potential therapeutic candidates, efficiently blocked skin inflammation. Bioinformatic analysis integrating both the IL-22 and anti-IL-22 cytokine transcriptomes and mapping them onto a psoriasis disease gene coexpression network identified key cytokine-dependent hub genes. Using knockout mice and small-molecule blockade, we show that one of these hub genes, the so far unexplored serine/threonine kinase PIM1, is a critical checkpoint for human skin inflammation and potential future therapeutic target in psoriasis. Using in silico integration of human data sets and biological models, we were able to identify a new target in the treatment of psoriasis.

  4. Identifying new human oocyte marker genes: a microarray approach

    PubMed Central

    Gasca, Stephan; Pellestor, Franck; Assou, Said; Loup, Vanessa; Anahory, Tal; Dechaud, Hervé; De Vos, John; Hamamah, Samir

    2007-01-01

    Efficiency in classical IVF (cIVF) techniques is still impaired by poor implantation and pregnancy rates after embryo transfer. This is mostly due to a lack of reliable criteria for the selection of embryos with sufficient development potential. Several studies have provided evidence that some genes’ expression levels could be used as objective markers of oocytes and embryos competence and of their capacity to sustain a successful pregnancy. These analyses usually used reverse transcription-polymerase chain reaction to look at small sets of pre-selected genes. However, microarray approaches permit to identify a wider range of cellular marker genes. Thus they allow the identification of additional and perhaps more suited genes that could serve as embryo selection markers. Microarray screenings of circa 30 000 genes on U133P Affymetrix™ gene chips made it possible to establish the expression profile of these genes as well as other related genes in human oocytes and cumulus cells. In this study, we identified new potential regulators and marker genes such as BARD1, RBL2, RBBP7, BUB3 or BUB1B, which are involved in oocyte maturation. PMID:17298719

  5. A new approach to estimate fugitive methane emissions from coal mining in China.

    PubMed

    Ju, Yiwen; Sun, Yue; Sa, Zhanyou; Pan, Jienan; Wang, Jilin; Hou, Quanlin; Li, Qingguang; Yan, Zhifeng; Liu, Jie

    2016-02-01

    Developing a more accurate greenhouse gas (GHG) emissions inventory draws too much attention. Because of its resource endowment and technical status, China has made coal-related GHG emissions a big part of its inventory. Lacking a stoichiometric carbon conversion coefficient and influenced by geological conditions and mining technologies, previous efforts to estimate fugitive methane emissions from coal mining in China has led to disagreeing results. This paper proposes a new calculation methodology to determine fugitive methane emissions from coal mining based on the domestic analysis of gas geology, gas emission features, and the merits and demerits of existing estimation methods. This new approach involves four main parameters: in-situ original gas content, gas remaining post-desorption, raw coal production, and mining influence coefficient. The case studies in Huaibei-Huainan Coalfield and Jincheng Coalfield show that the new method obtains the smallest error, +9.59% and 7.01% respectively compared with other methods, Tier 1 and Tier 2 (with two samples) in this study, which resulted in +140.34%, +138.90%, and -18.67%, in Huaibei-Huainan Coalfield, while +64.36%, +47.07%, and -14.91% in Jincheng Coalfield. Compared with the predominantly used methods, this new one possesses the characteristics of not only being a comparably more simple process and lower uncertainty than the "emission factor method" (IPCC recommended Tier 1 and Tier 2), but also having easier data accessibility, similar uncertainty, and additional post-mining emissions compared to the "absolute gas emission method" (IPCC recommended Tier 3). Therefore, methane emissions dissipated from most of the producing coal mines worldwide could be more accurately and more easily estimated. Copyright © 2015 Elsevier B.V. All rights reserved.

  6. Crime Pattern Analysis: A Spatial Frequent Pattern Mining Approach

    DTIC Science & Technology

    2012-05-10

    feature type, Bars, and two crime types, Assaults and Drunk Driving . Red circles represent bars; blue triangles represent assaults, green squares...represent drunk driving . Given the input in Figure 4(a), the RFCP discovery process identifies RFCPs as shown in Figure 4(b). In the Figure, the...like assault or drunk driving . This chance of occurrence can be interpreted as a local fraction of instances corresponding to a crime type or feature

  7. Text mining for identifying topics in the literatures about adolescent substance use and depression.

    PubMed

    Wang, Shi-Heng; Ding, Yijun; Zhao, Weizhong; Huang, Yung-Hsiang; Perkins, Roger; Zou, Wen; Chen, James J

    2016-03-19

    Both adolescent substance use and adolescent depression are major public health problems, and have the tendency to co-occur. Thousands of articles on adolescent substance use or depression have been published. It is labor intensive and time consuming to extract huge amounts of information from the cumulated collections. Topic modeling offers a computational tool to find relevant topics by capturing meaningful structure among collections of documents. In this study, a total of 17,723 abstracts from PubMed published from 2000 to 2014 on adolescent substance use and depression were downloaded as objects, and Latent Dirichlet allocation (LDA) was applied to perform text mining on the dataset. Word clouds were used to visually display the content of topics and demonstrate the distribution of vocabularies over each topic. The LDA topics recaptured the search keywords in PubMed, and further discovered relevant issues, such as intervention program, association links between adolescent substance use and adolescent depression, such as sexual experience and violence, and risk factors of adolescent substance use, such as family factors and peer networks. Using trend analysis to explore the dynamics of proportion of topics, we found that brain research was assessed as a hot issue by the coefficient of the trend test. Topic modeling has the ability to segregate a large collection of articles into distinct themes, and it could be used as a tool to understand the literature, not only by recapturing known facts but also by discovering other relevant topics.

  8. Text-mining approach to evaluate terms for ontology development.

    PubMed

    Tsoi, Lam C; Patel, Ravi; Zhao, Wenle; Zheng, W Jim

    2009-10-01

    Developing ontologies to account for the complexity of biological systems requires the time intensive collaboration of many participants with expertise in various fields. While each participant may contribute to construct a list of terms for ontology development, no objective methods have been developed to evaluate how relevant each of these terms is to the intended domain. We have developed a computational method based on a hypergeometric enrichment test to evaluate the relevance of such terms to the intended domain. The proposed method uses the PubMed literature database to evaluate whether each potential term for ontology development is overrepresented in the abstracts that discuss the particular domain. This evaluation provides an objective approach to assess terms and prioritize them for ontology development.

  9. A Temporal Pattern Mining Approach for Classifying Electronic Health Record Data.

    PubMed

    Batal, Iyad; Valizadegan, Hamed; Cooper, Gregory F; Hauskrecht, Milos

    2013-09-01

    We study the problem of learning classification models from complex multivariate temporal data encountered in electronic health record systems. The challenge is to define a good set of features that are able to represent well the temporal aspect of the data. Our method relies on temporal abstractions and temporal pattern mining to extract the classification features. Temporal pattern mining usually returns a large number of temporal patterns, most of which may be irrelevant to the classification task. To address this problem, we present the Minimal Predictive Temporal Patterns framework to generate a small set of predictive and non-spurious patterns. We apply our approach to the real-world clinical task of predicting patients who are at risk of developing heparin induced thrombocytopenia. The results demonstrate the benefit of our approach in efficiently learning accurate classifiers, which is a key step for developing intelligent clinical monitoring systems.

  10. A novel meta-analytic approach: Mining frequent co-activation patterns in neuroimaging databases

    PubMed Central

    Caspers, Julian; Zilles, Karl; Beierle, Christoph; Rottschy, Claudia; Eickhoff, Simon B.

    2016-01-01

    In recent years, coordinate-based meta-analyses have become a powerful and widely used tool to study coactivity across neuroimaging experiments, a development that was supported by the emergence of large-scale neuroimaging databases like BrainMap. However, the evaluation of co-activation patterns is constrained by the fact that previous coordinate-based meta-analysis techniques like Activation Likelihood Estimation (ALE) and Multilevel Kernel Density Analysis (MKDA) reveal all brain regions that show convergent activity within a dataset without taking into account actual within-experiment co-occurrence patterns. To overcome this issue we here propose a novel meta-analytic approach named PaMiNI that utilizes a combination of two well-established data-mining techniques, Gaussian mixture modeling and the Apriori algorithm. By this, PaMiNI enables a data-driven detection of frequent co-activation patterns within neuroimaging datasets. The feasibility of the method is demonstrated by means of several analyses on simulated data as well as a real application. The analyses of the simulated data show that PaMiNI identifies the brain regions underlying the simulated activation foci and perfectly separates the co-activation patterns of the experiments in the simulations. Furthermore, PaMiNI still yields good results when activation foci of distinct brain regions become closer together or if they are non-Gaussian distributed. For the further evaluation, a real dataset on working memory experiments is used, which was previously examined in an ALE meta-analysis and hence allows a cross-validation of both methods. In this latter analysis, PaMiNI revealed a fronto-parietal “core” network of working memory and furthermore indicates a left-lateralization in this network. Finally, to encourage a widespread usage of this new method, the PaMiNI approach was implemented into a publicly available software system. PMID:24365675

  11. Combustion efficiency optimization and virtual testing: A data-mining approach

    SciTech Connect

    Kusiak, A.; Song, Z.

    2006-08-15

    In this paper, a data-mining approach is applied to optimize combustion efficiency of a coal-fired boiler. The combustion process is complex, nonlinear, and nonstationary. A virtual testing procedure is developed to validate the results produced by the optimization methods. The developed procedure quantifies improvements in the combustion efficiency without performing live testing, which is expensive and time consuming. The ideas introduced in this paper are illustrated with an industrial case study.

  12. A Data-Mining Scheme for Identifying Peptide Structural Motifs Responsible for Different MS/MS Fragmentation Intensity Patterns

    PubMed Central

    Huang, Yingying; Tseng, George C.; Yuan, Shinsheng; Pasa-Tolic, Ljiljana; Lipton, Mary S.; Smith, Richard D.; Wysocki, Vicki H.

    2008-01-01

    Although tandem mass spectrometry (MS/MS) has become an integral part of proteomics, intensity patterns in MS/MS spectra are rarely weighted heavily in most widely used algorithms because they are not yet fully understood. Here a knowledge mining approach is demonstrated to discover fragmentation intensity patterns and elucidate the chemical factors behind such patterns. Fragmentation intensity information from 28 330 ion trap peptide MS/MS spectra of different charge states and sequences went through unsupervised clustering using a penalized K-means algorithm. Without any prior chemistry assumptions, four clusters with distinctive fragmentation patterns were obtained. A decision tree was generated to investigate peptide sequence motif and charge state status that caused these fragmentation patterns. This data-mining scheme is generally applicable for any large data sets. It bypasses the common prior knowledge constraints and reports on the overall peptide fragmentation behavior. It improves the understanding of gas-phase peptide dissociation and provides a foundation for new or improved protein identification algorithms. PMID:18052120

  13. New Seasonal Shift in In-Stream Diurnal Nitrate Cycles Identified by Mining High-Frequency Data

    PubMed Central

    2016-01-01

    The recent development of in-situ monitoring devices, such as UV-spectrometers, makes the study of short-term stream chemistry variation relevant, especially the study of diurnal cycles, which are not yet fully understood. Our study is based on high-frequency data from an agricultural catchment (Studienlandschaft Schwingbachtal, Germany). We propose a novel approach, i.e. the combination of cluster analysis and Linear Discriminant Analysis, to mine from these data nitrate behavior patterns. As a result, we observe a seasonality of nitrate diurnal cycles, that differs from the most common cycle seasonality described in the literature, i.e. pre-dawn peaks in spring. Our cycles appear in summer and the maximum and minimum shift to a later time in late summer/autumn. This is observed both for water- and energy-limited years, thus potentially stressing the role of evapotranspiration. This concluding hypothesis on the role of evapotranspiration on nitrate stream concentration, which was obtained through data mining, broadens the perspective on the diurnal cycling of stream nitrate concentrations. PMID:27073838

  14. New Seasonal Shift in In-Stream Diurnal Nitrate Cycles Identified by Mining High-Frequency Data.

    PubMed

    Aubert, Alice H; Breuer, Lutz

    2016-01-01

    The recent development of in-situ monitoring devices, such as UV-spectrometers, makes the study of short-term stream chemistry variation relevant, especially the study of diurnal cycles, which are not yet fully understood. Our study is based on high-frequency data from an agricultural catchment (Studienlandschaft Schwingbachtal, Germany). We propose a novel approach, i.e. the combination of cluster analysis and Linear Discriminant Analysis, to mine from these data nitrate behavior patterns. As a result, we observe a seasonality of nitrate diurnal cycles, that differs from the most common cycle seasonality described in the literature, i.e. pre-dawn peaks in spring. Our cycles appear in summer and the maximum and minimum shift to a later time in late summer/autumn. This is observed both for water- and energy-limited years, thus potentially stressing the role of evapotranspiration. This concluding hypothesis on the role of evapotranspiration on nitrate stream concentration, which was obtained through data mining, broadens the perspective on the diurnal cycling of stream nitrate concentrations.

  15. An ecosystem approach to evaluate restoration measures in the lignite mining district of Lusatia/Germany

    NASA Astrophysics Data System (ADS)

    Schaaf, Wolfgang

    2015-04-01

    Lignite mining in Lusatia has a history of over 100 years. Open-cast mining directly affected an area of 1000 km2. Since 20 years we established an ecosystem oriented approach to evaluate the development and site characteristics of post-mining areas mainly restored for agricultural and silvicultural land use. Water and element budgets of afforested sites were studied under different geochemical settings in a chronosequence approach (Schaaf 2001), as well as the effect of soil amendments like sewage sludge or compost in restoration (Schaaf & Hüttl 2006). Since 10 years we also study the development of natural site regeneration in the constructed catchment Chicken Creek at the watershed scale (Schaaf et al. 2011, 2013). One of the striking characteristics of post-mining sites is a very large small-scale soil heterogeneity that has to be taken into account with respect to soil forming processes and element cycling. Results from these studies in combination with smaller-scale process studies enable to evaluate the long-term effect of restoration measures and adapted land use options. In addition, it is crucial to compare these results with data from undisturbed, i.e. non-mined sites. Schaaf, W., 2001: What can element budgets of false-time series tell us about ecosystem development on post-lignite mining sites? Ecological Engineering 17, 241-252. Schaaf, W. and Hüttl, R. F., 2006: Direct and indirect effects of soil pollution by lignite mining. Water, Air and Soil Pollution - Focus 6, 253-264. Schaaf, W., Bens, O., Fischer, A., Gerke, H.H., Gerwin, W., Grünewald, U., Holländer, H.M., Kögel-Knabner, I., Mutz, M., Schloter, M., Schulin, R., Veste, M., Winter, S. & Hüttl, R.F., 2011: Patterns and processes of initial terrestrial-ecosystem development. Journal of Plant Nutrition and Soil Science, 174, 229-239. Schaaf, W., Elmer, M., Fischer, A., Gerwin, W., Nenov, R., Pretsch, H. and Zaplate, M.K., 2013: Feedbacks between vegetation, surface structures and hydrology

  16. An integrated approach for identifying priority contaminant in ...

    EPA Pesticide Factsheets

    Environmental assessment of complex mixtures typically requires integration of chemical and biological measurements. This study demonstrates the use of a combination of instrumental chemical analyses, effects-based monitoring, and bio-effects prediction approaches to help identify potential hazards and priority contaminants in two Great Lakes Areas of Concern (AOCs), the Lower Green Bay/Fox River located near Green Bay, WI, USA and the Milwaukee River Estuary, located near Milwaukee, WI, USA. Fathead minnows were caged at four sites within each AOC (eight sites total). Following 4 d of in situ exposure, tissues and biofluids were sampled and used for targeted biological effects analyses. Additionally, 4 d composite water samples were collected concurrently at each caged fish site and analyzed for 134 analytes as well as evaluated for total estrogenic and androgenic activity using cell-based bioassays. Of the analytes examined, 75 were detected in composite samples from at least one site. Based on multiple analyses, one site in the East River and another site near a paper mill discharge from lower Green Bay/Fox River AOC, were prioritized due to their estrogenic and androgenic acitvity, respectively. The water samples from other sites generally did not exhibit significant estrogenic or androgenic activity, nor was there evidence for endocrine disruption in the fish exposed at these sites as indicated the the lack of alterations in ex vivo steroid production, c

  17. Identifying heterogeneous anisotropic properties in cerebral aneurysms: a pointwise approach.

    PubMed

    Zhao, Xuefeng; Raghavan, Madhavan L; Lu, Jia

    2011-04-01

    The traditional approaches of estimating heterogeneous properties in a soft tissue structure using optimization-based inverse methods often face difficulties because of the large number of unknowns to be simultaneously determined. This article proposes a new method for identifying the heterogeneous anisotropic nonlinear elastic properties in cerebral aneurysms. In this method, the local properties are determined directly from the pointwise stress-strain data, thus avoiding the need for simultaneously optimizing for the property values at all points/regions in the aneurysm. The stress distributions needed for a pointwise identification are computed using an inverse elastostatic method without invoking the material properties in question. This paradigm is tested numerically through simulated inflation tests on an image-based cerebral aneurysm sac. The wall tissue is modeled as an eight-ply laminate whose constitutive behavior is described by an anisotropic hyperelastic strain energy function containing four parameters. The parameters are assumed to vary continuously in the sac. Deformed configurations generated from forward finite element analysis are taken as input to inversely establish the parameter distributions. The delineated and the assigned distributions are in excellent agreement. A forward verification is conducted by comparing the displacement solutions obtained from the delineated and the assigned material parameters at a different pressure. The deviations in nodal displacements are found to be within 0.2% in most part of the sac. The study highlights some distinct features of the proposed method, and demonstrates the feasibility of organ level identification of the distributive anisotropic nonlinear properties in cerebral aneurysms.

  18. Reverse Vaccinology: An Approach for Identifying Leptospiral Vaccine Candidates

    PubMed Central

    Dellagostin, Odir A.; Grassmann, André A.; Rizzi, Caroline; Schuch, Rodrigo A.; Jorge, Sérgio; Oliveira, Thais L.; McBride, Alan J. A.; Hartwig, Daiane D.

    2017-01-01

    Leptospirosis is a major public health problem with an incidence of over one million human cases each year. It is a globally distributed, zoonotic disease and is associated with significant economic losses in farm animals. Leptospirosis is caused by pathogenic Leptospira spp. that can infect a wide range of domestic and wild animals. Given the inability to control the cycle of transmission among animals and humans, there is an urgent demand for a new vaccine. Inactivated whole-cell vaccines (bacterins) are routinely used in livestock and domestic animals, however, protection is serovar-restricted and short-term only. To overcome these limitations, efforts have focused on the development of recombinant vaccines, with partial success. Reverse vaccinology (RV) has been successfully applied to many infectious diseases. A growing number of leptospiral genome sequences are now available in public databases, providing an opportunity to search for prospective vaccine antigens using RV. Several promising leptospiral antigens were identified using this approach, although only a few have been characterized and evaluated in animal models. In this review, we summarize the use of RV for leptospirosis and discuss the need for potential improvements for the successful development of a new vaccine towards reducing the burden of human and animal leptospirosis. PMID:28098813

  19. An integrated approach for identifying priority contaminant in ...

    EPA Pesticide Factsheets

    Environmental assessment of complex mixtures typically requires integration of chemical and biological measurements. This study demonstrates the use of a combination of instrumental chemical analyses, effects-based monitoring, and bio-effects prediction approaches to help identify potential hazards and priority contaminants in two Great Lakes Areas of Concern (AOCs), the Lower Green Bay/Fox River located near Green Bay, WI, USA and the Milwaukee River Estuary, located near Milwaukee, WI, USA. Fathead minnows were caged at four sites within each AOC (eight sites total). Following 4 d of in situ exposure, tissues and biofluids were sampled and used for targeted biological effects analyses. Additionally, 4 d composite water samples were collected concurrently at each caged fish site and analyzed for 134 analytes as well as evaluated for total estrogenic and androgenic activity using cell-based bioassays. Of the analytes examined, 75 were detected in composite samples from at least one site. Based on multiple analyses, one site in the East River and another site near a paper mill discharge from lower Green Bay/Fox River AOC, were prioritized due to their estrogenic and androgenic acitvity, respectively. The water samples from other sites generally did not exhibit significant estrogenic or androgenic activity, nor was there evidence for endocrine disruption in the fish exposed at these sites as indicated the the lack of alterations in ex vivo steroid production, c

  20. Proteomic approaches to identifying carbonylated proteins in brain tissue.

    PubMed

    Linares, María; Marín-Garcíía, Patricia; Méndez, Darío; Puyet, Antonio; Diez, Amalia; Bautista, José M

    2011-04-01

    Oxidative stress plays a critical role in the pathogenesis of a number of diseases. The carbonyl end products of protein oxidation are among the most commonly measured markers of oxidation in biological samples. Protein carbonyl functional groups may be derivatized with 2,4-dinitrophenylhydrazine (DNPH) to render a stable 2,4-dinitrophenylhydrazone-protein (DNP-protein) and the carbonyl contents of individual proteins then determined by two-dimensional electrophoresis followed by immunoblotting using specific anti-DNP antibodies. Unfortunately, derivatization of proteins with DNPH could affect their mass spectrometry (MS) identification. This problem can be overcome using nontreated samples for protein identification. Nevertheless, derivatization could also affect their mobility, which might be solved by performing the derivatization step after the initial electrophoresis. Here, we compare two-dimensional redox proteome maps of mouse cerebellum acquired by performing the DNPH derivatization step before or after electrophoresis and detect differences in protein patterns. When the same approach is used for protein detection and identification, both methods were found to be useful to identify carbonylated proteins. However, whereas pre-DNPH derivatized proteins were successfully analyzed, high background staining complicated the analysis when the DNPH reaction was performed after transblotting. Comparative data on protein identification using both methods are provided.

  1. Reverse Vaccinology: An Approach for Identifying Leptospiral Vaccine Candidates.

    PubMed

    Dellagostin, Odir A; Grassmann, André A; Rizzi, Caroline; Schuch, Rodrigo A; Jorge, Sérgio; Oliveira, Thais L; McBride, Alan J A; Hartwig, Daiane D

    2017-01-14

    Leptospirosis is a major public health problem with an incidence of over one million human cases each year. It is a globally distributed, zoonotic disease and is associated with significant economic losses in farm animals. Leptospirosis is caused by pathogenic Leptospira spp. that can infect a wide range of domestic and wild animals. Given the inability to control the cycle of transmission among animals and humans, there is an urgent demand for a new vaccine. Inactivated whole-cell vaccines (bacterins) are routinely used in livestock and domestic animals, however, protection is serovar-restricted and short-term only. To overcome these limitations, efforts have focused on the development of recombinant vaccines, with partial success. Reverse vaccinology (RV) has been successfully applied to many infectious diseases. A growing number of leptospiral genome sequences are now available in public databases, providing an opportunity to search for prospective vaccine antigens using RV. Several promising leptospiral antigens were identified using this approach, although only a few have been characterized and evaluated in animal models. In this review, we summarize the use of RV for leptospirosis and discuss the need for potential improvements for the successful development of a new vaccine towards reducing the burden of human and animal leptospirosis.

  2. A quantitative approach to identifying predators from nest remains

    USGS Publications Warehouse

    Anthony, R. Michael; Grand, J.B.; Fondell, T.F.; Manly, B.F.

    2004-01-01

    Nesting success of Dusky Canada Geese (Branta canadensis occidentalis) has declined greatly since a major earthquake affected southern Alaska in 1964. To identify nest predators, we collected predation data at goose nests and photographs of predators at natural nests containing artificial eggs in 1997-2000. To document feeding behavior by nest predators, we compiled the evidence from destroyed nests with known predators on our study site and from previous studies. We constructed a profile for each predator group and compared the evidence from 895 nests with unknown predators to our predator profiles using mixture-model analysis. This analysis indicated that 72% of destroyed nests were depredated by Bald Eagles and 13% by brown bears, and also yielded the probability that each nest was correctly assigned to a predator group based on model fit. Model testing using simulations indicated that the proportion estimated for eagle predation was unbiased and the proportion for bear predation was slightly overestimated. This approach may have application whenever there are adequate data on nests destroyed by known predators and predators exhibit different feeding behavior at nests.

  3. Identifying new human oocyte marker genes: a microarray approach.

    PubMed

    Gasca, Stéphan; Pellestor, Franck; Assou, Saïd; Loup, Vanessa; Anahory, Tal; Dechaud, Hervé; De Vos, John; Hamamah, Samir

    2007-02-01

    The efficacy of classical IVF techniques is still impaired by poor implantation and pregnancy rates after embryo transfer. This is mainly due to a lack of reliable criteria for the selection of embryos with sufficient development potential. Several studies have provided evidence that some gene expression levels could be used as objective markers of oocyte and embryo competence and capacity to sustain a successful pregnancy. These analyses usually use reverse transcription-polymerase chain reaction to look at small sets of pre-selected genes. However, microarray approaches allow the identification of a wider range of cellular marker genes which could include additional and perhaps more suitable genes that could serve as embryo selection markers. Microarray screenings of around 30,000 genes on U133P Affymetrix gene chips made it possible to establish the expression profile of these genes as well as other related genes in human oocytes and cumulus cells. This study identifies new potential regulators and marker genes such as BARD1, RBL2, RBBP7, BUB3 or BUB1B, which are involved in oocyte maturation.

  4. A Novel Approach for Mining Polymorphic Microsatellite Markers In Silico

    PubMed Central

    Hoffman, Joseph I.; Nichols, Hazel J.

    2011-01-01

    An important emerging application of high-throughput 454 sequencing is the isolation of molecular markers such as microsatellites from genomic DNA. However, few studies have developed microsatellites from cDNA despite the added potential for targeting candidate genes. Moreover, to develop microsatellites usually requires the evaluation of numerous primer pairs for polymorphism in the focal species. This can be time-consuming and wasteful, particularly for taxa with low genetic diversity where the majority of primers often yield monomorphic polymerase chain reaction (PCR) products. Transcriptome assemblies provide a convenient solution, functional annotation of transcripts allowing markers to be targeted towards candidate genes, while high sequence coverage in principle permits the assessment of variability in silico. Consequently, we evaluated fifty primer pairs designed to amplify microsatellites, primarily residing within transcripts related to immunity and growth, identified from an Antarctic fur seal (Arctocephalus gazella) transcriptome assembly. In silico visualization was used to classify each microsatellite as being either polymorphic or monomorphic and to quantify the number of distinct length variants, each taken to represent a different allele. The majority of loci (n = 36, 76.0%) yielded interpretable PCR products, 23 of which were polymorphic in a sample of 24 fur seal individuals. Loci that appeared variable in silico were significantly more likely to yield polymorphic PCR products, even after controlling for microsatellite length measured in silico. We also found a significant positive relationship between inferred and observed allele number. This study not only demonstrates the feasibility of generating modest panels of microsatellites targeted towards specific classes of gene, but also suggests that in silico microsatellite variability may provide a useful proxy for PCR product polymorphism. PMID:21853104

  5. Text Mining.

    ERIC Educational Resources Information Center

    Trybula, Walter J.

    1999-01-01

    Reviews the state of research in text mining, focusing on newer developments. The intent is to describe the disparate investigations currently included under the term text mining and provide a cohesive structure for these efforts. A summary of research identifies key organizations responsible for pushing the development of text mining. A section…

  6. Novel LanT Associated Lantibiotic Clusters Identified by Genome Database Mining

    PubMed Central

    Singh, Mangal; Sareen, Dipti

    2014-01-01

    Background Frequent use of antibiotics has led to the emergence of antibiotic resistance in bacteria. Lantibiotic compounds are ribosomally synthesized antimicrobial peptides against which bacteria are not able to produce resistance, hence making them a good alternative to antibiotics. Nisin is the oldest and the most widely used lantibiotic, in food preservation, without having developed any significant resistance against it. Having their antimicrobial potential and a limited number, there is a need to identify novel lantibiotics. Methodology/Findings Identification of novel lantibiotic biosynthetic clusters from an ever increasing database of bacterial genomes, can provide a major lead in this direction. In order to achieve this, a strategy was adopted to identify novel lantibiotic biosynthetic clusters by screening the sequenced genomes for LanT homolog, which is a conserved lantibiotic transporter specific to type IB clusters. This strategy resulted in identification of 54 bacterial strains containing the LanT homologs, which are not the known lantibiotic producers. Of these, 24 strains were subjected to a detailed bioinformatic analysis to identify genes encoding for precursor peptides, modification enzyme, immunity and quorum sensing proteins. Eight clusters having two LanM determinants, similar to haloduracin and lichenicidin were identified, along with 13 clusters having a single LanM determinant as in mersacidin biosynthetic cluster. Besides these, orphan LanT homologs were also identified which might be associated with novel bacteriocins, encoded somewhere else in the genome. Three identified gene clusters had a C39 domain containing LanT transporter, associated with the LanBC proteins and double glycine type precursor peptides, the only known example of such a cluster is that of salivaricin. Conclusion This study led to the identification of 8 novel putative two-component lantibiotic clusters along with 13 having a single LanM and 3 with LanBC genes

  7. Evaluation of the approach to respirable quartz exposure control in U.S. coal mines.

    PubMed

    Joy, Gerald J

    2012-01-01

    Occupational exposure to high levels of respirable quartz can result in respiratory and other diseases in humans. The Mine Safety and Health Adminstration (MSHA) regulates exposure to respirable quartz in coal mines indirectly through reductions in the respirable coal mine dust exposure limit based on the content of quartz in the airborne respirable dust. This reduction is implemented when the quartz content of airborne respirable dust exceeds 5% by weight. The intent of this dust standard reduction is to restrict miners' exposure to respirable quartz to a time-weighted average concentration of 100 μg/m(3). The effectiveness of this indirect approach to control quartz exposure was evaluated by analyzing respirable dust samples collected by MSHA inspectors from 1995 through 2008. The performance of the current regulatory approach was found to be lacking due to the use of a variable property-quartz content in airborne dust-to establish a standard for subsequent exposures. In one situation, 11.7% (4370/37,346) of samples that were below the applicable respirable coal mine dust exposure limit exceeded 100 μg/m(3) quartz. In a second situation, 4.4% (895/20,560) of samples with 5% or less quartz content in the airborne respirable dust exceeded 100 μg/m(3) quartz. In these two situations, the samples exceeding 100 μg/m(3) quartz were not subject to any potential compliance action. Therefore, the current respirable quartz exposure control approach does not reliably maintain miner exposure below 100 μg/m(3) quartz. A separate and specific respirable quartz exposure standard may improve control of coal miners' occupational exposure to respirable quartz.

  8. A fluorescent approach for identifying P2X1 ligands

    PubMed Central

    Ruepp, Marc-David; Brozik, James A.; de Esch, Iwan J.P.; Farndale, Richard W.; Murrell-Lagnado, Ruth D.; Thompson, Andrew J.

    2015-01-01

    There are no commercially available, small, receptor-specific P2X1 ligands. There are several synthetic derivatives of the natural agonist ATP and some structurally-complex antagonists including compounds such as PPADS, NTP-ATP, suramin and its derivatives (e.g. NF279, NF449). NF449 is the most potent and selective ligand, but potencies of many others are not particularly high and they can also act at other P2X, P2Y and non-purinergic receptors. While there is clearly scope for further work on P2X1 receptor pharmacology, screening can be difficult owing to rapid receptor desensitisation. To reduce desensitisation substitutions can be made within the N-terminus of the P2X1 receptor, but these could also affect ligand properties. An alternative is the use of fluorescent voltage-sensitive dyes that respond to membrane potential changes resulting from channel opening. Here we utilised this approach in conjunction with fragment-based drug-discovery. Using a single concentration (300 μM) we identified 46 novel leads from a library of 1443 fragments (hit rate = 3.2%). These hits were independently validated by measuring concentration-dependence with the same voltage-sensitive dye, and by visualising the competition of hits with an Alexa-647-ATP fluorophore using confocal microscopy; confocal yielded kon (1.142 × 106 M−1 s−1) and koff (0.136 s−1) for Alexa-647-ATP (Kd = 119 nM). The identified hit fragments had promising structural diversity. In summary, the measurement of functional responses using voltage-sensitive dyes was flexible and cost-effective because labelled competitors were not needed, effects were independent of a specific binding site, and both agonist and antagonist actions were probed in a single assay. The method is widely applicable and could be applied to all P2X family members, as well as other voltage-gated and ligand-gated ion channels. This article is part of the Special Issue entitled ‘Fluorescent Tools in Neuropharmacology

  9. A fluorescent approach for identifying P2X1 ligands.

    PubMed

    Ruepp, Marc-David; Brozik, James A; de Esch, Iwan J P; Farndale, Richard W; Murrell-Lagnado, Ruth D; Thompson, Andrew J

    2015-11-01

    There are no commercially available, small, receptor-specific P2X1 ligands. There are several synthetic derivatives of the natural agonist ATP and some structurally-complex antagonists including compounds such as PPADS, NTP-ATP, suramin and its derivatives (e.g. NF279, NF449). NF449 is the most potent and selective ligand, but potencies of many others are not particularly high and they can also act at other P2X, P2Y and non-purinergic receptors. While there is clearly scope for further work on P2X1 receptor pharmacology, screening can be difficult owing to rapid receptor desensitisation. To reduce desensitisation substitutions can be made within the N-terminus of the P2X1 receptor, but these could also affect ligand properties. An alternative is the use of fluorescent voltage-sensitive dyes that respond to membrane potential changes resulting from channel opening. Here we utilised this approach in conjunction with fragment-based drug-discovery. Using a single concentration (300 μM) we identified 46 novel leads from a library of 1443 fragments (hit rate = 3.2%). These hits were independently validated by measuring concentration-dependence with the same voltage-sensitive dye, and by visualising the competition of hits with an Alexa-647-ATP fluorophore using confocal microscopy; confocal yielded kon (1.142 × 10(6) M(-1) s(-1)) and koff (0.136 s(-1)) for Alexa-647-ATP (Kd = 119 nM). The identified hit fragments had promising structural diversity. In summary, the measurement of functional responses using voltage-sensitive dyes was flexible and cost-effective because labelled competitors were not needed, effects were independent of a specific binding site, and both agonist and antagonist actions were probed in a single assay. The method is widely applicable and could be applied to all P2X family members, as well as other voltage-gated and ligand-gated ion channels. This article is part of the Special Issue entitled 'Fluorescent Tools in Neuropharmacology'. Copyright

  10. Identifying predictors of physics item difficulty: A linear regression approach

    NASA Astrophysics Data System (ADS)

    Mesic, Vanes; Muratovic, Hasnija

    2011-06-01

    Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal physics knowledge

  11. Text mining approach to predict hospital admissions using early medical records from the emergency department.

    PubMed

    Lucini, Filipe R; S Fogliatto, Flavio; C da Silveira, Giovani J; L Neyeloff, Jeruza; Anzanello, Michel J; de S Kuchenbecker, Ricardo; D Schaan, Beatriz

    2017-04-01

    Emergency department (ED) overcrowding is a serious issue for hospitals. Early information on short-term inward bed demand from patients receiving care at the ED may reduce the overcrowding problem, and optimize the use of hospital resources. In this study, we use text mining methods to process data from early ED patient records using the SOAP framework, and predict future hospitalizations and discharges. We try different approaches for pre-processing of text records and to predict hospitalization. Sets-of-words are obtained via binary representation, term frequency, and term frequency-inverse document frequency. Unigrams, bigrams and trigrams are tested for feature formation. Feature selection is based on χ(2) and F-score metrics. In the prediction module, eight text mining methods are tested: Decision Tree, Random Forest, Extremely Randomized Tree, AdaBoost, Logistic Regression, Multinomial Naïve Bayes, Support Vector Machine (Kernel linear) and Nu-Support Vector Machine (Kernel linear). Prediction performance is evaluated by F1-scores. Precision and Recall values are also informed for all text mining methods tested. Nu-Support Vector Machine was the text mining method with the best overall performance. Its average F1-score in predicting hospitalization was 77.70%, with a standard deviation (SD) of 0.66%. The method could be used to manage daily routines in EDs such as capacity planning and resource allocation. Text mining could provide valuable information and facilitate decision-making by inward bed management teams. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.

  12. Web Mining

    NASA Astrophysics Data System (ADS)

    Fürnkranz, Johannes

    The World-Wide Web provides every internet citizen with access to an abundance of information, but it becomes increasingly difficult to identify the relevant pieces of information. Research in web mining tries to address this problem by applying techniques from data mining and machine learning to Web data and documents. This chapter provides a brief overview of web mining techniques and research areas, most notably hypertext classification, wrapper induction, recommender systems and web usage mining.

  13. A Lagrangian approach to identifying vortex pinch-off.

    PubMed

    O'Farrell, Clara; Dabiri, John O

    2010-03-01

    A criterion for identifying vortex ring pinch-off based on the Lagrangian coherent structures (LCSs) in the flow is proposed and demonstrated for a piston-cylinder arrangement with a piston stroke to diameter (L/D) ratio of approximately 12. It is found that the appearance of a new disconnected LCS and the termination of the original LCS are indicative of the initiation of vortex pinch-off. The subsequent growth of new LCSs, which tend to roll into spirals, indicates the formation of new vortex cores in the trailing shear layer. Using this criterion, the formation number is found to be 4.1+/-0.1, which is consistent with the predicted formation number of approximately 4 of Gharib et al. [Gharib et al. J. Fluid Mech. 360, 121 (1998)]. The results obtained using the proposed LCS criterion are compared with those obtained using the circulation criterion of Gharib et al. and are found to be in excellent agreement. The LCS approach is also compared against other metrics, both Lagrangian and Eulerian, and is found to yield insight into the pinch-off process that these do not. Furthermore, the LCS analysis reveals a consistent pattern of coalescing or "pairing" of adjacent vortices in the trailing shear layer, a process which has been extensively documented in circular jets. Given that LCSs are objective and insensitive to local errors in the velocity field, the proposed criterion has the potential to be a robust tool for pinch-off identification. In particular, it may prove useful in the study of unsteady and low Reynolds number flows, where conventional methods based on vorticity prove difficult to use.

  14. Identifying Topographic Factors of Observed Landslides Based on GIS Approach.

    NASA Astrophysics Data System (ADS)

    Rohmaneo, Mohammad; Chu, Hone-Jay

    2017-04-01

    Employing statistical model to estimate spatial probabilities of landslides has been conducted by (Martin et al, 2015), using the software tool r.randomwalk to calculate the impact probability distribution of the observed landslides. This study aims to identify topographic factors from the impact probability result along the riverbank in the landsliding area by employing the statistical model and GIS approach: (1) we derive the distance of each pixel from and the height of each pixel above the river. (2) The distance and the height are used to obtain the average slope of each pixel. (3) The average is used as a strong indicator for the tendency of erosion occurrence by the river - shown as a predictor map - making a slope more susceptible to landsliding. (4) A wetness index is derived to indicate where water content occurs both in area located near and in a certain distance from the river, as some landslide occurs far away from the riverside. We demonstrate the model by implementing with a 242 km2 study area of Kaoping river basin in Southern Taiwan using an inventory in 2011, and 30 meter DEM. Due to the pixel size, we only use the observed landslides larger than 10,000 m2 in the study area. Analyzing the result we arrive at to some conclusions: (1) the average slope in the study area varies from 0 to 47 degrees. (2) Observed landslides involving wide area occurs in the meander area with average slope from 30 to 47 degrees. (3) the most observed landslides in the riverbank are located in steep average slope, indicating where the erosion by the river occurs.

  15. A data mining approach to simulating farmers' crop choices for integrated water resources management.

    PubMed

    Ekasingh, B; Ngamsomsuke, K; Letcher, R A; Spate, J

    2005-12-01

    Water and land resources in Thailand are increasingly under pressure from development. In particular, there are many resource conflicts associated with agricultural production in northern Thailand. Communities in these areas are significantly constrained in the land and water management decisions they are able to make. This paper describes the application of a data mining approach to describing and simulating farmers' decision rules in a catchment in northern Thailand. This approach is being applied to simulate social, economic and biophysical constraints on farmers' decisions in these areas as part of an integrated water management model.

  16. Identifying the Factors Affecting Science and Mathematics Achievement Using Data Mining Methods

    ERIC Educational Resources Information Center

    Kiray, S. Ahmet; Gok, Bilge; Bozkir, A. Selman

    2015-01-01

    The purpose of this article is to identify the order of significance of the variables that affect science and mathematics achievement in middle school students. For this aim, the study deals with the relationship between science and math in terms of different angles using the perspectives of multiple causes-single effect and of multiple…

  17. Identifying the Factors Affecting Science and Mathematics Achievement Using Data Mining Methods

    ERIC Educational Resources Information Center

    Kiray, S. Ahmet; Gok, Bilge; Bozkir, A. Selman

    2015-01-01

    The purpose of this article is to identify the order of significance of the variables that affect science and mathematics achievement in middle school students. For this aim, the study deals with the relationship between science and math in terms of different angles using the perspectives of multiple causes-single effect and of multiple…

  18. Using Data Mining to Identify Actionable Information: Breaking New Ground in Data-Driven Decision Making

    ERIC Educational Resources Information Center

    Streifer, Philip A.; Schumann, Jeffrey A.

    2005-01-01

    The implementation of No Child Left Behind (NCLB) presents important challenges for schools across the nation to identify problems that lead to poor performance. Yet schools must intervene with instructional programs that can make a difference and evaluate the effectiveness of such programs. New advances in artificial intelligence (AI) data-mining…

  19. Using Data Mining to Identify Actionable Information: Breaking New Ground in Data-Driven Decision Making

    ERIC Educational Resources Information Center

    Streifer, Philip A.; Schumann, Jeffrey A.

    2005-01-01

    The implementation of No Child Left Behind (NCLB) presents important challenges for schools across the nation to identify problems that lead to poor performance. Yet schools must intervene with instructional programs that can make a difference and evaluate the effectiveness of such programs. New advances in artificial intelligence (AI) data-mining…

  20. Mining a Written Values Affirmation Intervention to Identify the Unique Linguistic Features of Stigmatized Groups

    ERIC Educational Resources Information Center

    Riddle, Travis; Bhagavatula, Sowmya Sree; Guo, Weiwei; Muresan, Smaranda; Cohen, Geoff; Cook, Jonathan E.; Purdie-Vaughns, Valerie

    2015-01-01

    Social identity threat refers to the process through which an individual underperforms in some domain due to their concern with confirming a negative stereotype held about their group. Psychological research has identified this as one contributor to the underperformance and underrepresentation of women, Blacks, and Latinos in STEM fields. Over the…

  1. Intrusion detection: a novel approach that combines boosting genetic fuzzy classifier and data mining techniques

    NASA Astrophysics Data System (ADS)

    Ozyer, Tansel; Alhajj, Reda; Barker, Ken

    2005-03-01

    This paper proposes an intelligent intrusion detection system (IDS) which is an integrated approach that employs fuzziness and two of the well-known data mining techniques: namely classification and association rule mining. By using these two techniques, we adopted the idea of using an iterative rule learning that extracts out rules from the data set. Our final intention is to predict different behaviors in networked computers. To achieve this, we propose to use a fuzzy rule based genetic classifier. Our approach has two main stages. First, fuzzy association rule mining is applied and a large number of candidate rules are generated for each class. Then the rules pass through pre-screening mechanism in order to reduce the fuzzy rule search space. Candidate rules obtained after pre-screening are used in genetic fuzzy classifier to generate rules for the specified classes. Classes are defined as Normal, PRB-probe, DOS-denial of service, U2R-user to root and R2L- remote to local. Second, an iterative rule learning mechanism is employed for each class to find its fuzzy rules required to classify data each time a fuzzy rule is extracted and included in the system. A Boosting mechanism evaluates the weight of each data item in order to help the rule extraction mechanism focus more on data having relatively higher weight. Finally, extracted fuzzy rules having the corresponding weight values are aggregated on class basis to find the vote of each class label for each data item.

  2. Mining of relations between proteins over biomedical scientific literature using a deep-linguistic approach.

    PubMed

    Rinaldi, Fabio; Schneider, Gerold; Kaljurand, Kaarel; Hess, Michael; Andronis, Christos; Konstandi, Ourania; Persidis, Andreas

    2007-02-01

    The amount of new discoveries (as published in the scientific literature) in the biomedical area is growing at an exponential rate. This growth makes it very difficult to filter the most relevant results, and thus the extraction of the core information becomes very expensive. Therefore, there is a growing interest in text processing approaches that can deliver selected information from scientific publications, which can limit the amount of human intervention normally needed to gather those results. This paper presents and evaluates an approach aimed at automating the process of extracting functional relations (e.g. interactions between genes and proteins) from scientific literature in the biomedical domain. The approach, using a novel dependency-based parser, is based on a complete syntactic analysis of the corpus. We have implemented a state-of-the-art text mining system for biomedical literature, based on a deep-linguistic, full-parsing approach. The results are validated on two different corpora: the manually annotated genomics information access (GENIA) corpus and the automatically annotated arabidopsis thaliana circadian rhythms (ATCR) corpus. We show how a deep-linguistic approach (contrary to common belief) can be used in a real world text mining application, offering high-precision relation extraction, while at the same time retaining a sufficient recall.

  3. A new approach to preserve privacy data mining based on fuzzy theory in numerical database

    NASA Astrophysics Data System (ADS)

    Cui, Run; Kim, Hyoung Joong

    2014-01-01

    With the rapid development of information techniques, data mining approaches have become one of the most important tools to discover the in-deep associations of tuples in large-scale database. Hence how to protect the private information is quite a huge challenge, especially during the data mining procedure. In this paper, a new method is proposed for privacy protection which is based on fuzzy theory. The traditional fuzzy approach in this area will apply fuzzification to the data without considering its readability. A new style of obscured data expression is introduced to provide more details of the subsets without reducing the readability. Also we adopt a balance approach between the privacy level and utility when to achieve the suitable subgroups. An experiment is provided to show that this approach is suitable for the classification without a lower accuracy. In the future, this approach can be adapted to the data stream as the low computation complexity of the fuzzy function with a suitable modification.

  4. Forecasting Precipitation over the MENA Region: A Data Mining and Remote Sensing Based Approach

    NASA Astrophysics Data System (ADS)

    Elkadiri, R.; Sultan, M.; Elbayoumi, T.; Chouinard, K.

    2015-12-01

    We developed and applied an integrated approach to construct predictive tools with lead times of 1 to 12 months to forecast precipitation amounts over the Middle East and North Africa (MENA) region. The following steps were conducted: (1) acquire and analyze temporal remote sensing-based precipitation datasets (i.e. Tropical Rainfall Measuring Mission [TRMM]) over five main water source regions in the MENA area (i.e. Atlas Mountains in Morocco, Southern Sudan, Red Sea Hills of Yemen, and Blue Nile and White Nile source areas) throughout the investigation period (1998 to 2015), (2) acquire and extract monthly values for all of the climatic indices that are likely to influence the climatic patterns over the MENA region (e.g., Northern Atlantic Oscillation [NOI], Southern Oscillation Index [SOI], and Tropical North Atlantic Index [TNA]); and (3) apply data mining methods to extract relationships between the observed precipitation and the controlling factors (climatic indices) and use predictive tools to forecast monthly precipitation over each of the identified pilot study areas. Preliminary results indicate that by using the period from January 1998 until August 2012 for model training and the period from September 2012 to January 2015 for testing, precipitation can be successfully predicted with a three-months lead over South West Yemen, Atlas Mountains in Morocco, Southern Sudan, Blue Nile sources and White Nile sources with confidence (Pearson correlation coefficient: 0.911, 0.823, 0.807, 0.801 and 0.895 respectively). Future work will focus on applying this technique for prediction of precipitation over each of the climatically contiguous areas of the MENA region. If our efforts are successful, our findings will lead the way to the development and implementation of sound water management scenarios for the MENA countries.

  5. Determine the therapeutic role of radiotherapy in administrative data: a data mining approach.

    PubMed

    Zhang-Salomons, Jina; Salomons, Greg

    2015-02-03

    Clinical data gathered for administrative purposes often lack sufficient information to separate the records of radiotherapy given for palliation from those given for cure. An absence, incompleteness, or inaccuracy of such information could hinder or bias the study of the utilization and outcome of radiotherapy. This study has three specific purposes: 1) develop a method to determine the therapeutic role of radiotherapy (TRR); 2) assess the accuracy of the method; 3) report the quality of the information on treatment "intent" recorded in the clinical data in Ontario, Canada. A general purpose is to use this study as a prototype to demonstrate and test a method to assess the quality of administrative data. This is a population based retrospective study. A random sample was drawn from the treatment records with "intent" assigned in treating hospitals. A decision tree is grown using treatment parameters as predictors and "intent" as outcome variable to classify the treatments into curative or palliative. The tree classifier was applied to the entire dataset, and the classification results were compared with those identified by "intent". A manual audit was conducted to assess the accuracy of the classification. The following parameters predicted the TRR, from the strongest to the weakest: radiation dose per fraction, treated body-region, disease site, and time of treatment. When applied to the records of treatments given between 1990 and 2008 in Ontario, Canada, the classification rules correctly classified 96.1% of the records. The quality of the "intent" variable was as follows: 77.5% correctly classified, 3.7% misclassified, and 18.8% did not have an "intent" assigned. The classification rules derived in this study can be used to determine the TRR when such information is unavailable, incomplete, or inaccurate in administrative data. The study demonstrates that data mining approach can be used to effectively assess and improve the quality of large administrative

  6. A Data Mining Approach for Examining Predictors of Physical Activity Among Urban Older Adults.

    PubMed

    Yoon, Sunmoo; Suero-Tejeda, Niurka; Bakken, Suzanne

    2015-07-01

    The current study applied innovative data mining techniques to a community survey dataset to develop prediction models for two aspects of physical activity (i.e., active transport and screen time) in a sample of urban, primarily Hispanic, older adults (N=2,514). Main predictors for active transport (accuracy=69.29%, precision=0.67, recall=0.69) were immigrant status, high level of anxiety, having a place for physical activity, and willingness to make time for physical activity. The main predictors for screen time (accuracy=63.13%, precision=0.60, recall=0.63) were willingness to make time for exercise, having a place for exercise, age, and availability of family support to access health information on the Internet. Data mining methods were useful to identify intervention targets and inform design of customized interventions.

  7. Identifying Similarities in Cognitive Subtest Functional Requirements: An Empirical Approach

    ERIC Educational Resources Information Center

    Frisby, Craig L.; Parkin, Jason R.

    2007-01-01

    In the cognitive test interpretation literature, a Rational/Intuitive, Indirect Empirical, or Combined approach is typically used to construct conceptual taxonomies of the functional (behavioral) similarities between subtests. To address shortcomings of these approaches, the functional requirements for 49 subtests from six individually…

  8. Identifying Similarities in Cognitive Subtest Functional Requirements: An Empirical Approach

    ERIC Educational Resources Information Center

    Frisby, Craig L.; Parkin, Jason R.

    2007-01-01

    In the cognitive test interpretation literature, a Rational/Intuitive, Indirect Empirical, or Combined approach is typically used to construct conceptual taxonomies of the functional (behavioral) similarities between subtests. To address shortcomings of these approaches, the functional requirements for 49 subtests from six individually…

  9. Biogeometallurgical pre-mining characterization of ore deposits: an approach to increase sustainability in the mining process.

    PubMed

    Dold, Bernhard; Weibel, Leyla

    2013-11-01

    Based on the knowledge obtained from acid mine drainage formation in mine waste environments (tailings impoundments and waste rock dumps), a new methodology is applied to characterize new ore deposits before exploitation starts. This gives the opportunity to design optimized processes for metal recovery of the different mineral assemblages in an ore deposit and at the same time to minimize the environmental impact and costs downstream for mine waste management. Additionally, the whole economic potential is evaluated including strategic elements. The methodology integrates high-resolution geochemistry by sequential extractions and quantitative mineralogy in combination with kinetic bioleach tests. The produced data set allows to define biogeometallurgical units in the ore deposit and to predict the behavior of each element, economically or environmentally relevant, along the mining process.

  10. A multi-isotope approach to characterize acid mine drainage in a hardrock alpine mine, Chaffe Co,Colorado.

    NASA Astrophysics Data System (ADS)

    Cordalis, D.; Williams, M. W.; Wireman, M.; Michel, R. L.; Manning, A.

    2004-12-01

    Here we present information from an innovative suite of stable, radiogenic, and cosmogenic isotopes to better understand groundwater flowpaths and groundwater-surface water interactions in an applied acid mine drainage system. Stable water isotopes, tritium, helium-tritium, sulfur-35, and uranium 234/238 ratios were analyzed from precipitation, groundwater wells, interior mine drainages, and surface waters at the Mary Murphy Mine in Colorado to determine hydrologic transport mechanisms responsible for contaminated zinc releases. Hydrometric measurements suggested a snowmelt-driven pulse of elevated zinc in adit outflow. However, mixing models using stable water isotopes showed a regional groundwater signal in the adit outflow. Tritium values of 11 to 13 TU showed a slight enrichment of bomb spike water compared to snow values of about 9 TU, suggesting an older water source as well. Helium/tritium ratios on a subset of groundwater wells suggested that average residence times of alluvial wells ranged from 2.5 to 8 years. The combination of stable water isotopes and sulfur-35 (half-life of 87 days), showed that zinc-rich waters within the mine derived from infiltrating snowmelt more than a year old. However, measurement of sulfur-35 using low-level scintillation counts was compromised at times by the presence of uranium. We were able to remove the uranium through wet chemistry procedures, improving the accuracy of S-35 measurements. The U234/U238 ratio shows promise in discriminating between acid mine drainage and acid rock drainage. Acid rock drainage shows an unaltered ratio of 1:1, while acid mine drainage is enriched relative to the 1:1 equilibrium ratio. The combination of cosmogenic and stable isotopes within and near the Mary Murphy Mine may provide a useful tool for studying interactions between groundwater and surfacewater in a fractured rock setting. Remediation techniques can be directed more appropriately, and cost effectively, by the characterization of

  11. Phylogeny-guided (meta)genome mining approach for the targeted discovery of new microbial natural products.

    PubMed

    Kang, Hahk-Soo

    2017-02-01

    Genomics-based methods are now commonplace in natural products research. A phylogeny-guided mining approach provides a means to quickly screen a large number of microbial genomes or metagenomes in search of new biosynthetic gene clusters of interest. In this approach, biosynthetic genes serve as molecular markers, and phylogenetic trees built with known and unknown marker gene sequences are used to quickly prioritize biosynthetic gene clusters for their metabolites characterization. An increase in the use of this approach has been observed for the last couple of years along with the emergence of low cost sequencing technologies. The aim of this review is to discuss the basic concept of a phylogeny-guided mining approach, and also to provide examples in which this approach was successfully applied to discover new natural products from microbial genomes and metagenomes. I believe that the phylogeny-guided mining approach will continue to play an important role in genomics-based natural products research.

  12. Using social connection information to improve opinion mining: Identifying negative sentiment about HPV vaccines on Twitter.

    PubMed

    Zhou, Xujuan; Coiera, Enrico; Tsafnat, Guy; Arachi, Diana; Ong, Mei-Sing; Dunn, Adam G

    2015-01-01

    The manner in which people preferentially interact with others like themselves suggests that information about social connections may be useful in the surveillance of opinions for public health purposes. We examined if social connection information from tweets about human papillomavirus (HPV) vaccines could be used to train classifiers that identify anti-vaccine opinions. From 42,533 tweets posted between October 2013 and March 2014, 2,098 were sampled at random and two investigators independently identified anti-vaccine opinions. Machine learning methods were used to train classifiers using the first three months of data, including content (8,261 text fragments) and social connections (10,758 relationships). Connection-based classifiers performed similarly to content-based classifiers on the first three months of training data, and performed more consistently than content-based classifiers on test data from the subsequent three months. The most accurate classifier achieved an accuracy of 88.6% on the test data set, and used only social connection features. Information about how people are connected, rather than what they write, may be useful for improving public health surveillance methods on Twitter.

  13. DNA enrichment approaches to identify unauthorized genetically modified organisms (GMOs).

    PubMed

    Arulandhu, Alfred J; van Dijk, Jeroen P; Dobnik, David; Holst-Jensen, Arne; Shi, Jianxin; Zel, Jana; Kok, Esther J

    2016-07-01

    With the increased global production of different genetically modified (GM) plant varieties, chances increase that unauthorized GM organisms (UGMOs) may enter the food chain. At the same time, the detection of UGMOs is a challenging task because of the limited sequence information that will generally be available. PCR-based methods are available to detect and quantify known UGMOs in specific cases. If this approach is not feasible, DNA enrichment of the unknown adjacent sequences of known GMO elements is one way to detect the presence of UGMOs in a food or feed product. These enrichment approaches are also known as chromosome walking or gene walking (GW). In recent years, enrichment approaches have been coupled with next generation sequencing (NGS) analysis and implemented in, amongst others, the medical and microbiological fields. The present review will provide an overview of these approaches and an evaluation of their applicability in the identification of UGMOs in complex food or feed samples.

  14. MEDICI: Mining Essentiality Data to Identify Critical Interactions for Cancer Drug Target Discovery and Development

    PubMed Central

    Moran, Josue D.; Giuste, Felipe O.; Du, Yuhong; Ivanov, Andrei A.; Johns, Margaret A.; Khuri, Fadlo R.; Fu, Haian

    2017-01-01

    Protein-protein interactions (PPIs) mediate the transmission and regulation of oncogenic signals that are essential to cellular proliferation and survival, and thus represent potential targets for anti-cancer therapeutic discovery. Despite their significance, there is no method to experimentally disrupt and interrogate the essentiality of individual endogenous PPIs. The ability to computationally predict or infer PPI essentiality would help prioritize PPIs for drug discovery and help advance understanding of cancer biology. Here we introduce a computational method (MEDICI) to predict PPI essentiality by combining gene knockdown studies with network models of protein interaction pathways in an analytic framework. Our method uses network topology to model how gene silencing can disrupt PPIs, relating the unknown essentialities of individual PPIs to experimentally observed protein essentialities. This model is then deconvolved to recover the unknown essentialities of individual PPIs. We demonstrate the validity of our approach via prediction of sensitivities to compounds based on PPI essentiality and differences in essentiality based on genetic mutations. We further show that lung cancer patients have improved overall survival when specific PPIs are no longer present, suggesting that these PPIs may be potentially new targets for therapeutic development. Software is freely available at https://github.com/cooperlab/MEDICI. Datasets are available at https://ctd2.nci.nih.gov/dataPortal. PMID:28118365

  15. MEDICI: Mining Essentiality Data to Identify Critical Interactions for Cancer Drug Target Discovery and Development.

    PubMed

    Harati, Sahar; Cooper, Lee A D; Moran, Josue D; Giuste, Felipe O; Du, Yuhong; Ivanov, Andrei A; Johns, Margaret A; Khuri, Fadlo R; Fu, Haian; Moreno, Carlos S

    2017-01-01

    Protein-protein interactions (PPIs) mediate the transmission and regulation of oncogenic signals that are essential to cellular proliferation and survival, and thus represent potential targets for anti-cancer therapeutic discovery. Despite their significance, there is no method to experimentally disrupt and interrogate the essentiality of individual endogenous PPIs. The ability to computationally predict or infer PPI essentiality would help prioritize PPIs for drug discovery and help advance understanding of cancer biology. Here we introduce a computational method (MEDICI) to predict PPI essentiality by combining gene knockdown studies with network models of protein interaction pathways in an analytic framework. Our method uses network topology to model how gene silencing can disrupt PPIs, relating the unknown essentialities of individual PPIs to experimentally observed protein essentialities. This model is then deconvolved to recover the unknown essentialities of individual PPIs. We demonstrate the validity of our approach via prediction of sensitivities to compounds based on PPI essentiality and differences in essentiality based on genetic mutations. We further show that lung cancer patients have improved overall survival when specific PPIs are no longer present, suggesting that these PPIs may be potentially new targets for therapeutic development. Software is freely available at https://github.com/cooperlab/MEDICI. Datasets are available at https://ctd2.nci.nih.gov/dataPortal.

  16. Integration of data mining classification techniques and ensemble learning to identify risk factors and diagnose ovarian cancer recurrence.

    PubMed

    Tseng, Chih-Jen; Lu, Chi-Jie; Chang, Chi-Chang; Chen, Gin-Den; Cheewakriangkrai, Chalong

    2017-05-01

    Ovarian cancer is the second leading cause of deaths among gynecologic cancers in the world. Approximately 90% of women with ovarian cancer reported having symptoms long before a diagnosis was made. Literature shows that recurrence should be predicted with regard to their personal risk factors and the clinical symptoms of this devastating cancer. In this study, ensemble learning and five data mining approaches, including support vector machine (SVM), C5.0, extreme learning machine (ELM), multivariate adaptive regression splines (MARS), and random forest (RF), were integrated to rank the importance of risk factors and diagnose the recurrence of ovarian cancer. The medical records and pathologic status were extracted from the Chung Shan Medical University Hospital Tumor Registry. Experimental results illustrated that the integrated C5.0 model is a superior approach in predicting the recurrence of ovarian cancer. Moreover, the classification accuracies of C5.0, ELM, MARS, RF, and SVM indeed increased after using the selected important risk factors as predictors. Our findings suggest that The International Federation of Gynecology and Obstetrics (FIGO), Pathologic M, Age, and Pathologic T were the four most critical risk factors for ovarian cancer recurrence. In summary, the above information can support the important influence of personality and clinical symptom representations on all phases of guide interventions, with the complexities of multiple symptoms associated with ovarian cancer in all phases of the recurrent trajectory. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. Metal dispersion resulting from mining activities in coastal environments: A pathways approach

    USGS Publications Warehouse

    Koski, Randolph A.

    2012-01-01

    Acid rock drainage (ARD) and disposal of tailings that result from mining activities impact coastal areas in many countries. The dispersion of metals from mine sites that are both proximal and distal to the shoreline can be examined using a pathways approach in which physical and chemical processes guide metal transport in the continuum from sources (sulfide minerals) to bioreceptors (marine biota). Large amounts of metals can be physically transported to the coastal environment by intentional or accidental release of sulfide-bearing mine tailings. Oxidation of sulfide minerals results in elevated dissolved metal concentrations in surface waters on land (producing ARD) and in pore waters of submarine tailings. Changes in pH, adsorption by insoluble secondary minerals (e.g., Fe oxyhydroxides), and precipitation of soluble salts (e.g., sulfates) affect dissolved metal fluxes. Evidence for bioaccumulation includes anomalous metal concentrations in bivalves and reef corals, and overlapping Pb isotope ratios for sulfides, shellfish, and seaweed in contaminated environments. Although bioavailability and potential toxicity are, to a large extent, functions of metal speciation, specific uptake pathways, such as adsorption from solution and ingestion of particles, also play important roles. Recent emphasis on broader ecological impacts has led to complementary methodologies involving laboratory toxicity tests and field studies of species richness and diversity.

  18. Stochastic Modeling Approach for the Evaluation of Backbreak due to Blasting Operations in Open Pit Mines

    NASA Astrophysics Data System (ADS)

    Sari, Mehmet; Ghasemi, Ebrahim; Ataei, Mohammad

    2014-03-01

    Backbreak is an undesirable side effect of bench blasting operations in open pit mines. A large number of parameters affect backbreak, including controllable parameters (such as blast design parameters and explosive characteristics) and uncontrollable parameters (such as rock and discontinuities properties). The complexity of the backbreak phenomenon and the uncertainty in terms of the impact of various parameters makes its prediction very difficult. The aim of this paper is to determine the suitability of the stochastic modeling approach for the prediction of backbreak and to assess the influence of controllable parameters on the phenomenon. To achieve this, a database containing actual measured backbreak occurrences and the major effective controllable parameters on backbreak (i.e., burden, spacing, stemming length, powder factor, and geometric stiffness ratio) was created from 175 blasting events in the Sungun copper mine, Iran. From this database, first, a new site-specific empirical equation for predicting backbreak was developed using multiple regression analysis. Then, the backbreak phenomenon was simulated by the Monte Carlo (MC) method. The results reveal that stochastic modeling is a good means of modeling and evaluating the effects of the variability of blasting parameters on backbreak. Thus, the developed model is suitable for practical use in the Sungun copper mine. Finally, a sensitivity analysis showed that stemming length is the most important parameter in controlling backbreak.

  19. Metal dispersion resulting from mining activities in coastal environments: a pathways approach

    USGS Publications Warehouse

    Koski, Randolph A.

    2012-01-01

    Acid rock drainage (ARD) and disposal of tailings that result from mining activities impact coastal areas in many countries. The dispersion of metals from mine sites that are both proximal and distal to the shoreline can be examined using a pathways approach in which physical and chemical processes guide metal transport in the continuum from sources (sulfide minerals) to bioreceptors (marine biota). Large amounts of metals can be physically transported to the coastal environment by intentional or accidental release of sulfide-bearing mine tailings. Oxidation of sulfide minerals results in elevated dissolved metal concentrations in surface waters on land (producing ARD) and in pore waters of submarine tailings. Changes in pH, adsorption by insoluble secondary minerals (e.g., Fe oxyhydroxides), and precipitation of soluble salts (e.g., sulfates) affect dissolved metal fluxes. Evidence for bioaccumulation includes anomalous metal concentrations in bivalves and reef corals, and overlapping Pb isotope ratios for sulfides, shellfish, and seaweed in contaminated environments. Although bioavailability and potential toxicity are, to a large extent, functions of metal speciation, specific uptake pathways, such as adsorption from solution and ingestion of particles, also play important roles. Recent emphasis on broader ecological impacts has led to complementary methodologies involving laboratory toxicity tests and field studies of species richness and diversity.

  20. Development of a data-mining algorithm to identify ages at reproductive milestones in electronic medical records.

    PubMed

    Malinowski, Jennifer; Farber-Eger, Eric; Crawford, Dana C

    2014-01-01

    Electronic medical records (EMRs) are becoming more widely implemented following directives from the federal government and incentives for supplemental reimbursements for Medicare and Medicaid claims. Replete with rich phenotypic data, EMRs offer a unique opportunity for clinicians and researchers to identify potential research cohorts and perform epidemiologic studies. Notable limitations to the traditional epidemiologic study include cost, time to complete the study, and limited ancestral diversity; EMR-based epidemiologic studies offer an alternative. The Epidemiologic Architecture for Genes Linked to Environment (EAGLE) Study, as part of the Population Architecture using Genomics and Epidemiology (PAGE) I Study, has genotyped more than 15,000 patients of diverse ancestry in BioVU, the Vanderbilt University Medical Center's biorepository linked to the EMR (EAGLE BioVU). We report here the development and performance of data-mining techniques used to identify the age at menarche (AM) and age at menopause (AAM), important milestones in the reproductive lifespan, in women from EAGLE BioVU for genetic association studies. In addition, we demonstrate the ability to discriminate age at naturally-occurring menopause (ANM) from medically-induced menopause. Unusual timing of these events may indicate underlying pathologies and increased risk for some complex diseases and cancer; however, they are not consistently recorded in the EMR. Our algorithm offers a mechanism by which to extract these data for clinical and research goals.

  1. Soil quality assessment using GIS-based chemometric approach and pollution indices: Nakhlak mining district, Central Iran.

    PubMed

    Moore, Farid; Sheykhi, Vahideh; Salari, Mohammad; Bagheri, Adel

    2016-04-01

    This paper is a comprehensive assessment of the quality of soil in the Nakhlak mining district in Central Iran with special reference to potentially toxic metals. In this regard, an integrated approach involving geostatistical, correlation matrix, pollution indices, and chemical fractionation measurement is used to evaluate selected potentially toxic metals in soil samples. The fractionation of metals indicated a relatively high variability. Some metals (Mo, Ag, and Pb) showed important enrichment in the bioavailable fractions (i.e., exchangeable and carbonate), whereas the residual fraction mostly comprised Sb and Cr. The Cd, Zn, Co, Ni, Mo, Cu, and As were retained in Fe-Mn oxide and oxidizable fractions, suggesting that they may be released to the environment by changes in physicochemical conditions. The spatial variability patterns of 11 soil heavy metals (Ag, As, Cd, Co, Cr, Cu, Mo, Ni, Pb, Sb, and Zn) were identified and mapped. The results demonstrated that Ag, As, Cd, Mo, Cu, Pb, Sb, and Zn pollution are associated with mineralized veins and mining operations in this area. Further environmental monitoring and remedial actions are required for management of soil heavy metals in the study area. The present study not only enhanced our knowledge regarding soil pollution in the study area but also introduced a better technique to analyze pollution indices by multivariate geostatistical methods.

  2. Knowledge Discovery using Domain-Concept Mining Approach for the Behavioral Risk Factor Surveillance System (BRFSS) Data

    PubMed Central

    Mahamaneerat, Wannapa Kay; Shyu, Chi-Ren

    2006-01-01

    The publicly available Behavioral Risk Factor Surveillance System (BRFSS) data is the largest telephone survey data set in the world. Often times, the data set is under-utilized due to its size and the difficulties to comprehend and explore the relationships among variables. With a traditional data mining approach, such as association rule (AR) mining, it is still not possible to discover valuable information under the existing computational power. To promote the usefulness of this rich data set efficiently, we propose a novel data mining approach called Domain-Concept Mining (DCM) that partitions data into groups of relevant domain-concept, then extracts associations among variables from each partition. The findings from the DCM show that it can efficiently discover relevant information from the BRFSS with respect to the previously published literature. PMID:17238640

  3. A ligand-based approach to mining the chemogenomic space of drugs.

    PubMed

    Gregori-Puigjané, Elisabet; Mestres, Jordi

    2008-09-01

    The practical implementation and validation of a ligand-based approach to mining the chemogenomic space of drugs is presented and applied to the in silico target profiling of 767 drugs against 684 targets of therapeutic relevance. The results reveal that drugs targeting aminergic G protein-coupled receptors (GPCRs) show the most promiscuous pharmacological profiles. The detection of cross-pharmacologies between aminergic GPCRs and the opioid, sigma, NMDA, and 5-HT3 receptors aggravate the potential promiscuity of those drugs, predominantly including analgesics, antidepressants, and antipsychotics.

  4. An Approach for Identifying Benefit Segments among Prospective College Students.

    ERIC Educational Resources Information Center

    Miller, Patrick; And Others

    1990-01-01

    A study investigated the importance to 578 applicants of various benefits offered by a moderately selective private university. Applicants rated the institution on 43 academic, social, financial, religious, and curricular attributes. The objective was to test the efficacy of one approach to college market segmentation. Results support the utility…

  5. Identifying Predictors of Physics Item Difficulty: A Linear Regression Approach

    ERIC Educational Resources Information Center

    Mesic, Vanes; Muratovic, Hasnija

    2011-01-01

    Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary…

  6. An Approach for Identifying Benefit Segments among Prospective College Students.

    ERIC Educational Resources Information Center

    Miller, Patrick; And Others

    1990-01-01

    A study investigated the importance to 578 applicants of various benefits offered by a moderately selective private university. Applicants rated the institution on 43 academic, social, financial, religious, and curricular attributes. The objective was to test the efficacy of one approach to college market segmentation. Results support the utility…

  7. Identifying Predictors of Physics Item Difficulty: A Linear Regression Approach

    ERIC Educational Resources Information Center

    Mesic, Vanes; Muratovic, Hasnija

    2011-01-01

    Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary…

  8. The adaptive approach for storage assignment by mining data of warehouse management system for distribution centres

    NASA Astrophysics Data System (ADS)

    Ming-Huang Chiang, David; Lin, Chia-Ping; Chen, Mu-Chen

    2011-05-01

    Among distribution centre operations, order picking has been reported to be the most labour-intensive activity. Sophisticated storage assignment policies adopted to reduce the travel distance of order picking have been explored in the literature. Unfortunately, previous research has been devoted to locating entire products from scratch. Instead, this study intends to propose an adaptive approach, a Data Mining-based Storage Assignment approach (DMSA), to find the optimal storage assignment for newly delivered products that need to be put away when there is vacant shelf space in a distribution centre. In the DMSA, a new association index (AIX) is developed to evaluate the fitness between the put away products and the unassigned storage locations by applying association rule mining. With AIX, the storage location assignment problem (SLAP) can be formulated and solved as a binary integer programming. To evaluate the performance of DMSA, a real-world order database of a distribution centre is obtained and used to compare the results from DMSA with a random assignment approach. It turns out that DMSA outperforms random assignment as the number of put away products and the proportion of put away products with high turnover rates increase.

  9. A Control Chart Approach for Representing and Mining Data Streams with Shape Based Similarity

    SciTech Connect

    Omitaomu, Olufemi A

    2014-01-01

    The mining of data streams for online condition monitoring is a challenging task in several domains including (electric) power grid system, intelligent manufacturing, and consumer science. Considering a power grid application in which thousands of sensors, called the phasor measurement units, are deployed on the power grid network to continuously collect streams of digital data for real-time situational awareness and system management. Depending on design, each sensor could stream between ten and sixty data samples per second. The myriad of sensory data captured could convey deeper insights about sequence of events in real-time and before major damages are done. However, the timely processing and analysis of these high-velocity and high-volume data streams is a challenge. Hence, a new data processing and transformation approach, based on the concept of control charts, for representing sequence of data streams from sensors is proposed. In addition, an application of the proposed approach for enhancing data mining tasks such as clustering using real-world power grid data streams is presented. The results indicate that the proposed approach is very efficient for data streams storage and manipulation.

  10. VALUING ACID MINE DRAINAGE REMEDIATION OF IMPAIRED WATERWAYS IN WEST VIRGINIA: A HEDONIC MODELING APPROACH

    EPA Science Inventory

    States with active and abandoned mines face large private and public costs to remediate damage to streams and rivers from acid mine drainage (AMD), the metal rich runoff flowing primarily from abandoned mines and surface deposits of mine waste. AMD can lower stream and river pH ...

  11. VALUING ACID MINE DRAINAGE REMEDIATION OF IMPAIRED WATERWAYS IN WEST VIRGINIA: A HEDONIC MODELING APPROACH

    EPA Science Inventory

    States with active and abandoned mines face large private and public costs to remediate damage to streams and rivers from acid mine drainage (AMD), the metal rich runoff flowing primarily from abandoned mines and surface deposits of mine waste. AMD can lower stream and river pH ...

  12. A Data Mining Approach for Exploring Correlates of Self-Reported Comparative Physical Activity Levels of Urban Latinos

    PubMed Central

    YOON, Sunmoo; CO, Manuel C.; SUERO-TEJEDA, Niurka; BAKKEN, Suzanne

    2017-01-01

    We applied data mining techniques to a community-based behavioral dataset to build prediction models to gain insights about physical activity levels as the foundation for future interventions for urban Latinos. Our application of data mining strategies identified environment factors including having a convenient location for physical activity and psychological factors including depression as the strongest correlates of self-reported comparative physical activity among hundreds of variables. The data mining methods were useful to build prediction models to gain insights about perceptions of physical activity behavior as compared to peers. PMID:27332262

  13. A Data Mining Approach for Exploring Correlates of Self-Reported Comparative Physical Activity Levels of Urban Latinos.

    PubMed

    Yoon, Sunmoo; Co, Manuel C; Suero-Tejeda, Niurka; Bakken, Suzanne

    2016-01-01

    We applied data mining techniques to a community-based behavioral dataset to build prediction models to gain insights about physical activity levels as the foundation for future interventions for urban Latinos. Our application of data mining strategies identified environment factors including having a convenient location for physical activity and psychological factors including depression as the strongest correlates of self-reported comparative physical activity among hundreds of variables. The data mining methods were useful to build prediction models to gain insights about perceptions of physical activity behavior as compared to peers.

  14. Coconut matting bezoar identified by a combined analytical approach.

    PubMed Central

    Levison, D A; Crocker, P R; Boxall, T A; Randall, K J

    1986-01-01

    A rare type of bezoar composed of coconut matting was found in the stomach of a caucasian man. The exact identity of the fibres was established by scanning electron microscopy, x-ray energy spectroscopy, and microscopic infrared spectroscopy. This report illustrates the importance of these techniques for identifying the nature of foreign material. Images PMID:3950038

  15. Enhanced approaches for identifying Amadori products: application to peanut allergens

    USDA-ARS?s Scientific Manuscript database

    The dry roasting of peanuts is suggested to influence allergenic sensitization due to formation of advanced glycation end products (AGE) on peanut proteins. Identifying AGEs is technically challenging. The AGE composition of peanut proteins was probed with nanoLC-ESI-MS and MS/MS analyses. Amadori ...

  16. Tiered High-Throughput Screening Approach to Identify ...

    EPA Pesticide Factsheets

    High-throughput screening (HTS) for potential thyroid–disrupting chemicals requires a system of assays to capture multiple molecular-initiating events (MIEs) that converge on perturbed thyroid hormone (TH) homeostasis. Screening for MIEs specific to TH-disrupting pathways is limited in the US EPA ToxCast screening assay portfolio. To fill one critical screening gap, the Amplex UltraRed-thyroperoxidase (AUR-TPO) assay was developed to identify chemicals that inhibit TPO, as decreased TPO activity reduces TH synthesis. The ToxCast Phase I and II chemical libraries, comprised of 1,074 unique chemicals, were initially screened using a single, high concentration to identify potential TPO inhibitors. Chemicals positive in the single concentration screen were retested in concentration-response. Due to high false positive rates typically observed with loss-of-signal assays such as AUR-TPO, we also employed two additional assays in parallel to identify possible sources of nonspecific assay signal loss, enabling stratification of roughly 300 putative TPO inhibitors based upon selective AUR-TPO activity. A cell-free luciferase inhibition assay was used to identify nonspecific enzyme inhibition among the putative TPO inhibitors, and a cytotoxicity assay using a human cell line was used to estimate the cellular tolerance limit. Additionally, the TPO inhibition activities of 150 chemicals were compared between the AUR-TPO and an orthogonal peroxidase oxidation assay using

  17. Identifying the "Truly Disadvantaged": A Comprehensive Biosocial Approach

    ERIC Educational Resources Information Center

    Barnes, J. C.; Beaver, Kevin M.; Connolly, Eric J.; Schwartz, Joseph A.

    2016-01-01

    There has been significant interest in examining the developmental factors that predispose individuals to chronic criminal offending. This body of research has identified some social-environmental risk factors as potentially important. At the same time, the research producing these results has generally failed to employ genetically sensitive…

  18. Genomic approaches to identifying transcriptional regulators of osteoblast differentiation

    NASA Technical Reports Server (NTRS)

    Stains, Joseph P.; Civitelli, Roberto

    2003-01-01

    Recent microarray studies of mouse and human osteoblast differentiation in vitro have identified novel transcription factors that may be important in the establishment and maintenance of differentiation. These findings help unravel the pattern of gene-expression changes that underly the complex process of bone formation.

  19. Identifying the Determinants of Chronic Absenteeism: A Bioecological Systems Approach

    ERIC Educational Resources Information Center

    Gottfried, Michael A.; Gee, Kevin A.

    2017-01-01

    Background/Context: Chronic school absenteeism is a pervasive problem across the US; in early education, it is most rampant in kindergarten and its consequences are particularly detrimental, often leading to poorer academic, behavioral and developmental outcomes later in life. Though prior empirical research has identified a broad range of…

  20. Genomic approaches to identifying transcriptional regulators of osteoblast differentiation

    NASA Technical Reports Server (NTRS)

    Stains, Joseph P.; Civitelli, Roberto

    2003-01-01

    Recent microarray studies of mouse and human osteoblast differentiation in vitro have identified novel transcription factors that may be important in the establishment and maintenance of differentiation. These findings help unravel the pattern of gene-expression changes that underly the complex process of bone formation.

  1. Identifying the "Truly Disadvantaged": A Comprehensive Biosocial Approach

    ERIC Educational Resources Information Center

    Barnes, J. C.; Beaver, Kevin M.; Connolly, Eric J.; Schwartz, Joseph A.

    2016-01-01

    There has been significant interest in examining the developmental factors that predispose individuals to chronic criminal offending. This body of research has identified some social-environmental risk factors as potentially important. At the same time, the research producing these results has generally failed to employ genetically sensitive…

  2. Systems Approaches to Identifying Gene Regulatory Networks in Plants

    PubMed Central

    Long, Terri A.; Brady, Siobhan M.; Benfey, Philip N.

    2009-01-01

    Complex gene regulatory networks are composed of genes, noncoding RNAs, proteins, metabolites, and signaling components. The availability of genome-wide mutagenesis libraries; large-scale transcriptome, proteome, and metabalome data sets; and new high-throughput methods that uncover protein interactions underscores the need for mathematical modeling techniques that better enable scientists to synthesize these large amounts of information and to understand the properties of these biological systems. Systems biology approaches can allow researchers to move beyond a reductionist approach and to both integrate and comprehend the interactions of multiple components within these systems. Descriptive and mathematical models for gene regulatory networks can reveal emergent properties of these plant systems. This review highlights methods that researchers are using to obtain large-scale data sets, and examples of gene regulatory networks modeled with these data. Emergent properties revealed by the use of these network models and perspectives on the future of systems biology are discussed. PMID:18616425

  3. A Computational Approach for Identifying Synergistic Drug Combinations

    PubMed Central

    Gayvert, Kaitlyn M.; Aly, Omar; Bosenberg, Marcus W.; Stern, David F.; Elemento, Olivier

    2017-01-01

    A promising alternative to address the problem of acquired drug resistance is to rely on combination therapies. Identification of the right combinations is often accomplished through trial and error, a labor and resource intensive process whose scale quickly escalates as more drugs can be combined. To address this problem, we present a broad computational approach for predicting synergistic combinations using easily obtainable single drug efficacy, no detailed mechanistic understanding of drug function, and limited drug combination testing. When applied to mutant BRAF melanoma, we found that our approach exhibited significant predictive power. Additionally, we validated previously untested synergy predictions involving anticancer molecules. As additional large combinatorial screens become available, this methodology could prove to be impactful for identification of drug synergy in context of other types of cancers. PMID:28085880

  4. Bioinformatic approaches to identifying and classifying Rab proteins.

    PubMed

    Diekmann, Yoan; Pereira-Leal, José B

    2015-01-01

    The bioinformatic annotation of Rab GTPases is important, for example, to understand the evolution of the endomembrane system. However, Rabs are particularly challenging for standard annotation pipelines because they are similar to other small GTPases and form a large family with many paralogous subfamilies. Here, we describe a bioinformatic annotation pipeline specifically tailored to Rab GTPases. It proceeds in two steps: first, Rabs are distinguished from other proteins based on GTPase-specific motifs, overall sequence similarity to other Rabs, and the occurrence of Rab-specific motifs. Second, Rabs are classified taking either a more accurate but slower phylogenetic approach or a slightly less accurate but much faster bioinformatic approach. All necessary steps can either be performed locally or using the referenced online tools. An implementation of a slightly more involved version of the pipeline presented here is available at RabDB.org.

  5. Systematic analysis of molecular mechanisms for HCC metastasis via text mining approach.

    PubMed

    Zhen, Cheng; Zhu, Caizhong; Chen, Haoyang; Xiong, Yiru; Tan, Junyuan; Chen, Dong; Li, Jin

    2017-02-21

    To systematically explore the molecular mechanism for hepatocellular carcinoma (HCC) metastasis and identify regulatory genes with text mining methods. Genes with highest frequencies and significant pathways related to HCC metastasis were listed. A handful of proteins such as EGFR, MDM2, TP53 and APP, were identified as hub nodes in PPI (protein-protein interaction) network. Compared with unique genes for HBV-HCCs, genes particular to HCV-HCCs were less, but may participate in more extensive signaling processes. VEGFA, PI3KCA, MAPK1, MMP9 and other genes may play important roles in multiple phenotypes of metastasis. Genes in abstracts of HCC-metastasis literatures were identified. Word frequency analysis, KEGG pathway and PPI network analysis were performed. Then co-occurrence analysis between genes and metastasis-related phenotypes were carried out. Text mining is effective for revealing potential regulators or pathways, but the purpose of it should be specific, and the combination of various methods will be more useful.

  6. A Data Mining Approach to Predict In Situ Detoxification Potential of Chlorinated Ethenes.

    PubMed

    Lee, Jaejin; Im, Jeongdae; Kim, Ungtae; Löffler, Frank E

    2016-05-17

    Despite advances in physicochemical remediation technologies, in situ bioremediation treatment based on Dehalococcoides mccartyi (Dhc) reductive dechlorination activity remains a cornerstone approach to remedy sites impacted with chlorinated ethenes. Selecting the best remedial strategy is challenging due to uncertainties and complexity associated with biological and geochemical factors influencing Dhc activity. Guidelines based on measurable biogeochemical parameters have been proposed, but contemporary efforts fall short of meaningfully integrating the available information. Extensive groundwater monitoring data sets have been collected for decades, but have not been systematically analyzed and used for developing tools to guide decision-making. In the present study, geochemical and microbial data sets collected from 35 wells at five contaminated sites were used to demonstrate that a data mining prediction model using the classification and regression tree (CART) algorithm can provide improved predictive understanding of a site's reductive dechlorination potential. The CART model successfully predicted the 3-month-ahead reductive dechlorination potential with 75.8% and 69.5% true positive rate (i.e., sensitivity) for the training set and the test set, respectively. The machine learning algorithm ranked parameters by relative importance for assessing in situ reductive dechlorination potential. The abundance of Dhc 16S rRNA genes, CH4, Fe(2+), NO3(-), NO2(-), and SO4(2-) concentrations, total organic carbon (TOC) amounts, and oxidation-reduction potential (ORP) displayed significant correlations (p < 0.01) with dechlorination potential, with NO3(-), NO2(-), and Fe(2+) concentrations exhibiting precedence over other parameters. Contrary to prior efforts, the power of data mining approaches lies in the ability to discern synergetic effects between multiple parameters that affect reductive dechlorination activity. Overall, these findings demonstrate that data mining

  7. Identifying high-cost patients using data mining techniques and a small set of non-trivial attributes.

    PubMed

    Izad Shenas, Seyed Abdolmotalleb; Raahemi, Bijan; Hossein Tekieh, Mohammad; Kuziemsky, Craig

    2014-10-01

    In this paper, we use data mining techniques, namely neural networks and decision trees, to build predictive models to identify very high-cost patients in the top 5 percentile among the general population. A large empirical dataset from the Medical Expenditure Panel Survey with 98,175 records was used in our study. After pre-processing, partitioning and balancing the data, the refined dataset of 31,704 records was modeled by Decision Trees (including C5.0 and CHAID), and Neural Networks. The performances of the models are analyzed using various measures including accuracy, G-mean, and Area under ROC curve. We concluded that the CHAID classifier returns the best G-mean and AUC measures for top performing predictive models ranging from 76% to 85%, and 0.812 to 0.942 units, respectively. We also identify a small set of 5 non-trivial attributes among a primary set of 66 attributes to identify the top 5% of the high cost population. The attributes are the individual׳s overall health perception, age, history of blood cholesterol check, history of physical/sensory/mental limitations, and history of colonic prevention measures. The small set of attributes are what we call non-trivial and does not include visits to care providers, doctors or hospitals, which are highly correlated with expenditures and does not offer new insight to the data. The results of this study can be used by healthcare data analysts, policy makers, insurer, and healthcare planners to improve the delivery of health services.

  8. Multidisciplinary approach to identify aquifer-peatland connectivity

    NASA Astrophysics Data System (ADS)

    Larocque, Marie; Pellerin, Stéphanie; Cloutier, Vincent; Ferlatte, Miryane; Munger, Julie; Quillet, Anne; Paniconi, Claudio

    2015-04-01

    In southern Quebec (Canada), wetlands sustain increasing pressures from agriculture, urban development, and peat exploitation. To protect both groundwater and ecosystems, it is important to be able to identify how, where, and to what extent shallow aquifers and wetlands are connected. This study focuses on peatlands which are especially abundant in Quebec. The objective of this research was to better understand aquifer-peatland connectivity and to identify easily measured indicators of this connectivity. Geomorphology, hydrogeochemistry, and vegetation were selected as key indicators of connectivity. Twelve peatland transects were instrumented and monitored in the Abitibi (slope peatlands associated with eskers) and Centre-du-Quebec (depression peatlands) regions of Quebec (Canada). Geomorphology, geology, water levels, water chemistry, and vegetation species were identified/measured on all transects. Flow conditions were simulated numerically on two typical transects. Results show that a majority of peatland transects receives groundwater from a shallow aquifer. In slope peatlands, groundwater flows through the organic deposits towards the peatland center. In depression peatlands, groundwater flows only 100-200 m within the peatland before being redirected through surface routes towards the outlet. Flow modeling and sensitivity analysis have identified that the thickness and hydraulic conductivity of permeable deposits close to the peatland and beneath the organic deposits influence flow directions within the peatland. Geochemical data have confirmed the usefulness of total dissolved solids (TDS) exceeding 14 mg/L as an indicator of the presence of groundwater within the peatland. Vegetation surveys have allowed the identification of species and groups of species that occur mostly when groundwater is present, for instance Carex limosa and Sphagnum russowii. Geomorphological conditions (slope or depression peatland), TDS, and vegetation can be measured

  9. Proteomics approach to identify biomarkers for upper gastrointestinal cancer.

    PubMed

    Harada, Kazuto; Mizrak Kaya, Dilsa; Shimodaira, Yusuke; Song, Shumei; Baba, Hideo; Ajani, Jaffer A

    2016-10-10

    The prognosis for patients with upper gastrointestinal cancers remains dismal despite the development of multimodality therapies that incorporate surgery, chemotherapy, and radiotherapy. Early diagnosis and personalized treatment should lead to better prognosis. Given the advances in proteomic technologies over the past decades, proteomics promises to be the most effective technique to identify novel diagnostics and therapeutic targets. Areas covered: For this review, keywords were searched in combination with "proteomics" and "gastric cancer" or "esophageal cancer" in PubMed. Studies that evaluated proteomics associated with upper gastrointestinal cancer were identified through reading, with several studies quoted at second hand. We summarize the proteomics involved in upper gastrointestinal cancer and discuss potential biomarkers and therapeutic targets. Expert commentary: In particular, the development of mass spectrometry has enabled detection of multiple proteins and peptides in more biological samples over a shorter time period and at lower cost than was previously possible. In addition, more sophisticated protein databases have allowed a wider variety of proteins in samples to be quantified. Novel biomarkers that have been identified by new proteomic technologies should be applied in a clinical setting.

  10. Data mining with molecular design rules identifies new class of dyes for dye-sensitised solar cells.

    PubMed

    Cole, Jacqueline M; Low, Kian Sing; Ozoe, Hiroaki; Stathi, Panagiota; Kitamura, Chitoshi; Kurata, Hiroyuki; Rudolf, Petra; Kawase, Takeshi

    2014-12-28

    A major deficit in suitable dyes is stifling progress in the dye-sensitised solar cell (DSC) industry. Materials discovery strategies have afforded numerous new dyes; yet, corresponding solution-based DSC device performance has little improved upon 11% efficiency, achieved using the N719 dye over two decades ago. Research on these dyes has nevertheless revealed relationships between the molecular structure of dyes and their associated DSC efficiency. Here, such structure-property relationships have been codified in the form of molecular dye design rules, which have been judiciously sequenced in an algorithm to enable large-scale data mining of dye structures with optimal DSC performance. This affords, for the first time, a DSC-specific dye-discovery strategy that predicts new classes of dyes from surveying a representative set of chemical space. A lead material from these predictions is experimentally validated, showing DSC efficiency that is comparable to many well-known organic dyes. This demonstrates the power of this approach.

  11. Implementation of Predictive Data Mining Techniques for Identifying Risk Factors of Early AVF Failure in Hemodialysis Patients

    PubMed Central

    Rezapour, Mohammad; Khavanin Zadeh, Morteza; Sepehri, Mohammad Mehdi

    2013-01-01

    Arteriovenous fistula (AVF) is an important vascular access for hemodialysis (HD) treatment but has 20–60% rate of early failure. Detecting association between patient's parameters and early AVF failure is important for reducing its prevalence and relevant costs. Also predicting incidence of this complication in new patients is a beneficial controlling procedure. Patient safety and preservation of early AVF failure is the ultimate goal. Our research society is Hasheminejad Kidney Center (HKC) of Tehran, which is one of Iran's largest renal hospitals. We analyzed data of 193 HD patients using supervised techniques of data mining approach. There were 137 male (70.98%) and 56 female (29.02%) patients introduced into this study. The average of age for all the patients was 53.87 ± 17.47 years. Twenty eight patients had smoked and the number of diabetic patients and nondiabetics was 87 and 106, respectively. A significant relationship was found between “diabetes mellitus,” “smoking,” and “hypertension” with early AVF failure in this study. We have found that these mentioned risk factors have important roles in outcome of vascular surgery, versus other parameters such as “age.” Then we predicted this complication in future AVF surgeries and evaluated our designed prediction methods with accuracy rates of 61.66%–75.13%. PMID:23861725

  12. Genetic heterogeneity of asthma phenotypes identified by a clustering approach.

    PubMed

    Siroux, Valérie; González, Juan R; Bouzigon, Emmanuelle; Curjuric, Ivan; Boudier, Anne; Imboden, Medea; Anto, Josep Maria; Gut, Ivo; Jarvis, Deborah; Lathrop, Mark; Omenaas, Ernst Reidar; Pin, Isabelle; Wjst, Mathias; Demenais, Florence; Probst-Hensch, Nicole; Kogevinas, Manolis; Kauffmann, Francine

    2014-02-01

    The aim of the study was to identify genetic variants associated with refined asthma phenotypes enabling multiple features of the disease to be taken into account. Latent class analysis (LCA) was applied in 3001 adults ever having asthma recruited in the frame of three epidemiological surveys (the European Community Respiratory Health Survey (ECRHS), the Swiss Study on Air Pollution and Lung Disease in Adults (SAPALDIA) and the Epidemiological Study on the Genetics and Environment of Asthma (EGEA)). 14 personal and phenotypic characteristics, gathered from questionnaires and clinical examination, were used. A genome-wide association study was conducted for each LCA-derived asthma phenotype, compared to subjects without asthma (n=3474). The LCA identified four adult asthma phenotypes, mainly characterised by disease activity, age of asthma onset and atopic status. Associations of genome-wide significance (p<1.25 × 10(-7)) were observed between "active adult-onset nonallergic asthma" and rs9851461 flanking CD200 (3q13.2) and between "inactive/mild nonallergic asthma" and rs2579931 flanking GRIK2 (6q16.3). Borderline significant results (2.5 × 10(-7) < p <8.2 × 10(-7)) were observed between three single nucleotide polymorphisms (SNPs) in the ALCAM region (3q13.11) and "active adult-onset nonallergic asthma". These results were consistent across studies. 15 SNPs identified in previous genome-wide association studies of asthma have been replicated with at least one asthma phenotype, most of them with the "active allergic asthma" phenotype. Our results provide evidence that a better understanding of asthma phenotypic heterogeneity helps to disentangle the genetic heterogeneity of asthma.

  13. Using Data Mining and Computational Approaches to Study Intermediate Filament Structure and Function.

    PubMed

    Parry, David A D

    2016-01-01

    Experimental and theoretical research aimed at determining the structure and function of the family of intermediate filament proteins has made significant advances over the past 20 years. Much of this has either contributed to or relied on the amino acid sequence databases that are now available online, and the data mining approaches that have been developed to analyze these sequences. As the quality of sequence data is generally high, it follows that it is the design of the computational and graphical methodologies that are of especial importance to researchers who aspire to gain a greater understanding of those sequence features that specify both function and structural hierarchy. However, these techniques are necessarily subject to limitations and it is important that these be recognized. In addition, no single method is likely to be successful in solving a particular problem, and a coordinated approach using a suite of methods is generally required. A final step in the process involves the interpretation of the results obtained and the construction of a working model or hypothesis that suggests further experimentation. While such methods allow meaningful progress to be made it is still important that the data are interpreted correctly and conservatively. New data mining methods are continually being developed, and it can be expected that even greater understanding of the relationship between structure and function will be gleaned from sequence data in the coming years. Copyright © 2016 Elsevier Inc. All rights reserved.

  14. A data mining approach to optimize pellets manufacturing process based on a decision tree algorithm.

    PubMed

    Ronowicz, Joanna; Thommes, Markus; Kleinebudde, Peter; Krysiński, Jerzy

    2015-06-20

    The present study is focused on the thorough analysis of cause-effect relationships between pellet formulation characteristics (pellet composition as well as process parameters) and the selected quality attribute of the final product. The shape using the aspect ratio value expressed the quality of pellets. A data matrix for chemometric analysis consisted of 224 pellet formulations performed by means of eight different active pharmaceutical ingredients and several various excipients, using different extrusion/spheronization process conditions. The data set contained 14 input variables (both formulation and process variables) and one output variable (pellet aspect ratio). A tree regression algorithm consistent with the Quality by Design concept was applied to obtain deeper understanding and knowledge of formulation and process parameters affecting the final pellet sphericity. The clear interpretable set of decision rules were generated. The spehronization speed, spheronization time, number of holes and water content of extrudate have been recognized as the key factors influencing pellet aspect ratio. The most spherical pellets were achieved by using a large number of holes during extrusion, a high spheronizer speed and longer time of spheronization. The described data mining approach enhances knowledge about pelletization process and simultaneously facilitates searching for the optimal process conditions which are necessary to achieve ideal spherical pellets, resulting in good flow characteristics. This data mining approach can be taken into consideration by industrial formulation scientists to support rational decision making in the field of pellets technology. Copyright © 2015 Elsevier B.V. All rights reserved.

  15. Timely approaches to identify probiotic species of the genus Lactobacillus

    PubMed Central

    2013-01-01

    Over the past decades the use of probiotics in food has increased largely due to the manufacturer’s interest in placing “healthy” food on the market based on the consumer’s ambitions to live healthy. Due to this trend, health benefits of products containing probiotic strains such as lactobacilli are promoted and probiotic strains have been established in many different products with their numbers increasing steadily. Probiotics are used as starter cultures in dairy products such as cheese or yoghurts and in addition they are also utilized in non-dairy products such as fermented vegetables, fermented meat and pharmaceuticals, thereby, covering a large variety of products. To assure quality management, several pheno-, physico- and genotyping methods have been established to unambiguously identify probiotic lactobacilli. These methods are often specific enough to identify the probiotic strains at genus and species levels. However, the probiotic ability is often strain dependent and it is impossible to distinguish strains by basic microbiological methods. Therefore, this review aims to critically summarize and evaluate conventional identification methods for the genus Lactobacillus, complemented by techniques that are currently being developed. PMID:24063519

  16. Proteomic Approach to Identify Nuclear Proteins in Wheat Grain.

    PubMed

    Bancel, Emmanuelle; Bonnot, Titouan; Davanture, Marlène; Branlard, Gérard; Zivy, Michel; Martre, Pierre

    2015-10-02

    The nuclear proteome of the grain of the two cultivated wheat species Triticum aestivum (hexaploid wheat; genomes A, B, and D) and T. monococcum (diploid wheat; genome A) was analyzed in two early stages of development using shotgun-based proteomics. A procedure was optimized to purify nuclei, and an improved protein sample preparation was developed to efficiently remove nonprotein substances (starch and nucleic acids). A total of 797 proteins corresponding to 528 unique proteins were identified, 36% of which were classified in functional groups related to DNA and RNA metabolism. A large number (107 proteins) of unknown functions and hypothetical proteins were also found. Some identified proteins may be multifunctional and may present multiple localizations. On the basis of the MS/MS analysis, 368 proteins were present in the two species, and in two stages of development, some qualitative differences between species and stages of development were also found. All of these data illustrate the dynamic function of the grain nucleus in the early stages of development.

  17. Targeted Approach to Identify Genetic Loci Associated with ...

    EPA Pesticide Factsheets

    Extreme tolerance to highly toxic dioxin-like contaminants (DLCs) has evolved independently and contemporaneously in (at least) four populations of Atlantic killifish (Fundulus heteroclitus). Surprisingly, the magnitude and phenotype of DLC tolerance is similar among these killifish populations that have adapted to varied, but highly contaminated urban/industrialized estuaries of the US Atlantic coast. We hypothesized that comparisons among tolerant populations and in contrast to their sensitive neighboring killifish might reveal genetic loci associated with DLC tolerance. Since the aryl hydrocarbon receptor (AHR) pathway partly or fully mediates DLC toxicity in vertebrates, we identified single nucleotide polymorphisms (SNPs) from 43 genes associated with the AHR to serve as targeted markers. Wild fish from the four highly tolerant killifish populations and four nearby sensitive populations were genotyped using 59 SNP markers. Consistent with other killifish population genetic analyses, our results revealed strong genetic differentiation among populations, consistent with isolation by distance models. Pairwise comparisons of nearby tolerant and sensitive populations revealed differentiation among these loci: AHR 1 and 2, cathepsin Z, the cytochrome P450s (CYP) 1A and 3A30, and the NADH ubiquinone oxidoreductase MLRQ subunit. By grouping tolerant versus sensitive populations, we also identified cytochrome P450 1A and the AHR2 loci as under selection, lend

  18. Multimodal Approach to Identifying Malingered Posttraumatic Stress Disorder: A Review

    PubMed Central

    Jabeen, Shagufta; Alam, Farzana

    2015-01-01

    The primary aim of this article is to aid clinicians in differentiating true posttraumatic stress disorder from malingered posttraumatic stress disorder. Posttraumatic stress disorder and malingering are defined, and prevalence rates are explored. Similarities and differences in diagnostic criteria between the fourth and fifth editions of the Diagnostic and Statistical Manual of Mental Disorders are described for posttraumatic stress disorder. Possible motivations for malingering posttraumatic stress disorder are discussed, and common characteristics of malingered posttraumatic stress disorder are described. A multimodal approach is described for evaluating posttraumatic stress disorder, including interview techniques, collection of collateral data, and psychometric and physiologic testing, that should allow clinicians to distinguish between those patients who are truly suffering from posttraumatic disorder and those who are malingering the illness. PMID:25852974

  19. Utilizing Soize's Approach to Identify Parameter and Model Uncertainties

    SciTech Connect

    Bonney, Matthew S.; Brake, Matthew Robert

    2014-10-01

    Quantifying uncertainty in model parameters is a challenging task for analysts. Soize has derived a method that is able to characterize both model and parameter uncertainty independently. This method is explained with the assumption that some experimental data is available, and is divided into seven steps. Monte Carlo analyses are performed to select the optimal dispersion variable to match the experimental data. Along with the nominal approach, an alternative distribution can be used along with corrections that can be utilized to expand the scope of this method. This method is one of a very few methods that can quantify uncertainty in the model form independently of the input parameters. Two examples are provided to illustrate the methodology, and example code is provided in the Appendix.

  20. Omics Approach to Identify Factors Involved in Brassica Disease Resistance.

    PubMed

    Francisco, Marta; Soengas, Pilar; Velasco, Pablo; Bhadauria, Vijai; Cartea, Maria E; Rodríguez, Victor M

    2016-01-01

    Understanding plant's defense mechanisms and their response to biotic stresses is of fundamental meaning for the development of resistant crop varieties and more productive agriculture. The Brassica genus involves a large variety of economically important species and cultivars used as vegetable source, oilseeds, forage and ornamental. Damage caused by pathogens attack affects negatively various aspects of plant growth, development, and crop productivity. Over the last few decades, advances in plant physiology, genetics, and molecular biology have greatly improved our understanding of plant responses to biotic stress conditions. In this regard, various 'omics' technologies enable qualitative and quantitative monitoring of the abundance of various biological molecules in a high-throughput manner, and thus allow determination of their variation between different biological states on a genomic scale. In this review, we have described advances in 'omic' tools (genomics, transcriptomics, proteomics and metabolomics) in the view of conventional and modern approaches being used to elucidate the molecular mechanisms that underlie Brassica disease resistance.

  1. Multimodal approach to identifying malingered posttraumatic stress disorder: a review.

    PubMed

    Ali, Shahid; Jabeen, Shagufta; Alam, Farzana

    2015-01-01

    The primary aim of this article is to aid clinicians in differentiating true posttraumatic stress disorder from malingered posttraumatic stress disorder. Posttraumatic stress disorder and malingering are defined, and prevalence rates are explored. Similarities and differences in diagnostic criteria between the fourth and fifth editions of the Diagnostic and Statistical Manual of Mental Disorders are described for posttraumatic stress disorder. Possible motivations for malingering posttraumatic stress disorder are discussed, and common characteristics of malingered posttraumatic stress disorder are described. A multimodal approach is described for evaluating posttraumatic stress disorder, including interview techniques, collection of collateral data, and psychometric and physiologic testing, that should allow clinicians to distinguish between those patients who are truly suffering from posttraumatic disorder and those who are malingering the illness.

  2. A comprehensive approach to identifying and authenticating botanical products.

    PubMed

    Smillie, T J; Khan, I A

    2010-02-01

    Whether they are being taken as dietary supplements by the general public or being evaluated in a clinical study, the authenticity of botanical products is a matter of paramount concern. Botanical specimens and the dietary supplements derived from them can vary in quality and in chemical constituent profiles because of a number of factors. Subtle variations in botanical specimens are known to have profound effects on the quality, efficacy, and safety of botanical dietary supplements and can potentially alter the results of clinical studies that rely on these materials. A complete array of authentication and evaluation tools can be utilized to provide a well-rounded scientific approach to the authentication of botanical products. It is vital that the authenticity of botanical supplements be established using appropriate analysis tools regardless of whether the end products are being considered for evaluation in clinical studies or are being developed for the consumer market.

  3. A genomics approach identifies senescence-specific gene expression regulation.

    PubMed

    Lackner, Daniel H; Hayashi, Makoto T; Cesare, Anthony J; Karlseder, Jan

    2014-10-01

    Replicative senescence is a fundamental tumor-suppressive mechanism triggered by telomere erosion that results in a permanent cell cycle arrest. To understand the impact of telomere shortening on gene expression, we analyzed the transcriptome of diploid human fibroblasts as they progressed toward and entered into senescence. We distinguished novel transcription regulation due to replicative senescence by comparing senescence-specific expression profiles to profiles from cells arrested by DNA damage or serum starvation. Only a small specific subset of genes was identified that was truly senescence-regulated and changes in gene expression were exacerbated from presenescent to senescent cells. The majority of gene expression regulation in replicative senescence was shown to occur due to telomere shortening, as exogenous telomerase activity reverted most of these changes.

  4. Identifying Hosts of Families of Viruses: A Machine Learning Approach

    PubMed Central

    Raj, Anil; Dewar, Michael; Palacios, Gustavo; Rabadan, Raul; Wiggins, Christopher H.

    2011-01-01

    Identifying emerging viral pathogens and characterizing their transmission is essential to developing effective public health measures in response to an epidemic. Phylogenetics, though currently the most popular tool used to characterize the likely host of a virus, can be ambiguous when studying species very distant to known species and when there is very little reliable sequence information available in the early stages of the outbreak of disease. Motivated by an existing framework for representing biological sequence information, we learn sparse, tree-structured models, built from decision rules based on subsequences, to predict viral hosts from protein sequence data using popular discriminative machine learning tools. Furthermore, the predictive motifs robustly selected by the learning algorithm are found to show strong host-specificity and occur in highly conserved regions of the viral proteome. PMID:22174744

  5. A genomics approach identifies senescence-specific gene expression regulation

    PubMed Central

    Lackner, Daniel H; Hayashi, Makoto T; Cesare, Anthony J; Karlseder, Jan

    2014-01-01

    Replicative senescence is a fundamental tumor-suppressive mechanism triggered by telomere erosion that results in a permanent cell cycle arrest. To understand the impact of telomere shortening on gene expression, we analyzed the transcriptome of diploid human fibroblasts as they progressed toward and entered into senescence. We distinguished novel transcription regulation due to replicative senescence by comparing senescence-specific expression profiles to profiles from cells arrested by DNA damage or serum starvation. Only a small specific subset of genes was identified that was truly senescence-regulated and changes in gene expression were exacerbated from presenescent to senescent cells. The majority of gene expression regulation in replicative senescence was shown to occur due to telomere shortening, as exogenous telomerase activity reverted most of these changes. PMID:24863242

  6. Enhanced approaches for identifying Amadori products: application to peanut allergens

    PubMed Central

    Johnson, Katina L.; Williams, Jason G.; Maleki, Soheila J.; Hurlburt, Barry K.; London, Robert E.; Mueller, Geoffrey A.

    2016-01-01

    The dry roasting of peanuts is suggested to influence allergenic sensitization due to formation of advanced glycation end products (AGE) on peanut proteins. Identifying AGEs is technically challenging. The AGEs of a peanut allergen were probed with nanoLC-ESI-MS and MS/MS analyses. Amadori product ions matched to expected peptides and yielded fragments that included a loss of 3 waters and HCHO. Due to the paucity of b- and y-ions in the MS/MS spectrum, standard search algorithms do not perform well. Reactions with isotopically labeled sugars confirmed that the peptides contained Amadori products. An algorithm was developed based upon information content (Shannon entropy) and the loss of water and HCHO. Results with test data show that the algorithm finds the correct spectra with high precision, reducing the time needed to manually inspect data. Computational and technical improvements allowed better identification of the chemical differences between modified and unmodified proteins. PMID:26811263

  7. Constraint-based control of boiler efficiency: A data-mining approach

    SciTech Connect

    Song, Z.; Kusiak, A.

    2007-02-15

    In this paper, a data-mining approach is used to develop a model for optimizing the efficiency of an electric-utility boiler subject to operating constraints. Selection of process variables to optimize combustion efficiency is discussed. The selected variables are critical for control of combustion efficiency of a coal-fired boiler in the presence of operating constraints. Two schemes of generating control settings and updating control variables are evaluated. One scheme is based on the controllable and noncontrollable variables. The second one incorporates response variables into the clustering process. The process control scheme based on the response variables produces the smallest variance of the target variable due to reduced coupling among the process variables. An industrial case study, and its implementation illustrate the control approach developed in this paper.

  8. A multitrophic approach to monitoring the effects of metal mining in otherwise pristine and ecologically sensitive rivers in northern Canada.

    PubMed

    Spencer, Paula; Bowman, Michelle F; Dubé, Monique G

    2008-07-01

    It is not known if current chemical and biological monitoring methods are appropriate for assessing the impacts of growing industrial development on ecologically sensitive northern waters. We used a multitrophic level approach to evaluate current monitoring methods and to determine whether metal-mining activities had affected 2 otherwise pristine rivers that flow into the South Nahanni River, Northwest Territories, a World Heritage Site. We compared upstream reference conditions in the rivers to sites downstream and further downstream of mines. The endpoints we evaluated included concentrations of metals in river water, sediments, and liver and flesh of slimy sculpin (Cottus cognatus); benthic algal and macroinvertebrate abundance, richness, diversity, and community composition; and various slimy sculpin measures, our sentinel forage fish species. Elevated concentrations of copper and iron in liver tissue of sculpin from the Flat River were associated with high concentrations of mine-derived iron in river water and copper in sediments that were above national guidelines. In addition, sites downstream of the mine on the Flat River had increased algal abundances and altered benthic macroinvertebrate communities, whereas the sites downstream of the mine on Prairie Creek had increased benthic macroinvertebrate taxa richness and improved sculpin condition. Biological differences in both rivers were consistent with mild enrichment of the rivers downstream of current and historical mining activity. We recommend that monitoring in these northern rivers focus on indicators in epilithon and benthic macroinvertebrate communities due to their responsiveness and as alternatives to lethal fish sampling in habitats with low fish abundance. We also recommend monitoring of metal burdens in periphyton and benthic invertebrates for assessment of exposure to mine effluent and causal association. Although the effects of mining activities on riverine biota currently are limited, our

  9. Citation Mining: Integrating Text Mining and Bibliometrics for Research User Profiling.

    ERIC Educational Resources Information Center

    Kostoff, Ronald N.; del Rio, J. Antonio; Humenik, James A.; Garcia, Esther Ofilia; Ramirez, Ana Maria

    2001-01-01

    Discusses the importance of identifying the users and impact of research, and describes an approach for identifying the pathways through which research can impact other research, technology development, and applications. Describes a study that used citation mining, an integration of citation bibliometrics and text mining, on articles from the…

  10. Citation Mining: Integrating Text Mining and Bibliometrics for Research User Profiling.

    ERIC Educational Resources Information Center

    Kostoff, Ronald N.; del Rio, J. Antonio; Humenik, James A.; Garcia, Esther Ofilia; Ramirez, Ana Maria

    2001-01-01

    Discusses the importance of identifying the users and impact of research, and describes an approach for identifying the pathways through which research can impact other research, technology development, and applications. Describes a study that used citation mining, an integration of citation bibliometrics and text mining, on articles from the…

  11. A proteomic approach to identify endosomal cargoes controlling cancer invasiveness

    PubMed Central

    Diaz-Vera, Jesica; Palmer, Sarah; Hernandez-Fernaud, Juan Ramon; Dornier, Emmanuel; Mitchell, Louise E.; Macpherson, Iain; Edwards, Joanne; Zanivan, Sara

    2017-01-01

    ABSTRACT We have previously shown that Rab17, a small GTPase associated with epithelial polarity, is specifically suppressed by ERK2 (also known as MAPK1) signalling to promote an invasive phenotype. However, the mechanisms through which Rab17 loss permits invasiveness, and the endosomal cargoes that are responsible for mediating this, are unknown. Using quantitative mass spectrometry-based proteomics, we have found that knockdown of Rab17 leads to a highly selective reduction in the cellular levels of a v-SNARE (Vamp8). Moreover, proteomics and immunofluorescence indicate that Vamp8 is associated with Rab17 at late endosomes. Reduced levels of Vamp8 promote transition between ductal carcinoma in situ (DCIS) and a more invasive phenotype. We developed an unbiased proteomic approach to elucidate the complement of receptors that redistributes between endosomes and the plasma membrane, and have pin-pointed neuropilin-2 (NRP2) as a key pro-invasive cargo of Rab17- and Vamp8-regulated trafficking. Indeed, reduced Rab17 or Vamp8 levels lead to increased mobilisation of NRP2-containing late endosomes and upregulated cell surface expression of NRP2. Finally, we show that NRP2 is required for the basement membrane disruption that accompanies the transition between DCIS and a more invasive phenotype. PMID:28062852

  12. A generalised approach for identifying influential data in hydrological modelling

    NASA Astrophysics Data System (ADS)

    Wright, David; Thyer, Mark; Westra, Seth; Renard, Benjamin; McInerney, David

    2017-04-01

    Influence diagnostics identify data points that have a disproportionate impact on model calibration, and are therefore useful to identify possible erroneous data points or scrutinise the sensitivity of the model results to a small portion of the overall calibration dataset. Case-deletion Cook's distance calculates influence; however, it has a large computational demand due to the requirement for recalibration of the model parameters for every data point in the calibration data. Regression based Cook's distance provides an approximation of case-deletion Cook's distance by combining two regression components for each observed data point: 1) the leverage which is used to assess the potential importance of individual observations, and 2) the standardised residuals. By combining these two components the regression based Cook's distance requires only a relatively small number of additional runs and is therefore an attractive alternative to the computationally demanding case-deletion Cook's distance. The objective of this study is to develop generalised regression based influence diagnostics that can be applied across a wide range models and objective functions in a computationally efficient manner. This overcomes the limitations of the current suite of influence diagnostics. For example, the regression based Cook's distance has two assumptions that are not satisfied in hydrological modelling: 1) the hydrological model is linear; 2) the objective function applied is limited to standard least squares. In addition, although the case-deletion diagnostics overcome these assumptions, they require high performance computing and therefore are not computationally feasible for most hydrological model applications. In this study we generalise regression based Cook's distance to be applied beyond linear models and to the vast majority of objective functions currently applied in hydrological model calibration. The improvements from the new formulation are then examined by comparing

  13. Correlation approach to identify coding regions in DNA sequences

    NASA Technical Reports Server (NTRS)

    Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1994-01-01

    Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.

  14. Newer Approaches to Identify Potential Untoward Effects in Functional Foods.

    PubMed

    Marone, Palma Ann; Birkenbach, Victoria L; Hayes, A Wallace

    2016-01-01

    Globalization has greatly accelerated the numbers and variety of food and beverage products available worldwide. The exchange among greater numbers of countries, manufacturers, and products in the United States and worldwide has necessitated enhanced quality measures for nutritional products for larger populations increasingly reliant on functionality. These functional foods, those that provide benefit beyond basic nutrition, are increasingly being used for their potential to alleviate food insufficiency while enhancing quality and longevity of life. In the United States alone, a steady import increase of greater than 15% per year or 24 million shipments, over 70% products of which are food related, is regulated under the Food and Drug Administration (FDA). This unparalleled growth has resulted in the need for faster, cheaper, and better safety and efficacy screening methods in the form of harmonized guidelines and recommendations for product standardization. In an effort to meet this need, the in vitro toxicology testing market has similarly grown with an anticipatory 15% increase between 2010 and 2015 of US$1.3 to US$2.7 billion. Although traditionally occupying a small fraction of the market behind pharmaceuticals and cosmetic/household products, the scope of functional food testing, including additives/supplements, ingredients, residues, contact/processing, and contaminants, is potentially expansive. Similarly, as functional food testing has progressed, so has the need to identify potential adverse factors that threaten the safety and quality of these products.

  15. Phenotypic Approaches to Identify Inhibitors of B Cell Activation

    PubMed Central

    Kim, Suzie; Wiener, Jake; Rao, Navin L.; Milla, Marcos E.; DiSepio, Daniel

    2015-01-01

    An EPIC label-free phenotypic platform was developed to explore B cell receptor (BCR) and CD40R-mediated B cell activation. The phenotypic assay measured the association of RL non-Hodgkin’s lymphoma B cells expressing lymphocyte function-associated antigen 1 (LFA-1) to intercellular adhesion molecule 1 (ICAM-1)-coated EPIC plates. Anti-IgM (immunoglobulin M) mediated BCR activation elicited a response that was blocked by LFA-1/ICAM-1 specific inhibitors and a panel of Bruton’s tyrosine kinase (BTK) inhibitors. LFA-1/ICAM-1 association was further increased on coapplication of anti-IgM and mega CD40L when compared to individual application of either. Anti-IgM, mega CD40L, or the combination of both displayed distinct kinetic profiles that were inhibited by treatment with a BTK inhibitor. We also established a FLIPR-based assay to measure B cell activation in Ramos Burkitt’s lymphoma B cells and an RL cell line. Anti-IgM-mediated BCR activation elicited a robust calcium response that was inhibited by a panel of BTK inhibitors. Conversely, CD40R activation did not elicit a calcium response in the FLIPR assay. Compared to the FLIPR, the EPIC assay has the propensity to identify inhibitors of both BCR and CD40R-mediated B cell activation and may provide more pharmacological depth or novel mechanisms of action for inhibition of B cell activation. PMID:25948491

  16. Identifying problematic concepts in SNOMED CT using a lexical approach.

    PubMed

    Agrawal, Ankur; Perl, Yehoshua; Elhanan, Gai

    2013-01-01

    SNOMED CT (SCT) has been endorsed as a premier clinical terminology by many organizations with a perceived use within electronic health records and clinical information systems. However, there are indications that, at the moment, SCT is not optimally structured for its intended use by healthcare practitioners. A study is conducted to investigate the extent of inconsistencies among the concepts in SCT. A group auditing technique to improve the quality of SCT is introduced that can help identify problematic concepts with a high probability. Positional similarity sets are defined, which are groups of concepts that are lexically similar and the position of the differing word in the fully specified name of the concepts of a set that correspond to each other. A manual auditing of a sample of such sets found 38% of the sets exhibiting one or more inconsistent concepts. Group auditing techniques such as this can thus be very helpful to assure the quality of SCT, which will help expedite its adoption as a reference terminology for clinical purposes.

  17. Correlation approach to identify coding regions in DNA sequences

    NASA Technical Reports Server (NTRS)

    Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1994-01-01

    Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.

  18. Configurational approach to identifying the earliest hominin butchers

    PubMed Central

    Domínguez-Rodrigo, Manuel; Pickering, Travis Rayne; Bunn, Henry T.

    2010-01-01

    The announcement of two approximately 3.4-million-y-old purportedly butchered fossil bones from the Dikika paleoanthropological research area (Lower Awash Valley, Ethiopia) could profoundly alter our understanding of human evolution. Butchering damage on the Dikika bones would imply that tool-assisted meat-eating began approximately 800,000 y before previously thought, based on butchered bones from 2.6- to 2.5-million-y-old sites at the Ethiopian Gona and Bouri localities. Further, the only hominin currently known from Dikika at approximately 3.4 Ma is Australopithecus afarensis, a temporally and geographically widespread species unassociated previously with any archaeological evidence of butchering. Our taphonomic configurational approach to assess the claims of A. afarensis butchery at Dikika suggests the claims of unexpectedly early butchering at the site are not warranted. The Dikika research group focused its analysis on the morphology of the marks in question but failed to demonstrate, through recovery of similarly marked in situ fossils, the exact provenience of the published fossils, and failed to note occurrences of random striae on the cortices of the published fossils (incurred through incidental movement of the defleshed specimens across and/or within their abrasive encasing sediments). The occurrence of such random striae (sometimes called collectively “trampling” damage) on the two fossils provide the configurational context for rejection of the claimed butchery marks. The earliest best evidence for hominin butchery thus remains at 2.6 to 2.5 Ma, presumably associated with more derived species than A. afarensis. PMID:21078985

  19. A landscape ecology approach identifies important drivers of urban biodiversity.

    PubMed

    Turrini, Tabea; Knop, Eva

    2015-04-01

    Cities are growing rapidly worldwide, yet a mechanistic understanding of the impact of urbanization on biodiversity is lacking. We assessed the impact of urbanization on arthropod diversity (species richness and evenness) and abundance in a study of six cities and nearby intensively managed agricultural areas. Within the urban ecosystem, we disentangled the relative importance of two key landscape factors affecting biodiversity, namely the amount of vegetated area and patch isolation. To do so, we a priori selected sites that independently varied in the amount of vegetated area in the surrounding landscape at the 500-m scale and patch isolation at the 100-m scale, and we hold local patch characteristics constant. As indicator groups, we used bugs, beetles, leafhoppers, and spiders. Compared to intensively managed agricultural ecosystems, urban ecosystems supported a higher abundance of most indicator groups, a higher number of bug species, and a lower evenness of bug and beetle species. Within cities, a high amount of vegetated area increased species richness and abundance of most arthropod groups, whereas evenness showed no clear pattern. Patch isolation played only a limited role in urban ecosystems, which contrasts findings from agro-ecological studies. Our results show that urban areas can harbor a similar arthropod diversity and abundance compared to intensively managed agricultural ecosystems. Further, negative consequences of urbanization on arthropod diversity can be mitigated by providing sufficient vegetated space in the urban area, while patch connectivity is less important in an urban context. This highlights the need for applying a landscape ecological approach to understand the mechanisms shaping urban biodiversity and underlines the potential of appropriate urban planning for mitigating biodiversity loss.

  20. Configurational approach to identifying the earliest hominin butchers.

    PubMed

    Domínguez-Rodrigo, Manuel; Pickering, Travis Rayne; Bunn, Henry T

    2010-12-07

    The announcement of two approximately 3.4-million-y-old purportedly butchered fossil bones from the Dikika paleoanthropological research area (Lower Awash Valley, Ethiopia) could profoundly alter our understanding of human evolution. Butchering damage on the Dikika bones would imply that tool-assisted meat-eating began approximately 800,000 y before previously thought, based on butchered bones from 2.6- to 2.5-million-y-old sites at the Ethiopian Gona and Bouri localities. Further, the only hominin currently known from Dikika at approximately 3.4 Ma is Australopithecus afarensis, a temporally and geographically widespread species unassociated previously with any archaeological evidence of butchering. Our taphonomic configurational approach to assess the claims of A. afarensis butchery at Dikika suggests the claims of unexpectedly early butchering at the site are not warranted. The Dikika research group focused its analysis on the morphology of the marks in question but failed to demonstrate, through recovery of similarly marked in situ fossils, the exact provenience of the published fossils, and failed to note occurrences of random striae on the cortices of the published fossils (incurred through incidental movement of the defleshed specimens across and/or within their abrasive encasing sediments). The occurrence of such random striae (sometimes called collectively "trampling" damage) on the two fossils provide the configurational context for rejection of the claimed butchery marks. The earliest best evidence for hominin butchery thus remains at 2.6 to 2.5 Ma, presumably associated with more derived species than A. afarensis.

  1. A Bayesian Approach to Identifying New Risk Factors for Dementia

    PubMed Central

    Wen, Yen-Hsia; Wu, Shihn-Sheng; Lin, Chun-Hung Richard; Tsai, Jui-Hsiu; Yang, Pinchen; Chang, Yang-Pei; Tseng, Kuan-Hua

    2016-01-01

    Abstract Dementia is one of the most disabling and burdensome health conditions worldwide. In this study, we identified new potential risk factors for dementia from nationwide longitudinal population-based data by using Bayesian statistics. We first tested the consistency of the results obtained using Bayesian statistics with those obtained using classical frequentist probability for 4 recognized risk factors for dementia, namely severe head injury, depression, diabetes mellitus, and vascular diseases. Then, we used Bayesian statistics to verify 2 new potential risk factors for dementia, namely hearing loss and senile cataract, determined from the Taiwan's National Health Insurance Research Database. We included a total of 6546 (6.0%) patients diagnosed with dementia. We observed older age, female sex, and lower income as independent risk factors for dementia. Moreover, we verified the 4 recognized risk factors for dementia in the older Taiwanese population; their odds ratios (ORs) ranged from 3.469 to 1.207. Furthermore, we observed that hearing loss (OR = 1.577) and senile cataract (OR = 1.549) were associated with an increased risk of dementia. We found that the results obtained using Bayesian statistics for assessing risk factors for dementia, such as head injury, depression, DM, and vascular diseases, were consistent with those obtained using classical frequentist probability. Moreover, hearing loss and senile cataract were found to be potential risk factors for dementia in the older Taiwanese population. Bayesian statistics could help clinicians explore other potential risk factors for dementia and for developing appropriate treatment strategies for these patients. PMID:27227925

  2. A normalized lexical lookup approach to identifying UMLS concepts in free text.

    PubMed

    Bashyam, Vijayaraghavan; Divita, Guy; Bennett, David B; Browne, Allen C; Taira, Ricky K

    2007-01-01

    The National Library of Medicine has developed a tool to identify medical concepts from the Unified Medical Language System in free text. This tool - MetaMap (and its java version MMTx) has been used extensively for biomedical text mining applications. We have developed a module for MetaMap which has a high performance in terms of processing speed. We evaluated our module independently against MetaMap for the task of identifying UMLS concepts in free text clinical radiology reports. A set of 1000 sentences from neuro-radiology reports were collected and processed using our technique and the MMTx Program. An evaluation showed that our technique was able to identify 91% of the concepts found by MMTx in 14% of the time taken by MMTx. An error analysis showed that the missing concepts were largely those which were not direct lexical matches but inferential matches of multiple concepts. Our method also identified multi-phrase concepts which MMTx failed to identify. We suggest that this module be implemented as an option in MMTx for real-time text mining applications where single concepts found in the UMLS need to be identified.

  3. Order Batching in Warehouses by Minimizing Total Tardiness: A Hybrid Approach of Weighted Association Rule Mining and Genetic Algorithms

    PubMed Central

    Taheri, Shahrooz; Mat Saman, Muhamad Zameri; Wong, Kuan Yew

    2013-01-01

    One of the cost-intensive issues in managing warehouses is the order picking problem which deals with the retrieval of items from their storage locations in order to meet customer requests. Many solution approaches have been proposed in order to minimize traveling distance in the process of order picking. However, in practice, customer orders have to be completed by certain due dates in order to avoid tardiness which is neglected in most of the related scientific papers. Consequently, we proposed a novel solution approach in order to minimize tardiness which consists of four phases. First of all, weighted association rule mining has been used to calculate associations between orders with respect to their due date. Next, a batching model based on binary integer programming has been formulated to maximize the associations between orders within each batch. Subsequently, the order picking phase will come up which used a Genetic Algorithm integrated with the Traveling Salesman Problem in order to identify the most suitable travel path. Finally, the Genetic Algorithm has been applied for sequencing the constructed batches in order to minimize tardiness. Illustrative examples and comparisons are presented to demonstrate the proficiency and solution quality of the proposed approach. PMID:23864823

  4. Order batching in warehouses by minimizing total tardiness: a hybrid approach of weighted association rule mining and genetic algorithms.

    PubMed

    Azadnia, Amir Hossein; Taheri, Shahrooz; Ghadimi, Pezhman; Saman, Muhamad Zameri Mat; Wong, Kuan Yew

    2013-01-01

    One of the cost-intensive issues in managing warehouses is the order picking problem which deals with the retrieval of items from their storage locations in order to meet customer requests. Many solution approaches have been proposed in order to minimize traveling distance in the process of order picking. However, in practice, customer orders have to be completed by certain due dates in order to avoid tardiness which is neglected in most of the related scientific papers. Consequently, we proposed a novel solution approach in order to minimize tardiness which consists of four phases. First of all, weighted association rule mining has been used to calculate associations between orders with respect to their due date. Next, a batching model based on binary integer programming has been formulated to maximize the associations between orders within each batch. Subsequently, the order picking phase will come up which used a Genetic Algorithm integrated with the Traveling Salesman Problem in order to identify the most suitable travel path. Finally, the Genetic Algorithm has been applied for sequencing the constructed batches in order to minimize tardiness. Illustrative examples and comparisons are presented to demonstrate the proficiency and solution quality of the proposed approach.

  5. An Approach to Identify Site Response Directivity of Accelerometer Sites and Application to the Iranian Area

    NASA Astrophysics Data System (ADS)

    Del Gaudio, Vincenzo; Pierri, Pierpaolo; Rajabi, Ali M.

    2015-06-01

    In recent years, several workers have found numerous cases of sites characterised by significant azimuthal variation of dynamic response to seismic shaking. The causes of this phenomenon are still unclear, but are possibly related to combinations of geological and geomorphological factors determining a polarisation of resonance effects. To improve their comprehension, it would be desirable to extend the database of observations on this phenomenon. Thus, considering that unrevealed cases of site response directivity can be "hidden" among the sites of accelerometer networks, we developed a two-stage approach of data mining from existing strong motion databases to identify sites affected by directional amplification. The proposed procedure first calculates Arias Intensity tensor components from accelerometer recordings of each site to determine mean directional variations of total shaking energy. Then, at the sites where a significant anisotropy appears in ground motion, azimuthal variations of HVSR values (spectral ratios between horizontal and vertical components of recordings) are analysed to confirm the occurrence of site resonance conditions. We applied this technique to a database of recordings acquired by accelerometer stations in the Iranian area. The results of this investigation pointed out some sites affected by directional resonance that appear to be correlated to the orientation of local tectonic lineaments, these being mostly transversal to the direction of maximum shaking. Comparing Arias Intensities observed at these sites with theoretical estimates provided by ground motion prediction equations, the presence of significant site amplifications was confirmed. The magnitude of the amplification factors appear to be correlated to the results of HVSR analysis, even though the pattern of dispersion of HVSR values suggests that while high peak values of spectral ratios are indicative of strong amplifications, lower values do not necessarily imply lower

  6. Assessing the pollution potential of non-point mine wastes on surface water using a geo-spatial modeling approach

    NASA Astrophysics Data System (ADS)

    Xiao, Huaguo

    Abandoned mine lands (or inactive and abandoned mines) have received increasing concerns because they may cause severe environmental and public health problems. Most of previous studies to characterize mine waste pollution potential were focused on screening-level investigations. The issues related to pollution potential of mine waste were poorly addressed from the perspective of non-point source pollution, and few efforts have been made to study the effect of spatial characteristics of mine wastes on water quality using spatial technology such as GIS, remote sensing and spatial modeling. This research develops a geo-spatial approach to assessing mine waste pollution on surface water, which integrates GIS, remote sensing and watershed modeling techniques in order to effectively address the effects of spatial characteristics of pollutants. The study area is Tri-State Mining District which is located in the conjunction of Missouri, Kansas and Okalahoma. This district was the most important lead and zinc mining area in U.S. The historic mining left behind a huge area of mine wastes. Satellite remote sensing data (Landsat MSS and TM) were acquired, processed and classified in a decadal interval to generate land use/land cover (LULC) data for the entire district. Watersheds within the district were delineated by using USGS DEM data and a newly-developed GIS tool. Water quality indicators were selected and relevant water quality data between 1970 and 2002 was retrieved from USGS and USEPA databases. With the classified LULC data as a data source, landscape metrics (composition and spatial configuration indices) for each water quality station in mine waste-located watersheds were calculated. Statistical analyses were performed to quantify the relationship between landscape and surface water quality and to evaluate the impacts of landscape characteristics on surface water quality. Related GIS data layers were then created and a cell-based watershed modeling was conducted

  7. Identifying and overcoming the constraints that prevent the full implementation of decommissioning and remediation programs in uranium mining sites.

    PubMed

    Franklin, Mariza Ramalho; Fernandes, Horst Monken

    2013-05-01

    Environmental remediation of radioactive contamination is about achieving appropriate reduction of exposures to ionizing radiation. This goal can be achieved by means of isolation or removal of the contamination source(s) or by breaking the exposure pathways. Ideally, environmental remediation is part of the planning phase of any industrial operation with the potential to cause environmental contamination. This concept is even more important in mining operations due to the significant impacts produced. This approach has not been considered in several operations developed in the past. Therefore many legacy sites face the challenge to implement appropriate remediation plans. One of the first barriers to remediation works is the lack of financial resources as environmental issues used to be taken in the past as marginal costs and were not included in the overall budget of the company. This paper analyses the situation of the former uranium production site of Poços de Caldas in Brazil. It is demonstrated that in addition to the lack of resources, other barriers such as the lack of information on site characteristics, appropriate regulatory framework, funding mechanisms, stakeholder involvement, policy and strategy, technical experience and mechanism for the appropriation of adequate technical expertise will play key roles in preventing the implementation of remediation programs. All these barriers are discussed and some solutions are suggested. It is expected that lessons learned from the Poços de Caldas legacy site may stimulate advancement of more sustainable options in the development of future uranium production centers. Copyright © 2011 Elsevier Ltd. All rights reserved.

  8. Development and testing of a text-mining approach to analyse patients' comments on their experiences of colorectal cancer care.

    PubMed

    Wagland, Richard; Recio-Saucedo, Alejandra; Simon, Michael; Bracher, Michael; Hunt, Katherine; Foster, Claire; Downing, Amy; Glaser, Adam; Corner, Jessica

    2016-08-01

    Quality of cancer care may greatly impact on patients' health-related quality of life (HRQoL). Free-text responses to patient-reported outcome measures (PROMs) provide rich data but analysis is time and resource-intensive. This study developed and tested a learning-based text-mining approach to facilitate analysis of patients' experiences of care and develop an explanatory model illustrating impact on HRQoL. Respondents to a population-based survey of colorectal cancer survivors provided free-text comments regarding their experience of living with and beyond cancer. An existing coding framework was tested and adapted, which informed learning-based text mining of the data. Machine-learning algorithms were trained to identify comments relating to patients' specific experiences of service quality, which were verified by manual qualitative analysis. Comparisons between coded retrieved comments and a HRQoL measure (EQ5D) were explored. The survey response rate was 63.3% (21 802/34 467), of which 25.8% (n=5634) participants provided free-text comments. Of retrieved comments on experiences of care (n=1688), over half (n=1045, 62%) described positive care experiences. Most negative experiences concerned a lack of post-treatment care (n=191, 11% of retrieved comments) and insufficient information concerning self-management strategies (n=135, 8%) or treatment side effects (n=160, 9%). Associations existed between HRQoL scores and coded algorithm-retrieved comments. Analysis indicated that the mechanism by which service quality impacted on HRQoL was the extent to which services prevented or alleviated challenges associated with disease and treatment burdens. Learning-based text mining techniques were found useful and practical tools to identify specific free-text comments within a large dataset, facilitating resource-efficient qualitative analysis. This method should be considered for future PROM analysis to inform policy and practice. Study findings indicated that

  9. The Forestry Reclamation Approach: guide to successful reforestation of mined lands

    Treesearch

    Mary Beth Adams

    2017-01-01

    Appalachian forests are among the most productive and diverse in the world. The land underlying them is also rich in coal, and surface mines operated on more than 2.4 million acres in the region from 1977, when the federal Surface Mining Control and Reclamation Act was passed, through 2015. Many efforts to reclaim mined lands most often resulted in the establishment of...

  10. Use of lead isotopes to identify sources of metal and metalloid contaminants in atmospheric aerosol from mining operations.

    PubMed

    Félix, Omar I; Csavina, Janae; Field, Jason; Rine, Kyle P; Sáez, A Eduardo; Betterton, Eric A

    2015-03-01

    Mining operations are a potential source of metal and metalloid contamination by atmospheric particulate generated from smelting activities, as well as from erosion of mine tailings. In this work, we show how lead isotopes can be used for source apportionment of metal and metalloid contaminants from the site of an active copper mine. Analysis of atmospheric aerosol shows two distinct isotopic signatures: one prevalent in fine particles (<1μm aerodynamic diameter) while the other corresponds to coarse particles as well as particles in all size ranges from a nearby urban environment. The lead isotopic ratios found in the fine particles are equal to those of the mine that provides the ore to the smelter. Topsoil samples at the mining site show concentrations of Pb and As decreasing with distance from the smelter. Isotopic ratios for the sample closest to the smelter (650m) and from topsoil at all sample locations, extending to more than 1km from the smelter, were similar to those found in fine particles in atmospheric dust. The results validate the use of lead isotope signatures for source apportionment of metal and metalloid contaminants transported by atmospheric particulate. Copyright © 2014 Elsevier Ltd. All rights reserved.

  11. Use of Lead Isotopes to Identify Sources of Metal and Metalloid Contaminants in Atmospheric Aerosol from Mining Operations

    PubMed Central

    Félix, Omar I.; Csavina, Janae; Field, Jason; Rine, Kyle P.; Sáez, A. Eduardo; Betterton, Eric A.

    2014-01-01

    Mining operations are a potential source of metal and metalloid contamination by atmospheric particulate generated from smelting activities, as well as from erosion of mine tailings. In this work, we show how lead isotopes can be used for source apportionment of metal and metalloid contaminants from the site of an active copper mine. Analysis of atmospheric aerosol shows two distinct isotopic signatures: one prevalent in fine particles (< 1 μm aerodynamic diameter) while the other corresponds to coarse particles as well as particles in all size ranges from a nearby urban environment. The lead isotopic ratios found in the fine particles are equal to those of the mine that provides the ore to the smelter. Topsoil samples at the mining site show concentrations of Pb and As decreasing with distance from the smelter. Isotopic ratios for the sample closest to the smelter (650 m) and from topsoil at all sample locations, extending to more than 1 km from the smelter, were similar to those found in fine particles in atmospheric dust. The results validate the use of lead isotope signatures for source apportionment of metal and metalloid contaminants transported by atmospheric particulate. PMID:25496740

  12. Using a Text-Mining Approach to Evaluate the Quality of Nursing Records.

    PubMed

    Chang, Hsiu-Mei; Chiou, Shwu-Fen; Liu, Hsiu-Yun; Yu, Hui-Chu

    2016-01-01

    Nursing records in Taiwan have been computerized, but their quality has rarely been discussed. Therefore, this study employed a text-mining approach and a cross-sectional retrospective research design to evaluate the quality of electronic nursing records at a medical center in Northern Taiwan. SAS Text Miner software Version 13.2 was employed to analyze unstructured nursing event records. The results show that SAS Text Miner is suitable for developing a textmining model for validating nursing records. The sensitivity of SAS Text Miner was approximately 0.94, and the specificity and accuracy were 0.99. Thus, SAS Text Miner software is an effective tool for auditing unstructured electronic nursing records.

  13. A data mining approach to predict in situ chlorinated ethene detoxification potential

    NASA Astrophysics Data System (ADS)

    Lee, J.; Im, J.; Kim, U.; Loeffler, F. E.

    2015-12-01

    Despite major advances in physicochemical remediation technologies, in situ biostimulation and bioaugmentation treatment aimed at stimulating Dehalococcoides mccartyi (Dhc) reductive dechlorination activity remains a cornerstone approach to remedy sites impacted with chlorinated ethenes. In practice, selecting the best remedial strategy is challenging due to uncertainties associated with the microbiology (e.g., presence and activity of Dhc) and geochemical factors influencing Dhc activity. Extensive groundwater datasets collected over decades of monitoring exist, but have not been systematically analyzed. In the present study, geochemical and microbial data sets collected from 35 wells at 5 contaminated sites were used to develop a predictive empirical model using a machine learning algorithm (i) to rank the relative importance of parameters that affect in situ reductive dechlorination potential, and (ii) to provide recommendations for selecting the optimal remediation strategy at a specific site. Classification and regression tree (CART) analysis was applied, and a representative classification tree model was developed that allowed short-term prediction of dechlorination potential. Indirect indicators for low dissolved oxygen (e.g., low NO3-and NO2-, high Fe2+ and CH4) were the most influential factors for predicting dechlorination potential, followed by total organic carbon content (TOC) and Dhc cell abundance. These findings indicate that machine learning-based data mining techniques applied to groundwater monitoring data can lead to the development of predictive groundwater remediation models. A major need for improving the predictive capabilities of the data mining approach is a curated, up-to-date and comprehensive collection of groundwater monitoring data.

  14. A Comparison of Educational Statistics and Data Mining Approaches to Identify Characteristics That Impact Online Learning

    ERIC Educational Resources Information Center

    Miller, L. Dee; Soh, Leen-Kiat; Samal, Ashok; Kupzyk, Kevin; Nugent, Gwen

    2015-01-01

    Learning objects (LOs) are important online resources for both learners and instructors and usage for LOs is growing. Automatic LO tracking collects large amounts of metadata about individual students as well as data aggregated across courses, learning objects, and other demographic characteristics (e.g. gender). The challenge becomes identifying…

  15. An Integrative data mining approach to identifying Adverse Outcome Pathway (AOP) Signatures

    EPA Science Inventory

    The Adverse Outcome Pathway (AOP) framework is a tool for making biological connections and summarizing key information across different levels of biological organization to connect biological perturbations at the molecular level to adverse outcomes for an individual or populatio...

  16. A Comparison of Educational Statistics and Data Mining Approaches to Identify Characteristics That Impact Online Learning

    ERIC Educational Resources Information Center

    Miller, L. Dee; Soh, Leen-Kiat; Samal, Ashok; Kupzyk, Kevin; Nugent, Gwen

    2015-01-01

    Learning objects (LOs) are important online resources for both learners and instructors and usage for LOs is growing. Automatic LO tracking collects large amounts of metadata about individual students as well as data aggregated across courses, learning objects, and other demographic characteristics (e.g. gender). The challenge becomes identifying…

  17. Three-dimensional organic Dirac-line materials due to nonsymmorphic symmetry: A data mining approach

    NASA Astrophysics Data System (ADS)

    Geilhufe, R. Matthias; Bouhon, Adrien; Borysov, Stanislav S.; Balatsky, Alexander V.

    2017-01-01

    A data mining study of electronic Kohn-Sham band structures was performed to identify Dirac materials within the Organic Materials Database. Out of that, the three-dimensional organic crystal 5,6-bis(trifluoromethyl)-2-methoxy-1 H -1,3-diazepine was found to host different Dirac-line nodes within the band structure. From a group theoretical analysis, it is possible to distinguish between Dirac-line nodes occurring due to twofold degenerate energy levels protected by the monoclinic crystalline symmetry and twofold degenerate accidental crossings protected by the topology of the electronic band structure. The obtained results can be generalized to all materials having the space group P 21/c (No. 14, C2h 5) by introducing three distinct topological classes.

  18. Multinomial modeling and an evaluation of common data-mining algorithms for identifying signals of disproportionate reporting in pharmacovigilance databases.

    PubMed

    Johnson, Kjell; Guo, Cen; Gosink, Mark; Wang, Vicky; Hauben, Manfred

    2012-12-01

    A principal objective of pharmacovigilance is to detect adverse drug reactions that are unknown or novel in terms of their clinical severity or frequency. One method is through inspection of spontaneous reporting system databases, which consist of millions of reports of patients experiencing adverse effects while taking one or more drugs. For such large databases, there is an increasing need for quantitative and automated screening tools to assist drug safety professionals in identifying drug-event combinations (DECs) worthy of further investigation. Existing algorithms can effectively identify problematic DECs when the frequencies are high. However these algorithms perform differently for low-frequency DECs. In this work, we provide a method based on the multinomial distribution that identifies signals of disproportionate reporting, especially for low-frequency combinations. In addition, we comprehensively compare the performance of commonly used algorithms with the new approach. Simulation results demonstrate the advantages of the proposed method, and analysis of the Adverse Event Reporting System data shows that the proposed method can help detect interesting signals. Furthermore, we suggest that these methods be used to identify DECs that occur significantly less frequently than expected, thus identifying potential alternative indications for these drugs. We provide an empirical example that demonstrates the importance of exploring underexpected DECs. Code to implement the proposed method is available in R on request from the corresponding authors. kjell@arboranalytics.com or Mark.M.Gosink@Pfizer.com Supplementary data are available at Bioinformatics online.

  19. The first step in the development of Text Mining technology for Cancer Risk Assessment: identifying and organizing scientific evidence in risk assessment literature.

    PubMed

    Korhonen, Anna; Silins, Ilona; Sun, Lin; Stenius, Ulla

    2009-09-22

    One of the most neglected areas of biomedical Text Mining (TM) is the development of systems based on carefully assessed user needs. We have recently investigated the user needs of an important task yet to be tackled by TM -- Cancer Risk Assessment (CRA). Here we take the first step towards the development of TM technology for the task: identifying and organizing the scientific evidence required for CRA in a taxonomy which is capable of supporting extensive data gathering from biomedical literature. The taxonomy is based on expert annotation of 1297 abstracts downloaded from relevant PubMed journals. It classifies 1742 unique keywords found in the corpus to 48 classes which specify core evidence required for CRA. We report promising results with inter-annotator agreement tests and automatic classification of PubMed abstracts to taxonomy classes. A simple user test is also reported in a near real-world CRA scenario which demonstrates along with other evaluation that the resources we have built are well-defined, accurate, and applicable in practice. We present our annotation guidelines and a tool which we have designed for expert annotation of PubMed abstracts. A corpus annotated for keywords and document relevance is also presented, along with the taxonomy which organizes the keywords into classes defining core evidence for CRA. As demonstrated by the evaluation, the materials we have constructed provide a good basis for classification of CRA literature along multiple dimensions. They can support current manual CRA as well as facilitate the development of an approach based on TM. We discuss extending the taxonomy further via manual and machine learning approaches and the subsequent steps required to develop TM technology for the needs of CRA.

  20. The first step in the development of text mining technology for cancer risk assessment: identifying and organizing scientific evidence in risk assessment literature

    PubMed Central

    Korhonen, Anna; Silins, Ilona; Sun, Lin; Stenius, Ulla

    2009-01-01

    Background One of the most neglected areas of biomedical Text Mining (TM) is the development of systems based on carefully assessed user needs. We have recently investigated the user needs of an important task yet to be tackled by TM -- Cancer Risk Assessment (CRA). Here we take the first step towards the development of TM technology for the task: identifying and organizing the scientific evidence required for CRA in a taxonomy which is capable of supporting extensive data gathering from biomedical literature. Results The taxonomy is based on expert annotation of 1297 abstracts downloaded from relevant PubMed journals. It classifies 1742 unique keywords found in the corpus to 48 classes which specify core evidence required for CRA. We report promising results with inter-annotator agreement tests and automatic classification of PubMed abstracts to taxonomy classes. A simple user test is also reported in a near real-world CRA scenario which demonstrates along with other evaluation that the resources we have built are well-defined, accurate, and applicable in practice. Conclusion We present our annotation guidelines and a tool which we have designed for expert annotation of PubMed abstracts. A corpus annotated for keywords and document relevance is also presented, along with the taxonomy which organizes the keywords into classes defining core evidence for CRA. As demonstrated by the evaluation, the materials we have constructed provide a good basis for classification of CRA literature along multiple dimensions. They can support current manual CRA as well as facilitate the development of an approach based on TM. We discuss extending the taxonomy further via manual and machine learning approaches and the subsequent steps required to develop TM technology for the needs of CRA. PMID:19772619

  1. Integrating Communication into Engineering Curricula: An Interdisciplinary Approach to Facilitating Transfer at New Mexico Institute of Mining and Technology

    ERIC Educational Resources Information Center

    Ford, Julie Dyke

    2012-01-01

    This program profile describes a new approach towards integrating communication within Mechanical Engineering curricula. The author, who holds a joint appointment between Technical Communication and Mechanical Engineering at New Mexico Institute of Mining and Technology, has been collaborating with Mechanical Engineering colleagues to establish a…

  2. An Approach to Developing Independent Learning and Non-Technical Skills Amongst Final Year Mining Engineering Students

    ERIC Educational Resources Information Center

    Knobbs, C. G.; Grayson, D. J.

    2012-01-01

    There is mounting evidence to show that engineers need more than technical skills to succeed in industry. This paper describes a curriculum innovation in which so-called "soft" skills, specifically inter-personal and intra-personal skills, were integrated into a final year mining engineering course. The instructional approach was…

  3. An Approach to Developing Independent Learning and Non-Technical Skills Amongst Final Year Mining Engineering Students

    ERIC Educational Resources Information Center

    Knobbs, C. G.; Grayson, D. J.

    2012-01-01

    There is mounting evidence to show that engineers need more than technical skills to succeed in industry. This paper describes a curriculum innovation in which so-called "soft" skills, specifically inter-personal and intra-personal skills, were integrated into a final year mining engineering course. The instructional approach was…

  4. GWA study data mining and independent replication identify cardiomyopathy-associated 5 (CMYA5) as a risk gene for schizophrenia.

    PubMed

    Chen, X; Lee, G; Maher, B S; Fanous, A H; Chen, J; Zhao, Z; Guo, A; van den Oord, E; Sullivan, P F; Shi, J; Levinson, D F; Gejman, P V; Sanders, A; Duan, J; Owen, M J; Craddock, N J; O'Donovan, M C; Blackman, J; Lewis, D; Kirov, G K; Qin, W; Schwab, S; Wildenauer, D; Chowdari, K; Nimgaonkar, V; Straub, R E; Weinberger, D R; O'Neill, F A; Walsh, D; Bronstein, M; Darvasi, A; Lencz, T; Malhotra, A K; Rujescu, D; Giegling, I; Werge, T; Hansen, T; Ingason, A; Nöethen, M M; Rietschel, M; Cichon, S; Djurovic, S; Andreassen, O A; Cantor, R M; Ophoff, R; Corvin, A; Morris, D W; Gill, M; Pato, C N; Pato, M T; Macedo, A; Gurling, H M D; McQuillin, A; Pimm, J; Hultman, C; Lichtenstein, P; Sklar, P; Purcell, S M; Scolnick, E; St Clair, D; Blackwood, D H R; Kendler, K S

    2011-11-01

    We conducted data-mining analyses using the Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) and molecular genetics of schizophrenia genome-wide association study supported by the genetic association information network (MGS-GAIN) schizophrenia data sets and performed bioinformatic prioritization for all the markers with P-values ≤0.05 in both data sets. In this process, we found that in the CMYA5 gene, there were two non-synonymous markers, rs3828611 and rs10043986, showing nominal significance in both the CATIE and MGS-GAIN samples. In a combined analysis of both the CATIE and MGS-GAIN samples, rs4704591 was identified as the most significant marker in the gene. Linkage disequilibrium analyses indicated that these markers were in low LD (3 828 611-rs10043986, r(2)=0.008; rs10043986-rs4704591, r(2)=0.204). In addition, CMYA5 was reported to be physically interacting with the DTNBP1 gene, a promising candidate for schizophrenia, suggesting that CMYA5 may be involved in the same biological pathway and process. On the basis of this information, we performed replication studies for these three single-nucleotide polymorphisms. The rs3828611 was found to have conflicting results in our Irish samples and was dropped out without further investigation. The other two markers were verified in 23 other independent data sets. In a meta-analysis of all 23 replication samples (family samples, 912 families with 4160 subjects; case-control samples, 11 380 cases and 15 021 controls), we found that both markers are significantly associated with schizophrenia (rs10043986, odds ratio (OR)=1.11, 95% confidence interval (CI)=1.04-1.18, P=8.2 × 10(-4) and rs4704591, OR=1.07, 95% CI=1.03-1.11, P=3.0 × 10(-4)). The results were also significant for the 22 Caucasian replication samples (rs10043986, OR=1.11, 95% CI=1.03-1.17, P=0.0026 and rs4704591, OR=1.07, 95% CI=1.02-1.11, P=0.0015). Furthermore, haplotype conditioned analyses indicated that the association

  5. GWA study data mining and independent replication identify cardiomyopathy-associated 5 (CMYA5) as a risk gene for schizophrenia

    PubMed Central

    Chen, X; Lee, G; Maher, BS; Fanous, AH; Chen, J; Zhao, Z; Guo, A; van den Oord, E; Sullivan, PF; Shi, J; Levinson, DF; Gejman, PV; Sanders, A; Duan, J; Owen, MJ; Craddock, NJ; O’Donovan, MC; Blackman, J; Lewis, D; Kirov, GK; Qin, W; Schwab, S; Wildenauer, D; Chowdari, K; Nimgaonkar, V; Straub, RE; Weinberger, DR; O’Neill, FA; Walsh, D; Bronstein, M; Darvasi, A; Lencz, T; Malhotra, AK; Rujescu, D; Giegling, I; Werge, T; Hansen, T; Ingason, A; Nöethen, MM; Rietschel, M; Cichon, S; Djurovic, S; Andreassen, OA; Cantor, RM; Ophoff, R; Corvin, A; Morris, DW; Gill, M; Pato, CN; Pato, MT; Macedo, A; Gurling, HMD; McQuillin, A; Pimm, J; Hultman, C; Lichtenstein, P; Sklar, P; Purcell, SM; Scolnick, E; St Clair, D; Blackwood, DHR; Kendler, KS

    2012-01-01

    We conducted data-mining analyses using the Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) and molecular genetics of schizophrenia genome-wide association study supported by the genetic association information network (MGS-GAIN) schizophrenia data sets and performed bioinformatic prioritization for all the markers with P-values ≤0.05 in both data sets. In this process, we found that in the CMYA5 gene, there were two non-synonymous markers, rs3828611 and rs10043986, showing nominal significance in both the CATIE and MGS-GAIN samples. In a combined analysis of both the CATIE and MGS-GAIN samples, rs4704591 was identified as the most significant marker in the gene. Linkage disequilibrium analyses indicated that these markers were in low LD (3 828 611–rs10043986, r2 = 0.008; rs10043986–rs4704591, r2 = 0.204). In addition, CMYA5 was reported to be physically interacting with the DTNBP1 gene, a promising candidate for schizophrenia, suggesting that CMYA5 may be involved in the same biological pathway and process. On the basis of this information, we performed replication studies for these three single-nucleotide polymorphisms. The rs3828611 was found to have conflicting results in our Irish samples and was dropped out without further investigation. The other two markers were verified in 23 other independent data sets. In a meta-analysis of all 23 replication samples (family samples, 912 families with 4160 subjects; case–control samples, 11 380 cases and 15 021 controls), we found that both markers are significantly associated with schizophrenia (rs10043986, odds ratio (OR) = 1.11, 95% confidence interval (CI) = 1.04–1.18, P = 8.2 × 10−4 and rs4704591, OR = 1.07, 95% CI = 1.03–1.11, P = 3.0 × 10−4). The results were also significant for the 22 Caucasian replication samples (rs10043986, OR = 1.11, 95% CI = 1.03–1.17, P = 0.0026 and rs4704591, OR = 1.07, 95% CI = 1.02–1.11, P = 0.0015). Furthermore, haplotype conditioned analyses

  6. The Voice of Chinese Health Consumers: A Text Mining Approach to Web-Based Physician Reviews

    PubMed Central

    Zhang, Kunpeng

    2016-01-01

    experience of finding doctors, doctors’ technical skills and bedside manner, general appreciation from patients, and description of various symptoms. Conclusions To the best of our knowledge, our work is the first study using an automated text-mining approach to analyze a large amount of unstructured textual data of Web-based physician reviews in China. Based on our analysis, we found that Chinese reviewers mainly concentrate on a few popular topics. This is consistent with the goal of Chinese online health platforms and demonstrates the health care focus in China’s health care system. Our text-mining approach reveals a new research area on how to use big data to help health care providers, health care administrators, and policy makers hear patient voices, target patient concerns, and improve the quality of care in this age of patient-centered care. Also, on the health care consumer side, our text mining technique helps patients make more informed decisions about which specialists to see without reading thousands of reviews, which is simply not feasible. In addition, our comparison analysis of Web-based physician reviews in China and the United States also indicates some cultural differences. PMID:27165558

  7. The Voice of Chinese Health Consumers: A Text Mining Approach to Web-Based Physician Reviews.

    PubMed

    Hao, Haijing; Zhang, Kunpeng

    2016-05-10

    skills and bedside manner, general appreciation from patients, and description of various symptoms. To the best of our knowledge, our work is the first study using an automated text-mining approach to analyze a large amount of unstructured textual data of Web-based physician reviews in China. Based on our analysis, we found that Chinese reviewers mainly concentrate on a few popular topics. This is consistent with the goal of Chinese online health platforms and demonstrates the health care focus in China's health care system. Our text-mining approach reveals a new research area on how to use big data to help health care providers, health care administrators, and policy makers hear patient voices, target patient concerns, and improve the quality of care in this age of patient-centered care. Also, on the health care consumer side, our text mining technique helps patients make more informed decisions about which specialists to see without reading thousands of reviews, which is simply not feasible. In addition, our comparison analysis of Web-based physician reviews in China and the United States also indicates some cultural differences.

  8. Model-based approach to the detection and classification of mines in sidescan sonar.

    PubMed

    Reed, Scott; Petillot, Yvan; Bell, Judith

    2004-01-10

    This paper presents a model-based approach to mine detection and classification by use of sidescan sonar. Advances in autonomous underwater vehicle technology have increased the interest in automatic target recognition systems in an effort to automate a process that is currently carried out by a human operator. Current automated systems generally require training and thus produce poor results when the test data set is different from the training set. This has led to research into unsupervised systems, which are able to cope with the large variability in conditions and terrains seen in sidescan imagery. The system presented in this paper first detects possible minelike objects using a Markov random field model, which operates well on noisy images, such as sidescan, and allows a priori information to be included through the use of priors. The highlight and shadow regions of the object are then extracted with a cooperating statistical snake, which assumes these regions are statistically separate from the background. Finally, a classification decision is made using Dempster-Shafer theory, where the extracted features are compared with synthetic realizations generated with a sidescan sonar simulator model. Results for the entire process are shown on real sidescan sonar data. Similarities between the sidescan sonar and synthetic aperture radar (SAR) imaging processes ensure that the approach outlined here could be made applied to SAR image analysis.

  9. Stream Response to Storm Events Downstream of Mine Tailings: Identifying Contaminant Sources Using Hydrograph Separation and Stream Chemistry

    NASA Astrophysics Data System (ADS)

    Holmes, J.; Renshaw, C. E.; Feng, X.

    2001-05-01

    Quantifying sources of contamination is paramount to good remediation plans at abandoned mine sites. We collected surface water samples from Copperas Brook, a second order stream draining over 16 ha (40 acres) of mine tailings from the abandoned Elizabeth Copper Mine in east central Vermont. Streamflow exhibits a rapid response to rain events. Hydrograph separations using oxygen isotopes consistently indicate considerably higher percentages of new water during rain events compared to a nearby control catchment and to other northeastern U.S. catchments. We attribute most of the new water to direct precipitation on low-infiltration hardpans at the base of the mine tailings, as well as to direct precipitation on to the stream channel itself. In stormflow, base cations (Ca, Mg, Na, K) are diluted, consistent with other studies. By contrast, heavy metal concentrations (Cu, Zn, Cd, Co) increase by up to an order of magnitude. Other studies have suggested that the increased metals in stormflow may be the result of rapid dissolution and transport of the soluble efflorescent sulfate minerals coating the hardpans. Copperas Brook could be highly susceptible to this process given the high percentage of new water in its stormflow. However, multiple regression of stormflow chemical source end-members shows that neither dissolved sulfur salts nor groundwater seeps from the major tailings pile are primarily responsible for the increased metals concentrations at this site. Rather, the majority of heavy metals derive from an isolated 2 ha (5 acres) tailings pile via a pathway that is not connected with the major tailings. This may have profound implications for prioritizing the remediation of this site.

  10. Occupational safety risk management in Australian mining.

    PubMed

    Joy, J

    2004-08-01

    In the past 15 years, there has been a major safety improvement in the Australian mining industry. Part of this change can be attributed to the development and application of risk assessment methods. These systematic, team-based techniques identify, assess and control unacceptable risks to people, assets, the environment and production. The outcomes have improved mine management systems. This paper discusses the risk assessment approach applied to equipment design and mining operations, as well as the specific risk assessment methodology. The paper also discusses the reactive side of risk management, incident and accident investigation. Systematic analytical methods have also been adopted by regulatory authorities and mining companies to investigate major losses.

  11. Optimizing data collection for public health decisions: a data mining approach

    PubMed Central

    2014-01-01

    Background Collecting data can be cumbersome and expensive. Lack of relevant, accurate and timely data for research to inform policy may negatively impact public health. The aim of this study was to test if the careful removal of items from two community nutrition surveys guided by a data mining technique called feature selection, can (a) identify a reduced dataset, while (b) not damaging the signal inside that data. Methods The Nutrition Environment Measures Surveys for stores (NEMS-S) and restaurants (NEMS-R) were completed on 885 retail food outlets in two counties in West Virginia between May and November of 2011. A reduced dataset was identified for each outlet type using feature selection. Coefficients from linear regression modeling were used to weight items in the reduced datasets. Weighted item values were summed with the error term to compute reduced item survey scores. Scores produced by the full survey were compared to the reduced item scores using a Wilcoxon rank-sum test. Results Feature selection identified 9 store and 16 restaurant survey items as significant predictors of the score produced from the full survey. The linear regression models built from the reduced feature sets had R2 values of 92% and 94% for restaurant and grocery store data, respectively. Conclusions While there are many potentially important variables in any domain, the most useful set may only be a small subset. The use of feature selection in the initial phase of data collection to identify the most influential variables may be a useful tool to greatly reduce the amount of data needed thereby reducing cost. PMID:24919484

  12. Standards-based metadata procedures for retrieving data for display or mining utilizing persistent (data-DOI) identifiers.

    PubMed

    Harvey, Matthew J; Mason, Nicholas J; McLean, Andrew; Rzepa, Henry S

    2015-01-01

    We describe three different procedures based on metadata standards for enabling automated retrieval of scientific data from digital repositories utilising the persistent identifier of the dataset with optional specification of the attributes of the data document such as filename or media type. The procedures are demonstrated using the JSmol molecular visualizer as a component of a web page and Avogadro as a stand-alone modelling program. We compare our methods for automated retrieval of data from a standards-compliant data repository with those currently in operation for a selection of existing molecular databases and repositories. Our methods illustrate the importance of adopting a standards-based approach of using metadata declarations to increase access to and discoverability of repository-based data. Graphical abstract.

  13. VALUING ACID MINE DRAINAGE REMEDIATION IN WEST VIRGINIA: A HEDONIC MODELING APPROACH

    EPA Science Inventory

    States with active and abandoned mines face large private and public costs to remediate damage to streams and rivers from acid mine drainage (AMD). Appalachian states have an especially large number of contaminated streams and rivers, and the USGS places AMD as the primary source...

  14. VALUING ACID MINE DRAINAGE REMEDIATION IN WEST VIRGINIA: A HEDONIC MODELING APPROACH INCORPORATING GEOGRAPHIC INFORMATION SYSTEMS

    EPA Science Inventory

    States with active and abandoned mines face large private and public costs to remediate damage to streams and rivers from acid mine drainage (AMD). Appalachian states have an especially large number of contaminated streams and rivers, and the USGS places AMD as the primary source...

  15. A Data Mining Approach to Reveal Representative Collaboration Indicators in Open Collaboration Frameworks

    ERIC Educational Resources Information Center

    Anaya, Antonio R.; Boticario, Jesus G.

    2009-01-01

    Data mining methods are successful in educational environments to discover new knowledge or learner skills or features. Unfortunately, they have not been used in depth with collaboration. We have developed a scalable data mining method, whose objective is to infer information on the collaboration during the collaboration process in a…

  16. VALUING ACID MINE DRAINAGE REMEDIATION IN WEST VIRGINIA: A HEDONIC MODELING APPROACH

    EPA Science Inventory

    States with active and abandoned mines face large private and public costs to remediate damage to streams and rivers from acid mine drainage (AMD). Appalachian states have an especially large number of contaminated streams and rivers, and the USGS places AMD as the primary source...

  17. VALUING ACID MINE DRAINAGE REMEDIATION IN WEST VIRGINIA: A HEDONIC MODELING APPROACH INCORPORATING GEOGRAPHIC INFORMATION SYSTEMS

    EPA Science Inventory

    States with active and abandoned mines face large private and public costs to remediate damage to streams and rivers from acid mine drainage (AMD). Appalachian states have an especially large number of contaminated streams and rivers, and the USGS places AMD as the primary source...

  18. Identifying diagnostically-relevant resting state brain functional connectivity in the ventral posterior complex via genetic data mining in autism spectrum disorder.

    PubMed

    Baldwin, Philip R; Curtis, Kaylah N; Patriquin, Michelle A; Wolf, Varina; Viswanath, Humsini; Shaw, Chad; Sakai, Yasunari; Salas, Ramiro

    2016-05-01

    Exome sequencing and copy number variation analyses continue to provide novel insight to the biological bases of autism spectrum disorder (ASD). The growing speed at which massive genetic data are produced causes serious lags in analysis and interpretation of the data. Thus, there is a need to develop systematic genetic data mining processes that facilitate efficient analysis of large datasets. We report a new genetic data mining system, ProcessGeneLists and integrated a list of ASD-related genes with currently available resources in gene expression and functional connectivity of the human brain. Our data-mining program successfully identified three primary regions of interest (ROIs) in the mouse brain: inferior colliculus, ventral posterior complex of the thalamus (VPC), and parafascicular nucleus (PFn). To understand its pathogenic relevance in ASD, we examined the resting state functional connectivity (RSFC) of the homologous ROIs in human brain with other brain regions that were previously implicated in the neuro-psychiatric features of ASD. Among them, the RSFC of the VPC with the medial frontal gyrus (MFG) was significantly more anticorrelated, whereas the RSFC of the PN with the globus pallidus was significantly increased in children with ASD compared with healthy children. Moreover, greater values of RSFC between VPC and MFG were correlated with severity index and repetitive behaviors in children with ASD. No significant RSFC differences were detected in adults with ASD. Together, these data demonstrate the utility of our data-mining program through identifying the aberrant connectivity of thalamo-cortical circuits in children with ASD. Autism Res 2016, 9: 553-562. © 2015 International Society for Autism Research, Wiley Periodicals, Inc. © 2015 International Society for Autism Research, Wiley Periodicals, Inc.

  19. Combined Proteomic-Molecular Epidemiology Approach to Identify Precision Targets in Brain Cancer.

    PubMed

    Mostovenko, Ekaterina; Liu, Yanhong; Amirian, E Susan; Tsavachidis, Spiridon; Armstrong, Georgina N; Bondy, Melissa L; Nilsson, Carol L

    2017-07-11

    Primary brain tumors are predominantly malignant gliomas. Grade IV astrocytomas (glioblastomas, GBM) are among the most deadly of all tumors; most patients will succumb to their disease within 2 years of diagnosis despite standard of care. The grim outlook for brain tumor patients indicates that novel precision therapeutic targets must be identified. Our hypothesis is that the cancer proteomes of glioma tumors may contain protein variants that are linked to the aggressive pathology of the disease. To this end, we devised a novel workflow that combined variant proteomics with molecular epidemiological mining of public cancer data sets to identify 10 previously unrecognized variants linked to the risk of death in low grade glioma or GBM. We hypothesize that a subset of the protein variants may be successfully developed in the future as novel targets for malignant gliomas.

  20. The impact of vascular diameter ratio on hemodialysis maturation time: Evidence from data mining approaches and thermodynamics law

    PubMed Central

    Rezapour, Mohammad; Taran, Somayeh; Balin Parast, Mahmood; Khavanin Zadeh, Morteza

    2016-01-01

    Background: Vascular Access (VA) is an important aspect for blood circulatory in Hemodialysis (HD). Arteriovenous Fistula (AVF) is a suitable procedure to gain VA. Maturation of the AVF is a status of AVF, which can be cannulated for HD. This study aimed to discover the parameters that effectively reduce the duration between VA and start of HD, which symbolizes the maturation time (MT). Methods: Ninety-six patients who underwent AVF creation were selected for this study. The decision tree method was used based on CART/C4.5 algorithm, which is one of the data mining approaches for data classification. Vascular diameter ratio (VDR) coefficient was obtained (VDR=Artery/Vein diameters). Results: We investigated the relationship between the VDR and MT in this study and found that MT is reversely related to VDR in elderly patients, while this relation was direct in younger patients. Conclusion: The analysis revealed a Spearman's correlation coefficient for Vein diameter with MT. MT decreases when diameters of vein and artery are close to one another. This study can help the surgeons to identify high- risk patients who elongate MT for HD. PMID:27453889

  1. A network-based approach for semi-quantitative knowledge mining and its application to yield variability

    NASA Astrophysics Data System (ADS)

    Schauberger, Bernhard; Rolinski, Susanne; Müller, Christoph

    2016-12-01

    Variability of crop yields is detrimental for food security. Under climate change its amplitude is likely to increase, thus it is essential to understand the underlying causes and mechanisms. Crop models are the primary tool to project future changes in crop yields under climate change. A systematic overview of drivers and mechanisms of crop yield variability (YV) can thus inform crop model development and facilitate improved understanding of climate change impacts on crop yields. Yet there is a vast body of literature on crop physiology and YV, which makes a prioritization of mechanisms for implementation in models challenging. Therefore this paper takes on a novel approach to systematically mine and organize existing knowledge from the literature. The aim is to identify important mechanisms lacking in models, which can help to set priorities in model improvement. We structure knowledge from the literature in a semi-quantitative network. This network consists of complex interactions between growing conditions, plant physiology and crop yield. We utilize the resulting network structure to assign relative importance to causes of YV and related plant physiological processes. As expected, our findings confirm existing knowledge, in particular on the dominant role of temperature and precipitation, but also highlight other important drivers of YV. More importantly, our method allows for identifying the relevant physiological processes that transmit variability in growing conditions to variability in yield. We can identify explicit targets for the improvement of crop models. The network can additionally guide model development by outlining complex interactions between processes and by easily retrieving quantitative information for each of the 350 interactions. We show the validity of our network method as a structured, consistent and scalable dictionary of literature. The method can easily be applied to many other research fields.

  2. Text Influenced Molecular Indexing (TIMI): a literature database mining approach that handles text and chemistry.

    PubMed

    Singh, Suresh B; Hull, Richard D; Fluder, Eugene M

    2003-01-01

    We present an application of a novel methodology called Text Influenced Molecular Indexing (TIMI) to mine the information in the scientific literature. TIMI is an extension of two existing methodologies: (1) Latent Semantic Structure Indexing (LaSSI), a method for calculating chemical similarity using two-dimensional topological descriptors, and (2) Latent Semantic Indexing (LSI), a method for generating correlations between textual terms. The singular value decomposition (SVD) of a feature/object matrix is the fundamental mathematical operation underlying LSI, LaSSI, and TIMI and is used in the identification of associations between textual and chemical descriptors. We present the results of our studies with a database containing 11,571 PubMed/MEDLINE abstracts which show the advantages of merging textual and chemical descriptors over using either text or chemistry alone. Our work demonstrates that searching text-only databases limits retrieved documents to those that explicitly mention compounds by name in the text. Similarly, searching chemistry-only databases can only retrieve those documents that have chemical structures in them. TIMI, however, enables search and retrieval of documents with textual, chemical, and/or text- and chemistry-based queries. Thus, the TIMI system offers a powerful new approach to uncovering the contextual scientific knowledge sought by the medical research community.

  3. Application of machine learning algorithms for clinical predictive modeling: a data-mining approach in SCT.

    PubMed

    Shouval, R; Bondi, O; Mishan, H; Shimoni, A; Unger, R; Nagler, A

    2014-03-01

    Data collected from hematopoietic SCT (HSCT) centers are becoming more abundant and complex owing to the formation of organized registries and incorporation of biological data. Typically, conventional statistical methods are used for the development of outcome prediction models and risk scores. However, these analyses carry inherent properties limiting their ability to cope with large data sets with multiple variables and samples. Machine learning (ML), a field stemming from artificial intelligence, is part of a wider approach for data analysis termed data mining (DM). It enables prediction in complex data scenarios, familiar to practitioners and researchers. Technological and commercial applications are all around us, gradually entering clinical research. In the following review, we would like to expose hematologists and stem cell transplanters to the concepts, clinical applications, strengths and limitations of such methods and discuss current research in HSCT. The aim of this review is to encourage utilization of the ML and DM techniques in the field of HSCT, including prediction of transplantation outcome and donor selection.

  4. MetricForensics: A Multi-Level Approach for Mining Volatile Graphs

    SciTech Connect

    Henderson, Keith; Eliassi-Rad, Tina; Faloutsos, Christos; Akoglu, Leman; Li, Lei; Maruhashi, Koji; Prakash, B. Aditya; Tong, H

    2010-02-08

    Advances in data collection and storage capacity have made it increasingly possible to collect highly volatile graph data for analysis. Existing graph analysis techniques are not appropriate for such data, especially in cases where streaming or near-real-time results are required. An example that has drawn significant research interest is the cyber-security domain, where internet communication traces are collected and real-time discovery of events, behaviors, patterns and anomalies is desired. We propose MetricForensics, a scalable framework for analysis of volatile graphs. MetricForensics combines a multi-level “drill down" approach, a collection of user-selected graph metrics and a collection of analysis techniques. At each successive level, more sophisticated metrics are computed and the graph is viewed at a finer temporal resolution. In this way, MetricForensics scales to highly volatile graphs by only allocating resources for computationally expensive analysis when an interesting event is discovered at a coarser resolution first. We test MetricForensics on three real-world graphs: an enterprise IP trace, a trace of legitimate and malicious network traffic from a research institution, and the MIT Reality Mining proximity sensor data. Our largest graph has »3M vertices and »32M edges, spanning 4:5 days. The results demonstrate the scalability and capability of MetricForensics in analyzing volatile graphs; and highlight four novel phenomena in such graphs: elbows, broken correlations, prolonged spikes, and strange stars.

  5. New approach for reduction of diesel consumption by comparing different mining haulage configurations.

    PubMed

    Rodovalho, Edmo da Cunha; Lima, Hernani Mota; de Tomi, Giorgio

    2016-05-01

    The mining operations of loading and haulage have an energy source that is highly dependent on fossil fuels. In mining companies that select trucks for haulage, this input is the main component of mining costs. How can the impact of the operational aspects on the diesel consumption of haulage operations in surface mines be assessed? There are many studies relating the consumption of fuel trucks to several variables, but a methodology that prioritizes higher-impact variables under each specific condition is not available. Generic models may not apply to all operational settings presented in the mining industry. This study aims to create a method of analysis, identification, and prioritization of variables related to fuel consumption of haul trucks in open pit mines. For this purpose, statistical analysis techniques and mathematical modelling tools using multiple linear regressions will be applied. The model is shown to be suitable because the results generate a good description of the fuel consumption behaviour. In the practical application of the method, the reduction of diesel consumption reached 10%. The implementation requires no large-scale investments or very long deadlines and can be applied to mining haulage operations in other settings.

  6. A Critical Study on the Underground Environment of Coal Mines in India-an Ergonomic Approach

    NASA Astrophysics Data System (ADS)

    Dey, Netai Chandra; Sharma, Gourab Dhara

    2013-04-01

    Ergonomics application on underground miner's health plays a great role in controlling the efficiency of miners. The job stress in underground mine is still physically demanding and continuous stress due to certain posture or movement of miners during work leads to localized muscle fatigue creating musculo-skeletal disorders. A good working environment can change the degree of job heaviness and thermal stress (WBGT values) can directly have the effect on stretch of work of miners. Out of many unit operations in underground mine, roof bolting keeps an important contribution with regard to safety of the mine and miners. Occupational stress of roof bolters from ergonomic consideration has been discussed in the paper.

  7. Application of techniques to identify coal-mine and power-generation effects on surface-water quality, San Juan River basin, New Mexico and Colorado

    USGS Publications Warehouse

    Goetz, C.L.; Abeyta, Cynthia G.; Thomas, E.V.

    1987-01-01

    Numerous analytical techniques were applied to determine water quality changes in the San Juan River basin upstream of Shiprock , New Mexico. Eight techniques were used to analyze hydrologic data such as: precipitation, water quality, and streamflow. The eight methods used are: (1) Piper diagram, (2) time-series plot, (3) frequency distribution, (4) box-and-whisker plot, (5) seasonal Kendall test, (6) Wilcoxon rank-sum test, (7) SEASRS procedure, and (8) analysis of flow adjusted, specific conductance data and smoothing. Post-1963 changes in dissolved solids concentration, dissolved potassium concentration, specific conductance, suspended sediment concentration, or suspended sediment load in the San Juan River downstream from the surface coal mines were examined to determine if coal mining was having an effect on the quality of surface water. None of the analytical methods used to analyzed the data showed any increase in dissolved solids concentration, dissolved potassium concentration, or specific conductance in the river downstream from the mines; some of the analytical methods used showed a decrease in dissolved solids concentration and specific conductance. Chaco River, an ephemeral stream tributary to the San Juan River, undergoes changes in water quality due to effluent from a power generation facility. The discharge in the Chaco River contributes about 1.9% of the average annual discharge at the downstream station, San Juan River at Shiprock, NM. The changes in water quality detected at the Chaco River station were not detected at the downstream Shiprock station. It was not possible, with the available data, to identify any effects of the surface coal mines on water quality that were separable from those of urbanization, agriculture, and other cultural and natural changes. In order to determine the specific causes of changes in water quality, it would be necessary to collect additional data at strategically located stations. (Author 's abstract)

  8. Characterizing user engagement with health app data: a data mining approach.

    PubMed

    Serrano, Katrina J; Coa, Kisha I; Yu, Mandi; Wolff-Hughes, Dana L; Atienza, Audie A

    2017-06-01

    The use of mobile health applications (apps) especially in the area of lifestyle behaviors has increased, thus providing unprecedented opportunities to develop health programs that can engage people in real-time and in the real-world. Yet, relatively little is known about which factors relate to the engagement of commercially available apps for health behaviors. This exploratory study examined behavioral engagement with a weight loss app, Lose It! and characterized higher versus lower engaged groups. Cross-sectional, anonymized data from Lose It! were analyzed (n = 12,427,196). This dataset was randomly split into 24 subsamples and three were used for this study (total n = 1,011,008). Classification and regression tree methods were used to identify subgroups of user engagement with one subsample, and descriptive analyses were conducted to examine other group characteristics associated with engagement. Data mining validation methods were conducted with two separate subsamples. On average, users engaged with the app for 29 days. Six unique subgroups were identified, and engagement for each subgroup varied, ranging from 3.5 to 172 days. Highly engaged subgroups were primarily distinguished by the customization of diet and exercise. Those less engaged were distinguished by weigh-ins and the customization of diet. Results were replicated in further analyses. Commercially-developed apps can reach large segments of the population, and data from these apps can provide insights into important app features that may aid in user engagement. Getting users to engage with a mobile health app is critical to the success of apps and interventions that are focused on health behavior change.

  9. DEVELOPMENT AND PERFORMANCE OF TEXT-MINING ALGORITHMS TO EXTRACT SOCIOECONOMIC STATUS FROM DE-IDENTIFIED ELECTRONIC HEALTH RECORDS.

    PubMed

    Hollister, Brittany M; Restrepo, Nicole A; Farber-Eger, Eric; Crawford, Dana C; Aldrich, Melinda C; Non, Amy

    2016-01-01

    Socioeconomic status (SES) is a fundamental contributor to health, and a key factor underlying racial disparities in disease. However, SES data are rarely included in genetic studies due in part to the difficultly of collecting these data when studies were not originally designed for that purpose. The emergence of large clinic-based biobanks linked to electronic health records (EHRs) provides research access to large patient populations with longitudinal phenotype data captured in structured fields as billing codes, procedure codes, and prescriptions. SES data however, are often not explicitly recorded in structured fields, but rather recorded in the free text of clinical notes and communications. The content and completeness of these data vary widely by practitioner. To enable gene-environment studies that consider SES as an exposure, we sought to extract SES variables from racial/ethnic minority adult patients (n=9,977) in BioVU, the Vanderbilt University Medical Center biorepository linked to de-identified EHRs. We developed several measures of SES using information available within the de-identified EHR, including broad categories of occupation, education, insurance status, and homelessness. Two hundred patients were randomly selected for manual review to develop a set of seven algorithms for extracting SES information from de-identified EHRs. The algorithms consist of 15 categories of information, with 830 unique search terms. SES data extracted from manual review of 50 randomly selected records were compared to data produced by the algorithm, resulting in positive predictive values of 80.0% (education), 85.4% (occupation), 87.5% (unemployment), 63.6% (retirement), 23.1% (uninsured), 81.8% (Medicaid), and 33.3% (homelessness), suggesting some categories of SES data are easier to extract in this EHR than others. The SES data extraction approach developed here will enable future EHR-based genetic studies to integrate SES information into statistical analyses

  10. Study on perception and control layer of mine CPS with mixed logic dynamic approach

    NASA Astrophysics Data System (ADS)

    Li, Jingzhao; Ren, Ping; Yang, Dayu

    2017-01-01

    Mine inclined roadway transportation system of mine cyber physical system is a hybrid system consisting of a continuous-time system and a discrete-time system, which can be divided into inclined roadway signal subsystem, error-proofing channel subsystems, anti-car subsystems, and frequency control subsystems. First, to ensure stable operation, improve efficiency and production safety, this hybrid system model with n inputs and m outputs is constructed and analyzed in detail, then its steady schedule state to be solved. Second, on the basis of the formal modeling for real-time systems, we use hybrid toolbox for system security verification. Third, the practical application of mine cyber physical system shows that the method for real-time simulation of mine cyber physical system is effective.

  11. A novel approach for acid mine drainage pollution biomonitoring using rare earth elements bioaccumulated in the freshwater clam Corbicula fluminea.

    PubMed

    Bonnail, Estefanía; Pérez-López, Rafael; Sarmiento, Aguasanta M; Nieto, José Miguel; DelValls, T Ángel

    2017-09-15

    Lanthanide series have been used as a record of the water-rock interaction and work as a tool for identifying impacts of acid mine drainage (lixiviate residue derived from sulphide oxidation). The application of North-American Shale Composite-normalized rare earth elements patterns to these minority elements allows determining the origin of the contamination. In the current study, geochemical patterns were applied to rare earth elements bioaccumulated in the soft tissue of the freshwater clam Corbicula fluminea after exposure to different acid mine drainage contaminated environments. Results show significant bioaccumulation of rare earth elements in soft tissue of the clam after 14 days of exposure to acid mine drainage contaminated sediment (ΣREE=1.3-8μg/gdw). Furthermore, it was possible to biomonitor different degrees of contamination based on rare earth elements in tissue. The pattern of this type of contamination describes a particular curve characterized by an enrichment in the middle rare earth elements; a homologous pattern (EMREE=0.90) has also been observed when applied NASC normalization in clam tissues. Results of lanthanides found in clams were contrasted with the paucity of toxicity studies, determining risk caused by light rare earth elements in the Odiel River close to the Estuary. The current study purposes the use of clam as an innovative "bio-tool" for the biogeochemical monitoring of pollution inputs that determines the acid mine drainage networks affection. Copyright © 2017 Elsevier B.V. All rights reserved.

  12. Stochastic dynamic optimization approach for revegetation of reclaimed mine soils under uncertain weather regime

    SciTech Connect

    Mustafa, G.

    1989-01-01

    This study presents a comprehensive physically based stochastic dynamic optimization model to assist planners in making decisions concerning mine soil depths and soil mixture ratios required to achieve successful revegetation of mine lands at different probability levels of success, subject to an uncertain weather regime. A perennial grass growth model was modified and validated for predicting vegetation growth in reclaimed mine soils. The plant growth model is based on continuous relationships between plant growth, air temperature, dry length, leaf area, photoperiod and plant-soil-moisture stresses. A plant available soil moisture model was adopted to estimate daily soil moisture for mine soils. A general probability model was developed to estimate the probability of successful revegetation in a 5-year bond release period. The probability model considers five possible bond release criteria in mine soil reclamation planning. A stochastic dynamic optimization model (SDOM) was developed to find the optimum combination of soil depth and soil mixture ratios that met the successful vegetation standard under non-irrigated conditions with weather as the only random element of the system. The SDOM was applied for Wise County, Virginia, and the model found that 2:1 sandstone/siltstone soil mixture required the minimum soil depth to achieve successful revegetation. These results were also supported by field data. The developed model allows the planners to better manage lands drastically disturbed by surface mining.

  13. Identifying Low-Effort Examinees on Student Learning Outcomes Assessment: A Comparison of Two Approaches

    ERIC Educational Resources Information Center

    Rios, Joseph A.; Liu, Ou Lydia; Bridgeman, Brent

    2014-01-01

    This chapter describes a study that compares two approaches (self-reported effort [SRE] and response time effort [RTE]) for identifying low-effort examinees in student learning outcomes assessment. Although both approaches equally discriminated from measures of ability (e.g., SAT scores), RTE was found to have a stronger relationship with test…

  14. Mining DNA microarray data using a novel approach based on graph theory.

    PubMed

    del Rio, G; Bartley, T F; del-Rio, H; Rao, R; Jin, K L; Greenberg, D A; Eshoo, M; Bredesen, D E

    2001-12-07

    The recent demonstration that biochemical pathways from diverse organisms are arranged in scale-free, rather than random, systems [Jeong et al., Nature 407 (2000) 651-654], emphasizes the importance of developing methods for the identification of biochemical nexuses--the nodes within biochemical pathways that serve as the major input/output hubs, and therefore represent potentially important targets for modulation. Here we describe a bioinformatics approach that identifies candidate nexuses for biochemical pathways without requiring functional gene annotation; we also provide proof-of-principle experiments to support this technique. This approach, called Nexxus, may lead to the identification of new signal transduction pathways and targets for drug design.

  15. A Network Biology Approach Identifies Molecular Cross-Talk between Normal Prostate Epithelial and Prostate Carcinoma Cells.

    PubMed

    Trevino, Victor; Cassese, Alberto; Nagy, Zsuzsanna; Zhuang, Xiaodong; Herbert, John; Antczak, Philipp; Clarke, Kim; Davies, Nicholas; Rahman, Ayesha; Campbell, Moray J; Guindani, Michele; Bicknell, Roy; Vannucci, Marina; Falciani, Francesco

    2016-04-01

    The advent of functional genomics has enabled the genome-wide characterization of the molecular state of cells and tissues, virtually at every level of biological organization. The difficulty in organizing and mining this unprecedented amount of information has stimulated the development of computational methods designed to infer the underlying structure of regulatory networks from observational data. These important developments had a profound impact in biological sciences since they triggered the development of a novel data-driven investigative approach. In cancer research, this strategy has been particularly successful. It has contributed to the identification of novel biomarkers, to a better characterization of disease heterogeneity and to a more in depth understanding of cancer pathophysiology. However, so far these approaches have not explicitly addressed the challenge of identifying networks representing the interaction of different cell types in a complex tissue. Since these interactions represent an essential part of the biology of both diseased and healthy tissues, it is of paramount importance that this challenge is addressed. Here we report the definition of a network reverse engineering strategy designed to infer directional signals linking adjacent cell types within a complex tissue. The application of this inference strategy to prostate cancer genome-wide expression profiling data validated the approach and revealed that normal epithelial cells exert an anti-tumour activity on prostate carcinoma cells. Moreover, by using a Bayesian hierarchical model integrating genetics and gene expression data and combining this with survival analysis, we show that the expression of putative cell communication genes related to focal adhesion and secretion is affected by epistatic gene copy number variation and it is predictive of patient survival. Ultimately, this study represents a generalizable approach to the challenge of deciphering cell communication networks

  16. A Network Biology Approach Identifies Molecular Cross-Talk between Normal Prostate Epithelial and Prostate Carcinoma Cells

    PubMed Central

    Trevino, Victor; Cassese, Alberto; Nagy, Zsuzsanna; Zhuang, Xiaodong; Herbert, John; Antzack, Philipp; Clarke, Kim; Davies, Nicholas; Rahman, Ayesha; Campbell, Moray J.; Bicknell, Roy; Vannucci, Marina; Falciani, Francesco

    2016-01-01

    Abstract The advent of functional genomics has enabled the genome-wide characterization of the molecular state of cells and tissues, virtually at every level of biological organization. The difficulty in organizing and mining this unprecedented amount of information has stimulated the development of computational methods designed to infer the underlying structure of regulatory networks from observational data. These important developments had a profound impact in biological sciences since they triggered the development of a novel data-driven investigative approach. In cancer research, this strategy has been particularly successful. It has contributed to the identification of novel biomarkers, to a better characterization of disease heterogeneity and to a more in depth understanding of cancer pathophysiology. However, so far these approaches have not explicitly addressed the challenge of identifying networks representing the interaction of different cell types in a complex tissue. Since these interactions represent an essential part of the biology of both diseased and healthy tissues, it is of paramount importance that this challenge is addressed. Here we report the definition of a network reverse engineering strategy designed to infer directional signals linking adjacent cell types within a complex tissue. The application of this inference strategy to prostate cancer genome-wide expression profiling data validated the approach and revealed that normal epithelial cells exert an anti-tumour activity on prostate carcinoma cells. Moreover, by using a Bayesian hierarchical model integrating genetics and gene expression data and combining this with survival analysis, we show that the expression of putative cell communication genes related to focal adhesion and secretion is affected by epistatic gene copy number variation and it is predictive of patient survival. Ultimately, this study represents a generalizable approach to the challenge of deciphering cell communication

  17. Nuclear waste repositories in salt mines: a new approach to safety assessment.

    PubMed

    Memmert, G

    1996-08-01

    The long-term safety of radioactive waste repositories in rock-salt mines in the deep underground benefits significantly from the barrier effect of overlying rocks. The concentrations of radioactive substances released from the repository and migrating in the aquifer up to the biosphere are greatly reduced during passage through these rocks. In former safety analyses of waste repositories this transport has generally been modelled as a combination of the involved phenomena, e.g. convection, dispersion, adsorption, etc. The data required for a numerical evaluation of the overall effect are obtained either as (conservative) estimates based on experience or are empirical, based mainly on laboratory experiments. The approach presented here is much simpler and entirely empirical, and therefore more transparent. It makes use of the fact that the groundwater in the overlying rocks always contains dissolved salt from the salt formation and carries it continuously into the receiving channels or the drainage system. The relation between the total amount of dissolved solids present in a certain subsurface catchment area and their steady-state concentration in the receiving channels is assumed to be equivalent to the relation between the given amount of radionuclides released from the repository and their concentration in the receiving channels, the latter leading to a certain radiation exposure of the population. Two versions of this approach are discussed: version (a) assumes a continuous stream of radionuclides released from the repository, and version (b) assumes a pulse release of radionuclides from the repository. A simple calculation using data from the Gorleben exploration leads to the inequality [equation: see text] where Cmax is the maximum radionuclide concentration (with respect to time) in the receiving channels and W (Bq) is the amount of radionuclides released from the respository in a very short time. Cmax obtained from (1), is supposed to be an upper limit of

  18. Improvement Evaluation on Ceramic Roof Extraction Using WORLDVIEW-2 Imagery and Geographic Data Mining Approach

    NASA Astrophysics Data System (ADS)

    Brum-Bastos, V. S.; Ribeiro, B. M. G.; Pinho, C. M. D.; Korting, T. S.; Fonseca, L. M. G.

    2016-06-01

    Advances in geotechnologies and in remote sensing have improved analysis of urban environments. The new sensors are increasingly suited to urban studies, due to the enhancement in spatial, spectral and radiometric resolutions. Urban environments present high heterogeneity, which cannot be tackled using pixel-based approaches on high resolution images. Geographic Object-Based Image Analysis (GEOBIA) has been consolidated as a methodology for urban land use and cover monitoring; however, classification of high resolution images is still troublesome. This study aims to assess the improvement on ceramic roof classification using WorldView-2 images due to the increase of 4 new bands besides the standard "Blue-Green-Red-Near Infrared" bands. Our methodology combines GEOBIA, C4.5 classification tree algorithm, Monte Carlo simulation and statistical tests for classification accuracy. Two samples groups were considered: 1) eight multispectral and panchromatic bands, and 2) four multispectral and panchromatic bands, representing previous high-resolution sensors. The C4.5 algorithm generates a decision tree that can be used for classification; smaller decision trees are closer to the semantic networks produced by experts on GEOBIA, while bigger trees, are not straightforward to implement manually, but are more accurate. The choice for a big or small tree relies on the user's skills to implement it. This study aims to determine for what kind of user the addition of the 4 new bands might be beneficial: 1) the common user (smaller trees) or 2) a more skilled user with coding and/or data mining abilities (bigger trees). In overall the classification was improved by the addition of the four new bands for both types of users.

  19. Novel approaches to global mining of aberrantly methylated promoter sites in squamous head and neck cancer.

    PubMed

    Worsham, Maria J; Chen, Kang Mei; Stephen, Josena K; Havard, Shaleta; Benninger, Michael S

    2010-07-01

    Promoter hypermethylation is emerging as a promising molecular strategy for early detection of cancer. We examined promoter methylation status of 1143 cancer-associated genes to perform a global but unbiased inspection of methylated regions in head and neck squamous cell carcinoma (HNSCC). Laboratory-based study. Integrated health care system. Five samples, two frozen primary HNSCC biopsies and three HNSCC cell lines, were examined. Whole genomic DNA was interrogated using a combination of DNA immunoprecipitation (IP) and Affymetrix whole-genome tiling arrays. Of the 1143 unique cancer genes on the array, 265 were recorded across five samples. Of the 265 genes, 55 were present in all five samples, and 36 were common to four of five samples, 46 to three of five, 56 to two of five, and 72 to one of five samples. Hypermethylated genes in the five samples were cross-examined against those in PubMeth, a cancer methylation database combining text mining and expert annotation (http://www.pubmeth.org). Of the 441 genes in PubMeth, only 33 are referenced to HNSCC. We matched 34 genes in our samples to the 441 genes in the PubMeth database. Of the 34 genes, eight are reported in PubMeth as HNSCC associated. This pilot study examined the contribution of global DNA hypermethylation to the pathogenesis of HNSCC. The whole-genome methylation approach indicated 231 new genes with methylated promoter regions not yet reported in HNSCC. Examination of this comprehensive gene panel in a larger HNSCC cohort should advance selection of HNSCC-specific candidate genes for further validation as biomarkers in HNSCC. 2010 American Academy of Otolaryngology-Head and Neck Surgery Foundation. Published by Mosby, Inc. All rights reserved.

  20. Smart-card-based automatic meal record system intervention tool for analysis using data mining approach.

    PubMed

    Zenitani, Satoko; Nishiuchi, Hiromu; Kiuchi, Takahiro

    2010-04-01

    The Smart-card-based Automatic Meal Record system for company cafeterias (AutoMealRecord system) was recently developed and used to monitor employee eating habits. The system could be a unique nutrition assessment tool for automatically monitoring the meal purchases of all employees, although it only focuses on company cafeterias and has never been validated. Before starting an interventional study, we tested the reliability of the data collected by the system using the data mining approach. The AutoMealRecord data were examined to determine if it could predict current obesity. All data used in this study (n = 899) were collected by a major electric company based in Tokyo, which has been operating the AutoMealRecord system for several years. We analyzed dietary patterns by principal component analysis using data from the system and extracted 5 major dietary patterns: healthy, traditional Japanese, Chinese, Japanese noodles, and pasta. The ability to predict current body mass index (BMI) with dietary preference was assessed with multiple linear regression analyses, and in the current study, BMI was positively correlated with male gender, preference for "Japanese noodles," mean energy intake, protein content, and frequency of body measurement at a body measurement booth in the cafeteria. There was a negative correlation with age, dietary fiber, and lunchtime cafeteria use (R(2) = 0.22). This regression model predicted "would-be obese" participants (BMI >or= 23) with 68.8% accuracy by leave-one-out cross validation. This shows that there was sufficient predictability of BMI based on data from the AutoMealRecord System. We conclude that the AutoMealRecord system is valuable for further consideration as a health care intervention tool. Copyright 2010 Elsevier Inc. All rights reserved.

  1. Shifting species ranges and changing phenology: A new approach to mining social media for ecosystems observations

    NASA Astrophysics Data System (ADS)

    Fuka, M. Z.; Osborne-Gowey, J. D.; Fuka, D. R.

    2013-12-01

    Geoscientists & ecologists are increasingly using social media to solicit 'citizen scientists' to participate in the data collection process. However, social media users are also a largely untapped resource of spontaneous, unsolicited observations of the natural world. Of particular interest are observations of species phenology & range to better develop a predictive understanding of how ecosystems are affected by a changing climate and human-mediated influences. Social media users' observations include information on phenological & biological phenomena such as flowers blooming, native & invasive species sightings, unusual behaviors, animal tracks, droppings, damage, feeding, nesting, etc. Our AGU2011 pilot study on the North American armadillo suggests that useful observational data can be extracted from Twitter to map current species ranges to compare with past ranges. We have expanded that work by mining Twitter for a number of North American species and ecosystem observations to determine usefulness for environmental applications such as: 1) supplementing existing databases, 2) identifying outlier phenomena, 3) guiding additional crowd-sourced studies and data collection efforts, 4) recruiting citizen scientists, 5) gauging sentiment about the observations and 6) informing ecosystems policy-making and education. We present the results for our evaluation of a representative sample from a list of 200+ species for which we've collected data since August 2011. Our results include frequency of reports and sightings by day, week and month, where the number of observations range from a few per month to ten or more per day. We discuss challenges, best practices and tools for distilling information from crowd-sourced observations gathered via Twitter in the form of 140-character 'tweets'. For example, geolocation is a critical issue. Despite the prevalence of smart phones, specific latitudinal and longitudinal coordinates are included in fewer than 10% of the

  2. Mining and biodiversity offsets: a transparent and science-based approach to measure "no-net-loss".

    PubMed

    Virah-Sawmy, Malika; Ebeling, Johannes; Taplin, Roslyn

    2014-10-01

    Mining and associated infrastructure developments can present themselves as economic opportunities that are difficult to forego for developing and industrialised countries alike. Almost inevitably, however, they lead to biodiversity loss. This trade-off can be greatest in economically poor but highly biodiverse regions. Biodiversity offsets have, therefore, increasingly been promoted as a mechanism to help achieve both the aims of development and biodiversity conservation. Accordingly, this mechanism is emerging as a key tool for multinational mining companies to demonstrate good environmental stewardship. Relying on offsets to achieve "no-net-loss" of biodiversity, however, requires certainty in their ecological integrity where they are used to sanction habitat destruction. Here, we discuss real-world practices in biodiversity offsetting by assessing how well some leading initiatives internationally integrate critical aspects of biodiversity attributes, net loss accounting and project management. With the aim of improving, rather than merely critiquing the approach, we analyse different aspects of biodiversity offsetting. Further, we analyse the potential pitfalls of developing counterfactual scenarios of biodiversity loss or gains in a project's absence. In this, we draw on insights from experience with carbon offsetting. This informs our discussion of realistic projections of project effectiveness and permanence of benefits to ensure no net losses, and the risk of displacing, rather than avoiding biodiversity losses ("leakage"). We show that the most prominent existing biodiversity offset initiatives employ broad and somewhat arbitrary parameters to measure habitat value and do not sufficiently consider real-world challenges in compensating losses in an effective and lasting manner. We propose a more transparent and science-based approach, supported with a new formula, to help design biodiversity offsets to realise their potential in enabling more responsible

  3. Microbial populations identified by fluorescence in situ hybridization in a constructed wetland treating acid coal mine drainage

    SciTech Connect

    Nicomrat, D.; Dick, W.A.; Tuovinen, O.H.

    2006-07-15

    Microorganisms are an integral part of the biogeochemical processes in wetlands, yet microbial communities in sediments within constructed wetlands receiving acid mine drainage (AMD) are only poorly understood. The purpose of this study was to characterize the microbial diversity and abundance in a wetland receiving AMD using fluorescence in situ hybridization (FISH) analysis. Seasonal samples of oxic surface sediments, comprised of Fe(III) precipitates, were collected from two treatment cells of the constructed wetland system. The pH of the bulk samples ranged between pH 2.1 and 3.9. Viable counts of acidophilic Fe and S oxidizers and heterotrophs were determined with a most probable number (MPN) method. The MPN counts were only a fraction of the corresponding FISH counts. The sediment samples contained microorganisms in the Bacteria (including the subgroups of acidophilic Fe- and S-oxidizing bacteria and Acidiphilium spp.) and Eukarya domains. Archaea were present in the sediment surface samples at < 0.01% of the total microbial community. The most numerous bacterial species in this wetland system was Acidithiobacillus ferrooxidans, comprising up to 37% of the bacterial population. Acidithiobacillus thiooxidans was also abundant.

  4. Geochemistry and mercury contamination in receiving environments of artisanal mining wastes and identified concerns for food safety.

    PubMed

    Reichelt-Brushett, Amanda J; Stone, Jane; Howe, Pelli; Thomas, Bernard; Clark, Malcolm; Male, Yusthinus; Nanlohy, Albert; Butcher, Paul

    2017-01-01

    Artisanal small-scale gold mining (ASGM) using mercury (Hg) amalgamation has been occurring on Buru Island, Indonesia since early 2012, and has caused rapid accumulation of high Hg concentrations in river, estuary and marine sediments. In this study, sediment samples were collected from several sites downstream of the Mount Botak ASGM site, as well as in the vicinity of the more recently established site at Gogrea where no sampling had previously been completed. All sediment samples had total Hg (THg) concentrations exceeding Indonesian sediment quality guidelines and were up to 82 times this limit at one estuary site. The geochemistry of sediments in receiving environments indicates the potential for Hg-methylation to form highly bioavailable Hg species. To assess the current contamination threat from consumption of local seafood, samples of fish, molluscs and crustaceans were collected from the Namlea fish market and analysed for THg concentrations. The majority of edible tissue samples had elevated THg concentrations, which raises concerns for food safety. This study shows that river, estuary and marine ecosystems downstream of ASGM operations on Buru Island are exposed to dangerously high Hg concentrations, which are impacting aquatic food chains, and fisheries resources. Considering the high dietary dependence on marine protein in the associated community and across the Mollucas Province, and the short time period since ASGM operations commenced in this region, the results warrant urgent further investigation, risk mitigation, and community education. Copyright © 2016 Elsevier Inc. All rights reserved.

  5. A Novel Four-step Approach for Systematic Identification of Naphthoquinones in Juglans cathayensis Dode using Various Scan Functions of Liquid Chromatography-Tandem Mass Spectrometry along with Data Mining Strategies.

    PubMed

    Gan, Yuan; Zhang, Yang; Li, Aiqian; Song, Chengwu; Chen, Chang; Xu, Yong; Ruan, Hanli; Jiang, Hongliang

    2015-01-01

    Systematic analyses of naphthoquinones in Juglans cathayensis have not yet been reported. It is very challenging to identify naphthoquinones with various structural diversities, especially those at trace levels. To develop an efficient analytical approach for systematic discovery and identification of naphthoquinones in Juglans cathayensis. A novel four-step approach was evaluated by utilizing various scan functions of liquid chromatography-triple quadrupole-linear ion trap mass spectrometry (LC-QTRAP-MS/MS) and liquid chromatography-quadrupole time-of-flight tandem mass spectrometry (LC-QTOF-MS/MS) along with data mining strategies. First, MS/MS fragmentation behaviors of naphthoquinones were investigated. Second, multiple ion monitoring triggered enhanced product ion scan (MIM-EPI) with specified ions was conducted to identify targeted naphthoquinones. Third, other scan functions of QTRAP-MS/MS and data mining strategies were explored to identify untargeted naphthoquinones. Fourth, structural rationalization and confirmation of naphthoquinones were performed using QTOF-MS/MS via its accurate mass measurement and MS/MS fragmentation functions. Optimal scan methods and data mining strategies using QTRAP-MS/MS were obtained for identification of targeted and untargeted naphthoquinones. Consequently, 48 naphthoquinones including 24 novel ones were identified or tentatively identified from Juglans cathayensis. A novel four-step approach for efficient discovery and identification of naphthoquinones was developed by exploring various scan functions of current LC-MS/MS technologies and data mining strategies, providing an example for systematic characterization of certain classes of phytochemicals, especially trace analytes in complex samples. Copyright © 2015 John Wiley & Sons, Ltd.

  6. An Empirical Bayesian Approach for Identifying Differential Coexpression in High-Throughput Experiments

    PubMed Central

    Dawson, John A.; Kendziorski, Christina

    2012-01-01

    Summary A common goal of microarray and related high-throughput genomic experiments is to identify genes that vary across biological condition. Most often this is accomplished by identifying genes with changes in mean expression level, so called differentially expressed (DE) genes, and a number of effective methods for identifying DE genes have been developed. Although useful, these approaches do not accommodate other types of differential regulation. An important example concerns differential coexpression (DC). Investigations of this class of genes are hampered by the large cardinality of the space to be interrogated as well as by influential outliers. As a result, existing DC approaches are often underpowered, exceedingly prone to false discoveries, and/or computationally intractable for even a moderately large number of pairs. To address this, an empirical Bayesian approach for identifying DC gene pairs is developed. The approach provides a false discovery rate controlled list of significant DC gene pairs without sacrificing power. It is applicable within a single study as well as across multiple studies. Computations are greatly facilitated by a modification to the expectation–maximization algorithm and a procedural heuristic. Simulations suggest that the proposed approach outperforms existing methods in far less computational time; and case study results suggest that the approach will likely prove to be a useful complement to current DE methods in high-throughput genomic studies. PMID:22004327

  7. The use of data-mining to identify indicators of health-related quality of life in patients with irritable bowel syndrome.

    PubMed

    Penny, Kay I; Smith, Graeme D

    2012-10-01

    To examine the health-related quality of life in a cohort of individuals with irritable bowel syndrome and to explore the use of several data-mining methods to identify which socio-demographic and irritable bowel syndrome symptoms are most highly associated with impaired health-related quality of life. Health-related quality of life can be adversely affected by irritable bowel syndrome. Little is presently known about the predictive factors that may influence the quality of life in these patients. Cross-sectional survey design involving the general population of the UK. Methods.  Individuals with symptoms of irritable bowel syndrome were recruited to a longitudinal cohort survey via a UK-wide newspaper advert. Health-related quality of life was measured using a battery of validated questionnaires. Several data-mining models to determine which factors are associated with impaired health-related quality of life are considered in this study and include logistic regression, a classification tree and artificial neural networks. As well as irritable bowel syndrome symptom severity, results indicate that psychological morbidity and socio-demographic factors such as marital status and employment status also have a major influence on health-related quality of life in irritable bowel syndrome. Health-related quality of life is impaired in community-based individuals in the UK with irritable bowel syndrome. Although not always as easily interpreted as logistic regression, data-mining techniques indicate subsets of factors that are highly associated with impaired quality of life. These models tend to include subsets of irritable bowel syndrome symptoms and psychosocial factors. Identification of the role of psychological and socio-demographic factors on health-related quality of life may provide more insight into the nature of irritable bowel syndrome. Greater understanding of these factors will facilitate more flexible and efficient nursing assessment and management of this

  8. Noninvasive intracranial pressure assessment based on a data-mining approach using a nonlinear mapping function.

    PubMed

    Kim, Sunghan; Scalzo, Fabien; Bergsneider, Marvin; Vespa, Paul; Martin, Neil; Hu, Xiao

    2012-03-01

    The current gold standard to determine intracranial pressure (ICP) involves an invasive procedure for direct access to the intracranial compartment. The risks associated with this invasive procedure include intracerebral hemorrhage, infection, and discomfort. We previously proposed an innovative data-mining framework of noninvasive ICP (NICP) assessment. The performance of the proposed framework relies on designing a good mapping function. We attempt to achieve performance gain by adopting various linear and nonlinear mapping functions. Our results demonstrate that a nonlinear mapping function based on the kernel spectral regression technique significantly improves the performance of the proposed data-mining framework for NICP assessment in comparison to other linear mapping functions.

  9. Geotechnical approaches to coal ash content control in mining of complex structure deposits

    NASA Astrophysics Data System (ADS)

    Batugin, SA; Gavrilov, VL; Khoyutanov, EA

    2017-02-01

    Coal deposits having complex structure and nonuniform quality coal reserves require improved processes of production quality control. The paper proposes a method to present coal ash content as components of natural and technological dilution. It is chosen to carry out studies on the western site of Elginsk coal deposit, composed of four coal beds of complex structure. The reported estimates of coal ash content in the beds with respect to five components point at the need to account for such data in confirmation exploration, mine planning and actual mining. Basic means of analysis and control of overall ash content and its components are discussed.

  10. Demonstrating a Market-Based Approach to the Reclamation of Mined Lands in West Virginia

    SciTech Connect

    John W. Goodrich-Mahoney; Paul Ziemkiewicz

    2006-07-19

    This is the third quarter progress report of Phase II of a three-phase project to develop and evaluate the efficacy of developing multiple environmental market trading credits on a partially reclaimed surface mined site near Valley Point, Preston County, WV. Construction of the passive acid mine drainage (AMD) treatment system was completed but several modifications from the original design had to be made following the land survey and during construction to compensate for unforeseen circumstances. We continued to collect baseline quality data from the Conner Run AMD seeps to confirm the conceptual and final design for the passive AMD treatment system.

  11. Rate of occupational accidents in the mining industry since 1950--a successful approach to prevention policy.

    PubMed

    Breuer, Joachim; Höffer, Eva-Marie; Hummitzsch, Walter

    2002-01-01

    This paper deals with the decrease in the rate of accident insurance claims in the German mining industry over the last five decades. It intends to show that this process is above all the result of a prevention policy where companies and the body responsible for the legal accident insurance in the mining industry, the Bergbau-Berufsgenossenschaft (BBG), work hand in hand. A system like the German accident insurance scheme, combining prevention, rehabilitation, and compensation, enables successful and modern safety and health measures.

  12. Data mining the NCI cancer cell line compound GI(50) values: identifying quinone subtypes effective against melanoma and leukemia cell classes.

    PubMed

    Marx, Kenneth A; O'Neil, Philip; Hoffman, Patrick; Ujwal, M L

    2003-01-01

    Using data mining techniques, we have studied a subset (1400) of compounds from the large public National Cancer Institute (NCI) compounds data repository. We first carried out a functional class identity assignment for the 60 NCI cancer testing cell lines via hierarchical clustering of gene expression data. Comprised of nine clinical tissue types, the 60 cell lines were placed into six classes-melanoma, leukemia, renal, lung, and colorectal, and the sixth class was comprised of mixed tissue cell lines not found in any of the other five classes. We then carried out supervised machine learning, using the GI(50) values tested on a panel of 60 NCI cancer cell lines. For separate 3-class and 2-class problem clustering, we successfully carried out clear cell line class separation at high stringency, p < 0.01 (Bonferroni corrected t-statistic), using feature reduction clustering algorithms embedded in RadViz, an integrated high dimensional analytic and visualization tool. We started with the 1400 compound GI(50) values as input and selected only those compounds, or features, significant in carrying out the classification. With this approach, we identified two small sets of compounds that were most effective in carrying out complete class separation of the melanoma, non-melanoma classes and leukemia, non-leukemia classes. To validate these results, we showed that these two compound sets' GI(50) values were highly accurate classifiers using five standard analytical algorithms. One compound set was most effective against the melanoma class cell lines (14 compounds), and the other set was most effective against the leukemia class cell lines (30 compounds). The two compound classes were both significantly enriched in two different types of substituted p-quinones. The melanoma cell line class of 14 compounds was comprised of 11 compounds that were internal substituted p-quinones, and the leukemia cell line class of 30 compounds was comprised of 6 compounds that were external

  13. Microbial populations identified by fluorescence in situ hybridization in a constructed wetland treating acid coal mine drainage.

    PubMed

    Nicomrat, Duongruitai; Dick, Warren A; Tuovinen, Olli H

    2006-01-01

    Microorganisms are an integral part of the biogeochemical processes in wetlands, yet microbial communities in sediments within constructed wetlands receiving acid mine drainage (AMD) are only poorly understood. The purpose of this study was to characterize the microbial diversity and abundance in a wetland receiving AMD using fluorescence in situ hybridization (FISH) analysis. Seasonal samples of oxic surface sediments, comprised of Fe(III) precipitates, were collected from two treatment cells of the constructed wetland system. The pH of the bulk samples ranged between pH 2.1 and 3.9. Viable counts of acidophilic Fe and S oxidizers and heterotrophs were determined with a most probable number (MPN) method. The MPN counts were only a fraction of the corresponding FISH counts. The sediment samples contained microorganisms in the Bacteria (including the subgroups of acidophilic Fe- and S-oxidizing bacteria and Acidiphilium spp.) and Eukarya domains. Archaea were present in the sediment surface samples at < 0.01% of the total microbial community. The most numerous bacterial species in this wetland system was Acidithiobacillus ferrooxidans, comprising up to 37% of the bacterial population. Acidithiobacillus thiooxidans was also abundant. Heterotrophs in the Acidiphilium genus totaled 20% of the bacterial population. Leptospirillum ferrooxidans was below the level of detection in the bacterial community. The results from the FISH technique from this field study are consistent with results from other experiments involving enumeration by most probable number, dot-blot hybridization, and denaturing gradient gel electrophoresis analyses and with the geochemistry of the site.

  14. Detecting surface coal mining areas from remote sensing imagery: an approach based on object-oriented decision trees

    NASA Astrophysics Data System (ADS)

    Zeng, Xiaoji; Liu, Zhifeng; He, Chunyang; Ma, Qun; Wu, Jianguo

    2017-01-01

    Detecting surface coal mining areas (SCMAs) using remote sensing data in a timely and an accurate manner is necessary for coal industry management and environmental assessment. We developed an approach to effectively extract SCMAs from remote sensing imagery based on object-oriented decision trees (OODT). This OODT approach involves three main steps: object-oriented segmentation, calculation of spectral characteristics, and extraction of SCMAs. The advantage of this approach lies in its effective integration of the spectral and spatial characteristics of SCMAs so as to distinguish the mining areas (i.e., the extracting areas, stripped areas, and dumping areas) from other areas that exhibit similar spectral features (e.g., bare soils and built-up areas). We implemented this method to extract SCMAs in the eastern part of Ordos City in Inner Mongolia, China. Our results had an overall accuracy of 97.07% and a kappa coefficient of 0.80. As compared with three other spectral information-based methods, our OODT approach is more accurate in quantifying the amount and spatial pattern of SCMAs in dryland regions.

  15. Identification of gefitinib off-targets using a structure-based systems biology approach; their validation with reverse docking and retrospective data mining

    PubMed Central

    Verma, Nidhi; Rai, Amit Kumar; Kaushik, Vibha; Brünnert, Daniela; Chahar, Kirti Raj; Pandey, Janmejay; Goyal, Pankaj

    2016-01-01

    Gefitinib, an EGFR tyrosine kinase inhibitor, is used as FDA approved drug in breast cancer and non-small cell lung cancer treatment. However, this drug has certain side effects and complications for which the underlying molecular mechanisms are not well understood. By systems biology based in silico analysis, we identified off-targets of gefitinib that might explain side effects of this drugs. The crystal structure of EGFR-gefitinib complex was used for binding pocket similarity searches on a druggable proteome database (Sc-PDB) by using IsoMIF Finder. The top 128 hits of putative off-targets were validated by reverse docking approach. The results showed that identified off-targets have efficient binding with gefitinib. The identified human specific off-targets were confirmed and further analyzed for their links with biological process and clinical disease pathways using retrospective studies and literature mining, respectively. Noticeably, many of the identified off-targets in this study were reported in previous high-throughput screenings. Interestingly, the present study reveals that gefitinib may have positive effects in reducing brain and bone metastasis, and may be useful in defining novel gefitinib based treatment regime. We propose that a system wide approach could be useful during new drug development and to minimize side effect of the prospective drug. PMID:27653775

  16. Identification of gefitinib off-targets using a structure-based systems biology approach; their validation with reverse docking and retrospective data mining.

    PubMed

    Verma, Nidhi; Rai, Amit Kumar; Kaushik, Vibha; Brünnert, Daniela; Chahar, Kirti Raj; Pandey, Janmejay; Goyal, Pankaj

    2016-09-22

    Gefitinib, an EGFR tyrosine kinase inhibitor, is used as FDA approved drug in breast cancer and non-small cell lung cancer treatment. However, this drug has certain side effects and complications for which the underlying molecular mechanisms are not well understood. By systems biology based in silico analysis, we identified off-targets of gefitinib that might explain side effects of this drugs. The crystal structure of EGFR-gefitinib complex was used for binding pocket similarity searches on a druggable proteome database (Sc-PDB) by using IsoMIF Finder. The top 128 hits of putative off-targets were validated by reverse docking approach. The results showed that identified off-targets have efficient binding with gefitinib. The identified human specific off-targets were confirmed and further analyzed for their links with biological process and clinical disease pathways using retrospective studies and literature mining, respectively. Noticeably, many of the identified off-targets in this study were reported in previous high-throughput screenings. Interestingly, the present study reveals that gefitinib may have positive effects in reducing brain and bone metastasis, and may be useful in defining novel gefitinib based treatment regime. We propose that a system wide approach could be useful during new drug development and to minimize side effect of the prospective drug.

  17. A mutation-centric approach to identifying pharmacogenomic relations in text.

    PubMed

    Rance, Bastien; Doughty, Emily; Demner-Fushman, Dina; Kann, Maricel G; Bodenreider, Olivier

    2012-10-01

    To explore the notion of mutation-centric pharmacogenomic relation extraction and to evaluate our approach against reference pharmacogenomic relations. From a corpus of MEDLINE abstracts relevant to genetic variation, we identify co-occurrences between drug mentions extracted using MetaMap and RxNorm, and genetic variants extracted by EMU. The recall of our approach is evaluated against reference relations curated manually in PharmGKB. We also reviewed a random sample of 180 relations in order to evaluate its precision. One crucial aspect of our strategy is the use of biological knowledge for identifying specific genetic variants in text, not simply gene mentions. On the 104 reference abstracts from PharmGKB, the recall of our mutation-centric approach is 33-46%. Applied to 282,000 abstracts from MEDLINE, our approach identifies pharmacogenomic relations in 4534 abstracts, with a precision of 65%. Compared to a relation-centric approach, our mutation-centric approach shows similar recall, but slightly lower precision. We show that both approaches have limited overlap in their results, but are complementary and can be used in combination. Rather than a solution for the automatic curation of pharmacogenomic knowledge, we see these high-throughput approaches as tools to assist biocurators in the identification of pharmacogenomic relations of interest from the published literature. This investigation also identified three challenging aspects of the extraction of pharmacogenomic relations, namely processing full-text articles, sequence validation of DNA variants and resolution of genetic variants to reference databases, such as dbSNP. Published by Elsevier Inc.

  18. Data mining for water resource management part 2 - methods and approaches to solving contemporary problems

    USGS Publications Warehouse

    Roehl, Edwin A.; Conrads, Paul A.

    2010-01-01

    This is the second of two papers that describe how data mining can aid natural-resource managers with the difficult problem of controlling the interactions between hydrologic and man-made systems. Data mining is a new science that assists scientists in converting large databases into knowledge, and is uniquely able to leverage the large amounts of real-time, multivariate data now being collected for hydrologic systems. Part 1 gives a high-level overview of data mining, and describes several applications that have addressed major water resource issues in South Carolina. This Part 2 paper describes how various data mining methods are integrated to produce predictive models for controlling surface- and groundwater hydraulics and quality. The methods include: - signal processing to remove noise and decompose complex signals into simpler components; - time series clustering that optimally groups hundreds of signals into "classes" that behave similarly for data reduction and (or) divide-and-conquer problem solving; - classification which optimally matches new data to behavioral classes; - artificial neural networks which optimally fit multivariate data to create predictive models; - model response surface visualization that greatly aids in understanding data and physical processes; and, - decision support systems that integrate data, models, and graphics into a single package that is easy to use.

  19. Early Prediction of Students' Grade Point Averages at Graduation: A Data Mining Approach

    ERIC Educational Resources Information Center

    Tekin, Ahmet

    2014-01-01

    Problem Statement: There has recently been interest in educational databases containing a variety of valuable but sometimes hidden data that can be used to help less successful students to improve their academic performance. The extraction of hidden information from these databases often implements aspects of the educational data mining (EDM)…

  20. A data mining approach for prediction of cultivated land demand in land use planning

    NASA Astrophysics Data System (ADS)

    Liu, Yaolin; Miao, Zuohua; Chen, Wenfei

    2005-10-01

    Although data mining is relative young technique, it has been used in a wide range of problem domains over the past few decades. In this paper, the authors present a new model to forecast the cultivated land demand adopts the technique of data mining. The new model which is called fuzzy Markov Chain model with weights ameliorate the traditional Time Homogeneous Finite Markov chain model to predict the future value of cultivated land demand in land use planning. The new model applied data mining technique to extract useful information from enormous historical data and then applied fuzzy sequential cluster method to set up the dissimilitude fuzzy clustering sections. The new model regards the standardized self-correlative coefficients as weights based on the special characteristics of correlation among the historical stochastic variables. The transition probabilities matrix of new model was obtained by using fuzzy logic theory and statistical analysis method. The experimental results shown that the ameliorative model combined with technique of data mining is more scientific and practical than traditional predictive models.

  1. Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine

    PubMed Central

    2014-01-01

    High-throughput DNA sequencing is revolutionizing the study of cancer and enabling the measurement of the somatic mutations that drive cancer development. However, the resulting sequencing datasets are large and complex, obscuring the clinically important mutations in a background of errors, noise, and random mutations. Here, we review computational approaches to identify somatic mutations in cancer genome sequences and to distinguish the driver mutations that are responsible for cancer from random, passenger mutations. First, we describe approaches to detect somatic mutations from high-throughput DNA sequencing data, particularly for tumor samples that comprise heterogeneous populations of cells. Next, we review computational approaches that aim to predict driver mutations according to their frequency of occurrence in a cohort of samples, or according to their predicted functional impact on protein sequence or structure. Finally, we review techniques to identify recurrent combinations of somatic mutations, including approaches that examine mutations in known pathways or protein-interaction networks, as well as de novo approaches that identify combinations of mutations according to statistical patterns of mutual exclusivity. These techniques, coupled with advances in high-throughput DNA sequencing, are enabling precision medicine approaches to the diagnosis and treatment of cancer. PMID:24479672

  2. A semiautomated approach to gene discovery through expressed sequence tag data mining: discovery of new human transporter genes.

    PubMed

    Brown, Shoshana; Chang, Jean L; Sadée, Wolfgang; Babbitt, Patricia C

    2003-01-01

    Identification and functional characterization of the genes in the human genome remain a major challenge. A principal source of publicly available information used for this purpose is the National Center for Biotechnology Information database of expressed sequence tags (dbEST), which contains over 4 million human ESTs. To extract the information buried in this data more effectively, we have developed a semiautomated method to mine dbEST for uncharacterized human genes. Starting with a single protein input sequence, a family of related proteins from all species is compiled. This entire family is then used to mine the human EST database for new gene candidates. Evaluation of putative new gene candidates in the context of a family of characterized proteins provides a framework for inference of the structure and function of the new genes. When applied to a test data set of 28 families within the major facilitator superfamily (MFS) of membrane transporters, our protocol found 73 previously characterized human MFS genes and 43 new MFS gene candidates. Development of this approach provided insights into the problems and pitfalls of automated data mining using public databases.

  3. Beyond the biomedical and behavioural: towards an integrated approach to HIV prevention in the southern African mining industry.

    PubMed

    Campbell, C; Williams, B

    1999-06-01

    While migrant labour is believed to play an important role in the dynamics of HIV-transmission in many of the countries of southern Africa, little has been written about the way in which HIV/AIDS has been dealt with in the industrial settings in which many migrant workers are employed. This paper takes the gold mining industry in the countries of the Southern African Development Community (SADC) as a case study. While many mines made substantial efforts to establish HIV-prevention programmes relatively early on in the epidemic, these appear to have had little impact. The paper analyses the response of key players in the mining industry, in the interests of highlighting the limitations of the way in which both managements and trade unions have responded to HIV. It will be argued that the energy that has been devoted either to biomedical or behavioural prevention programmes or to human rights issues has served to obscure the social and developmental dimensions of HIV-transmission. This argument is supported by means of a case study which seeks to highlight the complexity of the dynamics of disease transmission in this context, a complexity which is not reflected in individualistic responses. An account is given of a new intervention which seeks to develop a more integrated approach to HIV management in an industrial setting.

  4. Deformation Prediction and Geometrical Modeling of Head and Neck Cancer Tumor: A Data Mining Approach

    NASA Astrophysics Data System (ADS)

    Azimi, Maryam

    Radiation therapy has been used in the treatment of cancer tumors for several years and many cancer patients receive radiotherapy. It may be used as primary therapy or with a combination of surgery or other kinds of therapy such as chemotherapy, hormone therapy or some mixture of the three. The treatment objective is to destroy cancer cells or shrink the tumor by planning an adequate radiation dose to the desired target without damaging the normal tissues. By using the pre-treatment Computer Tomography (CT) images, most of the radiotherapy planning systems design the target and assume that the size of the tumor will not change throughout the treatment course, which takes 5 to 7 weeks. Based on this assumption, the total amount of radiation is planned and fractionated for the daily dose required to be delivered to the patient's body. However, this assumption is flawed because the patients receiving radiotherapy have marked changes in tumor geometry during the treatment period. Therefore, there is a critical need to understand the changes of the tumor shape and size over time during the course of radiotherapy in order to prevent significant effects of inaccuracy in the planning. In this research, a methodology is proposed in order to monitor and predict daily (fraction day) tumor volume and surface changes of head and neck cancer tumors during the entire treatment period. In the proposed method, geometrical modeling and data mining techniques will be used rather than repetitive CT scans data to predict the tumor deformation for radiation planning. Clinical patient data were obtained from the University of Texas-MD Anderson Cancer Center (MDACC). In the first step, by using CT scan data, the tumor's progressive geometric changes during the treatment period are quantified. The next step relates to using regression analysis in order to develop predictive models for tumor geometry based on the geometric analysis results and the patients' selected attributes (age, weight

  5. A data mining approach to in vivo classification of psychopharmacological drugs.

    PubMed

    Kafkafi, Neri; Yekutieli, Daniel; Elmer, Greg I

    2009-02-01

    Data mining is a powerful bioinformatics strategy that has been successfully applied in vitro to screen for gene-expression profiles predicting toxicological or carcinogenic response ('class predictors'). In this report we used a data mining algorithm named Pattern Array (PA) in vivo to analyze mouse open-field behavior and characterize the psychopharmacological effects of three drug classes--psychomotor stimulant, opioid, and psychotomimetic. PA represents rodent movement with approximately 100,000 complex patterns, defined as multiple combinations of several ethologically relevant variables, and mines them for those that maximize any effect of interest, such as the difference between drug classes. We show that PA can discover behavioral predictors of all three drug classes, thus developing a reliable drug-classification scheme in small group sizes. The discovered predictors showed orderly dose dependency despite being explicitly mined only for class differences, with the high doses scoring 4-10 standard deviations from the vehicle group. Furthermore, these predictors correctly classified in a dose-dependent manner four 'unknown' drugs (ie that were not used in the training process), and scored a mixture of a psychomotor stimulant and an opioid as being intermediate between these two classes. The isolated behaviors were highly heritable (h(2)>50%) and replicable as determined in 10 inbred strains across three laboratories. PA can in principle be applied for mining behaviors predicting additional properties, such as within-class differences between drugs and within-drug dose-response, all of which can be measured automatically in a single session per animal in an open-field arena, suggesting a high potential as a tool in psychotherapeutic drug discovery.

  6. An Improved Approach to Estimate Methane Emissions from Coal Mining in China.

    PubMed

    Zhu, Tao; Bian, Wenjing; Zhang, Shuqing; Di, Pingkuan; Nie, Baisheng

    2017-10-10

    China, the largest coal producer in the world, is responsible for over 50% of the total global methane (CH4) emissions from coal mining. However, the current emission inventory of CH4 from coal mining has large uncertainties because of the lack of localized emission factors (EFs). In this study, province-level CH4 EFs from coal mining in China were developed based on the data analysis of coal production and corresponding discharged CH4 emissions from 787 coal mines distributed in 25 provinces with different geological and operation conditions. Results show that the spatial distribution of CH4 EFs is highly variable with values as high as 36 m3/t and as low as 0.74 m3/t. Based on newly developed CH4 EFs and activity data, an inventory of the province-level CH4 emissions was built for 2005-2010. Results reveal that the total CH4 emissions in China increased from 11.5 Tg in 2005 to 16.0 Tg in 2010. By constructing a gray forecasting model for CH4 EFs and a regression model for activity, the province-level CH4 emissions from coal mining in China are forecasted for the years of 2011-2020. The estimates are compared with other published inventories. Our results have a reasonable agreement with USEPA's inventory and are lower by a factor of 1-2 than those estimated using the IPCC default EFs. This study could help guide CH4 mitigation policies and practices in China.

  7. Improving mine safety technology and training: establishing US global leadership

    SciTech Connect

    2006-12-15

    In 2006, the USA's record of mine safety was interrupted by fatalities that rocked the industry and caused the National Mining Association and its members to recommit to returning the US underground coal mining industry to a global mine safety leadership role. This report details a comprehensive approach to increase the odds of survival for miners in emergency situations and to create a culture of prevention of accidents. Among its 75 recommendations are a need to improve communications, mine rescue training, and escape and protection of miners. Section headings of the report are: Introduction; Review of mine emergency situations in the past 25 years: identifying and addressing the issues and complexities; Risk-based design and management; Communications technology; Escape and protection strategies; Emergency response and mine rescue procedures; Training for preparedness; Summary of recommendations; and Conclusions. 37 refs., 3 figs., 5 apps.

  8. GlycoMine: a machine learning-based approach for predicting N-, C- and O-linked glycosylation in the human proteome.

    PubMed

    Li, Fuyi; Li, Chen; Wang, Mingjun; Webb, Geoffrey I; Zhang, Yang; Whisstock, James C; Song, Jiangning

    2015-05-01

    Glycosylation is a ubiquitous type of protein post-translational modification (PTM) in eukaryotic cells, which plays vital roles in various biological processes (BPs) such as cellular communication, ligand recognition and subcellular recognition. It is estimated that >50% of the entire human proteome is glycosylated. However, it is still a significant challenge to identify glycosylation sites, which requires expensive/laborious experimental research. Thus, bioinformatics approaches that can predict the glycan occupancy at specific sequons in protein sequences would be useful for understanding and utilizing this important PTM. In this study, we present a novel bioinformatics tool called GlycoMine, which is a comprehensive tool for the systematic in silico identification of C-linked, N-linked, and O-linked glycosylation sites in the human proteome. GlycoMine was developed using the random forest algorithm and evaluated based on a well-prepared up-to-date benchmark dataset that encompasses all three types of glycosylation sites, which was curated from multiple public resources. Heterogeneous sequences and functional features were derived from various sources, and subjected to further two-step feature selection to characterize a condensed subset of optimal features that contributed most to the type-specific prediction of glycosylation sites. Five-fold cross-validation and independent tests show that this approach significantly improved the prediction performance compared with four existing prediction tools: NetNGlyc, NetOGlyc, EnsembleGly and GPP. We demonstrated that this tool could identify candidate glycosylation sites in case study proteins and applied it to identify many high-confidence glycosylation target proteins by screening the entire human proteome. The webserver, Java Applet, user instructions, datasets, and predicted glycosylation sites in the human proteome are freely available at http://www.structbioinfor.org/Lab/GlycoMine/. Jiangning.Song@monash.edu or

  9. Rehabilitation prioritization of abandoned mines and its application to Nyala Magnesite Mine

    NASA Astrophysics Data System (ADS)

    Mhlongo, Sphiwe Emmanuel; Amponsah-Dacosta, Francis; Mphephu, Nndweleni Fredrick

    2013-12-01

    The issue of abandoned mine sites is a major environmental and social problem for the mining industry, communities and governments. Historical mine sites are characterized by significant environmental, health and safety problems. The aim of this study was to develop hazard maps that can assist in the prioritization of rehabilitation at Nyala Mine. The approach used involved site examination and characterization to establish the environmental conditions of the mine. Hazards at the mine were identified, scored, and rated using modified Historic Mine Site Scoring System. The scoring focused on source and exposure pathways. The developed hazard maps showed that the best approach of effectively reducing the physical and environmental hazards at Nyala Mine was to give priority to extremely and moderately hazardous pits; surface infrastructure and spoil dumps, and then to tailings dumps characterized with less physical hazards but extremely high environmental hazards. Pits and spoil materials which were found to be relatively less problematic in terms of physical hazards were to receive least attention. The use of this hazard-scoring and risk-ranking methodology coupled with the hazard maps would provide a more robust scientific basis for making sound decisions and prioritize actions that need to be taken to minimize or manage risks associated with various areas of the mine site.

  10. Identifying Useful Auxiliary Variables for Incomplete Data Analyses: A Note on a Group Difference Examination Approach

    ERIC Educational Resources Information Center

    Raykov, Tenko; Marcoulides, George A.

    2014-01-01

    This research note contributes to the discussion of methods that can be used to identify useful auxiliary variables for analyses of incomplete data sets. A latent variable approach is discussed, which is helpful in finding auxiliary variables with the property that if included in subsequent maximum likelihood analyses they may enhance considerably…

  11. A Comprehensive Approach to Identifying Intervention Targets for Patient-Safety Improvement in a Hospital Setting

    ERIC Educational Resources Information Center

    Cunningham, Thomas R.; Geller, E. Scott

    2012-01-01

    Despite differences in approaches to organizational problem solving, healthcare managers and organizational behavior management (OBM) practitioners share a number of practices, and connecting healthcare management with OBM may lead to improvements in patient safety. A broad needs-assessment methodology was applied to identify patient-safety…

  12. The Baby TALK Model: An Innovative Approach to Identifying High-Risk Children and Families

    ERIC Educational Resources Information Center

    Villalpando, Aimee Hilado; Leow, Christine; Hornstein, John

    2012-01-01

    This research report examines the Baby TALK model, an innovative early childhood intervention approach used to identify, recruit, and serve young children who are at-risk for developmental delays, mental health needs, and/or school failure, and their families. The report begins with a description of the model. This description is followed by an…

  13. Identifying Useful Auxiliary Variables for Incomplete Data Analyses: A Note on a Group Difference Examination Approach

    ERIC Educational Resources Information Center

    Raykov, Tenko; Marcoulides, George A.

    2014-01-01

    This research note contributes to the discussion of methods that can be used to identify useful auxiliary variables for analyses of incomplete data sets. A latent variable approach is discussed, which is helpful in finding auxiliary variables with the property that if included in subsequent maximum likelihood analyses they may enhance considerably…

  14. Identifying Core Mobile Learning Faculty Competencies Based Integrated Approach: A Delphi Study

    ERIC Educational Resources Information Center

    Elbarbary, Rafik Said

    2015-01-01

    This study is based on the integrated approach as a concept framework to identify, categorize, and rank a key component of mobile learning core competencies for Egyptian faculty members in higher education. The field investigation framework used four rounds Delphi technique to determine the importance rate of each component of core competencies…

  15. Doing the Work of Extension: Three Approaches to Identify, Amplify, and Implement Outreach

    ERIC Educational Resources Information Center

    Raison, Brian

    2014-01-01

    This article explores the literature and practice of how the Cooperative Extension Service does its work and asks if traditional outreach and engagement models have room for innovative delivery mechanisms that may identify emerging trends and help meet community needs. It considers three innovative approaches to the educational mission:…

  16. A Function-First Approach to Identifying Formulaic Language in Academic Writing

    ERIC Educational Resources Information Center

    Durrant, Philip; Mathews-Aydinli, Julie

    2011-01-01

    There is currently much interest in creating pedagogically-oriented descriptions of formulaic language. Research in this area has typically taken what we call a "form-first" approach, in which formulas are identified as the most frequent recurrent forms in a relevant corpus. While this research continues to yield valuable results, the present…

  17. A Comprehensive Approach to Identifying Intervention Targets for Patient-Safety Improvement in a Hospital Setting

    ERIC Educational Resources Information Center

    Cunningham, Thomas R.; Geller, E. Scott

    2012-01-01

    Despite differences in approaches to organizational problem solving, healthcare managers and organizational behavior management (OBM) practitioners share a number of practices, and connecting healthcare management with OBM may lead to improvements in patient safety. A broad needs-assessment methodology was applied to identify patient-safety…

  18. A Function-First Approach to Identifying Formulaic Language in Academic Writing

    ERIC Educational Resources Information Center

    Durrant, Philip; Mathews-Aydinli, Julie

    2011-01-01

    There is currently much interest in creating pedagogically-oriented descriptions of formulaic language. Research in this area has typically taken what we call a "form-first" approach, in which formulas are identified as the most frequent recurrent forms in a relevant corpus. While this research continues to yield valuable results, the present…

  19. Ab initio thermodynamic approach to identify mixed solid sorbents for CO2 capture technology

    DOE PAGES

    Duan, Yuhua

    2015-10-15

    Because the current technologies for capturing CO2 are still too energy intensive, new materials must be developed that can capture CO2 reversibly with acceptable energy costs. At a given CO2 pressure, the turnover temperature (Tt) of the reaction of an individual solid that can capture CO2 is fixed. Such Tt may be outside the operating temperature range (ΔTo) for a practical capture technology. To adjust Tt to fit the practical ΔTo, in this study, three scenarios of mixing schemes are explored by combining thermodynamic database mining with first principles density functional theory and phonon lattice dynamics calculations. Our calculated resultsmore » demonstrate that by mixing different types of solids, it’s possible to shift Tt to the range of practical operating temperature conditions. According to the requirements imposed by the pre- and post- combustion technologies and based on our calculated thermodynamic properties for the CO2 capture reactions by the mixed solids of interest, we were able to identify the mixing ratios of two or more solids to form new sorbent materials for which lower capture energy costs are expected at the desired pressure and temperature conditions.« less

  20. Impacts of mountaintop mining on terrestrial ecosystem integrity: Identifying landscape thresholds for avian species in the central Appalachians, United States

    USGS Publications Warehouse

    Becker, Douglas A.; Wood, Petra Bohall; Strager, Michael P.; Mazzarella, Christine

    2014-01-01

    Because of little overlap in habitat requirements, managing landscapes simultaneously to maximally benefit both guilds may not be possible. Our avian thresholds identify single community management targets accounting for scarce species. Guild or individual species thresholds allow for species-specific management.

  1. Identifying seizure onset zone from electrocorticographic recordings: A machine learning approach based on phase locking value.

    PubMed

    Elahian, Bahareh; Yeasin, Mohammed; Mudigoudar, Basanagoud; Wheless, James W; Babajani-Feremi, Abbas

    2017-10-01

    Using a novel technique based on phase locking value (PLV), we investigated the potential for features extracted from electrocorticographic (ECoG) recordings to serve as biomarkers to identify the seizure onset zone (SOZ). We computed the PLV between the phase of the amplitude of high gamma activity (80-150Hz) and the phase of lower frequency rhythms (4-30Hz) from ECoG recordings obtained from 10 patients with epilepsy (21 seizures). We extracted five features from the PLV and used a machine learning approach based on logistic regression to build a model that classifies electrodes as SOZ or non-SOZ. More than 96% of electrodes identified as the SOZ by our algorithm were within the resected area in six seizure-free patients. In four non-seizure-free patients, more than 31% of the identified SOZ electrodes by our algorithm were outside the resected area. In addition, we observed that the seizure outcome in non-seizure-free patients correlated with the number of non-resected SOZ electrodes identified by our algorithm. This machine learning approach, based on features extracted from the PLV, effectively identified electrodes within the SOZ. The approach has the potential to assist clinicians in surgical decision-making when pre-surgical intracranial recordings are utilized. Copyright © 2017 British Epilepsy Association. Published by Elsevier Ltd. All rights reserved.

  2. Identifying Bioaccumulative Halogenated Organic Compounds Using a Nontargeted Analytical Approach: Seabirds as Sentinels

    PubMed Central

    Millow, Christopher J.; Mackintosh, Susan A.; Lewison, Rebecca L.; Dodder, Nathan G.; Hoh, Eunha

    2015-01-01

    Persistent organic pollutants (POPs) are typically monitored via targeted mass spectrometry, which potentially identifies only a fraction of the contaminants actually present in environmental samples. With new anthropogenic compounds continuously introduced to the environment, novel and proactive approaches that provide a comprehensive alternative to targeted methods are needed in order to more completely characterize the diversity of known and unknown compounds likely to cause adverse effects. Nontargeted mass spectrometry attempts to extensively screen for compounds, providing a feasible approach for identifying contaminants that warrant future monitoring. We employed a nontargeted analytical method using comprehensive two-dimensional gas chromatography coupled to time-of-flight mass spectrometry (GC×GC/TOF-MS) to characterize halogenated organic compounds (HOCs) in California Black skimmer (Rynchops niger) eggs. Our study identified 111 HOCs; 84 of these compounds were regularly detected via targeted approaches, while 27 were classified as typically unmonitored or unknown. Typically unmonitored compounds of note in bird eggs included tris(4-chlorophenyl)methane (TCPM), tris(4-chlorophenyl)methanol (TCPMOH), triclosan, permethrin, heptachloro-1'-methyl-1,2'-bipyrrole (MBP), as well as four halogenated unknown compounds that could not be identified through database searching or the literature. The presence of these compounds in Black skimmer eggs suggests they are persistent, bioaccumulative, potentially biomagnifying, and maternally transferring. Our results highlight the utility and importance of employing nontargeted analytical tools to assess true contaminant burdens in organisms, as well as to demonstrate the value in using environmental sentinels to proactively identify novel contaminants. PMID:26020245

  3. A novel approach of mining strong jumping emerging patterns based on BSC-tree

    NASA Astrophysics Data System (ADS)

    Liu, Quanzhong; Shi, Peng; Hu, Zhengguo; Zhang, Yang

    2014-03-01

    It is a great challenge to discover strong jumping emerging patterns (SJEPs) from a high-dimensional dataset because of the huge pattern space. In this article, we propose a dynamically growing contrast pattern tree (DGCP-tree) structure to store grown patterns and their path codes arrays with 1-bit counts, which are from the constructed bit string compression tree. A method of mining SJEPs based on DGCP-tree is developed. In order to reduce the pattern search space, we introduce a novel pattern pruning method, which dramatically reduces non-minimal jumping emerging patterns (JEPs) during the mining process. Experiments are performed on three real cancer datasets and three datasets from the University of California, Irvine machine-learning repository. Compared with the well-known CP-tree method, the results show that the proposed method is substantially faster, able to handle higher-dimensional datasets and to prune more non-minimal JEPs.

  4. A New Approach for Mining Order-Preserving Submatrices Based on All Common Subsequences.

    PubMed

    Xue, Yun; Liao, Zhengling; Li, Meihang; Luo, Jie; Kuang, Qiuhua; Hu, Xiaohui; Li, Tiechen

    2015-01-01

    Order-preserving submatrices (OPSMs) have been applied in many fields, such as DNA microarray data analysis, automatic recommendation systems, and target marketing systems, as an important unsupervised learning model. Unfortunately, most existing methods are heuristic algorithms which are unable to reveal OPSMs entirely in NP-complete problem. In particular, deep OPSMs, corresponding to long patterns with few supporting sequences, incur explosive computational costs and are completely pruned by most popular methods. In this paper, we propose an exact method to discover all OPSMs based on frequent sequential pattern mining. First, an existing algorithm was adjusted to disclose all common subsequence (ACS) between every two row sequences, and therefore all deep OPSMs will not be missed. Then, an improved data structure for prefix tree was used to store and traverse ACS, and Apriori principle was employed to efficiently mine the frequent sequential pattern. Finally, experiments were implemented on gene and synthetic datasets. Results demonstrated the effectiveness and efficiency of this method.

  5. A fuzzy approach for mining association rules in a probabilistic database

    NASA Astrophysics Data System (ADS)

    Pei, Bin; Chen, Dingjie; Zhao, Suyun; Chen, Hong

    2013-07-01

    Association rule mining is an essential knowledge discovery method that can find associations in database. Previous studies on association rule mining focus on finding quantitative association rules from certain data, or finding Boolean association rules from uncertain data. Unfortunately, due to instrument errors, imprecise of sensor monitoring systems and so on, real-world data tend to be quantitative data with inherent uncertainty. In our paper, we study the discovery of association rules from probabilistic database with quantitative attributes. Once we convert quantitative attributes into fuzzy sets, we get a probabilistic database with fuzzy sets in the database. This is theoretical challenging, since we need to give appropriate interest measures to define support and confidence degree of fuzzy events with probability. We propose a Shannon-like Entropy to measure the information of such event. After that, an algorithm is proposed to find fuzzy association rules from probabilistic database. Finally, an illustrated example is given to demonstrate the procedure of the algorithm.

  6. A reclamation approach for mined prime farmland by adding organic wastes and lime to the subsoil

    SciTech Connect

    Zhai, Qiang; Barnhisel, R.I.

    1996-12-31

    Surface mined prime farmland may be reclaimed by adding organic wastes and lime to subsoil thus improving conditions in root zone. In this study, sewage sludge, poultry manure, horse bedding, and lime were applied to subsoil (15-30 cm) during reclamation. Soil properties and plant growth were measured over two years. All organic amendments tended to lower the subsoil bulk density and increase organic matter and total nitrogen. Liming raised exchangeable calcium, slightly increased pH, but decreased exchangeable magnesium and potassium. Corn ear-leaf and forage tissue nitrogen, yields, and nitrogen removal increased in treatments amended with sewage sludge and poultry manure, but not horse bedding. Subsoil application of sewage sludge or poultry manure seems like a promising method in the reclamation of surface mined prime farmland based on the improvements observed in the root zone environment.

  7. Perspective: NanoMine: A material genome approach for polymer nanocomposites analysis and design

    NASA Astrophysics Data System (ADS)

    Zhao, He; Li, Xiaolin; Zhang, Yichi; Schadler, Linda S.; Chen, Wei; Brinson, L. Catherine

    2016-05-01

    Polymer nanocomposites are a designer class of materials where nanoscale particles, functional chemistry, and polymer resin combine to provide materials with unprecedented combinations of physical properties. In this paper, we introduce NanoMine, a data-driven web-based platform for analysis and design of polymer nanocomposite systems under the material genome concept. This open data resource strives to curate experimental and computational data on nanocomposite processing, structure, and properties, as well as to provide analysis and modeling tools that leverage curated data for material property prediction and design. With a continuously expanding dataset and toolkit, NanoMine encourages community feedback and input to construct a sustainable infrastructure that benefits nanocomposite material research and development.

  8. Quantitative and qualitative approaches to identifying migration chronology in a continental migrant

    USGS Publications Warehouse

    Beatty, William S.; Kesler, Dylan C.; Webb, Elisabeth B.; Raedeke, Andrew H.; Naylor, Luke W.; Humburg, Dale D.

    2013-01-01

    The degree to which extrinsic factors influence migration chronology in North American waterfowl has not been quantified, particularly for dabbling ducks. Previous studies have examined waterfowl migration using various methods, however, quantitative approaches to define avian migration chronology over broad spatio-temporal scales are limited, and the implications for using different approaches have not been assessed. We used movement data from 19 female adult mallards (Anas platyrhynchos) equipped with solar-powered global positioning system satellite transmitters to evaluate two individual level approaches for quantifying migration chronology. The first approach defined migration based on individual movements among geopolitical boundaries (state, provincial, international), whereas the second method modeled net displacement as a function of time using nonlinear models. Differences in migration chronologies identified by each of the approaches were examined with analysis of variance. The geopolitical method identified mean autumn migration midpoints at 15 November 2010 and 13 November 2011, whereas the net displacement method identified midpoints at 15 November 2010 and 14 November 2011. The mean midpoints for spring migration were 3 April 2011 and 20 March 2012 using the geopolitical method and 31 March 2011 and 22 March 2012 using the net displacement method. The duration, initiation date, midpoint, and termination date for both autumn and spring migration did not differ between the two individual level approaches. Although we did not detect differences in migration parameters between the different approaches, the net displacement metric offers broad potential to address questions in movement ecology for migrating species. Ultimately, an objective definition of migration chronology will allow researchers to obtain a comprehensive understanding of the extrinsic factors that drive migration at the individual and population levels. As a result, targeted

  9. Quantitative and qualitative approaches to identifying migration chronology in a continental migrant.

    PubMed

    Beatty, William S; Kesler, Dylan C; Webb, Elisabeth B; Raedeke, Andrew H; Naylor, Luke W; Humburg, Dale D

    2013-01-01

    The degree to which extrinsic factors influence migration chronology in North American waterfowl has not been quantified, particularly for dabbling ducks. Previous studies have examined waterfowl migration using various methods, however, quantitative approaches to define avian migration chronology over broad spatio-temporal scales are limited, and the implications for using different approaches have not been assessed. We used movement data from 19 female adult mallards (Anas platyrhynchos) equipped with solar-powered global positioning system satellite transmitters to evaluate two individual level approaches for quantifying migration chronology. The first approach defined migration based on individual movements among geopolitical boundaries (state, provincial, international), whereas the second method modeled net displacement as a function of time using nonlinear models. Differences in migration chronologies identified by each of the approaches were examined with analysis of variance. The geopolitical method identified mean autumn migration midpoints at 15 November 2010 and 13 November 2011, whereas the net displacement method identified midpoints at 15 November 2010 and 14 November 2011. The mean midpoints for spring migration were 3 April 2011 and 20 March 2012 using the geopolitical method and 31 March 2011 and 22 March 2012 using the net displacement method. The duration, initiation date, midpoint, and termination date for both autumn and spring migration did not differ between the two individual level approaches. Although we did not detect differences in migration parameters between the different approaches, the net displacement metric offers broad potential to address questions in movement ecology for migrating species. Ultimately, an objective definition of migration chronology will allow researchers to obtain a comprehensive understanding of the extrinsic factors that drive migration at the individual and population levels. As a result, targeted

  10. Quantitative and Qualitative Approaches to Identifying Migration Chronology in a Continental Migrant

    PubMed Central

    Beatty, William S.; Kesler, Dylan C.; Webb, Elisabeth B.; Raedeke, Andrew H.; Naylor, Luke W.; Humburg, Dale D.

    2013-01-01

    The degree to which extrinsic factors influence migration chronology in North American waterfowl has not been quantified, particularly for dabbling ducks. Previous studies have examined waterfowl migration using various methods, however, quantitative approaches to define avian migration chronology over broad spatio-temporal scales are limited, and the implications for using different approaches have not been assessed. We used movement data from 19 female adult mallards (Anas platyrhynchos) equipped with solar-powered global positioning system satellite transmitters to evaluate two individual level approaches for quantifying migration chronology. The first approach defined migration based on individual movements among geopolitical boundaries (state, provincial, international), whereas the second method modeled net displacement as a function of time using nonlinear models. Differences in migration chronologies identified by each of the approaches were examined with analysis of variance. The geopolitical method identified mean autumn migration midpoints at 15 November 2010 and 13 November 2011, whereas the net displacement method identified midpoints at 15 November 2010 and 14 November 2011. The mean midpoints for spring migration were 3 April 2011 and 20 March 2012 using the geopolitical method and 31 March 2011 and 22 March 2012 using the net displacement method. The duration, initiation date, midpoint, and termination date for both autumn and spring migration did not differ between the two individual level approaches. Although we did not detect differences in migration parameters between the different approaches, the net displacement metric offers broad potential to address questions in movement ecology for migrating species. Ultimately, an objective definition of migration chronology will allow researchers to obtain a comprehensive understanding of the extrinsic factors that drive migration at the individual and population levels. As a result, targeted

  11. Comparative Characterization of Crofelemer Samples Using Data Mining and Machine Learning Approaches With Analytical Stability Data Sets.

    PubMed

    Nariya, Maulik K; Kim, Jae Hyun; Xiong, Jian; Kleindl, Peter A; Hewarathna, Asha; Fisher, Adam C; Joshi, Sangeeta B; Schöneich, Christian; Forrest, M Laird; Middaugh, C Russell; Volkin, David B; Deeds, Eric J

    2017-07-22

    There is growing interest in generating physicochemical and biological analytical data sets to compare complex mixture drugs, for example, products from different manufacturers. In this work, we compare various crofelemer samples prepared from a single lot by filtration with varying molecular weight cutoffs combined with incubation for different times at different temperatures. The 2 preceding articles describe experimental data sets generated from analytical characterization of fractionated and degraded crofelemer samples. In this work, we use data mining techniques such as principal component analysis and mutual information scores to help visualize the data and determine discriminatory regions within these large data sets. The mutual information score identifies chemical signatures that differentiate crofelemer samples. These signatures, in many cases, would likely be missed by traditional data analysis tools. We also found that supervised learning classifiers robustly discriminate samples with around 99% classification accuracy, indicating that mathematical models of these physicochemical data sets are capable of identifying even subtle differences in crofelemer samples. Data mining and machine learning techniques can thus identify fingerprint-type attributes of complex mixture drugs that may be used for comparative characterization of products. Copyright © 2017 American Pharmacists Association®. All rights reserved.

  12. A cross-species bi-clustering approach to identifying conserved co-regulated genes

    PubMed Central

    Sun, Jiangwen; Jiang, Zongliang; Tian, Xiuchun; Bi, Jinbo

    2016-01-01

    Motivation: A growing number of studies have explored the process of pre-implantation embryonic development of multiple mammalian species. However, the conservation and variation among different species in their developmental programming are poorly defined due to the lack of effective computational methods for detecting co-regularized genes that are conserved across species. The most sophisticated method to date for identifying conserved co-regulated genes is a two-step approach. This approach first identifies gene clusters for each species by a cluster analysis of gene expression data, and subsequently computes the overlaps of clusters identified from different species to reveal common subgroups. This approach is ineffective to deal with the noise in the expression data introduced by the complicated procedures in quantifying gene expression. Furthermore, due to the sequential nature of the approach, the gene clusters identified in the first step may have little overlap among different species in the second step, thus difficult to detect conserved co-regulated genes. Results: We propose a cross-species bi-clustering approach which first denoises the gene expression data of each species into a data matrix. The rows of the data matrices of different species represent the same set of genes that are characterized by their expression patterns over the developmental stages of each species as columns. A novel bi-clustering method is then developed to cluster genes into subgroups by a joint sparse rank-one factorization of all the data matrices. This method decomposes a data matrix into a product of a column vector and a row vector where the column vector is a consistent indicator across the matrices (species) to identify the same gene cluster and the row vector specifies for each species the developmental stages that the clustered genes co-regulate. Efficient optimization algorithm has been developed with convergence analysis. This approach was first validated on

  13. Using an interdisciplinary approach to identify factors that affect clinicians' compliance with evidence-based guidelines.

    PubMed

    Gurses, Ayse P; Marsteller, Jill A; Ozok, A Ant; Xiao, Yan; Owens, Sharon; Pronovost, Peter J

    2010-08-01

    Our objective was to identify factors that affect clinicians' compliance with the evidence-based guidelines using an interdisciplinary approach and develop a conceptual framework that can provide a comprehensive and practical guide for designing effective interventions. A literature review and a brainstorming session with 11 researchers from a variety of scientific disciplines were used to identify theoretical and conceptual models describing clinicians' guideline compliance. MEDLINE, EMBASE, CINAHL, and the bibliographies of the papers identified were used as data sources for identifying the relevant theoretical and conceptual models. Thirteen different models that originated from various disciplines including medicine, rural sociology, psychology, human factors and systems engineering, organizational management, marketing, and health education were identified. Four main categories of factors that affect compliance emerged from our analysis: clinician characteristics, guideline characteristics, system characteristics, and implementation characteristics. Based on these findings, we developed an interdisciplinary conceptual framework that specifies the expected interrelationships among these four categories of factors and their impact on clinicians' compliance. An interdisciplinary approach is needed to improve clinicians' compliance with evidence-based guidelines. The conceptual framework from this research can provide a comprehensive and systematic guide to identify barriers to guideline compliance and design effective interventions to improve patient safety.

  14. Identifying inhibitory compounds in lignocellulosic biomass hydrolysates using an exometabolomics approach

    PubMed Central

    2014-01-01

    Background Inhibitors are formed that reduce the fermentation performance of fermenting yeast during the pretreatment process of lignocellulosic biomass. An exometabolomics approach was applied to systematically identify inhibitors in lignocellulosic biomass hydrolysates. Results We studied the composition and fermentability of 24 different biomass hydrolysates. To create diversity, the 24 hydrolysates were prepared from six different biomass types, namely sugar cane bagasse, corn stover, wheat straw, barley straw, willow wood chips and oak sawdust, and with four different pretreatment methods, i.e. dilute acid, mild alkaline, alkaline/peracetic acid and concentrated acid. Their composition and that of fermentation samples generated with these hydrolysates were analyzed with two GC-MS methods. Either ethyl acetate extraction or ethyl chloroformate derivatization was used before conducting GC-MS to prevent sugars are overloaded in the chromatograms, which obscure the detection of less abundant compounds. Using multivariate PLS-2CV and nPLS-2CV data analysis models, potential inhibitors were identified through establishing relationship between fermentability and composition of the hydrolysates. These identified compounds were tested for their effects on the growth of the model yeast, Saccharomyces. cerevisiae CEN.PK 113-7D, confirming that the majority of the identified compounds were indeed inhibitors. Conclusion Inhibitory compounds in lignocellulosic biomass hydrolysates were successfully identified using a non-targeted systematic approach: metabolomics. The identified inhibitors include both known ones, such as furfural, HMF and vanillin, and novel inhibitors, namely sorbic acid and phenylacetaldehyde. PMID:24655423

  15. Missing defects? A comparison of microscopic and macroscopic approaches to identifying linear enamel hypoplasia.

    PubMed

    Hassett, Brenna R

    2014-03-01

    Linear enamel hypoplasia (LEH), the presence of linear defects of dental enamel formed during periods of growth disruption, is frequently analyzed in physical anthropology as evidence for childhood health in the past. However, a wide variety of methods for identifying and interpreting these defects in archaeological remains exists, preventing easy cross-comparison of results from disparate studies. This article compares a standard approach to identifying LEH using the naked eye to the evidence of growth disruption observed microscopically from the enamel surface. This comparison demonstrates that what is interpreted as evidence of growth disruption microscopically is not uniformly identified with the naked eye, and provides a reference for the level of consistency between the number and timing of defects identified using microscopic versus macroscopic approaches. This is done for different tooth types using a large sample of unworn permanent teeth drawn from several post-medieval London burial assemblages. The resulting schematic diagrams showing where macroscopic methods achieve more or less similar results to microscopic methods are presented here and clearly demonstrate that "naked-eye" methods of identifying growth disruptions do not identify LEH as often as microscopic methods in areas where perikymata are more densely packed.

  16. Identifying inhibitory compounds in lignocellulosic biomass hydrolysates using an exometabolomics approach.

    PubMed

    Zha, Ying; Westerhuis, Johan A; Muilwijk, Bas; Overkamp, Karin M; Nijmeijer, Bernadien M; Coulier, Leon; Smilde, Age K; Punt, Peter J

    2014-03-21

    Inhibitors are formed that reduce the fermentation performance of fermenting yeast during the pretreatment process of lignocellulosic biomass. An exometabolomics approach was applied to systematically identify inhibitors in lignocellulosic biomass hydrolysates. We studied the composition and fermentability of 24 different biomass hydrolysates. To create diversity, the 24 hydrolysates were prepared from six different biomass types, namely sugar cane bagasse, corn stover, wheat straw, barley straw, willow wood chips and oak sawdust, and with four different pretreatment methods, i.e. dilute acid, mild alkaline, alkaline/peracetic acid and concentrated acid. Their composition and that of fermentation samples generated with these hydrolysates were analyzed with two GC-MS methods. Either ethyl acetate extraction or ethyl chloroformate derivatization was used before conducting GC-MS to prevent sugars are overloaded in the chromatograms, which obscure the detection of less abundant compounds. Using multivariate PLS-2CV and nPLS-2CV data analysis models, potential inhibitors were identified through establishing relationship between fermentability and composition of the hydrolysates. These identified compounds were tested for their effects on the growth of the model yeast, Saccharomyces. cerevisiae CEN.PK 113-7D, confirming that the majority of the identified compounds were indeed inhibitors. Inhibitory compounds in lignocellulosic biomass hydrolysates were successfully identified using a non-targeted systematic approach: metabolomics. The identified inhibitors include both known ones, such as furfural, HMF and vanillin, and novel inhibitors, namely sorbic acid and phenylacetaldehyde.

  17. A comparison of approaches for finding minimum identifying codes on graphs

    NASA Astrophysics Data System (ADS)

    Horan, Victoria; Adachi, Steve; Bak, Stanley

    2016-05-01

    In order to formulate mathematical conjectures likely to be true, a number of base cases must be determined. However, many combinatorial problems are NP-hard and the computational complexity makes this research approach difficult using a standard brute force approach on a typical computer. One sample problem explored is that of finding a minimum identifying code. To work around the computational issues, a variety of methods are explored and consist of a parallel computing approach using MATLAB, an adiabatic quantum optimization approach using a D-Wave quantum annealing processor, and lastly using satisfiability modulo theory (SMT) and corresponding SMT solvers. Each of these methods requires the problem to be formulated in a unique manner. In this paper, we address the challenges of computing solutions to this NP-hard problem with respect to each of these methods.

  18. In silico and in vitro drug screening identifies new therapeutic approaches for Ewing sarcoma

    PubMed Central

    Pessetto, Ziyan Y.; Chen, Bin; Alturkmani, Hani; Hyter, Stephen; Flynn, Colleen A.; Baltezor, Michael; Ma, Yan; Rosenthal, Howard G.; Neville, Kathleen A.; Weir, Scott J.; Butte, Atul J.; Godwin, Andrew K.

    2017-01-01

    The long-term overall survival of Ewing sarcoma (EWS) patients remains poor; less than 30% of patients with metastatic or recurrent disease survive despite aggressive combinations of chemotherapy, radiation and surgery. To identify new therapeutic options, we employed a multi-pronged approach using in silico predictions of drug activity via an integrated bioinformatics approach in parallel with an in vitro screen of FDA-approved drugs. Twenty-seven drugs and forty-six drugs were identified, respectively, to have anti-proliferative effects for EWS, including several classes of drugs in both screening approaches. Among these drugs, 30 were extensively validated as mono-therapeutic agents and 9 in 14 various combinations in vitro. Two drugs, auranofin, a thioredoxin reductase inhibitor, and ganetespib, an HSP90 inhibitor, were predicted to have anti-cancer activities in silico and were confirmed active across a panel of genetically diverse EWS cells. When given in combination, the survival rate in vivo was superior compared to auranofin or ganetespib alone. Importantly, extensive formulations, dose tolerance, and pharmacokinetics studies demonstrated that auranofin requires alternative delivery routes to achieve therapeutically effective levels of the gold compound. These combined screening approaches provide a rapid means to identify new treatment options for patients with a rare and often-fatal disease. PMID:27863422

  19. Support for information management in critical care: a new approach to identify needs.

    PubMed Central

    Rosenal, T. W.; Forsythe, D. E.; Musen, M. A.; Seiver, A.

    1995-01-01

    Managing information is necessary to support clinical decision making and action in critical care. By understanding the nature of information management and its relationship to sound clinical practice, we should come to use technology more wisely. We demonstrated that a new approach inspired by ethnographic research methods could identify useful and unexpected findings about clinical information management. In this approach, a clinician experienced in a specific domain (critical care), with advice from a medical anthropologist, made short-term observations of information management in that domain. We identified 8 areas in a critical care Unit in which information management was seriously in need of better support. We also found interesting differences in how these needs were viewed by nurses and physicians. Our interest in this approach was at two levels: 1. Identify and describe representative instances of sub-optimal information management in a critical care Unit. 2. Investigate the effectiveness of such short-term observations by clinicians. Our long-range goal is to explore the use of this approach and the information it reveals to optimize the process of developing and selecting new information support tools, preparing for their introduction, and optimizing clinical outcomes. PMID:8563267

  20. Alternative approaches for identifying acute systemic toxicity: Moving from research to regulatory testing.

    PubMed

    Hamm, Jon; Sullivan, Kristie; Clippinger, Amy J; Strickland, Judy; Bell, Shannon; Bhhatarai, Barun; Blaauboer, Bas; Casey, Warren; Dorman, David; Forsby, Anna; Garcia-Reyero, Natàlia; Gehen, Sean; Graepel, Rabea; Hotchkiss, Jon; Lowit, Anna; Matheson, Joanna; Reaves, Elissa; Scarano, Louis; Sprankle, Catherine; Tunkel, Jay; Wilson, Dan; Xia, Menghang; Zhu, Hao; Allen, David

    2017-01-06

    Acute systemic toxicity testing provides the basis for hazard labeling and risk management of chemicals. A number of international efforts have been directed at identifying non-animal alternatives for in vivo acute systemic toxicity tests. A September 2015 workshop, Alternative Approaches for Identifying Acute Systemic Toxicity: Moving from Research to Regulatory Testing, reviewed the state-of-the-science of non-animal alternatives for this testing and explored ways to facilitate implementation of alternatives. Workshop attendees included representatives from international regulatory agencies, academia, nongovernmental organizations, and industry. Resources identified as necessary for meaningful progress in implementing alternatives included compiling and making available high-quality reference data, training on use and interpretation of in vitro and in silico approaches, and global harmonization of testing requirements. Attendees particularly noted the need to characterize variability in reference data to evaluate new approaches. They also noted the importance of understanding the mechanisms of acute toxicity, which could be facilitated by the development of adverse outcome pathways. Workshop breakout groups explored different approaches to reducing or replacing animal use for acute toxicity testing, with each group crafting a roadmap and strategy to accomplish near-term progress. The workshop steering committee has organized efforts to implement the recommendations of the workshop participants.

  1. An information-theoretic approach to assess practical identifiability of parametric dynamical systems.

    PubMed

    Pant, Sanjay; Lombardi, Damiano

    2015-10-01

    A new approach for assessing parameter identifiability of dynamical systems in a Bayesian setting is presented. The concept of Shannon entropy is employed to measure the inherent uncertainty in the parameters. The expected reduction in this uncertainty is seen as the amount of information one expects to gain about the parameters due to the availability of noisy measurements of the dynamical system. Such expected information gain is interpreted in terms of the variance of a hypothetical measurement device that can measure the parameters directly, and is related to practical identifiability of the parameters. If the individual parameters are unidentifiable, correlation between parameter combinations is assessed through conditional mutual information to determine which sets of parameters can be identified together. The information theoretic quantities of entropy and information are evaluated numerically through a combination of Monte Carlo and k-nearest neighbour methods in a non-parametric fashion. Unlike many methods to evaluate identifiability proposed in the literature, the proposed approach takes the measurement-noise into account and is not restricted to any particular noise-structure. Whilst computationally intensive for large dynamical systems, it is easily parallelisable and is non-intrusive as it does not necessitate re-writing of the numerical solvers of the dynamical system. The application of such an approach is presented for a variety of dynamical systems--ranging from systems governed by ordinary differential equations to partial differential equations--and, where possible, validated against results previously published in the literature.

  2. Referred by Google: mining Trends data to identify patterns in and correlates to searches for dermatologic concerns and providers.

    PubMed

    Ransohoff, J D; Sarin, K Y

    2017-03-24

    Google Trends is a powerful tool that provides population-level insight into search volumes by time and geography. Since 2004, Google Trends has been profiled across studies of public interest, disease surveillance, prevention, and compliance.(1,2) Dermatologists have used Trends data to identify seasonal peaks in skin cancer and tanning searches(3,4) Other Google tools, including Health Cards(5) and Reverse Image Searching(6) , have been explored by dermatologists for generating differential diagnoses. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  3. A novel approach to identify driver genes involved in androgen-independent prostate cancer

    PubMed Central

    2014-01-01

    Background Insertional mutagenesis screens have been used with great success to identify oncogenes and tumor suppressor genes. Typically, these screens use gammaretroviruses (γRV) or transposons as insertional mutagens. However, insertional mutations from replication-competent γRVs or transposons that occur later during oncogenesis can produce passenger mutations that do not drive cancer progression. Here, we utilized a replication-incompetent lentiviral vector (LV) to perform an insertional mutagenesis screen to identify genes in the progression to androgen-independent prostate cancer (AIPC). Methods Prostate cancer cells were mutagenized with a LV to enrich for clones with a selective advantage in an androgen-deficient environment provided by a dysregulated gene(s) near the vector integration site. We performed our screen using an in vitro AIPC model and also an in vivo xenotransplant model for AIPC. Our approach identified proviral integration sites utilizing a shuttle vector that allows for rapid rescue of plasmids in E. coli that contain LV long terminal repeat (LTR)-chromosome junctions. This shuttle vector approach does not require PCR amplification and has several advantages over PCR-based techniques. Results Proviral integrations were enriched near prostate cancer susceptibility loci in cells grown in androgen-deficient medium (p < 0.001), and five candidate genes that influence AIPC were identified; ATPAF1, GCOM1, MEX3D, PTRF, and TRPM4. Additionally, we showed that RNAi knockdown of ATPAF1 significantly reduces growth (p < 0.05) in androgen-deficient conditions. Conclusions Our approach has proven effective for use in PCa, identifying a known prostate cancer gene, PTRF, and also several genes not previously associated with prostate cancer. The replication-incompetent shuttle vector approach has broad potential applications for cancer gene discovery, and for interrogating diverse biological and disease processes. PMID:24885513

  4. Three novel approaches to structural identifiability analysis in mixed-effects models.

    PubMed

    Janzén, David L I; Jirstrand, Mats; Chappell, Michael J; Evans, Neil D

    2016-05-06

    Structural identifiability is a concept that considers whether the structure of a model together with a set of input-output relations uniquely determines the model parameters. In the mathematical modelling of biological systems, structural identifiability is an important concept since biological interpretations are typically made from the parameter estimates. For a system defined by ordinary differential equations, several methods have been developed to analyse whether the model is structurally identifiable or otherwise. Another well-used modelling framework, which is particularly useful when the experimental data are sparsely sampled and the population variance is of interest, is mixed-effects modelling. However, established identifiability analysis techniques for ordinary differential equations are not directly applicable to such models. In this paper, we present and apply three different methods that can be used to study structural identifiability in mixed-effects models. The first method, called the repeated measurement approach, is based on applying a set of previously established statistical theorems. The second method, called the augmented system approach, is based on augmenting the mixed-effects model to an extended state-space form. The third method, called the Laplace transform mixed-effects extension, is based on considering the moment invariants of the systems transfer function as functions of random variables. To illustrate, compare and contrast the application of the three methods, they are applied to a set of mixed-effects models. Three structural identifiability analysis methods applicable to mixed-effects models have been presented in this paper. As method development of structural identifiability techniques for mixed-effects models has been given very little attention, despite mixed-effects models being widely used, the methods presented in this paper provides a way of handling structural identifiability in mixed-effects models previously not

  5. Identifying Predictors of Arsenic Bioavailability in Low-Sulfide, Quartz-Hosted Gold Deposits: Case Study at the Empire Mine State Historic Park, CA, USA

    NASA Astrophysics Data System (ADS)

    Foster, A. L.; Alpers, C. N.; Burlak Regnier, T.; Blum, A.; Petersen, E. U.; Basta, N. T.; Whitacre, S.; Casteel, S. W.; Kim, C. S.

    2016-12-01

    Introduction: This study addressed a need to identify geochemical and mineralogical parameters that are significantly correlated with arsenic bioavailability at historically-mined, low-sulfide, quartz-hosted ("lode") gold deposits. The study location was the Empire Mine State Historic Park (EMSHP), a site that is typical of many lode deposits in California in that arsenic is a primary contaminant of concern. Methods: A total of 25 large-volume sediment/mine waste samples were collected from sites in the EMSHP, homogenized, and dry sieved (< 250 micron). The following datasets were collected from the 25 samples (or a subset thereof as indicated): (1) in vivo relative As bioavailability (juvenile swine; n = 12); (2) in vitro relative As bioaccessibility (n = 25); (3) solid-phase chemistry (XRF; n = 25); (4) quantitative mineralogy (n =25); (5) Bulk As- and iron (Fe) speciation (synchrotron X-ray absorption spectroscopy, XAS, n =19); (6) point-based micron-scale composition (electron microprobe, n =12); and (7) micron-scale mineralogical and compositional mapping (QEMSCAN, n = 12). The matrix of bivariate correlations among these datasets was evaluated using a cutoff criterion for significance of p < 0.05. Results:Arsenic bioavailability was positively and significantly correlated with the abundance of Fe (hydr)oxide, the relative abundance of As-bearing hydroxide and As concentration in Fe hydroxide (datasets 4, 5, and 6, respectively). The relative abundance of As associated with Al-bearing secondary minerals (determined by As-XAS) was also positively and significantly correlated with datasets (1) and (2), but the correlation quality was lower. The relative abundance of other arsenic-bearing secondary minerals (e.g., jarosite, calcium arsenate, arseniosiderite) as determined by As XAS had positive correlations with bioaccessibility and/or bioavailability, but the correlations were not statistically significant. We ascribe this result to the fact that these phases

  6. Objective definition of rosette shape variation using a combined computer vision and data mining approach.

    PubMed

    Camargo, Anyela; Papadopoulou, Dimitra; Spyropoulou, Zoi; Vlachonasios, Konstantinos; Doonan, John H; Gay, Alan P

    2014-01-01

    Computer-vision based measurements of phenotypic variation have implications for crop improvement and food security because they are intrinsically objective. It should be possible therefore to use such approaches to select robust genotypes. However, plants are morphologically complex and identification of meaningful traits from automatically acquired image data is not straightforward. Bespoke algorithms can be designed to capture and/or quantitate specific features but this approach is inflexible and is not generally applicable to a wide range of traits. In this paper, we have used industry-standard computer vision techniques to extract a wide range of features from images of genetically diverse Arabidopsis rosettes growing under non-stimulated conditions, and then used statistical analysis to identify those features that provide good discrimination between ecotypes. This analysis indicates that almost all the observed shape variation can be described by 5 principal components. We describe an easily implemented pipeline including image segmentation, feature extraction and statistical analysis. This pipeline provides a cost-effective and inherently scalable method to parameterise and analyse variation in rosette shape. The acquisition of images does not require any specialised equipment and the computer routines for image processing and data analysis have been implemented using open source software. Source code for data analysis is written using the R package. The equations to calculate image descriptors have been also provided.

  7. Objective Definition of Rosette Shape Variation Using a Combined Computer Vision and Data Mining Approach

    PubMed Central

    Camargo, Anyela; Papadopoulou, Dimitra; Spyropoulou, Zoi; Vlachonasios, Konstantinos; Doonan, John H.; Gay, Alan P.

    2014-01-01

    Computer-vision based measurements of phenotypic variation have implications for crop improvement and food security because they are intrinsically objective. It should be possible therefore to use such approaches to select robust genotypes. However, plants are morphologically complex and identification of meaningful traits from automatically acquired image data is not straightforward. Bespoke algorithms can be designed to capture and/or quantitate specific features but this approach is inflexible and is not generally applicable to a wide range of traits. In this paper, we have used industry-standard computer vision techniques to extract a wide range of features from images of genetically diverse Arabidopsis rosettes growing under non-stimulated conditions, and then used statistical analysis to identify those features that provide good discrimination between ecotypes. This analysis indicates that almost all the observed shape variation can be described by 5 principal components. We describe an easily implemented pipeline including image segmentation, feature extraction and statistical analysis. This pipeline provides a cost-effective and inherently scalable method to parameterise and analyse variation in rosette shape. The acquisition of images does not require any specialised equipment and the computer routines for image processing and data analysis have been implemented using open source software. Source code for data analysis is written using the R package. The equations to calculate image descriptors have been also provided. PMID:24804972

  8. An Integrated Approach Identifies Mediators of Local Recurrence in Head and Neck Squamous Carcinoma.

    PubMed

    Citron, Francesca; Armenia, Joshua; Franchin, Giovanni; Polesel, Jerry; Talamini, Renato; D'Andrea, Sara; Sulfaro, Sandro; Croce, Carlo M; Klement, William; Otasek, David; Pastrello, Chiara; Tokar, Tomas; Jurisica, Igor; French, Deborah; Bomben, Riccardo; Vaccher, Emanuela; Serraino, Diego; Belletti, Barbara; Vecchione, Andrea; Barzan, Luigi; Baldassarre, Gustavo

    2017-07-15

    Purpose: Head and neck squamous cell carcinomas (HNSCCs) cause more than 300,000 deaths worldwide each year. Locoregional and distant recurrences represent worse prognostic events and accepted surrogate markers of patients' overall survival. No valid biomarker and salvage therapy exist to identify and treat patients at high-risk of recurrence. We aimed to verify if selected miRNAs could be used as biomarkers of recurrence in HNSCC.Experimental Design: A NanoString array was used to identify miRNAs associated with locoregional recurrence in 44 patients with HNSCC. Bioinformatic approaches validated the signature and identified potential miRNA targets. Validation experiments were performed using an independent cohort of primary HNSCC samples and a panel of HNSCC cell lines. In vivo experiments validated the in vitro results.Results: Our data identified a four-miRNA signature that classified HNSCC patients at high- or low-risk of recurrence. These miRNAs collectively impinge on the epithelial-mesenchymal transition process. In silico and wet lab approaches showed that miR-9, expressed at high levels in recurrent HNSCC, targets SASH1 and KRT13, whereas miR-1, miR-133, and miR-150, expressed at low levels in recurrent HNSCC, collectively target SP1 and TGFβ pathways. A six-gene signature comprising these targets identified patients at high risk of recurrences, as well. Combined pharmacological inhibition of SP1 and TGFβ pathways induced HNSCC cell death and, when timely administered, prevented recurrence formation in a preclinical model of HNSCC recurrence.Conclusions: By integrating different experimental approaches and competences, we identified critical mediators of recurrence formation in HNSCC that may merit to be considered for future clinical development. Clin Cancer Res; 23(14); 3769-80. ©2017 AACR. ©2017 American Association for Cancer Research.

  9. A Systematic Approach to Determining the Identifiability of Multistage Carcinogenesis Models.

    PubMed

    Brouwer, Andrew F; Meza, Rafael; Eisenberg, Marisa C

    2016-09-09

    Multistage clonal expansion (MSCE) models of carcinogenesis are continuous-time Markov process models often used to relate cancer incidence to biological mechanism. Identifiability analysis determines what model parameter combinations can, theoretically, be estimated from given data. We use a systematic approach, based on differential algebra methods traditionally used for deterministic ordinary differential equation (ODE) models, to determine identifiable combinations for a generalized subclass of MSCE models with any number of preinitation stages and one clonal expansion. Additionally, we determine the identifiable combinations of the generalized MSCE model with up to four clonal expansion stages, and conjecture the results for any number of clonal expansion stages. The results improve upon previous work in a number of ways and provide a framework to find the identifiable combinations for further variations on the MSCE models. Finally, our approach, which takes advantage of the Kolmogorov backward equations for the probability generating functions of the Markov process, demonstrates that identifiability methods used in engineering and mathematics for systems of ODEs can be applied to continuous-time Markov processes.

  10. Exposures from mining and mine tailings

    NASA Astrophysics Data System (ADS)

    Chambers, Douglas B.; Cassaday, Valerie J.; Lowe, Leo M.

    The mining, milling and tailings management of uranium ores results in environmental radiation exposures. This paper describes the sources of radioactive emissions to the environment associated with these activities, reviews the basic approach used to estimate the resultant radiation exposures and presents examples of typical uranium mind and mill facilities. Similar concepts apply to radiation exposures associated with the mining of non-radioactive ores although the magnitudes of the exposures would normally be smaller than those associated with uranium mining.

  11. An integrated remote sensing approach for identifying ecological range sites. [parker mountain

    NASA Technical Reports Server (NTRS)

    Jaynes, R. A.

    1983-01-01

    A model approach for identifying ecological range sites was applied to high elevation sagebrush-dominated rangelands on Parker Mountain, in south-central Utah. The approach utilizes map information derived from both high altitude color infrared photography and LANDSAT digital data, integrated with soils, geological, and precipitation maps. Identification of the ecological range site for a given area requires an evaluation of all relevant environmental factors which combine to give that site the potential to produce characteristic types and amounts of vegetation. A table is presented which allows the user to determine ecological range site based upon an integrated use of the maps which were prepared. The advantages of identifying ecological range sites through an integrated photo interpretation/LANDSAT analysis are discussed.

  12. Online-Based Approaches to Identify Real Journals and Publishers from Hijacked Ones.

    PubMed

    Asadi, Amin; Rahbar, Nader; Asadi, Meisam; Asadi, Fahime; Khalili Paji, Kokab

    2017-02-01

    The aim of the present paper was to introduce some online-based approaches to evaluate scientific journals and publishers and to differentiate them from the hijacked ones, regardless of their disciplines. With the advent of open-access journals, many hijacked journals and publishers have deceitfully assumed the mantle of authenticity in order to take advantage of researchers and students. Although these hijacked journals and publishers can be identified through checking their advertisement techniques and their websites, these ways do not always result in their identification. There exist certain online-based approaches, such as using Master Journal List provided by Thomson Reuters, and Scopus database, and using the DOI of a paper, to certify the realness of a journal or publisher. It is indispensable that inexperienced students and researchers know these methods so as to identify hijacked journals and publishers with a higher level of probability.

  13. A network-based feature selection approach to identify metabolic signatures in disease.

    PubMed

    Netzer, Michael; Kugler, Karl G; Müller, Laurin A J; Weinberger, Klaus M; Graber, Armin; Baumgartner, Christian; Dehmer, Matthias

    2012-10-07

    The identification and interpretation of metabolic biomarkers is a challenging task. In this context, network-based approaches have become increasingly a key technology in systems biology allowing to capture complex interactions in biological systems. In this work, we introduce a novel network-based method to identify highly predictive biomarker candidates for disease. First, we infer two different types of networks: (i) correlation networks, and (ii) a new type of network called ratio networks. Based on these networks, we introduce scores to prioritize features using topological descriptors of the vertices. To evaluate our method we use an example dataset where quantitative targeted MS/MS analysis was applied to a total of 52 blood samples from 22 persons with obesity (BMI >30) and 30 healthy controls. Using our network-based feature selection approach we identified highly discriminating metabolites for obesity (F-score >0.85, accuracy >85%), some of which could be verified by the literature.

  14. An Approach to Identify and Characterize a Subunit Candidate Shigella Vaccine Antigen.

    PubMed

    Pore, Debasis; Chakrabarti, Manoj K

    2016-01-01

    Shigellosis remains a serious issue throughout the developing countries, particularly in children under the age of 5. Numerous strategies have been tested to develop vaccines targeting shigellosis; unfortunately despite several years of extensive research, no safe, effective, and inexpensive vaccine against shigellosis is available so far. Here, we illustrate in detail an approach to identify and establish immunogenic outer membrane proteins from Shigella flexneri 2a as subunit vaccine candidates.

  15. FRIGA, A New Approach To Identify Isotopes and Hypernuclei In N-Body Transport Models

    NASA Astrophysics Data System (ADS)

    Le Févre, A.; Leifels, Y.; Aichelin, J.; Hartnack, Ch; Kireyev, V.; Bratkovskaya, E.

    2016-01-01

    We present a new algorithm to identify fragments in computer simulations of relativistic heavy ion collisions. It is based on the simulated annealing technique and can be applied to n-body transport models like the Quantum Molecular Dynamics. This new approach is able to predict isotope yields as well as hyper-nucleus production. In order to illustrate its predicting power, we confront this new method to experimental data, and show the sensitivity on the parameters which govern the cluster formation.

  16. Demonstrating a Market-Based Approach to the Reclamation of Mined Lands in West Virginia

    SciTech Connect

    Goodrich-Mahoney, John; Donnelly, Ellen

    2009-12-31

    This project demonstrated that developing environmental credits on private land—including abandoned mined lands—is dependent on a number of factors, some of them beyond the control of the project team. In this project, acid mine drainage (AMD) was successfully remediated through the construction of a passive AMD treatment system. Extensive water quality sampling both before and after the installation of the passive AMD treatment system showed that the system achieved removal efficiencies and pollutant loading reductions for acidity, iron, aluminum and manganese that were consistent with systems of similar size and design. The success of the passive AMD treatment system should have resulted in water credits if the project had not been terminated. Developing carbon sequestration credits, however, was much more complex and was not achieved in this project. The primary challenge that the project team encountered in meeting the full project objectives was the unsuccessful attempt to have the landowner sign a conservation easement for his property. This would have allowed the project team to clear and reforest the site, monitor the progress of the newly planted trees, and eventually realize carbon sequestration credits once the forest was mature. The delays caused by the lack of a conservation easement, as well as other factors, eventually resulted in the reforestation portion of the project being cancelled. The information in this report will help the public make more informed decisions regarding the potential of using water and carbon, and other credits to support the remediation of minded lands through out the United States. The hope is that by using credits that more mined lands with be remediated.

  17. Long-range prediction of Indian summer monsoon rainfall using data mining and statistical approaches

    NASA Astrophysics Data System (ADS)

    H, Vathsala; Koolagudi, Shashidhar G.

    2016-07-01

    This paper presents a hybrid model to better predict Indian summer monsoon rainfall. The algorithm considers suitable techniques for processing dense datasets. The proposed three-step algorithm comprises closed itemset generation-based association rule mining for feature selection, cluster membership for dimensionality reduction, and simple logistic function for prediction. The application of predicting rainfall into flood, excess, normal, deficit, and drought based on 36 predictors consisting of land and ocean variables is presented. Results show good accuracy in the considered study period of 37years (1969-2005).

  18. Mining environmental high-throughput sequence data sets to identify divergent amplicon clusters for phylogenetic reconstruction and morphotype visualization.

    PubMed

    Gimmler, Anna; Stoeck, Thorsten

    2015-08-01

    Environmental high-throughput sequencing (envHTS) is a very powerful tool, which in protistan ecology is predominantly used for the exploration of diversity and its geographic and local patterns. We here used a pyrosequenced V4-SSU rDNA data set from a solar saltern pond as test case to exploit such massive protistan amplicon data sets beyond this descriptive purpose. Therefore, we combined a Swarm-based blastn network including 11 579 ciliate V4 amplicons to identify divergent amplicon clusters with targeted polymerase chain reaction (PCR) primer design for full-length small subunit of the ribosomal DNA retrieval and probe design for fluorescence in situ hybridization (FISH). This powerful strategy allows to benefit from envHTS data sets to (i) reveal the phylogenetic position of the taxon behind divergent amplicons; (ii) improve phylogenetic resolution and evolutionary history of specific taxon groups; (iii) solidly assess an amplicons (species') degree of similarity to its closest described relative; (iv) visualize the morphotype behind a divergent amplicons cluster; (v) rapidly FISH screen many environmental samples for geographic/habitat distribution and abundances of the respective organism and (vi) to monitor the success of enrichment strategies in live samples for cultivation and isolation of the respective organisms.

  19. A Chemical Screening Approach to Identify Novel Key Mediators of Erythroid Enucleation.

    PubMed

    Wölwer, Christina B; Pase, Luke B; Pearson, Helen B; Gödde, Nathan J; Lackovic, Kurt; Huang, David C S; Russell, Sarah M; Humbert, Patrick O

    2015-01-01

    Erythroid enucleation is critical for terminal differentiation of red blood cells, and involves extrusion of the nucleus by orthochromatic erythroblasts to produce reticulocytes. Due to the difficulty of synchronizing erythroblasts, the molecular mechanisms underlying the enucleation process remain poorly understood. To elucidate the cellular program governing enucleation, we utilized a novel chemical screening approach whereby orthochromatic cells primed for enucleation were enriched ex vivo and subjected to a functional drug screen using a 324 compound library consisting of structurally diverse, medicinally active and cell permeable drugs. Using this approach, we have confirmed the role of HDACs, proteasomal regulators and MAPK in erythroid enucleation and introduce a new role for Cyclin-dependent kinases, in particular CDK9, in this process. Importantly, we demonstrate that when coupled with imaging analysis, this approach provides a powerful means to identify and characterize rate limiting steps involved in the erythroid enucleation process.

  20. Systems approaches in osteoarthritis: Identifying routes to novel diagnostic and therapeutic strategies.

    PubMed

    Mueller, Alan J; Peffers, Mandy J; Proctor, Carole J; Clegg, Peter D

    2017-03-20

    Systems orientated research offers the possibility of identifying novel therapeutic targets and relevant diagnostic markers for complex diseases such as osteoarthritis. This review demonstrates that the osteoarthritis research community has been slow to incorporate systems orientated approaches into research studies, although a number of key studies reveal novel insights into the regulatory mechanisms that contribute both to joint tissue homeostasis and its dysfunction. The review introduces both top-down and bottom-up approaches employed in the study of osteoarthritis. A holistic and multiscale approach, where clinical measurements may predict dysregulation and progression of joint degeneration, should be a key objective in future research. The review concludes with suggestions for further research and emerging trends not least of which is the coupled development of diagnostic tests and therapeutics as part of a concerted effort by the osteoarthritis research community to meet clinical needs. This article is protected by copyright. All rights reserved.

  1. Identifying Key Performance Indicators for Holistic Hospital Management with a Modified DEMATEL Approach

    PubMed Central

    Si, Sheng-Li; You, Xiao-Yue; Huang, Jia

    2017-01-01

    Performance analysis is an important way for hospitals to achieve higher efficiency and effectiveness in providing services to their customers. The performance of the healthcare system can be measured by many indicators, but it is difficult to improve them simultaneously due to the limited resources. A feasible way is to identify the central and influential indicators to improve healthcare performance in a stepwise manner. In this paper, we propose a hybrid multiple criteria decision making (MCDM) approach to identify key performance indicators (KPIs) for holistic hospital management. First, through integrating evidential reasoning approach and interval 2-tuple linguistic variables, various assessments of performance indicators provided by healthcare experts are modeled. Then, the decision making trial and evaluation laboratory (DEMATEL) technique is adopted to build an interactive network and visualize the causal relationships between the performance indicators. Finally, an empirical case study is provided to demonstrate the proposed approach for improving the efficiency of healthcare management. The results show that “accidents/adverse events”, “nosocomial infection”, ‘‘incidents/errors”, “number of operations/procedures” are significant influential indicators. Also, the indicators of “length of stay”, “bed occupancy” and “financial measures” play important roles in performance evaluation of the healthcare organization. The proposed decision making approach could be considered as a reference for healthcare administrators to enhance the performance of their healthcare institutions. PMID:28825613

  2. Identifying Key Performance Indicators for Holistic Hospital Management with a Modified DEMATEL Approach.

    PubMed

    Si, Sheng-Li; You, Xiao-Yue; Liu, Hu-Chen; Huang, Jia

    2017-08-19

    Performance analysis is an important way for hospitals to achieve higher efficiency and effectiveness in providing services to their customers. The performance of the healthcare system can be measured by many indicators, but it is difficult to improve them simultaneously due to the limited resources. A feasible way is to identify the central and influential indicators to improve healthcare performance in a stepwise manner. In this paper, we propose a hybrid multiple criteria decision making (MCDM) approach to identify key performance indicators (KPIs) for holistic hospital management. First, through integrating evidential reasoning approach and interval 2-tuple linguistic variables, various assessments of performance indicators provided by healthcare experts are modeled. Then, the decision making trial and evaluation laboratory (DEMATEL) technique is adopted to build an interactive network and visualize the causal relationships between the performance indicators. Finally, an empirical case study is provided to demonstrate the proposed approach for improving the efficiency of healthcare management. The results show that "accidents/adverse events", "nosocomial infection", ''incidents/errors", "number of operations/procedures" are significant influential indicators. Also, the indicators of "length of stay", "bed occupancy" and "financial measures" play important roles in performance evaluation of the healthcare organization. The proposed decision making approach could be considered as a reference for healthcare administrators to enhance the performance of their healthcare institutions.

  3. An Approach for Identifying Cytokines Based on a Novel Ensemble Classifier

    PubMed Central

    Zou, Quan; Wang, Zhen; Guan, Xinjun; Liu, Bin; Wu, Yunfeng; Lin, Ziyu

    2013-01-01

    Biology is meaningful and important to identify cytokines and investigate their various functions and biochemical mechanisms. However, several issues remain, including the large scale of benchmark datasets, serious imbalance of data, and discovery of new gene families. In this paper, we employ the machine learning approach based on a novel ensemble classifier to predict cytokines. We directly selected amino acids sequences as research objects. First, we pretreated the benchmark data accurately. Next, we analyzed the physicochemical properties and distribution of whole amino acids and then extracted a group of 120-dimensional (120D) valid features to represent sequences. Third, in the view of the serious imbalance in benchmark datasets, we utilized a sampling approach based on the synthetic minority oversampling technique algorithm and K-means clustering undersampling algorithm to rebuild the training set. Finally, we built a library for dynamic selection and circulating combination based on clustering (LibD3C) and employed the new training set to realize cytokine classification. Experiments showed that the geometric mean of sensitivity and specificity obtained through our approach is as high as 93.3%, which proves that our approach is effective for identifying