Science.gov

Sample records for mining approach identifies

  1. A novel pattern mining approach for identifying cognitive activity in EEG based functional brain networks.

    PubMed

    Thilaga, M; Vijayalakshmi, R; Nadarajan, R; Nandagopal, D

    2016-06-01

    The complex nature of neuronal interactions of the human brain has posed many challenges to the research community. To explore the underlying mechanisms of neuronal activity of cohesive brain regions during different cognitive activities, many innovative mathematical and computational models are required. This paper presents a novel Common Functional Pattern Mining approach to demonstrate the similar patterns of interactions due to common behavior of certain brain regions. The electrode sites of EEG-based functional brain network are modeled as a set of transactions and node-based complex network measures as itemsets. These itemsets are transformed into a graph data structure called Functional Pattern Graph. By mining this Functional Pattern Graph, the common functional patterns due to specific brain functioning can be identified. The empirical analyses show the efficiency of the proposed approach in identifying the extent to which the electrode sites (transactions) are similar during various cognitive load states.

  2. Computational Approaches for Mining GRO-Seq Data to Identify and Characterize Active Enhancers.

    PubMed

    Nagari, Anusha; Murakami, Shino; Malladi, Venkat S; Kraus, W Lee

    2017-01-01

    Transcriptional enhancers are DNA regulatory elements that are bound by transcription factors and act to positively regulate the expression of nearby or distally located target genes. Enhancers have many features that have been discovered using genomic analyses. Recent studies have shown that active enhancers recruit RNA polymerase II (Pol II) and are transcribed, producing enhancer RNAs (eRNAs). GRO-seq, a method for identifying the location and orientation of all actively transcribing RNA polymerases across the genome, is a powerful approach for monitoring nascent enhancer transcription. Furthermore, the unique pattern of enhancer transcription can be used to identify enhancers in the absence of any information about the underlying transcription factors. Here, we describe the computational approaches required to identify and analyze active enhancers using GRO-seq data, including data pre-processing, alignment, and transcript calling. In addition, we describe protocols and computational pipelines for mining GRO-seq data to identify active enhancers, as well as known transcription factor binding sites that are transcribed. Furthermore, we discuss approaches for integrating GRO-seq-based enhancer data with other genomic data, including target gene expression and function. Finally, we describe molecular biology assays that can be used to confirm and explore further the function of enhancers that have been identified using genomic assays. Together, these approaches should allow the user to identify and explore the features and biological functions of new cell type-specific enhancers.

  3. A Data Mining Approach to Identify Sexuality Patterns in a Brazilian University Population.

    PubMed

    Waleska Simões, Priscyla; Cesconetto, Samuel; Toniazzo de Abreu, Larissa Letieli; Côrtes de Mattos Garcia, Merisandra; Cassettari Junior, José Márcio; Comunello, Eros; Bisognin Ceretta, Luciane; Aparecida Manenti, Sandra

    2015-01-01

    This paper presents the profile and experience of sexuality generated from a data mining classification task. We used a database about sexuality and gender violence performed on a university population in southern Brazil. The data mining task identified two relationships between the variables, which enabled the distinction of subgroups that better detail the profile and experience of sexuality. The identification of the relationships between the variables define behavioral models and factors of risk that will help define the algorithms being implemented in the data mining classification task.

  4. Identifying functional connectivity in large-scale neural ensemble recordings: a multiscale data mining approach.

    PubMed

    Eldawlatly, Seif; Jin, Rong; Oweiss, Karim G

    2009-02-01

    Identifying functional connectivity between neuronal elements is an essential first step toward understanding how the brain orchestrates information processing at the single-cell and population levels to carry out biological computations. This letter suggests a new approach to identify functional connectivity between neuronal elements from their simultaneously recorded spike trains. In particular, we identify clusters of neurons that exhibit functional interdependency over variable spatial and temporal patterns of interaction. We represent neurons as objects in a graph and connect them using arbitrarily defined similarity measures calculated across multiple timescales. We then use a probabilistic spectral clustering algorithm to cluster the neurons in the graph by solving a minimum graph cut optimization problem. Using point process theory to model population activity, we demonstrate the robustness of the approach in tracking a broad spectrum of neuronal interaction, from synchrony to rate co-modulation, by systematically varying the length of the firing history interval and the strength of the connecting synapses that govern the discharge pattern of each neuron. We also demonstrate how activity-dependent plasticity can be tracked and quantified in multiple network topologies built to mimic distinct behavioral contexts. We compare the performance to classical approaches to illustrate the substantial gain in performance.

  5. SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics.

    PubMed

    Bertone, P; Kluger, Y; Lan, N; Zheng, D; Christendat, D; Yee, A; Edwards, A M; Arrowsmith, C H; Montelione, G T; Gerstein, M

    2001-07-01

    High-throughput structural proteomics is expected to generate considerable amounts of data on the progress of structure determination for many proteins. For each protein this includes information about cloning, expression, purification, biophysical characterization and structure determination via NMR spectroscopy or X-ray crystallography. It will be essential to develop specifications and ontologies for standardizing this information to make it amenable to retrospective analysis. To this end we created the SPINE database and analysis system for the Northeast Structural Genomics Consortium. SPINE, which is available at bioinfo.mbb.yale.edu/nesg or nesg.org, is specifically designed to enable distributed scientific collaboration via the Internet. It was designed not just as an information repository but as an active vehicle to standardize proteomics data in a form that would enable systematic data mining. The system features an intuitive user interface for interactive retrieval and modification of expression construct data, query forms designed to track global project progress and external links to many other resources. Currently the database contains experimental data on 985 constructs, of which 740 are drawn from Methanobacterium thermoautotrophicum, 123 from Saccharomyces cerevisiae, 93 from Caenorhabditis elegans and the remainder from other organisms. We developed a comprehensive set of data mining features for each protein, including several related to experimental progress (e.g. expression level, solubility and crystallization) and 42 based on the underlying protein sequence (e.g. amino acid composition, secondary structure and occurrence of low complexity regions). We demonstrate in detail the application of a particular machine learning approach, decision trees, to the tasks of predicting a protein's solubility and propensity to crystallize based on sequence features. We are able to extract a number of key rules from our trees, in particular that soluble

  6. Developing Isotope Tools for Identifying Mercury Mining Sources

    NASA Astrophysics Data System (ADS)

    Koster van Groos, P. G.; Esser, B. K.; Williams, R. W.; Hunt, J. R.

    2009-12-01

    Mining operations in California during the past two centuries have resulted in widespread mercury contamination. Source control strategies are difficult and expensive to implement, in part because links between specific mercury sources and exposures are often uncertain. Examination of mercury’s stable isotopes can help resolve this issue. Sources with distinct isotope compositions may be traced through the environment. Mercury mining operations are predicted to have led to waste tailings, mercury metal products, and air emissions with different isotope compositions as a result of inefficient mercury extraction and recovery from ores. The predicted differences in isotope composition, based on estimated kinetic and diffusion isotope effects, are greater than the precision of current analytical methods using multi-collector inductively coupled plasma mass-spectrometers (MC-ICP-MS). As such, mercury isotope measurements may help identify mercury originating from different mining operations. To support a mechanistic approach to mercury isotope fractionation, the isotope effects of diffusion through solids and gases are being investigated experimentally. Besides demonstrating the utility of mercury isotope analysis for source identification, this work is providing a mechanistic basis for differences in isotope compositions.

  7. A data mining approach to intelligence operations

    NASA Astrophysics Data System (ADS)

    Memon, Nasrullah; Hicks, David L.; Harkiolakis, Nicholas

    2008-03-01

    In this paper we examine the latest thinking, approaches and methodologies in use for finding the nuggets of information and subliminal (and perhaps intentionally hidden) patterns and associations that are critical to identify criminal activity and suspects to private and government security agencies. An emphasis in the paper is placed on Social Network Analysis and Investigative Data Mining, and the use of these technologies in the counterterrorism domain. Tools and techniques from both areas are described, along with the important tasks for which they can be used to assist with the investigation and analysis of terrorist organizations. The process of collecting data about these organizations is also considered along with the inherent difficulties that are involved.

  8. Implementation of an original approach on the Mines-Douai Comparative Reactivity Method (MD-CRM) instrument to identify part of the missing OH reactivity at an urban site

    NASA Astrophysics Data System (ADS)

    Dusanter, S.; Michoud, V.; Leonardis, T.; Riffault, V.; Zhang, S.; Locoge, N.

    2015-12-01

    Due to the large number of Volatile Organic Compounds (VOCs) expected in the atmosphere (104-105) (Goldstein and Galbally, ES&T, 2007), exhaustive measurements of VOCs appear to be currently unfeasible using common analytical techniques. In this context, measurements of the total sink of OH, referred as total OH reactivity, can provide a critical test to assess the completeness of trace gas measurements during field campaigns. This can be done by comparing the measured total OH reactivity to values calculated from trace gas measurements. Indeed, large discrepancies are usually found between measured and calculated OH reactivity values revealing the presence of important unmeasured reactive species, which have yet to be identified. A Comparative Reactivity Method (CRM) instrument has been setup at Mines Douai to allow sequential measurements of VOCs and OH reactivity using the same Proton Transfer Reaction-Time of Flight Mass Spectrometer. This approach aims at identifying unmeasured reactive VOCs based on a method proposed by Kato et al. (Atmos. Environ., 2011), taking advantage of VOC oxidations occurring in the CRM sampling reactor. MD-CRM has been deployed at an urban site in Dunkirk (France) during July 2014 to test this new approach. During this campaign, a large fraction of the OH reactivity was not explained by collocated measurements of trace gases (67% on average). In this presentation, we will first describe the approach that was implemented in the CRM instrument to identify part of the observed missing OH reactivity and we will then discuss the OH reactivity budget regarding the origin of air masses reaching the measurement site.

  9. Edu-mining: A Machine Learning Approach

    NASA Astrophysics Data System (ADS)

    Srimani, P. K.; Patil, Malini M.

    2011-12-01

    Mining Educational data is an emerging interdisciplinary research area that mainly deals with the development of methods to explore the data stored in educational institutions. The educational data is referred as Edu-DATA. Queries related to Edu-DATA are of practical interest as SQL approach is insufficient and needs to be focused in a different way. The paper aims at developing a technique called Edu-MINING which converts raw data coming from educational institutions using data mining techniques into useful information. The discovered knowledge will have a great impact on the educational research and practices. Edu-MINING explores Edu-DATA, discovers new knowledge and suggests useful methods to improve the quality of education with regard to teaching-learning process. This is illustrated through a case study.

  10. Mining for Murder-Suicide: An Approach to Identifying Cases of Murder-Suicide in the National Violent Death Reporting System Restricted Access Database.

    PubMed

    McNally, Matthew R; Patton, Christina L; Fremouw, William J

    2016-01-01

    The National Violent Death Reporting System (NVDRS) is a United States Centers for Disease Control and Prevention (CDC) database of violent deaths from 2003 to the present. The NVDRS collects information from 32 states on several types of violent deaths, including suicides, homicides, homicides followed by suicides, and deaths resulting from child maltreatment or intimate partner violence, as well as legal intervention and accidental firearm deaths. Despite the availability of data from police narratives, medical examiner reports, and other sources, reliably finding the cases of murder-suicide in the NVDRS has proven problematic due to the lack of a unique code for murder-suicide incidents and outdated descriptions of case-finding procedures from previous researchers. By providing a description of the methods used to access to the NVDRS and coding procedures used to decipher these data, the authors seek to assist future researchers in correctly identifying cases of murder-suicide deaths while avoiding false positives.

  11. Identifying Engineering Students' English Sentence Reading Comprehension Errors: Applying a Data Mining Technique

    ERIC Educational Resources Information Center

    Tsai, Yea-Ru; Ouyang, Chen-Sen; Chang, Yukon

    2016-01-01

    The purpose of this study is to propose a diagnostic approach to identify engineering students' English reading comprehension errors. Student data were collected during the process of reading texts of English for science and technology on a web-based cumulative sentence analysis system. For the analysis, the association-rule, data mining technique…

  12. Identifying the Cause of Toxicity of a Saline Mine Water

    PubMed Central

    van Dam, Rick A.; Harford, Andrew J.; Lunn, Simon A.; Gagnon, Marthe M.

    2014-01-01

    Elevated major ions (or salinity) are recognised as being a key contributor to the toxicity of many mine waste waters but the complex interactions between the major ions and large inter-species variability in response to salinity, make it difficult to relate toxicity to causal factors. This study aimed to determine if the toxicity of a typical saline seepage water was solely due to its major ion constituents; and determine which major ions were the leading contributors to the toxicity. Standardised toxicity tests using two tropical freshwater species Chlorella sp. (alga) and Moinodaphnia macleayi (cladoceran) were used to compare the toxicity of 1) mine and synthetic seepage water; 2) key major ions (e.g. Na, Cl, SO4 and HCO3); 3) synthetic seepage water that were modified by excluding key major ions. For Chlorella sp., the toxicity of the seepage water was not solely due to its major ion concentrations because there were differences in effects caused by the mine seepage and synthetic seepage. However, for M. macleayi this hypothesis was supported because similar effects caused by mine seepage and synthetic seepage. Sulfate was identified as a major ion that could predict the toxicity of the synthetic waters, which might be expected as it was the dominant major ion in the seepage water. However, sulfate was not the primary cause of toxicity in the seepage water and electrical conductivity was a better predictor of effects. Ultimately, the results show that specific major ions do not clearly drive the toxicity of saline seepage waters and the effects are probably due to the electrical conductivity of the mine waste waters. PMID:25180579

  13. Identifying the cause of toxicity of a saline mine water.

    PubMed

    van Dam, Rick A; Harford, Andrew J; Lunn, Simon A; Gagnon, Marthe M

    2014-01-01

    Elevated major ions (or salinity) are recognised as being a key contributor to the toxicity of many mine waste waters but the complex interactions between the major ions and large inter-species variability in response to salinity, make it difficult to relate toxicity to causal factors. This study aimed to determine if the toxicity of a typical saline seepage water was solely due to its major ion constituents; and determine which major ions were the leading contributors to the toxicity. Standardised toxicity tests using two tropical freshwater species Chlorella sp. (alga) and Moinodaphnia macleayi (cladoceran) were used to compare the toxicity of 1) mine and synthetic seepage water; 2) key major ions (e.g. Na, Cl, SO4 and HCO3); 3) synthetic seepage water that were modified by excluding key major ions. For Chlorella sp., the toxicity of the seepage water was not solely due to its major ion concentrations because there were differences in effects caused by the mine seepage and synthetic seepage. However, for M. macleayi this hypothesis was supported because similar effects caused by mine seepage and synthetic seepage. Sulfate was identified as a major ion that could predict the toxicity of the synthetic waters, which might be expected as it was the dominant major ion in the seepage water. However, sulfate was not the primary cause of toxicity in the seepage water and electrical conductivity was a better predictor of effects. Ultimately, the results show that specific major ions do not clearly drive the toxicity of saline seepage waters and the effects are probably due to the electrical conductivity of the mine waste waters.

  14. Data Mining and Pattern Recognition Models for Identifying Inherited Diseases: Challenges and Implications

    PubMed Central

    Iddamalgoda, Lahiru; Das, Partha S.; Aponso, Achala; Sundararajan, Vijayaraghava S.; Suravajhala, Prashanth; Valadi, Jayaraman K.

    2016-01-01

    Data mining and pattern recognition methods reveal interesting findings in genetic studies, especially on how the genetic makeup is associated with inherited diseases. Although researchers have proposed various data mining models for biomedical approaches, there remains a challenge in accurately prioritizing the single nucleotide polymorphisms (SNP) associated with the disease. In this commentary, we review the state-of-art data mining and pattern recognition models for identifying inherited diseases and deliberate the need of binary classification- and scoring-based prioritization methods in determining causal variants. While we discuss the pros and cons associated with these methods known, we argue that the gene prioritization methods and the protein interaction (PPI) methods in conjunction with the K nearest neighbors' could be used in accurately categorizing the genetic factors in disease causation. PMID:27559342

  15. Mines and human casualties: a robotics approach toward mine clearing

    NASA Astrophysics Data System (ADS)

    Ghaffari, Masoud; Manthena, Dinesh; Ghaffari, Alireza; Hall, Ernest L.

    2004-10-01

    An estimated 100 million landmines which have been planted in more than 60 countries kill or maim thousands of civilians every year. Millions of people live in the vast dangerous areas and are not able to access to basic human services because of landmines" threats. This problem has affected many third world countries and poor nations which are not able to afford high cost solutions. This paper tries to present some experiences with the land mine victims and solutions for the mine clearing. It studies current situation of this crisis as well as state of the art robotics technology for the mine clearing. It also introduces a survey robot which is suitable for the mine clearing applications. The results show that in addition to technical aspects, this problem has many socio-economic issues. The significance of this study is to persuade robotics researchers toward this topic and to peruse the technical and humanitarian facets of this issue.

  16. APPLYING DATA MINING APPROACHES TO FURTHER ...

    EPA Pesticide Factsheets

    This dataset will be used to illustrate various data mining techniques to biologically profile the chemical space. This dataset will be used to illustrate various data mining techniques to biologically profile the chemical space.

  17. A MeSH-based text mining method for identifying novel prebiotics

    PubMed Central

    Shan, Guangyu; Lu, Yiming; Min, Bo; Qu, Wubin; Zhang, Chenggang

    2016-01-01

    Abstract Prebiotics contribute to the well-being of their host by altering the composition of the gut microbiota. Discovering new prebiotics is a challenging and arduous task due to strict inclusion criteria; thus, highly limited numbers of prebiotic candidates have been identified. Notably, the large numbers of published studies may contain substantial information attached to various features of known prebiotics that can be used to predict new candidates. In this paper, we propose a medical subject headings (MeSH)-based text mining method for identifying new prebiotics with structured texts obtained from PubMed. We defined an optimal feature set for prebiotics prediction using a systematic feature-ranking algorithm with which a variety of carbohydrates can be accurately classified into different clusters in accordance with their chemical and biological attributes. The optimal feature set was used to separate positive prebiotics from other carbohydrates, and a cross-validation procedure was employed to assess the prediction accuracy of the model. Our method achieved a specificity of 0.876 and a sensitivity of 0.838. Finally, we identified a high-confidence list of candidates of prebiotics that are strongly supported by the literature. Our study demonstrates that text mining from high-volume biomedical literature is a promising approach in searching for new prebiotics. PMID:27930574

  18. Systematic evaluation of satellite remote sensing for identifying uranium mines and mills.

    SciTech Connect

    Blair, Dianna Sue; Stork, Christopher Lyle; Smartt, Heidi Anne; Smith, Jody Lynn

    2006-01-01

    In this report, we systematically evaluate the ability of current-generation, satellite-based spectroscopic sensors to distinguish uranium mines and mills from other mineral mining and milling operations. We perform this systematic evaluation by (1) outlining the remote, spectroscopic signal generation process, (2) documenting the capabilities of current commercial satellite systems, (3) systematically comparing the uranium mining and milling process to other mineral mining and milling operations, and (4) identifying the most promising observables associated with uranium mining and milling that can be identified using satellite remote sensing. The Ranger uranium mine and mill in Australia serves as a case study where we apply and test the techniques developed in this systematic analysis. Based on literature research of mineral mining and milling practices, we develop a decision tree which utilizes the information contained in one or more observables to determine whether uranium is possibly being mined and/or milled at a given site. Promising observables associated with uranium mining and milling at the Ranger site included in the decision tree are uranium ore, sulfur, the uranium pregnant leach liquor, ammonia, and uranyl compounds and sulfate ion disposed of in the tailings pond. Based on the size, concentration, and spectral characteristics of these promising observables, we then determine whether these observables can be identified using current commercial satellite systems, namely Hyperion, ASTER, and Quickbird. We conclude that the only promising observables at Ranger that can be uniquely identified using a current commercial satellite system (notably Hyperion) are magnesium chlorite in the open pit mine and the sulfur stockpile. Based on the identified magnesium chlorite and sulfur observables, the decision tree narrows the possible mineral candidates at Ranger to uranium, copper, zinc, manganese, vanadium, the rare earths, and phosphorus, all of which are

  19. Design approaches in quarrying and pit-mining reclamation

    USGS Publications Warehouse

    Arbogast, Belinda F.

    1999-01-01

    Reclaimed mine sites have been evaluated so that the public, industry, and land planners may recognize there are innovative designs available for consideration and use. People tend to see cropland, range, and road cuts as a necessary part of their everyday life, not as disturbed areas despite their high visibility. Mining also generates a disturbed landscape, unfortunately one that many consider waste until reclaimed by human beings. The development of mining provides an economic base and use of a natural resource to improve the quality of human life. Equally important is a sensitivity to the geologic origin and natural pattern of the land. Wisely shaping out environment requires a design plan and product that responds to a site's physiography, ecology, function, artistic form, and publication perception. An examination of selected sites for their landscape design suggested nine approaches for mining reclamation. The oldest design approach around is nature itself. Humans may sometimes do more damage going to an area in the attempt to repair it. Given enough geologic time, a small-site area, and stable adjacent ecosystems, disturbed areas recover without mankind's input. Visual screens and buffer zones conceal the facility in a camouflage approach. Typically, earth berms, fences, and plantings are used to disguise the mining facility. Restoration targets social or economic benefits by reusing the site for public amenities, most often in urban centers with large populations. A mitigation approach attempts to protect the environment and return mined areas to use with scientific input. The reuse of cement, building rubble, macadam meets only about 10% of the demand from aggregate. Recognizing the limited supply of mineral resources and encouraging recycling efforts are steps are steps in a renewable resource approach. An educative design approach effectively communicates mining information through outreach, land stewardship, and community service. Mine sites used for

  20. Bayesian network approach to spatial data mining: a case study

    NASA Astrophysics Data System (ADS)

    Huang, Jiejun; Wan, Youchuan

    2006-10-01

    Spatial data mining is a process of discovering interesting, novel, and potentially useful information or knowledge hidden in spatial data sets. It involves different techniques and different methods from various areas of research. A Bayesian network is a graphical model that encodes causal probabilistic relationships among variables of interest, which has a powerful ability for representing and reasoning and provides an effective way to spatial data mining. In this paper we give an introduction to Bayesian networks, and discuss using Bayesian networks for spatial data mining. We propose a framework of spatial data mining based on Bayesian networks. Then we show a case study and use the experimental results to validate the practical viability of the proposed approach to spatial data mining. Finally, the paper gives a summary and some remarks.

  1. Graduates employment classification using data mining approach

    NASA Astrophysics Data System (ADS)

    Aziz, Mohd Tajul Rizal Ab; Yusof, Yuhanis

    2016-08-01

    Data Mining is a platform to extract hidden knowledge in a collection of data. This study investigates the suitable classification model to classify graduates employment for one of the MARA Professional College (KPM) in Malaysia. The aim is to classify the graduates into either as employed, unemployed or further study. Five data mining algorithms offered in WEKA were used; Naïve Bayes, Logistic regression, Multilayer perceptron, k-nearest neighbor and Decision tree J48. Based on the obtained result, it is learned that the Logistic regression produces the highest classification accuracy which is at 92.5%. Such result was obtained while using 80% data for training and 20% for testing. The produced classification model will benefit the management of the college as it provides insight to the quality of graduates that they produce and how their curriculum can be improved to cater the needs from the industry.

  2. EST mining and functional expression assays identify extracellular effector proteins from the plant pathogen Phytophthora.

    PubMed

    Torto, Trudy A; Li, Shuang; Styer, Allison; Huitema, Edgar; Testa, Antonino; Gow, Neil A R; van West, Pieter; Kamoun, Sophien

    2003-07-01

    Plant pathogenic microbes have the remarkable ability to manipulate biochemical, physiological, and morphological processes in their host plants. These manipulations are achieved through a diverse array of effector molecules that can either promote infection or trigger defense responses. We describe a general functional genomics approach aimed at identifying extracellular effector proteins from plant pathogenic microorganisms by combining data mining of expressed sequence tags (ESTs) with virus-based high-throughput functional expression assays in plants. PexFinder, an algorithm for automated identification of extracellular proteins from EST data sets, was developed and applied to 2147 ESTs from the oomycete plant pathogen Phytophthora infestans. The program identified 261 ESTs (12.2%) corresponding to a set of 142 nonredundant Pex (Phytophthora extracellular protein) cDNAs. Of these, 78 (55%) Pex cDNAs were novel with no significant matches in public databases. Validation of PexFinder was performed using proteomic analysis of secreted protein of P. infestans. To identify which of the Pex cDNAs encode effector proteins that manipulate plant processes, high-throughput functional expression assays in plants were performed on 63 of the identified cDNAs using an Agrobacterium tumefaciens binary vector carrying the potato virus X (PVX) genome. This led to the discovery of two novel necrosis-inducing cDNAs, crn1 and crn2, encoding extracellular proteins that belong to a large and complex protein family in Phytophthora. Further characterization of the crn genes indicated that they are both expressed in P. infestans during colonization of the host plant tomato and that crn2 induced defense-response genes in tomato. Our results indicate that combining data mining using PexFinder with PVX-based functional assays can facilitate the discovery of novel pathogen effector proteins. In principle, this strategy can be applied to a variety of eukaryotic plant pathogens, including

  3. A proactive approach to sustainable management of mine tailings

    NASA Astrophysics Data System (ADS)

    Edraki, Mansour; Baumgartl, Thomas

    2015-04-01

    The reactive strategies to manage mine tailings i.e. containment of slurries of tailings in tailings storage facilities (TSF's) and remediation of tailings solids or tailings seepage water after the decommissioning of those facilities, can be technically inefficient to eliminate environmental risks (e.g. prevent dispersion of contaminants and catastrophic dam wall failures), pose a long term economic burden for companies, governments and society after mine closure, and often fail to meet community expectations. Most preventive environmental management practices promote proactive integrated approaches to waste management whereby the source of environmental issues are identified to help make a more informed decisions. They often use life cycle assessment to find the "hot spots" of environmental burdens. This kind of approach is often based on generic data and has rarely been used for tailings. Besides, life cycle assessments are less useful for designing operations or simulating changes in the process and consequent environmental outcomes. It is evident that an integrated approach for tailings research linked to better processing options is needed. A literature review revealed that there are only few examples of integrated approaches. The aim of this project is to develop new tailings management models by streamlining orebody characterization, process optimization and rehabilitation. The approach is based on continuous fingerprinting of geochemical processes from orebody to tailings storage facility, and benchmark the success of such proactive initiatives by evidence of no impacts and no future projected impacts on receiving environments. We present an approach for developing such a framework and preliminary results from a case study where combined grinding and flotation models developed using geometallurgical data from the orebody were constructed to predict the properties of tailings produced under various processing scenarios. The modelling scenarios based on the

  4. Autonomous decision-making: a data mining approach.

    PubMed

    Kusiak, A; Kern, J A; Kernstine, K H; Tseng, B T

    2000-12-01

    The researchers and practitioners of today create models, algorithms, functions, and other constructs defined in abstract spaces. The research of the future will likely be data driven. Symbolic and numeric data that are becoming available in large volumes will define the need for new data analysis techniques and tools. Data mining is an emerging area of computational intelligence that offers new theories, techniques, and tools for analysis of large data sets. In this paper, a novel approach for autonomous decision-making is developed based on the rough set theory of data mining. The approach has been tested on a medical data set for patients with lung abnormalities referred to as solitary pulmonary nodules (SPNs). The two independent algorithms developed in this paper either generate an accurate diagnosis or make no decision. The methodolgy discussed in the paper depart from the developments in data mining as well as current medical literature, thus creating a variable approach for autonomous decision-making.

  5. Text mining electronic health records to identify hospital adverse events.

    PubMed

    Gerdes, Lars Ulrik; Hardahl, Christian

    2013-01-01

    Manual reviews of health records to identify possible adverse events are time consuming. We are developing a method based on natural language processing to quickly search electronic health records for common triggers and adverse events. Our results agree fairly well with those obtained using manual reviews, and we therefore believe that it is possible to develop automatic tools for monitoring aspects of patient safety.

  6. Current approaches for mitigating acid mine drainage.

    PubMed

    Sahoo, Prafulla Kumar; Kim, Kangjoo; Equeenuddin, Sk Md; Powell, Michael A

    2013-01-01

    AMD is one of the critical environmental problems that causes acidification and metal contamination of surface and ground water bodies when mine materials and/or over burden-containing metal sulfides are exposed to oxidizing conditions. The best option to limit AMD is early avoidance of sulfide oxidation. Several techniques are available to achieve this. In this paper, we review all of the major methods now used to limit sulfide oxidation. These fall into five categories: (1) physical barriers,(2) bacterial inhibition, (3) chemical passivation, ( 4) electrochemical, and (5) desulfurization.We describe the processes underlying each method by category and then address aspects relating to effectiveness, cost, and environmental impact. This paper may help researchers and environmental engineers to select suitable methods for addressing site-specific AMD problems.Irrespective of the mechanism by which each method works, all share one common feature, i.e., they delay or prevent oxidation. In addition, all have limitations.Physical barriers such as wet or dry cover have retarded sulfide oxidation in several studies; however, both wet and dry barriers exhibit only short-term effectiveness.Wet cover is suitable at specific sites where complete inundation is established, but this approach requires high maintenance costs. When employing dry cover, plastic liners are expensive and rarely used for large volumes of waste. Bactericides can suppress oxidation, but are only effective on fresh tailings and short-lived, and do not serve as a permanent solution to AMD. In addition, application of bactericides may be toxic to aquatic organisms.Encapsulation or passivation of sulfide surfaces (applying organic and/or inorganic coatings) is simple and effective in preventing AMD. Among inorganic coatings,silica is the most promising, stable, acid-resistant and long lasting, as compared to phosphate and other inorganic coatings. Permanganate passivation is also promising because it

  7. Mining genomic databases to identify novel hydrogen producers.

    PubMed

    Kalia, Vipin C; Lal, Sadhana; Ghai, Rohit; Mandal, Manabendra; Chauhan, Ashwini

    2003-04-01

    The realization that fossil fuel reserves are limited and their adverse effect on the environment has forced us to look into alternative sources of energy. Hydrogen is a strong contender as a future fuel. Biological hydrogen production ranges from 0.37 to 3.3 moles H(2) per mole of glucose and, considering the high theoretical values of production (4.0 moles H(2) per mole of glucose), it is worth exploring approaches to increase hydrogen yields. Screening the untapped microbial population is a promising possibility. Sequence analysis and pathway alignment of hydrogen metabolism in complete and incomplete genomes has led to the identification of potential hydrogen producers.

  8. Wastewater treatment polymers identified as the toxic component of a diamond mine effluent.

    PubMed

    De Rosemond, Simone J C; Liber, Karsten

    2004-09-01

    The Ekati Diamond Mine, located approximately 300 km northeast of Yellowknife in Canada's Northwest Territories, uses mechanical crushing and washing processes to extract diamonds from kimberlite ore. The processing plant's effluent contains kimberlite ore particles (< or =0.5 mm), wastewater, and two wastewater treatment polymers, a cationic polydiallydimethylammonium chloride (DADMAC) polymer and an anionic sodium acrylate polyacrylamide (PAM) polymer. A series of acute (48-h) and chronic (7-d) toxicity tests determined the processed kimberlite effluent (PKE) was chronically, but not acutely, toxic to Ceriodaphnia dubia. Reproduction of C. dubia was inhibited significantly at concentrations as low as 12.5% PKE. Toxicity identification evaluations (TIE) were initiated to identify the toxic component of PKE. Ethylenediaminetetraacetic acid (EDTA), sodium thiosulfate, aeration, and solid phase extraction with C-18 manipulations failed to reduce PKE toxicity. Toxicity was reduced significantly by pH adjustments to pH 3 or 11 followed by filtration. Toxicity testing with C. dubia determined that the cationic DADMAC polymer had a 48-h median lethal concentration (LC50) of 0.32 mg/L and 7-d median effective concentration (EC50) of 0.014 mg/L. The anionic PAM polymer had a 48-h LC50 of 218 mg/L. A weight-of-evidence approach, using the data obtained from the TIE, the polymer toxicity experiments, the estimated concentration of the cationic polymer in the kimberlite effluent, and the behavior of kimberlite minerals in pH-adjusted solutions provided sufficient evidence to identify the cationic DADMAC polymer as the toxic component of the diamond mine PKE.

  9. Using text-mining techniques in electronic patient records to identify ADRs from medicine use.

    PubMed

    Warrer, Pernille; Hansen, Ebba Holme; Juhl-Jensen, Lars; Aagaard, Lise

    2012-05-01

    This literature review included studies that use text-mining techniques in narrative documents stored in electronic patient records (EPRs) to investigate ADRs. We searched PubMed, Embase, Web of Science and International Pharmaceutical Abstracts without restrictions from origin until July 2011. We included empirically based studies on text mining of electronic patient records (EPRs) that focused on detecting ADRs, excluding those that investigated adverse events not related to medicine use. We extracted information on study populations, EPR data sources, frequencies and types of the identified ADRs, medicines associated with ADRs, text-mining algorithms used and their performance. Seven studies, all from the United States, were eligible for inclusion in the review. Studies were published from 2001, the majority between 2009 and 2010. Text-mining techniques varied over time from simple free text searching of outpatient visit notes and inpatient discharge summaries to more advanced techniques involving natural language processing (NLP) of inpatient discharge summaries. Performance appeared to increase with the use of NLP, although many ADRs were still missed. Due to differences in study design and populations, various types of ADRs were identified and thus we could not make comparisons across studies. The review underscores the feasibility and potential of text mining to investigate narrative documents in EPRs for ADRs. However, more empirical studies are needed to evaluate whether text mining of EPRs can be used systematically to collect new information about ADRs.

  10. Mining the Metabiome: Identifying Novel Natural Products from Microbial Communities

    PubMed Central

    Milshteyn, Aleksandr; Schneider, Jessica S.; Brady, Sean F.

    2014-01-01

    Summary Microbial-derived natural products provide the foundation for most of the chemotherapeutic arsenal available to contemporary medicine. In the face of a dwindling pipeline of new lead structures identified by traditional culturing techniques and an increasing need for new therapeutics, surveys of microbial biosynthetic diversity across environmental metabiomes have revealed enormous reservoirs of as yet untapped natural products chemistry. In this review we touch on the historical context of microbial natural product discovery and discuss innovations and technological advances that are facilitating culture-dependent and culture-independent access to new chemistry from environmental microbiomes with the goal of re-invigorating the small molecule therapeutics discovery pipeline. We highlight the successful strategies that have emerged and some of the challenges that must be overcome to enable the development of high-throughput methods for natural product discovery from complex microbial communities. PMID:25237864

  11. Data mining approach to model the diagnostic service management.

    PubMed

    Lee, Sun-Mi; Lee, Ae-Kyung; Park, Il-Su

    2006-01-01

    Korea has National Health Insurance Program operated by the government-owned National Health Insurance Corporation, and diagnostic services are provided every two year for the insured and their family members. Developing a customer relationship management (CRM) system using data mining technology would be useful to improve the performance of diagnostic service programs. Under these circumstances, this study developed a model for diagnostic service management taking into account the characteristics of subjects using a data mining approach. This study could be further used to develop an automated CRM system contributing to the increase in the rate of receiving diagnostic services.

  12. Using Helicopter Electromagnetic Surveys to Identify Potential Hazards at Mine Waste Impoundments

    SciTech Connect

    Hammack, R.W.

    2008-01-01

    In July 2003, helicopter electromagnetic surveys were conducted at 14 coal waste impoundments in southern West Virginia. The purpose of the surveys was to detect conditions that could lead to impoundment failure either by structural failure of the embankment or by the flooding of adjacent or underlying mine works. Specifically, the surveys attempted to: 1) identify saturated zones within the mine waste, 2) delineate filtrate flow paths through the embankment or into adjacent strata and receiving streams, and 3) identify flooded mine workings underlying or adjacent to the waste impoundment. Data from the helicopter surveys were processed to generate conductivity/depth images. Conductivity/depth images were then spatially linked to georeferenced air photos or topographic maps for interpretation. Conductivity/depth images were found to provide a snapshot of the hydrologic conditions that exist within the impoundment. This information can be used to predict potential areas of failure within the embankment because of its ability to image the phreatic zone. Also, the electromagnetic survey can identify areas of unconsolidated slurry in the decant basin and beneath the embankment. Although shallow, flooded mineworks beneath the impoundment were identified by this survey, it cannot be assumed that electromagnetic surveys can detect all underlying mines. A preliminary evaluation of the data implies that helicopter electromagnetic surveys can provide a better understanding of the phreatic zone than the piezometer arrays that are typically used.

  13. Application of data mining approaches to drug delivery.

    PubMed

    Ekins, Sean; Shimada, Jun; Chang, Cheng

    2006-11-30

    Computational approaches play a key role in all areas of the pharmaceutical industry from data mining, experimental and clinical data capture to pharmacoeconomics and adverse events monitoring. They will likely continue to be indispensable assets along with a growing library of software applications. This is primarily due to the increasingly massive amount of biology, chemistry and clinical data, which is now entering the public domain mainly as a result of NIH and commercially funded projects. We are therefore in need of new methods for mining this mountain of data in order to enable new hypothesis generation. The computational approaches include, but are not limited to, database compilation, quantitative structure activity relationships (QSAR), pharmacophores, network visualization models, decision trees, machine learning algorithms and multidimensional data visualization software that could be used to improve drug delivery after mining public and/or proprietary data. We will discuss some areas of unmet needs in the area of data mining for drug delivery that can be addressed with new software tools or databases of relevance to future pharmaceutical projects.

  14. Mining Clinicians' Electronic Documentation to Identify Heart Failure Patients with Ineffective Self-Management: A Pilot Text-Mining Study.

    PubMed

    Topaz, Maxim; Radhakrishnan, Kavita; Lei, Victor; Zhou, Li

    2016-01-01

    Effective self-management can decrease up to 50% of heart failure hospitalizations. Unfortunately, self-management by patients with heart failure remains poor. This pilot study aimed to explore the use of text-mining to identify heart failure patients with ineffective self-management. We first built a comprehensive self-management vocabulary based on the literature and clinical notes review. We then randomly selected 545 heart failure patients treated within Partners Healthcare hospitals (Boston, MA, USA) and conducted a regular expression search with the compiled vocabulary within 43,107 interdisciplinary clinical notes of these patients. We found that 38.2% (n = 208) patients had documentation of ineffective heart failure self-management in the domains of poor diet adherence (28.4%), missed medical encounters (26.4%) poor medication adherence (20.2%) and non-specified self-management issues (e.g., "compliance issues", 34.6%). We showed the feasibility of using text-mining to identify patients with ineffective self-management. More natural language processing algorithms are needed to help busy clinicians identify these patients.

  15. Mining Functional Modules in Heterogeneous Biological Networks Using Multiplex PageRank Approach

    PubMed Central

    Li, Jun; Zhao, Patrick X.

    2016-01-01

    Identification of functional modules/sub-networks in large-scale biological networks is one of the important research challenges in current bioinformatics and systems biology. Approaches have been developed to identify functional modules in single-class biological networks; however, methods for systematically and interactively mining multiple classes of heterogeneous biological networks are lacking. In this paper, we present a novel algorithm (called mPageRank) that utilizes the Multiplex PageRank approach to mine functional modules from two classes of biological networks. We demonstrate the capabilities of our approach by successfully mining functional biological modules through integrating expression-based gene-gene association networks and protein-protein interaction networks. We first compared the performance of our method with that of other methods using simulated data. We then applied our method to identify the cell division cycle related functional module and plant signaling defense-related functional module in the model plant Arabidopsis thaliana. Our results demonstrated that the mPageRank method is effective for mining sub-networks in both expression-based gene-gene association networks and protein-protein interaction networks, and has the potential to be adapted for the discovery of functional modules/sub-networks in other heterogeneous biological networks. The mPageRank executable program, source code, the datasets and results of the presented two case studies are publicly and freely available at http://plantgrn.noble.org/MPageRank/. PMID:27446133

  16. Selecting Proper Plant Species for Mine Reclamation Using Fuzzy AHP Approach (Case Study: Chadormaloo Iron Mine of Iran)

    NASA Astrophysics Data System (ADS)

    Ebrahimabadi, Arash

    2016-12-01

    This paper describes an effective approach to select suitable plant species for reclamation of mined lands in Chadormaloo iron mine which is located in central part of Iran, near the city of Bafgh in Yazd province. After mine's total reserves are excavated, the mine requires to be permanently closed and reclaimed. Mine reclamation and post-mining land-use are the main issues in the phase of mine closure. In general, among various scenarios for mine reclamation process, i.e. planting, agriculture, forestry, residency, tourist attraction, etc., planting is the oldest and commonly-used technology for the reclamation of lands damaged by mining activities. Planting and vegetation play a major role in restoring productivity, ecosystem stability and biological diversity to degraded areas, therefore the main goal of this research work is to choose proper and suitable plants compatible with the conditions of Chadormaloo mined area, providing consistent conditions for future use. To ensure the sustainability of the reclaimed landscape, the most suitable plant species adapted to the mine conditions are selected. Plant species selection is a Multi Criteria Decision Making (MCDM) problem. In this paper, a fuzzy MCDM technique, namely Fuzzy Analytic Hierarchy Process (FAHP) is developed to assist chadormaloo iron mine managers and designers in the process of plant type selection for reclamation of the mine under fuzzy environment where the vagueness and uncertainty are taken into account with linguistic variables parameterized by triangular fuzzy numbers. The results achieved from using FAHP approach demonstrate that the most proper plant species are ranked as Artemisia sieberi, Salsola yazdiana, Halophytes types, and Zygophyllum, respectively for reclamation of Chadormaloo iron mine.

  17. A Pattern Mining Approach for Classifying Multivariate Temporal Data.

    PubMed

    Batal, Iyad; Valizadegan, Hamed; Cooper, Gregory F; Hauskrecht, Milos

    2011-11-12

    We study the problem of learning classification models from complex multivariate temporal data encountered in electronic health record systems. The challenge is to define a good set of features that are able to represent well the temporal aspect of the data. Our method relies on temporal abstractions and temporal pattern mining to extract the classification features. Temporal pattern mining usually returns a large number of temporal patterns, most of which may be irrelevant to the classification task. To address this problem, we present the minimal predictive temporal patterns framework to generate a small set of predictive and non-spurious patterns. We apply our approach to the real-world clinical task of predicting patients who are at risk of developing heparin induced thrombocytopenia. The results demonstrate the benefit of our approach in learning accurate classifiers, which is a key step for developing intelligent clinical monitoring systems.

  18. Efflorescent sulfates from Baia Sprie mining area (Romania)--Acid mine drainage and climatological approach.

    PubMed

    Buzatu, Andrei; Dill, Harald G; Buzgar, Nicolae; Damian, Gheorghe; Maftei, Andreea Elena; Apopei, Andrei Ionuț

    2016-01-15

    The Baia Sprie epithermal system, a well-known deposit for its impressive mineralogical associations, shows the proper conditions for acid mine drainage and can be considered a general example for affected mining areas around the globe. Efflorescent samples from the abandoned open pit Minei Hill have been analyzed by X-ray diffraction (XRD), scanning electron microscopy (SEM), Raman and near-infrared (NIR) spectrometry. The identified phases represent mostly iron sulfates with different hydration degrees (szomolnokite, rozenite, melanterite, coquimbite, ferricopiapite), Zn and Al sulfates (gunningite, alunogen, halotrichite). The samples were heated at different temperatures in order to establish the phase transformations among the studied sulfates. The dehydration temperatures and intermediate phases upon decomposition were successfully identified for each of mineral phases. Gunningite was the single sulfate that showed no transformations during the heating experiment. All the other sulfates started to dehydrate within the 30-90 °C temperature range. The acid mine drainage is the main cause for sulfates formation, triggered by pyrite oxidation as the major source for the abundant iron sulfates. Based on the dehydration temperatures, the climatological interpretation indicated that melanterite formation and long-term presence is related to continental and temperate climates. Coquimbite and rozenite are attributed also to the dry arid/semi-arid areas, in addition to the above mentioned ones. The more stable sulfates, alunogen, halotrichite, szomolnokite, ferricopiapite and gunningite, can form and persists in all climate regimes, from dry continental to even tropical humid.

  19. A geomorphological approach to the management of rivers contaminated by metal mining

    NASA Astrophysics Data System (ADS)

    Macklin, M. G.; Brewer, P. A.; Hudson-Edwards, K. A.; Bird, G.; Coulthard, T. J.; Dennis, I. A.; Lechler, P. J.; Miller, J. R.; Turner, J. N.

    2006-09-01

    As the result of current and historical metal mining, river channels and floodplains in many parts of the world have become contaminated by metal-rich waste in concentrations that may pose a hazard to human livelihoods and sustainable development. Environmental and human health impacts commonly arise because of the prolonged residence time of heavy metals in river sediments and alluvial soils and their bioaccumulatory nature in plants and animals. This paper considers how an understanding of the processes of sediment-associated metal dispersion in rivers, and the space and timescales over which they operate, can be used in a practical way to help river basin managers more effectively control and remediate catchments affected by current and historical metal mining. A geomorphological approach to the management of rivers contaminated by metals is outlined and four emerging research themes are highlighted and critically reviewed. These are: (1) response and recovery of river systems following the failures of major tailings dams; (2) effects of flooding on river contamination and the sustainable use of floodplains; (3) new developments in isotopic fingerprinting, remote sensing and numerical modelling for identifying the sources of contaminant metals and for mapping the spatial distribution of contaminants in river channels and floodplains; and (4) current approaches to the remediation of river basins affected by mining, appraised in light of the European Union's Water Framework Directive (2000/60/EC). Future opportunities for geomorphologically-based assessments of mining-affected catchments are also identified.

  20. Magnetic signature of overbank sediment in industry impacted floodplains identified by data mining methods

    NASA Astrophysics Data System (ADS)

    Chudaničová, Monika; Hutchinson, Simon M.

    2016-11-01

    Our study attempts to identify a characteristic magnetic signature of overbank sediments exhibiting anthropogenically induced magnetic enhancement and thereby to distinguish them from unenhanced sediments with weak magnetic background values, using a novel approach based on data mining methods, thus providing a mean of rapid pollution determination. Data were obtained from 539 bulk samples from vertical profiles through overbank sediment, collected on seven rivers in the eastern Czech Republic and three rivers in northwest England. k-Means clustering and hierarchical clustering methods, paired group (UPGMA) and Ward's method, were used to divide the samples to natural groups according to their attributes. Interparametric ratios: SIRM/χ; SIRM/ARM; and S-0.1T were chosen as attributes for analyses making the resultant model more widely applicable as magnetic concentration values can differ by two orders. Division into three clusters appeared to be optimal and corresponded to inherent clusters in the data scatter. Clustering managed to separate samples with relatively weak anthropogenically induced enhancement, relatively strong anthropogenically induced enhancement and samples lacking enhancement. To describe the clusters explicitly and thus obtain a discrete magnetic signature, classification rules (JRip method) and decision trees (J4.8 and Simple Cart methods) were used. Samples lacking anthropogenic enhancement typically exhibited an S-0.1T < c. 0.5, SIRM/ARM < c. 150 and SIRM/χ < c. 6000 A m-1. Samples with magnetic enhancement all exhibited an S-0.1T > 0.5. Samples with relatively stronger anthropogenic enhancement were unequivocally distinguished from the samples with weaker enhancement by an SIRM/ARM > c. 150. Samples with SIRM/ARM in a range c. 126-150 were classified as relatively strongly enhanced when their SIRM/χ > 18 000 A m-1 and relatively less enhanced when their SIRM/χ < 18 000 A m-1. An additional rule was arbitrary added to exclude samples with

  1. Identifying MMORPG Bots: A Traffic Analysis Approach

    NASA Astrophysics Data System (ADS)

    Chen, Kuan-Ta; Jiang, Jhih-Wei; Huang, Polly; Chu, Hao-Hua; Lei, Chin-Laung; Chen, Wen-Chin

    2008-12-01

    Massively multiplayer online role playing games (MMORPGs) have become extremely popular among network gamers. Despite their success, one of MMORPG's greatest challenges is the increasing use of game bots, that is, autoplaying game clients. The use of game bots is considered unsportsmanlike and is therefore forbidden. To keep games in order, game police, played by actual human players, often patrol game zones and question suspicious players. This practice, however, is labor-intensive and ineffective. To address this problem, we analyze the traffic generated by human players versus game bots and propose general solutions to identify game bots. Taking Ragnarok Online as our subject, we study the traffic generated by human players and game bots. We find that their traffic is distinguishable by 1) the regularity in the release time of client commands, 2) the trend and magnitude of traffic burstiness in multiple time scales, and 3) the sensitivity to different network conditions. Based on these findings, we propose four strategies and two ensemble schemes to identify bots. Finally, we discuss the robustness of the proposed methods against countermeasures of bot developers, and consider a number of possible ways to manage the increasingly serious bot problem.

  2. Identifying and Describing a Seismogenic Zone in a Sublevel Caving Mine

    NASA Astrophysics Data System (ADS)

    Abolfazlzadeh, Yousef; Hudyma, Marty

    2016-09-01

    Analysis of caving-induced seismicity can aid in the understanding of rock mass behaviour in the different stages of the caving process. A detailed analysis of caving-induced seismicity at the Telfer sublevel caving mine was undertaken. Interpretation of seismic data in the Telfer mine showed the influence of the major geological features on cave behaviour and helped to identify the phases of cave evolution. Two geological zones with unique seismic characteristics (the M50 and M30 stiff reefs) and four key caving phases (initial undercut blasting, cave initiation, cave propagation and breakthrough) were defined through seismic data analysis. Movement of the seismogenic zone was significantly affected by the stiff reefs within the cave column. Seismic source parameter analysis was used to investigate caving mechanisms at Telfer.

  3. Clustering-based approaches to SAGE data mining

    PubMed Central

    Wang, Haiying; Zheng, Huiru; Azuaje, Francisco

    2008-01-01

    Serial analysis of gene expression (SAGE) is one of the most powerful tools for global gene expression profiling. It has led to several biological discoveries and biomedical applications, such as the prediction of new gene functions and the identification of biomarkers in human cancer research. Clustering techniques have become fundamental approaches in these applications. This paper reviews relevant clustering techniques specifically designed for this type of data. It places an emphasis on current limitations and opportunities in this area for supporting biologically-meaningful data mining and visualisation. PMID:18822151

  4. Using cloud association rule data mining approach in optical networks

    NASA Astrophysics Data System (ADS)

    Ma, Bin

    2007-11-01

    In the current DWDM network, one of the critical design issues in the utilization of networks is careful planning to minimize burst dropping resulting from resource contention. The provision of suitable planning before metadata are sent is critical to improve the rate of successful transmission. In this paper, we attempt to adopt a novel data mining approaches to determining a suitable routing path in the OBS network. Instead of using label switching techniques in DWDM, we proposed the hybrid OBS routing planning on the basics of Cloud Association Rules Algorithm, thus reduced the transmission collision rate in OBS routing. This paper searches for the optimal routing path from all the possible routing paths using cloud association rule approach with Apriori-gen algorithm based on the PACNet topology. The heuristic rules discovered by Apriori-gen algorithm are stored in the Knowledge Base (KB) as references for determining the most suitable routing path. The Knowledge Base of the routing path are set up by means of optimal path routing with the highest successful rate which is mined from the database of historical routing paths using cloud association rules. The experiment results show that the successful rates of routing paths obtained by the proposed routing planning approach can effectively improve the successful rates of transmission.

  5. A practical approach for content mining of Tweets.

    PubMed

    Yoon, Sunmoo; Elhadad, Noémie; Bakken, Suzanne

    2013-07-01

    Use of data generated through social media for health studies is gradually increasing. Twitter is a short-text message system developed 6 years ago, now with more than 100 million users generating over 300 million Tweets every day. Twitter may be used to gain real-world insights to promote healthy behaviors. The purposes of this paper are to describe a practical approach to analyzing Tweet contents and to illustrate an application of the approach to the topic of physical activity. The approach includes five steps: (1) selecting keywords to gather an initial set of Tweets to analyze; (2) importing data; (3) preparing data; (4) analyzing data (topic, sentiment, and ecologic context); and (5) interpreting data. The steps are implemented using tools that are publically available and free of charge and designed for use by researchers with limited programming skills. Content mining of Tweets can contribute to addressing challenges in health behavior research.

  6. A Practical Approach for Content Mining of Tweets

    PubMed Central

    Yoon, Sunmoo; Elhadad, Noémie; Bakken, Suzanne

    2013-01-01

    Use of data generated through social media for health studies is gradually increasing. Twitter is a short-text message system developed 6 years ago, now with more than 100 million users generating over 300 million Tweets every day. Twitter may be used to gain real-world insights to promote healthy behaviors. The purposes of this paper are to describe a practical approach to analyzing Tweet contents and to illustrate an application of the approach to the topic of physical activity. The approach includes five steps: (1) selecting keywords to gather an initial set of Tweets to analyze; (2) importing data; (3) preparing data; (4) analyzing data (topic, sentiment, and ecologic context); and (5) interpreting data. The steps are implemented using tools that are publically available and free of charge and designed for use by researchers with limited programming skills. Content mining of Tweets can contribute to addressing challenges in health behavior research. PMID:23790998

  7. WHAT INNOVATIVE APPROACHES CAN BE DEVELOPED FOR MINING SITES?

    EPA Science Inventory

    Mining is essential to maintain our way of life. However, based upon industry's reporting in the most recent Toxic Release Inventory (TRI), the primary sources of heavy metal releases to the environment are mining and mining related activities. The hard rock mining industry rel...

  8. An efficacy driven approach for medication recommendation in type 2 diabetes treatment using data mining techniques.

    PubMed

    Liu, Haifeng; Xie, Guotong; Mei, Jing; Shen, Weijia; Sun, Wen; Li, Xiang

    2013-01-01

    We demonstrate how data mining techniques can help recommend effective medications when physicians need to control the glucose level of patients with type 2 diabetes. We first identify the factors that may affect physicians' medication decisions and then develop a patient-similarity based approach to automatically recommend medications for a patient with the specific condition so that his blood glucose level (measured by HbA1C value) can be well controlled. The approach is validated through experiments on real data sets and compared with the recommendations by following a clinical guideline.

  9. Identifying Understudied Nuclear Reactions by Text-mining the EXFOR Experimental Nuclear Reaction Library

    NASA Astrophysics Data System (ADS)

    Hirdt, J. A.; Brown, D. A.

    2016-01-01

    The EXFOR library contains the largest collection of experimental nuclear reaction data available as well as the data's bibliographic information and experimental details. We text-mined the REACTION and MONITOR fields of the ENTRYs in the EXFOR library in order to identify understudied reactions and quantities. Using the results of the text-mining, we created an undirected graph from the EXFOR datasets with each graph node representing a single reaction and quantity and graph links representing the various types of connections between these reactions and quantities. This graph is an abstract representation of the connections in EXFOR, similar to graphs of social networks, authorship networks, etc. We use various graph theoretical tools to identify important yet understudied reactions and quantities in EXFOR. Although we identified a few cross sections relevant for shielding applications and isotope production, mostly we identified charged particle fluence monitor cross sections. As a side effect of this work, we learn that our abstract graph is typical of other real-world graphs.

  10. Identifying Understudied Nuclear Reactions by Text-mining the EXFOR Experimental Nuclear Reaction Library

    SciTech Connect

    Hirdt, J.A.; Brown, D.A.

    2016-01-15

    The EXFOR library contains the largest collection of experimental nuclear reaction data available as well as the data's bibliographic information and experimental details. We text-mined the REACTION and MONITOR fields of the ENTRYs in the EXFOR library in order to identify understudied reactions and quantities. Using the results of the text-mining, we created an undirected graph from the EXFOR datasets with each graph node representing a single reaction and quantity and graph links representing the various types of connections between these reactions and quantities. This graph is an abstract representation of the connections in EXFOR, similar to graphs of social networks, authorship networks, etc. We use various graph theoretical tools to identify important yet understudied reactions and quantities in EXFOR. Although we identified a few cross sections relevant for shielding applications and isotope production, mostly we identified charged particle fluence monitor cross sections. As a side effect of this work, we learn that our abstract graph is typical of other real-world graphs.

  11. Development and application of the Safe Performance Index as a risk-based methodology for identifying major hazard-related safety issues in underground coal mines

    NASA Astrophysics Data System (ADS)

    Kinilakodi, Harisha

    The underground coal mining industry has been under constant watch due to the high risk involved in its activities, and scrutiny increased because of the disasters that occurred in 2006-07. In the aftermath of the incidents, the U.S. Congress passed the Mine Improvement and New Emergency Response Act of 2006 (MINER Act), which strengthened the existing regulations and mandated new laws to address the various issues related to a safe working environment in the mines. Risk analysis in any form should be done on a regular basis to tackle the possibility of unwanted major hazard-related events such as explosions, outbursts, airbursts, inundations, spontaneous combustion, and roof fall instabilities. One of the responses by the Mine Safety and Health Administration (MSHA) in 2007 involved a new pattern of violations (POV) process to target mines with a poor safety performance, specifically to improve their safety. However, the 2010 disaster (worst in 40 years) gave an impression that the collective effort of the industry, federal/state agencies, and researchers to achieve the goal of zero fatalities and serious injuries has gone awry. The Safe Performance Index (SPI) methodology developed in this research is a straight-forward, effective, transparent, and reproducible approach that can help in identifying and addressing some of the existing issues while targeting (poor safety performance) mines which need help. It combines three injury and three citation measures that are scaled to have an equal mean (5.0) in a balanced way with proportionate weighting factors (0.05, 0.15, 0.30) and overall normalizing factor (15) into a mine safety performance evaluation tool. It can be used to assess the relative safety-related risk of mines, including by mine-size category. Using 2008 and 2009 data, comparisons were made of SPI-associated, normalized safety performance measures across mine-size categories, with emphasis on small-mine safety performance as compared to large- and

  12. An Integrated Assessment Approach to Address Artisanal and Small-Scale Gold Mining in Ghana

    PubMed Central

    Basu, Niladri; Renne, Elisha P.; Long, Rachel N.

    2015-01-01

    Artisanal and small-scale gold mining (ASGM) is growing in many regions of the world including Ghana. The problems in these communities are complex and multi-faceted. To help increase understanding of such problems, and to enable consensus-building and effective translation of scientific findings to stakeholders, help inform policies, and ultimately improve decision making, we utilized an Integrated Assessment approach to study artisanal and small-scale gold mining activities in Ghana. Though Integrated Assessments have been used in the fields of environmental science and sustainable development, their use in addressing specific matter in public health, and in particular, environmental and occupational health is quite limited despite their many benefits. The aim of the current paper was to describe specific activities undertaken and how they were organized, and the outputs and outcomes of our activity. In brief, three disciplinary workgroups (Natural Sciences, Human Health, Social Sciences and Economics) were formed, with 26 researchers from a range of Ghanaian institutions plus international experts. The workgroups conducted activities in order to address the following question: What are the causes, consequences and correctives of small-scale gold mining in Ghana? More specifically: What alternatives are available in resource-limited settings in Ghana that allow for gold-mining to occur in a manner that maintains ecological health and human health without hindering near- and long-term economic prosperity? Several response options were identified and evaluated, and are currently being disseminated to various stakeholders within Ghana and internationally. PMID:26393627

  13. An Integrated Assessment Approach to Address Artisanal and Small-Scale Gold Mining in Ghana.

    PubMed

    Basu, Niladri; Renne, Elisha P; Long, Rachel N

    2015-09-17

    Artisanal and small-scale gold mining (ASGM) is growing in many regions of the world including Ghana. The problems in these communities are complex and multi-faceted. To help increase understanding of such problems, and to enable consensus-building and effective translation of scientific findings to stakeholders, help inform policies, and ultimately improve decision making, we utilized an Integrated Assessment approach to study artisanal and small-scale gold mining activities in Ghana. Though Integrated Assessments have been used in the fields of environmental science and sustainable development, their use in addressing specific matter in public health, and in particular, environmental and occupational health is quite limited despite their many benefits. The aim of the current paper was to describe specific activities undertaken and how they were organized, and the outputs and outcomes of our activity. In brief, three disciplinary workgroups (Natural Sciences, Human Health, Social Sciences and Economics) were formed, with 26 researchers from a range of Ghanaian institutions plus international experts. The workgroups conducted activities in order to address the following question: What are the causes, consequences and correctives of small-scale gold mining in Ghana? More specifically: What alternatives are available in resource-limited settings in Ghana that allow for gold-mining to occur in a manner that maintains ecological health and human health without hindering near- and long-term economic prosperity? Several response options were identified and evaluated, and are currently being disseminated to various stakeholders within Ghana and internationally.

  14. Using machine vision and data mining techniques to identify cell properties via microfluidic flow analysis

    NASA Astrophysics Data System (ADS)

    Horowitz, Geoffrey; Bowie, Samuel; Liu, Anna; Stone, Nicholas; Sulchek, Todd; Alexeev, Alexander

    2016-11-01

    In order to quickly identify the wide range of mechanistic properties that are seen in cell populations, a coupled machine vision and data mining analysis is developed to examine high speed videos of cells flowing through a microfluidic device. The microfluidic device contains a microchannel decorated with a periodical array of diagonal ridges. The ridges compress flowing cells that results in complex cell trajectory and induces cell cross-channel drift, both depend on the cell intrinsic mechanical properties that can be used to characterize specific cell lines. Thus, the cell trajectory analysis can yield a parameter set that can serve as a unique identifier of a cell's membership to a specific cell population. By using the correlations between the cell populations and measured cell trajectories in the ridged microchannel, mechanical properties of individual cells and their specific populations can be identified via only information captured using video analysis. Financial support provided by National Science Foundation (NSF) Grant No. CMMI 1538161.

  15. Identifying Catchment-Scale Predictors of Coal Mining Impacts on New Zealand Stream Communities.

    PubMed

    Clapcott, Joanne E; Goodwin, Eric O; Harding, Jon S

    2016-03-01

    Coal mining activities can have severe and long-term impacts on freshwater ecosystems. At the individual stream scale, these impacts have been well studied; however, few attempts have been made to determine the predictors of mine impacts at a regional scale. We investigated whether catchment-scale measures of mining impacts could be used to predict biological responses. We collated data from multiple studies and analyzed algae, benthic invertebrate, and fish community data from 186 stream sites, including un-mined streams, and those associated with 620 mines on the West Coast of the South Island, New Zealand. Algal, invertebrate, and fish richness responded to mine impacts and were significantly higher in un-mined compared to mine-impacted streams. Changes in community composition toward more acid- and metal-tolerant species were evident for algae and invertebrates, whereas changes in fish communities were significant and driven by a loss of nonmigratory native species. Consistent catchment-scale predictors of mining activities affecting biota included the time post mining (years), mining density (the number of mines upstream per catchment area), and mining intensity (tons of coal production per catchment area). Mining was associated with a decline in stream biodiversity irrespective of catchment size, and recovery was not evident until at least 30 years after mining activities have ceased. These catchment-scale predictors can provide managers and regulators with practical metrics to focus on management and remediation decisions.

  16. Identifying Catchment-Scale Predictors of Coal Mining Impacts on New Zealand Stream Communities

    NASA Astrophysics Data System (ADS)

    Clapcott, Joanne E.; Goodwin, Eric O.; Harding, Jon S.

    2016-03-01

    Coal mining activities can have severe and long-term impacts on freshwater ecosystems. At the individual stream scale, these impacts have been well studied; however, few attempts have been made to determine the predictors of mine impacts at a regional scale. We investigated whether catchment-scale measures of mining impacts could be used to predict biological responses. We collated data from multiple studies and analyzed algae, benthic invertebrate, and fish community data from 186 stream sites, including un-mined streams, and those associated with 620 mines on the West Coast of the South Island, New Zealand. Algal, invertebrate, and fish richness responded to mine impacts and were significantly higher in un-mined compared to mine-impacted streams. Changes in community composition toward more acid- and metal-tolerant species were evident for algae and invertebrates, whereas changes in fish communities were significant and driven by a loss of nonmigratory native species. Consistent catchment-scale predictors of mining activities affecting biota included the time post mining (years), mining density (the number of mines upstream per catchment area), and mining intensity (tons of coal production per catchment area). Mining was associated with a decline in stream biodiversity irrespective of catchment size, and recovery was not evident until at least 30 years after mining activities have ceased. These catchment-scale predictors can provide managers and regulators with practical metrics to focus on management and remediation decisions.

  17. Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids

    PubMed Central

    2014-01-01

    Background Computational methods for mining of biomedical literature can be useful in augmenting manual searches of the literature using keywords for disease-specific biomarker discovery from biofluids. In this work, we develop and apply a semi-automated literature mining method to mine abstracts obtained from PubMed to discover putative biomarkers of breast and lung cancers in specific biofluids. Methodology A positive set of abstracts was defined by the terms ‘breast cancer’ and ‘lung cancer’ in conjunction with 14 separate ‘biofluids’ (bile, blood, breastmilk, cerebrospinal fluid, mucus, plasma, saliva, semen, serum, synovial fluid, stool, sweat, tears, and urine), while a negative set of abstracts was defined by the terms ‘(biofluid) NOT breast cancer’ or ‘(biofluid) NOT lung cancer.’ More than 5.3 million total abstracts were obtained from PubMed and examined for biomarker-disease-biofluid associations (34,296 positive and 2,653,396 negative for breast cancer; 28,355 positive and 2,595,034 negative for lung cancer). Biological entities such as genes and proteins were tagged using ABNER, and processed using Python scripts to produce a list of putative biomarkers. Z-scores were calculated, ranked, and used to determine significance of putative biomarkers found. Manual verification of relevant abstracts was performed to assess our method’s performance. Results Biofluid-specific markers were identified from the literature, assigned relevance scores based on frequency of occurrence, and validated using known biomarker lists and/or databases for lung and breast cancer [NCBI’s On-line Mendelian Inheritance in Man (OMIM), Cancer Gene annotation server for cancer genomics (CAGE), NCBI’s Genes & Disease, NCI’s Early Detection Research Network (EDRN), and others]. The specificity of each marker for a given biofluid was calculated, and the performance of our semi-automated literature mining method assessed for breast and lung cancer

  18. Identifying Liver Cancer and Its Relations with Diseases, Drugs, and Genes: A Literature-Based Approach

    PubMed Central

    Song, Min

    2016-01-01

    In biomedicine, scientific literature is a valuable source for knowledge discovery. Mining knowledge from textual data has become an ever important task as the volume of scientific literature is growing unprecedentedly. In this paper, we propose a framework for examining a certain disease based on existing information provided by scientific literature. Disease-related entities that include diseases, drugs, and genes are systematically extracted and analyzed using a three-level network-based approach. A paper-entity network and an entity co-occurrence network (macro-level) are explored and used to construct six entity specific networks (meso-level). Important diseases, drugs, and genes as well as salient entity relations (micro-level) are identified from these networks. Results obtained from the literature-based literature mining can serve to assist clinical applications. PMID:27195695

  19. Precursor-centric genome-mining approach for lasso peptide discovery

    PubMed Central

    Maksimov, Mikhail O.; Pelczer, István; Link, A. James

    2012-01-01

    Lasso peptides are a class of ribosomally synthesized posttranslationally modified natural products found in bacteria. Currently known lasso peptides have a diverse set of pharmacologically relevant activities, including inhibition of bacterial growth, receptor antagonism, and enzyme inhibition. The biosynthesis of lasso peptides is specified by a cluster of three genes encoding a precursor protein and two enzymes. Here we develop a unique genome-mining algorithm to identify lasso peptide gene clusters in prokaryotes. Our approach involves pattern matching to a small number of conserved amino acids in precursor proteins, and thus allows for a more global survey of lasso peptide gene clusters than does homology-based genome mining. Of more than 3,000 currently sequenced prokaryotic genomes, we found 76 organisms that are putative lasso peptide producers. These organisms span nine bacterial phyla and an archaeal phylum. To provide validation of the genome-mining method, we focused on a single lasso peptide predicted to be produced by the freshwater bacterium Asticcacaulis excentricus. Heterologous expression of an engineered, minimal gene cluster in Escherichia coli led to the production of a unique lasso peptide, astexin-1. At 23 aa, astexin-1 is the largest lasso peptide isolated to date. It is also highly polar, in contrast to many lasso peptides that are primarily hydrophobic. Astexin-1 has modest antimicrobial activity against its phylogenetic relative Caulobacter crescentus. The solution structure of astexin-1 was determined revealing a unique topology that is stabilized by hydrogen bonding between segments of the peptide. PMID:22949633

  20. Precursor-centric genome-mining approach for lasso peptide discovery.

    PubMed

    Maksimov, Mikhail O; Pelczer, István; Link, A James

    2012-09-18

    Lasso peptides are a class of ribosomally synthesized posttranslationally modified natural products found in bacteria. Currently known lasso peptides have a diverse set of pharmacologically relevant activities, including inhibition of bacterial growth, receptor antagonism, and enzyme inhibition. The biosynthesis of lasso peptides is specified by a cluster of three genes encoding a precursor protein and two enzymes. Here we develop a unique genome-mining algorithm to identify lasso peptide gene clusters in prokaryotes. Our approach involves pattern matching to a small number of conserved amino acids in precursor proteins, and thus allows for a more global survey of lasso peptide gene clusters than does homology-based genome mining. Of more than 3,000 currently sequenced prokaryotic genomes, we found 76 organisms that are putative lasso peptide producers. These organisms span nine bacterial phyla and an archaeal phylum. To provide validation of the genome-mining method, we focused on a single lasso peptide predicted to be produced by the freshwater bacterium Asticcacaulis excentricus. Heterologous expression of an engineered, minimal gene cluster in Escherichia coli led to the production of a unique lasso peptide, astexin-1. At 23 aa, astexin-1 is the largest lasso peptide isolated to date. It is also highly polar, in contrast to many lasso peptides that are primarily hydrophobic. Astexin-1 has modest antimicrobial activity against its phylogenetic relative Caulobacter crescentus. The solution structure of astexin-1 was determined revealing a unique topology that is stabilized by hydrogen bonding between segments of the peptide.

  1. A New Approach in Coal Mine Exploration Using Cosmic Ray Muons

    NASA Astrophysics Data System (ADS)

    Darijani, Reza; Negarestani, Ali; Rezaie, Mohammad Reza; Fatemi, Syed Jalil; Akhond, Ahmad

    2016-08-01

    Muon radiography is a technique that uses cosmic ray muons to image the interior of large scale geological structures. The muon absorption in matter is the most important parameter in cosmic ray muon radiography. Cosmic ray muon radiography is similar to X-ray radiography. The main aim in this survey is the simulation of the muon radiography for exploration of mines. So, the production source, tracking, and detection of cosmic ray muons were simulated by MCNPX code. For this purpose, the input data of the source card in MCNPX code were extracted from the muon energy spectrum at sea level. In addition, the other input data such as average density and thickness of layers that were used in this code are the measured data from Pabdana (Kerman, Iran) coal mines. The average thickness and density of these layers in the coal mines are from 2 to 4 m and 1.3 gr/c3, respectively. To increase the spatial resolution, a detector was placed inside the mountain. The results indicated that using this approach, the layers with minimum thickness about 2.5 m can be identified.

  2. A systematic approach to identify cellular auxetic materials

    NASA Astrophysics Data System (ADS)

    Körner, Carolin; Liebold-Ribeiro, Yvonne

    2015-02-01

    Auxetics are materials showing a negative Poisson’s ratio. This characteristic leads to unusual mechanical properties that make this an interesting class of materials. So far no systematic approach for generating auxetic cellular materials has been reported. In this contribution, we present a systematic approach to identifying auxetic cellular materials based on eigenmode analysis. The fundamental mechanism generating auxetic behavior is identified as rotation. With this knowledge, a variety of complex two-dimensional (2D) and three-dimensional (3D) auxetic structures based on simple unit cells can be identified.

  3. A Tools-Based Approach to Teaching Data Mining Methods

    ERIC Educational Resources Information Center

    Jafar, Musa J.

    2010-01-01

    Data mining is an emerging field of study in Information Systems programs. Although the course content has been streamlined, the underlying technology is still in a state of flux. The purpose of this paper is to describe how we utilized Microsoft Excel's data mining add-ins as a front-end to Microsoft's Cloud Computing and SQL Server 2008 Business…

  4. A data mining approach to finding relationships between reservoir properties and oil production for CHOPS

    NASA Astrophysics Data System (ADS)

    Cai, Yongxiang; Wang, Xin; Hu, Kezhen; Dong, Mingzhe

    2014-12-01

    Cold heavy oil production with sand (CHOPS) is a primary oil extraction process for heavy crude oil and reservoir properties are key factors that contribute to the effectiveness of CHOPS. However, identification of the key reservoir properties and quantification of the relationships between the reservoir properties and the oil production are still challenging tasks. In this paper, we propose the use of a data mining approach for finding quantitative relationships between various reservoir properties and oil production for CHOPS. The approach includes four steps: firstly, a set of reservoir properties are identified to describe reservoir characteristics through a petrophysical analysis. In addition to common parameters, such as porosity and permeability, two new parameters - a fluid mobility factor and the maximum inscribed rectangular of net pay (MIRNP) - are proposed. Secondly, three new parameters to describe the production performance of wells are proposed: the peak value, effective life cycle and effective yield. Next, the fuzzy ranking method is used to rank the importance of the identified reservoir properties in terms of oil production. Finally, association rule mining is used to obtain quantitative relationships between reservoir property variables and the production performance of wells. The proposed methods have been applied for 118 wells in the Sparky Formation of the Lloydminster heavy oil field in Alberta. The result shows that the production performance of wells in the area could be described and predicted by using the found quantitative relations.

  5. Risk evaluation of uranium mining: A geochemical inverse modelling approach

    NASA Astrophysics Data System (ADS)

    Rillard, J.; Zuddas, P.; Scislewski, A.

    2011-12-01

    It is well known that uranium extraction operations can increase risks linked to radiation exposure. The toxicity of uranium and associated heavy metals is the main environmental concern regarding exploitation and processing of U-ore. In areas where U mining is planned, a careful assessment of toxic and radioactive element concentrations is recommended before the start of mining activities. A background evaluation of harmful elements is important in order to prevent and/or quantify future water contamination resulting from possible migration of toxic metals coming from ore and waste water interaction. Controlled leaching experiments were carried out to investigate processes of ore and waste (leached ore) degradation, using samples from the uranium exploitation site located in Caetité-Bahia, Brazil. In experiments in which the reaction of waste with water was tested, we found that the water had low pH and high levels of sulphates and aluminium. On the other hand, in experiments in which ore was tested, the water had a chemical composition comparable to natural water found in the region of Caetité. On the basis of our experiments, we suggest that waste resulting from sulphuric acid treatment can induce acidification and salinization of surface and ground water. For this reason proper storage of waste is imperative. As a tool to evaluate the risks, a geochemical inverse modelling approach was developed to estimate the water-mineral interaction involving the presence of toxic elements. We used a method earlier described by Scislewski and Zuddas 2010 (Geochim. Cosmochim. Acta 74, 6996-7007) in which the reactive surface area of mineral dissolution can be estimated. We found that the reactive surface area of rock parent minerals is not constant during time but varies according to several orders of magnitude in only two months of interaction. We propose that parent mineral heterogeneity and particularly, neogenic phase formation may explain the observed variation of the

  6. National Conference on Mining-Influenced Waters: Approaches for Characterization, Source Control and Treatment

    EPA Science Inventory

    The conference goal was to provide a forum for the exchange of scientific information on current and emerging approaches to assessing characterization, monitoring, source control, treatment and/or remediation on mining-influenced waters. The conference was aimed at mining remedi...

  7. SMM-system: A mining tool to identify specific markers in Salmonella enterica.

    PubMed

    Yu, Shuijing; Liu, Weibing; Shi, Chunlei; Wang, Dapeng; Dan, Xianlong; Li, Xiao; Shi, Xianming

    2011-03-01

    This report presents SMM-system, a software package that implements various personalized pre- and post-BLASTN tasks for mining specific markers of microbial pathogens. The main functionalities of SMM-system are summarized as follows: (i) converting multi-FASTA file, (ii) cutting interesting genomic sequence, (iii) automatic high-throughput BLASTN searches, and (iv) screening target sequences. The utility of SMM-system was demonstrated by using it to identify 214 Salmonella enterica-specific protein-coding sequences (CDSs). Eighteen primer pairs were designed based on eighteen S. enterica-specific CDSs, respectively. Seven of these primer pairs were validated with PCR assay, which showed 100% inclusivity for the 101 S. enterica genomes and 100% exclusivity of 30 non-S. enterica genomes. Three specific primer pairs were chosen to develop a multiplex PCR assay, which generated specific amplicons with a size of 180bp (SC1286), 238bp (SC1598) and 405bp (SC4361), respectively. This study demonstrates that SMM-system is a high-throughput specific marker generation tool that can be used to identify genus-, species-, serogroup- and even serovar-specific DNA sequences of microbial pathogens, which has a potential to be applied in food industries, diagnostics and taxonomic studies. SMM-system is freely available and can be downloaded from http://foodsafety.sjtu.edu.cn/SMM-system.html.

  8. Identifying the Uncertainty in Physician Practice Location through Spatial Analytics and Text Mining

    PubMed Central

    Shi, Xuan; Xue, Bowei; Xierali, Imam M.

    2016-01-01

    In response to the widespread concern about the adequacy, distribution, and disparity of access to a health care workforce, the correct identification of physicians’ practice locations is critical to access public health services. In prior literature, little effort has been made to detect and resolve the uncertainty about whether the address provided by a physician in the survey is a practice address or a home address. This paper introduces how to identify the uncertainty in a physician’s practice location through spatial analytics, text mining, and visual examination. While land use and zoning code, embedded within the parcel datasets, help to differentiate resident areas from other types, spatial analytics may have certain limitations in matching and comparing physician and parcel datasets with different uncertainty issues, which may lead to unforeseen results. Handling and matching the string components between physicians’ addresses and the addresses of the parcels could identify the spatial uncertainty and instability to derive a more reasonable relationship between different datasets. Visual analytics and examination further help to clarify the undetectable patterns. This research will have a broader impact over federal and state initiatives and policies to address both insufficiency and maldistribution of a health care workforce to improve the accessibility to public health services. PMID:27657100

  9. TOXICITY APPROACHES TO ASSESSING MINING IMPACTS AND MINE WASTE TREATMENT EFFECTIVENESS

    EPA Science Inventory

    The USEPA Office of Research and Development's National Exposure Research Laboratory and National Risk Management Research Laboratory have been evaluating the impact of mining sites on receiving streams and the effectiveness of waste treatment technologies in removing toxicity fo...

  10. Analysis of biological processes and diseases using text mining approaches.

    PubMed

    Krallinger, Martin; Leitner, Florian; Valencia, Alfonso

    2010-01-01

    A number of biomedical text mining systems have been developed to extract biologically relevant information directly from the literature, complementing bioinformatics methods in the analysis of experimentally generated data. We provide a short overview of the general characteristics of natural language data, existing biomedical literature databases, and lexical resources relevant in the context of biomedical text mining. A selected number of practically useful systems are introduced together with the type of user queries supported and the results they generate. The extraction of biological relationships, such as protein-protein interactions as well as metabolic and signaling pathways using information extraction systems, will be discussed through example cases of cancer-relevant proteins. Basic strategies for detecting associations of genes to diseases together with literature mining of mutations, SNPs, and epigenetic information (methylation) are described. We provide an overview of disease-centric and gene-centric literature mining methods for linking genes to phenotypic and genotypic aspects. Moreover, we discuss recent efforts for finding biomarkers through text mining and for gene list analysis and prioritization. Some relevant issues for implementing a customized biomedical text mining system will be pointed out. To demonstrate the usefulness of literature mining for the molecular oncology domain, we implemented two cancer-related applications. The first tool consists of a literature mining system for retrieving human mutations together with supporting articles. Specific gene mutations are linked to a set of predefined cancer types. The second application consists of a text categorization system supporting breast cancer-specific literature search and document-based breast cancer gene ranking. Future trends in text mining emphasize the importance of community efforts such as the BioCreative challenge for the development and integration of multiple systems into

  11. Acid mine drainage risks - A modeling approach to siting mine facilities in Northern Minnesota USA

    NASA Astrophysics Data System (ADS)

    Myers, Tom

    2016-02-01

    Most watershed-scale planning for mine-caused contamination concerns remediation of past problems while future planning relies heavily on engineering controls. As an alternative, a watershed scale groundwater fate and transport model for the Rainy Headwaters, a northeastern Minnesota watershed, has been developed to examine the risks of leaks or spills to a pristine downstream watershed. The model shows that the risk depends on the location and whether the source of the leak is on the surface or from deeper underground facilities. Underground sources cause loads that last longer but arrive at rivers after a longer travel time and have lower concentrations due to dilution and attenuation. Surface contaminant sources could cause much more short-term damage to the resource. Because groundwater dominates baseflow, mine contaminant seepage would cause the most damage during low flow periods. Groundwater flow and transport modeling is a useful tool for decreasing the risk to downgradient sources by aiding in the placement of mine facilities. Although mines are located based on the minerals, advance planning and analysis could avoid siting mine facilities where failure or leaks would cause too much natural resource damage. Watershed scale transport modeling could help locate the facilities or decide in advance that the mine should not be constructed due to the risk to downstream resources.

  12. Identifying woody vegetation on coal surface mines using phenological indicators with multitemporal Landsat imagery

    NASA Astrophysics Data System (ADS)

    Oliphant, A. J.; Li, J.; Wynne, R. H.; Donovan, P. F.; Zipper, C. E.

    2014-11-01

    Surface mining for coal has disturbed large land areas in the Appalachian Mountains. Better information on mined lands' ecosystem recovery status is necessary for effective environmental management in mining-impacted regions. Because record quality varies between state mining agencies and much mining occurred prior to widespread use of geospatial technologies, accurate maps of mining extents, durations, and land cover effects are often not available. Landsat data are well suited to mapping and characterizing land cover and forest recovery on former coal surface mines. Past mine reclamation techniques have often failed to restore premining forest vegetation but natural processes may enable native forests to re-establish on mined areas with time. However, the invasive species autumn olive (Elaeagnus umbellate) is proliferating widely on former coal surface mines, often inhibiting reestablishment of native forests. Autumn olive outcompetes native vegetation because it fixes atmospheric nitrogen and benefits from a longer growing season than native deciduous trees. This longer growing season, along with Landsat 8's high signal to noise ratio, has enabled species-level classification of autumn olive using multitemporal Landsat 8 data at accuracy levels usually only obtainable using higher spatial or spectral resolution sensors. We have used classification and regression tree (CART®) and support vector machine (SVM) to classify five counties in the coal mining region of Virginia for presence and absence of autumn olive. The best model found was a CART® model with 36 nodes which had an overall accuracy of 84% and kappa of 0.68. Autumn olive had conditional kappa of 0.65 and a producers and users accuracy of 86% and 83% respectively. The best SVM model used a second order polynomial kernel and had an overall accuracy of 77%, an overall kappa of 0.54 and a producers and users accuracy of 60% and 90% respectively.

  13. A predictive approach to identify genes differentially expressed

    NASA Astrophysics Data System (ADS)

    Saraiva, Erlandson F.; Louzada, Francisco; Milan, Luís A.; Meira, Silvana; Cobre, Juliana

    2012-10-01

    The main objective of gene expression data analysis is to identify genes that present significant changes in expression levels between a treatment and a control biological condition. In this paper, we propose a Bayesian approach to identify genes differentially expressed calculating credibility intervals from predictive densities which are constructed using sampled mean treatment effect from all genes in study excluding the treatment effect of genes previously identified with statistical evidence for difference. We compare our Bayesian approach with the standard ones based on the use of the t-test and modified t-tests via a simulation study, using small sample sizes which are common in gene expression data analysis. Results obtained indicate that the proposed approach performs better than standard ones, especially for cases with mean differences and increases in treatment variance in relation to control variance. We also apply the methodologies to a publicly available data set on Escherichia coli bacteria.

  14. Identifying Learning Behaviors by Contextualizing Differential Sequence Mining with Action Features and Performance Evolution

    ERIC Educational Resources Information Center

    Kinnebrew, John S.; Biswas, Gautam

    2012-01-01

    Our learning-by-teaching environment, Betty's Brain, captures a wealth of data on students' learning interactions as they teach a virtual agent. This paper extends an exploratory data mining methodology for assessing and comparing students' learning behaviors from these interaction traces. The core algorithm employs sequence mining techniques to…

  15. IDENTIFYING RECENT SURFACE MINING ACTIVITIES USING A NORMALIZED DIFFERENCE VEGETATION INDEX (NDVI) CHANGE DETECTION METHOD

    EPA Science Inventory



    Coal mining is a major resource extraction activity on the Appalachian Mountains. The increased size and frequency of a specific type of surface mining, known as mountain top removal-valley fill, has in recent years raised various environmental concerns. During mountainto...

  16. Identifying Trustworthiness Deficit in Legacy Systems Using the NFR Approach

    DTIC Science & Technology

    2014-01-01

    this “shortfall” in re-engineered systems. In this project we applied the NFR Approach, as a case study to the middleware system called Phoenix used...The legacy system we used as a case study is the Phoenix middleware system used by the Air Force - we identified the trustworthiness deficit in...engineered systems. In this project we applied the NFR Approach, as a case study to the middleware system called Phoenix used by the Air Force and determined

  17. A novel approach to generating CER hypotheses based on mining clinical data.

    PubMed

    Zhang, Shuo; Li, Lin; Yu, Yiqin; Sun, Xingzhi; Xu, Linhao; Zhao, Wei; Teng, Xiaofei; Pan, Yue

    2013-01-01

    Comparative effectiveness research (CER) is a scientific method of investigating the effectiveness of alternative intervention methods. In a CER study, clinical researchers typically start with a CER hypothesis, and aim to evaluate it by applying a series of medical statistical methods. Traditionally, the CER hypotheses are defined manually by clinical researchers. This makes the task of hypothesis generation very time-consuming and the quality of hypothesis heavily dependent on the researchers' skills. Recently, with more electronic medical data being collected, it is highly promising to apply the computerized method for discovering CER hypotheses from clinical data sets. In this poster, we proposes a novel approach to automatically generating CER hypotheses based on mining clinical data, and presents a case study showing that the approach can facilitate clinical researchers to identify potentially valuable hypotheses and eventually define high quality CER studies.

  18. Using a data mining approach to discover behavior correlates of chronic disease: a case study of depression.

    PubMed

    Yoon, Sunmoo; Taha, Basirah; Bakken, Suzanne

    2014-01-01

    The purposes of this methodological paper are: 1) to describe data mining methods for building a classification model for a chronic disease using a U.S. behavior risk factor data set, and 2) to illustrate application of the methods using a case study of depressive disorder. Methods described include: 1) six steps of data mining to build a disease model using classification techniques, 2) an innovative approach to analyzing high-dimensionality data, and 3) a visualization strategy to communicate with clinicians who are unfamiliar with advanced statistics. Our application of data mining strategies identified childhood experience living with mentally ill and sexual abuse, and limited usual activity as the strongest correlates of depression among hundreds variables. The methods that we applied may be useful to others wishing to build a classification model from complex, large volume datasets for other health conditions.

  19. Crime Pattern Analysis: A Spatial Frequent Pattern Mining Approach

    DTIC Science & Technology

    2012-05-10

    analysts. Many police departments aim to accomplish crime mitigation and crime prevention with very few resources. However, the growth in the size and...mining (SFPM) and defines the crime outbreak detection problem . Spatial frequent pattern mining (SFPM) is the process of discovering interesting...an analysis problem that may require a solution using SFPM . 2.1 Crime outbreak detection and Illustration In this section, we define crime outbreak

  20. Trend Motif: A Graph Mining Approach for Analysis of Dynamic Complex Networks

    SciTech Connect

    Jin, R; McCallen, S; Almaas, E

    2007-05-28

    Complex networks have been used successfully in scientific disciplines ranging from sociology to microbiology to describe systems of interacting units. Until recently, studies of complex networks have mainly focused on their network topology. However, in many real world applications, the edges and vertices have associated attributes that are frequently represented as vertex or edge weights. Furthermore, these weights are often not static, instead changing with time and forming a time series. Hence, to fully understand the dynamics of the complex network, we have to consider both network topology and related time series data. In this work, we propose a motif mining approach to identify trend motifs for such purposes. Simply stated, a trend motif describes a recurring subgraph where each of its vertices or edges displays similar dynamics over a userdefined period. Given this, each trend motif occurrence can help reveal significant events in a complex system; frequent trend motifs may aid in uncovering dynamic rules of change for the system, and the distribution of trend motifs may characterize the global dynamics of the system. Here, we have developed efficient mining algorithms to extract trend motifs. Our experimental validation using three disparate empirical datasets, ranging from the stock market, world trade, to a protein interaction network, has demonstrated the efficiency and effectiveness of our approach.

  1. Identifying the Educationally Influential Physician: A Systematic Review of Approaches

    ERIC Educational Resources Information Center

    Kronberger, Matthew P.; Bakken, Lori L.

    2011-01-01

    Introduction: Previous studies have indicated that educationally influential physicians' (EIPs) interactions with peers can lead to practice changes and improved patient outcomes. However, multiple approaches have been used to identify and investigate EIPs' informal or formal influence on practice, which creates study outcomes that are difficult…

  2. Determining the familial risk distribution of colorectal cancer: a data mining approach.

    PubMed

    Chau, Rowena; Jenkins, Mark A; Buchanan, Daniel D; Ait Ouakrim, Driss; Giles, Graham G; Casey, Graham; Gallinger, Steven; Haile, Robert W; Le Marchand, Loic; Newcomb, Polly A; Lindor, Noralane M; Hopper, John L; Win, Aung Ko

    2016-04-01

    This study was aimed to characterize the distribution of colorectal cancer risk using family history of cancers by data mining. Family histories for 10,066 colorectal cancer cases recruited to population cancer registries of the Colon Cancer Family Registry were analyzed using a data mining framework. A novel index was developed to quantify familial cancer aggregation. Artificial neural network was used to identify distinct categories of familial risk. Standardized incidence ratios (SIRs) and corresponding 95% confidence intervals (CIs) of colorectal cancer were calculated for each category. We identified five major, and 66 minor categories of familial risk for developing colorectal cancer. The distribution the major risk categories were: (1) 7% of families (SIR = 7.11; 95% CI 6.65-7.59) had a strong family history of colorectal cancer; (2) 13% of families (SIR = 2.94; 95% CI 2.78-3.10) had a moderate family history of colorectal cancer; (3) 11% of families (SIR = 1.23; 95% CI 1.12-1.36) had a strong family history of breast cancer and a weak family history of colorectal cancer; (4) 9 % of families (SIR = 1.06; 95 % CI 0.96-1.18) had strong family history of prostate cancer and weak family history of colorectal cancer; and (5) 60% of families (SIR = 0.61; 95% CI 0.57-0.65) had a weak family history of all cancers. There is a wide variation of colorectal cancer risk that can be categorized by family history of cancer, with a strong gradient of colorectal cancer risk between the highest and lowest risk categories. The risk of colorectal cancer for people with the highest risk category of family history (7% of the population) was 12-times that for people in the lowest risk category (60%) of the population. Data mining was proven an effective approach for gaining insight into the underlying cancer aggregation patterns and for categorizing familial risk of colorectal cancer.

  3. Genetic and proteomic approaches to identify cancer drug targets

    PubMed Central

    Roti, G; Stegmaier, K

    2012-01-01

    While target-based small-molecule discovery has taken centre-stage in the pharmaceutical industry, there are many cancer-promoting proteins not easily addressed with a traditional target-based screening approach. In order to address this problem, as well as to identify modulators of biological states in the absence of knowing the protein target of the state switch, alternative phenotypic screening approaches, such as gene expression-based and high-content imaging, have been developed. With this renewed interest in phenotypic screening, however, comes the challenge of identifying the binding protein target(s) of small-molecule hits. Emerging technologies have the potential to improve the process of target identification. In this review, we discuss the application of genomic (gene expression-based), genetic (short hairpin RNA and open reading frame screening), and proteomic approaches to protein target identification. PMID:22166799

  4. Genomic approaches to identifying targets for treating β hemoglobinopathies.

    PubMed

    Ngo, Duyen A; Steinberg, Martin H

    2015-07-29

    Sickle cell disease and β thalassemia are common severe diseases with little effective pathophysiologically-based treatment. Their phenotypic heterogeneity prompted genomic approaches to identify modifiers that ultimately might be exploited therapeutically. Fetal hemoglobin (HbF) is the major modulator of the phenotype of the β hemoglobinopathies. HbF inhibits deoxyHbS polymerization and in β thalassemia compensates for the reduction of HbA. The major success of genomics has been a better understanding the genetic regulation of HbF by identifying the major quantitative trait loci for this trait. If the targets identified can lead to means of increasing HbF to therapeutic levels in sufficient numbers of sickle or β-thalassemia erythrocytes, the pathophysiology of these diseases would be reversed. The availability of new target loci, high-throughput drug screening, and recent advances in genome editing provide the opportunity for new approaches to therapeutically increasing HbF production.

  5. A Computer Vision Approach to Identify Einstein Rings and Arcs

    NASA Astrophysics Data System (ADS)

    Lee, Chien-Hsiu

    2017-03-01

    Einstein rings are rare gems of strong lensing phenomena; the ring images can be used to probe the underlying lens gravitational potential at every position angles, tightly constraining the lens mass profile. In addition, the magnified images also enable us to probe high-z galaxies with enhanced resolution and signal-to-noise ratios. However, only a handful of Einstein rings have been reported, either from serendipitous discoveries or or visual inspections of hundred thousands of massive galaxies or galaxy clusters. In the era of large sky surveys, an automated approach to identify ring pattern in the big data to come is in high demand. Here, we present an Einstein ring recognition approach based on computer vision techniques. The workhorse is the circle Hough transform that recognise circular patterns or arcs in the images. We propose a two-tier approach by first pre-selecting massive galaxies associated with multiple blue objects as possible lens, than use Hough transform to identify circular pattern. As a proof-of-concept, we apply our approach to SDSS, with a high completeness, albeit with low purity. We also apply our approach to other lenses in DES, HSC-SSP, and UltraVISTA survey, illustrating the versatility of our approach.

  6. Hazards identified and the need for health risk assessment in the South African mining industry.

    PubMed

    Utembe, W; Faustman, E M; Matatiele, P; Gulumian, M

    2015-12-01

    Although mining plays a prominent role in the economy of South Africa, it is associated with many chemical hazards. Exposure to dust from mining can lead to many pathological effects depending on mineralogical composition, size, shape and levels and duration of exposure. Mining and processing of minerals also result in occupational exposure to toxic substances such as platinum, chromium, vanadium, manganese, mercury, cyanide and diesel particulate. South Africa has set occupational exposure limits (OELs) for some hazards, but mine workers are still at a risk. Since the hazard posed by a mineral depends on its physiochemical properties, it is recommended that South Africa should not simply adopt OELs from other countries but rather set her own standards based on local toxicity studies. The limits should take into account the issue of mixtures to which workers could be exposed as well as the health status of the workers. The mining industry is also a source of contamination of the environment, due inter alia to the large areas of tailings dams and dumps left behind. Therefore, there is need to develop guidelines for safe land-uses of contaminated lands after mine closure.

  7. Practical Approaches for Mining Frequent Patterns in Molecular Datasets.

    PubMed

    Naulaerts, Stefan; Moens, Sandy; Engelen, Kristof; Berghe, Wim Vanden; Goethals, Bart; Laukens, Kris; Meysman, Pieter

    2016-01-01

    Pattern detection is an inherent task in the analysis and interpretation of complex and continuously accumulating biological data. Numerous itemset mining algorithms have been developed in the last decade to efficiently detect specific pattern classes in data. Although many of these have proven their value for addressing bioinformatics problems, several factors still slow down promising algorithms from gaining popularity in the life science community. Many of these issues stem from the low user-friendliness of these tools and the complexity of their output, which is often large, static, and consequently hard to interpret. Here, we apply three software implementations on common bioinformatics problems and illustrate some of the advantages and disadvantages of each, as well as inherent pitfalls of biological data mining. Frequent itemset mining exists in many different flavors, and users should decide their software choice based on their research question, programming proficiency, and added value of extra features.

  8. Practical Approaches for Mining Frequent Patterns in Molecular Datasets

    PubMed Central

    Naulaerts, Stefan; Moens, Sandy; Engelen, Kristof; Berghe, Wim Vanden; Goethals, Bart; Laukens, Kris; Meysman, Pieter

    2016-01-01

    Pattern detection is an inherent task in the analysis and interpretation of complex and continuously accumulating biological data. Numerous itemset mining algorithms have been developed in the last decade to efficiently detect specific pattern classes in data. Although many of these have proven their value for addressing bioinformatics problems, several factors still slow down promising algorithms from gaining popularity in the life science community. Many of these issues stem from the low user-friendliness of these tools and the complexity of their output, which is often large, static, and consequently hard to interpret. Here, we apply three software implementations on common bioinformatics problems and illustrate some of the advantages and disadvantages of each, as well as inherent pitfalls of biological data mining. Frequent itemset mining exists in many different flavors, and users should decide their software choice based on their research question, programming proficiency, and added value of extra features. PMID:27168722

  9. GTA: a game theoretic approach to identifying cancer subnetwork markers.

    PubMed

    Farahmand, S; Goliaei, S; Ansari-Pour, N; Razaghi-Moghadam, Z

    2016-03-01

    The identification of genetic markers (e.g. genes, pathways and subnetworks) for cancer has been one of the most challenging research areas in recent years. A subset of these studies attempt to analyze genome-wide expression profiles to identify markers with high reliability and reusability across independent whole-transcriptome microarray datasets. Therefore, the functional relationships of genes are integrated with their expression data. However, for a more accurate representation of the functional relationships among genes, utilization of the protein-protein interaction network (PPIN) seems to be necessary. Herein, a novel game theoretic approach (GTA) is proposed for the identification of cancer subnetwork markers by integrating genome-wide expression profiles and PPIN. The GTA method was applied to three distinct whole-transcriptome breast cancer datasets to identify the subnetwork markers associated with metastasis. To evaluate the performance of our approach, the identified subnetwork markers were compared with gene-based, pathway-based and network-based markers. We show that GTA is not only capable of identifying robust metastatic markers, it also provides a higher classification performance. In addition, based on these GTA-based subnetworks, we identified a new bonafide candidate gene for breast cancer susceptibility.

  10. The Usage of Association Rule Mining to Identify Influencing Factors on Deafness After Birth

    PubMed Central

    Shahraki, Azimeh Danesh; Safdari, Reza; Gahfarokhi, Hamid Habibi; Tahmasebian, Shahram

    2015-01-01

    Background: Providing complete and high quality health care services has very important role to enable people to understand the factors related to personal and social health and to make decision regarding choice of suitable healthy behaviors in order to achieve healthy life. For this reason, demographic and clinical data of person are collecting, this huge volume of data can be known as a valuable resource for analyzing, exploring and discovering valuable information and communication. This study using forum rules techniques in the data mining has tried to identify the affecting factors on hearing loss after birth in Iran. Materials and Methods: The survey is kind of data oriented study. The population of the study is contained questionnaires in several provinces of the country. First, all data of questionnaire was implemented in the form of information table in Software SQL Server and followed by Data Entry using written software of C # .Net, then algorithm Association in SQL Server Data Tools software and Clementine software was implemented to determine the rules and hidden patterns in the gathered data. Findings: Two factors of number of deaf brothers and the degree of consanguinity of the parents have a significant impact on severity of deafness of individuals. Also, when the severity of hearing loss is greater than or equal to moderately severe hearing loss, people use hearing aids and Men are also less interested in the use of hearing aids. Conclusion: In fact, it can be said that in families with consanguineous marriage of parents that are from first degree (girl/boy cousins) and 2nd degree relatives (girl/boy cousins) and especially from first degree, the number of people with severe hearing loss or deafness are more and in the use of hearing aids, gender of the patient is more important than the severity of the hearing loss. PMID:26862245

  11. Large screen approaches to identify novel malaria vaccine candidates

    PubMed Central

    Davies, D. Huw; Duffy, Patrick; Bodmer, Jean-Luc; Felgner, Philip L.; Doolan, Denise L.

    2016-01-01

    Until recently, malaria vaccine development efforts have focused almost exclusively on a handful of well characterized Plasmodium falciparum antigens. Despite dedicated work by many researchers on different continents spanning more than half a century, a successful malaria vaccine remains elusive. Sequencing of the P. falciparum genome has revealed more than five thousand genes, providing the foundation for systematic approaches to discover candidate vaccine antigens. We are taking advantage of this wealth of information to discover new antigens that may be more effective vaccine targets. Herein, we describe different approaches to large-scale screening of the P. falciparum genome to identify targets of either antibody responses or T cell responses using human specimens collected in Controlled Human Malaria Infections (CHMI) or under conditions of natural exposure in the field. These genome, proteome and transcriptome based approaches offer enormous potential for the development of an efficacious malaria vaccine. PMID:26428458

  12. A data mining approach to evolutionary optimisation of noisy multi-objective problems

    NASA Astrophysics Data System (ADS)

    Chia, J. Y.; Goh, C. K.; Shim, V. A.; Tan, K. C.

    2012-07-01

    Many real world optimisation problems have opposing objective functions which are subjected to the influence of noise. Noise in the objective functions can adversely affect the stability, performance and convergence of evolutionary optimisers. This article proposes a Bayesian frequent data mining (DM) approach to identify optimal regions to guide the population amidst the presence of noise. The aggregated information provided by all the solutions helped to average out the effects of noise. This article proposes a DM crossover operator to make use of the rules mined. After implementation of this operator, a better convergence to the true Pareto front is achieved at the expense of the diversity of the solution. Consequently, an ExtremalExploration operator will be proposed in the later part of this article to help curb the loss in diversity caused by the DM operator. The result is a more directive search with a faster convergence rate. The search is effective in decision space where the Pareto set is in a tight cluster. A further investigation of the performance of the proposed algorithm in noisy and noiseless environment will also be studied with respect to non-convexity, discontinuity, multi-modality and uniformity. The proposed algorithm is evaluated on ZDT and other benchmarks problems. The results of the simulations indicate that the proposed method is effective in handling noise and is competitive against the other noise tolerant algorithms.

  13. Systems biology approaches to identify developmental bases for lung diseases.

    PubMed

    Bhattacharya, Soumyaroop; Mariani, Thomas J

    2013-04-01

    A greater understanding of the regulatory processes contributing to lung development could be helpful to identify strategies to ameliorate morbidity and mortality in premature infants and to identify individuals at risk for congenital and/or chronic lung diseases. Over the past decade, genomics technologies have enabled the production of rich gene expression databases providing information for all genes across developmental time or in diseased tissue. These data sets facilitate systems biology approaches for identifying underlying biological modules and programs contributing to the complex processes of normal development and those that may be associated with disease states. The next decade will undoubtedly see rapid and significant advances in redefining both lung development and disease at the systems level.

  14. Reverse Pathway Genetic Approach Identifies Epistasis in Autism Spectrum Disorders

    PubMed Central

    Traglia, Michela; Tsang, Kathryn; Bearden, Carrie E.; Rauen, Katherine A.

    2017-01-01

    Although gene-gene interaction, or epistasis, plays a large role in complex traits in model organisms, genome-wide by genome-wide searches for two-way interaction have limited power in human studies. We thus used knowledge of a biological pathway in order to identify a contribution of epistasis to autism spectrum disorders (ASDs) in humans, a reverse-pathway genetic approach. Based on previous observation of increased ASD symptoms in Mendelian disorders of the Ras/MAPK pathway (RASopathies), we showed that common SNPs in RASopathy genes show enrichment for association signal in GWAS (P = 0.02). We then screened genome-wide for interactors with RASopathy gene SNPs and showed strong enrichment in ASD-affected individuals (P < 2.2 x 10−16), with a number of pairwise interactions meeting genome-wide criteria for significance. Finally, we utilized quantitative measures of ASD symptoms in RASopathy-affected individuals to perform modifier mapping via GWAS. One top region overlapped between these independent approaches, and we showed dysregulation of a gene in this region, GPR141, in a RASopathy neural cell line. We thus used orthogonal approaches to provide strong evidence for a contribution of epistasis to ASDs, confirm a role for the Ras/MAPK pathway in idiopathic ASDs, and to identify a convergent candidate gene that may interact with the Ras/MAPK pathway. PMID:28076348

  15. An enhanced stream mining approach for network anomaly detection

    NASA Astrophysics Data System (ADS)

    Bellaachia, Abdelghani; Bhatt, Rajat

    2005-03-01

    Network anomaly detection is one of the hot topics in the market today. Currently, researchers are trying to find a way in which machines could automatically learn both normal and anomalous behavior and thus detect anomalies if and when they occur. Most important applications which could spring out of these systems is intrusion detection and spam mail detection. In this paper, the primary focus on the problem and solution of "real time" network intrusion detection although the underlying theory discussed may be used for other applications of anomaly detection (like spam detection or spy-ware detection) too. Since a machine needs a learning process on its own, data mining has been chosen as a preferred technique. The object of this paper is to present a real time clustering system; we call Enhanced Stream Mining (ESM) which could analyze packet information (headers, and data) to determine intrusions.

  16. A multidimensional proteomic approach to identify hypertrophy-associated proteins.

    PubMed

    Lindsey, Merry L; Goshorn, Danielle K; Comte-Walters, Susana; Hendrick, Jennifer W; Hapke, Elizabeth; Zile, Michael R; Schey, Kevin

    2006-04-01

    Left ventricular hypertrophy (LVH) is a leading cause of congestive heart failure. The exact mechanisms that control cardiac growth and regulate the transition to failure are not fully understood, in part due to the lack of a complete inventory of proteins associated with LVH. We investigated the proteomic basis of LVH using the transverse aortic constriction model of pressure overload in mice coupled with a multidimensional approach to identify known and novel proteins that may be relevant to the development and maintenance of LVH. We identified 123 proteins that were differentially expressed during LVH, including LIM proteins, thioredoxin, myoglobin, fatty acid binding protein 3, the abnormal spindle-like microcephaly protein (ASPM), and cytoskeletal proteins such as actin and myosin. In addition, proteins with unknown functions were identified, providing new directions for future research in this area. We also discuss common pitfalls and strategies to overcome the limitations of current proteomic technologies. Together, the multidimensional approach provides insight into the proteomic changes that occur in the LV during hypertrophy.

  17. An Approach to Realizing Process Control for Underground Mining Operations of Mobile Machines.

    PubMed

    Song, Zhen; Schunnesson, Håkan; Rinne, Mikael; Sturgul, John

    2015-01-01

    The excavation and production in underground mines are complicated processes which consist of many different operations. The process of underground mining is considerably constrained by the geometry and geology of the mine. The various mining operations are normally performed in series at each working face. The delay of a single operation will lead to a domino effect, thus delay the starting time for the next process and the completion time of the entire process. This paper presents a new approach to the process control for underground mining operations, e.g. drilling, bolting, mucking. This approach can estimate the working time and its probability for each operation more efficiently and objectively by improving the existing PERT (Program Evaluation and Review Technique) and CPM (Critical Path Method). If the delay of the critical operation (which is on a critical path) inevitably affects the productivity of mined ore, the approach can rapidly assign mucking machines new jobs to increase this amount at a maximum level by using a new mucking algorithm under external constraints.

  18. An Approach to Realizing Process Control for Underground Mining Operations of Mobile Machines

    PubMed Central

    Song, Zhen; Schunnesson, Håkan; Rinne, Mikael; Sturgul, John

    2015-01-01

    The excavation and production in underground mines are complicated processes which consist of many different operations. The process of underground mining is considerably constrained by the geometry and geology of the mine. The various mining operations are normally performed in series at each working face. The delay of a single operation will lead to a domino effect, thus delay the starting time for the next process and the completion time of the entire process. This paper presents a new approach to the process control for underground mining operations, e.g. drilling, bolting, mucking. This approach can estimate the working time and its probability for each operation more efficiently and objectively by improving the existing PERT (Program Evaluation and Review Technique) and CPM (Critical Path Method). If the delay of the critical operation (which is on a critical path) inevitably affects the productivity of mined ore, the approach can rapidly assign mucking machines new jobs to increase this amount at a maximum level by using a new mucking algorithm under external constraints. PMID:26062092

  19. PM2: a partitioning-mining-measuring method for identifying progressive changes in older adults' sleeping activity.

    PubMed

    Lin, Qiang; Zhang, Daqing; Connelly, Kay; Zhou, Xingshe; Ni, Hongbo

    2014-01-01

    As people age, their health typically declines, resulting in difficulty in performing daily activities. Sleep-related problems are common issues with older adults, including shifts in circadian rhythms. A detection method is proposed to identify progressive changes in sleeping activity using a three-step process: partitioning, mining, and measuring. Specifically, the original spatiotemporal representation of each sleeping activity instance was first transformed into a sequence of equal-sized segments, or symbols, via a partitioning process. A data-mining-based algorithm was proposed to find symbols that are not present in all instances of a sleeping activity. Finally, a measuring process was responsible for evaluating the changes in these symbols. Experimental evaluation conducted on a group of datasets of older adults showed that the proposed method is able to identify progressive changes in sleeping activity.

  20. Data mining approaches for information retrieval from genomic databases

    NASA Astrophysics Data System (ADS)

    Liu, Donglin; Singh, Gautam B.

    2000-04-01

    Sequence retrieval in genomic databases is used for finding sequences related to a query sequence specified by a user. Comparison is the main part of the retrieval system in genomic databases. An efficient sequence comparison algorithm is critical in bioinformatics. There are several different algorithms to perform sequence comparison, such as the suffix array based database search, divergence measurement, methods that rely upon the existence of a local similarity between the query sequence and sequences in the database, or common mutual information between query and sequences in DB. In this paper we have described a new method for DNA sequence retrieval based on data mining techniques. Data mining tools generally find patterns among data and have been successfully applied in industries to improve marketing, sales, and customer support operations. We have applied the descriptive data mining techniques to find relevant patterns that are significant for comparing genetic sequences. Relevance feedback score based on common patterns is developed and employed to compute distance between sequences. The contigs of human chromosomes are used to test the retrieval accuracy and the experimental results are presented.

  1. A Visualization System Using Data Mining Techniques for Identifying Information Sources.

    ERIC Educational Resources Information Center

    Fowler, Richard H.; Karadayi, Tarkan; Chen, Zhixiang; Meng, Xiannong; Fowler, Wendy A. Lawrence

    The Visual Analysis System (VAS) was developed to couple emerging successes in data mining with information visualization techniques in order to create a richly interactive environment for information retrieval from the World Wide Web. VAS's retrieval strategy operates by first using a conventional search engine to form a core set of retrieved…

  2. Quantiles Regression Approach to Identifying the Determinant of Breastfeeding Duration

    NASA Astrophysics Data System (ADS)

    Mahdiyah; Norsiah Mohamed, Wan; Ibrahim, Kamarulzaman

    In this study, quantiles regression approach is applied to the data of Malaysian Family Life Survey (MFLS), to identify factors which are significantly related to the different conditional quantiles of the breastfeeding duration. It is known that the classical linear regression methods are based on minimizing residual sum of squared, but quantiles regression use a mechanism which are based on the conditional median function and the full range of other conditional quantile functions. Overall, it is found that the period of breastfeeding is significantly related to place of living, religion and total number of children in the family.

  3. Dual-band, infrared buried mine detection using a statistical pattern recognition approach

    SciTech Connect

    Buhl, M.R.; Hernandez, J.E.; Clark, G.A.; Sengupta, S.K.

    1993-08-01

    The main objective of this work was to detect surrogate land mines, which were buried in clay and sand, using dual-band, infrared images. A statistical pattern recognition approach was used to achieve this objective. This approach is discussed and results of applying it to real images are given.

  4. THE FUTURE OF COMPUTER-BASED TOXICITY PREDICTION: MECHANISM-BASED MODELS VS. INFORMATION MINING APPROACHES

    EPA Science Inventory


    The Future of Computer-Based Toxicity Prediction:
    Mechanism-Based Models vs. Information Mining Approaches

    When we speak of computer-based toxicity prediction, we are generally referring to a broad array of approaches which rely primarily upon chemical structure ...

  5. A Feature Mining Based Approach for the Classification of Text Documents into Disjoint Classes.

    ERIC Educational Resources Information Center

    Nieto Sanchez, Salvador; Triantaphyllou, Evangelos; Kraft, Donald

    2002-01-01

    Proposes a new approach for classifying text documents into two disjoint classes. Highlights include a brief overview of document clustering; a data mining approach called the One Clause at a Time (OCAT) algorithm which is based on mathematical logic; vector space model (VSM); and comparing the OCAT to the VSM. (Author/LRW)

  6. Data-mining the FlyAtlas online resource to identify core functional motifs across transporting epithelia

    PubMed Central

    2013-01-01

    Background Comparative analysis of tissue-specific transcriptomes is a powerful technique to uncover tissue functions. Our FlyAtlas.org provides authoritative gene expression levels for multiple tissues of Drosophila melanogaster (1). Although the main use of such resources is single gene lookup, there is the potential for powerful meta-analysis to address questions that could not easily be framed otherwise. Here, we illustrate the power of data-mining of FlyAtlas data by comparing epithelial transcriptomes to identify a core set of highly-expressed genes, across the four major epithelial tissues (salivary glands, Malpighian tubules, midgut and hindgut) of both adults and larvae. Method Parallel hypothesis-led and hypothesis-free approaches were adopted to identify core genes that underpin insect epithelial function. In the former, gene lists were created from transport processes identified in the literature, and their expression profiles mapped from the flyatlas.org online dataset. In the latter, gene enrichment lists were prepared for each epithelium, and genes (both transport related and unrelated) consistently enriched in transporting epithelia identified. Results A key set of transport genes, comprising V-ATPases, cation exchangers, aquaporins, potassium and chloride channels, and carbonic anhydrase, was found to be highly enriched across the epithelial tissues, compared with the whole fly. Additionally, a further set of genes that had not been predicted to have epithelial roles, were co-expressed with the core transporters, extending our view of what makes a transporting epithelium work. Further insights were obtained by studying the genes uniquely overexpressed in each epithelium; for example, the salivary gland expresses lipases, the midgut organic solute transporters, the tubules specialize for purine metabolism and the hindgut overexpresses still unknown genes. Conclusion Taken together, these data provide a unique insight into epithelial function in this

  7. Functional epigenetic approach identifies frequently methylated genes in Ewing sarcoma.

    PubMed

    Alholle, Abdullah; Brini, Anna T; Gharanei, Seley; Vaiyapuri, Sumathi; Arrigoni, Elena; Dallol, Ashraf; Gentle, Dean; Kishida, Takeshi; Hiruma, Toru; Avigad, Smadar; Grimer, Robert; Maher, Eamonn R; Latif, Farida

    2013-11-01

    Using a candidate gene approach we recently identified frequent methylation of the RASSF2 gene associated with poor overall survival in Ewing sarcoma (ES). To identify effective biomarkers in ES on a genome-wide scale, we used a functionally proven epigenetic approach, in which gene expression was induced in ES cell lines by treatment with a demethylating agent followed by hybridization onto high density gene expression microarrays. After following a strict selection criterion, 34 genes were selected for expression and methylation analysis in ES cell lines and primary ES. Eight genes (CTHRC1, DNAJA4, ECHDC2, NEFH, NPTX2, PHF11, RARRES2, TSGA14) showed methylation frequencies of>20% in ES tumors (range 24-71%), these genes were expressed in human bone marrow derived mesenchymal stem cells (hBMSC) and hypermethylation was associated with transcriptional silencing. Methylation of NPTX2 or PHF11 was associated with poorer prognosis in ES. In addition, six of the above genes also showed methylation frequency of>20% (range 36-50%) in osteosarcomas. Identification of these genes may provide insights into bone cancer tumorigenesis and development of epigenetic biomarkers for prognosis and detection of these rare tumor types.

  8. A text mining approach to the prediction of disease status from clinical discharge summaries.

    PubMed

    Yang, Hui; Spasic, Irena; Keane, John A; Nenadic, Goran

    2009-01-01

    OBJECTIVE The authors present a system developed for the Challenge in Natural Language Processing for Clinical Data-the i2b2 obesity challenge, whose aim was to automatically identify the status of obesity and 15 related co-morbidities in patients using their clinical discharge summaries. The challenge consisted of two tasks, textual and intuitive. The textual task was to identify explicit references to the diseases, whereas the intuitive task focused on the prediction of the disease status when the evidence was not explicitly asserted. DESIGN The authors assembled a set of resources to lexically and semantically profile the diseases and their associated symptoms, treatments, etc. These features were explored in a hybrid text mining approach, which combined dictionary look-up, rule-based, and machine-learning methods. MEASUREMENTS The methods were applied on a set of 507 previously unseen discharge summaries, and the predictions were evaluated against a manually prepared gold standard. The overall ranking of the participating teams was primarily based on the macro-averaged F-measure. RESULTS The implemented method achieved the macro-averaged F-measure of 81% for the textual task (which was the highest achieved in the challenge) and 63% for the intuitive task (ranked 7(th) out of 28 teams-the highest was 66%). The micro-averaged F-measure showed an average accuracy of 97% for textual and 96% for intuitive annotations. CONCLUSIONS The performance achieved was in line with the agreement between human annotators, indicating the potential of text mining for accurate and efficient prediction of disease statuses from clinical discharge summaries.

  9. Assessing Weather-Yield Relationships in Rice at Local Scale Using Data Mining Approaches.

    PubMed

    Delerce, Sylvain; Dorado, Hugo; Grillon, Alexandre; Rebolledo, Maria Camila; Prager, Steven D; Patiño, Victor Hugo; Garcés Varón, Gabriel; Jiménez, Daniel

    2016-01-01

    Seasonal and inter-annual climate variability have become important issues for farmers, and climate change has been shown to increase them. Simultaneously farmers and agricultural organizations are increasingly collecting observational data about in situ crop performance. Agriculture thus needs new tools to cope with changing environmental conditions and to take advantage of these data. Data mining techniques make it possible to extract embedded knowledge associated with farmer experiences from these large observational datasets in order to identify best practices for adapting to climate variability. We introduce new approaches through a case study on irrigated and rainfed rice in Colombia. Preexisting observational datasets of commercial harvest records were combined with in situ daily weather series. Using Conditional Inference Forest and clustering techniques, we assessed the relationships between climatic factors and crop yield variability at the local scale for specific cultivars and growth stages. The analysis showed clear relationships in the various location-cultivar combinations, with climatic factors explaining 6 to 46% of spatiotemporal variability in yield, and with crop responses to weather being non-linear and cultivar-specific. Climatic factors affected cultivars differently during each stage of development. For instance, one cultivar was affected by high nighttime temperatures in the reproductive stage but responded positively to accumulated solar radiation during the ripening stage. Another was affected by high nighttime temperatures during both the vegetative and reproductive stages. Clustering of the weather patterns corresponding to individual cropping events revealed different groups of weather patterns for irrigated and rainfed systems with contrasting yield levels. Best-suited cultivars were identified for some weather patterns, making weather-site-specific recommendations possible. This study illustrates the potential of data mining for

  10. Assessing Weather-Yield Relationships in Rice at Local Scale Using Data Mining Approaches

    PubMed Central

    Delerce, Sylvain; Dorado, Hugo; Grillon, Alexandre; Rebolledo, Maria Camila; Prager, Steven D.; Patiño, Victor Hugo; Garcés Varón, Gabriel; Jiménez, Daniel

    2016-01-01

    Seasonal and inter-annual climate variability have become important issues for farmers, and climate change has been shown to increase them. Simultaneously farmers and agricultural organizations are increasingly collecting observational data about in situ crop performance. Agriculture thus needs new tools to cope with changing environmental conditions and to take advantage of these data. Data mining techniques make it possible to extract embedded knowledge associated with farmer experiences from these large observational datasets in order to identify best practices for adapting to climate variability. We introduce new approaches through a case study on irrigated and rainfed rice in Colombia. Preexisting observational datasets of commercial harvest records were combined with in situ daily weather series. Using Conditional Inference Forest and clustering techniques, we assessed the relationships between climatic factors and crop yield variability at the local scale for specific cultivars and growth stages. The analysis showed clear relationships in the various location-cultivar combinations, with climatic factors explaining 6 to 46% of spatiotemporal variability in yield, and with crop responses to weather being non-linear and cultivar-specific. Climatic factors affected cultivars differently during each stage of development. For instance, one cultivar was affected by high nighttime temperatures in the reproductive stage but responded positively to accumulated solar radiation during the ripening stage. Another was affected by high nighttime temperatures during both the vegetative and reproductive stages. Clustering of the weather patterns corresponding to individual cropping events revealed different groups of weather patterns for irrigated and rainfed systems with contrasting yield levels. Best-suited cultivars were identified for some weather patterns, making weather-site-specific recommendations possible. This study illustrates the potential of data mining for

  11. Fuzzy OLAP association rules mining-based modular reinforcement learning approach for multiagent systems.

    PubMed

    Kaya, Mehmet; Alhajj, Reda

    2005-04-01

    Multiagent systems and data mining have recently attracted considerable attention in the field of computing. Reinforcement learning is the most commonly used learning process for multiagent systems. However, it still has some drawbacks, including modeling other learning agents present in the domain as part of the state of the environment, and some states are experienced much less than others, or some state-action pairs are never visited during the learning phase. Further, before completing the learning process, an agent cannot exhibit a certain behavior in some states that may be experienced sufficiently. In this study, we propose a novel multiagent learning approach to handle these problems. Our approach is based on utilizing the mining process for modular cooperative learning systems. It incorporates fuzziness and online analytical processing (OLAP) based mining to effectively process the information reported by agents. First, we describe a fuzzy data cube OLAP architecture which facilitates effective storage and processing of the state information reported by agents. This way, the action of the other agent, not even in the visual environment. of the agent under consideration, can simply be predicted by extracting online association rules, a well-known data mining technique, from the constructed data cube. Second, we present a new action selection model, which is also based on association rules mining. Finally, we generalize not sufficiently experienced states, by mining multilevel association rules from the proposed fuzzy data cube. Experimental results obtained on two different versions of a well-known pursuit domain show the robustness and effectiveness of the proposed fuzzy OLAP mining based modular learning approach. Finally, we tested the scalability of the approach presented in this paper and compared it with our previous work on modular-fuzzy Q-learning and ordinary Q-learning.

  12. New approach for identifying boundary characteristics using transmissibility

    NASA Astrophysics Data System (ADS)

    Joo, Kyung-Hoon; Min, Dongwoo; Kim, Jun-Gu; Kang, Yeon June

    2017-04-01

    A novel approach is proposed for identifying boundary properties as a response model using transmissibility. This approach differs from those proposed in previous studies dealing with frequency response functions (FRFs) for joint identification. Transmissibility includes only response data, unlike FRFs that include force measurements. The boundary properties can be estimated by comparing the characteristics of the components under the free condition and connected to boundary conditions. When analyzing the components assembled compactly in the system for setting the shaker or measuring the impact force exerted on the component correctly, the proposed method could reduce the errors caused by an incorrectly measured force. The derived equation is verified using a discrete multiple degrees of freedom system with single boundary and multiple boundary conditions and by application to a beam, which is the simplest continuous structural form to validate the feasibility of the theory. The transmissibility defined by the apparent mass matrix is used for verifying the derived equation for identifying the boundary properties in the discrete system. However, when applying the equation to practical cases, as is the purpose of this research, the transmissibility matrix should be defined using only the response data. For this purpose, the accelerance matrix is modified slightly to the response matrix using the input as a unit force. This transmissibility matrix composed of response data is used for validating the equation in a continuous system. Furthermore, the effects of measurement noise are also investigated to assess the robustness of the method for application under practical conditions. Consequently, the proposed method could show reliable results by properly extracting the boundary properties in both cases. In many practical cases, this research is expected to contribute toward identifying the boundary properties in a complex system more conveniently compared to the method

  13. A comparative genomics approach to identifying the plasticity transcriptome

    PubMed Central

    Pfenning, Andreas R; Schwartz, Russell; Barth, Alison L

    2007-01-01

    Background Neuronal activity regulates gene expression to control learning and memory, homeostasis of neuronal function, and pathological disease states such as epilepsy. A great deal of experimental evidence supports the involvement of two particular transcription factors in shaping the genomic response to neuronal activity and mediating plasticity: CREB and zif268 (egr-1, krox24, NGFI-A). The gene targets of these two transcription factors are of considerable interest, since they may help develop hypotheses about how neural activity is coupled to changes in neural function. Results We have developed a computational approach for identifying binding sites for these transcription factors within the promoter regions of annotated genes in the mouse, rat, and human genomes. By combining a robust search algorithm to identify discrete binding sites, a comparison of targets across species, and an analysis of binding site locations within promoter regions, we have defined a group of candidate genes that are strong CREB- or zif268 targets and are thus regulated by neural activity. Our analysis revealed that CREB and zif268 share a disproportionate number of targets in common and that these common targets are dominated by transcription factors. Conclusion These observations may enable a more detailed understanding of the regulatory networks that are induced by neural activity and contribute to the plasticity transcriptome. The target genes identified in this study will be a valuable resource for investigators who hope to define the functions of specific genes that underlie activity-dependent changes in neuronal properties. PMID:17355637

  14. Social Network Analysis and Mining to Monitor and Identify Problems with Large-Scale Information and Communication Technology Interventions

    PubMed Central

    da Silva, Aleksandra do Socorro; de Brito, Silvana Rossy; Vijaykumar, Nandamudi Lankalapalli; da Rocha, Cláudio Alex Jorge; Monteiro, Maurílio de Abreu; Costa, João Crisóstomo Weyl Albuquerque; Francês, Carlos Renato Lisboa

    2016-01-01

    The published literature reveals several arguments concerning the strategic importance of information and communication technology (ICT) interventions for developing countries where the digital divide is a challenge. Large-scale ICT interventions can be an option for countries whose regions, both urban and rural, present a high number of digitally excluded people. Our goal was to monitor and identify problems in interventions aimed at certification for a large number of participants in different geographical regions. Our case study is the training at the Telecentros.BR, a program created in Brazil to install telecenters and certify individuals to use ICT resources. We propose an approach that applies social network analysis and mining techniques to data collected from Telecentros.BR dataset and from the socioeconomics and telecommunications infrastructure indicators of the participants’ municipalities. We found that (i) the analysis of interactions in different time periods reflects the objectives of each phase of training, highlighting the increased density in the phase in which participants develop and disseminate their projects; (ii) analysis according to the roles of participants (i.e., tutors or community members) reveals that the interactions were influenced by the center (or region) to which the participant belongs (that is, a community contained mainly members of the same region and always with the presence of tutors, contradicting expectations of the training project, which aimed for intense collaboration of the participants, regardless of the geographic region); (iii) the social network of participants influences the success of the training: that is, given evidence that the degree of the community member is in the highest range, the probability of this individual concluding the training is 0.689; (iv) the North region presented the lowest probability of participant certification, whereas the Northeast, which served municipalities with similar

  15. Social Network Analysis and Mining to Monitor and Identify Problems with Large-Scale Information and Communication Technology Interventions.

    PubMed

    da Silva, Aleksandra do Socorro; de Brito, Silvana Rossy; Vijaykumar, Nandamudi Lankalapalli; da Rocha, Cláudio Alex Jorge; Monteiro, Maurílio de Abreu; Costa, João Crisóstomo Weyl Albuquerque; Francês, Carlos Renato Lisboa

    2016-01-01

    The published literature reveals several arguments concerning the strategic importance of information and communication technology (ICT) interventions for developing countries where the digital divide is a challenge. Large-scale ICT interventions can be an option for countries whose regions, both urban and rural, present a high number of digitally excluded people. Our goal was to monitor and identify problems in interventions aimed at certification for a large number of participants in different geographical regions. Our case study is the training at the Telecentros.BR, a program created in Brazil to install telecenters and certify individuals to use ICT resources. We propose an approach that applies social network analysis and mining techniques to data collected from Telecentros.BR dataset and from the socioeconomics and telecommunications infrastructure indicators of the participants' municipalities. We found that (i) the analysis of interactions in different time periods reflects the objectives of each phase of training, highlighting the increased density in the phase in which participants develop and disseminate their projects; (ii) analysis according to the roles of participants (i.e., tutors or community members) reveals that the interactions were influenced by the center (or region) to which the participant belongs (that is, a community contained mainly members of the same region and always with the presence of tutors, contradicting expectations of the training project, which aimed for intense collaboration of the participants, regardless of the geographic region); (iii) the social network of participants influences the success of the training: that is, given evidence that the degree of the community member is in the highest range, the probability of this individual concluding the training is 0.689; (iv) the North region presented the lowest probability of participant certification, whereas the Northeast, which served municipalities with similar

  16. Mining 3D genome structure populations identifies major factors governing the stability of regulatory communities

    PubMed Central

    Dai, Chao; Li, Wenyuan; Tjong, Harianto; Hao, Shengli; Zhou, Yonggang; Li, Qingjiao; Chen, Lin; Zhu, Bing; Alber, Frank; Jasmine Zhou, Xianghong

    2016-01-01

    Three-dimensional (3D) genome structures vary from cell to cell even in an isogenic sample. Unlike protein structures, genome structures are highly plastic, posing a significant challenge for structure-function mapping. Here we report an approach to comprehensively identify 3D chromatin clusters that each occurs frequently across a population of genome structures, either deconvoluted from ensemble-averaged Hi-C data or from a collection of single-cell Hi-C data. Applying our method to a population of genome structures (at the macrodomain resolution) of lymphoblastoid cells, we identify an atlas of stable inter-chromosomal chromatin clusters. A large number of these clusters are enriched in binding of specific regulatory factors and are therefore defined as ‘Regulatory Communities.' We reveal two major factors, centromere clustering and transcription factor binding, which significantly stabilize such communities. Finally, we show that the regulatory communities differ substantially from cell to cell, indicating that expression variability could be impacted by genome structures. PMID:27240697

  17. Identifying Subgroups among Hardcore Smokers: a Latent Profile Approach

    PubMed Central

    Bommelé, Jeroen; Kleinjan, Marloes; Schoenmakers, Tim M.; Burk, William J.; van den Eijnden, Regina; van de Mheen, Dike

    2015-01-01

    Introduction Hardcore smokers are smokers who have little to no intention to quit. Previous research suggests that there are distinct subgroups among hardcore smokers and that these subgroups vary in the perceived pros and cons of smoking and quitting. Identifying these subgroups could help to develop individualized messages for the group of hardcore smokers. In this study we therefore used the perceived pros and cons of smoking and quitting to identify profiles among hardcore smokers. Methods A sample of 510 hardcore smokers completed an online survey on the perceived pros and cons of smoking and quitting. We used these perceived pros and cons in a latent profile analysis to identify possible subgroups among hardcore smokers. To validate the profiles identified among hardcore smokers, we analysed data from a sample of 338 non-hardcore smokers in a similar way. Results We found three profiles among hardcore smokers. ‘Receptive’ hardcore smokers (36%) perceived many cons of smoking and many pros of quitting. ‘Ambivalent’ hardcore smokers (59%) were rather undecided towards quitting. ‘Resistant’ hardcore smokers (5%) saw few cons of smoking and few pros of quitting. Among non-hardcore smokers, we found similar groups of ‘receptive’ smokers (30%) and ‘ambivalent’ smokers (54%). However, a third group consisted of ‘disengaged’ smokers (16%), who saw few pros and cons of both smoking and quitting. Discussion Among hardcore smokers, we found three distinct profiles based on perceived pros and cons of smoking. This indicates that hardcore smokers are not a homogenous group. Each profile might require a different tobacco control approach. Our findings may help to develop individualized tobacco control messages for the particularly hard-to-reach group of hardcore smokers. PMID:26207829

  18. Systematic approaches to identify E3 ligase substrates

    PubMed Central

    Iconomou, Mary; Saunders, Darren N.

    2016-01-01

    Protein ubiquitylation is a widespread post-translational modification, regulating cellular signalling with many outcomes, such as protein degradation, endocytosis, cell cycle progression, DNA repair and transcription. E3 ligases are a critical component of the ubiquitin proteasome system (UPS), determining the substrate specificity of the cascade by the covalent attachment of ubiquitin to substrate proteins. Currently, there are over 600 putative E3 ligases, but many are poorly characterized, particularly with respect to individual protein substrates. Here, we highlight systematic approaches to identify and validate UPS targets and discuss how they are underpinning rapid advances in our understanding of the biochemistry and biology of the UPS. The integration of novel tools, model systems and methods for target identification is driving significant interest in drug development, targeting various aspects of UPS function and advancing the understanding of a diverse range of disease processes. PMID:27834739

  19. Novel approaches to identify protective malaria vaccine candidates

    PubMed Central

    Chia, Wan Ni; Goh, Yun Shan; Rénia, Laurent

    2014-01-01

    Efforts to develop vaccines against malaria have been the focus of substantial research activities for decades. Several categories of candidate vaccines are currently being developed for protection against malaria, based on antigens corresponding to the pre-erythrocytic, blood stage, or sexual stages of the parasite. Long lasting sterile protection from Plasmodium falciparum sporozoite challenge has been observed in human following vaccination with whole parasite formulations, clearly demonstrating that a protective immune response targeting predominantly the pre-erythrocytic stages can develop against malaria. However, most of vaccine candidates currently being investigated, which are mostly subunits vaccines, have not been able to induce substantial (>50%) protection thus far. This is due to the fact that the antigens responsible for protection against the different parasite stages are still yet to be known and relevant correlates of protection have remained elusive. For a vaccine to be developed in a timely manner, novel approaches are required. In this article, we review the novel approaches that have been developed to identify the antigens for the development of an effective malaria vaccine. PMID:25452745

  20. Integrative biology approach identifies cytokine targeting strategies for psoriasis.

    PubMed

    Perera, Gayathri K; Ainali, Chrysanthi; Semenova, Ekaterina; Hundhausen, Christian; Barinaga, Guillermo; Kassen, Deepika; Williams, Andrew E; Mirza, Muddassar M; Balazs, Mercedesz; Wang, Xiaoting; Rodriguez, Robert Sanchez; Alendar, Andrej; Barker, Jonathan; Tsoka, Sophia; Ouyang, Wenjun; Nestle, Frank O

    2014-02-12

    Cytokines are critical checkpoints of inflammation. The treatment of human autoimmune disease has been revolutionized by targeting inflammatory cytokines as key drivers of disease pathogenesis. Despite this, there exist numerous pitfalls when translating preclinical data into the clinic. We developed an integrative biology approach combining human disease transcriptome data sets with clinically relevant in vivo models in an attempt to bridge this translational gap. We chose interleukin-22 (IL-22) as a model cytokine because of its potentially important proinflammatory role in epithelial tissues. Injection of IL-22 into normal human skin grafts produced marked inflammatory skin changes resembling human psoriasis. Injection of anti-IL-22 monoclonal antibody in a human xenotransplant model of psoriasis, developed specifically to test potential therapeutic candidates, efficiently blocked skin inflammation. Bioinformatic analysis integrating both the IL-22 and anti-IL-22 cytokine transcriptomes and mapping them onto a psoriasis disease gene coexpression network identified key cytokine-dependent hub genes. Using knockout mice and small-molecule blockade, we show that one of these hub genes, the so far unexplored serine/threonine kinase PIM1, is a critical checkpoint for human skin inflammation and potential future therapeutic target in psoriasis. Using in silico integration of human data sets and biological models, we were able to identify a new target in the treatment of psoriasis.

  1. A review of approaches to identifying patient phenotype cohorts using electronic health records

    PubMed Central

    Shivade, Chaitanya; Raghavan, Preethi; Fosler-Lussier, Eric; Embi, Peter J; Elhadad, Noemie; Johnson, Stephen B; Lai, Albert M

    2014-01-01

    Objective To summarize literature describing approaches aimed at automatically identifying patients with a common phenotype. Materials and methods We performed a review of studies describing systems or reporting techniques developed for identifying cohorts of patients with specific phenotypes. Every full text article published in (1) Journal of American Medical Informatics Association, (2) Journal of Biomedical Informatics, (3) Proceedings of the Annual American Medical Informatics Association Symposium, and (4) Proceedings of Clinical Research Informatics Conference within the past 3 years was assessed for inclusion in the review. Only articles using automated techniques were included. Results Ninety-seven articles met our inclusion criteria. Forty-six used natural language processing (NLP)-based techniques, 24 described rule-based systems, 41 used statistical analyses, data mining, or machine learning techniques, while 22 described hybrid systems. Nine articles described the architecture of large-scale systems developed for determining cohort eligibility of patients. Discussion We observe that there is a rise in the number of studies associated with cohort identification using electronic medical records. Statistical analyses or machine learning, followed by NLP techniques, are gaining popularity over the years in comparison with rule-based systems. Conclusions There are a variety of approaches for classifying patients into a particular phenotype. Different techniques and data sources are used, and good performance is reported on datasets at respective institutions. However, no system makes comprehensive use of electronic medical records addressing all of their known weaknesses. PMID:24201027

  2. North American Bats and Mines Project: A cooperative approach for integrating bat conservation and mine-land reclamation

    SciTech Connect

    Ducummon, S.L.

    1997-12-31

    Inactive underground mines now provide essential habitat for more than half of North America`s 44 bat species, including some of the largest remaining populations. Thousands of abandoned mines have already been closed or are slated for safety closures, and many are destroyed during renewed mining in historic districts. The available evidence suggests that millions of bats have already been lost due to these closures. Bats are primary predators of night-flying insects that cost American farmers and foresters billions of dollars annually, therefore, threats to bat survival are cause for serious concern. Fortunately, mine closure methods exist that protect both bats and humans. Bat Conservation International (BCI) and the USDI-Bureau of Land Management founded the North American Bats and Mines Project to provide national leadership and coordination to minimize the loss of mine-roosting bats. This partnership has involved federal and state mine-land and wildlife managers and the mining industry. BCI has trained hundreds of mine-land and wildlife managers nationwide in mine assessment techniques for bats and bat-compatible closure methods, published technical information on bats and mine-land management, presented papers on bats and mines at national mining and wildlife conferences, and collaborated with numerous federal, state, and private partners to protect some of the most important mine-roosting bat populations. Our new mining industry initiative, Mining for Habitat, is designed to develop bat habitat conservation and enhancement plans for active mining operations. It includes the creation of cost-effective artificial underground bat roosts using surplus mining materials such as old mine-truck tires and culverts buried beneath waste rock.

  3. A Hybrid Approach for Efficient Modeling of Medium-Frequency Propagation in Coal Mines

    PubMed Central

    Brocker, Donovan E.; Sieber, Peter E.; Waynert, Joseph A.; Li, Jingcheng; Werner, Pingjuan L.; Werner, Douglas H.

    2015-01-01

    An efficient procedure for modeling medium frequency (MF) communications in coal mines is introduced. In particular, a hybrid approach is formulated and demonstrated utilizing ideal transmission line equations to model MF propagation in combination with full-wave sections used for accurate simulation of local antenna-line coupling and other near-field effects. This work confirms that the hybrid method accurately models signal propagation from a source to a load for various system geometries and material compositions, while significantly reducing computation time. With such dramatic improvement to solution times, it becomes feasible to perform large-scale optimizations with the primary motivation of improving communications in coal mines both for daily operations and emergency response. Furthermore, it is demonstrated that the hybrid approach is suitable for modeling and optimizing large communication networks in coal mines that may otherwise be intractable to simulate using traditional full-wave techniques such as moment methods or finite-element analysis. PMID:26478686

  4. Meta-control of combustion performance with a data mining approach

    NASA Astrophysics Data System (ADS)

    Song, Zhe

    Large scale combustion process is complex and proposes challenges of optimizing its performance. Traditional approaches based on thermal dynamics have limitations on finding optimal operational regions due to time-shift nature of the process. Recent advances in information technology enable people collect large volumes of process data easily and continuously. The collected process data contains rich information about the process and, to some extent, represents a digital copy of the process over time. Although large volumes of data exist in industrial combustion processes, they are not fully utilized to the level where the process can be optimized. Data mining is an emerging science which finds patterns or models from large data sets. It has found many successful applications in business marketing, medical and manufacturing domains The focus of this dissertation is on applying data mining to industrial combustion processes, and ultimately optimizing the combustion performance. However the philosophy, methods and frameworks discussed in this research can also be applied to other industrial processes. Optimizing an industrial combustion process has two major challenges. One is the underlying process model changes over time and obtaining an accurate process model is nontrivial. The other is that a process model with high fidelity is usually highly nonlinear, solving the optimization problem needs efficient heuristics. This dissertation is set to solve these two major challenges. The major contribution of this 4-year research is the data-driven solution to optimize the combustion process, where process model or knowledge is identified based on the process data, then optimization is executed by evolutionary algorithms to search for optimal operating regions.

  5. A Hybrid Data Mining Approach for Credit Card Usage Behavior Analysis

    NASA Astrophysics Data System (ADS)

    Tsai, Chieh-Yuan

    Credit card is one of the most popular e-payment approaches in current online e-commerce. To consolidate valuable customers, card issuers invest a lot of money to maintain good relationship with their customers. Although several efforts have been done in studying card usage motivation, few researches emphasize on credit card usage behavior analysis when time periods change from t to t+1. To address this issue, an integrated data mining approach is proposed in this paper. First, the customer profile and their transaction data at time period t are retrieved from databases. Second, a LabelSOM neural network groups customers into segments and identify critical characteristics for each group. Third, a fuzzy decision tree algorithm is used to construct usage behavior rules of interesting customer groups. Finally, these rules are used to analysis the behavior changes between time periods t and t+1. An implementation case using a practical credit card database provided by a commercial bank in Taiwan is illustrated to show the benefits of the proposed framework.

  6. A Study of the Physical and Mechanical Properties of Lutetium Compared with Those of Transition Metals: A Data Mining Approach

    NASA Astrophysics Data System (ADS)

    Settouti, Nadera; Aourag, Hafid

    2015-01-01

    In this article, we study the physical and mechanical properties of lutetium, which will be compared with the elements of the third-row transition metals (Cs, Ba, Hf, Ta, W, Re, Os, Ir, Pt, Au, Tl, Pb, and Bi). Data mining is an ideal approach for analyzing the information and exploring the hidden knowledge among the data. The purpose of the data mining scheme is to identify and classify the effects of the relationships existing between properties. The results of the investigation are presented by means of multivariate modeling methods, such as the principal component analysis and the partial least squares regression to discover the implicit, yet meaningful, relationship between the elements of the data set, and to locate correlations between the properties of the materials. In this study, we present a data mining approach to discover such unusual correlations between properties of the elements. When comparing the properties of the transition metals with those of lutetium, our results show that lutetium shares many properties and similarities with the transition metals of the sixth row in the periodic table and can be well described as a transition metal.

  7. Assessing the effectiveness of sustainable land management policies for combating desertification: A data mining approach.

    PubMed

    Salvati, L; Kosmas, C; Kairis, O; Karavitis, C; Acikalin, S; Belgacem, A; Solé-Benet, A; Chaker, M; Fassouli, V; Gokceoglu, C; Gungor, H; Hessel, R; Khatteli, H; Kounalaki, A; Laouina, A; Ocakoglu, F; Ouessar, M; Ritsema, C; Sghaier, M; Sonmez, H; Taamallah, H; Tezcan, L; de Vente, J; Kelly, C; Colantoni, A; Carlucci, M

    2016-12-01

    This study investigates the relationship between fine resolution, local-scale biophysical and socioeconomic contexts within which land degradation occurs, and the human responses to it. The research draws on experimental data collected under different territorial and socioeconomic conditions at 586 field sites in five Mediterranean countries (Spain, Greece, Turkey, Tunisia and Morocco). We assess the level of desertification risk under various land management practices (terracing, grazing control, prevention of wildland fires, soil erosion control measures, soil water conservation measures, sustainable farming practices, land protection measures and financial subsidies) taken as possible responses to land degradation. A data mining approach, incorporating principal component analysis, non-parametric correlations, multiple regression and canonical analysis, was developed to identify the spatial relationship between land management conditions, the socioeconomic and environmental context (described using 40 biophysical and socioeconomic indicators) and desertification risk. Our analysis identified a number of distinct relationships between the level of desertification experienced and the underlying socioeconomic context, suggesting that the effectiveness of responses to land degradation is strictly dependent on the local biophysical and socioeconomic context. Assessing the latent relationship between land management practices and the biophysical/socioeconomic attributes characterizing areas exposed to different levels of desertification risk proved to be an indirect measure of the effectiveness of field actions contrasting land degradation.

  8. A spatio-temporal mining approach towards summarizing and analyzing protein folding trajectories.

    PubMed

    Yang, Hui; Parthasarathy, Srinivasan; Ucar, Duygu

    2007-04-04

    Understanding the protein folding mechanism remains a grand challenge in structural biology. In the past several years, computational theories in molecular dynamics have been employed to shed light on the folding process. Coupled with high computing power and large scale storage, researchers now can computationally simulate the protein folding process in atomistic details at femtosecond temporal resolution. Such simulation often produces a large number of folding trajectories, each consisting of a series of 3D conformations of the protein under study. As a result, effectively managing and analyzing such trajectories is becoming increasingly important. In this article, we present a spatio-temporal mining approach to analyze protein folding trajectories. It exploits the simplicity of contact maps, while also integrating 3D structural information in the analysis. It characterizes the dynamic folding process by first identifying spatio-temporal association patterns in contact maps, then studying how such patterns evolve along a folding trajectory. We demonstrate that such patterns can be leveraged to summarize folding trajectories, and to facilitate the detection and ordering of important folding events along a folding path. We also show that such patterns can be used to identify a consensus partial folding pathway across multiple folding trajectories. Furthermore, we argue that such patterns can capture both local and global structural topology in a 3D protein conformation, thereby facilitating effective structural comparison amongst conformations. We apply this approach to analyze the folding trajectories of two small synthetic proteins-BBA5 and GSGS (or Beta3S). We show that this approach is promising towards addressing the above issues, namely, folding trajectory summarization, folding events detection and ordering, and consensus partial folding pathway identification across trajectories.

  9. Integrated approach of environmental impact and risk assessment of Rosia Montana Mining Area, Romania.

    PubMed

    Stefănescu, Lucrina; Robu, Brînduşa Mihaela; Ozunu, Alexandru

    2013-11-01

    The environmental impact assessment of mining sites represents nowadays a large interest topic in Romania. Historical pollution in the Rosia Montana mining area of Romania caused extensive damage to environmental media. This paper has two goals: to investigate the environmental pollution induced by mining activities in the Rosia Montana area and to quantify the environmental impacts and associated risks by means of an integrated approach. Thus, a new method was developed and applied for quantifying the impact of mining activities, taking account of the quality of environmental media in the mining area, and used as case study in the present paper. The associated risks are a function of the environmental impacts and the probability of their occurrence. The results show that the environmental impacts and quantified risks, based on quality indicators to characterize the environmental quality, are of a higher order, and thus measures for pollution remediation and control need to be considered in the investigated area. The conclusion drawn is that an integrated approach for the assessment of environmental impact and associated risks is a valuable and more objective method, and is an important tool that can be applied in the decision-making process for national authorities in the prioritization of emergency action.

  10. Quantitative risk-based approach for improving water quality management in mining.

    PubMed

    Liu, Wenying; Moran, Chris J; Vink, Sue

    2011-09-01

    The potential environmental threats posed by freshwater withdrawal and mine water discharge are some of the main drivers for the mining industry to improve water management. The use of multiple sources of water supply and introducing water reuse into the mine site water system have been part of the operating philosophies employed by the mining industry to realize these improvements. However, a barrier to implementation of such good water management practices is concomitant water quality variation and the resulting impacts on the efficiency of mineral separation processes, and an increased environmental consequence of noncompliant discharge events. There is an increasing appreciation that conservative water management practices, production efficiency, and environmental consequences are intimately linked through the site water system. It is therefore essential to consider water management decisions and their impacts as an integrated system as opposed to dealing with each impact separately. This paper proposes an approach that could assist mine sites to manage water quality issues in a systematic manner at the system level. This approach can quantitatively forecast the risk related with water quality and evaluate the effectiveness of management strategies in mitigating the risk by quantifying implications for production and hence economic viability.

  11. Performance-based approach to improve skills, safety, and training in the mining industry. Research report, September 1984-October 1989 (Final)

    SciTech Connect

    Klishis, M.J.; Althouse, R.C.; Grayson, R.L.; Lies, G.M.

    1989-10-01

    The Mining Extension Service of West Virginia University has developed an approach for identifying the specific training needs of mining operations (production, maintenance and support tasks) that can be used to upgrade on-the-job training, annual refresher training and task training. It can also be directed at systematically correcting performance discrepancies of an individual, a crew, or the mining operation, or to challenge workers and management toward attaining improved performances. Built on the systematic use of operational data, the Training in Operations Process (TOP) approach is designed to integrate features such as diligence in monitoring and evaluating performances into operational performance decisions. The approach may also be applied to other (non-training) interventions by mine management. Based on research on the state of training and worker performance in longwall mining, coal preparation plants, and underground haulage operations, this approach provides a practical five-step process for managers to implement and focus training so that it coincides with the organization's productivity and safety goals. The system permits management to plan, organize, and schedule task training, cross training, annual refresher training, and specialized skills training for miners' regular job assignments, and for their back-up, or fill-in-roles.

  12. EVALUATION OF A TWO-STAGE PASSIVE TREATMENT APPROACH FOR MINING INFLUENCE WATERS

    EPA Science Inventory

    A two-stage passive treatment approach was assessed at bench-scale using two Colorado Mining Influenced Waters (MIWs). The first-stage was a limestone drain with the purpose of removing iron and aluminum and mitigating the potential effects of mineral acidity. The second stage w...

  13. Ultrabroadband photonic Internet: data mining approach to security aspects

    NASA Astrophysics Data System (ADS)

    Kalicki, Arkadiusz

    2009-06-01

    Web applications became most popular medium in the Internet. Popularity, easiness of web application frameworks together with careless development results in high number of vulnerabilities and attacks. There are several types of attacks possible because of improper input validation. SQL injection is ability to execute arbitrary SQL queries in a database through an existing application. Cross-site scripting is the vulnerability which allows malicious web users to inject code into the web pages viewed by other users. Cross-Site Request Forgery (CSRF) is an attack that tricks the victim into loading a page that contains malicious request. Web spam in blogs. In order to secure web applications intrusion detection (IDS) and intrusion prevention systems (IPS) are being used. Intrusion detection systems are divided in two groups: misuse detection (traditional IDS) and anomaly detection. Misuse detection systems are signature based, have high accuracy in detecting many kinds of known attacks but cannot detect unknown and emerging attacks. This can be complemented with anomaly based intrusion detection and prevention systems. This paper presents anomaly driven proxy as an IPS and data mining based algorithm which was used to detecting anomalies. The principle of this method is the comparison of the incoming HTTP traffic with a previously built profile that contains a representation of the "normal" or expected web application usage sequence patterns. The frequent sequence patterns are found with GSP algorithm. Some basic tests show that the software catches malicious requests.

  14. Data mining approach to web application intrusions detection

    NASA Astrophysics Data System (ADS)

    Kalicki, Arkadiusz

    2011-10-01

    Web applications became most popular medium in the Internet. Popularity, easiness of web application script languages and frameworks together with careless development results in high number of web application vulnerabilities and high number of attacks performed. There are several types of attacks possible because of improper input validation: SQL injection Cross-site scripting, Cross-Site Request Forgery (CSRF), web spam in blogs and others. In order to secure web applications intrusion detection (IDS) and intrusion prevention systems (IPS) are being used. Intrusion detection systems are divided in two groups: misuse detection (traditional IDS) and anomaly detection. This paper presents data mining based algorithm for anomaly detection. The principle of this method is the comparison of the incoming HTTP traffic with a previously built profile that contains a representation of the "normal" or expected web application usage sequence patterns. The frequent sequence patterns are found with GSP algorithm. Previously presented detection method was rewritten and improved. Some tests show that the software catches malicious requests, especially long attack sequences, results quite good with medium length sequences, for short length sequences must be complemented with other methods.

  15. A quantitative approach to identifying predators from nest remains

    USGS Publications Warehouse

    Anthony, R. Michael; Grand, J.B.; Fondell, T.F.; Manly, B.F.

    2004-01-01

    Nesting success of Dusky Canada Geese (Branta canadensis occidentalis) has declined greatly since a major earthquake affected southern Alaska in 1964. To identify nest predators, we collected predation data at goose nests and photographs of predators at natural nests containing artificial eggs in 1997-2000. To document feeding behavior by nest predators, we compiled the evidence from destroyed nests with known predators on our study site and from previous studies. We constructed a profile for each predator group and compared the evidence from 895 nests with unknown predators to our predator profiles using mixture-model analysis. This analysis indicated that 72% of destroyed nests were depredated by Bald Eagles and 13% by brown bears, and also yielded the probability that each nest was correctly assigned to a predator group based on model fit. Model testing using simulations indicated that the proportion estimated for eagle predation was unbiased and the proportion for bear predation was slightly overestimated. This approach may have application whenever there are adequate data on nests destroyed by known predators and predators exhibit different feeding behavior at nests.

  16. Reverse Vaccinology: An Approach for Identifying Leptospiral Vaccine Candidates

    PubMed Central

    Dellagostin, Odir A.; Grassmann, André A.; Rizzi, Caroline; Schuch, Rodrigo A.; Jorge, Sérgio; Oliveira, Thais L.; McBride, Alan J. A.; Hartwig, Daiane D.

    2017-01-01

    Leptospirosis is a major public health problem with an incidence of over one million human cases each year. It is a globally distributed, zoonotic disease and is associated with significant economic losses in farm animals. Leptospirosis is caused by pathogenic Leptospira spp. that can infect a wide range of domestic and wild animals. Given the inability to control the cycle of transmission among animals and humans, there is an urgent demand for a new vaccine. Inactivated whole-cell vaccines (bacterins) are routinely used in livestock and domestic animals, however, protection is serovar-restricted and short-term only. To overcome these limitations, efforts have focused on the development of recombinant vaccines, with partial success. Reverse vaccinology (RV) has been successfully applied to many infectious diseases. A growing number of leptospiral genome sequences are now available in public databases, providing an opportunity to search for prospective vaccine antigens using RV. Several promising leptospiral antigens were identified using this approach, although only a few have been characterized and evaluated in animal models. In this review, we summarize the use of RV for leptospirosis and discuss the need for potential improvements for the successful development of a new vaccine towards reducing the burden of human and animal leptospirosis. PMID:28098813

  17. Identifying heterogeneous anisotropic properties in cerebral aneurysms: a pointwise approach.

    PubMed

    Zhao, Xuefeng; Raghavan, Madhavan L; Lu, Jia

    2011-04-01

    The traditional approaches of estimating heterogeneous properties in a soft tissue structure using optimization-based inverse methods often face difficulties because of the large number of unknowns to be simultaneously determined. This article proposes a new method for identifying the heterogeneous anisotropic nonlinear elastic properties in cerebral aneurysms. In this method, the local properties are determined directly from the pointwise stress-strain data, thus avoiding the need for simultaneously optimizing for the property values at all points/regions in the aneurysm. The stress distributions needed for a pointwise identification are computed using an inverse elastostatic method without invoking the material properties in question. This paradigm is tested numerically through simulated inflation tests on an image-based cerebral aneurysm sac. The wall tissue is modeled as an eight-ply laminate whose constitutive behavior is described by an anisotropic hyperelastic strain energy function containing four parameters. The parameters are assumed to vary continuously in the sac. Deformed configurations generated from forward finite element analysis are taken as input to inversely establish the parameter distributions. The delineated and the assigned distributions are in excellent agreement. A forward verification is conducted by comparing the displacement solutions obtained from the delineated and the assigned material parameters at a different pressure. The deviations in nodal displacements are found to be within 0.2% in most part of the sac. The study highlights some distinct features of the proposed method, and demonstrates the feasibility of organ level identification of the distributive anisotropic nonlinear properties in cerebral aneurysms.

  18. An integrated approach for identifying priority contaminant in ...

    EPA Pesticide Factsheets

    Environmental assessment of complex mixtures typically requires integration of chemical and biological measurements. This study demonstrates the use of a combination of instrumental chemical analyses, effects-based monitoring, and bio-effects prediction approaches to help identify potential hazards and priority contaminants in two Great Lakes Areas of Concern (AOCs), the Lower Green Bay/Fox River located near Green Bay, WI, USA and the Milwaukee River Estuary, located near Milwaukee, WI, USA. Fathead minnows were caged at four sites within each AOC (eight sites total). Following 4 d of in situ exposure, tissues and biofluids were sampled and used for targeted biological effects analyses. Additionally, 4 d composite water samples were collected concurrently at each caged fish site and analyzed for 134 analytes as well as evaluated for total estrogenic and androgenic activity using cell-based bioassays. Of the analytes examined, 75 were detected in composite samples from at least one site. Based on multiple analyses, one site in the East River and another site near a paper mill discharge from lower Green Bay/Fox River AOC, were prioritized due to their estrogenic and androgenic acitvity, respectively. The water samples from other sites generally did not exhibit significant estrogenic or androgenic activity, nor was there evidence for endocrine disruption in the fish exposed at these sites as indicated the the lack of alterations in ex vivo steroid production, c

  19. Prediction model for peninsular Indian summer monsoon rainfall using data mining and statistical approaches

    NASA Astrophysics Data System (ADS)

    Vathsala, H.; Koolagudi, Shashidhar G.

    2017-01-01

    In this paper we discuss a data mining application for predicting peninsular Indian summer monsoon rainfall, and propose an algorithm that combine data mining and statistical techniques. We select likely predictors based on association rules that have the highest confidence levels. We then cluster the selected predictors to reduce their dimensions and use cluster membership values for classification. We derive the predictors from local conditions in southern India, including mean sea level pressure, wind speed, and maximum and minimum temperatures. The global condition variables include southern oscillation and Indian Ocean dipole conditions. The algorithm predicts rainfall in five categories: Flood, Excess, Normal, Deficit and Drought. We use closed itemset mining, cluster membership calculations and a multilayer perceptron function in the algorithm to predict monsoon rainfall in peninsular India. Using Indian Institute of Tropical Meteorology data, we found the prediction accuracy of our proposed approach to be exceptionally good.

  20. A fluorescent approach for identifying P2X1 ligands

    PubMed Central

    Ruepp, Marc-David; Brozik, James A.; de Esch, Iwan J.P.; Farndale, Richard W.; Murrell-Lagnado, Ruth D.; Thompson, Andrew J.

    2015-01-01

    There are no commercially available, small, receptor-specific P2X1 ligands. There are several synthetic derivatives of the natural agonist ATP and some structurally-complex antagonists including compounds such as PPADS, NTP-ATP, suramin and its derivatives (e.g. NF279, NF449). NF449 is the most potent and selective ligand, but potencies of many others are not particularly high and they can also act at other P2X, P2Y and non-purinergic receptors. While there is clearly scope for further work on P2X1 receptor pharmacology, screening can be difficult owing to rapid receptor desensitisation. To reduce desensitisation substitutions can be made within the N-terminus of the P2X1 receptor, but these could also affect ligand properties. An alternative is the use of fluorescent voltage-sensitive dyes that respond to membrane potential changes resulting from channel opening. Here we utilised this approach in conjunction with fragment-based drug-discovery. Using a single concentration (300 μM) we identified 46 novel leads from a library of 1443 fragments (hit rate = 3.2%). These hits were independently validated by measuring concentration-dependence with the same voltage-sensitive dye, and by visualising the competition of hits with an Alexa-647-ATP fluorophore using confocal microscopy; confocal yielded kon (1.142 × 106 M−1 s−1) and koff (0.136 s−1) for Alexa-647-ATP (Kd = 119 nM). The identified hit fragments had promising structural diversity. In summary, the measurement of functional responses using voltage-sensitive dyes was flexible and cost-effective because labelled competitors were not needed, effects were independent of a specific binding site, and both agonist and antagonist actions were probed in a single assay. The method is widely applicable and could be applied to all P2X family members, as well as other voltage-gated and ligand-gated ion channels. This article is part of the Special Issue entitled ‘Fluorescent Tools in Neuropharmacology

  1. Identifying predictors of physics item difficulty: A linear regression approach

    NASA Astrophysics Data System (ADS)

    Mesic, Vanes; Muratovic, Hasnija

    2011-06-01

    Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal physics knowledge

  2. Image Mining in Remote Sensing for Coastal Wetlands Mapping: from Pixel Based to Object Based Approach

    NASA Astrophysics Data System (ADS)

    Farda, N. M.; Danoedoro, P.; Hartono; Harjoko, A.

    2016-11-01

    The availably of remote sensing image data is numerous now, and with a large amount of data it makes “knowledge gap” in extraction of selected information, especially coastal wetlands. Coastal wetlands provide ecosystem services essential to people and the environment. The aim of this research is to extract coastal wetlands information from satellite data using pixel based and object based image mining approach. Landsat MSS, Landsat 5 TM, Landsat 7 ETM+, and Landsat 8 OLI images located in Segara Anakan lagoon are selected to represent data at various multi temporal images. The input for image mining are visible and near infrared bands, PCA band, invers PCA bands, mean shift segmentation bands, bare soil index, vegetation index, wetness index, elevation from SRTM and ASTER GDEM, and GLCM (Harralick) or variability texture. There is three methods were applied to extract coastal wetlands using image mining: pixel based - Decision Tree C4.5, pixel based - Back Propagation Neural Network, and object based - Mean Shift segmentation and Decision Tree C4.5. The results show that remote sensing image mining can be used to map coastal wetlands ecosystem. Decision Tree C4.5 can be mapped with highest accuracy (0.75 overall kappa). The availability of remote sensing image mining for mapping coastal wetlands is very important to provide better understanding about their spatiotemporal coastal wetlands dynamics distribution.

  3. A software tool for determination of breast cancer treatment methods using data mining approach.

    PubMed

    Cakır, Abdülkadir; Demirel, Burçin

    2011-12-01

    In this work, breast cancer treatment methods are determined using data mining. For this purpose, software is developed to help to oncology doctor for the suggestion of application of the treatment methods about breast cancer patients. 462 breast cancer patient data, obtained from Ankara Oncology Hospital, are used to determine treatment methods for new patients. This dataset is processed with Weka data mining tool. Classification algorithms are applied one by one for this dataset and results are compared to find proper treatment method. Developed software program called as "Treatment Assistant" uses different algorithms (IB1, Multilayer Perception and Decision Table) to find out which one is giving better result for each attribute to predict and by using Java Net beans interface. Treatment methods are determined for the post surgical operation of breast cancer patients using this developed software tool. At modeling step of data mining process, different Weka algorithms are used for output attributes. For hormonotherapy output IB1, for tamoxifen and radiotherapy outputs Multilayer Perceptron and for the chemotherapy output decision table algorithm shows best accuracy performance compare to each other. In conclusion, this work shows that data mining approach can be a useful tool for medical applications particularly at the treatment decision step. Data mining helps to the doctor to decide in a short time.

  4. Data mining approaches to high-throughput crystal structure and compound prediction.

    PubMed

    Hautier, Geoffroy

    2014-01-01

    Predicting unknown inorganic compounds and their crystal structure is a critical step of high-throughput computational materials design and discovery. One way to achieve efficient compound prediction is to use data mining or machine learning methods. In this chapter we present a few algorithms for data mining compound prediction and their applications to different materials discovery problems. In particular, the patterns or correlations governing phase stability for experimental or computational inorganic compound databases are statistically learned and used to build probabilistic or regression models to identify novel compounds and their crystal structures. The stability of those compound candidates is then assessed using ab initio techniques. Finally, we report a few cases where data mining driven computational predictions were experimentally confirmed through inorganic synthesis.

  5. Discovering drug–drug interactions: a text-mining and reasoning approach based on properties of drug metabolism

    PubMed Central

    Tari, Luis; Anwar, Saadat; Liang, Shanshan; Cai, James; Baral, Chitta

    2010-01-01

    Motivation: Identifying drug–drug interactions (DDIs) is a critical process in drug administration and drug development. Clinical support tools often provide comprehensive lists of DDIs, but they usually lack the supporting scientific evidences and different tools can return inconsistent results. In this article, we propose a novel approach that integrates text mining and automated reasoning to derive DDIs. Through the extraction of various facts of drug metabolism, not only the DDIs that are explicitly mentioned in text can be extracted but also the potential interactions that can be inferred by reasoning. Results: Our approach was able to find several potential DDIs that are not present in DrugBank. We manually evaluated these interactions based on their supporting evidences, and our analysis revealed that 81.3% of these interactions are determined to be correct. This suggests that our approach can uncover potential DDIs with scientific evidences explaining the mechanism of the interactions. Contact: luis.tari@roche.com PMID:20823320

  6. Data Mining: A Systems Approach to Formative Assessment

    ERIC Educational Resources Information Center

    Schmid, Dale

    2012-01-01

    This article describes how using raw data and information from reliable assessments can inform teachers' decisions leading to improved instruction. The primary aim is to use a systems approach to provide evidence of what students know and how they demonstrate mastery. Such evidence can empower teachers to reach all students. The pedagogic…

  7. An Efficient Pattern Mining Approach for Event Detection in Multivariate Temporal Data.

    PubMed

    Batal, Iyad; Cooper, Gregory; Fradkin, Dmitriy; Harrison, James; Moerchen, Fabian; Hauskrecht, Milos

    2016-01-01

    This work proposes a pattern mining approach to learn event detection models from complex multivariate temporal data, such as electronic health records. We present Recent Temporal Pattern mining, a novel approach for efficiently finding predictive patterns for event detection problems. This approach first converts the time series data into time-interval sequences of temporal abstractions. It then constructs more complex time-interval patterns backward in time using temporal operators. We also present the Minimal Predictive Recent Temporal Patterns framework for selecting a small set of predictive and non-spurious patterns. We apply our methods for predicting adverse medical events in real-world clinical data. The results demonstrate the benefits of our methods in learning accurate event detection models, which is a key step for developing intelligent patient monitoring and decision support systems.

  8. An Efficient Pattern Mining Approach for Event Detection in Multivariate Temporal Data

    PubMed Central

    Batal, Iyad; Cooper, Gregory; Fradkin, Dmitriy; Harrison, James; Moerchen, Fabian; Hauskrecht, Milos

    2015-01-01

    This work proposes a pattern mining approach to learn event detection models from complex multivariate temporal data, such as electronic health records. We present Recent Temporal Pattern mining, a novel approach for efficiently finding predictive patterns for event detection problems. This approach first converts the time series data into time-interval sequences of temporal abstractions. It then constructs more complex time-interval patterns backward in time using temporal operators. We also present the Minimal Predictive Recent Temporal Patterns framework for selecting a small set of predictive and non-spurious patterns. We apply our methods for predicting adverse medical events in real-world clinical data. The results demonstrate the benefits of our methods in learning accurate event detection models, which is a key step for developing intelligent patient monitoring and decision support systems. PMID:26752800

  9. An effective data mining approach for structure damage identification

    NASA Astrophysics Data System (ADS)

    Hong, Soonyoung

    An efficient, neural network based, online nondestructive structural damage identification procedure is developed for determining the damage characteristics (the damage locations and the corresponding severity) from dynamic measurements in near real-time. The procedure utilizes unique data processing techniques to track the most useful modal information based on modal strain energy and to calculate the associated data based on principal component analysis for further processing in a neural network based identification scheme. With two unique features, this approach is significantly different from currently available damage identification procedures for real-time structural integrity monitoring/diagnostics. First, the most sensitive mode for the specific damage is selected in an automatic process which increases the accuracy of damage identification and decreases time spent on neural network training. Second, the approach creates unique data that extracts core characteristics from modal information for a number of different damage cases; and consequently, the accuracy of the damage identification improves significantly. This approach can be operated online providing real time structural damage identification. The method is tested for simulated damage cases, including situations of single and multiple damage in the closely-spaced frequencies of Kabe's model. The philosophy behind the proposed research is to provide a means to online and nondestructively predict the degradation of a structure's integrity (i.e. damage location and the corresponding severity, strength loss).

  10. A data mining based approach to predict spatiotemporal changes in satellite images

    NASA Astrophysics Data System (ADS)

    Boulila, W.; Farah, I. R.; Ettabaa, K. Saheb; Solaiman, B.; Ghézala, H. Ben

    2011-06-01

    The interpretation of remotely sensed images in a spatiotemporal context is becoming a valuable research topic. However, the constant growth of data volume in remote sensing imaging makes reaching conclusions based on collected data a challenging task. Recently, data mining appears to be a promising research field leading to several interesting discoveries in various areas such as marketing, surveillance, fraud detection and scientific discovery. By integrating data mining and image interpretation techniques, accurate and relevant information (i.e. functional relation between observed parcels and a set of informational contents) can be automatically elicited. This study presents a new approach to predict spatiotemporal changes in satellite image databases. The proposed method exploits fuzzy sets and data mining concepts to build predictions and decisions for several remote sensing fields. It takes into account imperfections related to the spatiotemporal mining process in order to provide more accurate and reliable information about land cover changes in satellite images. The proposed approach is validated using SPOT images representing the Saint-Denis region, capital of Reunion Island. Results show good performances of the proposed framework in predicting change for the urban zone.

  11. Identifying Similarities in Cognitive Subtest Functional Requirements: An Empirical Approach

    ERIC Educational Resources Information Center

    Frisby, Craig L.; Parkin, Jason R.

    2007-01-01

    In the cognitive test interpretation literature, a Rational/Intuitive, Indirect Empirical, or Combined approach is typically used to construct conceptual taxonomies of the functional (behavioral) similarities between subtests. To address shortcomings of these approaches, the functional requirements for 49 subtests from six individually…

  12. New Seasonal Shift in In-Stream Diurnal Nitrate Cycles Identified by Mining High-Frequency Data

    PubMed Central

    2016-01-01

    The recent development of in-situ monitoring devices, such as UV-spectrometers, makes the study of short-term stream chemistry variation relevant, especially the study of diurnal cycles, which are not yet fully understood. Our study is based on high-frequency data from an agricultural catchment (Studienlandschaft Schwingbachtal, Germany). We propose a novel approach, i.e. the combination of cluster analysis and Linear Discriminant Analysis, to mine from these data nitrate behavior patterns. As a result, we observe a seasonality of nitrate diurnal cycles, that differs from the most common cycle seasonality described in the literature, i.e. pre-dawn peaks in spring. Our cycles appear in summer and the maximum and minimum shift to a later time in late summer/autumn. This is observed both for water- and energy-limited years, thus potentially stressing the role of evapotranspiration. This concluding hypothesis on the role of evapotranspiration on nitrate stream concentration, which was obtained through data mining, broadens the perspective on the diurnal cycling of stream nitrate concentrations. PMID:27073838

  13. A new approach to estimate fugitive methane emissions from coal mining in China.

    PubMed

    Ju, Yiwen; Sun, Yue; Sa, Zhanyou; Pan, Jienan; Wang, Jilin; Hou, Quanlin; Li, Qingguang; Yan, Zhifeng; Liu, Jie

    2016-02-01

    Developing a more accurate greenhouse gas (GHG) emissions inventory draws too much attention. Because of its resource endowment and technical status, China has made coal-related GHG emissions a big part of its inventory. Lacking a stoichiometric carbon conversion coefficient and influenced by geological conditions and mining technologies, previous efforts to estimate fugitive methane emissions from coal mining in China has led to disagreeing results. This paper proposes a new calculation methodology to determine fugitive methane emissions from coal mining based on the domestic analysis of gas geology, gas emission features, and the merits and demerits of existing estimation methods. This new approach involves four main parameters: in-situ original gas content, gas remaining post-desorption, raw coal production, and mining influence coefficient. The case studies in Huaibei-Huainan Coalfield and Jincheng Coalfield show that the new method obtains the smallest error, +9.59% and 7.01% respectively compared with other methods, Tier 1 and Tier 2 (with two samples) in this study, which resulted in +140.34%, +138.90%, and -18.67%, in Huaibei-Huainan Coalfield, while +64.36%, +47.07%, and -14.91% in Jincheng Coalfield. Compared with the predominantly used methods, this new one possesses the characteristics of not only being a comparably more simple process and lower uncertainty than the "emission factor method" (IPCC recommended Tier 1 and Tier 2), but also having easier data accessibility, similar uncertainty, and additional post-mining emissions compared to the "absolute gas emission method" (IPCC recommended Tier 3). Therefore, methane emissions dissipated from most of the producing coal mines worldwide could be more accurately and more easily estimated.

  14. Novel LanT Associated Lantibiotic Clusters Identified by Genome Database Mining

    PubMed Central

    Singh, Mangal; Sareen, Dipti

    2014-01-01

    Background Frequent use of antibiotics has led to the emergence of antibiotic resistance in bacteria. Lantibiotic compounds are ribosomally synthesized antimicrobial peptides against which bacteria are not able to produce resistance, hence making them a good alternative to antibiotics. Nisin is the oldest and the most widely used lantibiotic, in food preservation, without having developed any significant resistance against it. Having their antimicrobial potential and a limited number, there is a need to identify novel lantibiotics. Methodology/Findings Identification of novel lantibiotic biosynthetic clusters from an ever increasing database of bacterial genomes, can provide a major lead in this direction. In order to achieve this, a strategy was adopted to identify novel lantibiotic biosynthetic clusters by screening the sequenced genomes for LanT homolog, which is a conserved lantibiotic transporter specific to type IB clusters. This strategy resulted in identification of 54 bacterial strains containing the LanT homologs, which are not the known lantibiotic producers. Of these, 24 strains were subjected to a detailed bioinformatic analysis to identify genes encoding for precursor peptides, modification enzyme, immunity and quorum sensing proteins. Eight clusters having two LanM determinants, similar to haloduracin and lichenicidin were identified, along with 13 clusters having a single LanM determinant as in mersacidin biosynthetic cluster. Besides these, orphan LanT homologs were also identified which might be associated with novel bacteriocins, encoded somewhere else in the genome. Three identified gene clusters had a C39 domain containing LanT transporter, associated with the LanBC proteins and double glycine type precursor peptides, the only known example of such a cluster is that of salivaricin. Conclusion This study led to the identification of 8 novel putative two-component lantibiotic clusters along with 13 having a single LanM and 3 with LanBC genes

  15. DNA enrichment approaches to identify unauthorized genetically modified organisms (GMOs).

    PubMed

    Arulandhu, Alfred J; van Dijk, Jeroen P; Dobnik, David; Holst-Jensen, Arne; Shi, Jianxin; Zel, Jana; Kok, Esther J

    2016-07-01

    With the increased global production of different genetically modified (GM) plant varieties, chances increase that unauthorized GM organisms (UGMOs) may enter the food chain. At the same time, the detection of UGMOs is a challenging task because of the limited sequence information that will generally be available. PCR-based methods are available to detect and quantify known UGMOs in specific cases. If this approach is not feasible, DNA enrichment of the unknown adjacent sequences of known GMO elements is one way to detect the presence of UGMOs in a food or feed product. These enrichment approaches are also known as chromosome walking or gene walking (GW). In recent years, enrichment approaches have been coupled with next generation sequencing (NGS) analysis and implemented in, amongst others, the medical and microbiological fields. The present review will provide an overview of these approaches and an evaluation of their applicability in the identification of UGMOs in complex food or feed samples.

  16. A Temporal Pattern Mining Approach for Classifying Electronic Health Record Data.

    PubMed

    Batal, Iyad; Valizadegan, Hamed; Cooper, Gregory F; Hauskrecht, Milos

    2013-09-01

    We study the problem of learning classification models from complex multivariate temporal data encountered in electronic health record systems. The challenge is to define a good set of features that are able to represent well the temporal aspect of the data. Our method relies on temporal abstractions and temporal pattern mining to extract the classification features. Temporal pattern mining usually returns a large number of temporal patterns, most of which may be irrelevant to the classification task. To address this problem, we present the Minimal Predictive Temporal Patterns framework to generate a small set of predictive and non-spurious patterns. We apply our approach to the real-world clinical task of predicting patients who are at risk of developing heparin induced thrombocytopenia. The results demonstrate the benefit of our approach in efficiently learning accurate classifiers, which is a key step for developing intelligent clinical monitoring systems.

  17. Mining a Written Values Affirmation Intervention to Identify the Unique Linguistic Features of Stigmatized Groups

    ERIC Educational Resources Information Center

    Riddle, Travis; Bhagavatula, Sowmya Sree; Guo, Weiwei; Muresan, Smaranda; Cohen, Geoff; Cook, Jonathan E.; Purdie-Vaughns, Valerie

    2015-01-01

    Social identity threat refers to the process through which an individual underperforms in some domain due to their concern with confirming a negative stereotype held about their group. Psychological research has identified this as one contributor to the underperformance and underrepresentation of women, Blacks, and Latinos in STEM fields. Over the…

  18. Identifying the Factors Affecting Science and Mathematics Achievement Using Data Mining Methods

    ERIC Educational Resources Information Center

    Kiray, S. Ahmet; Gok, Bilge; Bozkir, A. Selman

    2015-01-01

    The purpose of this article is to identify the order of significance of the variables that affect science and mathematics achievement in middle school students. For this aim, the study deals with the relationship between science and math in terms of different angles using the perspectives of multiple causes-single effect and of multiple…

  19. Using Data Mining to Identify Actionable Information: Breaking New Ground in Data-Driven Decision Making

    ERIC Educational Resources Information Center

    Streifer, Philip A.; Schumann, Jeffrey A.

    2005-01-01

    The implementation of No Child Left Behind (NCLB) presents important challenges for schools across the nation to identify problems that lead to poor performance. Yet schools must intervene with instructional programs that can make a difference and evaluate the effectiveness of such programs. New advances in artificial intelligence (AI) data-mining…

  20. Combustion efficiency optimization and virtual testing: A data-mining approach

    SciTech Connect

    Kusiak, A.; Song, Z.

    2006-08-15

    In this paper, a data-mining approach is applied to optimize combustion efficiency of a coal-fired boiler. The combustion process is complex, nonlinear, and nonstationary. A virtual testing procedure is developed to validate the results produced by the optimization methods. The developed procedure quantifies improvements in the combustion efficiency without performing live testing, which is expensive and time consuming. The ideas introduced in this paper are illustrated with an industrial case study.

  1. An ecosystem approach to evaluate restoration measures in the lignite mining district of Lusatia/Germany

    NASA Astrophysics Data System (ADS)

    Schaaf, Wolfgang

    2015-04-01

    Lignite mining in Lusatia has a history of over 100 years. Open-cast mining directly affected an area of 1000 km2. Since 20 years we established an ecosystem oriented approach to evaluate the development and site characteristics of post-mining areas mainly restored for agricultural and silvicultural land use. Water and element budgets of afforested sites were studied under different geochemical settings in a chronosequence approach (Schaaf 2001), as well as the effect of soil amendments like sewage sludge or compost in restoration (Schaaf & Hüttl 2006). Since 10 years we also study the development of natural site regeneration in the constructed catchment Chicken Creek at the watershed scale (Schaaf et al. 2011, 2013). One of the striking characteristics of post-mining sites is a very large small-scale soil heterogeneity that has to be taken into account with respect to soil forming processes and element cycling. Results from these studies in combination with smaller-scale process studies enable to evaluate the long-term effect of restoration measures and adapted land use options. In addition, it is crucial to compare these results with data from undisturbed, i.e. non-mined sites. Schaaf, W., 2001: What can element budgets of false-time series tell us about ecosystem development on post-lignite mining sites? Ecological Engineering 17, 241-252. Schaaf, W. and Hüttl, R. F., 2006: Direct and indirect effects of soil pollution by lignite mining. Water, Air and Soil Pollution - Focus 6, 253-264. Schaaf, W., Bens, O., Fischer, A., Gerke, H.H., Gerwin, W., Grünewald, U., Holländer, H.M., Kögel-Knabner, I., Mutz, M., Schloter, M., Schulin, R., Veste, M., Winter, S. & Hüttl, R.F., 2011: Patterns and processes of initial terrestrial-ecosystem development. Journal of Plant Nutrition and Soil Science, 174, 229-239. Schaaf, W., Elmer, M., Fischer, A., Gerwin, W., Nenov, R., Pretsch, H. and Zaplate, M.K., 2013: Feedbacks between vegetation, surface structures and hydrology

  2. A Novel Approach for Mining Polymorphic Microsatellite Markers In Silico

    PubMed Central

    Hoffman, Joseph I.; Nichols, Hazel J.

    2011-01-01

    An important emerging application of high-throughput 454 sequencing is the isolation of molecular markers such as microsatellites from genomic DNA. However, few studies have developed microsatellites from cDNA despite the added potential for targeting candidate genes. Moreover, to develop microsatellites usually requires the evaluation of numerous primer pairs for polymorphism in the focal species. This can be time-consuming and wasteful, particularly for taxa with low genetic diversity where the majority of primers often yield monomorphic polymerase chain reaction (PCR) products. Transcriptome assemblies provide a convenient solution, functional annotation of transcripts allowing markers to be targeted towards candidate genes, while high sequence coverage in principle permits the assessment of variability in silico. Consequently, we evaluated fifty primer pairs designed to amplify microsatellites, primarily residing within transcripts related to immunity and growth, identified from an Antarctic fur seal (Arctocephalus gazella) transcriptome assembly. In silico visualization was used to classify each microsatellite as being either polymorphic or monomorphic and to quantify the number of distinct length variants, each taken to represent a different allele. The majority of loci (n = 36, 76.0%) yielded interpretable PCR products, 23 of which were polymorphic in a sample of 24 fur seal individuals. Loci that appeared variable in silico were significantly more likely to yield polymorphic PCR products, even after controlling for microsatellite length measured in silico. We also found a significant positive relationship between inferred and observed allele number. This study not only demonstrates the feasibility of generating modest panels of microsatellites targeted towards specific classes of gene, but also suggests that in silico microsatellite variability may provide a useful proxy for PCR product polymorphism. PMID:21853104

  3. Identifying Predictors of Physics Item Difficulty: A Linear Regression Approach

    ERIC Educational Resources Information Center

    Mesic, Vanes; Muratovic, Hasnija

    2011-01-01

    Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary…

  4. An Approach for Identifying Benefit Segments among Prospective College Students.

    ERIC Educational Resources Information Center

    Miller, Patrick; And Others

    1990-01-01

    A study investigated the importance to 578 applicants of various benefits offered by a moderately selective private university. Applicants rated the institution on 43 academic, social, financial, religious, and curricular attributes. The objective was to test the efficacy of one approach to college market segmentation. Results support the utility…

  5. Genomic approaches to identifying transcriptional regulators of osteoblast differentiation

    NASA Technical Reports Server (NTRS)

    Stains, Joseph P.; Civitelli, Roberto

    2003-01-01

    Recent microarray studies of mouse and human osteoblast differentiation in vitro have identified novel transcription factors that may be important in the establishment and maintenance of differentiation. These findings help unravel the pattern of gene-expression changes that underly the complex process of bone formation.

  6. Tiered High-Throughput Screening Approach to Identify ...

    EPA Pesticide Factsheets

    High-throughput screening (HTS) for potential thyroid–disrupting chemicals requires a system of assays to capture multiple molecular-initiating events (MIEs) that converge on perturbed thyroid hormone (TH) homeostasis. Screening for MIEs specific to TH-disrupting pathways is limited in the US EPA ToxCast screening assay portfolio. To fill one critical screening gap, the Amplex UltraRed-thyroperoxidase (AUR-TPO) assay was developed to identify chemicals that inhibit TPO, as decreased TPO activity reduces TH synthesis. The ToxCast Phase I and II chemical libraries, comprised of 1,074 unique chemicals, were initially screened using a single, high concentration to identify potential TPO inhibitors. Chemicals positive in the single concentration screen were retested in concentration-response. Due to high false positive rates typically observed with loss-of-signal assays such as AUR-TPO, we also employed two additional assays in parallel to identify possible sources of nonspecific assay signal loss, enabling stratification of roughly 300 putative TPO inhibitors based upon selective AUR-TPO activity. A cell-free luciferase inhibition assay was used to identify nonspecific enzyme inhibition among the putative TPO inhibitors, and a cytotoxicity assay using a human cell line was used to estimate the cellular tolerance limit. Additionally, the TPO inhibition activities of 150 chemicals were compared between the AUR-TPO and an orthogonal peroxidase oxidation assay using

  7. Identifying the "Truly Disadvantaged": A Comprehensive Biosocial Approach

    ERIC Educational Resources Information Center

    Barnes, J. C.; Beaver, Kevin M.; Connolly, Eric J.; Schwartz, Joseph A.

    2016-01-01

    There has been significant interest in examining the developmental factors that predispose individuals to chronic criminal offending. This body of research has identified some social-environmental risk factors as potentially important. At the same time, the research producing these results has generally failed to employ genetically sensitive…

  8. Systems Approaches to Identifying Gene Regulatory Networks in Plants

    PubMed Central

    Long, Terri A.; Brady, Siobhan M.; Benfey, Philip N.

    2009-01-01

    Complex gene regulatory networks are composed of genes, noncoding RNAs, proteins, metabolites, and signaling components. The availability of genome-wide mutagenesis libraries; large-scale transcriptome, proteome, and metabalome data sets; and new high-throughput methods that uncover protein interactions underscores the need for mathematical modeling techniques that better enable scientists to synthesize these large amounts of information and to understand the properties of these biological systems. Systems biology approaches can allow researchers to move beyond a reductionist approach and to both integrate and comprehend the interactions of multiple components within these systems. Descriptive and mathematical models for gene regulatory networks can reveal emergent properties of these plant systems. This review highlights methods that researchers are using to obtain large-scale data sets, and examples of gene regulatory networks modeled with these data. Emergent properties revealed by the use of these network models and perspectives on the future of systems biology are discussed. PMID:18616425

  9. A Computational Approach for Identifying Synergistic Drug Combinations

    PubMed Central

    Gayvert, Kaitlyn M.; Aly, Omar; Bosenberg, Marcus W.; Stern, David F.; Elemento, Olivier

    2017-01-01

    A promising alternative to address the problem of acquired drug resistance is to rely on combination therapies. Identification of the right combinations is often accomplished through trial and error, a labor and resource intensive process whose scale quickly escalates as more drugs can be combined. To address this problem, we present a broad computational approach for predicting synergistic combinations using easily obtainable single drug efficacy, no detailed mechanistic understanding of drug function, and limited drug combination testing. When applied to mutant BRAF melanoma, we found that our approach exhibited significant predictive power. Additionally, we validated previously untested synergy predictions involving anticancer molecules. As additional large combinatorial screens become available, this methodology could prove to be impactful for identification of drug synergy in context of other types of cancers. PMID:28085880

  10. Using social connection information to improve opinion mining: Identifying negative sentiment about HPV vaccines on Twitter.

    PubMed

    Zhou, Xujuan; Coiera, Enrico; Tsafnat, Guy; Arachi, Diana; Ong, Mei-Sing; Dunn, Adam G

    2015-01-01

    The manner in which people preferentially interact with others like themselves suggests that information about social connections may be useful in the surveillance of opinions for public health purposes. We examined if social connection information from tweets about human papillomavirus (HPV) vaccines could be used to train classifiers that identify anti-vaccine opinions. From 42,533 tweets posted between October 2013 and March 2014, 2,098 were sampled at random and two investigators independently identified anti-vaccine opinions. Machine learning methods were used to train classifiers using the first three months of data, including content (8,261 text fragments) and social connections (10,758 relationships). Connection-based classifiers performed similarly to content-based classifiers on the first three months of training data, and performed more consistently than content-based classifiers on test data from the subsequent three months. The most accurate classifier achieved an accuracy of 88.6% on the test data set, and used only social connection features. Information about how people are connected, rather than what they write, may be useful for improving public health surveillance methods on Twitter.

  11. Evaluation of the approach to respirable quartz exposure control in U.S. coal mines.

    PubMed

    Joy, Gerald J

    2012-01-01

    Occupational exposure to high levels of respirable quartz can result in respiratory and other diseases in humans. The Mine Safety and Health Adminstration (MSHA) regulates exposure to respirable quartz in coal mines indirectly through reductions in the respirable coal mine dust exposure limit based on the content of quartz in the airborne respirable dust. This reduction is implemented when the quartz content of airborne respirable dust exceeds 5% by weight. The intent of this dust standard reduction is to restrict miners' exposure to respirable quartz to a time-weighted average concentration of 100 μg/m(3). The effectiveness of this indirect approach to control quartz exposure was evaluated by analyzing respirable dust samples collected by MSHA inspectors from 1995 through 2008. The performance of the current regulatory approach was found to be lacking due to the use of a variable property-quartz content in airborne dust-to establish a standard for subsequent exposures. In one situation, 11.7% (4370/37,346) of samples that were below the applicable respirable coal mine dust exposure limit exceeded 100 μg/m(3) quartz. In a second situation, 4.4% (895/20,560) of samples with 5% or less quartz content in the airborne respirable dust exceeded 100 μg/m(3) quartz. In these two situations, the samples exceeding 100 μg/m(3) quartz were not subject to any potential compliance action. Therefore, the current respirable quartz exposure control approach does not reliably maintain miner exposure below 100 μg/m(3) quartz. A separate and specific respirable quartz exposure standard may improve control of coal miners' occupational exposure to respirable quartz.

  12. Multidisciplinary approach to identify aquifer-peatland connectivity

    NASA Astrophysics Data System (ADS)

    Larocque, Marie; Pellerin, Stéphanie; Cloutier, Vincent; Ferlatte, Miryane; Munger, Julie; Quillet, Anne; Paniconi, Claudio

    2015-04-01

    In southern Quebec (Canada), wetlands sustain increasing pressures from agriculture, urban development, and peat exploitation. To protect both groundwater and ecosystems, it is important to be able to identify how, where, and to what extent shallow aquifers and wetlands are connected. This study focuses on peatlands which are especially abundant in Quebec. The objective of this research was to better understand aquifer-peatland connectivity and to identify easily measured indicators of this connectivity. Geomorphology, hydrogeochemistry, and vegetation were selected as key indicators of connectivity. Twelve peatland transects were instrumented and monitored in the Abitibi (slope peatlands associated with eskers) and Centre-du-Quebec (depression peatlands) regions of Quebec (Canada). Geomorphology, geology, water levels, water chemistry, and vegetation species were identified/measured on all transects. Flow conditions were simulated numerically on two typical transects. Results show that a majority of peatland transects receives groundwater from a shallow aquifer. In slope peatlands, groundwater flows through the organic deposits towards the peatland center. In depression peatlands, groundwater flows only 100-200 m within the peatland before being redirected through surface routes towards the outlet. Flow modeling and sensitivity analysis have identified that the thickness and hydraulic conductivity of permeable deposits close to the peatland and beneath the organic deposits influence flow directions within the peatland. Geochemical data have confirmed the usefulness of total dissolved solids (TDS) exceeding 14 mg/L as an indicator of the presence of groundwater within the peatland. Vegetation surveys have allowed the identification of species and groups of species that occur mostly when groundwater is present, for instance Carex limosa and Sphagnum russowii. Geomorphological conditions (slope or depression peatland), TDS, and vegetation can be measured

  13. Genetic heterogeneity of asthma phenotypes identified by a clustering approach.

    PubMed

    Siroux, Valérie; González, Juan R; Bouzigon, Emmanuelle; Curjuric, Ivan; Boudier, Anne; Imboden, Medea; Anto, Josep Maria; Gut, Ivo; Jarvis, Deborah; Lathrop, Mark; Omenaas, Ernst Reidar; Pin, Isabelle; Wjst, Mathias; Demenais, Florence; Probst-Hensch, Nicole; Kogevinas, Manolis; Kauffmann, Francine

    2014-02-01

    The aim of the study was to identify genetic variants associated with refined asthma phenotypes enabling multiple features of the disease to be taken into account. Latent class analysis (LCA) was applied in 3001 adults ever having asthma recruited in the frame of three epidemiological surveys (the European Community Respiratory Health Survey (ECRHS), the Swiss Study on Air Pollution and Lung Disease in Adults (SAPALDIA) and the Epidemiological Study on the Genetics and Environment of Asthma (EGEA)). 14 personal and phenotypic characteristics, gathered from questionnaires and clinical examination, were used. A genome-wide association study was conducted for each LCA-derived asthma phenotype, compared to subjects without asthma (n=3474). The LCA identified four adult asthma phenotypes, mainly characterised by disease activity, age of asthma onset and atopic status. Associations of genome-wide significance (p<1.25 × 10(-7)) were observed between "active adult-onset nonallergic asthma" and rs9851461 flanking CD200 (3q13.2) and between "inactive/mild nonallergic asthma" and rs2579931 flanking GRIK2 (6q16.3). Borderline significant results (2.5 × 10(-7) < p <8.2 × 10(-7)) were observed between three single nucleotide polymorphisms (SNPs) in the ALCAM region (3q13.11) and "active adult-onset nonallergic asthma". These results were consistent across studies. 15 SNPs identified in previous genome-wide association studies of asthma have been replicated with at least one asthma phenotype, most of them with the "active allergic asthma" phenotype. Our results provide evidence that a better understanding of asthma phenotypic heterogeneity helps to disentangle the genetic heterogeneity of asthma.

  14. Intrusion detection: a novel approach that combines boosting genetic fuzzy classifier and data mining techniques

    NASA Astrophysics Data System (ADS)

    Ozyer, Tansel; Alhajj, Reda; Barker, Ken

    2005-03-01

    This paper proposes an intelligent intrusion detection system (IDS) which is an integrated approach that employs fuzziness and two of the well-known data mining techniques: namely classification and association rule mining. By using these two techniques, we adopted the idea of using an iterative rule learning that extracts out rules from the data set. Our final intention is to predict different behaviors in networked computers. To achieve this, we propose to use a fuzzy rule based genetic classifier. Our approach has two main stages. First, fuzzy association rule mining is applied and a large number of candidate rules are generated for each class. Then the rules pass through pre-screening mechanism in order to reduce the fuzzy rule search space. Candidate rules obtained after pre-screening are used in genetic fuzzy classifier to generate rules for the specified classes. Classes are defined as Normal, PRB-probe, DOS-denial of service, U2R-user to root and R2L- remote to local. Second, an iterative rule learning mechanism is employed for each class to find its fuzzy rules required to classify data each time a fuzzy rule is extracted and included in the system. A Boosting mechanism evaluates the weight of each data item in order to help the rule extraction mechanism focus more on data having relatively higher weight. Finally, extracted fuzzy rules having the corresponding weight values are aggregated on class basis to find the vote of each class label for each data item.

  15. MEDICI: Mining Essentiality Data to Identify Critical Interactions for Cancer Drug Target Discovery and Development.

    PubMed

    Harati, Sahar; Cooper, Lee A D; Moran, Josue D; Giuste, Felipe O; Du, Yuhong; Ivanov, Andrei A; Johns, Margaret A; Khuri, Fadlo R; Fu, Haian; Moreno, Carlos S

    2017-01-01

    Protein-protein interactions (PPIs) mediate the transmission and regulation of oncogenic signals that are essential to cellular proliferation and survival, and thus represent potential targets for anti-cancer therapeutic discovery. Despite their significance, there is no method to experimentally disrupt and interrogate the essentiality of individual endogenous PPIs. The ability to computationally predict or infer PPI essentiality would help prioritize PPIs for drug discovery and help advance understanding of cancer biology. Here we introduce a computational method (MEDICI) to predict PPI essentiality by combining gene knockdown studies with network models of protein interaction pathways in an analytic framework. Our method uses network topology to model how gene silencing can disrupt PPIs, relating the unknown essentialities of individual PPIs to experimentally observed protein essentialities. This model is then deconvolved to recover the unknown essentialities of individual PPIs. We demonstrate the validity of our approach via prediction of sensitivities to compounds based on PPI essentiality and differences in essentiality based on genetic mutations. We further show that lung cancer patients have improved overall survival when specific PPIs are no longer present, suggesting that these PPIs may be potentially new targets for therapeutic development. Software is freely available at https://github.com/cooperlab/MEDICI. Datasets are available at https://ctd2.nci.nih.gov/dataPortal.

  16. MEDICI: Mining Essentiality Data to Identify Critical Interactions for Cancer Drug Target Discovery and Development

    PubMed Central

    Moran, Josue D.; Giuste, Felipe O.; Du, Yuhong; Ivanov, Andrei A.; Johns, Margaret A.; Khuri, Fadlo R.; Fu, Haian

    2017-01-01

    Protein-protein interactions (PPIs) mediate the transmission and regulation of oncogenic signals that are essential to cellular proliferation and survival, and thus represent potential targets for anti-cancer therapeutic discovery. Despite their significance, there is no method to experimentally disrupt and interrogate the essentiality of individual endogenous PPIs. The ability to computationally predict or infer PPI essentiality would help prioritize PPIs for drug discovery and help advance understanding of cancer biology. Here we introduce a computational method (MEDICI) to predict PPI essentiality by combining gene knockdown studies with network models of protein interaction pathways in an analytic framework. Our method uses network topology to model how gene silencing can disrupt PPIs, relating the unknown essentialities of individual PPIs to experimentally observed protein essentialities. This model is then deconvolved to recover the unknown essentialities of individual PPIs. We demonstrate the validity of our approach via prediction of sensitivities to compounds based on PPI essentiality and differences in essentiality based on genetic mutations. We further show that lung cancer patients have improved overall survival when specific PPIs are no longer present, suggesting that these PPIs may be potentially new targets for therapeutic development. Software is freely available at https://github.com/cooperlab/MEDICI. Datasets are available at https://ctd2.nci.nih.gov/dataPortal. PMID:28118365

  17. Forecasting Precipitation over the MENA Region: A Data Mining and Remote Sensing Based Approach

    NASA Astrophysics Data System (ADS)

    Elkadiri, R.; Sultan, M.; Elbayoumi, T.; Chouinard, K.

    2015-12-01

    We developed and applied an integrated approach to construct predictive tools with lead times of 1 to 12 months to forecast precipitation amounts over the Middle East and North Africa (MENA) region. The following steps were conducted: (1) acquire and analyze temporal remote sensing-based precipitation datasets (i.e. Tropical Rainfall Measuring Mission [TRMM]) over five main water source regions in the MENA area (i.e. Atlas Mountains in Morocco, Southern Sudan, Red Sea Hills of Yemen, and Blue Nile and White Nile source areas) throughout the investigation period (1998 to 2015), (2) acquire and extract monthly values for all of the climatic indices that are likely to influence the climatic patterns over the MENA region (e.g., Northern Atlantic Oscillation [NOI], Southern Oscillation Index [SOI], and Tropical North Atlantic Index [TNA]); and (3) apply data mining methods to extract relationships between the observed precipitation and the controlling factors (climatic indices) and use predictive tools to forecast monthly precipitation over each of the identified pilot study areas. Preliminary results indicate that by using the period from January 1998 until August 2012 for model training and the period from September 2012 to January 2015 for testing, precipitation can be successfully predicted with a three-months lead over South West Yemen, Atlas Mountains in Morocco, Southern Sudan, Blue Nile sources and White Nile sources with confidence (Pearson correlation coefficient: 0.911, 0.823, 0.807, 0.801 and 0.895 respectively). Future work will focus on applying this technique for prediction of precipitation over each of the climatically contiguous areas of the MENA region. If our efforts are successful, our findings will lead the way to the development and implementation of sound water management scenarios for the MENA countries.

  18. Timely approaches to identify probiotic species of the genus Lactobacillus

    PubMed Central

    2013-01-01

    Over the past decades the use of probiotics in food has increased largely due to the manufacturer’s interest in placing “healthy” food on the market based on the consumer’s ambitions to live healthy. Due to this trend, health benefits of products containing probiotic strains such as lactobacilli are promoted and probiotic strains have been established in many different products with their numbers increasing steadily. Probiotics are used as starter cultures in dairy products such as cheese or yoghurts and in addition they are also utilized in non-dairy products such as fermented vegetables, fermented meat and pharmaceuticals, thereby, covering a large variety of products. To assure quality management, several pheno-, physico- and genotyping methods have been established to unambiguously identify probiotic lactobacilli. These methods are often specific enough to identify the probiotic strains at genus and species levels. However, the probiotic ability is often strain dependent and it is impossible to distinguish strains by basic microbiological methods. Therefore, this review aims to critically summarize and evaluate conventional identification methods for the genus Lactobacillus, complemented by techniques that are currently being developed. PMID:24063519

  19. Targeted Approach to Identify Genetic Loci Associated with ...

    EPA Pesticide Factsheets

    Extreme tolerance to highly toxic dioxin-like contaminants (DLCs) has evolved independently and contemporaneously in (at least) four populations of Atlantic killifish (Fundulus heteroclitus). Surprisingly, the magnitude and phenotype of DLC tolerance is similar among these killifish populations that have adapted to varied, but highly contaminated urban/industrialized estuaries of the US Atlantic coast. We hypothesized that comparisons among tolerant populations and in contrast to their sensitive neighboring killifish might reveal genetic loci associated with DLC tolerance. Since the aryl hydrocarbon receptor (AHR) pathway partly or fully mediates DLC toxicity in vertebrates, we identified single nucleotide polymorphisms (SNPs) from 43 genes associated with the AHR to serve as targeted markers. Wild fish from the four highly tolerant killifish populations and four nearby sensitive populations were genotyped using 59 SNP markers. Consistent with other killifish population genetic analyses, our results revealed strong genetic differentiation among populations, consistent with isolation by distance models. Pairwise comparisons of nearby tolerant and sensitive populations revealed differentiation among these loci: AHR 1 and 2, cathepsin Z, the cytochrome P450s (CYP) 1A and 3A30, and the NADH ubiquinone oxidoreductase MLRQ subunit. By grouping tolerant versus sensitive populations, we also identified cytochrome P450 1A and the AHR2 loci as under selection, lend

  20. A new approach to preserve privacy data mining based on fuzzy theory in numerical database

    NASA Astrophysics Data System (ADS)

    Cui, Run; Kim, Hyoung Joong

    2014-01-01

    With the rapid development of information techniques, data mining approaches have become one of the most important tools to discover the in-deep associations of tuples in large-scale database. Hence how to protect the private information is quite a huge challenge, especially during the data mining procedure. In this paper, a new method is proposed for privacy protection which is based on fuzzy theory. The traditional fuzzy approach in this area will apply fuzzification to the data without considering its readability. A new style of obscured data expression is introduced to provide more details of the subsets without reducing the readability. Also we adopt a balance approach between the privacy level and utility when to achieve the suitable subgroups. An experiment is provided to show that this approach is suitable for the classification without a lower accuracy. In the future, this approach can be adapted to the data stream as the low computation complexity of the fuzzy function with a suitable modification.

  1. Multimodal Approach to Identifying Malingered Posttraumatic Stress Disorder: A Review

    PubMed Central

    Jabeen, Shagufta; Alam, Farzana

    2015-01-01

    The primary aim of this article is to aid clinicians in differentiating true posttraumatic stress disorder from malingered posttraumatic stress disorder. Posttraumatic stress disorder and malingering are defined, and prevalence rates are explored. Similarities and differences in diagnostic criteria between the fourth and fifth editions of the Diagnostic and Statistical Manual of Mental Disorders are described for posttraumatic stress disorder. Possible motivations for malingering posttraumatic stress disorder are discussed, and common characteristics of malingered posttraumatic stress disorder are described. A multimodal approach is described for evaluating posttraumatic stress disorder, including interview techniques, collection of collateral data, and psychometric and physiologic testing, that should allow clinicians to distinguish between those patients who are truly suffering from posttraumatic disorder and those who are malingering the illness. PMID:25852974

  2. Utilizing Soize's Approach to Identify Parameter and Model Uncertainties

    SciTech Connect

    Bonney, Matthew S.; Brake, Matthew Robert

    2014-10-01

    Quantifying uncertainty in model parameters is a challenging task for analysts. Soize has derived a method that is able to characterize both model and parameter uncertainty independently. This method is explained with the assumption that some experimental data is available, and is divided into seven steps. Monte Carlo analyses are performed to select the optimal dispersion variable to match the experimental data. Along with the nominal approach, an alternative distribution can be used along with corrections that can be utilized to expand the scope of this method. This method is one of a very few methods that can quantify uncertainty in the model form independently of the input parameters. Two examples are provided to illustrate the methodology, and example code is provided in the Appendix.

  3. Omics Approach to Identify Factors Involved in Brassica Disease Resistance.

    PubMed

    Francisco, Marta; Soengas, Pilar; Velasco, Pablo; Bhadauria, Vijai; Cartea, Maria E; Rodríguez, Victor M

    2016-01-01

    Understanding plant's defense mechanisms and their response to biotic stresses is of fundamental meaning for the development of resistant crop varieties and more productive agriculture. The Brassica genus involves a large variety of economically important species and cultivars used as vegetable source, oilseeds, forage and ornamental. Damage caused by pathogens attack affects negatively various aspects of plant growth, development, and crop productivity. Over the last few decades, advances in plant physiology, genetics, and molecular biology have greatly improved our understanding of plant responses to biotic stress conditions. In this regard, various 'omics' technologies enable qualitative and quantitative monitoring of the abundance of various biological molecules in a high-throughput manner, and thus allow determination of their variation between different biological states on a genomic scale. In this review, we have described advances in 'omic' tools (genomics, transcriptomics, proteomics and metabolomics) in the view of conventional and modern approaches being used to elucidate the molecular mechanisms that underlie Brassica disease resistance.

  4. A genomics approach identifies senescence-specific gene expression regulation

    PubMed Central

    Lackner, Daniel H; Hayashi, Makoto T; Cesare, Anthony J; Karlseder, Jan

    2014-01-01

    Replicative senescence is a fundamental tumor-suppressive mechanism triggered by telomere erosion that results in a permanent cell cycle arrest. To understand the impact of telomere shortening on gene expression, we analyzed the transcriptome of diploid human fibroblasts as they progressed toward and entered into senescence. We distinguished novel transcription regulation due to replicative senescence by comparing senescence-specific expression profiles to profiles from cells arrested by DNA damage or serum starvation. Only a small specific subset of genes was identified that was truly senescence-regulated and changes in gene expression were exacerbated from presenescent to senescent cells. The majority of gene expression regulation in replicative senescence was shown to occur due to telomere shortening, as exogenous telomerase activity reverted most of these changes. PMID:24863242

  5. A genomics approach identifies senescence-specific gene expression regulation.

    PubMed

    Lackner, Daniel H; Hayashi, Makoto T; Cesare, Anthony J; Karlseder, Jan

    2014-10-01

    Replicative senescence is a fundamental tumor-suppressive mechanism triggered by telomere erosion that results in a permanent cell cycle arrest. To understand the impact of telomere shortening on gene expression, we analyzed the transcriptome of diploid human fibroblasts as they progressed toward and entered into senescence. We distinguished novel transcription regulation due to replicative senescence by comparing senescence-specific expression profiles to profiles from cells arrested by DNA damage or serum starvation. Only a small specific subset of genes was identified that was truly senescence-regulated and changes in gene expression were exacerbated from presenescent to senescent cells. The majority of gene expression regulation in replicative senescence was shown to occur due to telomere shortening, as exogenous telomerase activity reverted most of these changes.

  6. Enhanced approaches for identifying Amadori products: application to peanut allergens

    PubMed Central

    Johnson, Katina L.; Williams, Jason G.; Maleki, Soheila J.; Hurlburt, Barry K.; London, Robert E.; Mueller, Geoffrey A.

    2016-01-01

    The dry roasting of peanuts is suggested to influence allergenic sensitization due to formation of advanced glycation end products (AGE) on peanut proteins. Identifying AGEs is technically challenging. The AGEs of a peanut allergen were probed with nanoLC-ESI-MS and MS/MS analyses. Amadori product ions matched to expected peptides and yielded fragments that included a loss of 3 waters and HCHO. Due to the paucity of b- and y-ions in the MS/MS spectrum, standard search algorithms do not perform well. Reactions with isotopically labeled sugars confirmed that the peptides contained Amadori products. An algorithm was developed based upon information content (Shannon entropy) and the loss of water and HCHO. Results with test data show that the algorithm finds the correct spectra with high precision, reducing the time needed to manually inspect data. Computational and technical improvements allowed better identification of the chemical differences between modified and unmodified proteins. PMID:26811263

  7. A Data Mining Approach for Examining Predictors of Physical Activity Among Urban Older Adults.

    PubMed

    Yoon, Sunmoo; Suero-Tejeda, Niurka; Bakken, Suzanne

    2015-07-01

    The current study applied innovative data mining techniques to a community survey dataset to develop prediction models for two aspects of physical activity (i.e., active transport and screen time) in a sample of urban, primarily Hispanic, older adults (N=2,514). Main predictors for active transport (accuracy=69.29%, precision=0.67, recall=0.69) were immigrant status, high level of anxiety, having a place for physical activity, and willingness to make time for physical activity. The main predictors for screen time (accuracy=63.13%, precision=0.60, recall=0.63) were willingness to make time for exercise, having a place for exercise, age, and availability of family support to access health information on the Internet. Data mining methods were useful to identify intervention targets and inform design of customized interventions.

  8. Web Mining

    NASA Astrophysics Data System (ADS)

    Fürnkranz, Johannes

    The World-Wide Web provides every internet citizen with access to an abundance of information, but it becomes increasingly difficult to identify the relevant pieces of information. Research in web mining tries to address this problem by applying techniques from data mining and machine learning to Web data and documents. This chapter provides a brief overview of web mining techniques and research areas, most notably hypertext classification, wrapper induction, recommender systems and web usage mining.

  9. Development of a data-mining algorithm to identify ages at reproductive milestones in electronic medical records.

    PubMed

    Malinowski, Jennifer; Farber-Eger, Eric; Crawford, Dana C

    2014-01-01

    Electronic medical records (EMRs) are becoming more widely implemented following directives from the federal government and incentives for supplemental reimbursements for Medicare and Medicaid claims. Replete with rich phenotypic data, EMRs offer a unique opportunity for clinicians and researchers to identify potential research cohorts and perform epidemiologic studies. Notable limitations to the traditional epidemiologic study include cost, time to complete the study, and limited ancestral diversity; EMR-based epidemiologic studies offer an alternative. The Epidemiologic Architecture for Genes Linked to Environment (EAGLE) Study, as part of the Population Architecture using Genomics and Epidemiology (PAGE) I Study, has genotyped more than 15,000 patients of diverse ancestry in BioVU, the Vanderbilt University Medical Center's biorepository linked to the EMR (EAGLE BioVU). We report here the development and performance of data-mining techniques used to identify the age at menarche (AM) and age at menopause (AAM), important milestones in the reproductive lifespan, in women from EAGLE BioVU for genetic association studies. In addition, we demonstrate the ability to discriminate age at naturally-occurring menopause (ANM) from medically-induced menopause. Unusual timing of these events may indicate underlying pathologies and increased risk for some complex diseases and cancer; however, they are not consistently recorded in the EMR. Our algorithm offers a mechanism by which to extract these data for clinical and research goals.

  10. Phylogeny-guided (meta)genome mining approach for the targeted discovery of new microbial natural products.

    PubMed

    Kang, Hahk-Soo

    2017-02-01

    Genomics-based methods are now commonplace in natural products research. A phylogeny-guided mining approach provides a means to quickly screen a large number of microbial genomes or metagenomes in search of new biosynthetic gene clusters of interest. In this approach, biosynthetic genes serve as molecular markers, and phylogenetic trees built with known and unknown marker gene sequences are used to quickly prioritize biosynthetic gene clusters for their metabolites characterization. An increase in the use of this approach has been observed for the last couple of years along with the emergence of low cost sequencing technologies. The aim of this review is to discuss the basic concept of a phylogeny-guided mining approach, and also to provide examples in which this approach was successfully applied to discover new natural products from microbial genomes and metagenomes. I believe that the phylogeny-guided mining approach will continue to play an important role in genomics-based natural products research.

  11. A proteomic approach to identify endosomal cargoes controlling cancer invasiveness

    PubMed Central

    Diaz-Vera, Jesica; Palmer, Sarah; Hernandez-Fernaud, Juan Ramon; Dornier, Emmanuel; Mitchell, Louise E.; Macpherson, Iain; Edwards, Joanne; Zanivan, Sara

    2017-01-01

    ABSTRACT We have previously shown that Rab17, a small GTPase associated with epithelial polarity, is specifically suppressed by ERK2 (also known as MAPK1) signalling to promote an invasive phenotype. However, the mechanisms through which Rab17 loss permits invasiveness, and the endosomal cargoes that are responsible for mediating this, are unknown. Using quantitative mass spectrometry-based proteomics, we have found that knockdown of Rab17 leads to a highly selective reduction in the cellular levels of a v-SNARE (Vamp8). Moreover, proteomics and immunofluorescence indicate that Vamp8 is associated with Rab17 at late endosomes. Reduced levels of Vamp8 promote transition between ductal carcinoma in situ (DCIS) and a more invasive phenotype. We developed an unbiased proteomic approach to elucidate the complement of receptors that redistributes between endosomes and the plasma membrane, and have pin-pointed neuropilin-2 (NRP2) as a key pro-invasive cargo of Rab17- and Vamp8-regulated trafficking. Indeed, reduced Rab17 or Vamp8 levels lead to increased mobilisation of NRP2-containing late endosomes and upregulated cell surface expression of NRP2. Finally, we show that NRP2 is required for the basement membrane disruption that accompanies the transition between DCIS and a more invasive phenotype. PMID:28062852

  12. Correlation approach to identify coding regions in DNA sequences

    NASA Technical Reports Server (NTRS)

    Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1994-01-01

    Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.

  13. Newer Approaches to Identify Potential Untoward Effects in Functional Foods.

    PubMed

    Marone, Palma Ann; Birkenbach, Victoria L; Hayes, A Wallace

    2016-01-01

    Globalization has greatly accelerated the numbers and variety of food and beverage products available worldwide. The exchange among greater numbers of countries, manufacturers, and products in the United States and worldwide has necessitated enhanced quality measures for nutritional products for larger populations increasingly reliant on functionality. These functional foods, those that provide benefit beyond basic nutrition, are increasingly being used for their potential to alleviate food insufficiency while enhancing quality and longevity of life. In the United States alone, a steady import increase of greater than 15% per year or 24 million shipments, over 70% products of which are food related, is regulated under the Food and Drug Administration (FDA). This unparalleled growth has resulted in the need for faster, cheaper, and better safety and efficacy screening methods in the form of harmonized guidelines and recommendations for product standardization. In an effort to meet this need, the in vitro toxicology testing market has similarly grown with an anticipatory 15% increase between 2010 and 2015 of US$1.3 to US$2.7 billion. Although traditionally occupying a small fraction of the market behind pharmaceuticals and cosmetic/household products, the scope of functional food testing, including additives/supplements, ingredients, residues, contact/processing, and contaminants, is potentially expansive. Similarly, as functional food testing has progressed, so has the need to identify potential adverse factors that threaten the safety and quality of these products.

  14. Phenotypic Approaches to Identify Inhibitors of B Cell Activation

    PubMed Central

    Kim, Suzie; Wiener, Jake; Rao, Navin L.; Milla, Marcos E.; DiSepio, Daniel

    2015-01-01

    An EPIC label-free phenotypic platform was developed to explore B cell receptor (BCR) and CD40R-mediated B cell activation. The phenotypic assay measured the association of RL non-Hodgkin’s lymphoma B cells expressing lymphocyte function-associated antigen 1 (LFA-1) to intercellular adhesion molecule 1 (ICAM-1)-coated EPIC plates. Anti-IgM (immunoglobulin M) mediated BCR activation elicited a response that was blocked by LFA-1/ICAM-1 specific inhibitors and a panel of Bruton’s tyrosine kinase (BTK) inhibitors. LFA-1/ICAM-1 association was further increased on coapplication of anti-IgM and mega CD40L when compared to individual application of either. Anti-IgM, mega CD40L, or the combination of both displayed distinct kinetic profiles that were inhibited by treatment with a BTK inhibitor. We also established a FLIPR-based assay to measure B cell activation in Ramos Burkitt’s lymphoma B cells and an RL cell line. Anti-IgM-mediated BCR activation elicited a robust calcium response that was inhibited by a panel of BTK inhibitors. Conversely, CD40R activation did not elicit a calcium response in the FLIPR assay. Compared to the FLIPR, the EPIC assay has the propensity to identify inhibitors of both BCR and CD40R-mediated B cell activation and may provide more pharmacological depth or novel mechanisms of action for inhibition of B cell activation. PMID:25948491

  15. Biogeometallurgical pre-mining characterization of ore deposits: an approach to increase sustainability in the mining process.

    PubMed

    Dold, Bernhard; Weibel, Leyla

    2013-11-01

    Based on the knowledge obtained from acid mine drainage formation in mine waste environments (tailings impoundments and waste rock dumps), a new methodology is applied to characterize new ore deposits before exploitation starts. This gives the opportunity to design optimized processes for metal recovery of the different mineral assemblages in an ore deposit and at the same time to minimize the environmental impact and costs downstream for mine waste management. Additionally, the whole economic potential is evaluated including strategic elements. The methodology integrates high-resolution geochemistry by sequential extractions and quantitative mineralogy in combination with kinetic bioleach tests. The produced data set allows to define biogeometallurgical units in the ore deposit and to predict the behavior of each element, economically or environmentally relevant, along the mining process.

  16. Configurational approach to identifying the earliest hominin butchers.

    PubMed

    Domínguez-Rodrigo, Manuel; Pickering, Travis Rayne; Bunn, Henry T

    2010-12-07

    The announcement of two approximately 3.4-million-y-old purportedly butchered fossil bones from the Dikika paleoanthropological research area (Lower Awash Valley, Ethiopia) could profoundly alter our understanding of human evolution. Butchering damage on the Dikika bones would imply that tool-assisted meat-eating began approximately 800,000 y before previously thought, based on butchered bones from 2.6- to 2.5-million-y-old sites at the Ethiopian Gona and Bouri localities. Further, the only hominin currently known from Dikika at approximately 3.4 Ma is Australopithecus afarensis, a temporally and geographically widespread species unassociated previously with any archaeological evidence of butchering. Our taphonomic configurational approach to assess the claims of A. afarensis butchery at Dikika suggests the claims of unexpectedly early butchering at the site are not warranted. The Dikika research group focused its analysis on the morphology of the marks in question but failed to demonstrate, through recovery of similarly marked in situ fossils, the exact provenience of the published fossils, and failed to note occurrences of random striae on the cortices of the published fossils (incurred through incidental movement of the defleshed specimens across and/or within their abrasive encasing sediments). The occurrence of such random striae (sometimes called collectively "trampling" damage) on the two fossils provide the configurational context for rejection of the claimed butchery marks. The earliest best evidence for hominin butchery thus remains at 2.6 to 2.5 Ma, presumably associated with more derived species than A. afarensis.

  17. A landscape ecology approach identifies important drivers of urban biodiversity.

    PubMed

    Turrini, Tabea; Knop, Eva

    2015-04-01

    Cities are growing rapidly worldwide, yet a mechanistic understanding of the impact of urbanization on biodiversity is lacking. We assessed the impact of urbanization on arthropod diversity (species richness and evenness) and abundance in a study of six cities and nearby intensively managed agricultural areas. Within the urban ecosystem, we disentangled the relative importance of two key landscape factors affecting biodiversity, namely the amount of vegetated area and patch isolation. To do so, we a priori selected sites that independently varied in the amount of vegetated area in the surrounding landscape at the 500-m scale and patch isolation at the 100-m scale, and we hold local patch characteristics constant. As indicator groups, we used bugs, beetles, leafhoppers, and spiders. Compared to intensively managed agricultural ecosystems, urban ecosystems supported a higher abundance of most indicator groups, a higher number of bug species, and a lower evenness of bug and beetle species. Within cities, a high amount of vegetated area increased species richness and abundance of most arthropod groups, whereas evenness showed no clear pattern. Patch isolation played only a limited role in urban ecosystems, which contrasts findings from agro-ecological studies. Our results show that urban areas can harbor a similar arthropod diversity and abundance compared to intensively managed agricultural ecosystems. Further, negative consequences of urbanization on arthropod diversity can be mitigated by providing sufficient vegetated space in the urban area, while patch connectivity is less important in an urban context. This highlights the need for applying a landscape ecological approach to understand the mechanisms shaping urban biodiversity and underlines the potential of appropriate urban planning for mitigating biodiversity loss.

  18. Configurational approach to identifying the earliest hominin butchers

    PubMed Central

    Domínguez-Rodrigo, Manuel; Pickering, Travis Rayne; Bunn, Henry T.

    2010-01-01

    The announcement of two approximately 3.4-million-y-old purportedly butchered fossil bones from the Dikika paleoanthropological research area (Lower Awash Valley, Ethiopia) could profoundly alter our understanding of human evolution. Butchering damage on the Dikika bones would imply that tool-assisted meat-eating began approximately 800,000 y before previously thought, based on butchered bones from 2.6- to 2.5-million-y-old sites at the Ethiopian Gona and Bouri localities. Further, the only hominin currently known from Dikika at approximately 3.4 Ma is Australopithecus afarensis, a temporally and geographically widespread species unassociated previously with any archaeological evidence of butchering. Our taphonomic configurational approach to assess the claims of A. afarensis butchery at Dikika suggests the claims of unexpectedly early butchering at the site are not warranted. The Dikika research group focused its analysis on the morphology of the marks in question but failed to demonstrate, through recovery of similarly marked in situ fossils, the exact provenience of the published fossils, and failed to note occurrences of random striae on the cortices of the published fossils (incurred through incidental movement of the defleshed specimens across and/or within their abrasive encasing sediments). The occurrence of such random striae (sometimes called collectively “trampling” damage) on the two fossils provide the configurational context for rejection of the claimed butchery marks. The earliest best evidence for hominin butchery thus remains at 2.6 to 2.5 Ma, presumably associated with more derived species than A. afarensis. PMID:21078985

  19. A multi-isotope approach to characterize acid mine drainage in a hardrock alpine mine, Chaffe Co,Colorado.

    NASA Astrophysics Data System (ADS)

    Cordalis, D.; Williams, M. W.; Wireman, M.; Michel, R. L.; Manning, A.

    2004-12-01

    Here we present information from an innovative suite of stable, radiogenic, and cosmogenic isotopes to better understand groundwater flowpaths and groundwater-surface water interactions in an applied acid mine drainage system. Stable water isotopes, tritium, helium-tritium, sulfur-35, and uranium 234/238 ratios were analyzed from precipitation, groundwater wells, interior mine drainages, and surface waters at the Mary Murphy Mine in Colorado to determine hydrologic transport mechanisms responsible for contaminated zinc releases. Hydrometric measurements suggested a snowmelt-driven pulse of elevated zinc in adit outflow. However, mixing models using stable water isotopes showed a regional groundwater signal in the adit outflow. Tritium values of 11 to 13 TU showed a slight enrichment of bomb spike water compared to snow values of about 9 TU, suggesting an older water source as well. Helium/tritium ratios on a subset of groundwater wells suggested that average residence times of alluvial wells ranged from 2.5 to 8 years. The combination of stable water isotopes and sulfur-35 (half-life of 87 days), showed that zinc-rich waters within the mine derived from infiltrating snowmelt more than a year old. However, measurement of sulfur-35 using low-level scintillation counts was compromised at times by the presence of uranium. We were able to remove the uranium through wet chemistry procedures, improving the accuracy of S-35 measurements. The U234/U238 ratio shows promise in discriminating between acid mine drainage and acid rock drainage. Acid rock drainage shows an unaltered ratio of 1:1, while acid mine drainage is enriched relative to the 1:1 equilibrium ratio. The combination of cosmogenic and stable isotopes within and near the Mary Murphy Mine may provide a useful tool for studying interactions between groundwater and surfacewater in a fractured rock setting. Remediation techniques can be directed more appropriately, and cost effectively, by the characterization of

  20. A Bayesian Approach to Identifying New Risk Factors for Dementia

    PubMed Central

    Wen, Yen-Hsia; Wu, Shihn-Sheng; Lin, Chun-Hung Richard; Tsai, Jui-Hsiu; Yang, Pinchen; Chang, Yang-Pei; Tseng, Kuan-Hua

    2016-01-01

    Abstract Dementia is one of the most disabling and burdensome health conditions worldwide. In this study, we identified new potential risk factors for dementia from nationwide longitudinal population-based data by using Bayesian statistics. We first tested the consistency of the results obtained using Bayesian statistics with those obtained using classical frequentist probability for 4 recognized risk factors for dementia, namely severe head injury, depression, diabetes mellitus, and vascular diseases. Then, we used Bayesian statistics to verify 2 new potential risk factors for dementia, namely hearing loss and senile cataract, determined from the Taiwan's National Health Insurance Research Database. We included a total of 6546 (6.0%) patients diagnosed with dementia. We observed older age, female sex, and lower income as independent risk factors for dementia. Moreover, we verified the 4 recognized risk factors for dementia in the older Taiwanese population; their odds ratios (ORs) ranged from 3.469 to 1.207. Furthermore, we observed that hearing loss (OR = 1.577) and senile cataract (OR = 1.549) were associated with an increased risk of dementia. We found that the results obtained using Bayesian statistics for assessing risk factors for dementia, such as head injury, depression, DM, and vascular diseases, were consistent with those obtained using classical frequentist probability. Moreover, hearing loss and senile cataract were found to be potential risk factors for dementia in the older Taiwanese population. Bayesian statistics could help clinicians explore other potential risk factors for dementia and for developing appropriate treatment strategies for these patients. PMID:27227925

  1. Metal dispersion resulting from mining activities in coastal environments: a pathways approach

    USGS Publications Warehouse

    Koski, Randolph A.

    2012-01-01

    Acid rock drainage (ARD) and disposal of tailings that result from mining activities impact coastal areas in many countries. The dispersion of metals from mine sites that are both proximal and distal to the shoreline can be examined using a pathways approach in which physical and chemical processes guide metal transport in the continuum from sources (sulfide minerals) to bioreceptors (marine biota). Large amounts of metals can be physically transported to the coastal environment by intentional or accidental release of sulfide-bearing mine tailings. Oxidation of sulfide minerals results in elevated dissolved metal concentrations in surface waters on land (producing ARD) and in pore waters of submarine tailings. Changes in pH, adsorption by insoluble secondary minerals (e.g., Fe oxyhydroxides), and precipitation of soluble salts (e.g., sulfates) affect dissolved metal fluxes. Evidence for bioaccumulation includes anomalous metal concentrations in bivalves and reef corals, and overlapping Pb isotope ratios for sulfides, shellfish, and seaweed in contaminated environments. Although bioavailability and potential toxicity are, to a large extent, functions of metal speciation, specific uptake pathways, such as adsorption from solution and ingestion of particles, also play important roles. Recent emphasis on broader ecological impacts has led to complementary methodologies involving laboratory toxicity tests and field studies of species richness and diversity.

  2. Stochastic Modeling Approach for the Evaluation of Backbreak due to Blasting Operations in Open Pit Mines

    NASA Astrophysics Data System (ADS)

    Sari, Mehmet; Ghasemi, Ebrahim; Ataei, Mohammad

    2014-03-01

    Backbreak is an undesirable side effect of bench blasting operations in open pit mines. A large number of parameters affect backbreak, including controllable parameters (such as blast design parameters and explosive characteristics) and uncontrollable parameters (such as rock and discontinuities properties). The complexity of the backbreak phenomenon and the uncertainty in terms of the impact of various parameters makes its prediction very difficult. The aim of this paper is to determine the suitability of the stochastic modeling approach for the prediction of backbreak and to assess the influence of controllable parameters on the phenomenon. To achieve this, a database containing actual measured backbreak occurrences and the major effective controllable parameters on backbreak (i.e., burden, spacing, stemming length, powder factor, and geometric stiffness ratio) was created from 175 blasting events in the Sungun copper mine, Iran. From this database, first, a new site-specific empirical equation for predicting backbreak was developed using multiple regression analysis. Then, the backbreak phenomenon was simulated by the Monte Carlo (MC) method. The results reveal that stochastic modeling is a good means of modeling and evaluating the effects of the variability of blasting parameters on backbreak. Thus, the developed model is suitable for practical use in the Sungun copper mine. Finally, a sensitivity analysis showed that stemming length is the most important parameter in controlling backbreak.

  3. Identifying high-cost patients using data mining techniques and a small set of non-trivial attributes.

    PubMed

    Izad Shenas, Seyed Abdolmotalleb; Raahemi, Bijan; Hossein Tekieh, Mohammad; Kuziemsky, Craig

    2014-10-01

    In this paper, we use data mining techniques, namely neural networks and decision trees, to build predictive models to identify very high-cost patients in the top 5 percentile among the general population. A large empirical dataset from the Medical Expenditure Panel Survey with 98,175 records was used in our study. After pre-processing, partitioning and balancing the data, the refined dataset of 31,704 records was modeled by Decision Trees (including C5.0 and CHAID), and Neural Networks. The performances of the models are analyzed using various measures including accuracy, G-mean, and Area under ROC curve. We concluded that the CHAID classifier returns the best G-mean and AUC measures for top performing predictive models ranging from 76% to 85%, and 0.812 to 0.942 units, respectively. We also identify a small set of 5 non-trivial attributes among a primary set of 66 attributes to identify the top 5% of the high cost population. The attributes are the individual׳s overall health perception, age, history of blood cholesterol check, history of physical/sensory/mental limitations, and history of colonic prevention measures. The small set of attributes are what we call non-trivial and does not include visits to care providers, doctors or hospitals, which are highly correlated with expenditures and does not offer new insight to the data. The results of this study can be used by healthcare data analysts, policy makers, insurer, and healthcare planners to improve the delivery of health services.

  4. Soil quality assessment using GIS-based chemometric approach and pollution indices: Nakhlak mining district, Central Iran.

    PubMed

    Moore, Farid; Sheykhi, Vahideh; Salari, Mohammad; Bagheri, Adel

    2016-04-01

    This paper is a comprehensive assessment of the quality of soil in the Nakhlak mining district in Central Iran with special reference to potentially toxic metals. In this regard, an integrated approach involving geostatistical, correlation matrix, pollution indices, and chemical fractionation measurement is used to evaluate selected potentially toxic metals in soil samples. The fractionation of metals indicated a relatively high variability. Some metals (Mo, Ag, and Pb) showed important enrichment in the bioavailable fractions (i.e., exchangeable and carbonate), whereas the residual fraction mostly comprised Sb and Cr. The Cd, Zn, Co, Ni, Mo, Cu, and As were retained in Fe-Mn oxide and oxidizable fractions, suggesting that they may be released to the environment by changes in physicochemical conditions. The spatial variability patterns of 11 soil heavy metals (Ag, As, Cd, Co, Cr, Cu, Mo, Ni, Pb, Sb, and Zn) were identified and mapped. The results demonstrated that Ag, As, Cd, Mo, Cu, Pb, Sb, and Zn pollution are associated with mineralized veins and mining operations in this area. Further environmental monitoring and remedial actions are required for management of soil heavy metals in the study area. The present study not only enhanced our knowledge regarding soil pollution in the study area but also introduced a better technique to analyze pollution indices by multivariate geostatistical methods.

  5. Implementation of Predictive Data Mining Techniques for Identifying Risk Factors of Early AVF Failure in Hemodialysis Patients

    PubMed Central

    Rezapour, Mohammad; Khavanin Zadeh, Morteza; Sepehri, Mohammad Mehdi

    2013-01-01

    Arteriovenous fistula (AVF) is an important vascular access for hemodialysis (HD) treatment but has 20–60% rate of early failure. Detecting association between patient's parameters and early AVF failure is important for reducing its prevalence and relevant costs. Also predicting incidence of this complication in new patients is a beneficial controlling procedure. Patient safety and preservation of early AVF failure is the ultimate goal. Our research society is Hasheminejad Kidney Center (HKC) of Tehran, which is one of Iran's largest renal hospitals. We analyzed data of 193 HD patients using supervised techniques of data mining approach. There were 137 male (70.98%) and 56 female (29.02%) patients introduced into this study. The average of age for all the patients was 53.87 ± 17.47 years. Twenty eight patients had smoked and the number of diabetic patients and nondiabetics was 87 and 106, respectively. A significant relationship was found between “diabetes mellitus,” “smoking,” and “hypertension” with early AVF failure in this study. We have found that these mentioned risk factors have important roles in outcome of vascular surgery, versus other parameters such as “age.” Then we predicted this complication in future AVF surgeries and evaluated our designed prediction methods with accuracy rates of 61.66%–75.13%. PMID:23861725

  6. Data mining with molecular design rules identifies new class of dyes for dye-sensitised solar cells.

    PubMed

    Cole, Jacqueline M; Low, Kian Sing; Ozoe, Hiroaki; Stathi, Panagiota; Kitamura, Chitoshi; Kurata, Hiroyuki; Rudolf, Petra; Kawase, Takeshi

    2014-12-28

    A major deficit in suitable dyes is stifling progress in the dye-sensitised solar cell (DSC) industry. Materials discovery strategies have afforded numerous new dyes; yet, corresponding solution-based DSC device performance has little improved upon 11% efficiency, achieved using the N719 dye over two decades ago. Research on these dyes has nevertheless revealed relationships between the molecular structure of dyes and their associated DSC efficiency. Here, such structure-property relationships have been codified in the form of molecular dye design rules, which have been judiciously sequenced in an algorithm to enable large-scale data mining of dye structures with optimal DSC performance. This affords, for the first time, a DSC-specific dye-discovery strategy that predicts new classes of dyes from surveying a representative set of chemical space. A lead material from these predictions is experimentally validated, showing DSC efficiency that is comparable to many well-known organic dyes. This demonstrates the power of this approach.

  7. An Approach to Identify Site Response Directivity of Accelerometer Sites and Application to the Iranian Area

    NASA Astrophysics Data System (ADS)

    Del Gaudio, Vincenzo; Pierri, Pierpaolo; Rajabi, Ali M.

    2015-06-01

    In recent years, several workers have found numerous cases of sites characterised by significant azimuthal variation of dynamic response to seismic shaking. The causes of this phenomenon are still unclear, but are possibly related to combinations of geological and geomorphological factors determining a polarisation of resonance effects. To improve their comprehension, it would be desirable to extend the database of observations on this phenomenon. Thus, considering that unrevealed cases of site response directivity can be "hidden" among the sites of accelerometer networks, we developed a two-stage approach of data mining from existing strong motion databases to identify sites affected by directional amplification. The proposed procedure first calculates Arias Intensity tensor components from accelerometer recordings of each site to determine mean directional variations of total shaking energy. Then, at the sites where a significant anisotropy appears in ground motion, azimuthal variations of HVSR values (spectral ratios between horizontal and vertical components of recordings) are analysed to confirm the occurrence of site resonance conditions. We applied this technique to a database of recordings acquired by accelerometer stations in the Iranian area. The results of this investigation pointed out some sites affected by directional resonance that appear to be correlated to the orientation of local tectonic lineaments, these being mostly transversal to the direction of maximum shaking. Comparing Arias Intensities observed at these sites with theoretical estimates provided by ground motion prediction equations, the presence of significant site amplifications was confirmed. The magnitude of the amplification factors appear to be correlated to the results of HVSR analysis, even though the pattern of dispersion of HVSR values suggests that while high peak values of spectral ratios are indicative of strong amplifications, lower values do not necessarily imply lower

  8. Knowledge Discovery using Domain-Concept Mining Approach for the Behavioral Risk Factor Surveillance System (BRFSS) Data

    PubMed Central

    Mahamaneerat, Wannapa Kay; Shyu, Chi-Ren

    2006-01-01

    The publicly available Behavioral Risk Factor Surveillance System (BRFSS) data is the largest telephone survey data set in the world. Often times, the data set is under-utilized due to its size and the difficulties to comprehend and explore the relationships among variables. With a traditional data mining approach, such as association rule (AR) mining, it is still not possible to discover valuable information under the existing computational power. To promote the usefulness of this rich data set efficiently, we propose a novel data mining approach called Domain-Concept Mining (DCM) that partitions data into groups of relevant domain-concept, then extracts associations among variables from each partition. The findings from the DCM show that it can efficiently discover relevant information from the BRFSS with respect to the previously published literature. PMID:17238640

  9. A Control Chart Approach for Representing and Mining Data Streams with Shape Based Similarity

    SciTech Connect

    Omitaomu, Olufemi A

    2014-01-01

    The mining of data streams for online condition monitoring is a challenging task in several domains including (electric) power grid system, intelligent manufacturing, and consumer science. Considering a power grid application in which thousands of sensors, called the phasor measurement units, are deployed on the power grid network to continuously collect streams of digital data for real-time situational awareness and system management. Depending on design, each sensor could stream between ten and sixty data samples per second. The myriad of sensory data captured could convey deeper insights about sequence of events in real-time and before major damages are done. However, the timely processing and analysis of these high-velocity and high-volume data streams is a challenge. Hence, a new data processing and transformation approach, based on the concept of control charts, for representing sequence of data streams from sensors is proposed. In addition, an application of the proposed approach for enhancing data mining tasks such as clustering using real-world power grid data streams is presented. The results indicate that the proposed approach is very efficient for data streams storage and manipulation.

  10. The adaptive approach for storage assignment by mining data of warehouse management system for distribution centres

    NASA Astrophysics Data System (ADS)

    Ming-Huang Chiang, David; Lin, Chia-Ping; Chen, Mu-Chen

    2011-05-01

    Among distribution centre operations, order picking has been reported to be the most labour-intensive activity. Sophisticated storage assignment policies adopted to reduce the travel distance of order picking have been explored in the literature. Unfortunately, previous research has been devoted to locating entire products from scratch. Instead, this study intends to propose an adaptive approach, a Data Mining-based Storage Assignment approach (DMSA), to find the optimal storage assignment for newly delivered products that need to be put away when there is vacant shelf space in a distribution centre. In the DMSA, a new association index (AIX) is developed to evaluate the fitness between the put away products and the unassigned storage locations by applying association rule mining. With AIX, the storage location assignment problem (SLAP) can be formulated and solved as a binary integer programming. To evaluate the performance of DMSA, a real-world order database of a distribution centre is obtained and used to compare the results from DMSA with a random assignment approach. It turns out that DMSA outperforms random assignment as the number of put away products and the proportion of put away products with high turnover rates increase.

  11. VALUING ACID MINE DRAINAGE REMEDIATION OF IMPAIRED WATERWAYS IN WEST VIRGINIA: A HEDONIC MODELING APPROACH

    EPA Science Inventory

    States with active and abandoned mines face large private and public costs to remediate damage to streams and rivers from acid mine drainage (AMD), the metal rich runoff flowing primarily from abandoned mines and surface deposits of mine waste. AMD can lower stream and river pH ...

  12. A multi-disciplinary approach to understanding the impacts of mines on traditional uses of water in Northern Mongolia.

    PubMed

    McIntyre, Neil; Bulovic, Nevenka; Cane, Isabel; McKenna, Phill

    2016-07-01

    Mongolia is an example of a nation where the rapidity of mining development is outpacing capacity to manage the potential land and water resources impacts. Further, Mongolia has a particular social and economic reliance on traditional uses of land and water, principally livestock herding. While some mining operations are setting high standards in protecting the natural resources surrounding the mine site, others have less incentive and capacity to do so and therefore are having adverse effects on surrounding communities. The paper describes a case study of the Sharyn Gol Soum in northern Mongolia where a range of mining types, from artisanal, small-scale mining to a large coal mine, operate alongside traditional herding lifestyles. A multi-disciplinary approach is taken to observe and attribute causes to the water resources impacts in the area. Surveys of the herding household community, land use mapping, and monitoring the spatial variations in water quality indicate deterioration of water resources. Collectively, the different sources of evidence suggest that the deterioration is mainly due to small-scale gold mining. The evidence included the perception of 78% of the interviewed herders that water quality had changed due to mining; a change in the footprint of small-scale gold mining from 2.8 to 15.2km(2) during the period 1999 to 2015; and pH and sulphate values in 2015 consistently outside the ranges observed at a baseline site in the same region. It is concluded that the lack of baseline data and effective governance mechanisms are fundamental challenges that need to be addressed if Mongolia's transition to a mining economy is to be managed alongside sustainability of herder lifestyles.

  13. A Data Mining Approach to Predict In Situ Detoxification Potential of Chlorinated Ethenes.

    PubMed

    Lee, Jaejin; Im, Jeongdae; Kim, Ungtae; Löffler, Frank E

    2016-05-17

    Despite advances in physicochemical remediation technologies, in situ bioremediation treatment based on Dehalococcoides mccartyi (Dhc) reductive dechlorination activity remains a cornerstone approach to remedy sites impacted with chlorinated ethenes. Selecting the best remedial strategy is challenging due to uncertainties and complexity associated with biological and geochemical factors influencing Dhc activity. Guidelines based on measurable biogeochemical parameters have been proposed, but contemporary efforts fall short of meaningfully integrating the available information. Extensive groundwater monitoring data sets have been collected for decades, but have not been systematically analyzed and used for developing tools to guide decision-making. In the present study, geochemical and microbial data sets collected from 35 wells at five contaminated sites were used to demonstrate that a data mining prediction model using the classification and regression tree (CART) algorithm can provide improved predictive understanding of a site's reductive dechlorination potential. The CART model successfully predicted the 3-month-ahead reductive dechlorination potential with 75.8% and 69.5% true positive rate (i.e., sensitivity) for the training set and the test set, respectively. The machine learning algorithm ranked parameters by relative importance for assessing in situ reductive dechlorination potential. The abundance of Dhc 16S rRNA genes, CH4, Fe(2+), NO3(-), NO2(-), and SO4(2-) concentrations, total organic carbon (TOC) amounts, and oxidation-reduction potential (ORP) displayed significant correlations (p < 0.01) with dechlorination potential, with NO3(-), NO2(-), and Fe(2+) concentrations exhibiting precedence over other parameters. Contrary to prior efforts, the power of data mining approaches lies in the ability to discern synergetic effects between multiple parameters that affect reductive dechlorination activity. Overall, these findings demonstrate that data mining

  14. A data mining approach to optimize pellets manufacturing process based on a decision tree algorithm.

    PubMed

    Ronowicz, Joanna; Thommes, Markus; Kleinebudde, Peter; Krysiński, Jerzy

    2015-06-20

    The present study is focused on the thorough analysis of cause-effect relationships between pellet formulation characteristics (pellet composition as well as process parameters) and the selected quality attribute of the final product. The shape using the aspect ratio value expressed the quality of pellets. A data matrix for chemometric analysis consisted of 224 pellet formulations performed by means of eight different active pharmaceutical ingredients and several various excipients, using different extrusion/spheronization process conditions. The data set contained 14 input variables (both formulation and process variables) and one output variable (pellet aspect ratio). A tree regression algorithm consistent with the Quality by Design concept was applied to obtain deeper understanding and knowledge of formulation and process parameters affecting the final pellet sphericity. The clear interpretable set of decision rules were generated. The spehronization speed, spheronization time, number of holes and water content of extrudate have been recognized as the key factors influencing pellet aspect ratio. The most spherical pellets were achieved by using a large number of holes during extrusion, a high spheronizer speed and longer time of spheronization. The described data mining approach enhances knowledge about pelletization process and simultaneously facilitates searching for the optimal process conditions which are necessary to achieve ideal spherical pellets, resulting in good flow characteristics. This data mining approach can be taken into consideration by industrial formulation scientists to support rational decision making in the field of pellets technology.

  15. Using Data Mining and Computational Approaches to Study Intermediate Filament Structure and Function.

    PubMed

    Parry, David A D

    2016-01-01

    Experimental and theoretical research aimed at determining the structure and function of the family of intermediate filament proteins has made significant advances over the past 20 years. Much of this has either contributed to or relied on the amino acid sequence databases that are now available online, and the data mining approaches that have been developed to analyze these sequences. As the quality of sequence data is generally high, it follows that it is the design of the computational and graphical methodologies that are of especial importance to researchers who aspire to gain a greater understanding of those sequence features that specify both function and structural hierarchy. However, these techniques are necessarily subject to limitations and it is important that these be recognized. In addition, no single method is likely to be successful in solving a particular problem, and a coordinated approach using a suite of methods is generally required. A final step in the process involves the interpretation of the results obtained and the construction of a working model or hypothesis that suggests further experimentation. While such methods allow meaningful progress to be made it is still important that the data are interpreted correctly and conservatively. New data mining methods are continually being developed, and it can be expected that even greater understanding of the relationship between structure and function will be gleaned from sequence data in the coming years.

  16. Constraint-based control of boiler efficiency: A data-mining approach

    SciTech Connect

    Song, Z.; Kusiak, A.

    2007-02-15

    In this paper, a data-mining approach is used to develop a model for optimizing the efficiency of an electric-utility boiler subject to operating constraints. Selection of process variables to optimize combustion efficiency is discussed. The selected variables are critical for control of combustion efficiency of a coal-fired boiler in the presence of operating constraints. Two schemes of generating control settings and updating control variables are evaluated. One scheme is based on the controllable and noncontrollable variables. The second one incorporates response variables into the clustering process. The process control scheme based on the response variables produces the smallest variance of the target variable due to reduced coupling among the process variables. An industrial case study, and its implementation illustrate the control approach developed in this paper.

  17. Citation Mining: Integrating Text Mining and Bibliometrics for Research User Profiling.

    ERIC Educational Resources Information Center

    Kostoff, Ronald N.; del Rio, J. Antonio; Humenik, James A.; Garcia, Esther Ofilia; Ramirez, Ana Maria

    2001-01-01

    Discusses the importance of identifying the users and impact of research, and describes an approach for identifying the pathways through which research can impact other research, technology development, and applications. Describes a study that used citation mining, an integration of citation bibliometrics and text mining, on articles from the…

  18. Order Batching in Warehouses by Minimizing Total Tardiness: A Hybrid Approach of Weighted Association Rule Mining and Genetic Algorithms

    PubMed Central

    Taheri, Shahrooz; Mat Saman, Muhamad Zameri; Wong, Kuan Yew

    2013-01-01

    One of the cost-intensive issues in managing warehouses is the order picking problem which deals with the retrieval of items from their storage locations in order to meet customer requests. Many solution approaches have been proposed in order to minimize traveling distance in the process of order picking. However, in practice, customer orders have to be completed by certain due dates in order to avoid tardiness which is neglected in most of the related scientific papers. Consequently, we proposed a novel solution approach in order to minimize tardiness which consists of four phases. First of all, weighted association rule mining has been used to calculate associations between orders with respect to their due date. Next, a batching model based on binary integer programming has been formulated to maximize the associations between orders within each batch. Subsequently, the order picking phase will come up which used a Genetic Algorithm integrated with the Traveling Salesman Problem in order to identify the most suitable travel path. Finally, the Genetic Algorithm has been applied for sequencing the constructed batches in order to minimize tardiness. Illustrative examples and comparisons are presented to demonstrate the proficiency and solution quality of the proposed approach. PMID:23864823

  19. Order batching in warehouses by minimizing total tardiness: a hybrid approach of weighted association rule mining and genetic algorithms.

    PubMed

    Azadnia, Amir Hossein; Taheri, Shahrooz; Ghadimi, Pezhman; Saman, Muhamad Zameri Mat; Wong, Kuan Yew

    2013-01-01

    One of the cost-intensive issues in managing warehouses is the order picking problem which deals with the retrieval of items from their storage locations in order to meet customer requests. Many solution approaches have been proposed in order to minimize traveling distance in the process of order picking. However, in practice, customer orders have to be completed by certain due dates in order to avoid tardiness which is neglected in most of the related scientific papers. Consequently, we proposed a novel solution approach in order to minimize tardiness which consists of four phases. First of all, weighted association rule mining has been used to calculate associations between orders with respect to their due date. Next, a batching model based on binary integer programming has been formulated to maximize the associations between orders within each batch. Subsequently, the order picking phase will come up which used a Genetic Algorithm integrated with the Traveling Salesman Problem in order to identify the most suitable travel path. Finally, the Genetic Algorithm has been applied for sequencing the constructed batches in order to minimize tardiness. Illustrative examples and comparisons are presented to demonstrate the proficiency and solution quality of the proposed approach.

  20. Use of Lead Isotopes to Identify Sources of Metal and Metalloid Contaminants in Atmospheric Aerosol from Mining Operations

    PubMed Central

    Félix, Omar I.; Csavina, Janae; Field, Jason; Rine, Kyle P.; Sáez, A. Eduardo; Betterton, Eric A.

    2014-01-01

    Mining operations are a potential source of metal and metalloid contamination by atmospheric particulate generated from smelting activities, as well as from erosion of mine tailings. In this work, we show how lead isotopes can be used for source apportionment of metal and metalloid contaminants from the site of an active copper mine. Analysis of atmospheric aerosol shows two distinct isotopic signatures: one prevalent in fine particles (< 1 μm aerodynamic diameter) while the other corresponds to coarse particles as well as particles in all size ranges from a nearby urban environment. The lead isotopic ratios found in the fine particles are equal to those of the mine that provides the ore to the smelter. Topsoil samples at the mining site show concentrations of Pb and As decreasing with distance from the smelter. Isotopic ratios for the sample closest to the smelter (650 m) and from topsoil at all sample locations, extending to more than 1 km from the smelter, were similar to those found in fine particles in atmospheric dust. The results validate the use of lead isotope signatures for source apportionment of metal and metalloid contaminants transported by atmospheric particulate. PMID:25496740

  1. A Comparison of Educational Statistics and Data Mining Approaches to Identify Characteristics That Impact Online Learning

    ERIC Educational Resources Information Center

    Miller, L. Dee; Soh, Leen-Kiat; Samal, Ashok; Kupzyk, Kevin; Nugent, Gwen

    2015-01-01

    Learning objects (LOs) are important online resources for both learners and instructors and usage for LOs is growing. Automatic LO tracking collects large amounts of metadata about individual students as well as data aggregated across courses, learning objects, and other demographic characteristics (e.g. gender). The challenge becomes identifying…

  2. The first step in the development of text mining technology for cancer risk assessment: identifying and organizing scientific evidence in risk assessment literature

    PubMed Central

    Korhonen, Anna; Silins, Ilona; Sun, Lin; Stenius, Ulla

    2009-01-01

    Background One of the most neglected areas of biomedical Text Mining (TM) is the development of systems based on carefully assessed user needs. We have recently investigated the user needs of an important task yet to be tackled by TM -- Cancer Risk Assessment (CRA). Here we take the first step towards the development of TM technology for the task: identifying and organizing the scientific evidence required for CRA in a taxonomy which is capable of supporting extensive data gathering from biomedical literature. Results The taxonomy is based on expert annotation of 1297 abstracts downloaded from relevant PubMed journals. It classifies 1742 unique keywords found in the corpus to 48 classes which specify core evidence required for CRA. We report promising results with inter-annotator agreement tests and automatic classification of PubMed abstracts to taxonomy classes. A simple user test is also reported in a near real-world CRA scenario which demonstrates along with other evaluation that the resources we have built are well-defined, accurate, and applicable in practice. Conclusion We present our annotation guidelines and a tool which we have designed for expert annotation of PubMed abstracts. A corpus annotated for keywords and document relevance is also presented, along with the taxonomy which organizes the keywords into classes defining core evidence for CRA. As demonstrated by the evaluation, the materials we have constructed provide a good basis for classification of CRA literature along multiple dimensions. They can support current manual CRA as well as facilitate the development of an approach based on TM. We discuss extending the taxonomy further via manual and machine learning approaches and the subsequent steps required to develop TM technology for the needs of CRA. PMID:19772619

  3. A data mining approach to predict in situ chlorinated ethene detoxification potential

    NASA Astrophysics Data System (ADS)

    Lee, J.; Im, J.; Kim, U.; Loeffler, F. E.

    2015-12-01

    Despite major advances in physicochemical remediation technologies, in situ biostimulation and bioaugmentation treatment aimed at stimulating Dehalococcoides mccartyi (Dhc) reductive dechlorination activity remains a cornerstone approach to remedy sites impacted with chlorinated ethenes. In practice, selecting the best remedial strategy is challenging due to uncertainties associated with the microbiology (e.g., presence and activity of Dhc) and geochemical factors influencing Dhc activity. Extensive groundwater datasets collected over decades of monitoring exist, but have not been systematically analyzed. In the present study, geochemical and microbial data sets collected from 35 wells at 5 contaminated sites were used to develop a predictive empirical model using a machine learning algorithm (i) to rank the relative importance of parameters that affect in situ reductive dechlorination potential, and (ii) to provide recommendations for selecting the optimal remediation strategy at a specific site. Classification and regression tree (CART) analysis was applied, and a representative classification tree model was developed that allowed short-term prediction of dechlorination potential. Indirect indicators for low dissolved oxygen (e.g., low NO3-and NO2-, high Fe2+ and CH4) were the most influential factors for predicting dechlorination potential, followed by total organic carbon content (TOC) and Dhc cell abundance. These findings indicate that machine learning-based data mining techniques applied to groundwater monitoring data can lead to the development of predictive groundwater remediation models. A major need for improving the predictive capabilities of the data mining approach is a curated, up-to-date and comprehensive collection of groundwater monitoring data.

  4. Three-dimensional organic Dirac-line materials due to nonsymmorphic symmetry: A data mining approach

    NASA Astrophysics Data System (ADS)

    Geilhufe, R. Matthias; Bouhon, Adrien; Borysov, Stanislav S.; Balatsky, Alexander V.

    2017-01-01

    A data mining study of electronic Kohn-Sham band structures was performed to identify Dirac materials within the Organic Materials Database. Out of that, the three-dimensional organic crystal 5,6-bis(trifluoromethyl)-2-methoxy-1 H -1,3-diazepine was found to host different Dirac-line nodes within the band structure. From a group theoretical analysis, it is possible to distinguish between Dirac-line nodes occurring due to twofold degenerate energy levels protected by the monoclinic crystalline symmetry and twofold degenerate accidental crossings protected by the topology of the electronic band structure. The obtained results can be generalized to all materials having the space group P 21/c (No. 14, C2h 5) by introducing three distinct topological classes.

  5. Integrating Communication into Engineering Curricula: An Interdisciplinary Approach to Facilitating Transfer at New Mexico Institute of Mining and Technology

    ERIC Educational Resources Information Center

    Ford, Julie Dyke

    2012-01-01

    This program profile describes a new approach towards integrating communication within Mechanical Engineering curricula. The author, who holds a joint appointment between Technical Communication and Mechanical Engineering at New Mexico Institute of Mining and Technology, has been collaborating with Mechanical Engineering colleagues to establish a…

  6. An Approach to Developing Independent Learning and Non-Technical Skills Amongst Final Year Mining Engineering Students

    ERIC Educational Resources Information Center

    Knobbs, C. G.; Grayson, D. J.

    2012-01-01

    There is mounting evidence to show that engineers need more than technical skills to succeed in industry. This paper describes a curriculum innovation in which so-called "soft" skills, specifically inter-personal and intra-personal skills, were integrated into a final year mining engineering course. The instructional approach was…

  7. Stream Response to Storm Events Downstream of Mine Tailings: Identifying Contaminant Sources Using Hydrograph Separation and Stream Chemistry

    NASA Astrophysics Data System (ADS)

    Holmes, J.; Renshaw, C. E.; Feng, X.

    2001-05-01

    Quantifying sources of contamination is paramount to good remediation plans at abandoned mine sites. We collected surface water samples from Copperas Brook, a second order stream draining over 16 ha (40 acres) of mine tailings from the abandoned Elizabeth Copper Mine in east central Vermont. Streamflow exhibits a rapid response to rain events. Hydrograph separations using oxygen isotopes consistently indicate considerably higher percentages of new water during rain events compared to a nearby control catchment and to other northeastern U.S. catchments. We attribute most of the new water to direct precipitation on low-infiltration hardpans at the base of the mine tailings, as well as to direct precipitation on to the stream channel itself. In stormflow, base cations (Ca, Mg, Na, K) are diluted, consistent with other studies. By contrast, heavy metal concentrations (Cu, Zn, Cd, Co) increase by up to an order of magnitude. Other studies have suggested that the increased metals in stormflow may be the result of rapid dissolution and transport of the soluble efflorescent sulfate minerals coating the hardpans. Copperas Brook could be highly susceptible to this process given the high percentage of new water in its stormflow. However, multiple regression of stormflow chemical source end-members shows that neither dissolved sulfur salts nor groundwater seeps from the major tailings pile are primarily responsible for the increased metals concentrations at this site. Rather, the majority of heavy metals derive from an isolated 2 ha (5 acres) tailings pile via a pathway that is not connected with the major tailings. This may have profound implications for prioritizing the remediation of this site.

  8. The Voice of Chinese Health Consumers: A Text Mining Approach to Web-Based Physician Reviews

    PubMed Central

    Zhang, Kunpeng

    2016-01-01

    experience of finding doctors, doctors’ technical skills and bedside manner, general appreciation from patients, and description of various symptoms. Conclusions To the best of our knowledge, our work is the first study using an automated text-mining approach to analyze a large amount of unstructured textual data of Web-based physician reviews in China. Based on our analysis, we found that Chinese reviewers mainly concentrate on a few popular topics. This is consistent with the goal of Chinese online health platforms and demonstrates the health care focus in China’s health care system. Our text-mining approach reveals a new research area on how to use big data to help health care providers, health care administrators, and policy makers hear patient voices, target patient concerns, and improve the quality of care in this age of patient-centered care. Also, on the health care consumer side, our text mining technique helps patients make more informed decisions about which specialists to see without reading thousands of reviews, which is simply not feasible. In addition, our comparison analysis of Web-based physician reviews in China and the United States also indicates some cultural differences. PMID:27165558

  9. Correlation of HIV protease structure with Indinavir resistance: a data mining and neural networks approach

    NASA Astrophysics Data System (ADS)

    Draghici, Sorin; Cumberland, Lonnie T., Jr.; Kovari, Ladislau C.

    2000-04-01

    This paper presents some results of data mining HIV genotypic and structural data. Our aim is to try to relate structural features of HIV enzymes essential to its reproductive abilities to the drug resistance phenomenon. This paper concentrates on the HIV protease enzyme and Indinavir which is one of the FDA approved protease inhibitors. Our starting point was the current list of HIV mutations related to drug resistance. We used the fact that some molecular structures determined through high resolution X-ray crystallography were available for the protease-Indinavir complex. Starting with these structures and the known mutations, we modelled the mutant proteases and studied the pattern of atomic contacts between the protease and the drug. After suitable pre- processing, these patterns have been used as the input of our data mining process. We have used both supervised and unsupervised learning techniques with the aim of understanding the relationship between structural features at a molecular level and resistance to Indinavir. The supervised learning was aimed at predicting IC90 values for arbitrary mutants. The SOFM was aimed at identifying those structural features that are important for drug resistance and discovering a classifier based on such features. We have used validation and cross validation to test the generalization abilities of the learning paradigm we have designed. The straightforward supervised learning was able to learn very successfully but validation results are less than satisfactory. This is due to the insufficient number of patterns in the training set which in turn is due to the scarcity of the available data. The data mining using SOFM was very successful. We have managed to distinguish between resistant and non-resistant mutants using structural features. We have been able to divide all reported HIV mutants into several categories based on their 3- dimensional molecular structures and the pattern of contacts between the mutant protease and

  10. A Proteomic Approach to Identify Phosphorylation-Dependent Targets of BRCT Domains

    DTIC Science & Technology

    2007-03-01

    AD_________________ Award Number: W81XWH-05-1-0233 TITLE: A Proteomic Approach to...COVERED (From - To) 1 Mar 2006 – 28 Feb 2007 5a. CONTRACT NUMBER A Proteomic Approach to Identify Phosphorylation-Dependent Targets of BRCT...this award) BRCT domain, peptide library, OPAL, peptide array, proteomics , genome wide, signal transduction pathways, androgen receptor 16. SECURITY

  11. Identifying Low-Effort Examinees on Student Learning Outcomes Assessment: A Comparison of Two Approaches

    ERIC Educational Resources Information Center

    Rios, Joseph A.; Liu, Ou Lydia; Bridgeman, Brent

    2014-01-01

    This chapter describes a study that compares two approaches (self-reported effort [SRE] and response time effort [RTE]) for identifying low-effort examinees in student learning outcomes assessment. Although both approaches equally discriminated from measures of ability (e.g., SAT scores), RTE was found to have a stronger relationship with test…

  12. Identifying diagnostically-relevant resting state brain functional connectivity in the ventral posterior complex via genetic data mining in autism spectrum disorder.

    PubMed

    Baldwin, Philip R; Curtis, Kaylah N; Patriquin, Michelle A; Wolf, Varina; Viswanath, Humsini; Shaw, Chad; Sakai, Yasunari; Salas, Ramiro

    2016-05-01

    Exome sequencing and copy number variation analyses continue to provide novel insight to the biological bases of autism spectrum disorder (ASD). The growing speed at which massive genetic data are produced causes serious lags in analysis and interpretation of the data. Thus, there is a need to develop systematic genetic data mining processes that facilitate efficient analysis of large datasets. We report a new genetic data mining system, ProcessGeneLists and integrated a list of ASD-related genes with currently available resources in gene expression and functional connectivity of the human brain. Our data-mining program successfully identified three primary regions of interest (ROIs) in the mouse brain: inferior colliculus, ventral posterior complex of the thalamus (VPC), and parafascicular nucleus (PFn). To understand its pathogenic relevance in ASD, we examined the resting state functional connectivity (RSFC) of the homologous ROIs in human brain with other brain regions that were previously implicated in the neuro-psychiatric features of ASD. Among them, the RSFC of the VPC with the medial frontal gyrus (MFG) was significantly more anticorrelated, whereas the RSFC of the PN with the globus pallidus was significantly increased in children with ASD compared with healthy children. Moreover, greater values of RSFC between VPC and MFG were correlated with severity index and repetitive behaviors in children with ASD. No significant RSFC differences were detected in adults with ASD. Together, these data demonstrate the utility of our data-mining program through identifying the aberrant connectivity of thalamo-cortical circuits in children with ASD. Autism Res 2016, 9: 553-562. © 2015 International Society for Autism Research, Wiley Periodicals, Inc.

  13. Model-based approach to the detection and classification of mines in sidescan sonar.

    PubMed

    Reed, Scott; Petillot, Yvan; Bell, Judith

    2004-01-10

    This paper presents a model-based approach to mine detection and classification by use of sidescan sonar. Advances in autonomous underwater vehicle technology have increased the interest in automatic target recognition systems in an effort to automate a process that is currently carried out by a human operator. Current automated systems generally require training and thus produce poor results when the test data set is different from the training set. This has led to research into unsupervised systems, which are able to cope with the large variability in conditions and terrains seen in sidescan imagery. The system presented in this paper first detects possible minelike objects using a Markov random field model, which operates well on noisy images, such as sidescan, and allows a priori information to be included through the use of priors. The highlight and shadow regions of the object are then extracted with a cooperating statistical snake, which assumes these regions are statistically separate from the background. Finally, a classification decision is made using Dempster-Shafer theory, where the extracted features are compared with synthetic realizations generated with a sidescan sonar simulator model. Results for the entire process are shown on real sidescan sonar data. Similarities between the sidescan sonar and synthetic aperture radar (SAR) imaging processes ensure that the approach outlined here could be made applied to SAR image analysis.

  14. Optimizing data collection for public health decisions: a data mining approach

    PubMed Central

    2014-01-01

    Background Collecting data can be cumbersome and expensive. Lack of relevant, accurate and timely data for research to inform policy may negatively impact public health. The aim of this study was to test if the careful removal of items from two community nutrition surveys guided by a data mining technique called feature selection, can (a) identify a reduced dataset, while (b) not damaging the signal inside that data. Methods The Nutrition Environment Measures Surveys for stores (NEMS-S) and restaurants (NEMS-R) were completed on 885 retail food outlets in two counties in West Virginia between May and November of 2011. A reduced dataset was identified for each outlet type using feature selection. Coefficients from linear regression modeling were used to weight items in the reduced datasets. Weighted item values were summed with the error term to compute reduced item survey scores. Scores produced by the full survey were compared to the reduced item scores using a Wilcoxon rank-sum test. Results Feature selection identified 9 store and 16 restaurant survey items as significant predictors of the score produced from the full survey. The linear regression models built from the reduced feature sets had R2 values of 92% and 94% for restaurant and grocery store data, respectively. Conclusions While there are many potentially important variables in any domain, the most useful set may only be a small subset. The use of feature selection in the initial phase of data collection to identify the most influential variables may be a useful tool to greatly reduce the amount of data needed thereby reducing cost. PMID:24919484

  15. A network-based approach for semi-quantitative knowledge mining and its application to yield variability

    NASA Astrophysics Data System (ADS)

    Schauberger, Bernhard; Rolinski, Susanne; Müller, Christoph

    2016-12-01

    Variability of crop yields is detrimental for food security. Under climate change its amplitude is likely to increase, thus it is essential to understand the underlying causes and mechanisms. Crop models are the primary tool to project future changes in crop yields under climate change. A systematic overview of drivers and mechanisms of crop yield variability (YV) can thus inform crop model development and facilitate improved understanding of climate change impacts on crop yields. Yet there is a vast body of literature on crop physiology and YV, which makes a prioritization of mechanisms for implementation in models challenging. Therefore this paper takes on a novel approach to systematically mine and organize existing knowledge from the literature. The aim is to identify important mechanisms lacking in models, which can help to set priorities in model improvement. We structure knowledge from the literature in a semi-quantitative network. This network consists of complex interactions between growing conditions, plant physiology and crop yield. We utilize the resulting network structure to assign relative importance to causes of YV and related plant physiological processes. As expected, our findings confirm existing knowledge, in particular on the dominant role of temperature and precipitation, but also highlight other important drivers of YV. More importantly, our method allows for identifying the relevant physiological processes that transmit variability in growing conditions to variability in yield. We can identify explicit targets for the improvement of crop models. The network can additionally guide model development by outlining complex interactions between processes and by easily retrieving quantitative information for each of the 350 interactions. We show the validity of our network method as a structured, consistent and scalable dictionary of literature. The method can easily be applied to many other research fields.

  16. A Network Biology Approach Identifies Molecular Cross-Talk between Normal Prostate Epithelial and Prostate Carcinoma Cells

    PubMed Central

    Trevino, Victor; Cassese, Alberto; Nagy, Zsuzsanna; Zhuang, Xiaodong; Herbert, John; Antzack, Philipp; Clarke, Kim; Davies, Nicholas; Rahman, Ayesha; Campbell, Moray J.; Bicknell, Roy; Vannucci, Marina; Falciani, Francesco

    2016-01-01

    Abstract The advent of functional genomics has enabled the genome-wide characterization of the molecular state of cells and tissues, virtually at every level of biological organization. The difficulty in organizing and mining this unprecedented amount of information has stimulated the development of computational methods designed to infer the underlying structure of regulatory networks from observational data. These important developments had a profound impact in biological sciences since they triggered the development of a novel data-driven investigative approach. In cancer research, this strategy has been particularly successful. It has contributed to the identification of novel biomarkers, to a better characterization of disease heterogeneity and to a more in depth understanding of cancer pathophysiology. However, so far these approaches have not explicitly addressed the challenge of identifying networks representing the interaction of different cell types in a complex tissue. Since these interactions represent an essential part of the biology of both diseased and healthy tissues, it is of paramount importance that this challenge is addressed. Here we report the definition of a network reverse engineering strategy designed to infer directional signals linking adjacent cell types within a complex tissue. The application of this inference strategy to prostate cancer genome-wide expression profiling data validated the approach and revealed that normal epithelial cells exert an anti-tumour activity on prostate carcinoma cells. Moreover, by using a Bayesian hierarchical model integrating genetics and gene expression data and combining this with survival analysis, we show that the expression of putative cell communication genes related to focal adhesion and secretion is affected by epistatic gene copy number variation and it is predictive of patient survival. Ultimately, this study represents a generalizable approach to the challenge of deciphering cell communication

  17. A Network Biology Approach Identifies Molecular Cross-Talk between Normal Prostate Epithelial and Prostate Carcinoma Cells.

    PubMed

    Trevino, Victor; Cassese, Alberto; Nagy, Zsuzsanna; Zhuang, Xiaodong; Herbert, John; Antczak, Philipp; Clarke, Kim; Davies, Nicholas; Rahman, Ayesha; Campbell, Moray J; Guindani, Michele; Bicknell, Roy; Vannucci, Marina; Falciani, Francesco

    2016-04-01

    The advent of functional genomics has enabled the genome-wide characterization of the molecular state of cells and tissues, virtually at every level of biological organization. The difficulty in organizing and mining this unprecedented amount of information has stimulated the development of computational methods designed to infer the underlying structure of regulatory networks from observational data. These important developments had a profound impact in biological sciences since they triggered the development of a novel data-driven investigative approach. In cancer research, this strategy has been particularly successful. It has contributed to the identification of novel biomarkers, to a better characterization of disease heterogeneity and to a more in depth understanding of cancer pathophysiology. However, so far these approaches have not explicitly addressed the challenge of identifying networks representing the interaction of different cell types in a complex tissue. Since these interactions represent an essential part of the biology of both diseased and healthy tissues, it is of paramount importance that this challenge is addressed. Here we report the definition of a network reverse engineering strategy designed to infer directional signals linking adjacent cell types within a complex tissue. The application of this inference strategy to prostate cancer genome-wide expression profiling data validated the approach and revealed that normal epithelial cells exert an anti-tumour activity on prostate carcinoma cells. Moreover, by using a Bayesian hierarchical model integrating genetics and gene expression data and combining this with survival analysis, we show that the expression of putative cell communication genes related to focal adhesion and secretion is affected by epistatic gene copy number variation and it is predictive of patient survival. Ultimately, this study represents a generalizable approach to the challenge of deciphering cell communication networks

  18. The impact of vascular diameter ratio on hemodialysis maturation time: Evidence from data mining approaches and thermodynamics law

    PubMed Central

    Rezapour, Mohammad; Taran, Somayeh; Balin Parast, Mahmood; Khavanin Zadeh, Morteza

    2016-01-01

    Background: Vascular Access (VA) is an important aspect for blood circulatory in Hemodialysis (HD). Arteriovenous Fistula (AVF) is a suitable procedure to gain VA. Maturation of the AVF is a status of AVF, which can be cannulated for HD. This study aimed to discover the parameters that effectively reduce the duration between VA and start of HD, which symbolizes the maturation time (MT). Methods: Ninety-six patients who underwent AVF creation were selected for this study. The decision tree method was used based on CART/C4.5 algorithm, which is one of the data mining approaches for data classification. Vascular diameter ratio (VDR) coefficient was obtained (VDR=Artery/Vein diameters). Results: We investigated the relationship between the VDR and MT in this study and found that MT is reversely related to VDR in elderly patients, while this relation was direct in younger patients. Conclusion: The analysis revealed a Spearman's correlation coefficient for Vein diameter with MT. MT decreases when diameters of vein and artery are close to one another. This study can help the surgeons to identify high- risk patients who elongate MT for HD. PMID:27453889

  19. A Data Mining Approach to Reveal Representative Collaboration Indicators in Open Collaboration Frameworks

    ERIC Educational Resources Information Center

    Anaya, Antonio R.; Boticario, Jesus G.

    2009-01-01

    Data mining methods are successful in educational environments to discover new knowledge or learner skills or features. Unfortunately, they have not been used in depth with collaboration. We have developed a scalable data mining method, whose objective is to infer information on the collaboration during the collaboration process in a…

  20. VALUING ACID MINE DRAINAGE REMEDIATION IN WEST VIRGINIA: A HEDONIC MODELING APPROACH

    EPA Science Inventory

    States with active and abandoned mines face large private and public costs to remediate damage to streams and rivers from acid mine drainage (AMD). Appalachian states have an especially large number of contaminated streams and rivers, and the USGS places AMD as the primary source...

  1. VALUING ACID MINE DRAINAGE REMEDIATION IN WEST VIRGINIA: A HEDONIC MODELING APPROACH INCORPORATING GEOGRAPHIC INFORMATION SYSTEMS

    EPA Science Inventory

    States with active and abandoned mines face large private and public costs to remediate damage to streams and rivers from acid mine drainage (AMD). Appalachian states have an especially large number of contaminated streams and rivers, and the USGS places AMD as the primary source...

  2. Occupational safety risk management in Australian mining.

    PubMed

    Joy, J

    2004-08-01

    In the past 15 years, there has been a major safety improvement in the Australian mining industry. Part of this change can be attributed to the development and application of risk assessment methods. These systematic, team-based techniques identify, assess and control unacceptable risks to people, assets, the environment and production. The outcomes have improved mine management systems. This paper discusses the risk assessment approach applied to equipment design and mining operations, as well as the specific risk assessment methodology. The paper also discusses the reactive side of risk management, incident and accident investigation. Systematic analytical methods have also been adopted by regulatory authorities and mining companies to investigate major losses.

  3. Application of techniques to identify coal-mine and power-generation effects on surface-water quality, San Juan River basin, New Mexico and Colorado

    USGS Publications Warehouse

    Goetz, C.L.; Abeyta, Cynthia G.; Thomas, E.V.

    1987-01-01

    Numerous analytical techniques were applied to determine water quality changes in the San Juan River basin upstream of Shiprock , New Mexico. Eight techniques were used to analyze hydrologic data such as: precipitation, water quality, and streamflow. The eight methods used are: (1) Piper diagram, (2) time-series plot, (3) frequency distribution, (4) box-and-whisker plot, (5) seasonal Kendall test, (6) Wilcoxon rank-sum test, (7) SEASRS procedure, and (8) analysis of flow adjusted, specific conductance data and smoothing. Post-1963 changes in dissolved solids concentration, dissolved potassium concentration, specific conductance, suspended sediment concentration, or suspended sediment load in the San Juan River downstream from the surface coal mines were examined to determine if coal mining was having an effect on the quality of surface water. None of the analytical methods used to analyzed the data showed any increase in dissolved solids concentration, dissolved potassium concentration, or specific conductance in the river downstream from the mines; some of the analytical methods used showed a decrease in dissolved solids concentration and specific conductance. Chaco River, an ephemeral stream tributary to the San Juan River, undergoes changes in water quality due to effluent from a power generation facility. The discharge in the Chaco River contributes about 1.9% of the average annual discharge at the downstream station, San Juan River at Shiprock, NM. The changes in water quality detected at the Chaco River station were not detected at the downstream Shiprock station. It was not possible, with the available data, to identify any effects of the surface coal mines on water quality that were separable from those of urbanization, agriculture, and other cultural and natural changes. In order to determine the specific causes of changes in water quality, it would be necessary to collect additional data at strategically located stations. (Author 's abstract)

  4. Text Influenced Molecular Indexing (TIMI): a literature database mining approach that handles text and chemistry.

    PubMed

    Singh, Suresh B; Hull, Richard D; Fluder, Eugene M

    2003-01-01

    We present an application of a novel methodology called Text Influenced Molecular Indexing (TIMI) to mine the information in the scientific literature. TIMI is an extension of two existing methodologies: (1) Latent Semantic Structure Indexing (LaSSI), a method for calculating chemical similarity using two-dimensional topological descriptors, and (2) Latent Semantic Indexing (LSI), a method for generating correlations between textual terms. The singular value decomposition (SVD) of a feature/object matrix is the fundamental mathematical operation underlying LSI, LaSSI, and TIMI and is used in the identification of associations between textual and chemical descriptors. We present the results of our studies with a database containing 11,571 PubMed/MEDLINE abstracts which show the advantages of merging textual and chemical descriptors over using either text or chemistry alone. Our work demonstrates that searching text-only databases limits retrieved documents to those that explicitly mention compounds by name in the text. Similarly, searching chemistry-only databases can only retrieve those documents that have chemical structures in them. TIMI, however, enables search and retrieval of documents with textual, chemical, and/or text- and chemistry-based queries. Thus, the TIMI system offers a powerful new approach to uncovering the contextual scientific knowledge sought by the medical research community.

  5. Application of machine learning algorithms for clinical predictive modeling: a data-mining approach in SCT.

    PubMed

    Shouval, R; Bondi, O; Mishan, H; Shimoni, A; Unger, R; Nagler, A

    2014-03-01

    Data collected from hematopoietic SCT (HSCT) centers are becoming more abundant and complex owing to the formation of organized registries and incorporation of biological data. Typically, conventional statistical methods are used for the development of outcome prediction models and risk scores. However, these analyses carry inherent properties limiting their ability to cope with large data sets with multiple variables and samples. Machine learning (ML), a field stemming from artificial intelligence, is part of a wider approach for data analysis termed data mining (DM). It enables prediction in complex data scenarios, familiar to practitioners and researchers. Technological and commercial applications are all around us, gradually entering clinical research. In the following review, we would like to expose hematologists and stem cell transplanters to the concepts, clinical applications, strengths and limitations of such methods and discuss current research in HSCT. The aim of this review is to encourage utilization of the ML and DM techniques in the field of HSCT, including prediction of transplantation outcome and donor selection.

  6. Identifying Key Priorities for Future Palliative Care Research Using an Innovative Analytic Approach

    PubMed Central

    Pillemer, Karl; Chen, Emily K.; Warmington, Marcus; Adelman, Ronald D.; Reid, M. C.

    2015-01-01

    Using an innovative approach, we identified research priorities in palliative care to guide future research initiatives. We searched 7 databases (2005–2012) for review articles published on the topics of palliative and hospice–end-of-life care. The identified research recommendations (n = 648) fell into 2 distinct categories: (1) ways to improve methodological approaches and (2) specific topic areas in need of future study. The most commonly cited priority within the theme of methodological approaches was the need for enhanced rigor. Specific topics in need of future study included perspectives and needs of patients, relatives, and providers; underrepresented populations; decision-making; cost-effectiveness; provider education; spirituality; service use; and interdisciplinary approaches to delivering palliative care. This review underscores the need for additional research on specific topics and methodologically rigorous research to inform health policy and practice. PMID:25393169

  7. New approach for reduction of diesel consumption by comparing different mining haulage configurations.

    PubMed

    Rodovalho, Edmo da Cunha; Lima, Hernani Mota; de Tomi, Giorgio

    2016-05-01

    The mining operations of loading and haulage have an energy source that is highly dependent on fossil fuels. In mining companies that select trucks for haulage, this input is the main component of mining costs. How can the impact of the operational aspects on the diesel consumption of haulage operations in surface mines be assessed? There are many studies relating the consumption of fuel trucks to several variables, but a methodology that prioritizes higher-impact variables under each specific condition is not available. Generic models may not apply to all operational settings presented in the mining industry. This study aims to create a method of analysis, identification, and prioritization of variables related to fuel consumption of haul trucks in open pit mines. For this purpose, statistical analysis techniques and mathematical modelling tools using multiple linear regressions will be applied. The model is shown to be suitable because the results generate a good description of the fuel consumption behaviour. In the practical application of the method, the reduction of diesel consumption reached 10%. The implementation requires no large-scale investments or very long deadlines and can be applied to mining haulage operations in other settings.

  8. A Critical Study on the Underground Environment of Coal Mines in India-an Ergonomic Approach

    NASA Astrophysics Data System (ADS)

    Dey, Netai Chandra; Sharma, Gourab Dhara

    2013-04-01

    Ergonomics application on underground miner's health plays a great role in controlling the efficiency of miners. The job stress in underground mine is still physically demanding and continuous stress due to certain posture or movement of miners during work leads to localized muscle fatigue creating musculo-skeletal disorders. A good working environment can change the degree of job heaviness and thermal stress (WBGT values) can directly have the effect on stretch of work of miners. Out of many unit operations in underground mine, roof bolting keeps an important contribution with regard to safety of the mine and miners. Occupational stress of roof bolters from ergonomic consideration has been discussed in the paper.

  9. Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine

    PubMed Central

    2014-01-01

    High-throughput DNA sequencing is revolutionizing the study of cancer and enabling the measurement of the somatic mutations that drive cancer development. However, the resulting sequencing datasets are large and complex, obscuring the clinically important mutations in a background of errors, noise, and random mutations. Here, we review computational approaches to identify somatic mutations in cancer genome sequences and to distinguish the driver mutations that are responsible for cancer from random, passenger mutations. First, we describe approaches to detect somatic mutations from high-throughput DNA sequencing data, particularly for tumor samples that comprise heterogeneous populations of cells. Next, we review computational approaches that aim to predict driver mutations according to their frequency of occurrence in a cohort of samples, or according to their predicted functional impact on protein sequence or structure. Finally, we review techniques to identify recurrent combinations of somatic mutations, including approaches that examine mutations in known pathways or protein-interaction networks, as well as de novo approaches that identify combinations of mutations according to statistical patterns of mutual exclusivity. These techniques, coupled with advances in high-throughput DNA sequencing, are enabling precision medicine approaches to the diagnosis and treatment of cancer. PMID:24479672

  10. Mining DNA microarray data using a novel approach based on graph theory.

    PubMed

    del Rio, G; Bartley, T F; del-Rio, H; Rao, R; Jin, K L; Greenberg, D A; Eshoo, M; Bredesen, D E

    2001-12-07

    The recent demonstration that biochemical pathways from diverse organisms are arranged in scale-free, rather than random, systems [Jeong et al., Nature 407 (2000) 651-654], emphasizes the importance of developing methods for the identification of biochemical nexuses--the nodes within biochemical pathways that serve as the major input/output hubs, and therefore represent potentially important targets for modulation. Here we describe a bioinformatics approach that identifies candidate nexuses for biochemical pathways without requiring functional gene annotation; we also provide proof-of-principle experiments to support this technique. This approach, called Nexxus, may lead to the identification of new signal transduction pathways and targets for drug design.

  11. Study on perception and control layer of mine CPS with mixed logic dynamic approach

    NASA Astrophysics Data System (ADS)

    Li, Jingzhao; Ren, Ping; Yang, Dayu

    2017-01-01

    Mine inclined roadway transportation system of mine cyber physical system is a hybrid system consisting of a continuous-time system and a discrete-time system, which can be divided into inclined roadway signal subsystem, error-proofing channel subsystems, anti-car subsystems, and frequency control subsystems. First, to ensure stable operation, improve efficiency and production safety, this hybrid system model with n inputs and m outputs is constructed and analyzed in detail, then its steady schedule state to be solved. Second, on the basis of the formal modeling for real-time systems, we use hybrid toolbox for system security verification. Third, the practical application of mine cyber physical system shows that the method for real-time simulation of mine cyber physical system is effective.

  12. Stochastic dynamic optimization approach for revegetation of reclaimed mine soils under uncertain weather regime

    SciTech Connect

    Mustafa, G.

    1989-01-01

    This study presents a comprehensive physically based stochastic dynamic optimization model to assist planners in making decisions concerning mine soil depths and soil mixture ratios required to achieve successful revegetation of mine lands at different probability levels of success, subject to an uncertain weather regime. A perennial grass growth model was modified and validated for predicting vegetation growth in reclaimed mine soils. The plant growth model is based on continuous relationships between plant growth, air temperature, dry length, leaf area, photoperiod and plant-soil-moisture stresses. A plant available soil moisture model was adopted to estimate daily soil moisture for mine soils. A general probability model was developed to estimate the probability of successful revegetation in a 5-year bond release period. The probability model considers five possible bond release criteria in mine soil reclamation planning. A stochastic dynamic optimization model (SDOM) was developed to find the optimum combination of soil depth and soil mixture ratios that met the successful vegetation standard under non-irrigated conditions with weather as the only random element of the system. The SDOM was applied for Wise County, Virginia, and the model found that 2:1 sandstone/siltstone soil mixture required the minimum soil depth to achieve successful revegetation. These results were also supported by field data. The developed model allows the planners to better manage lands drastically disturbed by surface mining.

  13. Doing the Work of Extension: Three Approaches to Identify, Amplify, and Implement Outreach

    ERIC Educational Resources Information Center

    Raison, Brian

    2014-01-01

    This article explores the literature and practice of how the Cooperative Extension Service does its work and asks if traditional outreach and engagement models have room for innovative delivery mechanisms that may identify emerging trends and help meet community needs. It considers three innovative approaches to the educational mission:…

  14. Identifying Useful Auxiliary Variables for Incomplete Data Analyses: A Note on a Group Difference Examination Approach

    ERIC Educational Resources Information Center

    Raykov, Tenko; Marcoulides, George A.

    2014-01-01

    This research note contributes to the discussion of methods that can be used to identify useful auxiliary variables for analyses of incomplete data sets. A latent variable approach is discussed, which is helpful in finding auxiliary variables with the property that if included in subsequent maximum likelihood analyses they may enhance considerably…

  15. The Baby TALK Model: An Innovative Approach to Identifying High-Risk Children and Families

    ERIC Educational Resources Information Center

    Villalpando, Aimee Hilado; Leow, Christine; Hornstein, John

    2012-01-01

    This research report examines the Baby TALK model, an innovative early childhood intervention approach used to identify, recruit, and serve young children who are at-risk for developmental delays, mental health needs, and/or school failure, and their families. The report begins with a description of the model. This description is followed by an…

  16. Identifying Core Mobile Learning Faculty Competencies Based Integrated Approach: A Delphi Study

    ERIC Educational Resources Information Center

    Elbarbary, Rafik Said

    2015-01-01

    This study is based on the integrated approach as a concept framework to identify, categorize, and rank a key component of mobile learning core competencies for Egyptian faculty members in higher education. The field investigation framework used four rounds Delphi technique to determine the importance rate of each component of core competencies…

  17. A Comprehensive Approach to Identifying Intervention Targets for Patient-Safety Improvement in a Hospital Setting

    ERIC Educational Resources Information Center

    Cunningham, Thomas R.; Geller, E. Scott

    2012-01-01

    Despite differences in approaches to organizational problem solving, healthcare managers and organizational behavior management (OBM) practitioners share a number of practices, and connecting healthcare management with OBM may lead to improvements in patient safety. A broad needs-assessment methodology was applied to identify patient-safety…

  18. A Function-First Approach to Identifying Formulaic Language in Academic Writing

    ERIC Educational Resources Information Center

    Durrant, Philip; Mathews-Aydinli, Julie

    2011-01-01

    There is currently much interest in creating pedagogically-oriented descriptions of formulaic language. Research in this area has typically taken what we call a "form-first" approach, in which formulas are identified as the most frequent recurrent forms in a relevant corpus. While this research continues to yield valuable results, the present…

  19. Nuclear waste repositories in salt mines: a new approach to safety assessment.

    PubMed

    Memmert, G

    1996-08-01

    The long-term safety of radioactive waste repositories in rock-salt mines in the deep underground benefits significantly from the barrier effect of overlying rocks. The concentrations of radioactive substances released from the repository and migrating in the aquifer up to the biosphere are greatly reduced during passage through these rocks. In former safety analyses of waste repositories this transport has generally been modelled as a combination of the involved phenomena, e.g. convection, dispersion, adsorption, etc. The data required for a numerical evaluation of the overall effect are obtained either as (conservative) estimates based on experience or are empirical, based mainly on laboratory experiments. The approach presented here is much simpler and entirely empirical, and therefore more transparent. It makes use of the fact that the groundwater in the overlying rocks always contains dissolved salt from the salt formation and carries it continuously into the receiving channels or the drainage system. The relation between the total amount of dissolved solids present in a certain subsurface catchment area and their steady-state concentration in the receiving channels is assumed to be equivalent to the relation between the given amount of radionuclides released from the repository and their concentration in the receiving channels, the latter leading to a certain radiation exposure of the population. Two versions of this approach are discussed: version (a) assumes a continuous stream of radionuclides released from the repository, and version (b) assumes a pulse release of radionuclides from the repository. A simple calculation using data from the Gorleben exploration leads to the inequality [equation: see text] where Cmax is the maximum radionuclide concentration (with respect to time) in the receiving channels and W (Bq) is the amount of radionuclides released from the respository in a very short time. Cmax obtained from (1), is supposed to be an upper limit of

  20. Identifying Bioaccumulative Halogenated Organic Compounds Using a Nontargeted Analytical Approach: Seabirds as Sentinels

    PubMed Central

    Millow, Christopher J.; Mackintosh, Susan A.; Lewison, Rebecca L.; Dodder, Nathan G.; Hoh, Eunha

    2015-01-01

    Persistent organic pollutants (POPs) are typically monitored via targeted mass spectrometry, which potentially identifies only a fraction of the contaminants actually present in environmental samples. With new anthropogenic compounds continuously introduced to the environment, novel and proactive approaches that provide a comprehensive alternative to targeted methods are needed in order to more completely characterize the diversity of known and unknown compounds likely to cause adverse effects. Nontargeted mass spectrometry attempts to extensively screen for compounds, providing a feasible approach for identifying contaminants that warrant future monitoring. We employed a nontargeted analytical method using comprehensive two-dimensional gas chromatography coupled to time-of-flight mass spectrometry (GC×GC/TOF-MS) to characterize halogenated organic compounds (HOCs) in California Black skimmer (Rynchops niger) eggs. Our study identified 111 HOCs; 84 of these compounds were regularly detected via targeted approaches, while 27 were classified as typically unmonitored or unknown. Typically unmonitored compounds of note in bird eggs included tris(4-chlorophenyl)methane (TCPM), tris(4-chlorophenyl)methanol (TCPMOH), triclosan, permethrin, heptachloro-1'-methyl-1,2'-bipyrrole (MBP), as well as four halogenated unknown compounds that could not be identified through database searching or the literature. The presence of these compounds in Black skimmer eggs suggests they are persistent, bioaccumulative, potentially biomagnifying, and maternally transferring. Our results highlight the utility and importance of employing nontargeted analytical tools to assess true contaminant burdens in organisms, as well as to demonstrate the value in using environmental sentinels to proactively identify novel contaminants. PMID:26020245

  1. Microbial populations identified by fluorescence in situ hybridization in a constructed wetland treating acid coal mine drainage

    SciTech Connect

    Nicomrat, D.; Dick, W.A.; Tuovinen, O.H.

    2006-07-15

    Microorganisms are an integral part of the biogeochemical processes in wetlands, yet microbial communities in sediments within constructed wetlands receiving acid mine drainage (AMD) are only poorly understood. The purpose of this study was to characterize the microbial diversity and abundance in a wetland receiving AMD using fluorescence in situ hybridization (FISH) analysis. Seasonal samples of oxic surface sediments, comprised of Fe(III) precipitates, were collected from two treatment cells of the constructed wetland system. The pH of the bulk samples ranged between pH 2.1 and 3.9. Viable counts of acidophilic Fe and S oxidizers and heterotrophs were determined with a most probable number (MPN) method. The MPN counts were only a fraction of the corresponding FISH counts. The sediment samples contained microorganisms in the Bacteria (including the subgroups of acidophilic Fe- and S-oxidizing bacteria and Acidiphilium spp.) and Eukarya domains. Archaea were present in the sediment surface samples at < 0.01% of the total microbial community. The most numerous bacterial species in this wetland system was Acidithiobacillus ferrooxidans, comprising up to 37% of the bacterial population. Acidithiobacillus thiooxidans was also abundant.

  2. Shifting species ranges and changing phenology: A new approach to mining social media for ecosystems observations

    NASA Astrophysics Data System (ADS)

    Fuka, M. Z.; Osborne-Gowey, J. D.; Fuka, D. R.

    2013-12-01

    Geoscientists & ecologists are increasingly using social media to solicit 'citizen scientists' to participate in the data collection process. However, social media users are also a largely untapped resource of spontaneous, unsolicited observations of the natural world. Of particular interest are observations of species phenology & range to better develop a predictive understanding of how ecosystems are affected by a changing climate and human-mediated influences. Social media users' observations include information on phenological & biological phenomena such as flowers blooming, native & invasive species sightings, unusual behaviors, animal tracks, droppings, damage, feeding, nesting, etc. Our AGU2011 pilot study on the North American armadillo suggests that useful observational data can be extracted from Twitter to map current species ranges to compare with past ranges. We have expanded that work by mining Twitter for a number of North American species and ecosystem observations to determine usefulness for environmental applications such as: 1) supplementing existing databases, 2) identifying outlier phenomena, 3) guiding additional crowd-sourced studies and data collection efforts, 4) recruiting citizen scientists, 5) gauging sentiment about the observations and 6) informing ecosystems policy-making and education. We present the results for our evaluation of a representative sample from a list of 200+ species for which we've collected data since August 2011. Our results include frequency of reports and sightings by day, week and month, where the number of observations range from a few per month to ten or more per day. We discuss challenges, best practices and tools for distilling information from crowd-sourced observations gathered via Twitter in the form of 140-character 'tweets'. For example, geolocation is a critical issue. Despite the prevalence of smart phones, specific latitudinal and longitudinal coordinates are included in fewer than 10% of the

  3. Improvement Evaluation on Ceramic Roof Extraction Using WORLDVIEW-2 Imagery and Geographic Data Mining Approach

    NASA Astrophysics Data System (ADS)

    Brum-Bastos, V. S.; Ribeiro, B. M. G.; Pinho, C. M. D.; Korting, T. S.; Fonseca, L. M. G.

    2016-06-01

    Advances in geotechnologies and in remote sensing have improved analysis of urban environments. The new sensors are increasingly suited to urban studies, due to the enhancement in spatial, spectral and radiometric resolutions. Urban environments present high heterogeneity, which cannot be tackled using pixel-based approaches on high resolution images. Geographic Object-Based Image Analysis (GEOBIA) has been consolidated as a methodology for urban land use and cover monitoring; however, classification of high resolution images is still troublesome. This study aims to assess the improvement on ceramic roof classification using WorldView-2 images due to the increase of 4 new bands besides the standard "Blue-Green-Red-Near Infrared" bands. Our methodology combines GEOBIA, C4.5 classification tree algorithm, Monte Carlo simulation and statistical tests for classification accuracy. Two samples groups were considered: 1) eight multispectral and panchromatic bands, and 2) four multispectral and panchromatic bands, representing previous high-resolution sensors. The C4.5 algorithm generates a decision tree that can be used for classification; smaller decision trees are closer to the semantic networks produced by experts on GEOBIA, while bigger trees, are not straightforward to implement manually, but are more accurate. The choice for a big or small tree relies on the user's skills to implement it. This study aims to determine for what kind of user the addition of the 4 new bands might be beneficial: 1) the common user (smaller trees) or 2) a more skilled user with coding and/or data mining abilities (bigger trees). In overall the classification was improved by the addition of the four new bands for both types of users.

  4. Smart-card-based automatic meal record system intervention tool for analysis using data mining approach.

    PubMed

    Zenitani, Satoko; Nishiuchi, Hiromu; Kiuchi, Takahiro

    2010-04-01

    The Smart-card-based Automatic Meal Record system for company cafeterias (AutoMealRecord system) was recently developed and used to monitor employee eating habits. The system could be a unique nutrition assessment tool for automatically monitoring the meal purchases of all employees, although it only focuses on company cafeterias and has never been validated. Before starting an interventional study, we tested the reliability of the data collected by the system using the data mining approach. The AutoMealRecord data were examined to determine if it could predict current obesity. All data used in this study (n = 899) were collected by a major electric company based in Tokyo, which has been operating the AutoMealRecord system for several years. We analyzed dietary patterns by principal component analysis using data from the system and extracted 5 major dietary patterns: healthy, traditional Japanese, Chinese, Japanese noodles, and pasta. The ability to predict current body mass index (BMI) with dietary preference was assessed with multiple linear regression analyses, and in the current study, BMI was positively correlated with male gender, preference for "Japanese noodles," mean energy intake, protein content, and frequency of body measurement at a body measurement booth in the cafeteria. There was a negative correlation with age, dietary fiber, and lunchtime cafeteria use (R(2) = 0.22). This regression model predicted "would-be obese" participants (BMI >or= 23) with 68.8% accuracy by leave-one-out cross validation. This shows that there was sufficient predictability of BMI based on data from the AutoMealRecord System. We conclude that the AutoMealRecord system is valuable for further consideration as a health care intervention tool.

  5. Quantitative and qualitative approaches to identifying migration chronology in a continental migrant.

    PubMed

    Beatty, William S; Kesler, Dylan C; Webb, Elisabeth B; Raedeke, Andrew H; Naylor, Luke W; Humburg, Dale D

    2013-01-01

    The degree to which extrinsic factors influence migration chronology in North American waterfowl has not been quantified, particularly for dabbling ducks. Previous studies have examined waterfowl migration using various methods, however, quantitative approaches to define avian migration chronology over broad spatio-temporal scales are limited, and the implications for using different approaches have not been assessed. We used movement data from 19 female adult mallards (Anas platyrhynchos) equipped with solar-powered global positioning system satellite transmitters to evaluate two individual level approaches for quantifying migration chronology. The first approach defined migration based on individual movements among geopolitical boundaries (state, provincial, international), whereas the second method modeled net displacement as a function of time using nonlinear models. Differences in migration chronologies identified by each of the approaches were examined with analysis of variance. The geopolitical method identified mean autumn migration midpoints at 15 November 2010 and 13 November 2011, whereas the net displacement method identified midpoints at 15 November 2010 and 14 November 2011. The mean midpoints for spring migration were 3 April 2011 and 20 March 2012 using the geopolitical method and 31 March 2011 and 22 March 2012 using the net displacement method. The duration, initiation date, midpoint, and termination date for both autumn and spring migration did not differ between the two individual level approaches. Although we did not detect differences in migration parameters between the different approaches, the net displacement metric offers broad potential to address questions in movement ecology for migrating species. Ultimately, an objective definition of migration chronology will allow researchers to obtain a comprehensive understanding of the extrinsic factors that drive migration at the individual and population levels. As a result, targeted

  6. Quantitative and qualitative approaches to identifying migration chronology in a continental migrant

    USGS Publications Warehouse

    Beatty, William S.; Kesler, Dylan C.; Webb, Elisabeth B.; Raedeke, Andrew H.; Naylor, Luke W.; Humburg, Dale D.

    2013-01-01

    The degree to which extrinsic factors influence migration chronology in North American waterfowl has not been quantified, particularly for dabbling ducks. Previous studies have examined waterfowl migration using various methods, however, quantitative approaches to define avian migration chronology over broad spatio-temporal scales are limited, and the implications for using different approaches have not been assessed. We used movement data from 19 female adult mallards (Anas platyrhynchos) equipped with solar-powered global positioning system satellite transmitters to evaluate two individual level approaches for quantifying migration chronology. The first approach defined migration based on individual movements among geopolitical boundaries (state, provincial, international), whereas the second method modeled net displacement as a function of time using nonlinear models. Differences in migration chronologies identified by each of the approaches were examined with analysis of variance. The geopolitical method identified mean autumn migration midpoints at 15 November 2010 and 13 November 2011, whereas the net displacement method identified midpoints at 15 November 2010 and 14 November 2011. The mean midpoints for spring migration were 3 April 2011 and 20 March 2012 using the geopolitical method and 31 March 2011 and 22 March 2012 using the net displacement method. The duration, initiation date, midpoint, and termination date for both autumn and spring migration did not differ between the two individual level approaches. Although we did not detect differences in migration parameters between the different approaches, the net displacement metric offers broad potential to address questions in movement ecology for migrating species. Ultimately, an objective definition of migration chronology will allow researchers to obtain a comprehensive understanding of the extrinsic factors that drive migration at the individual and population levels. As a result, targeted

  7. Quantitative and Qualitative Approaches to Identifying Migration Chronology in a Continental Migrant

    PubMed Central

    Beatty, William S.; Kesler, Dylan C.; Webb, Elisabeth B.; Raedeke, Andrew H.; Naylor, Luke W.; Humburg, Dale D.

    2013-01-01

    The degree to which extrinsic factors influence migration chronology in North American waterfowl has not been quantified, particularly for dabbling ducks. Previous studies have examined waterfowl migration using various methods, however, quantitative approaches to define avian migration chronology over broad spatio-temporal scales are limited, and the implications for using different approaches have not been assessed. We used movement data from 19 female adult mallards (Anas platyrhynchos) equipped with solar-powered global positioning system satellite transmitters to evaluate two individual level approaches for quantifying migration chronology. The first approach defined migration based on individual movements among geopolitical boundaries (state, provincial, international), whereas the second method modeled net displacement as a function of time using nonlinear models. Differences in migration chronologies identified by each of the approaches were examined with analysis of variance. The geopolitical method identified mean autumn migration midpoints at 15 November 2010 and 13 November 2011, whereas the net displacement method identified midpoints at 15 November 2010 and 14 November 2011. The mean midpoints for spring migration were 3 April 2011 and 20 March 2012 using the geopolitical method and 31 March 2011 and 22 March 2012 using the net displacement method. The duration, initiation date, midpoint, and termination date for both autumn and spring migration did not differ between the two individual level approaches. Although we did not detect differences in migration parameters between the different approaches, the net displacement metric offers broad potential to address questions in movement ecology for migrating species. Ultimately, an objective definition of migration chronology will allow researchers to obtain a comprehensive understanding of the extrinsic factors that drive migration at the individual and population levels. As a result, targeted

  8. Ab initio thermodynamic approach to identify mixed solid sorbents for CO2 capture technology

    DOE PAGES

    Duan, Yuhua

    2015-10-15

    Because the current technologies for capturing CO2 are still too energy intensive, new materials must be developed that can capture CO2 reversibly with acceptable energy costs. At a given CO2 pressure, the turnover temperature (Tt) of the reaction of an individual solid that can capture CO2 is fixed. Such Tt may be outside the operating temperature range (ΔTo) for a practical capture technology. To adjust Tt to fit the practical ΔTo, in this study, three scenarios of mixing schemes are explored by combining thermodynamic database mining with first principles density functional theory and phonon lattice dynamics calculations. Our calculated resultsmore » demonstrate that by mixing different types of solids, it’s possible to shift Tt to the range of practical operating temperature conditions. According to the requirements imposed by the pre- and post- combustion technologies and based on our calculated thermodynamic properties for the CO2 capture reactions by the mixed solids of interest, we were able to identify the mixing ratios of two or more solids to form new sorbent materials for which lower capture energy costs are expected at the desired pressure and temperature conditions.« less

  9. Missing defects? A comparison of microscopic and macroscopic approaches to identifying linear enamel hypoplasia.

    PubMed

    Hassett, Brenna R

    2014-03-01

    Linear enamel hypoplasia (LEH), the presence of linear defects of dental enamel formed during periods of growth disruption, is frequently analyzed in physical anthropology as evidence for childhood health in the past. However, a wide variety of methods for identifying and interpreting these defects in archaeological remains exists, preventing easy cross-comparison of results from disparate studies. This article compares a standard approach to identifying LEH using the naked eye to the evidence of growth disruption observed microscopically from the enamel surface. This comparison demonstrates that what is interpreted as evidence of growth disruption microscopically is not uniformly identified with the naked eye, and provides a reference for the level of consistency between the number and timing of defects identified using microscopic versus macroscopic approaches. This is done for different tooth types using a large sample of unworn permanent teeth drawn from several post-medieval London burial assemblages. The resulting schematic diagrams showing where macroscopic methods achieve more or less similar results to microscopic methods are presented here and clearly demonstrate that "naked-eye" methods of identifying growth disruptions do not identify LEH as often as microscopic methods in areas where perikymata are more densely packed.

  10. Identifying inhibitory compounds in lignocellulosic biomass hydrolysates using an exometabolomics approach

    PubMed Central

    2014-01-01

    Background Inhibitors are formed that reduce the fermentation performance of fermenting yeast during the pretreatment process of lignocellulosic biomass. An exometabolomics approach was applied to systematically identify inhibitors in lignocellulosic biomass hydrolysates. Results We studied the composition and fermentability of 24 different biomass hydrolysates. To create diversity, the 24 hydrolysates were prepared from six different biomass types, namely sugar cane bagasse, corn stover, wheat straw, barley straw, willow wood chips and oak sawdust, and with four different pretreatment methods, i.e. dilute acid, mild alkaline, alkaline/peracetic acid and concentrated acid. Their composition and that of fermentation samples generated with these hydrolysates were analyzed with two GC-MS methods. Either ethyl acetate extraction or ethyl chloroformate derivatization was used before conducting GC-MS to prevent sugars are overloaded in the chromatograms, which obscure the detection of less abundant compounds. Using multivariate PLS-2CV and nPLS-2CV data analysis models, potential inhibitors were identified through establishing relationship between fermentability and composition of the hydrolysates. These identified compounds were tested for their effects on the growth of the model yeast, Saccharomyces. cerevisiae CEN.PK 113-7D, confirming that the majority of the identified compounds were indeed inhibitors. Conclusion Inhibitory compounds in lignocellulosic biomass hydrolysates were successfully identified using a non-targeted systematic approach: metabolomics. The identified inhibitors include both known ones, such as furfural, HMF and vanillin, and novel inhibitors, namely sorbic acid and phenylacetaldehyde. PMID:24655423

  11. A cross-species bi-clustering approach to identifying conserved co-regulated genes

    PubMed Central

    Sun, Jiangwen; Jiang, Zongliang; Tian, Xiuchun; Bi, Jinbo

    2016-01-01

    Motivation: A growing number of studies have explored the process of pre-implantation embryonic development of multiple mammalian species. However, the conservation and variation among different species in their developmental programming are poorly defined due to the lack of effective computational methods for detecting co-regularized genes that are conserved across species. The most sophisticated method to date for identifying conserved co-regulated genes is a two-step approach. This approach first identifies gene clusters for each species by a cluster analysis of gene expression data, and subsequently computes the overlaps of clusters identified from different species to reveal common subgroups. This approach is ineffective to deal with the noise in the expression data introduced by the complicated procedures in quantifying gene expression. Furthermore, due to the sequential nature of the approach, the gene clusters identified in the first step may have little overlap among different species in the second step, thus difficult to detect conserved co-regulated genes. Results: We propose a cross-species bi-clustering approach which first denoises the gene expression data of each species into a data matrix. The rows of the data matrices of different species represent the same set of genes that are characterized by their expression patterns over the developmental stages of each species as columns. A novel bi-clustering method is then developed to cluster genes into subgroups by a joint sparse rank-one factorization of all the data matrices. This method decomposes a data matrix into a product of a column vector and a row vector where the column vector is a consistent indicator across the matrices (species) to identify the same gene cluster and the row vector specifies for each species the developmental stages that the clustered genes co-regulate. Efficient optimization algorithm has been developed with convergence analysis. This approach was first validated on

  12. A comparison of approaches for finding minimum identifying codes on graphs

    NASA Astrophysics Data System (ADS)

    Horan, Victoria; Adachi, Steve; Bak, Stanley

    2016-05-01

    In order to formulate mathematical conjectures likely to be true, a number of base cases must be determined. However, many combinatorial problems are NP-hard and the computational complexity makes this research approach difficult using a standard brute force approach on a typical computer. One sample problem explored is that of finding a minimum identifying code. To work around the computational issues, a variety of methods are explored and consist of a parallel computing approach using MATLAB, an adiabatic quantum optimization approach using a D-Wave quantum annealing processor, and lastly using satisfiability modulo theory (SMT) and corresponding SMT solvers. Each of these methods requires the problem to be formulated in a unique manner. In this paper, we address the challenges of computing solutions to this NP-hard problem with respect to each of these methods.

  13. Microbial populations identified by fluorescence in situ hybridization in a constructed wetland treating acid coal mine drainage.

    PubMed

    Nicomrat, Duongruitai; Dick, Warren A; Tuovinen, Olli H

    2006-01-01

    Microorganisms are an integral part of the biogeochemical processes in wetlands, yet microbial communities in sediments within constructed wetlands receiving acid mine drainage (AMD) are only poorly understood. The purpose of this study was to characterize the microbial diversity and abundance in a wetland receiving AMD using fluorescence in situ hybridization (FISH) analysis. Seasonal samples of oxic surface sediments, comprised of Fe(III) precipitates, were collected from two treatment cells of the constructed wetland system. The pH of the bulk samples ranged between pH 2.1 and 3.9. Viable counts of acidophilic Fe and S oxidizers and heterotrophs were determined with a most probable number (MPN) method. The MPN counts were only a fraction of the corresponding FISH counts. The sediment samples contained microorganisms in the Bacteria (including the subgroups of acidophilic Fe- and S-oxidizing bacteria and Acidiphilium spp.) and Eukarya domains. Archaea were present in the sediment surface samples at < 0.01% of the total microbial community. The most numerous bacterial species in this wetland system was Acidithiobacillus ferrooxidans, comprising up to 37% of the bacterial population. Acidithiobacillus thiooxidans was also abundant. Heterotrophs in the Acidiphilium genus totaled 20% of the bacterial population. Leptospirillum ferrooxidans was below the level of detection in the bacterial community. The results from the FISH technique from this field study are consistent with results from other experiments involving enumeration by most probable number, dot-blot hybridization, and denaturing gradient gel electrophoresis analyses and with the geochemistry of the site.

  14. Support for information management in critical care: a new approach to identify needs.

    PubMed Central

    Rosenal, T. W.; Forsythe, D. E.; Musen, M. A.; Seiver, A.

    1995-01-01

    Managing information is necessary to support clinical decision making and action in critical care. By understanding the nature of information management and its relationship to sound clinical practice, we should come to use technology more wisely. We demonstrated that a new approach inspired by ethnographic research methods could identify useful and unexpected findings about clinical information management. In this approach, a clinician experienced in a specific domain (critical care), with advice from a medical anthropologist, made short-term observations of information management in that domain. We identified 8 areas in a critical care Unit in which information management was seriously in need of better support. We also found interesting differences in how these needs were viewed by nurses and physicians. Our interest in this approach was at two levels: 1. Identify and describe representative instances of sub-optimal information management in a critical care Unit. 2. Investigate the effectiveness of such short-term observations by clinicians. Our long-range goal is to explore the use of this approach and the information it reveals to optimize the process of developing and selecting new information support tools, preparing for their introduction, and optimizing clinical outcomes. PMID:8563267

  15. Alternative approaches for identifying acute systemic toxicity: Moving from research to regulatory testing.

    PubMed

    Hamm, Jon; Sullivan, Kristie; Clippinger, Amy J; Strickland, Judy; Bell, Shannon; Bhhatarai, Barun; Blaauboer, Bas; Casey, Warren; Dorman, David; Forsby, Anna; Garcia-Reyero, Natàlia; Gehen, Sean; Graepel, Rabea; Hotchkiss, Jon; Lowit, Anna; Matheson, Joanna; Reaves, Elissa; Scarano, Louis; Sprankle, Catherine; Tunkel, Jay; Wilson, Dan; Xia, Menghang; Zhu, Hao; Allen, David

    2017-01-06

    Acute systemic toxicity testing provides the basis for hazard labeling and risk management of chemicals. A number of international efforts have been directed at identifying non-animal alternatives for in vivo acute systemic toxicity tests. A September 2015 workshop, Alternative Approaches for Identifying Acute Systemic Toxicity: Moving from Research to Regulatory Testing, reviewed the state-of-the-science of non-animal alternatives for this testing and explored ways to facilitate implementation of alternatives. Workshop attendees included representatives from international regulatory agencies, academia, nongovernmental organizations, and industry. Resources identified as necessary for meaningful progress in implementing alternatives included compiling and making available high-quality reference data, training on use and interpretation of in vitro and in silico approaches, and global harmonization of testing requirements. Attendees particularly noted the need to characterize variability in reference data to evaluate new approaches. They also noted the importance of understanding the mechanisms of acute toxicity, which could be facilitated by the development of adverse outcome pathways. Workshop breakout groups explored different approaches to reducing or replacing animal use for acute toxicity testing, with each group crafting a roadmap and strategy to accomplish near-term progress. The workshop steering committee has organized efforts to implement the recommendations of the workshop participants.

  16. An information-theoretic approach to assess practical identifiability of parametric dynamical systems.

    PubMed

    Pant, Sanjay; Lombardi, Damiano

    2015-10-01

    A new approach for assessing parameter identifiability of dynamical systems in a Bayesian setting is presented. The concept of Shannon entropy is employed to measure the inherent uncertainty in the parameters. The expected reduction in this uncertainty is seen as the amount of information one expects to gain about the parameters due to the availability of noisy measurements of the dynamical system. Such expected information gain is interpreted in terms of the variance of a hypothetical measurement device that can measure the parameters directly, and is related to practical identifiability of the parameters. If the individual parameters are unidentifiable, correlation between parameter combinations is assessed through conditional mutual information to determine which sets of parameters can be identified together. The information theoretic quantities of entropy and information are evaluated numerically through a combination of Monte Carlo and k-nearest neighbour methods in a non-parametric fashion. Unlike many methods to evaluate identifiability proposed in the literature, the proposed approach takes the measurement-noise into account and is not restricted to any particular noise-structure. Whilst computationally intensive for large dynamical systems, it is easily parallelisable and is non-intrusive as it does not necessitate re-writing of the numerical solvers of the dynamical system. The application of such an approach is presented for a variety of dynamical systems--ranging from systems governed by ordinary differential equations to partial differential equations--and, where possible, validated against results previously published in the literature.

  17. Mining and biodiversity offsets: a transparent and science-based approach to measure "no-net-loss".

    PubMed

    Virah-Sawmy, Malika; Ebeling, Johannes; Taplin, Roslyn

    2014-10-01

    Mining and associated infrastructure developments can present themselves as economic opportunities that are difficult to forego for developing and industrialised countries alike. Almost inevitably, however, they lead to biodiversity loss. This trade-off can be greatest in economically poor but highly biodiverse regions. Biodiversity offsets have, therefore, increasingly been promoted as a mechanism to help achieve both the aims of development and biodiversity conservation. Accordingly, this mechanism is emerging as a key tool for multinational mining companies to demonstrate good environmental stewardship. Relying on offsets to achieve "no-net-loss" of biodiversity, however, requires certainty in their ecological integrity where they are used to sanction habitat destruction. Here, we discuss real-world practices in biodiversity offsetting by assessing how well some leading initiatives internationally integrate critical aspects of biodiversity attributes, net loss accounting and project management. With the aim of improving, rather than merely critiquing the approach, we analyse different aspects of biodiversity offsetting. Further, we analyse the potential pitfalls of developing counterfactual scenarios of biodiversity loss or gains in a project's absence. In this, we draw on insights from experience with carbon offsetting. This informs our discussion of realistic projections of project effectiveness and permanence of benefits to ensure no net losses, and the risk of displacing, rather than avoiding biodiversity losses ("leakage"). We show that the most prominent existing biodiversity offset initiatives employ broad and somewhat arbitrary parameters to measure habitat value and do not sufficiently consider real-world challenges in compensating losses in an effective and lasting manner. We propose a more transparent and science-based approach, supported with a new formula, to help design biodiversity offsets to realise their potential in enabling more responsible

  18. A Systematic Approach to Determining the Identifiability of Multistage Carcinogenesis Models.

    PubMed

    Brouwer, Andrew F; Meza, Rafael; Eisenberg, Marisa C

    2016-09-09

    Multistage clonal expansion (MSCE) models of carcinogenesis are continuous-time Markov process models often used to relate cancer incidence to biological mechanism. Identifiability analysis determines what model parameter combinations can, theoretically, be estimated from given data. We use a systematic approach, based on differential algebra methods traditionally used for deterministic ordinary differential equation (ODE) models, to determine identifiable combinations for a generalized subclass of MSCE models with any number of preinitation stages and one clonal expansion. Additionally, we determine the identifiable combinations of the generalized MSCE model with up to four clonal expansion stages, and conjecture the results for any number of clonal expansion stages. The results improve upon previous work in a number of ways and provide a framework to find the identifiable combinations for further variations on the MSCE models. Finally, our approach, which takes advantage of the Kolmogorov backward equations for the probability generating functions of the Markov process, demonstrates that identifiability methods used in engineering and mathematics for systems of ODEs can be applied to continuous-time Markov processes.

  19. Identification of gefitinib off-targets using a structure-based systems biology approach; their validation with reverse docking and retrospective data mining

    PubMed Central

    Verma, Nidhi; Rai, Amit Kumar; Kaushik, Vibha; Brünnert, Daniela; Chahar, Kirti Raj; Pandey, Janmejay; Goyal, Pankaj

    2016-01-01

    Gefitinib, an EGFR tyrosine kinase inhibitor, is used as FDA approved drug in breast cancer and non-small cell lung cancer treatment. However, this drug has certain side effects and complications for which the underlying molecular mechanisms are not well understood. By systems biology based in silico analysis, we identified off-targets of gefitinib that might explain side effects of this drugs. The crystal structure of EGFR-gefitinib complex was used for binding pocket similarity searches on a druggable proteome database (Sc-PDB) by using IsoMIF Finder. The top 128 hits of putative off-targets were validated by reverse docking approach. The results showed that identified off-targets have efficient binding with gefitinib. The identified human specific off-targets were confirmed and further analyzed for their links with biological process and clinical disease pathways using retrospective studies and literature mining, respectively. Noticeably, many of the identified off-targets in this study were reported in previous high-throughput screenings. Interestingly, the present study reveals that gefitinib may have positive effects in reducing brain and bone metastasis, and may be useful in defining novel gefitinib based treatment regime. We propose that a system wide approach could be useful during new drug development and to minimize side effect of the prospective drug. PMID:27653775

  20. Identification of gefitinib off-targets using a structure-based systems biology approach; their validation with reverse docking and retrospective data mining.

    PubMed

    Verma, Nidhi; Rai, Amit Kumar; Kaushik, Vibha; Brünnert, Daniela; Chahar, Kirti Raj; Pandey, Janmejay; Goyal, Pankaj

    2016-09-22

    Gefitinib, an EGFR tyrosine kinase inhibitor, is used as FDA approved drug in breast cancer and non-small cell lung cancer treatment. However, this drug has certain side effects and complications for which the underlying molecular mechanisms are not well understood. By systems biology based in silico analysis, we identified off-targets of gefitinib that might explain side effects of this drugs. The crystal structure of EGFR-gefitinib complex was used for binding pocket similarity searches on a druggable proteome database (Sc-PDB) by using IsoMIF Finder. The top 128 hits of putative off-targets were validated by reverse docking approach. The results showed that identified off-targets have efficient binding with gefitinib. The identified human specific off-targets were confirmed and further analyzed for their links with biological process and clinical disease pathways using retrospective studies and literature mining, respectively. Noticeably, many of the identified off-targets in this study were reported in previous high-throughput screenings. Interestingly, the present study reveals that gefitinib may have positive effects in reducing brain and bone metastasis, and may be useful in defining novel gefitinib based treatment regime. We propose that a system wide approach could be useful during new drug development and to minimize side effect of the prospective drug.

  1. An integrated remote sensing approach for identifying ecological range sites. [parker mountain

    NASA Technical Reports Server (NTRS)

    Jaynes, R. A.

    1983-01-01

    A model approach for identifying ecological range sites was applied to high elevation sagebrush-dominated rangelands on Parker Mountain, in south-central Utah. The approach utilizes map information derived from both high altitude color infrared photography and LANDSAT digital data, integrated with soils, geological, and precipitation maps. Identification of the ecological range site for a given area requires an evaluation of all relevant environmental factors which combine to give that site the potential to produce characteristic types and amounts of vegetation. A table is presented which allows the user to determine ecological range site based upon an integrated use of the maps which were prepared. The advantages of identifying ecological range sites through an integrated photo interpretation/LANDSAT analysis are discussed.

  2. Online-Based Approaches to Identify Real Journals and Publishers from Hijacked Ones.

    PubMed

    Asadi, Amin; Rahbar, Nader; Asadi, Meisam; Asadi, Fahime; Khalili Paji, Kokab

    2017-02-01

    The aim of the present paper was to introduce some online-based approaches to evaluate scientific journals and publishers and to differentiate them from the hijacked ones, regardless of their disciplines. With the advent of open-access journals, many hijacked journals and publishers have deceitfully assumed the mantle of authenticity in order to take advantage of researchers and students. Although these hijacked journals and publishers can be identified through checking their advertisement techniques and their websites, these ways do not always result in their identification. There exist certain online-based approaches, such as using Master Journal List provided by Thomson Reuters, and Scopus database, and using the DOI of a paper, to certify the realness of a journal or publisher. It is indispensable that inexperienced students and researchers know these methods so as to identify hijacked journals and publishers with a higher level of probability.

  3. An innovative and integrated approach based on DNA walking to identify unauthorised GMOs.

    PubMed

    Fraiture, Marie-Alice; Herman, Philippe; Taverniers, Isabel; De Loose, Marc; Deforce, Dieter; Roosens, Nancy H

    2014-03-15

    In the coming years, the frequency of unauthorised genetically modified organisms (GMOs) being present in the European food and feed chain will increase significantly. Therefore, we have developed a strategy to identify unauthorised GMOs containing a pCAMBIA family vector, frequently present in transgenic plants. This integrated approach is performed in two successive steps on Bt rice grains. First, the potential presence of unauthorised GMOs is assessed by the qPCR SYBR®Green technology targeting the terminator 35S pCAMBIA element. Second, its presence is confirmed via the characterisation of the junction between the transgenic cassette and the rice genome. To this end, a DNA walking strategy is applied using a first reverse primer followed by two semi-nested PCR rounds using primers that are each time nested to the previous reverse primer. This approach allows to rapidly identify the transgene flanking region and can easily be implemented by the enforcement laboratories.

  4. Geotechnical approaches to coal ash content control in mining of complex structure deposits

    NASA Astrophysics Data System (ADS)

    Batugin, SA; Gavrilov, VL; Khoyutanov, EA

    2017-02-01

    Coal deposits having complex structure and nonuniform quality coal reserves require improved processes of production quality control. The paper proposes a method to present coal ash content as components of natural and technological dilution. It is chosen to carry out studies on the western site of Elginsk coal deposit, composed of four coal beds of complex structure. The reported estimates of coal ash content in the beds with respect to five components point at the need to account for such data in confirmation exploration, mine planning and actual mining. Basic means of analysis and control of overall ash content and its components are discussed.

  5. Rate of occupational accidents in the mining industry since 1950--a successful approach to prevention policy.

    PubMed

    Breuer, Joachim; Höffer, Eva-Marie; Hummitzsch, Walter

    2002-01-01

    This paper deals with the decrease in the rate of accident insurance claims in the German mining industry over the last five decades. It intends to show that this process is above all the result of a prevention policy where companies and the body responsible for the legal accident insurance in the mining industry, the Bergbau-Berufsgenossenschaft (BBG), work hand in hand. A system like the German accident insurance scheme, combining prevention, rehabilitation, and compensation, enables successful and modern safety and health measures.

  6. Demonstrating a Market-Based Approach to the Reclamation of Mined Lands in West Virginia

    SciTech Connect

    John W. Goodrich-Mahoney; Paul Ziemkiewicz

    2006-07-19

    This is the third quarter progress report of Phase II of a three-phase project to develop and evaluate the efficacy of developing multiple environmental market trading credits on a partially reclaimed surface mined site near Valley Point, Preston County, WV. Construction of the passive acid mine drainage (AMD) treatment system was completed but several modifications from the original design had to be made following the land survey and during construction to compensate for unforeseen circumstances. We continued to collect baseline quality data from the Conner Run AMD seeps to confirm the conceptual and final design for the passive AMD treatment system.

  7. An Approach to Identify and Characterize a Subunit Candidate Shigella Vaccine Antigen.

    PubMed

    Pore, Debasis; Chakrabarti, Manoj K

    2016-01-01

    Shigellosis remains a serious issue throughout the developing countries, particularly in children under the age of 5. Numerous strategies have been tested to develop vaccines targeting shigellosis; unfortunately despite several years of extensive research, no safe, effective, and inexpensive vaccine against shigellosis is available so far. Here, we illustrate in detail an approach to identify and establish immunogenic outer membrane proteins from Shigella flexneri 2a as subunit vaccine candidates.

  8. Impacts of mountaintop mining on terrestrial ecosystem integrity: Identifying landscape thresholds for avian species in the central Appalachians, United States

    USGS Publications Warehouse

    Becker, Douglas A.; Wood, Petra Bohall; Strager, Michael P.; Mazzarella, Christine

    2014-01-01

    Because of little overlap in habitat requirements, managing landscapes simultaneously to maximally benefit both guilds may not be possible. Our avian thresholds identify single community management targets accounting for scarce species. Guild or individual species thresholds allow for species-specific management.

  9. Systems approaches in osteoarthritis: Identifying routes to novel diagnostic and therapeutic strategies.

    PubMed

    Mueller, Alan J; Peffers, Mandy J; Proctor, Carole J; Clegg, Peter D

    2017-03-20

    Systems orientated research offers the possibility of identifying novel therapeutic targets and relevant diagnostic markers for complex diseases such as osteoarthritis. This review demonstrates that the osteoarthritis research community has been slow to incorporate systems orientated approaches into research studies, although a number of key studies reveal novel insights into the regulatory mechanisms that contribute both to joint tissue homeostasis and its dysfunction. The review introduces both top-down and bottom-up approaches employed in the study of osteoarthritis. A holistic and multiscale approach, where clinical measurements may predict dysregulation and progression of joint degeneration, should be a key objective in future research. The review concludes with suggestions for further research and emerging trends not least of which is the coupled development of diagnostic tests and therapeutics as part of a concerted effort by the osteoarthritis research community to meet clinical needs. This article is protected by copyright. All rights reserved.

  10. An Approach for Identifying Cytokines Based on a Novel Ensemble Classifier

    PubMed Central

    Zou, Quan; Wang, Zhen; Guan, Xinjun; Liu, Bin; Wu, Yunfeng; Lin, Ziyu

    2013-01-01

    Biology is meaningful and important to identify cytokines and investigate their various functions and biochemical mechanisms. However, several issues remain, including the large scale of benchmark datasets, serious imbalance of data, and discovery of new gene families. In this paper, we employ the machine learning approach based on a novel ensemble classifier to predict cytokines. We directly selected amino acids sequences as research objects. First, we pretreated the benchmark data accurately. Next, we analyzed the physicochemical properties and distribution of whole amino acids and then extracted a group of 120-dimensional (120D) valid features to represent sequences. Third, in the view of the serious imbalance in benchmark datasets, we utilized a sampling approach based on the synthetic minority oversampling technique algorithm and K-means clustering undersampling algorithm to rebuild the training set. Finally, we built a library for dynamic selection and circulating combination based on clustering (LibD3C) and employed the new training set to realize cytokine classification. Experiments showed that the geometric mean of sensitivity and specificity obtained through our approach is as high as 93.3%, which proves that our approach is effective for identifying cytokines. PMID:24027761

  11. A Bayesian Credible Subgroups Approach to Identifying Patient Subgroups with Positive Treatment Effects

    PubMed Central

    Tang, Qi; Offen, Walter W.; Carlin, Bradley P.

    2016-01-01

    Summary Many new experimental treatments benefit only a subset of the population. Identifying the baseline covariate profiles of patients who benefit from such a treatment, rather than determining whether or not the treatment has a population-level effect, can substantially lessen the risk in undertaking a clinical trial and expose fewer patients to treatments that do not benefit them. The standard analyses for identifying patient subgroups that benefit from an experimental treatment either do not account for multiplicity, or focus on testing for the presence of treatment-covariate interactions rather than the resulting individualized treatment effects. We propose a Bayesian credible subgroups method to identify two bounding subgroups for the benefiting subgroup: one for which it is likely that all members simultaneously have a treatment effect exceeding a specified threshold, and another for which it is likely that no members do. We examine frequentist properties of the credible subgroups method via simulations and illustrate the approach using data from an Alzheimer's disease treatment trial. We conclude with a discussion of the advantages and limitations of this approach to identifying patients for whom the treatment is beneficial. PMID:27159131

  12. Identifying and assessing the application of ecosystem services approaches in environmental policies and decision making.

    PubMed

    Van Wensem, Joke; Calow, Peter; Dollacker, Annik; Maltby, Lorraine; Olander, Lydia; Tuvendal, Magnus; Van Houtven, George

    2017-01-01

    The presumption is that ecosystem services (ES) approaches provide a better basis for environmental decision making than do other approaches because they make explicit the connection between human well-being and ecosystem structures and processes. However, the existing literature does not provide a precise description of ES approaches for environmental policy and decision making, nor does it assess whether these applications will make a difference in terms of changing decisions and improving outcomes. We describe 3 criteria that can be used to identify whether and to what extent ES approaches are being applied: 1) connect impacts all the way from ecosystem changes to human well-being, 2) consider all relevant ES affected by the decision, and 3) consider and compare the changes in well-being of different stakeholders. As a demonstration, we then analyze retrospectively whether and how the criteria were met in different decision-making contexts. For this assessment, we have developed an analysis format that describes the type of policy, the relevant scales, the decisions or questions, the decision maker, and the underlying documents. This format includes a general judgment of how far the 3 ES criteria have been applied. It shows that the criteria can be applied to many different decision-making processes, ranging from the supranational to the local scale and to different parts of decision-making processes. In conclusion we suggest these criteria could be used for assessments of the extent to which ES approaches have been and should be applied, what benefits and challenges arise, and whether using ES approaches made a difference in the decision-making process, decisions made, or outcomes of those decisions. Results from such studies could inform future use and development of ES approaches, draw attention to where the greatest benefits and challenges are, and help to target integration of ES approaches into policies, where they can be most effective. Integr Environ

  13. A Virtual Screening Approach For Identifying Plants with Anti H5N1 Neuraminidase Activity

    PubMed Central

    2016-01-01

    Recent outbreaks of highly pathogenic and occasional drug-resistant influenza strains have highlighted the need to develop novel anti-influenza therapeutics. Here, we report computational and experimental efforts to identify influenza neuraminidase inhibitors from among the 3000 natural compounds in the Malaysian-Plants Natural-Product (NADI) database. These 3000 compounds were first docked into the neuraminidase active site. The five plants with the largest number of top predicted ligands were selected for experimental evaluation. Twelve specific compounds isolated from these five plants were shown to inhibit neuraminidase, including two compounds with IC50 values less than 92 μM. Furthermore, four of the 12 isolated compounds had also been identified in the top 100 compounds from the virtual screen. Together, these results suggest an effective new approach for identifying bioactive plant species that will further the identification of new pharmacologically active compounds from diverse natural-product resources. PMID:25555059

  14. Deformation Prediction and Geometrical Modeling of Head and Neck Cancer Tumor: A Data Mining Approach

    NASA Astrophysics Data System (ADS)

    Azimi, Maryam

    Radiation therapy has been used in the treatment of cancer tumors for several years and many cancer patients receive radiotherapy. It may be used as primary therapy or with a combination of surgery or other kinds of therapy such as chemotherapy, hormone therapy or some mixture of the three. The treatment objective is to destroy cancer cells or shrink the tumor by planning an adequate radiation dose to the desired target without damaging the normal tissues. By using the pre-treatment Computer Tomography (CT) images, most of the radiotherapy planning systems design the target and assume that the size of the tumor will not change throughout the treatment course, which takes 5 to 7 weeks. Based on this assumption, the total amount of radiation is planned and fractionated for the daily dose required to be delivered to the patient's body. However, this assumption is flawed because the patients receiving radiotherapy have marked changes in tumor geometry during the treatment period. Therefore, there is a critical need to understand the changes of the tumor shape and size over time during the course of radiotherapy in order to prevent significant effects of inaccuracy in the planning. In this research, a methodology is proposed in order to monitor and predict daily (fraction day) tumor volume and surface changes of head and neck cancer tumors during the entire treatment period. In the proposed method, geometrical modeling and data mining techniques will be used rather than repetitive CT scans data to predict the tumor deformation for radiation planning. Clinical patient data were obtained from the University of Texas-MD Anderson Cancer Center (MDACC). In the first step, by using CT scan data, the tumor's progressive geometric changes during the treatment period are quantified. The next step relates to using regression analysis in order to develop predictive models for tumor geometry based on the geometric analysis results and the patients' selected attributes (age, weight

  15. Early Prediction of Students' Grade Point Averages at Graduation: A Data Mining Approach

    ERIC Educational Resources Information Center

    Tekin, Ahmet

    2014-01-01

    Problem Statement: There has recently been interest in educational databases containing a variety of valuable but sometimes hidden data that can be used to help less successful students to improve their academic performance. The extraction of hidden information from these databases often implements aspects of the educational data mining (EDM)…

  16. Data mining for water resource management part 2 - methods and approaches to solving contemporary problems

    USGS Publications Warehouse

    Roehl, Edwin A.; Conrads, Paul A.

    2010-01-01

    This is the second of two papers that describe how data mining can aid natural-resource managers with the difficult problem of controlling the interactions between hydrologic and man-made systems. Data mining is a new science that assists scientists in converting large databases into knowledge, and is uniquely able to leverage the large amounts of real-time, multivariate data now being collected for hydrologic systems. Part 1 gives a high-level overview of data mining, and describes several applications that have addressed major water resource issues in South Carolina. This Part 2 paper describes how various data mining methods are integrated to produce predictive models for controlling surface- and groundwater hydraulics and quality. The methods include: - signal processing to remove noise and decompose complex signals into simpler components; - time series clustering that optimally groups hundreds of signals into "classes" that behave similarly for data reduction and (or) divide-and-conquer problem solving; - classification which optimally matches new data to behavioral classes; - artificial neural networks which optimally fit multivariate data to create predictive models; - model response surface visualization that greatly aids in understanding data and physical processes; and, - decision support systems that integrate data, models, and graphics into a single package that is easy to use.

  17. A semiautomated approach to gene discovery through expressed sequence tag data mining: discovery of new human transporter genes.

    PubMed

    Brown, Shoshana; Chang, Jean L; Sadée, Wolfgang; Babbitt, Patricia C

    2003-01-01

    Identification and functional characterization of the genes in the human genome remain a major challenge. A principal source of publicly available information used for this purpose is the National Center for Biotechnology Information database of expressed sequence tags (dbEST), which contains over 4 million human ESTs. To extract the information buried in this data more effectively, we have developed a semiautomated method to mine dbEST for uncharacterized human genes. Starting with a single protein input sequence, a family of related proteins from all species is compiled. This entire family is then used to mine the human EST database for new gene candidates. Evaluation of putative new gene candidates in the context of a family of characterized proteins provides a framework for inference of the structure and function of the new genes. When applied to a test data set of 28 families within the major facilitator superfamily (MFS) of membrane transporters, our protocol found 73 previously characterized human MFS genes and 43 new MFS gene candidates. Development of this approach provided insights into the problems and pitfalls of automated data mining using public databases.

  18. Beyond the biomedical and behavioural: towards an integrated approach to HIV prevention in the southern African mining industry.

    PubMed

    Campbell, C; Williams, B

    1999-06-01

    While migrant labour is believed to play an important role in the dynamics of HIV-transmission in many of the countries of southern Africa, little has been written about the way in which HIV/AIDS has been dealt with in the industrial settings in which many migrant workers are employed. This paper takes the gold mining industry in the countries of the Southern African Development Community (SADC) as a case study. While many mines made substantial efforts to establish HIV-prevention programmes relatively early on in the epidemic, these appear to have had little impact. The paper analyses the response of key players in the mining industry, in the interests of highlighting the limitations of the way in which both managements and trade unions have responded to HIV. It will be argued that the energy that has been devoted either to biomedical or behavioural prevention programmes or to human rights issues has served to obscure the social and developmental dimensions of HIV-transmission. This argument is supported by means of a case study which seeks to highlight the complexity of the dynamics of disease transmission in this context, a complexity which is not reflected in individualistic responses. An account is given of a new intervention which seeks to develop a more integrated approach to HIV management in an industrial setting.

  19. A data mining approach to in vivo classification of psychopharmacological drugs.

    PubMed

    Kafkafi, Neri; Yekutieli, Daniel; Elmer, Greg I

    2009-02-01

    Data mining is a powerful bioinformatics strategy that has been successfully applied in vitro to screen for gene-expression profiles predicting toxicological or carcinogenic response ('class predictors'). In this report we used a data mining algorithm named Pattern Array (PA) in vivo to analyze mouse open-field behavior and characterize the psychopharmacological effects of three drug classes--psychomotor stimulant, opioid, and psychotomimetic. PA represents rodent movement with approximately 100,000 complex patterns, defined as multiple combinations of several ethologically relevant variables, and mines them for those that maximize any effect of interest, such as the difference between drug classes. We show that PA can discover behavioral predictors of all three drug classes, thus developing a reliable drug-classification scheme in small group sizes. The discovered predictors showed orderly dose dependency despite being explicitly mined only for class differences, with the high doses scoring 4-10 standard deviations from the vehicle group. Furthermore, these predictors correctly classified in a dose-dependent manner four 'unknown' drugs (ie that were not used in the training process), and scored a mixture of a psychomotor stimulant and an opioid as being intermediate between these two classes. The isolated behaviors were highly heritable (h(2)>50%) and replicable as determined in 10 inbred strains across three laboratories. PA can in principle be applied for mining behaviors predicting additional properties, such as within-class differences between drugs and within-drug dose-response, all of which can be measured automatically in a single session per animal in an open-field arena, suggesting a high potential as a tool in psychotherapeutic drug discovery.

  20. Translational informatics approach for identifying the functional molecular communicators linking coronary artery disease, infection and inflammation.

    PubMed

    Sharma, Ankit; Ghatge, Madankumar; Mundkur, Lakshmi; Vangala, Rajani Kanth

    2016-05-01

    Translational informatics approaches are required for the integration of diverse and accumulating data to enable the administration of effective translational medicine specifically in complex diseases such as coronary artery disease (CAD). In the current study, a novel approach for elucidating the association between infection, inflammation and CAD was used. Genes for CAD were collected from the CAD‑gene database and those for infection and inflammation were collected from the UniProt database. The cytomegalovirus (CMV)‑induced genes were identified from the literature and the CAD‑associated clinical phenotypes were obtained from the Unified Medical Language System. A total of 55 gene ontologies (GO) termed functional communicator ontologies were identified in the gene sets linking clinical phenotypes in the diseasome network. The network topology analysis suggested that important functions including viral entry, cell adhesion, apoptosis, inflammatory and immune responses networked with clinical phenotypes. Microarray data was extracted from the Gene Expression Omnibus (dataset: GSE48060) for highly networked disease myocardial infarction. Further analysis of differentially expressed genes and their GO terms suggested that CMV infection may trigger a xenobiotic response, oxidative stress, inflammation and immune modulation. Notably, the current study identified γ‑glutamyl transferase (GGT)‑5 as a potential biomarker with an odds ratio of 1.947, which increased to 2.561 following the addition of CMV and CMV‑neutralizing antibody (CMV‑NA) titers. The C‑statistics increased from 0.530 for conventional risk factors (CRFs) to 0.711 for GGT in combination with the above mentioned infections and CRFs. Therefore, the translational informatics approach used in the current study identified a potential molecular mechanism for CMV infection in CAD, and a potential biomarker for risk prediction.

  1. Identifying comorbid depression and disruptive behavior disorders: Comparison of two approaches used in adolescent studies

    PubMed Central

    Stoep, Ann Vander; Adrian, Molly C.; Rhew, Isaac C.; McCauley, Elizabeth; Herting, Jerald R.; Kraemer, Helena C.

    2013-01-01

    Interest in commonly co-occurring depression and disruptive behavior disorders in children has yielded a small body of research that estimates the prevalence of this comorbid condition and compares children with the comorbid condition and children with depression or disruptive behavior disorders alone with respect to antecedents and outcomes. Prior studies have used one of two different approaches to measure comorbid disorders: 1) meeting criteria for two DSM or ICD diagnoses or 2) scoring .5 SD above the mean or higher on two dimensional scales. This study compares two snapshots of comorbidity taken simultaneously in the same sample with each of the measurement approaches. The Developmental Pathways Project administered structured diagnostic interviews as well as dimensional scales to a community-based sample of 521 11-12 year olds to assess depression and disruptive behavior disorders. Clinical caseness indicators of children identified as “comorbid” by each method were examined concurrently and 3-years later. Cross-classification of adolescents via the two approaches revealed low agreement. When other indicators of caseness, including functional impairment, need for services, and clinical elevations on other symptom scales were examined, adolescents identified as comorbid via dimensional scales only were similar to those who were identified as comorbid via DSM-IV diagnostic criteria. Findings suggest that when relying solely on DSM diagnostic criteria for comorbid depression and disruptive behavior disorders, many adolescents with significant impairment will be overlooked. Findings also suggest that lower dimensional scale thresholds can be set when comorbid conditions, rather than single forms of psychopathology, are being identified. PMID:22575333

  2. Identifying comorbid depression and disruptive behavior disorders: comparison of two approaches used in adolescent studies.

    PubMed

    Vander Stoep, Ann; Adrian, Molly C; Rhew, Isaac C; McCauley, Elizabeth; Herting, Jerald R; Kraemer, Helena C

    2012-07-01

    Interest in commonly co-occurring depression and disruptive behavior disorders in children has yielded a small body of research that estimates the prevalence of this comorbid condition and compares children with the comorbid condition and children with depression or disruptive behavior disorders alone with respect to antecedents and outcomes. Prior studies have used one of two different approaches to measure comorbid disorders: (1) meeting criteria for two DSM or ICD diagnoses or (2) scoring .5 SD above the mean or higher on two dimensional scales. This study compares two snapshots of comorbidity taken simultaneously in the same sample with each of the measurement approaches. The Developmental Pathways Project administered structured diagnostic interviews as well as dimensional scales to a community-based sample of 521 11-12 year olds to assess depression and disruptive behavior disorders. Clinical caseness indicators of children identified as "comorbid" by each method were examined concurrently and 3-years later. Cross-classification of adolescents via the two approaches revealed low agreement. When other indicators of caseness, including functional impairment, need for services, and clinical elevations on other symptom scales were examined, adolescents identified as comorbid via dimensional scales only were similar to those who were identified as comorbid via DSM-IV diagnostic criteria. Findings suggest that when relying solely on DSM diagnostic criteria for comorbid depression and disruptive behavior disorders, many adolescents with significant impairment will be overlooked. Findings also suggest that lower dimensional scale thresholds can be set when comorbid conditions, rather than single forms of psychopathology, are being identified.

  3. Blind SELEX Approach Identifies RNA Aptamer that Regulate EMT and Inhibit Metastasis.

    PubMed

    Yoon, Sorah; Armstrong, Brian; Habib, Nagy; Rossi, John J

    2017-04-10

    Identifying targets that are exposed on the plasma membrane of tumor cells, but expressed internally in normal cells, is a fundamental issue for improving the specificity and efficacy of anticancer therpeutics. Using blind cell SELEX (Systemic Evolution of Ligands by EXponetial enrichment) which is untargeted SELEX, we have identified an aptamer, P15, which specifically bound to the human pancreatic adenocarcinoma cells. To identify the aptamer binding plasma membrane protein, liquid chromatography tandem mass spectrometry (LC-MS/MS) was used. The results of this unbiased proteomic mass spectrometry approach identified the target of P15 as the intermediate filament vimentin, biomarker of epithelial mesenchymal transition (EMT), which is an intracellular protein but is specifically expressed on the plasma membrane of cancer cells. As EMT plays a pivotal role to transit cancer cells to invasive cells, tumor cell metastasis assays were performed in vitro. P15 treated pancreatic cancer cells showed the significant inhibition of tumor metastasis. To investigate the downstream effects of P15, EMT related gene expression analysis was performed to identify differently expressed genes (DEGs). Among five DEGs, P15 treated cells showed the down-regulated expression of matrix metallopeptidase 3 (MMP3), which is involved in cancer invasion. These results, for the first time, demonstrate that P15 binding to cell surface vimentin inhibits the tumor cell invasion and is associated with reduced MMP3 expression. Thus, suggesting that P15 has potential as an anti-metastatic therapy in pancreatic cancer.

  4. Importance of multi-modal approaches to effectively identify cataract cases from electronic health records

    PubMed Central

    Rasmussen, Luke V; Berg, Richard L; Linneman, James G; McCarty, Catherine A; Waudby, Carol; Chen, Lin; Denny, Joshua C; Wilke, Russell A; Pathak, Jyotishman; Carrell, David; Kho, Abel N; Starren, Justin B

    2012-01-01

    Objective There is increasing interest in using electronic health records (EHRs) to identify subjects for genomic association studies, due in part to the availability of large amounts of clinical data and the expected cost efficiencies of subject identification. We describe the construction and validation of an EHR-based algorithm to identify subjects with age-related cataracts. Materials and methods We used a multi-modal strategy consisting of structured database querying, natural language processing on free-text documents, and optical character recognition on scanned clinical images to identify cataract subjects and related cataract attributes. Extensive validation on 3657 subjects compared the multi-modal results to manual chart review. The algorithm was also implemented at participating electronic MEdical Records and GEnomics (eMERGE) institutions. Results An EHR-based cataract phenotyping algorithm was successfully developed and validated, resulting in positive predictive values (PPVs) >95%. The multi-modal approach increased the identification of cataract subject attributes by a factor of three compared to single-mode approaches while maintaining high PPV. Components of the cataract algorithm were successfully deployed at three other institutions with similar accuracy. Discussion A multi-modal strategy incorporating optical character recognition and natural language processing may increase the number of cases identified while maintaining similar PPVs. Such algorithms, however, require that the needed information be embedded within clinical documents. Conclusion We have demonstrated that algorithms to identify and characterize cataracts can be developed utilizing data collected via the EHR. These algorithms provide a high level of accuracy even when implemented across multiple EHRs and institutional boundaries. PMID:22319176

  5. An unbiased approach to identifying tau kinases that phosphorylate tau at sites associated with Alzheimer disease.

    PubMed

    Cavallini, Annalisa; Brewerton, Suzanne; Bell, Amanda; Sargent, Samantha; Glover, Sarah; Hardy, Clare; Moore, Roger; Calley, John; Ramachandran, Devaki; Poidinger, Michael; Karran, Eric; Davies, Peter; Hutton, Michael; Szekeres, Philip; Bose, Suchira

    2013-08-09

    Neurofibrillary tangles, one of the hallmarks of Alzheimer disease (AD), are composed of paired helical filaments of abnormally hyperphosphorylated tau. The accumulation of these proteinaceous aggregates in AD correlates with synaptic loss and severity of dementia. Identifying the kinases involved in the pathological phosphorylation of tau may identify novel targets for AD. We used an unbiased approach to study the effect of 352 human kinases on their ability to phosphorylate tau at epitopes associated with AD. The kinases were overexpressed together with the longest form of human tau in human neuroblastoma cells. Levels of total and phosphorylated tau (epitopes Ser(P)-202, Thr(P)-231, Ser(P)-235, and Ser(P)-396/404) were measured in cell lysates using AlphaScreen assays. GSK3α, GSK3β, and MAPK13 were found to be the most active tau kinases, phosphorylating tau at all four epitopes. We further dissected the effects of GSK3α and GSK3β using pharmacological and genetic tools in hTau primary cortical neurons. Pathway analysis of the kinases identified in the screen suggested mechanisms for regulation of total tau levels and tau phosphorylation; for example, kinases that affect total tau levels do so by inhibition or activation of translation. A network fishing approach with the kinase hits identified other key molecules putatively involved in tau phosphorylation pathways, including the G-protein signaling through the Ras family of GTPases (MAPK family) pathway. The findings identify novel tau kinases and novel pathways that may be relevant for AD and other tauopathies.

  6. A novel approach for identifying causal models of complex diseases from family data.

    PubMed

    Park, Leeyoung; Kim, Ju H

    2015-04-01

    Causal models including genetic factors are important for understanding the presentation mechanisms of complex diseases. Familial aggregation and segregation analyses based on polygenic threshold models have been the primary approach to fitting genetic models to the family data of complex diseases. In the current study, an advanced approach to obtaining appropriate causal models for complex diseases based on the sufficient component cause (SCC) model involving combinations of traditional genetics principles was proposed. The probabilities for the entire population, i.e., normal-normal, normal-disease, and disease-disease, were considered for each model for the appropriate handling of common complex diseases. The causal model in the current study included the genetic effects from single genes involving epistasis, complementary gene interactions, gene-environment interactions, and environmental effects. Bayesian inference using a Markov chain Monte Carlo algorithm (MCMC) was used to assess of the proportions of each component for a given population lifetime incidence. This approach is flexible, allowing both common and rare variants within a gene and across multiple genes. An application to schizophrenia data confirmed the complexity of the causal factors. An analysis of diabetes data demonstrated that environmental factors and gene-environment interactions are the main causal factors for type II diabetes. The proposed method is effective and useful for identifying causal models, which can accelerate the development of efficient strategies for identifying causal factors of complex diseases.

  7. A Novel Approach to Identifying Trajectories of Mobility Change in Older Adults

    PubMed Central

    Ward, Rachel E.; Beauchamp, Marla K.; Latham, Nancy K.; Leveille, Suzanne G.; Percac-Lima, Sanja; Kurlinski, Laura; Ni, Pengsheng; Goldstein, Richard; Jette, Alan M.; Bean, Jonathan F.

    2016-01-01

    Objectives To validate trajectories of late-life mobility change using a novel approach designed to overcome the constraints of modest sample size and few follow-up time points. Methods Using clinical reasoning and distribution-based methodology, we identified trajectories of mobility change (Late Life Function and Disability Instrument) across 2 years in 391 participants age ≥65 years from a prospective cohort study designed to identify modifiable impairments predictive of mobility in late-life. We validated our approach using model fit indices and comparing baseline mobility-related factors between trajectories. Results Model fit indices confirmed that the optimal number of trajectories were between 4 and 6. Mobility-related factors varied across trajectories with the most unfavorable values in poor mobility trajectories and the most favorable in high mobility trajectories. These factors included leg strength, trunk extension endurance, knee flexion range of motion, limb velocity, physical performance measures, and the number and prevalence of medical conditions including osteoarthritis and back pain. Conclusions Our findings support the validity of this approach and may facilitate the investigation of a broader scope of research questions within aging populations of varied sizes and traits. PMID:28006024

  8. Improving mine safety technology and training: establishing US global leadership

    SciTech Connect

    2006-12-15

    In 2006, the USA's record of mine safety was interrupted by fatalities that rocked the industry and caused the National Mining Association and its members to recommit to returning the US underground coal mining industry to a global mine safety leadership role. This report details a comprehensive approach to increase the odds of survival for miners in emergency situations and to create a culture of prevention of accidents. Among its 75 recommendations are a need to improve communications, mine rescue training, and escape and protection of miners. Section headings of the report are: Introduction; Review of mine emergency situations in the past 25 years: identifying and addressing the issues and complexities; Risk-based design and management; Communications technology; Escape and protection strategies; Emergency response and mine rescue procedures; Training for preparedness; Summary of recommendations; and Conclusions. 37 refs., 3 figs., 5 apps.

  9. A rational approach to identify inhibitors of Mycobacterium tuberculosis enoyl acyl carrier protein reductase.

    PubMed

    Chhabria, Mahesh T; Parmar, Kailash B; Brahmkshatriya, Pathik S

    2013-01-01

    Mycobacterial enoyl acyl carrier protein (ACP) reductase is an attractive target for focused design of novel antitubercular agents. Structural information available on enoyl-ACP reductase in complex with different ligands was used to generate receptor-based pharmacophore model in Discovery Studio (DS). In parallel, pharmacophore models were also generated using ligand-based approach (HypoGen module in DS). Statistically significant models were generated (r(2) = 0.85) which were found to be predictive as indicated from internal and external cross-validations. The model was used as a query tool to search Zinc and Maybridge databases to identify lead compounds and predict their activity in silico. Database searching retrieved many potential lead compounds having better estimated IC50 values than the training set compounds. These compounds were then evaluated for their drug-likeliness and pharmacokinetic properties using DS. Few selected compounds were then docked into the crystal structure of enoyl-ACP reductase using Dock 6.5. Most compounds were found to have high score values, which was found to be consistent with the results from pharmacophore mapping. Additionally, molecular docking provided useful insights into the nature of binding of the identified hit molecules. In summary, we show a useful strategy employing ligand- and structure-based approaches (pharmacophore modeling coupled with molecular docking) to identify new enoyl- ACP reductase inhibitors for antimycobacterial chemotherapy.

  10. Identifying gene-environment and gene-gene interactions using a progressive penalization approach.

    PubMed

    Zhu, Ruoqing; Zhao, Hongyu; Ma, Shuangge

    2014-05-01

    In genomic studies, identifying important gene-environment and gene-gene interactions is a challenging problem. In this study, we adopt the statistical modeling approach, where interactions are represented by product terms in regression models. For the identification of important interactions, we adopt penalization, which has been used in many genomic studies. Straightforward application of penalization does not respect the "main effect, interaction" hierarchical structure. A few recently proposed methods respect this structure by applying constrained penalization. However, they demand very complicated computational algorithms and can only accommodate a small number of genomic measurements. We propose a computationally fast penalization method that can identify important gene-environment and gene-gene interactions and respect a strong hierarchical structure. The method takes a stagewise approach and progressively expands its optimization domain to account for possible hierarchical interactions. It is applicable to multiple data types and models. A coordinate descent method is utilized to produce the entire regularized solution path. Simulation study demonstrates the superior performance of the proposed method. We analyze a lung cancer prognosis study with gene expression measurements and identify important gene-environment interactions.

  11. A systematic approach to identify functional motifs within vertebrate developmental enhancers

    PubMed Central

    Li, Qiang; Ritter, Deborah; Yang, Nan; Dong, Zhiqiang; Li, Hao; Chuang, Jeffrey H.; Guo, Su

    2012-01-01

    Uncovering the cis-regulatory logic of developmental enhancers is critical to understanding the role of non-coding DNA in development. However, it is cumbersome to identify functional motifs within enhancers, and thus few vertebrate enhancers have their core functional motifs revealed. Here we report a combined experimental and computational approach for discovering regulatory motifs in developmental enhancers. Making use of the zebrafish gene expression database, we computationally identified conserved non-coding elements (CNEs) likely to have a desired tissue-specificity based on the expression of nearby genes. Through a high throughput and robust enhancer assay, we tested the activity of ~100 such CNEs and efficiently uncovered developmental enhancers with desired spatial and temporal expression patterns in the zebrafish brain. Application of de novo motif prediction algorithms on a group of forebrain enhancers identified five top-ranked motifs, all of which were experimentally validated as critical for forebrain enhancer activity. These results demonstrate a systematic approach to discover important regulatory motifs in vertebrate developmental enhancers. Moreover, this dataset provides a useful resource for further dissection of vertebrate brain development and function. PMID:19850031

  12. Improving accuracy for identifying related PubMed queries by an integrated approach.

    PubMed

    Lu, Zhiyong; Wilbur, W John

    2009-10-01

    PubMed is the most widely used tool for searching biomedical literature online. As with many other online search tools, a user often types a series of multiple related queries before retrieving satisfactory results to fulfill a single information need. Meanwhile, it is also a common phenomenon to see a user type queries on unrelated topics in a single session. In order to study PubMed users' search strategies, it is necessary to be able to automatically separate unrelated queries and group together related queries. Here, we report a novel approach combining both lexical and contextual analyses for segmenting PubMed query sessions and identifying related queries and compare its performance with the previous approach based solely on concept mapping. We experimented with our integrated approach on sample data consisting of 1539 pairs of consecutive user queries in 351 user sessions. The prediction results of 1396 pairs agreed with the gold-standard annotations, achieving an overall accuracy of 90.7%. This demonstrates that our approach is significantly better than the previously published method. By applying this approach to a one day query log of PubMed, we found that a significant proportion of information needs involved more than one PubMed query, and that most of the consecutive queries for the same information need are lexically related. Finally, the proposed PubMed distance is shown to be an accurate and meaningful measure for determining the contextual similarity between biological terms. The integrated approach can play a critical role in handling real-world PubMed query log data as is demonstrated in our experiments.

  13. A voxel-based neural approach (VBNA) to identify lung nodules in the ANODE09 study

    NASA Astrophysics Data System (ADS)

    Retico, Alessandra; Bagagli, Francesco; Camarlinghi, Niccolo; Carpentieri, Carmela; Fantacci, Maria Evelina; Gori, Ilaria

    2009-02-01

    The computer-aided detection (CAD) system we applied on the ANODE09 dataset is devoted to identify pulmonary nodules in low-dose and thin-slice computed tomography (CT) images: we developed two different systems for internal (CADI) and juxtapleural nodules (CADJP) in the framework of the italian MAGIC-5 collaboration. The basic modules of CADI subsystem are: a 3D dot-enhancement algorithm for nodule candidate identification and an original approach, we referred as Voxel-Based Neural Approach (VBNA), to reduce the amount of false-positive findings based on a neural classifier working at the voxel level. To detect juxtapleural nodules we developed the CADJP subsystem based on a procedure enhancing regions where many pleura surface normals intersect, followed by a VBNA classification. We present both the FROC curves we obtained on the 5 annotated ANODE09 example dataset, and on all the ANODE09 50 test cases.

  14. Identifying overlapping and hierarchical thematic structures in networks of scholarly papers: a comparison of three approaches.

    PubMed

    Havemann, Frank; Gläser, Jochen; Heinz, Michael; Struck, Alexander

    2012-01-01

    The aim of this paper is to introduce and assess three algorithms for the identification of overlapping thematic structures in networks of papers. We implemented three recently proposed approaches to the identification of overlapping and hierarchical substructures in graphs and applied the corresponding algorithms to a network of 492 information-science papers coupled via their cited sources. The thematic substructures obtained and overlaps produced by the three hierarchical cluster algorithms were compared to a content-based categorisation, which we based on the interpretation of titles, abstracts, and keywords. We defined sets of papers dealing with three topics located on different levels of aggregation: h-index, webometrics, and bibliometrics. We identified these topics with branches in the dendrograms produced by the three cluster algorithms and compared the overlapping topics they detected with one another and with the three predefined paper sets. We discuss the advantages and drawbacks of applying the three approaches to paper networks in research fields.

  15. Identifying western yellow-billed cuckoo breeding habitat with a dual modelling approach

    USGS Publications Warehouse

    Johnson, Matthew J.; Hatten, James R.; Holmes, Jennifer A.; Shafroth, Patrick B.

    2017-01-01

    The western population of the yellow-billed cuckoo (Coccyzus americanus) was recently listed as threatened under the federal Endangered Species Act. Yellow-billed cuckoo conservation efforts require the identification of features and area requirements associated with high quality, riparian forest habitat at spatial scales that range from nest microhabitat to landscape, as well as lower-suitability areas that can be enhanced or restored. Spatially explicit models inform conservation efforts by increasing ecological understanding of a target species, especially at landscape scales. Previous yellow-billed cuckoo modelling efforts derived plant-community maps from aerial photography, an expensive and oftentimes inconsistent approach. Satellite models can remotely map vegetation features (e.g., vegetation density, heterogeneity in vegetation density or structure) across large areas with near perfect repeatability, but they usually cannot identify plant communities. We used aerial photos and satellite imagery, and a hierarchical spatial scale approach, to identify yellow-billed cuckoo breeding habitat along the Lower Colorado River and its tributaries. Aerial-photo and satellite models identified several key features associated with yellow-billed cuckoo breeding locations: (1) a 4.5 ha core area of dense cottonwood-willow vegetation, (2) a large native, heterogeneously dense forest (72 ha) around the core area, and (3) moderately rough topography. The odds of yellow-billed cuckoo occurrence decreased rapidly as the amount of tamarisk cover increased or when cottonwood-willow vegetation was limited. We achieved model accuracies of 75–80% in the project area the following year after updating the imagery and location data. The two model types had very similar probability maps, largely predicting the same areas as high quality habitat. While each model provided unique information, a dual-modelling approach provided a more complete picture of yellow-billed cuckoo habitat

  16. Perspective: NanoMine: A material genome approach for polymer nanocomposites analysis and design

    NASA Astrophysics Data System (ADS)

    Zhao, He; Li, Xiaolin; Zhang, Yichi; Schadler, Linda S.; Chen, Wei; Brinson, L. Catherine

    2016-05-01

    Polymer nanocomposites are a designer class of materials where nanoscale particles, functional chemistry, and polymer resin combine to provide materials with unprecedented combinations of physical properties. In this paper, we introduce NanoMine, a data-driven web-based platform for analysis and design of polymer nanocomposite systems under the material genome concept. This open data resource strives to curate experimental and computational data on nanocomposite processing, structure, and properties, as well as to provide analysis and modeling tools that leverage curated data for material property prediction and design. With a continuously expanding dataset and toolkit, NanoMine encourages community feedback and input to construct a sustainable infrastructure that benefits nanocomposite material research and development.

  17. A reclamation approach for mined prime farmland by adding organic wastes and lime to the subsoil

    SciTech Connect

    Zhai, Qiang; Barnhisel, R.I.

    1996-12-31

    Surface mined prime farmland may be reclaimed by adding organic wastes and lime to subsoil thus improving conditions in root zone. In this study, sewage sludge, poultry manure, horse bedding, and lime were applied to subsoil (15-30 cm) during reclamation. Soil properties and plant growth were measured over two years. All organic amendments tended to lower the subsoil bulk density and increase organic matter and total nitrogen. Liming raised exchangeable calcium, slightly increased pH, but decreased exchangeable magnesium and potassium. Corn ear-leaf and forage tissue nitrogen, yields, and nitrogen removal increased in treatments amended with sewage sludge and poultry manure, but not horse bedding. Subsoil application of sewage sludge or poultry manure seems like a promising method in the reclamation of surface mined prime farmland based on the improvements observed in the root zone environment.

  18. A New Approach for Mining Order-Preserving Submatrices Based on All Common Subsequences.

    PubMed

    Xue, Yun; Liao, Zhengling; Li, Meihang; Luo, Jie; Kuang, Qiuhua; Hu, Xiaohui; Li, Tiechen

    2015-01-01

    Order-preserving submatrices (OPSMs) have been applied in many fields, such as DNA microarray data analysis, automatic recommendation systems, and target marketing systems, as an important unsupervised learning model. Unfortunately, most existing methods are heuristic algorithms which are unable to reveal OPSMs entirely in NP-complete problem. In particular, deep OPSMs, corresponding to long patterns with few supporting sequences, incur explosive computational costs and are completely pruned by most popular methods. In this paper, we propose an exact method to discover all OPSMs based on frequent sequential pattern mining. First, an existing algorithm was adjusted to disclose all common subsequence (ACS) between every two row sequences, and therefore all deep OPSMs will not be missed. Then, an improved data structure for prefix tree was used to store and traverse ACS, and Apriori principle was employed to efficiently mine the frequent sequential pattern. Finally, experiments were implemented on gene and synthetic datasets. Results demonstrated the effectiveness and efficiency of this method.

  19. A novel approach of mining strong jumping emerging patterns based on BSC-tree

    NASA Astrophysics Data System (ADS)

    Liu, Quanzhong; Shi, Peng; Hu, Zhengguo; Zhang, Yang

    2014-03-01

    It is a great challenge to discover strong jumping emerging patterns (SJEPs) from a high-dimensional dataset because of the huge pattern space. In this article, we propose a dynamically growing contrast pattern tree (DGCP-tree) structure to store grown patterns and their path codes arrays with 1-bit counts, which are from the constructed bit string compression tree. A method of mining SJEPs based on DGCP-tree is developed. In order to reduce the pattern search space, we introduce a novel pattern pruning method, which dramatically reduces non-minimal jumping emerging patterns (JEPs) during the mining process. Experiments are performed on three real cancer datasets and three datasets from the University of California, Irvine machine-learning repository. Compared with the well-known CP-tree method, the results show that the proposed method is substantially faster, able to handle higher-dimensional datasets and to prune more non-minimal JEPs.

  20. A fuzzy approach for mining association rules in a probabilistic database

    NASA Astrophysics Data System (ADS)

    Pei, Bin; Chen, Dingjie; Zhao, Suyun; Chen, Hong

    2013-07-01

    Association rule mining is an essential knowledge discovery method that can find associations in database. Previous studies on association rule mining focus on finding quantitative association rules from certain data, or finding Boolean association rules from uncertain data. Unfortunately, due to instrument errors, imprecise of sensor monitoring systems and so on, real-world data tend to be quantitative data with inherent uncertainty. In our paper, we study the discovery of association rules from probabilistic database with quantitative attributes. Once we convert quantitative attributes into fuzzy sets, we get a probabilistic database with fuzzy sets in the database. This is theoretical challenging, since we need to give appropriate interest measures to define support and confidence degree of fuzzy events with probability. We propose a Shannon-like Entropy to measure the information of such event. After that, an algorithm is proposed to find fuzzy association rules from probabilistic database. Finally, an illustrated example is given to demonstrate the procedure of the algorithm.

  1. Leveraging Concept-based Approaches to Identify Potential Phyto-therapies

    PubMed Central

    Sharma, Vivekanand; Sarkar, Indra Neil

    2013-01-01

    The potential of plant-based remedies has been documented in both traditional and contemporary biomedical literature. Such types of text sources may thus be sources from which one might identify potential plant-based therapies (“phyto-therapies”). Concept-based analytic approaches have been shown to uncover knowledge embedded within biomedical literature. However, to date there has been limited attention towards leveraging such techniques for the identification of potential phyto-therapies. This study presents concept-based analytic approaches for the retrieval and ranking of associations between plants and human diseases. Focusing on identification of phyto-therapies described in MEDLINE, both MeSH descriptors used for indexing and MetaMap inferred UMLS concepts are considered. Furthermore, the identification and ranking consider both direct (i.e., plant concepts directly correlated with disease concepts) and inferred (i.e., plant concepts associated with disease concepts based on shared signs and symptoms) relationships. Based on the two scoring methodologies used in this study, it was found that a vector space model approach outperformed probabilistic reliability based inferences. An evaluation of the approach is provided based on therapeutic interventions catalogued in both ClinicalTrials.gov and NDF-RT. The promising findings from this feasibility study highlight the challenges and applicability of concept-based analytic strategies for distilling phyto-therapeutic knowledge from text based knowledge sources like MEDLINE. PMID:23665360

  2. Identifying the Critical Links in Road Transportation Networks: Centrality-based approach utilizing structural properties

    SciTech Connect

    Chinthavali, Supriya

    2016-04-01

    Surface transportation road networks share structural properties similar to other complex networks (e.g., social networks, information networks, biological networks, and so on). This research investigates the structural properties of road networks for any possible correlation with the traffic characteristics such as link flows those determined independently. Additionally, we define a criticality index for the links of the road network that identifies the relative importance in the network. We tested our hypotheses with two sample road networks. Results show that, correlation exists between the link flows and centrality measures of a link of the road (dual graph approach is followed) and the criticality index is found to be effective for one test network to identify the vulnerable nodes.

  3. Rehabilitation prioritization of abandoned mines and its application to Nyala Magnesite Mine

    NASA Astrophysics Data System (ADS)

    Mhlongo, Sphiwe Emmanuel; Amponsah-Dacosta, Francis; Mphephu, Nndweleni Fredrick

    2013-12-01

    The issue of abandoned mine sites is a major environmental and social problem for the mining industry, communities and governments. Historical mine sites are characterized by significant environmental, health and safety problems. The aim of this study was to develop hazard maps that can assist in the prioritization of rehabilitation at Nyala Mine. The approach used involved site examination and characterization to establish the environmental conditions of the mine. Hazards at the mine were identified, scored, and rated using modified Historic Mine Site Scoring System. The scoring focused on source and exposure pathways. The developed hazard maps showed that the best approach of effectively reducing the physical and environmental hazards at Nyala Mine was to give priority to extremely and moderately hazardous pits; surface infrastructure and spoil dumps, and then to tailings dumps characterized with less physical hazards but extremely high environmental hazards. Pits and spoil materials which were found to be relatively less problematic in terms of physical hazards were to receive least attention. The use of this hazard-scoring and risk-ranking methodology coupled with the hazard maps would provide a more robust scientific basis for making sound decisions and prioritize actions that need to be taken to minimize or manage risks associated with various areas of the mine site.

  4. Mining environmental high-throughput sequence data sets to identify divergent amplicon clusters for phylogenetic reconstruction and morphotype visualization.

    PubMed

    Gimmler, Anna; Stoeck, Thorsten

    2015-08-01

    Environmental high-throughput sequencing (envHTS) is a very powerful tool, which in protistan ecology is predominantly used for the exploration of diversity and its geographic and local patterns. We here used a pyrosequenced V4-SSU rDNA data set from a solar saltern pond as test case to exploit such massive protistan amplicon data sets beyond this descriptive purpose. Therefore, we combined a Swarm-based blastn network including 11 579 ciliate V4 amplicons to identify divergent amplicon clusters with targeted polymerase chain reaction (PCR) primer design for full-length small subunit of the ribosomal DNA retrieval and probe design for fluorescence in situ hybridization (FISH). This powerful strategy allows to benefit from envHTS data sets to (i) reveal the phylogenetic position of the taxon behind divergent amplicons; (ii) improve phylogenetic resolution and evolutionary history of specific taxon groups; (iii) solidly assess an amplicons (species') degree of similarity to its closest described relative; (iv) visualize the morphotype behind a divergent amplicons cluster; (v) rapidly FISH screen many environmental samples for geographic/habitat distribution and abundances of the respective organism and (vi) to monitor the success of enrichment strategies in live samples for cultivation and isolation of the respective organisms.

  5. An Integrated Human/Murine Transcriptome and Pathway Approach To Identify Prenatal Treatments For Down Syndrome

    PubMed Central

    Guedj, Faycal; Pennings, Jeroen LA; Massingham, Lauren J.; Wick, Heather C.; Siegel, Ashley E.; Tantravahi, Umadevi; Bianchi, Diana W.

    2016-01-01

    Anatomical and functional brain abnormalities begin during fetal life in Down syndrome (DS). We hypothesize that novel prenatal treatments can be identified by targeting signaling pathways that are consistently perturbed in cell types/tissues obtained from human fetuses with DS and mouse embryos. We analyzed transcriptome data from fetuses with trisomy 21, age and sex-matched euploid controls, and embryonic day 15.5 forebrains from Ts1Cje, Ts65Dn, and Dp16 mice. The new datasets were compared to other publicly available datasets from humans with DS. We used the human Connectivity Map (CMap) database and created a murine adaptation to identify FDA-approved drugs that can rescue affected pathways. USP16 and TTC3 were dysregulated in all affected human cells and two mouse models. DS-associated pathway abnormalities were either the result of gene dosage specific effects or the consequence of a global cell stress response with activation of compensatory mechanisms. CMap analyses identified 56 molecules with high predictive scores to rescue abnormal gene expression in both species. Our novel integrated human/murine systems biology approach identified commonly dysregulated genes and pathways. This can help to prioritize therapeutic molecules on which to further test safety and efficacy. Additional studies in human cells are ongoing prior to pre-clinical prenatal treatment in mice. PMID:27586445

  6. A Multiple-Tracer Approach for Identifying Sewage Sources to an Urban Stream System

    USGS Publications Warehouse

    Hyer, Kenneth Edward

    2007-01-01

    The presence of human-derived fecal coliform bacteria (sewage) in streams and rivers is recognized as a human health hazard. The source of these human-derived bacteria, however, is often difficult to identify and eliminate, because sewage can be delivered to streams through a variety of mechanisms, such as leaking sanitary sewers or private lateral lines, cross-connected pipes, straight pipes, sewer-line overflows, illicit dumping of septic waste, and vagrancy. A multiple-tracer study was conducted to identify site-specific sources of sewage in Accotink Creek, an urban stream in Fairfax County, Virginia, that is listed on the Commonwealth's priority list of impaired streams for violations of the fecal coliform bacteria standard. Beyond developing this multiple-tracer approach for locating sources of sewage inputs to Accotink Creek, the second objective of the study was to demonstrate how the multiple-tracer approach can be applied to other streams affected by sewage sources. The tracers used in this study were separated into indicator tracers, which are relatively simple and inexpensive to apply, and confirmatory tracers, which are relatively difficult and expensive to analyze. Indicator tracers include fecal coliform bacteria, surfactants, boron, chloride, chloride/bromide ratio, specific conductance, dissolved oxygen, turbidity, and water temperature. Confirmatory tracers include 13 organic compounds that are associated with human waste, including caffeine, cotinine, triclosan, a number of detergent metabolites, several fragrances, and several plasticizers. To identify sources of sewage to Accotink Creek, a detailed investigation of the Accotink Creek main channel, tributaries, and flowing storm drains was undertaken from 2001 to 2004. Sampling was conducted in a series of eight synoptic sampling events, each of which began at the most downstream site and extended upstream through the watershed and into the headwaters of each tributary. Using the synoptic

  7. Objective Definition of Rosette Shape Variation Using a Combined Computer Vision and Data Mining Approach

    PubMed Central

    Camargo, Anyela; Papadopoulou, Dimitra; Spyropoulou, Zoi; Vlachonasios, Konstantinos; Doonan, John H.; Gay, Alan P.

    2014-01-01

    Computer-vision based measurements of phenotypic variation have implications for crop improvement and food security because they are intrinsically objective. It should be possible therefore to use such approaches to select robust genotypes. However, plants are morphologically complex and identification of meaningful traits from automatically acquired image data is not straightforward. Bespoke algorithms can be designed to capture and/or quantitate specific features but this approach is inflexible and is not generally applicable to a wide range of traits. In this paper, we have used industry-standard computer vision techniques to extract a wide range of features from images of genetically diverse Arabidopsis rosettes growing under non-stimulated conditions, and then used statistical analysis to identify those features that provide good discrimination between ecotypes. This analysis indicates that almost all the observed shape variation can be described by 5 principal components. We describe an easily implemented pipeline including image segmentation, feature extraction and statistical analysis. This pipeline provides a cost-effective and inherently scalable method to parameterise and analyse variation in rosette shape. The acquisition of images does not require any specialised equipment and the computer routines for image processing and data analysis have been implemented using open source software. Source code for data analysis is written using the R package. The equations to calculate image descriptors have been also provided. PMID:24804972

  8. Objective definition of rosette shape variation using a combined computer vision and data mining approach.

    PubMed

    Camargo, Anyela; Papadopoulou, Dimitra; Spyropoulou, Zoi; Vlachonasios, Konstantinos; Doonan, John H; Gay, Alan P

    2014-01-01

    Computer-vision based measurements of phenotypic variation have implications for crop improvement and food security because they are intrinsically objective. It should be possible therefore to use such approaches to select robust genotypes. However, plants are morphologically complex and identification of meaningful traits from automatically acquired image data is not straightforward. Bespoke algorithms can be designed to capture and/or quantitate specific features but this approach is inflexible and is not generally applicable to a wide range of traits. In this paper, we have used industry-standard computer vision techniques to extract a wide range of features from images of genetically diverse Arabidopsis rosettes growing under non-stimulated conditions, and then used statistical analysis to identify those features that provide good discrimination between ecotypes. This analysis indicates that almost all the observed shape variation can be described by 5 principal components. We describe an easily implemented pipeline including image segmentation, feature extraction and statistical analysis. This pipeline provides a cost-effective and inherently scalable method to parameterise and analyse variation in rosette shape. The acquisition of images does not require any specialised equipment and the computer routines for image processing and data analysis have been implemented using open source software. Source code for data analysis is written using the R package. The equations to calculate image descriptors have been also provided.

  9. A comparative genomic approach for identifying synthetic lethal interactions in human cancer.

    PubMed

    Deshpande, Raamesh; Asiedu, Michael K; Klebig, Mitchell; Sutor, Shari; Kuzmin, Elena; Nelson, Justin; Piotrowski, Jeff; Shin, Seung Ho; Yoshida, Minoru; Costanzo, Michael; Boone, Charles; Wigle, Dennis A; Myers, Chad L

    2013-10-15

    Synthetic lethal interactions enable a novel approach for discovering specific genetic vulnerabilities in cancer cells that can be exploited for the development of therapeutics. Despite successes in model organisms such as yeast, discovering synthetic lethal interactions on a large scale in human cells remains a significant challenge. We describe a comparative genomic strategy for identifying cancer-relevant synthetic lethal interactions whereby candidate interactions are prioritized on the basis of genetic interaction data available in yeast, followed by targeted testing of candidate interactions in human cell lines. As a proof of principle, we describe two novel synthetic lethal interactions in human cells discovered by this approach, one between the tumor suppressor gene SMARCB1 and PSMA4, and another between alveolar soft-part sarcoma-associated ASPSCR1 and PSMC2. These results suggest therapeutic targets for cancers harboring mutations in SMARCB1 or ASPSCR1 and highlight the potential of a targeted, cross-species strategy for identifying synthetic lethal interactions relevant to human cancer.

  10. Identifying ligands at orphan GPCRs: current status using structure-based approaches.

    PubMed

    Ngo, Tony; Kufareva, Irina; Coleman, James Lj; Graham, Robert M; Abagyan, Ruben; Smith, Nicola J

    2016-10-01

    GPCRs are the most successful pharmaceutical targets in history. Nevertheless, the pharmacology of many GPCRs remains inaccessible as their endogenous or exogenous modulators have not been discovered. Tools that explore the physiological functions and pharmacological potential of these 'orphan' GPCRs, whether they are endogenous and/or surrogate ligands, are therefore of paramount importance. Rates of receptor deorphanization determined by traditional reverse pharmacology methods have slowed, indicating a need for the development of more sophisticated and efficient ligand screening approaches. Here, we discuss the use of structure-based ligand discovery approaches to identify small molecule modulators for exploring the function of orphan GPCRs. These studies have been buoyed by the growing number of GPCR crystal structures solved in the past decade, providing a broad range of template structures for homology modelling of orphans. This review discusses the methods used to establish the appropriate signalling assays to test orphan receptor activity and provides current examples of structure-based methods used to identify ligands of orphan GPCRs. Linked Articles This article is part of a themed section on Molecular Pharmacology of G Protein-Coupled Receptors. To view the other articles in this section visit http://onlinelibrary.wiley.com/doi/10.1111/bph.v173.20/issuetoc.

  11. Innovative Approach for Identifying Root Causes of Glass Defects in Sterile Drug Product Manufacturing.

    PubMed

    Eberle, Lukas; Svensson, Alexander; Graser, Andreas; Luemkemann, Joerg; Sugiyama, Hirokazu; Schmidt, Rainer; Hungerbuehler, Konrad

    2017-03-14

    In sterile drug product manufacturing, scratched and broken glass containers (i.e., vials) cause product losses, glass particles, equipment contamination and additional cleaning efforts. However, mechanical resistance and exposure of vials to mechanical stress are not sufficiently understood, and no systematic approach for reducing glass-related losses is established. Manufacturers may tackle glass-related losses more rationally if (i) frequencies for inflicting disqualifying damages to drug product containers are known for given forces, (ii) actual exposure in industrial filling lines is quantified and (iii) process enhancements are derived based on collected information. In this work, an innovative approach for exploiting these opportunities, identifying glass defect root causes and reducing glass defects is provided. Devices for quantifying (i) damaging frequencies and (ii) actual exposure are presented and then applied in an industrial case study on sterile drug product manufacturing; finally, (iii) process enhancements are derived and implemented. In the case study, frequencies for scratching vials at given forces as well as breaking forces have been determined. Peak exposure in the investigated filling line was detected at 6 Newton. As a result of the case study, key machine parts were identified and adjusted.

  12. A Mutant Library Approach to Identify Improved Meningococcal Factor H Binding Protein Vaccine Antigens

    PubMed Central

    Konar, Monica; Rossi, Raffaella; Walter, Helen; Pajon, Rolando; Beernink, Peter T.

    2015-01-01

    Factor H binding protein (FHbp) is a virulence factor used by meningococci to evade the host complement system. FHbp elicits bactericidal antibodies in humans and is part of two recently licensed vaccines. Using human complement Factor H (FH) transgenic mice, we previously showed that binding of FH decreased the protective antibody responses to FHbp vaccination. Therefore, in the present study we devised a library-based method to identify mutant FHbp antigens with very low binding of FH. Using an FHbp sequence variant in one of the two licensed vaccines, we displayed an error-prone PCR mutant FHbp library on the surface of Escherichia coli. We used fluorescence-activated cell sorting to isolate FHbp mutants with very low binding of human FH and preserved binding of control anti-FHbp monoclonal antibodies. We sequenced the gene encoding FHbp from selected clones and introduced the mutations into a soluble FHbp construct. Using this approach, we identified several new mutant FHbp vaccine antigens that had very low binding of FH as measured by ELISA and surface plasmon resonance. The new mutant FHbp antigens elicited protective antibody responses in human FH transgenic mice that were up to 20-fold higher than those elicited by the wild-type FHbp antigen. This approach offers the potential to discover mutant antigens that might not be predictable even with protein structural information and potentially can be applied to other microbial vaccine antigens that bind host proteins. PMID:26057742

  13. A systematic approach for identifying and presenting mechanistic evidence in human health assessments

    PubMed Central

    Kushman, Mary E.; Kraft, Andrew D.; Guyton, Kathryn Z.; Chiu, Weihsueh A.; Makris, Susan L.; Rusyn, Ivan

    2013-01-01

    Clear documentation of literature search and presentation methodologies can improve transparency in chemical hazard assessments. We sought to improve clarity for the scientific support for cancer mechanisms of action using a systematic approach to literature retrieval, selection, and presentation of studies. The general question was “What are the mechanisms by which a chemical may cause carcinogenicity in the target tissue?” Di(2-ethylhexyl)phthalate was used as a case study chemical with a complex database of >3,000 publications. Relevant mechanistic events were identified from published reviews. The PubMed search strategy included relevant synonyms and wildcards for DEHP and its metabolites, mechanistic events, and species of interest. Tiered exclusion/inclusion criteria for study pertinence were defined, and applied to the retrieved literature. Manual curation was conducted for mechanistic events with large literature databases. Literature trees documented identification and selection of the literature evidence. The selected studies were summarized in evidence tables accompanied by succinct narratives. Primary publications were deposited into the Health and Environmental Research Online (http://hero.epa.gov/) database and identified by pertinence criteria and key terms to permit organized retrieval. This approach contributes to human health assessment by effectively managing a large volume of literature, improving transparency, and facilitating subsequent synthesis of information across studies. PMID:23959061

  14. Comprehensively identifying and characterizing the missing gene sequences in human reference genome with integrated analytic approaches.

    PubMed

    Chen, Geng; Wang, Charles; Shi, Leming; Tong, Weida; Qu, Xiongfei; Chen, Jiwei; Yang, Jianmin; Shi, Caiping; Chen, Long; Zhou, Peiying; Lu, Bingxin; Shi, Tieliu

    2013-08-01

    The human reference genome is still incomplete and a number of gene sequences are missing from it. The approaches to uncover them, the reasons causing their absence and their functions are less explored. Here, we comprehensively identified and characterized the missing genes of human reference genome with RNA-Seq data from 16 different human tissues. By using a combined approach of genome-guided transcriptome reconstruction coupled with genome-wide comparison, we uncovered 3.78 and 2.37 Mb transcribed regions in the human genome assemblies of Celera and HuRef either missed from their homologous chromosomes of NCBI human reference genome build 37.2 or partially or entirely absent from the reference. We further identified a significant number of novel transcript contigs in each tissue from de novo transcriptome assembly that are unalignable to NCBI build 37.2 but can be aligned to at least one of the genomes from Celera, HuRef, chimpanzee, macaca or mouse. Our analyses indicate that the missing genes could result from genome misassembly, transposition, copy number variation, translocation and other structural variations. Moreover, our results further suggest that a large portion of these missing genes are conserved between human and other mammals, implying their important biological functions. Totally, 1,233 functional protein domains were detected in these missing genes. Collectively, our study not only provides approaches for uncovering the missing genes of a genome, but also proposes the potential reasons causing genes missed from the genome and highlights the importance of uncovering the missing genes of incomplete genomes.

  15. A General Approach for Identifying Distant Regulatory Elements Applied to the Gdf6 Gene

    PubMed Central

    Mortlock, Douglas P.; Guenther, Catherine; Kingsley, David M.

    2003-01-01

    Regulatory sequences in higher genomes can map large distances from gene coding regions, and cannot yet be identified by simple inspection of primary DNA sequence information. Here we describe an efficient method of surveying large genomic regions for gene regulatory information, and subdividing complex sets of distant regulatory elements into smaller intervals for detailed study. The mouse Gdf6 gene is expressed in a number of distinct embryonic locations that are involved in the patterning of skeletal and soft tissues. To identify sequences responsible for Gdf6 regulation, we first isolated a series of overlapping bacterial artificial chromosomes (BACs) that extend varying distances upstream and downstream of the gene. A LacZ reporter cassette was integrated into the Gdf6 transcription unit of each BAC using homologous recombination in bacteria. Each modified BAC was injected into fertilized mouse eggs, and founder transgenic embryos were analyzed for LacZ expression mid-gestation. The overlapping segments defined by the BAC clones revealed five separate regulatory regions that drive LacZ expression in 11 distinct anatomical locations. To further localize sequences that control expression in developing skeletal joints, we created a series of BAC constructs with precise deletions across a putative joint-control region. This approach further narrowed the critical control region to an area containing several stretches of sequence that are highly conserved between mice and humans. A distant 2.9-kilobase fragment containing the highly conserved regions is able to direct very specific expression of a minimal promoter/LacZ reporter in proximal limb joints. These results demonstrate that even distant, complex regulatory sequences can be identified using a combination of BAC scanning, BAC deletion, and comparative sequencing approaches. PMID:12915490

  16. Identifying functional reorganization of spelling networks: an individual peak probability comparison approach.

    PubMed

    Purcell, Jeremy J; Rapp, Brenda

    2013-01-01

    Previous research has shown that damage to the neural substrates of orthographic processing can lead to functional reorganization during reading (Tsapkini et al., 2011); in this research we ask if the same is true for spelling. To examine the functional reorganization of spelling networks we present a novel three-stage Individual Peak Probability Comparison (IPPC) analysis approach for comparing the activation patterns obtained during fMRI of spelling in a single brain-damaged individual with dysgraphia to those obtained in a set of non-impaired control participants. The first analysis stage characterizes the convergence in activations across non-impaired control participants by applying a technique typically used for characterizing activations across studies: Activation Likelihood Estimate (ALE) (Turkeltaub et al., 2002). This method was used to identify locations that have a high likelihood of yielding activation peaks in the non-impaired participants. The second stage provides a characterization of the degree to which the brain-damaged individual's activations correspond to the group pattern identified in Stage 1. This involves performing a Mahalanobis distance statistics analysis (Tsapkini et al., 2011) that compares each of a control group's peak activation locations to the nearest peak generated by the brain-damaged individual. The third stage evaluates the extent to which the brain-damaged individual's peaks are atypical relative to the range of individual variation among the control participants. This IPPC analysis allows for a quantifiable, statistically sound method for comparing an individual's activation pattern to the patterns observed in a control group and, thus, provides a valuable tool for identifying functional reorganization in a brain-damaged individual with impaired spelling. Furthermore, this approach can be applied more generally to compare any individual's activation pattern with that of a set of other individuals.

  17. Identifying functional reorganization of spelling networks: an individual peak probability comparison approach

    PubMed Central

    Purcell, Jeremy J.; Rapp, Brenda

    2013-01-01

    Previous research has shown that damage to the neural substrates of orthographic processing can lead to functional reorganization during reading (Tsapkini et al., 2011); in this research we ask if the same is true for spelling. To examine the functional reorganization of spelling networks we present a novel three-stage Individual Peak Probability Comparison (IPPC) analysis approach for comparing the activation patterns obtained during fMRI of spelling in a single brain-damaged individual with dysgraphia to those obtained in a set of non-impaired control participants. The first analysis stage characterizes the convergence in activations across non-impaired control participants by applying a technique typically used for characterizing activations across studies: Activation Likelihood Estimate (ALE) (Turkeltaub et al., 2002). This method was used to identify locations that have a high likelihood of yielding activation peaks in the non-impaired participants. The second stage provides a characterization of the degree to which the brain-damaged individual's activations correspond to the group pattern identified in Stage 1. This involves performing a Mahalanobis distance statistics analysis (Tsapkini et al., 2011) that compares each of a control group's peak activation locations to the nearest peak generated by the brain-damaged individual. The third stage evaluates the extent to which the brain-damaged individual's peaks are atypical relative to the range of individual variation among the control participants. This IPPC analysis allows for a quantifiable, statistically sound method for comparing an individual's activation pattern to the patterns observed in a control group and, thus, provides a valuable tool for identifying functional reorganization in a brain-damaged individual with impaired spelling. Furthermore, this approach can be applied more generally to compare any individual's activation pattern with that of a set of other individuals. PMID:24399981

  18. Targeted sequencing approach to identify genetic mutations in Nasu-Hakola disease

    PubMed Central

    Satoh, Jun-ichi; Yanaizu, Motoaki; Tosaki, Youhei; Sakai, Kenji; Kino, Yoshihiro

    2016-01-01

    Summary Nasu-Hakola disease (NHD) is a rare autosomal recessive disorder characterized by sclerosing leukoencephalopathy and multifocal bone cysts, caused by a loss-of-function mutation of either TYROBP (DAP12) or TREM2. TREM2 and DAP12 constitute a receptor/adaptor signaling complex expressed exclusively on osteoclasts, dendritic cells, macrophages, and microglia. Premortem molecular diagnosis of NHD requires genetic analysis of both TYROBP and TREM2, in which 20 distinct NHD-causing mutations have been reported. Due to genetic heterogeneity, it is often difficult to identify the exact mutation responsible for NHD. Recently, the revolution of the next-generation sequencing (NGS) technology has greatly advanced the field of genome research. A targeted sequencing approach allows us to investigate a selected set of disease-causing genes and mutations in a number of samples within several days. By targeted sequencing using the TruSight One Sequencing Panel, we resequenced genetic mutations of seven NHD cases with known molecular diagnosis and two control subjects. We identified homozygous variants of TYROBP or TREM2 in all NHD cases, composed of a frameshift mutation of c.141delG in exon 3 of TYROBP in four cases, a missense mutation of c.2T>C in exon 1 of TYROBP in two cases, or a splicing mutation of c.482+2T>C in intron 3 of TREM2 in one case. The results of targeted resequencing corresponded to those of Sanger sequencing. In contrast, causative variants were not detected in control subjects. These results indicate that targeted sequencing is a useful approach to precisely identify genetic mutations responsible for NHD in a comprehensive manner. PMID:27904822

  19. TRM: a powerful two-stage machine learning approach for identifying SNP-SNP interactions.

    PubMed

    Lin, Hui-Yi; Chen, Y Ann; Tsai, Ya-Yu; Qu, Xiaotao; Tseng, Tung-Sung; Park, Jong Y

    2012-01-01

    Studies have shown that interactions of single nucleotide polymorphisms (SNPs) may play an important role in understanding the causes of complex disease. We have proposed an integrated machine learning method that combines two machine-learning methods-Random Forests (RF) and Multivariate Adaptive Regression Splines (MARS)-to identify a subset of important SNPs and detect interaction patterns more effectively and efficiently. In this two-stage RF-MARS (TRM) approach, RF is first applied to detect a predictive subset of SNPs, and then MARS is used to identify the interaction patterns. We evaluated the TRM performances in four models. RF variable selection was based on out-of-bag classification error rate (OOB) and variable important spectrum (IS). Our results support that RF(OOB) had better performance than MARS and RF(IS) in detecting important variables. This study demonstrates that TRM(OOB) , which is RF(OOB) plus MARS, has combined the strengths of RF and MARS in identifying SNP-SNP interactions in a scenario of 100 candidate SNPs. TRM(OOB) had greater true positive rate and lower false positive rate compared with MARS, particularly for searching interactions with a strong association with the outcome. Therefore, the use of TRM(OOB) is favored for exploring SNP-SNP interactions in a large-scale genetic variation study.

  20. Identifying natural substrates for chaperonins using a sequence-based approach

    PubMed Central

    Stan, George; Brooks, Bernard R.; Lorimer, George H.; Thirumalai, D.

    2005-01-01

    The Escherichia coli chaperonin machinery, GroEL, assists the folding of a number of proteins. We describe a sequence-based approach to identify the natural substrate proteins (SPs) for GroEL. Our method is based on the hypothesis that natural SPs are those that contain patterns of residues similar to those found in either GroES mobile loop and/or strongly binding peptide in complex with GroEL. The method is validated by comparing the predicted results with experimentally determined natural SPs for GroEL. We have searched for such patterns in five genomes. In the E. coli genome, we identify 1422 (about one-third) sequences that are putative natural SPs. In Saccharomyces cerevisiae, 2885 (32%) of sequences can be natural substrates for Hsp60, which is the analog of GroEL. The precise number of natural SPs is shown to be a function of the number of contacts an SP makes with the apical domain (NC) and the number of binding sites (NB) in the oligomer with which it interacts. For known SPs for GroEL, we find ~4 < NC < 5 and 2 ≤ NB ≤ 4. A limited analysis of the predicted binding sequences shows that they do not adopt any preferred secondary structure. Our method also predicts the putative binding regions in the identified SPs. The results of our study show that a variety of SPs, associated with diverse functions, can interact with GroEL. PMID:15576562

  1. A novel approach identifies new differentially methylated regions (DMRs) associated with imprinted genes

    PubMed Central

    Choufani, Sanaa; Shapiro, Jonathan S.; Susiarjo, Martha; Butcher, Darci T.; Grafodatskaya, Daria; Lou, Youliang; Ferreira, Jose C.; Pinto, Dalila; Scherer, Stephen W.; Shaffer, Lisa G.; Coullin, Philippe; Caniggia, Isabella; Beyene, Joseph; Slim, Rima; Bartolomei, Marisa S.; Weksberg, Rosanna

    2011-01-01

    Imprinted genes are critical for normal human growth and neurodevelopment. They are characterized by differentially methylated regions (DMRs) of DNA that confer parent of origin-specific transcription. We developed a new strategy to identify imprinted gene-associated DMRs. Using genome-wide methylation profiling of sodium bisulfite modified DNA from normal human tissues of biparental origin, candidate DMRs were identified by selecting CpGs with methylation levels consistent with putative allelic differential methylation. In parallel, the methylation profiles of tissues of uniparental origin, i.e., paternally-derived androgenetic complete hydatidiform moles (AnCHMs), and maternally-derived mature cystic ovarian teratoma (MCT), were examined and then used to identify CpGs with parent of origin-specific DNA methylation. With this approach, we found known DMRs associated with imprinted genomic regions as well as new DMRs for known imprinted genes, NAP1L5 and ZNF597, and novel candidate imprinted genes. The paternally methylated DMR for one candidate, AXL, a receptor tyrosine kinase, was also validated in experiments with mouse embryos that demonstrated Axl was expressed preferentially from the maternal allele in a DNA methylation-dependent manner. PMID:21324877

  2. Ask and Ye Shall Receive? Automated Text Mining of Michigan Capital Facility Finance Bond Election Proposals to Identify Which Topics Are Associated with Bond Passage and Voter Turnout

    ERIC Educational Resources Information Center

    Bowers, Alex J.; Chen, Jingjing

    2015-01-01

    The purpose of this study is to bring together recent innovations in the research literature around school district capital facility finance, municipal bond elections, statistical models of conditional time-varying outcomes, and data mining algorithms for automated text mining of election ballot proposals to examine the factors that influence the…

  3. Demonstrating a Market-Based Approach to the Reclamation of Mined Lands in West Virginia

    SciTech Connect

    Goodrich-Mahoney, John; Donnelly, Ellen

    2009-12-31

    This project demonstrated that developing environmental credits on private land—including abandoned mined lands—is dependent on a number of factors, some of them beyond the control of the project team. In this project, acid mine drainage (AMD) was successfully remediated through the construction of a passive AMD treatment system. Extensive water quality sampling both before and after the installation of the passive AMD treatment system showed that the system achieved removal efficiencies and pollutant loading reductions for acidity, iron, aluminum and manganese that were consistent with systems of similar size and design. The success of the passive AMD treatment system should have resulted in water credits if the project had not been terminated. Developing carbon sequestration credits, however, was much more complex and was not achieved in this project. The primary challenge that the project team encountered in meeting the full project objectives was the unsuccessful attempt to have the landowner sign a conservation easement for his property. This would have allowed the project team to clear and reforest the site, monitor the progress of the newly planted trees, and eventually realize carbon sequestration credits once the forest was mature. The delays caused by the lack of a conservation easement, as well as other factors, eventually resulted in the reforestation portion of the project being cancelled. The information in this report will help the public make more informed decisions regarding the potential of using water and carbon, and other credits to support the remediation of minded lands through out the United States. The hope is that by using credits that more mined lands with be remediated.

  4. Long-range prediction of Indian summer monsoon rainfall using data mining and statistical approaches

    NASA Astrophysics Data System (ADS)

    H, Vathsala; Koolagudi, Shashidhar G.

    2016-07-01

    This paper presents a hybrid model to better predict Indian summer monsoon rainfall. The algorithm considers suitable techniques for processing dense datasets. The proposed three-step algorithm comprises closed itemset generation-based association rule mining for feature selection, cluster membership for dimensionality reduction, and simple logistic function for prediction. The application of predicting rainfall into flood, excess, normal, deficit, and drought based on 36 predictors consisting of land and ocean variables is presented. Results show good accuracy in the considered study period of 37years (1969-2005).

  5. Novel and Unexpected Microbial Diversity in Acid Mine Drainage in Svalbard (78° N), Revealed by Culture-Independent Approaches.

    PubMed

    García-Moyano, Antonio; Austnes, Andreas Erling; Lanzén, Anders; González-Toril, Elena; Aguilera, Ángeles; Øvreås, Lise

    2015-10-13

    Svalbard, situated in the high Arctic, is an important past and present coal mining area. Dozens of abandoned waste rock piles can be found in the proximity of Longyearbyen. This environment offers a unique opportunity for studying the biological control over the weathering of sulphide rocks at low temperatures. Although the extension and impact of acid mine drainage (AMD) in this area is known, the native microbial communities involved in this process are still scarcely studied and uncharacterized. Several abandoned mining areas were explored in the search for active AMD and a culture-independent approach was applied with samples from two different runoffs for the identification and quantification of the native microbial communities. The results obtained revealed two distinct microbial communities. One of the runoffs was more extreme with regards to pH and higher concentration of soluble iron and heavy metals. These conditions favored the development of algal-dominated microbial mats. Typical AMD microorganisms related to known iron-oxidizing bacteria (Acidithiobacillus ferrivorans, Acidobacteria and Actinobacteria) dominated the bacterial community although some unexpected populations related to Chloroflexi were also significant. No microbial mats were found in the second area. The geochemistry here showed less extreme drainage, most likely in direct contact with the ore under the waste pile. Large deposits of secondary minerals were found and the presence of iron stalks was revealed by microscopy analysis. Although typical AMD microorganisms were also detected here, the microbial community was dominated by other populations, some of them new to this type of system (Saccharibacteria, Gallionellaceae). These were absent or lowered in numbers the farther from the spring source and they could represent native populations involved in the oxidation of sulphide rocks within the waste rock pile. This environment appears thus as a highly interesting field of potential

  6. Novel and Unexpected Microbial Diversity in Acid Mine Drainage in Svalbard (78° N), Revealed by Culture-Independent Approaches

    PubMed Central

    García-Moyano, Antonio; Austnes, Andreas Erling; Lanzén, Anders; González-Toril, Elena; Aguilera, Ángeles; Øvreås, Lise

    2015-01-01

    Svalbard, situated in the high Arctic, is an important past and present coal mining area. Dozens of abandoned waste rock piles can be found in the proximity of Longyearbyen. This environment offers a unique opportunity for studying the biological control over the weathering of sulphide rocks at low temperatures. Although the extension and impact of acid mine drainage (AMD) in this area is known, the native microbial communities involved in this process are still scarcely studied and uncharacterized. Several abandoned mining areas were explored in the search for active AMD and a culture-independent approach was applied with samples from two different runoffs for the identification and quantification of the native microbial communities. The results obtained revealed two distinct microbial communities. One of the runoffs was more extreme with regards to pH and higher concentration of soluble iron and heavy metals. These conditions favored the development of algal-dominated microbial mats. Typical AMD microorganisms related to known iron-oxidizing bacteria (Acidithiobacillus ferrivorans, Acidobacteria and Actinobacteria) dominated the bacterial community although some unexpected populations related to Chloroflexi were also significant. No microbial mats were found in the second area. The geochemistry here showed less extreme drainage, most likely in direct contact with the ore under the waste pile. Large deposits of secondary minerals were found and the presence of iron stalks was revealed by microscopy analysis. Although typical AMD microorganisms were also detected here, the microbial community was dominated by other populations, some of them new to this type of system (Saccharibacteria, Gallionellaceae). These were absent or lowered in numbers the farther from the spring source and they could represent native populations involved in the oxidation of sulphide rocks within the waste rock pile. This environment appears thus as a highly interesting field of potential

  7. On the Way to New Possible Na-Ion Conductors: The Voronoi-Dirichlet Approach, Data Mining and Symmetry Considerations in Ternary Na Oxides.

    PubMed

    Meutzner, Falk; Münchgesang, Wolfram; Kabanova, Natalya A; Zschornak, Matthias; Leisegang, Tilmann; Blatov, Vladislav A; Meyer, Dirk C

    2015-11-09

    With the constant growth of the lithium battery market and the introduction of electric vehicles and stationary energy storage solutions, the low abundance and high price of lithium will greatly impact its availability in the future. Thus, a diversification of electrochemical energy storage technologies based on other source materials is of great relevance. Sodium is energetically similar to lithium but cheaper and more abundant, which results in some already established stationary concepts, such as Na-S and ZEBRA cells. The most significant bottleneck for these technologies is to find effective solid ionic conductors. Thus, the goal of this work is to identify new ionic conductors for Na ions in ternary Na oxides. For this purpose, the Voronoi-Dirichlet approach has been applied to the Inorganic Crystal Structure Database and some new procedures are introduced to the algorithm implemented in the programme package ToposPro. The main new features are the use of data mined values, which are then used for the evaluation of void spaces, and a new method of channel size calculation. 52 compounds have been identified to be high-potential candidates for solid ionic conductors. The results were analysed from a crystallographic point of view in combination with phenomenological requirements for ionic conductors and intercalation hosts. Of the most promising candidates, previously reported compounds have also been successfully identified by using the employed algorithm, which shows the reliability of the method.

  8. Mathematical and experimental approaches to identify and predict the effects of chemotherapy on neuroglial precursors

    PubMed Central

    Hyrien, Ollivier; Dietrich, Jörg; Noble, Mark

    2010-01-01

    The adverse effects of chemotherapy on normal cells of the body create substantial clinical problems for many cancer patients. Relatively little is known, however, about the effects, other than promotion of cell death, of such agents on the function of normal precursor cells critical in tissue homeostasis and repair. We have combined mathematical and experimental analyses to identify the effects of sublethal doses of chemotherapy on glial precursor cells of the central nervous system (CNS). We modeled the temporal development of a population of precursor and terminally differentiated cells exposed to sublethal doses of carmustine (BCNU), a classical alkylating chemotherapeutic agent used in treatment of gliomas and non-Hodgkin’s lymphomas, as a multi-type age-dependent branching process. We fitted our model to data from in vitro clonal experiments using the method of pseudo-likelihood. This approach identifies several novel drug effects, including modification of the cell cycle length, the time between division and differentiation, and alteration in the probability of undergoing self-renewal division in precursor cells. These changes of precursor cell function in the chemotherapy-exposed brain may have profound clinical implications. Major Findings We applied our computational approach to analyze the effects of BCNU on clonal cultures of oligodendrocyte progenitor cells – one of the best-characterized neural progenitor cells in the mammalian brain. Our analysis reveals that transient exposures to BCNU increased the cell cycle length of progenitor cells and decreased their time to differentiation, while also decreasing the likelihood that they will undergo self-renewing divisions. By investigating the behavior of our mathematical model we demonstrate that precursor cell populations should recover spontaneously from transient modifications of the timing of division and of differentiation, but such recovery will not happen after alteration of cell fate. These

  9. A Sparse Reconstruction Approach for Identifying Gene Regulatory Networks Using Steady-State Experiment Data

    PubMed Central

    Zhang, Wanhong; Zhou, Tong

    2015-01-01

    Motivation Identifying gene regulatory networks (GRNs) which consist of a large number of interacting units has become a problem of paramount importance in systems biology. Situations exist extensively in which causal interacting relationships among these units are required to be reconstructed from measured expression data and other a priori information. Though numerous classical methods have been developed to unravel the interactions of GRNs, these methods either have higher computing complexities or have lower estimation accuracies. Note that great similarities exist between identification of genes that directly regulate a specific gene and a sparse vector reconstruction, which often relates to the determination of the number, location and magnitude of nonzero entries of an unknown vector by solving an underdetermined system of linear equations y = Φx. Based on these similarities, we propose a novel framework of sparse reconstruction to identify the structure of a GRN, so as to increase accuracy of causal regulation estimations, as well as to reduce their computational complexity. Results In this paper, a sparse reconstruction framework is proposed on basis of steady-state experiment data to identify GRN structure. Different from traditional methods, this approach is adopted which is well suitable for a large-scale underdetermined problem in inferring a sparse vector. We investigate how to combine the noisy steady-state experiment data and a sparse reconstruction algorithm to identify causal relationships. Efficiency of this method is tested by an artificial linear network, a mitogen-activated protein kinase (MAPK) pathway network and the in silico networks of the DREAM challenges. The performance of the suggested approach is compared with two state-of-the-art algorithms, the widely adopted total least-squares (TLS) method and those available results on the DREAM project. Actual results show that, with a lower computational cost, the proposed method can

  10. Exposures from mining and mine tailings

    NASA Astrophysics Data System (ADS)

    Chambers, Douglas B.; Cassaday, Valerie J.; Lowe, Leo M.

    The mining, milling and tailings management of uranium ores results in environmental radiation exposures. This paper describes the sources of radioactive emissions to the environment associated with these activities, reviews the basic approach used to estimate the resultant radiation exposures and presents examples of typical uranium mind and mill facilities. Similar concepts apply to radiation exposures associated with the mining of non-radioactive ores although the magnitudes of the exposures would normally be smaller than those associated with uranium mining.

  11. Data mining approach to evaluating the use of skin surface electropotentials for breast cancer detection.

    PubMed

    Sree, S Vinitha; Ng, E Y K; Acharya, U Rajendra

    2010-02-01

    The Biofield Diagnostic System (BDS) uses a score formed with measured skin surface electropotentials and a prior Level Of Suspicion (LOS) value (predicted by the physician based on the patient's ultrasound or mammography results) to calculate a revised Post-BDS LOS to indicate the presence of breast cancer. The demographic details, BDS test results, and the recorded electropotential values form a potentially useful dataset, which can be further explored with data mining tools to extract important information that can be used to improve the current predictive accuracy of the device. According to the proposed data mining framework, the BDS dataset with 291 cases was first pre-processed to remove outliers and then used to select relevant and informative features for classifier development and finally to evaluate the capability of the built classifiers in detecting the presence of the disease. Two popular feature selection techniques, namely, the filter and wrapper methods, were used in parallel for feature selection. A few statistical inference based classifiers and neural networks were used for classification. The proposed technique significantly improved the BDS prediction accuracy. Also, the use of prior LOS and, hence, the Post-BDS LOS, associates a mild subjective interpretation to the current prediction methodology used by BDS. However, the feature subset selected in our analysis that gave the best accuracy did not use either of these features. This result indicates the possibility of using BDS as a better objective assessment tool for breast cancer detection.

  12. A systems approach to accident causation in mining: an application of the HFACS method.

    PubMed

    Lenné, Michael G; Salmon, Paul M; Liu, Charles C; Trotter, Margaret

    2012-09-01

    This project aimed to provide a greater understanding of the systemic factors involved in mining accidents, and to examine those organisational and supervisory failures that are predictive of sub-standard performance at operator level. A sample of 263 significant mining incidents in Australia across 2007-2008 were analysed using the Human Factors Analysis and Classification System (HFACS). Two human factors specialists independently undertook the analysis. Incidents occurred more frequently in operations concerning the use of surface mobile equipment (38%) and working at heights (21%), however injury was more frequently associated with electrical operations and vehicles and machinery. Several HFACS categories appeared frequently: skill-based errors (64%) and violations (57%), issues with the physical environment (56%), and organisational processes (65%). Focussing on the overall system, several factors were found to predict the presence of failures in other parts of the system, including planned inappropriate operations and team resource management; inadequate supervision and team resource management; and organisational climate and inadequate supervision. It is recommended that these associations deserve greater attention in future attempts to develop accident countermeasures, although other significant associations should not be ignored. In accordance with findings from previous HFACS-based analyses of aviation and medical incidents, efforts to reduce the frequency of unsafe acts or operations should be directed to a few critical HFACS categories at the higher levels: organisational climate, planned inadequate operations, and inadequate supervision. While remedial strategies are proposed it is important that future efforts evaluate the utility of the measures proposed in studies of system safety.

  13. Perception of Air Pollution in the Jinchuan Mining Area, China: A Structural Equation Modeling Approach.

    PubMed

    Li, Zhengtao; Folmer, Henk; Xue, Jianhong

    2016-07-21

    Studies on the perception of air pollution in China are very limited. The aim of this paper is to help to fill this gap by analyzing a cross-sectional dataset of 759 residents of the Jinchuan mining area, Gansu Province, China. The estimations suggest that perception of air pollution is two-dimensional. The first dimension is the perceived intensity of air pollution and the second is the perceived hazardousness of the pollutants. Both dimensions are influenced by environmental knowledge. Perceived intensity is furthermore influenced by socio-economic status and proximity to the pollution source; perceived hazardousness is influenced by socio-economic status, family health experience, family size and proximity to the pollution source. There are no reverse effects from perception on environmental knowledge. The main conclusion is that virtually all Jinchuan residents perceive high intensity and hazardousness of air pollution despite the fact that public information on air pollution and its health impacts is classified to a great extent. It is suggested that, to assist the residents to take appropriate preventive action, the local government should develop counseling and educational campaigns and institutionalize disclosure of air quality conditions. These programs should pay special attention to young residents who have limited knowledge of air pollution in the Jinchuan mining area.

  14. Perception of Air Pollution in the Jinchuan Mining Area, China: A Structural Equation Modeling Approach

    PubMed Central

    Li, Zhengtao; Folmer, Henk; Xue, Jianhong

    2016-01-01

    Studies on the perception of air pollution in China are very limited. The aim of this paper is to help to fill this gap by analyzing a cross-sectional dataset of 759 residents of the Jinchuan mining area, Gansu Province, China. The estimations suggest that perception of air pollution is two-dimensional. The first dimension is the perceived intensity of air pollution and the second is the perceived hazardousness of the pollutants. Both dimensions are influenced by environmental knowledge. Perceived intensity is furthermore influenced by socio-economic status and proximity to the pollution source; perceived hazardousness is influenced by socio-economic status, family health experience, family size and proximity to the pollution source. There are no reverse effects from perception on environmental knowledge. The main conclusion is that virtually all Jinchuan residents perceive high intensity and hazardousness of air pollution despite the fact that public information on air pollution and its health impacts is classified to a great extent. It is suggested that, to assist the residents to take appropriate preventive action, the local government should develop counseling and educational campaigns and institutionalize disclosure of air quality conditions. These programs should pay special attention to young residents who have limited knowledge of air pollution in the Jinchuan mining area. PMID:27455291

  15. Analysis of Maintenance Service Contracts for Dump Trucks Used in Mining Industry with Simulation Approach

    NASA Astrophysics Data System (ADS)

    Dymasius, A.; Wangsaputra, R.; Iskandar, B. P.

    2016-02-01

    A mining company needs high availability of dump trucks used to haul mining materials. As a result, an effective maintenance action is required to keep the dump trucks in a good condition and hence reducing failure and downtime of the dump trucks. To carry out maintenance in-house requires a high intensive maintenance facility and high skilled maintenance specialists. Often, outsourcing maintenance is an economic option for the company. An external agent takes a proactive action with offering some maintenance contract options to the owner. The decision problem for the owner is to decide the best option and for the agent is to determine the optimal price for each option offered. A non-cooperative game-theory is used to formulate the decision problems for the owner and the agent. We consider that failure pattern of each truck follows a non-homogeneous Poisson process (NHPP) and a queueing theory with multiple servers is used to estimate the downtime. As it involves high complexity to model downtime using a queueing theory, then in this paper we use a simulation method. Furthermore, we conduct experiment to seek for the best number of maintenance facilities (servers) which minimises maintenance and penalty costs incurred to the agent.

  16. A New Approach for Mining Order-Preserving Submatrices Based on All Common Subsequences

    PubMed Central

    Xue, Yun; Liao, Zhengling; Li, Meihang; Luo, Jie; Kuang, Qiuhua; Hu, Xiaohui; Li, Tiechen

    2015-01-01

    Order-preserving submatrices (OPSMs) have been applied in many fields, such as DNA microarray data analysis, automatic recommendation systems, and target marketing systems, as an important unsupervised learning model. Unfortunately, most existing methods are heuristic algorithms which are unable to reveal OPSMs entirely in NP-complete problem. In particular, deep OPSMs, corresponding to long patterns with few supporting sequences, incur explosive computational costs and are completely pruned by most popular methods. In this paper, we propose an exact method to discover all OPSMs based on frequent sequential pattern mining. First, an existing algorithm was adjusted to disclose all common subsequence (ACS) between every two row sequences, and therefore all deep OPSMs will not be missed. Then, an improved data structure for prefix tree was used to store and traverse ACS, and Apriori principle was employed to efficiently mine the frequent sequential pattern. Finally, experiments were implemented on gene and synthetic datasets. Results demonstrated the effectiveness and efficiency of this method. PMID:26161131

  17. A distributed approach for optimizing cascaded classifier topologies in real-time stream mining systems.

    PubMed

    Foo, Brian; van der Schaar, Mihaela

    2010-11-01

    In this paper, we discuss distributed optimization techniques for configuring classifiers in a real-time, informationally-distributed stream mining system. Due to the large volume of streaming data, stream mining systems must often cope with overload, which can lead to poor performance and intolerable processing delay for real-time applications. Furthermore, optimizing over an entire system of classifiers is a difficult task since changing the filtering process at one classifier can impact both the feature values of data arriving at classifiers further downstream and thus, the classification performance achieved by an ensemble of classifiers, as well as the end-to-end processing delay. To address this problem, this paper makes three main contributions: 1) Based on classification and queuing theoretic models, we propose a utility metric that captures both the performance and the delay of a binary filtering classifier system. 2) We introduce a low-complexity framework for estimating the system utility by observing, estimating, and/or exchanging parameters between the inter-related classifiers deployed across the system. 3) We provide distributed algorithms to reconfigure the system, and analyze the algorithms based on their convergence properties, optimality, information exchange overhead, and rate of adaptation to non-stationary data sources. We provide results using different video classifier systems.

  18. SPECTRUS: A Dimensionality Reduction Approach for Identifying Dynamical Domains in Protein Complexes from Limited Structural Datasets.

    PubMed

    Ponzoni, Luca; Polles, Guido; Carnevale, Vincenzo; Micheletti, Cristian

    2015-08-04

    Identifying dynamical, quasi-rigid domains in proteins provides a powerful means for characterizing functionally oriented structural changes via a parsimonious set of degrees of freedom. In fact, the relative displacements of few dynamical domains usually suffice to rationalize the mechanics underpinning biological functionality in proteins and can even be exploited for structure determination or refinement purposes. Here we present SPECTRUS, a general scheme that, by solely using amino acid distance fluctuations, can pinpoint the innate quasi-rigid domains of single proteins or large complexes in a robust way. Consistent domains are usually obtained by using either a pair of representative structures or thousands of conformers. The functional insights offered by the approach are illustrated for biomolecular systems of very different size and complexity such as kinases, ion channels, and viral capsids. The decomposition tool is available as a software package and web server at spectrus.sissa.it.

  19. Cocrystal dissociation in the presence of water: a general approach for identifying stable cocrystal forms.

    PubMed

    Eddleston, Mark D; Madusanka, Nadeesh; Jones, William

    2014-09-01

    In previous studies, cocrystals have been shown to be susceptible to dissociation at high humidity because of differences in the solubilities of the two coformer molecules, especially when these molecules can form hydrates. Contrastingly, however, the propensity of the pharmaceutically active compound caffeine to hydrate formation is reduced by cocrystallization with oxalic acid. Here, the stability of the oxalic acid cocrystal of caffeine is investigated from a thermodynamic perspective through the use of aqueous slurries of caffeine hydrate and oxalic acid dihydrate. Conversion to the anhydrous caffeine-oxalic acid cocrystal occurred under these conditions confirming that this form is thermodynamically stable in an aqueous environment. The slurry methodology was further developed as a general approach to screening for cocrystals that are not susceptible to dissociation at high humidity. In this manner, cocrystals of the hydrate-forming molecules theophylline, carbamazepine, and piroxicam that are stable at high humidity, indefinitely avoiding hydrate formation, were identified.

  20. Genome-Scale Approaches to Identify Genes Essential for Haemophilus influenzae Pathogenesis

    PubMed Central

    Wong, Sandy M. S.; Akerley, Brian J.

    2012-01-01

    Haemophilus influenzae is a Gram-negative bacterium that has no identified natural niche outside of the human host. It primarily colonizes the nasopharyngeal mucosa in an asymptomatic mode, but has the ability to disseminate to other anatomical sites to cause otitis media, upper, and lower respiratory tract infections, septicemia, and meningitis. To persist in diverse environments the bacterium must exploit and utilize the nutrients and other resources available in these sites for optimal growth/survival. Recent evidence suggests that regulatory factors that direct such adaptations also control virulence determinants required to resist and evade immune clearance mechanisms. In this review, we describe the recent application of whole-genome approaches that together provide insight into distinct survival mechanisms of H. influenzae in the context of different sites of pathogenesis. PMID:22919615

  1. Identifying the critical financial ratios for stocks evaluation: A fuzzy delphi approach

    NASA Astrophysics Data System (ADS)

    Mokhtar, Mazura; Shuib, Adibah; Mohamad, Daud

    2014-12-01

    Stocks evaluation has always been an interesting and challenging problem for both researchers and practitioners. Generally, the evaluation can be made based on a set of financial ratios. Nevertheless, there are a variety of financial ratios that can be considered and if all ratios in the set are placed into the evaluation process, data collection would be more difficult and time consuming. Thus, the objective of this paper is to identify the most important financial ratios upon which to focus in order to evaluate the stock's performance. For this purpose, a survey was carried out using an approach which is based on an expert judgement, namely the Fuzzy Delphi Method (FDM). The results of this study indicated that return on equity, return on assets, net profit margin, operating profit margin, earnings per share and debt to equity are the most important ratios.

  2. Identifying Risk and Protective Factors in Recidivist Juvenile Offenders: A Decision Tree Approach.

    PubMed

    Ortega-Campos, Elena; García-García, Juan; Gil-Fenoy, Maria José; Zaldívar-Basurto, Flor

    2016-01-01

    Research on juvenile justice aims to identify profiles of risk and protective factors in juvenile offenders. This paper presents a study of profiles of risk factors that influence young offenders toward committing sanctionable antisocial behavior (S-ASB). Decision tree analysis is used as a multivariate approach to the phenomenon of repeated sanctionable antisocial behavior in juvenile offenders in Spain. The study sample was made up of the set of juveniles who were charged in a court case in the Juvenile Court of Almeria (Spain). The period of study of recidivism was two years from the baseline. The object of study is presented, through the implementation of a decision tree. Two profiles of risk and protective factors are found. Risk factors associated with higher rates of recidivism are antisocial peers, age at baseline S-ASB, problems in school and criminality in family members.

  3. Identifying Risk and Protective Factors in Recidivist Juvenile Offenders: A Decision Tree Approach

    PubMed Central

    Ortega-Campos, Elena; García-García, Juan; Gil-Fenoy, Maria José; Zaldívar-Basurto, Flor

    2016-01-01

    Research on juvenile justice aims to identify profiles of risk and protective factors in juvenile offenders. This paper presents a study of profiles of risk factors that influence young offenders toward committing sanctionable antisocial behavior (S-ASB). Decision tree analysis is used as a multivariate approach to the phenomenon of repeated sanctionable antisocial behavior in juvenile offenders in Spain. The study sample was made up of the set of juveniles who were charged in a court case in the Juvenile Court of Almeria (Spain). The period of study of recidivism was two years from the baseline. The object of study is presented, through the implementation of a decision tree. Two profiles of risk and protective factors are found. Risk factors associated with higher rates of recidivism are antisocial peers, age at baseline S-ASB, problems in school and criminality in family members. PMID:27611313

  4. A qualitative approach to signal mining in pharmacovigilance using formal concept analysis.

    PubMed

    Lillo-Le Louët, Agnès; Toussaint, Yannick; Villerd, Jean

    2010-01-01

    "Pharmacovigilance is the process and science of monitoring the safety of medicines, consisting in (i) collecting and managing data on the safety of medicines (ii) looking at the data to detect 'signals' (any new or changing safety issue)" [1]. Pharmacovigilance is mainly based on spontaneous reports: when suspecting an adverse drug reaction, health care practitioners send a report to a spontaneous reporting system (SRS). This produces huge databases containing numerous reports and their manual exploration is both cost and time prohibitive. Existing techniques that automatically extract relevant signals rely on statistics or Bayesian models but do not provide information to the experts about possible biases lying in the data, nor about the specificity of a signal to a particular patient profile. Our extraction method combines numerical methods from the state of the art with a qualitative approach that helps interpretation. We build a synthetic representation of the database that is used to (i) identify unexpected patterns and biases (ii) extract potentially relevant signals w.r.t. patient profiles (iii) provide traceability facilities between extracted signals and raw data.

  5. Xtalk: a path-based approach for identifying crosstalk between signaling pathways

    PubMed Central

    Tegge, Allison N.; Sharp, Nicholas; Murali, T. M.

    2016-01-01

    Motivation: Cells communicate with their environment via signal transduction pathways. On occasion, the activation of one pathway can produce an effect downstream of another pathway, a phenomenon known as crosstalk. Existing computational methods to discover such pathway pairs rely on simple overlap statistics. Results: We present Xtalk, a path-based approach for identifying pairs of pathways that may crosstalk. Xtalk computes the statistical significance of the average length of multiple short paths that connect receptors in one pathway to the transcription factors in another. By design, Xtalk reports the precise interactions and mechanisms that support the identified crosstalk. We applied Xtalk to signaling pathways in the KEGG and NCI-PID databases. We manually curated a gold standard set of 132 crosstalking pathway pairs and a set of 140 pairs that did not crosstalk, for which Xtalk achieved an area under the receiver operator characteristic curve of 0.65, a 12% improvement over the closest competing approach. The area under the receiver operator characteristic curve varied with the pathway, suggesting that crosstalk should be evaluated on a pathway-by-pathway level. We also analyzed an extended set of 658 pathway pairs in KEGG and to a set of more than 7000 pathway pairs in NCI-PID. For the top-ranking pairs, we found substantial support in the literature (81% for KEGG and 78% for NCI-PID). We provide examples of networks computed by Xtalk that accurately recovered known mechanisms of crosstalk. Availability and implementation: The XTALK software is available at http://bioinformatics.cs.vt.edu/~murali/software. Crosstalk networks are available at http://graphspace.org/graphs?tags=2015-bioinformatics-xtalk. Contact: ategge@vt.edu, murali@cs.vt.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26400040

  6. A similarity based approach to identify homogeneous regions for seasonal forecasting

    NASA Astrophysics Data System (ADS)

    Schick, Simon; Rössler, Ole; Weingartner, Rolf

    2015-04-01

    Seasonal runoff forecasting using statistical models is challenged by a large number of candidate predictors and a general weak predictor-predictand relationship. As the area of the target basin increases, often also the available data sets do, thus reinforcing the predictor selection challenge. We propose an approach which follows the idea of 'divide and conquer' as developed in computational sciences and machine learning: First, the macroscale target basin is partitioned into homogeneous regions using all its gauged mesoscale subbasins. Second, one representative subbasin per homogeneous region is identified, for which models are fitted and applied. Third, the resulting forecasts are combined at the scale of the macroscale target basin. This approach requires a suitable method to identify homogeneous regions and representative subbasins. We suggest a way based on hydrological similarity, as catchment similarity estimated with respect to physiographic-climatic descriptors does not necessarily imply similar runoff response. Each descriptor is derived from daily runoff series and aimed to reflect a specific catchment characteristic: autocorrelation coefficient, parameters of fitted Gamma distribution and low/high flow indices (based on daily runoff values) fluctuation of the standard deviation within the yearly cycle (based on weekly runoff values) dominant harmonics obtained from the discrete Fourier transform (based on monthly runoff values) long term trend (based on yearly runoff values) Where necessary, the runoff series first need to be standardized, aggregated, detrended or deseasonalized. As a preliminary study we present the results of a cluster analysis for the Swiss Rhine River as macroscale target basin, which leads to about 40 mesoscale subbasins with runoff series for the period 1991-2010. Problems we have to address include the choice of a clustering algorithm, the identification of an appropriate number of regions and the selection of representative

  7. Deductive genomics: a functional approach to identify innovative drug targets in the post-genome era.

    PubMed

    Stumm, Gabriele; Russ, Andreas; Nehls, Michael

    2002-01-01

    The sequencing of the human genome has generated a drug discovery process that is based on sequence analysis and hypothesis-driven (inductive) prediction of gene function. This approach, which we term inductive genomics, is currently dominating the efforts of the pharmaceutical industry to identify new drug targets. According to recent studies, this sequence-driven discovery process is paradoxically increasing the average cost of drug development, thus falling short of the promise of the Human Genome Project to simplify the creation of much needed novel therapeutics. In the early stages of discovery, the flurry of new gene sequences makes it difficult to pick and prioritize the most promising product candidates for product development, as with existing technologies important decisions have to be based on circumstantial evidence that does not strongly predict therapeutic potential. This is because the physiological function of a potential target cannot be predicted by gene sequence analysis and in vitro technologies alone. In contrast, deductive genomics, or large-scale forward genetics, bridges the gap between sequence and function by providing a function-driven in vivo screen of a highly orthologous mammalian model genome for medically relevant physiological functions and drug targets. This approach allows drug discovery to move beyond the focus on sequence-driven identification of new members of classical drug-able protein families towards the biology-driven identification of innovative targets and biological pathways.

  8. An integrated functional genomics approach identifies the regulatory network directed by brachyury (T) in chordoma.

    PubMed

    Nelson, Andrew C; Pillay, Nischalan; Henderson, Stephen; Presneau, Nadège; Tirabosco, Roberto; Halai, Dina; Berisha, Fitim; Flicek, Paul; Stemple, Derek L; Stern, Claudio D; Wardle, Fiona C; Flanagan, Adrienne M

    2012-11-01

    Chordoma is a rare malignant tumour of bone, the molecular marker of which is the expression of the transcription factor, brachyury. Having recently demonstrated that silencing brachyury induces growth arrest in a chordoma cell line, we now seek to identify its downstream target genes. Here we use an integrated functional genomics approach involving shRNA-mediated brachyury knockdown, gene expression microarray, ChIP-seq experiments, and bioinformatics analysis to achieve this goal. We confirm that the T-box binding motif of human brachyury is identical to that found in mouse, Xenopus, and zebrafish development, and that brachyury acts primarily as an activator of transcription. Using human chordoma samples for validation purposes, we show that brachyury binds 99 direct targets and indirectly influences the expression of 64 other genes, thereby acting as a master regulator of an elaborate oncogenic transcriptional network encompassing diverse signalling pathways including components of the cell cycle, and extracellular matrix components. Given the wide repertoire of its active binding and the relative specific localization of brachyury to the tumour cells, we propose that an RNA interference-based gene therapy approach is a plausible therapeutic avenue worthy of investigation.

  9. The waveform similarity approach to identify dependent events in instrumental seismic catalogues

    NASA Astrophysics Data System (ADS)

    Barani, S.; Ferretti, G.; Massa, M.; Spallarossa, D.

    2007-01-01

    In this paper, waveform similarity analysis is adapted and implemented in a declustering procedure to identify foreshocks and aftershocks, to obtain instrumental catalogues that are cleaned of dependent events and to perform an independent check of the results of traditional declustering techniques. Unlike other traditional declustering methods (i.e. windowing techniques), the application of cross-correlation analysis allows definition of groups of dependent events (multiplets) characterized by similar location, fault mechanism and propagation pattern. In this way the chain of intervening related events is led by the seismogenetic features of earthquakes. Furthermore, a time-selection criterion is used to define time-independent seismic episodes eventually joined (on the basis of waveform similarity) into a single multiplet. The results, obtained applying our procedure to a test data set, show that the declustered catalogue is drawn by the Poisson distribution with a degree of confidence higher than using the Gardner and Knopoff method. The declustered catalogues, applying these two approaches, are similar with respect to the frequency-magnitude distribution and the number of earthquakes. Nevertheless, the application of our approach leads to declustered catalogues properly related to the seismotectonic background and the reology of the investigated area and the success of the procedure is ensured by the independence of the results on estimated location errors of the events collected in the raw catalogue.

  10. New approach for identifying the zero-order fringe in variable wavelength interferometry

    NASA Astrophysics Data System (ADS)

    Galas, Jacek; Litwin, Dariusz; Daszkiewicz, Marek

    2016-12-01

    The family of VAWI techniques (for transmitted and reflected light) is especially efficient for characterizing objects, when in the interference system the optical path difference exceeds a few wavelengths. The classical approach that consists in measuring the deflection of interference fringes fails because of strong edge effects. Broken continuity of interference fringes prevents from correct identification of the zero order fringe, which leads to significant errors. The family of these methods has been proposed originally by Professor Pluta in the 1980s but that time image processing facilities and computers were hardly available. Automated devices unfold a completely new approach to the classical measurement procedures. The Institute team has taken that new opportunity and transformed the technique into fully automated measurement devices offering commercial readiness of industry-grade quality. The method itself has been modified and new solutions and algorithms simultaneously have extended the field of application. This has concerned both construction aspects of the systems and software development in context of creating computerized instruments. The VAWI collection of instruments constitutes now the core of the Institute commercial offer. It is now practically applicable in industrial environment for measuring textile and optical fibers, strips of thin films, testing of wave plates and nonlinear affects in different materials. This paper describes new algorithms for identifying the zero order fringe, which increases the performance of the system as a whole and presents some examples of measurements of optical elements.

  11. Urinary Biomarkers of Whole Grain Wheat Intake Identified by Non-targeted and Targeted Metabolomics Approaches

    PubMed Central

    Zhu, Yingdong; Wang, Pei; Sha, Wei; Sang, Shengmin

    2016-01-01

    Mounting evidence suggests that whole grain (WG) intake plays an important role in chronic disease prevention. However, numerous human studies have failed to produce clear-cut conclusions on this topic. Here, a combination of non-targeted and targeted metabolomics approaches, together with kinetic studies, was used to investigate biomarkers of WG wheat intake and further explore the diet-disease associations. Via these integrated approaches, forty-one compounds were identified as the most discriminating endogenous metabolites after WG versus refined grain (RG) wheat bread consumption. The corresponding biological assessment of these endogenous changes suggests that, in contrast to RG consumption, WG wheat consumption may facilitate antioxidant defense systems and moderate the risk factors of cancer, cardiovascular diseases, and other chronic diseases. A panel of urinary markers consisting of seven alkylresorcinol metabolites and five benzoxazinoid derivatives as specific biomarkers, as well as five phenolic acid derivatives, was also established to cover multiple time points and longer time periods for correctly and objectively monitoring WG wheat intake. Through these findings, we have established a comprehensive biomarker pool to better assess WG wheat consumption, and to monitor the endogenous changes that are linked to health effects of WG wheat consumption. PMID:27805021

  12. Chemical proteomics approaches for identifying the cellular targets of natural products.

    PubMed

    Wright, M H; Sieber, S A

    2016-05-04

    Covering: 2010 up to 2016Deconvoluting the mode of action of natural products and drugs remains one of the biggest challenges in chemistry and biology today. Chemical proteomics is a growing area of chemical biology that seeks to design small molecule probes to understand protein function. In the context of natural products, chemical proteomics can be used to identify the protein binding partners or targets of small molecules in live cells. Here, we highlight recent examples of chemical probes based on natural products and their application for target identification. The review focuses on probes that can be covalently linked to their target proteins (either via intrinsic chemical reactivity or via the introduction of photocrosslinkers), and can be applied "in situ" - in living systems rather than cell lysates. We also focus here on strategies that employ a click reaction, the copper-catalysed azide-alkyne cycloaddition reaction (CuAAC), to allow minimal functionalisation of natural product scaffolds with an alkyne or azide tag. We also discuss 'competitive mode' approaches that screen for natural products that compete with a well-characterised chemical probe for binding to a particular set of protein targets. Fuelled by advances in mass spectrometry instrumentation and bioinformatics, many modern strategies are now embracing quantitative proteomics to help define the true interacting partners of probes, and we highlight the opportunities this rapidly evolving technology provides in chemical proteomics. Finally, some of the limitations and challenges of chemical proteomics approaches are discussed.

  13. Multivariate statistical and GIS-based approach to identify heavy metal sources in soils.

    PubMed

    Facchinelli, A; Sacchi, E; Mallen, L

    2001-01-01

    The knowledge of the regional variability, the background values and the anthropic vs. natural origin for potentially harmful elements in soils is of critical importance to assess human impact and to fix guide values and quality standards. The present study was undertaken as a preliminary survey on soil contamination on a regional scale in Piemonte (NW Italy). The aims of the study were: (1) to determine average regional concentrations of some heavy metals (Cr, Co, Ni, Cu, Zn, Pb); (2) to find out their large-scale variability; (3) to define their natural or artificial origin; and (4) to identify possible non-point sources of contamination. Multivariate statistic approaches (Principal Component Analysis and Cluster Analysis) were adopted for data treatment, allowing the identification of three main factors controlling the heavy metal variability in cultivated soils. Geostatistics were used to construct regional distribution maps, to be compared with the geographical, geologic and land use regional database using GIS software. This approach, evidencing spatial relationships, proved very useful to the confirmation and refinement of geochemical interpretations of the statistical output. Cr, Co and Ni were associated with and controlled by parent rocks, whereas Cu together with Zn, and Pb alone were controlled by anthropic activities. The study indicates that background values and realistic mandatory guidelines are impossible to fix without an extensive data collection and without a correct geochemical interpretation of the data.

  14. A New Approach to Identifying the Drivers of Regulation Compliance Using Multivariate Behavioural Models

    PubMed Central

    Thomas, Alyssa S.; Milfont, Taciano L.; Gavin, Michael C.

    2016-01-01

    Non-compliance with fishing regulations can undermine management effectiveness. Previous bivariate approaches were unable to untangle the complex mix of factors that may influence fishers’ compliance decisions, including enforcement, moral norms, perceived legitimacy of regulations and the behaviour of others. We compared seven multivariate behavioural models of fisher compliance decisions using structural equation modeling. An online survey of over 300 recreational fishers tested the ability of each model to best predict their compliance with two fishing regulations (daily and size limits). The best fitting model for both regulations was composed solely of psycho-social factors, with social norms having the greatest influence on fishers’ compliance behaviour. Fishers’ attitude also directly affected compliance with size limit, but to a lesser extent. On the basis of these findings, we suggest behavioural interventions to target social norms instead of increasing enforcement for the focal regulations in the recreational blue cod fishery in the Marlborough Sounds, New Zealand. These interventions could include articles in local newspapers and fishing magazines highlighting the extent of regulation compliance as well as using respected local fishers to emphasize the benefits of compliance through public meetings or letters to the editor. Our methodological approach can be broadly applied by natural resource managers as an effective tool to identify drivers of compliance that can then guide the design of interventions to decrease illegal resource use. PMID:27727292

  15. Chemical proteomics approaches for identifying the cellular targets of natural products

    PubMed Central

    Sieber, S. A.

    2016-01-01

    Covering: 2010 up to 2016 Deconvoluting the mode of action of natural products and drugs remains one of the biggest challenges in chemistry and biology today. Chemical proteomics is a growing area of chemical biology that seeks to design small molecule probes to understand protein function. In the context of natural products, chemical proteomics can be used to identify the protein binding partners or targets of small molecules in live cells. Here, we highlight recent examples of chemical probes based on natural products and their application for target identification. The review focuses on probes that can be covalently linked to their target proteins (either via intrinsic chemical reactivity or via the introduction of photocrosslinkers), and can be applied “in situ” – in living systems rather than cell lysates. We also focus here on strategies that employ a click reaction, the copper-catalysed azide–alkyne cycloaddition reaction (CuAAC), to allow minimal functionalisation of natural product scaffolds with an alkyne or azide tag. We also discuss ‘competitive mode’ approaches that screen for natural products that compete with a well-characterised chemical probe for binding to a particular set of protein targets. Fuelled by advances in mass spectrometry instrumentation and bioinformatics, many modern strategies are now embracing quantitative proteomics to help define the true interacting partners of probes, and we highlight the opportunities this rapidly evolving technology provides in chemical proteomics. Finally, some of the limitations and challenges of chemical proteomics approaches are discussed. PMID:27098809

  16. Parametric analysis of the biomechanical response of head subjected to the primary blast loading--a data mining approach.

    PubMed

    Zhu, Feng; Kalra, Anil; Saif, Tal; Yang, Zaihan; Yang, King H; King, Albert I

    2016-01-01

    Traumatic brain injury due to primary blast loading has become a signature injury in recent military conflicts and terrorist activities. Extensive experimental and computational investigations have been conducted to study the interrelationships between intracranial pressure response and intrinsic or 'input' parameters such as the head geometry and loading conditions. However, these relationships are very complicated and are usually implicit and 'hidden' in a large amount of simulation/test data. In this study, a data mining method is proposed to explore such underlying information from the numerical simulation results. The heads of different species are described as a highly simplified two-part (skull and brain) finite element model with varying geometric parameters. The parameters considered include peak incident pressure, skull thickness, brain radius and snout length. Their interrelationship and coupling effect are discovered by developing a decision tree based on the large simulation data-set. The results show that the proposed data-driven method is superior to the conventional linear regression method and is comparable to the nonlinear regression method. Considering its capability of exploring implicit information and the relatively simple relationships between response and input variables, the data mining method is considered to be a good tool for an in-depth understanding of the mechanisms of blast-induced brain injury. As a general method, this approach can also be applied to other nonlinear complex biomechanical systems.

  17. Incidental sinonasal findings identified during preoperative evaluation for endoscopic transsphenoidal approaches

    PubMed Central

    Laury, Adrienne M.; Oyesiku, Nelson M.; Hadjipanayis, Costas G.; DelGaudio, John M.

    2013-01-01

    Background: The endoscopic transsphenoidal approach (eTSA) to lesions of the sellar region is typically performed jointly by neurosurgeons and otolaryngologists. Occasionally, the approach is significantly altered by sinonasal disease, anatomic variants, or previous surgery. However, there are no current guidelines that describe which physical or radiological findings should prompt a change in the plan of care. The purpose of this study was to determine the incidence of sinonasal pathology or anatomic variants noted endoscopically or by imaging that altered preoperative or intraoperative management. Methods: A retrospective review was performed of 355 consecutive patients who underwent combined neurosurgery–otolaryngology endoscopic sella approach from August 1, 2007 to April 1, 2011. Our practice in these patients involves preoperative otolaryngology clinical evaluation and MRI review. Intraoperative image guidance is not routinely used in uncomplicated eTSA. Results: The most common management alteration was the addition of image guidance based on anatomic variants on MRI, which occurred in 81 patients (35.0%). Eight patients (2.9%) were preoperatively treated with antibiotics and surgery was postponed secondary to acute or chronic purulent rhinosinusitis; two (0.7%) required functional endoscopic sinus surgery for medically refractory disease before eTSA. Five patients (1.8%) required anterior septoplasty intraoperatively for severe nasal septal deviation. Two patients (0.7%) had inverted papilloma and one patient had esthesioneuroblastoma identified preoperatively during rigid nasal endoscopy. Conclusion: This is one of the larger reviews of patients undergoing eTSA for sellar lesions and the only study that describes how intraoperative management may be altered by preoperative sinonasal evaluation. We found a significant incidence of sinonasal pathology and anatomic variants that altered routine operative planning; therefore, a thorough sinonasal evaluation

  18. Multi-compartment approach to identify minimal flow and maximal recreational use of a lowland river

    NASA Astrophysics Data System (ADS)

    Pusch, Martin; Lorenz, Stefan

    2013-04-01

    Most approaches to establish a minimum flow rate for river sections subjected to water abstraction focus on flow requirements of fish and benthic invertebrates. However, artificial reduction of river flow will always affect additional key ecosystem features, as sediment properties and the metabolism of matter in these ecosystems as well, and may even influence adjacent floodplains. Thus, significant effects e.g. on the dissolved oxygen content of river water, on habitat conditions in the benthic zone, and on water levels in the floodplain are to be expected. Thus, we chose a multiple compartment method to identify minimum flow requirements in a lowland River in northern Germany (Spree River), selecting the minimal required flow level out of all compartments studied. Results showed that minimal flow levels necessary to keep key ecosystem features at a 'good' state depended significantly on actual water quality and on river channel morphology. Thereby, water quality of the Spree is potentially influenced by recreational boating activity, which causes mussels to stop filter-feeding, and thus impedes self-purification. Disturbance of mussel feeding was shown to directly depend on boat type and speed, with substantial differences among mussel species. Thus, a maximal recreational boating intensity could be derived that does not significantly affect self purification. We conclude that minimal flow levels should be identified not only based on flow preferences of target species, but also considering channel morphology, ecological functions, and the intensity of other human uses of the river section.

  19. A neural network approach for identifying particle pitch angle distributions in Van Allen Probes data

    NASA Astrophysics Data System (ADS)

    Souza, V. M.; Vieira, L. E. A.; Medeiros, C.; Da Silva, L. A.; Alves, L. R.; Koga, D.; Sibeck, D. G.; Walsh, B. M.; Kanekal, S. G.; Jauer, P. R.; Rockenbach, M.; Dal Lago, A.; Silveira, M. V. D.; Marchezi, J. P.; Mendes, O.; Gonzalez, W. D.; Baker, D. N.

    2016-04-01

    Analysis of particle pitch angle distributions (PADs) has been used as a means to comprehend a multitude of different physical mechanisms that lead to flux variations in the Van Allen belts and also to particle precipitation into the upper atmosphere. In this work we developed a neural network-based data clustering methodology that automatically identifies distinct PAD types in an unsupervised way using particle flux data. One can promptly identify and locate three well-known PAD types in both time and radial distance, namely, 90° peaked, butterfly, and flattop distributions. In order to illustrate the applicability of our methodology, we used relativistic electron flux data from the whole month of November 2014, acquired from the Relativistic Electron-Proton Telescope instrument on board the Van Allen Probes, but it is emphasized that our approach can also be used with multiplatform spacecraft data. Our PAD classification results are in reasonably good agreement with those obtained by standard statistical fitting algorithms. The proposed methodology has a potential use for Van Allen belt's monitoring.

  20. Integrative screening approach identifies regulators of polyploidization and targets for acute megakaryocytic leukemia

    PubMed Central

    Wen, Qiang; Goldenson, Benjamin; Silver, Serena J.; Schenone, Monica; Dancik, Vladimir; Huang, Zan; Wang, Ling-Zhi; Lewis, Timothy; An, W. Frank; Li, Xiaoyu; Bray, Mark-Anthony; Thiollier, Clarisse; Diebold, Lauren; Gilles, Laure; Vokes, Martha S.; Moore, Christopher B.; Bliss-Moreau, Meghan; VerPlank, Lynn; Tolliday, Nicola J.; Mishra, Rama; Vemula, Sasidhar; Shi, Jianjian; Wei, Lei; Kapur, Reuben; Lopez, Cécile K.; Gerby, Bastien; Ballerini, Paola; Pflumio, Francoise; Gilliland, D. Gary; Goldberg, Liat; Birger, Yehudit; Izraeli, Shai; Gamis, Alan S.; Smith, Franklin O.; Woods, William G.; Taub, Jeffrey; Scherer, Christina A.; Bradner, James; Goh, Boon-Cher; Mercher, Thomas; Carpenter, Anne E.; Gould, Robert J.; Clemons, Paul A.; Carr, Steven A.; Root, David E.; Schreiber, Stuart L.; Stern, Andrew M.; Crispino, John D.

    2012-01-01

    Summary The mechanism by which cells decide to skip mitosis to become polyploid is largely undefined. Here we used a high-content image-based screen to identify small-molecule probes that induce polyploidization of megakaryocytic leukemia cells and serve as perturbagens to help understand this process. We found that dimethylfasudil (diMF, H-1152P) selectively increased polyploidization, mature cell-surface marker expression, and apoptosis of malignant megakaryocytes. A broadly applicable, highly integrated target identification approach employing proteomic and shRNA screening revealed that a major target of diMF is Aurora A kinase (AURKA), which has not been studied extensively in megakaryocytes. Moreover, we discovered that MLN8237 (Alisertib), a selective inhibitor of AURKA, induced polyploidization and expression of mature megakaryocyte markers in AMKL blasts and displayed potent anti-AMKL activity in vivo. This research provides the rationale to support clinical trials of MLN8237 and other inducers of polyploidization in AMKL. Finally, we have identified five networks of kinases that regulate the switch to polyploidy. PMID:22863010

  1. Modelling Creativity: Identifying Key Components through a Corpus-Based Approach

    PubMed Central

    2016-01-01

    Creativity is a complex, multi-faceted concept encompassing a variety of related aspects, abilities, properties and behaviours. If we wish to study creativity scientifically, then a tractable and well-articulated model of creativity is required. Such a model would be of great value to researchers investigating the nature of creativity and in particular, those concerned with the evaluation of creative practice. This paper describes a unique approach to developing a suitable model of how creative behaviour emerges that is based on the words people use to describe the concept. Using techniques from the field of statistical natural language processing, we identify a collection of fourteen key components of creativity through an analysis of a corpus of academic papers on the topic. Words are identified which appear significantly often in connection with discussions of the concept. Using a measure of lexical similarity to help cluster these words, a number of distinct themes emerge, which collectively contribute to a comprehensive and multi-perspective model of creativity. The components provide an ontology of creativity: a set of building blocks which can be used to model creative practice in a variety of domains. The components have been employed in two case studies to evaluate the creativity of computational systems and have proven useful in articulating achievements of this work and directions for further research. PMID:27706185

  2. Calibrated photostimulated luminescence is an effective approach to identify irradiated orange during storage

    NASA Astrophysics Data System (ADS)

    Jo, Yunhee; Sanyal, Bhaskar; Chung, Namhyeok; Lee, Hyun-Gyu; Park, Yunji; Park, Hae-Jun; Kwon, Joong-Ho

    2015-06-01

    Photostimulated luminescence (PSL) has been employed as a fast screening method for various irradiated foods. In this study the potential use of PSL was evaluated to identify oranges irradiated with gamma ray, electron beam and X-ray (0-2 kGy) and stored under different conditions for 6 weeks. The effects of light conditions (natural light, artificial light, and dark) and storage temperatures (4 and 20 °C) on PSL photon counts (PCs) during post-irradiation periods were studied. Non-irradiated samples always showed negative values of PCs, while irradiated oranges exhibited intermediate results after first PSL measurements. However, the irradiated samples had much higher PCs. The PCs of all the samples declined as the storage time increased. Calibrated second PSL measurements showed PSL ratio <10 for the irradiated samples after 3 weeks of irradiation confirming their irradiation status in all the storage conditions. Calibrated PSL and sample storage in dark at 4 °C were found out to be most suitable approaches to identify irradiated oranges during storage.

  3. In silico scaffold evaluation and solid phase approach to identify new gelatinase inhibitors.

    PubMed

    Topai, Alessandra; Breccia, Perla; Minissi, Franco; Padova, Alessandro; Marini, Stefano; Cerbara, Ilaria

    2012-04-01

    Among matrix metalloproteinases (MMPs), gelatinases MMP-2 (gelatinase A) and MMP-9 (gelatinase B) play a key role in a number of physiological processes such as tissue repair and fibrosis. Many evidences point out their involvement in a series of pathological events, such as arthritis, multiple sclerosis, cardiovascular diseases, inflammatory processes and tumor progression by degradation of the extracellular matrix. To date, the identification of non-specific MMP inhibitors has made difficult the selective targeting of gelatinases. In this work we report the identification, design and synthesis of new gelatinase inhibitors with appropriate drug-like properties and good profile in terms of affinity and selectivity. By a detailed in silico protocol and innovative and versatile solid phase approaches, a series of 4-thiazolydinyl-N-hydroxycarboxyamide derivatives were identified. In particular, compounds 9a and 10a showed a potent inhibitory activity against gelatinase B and good selectivity over the other MMP considered in this study. The identified compounds could represent novel potential candidates as therapeutic agents.

  4. An algorithmic calibration approach to identify globally optimal parameters for constraining the DayCent model

    SciTech Connect

    Rafique, Rashid; Kumar, Sandeep; Luo, Yiqi; Kiely, Gerard; Asrar, Ghassem R.

    2015-02-01

    he accurate calibration of complex biogeochemical models is essential for the robust estimation of soil greenhouse gases (GHG) as well as other environmental conditions and parameters that are used in research and policy decisions. DayCent is a popular biogeochemical model used both nationally and internationally for this purpose. Despite DayCent’s popularity, its complex parameter estimation is often based on experts’ knowledge which is somewhat subjective. In this study we used the inverse modelling parameter estimation software (PEST), to calibrate the DayCent model based on sensitivity and identifi- ability analysis. Using previously published N2 O and crop yield data as a basis of our calibration approach, we found that half of the 140 parameters used in this study were the primary drivers of calibration dif- ferences (i.e. the most sensitive) and the remaining parameters could not be identified given the data set and parameter ranges we used in this study. The post calibration results showed improvement over the pre-calibration parameter set based on, a decrease in residual differences 79% for N2O fluxes and 84% for crop yield, and an increase in coefficient of determination 63% for N2O fluxes and 72% for corn yield. The results of our study suggest that future studies need to better characterize germination tem- perature, number of degree-days and temperature dependency of plant growth; these processes were highly sensitive and could not be adequately constrained by the data used in our study. Furthermore, the sensitivity and identifiability analysis was helpful in providing deeper insight for important processes and associated parameters that can lead to further improvement in calibration of DayCent model.

  5. An Integrated Multiomics Approach to Identify Candidate Antigens for Serodiagnosis of Human Onchocerciasis.

    PubMed

    McNulty, Samantha N; Rosa, Bruce A; Fischer, Peter U; Rumsey, Jeanne M; Erdmann-Gilmore, Petra; Curtis, Kurt C; Specht, Sabine; Townsend, R Reid; Weil, Gary J; Mitreva, Makedonka

    2015-12-01

    Improved diagnostic methods are needed to support ongoing efforts to eliminate onchocerciasis (river blindness). This study used an integrated approach to identify adult female Onchocerca volvulus antigens that can be explored for developing serodiagnostic tests. The first step was to develop a detailed multi-omics database of all O. volvulus proteins deduced from the genome, gene transcription data for different stages of the parasite including eight individual female worms (providing gene expression information for 94.8% of all protein coding genes), and the adult female worm proteome (detecting 2126 proteins). Next, female worm proteins were purified with IgG antibodies from onchocerciasis patients and identified using LC-MS with a high-resolution hybrid quadrupole-time-of-flight mass spectrometer. A total of 241 immunoreactive proteins were identified among those bound by IgG from infected individuals but not IgG from uninfected controls. These included most of the major diagnostic antigens described over the past 25 years plus many new candidates. Proteins of interest were prioritized for further study based on a lack of conservation with orthologs in the human host and other helminthes, their expression pattern across the life cycle, and their consistent expression among individual female worms. Based on these criteria, we selected 33 proteins that should be carried forward for testing as serodiagnostic antigens to supplement existing diagnostic tools. These candidates, together with the extensive pan-omics dataset generated in this study are available to the community (http://nematode.net) to facilitate basic and translational research on onchocerciasis.

  6. A phase coherence approach to identifying co-located earthquakes and tremor

    NASA Astrophysics Data System (ADS)

    Hawthorne, J. C.; Ampuero, J.-P.

    2017-01-01

    We present and use a phase coherence approach to identify seismic signals that have similar path effects but different source time functions: co-located earthquakes and tremor. The method used is a phase coherence-based implementation of empirical matched field processing, modified to suit tremor analysis. It works by comparing the frequency-domain phases of waveforms generated by two sources recorded at multiple stations. We first cross-correlate the records of the two sources at a single station. If the sources are co-located, this cross-correlation eliminates the phases of the Green's function. It leaves the relative phases of the source time functions, which should be the same across all stations so long as the spatial extent of the sources are small compared with the seismic wavelength. We therefore search for cross-correlation phases that are consistent across stations as an indication of co-located sources. We also introduce a method to obtain relative locations between the two sources, based on back-projection of inter-station phase coherence. We apply this technique to analyze two tremor-like signals that are thought to be composed of a number of earthquakes. First, we analyze a 20-second-long seismic precursor to a M 3.9 earthquake in central Alaska. The analysis locates the precursor to within 2 km of the mainshock, and it identifies several bursts of energy-potentially foreshocks or groups of foreshocks-within the precursor. Second, we examine several minutes of volcanic tremor prior to an eruption at Redoubt Volcano. We confirm that the tremor source is located close to repeating earthquakes identified earlier in the tremor sequence. The amplitude of the tremor diminishes about 30 seconds before the eruption, but the phase coherence results suggest that the tremor may persist at some level through this final interval.

  7. SU-E-J-212: Identifying Bones From MRI: A Dictionary Learnign and Sparse Regression Approach

    SciTech Connect

    Ruan, D; Yang, Y; Cao, M; Hu, P; Low, D

    2014-06-01

    Purpose: To develop an efficient and robust scheme to identify bony anatomy based on MRI-only simulation images. Methods: MRI offers important soft tissue contrast and functional information, yet its lack of correlation to electron-density has placed it as an auxiliary modality to CT in radiotherapy simulation and adaptation. An effective scheme to identify bony anatomy is an important first step towards MR-only simulation/treatment paradigm and would satisfy most practical purposes. We utilize a UTE acquisition sequence to achieve visibility of the bone. By contrast to manual + bulk or registration-to identify bones, we propose a novel learning-based approach for improved robustness to MR artefacts and environmental changes. Specifically, local information is encoded with MR image patch, and the corresponding label is extracted (during training) from simulation CT aligned to the UTE. Within each class (bone vs. nonbone), an overcomplete dictionary is learned so that typical patches within the proper class can be represented as a sparse combination of the dictionary entries. For testing, an acquired UTE-MRI is divided to patches using a sliding scheme, where each patch is sparsely regressed against both bone and nonbone dictionaries, and subsequently claimed to be associated with the class with the smaller residual. Results: The proposed method has been applied to the pilot site of brain imaging and it has showed general good performance, with dice similarity coefficient of greater than 0.9 in a crossvalidation study using 4 datasets. Importantly, it is robust towards consistent foreign objects (e.g., headset) and the artefacts relates to Gibbs and field heterogeneity. Conclusion: A learning perspective has been developed for inferring bone structures based on UTE MRI. The imaging setting is subject to minimal motion effects and the post-processing is efficient. The improved efficiency and robustness enables a first translation to MR-only routine. The scheme

  8. An immunohistochemical approach to identify the sex of young marine turtles.

    PubMed

    Tezak, Boris M; Guthrie, Kathleen; Wyneken, Jeanette

    2017-03-13

    Marine turtles exhibit temperature-dependent sex determination (TSD). During critical periods of embryonic development, the nest's thermal environment directs whether an embryo will develop as a male or female. At warmer sand temperatures nests tend to produce female-biased sex ratios. The rapid increase of global temperature highlights the need for a clear assessment of its effects on sea turtle sex ratios. However, estimating hatchling sex ratios at rookeries remains imprecise due to the lack of sexual dimorphism in young marine turtles. We rely mainly upon laparoscopic procedures to verify hatchling sex; however, in some species, morphological sex can be ambiguous even at the histological level. Recent studies using immunohistochemical (IHC) techniques identified that embryonic snapping turtle (Chelydra serpentina) ovaries over-expressed a particular Cold-induced RNA Binding Protein in comparison to testes. This feature allows the identification of females vs. males. We modified this technique to successfully identify the sexes of loggerhead sea turtle (Caretta caretta) hatchlings, and independently confirmed the results by standard histological and laparoscopic methods that reliably identify sex in this species. We next tested the CIRBP IHC method on gonad samples from leatherback turtles (Dermochelys coriacea). Leatherbacks display delayed gonad differentiation, when compared to other sea turtles, making hatchling gonads difficult to sex using standard H and E stain histology. The IHC approach was successful in both C. caretta and D. coriacea samples, offering a much-needed tool to establish baseline hatchling sex ratios, particularly for assessing impacts of climate change effects on leatherback turtle hatchlings and sea turtle demographics. This article is protected by copyright. All rights reserved.

  9. An approach to identify microRNAs involved in neuropathic pain following a peripheral nerve injury

    PubMed Central

    Norcini, Monica; Sideris, Alexandra; Martin Hernandez, Lourdes A.; Zhang, Jin; Blanck, Thomas J. J.; Recio-Pinto, Esperanza

    2014-01-01

    Peripheral nerve injury alters the expression of hundreds of proteins in dorsal root ganglia (DRG). Targeting some of these proteins has led to successful treatments for acute pain, but not for sustained post-operative neuropathic pain. The latter may require targeting multiple proteins. Since a single microRNA (miR) can affect the expression of multiple proteins, here, we describe an approach to identify chronic neuropathic pain-relevant miRs. We used two variants of the spared nerve injury (SNI): Sural-SNI and Tibial-SNI and found distinct pain phenotypes between the two. Both models induced strong mechanical allodynia, but only Sural-SNI rats maintained strong mechanical and cold allodynia, as previously reported. In contrast, we found that Tibial-SNI rats recovered from mechanical allodynia and never developed cold allodynia. Since both models involve nerve injury, we increased the probability of identifying differentially regulated miRs that correlated with the quality and magnitude of neuropathic pain and decreased the probability of detecting miRs that are solely involved in neuronal regeneration. We found seven such miRs in L3-L5 DRG. The expression of these miRs increased in Tibial-SNI. These miRs displayed a lower level of expression in Sural-SNI, with four having levels lower than those in sham animals. Bioinformatic analysis of how these miRs could affect the expression of some ion channels supports the view that, following a peripheral nerve injury, the increase of the seven miRs may contribute to the recovery from neuropathic pain while the decrease of four of them may contribute to the development of chronic neuropathic pain. The approach used resulted in the identification of a small number of potentially neuropathic pain relevant miRs. Additional studies are required to investigate whether manipulating the expression of the identified miRs in primary sensory neurons can prevent or ameliorate chronic neuropathic pain following peripheral nerve

  10. Engineering Approach to Identifying Patients with Colon Tumors on the Basis of Electrophotonic Imaging Technique Data

    PubMed Central

    Yakovleva, E.G.; Korotkov, K.G.; Fedorov, E.D.; Ivanova, E.V.; Plahov, R.V.; Belonosov, S.S.

    2016-01-01

    Background: Colonic neoplasms are quite a serious problem today. Screening methods play an important role in diagnosing the disease. Colorectal cancer screening is a complex undertaking, having various options, which require a lot of efforts both from the doctor and from the patient, including the use of sedatives and the necessity of the presence of an assistant for some procedures such as colonoscopy. This is why it is very important to find a method by which one can make a diagnosis quickly, easily, and painlessly. Methods: The ability to identify patients with tumors of the colon using the Electrophotonic Imaging (EPI) technique, as well as using it for differential diagnosis of tumors of the colon by their morphology, size and quantity was investigated. Selection of the most significant parameters of the EPI-graphy for the separation of the control group and the group of patients with tumors of the colon was developed. 137 people were studied with the EPI camera, with ages ranging from 16 to 86 years, including 49 males and 88 females. Based on the results of the colonoscopy and histological findings all subjects were divided into 2 groups: control group of 55 people, 9 males, 46 females; and patients with tumors (benign or malignant) of the colon - 82 people; 40 males and 42 females. Then all subjects were divided into smaller groups based on morphology, size, number of tumors and localization. Results: Based on the identified indicators decision rules to determine the patients with tumors of the colon were constructed. The specificity of the resulting function was 80.0% and sensitivity 75.6%. Decision rule was built as well with logistic regression. The specificity of the resulting function was 78.2% and sensitivity 90.0%. The accuracy of this approach was higher than using discriminant analysis. Conclusions: The results of this study have proven the ability to identify patients with tumors of the colon using EPI technology, as well as use it for

  11. A novel data-mining approach leveraging social media to monitor consumer opinion of sitagliptin.

    PubMed

    Akay, Altug; Dragomir, Andrei; Erlandsson, Björn-Erik

    2015-01-01

    A novel data mining method was developed to gauge the experience of the drug Sitagliptin (trade name Januvia) by patients with diabetes mellitus type 2. To this goal, we devised a two-step analysis framework. Initial exploratory analysis using self-organizing maps was performed to determine structures based on user opinions among the forum posts. The results were a compilation of user's clusters and their correlated (positive or negative) opinion of the drug. Subsequent modeling using network analysis methods was used to determine influential users among the forum members. These findings can open new avenues of research into rapid data collection, feedback, and analysis that can enable improved outcomes and solutions for public health and important feedback for the manufacturer.

  12. Heavy metal contamination and its indexing approach for groundwater of Goa mining region, India

    NASA Astrophysics Data System (ADS)

    Singh, Gurdeep; Kamal, Rakesh Kant

    2016-06-01

    The objective of the study is to reveal the seasonal variations in the groundwater quality with respect to heavy metal contamination. To get the extent of the heavy metals contamination, groundwater samples were collected from 45 different locations in and around Goa mining area during the monsoon and post-monsoon seasons. The concentration of heavy metals, such as lead, copper, manganese, zinc, cadmium, iron, and chromium, were determined using atomic absorption spectrophotometer. Most of the samples were found within limit except for Fe content during the monsoon season at two sampling locations which is above desirable limit, i.e., 300 µg/L as per Indian drinking water standard. The data generated were used to calculate the heavy metal pollution index (HPI) for groundwater. The mean values of HPI were 1.5 in the monsoon season and 2.1 in the post-monsoon season, and these values are well below the critical index limit of 100.

  13. Identifying typical patterns of vulnerability: A 5-step approach based on cluster analysis

    NASA Astrophysics Data System (ADS)

    Sietz, Diana; Lüdeke, Matthias; Kok, Marcel; Lucas, Paul; Carsten, Walther; Janssen, Peter

    2013-04-01

    Specific processes that shape the vulnerability of socio-ecological systems to climate, market and other stresses derive from diverse background conditions. Within the multitude of vulnerability-creating mechanisms, distinct processes recur in various regions inspiring research on typical patterns of vulnerability. The vulnerability patterns display typical combinations of the natural and socio-economic properties that shape a systems' vulnerability to particular stresses. Based on the identification of a limited number of vulnerability patterns, pattern analysis provides an efficient approach to improving our understanding of vulnerability and decision-making for vulnerability reduction. However, current pattern analyses often miss explicit descriptions of their methods and pay insufficient attention to the validity of their groupings. Therefore, the question arises as to how do we identify typical vulnerability patterns in order to enhance our understanding of a systems' vulnerability to stresses? A cluster-based pattern recognition applied at global and local levels is scrutinised with a focus on an applicable methodology and practicable insights. Taking the example of drylands, this presentation demonstrates the conditions necessary to identify typical vulnerability patterns. They are summarised in five methodological steps comprising the elicitation of relevant cause-effect hypotheses and the quantitative indication of mechanisms as well as an evaluation of robustness, a validation and a ranking of the identified patterns. Reflecting scale-dependent opportunities, a global study is able to support decision-making with insights into the up-scaling of interventions when available funds are limited. In contrast, local investigations encourage an outcome-based validation. This constitutes a crucial step in establishing the credibility of the patterns and hence their suitability for informing extension services and individual decisions. In this respect, working at

  14. Neuroimaging and Neuromodulation: Complementary Approaches for Identifying the Neuronal Correlates of Tinnitus

    PubMed Central

    Langguth, Berthold; Schecklmann, Martin; Lehner, Astrid; Landgrebe, Michael; Poeppl, Timm Benjamin; Kreuzer, Peter Michal; Schlee, Winfried; Weisz, Nathan; Vanneste, Sven; De Ridder, Dirk

    2012-01-01

    An inherent limitation of functional imaging studies is their correlational approach. More information about critical contributions of specific brain regions can be gained by focal transient perturbation of neural activity in specific regions with non-invasive focal brain stimulation methods. Functional imaging studies have revealed that tinnitus is related to alterations in neuronal activity of central auditory pathways. Modulation of neuronal activity in auditory cortical areas by repetitive transcranial magnetic stimulation (rTMS) can reduce tinnitus loudness and, if applied repeatedly, exerts therapeutic effects, confirming the relevance of auditory cortex activation for tinnitus generation and persistence. Measurements of oscillatory brain activity before and after rTMS demonstrate that the same stimulation protocol has different effects on brain activity in different patients, presumably related to interindividual differences in baseline activity in the clinically heterogeneous study cohort. In addition to alterations in auditory pathways, imaging techniques also indicate the involvement of non-auditory brain areas, such as the fronto-parietal “awareness” network and the non-tinnitus-specific distress network consisting of the anterior cingulate cortex, anterior insula, and amygdale. Involvement of the hippocampus and the parahippocampal region putatively reflects the relevance of memory mechanisms in the persistence of the phantom percept and the associated distress. Preliminary studies targeting the dorsolateral prefrontal cortex, the dorsal anterior cingulate cortex, and the parietal cortex with rTMS and with transcranial direct current stimulation confirm the relevance of the mentioned non-auditory networks. Available data indicate the important value added by brain stimulation as a complementary approach to neuroimaging for identifying the neuronal correlates of the various clinical aspects of tinnitus. PMID:22509155

  15. Geologic considerations in underground coal mining system design

    NASA Technical Reports Server (NTRS)

    Camilli, F. A.; Maynard, D. P.; Mangolds, A.; Harris, J.

    1981-01-01

    Geologic characteristics of coal resources which may impact new extraction technologies are identified and described to aid system designers and planners in their task of designing advanced coal extraction systems for the central Appalachian region. These geologic conditions are then organized into a matrix identified as the baseline mine concept. A sample region, eastern Kentucy is analyzed using both the developed baseline mine concept and the traditional geologic investigative approach.

  16. A metabolite profiling approach to identify biomarkers of flavonoid intake in humans.

    PubMed

    Loke, Wai Mun; Jenner, Andrew M; Proudfoot, Julie M; McKinley, Allan J; Hodgson, Jonathan M; Halliwell, Barry; Croft, Kevin D

    2009-12-01

    Flavonoids are phytochemicals that are widespread in the human diet. Despite limitations in their bioavailability, experimental and epidemiological data suggest health benefits of flavonoid consumption. Valid biomarkers of flavonoid intake may be useful for estimating exposure in a range of settings. However, to date, few useful flavonoid biomarkers have been identified. In this study, we used a metabolite profiling approach to examine the aromatic and phenolic profile of plasma and urine of healthy men after oral consumption of 200 mg of the pure flavonoids, quercetin, (-)-epicatechin, and epigallocatechin gallate, which represent major flavonoid constituents in the diet. Following enzymatic hydrolysis, 71 aromatic compounds were quantified in plasma and urine at 2 and 5 h, respectively, after flavonoid ingestion. Plasma concentrations of different aromatic compounds ranged widely, from 0.01 to 10 micromol/L, with variation among volunteers. None of the aromatic compounds was significantly elevated in plasma 2 h after consumption of either flavonoid compared with water placebo. This indicates that flavonoid-derived aromatic compounds are not responsible for the acute physiological effects reported within 2 h in previous human intervention studies involving flavonoids or flavonoid-rich food consumption. These effects are more likely due to absorption of the intact flavonoid. Our urine analysis suggested that urinary 4-ethylphenol, benzoic acid, and 4-ethylbenzoic acid may be potential biomarkers of quercetin intake and 1,3,5-trimethoxybenzene, 4-O-methylgallic acid, 3-O-methylgallic acid, and gallic acid may be potential markers of epigallocatechin gallate intake. Potential biomarkers of (-)-epicatechin were not identified. These urinary biomarkers may provide an accurate indication of flavonoid exposure.

  17. Novel phenotypes and loci identified through clinical genomics approaches to pediatric cataract.

    PubMed

    Patel, Nisha; Anand, Deepti; Monies, Dorota; Maddirevula, Sateesh; Khan, Arif O; Algoufi, Talal; Alowain, Mohammed; Faqeih, Eissa; Alshammari, Muneera; Qudair, Ahmed; Alsharif, Hadeel; Aljubran, Fatimah; Alsaif, Hessa S; Ibrahim, Niema; Abdulwahab, Firdous M; Hashem, Mais; Alsedairy, Haifa; Aldahmesh, Mohammed A; Lachke, Salil A; Alkuraya, Fowzan S

    2017-02-01

    Pediatric cataract is highly heterogeneous clinically and etiologically. While mostly isolated, cataract can be part of many multisystem disorders, further complicating the diagnostic process. In this study, we applied genomic tools in the form of a multi-gene panel as well as whole-exome sequencing on unselected cohort of pediatric cataract (166 patients from 74 families). Mutations in previously reported cataract genes were identified in 58% for a total of 43 mutations, including 15 that are novel. GEMIN4 was independently mutated in families with a syndrome of cataract, global developmental delay with or without renal involvement. We also highlight a recognizable syndrome that resembles galactosemia (a fulminant infantile liver disease with cataract) caused by biallelic mutations in CYP51A1. A founder mutation in RIC1 (KIAA1432) was identified in patients with cataract, brain atrophy, microcephaly with or without cleft lip and palate. For non-syndromic pediatric cataract, we map a novel locus in a multiplex consanguineous family on 4p15.32 where exome sequencing revealed a homozygous truncating mutation in TAPT1. We report two further candidates that are biallelically inactivated each in a single cataract family: TAF1A (cataract with global developmental delay) and WDR87 (non-syndromic cataract). In addition to positional mapping data, we use iSyTE developmental lens expression and gene-network analysis to corroborate the proposed link between the novel candidate genes and cataract. Our study expands the phenotypic, allelic and locus heterogeneity of pediatric cataract. The high diagnostic yield of clinical genomics supports the adoption of this approach in this patient group.

  18. An observation-based approach to identify local natural dust events from routine aerosol ground monitoring

    NASA Astrophysics Data System (ADS)

    Tong, D. Q.; Dan, M.; Wang, T.; Lee, P.

    2012-02-01

    Dust is a major component of atmospheric aerosols in many parts of the world. Although there exist many routine aerosol monitoring networks, it is often difficult to obtain dust records from these networks, because these monitors are either deployed far away from dust active regions (most likely collocated with dense population) or contaminated by anthropogenic sources and other natural sources, such as wildfires and vegetation detritus. Here we propose a new approach to identify local dust events relying solely on aerosol mass and composition from general-purpose aerosol measurements. Through analyzing the chemical and physical characteristics of aerosol observations during satellite-detected dust episodes, we select five indicators to be used to identify local dust records: (1) high PM10 concentrations; (2) low PM2.5/PM10 ratio; (3) higher concentrations and percentage of crustal elements; (4) lower percentage of anthropogenic pollutants; and (5) low enrichment factors of anthropogenic elements. After establishing these identification criteria, we conduct hierarchical cluster analysis for all validated aerosol measurement data over 68 IMPROVE sites in the Western United States. A total of 182 local dust events were identified over 30 of the 68 locations from 2000 to 2007. These locations are either close to the four US Deserts, namely the Great Basin Desert, the Mojave Desert, the Sonoran Desert, and the Chihuahuan Desert, or in the high wind power region (Colorado). During the eight-year study period, the total number of dust events displays an interesting four-year activity cycle (one in 2000-2003 and the other in 2004-2007). The years of 2003, 2002 and 2007 are the three most active dust periods, with 46, 31 and 24 recorded dust events, respectively, while the years of 2000, 2004 and 2005 are the calmest periods, all with single digit dust records. Among these deserts, the Chihuahua Desert (59 cases) and the Sonoran Desert (62 cases) are by far the most active

  19. Genetic Programming and Frequent Itemset Mining to Identify Feature Selection Patterns of iEEG and fMRI Epilepsy Data

    PubMed Central

    Smart, Otis; Burrell, Lauren

    2014-01-01

    Pattern classification for intracranial electroencephalogram (iEEG) and functional magnetic resonance imaging (fMRI) signals has furthered epilepsy research toward understanding the origin of epileptic seizures and localizing dysfunctional brain tissue for treatment. Prior research has demonstrated that implicitly selecting features with a genetic programming (GP) algorithm more effectively determined the proper features to discern biomarker and non-biomarker interictal iEEG and fMRI activity than conventional feature selection approaches. However for each the iEEG and fMRI modalities, it is still uncertain whether the stochastic properties of indirect feature selection with a GP yield (a) consistent results within a patient data set and (b) features that are specific or universal across multiple patient data sets. We examined the reproducibility of implicitly selecting features to classify interictal activity using a GP algorithm by performing several selection trials and subsequent frequent itemset mining (FIM) for separate iEEG and fMRI epilepsy patient data. We observed within-subject consistency and across-subject variability with some small similarity for selected features, indicating a clear need for patient-specific features and possible need for patient-specific feature selection or/and classification. For the fMRI, using nearest-neighbor classification and 30 GP generations, we obtained over 60% median sensitivity and over 60% median selectivity. For the iEEG, using nearest-neighbor classification and 30 GP generations, we obtained over 65% median sensitivity and over 65% median selectivity except one patient. PMID:25580059

  20. Identifying human disease genes: advances in molecular genetics and computational approaches.

    PubMed

    Bakhtiar, S M; Ali, A; Baig, S M; Barh, D; Miyoshi, A; Azevedo, V

    2014-07-04

    The human genome project is one of the significant achievements that have provided detailed insight into our genetic legacy. During the last two decades, biomedical investigations have gathered a considerable body of evidence by detecting more than 2000 disease genes. Despite the imperative advances in the genetic understanding of various diseases, the pathogenesis of many others remains obscure. With recent advances, the laborious methodologies used to identify DNA variations are replaced by direct sequencing of genomic DNA to detect genetic changes. The ability to perform such studies depends equally on the development of high-throughput and economical genotyping methods. Currently, basically for every disease whose origen is still unknown, genetic approaches are available which could be pedigree-dependent or -independent with the capacity to elucidate fundamental disease mechanisms. Computer algorithms and programs for linkage analysis have formed the foundation for many disease gene detection projects, similarly databases of clinical findings have been widely used to support diagnostic decisions in dysmorphology and general human disease. For every disease type, genome sequence variations, particularly single nucleotide polymorphisms are mapped by comparing the genetic makeup of case and control groups. Methods that predict the effects of polymorphisms on protein stability are useful for the identification of possible disease associations, whereas structural effects can be assessed using methods to predict stability changes in proteins using sequence and/or structural information.

  1. Machine learning approach identifies new pathways associated with demyelination in a viral model of multiple sclerosis.

    PubMed

    Ulrich, Reiner; Kalkuhl, Arno; Deschl, Ulrich; Baumgärtner, Wolfgang

    2010-01-01

    Theiler's murine encephalomyelitis is an experimentally virus-induced inflammatory demyelinating disease of the spinal cord, displaying clinical and pathological similarities to chronic progressive multiple sclerosis. The aim of this study was to identify pathways associated with chronic demyelination using an assumption-free combined microarray and immunohistology approach. Movement control as determined by rotarod assay significantly worsened in Theiler's murine encephalomyelitis -virus-infected SJL/J mice from 42 to 196 days after infection (dpi). In the spinal cords, inflammatory changes were detected 14 to 196 dpi, and demyelination progressively increased from 42 to 196 dpi. Microarray analysis revealed 1001 differentially expressed genes over the study period. The dominating changes as revealed by k-means and functional annotation clustering included up-regulations related to intrathecal antibody production and antigen processing and presentation via major histocompatibility class II molecules. A random forest machine learning algorithm revealed that down-regulated lipid and cholesterol biosynthesis, differentially expressed neurite morphogenesis and up-regulated toll-like receptor-4-induced pathways were intimately associated with demyelination as measured by immunohistology. Conclusively, although transcriptional changes were dominated by the adaptive immune response, the main pathways associated with demyelination included up-regulation of toll-like receptor 4 and down-regulation of cholesterol biosynthesis. Cholesterol biosynthesis is a rate limiting step of myelination and its down-regulation is suggested to be involved in chronic demyelination by an inhibition of remyelination.

  2. A Screening Approach for Identifying Gliadin Neutralizing Antibodies on Epithelial Intestinal Caco-2 Cells.

    PubMed

    Hundsberger, Harald; Koppensteiner, Anita; Hofmann, Elisabeth; Ripper, Doris; Pflüger, Maren; Stadlmann, Valerie; Klein, Christian Theodor; Kreiseder, Birgit; Katzlinger, Michael; Eger, Andreas; Forster, Florian; Missbichler, Albert; Wiesner, Christoph

    2017-03-01

    Celiac disease (CD) is a chronic inflammatory condition caused by the ingestion of gliadin-containing food in genetically susceptible individuals. Undigested peptides of gliadin exert various effects, including increased intestinal permeability and inflammation in the small intestine. Although many therapeutic approaches are in development, a gluten-free diet is the only effective treatment for CD. Affecting at least 1% of the population in industrialized countries, it is important to generate therapeutic options against CD. Here, we describe the establishment of a high-throughput screening (HTS) platform based on AlphaLISA and electrical cell-substrate impedance sensing (ECIS) technology for the identification of anti-inflammatory and barrier-protective compounds in human enterocytes after pepsin-trypsin-digested gliadin (PT-gliadin) treatment. Our results show that the combination of these HTS technologies enables fast, reliable, simple, and label-free screening of IgY antibodies against PT-gliadin. Using this platform, we have identified a new chicken anti-PT-gliadin IgY antibody as a potential anti-CD agent.

  3. Identifying contamination with advanced visualization and analysis practices: metagenomic approaches for eukaryotic genome assemblies.

    PubMed

    Delmont, Tom O; Eren, A Murat

    2016-01-01

    High-throughput sequencing provides a fast and cost-effective mean to recover genomes of organisms from all domains of life. However, adequate curation of the assembly results against potential contamination of non-target organisms requires advanced bioinformatics approaches and practices. Here, we re-analyzed the sequencing data generated for the tardigrade Hypsibius dujardini, and created a holistic display of the eukaryotic genome assembly using DNA data originating from two groups and eleven sequencing libraries. By using bacterial single-copy genes, k-mer frequencies, and coverage values of scaffolds we could identify and characterize multiple near-complete bacterial genomes from the raw assembly, and curate a 182 Mbp draft genome for H. dujardini supported by RNA-Seq data. Our results indicate that most contaminant scaffolds were assembled from Moleculo long-read libraries, and most of these contaminants have differed between library preparations. Our re-analysis shows that visualization and curation of eukaryotic genome assemblies can benefit from tools designed to address the needs of today's microbiologists, who are constantly challenged by the difficulties associated with the identification of distinct microbial genomes in complex environmental metagenomes.

  4. A network-based phenotype mapping approach to identify genes that modulate drug response phenotypes

    PubMed Central

    Cairns, Junmei; Ung, Choong Yong; da Rocha, Edroaldo Lummertz; Zhang, Cheng; Correia, Cristina; Weinshilboum, Richard; Wang, Liewei; Li, Hu

    2016-01-01

    To better address the problem of drug resistance during cancer chemotherapy and explore the possibility of manipulating drug response phenotypes, we developed a network-based phenotype mapping approach (P-Map) to identify gene candidates that upon perturbed can alter sensitivity to drugs. We used basal transcriptomics data from a panel of human lymphoblastoid cell lines (LCL) to infer drug response networks (DRNs) that are responsible for conferring response phenotypes for anthracycline and taxane, two common anticancer agents use in clinics. We further tested selected gene candidates that interact with phenotypic differentially expressed genes (PDEGs), which are up-regulated genes in LCL for a given class of drug response phenotype in triple-negative breast cancer (TNBC) cells. Our results indicate that it is possible to manipulate a drug response phenotype, from resistant to sensitive or vice versa, by perturbing gene candidates in DRNs and suggest plausible mechanisms regulating directionality of drug response sensitivity. More important, the current work highlights a new way to formulate systems-based therapeutic design: supplementing therapeutics that aim to target disease culprits with phenotypic modulators capable of altering DRN properties with the goal to re-sensitize resistant phenotypes. PMID:27841317

  5. Identifying contamination with advanced visualization and analysis practices: metagenomic approaches for eukaryotic genome assemblies

    PubMed Central

    Delmont, Tom O.

    2016-01-01

    High-throughput sequencing provides a fast and cost-effective mean to recover genomes of organisms from all domains of life. However, adequate curation of the assembly results against potential contamination of non-target organisms requires advanced bioinformatics approaches and practices. Here, we re-analyzed the sequencing data generated for the tardigrade Hypsibius dujardini, and created a holistic display of the eukaryotic genome assembly using DNA data originating from two groups and eleven sequencing libraries. By using bacterial single-copy genes, k-mer frequencies, and coverage values of scaffolds we could identify and characterize multiple near-complete bacterial genomes from the raw assembly, and curate a 182 Mbp draft genome for H. dujardini supported by RNA-Seq data. Our results indicate that most contaminant scaffolds were assembled from Moleculo long-read libraries, and most of these contaminants have differed between library preparations. Our re-analysis shows that visualization and curation of eukaryotic genome assemblies can benefit from tools designed to address the needs of today’s microbiologists, who are constantly challenged by the difficulties associated with the identification of distinct microbial genomes in complex environmental metagenomes. PMID:27069789

  6. Identifying key interactions stabilizing DOF zinc finger-DNA complexes using in silico approaches.

    PubMed

    Hamzeh-Mivehroud, Maryam; Moghaddas-Sani, Hakimeh; Rahbar-Shahrouziasl, Mahdieh; Dastmalchi, Siavoush

    2015-10-07

    DOF (DNA-binding with one finger) proteins, a family of DNA-binding transcription factors, are members of zinc fingers unique to plants. They are associated with different plant specific phenomena including germination, dormancy, light and defense responses. Until now, there is no report of experimentally solved structure for DOF proteins, making empirical investigation of DOF-DNA interaction more challenging. It has been shown that comparative modeling can be used to reliably predict the three-dimensional (3D) model of structurally unknown proteins whenever a suitable template is available. Furthermore, current molecular mechanics force fields allow prediction of interaction energies for macromolecular complexes. Therefore, the approaches considered in this work were to model the 3D structures of DOF zinc fingers (ZFs) from Arabidopsis thaliana complexed with DNA molecule, to calculate their binding energies, to identify key interactions established between ZFs and DNA, and to determine the impact of the different interactions on the binding energies. The results were used to predict the binding affinities for the novel designed ZFs and may be used in engineering DNA binding proteins.

  7. Mixed Integer Linear Programming based machine learning approach identifies regulators of telomerase in yeast.

    PubMed

    Poos, Alexandra M; Maicher, André; Dieckmann, Anna K; Oswald, Marcus; Eils, Roland; Kupiec, Martin; Luke, Brian; König, Rainer

    2016-06-02

    Understanding telomere length maintenance mechanisms is central in cancer biology as their dysregulation is one of the hallmarks for immortalization of cancer cells. Important for this well-balanced control is the transcriptional regulation of the telomerase genes. We integrated Mixed Integer Linear Programming models into a comparative machine learning based approach to identify regulatory interactions that best explain the discrepancy of telomerase transcript levels in yeast mutants with deleted regulators showing aberrant telomere length, when compared to mutants with normal telomere length. We uncover novel regulators of telomerase expression, several of which affect histone levels or modifications. In particular, our results point to the transcription factors Sum1, Hst1 and Srb2 as being important for the regulation of EST1 transcription, and we validated the effect of Sum1 experimentally. We compiled our machine learning method leading to a user friendly package for R which can straightforwardly be applied to similar problems integrating gene regulator binding information and expression profiles of samples of e.g. different phenotypes, diseases or treatments.

  8. HDR: a statistical two-step approach successfully identifies disease genes in autosomal recessive families.

    PubMed

    Imai, Atsuko; Kohda, Masakazu; Nakaya, Akihiro; Sakata, Yasushi; Murayama, Kei; Ohtake, Akira; Lathrop, Mark; Okazaki, Yasushi; Ott, Jurg

    2016-11-01

    In the search for sequence variants underlying disease, commonly applied filtering steps usually result in a number of candidate variants that cannot further be narrowed down. In autosomal recessive families, disease usually occurs only in one generation so that genetic linkage analysis is unlikely to help. Because homozygous recessive mutations tend to be inherited together with flanking homozygous variants, we developed a statistical method to detect pathogenic variants in autosomal recessive families: We look for differences in patterns of homozygosity around candidate variants between patients and control individuals and expect that such differences are greater for pathogenic variants than random candidate variants. In six autosomal recessive mitochondrial disease families, in which pathogenic homozygous variants have already been identified, our approach succeeded in prioritizing pathogenic mutations. Our method is applicable to single patients from recessive families with at least a few dozen control individuals from the same population; it is easy to use and is highly effective for detecting causative mutations in autosomal recessive families.

  9. AN INTEGRATED NETWORK APPROACH TO IDENTIFYING BIOLOGICAL PATHWAYS AND ENVIRONMENTAL EXPOSURE INTERACTIONS IN COMPLEX DISEASES

    PubMed Central

    DARABOS, CHRISTIAN; QIU, JINGYA; MOORE, JASON H.

    2015-01-01

    Complex diseases are the result of intricate interactions between genetic, epigenetic and environmental factors. In previous studies, we used epidemiological and genetic data linking environmental exposure or genetic variants to phenotypic disease to construct Human Phenotype Networks and separately analyze the effects of both environment and genetic factors on disease interactions. To better capture the intricacies of the interactions between environmental exposure and the biological pathways in complex disorders, we integrate both aspects into a single “tripartite” network. Despite extensive research, the mechanisms by which chemical agents disrupt biological pathways are still poorly understood. In this study, we use our integrated network model to identify specific biological pathway candidates possibly disrupted by environmental agents. We conjecture that a higher number of co-occurrences between an environmental substance and biological pathway pair can be associated with a higher likelihood that the substance is involved in disrupting that pathway. We validate our model by demonstrating its ability to detect known arsenic and signal transduction pathway interactions and speculate on candidate cell-cell junction organization pathways disrupted by cadmium. The validation was supported by distinct publications of cell biology and genetic studies that associated environmental exposure to pathway disruption. The integrated network approach is a novel method for detecting the biological effects of environmental exposures. A better understanding of the molecular processes associated with specific environmental exposures will help in developing targeted molecular therapies for patients who have been exposed to the toxicity of environmental chemicals. PMID:26776169

  10. Novel Vaccine Candidates against Brucella melitensis Identified through Reverse Vaccinology Approach.

    PubMed

    Vishnu, Udayakumar S; Sankarasubramanian, Jagadesan; Gunasekaran, Paramasamy; Rajendhran, Jeyaprakash

    2015-11-01

    Global health therapeutics is a rapidly emerging facet of postgenomics medicine. In this connection, Brucella melitensis is an intracellular bacterium that causes the zoonotic infectious disease, brucellosis. Presently, no licensed vaccines are available for human brucellosis. Here, we report the identification of potential vaccine candidates against B. melitensis using a reverse vaccinology approach. Based on a systematic screening of exoproteome and secretome of B. melitensis 16 M, we identified eight proteins as potential vaccine candidates, including LPS-assembly protein LptD, a polysaccharide export protein, a cell surface protein, heme transporter BhuA, flagellin FliC, 7-alpha-hydroxysteroid dehydrogenase, immunoglobulin-binding protein EIBE, and hemagglutinin. Among these, the roles of BhuA and hemagglutinin in the virulence of Brucella are essential to establish infection. Roles of other proteins in the virulence are yet to be studied. Prediction of protein-protein interactions revealed that these proteins can interact with other proteins involved in virulence, secretion system, metabolism, and transport. From these eight potential vaccine candidates, we predicted three surface exposed novel antigenic epitopes that can induce both B-cell and T-cell immune responses. These peptides can be used for the development of either exclusive peptide vaccines or multi-component vaccines against human brucellosis. Reverse vaccinology is an important strategy for discovery of novel global health therapeutics.

  11. Uranium Mines and Mills Location Database

    EPA Pesticide Factsheets

    The Uranium Mines and Mills location database identifies and shows the location of active and inactive uranium mines and mills, as well as mines which principally produced other minerals, but were known to have uranium in the ore.

  12. A visual data-mining approach using 3D thoracic CT images for classification between benign and malignant pulmonary nodules

    NASA Astrophysics Data System (ADS)

    Kawata, Yoshiki; Niki, Noboru; Ohamatsu, Hironobu; Kusumoto, Masahiko; Kakinuma, Ryutaro; Mori, Kiyoshi; Yamada, K.; Nishiyama, Hiroyuki; Eguchi, Kenji; Kaneko, Masahiro; Moriyama, Noriyuki

    2003-05-01

    This paper presents a visual data-mining approach to assist physicians for classification between benign and malignant pulmonary nodules. This approach retrieves and displays nodules which exhibit morphological and internal profiles consistent to the nodule in question. It uses a three-dimensional (3-D) CT image database of pulmonary nodules for which diagnosis is known. The central module in this approach makes possible analysis of the query nodule image and extraction of the features of interest: shape, surrounding structure, and internal structure of the nodules. The nodule shape is characterized by principal axes, while the surrounding and internal structure is represented by the distribution pattern of CT density and 3-D curvature indexes. The nodule representation is then applied to a similarity measure such as a correlation coefficient. For each query case, we sort all the nodules of the database from most to less similar ones. By applying the retrieval method to our database, we present its feasibility to search the similar 3-D nodule images.

  13. Using a Data Mining Approach to Develop a Student Engagement-Based Institutional Typology. IR Applications, Volume 18, February 8, 2009

    ERIC Educational Resources Information Center

    Luan, Jing; Zhao, Chun-Mei; Hayek, John C.

    2009-01-01

    Data mining provides both systematic and systemic ways to detect patterns of student engagement among students at hundreds of institutions. Using traditional statistical techniques alone, the task would be significantly difficult--if not impossible--considering the size and complexity in both data and analytical approaches necessary for this…

  14. Integrated approach to assess the environmental impact of mining activities: estimation of the spatial distribution of soil contamination (Panasqueira mining area, Central Portugal).

    PubMed

    Candeias, Carla; Ávila, Paula F; Ferreira da Silva, Eduardo; Teixeira, João Paulo

    2015-03-01

    Through the years, mining and beneficiation processes in Panasqueira Sn-W mine (Central Portugal) produced large amounts of As-rich mine wastes laid up in huge tailings and open-air impoundments (Barroca Grande and Rio tailings) that are the main source of pollution in the surrounding area once they are exposed to the weathering conditions leading to the formation of acid mine drainage (AMD) and consequently to the contamination of the surrounding environments, particularly soils. The active mine started the exploration during the nineteenth century. This study aims to look at the extension of the soil pollution due to mining activities and tailing erosion by combining data on the degree of soil contamination that allows a better understanding of the dynamics inherent to leaching, transport, and accumulation of some potential toxic elements in soil and their environmental relevance. Soil samples were collected in the surrounding soils of the mine, were digested in aqua regia, and were analyzed for 36 elements by inductively coupled plasma mass spectrometry (ICP-MS). Selected results are that (a) an association of elements like Ag, As, Bi, Cd, Cu, W, and Zn strongly correlated and controlled by the local sulfide mineralization geochemical signature was revealed; (b) the global area discloses significant concentrations of As, Bi, Cd, and W linked to the exchangeable and acid-soluble bearing phases; and (c) wind promotes the mechanical dispersion of the rejected materials, from the milled waste rocks and the mineral processing plant, with subsequent deposition on soils and waters. Arsenic- and sulfide-related heavy metals (such as Cu and Cd) are associated to the fine materials that are transported in suspension by surface waters or associated to the acidic waters, draining these sites and contaminating the local soils. Part of this fraction, especially for As, Cd, and Cu, is temporally retained in solid phases by precipitation of soluble secondary minerals (through

  15. A Graph Approach to Mining Biological Patterns in the Binding Interfaces.

    PubMed

    Cheng, Wen; Yan, Changhui

    2017-01-01

    Protein-RNA interactions play important roles in the biological systems. Searching for regular patterns in the Protein-RNA binding interfaces is important for understanding how protein and RNA recognize each other and bind to form a complex. Herein, we present a graph-mining method for discovering biological patterns in the protein-RNA interfaces. We represented known protein-RNA interfaces using graphs and then discovered graph patterns enriched in the interfaces. Comparison of the discovered graph patterns with UniProt annotations showed that the graph patterns had a significant overlap with residue sites that had been proven crucial for the RNA binding by experimental methods. Using 200 patterns as input features, a support vector machine method was able to classify protein surface patches into RNA-binding sites and non-RNA-binding sites with 84.0% accuracy and 88.9% precision. We built a simple scoring function that calculated the total number of the graph patterns that occurred in a protein-RNA interface. That scoring function was able to discriminate near-native protein-RNA complexes from docking decoys with a performance comparable with that of a state-of-the-art complex scoring function. Our work also revealed possible patterns that might be important for binding affinity.

  16. Modeling Approach/Strategy for Corrective Action Unit 97, Yucca Flat and Climax Mine , Revision 0

    SciTech Connect

    Janet Willie

    2003-08-01

    The objectives of the UGTA corrective action strategy are to predict the location of the contaminant boundary for each CAU, develop and implement a corrective action, and close each CAU. The process for achieving this strategy includes modeling to define the maximum extent of contaminant transport within a specified time frame. Modeling is a method of forecasting how the hydrogeologic system, including the underground test cavities, will behave over time with the goal of assessing the migration of radionuclides away from the cavities and chimneys. Use of flow and transport models to achieve the objectives of the corrective action strategy is specified in the FFACO. In the Yucca Flat/Climax Mine system, radionuclide migration will be governed by releases from the cavities and chimneys, and transport in alluvial aquifers, fractured and partially fractured volcanic rock aquifers and aquitards, the carbonate aquifers, and in intrusive units. Additional complexity is associated with multiple faults in Yucca Flat and the need to consider reactive transport mechanisms that both reduce and enhance the mobility of radionuclides. A summary of the data and information that form the technical basis for the model is provided in this document.

  17. Perception of mercury contamination by Brazilian adolescents in a gold mining community: an ethnographic approach.

    PubMed

    Novais, Gabriel; Câmara, Volney de Magalhães

    2009-01-01

    This study used ethnographic methods to examine the perception of mercury contamination by adolescents in the mining community of Poconé, Mato Grosso, Brazil. In Phase I, 53 students aged 13 to 16 years in six schools presented theatrical sketches about community health risks to generate key terms for a pile sorting activity in Phase II. Mercury was reported by four of the 15 groups (26%). In Phase II, researchers conducted semi-structured interviews and pile sorts with 31 students to assess adolescent attitudes about mercury and to generate an ethnomedical model of mercury perception. The lack of consensus evident in the model reveals that while students view mercury as an overall threat, many of them do not understand how its presence can harm human health. Few adolescents felt confident about their knowledge (3%) or could accurately explain how it was used (9%), even though many of them had relatives working as miners (55%). Further analysis of pile sort data suggests that mercury may not belong in a 'typical risks' domain. The authors argue that ethnographic methods are a useful tool for public health research, and hope that these findings can contribute to health education interventions in the field.

  18. Mining pinyin-to-character conversion rules from large-scale corpus: a rough set approach.

    PubMed

    Wang, Xiaolong; Chen, Qingcai; Yeung, Daniel S

    2004-04-01

    This paper introduces a rough set technique for solving the problem of mining Pinyin-to-character (PTC) conversion rules. It first presents a text-structuring method by constructing a language information table from a corpus for each pinyin, which it will then apply to a free-form textual corpus. Data generalization and rule extraction algorithms can then be used to eliminate redundant information and extract consistent PTC conversion rules. The design of our model also addresses a number of important issues such as the long-distance dependency problem, the storage requirements of the rule base, and the consistency of the extracted rules, while the performance of the extracted rules as well as the effects of different model parameters are evaluated experimentally. These results show that by the smoothing method, high precision conversion (0.947) and recall rates (0.84) can be achieved even for rules represented directly by pinyin rather than words. A comparison with the baseline tri-gram model also shows good complement between our method and the tri-gram language model.

  19. A Graph Approach to Mining Biological Patterns in the Binding Interfaces

    PubMed Central

    Cheng, Wen

    2017-01-01

    Abstract Protein–RNA interactions play important roles in the biological systems. Searching for regular patterns in the Protein–RNA binding interfaces is important for understanding how protein and RNA recognize each other and bind to form a complex. Herein, we present a graph-mining method for discovering biological patterns in the protein–RNA interfaces. We represented known protein–RNA interfaces using graphs and then discovered graph patterns enriched in the interfaces. Comparison of the discovered graph patterns with UniProt annotations showed that the graph patterns had a significant overlap with residue sites that had been proven crucial for the RNA binding by experimental methods. Using 200 patterns as input features, a support vector machine method was able to classify protein surface patches into RNA-binding sites and non-RNA-binding sites with 84.0% accuracy and 88.9% precision. We built a simple scoring function that calculated the total number of the graph patterns that occurred in a protein–RNA interface. That scoring function was able to discriminate near-native protein–RNA complexes from docking decoys with a performance comparable with that of a state-of-the-art complex scoring function. Our work also revealed possible patterns that might be important for binding affinity. PMID:27892693

  20. A data-mining approach for investigating social and economic geographical dynamics of beta-thalassemia's spread.

    PubMed

    Akay, Altug; Dragomir, Andrei; Yardimci, Ahmet; Canatan, Duran; Yesilipek, Akif; Pogue, Brian W

    2009-09-01

    Beta-thalassemia is an anemic genetic disorder that remains a major global health issue, especially in the globalized era where public health, economics, and education are tightly interwoven. Previous studies have examined the disease's rate and heredity. This study analyzed beta-thalassemia's socioeconomic geography and how it affects the afflicted population. We processed survey data and performed data mining using self-organizing maps to identify underlying data structure. We hypothesized that certain variables mark subgroups within the affected population and we aimed at identifying these subgroups and used a correlation-based measure to assess the variable's importance to the subgroup's distinction. The population's education level was one of the major factors that divided it into different subgroups. Our study showed that recurring patterns of specific variables separated the affected population into disparate subgroups based on their response to questionnaires. Future studies can use such tools to delve deeper into how other variables (e.g. socioeconomic and genomic) can identify subgroups within larger affected populations.

  1. Employment among Working-Age Adults with Multiple Sclerosis: A Data-Mining Approach to Identifying Employment Interventions

    ERIC Educational Resources Information Center

    Bishop, Malachy; Chan, Fong; Rumrill, Phillip D., Jr.; Frain, Michael P.; Tansey, Timothy N.; Chiu, Chung-Yi; Strauser, David; Umeasiegbu, Veronica I.

    2015-01-01

    Purpose: To examine demographic, functional, and clinical multiple sclerosis (MS) variables affecting employment status in a national sample of adults with MS in the United States. Method: The sample included 4,142 working-age (20-65 years) Americans with MS (79.1% female) who participated in a national survey. The mean age of participants was…

  2. A Complementary Bioinformatics Approach to Identify Potential Plant Cell Wall Glycosyltransferase-Encoding Genes1[w

    PubMed Central

    Egelund, Jack; Skjøt, Michael; Geshi, Naomi; Ulvskov, Peter; Petersen, Bent Larsen

    2004-01-01

    Plant cell wall (CW) synthesizing enzymes can be divided into the glycan (i.e. cellulose and callose) synthases, which are multimembrane spanning proteins located at the plasma membrane, and the glycosyltransferases (GTs), which are Golgi localized single membrane spanning proteins, believed to participate in the synthesis of hemicellulose, pectin, mannans, and various glycoproteins. At the Carbohydrate-Active enZYmes (CAZy) database where e.g. glucoside hydrolases and GTs are classified into gene families primarily based on amino acid sequence similarities, 415 Arabidopsis GTs have been classified. Although much is known with regard to composition and fine structures of the plant CW, only a handful of CW biosynthetic GT genes—all classified in the CAZy system—have been characterized. In an effort to identify CW GTs that have not yet been classified in the CAZy database, a simple bioinformatics approach was adopted. First, the entire Arabidopsis proteome was run through the Transmembrane Hidden Markov Model 2.0 server and proteins containing one or, more rarely, two transmembrane domains within the N-terminal 150 amino acids were collected. Second, these sequences were submitted to the SUPERFAMILY prediction server, and sequences that were predicted to belong to the superfamilies NDP-sugartransferase, UDP-glycosyltransferase/glucogen-phosphorylase, carbohydrate-binding domain, Gal-binding domain, or Rossman fold were collected, yielding a total of 191 sequences. Fifty-two accessions already classified in CAZy were discarded. The resulting 139 sequences were then analyzed using the Three-Dimensional-Position-Specific Scoring Matrix and mGenTHREADER servers, and 27 sequences with similarity to either the GT-A or the GT-B fold were obtained. Proof of concept of the present approach has to some extent been provided by our recent demonstration that two members of this pool of 27 non-CAZy-classified putative GTs are xylosyltransferases involved in synthesis of pectin

  3. Outbreaks source: A new mathematical approach to identify their possible location

    NASA Astrophysics Data System (ADS)

    Buscema, Massimo; Grossi, Enzo; Breda, Marco; Jefferson, Tom

    2009-11-01

    Classical epidemiology has generally relied on the description and explanation of the occurrence of infectious diseases in relation to time occurrence of events rather than to place of occurrence. In recent times, computer generated dot maps have facilitated the modeling of the spread of infectious epidemic diseases either with classical statistics approaches or with artificial “intelligent systems”. Few attempts, however, have been made so far to identify the origin of the epidemic spread rather than its evolution by mathematical topology methods. We report on the use of a new artificial intelligence method (the H-PST Algorithm) and we compare this new technique with other well known algorithms to identify the source of three examples of infectious disease outbreaks derived from literature. The H-PST algorithm is a new system able to project a distances matrix of points (events) into a bi-dimensional space, with the generation of a new point, named hidden unit. This new hidden unit deforms the original Euclidean space and transforms it into a new space (cognitive space). The cost function of this transformation is the minimization of the differences between the original distance matrix among the assigned points and the distance matrix of the same points projected into the bi-dimensional map (or any different set of constraints). For many reasons we will discuss, the position of the hidden unit shows to target the outbreak source in many epidemics much better than the other classic algorithms specifically targeted for this task. Compared with main algorithms known in the location theory, the hidden unit was within yards of the outbreak source in the first example (the 2007 epidemic of Chikungunya fever in Italy). The hidden unit was located in the river between the two village epicentres of the spread exactly where the index case was living. Equally in the second (the 1967 foot and mouth disease epidemic in England), and the third (1854 London Cholera epidemic

  4. A multi-proxy approach to identifying short-lived marine incursions in the Early Carboniferous

    NASA Astrophysics Data System (ADS)

    Bennett, Carys; Davies, Sarah; Leng, Melanie; Snelling, Andrea; Millward, David; Kearsey, Timothy; Marshall, John; Reves, Emma

    2015-04-01

    This study is a contribution to the TW:eed Project (Tetrapod World: early evolution and diversification), which examines the rebuilding of Carboniferous ecosystems following a mass extinction at the end of the Devonian. The project focuses on the Tournaisian Ballagan Formation of Scotland and the Borders, which contains rare fish and tetrapod material. The Ballagan Formation is characterised by sandstones, dolomitic cementstones, paleosols, siltstones and gypsum deposits. The depositional environment ranges from fluvial, alluvial-plain to marginal-marine environments, with fluvial, floodplain and lacustrine deposition dominant. A multi-proxy approach combining sedimentology, palaeontology, micropalaeontology, palynology and geochemistry is used to identify short-lived marine transgressions onto the floodplain environment. Rare marginal marine fossils are: Chondrites-Phycosiphon, Spirorbis, Serpula, certain ostracod species, rare orthocones, brachiopods and putative marine sharks. More common non-marine fauna include Leiocopida and Podocopida ostracods, Mytilida and Myalinida bivalves, plants, eurypterids, gastropods and fish. Thin carbonate-bearing dolomitic cementstones and siltstone contain are the sedimentary deposits of marine incursions and occur throughout the formation. Over 600 bulk carbon isotope samples were taken from the 500 metre thick Norham Core (located near Berwick-Upon-Tweed), encompassing a time interval of around 13 million years. The results range from -26o to -19 δ13Corg, with an average of -19o much lighter than the average value for Early Carboniferous marine bulk organic matter (δ13C of -28 to -30). The isotope results correspond to broad-scale changes in the depositional setting, with more positive δ13C in pedogenic sediments and more negative δ13C in un-altered grey siltstones. They may also relate to cryptic (short-lived) marine incursions. A comparison of δ13C values from specific plant/wood fragments, palynology and bulk

  5. Identifying diffused nitrate sources in a stream in an agricultural field using a dual isotopic approach.

    PubMed

    Ding, Jingtao; Xi, Beidou; Gao, Rutai; He, Liansheng; Liu, Hongliang; Dai, Xuanli; Yu, Yijun

    2014-06-15

    Nitrate (NO3(-)) pollution is a severe problem in aquatic systems in Taihu Lake Basin in China. A dual isotope approach (δ(15)NNO3(-) and δ(18)ONO3(-)) was applied to identify diffused NO3(-) inputs in a stream in an agricultural field at the basin in 2013. The site-specific isotopic characteristics of five NO3(-) sources (atmospheric deposition, AD; NO3(-) derived from soil organic matter nitrification, NS; NO3(-) derived from chemical fertilizer nitrification, NF; groundwater, GW; and manure and sewage, M&S) were identified. NO3(-) concentrations in the stream during the rainy season [mean±standard deviation (SD)=2.5±0.4mg/L] were lower than those during the dry season (mean±SD=4.0±0.5mg/L), whereas the δ(18)ONO3(-) values during the rainy season (mean±SD=+12.3±3.6‰) were higher than those during the dry season (mean±SD=+0.9±1.9‰). Both chemical and isotopic characteristics indicated that mixing with atmospheric NO3(-) resulted in the high δ(18)O values during the rainy season, whereas NS and M&S were the dominant NO3(-) sources during the dry season. A Bayesian model was used to determine the contribution of each NO3(-) source to total stream NO3(-). Results showed that reduced N nitrification in soil zones (including soil organic matter and fertilizer) was the main NO3(-) source throughout the year. M&S contributed more NO3(-) during the dry season (22.4%) than during the rainy season (17.8%). AD generated substantial amounts of NO3(-) in May (18.4%), June (29.8%), and July (24.5%). With the assessment of temporal variation of diffused NO3(-) sources in agricultural field, improved agricultural management practices can be implemented to protect the water resource and avoid further water quality deterioration in Taihu Lake Basin.

  6. A Systems Biology Approach Identifies a Regulatory Network in Parotid Acinar Cell Terminal Differentiation

    PubMed Central

    Metzler, Melissa A.; Venkatesh, Srirangapatnam G.; Lakshmanan, Jaganathan; Carenbauer, Anne L.; Perez, Sara M.; Andres, Sarah A.; Appana, Savitri; Brock, Guy N.; Wittliff, James L.; Darling, Douglas S.

    2015-01-01

    Objective The transcription factor networks that drive parotid salivary gland progenitor cells to terminally differentiate, remain largely unknown and are vital to understanding the regeneration process. Methodology A systems biology approach was taken to measure mRNA and microRNA expression in vivo across acinar cell terminal differentiation in the rat parotid salivary gland. Laser capture microdissection (LCM) was used to specifically isolate acinar cell RNA at times spanning the month-long period of parotid differentiation. Results Clustering of microarray measurements suggests that expression occurs in four stages. mRNA expression patterns suggest a novel role for Pparg which is transiently increased during mid postnatal differentiation in concert with several target gene mRNAs. 79 microRNAs are significantly differentially expressed across time. Profiles of statistically significant changes of mRNA expression, combined with reciprocal correlations of microRNAs and their target mRNAs, suggest a putative network involving Klf4, a differentiation inhibiting transcription factor, which decreases as several targeting microRNAs increase late in differentiation. The network suggests a molecular switch (involving Prdm1, Sox11, Pax5, miR-200a, and miR-30a) progressively decreases repression of Xbp1 gene transcription, in concert with decreased translational repression by miR-214. The transcription factor Xbp1 mRNA is initially low, increases progressively, and may be maintained by a positive feedback loop with Atf6. Transfection studies show that Xbp1Mist1 promoter. In addition, Xbp1 and Mist1 each activate the parotid secretory protein (Psp) gene, which encodes an abundant salivary protein, and is a marker of terminal differentiation. Conclusion This study identifies novel expression patterns of Pparg, Klf4, and Sox11 during parotid acinar cell differentiation, as well as numerous differentially expressed microRNAs. Network analysis identifies a novel stemness arm, a

  7. Data Mining Approach for Evaluating Vegetation Dynamics in Earth System Models (ESMs) Using Satellite Remote Sensing Products

    NASA Astrophysics Data System (ADS)

    Shu, S.; Hoffman, F. M.; Kumar, J.; Hargrove, W. W.; Jain, A. K.

    2014-12-01

    biome types. However, Mapcurves results showed a relatively low goodness of fit score for modeled phenology projected onto observations. This study demonstrates the utility of a data mining approach for cross-validation of observations and evaluation of model performance.

  8. Identifying core foods for total diet studies: a comparison of four different approaches.

    PubMed

    Devlin, Niamh F C; McNulty, Breige A; Turrini, Aida; Tlustos, Christina; Hearty, Aine P; Volatier, Jean-Luc; Kelleher, Cecily C; Nugent, Anne P

    2014-01-01

    Total diet studies (TDS) are recognised as a cost-effective approach in estimating dietary exposure to chemicals in food. It has been advised that candidate foods for inclusion in TDS analysis should represent a large part of the typical diet to estimate accurately the exposure of a population group. To date a variety of approaches have been used to determine which foods should be included in a core TDS food list, with no agreed method. Therefore, the aim of this study was to compare four of these approaches by creating TDS food lists for adult populations in Europe using summary statistics data from the EFSA Comprehensive Food Consumption Database. Both a food group approach and a total diet approach were employed, and foods were selected for inclusion in the TDS food lists if they met the criteria as defined by consumption weight and/or a 5% consumer rate. Using all four approaches the representation of the diet across the TDS food lists was > 85%. The food group approach showed a slight advantage in diet representation, but produced considerably longer TDS food lists in comparison with the total diet approach. The addition of a 5% consumer rate to both approaches had little impact on results. In conclusion, the total diet approach may act as a more cost-effective approach in comparison with the food group approach while still achieving comprehensive results in the creation of core TDS food lists.

  9. Selection Effects in Identifying Magnetic Clouds and the Importance of the Closest Approach Parameter

    NASA Technical Reports Server (NTRS)

    Lepping, R. P.; Wu, Chin-Chun

    2010-01-01

    This study is motivated by the unusually low number of magnetic clouds (MCs) that are strictly identified within interplanetary coronal mass ejections (ICMEs), as observed at 1 AU; this is usually estimated to be around 30% or lower. But a looser definition of MCs may significantly increase this percentage. Another motivation is the unexpected shape of the occurrence distribution of the observers' "closest approach distances" (measured from a MC's axis, and called CA) which drops off somewhat rapidly as |CA| (in % of MC radius) approaches 100%, based on earlier studies. We suggest, for various geometrical and physical reasons, that the |CA|-distribution should be somewhere between a uniform one and the one actually observed, and therefore the 30% estimate should be higher. So we ask, When there is a failure to identify a MC within an ICME, is it occasionally due to a large |CA| passage, making MC identification more difficult, i.e., is it due to an event selection effect? In attempting to answer this question we examine WIND data to obtain an accurate distribution of the number of MCs vs. |CA| distance, whether the event is ICME-related or not, where initially a large number of cases (N=98) are considered. This gives a frequence distribution that is far from uniform, confirming earlier studies. This along with the fact that there are many ICME identification-parameters that do not depend on |CA| suggest that, indeed an MC event selection effect may explain at least part of the low ratio of (No. MCs)/(No. ICMEs). We also show that there is an acceptable geometrical and physical consistency in the relationships for both average "normalized" magnetic field intensity change and field direction change vs. |CA| within a MC, suggesting that our estimates of |CA|, B(sub 0) (magnetic field intensity on the axis), and choice of a proper "cloud coordinate" system (all needed in the analysis) are acceptably accurate. Therefore the MC fitting model (Lepping et al., 1990) is

  10. LeadMine: a grammar and dictionary driven approach to entity recognition

    PubMed Central

    2015-01-01

    Background Chemical entity recognition has traditionally been performed by machine learning approaches. Here we describe an approach using grammars and dictionaries. This approach has the advantage that the entities found can be directly related to a given grammar or dictionary, which allows the type of an entity to be known and, if an entity is misannotated, indicates which resource should be corrected. As recognition is driven by what is expected, if spelling errors occur, they can be corrected. Correcting such errors is highly useful when attempting to lookup an entity in a database or, in the case of chemical names, converting them to structures. Results Our system uses a mixture of expertly curated grammars and dictionaries, as well as dictionaries automatically derived from public resources. We show that the heuristics developed to filter our dictionary of trivial chemical names (from PubChem) yields a better performing dictionary than the previously published Jochem dictionary. Our final system performs post-processing steps to modify the boundaries of entities and to detect abbreviations. These steps are shown to significantly improve performance (2.6% and 4.0% F1-score respectively). Our complete system, with incremental post-BioCreative workshop improvements, achieves 89.9% precision and 85.4% recall (87.6% F1-score) on the CHEMDNER test set. Conclusions Grammar and dictionary approaches can produce results at least as good as the current state of the art in machine learning approaches. While machine learning approaches are commonly thought of as "black box" systems, our approach directly links the output entities to the input dictionaries and grammars. Our approach also allows correction of errors in detected entities, which can assist with entity resolution. PMID:25810776

  11. A Comprehensive Regression-Based Approach for Identifying Sources of Person Misfit in Typical-Response Measures

    ERIC Educational Resources Information Center

    Ferrando, Pere J.; Lorenzo-Seva, Urbano

    2016-01-01

    This article proposes a general parametric item response theory approach for identifying sources of misfit in response patterns that have been classified as potentially inconsistent by a global person-fit index. The approach, which is based on the weighted least squared regression of the observed responses on the model-expected responses, can be…

  12. A spatial modeling approach to identify potential butternut restoration sites in Mammoth Cave National Park

    USGS Publications Warehouse

    Thompson, L.M.; Van Manen, F.T.; Schlarbaum, S.E.; DePoy, M.

    2006-01-01

    Incorporation of disease resistance is nearly complete for several important North American hardwood species threatened by exotic fungal diseases. The next important step toward species restoration would be to develop reliable tools to delineate ideal restoration sites on a landscape scale. We integrated spatial modeling and remote sensing techniques to delineate potential restoration sites for Butternut (Juglans cinerea L.) trees, a hardwood species being decimated by an exotic fungus, in Mammoth Cave National Park (MCNP), Kentucky. We first developed a multivariate habitat model to determine optimum Butternut habitats within MCNP. Habitat characteristics of 54 known Butternut locations were used in combination with eight topographic and land use data layers to calculate an index of habitat suitability based on Mahalanobis distance (D2). We used a bootstrapping technique to test the reliability of model predictions. Based on a threshold value for the D2 statistic, 75.9% of the Butternut locations were correctly classified, indicating that the habitat model performed well. Because Butternut seedlings require extensive amounts of sunlight to become established, we used canopy cover data to refine our delineation of favorable areas for Butternut restoration. Areas with the most favorable conditions to establish Butternut seedlings were limited to 291.6 ha. Our study provides a useful reference on the amount and location of favorable Butternut habitat in MCNP and can be used to identify priority areas for future Butternut restoration. Given the availability of relevant habitat layers and accurate location records, our approach can be applied to other tree species and areas. ?? 2006 Society for Ecological Restoration International.

  13. Systems Biology Approach to Identify Gene Network Signatures for Colorectal Cancer

    PubMed Central

    Sonachalam, Madhankumar; Shen, Jeffrey; Huang, Hui; Wu, Xiaogang

    2012-01-01

    In this work, we integrated prior knowledge from gene signatures and protein interactions with gene set enrichment analysis (GSEA), and gene/protein network modeling together to identify gene network signatures from gene expression microarray data. We demonstrated how to apply this approach into discovering gene network signatures for colorectal cancer (CRC) from microarray datasets. First, we used GSEA to analyze the microarray data through enriching differential genes in different CRC-related gene sets from two publicly available up-to-date gene set databases – Molecular Signatures Database (MSigDB) and Gene Signatures Database (GeneSigDB). Second, we compared the enriched gene sets through enrichment score, false-discovery rate, and nominal p-value. Third, we constructed an integrated protein–protein interaction (PPI) network through connecting these enriched genes by high-quality interactions from a human annotated and predicted protein interaction database, with a confidence score labeled for each interaction. Finally, we mapped differential gene expressions onto the constructed network to build a comprehensive network model containing visualized transcriptome and proteome data. The results show that although MSigDB has more CRC-relevant gene sets than GeneSigDB, the integrated PPI network connecting the enriched genes from both MSigDB and GeneSigDB can provide a more complete view for discovering gene network signatures. We also found several important sub-network signatures for CRC, such as TP53 sub-network, PCNA sub-network, and IL8 sub-network, corresponding to apoptosis, DNA repair, and immune response, respectively. PMID:22629282

  14. Identifying Country-Specific Cultures of Physics Education: A differential item functioning approach

    NASA Astrophysics Data System (ADS)

    Mesic, Vanes

    2012-11-01

    In international large-scale assessments of educational outcomes, student achievement is often represented by unidimensional constructs. This approach allows for drawing general conclusions about country rankings with respect to the given achievement measure, but it typically does not provide specific diagnostic information which is necessary for systematic comparisons and improvements of educational systems. Useful information could be obtained by exploring the differences in national profiles of student achievement between low-achieving and high-achieving countries. In this study, we aimed to identify the relative weaknesses and strengths of eighth graders' physics achievement in Bosnia and Herzegovina in comparison to the achievement of their peers from Slovenia. For this purpose, we ran a secondary analysis of Trends in International Mathematics and Science Study (TIMSS) 2007 data. The student sample consisted of 4,220 students from Bosnia and Herzegovina and 4,043 students from Slovenia. After analysing the cognitive demands of TIMSS 2007 physics items, the correspondent differential item functioning (DIF)/differential group functioning contrasts were estimated. Approximately 40% of items exhibited large DIF contrasts, indicating significant differences between cultures of physics education in Bosnia and Herzegovina and Slovenia. The relative strength of students from Bosnia and Herzegovina showed to be mainly associated with the topic area 'Electricity and magnetism'. Classes of items which required the knowledge of experimental method, counterintuitive thinking, proportional reasoning and/or the use of complex knowledge structures proved to be differentially easier for students from Slovenia. In the light of the presented results, the common practice of ranking countries with respect to universally established cognitive categories seems to be potentially misleading.

  15. Predicting Fish Growth Potential and Identifying Water Quality Constraints: A Spatially-Explicit Bioenergetics Approach

    NASA Astrophysics Data System (ADS)

    Budy, Phaedra; Baker, Matthew; Dahle, Samuel K.

    2011-10-01

    Anthropogenic impairment of water bodies represents a global environmental concern, yet few attempts have successfully linked fish performance to thermal habitat suitability and fewer have distinguished co-varying water quality constraints. We interfaced fish bioenergetics, field measurements, and Thermal Remote Imaging to generate a spatially-explicit, high-resolution surface of fish growth potential, and next employed a structured hypothesis to detect relationships among measures of fish performance and co-varying water quality constraints. Our thermal surface of fish performance captured the amount and spatial-temporal arrangement of thermally-suitable habitat for three focal species in an extremely heterogeneous reservoir, but interpretation of this pattern was initially confounded by seasonal covariation of water residence time and water quality. Subsequent path analysis revealed that in terms of seasonal patterns in growth potential, catfish and walleye responded to temperature, positively and negatively, respectively; crappie and walleye responded to eutrophy (negatively). At the high eutrophy levels observed in this system, some desired fishes appear to suffer from excessive cultural eutrophication within the context of elevated temperatures whereas others appear to be largely unaffected or even enhanced. Our overall findings do not lead to the conclusion that this system is degraded by pollution; however, they do highlight the need to use a sensitive focal species in the process of determining allowable nutrient loading and as integrators of habitat suitability across multiple spatial and temporal scales. We provide an integrated approach useful for quantifying fish growth potential and identifying water quality constraints on fish performance at spatial scales appropriate for whole-system management.

  16. A machine learning approach to identify hydrogenosomal proteins in Trichomonas vaginalis.

    PubMed

    Burstein, David; Gould, Sven B; Zimorski, Verena; Kloesges, Thorsten; Kiosse, Fuat; Major, Peter; Martin, William F; Pupko, Tal; Dagan, Tal

    2012-02-01

    The protozoan parasite Trichomonas vaginalis is the causative agent of trichomoniasis, the most widespread nonviral sexually transmitted disease in humans. It possesses hydrogenosomes-anaerobic mitochondria that generate H(2), CO(2), and acetate from pyruvate while converting ADP to ATP via substrate-level phosphorylation. T. vaginalis hydrogenosomes lack a genome and translation machinery; hence, they import all their proteins from the cytosol. To date, however, only 30 imported proteins have been shown to localize to the organelle. A total of 226 nuclear-encoded proteins inferred from the genome sequence harbor a characteristic short N-terminal presequence, reminiscent of mitochondrial targeting peptides, which is thought to mediate hydrogenosomal targeting. Recent studies suggest, however, that the presequences might be less important than previously thought. We sought to identify new hydrogenosomal proteins within the 59,672 annotated open reading frames (ORFs) of T. vaginalis, independent of the N-terminal targeting signal, using a machine learning approach. Our training set included 57 gene and protein features determined for all 30 known hydrogenosomal proteins and 576 nonhydrogenosomal proteins. Several classifiers were trained on this set to yield an import score for all proteins encoded by T. vaginalis ORFs, predicting the likelihood of hydrogenosomal localization. The machine learning results were tested through immunofluorescence assay and immunodetection in isolated cell fractions of 14 protein predictions using hemagglutinin constructs expressed under the homologous SCSα promoter in transiently transformed T. vaginalis cells. Localization of 6 of the 10 top predicted hydrogenosome-localized proteins was confirmed, and two of these were found to lack an obvious N-terminal targeting signal.

  17. Assessment of a Novel Approach to Identify Trichiasis Cases Using Community Treatment Assistants in Tanzania

    PubMed Central

    Greene, Gregory S.; West, Sheila K.; Mkocha, Harran; Munoz, Beatriz; Merbs, Shannath L.

    2015-01-01

    Background Simple surgical intervention advocated by the World Health Organization can alleviate trachomatous trichiasis (TT) and prevent subsequent blindness. A large backlog of TT cases remain unidentified and untreated. To increase identification and referral of TT cases, a novel approach using standard screening questions, a card, and simple training for Community Treatment Assistants (CTAs) to use during Mass Drug Administration (MDA) was developed and evaluated in Kongwa District, a trachoma-endemic area of central Tanzania. Methodology/Principal Findings A community randomized trial was conducted in 36 communities during MDA. CTAs in intervention villages received an additional half-day of training and a TT screening card in addition to the training received by CTAs in villages assigned to usual care. All MDA participants 15 years and older were screened for TT, and senior TT graders confirmed case status by evaluating all screened-positive cases. A random sample of those screened negative for TT and those who did not present at MDA were also evaluated by the master graders. Intervention CTAs identified 5.6 times as many cases (n = 50) as those assigned to usual care (n = 9, p < 0.05). While specificity was above 90% for both groups, the sensitivity for the novel screening tool was 31.2% compared to 5.6% for the usual care group (p < 0.05). Conclusions/Significance CTAs appear to be viable resources for the identification of TT cases. Additional training and use of a TT screening card significantly increased the ability of CTAs to recognize and refer TT cases during MDA; however, further efforts are needed to improve case detection and reduce the number of false positive cases. PMID:26658938

  18. A systems biology and proteomics-based approach identifies SRC and VEGFA as biomarkers in risk factor mediated coronary heart disease.

    PubMed

    V, Alexandar; Nayar, Pradeep G; Murugesan, R; S, Shajahan; Krishnan, Jayalakshmi; Ahmed, Shiek S S J

    2016-07-19

    Coronary heart disease (CHD) is the most common cause of death worldwide. The burden of CHD increases with risk factors such as smoking, hypertension, obesity and diabetes. Several studies have demonstrated the association of these classical risk factors with CHD. However, the mechanisms of these associations remain largely unclear due to the complexity of disease pathophysiology and the lack of an integrative approach that fails to provide a definite understanding of molecular linkage. To overcome these problems, we propose a novel systems biology approach that relates causative genes, interactomes and pathways to elucidate the risk factors mediating the molecular mechanisms and biomarkers for feasible diagnosis. The literature was mined to retrieve the causative genes of each risk factor and CHD to construct protein interactomes. The interactomes were examined to identify 298 common molecular signatures. The common signatures were mapped to the tissue network to synthesize a sub-network consisting of 82 proteins. Further, the dissection of the sub-network provides functional modules representing a diverse range of molecular functions, including the AKT/p13k, MAPK and wnt pathways. Also, the prioritization of functional modules identifies SRC, VEGFA and HIF1A as potential candidate markers. Further, we validate these candidates with the existing markers CRP, NOS3 and VCAM1 in the serum of 63 individuals, 33 with CHD and 30 controls, using ELISA. SRC, VEGFA, H1F1A, CRP and NOS3 were significantly altered in patients compared to controls. These results support the utility of these candidate markers for the diagnosis of CHD. Overall, our molecular observations indicate the influence of risk factors in the pathophysiology of CHD and identify serum markers for diagnosis.

  19. Geologic considerations in underground coal mining system design

    SciTech Connect

    Camilli, F.A.; Maynard, D.P.; Mangolds, A.; Harris, J.

    1981-10-01

    Geologic characteristics of coal resources which may impact new extraction technologies are identified and described to aid system designers and planners in their task of designing advanced coal extraction systems for the central Appalachian region. These geologic conditions are then organized into a matrix identified as the baseline mine concept. A sample region, eastern Kentucky, is next analyzed, using both the new baseline mine concept and traditional geologic investigative approach. The baseline mine concept presented is intended as a framework, providing a consistent basis for further analyses to be subsequently conducted in other geographic regions. The baseline mine concept is intended as a tool to give system designers a more realistic feel of the mine environment and will hopefully lead to acceptable alternatives for advanced coal extraction system.

  20. Proceedings: Fourth Workshop on Mining Scientific Datasets

    SciTech Connect

    Kamath, C

    2001-07-24

    Commercial applications of data mining in areas such as e-commerce, market-basket analysis, text-mining, and web-mining have taken on a central focus in the JCDD community. However, there is a significant amount of innovative data mining work taking place in the context of scientific and engineering applications that is not well represented in the mainstream KDD conferences. For example, scientific data mining techniques are being developed and applied to diverse fields such as remote sensing, physics, chemistry, biology, astronomy, structural mechanics, computational fluid dynamics etc. In these areas, data mining frequently complements and enhances existing analysis methods based on statistics, exploratory data analysis, and domain-specific approaches. On the surface, it may appear that data from one scientific field, say genomics, is very different from another field, such as physics. However, despite their diversity, there is much that is common across the mining of scientific and engineering data. For example, techniques used to identify objects in images are very similar, regardless of whether the images came from a remote sensing application, a physics experiment, an astronomy observation, or a medical study. Further, with data mining being applied to new types of data, such as mesh data from scientific simulations, there is the opportunity to apply and extend data mining to new scientific domains. This one-day workshop brings together data miners analyzing science data and scientists from diverse fields to share their experiences, learn how techniques developed in one field can be applied in another, and better understand some of the newer techniques being developed in the KDD community. This is the fourth workshop on the topic of Mining Scientific Data sets; for information on earlier workshops, see http://www.ahpcrc.org/conferences/. This workshop continues the tradition of addressing challenging problems in a field where the diversity of applications is

  1. Differentially Private Frequent Subgraph Mining

    PubMed Central

    Xu, Shengzhi; Xiong, Li; Cheng, Xiang; Xiao, Ke

    2016-01-01

    Mining frequent subgraphs from a collection of input graphs is an important topic in data mining research. However, if the input graphs contain sensitive information, releasing frequent subgraphs may pose considerable threats to individual's privacy. In this paper, we study the problem of frequent subgraph mining (FGM) under the rigorous differential privacy model. We introduce a novel differentially private FGM algorithm, which is referred to as DFG. In this algorithm, we first privately identify frequent subgraphs from input graphs, and then compute the noisy support of each identified frequent subgraph. In particular, to privately identify frequent subgraphs, we present a frequent subgraph identification approach which can improve the utility of frequent subgraph identifications through candidates pruning. Moreover, to compute the noisy support of each identified frequent subgraph, we devise a lattice-based noisy support derivation approach, where a series of methods has been proposed to improve the accuracy of the noisy supports. Through formal privacy analysis, we prove that our DFG algorithm satisfies ε-differential privacy. Extensive experimental results on real datasets show that the DFG algorithm can privately find frequent subgraphs with high data utility. PMID:27616876

  2. Enriching consumer health vocabulary through mining a social Q&A site: a similarity-based approach.

    PubMed

    He, Zhe; Chen, Zhiwei; Oh, Sanghee; Hou, Jinghui; Bian, Jiang

    2017-03-27

    The widely known vocabulary gap between health consumers and healthcare professionals hinders information seeking and health dialogue of consumers on end-user health applications. The Open Access and Collaborative Consumer Health Vocabulary (OAC CHV), which contains health-related terms used by lay consumers, has been created to bridge such a gap. Specifically, the OAC CHV facilitates consumers' health information retrieval by enabling consumer-facing health applications to translate between professional language and consumer friendly language. To keep up with the constantly evolving medical knowledge and language use, new terms need to be identified and added to the OAC CHV. User-generated content on social media, including social question and answer (social Q&A) sites, afford us an enormous opportunity in mining consumer health terms. Existing methods of identifying new consumer terms from text typically use ad-hoc lexical syntactic patterns and human review. Our study extends an existing method by extracting n-grams from a social Q&A textual corpus and representing them with a rich set of contextual and syntactic features. Using K-means clustering, our method, simiTerm, was able to identify terms that are both contextually and syntactically similar to the existing OAC CHV terms. We tested our method on social Q&A corpora on two disease domains: diabetes and cancer. Our method outperformed three baseline ranking methods. A post-hoc qualitative evaluation by human experts further validated that our method can effectively identify meaningful new consumer terms on social Q&A.

  3. Visual Data Mining: An Exploratory Approach to Analyzing Temporal Patterns of Eye Movements

    ERIC Educational Resources Information Center

    Yu, Chen; Yurovsky, Daniel; Xu, Tian

    2012-01-01

    Infant eye movements are an important behavioral resource to understand early human development and learning. But the complexity and amount of gaze data recorded from state-of-the-art eye-tracking systems also pose a challenge: how does one make sense of such dense data? Toward this goal, this article describes an interactive approach based on…

  4. A multivariate and stochastic approach to identify key variables to rank dairy farms on profitability.

    PubMed

    Atzori, A S; Tedeschi, L O; Cannas, A

    2013-05-01

    The economic efficiency of dairy farms is the main goal of farmers. The objective of this work was to use routinely available information at the dairy farm level to develop an index of profitability to rank dairy farms and to assist the decision-making process of farmers to increase the economic efficiency of the entire system. A stochastic modeling approach was used to study the relationships between inputs and profitability (i.e., income over feed cost; IOFC) of dairy cattle farms. The IOFC was calculated as: milk revenue + value of male calves + culling revenue - herd feed costs. Two databases were created. The first one was a development database, which was created from technical and economic variables collected in 135 dairy farms. The second one was a synthetic database (sDB) created from 5,000 synthetic dairy farms using the Monte Carlo technique and based on the characteristics of the development database data. The sDB was used to develop a ranking index as follows: (1) principal component analysis (PCA), excluding IOFC, was used to identify principal components (sPC); and (2) coefficient estimates of a multiple regression of the IOFC on the sPC were obtained. Then, the eigenvectors of the sPC were used to compute the principal component values for the original 135 dairy farms that were used with the multiple regression coefficient estimates to predict IOFC (dRI; ranking index from development database). The dRI was used to rank the original 135 dairy farms. The PCA explained 77.6% of the sDB variability and 4 sPC were selected. The sPC were associated with herd profile, milk quality and payment, poor management, and reproduction based on the significant variables of the sPC. The mean IOFC in the sDB was 0.1377 ± 0.0162 euros per liter of milk (€/L). The dRI explained 81% of the variability of the IOFC calculated for the 135 original farms. When the number of farms below and above 1 standard deviation (SD) of the dRI were calculated, we found that 21

  5. Evaluation of different approaches for identifying optimal sites to predict mean hillslope soil moisture content

    NASA Astrophysics Data System (ADS)

    Liao, Kaihua; Zhou, Zhiwen; Lai, Xiaoming; Zhu, Qing; Feng, Huihui

    2017-04-01

    The identification of representative soil moisture sampling sites is important for the validation of remotely sensed mean soil moisture in a certain area and ground-based soil moisture measurements in catchment or hillslope hydrological studies. Numerous approaches have been developed to identify optimal sites for predicting mean soil moisture. Each method has certain advantages and disadvantages, but they have rarely been evaluated and compared. In our study, surface (0-20 cm) soil moisture data from January 2013 to March 2016 (a total of 43 sampling days) were collected at 77 sampling sites on a mixed land-use (tea and bamboo) hillslope in the hilly area of Taihu Lake Basin, China. A total of 10 methods (temporal stability (TS) analyses based on 2 indices, K-means clustering based on 6 kinds of inputs and 2 random sampling strategies) were evaluated for determining optimal sampling sites for mean soil moisture estimation. They were TS analyses based on the smallest index of temporal stability (ITS, a combination of the mean relative difference and standard deviation of relative difference (SDRD)) and based on the smallest SDRD, K-means clustering based on soil properties and terrain indices (EFs), repeated soil moisture measurements (Theta), EFs plus one-time soil moisture data (EFsTheta), and the principal components derived from EFs (EFs-PCA), Theta (Theta-PCA), and EFsTheta (EFsTheta-PCA), and global and stratified random sampling strategies. Results showed that the TS based on the smallest ITS was better (RMSE = 0.023 m3 m-3) than that based on the smallest SDRD (RMSE = 0.034 m3 m-3). The K-means clustering based on EFsTheta (-PCA) was better (RMSE <0.020 m3 m-3) than these based on EFs (-PCA) and Theta (-PCA). The sampling design stratified by the land use was more efficient than the global random method. Forty and 60 sampling sites are needed for stratified sampling and global sampling respectively to make their performances comparable to the best K

  6. Novel approach to identifying the hepatitis B virus pre-S deletions associated with hepatocellular carcinoma

    PubMed Central

    Zhao, Zhi-Mei; Jin, Yan; Gan, Yu; Zhu, Yu; Chen, Tao-Yang; Wang, Jin-Bing; Sun, Yan; Cao, Zhi-Gang; Qian, Geng-Sun; Tu, Hong

    2014-01-01

    AIM: To develop a novel non-sequencing method for the detection of hepatitis B virus (HBV) pre-S deletion mutants in HBV carriers. METHODS: The entire region of HBV pre-S1 and pre-S2 was amplified by polymerase chain reaction (PCR). The size of PCR products was subsequently determined by capillary gel electrophoresis (CGE). CGE were carried out in a PACE-MDQ instrument equipped with a UV detector set at 254 nm. The samples were separated in 50 μm ID eCAP Neutral Coated Capillaries using a voltage of 6 kV for 30 min. Data acquisition and analysis were performed using the 32 Karat Software. A total of 114 DNA clones containing different sizes of the HBV pre-S gene were used to determine the accuracy of the CGE method. One hundred and fifty seven hepatocellular carcinoma (HCC) and 160 non-HCC patients were recruited into the study to assess the association between HBV pre-S deletion and HCC by using the newly-established CGE method. Nine HCC cases with HBV pre-S deletion at the diagnosis year were selected to conduct a longitudinal observation using serial serum samples collected 2-9 years prior to HCC diagnosis. RESULTS: CGE allowed the separation of PCR products differing in size > 3 bp and was able to identify 10% of the deleted DNA in a background of wild-type DNA. The accuracy rate of CGE-based analysis was 99.1% compared with the clone sequencing results. Using this assay, pre-S deletion was more frequently found in HCC patients than in non-HCC controls (47.1% vs 28.1%, P < 0.001). Interestingly, the increased risk of HCC was mainly contributed by the short deletion of pre-S. While the deletion ≤ 99 bp was associated with a 2.971-fold increased risk of HCC (95%CI: 1.723-5.122, P < 0.001), large deletion (> 99 bp) did not show any association with HCC (P = 0.918, OR = 0.966, 95%CI: 0.501-1.863). Of the 9 patients who carried pre-S deletions at the stage of HCC, 88.9% (8/9) had deletions 2-5 years prior to HCC, while only 44.4%4 (4/9) contained such deletions 6

  7. Mining Students' Learning Patterns and Performance in Web-Based Instruction: A Cognitive Style Approach

    ERIC Educational Resources Information Center

    Chen, Sherry Y.; Liu, Xiaohui

    2011-01-01

    Personalization has been widely used in Web-based instruction (WBI). To deliver effective personalization, there is a need to understand different preferences of each student. Cognitive style has been identified as one of the most pertinent factors that affect students' learning preferences. Therefore, it is essential to investigate how learners…

  8. A Data Mining Approach to Improve Re-Accessibility and Delivery of Learning Knowledge Objects

    ERIC Educational Resources Information Center

    Sabitha, Sai; Mehrotra, Deepti; Bansal, Abhay

    2014-01-01

    Today Learning Management Systems (LMS) have become an integral part of learning mechanism of both learning institutes and industry. A Learning Object (LO) can be one of the atomic components of LMS. A large amount of research is conducted into identifying benchmarks for creating Learning Objects. Some of the major concerns associated with LO are…

  9. Identifying Behavioral Barriers to Campus Sustainability: A Multi-Method Approach

    ERIC Educational Resources Information Center

    Horhota, Michelle; Asman, Jenni; Stratton, Jeanine P.; Halfacre, Angela C.

    2014-01-01

    Purpose: The purpose of this paper is to assess the behavioral barriers to sustainable action in a campus community. Design/methodology/approach: This paper reports three different methodological approaches to the assessment of behavioral barriers to sustainable actions on a college campus. Focus groups and surveys were used to assess campus…

  10. A Hybrid Knowledge-Based and Data-Driven Approach to Identifying Semantically Similar Concepts

    PubMed Central

    Pivovarov, Rimma; Elhadad, Noémie

    2012-01-01

    An open research question when leveraging ontological knowledge is when to treat different concepts separately from each other and when to aggregate them. For instance, concepts for the terms "paroxysmal cough" and "nocturnal cough" might be aggregated in a kidney disease study, but should be left separate in a pneumonia study. Determining whether two concepts are similar enough to be aggregated can help build better datasets for data mining purposes and avoid signal dilution. Quantifying the similarity among concepts is a difficult task, however, in part because such similarity is context-dependent. We propose a comprehensive method, which computes a similarity score for a concept pair by combining data-driven and ontology-driven knowledge. We demonstrate our method on concepts from SNOMED-CT and on a corpus of clinical notes of patients with chronic kidney disease. By combining information from usage patterns in clinical notes and from ontological structure, the method can prune out concepts that are simply related from those which are semantically similar. When evaluated against a list of concept pairs annotated for similarity, our method reaches an AUC (area under the curve) of 92%. PMID:22289420

  11. Combined mining: discovering informative knowledge in complex data.

    PubMed

    Cao, Longbing; Zhang, Huaifeng; Zhao, Yanchang; Luo, Dan; Zhang, Chengqi

    2011-06-01

    Enterprise data mining applications often involve complex data such as multiple large heterogeneous data sources, user preferences, and business impact. In such situations, a single method or one-step mining is often limited in discovering informative knowledge. It would also be very time and space consuming, if not impossible, to join relevant large data sources for mining patterns consisting of multiple aspects of information. It is crucial to develop effective approaches for mining patterns combining necessary information from multiple relevant business lines, catering for real business settings and decision-making actions rather than just providing a single line of patterns. The recent years have seen increasing efforts on mining more informative patterns, e.g., integrating frequent pattern mining with classifications to generate frequent pattern-based classifiers. Rather than presenting a specific algorithm, this paper builds on our existing works and proposes combined mining as a general approach to mining for informative patterns combining components from either multiple data sets or multiple features or by multiple methods on demand. We summarize general frameworks, paradigms, and basic processes for multifeature combined mining, multisource combined mining, and multimethod combined mining. Novel types of combined patterns, such as incremental cluster patterns, can result from such frameworks, which cannot be directly produced by the existing methods. A set of real-world case studies has been conducted to test the frameworks, with some of them briefed in this paper. They identify combined patterns for informing government debt prevention and improving government service objectives, which show the flexibility and instantiation capability of combined mining in discovering informative knowledge in complex data.

  12. Identifying Creatively Gifted Students: Necessity of a Multi-Method Approach

    ERIC Educational Resources Information Center

    Ambrose, Laura; Machek, Greg R.

    2015-01-01

    The process of identifying students as creatively gifted provides numerous challenges for educators. Although many schools assess for creativity in identifying students for gifted and talented services, the relationship between creativity and giftedness is often not fully understood. This article reviews commonly used methods of creativity…

  13. Multifunctional greenway approach for landscape planning and reclamation of a post-mining district: Cartagena-La Unión, SE Spain

    NASA Astrophysics Data System (ADS)

    Acosta, Jose A.; Faz, Ángel; Zornoza, Raúl; Martínez-Martínez, Silvia; Kabas, Sebla; Bech, Jaume

    2015-04-01

    Fragmented structures create metaphorical wounds in the landscape altering the ecological and cultural processes associated with it, as it can be seen in many mine areas. Therefore it is advisable to organize the reclamation plan in the beginning of mine operating to provide spatial and functional integration of the landscape based on scientific arguments and with all possible legal and administrative means, which is generally the case of the Strategic Environmental Assessment. However, there are many abandon mine areas where no reclamation plan has been carried out, such as the case of Mining District of Sierra Minera Cartagena-La Unión, SE Spain. In these cases it is vital to respond in a sustainable manner for healing the landscape wounds of post-mining activities. Reclamation activities of a post-mining district includes not only the mine soils also all land uses around them, for this reason on necessary create practical solutions for returning the functions of ecologic and cultural processes of the area. Greenway approach shows the main veins which are crucial for keeping alive and sustaining the mentioned processes of the area. Therefore the main objectives of this study are to 1) develop an integrated local greenway network to be able to preserve significant resources and values of the district, and to 2) develop this greenway network as a part of reclamation process for degraded areas. Landscape assessments revealed the most valuable and potential connectivity resources of the area. These clustering and linear patterns of resource concentrations include mountain range and valleys, natural drainage network, legally protected areas and cultural-historical resources. Conservation areas, cultural-educational resources of post-mining activities and the riverbeds have been the main building stones for the greenway corridor. The multifunctional greenway approach serves as landscape reclamation and planning tool in a degraded area by showing the priority zones for

  14. Renewed mining and reclamation: Imapacts on bats and potential mitigation

    SciTech Connect

    Brown, P.E.; Berry, R.D.

    1997-12-31

    Historic mining created new roosting habitat for many bat species. Now the same industry has the potential to adversely impact bats. Contemporary mining operations usually occur in historic districts; consequently the old workings are destroyed by open pit operations. Occasionally, underground techniques are employed, resulting in the enlargement or destruction of the original workings. Even during exploratory operations, historic mine openings can be covered as drill roads are bulldozed, or drills can penetrate and collapse underground workings. Nearby blasting associated with mine construction and operation can disrupt roosting bats. Bats can also be disturbed by the entry of mine personnel to collect ore samples or by recreational mine explorers, since the creation of roads often results in easier access. In addition to roost disturbance, other aspects of renewed mining can have adverse impacts on bat populations, and affect even those bats that do not live in mines. Open cyanide ponds, or other water in which toxic chemicals accumulate, can poison bats and other wildlife. The creation of the pits, roads and processing areas often destroys critical foraging habitat, or change drainage patterns. Finally, at the completion of mining, any historic mines still open may be sealed as part of closure and reclamation activities. The net result can be a loss of bats and bat habitat. Conversely, in some contemporary underground operations, future roosting habitat for bats can be fabricated. An experimental approach to the creation of new roosting habitat is to bury culverts or old tires beneath waste rock. Mining companies can mitigate for impacts to bats by surveying to identify bat-roosting habitat, removing bats prior to renewed mining or closure, protecting non-impacted roost sites with gates and fences, researching to identify habitat requirements and creating new artificial roosts.

  15. Two-step web-mining approach to study geology/geophysics-related open-source software projects

    NASA Astrophysics Data System (ADS)

    Behrends, Knut; Conze, Ronald

    2013-04-01

    Geology/geophysics is a highly interdisciplinary science, overlapping with, for instance, physics, biology and chemistry. In today's software-intensive work environments, geoscientists often encounter new open-source software from scientific fields that are only remotely related to the own field of expertise. We show how web-mining techniques can help to carry out systematic discovery and evaluation of such software. In a first step, we downloaded ~500 abstracts (each consisting of ~1 kb UTF-8 text) from agu-fm12.abstractcentral.com. This web site hosts the abstracts of all publications presented at AGU Fall Meeting 2012, the world's largest annual geology/geophysics conference. All abstracts belonged to the category "Earth and Space Science Informatics", an interdisciplinary label cross-cutting many disciplines such as "deep biosphere", "atmospheric research", and "mineral physics". Each publication was represented by a highly structured record with ~20 short data attributes, the largest authorship-record being the unstructured "abstract" field. We processed texts of the abstracts with the statistics software "R" to calculate a corpus and a term-document matrix. Using R package "tm", we applied text-mining techniques to filter data and develop hypotheses about software-development activities happening in various geology/geophysics fields. Analyzing the term-document matrix with basic techniques (e.g., word frequencies, co-occurences, weighting) as well as more complex methods (clustering, classification) several key pieces of information were extracted. For example, text-mining can be used to identify scientists who are also developers of open-source scientific software, and the names of their programming projects and codes can also be identified. In a second step, based on the intermediate results found by processing the conference-abstracts, any new hypotheses can be tested in another webmining subproject: by merging the dataset with open data from github

  16. Novel approach for identifying key residues in enzymatic reactions: proton abstraction in ketosteroid isomerase.

    PubMed

    Ito, Mika; Brinck, Tore

    2014-11-20

    We propose a computationally efficient approach for evaluating the individual contributions of many different residues to the catalytic efficiency of an enzymatic reaction. This approach is based on the fragment molecular orbital (FMO) method, and it defines the energy of a deletion form, i.e., the energy of the system when a particular residue is deleted. Using this approach, we found that, among 10 investigated residues, three, Tyr14, Asp99, and Tyr55, in this order, significantly reduce the activation energy of the proton abstraction from a substrate, cyclopent-2-enone, catalyzed by ketosteroid isomerase (KSI). The relative activation energies estimated in this study are in good agreement with available previous experimental and theoretical data obtained for the similar proton abstraction with a native substrate and substitution mutants of KSI. It was thus indicated that the new approach is efficient for rationally evaluating the catalytic effects of multiple residues on an enzymatic reaction.

  17. Big data mining powers fungal research: recent advances in fission yeast systems biology approaches.

    PubMed

    Wang, Zhe

    2016-10-11

    Biology research has entered into big data era. Systems biology approaches therefore become the powerful tools to obtain the whole landscape of how cell separate, grow, and resist the stresses. Fission yeast Schizosaccharomyces pombe is wonderful unicellular eukaryote model, especially studying its division and metabolism can facilitate to understanding the molecular mechanism of cancer and discovering anticancer agents. In this perspective, we discuss the recent advanced fission yeast systems biology tools, mainly focus on metabolomics profiling and metabolic modeling, protein-protein interactome and genetic interaction network, DNA sequencing and applications, and high-throughput phenotypic screening. We therefore hope this review can be useful for interested fungal researchers as well as bioformaticians.

  18. An approach to developing independent learning and non-technical skills amongst final year mining engineering students

    NASA Astrophysics Data System (ADS)

    Knobbs, C. G.; Grayson, D. J.

    2012-06-01

    There is mounting evidence to show that engineers need more than technical skills to succeed in industry. This paper describes a curriculum innovation in which so-called 'soft' skills, specifically inter-personal and intra-personal skills, were integrated into a final year mining engineering course. The instructional approach was designed to promote independent learning and to develop non-technical skills, essential for students on the threshold of becoming practising engineers. Three psychometric tests were administered at the beginning of the course to make students aware of their own and their classmates' characteristics. Substantial prescribed reading assignments preceded weekly group discussions. Several projects during the course required team work skills and application of content knowledge to real-world contexts. Results obtained from students' reflection papers, assignments related to 'soft' skills and end of course evaluations suggest that students' appreciation of the need for these skills, as well as their own perceived competence, increased during the course. Their ability to function as independent learners also increased.

  19. A coclustering approach for mining large protein-protein interaction networks.

    PubMed

    Pizzuti, Clara; Rombo, Simona E

    2012-01-01

    Several approaches have been presented in the literature to cluster Protein-Protein Interaction (PPI) networks. They can be grouped in two main categories: those allowing a protein to participate in different clusters and those generating only nonoverlapping clusters. In both cases, a challenging task is to find a suitable compromise between the biological relevance of the results and a comprehensive coverage of the analyzed networks. Indeed, methods returning high accurate results are often able to cover only small parts of the input PPI network, especially when low-characterized networks are considered. We present a coclustering-based technique able to generate both overlapping and nonoverlapping clusters. The density of the clusters to search for can also be set by the user. We tested our method on the two networks of yeast and human, and compared it to other five well-known techniques on the same interaction data sets. The results showed that, for all the examples considered, our approach always reaches a good compromise between accuracy and network coverage. Furthermore, the behavior of our algorithm is not influenced by the structure of the input network, different from all the techniques considered in the comparison, which returned very good results on the yeast network, while on the human network their outcomes are rather poor.

  20. Using a watershed-centric approach to identify potentially impacted beaches

    EPA Science Inventory

    Beaches can be affected by a variety of contaminants. Of particular concern are beaches impacted by human fecal contamination and urban runoff. This poster demonstrates a methodology to identify potentially impacted beaches using Geographic Information Systems (GIS). Since h...

  1. Prediction of possible CaMnO3 modifications using an ab initio minimization data-mining approach.

    PubMed

    Zagorac, Jelena; Zagorac, Dejan; Zarubica, Aleksandra; Schön, J Christian; Djuris, Katarina; Matovic, Branko

    2014-10-01

    We have performed a crystal structure prediction study of CaMnO3 focusing on structures generated by octahedral tilting according to group-subgroup relations from the ideal perovskite type (Pm\\overline 3 m), which is the aristotype of the experimentally known CaMnO3 compound in the Pnma space group. Furthermore, additional structure candidates have been obtained using data mining. For each of the structure candidates, a local optimization on the ab initio level using density-functional theory (LDA, hybrid B3LYP) and the Hartree--Fock (HF) method was performed, and we find that several of the modifications may be experimentally accessible. In the high-pressure regime, we identify a post-perovskite phase in the CaIrO3 type, not previously observed in CaMnO3. Similarly, calculations at effective negative pressure predict a phase transition from the orthorhombic perovskite to an ilmenite-type (FeTiO3) modification of CaMnO3.

  2. A Simple Screening Approach To Prioritize Genes for Functional Analysis Identifies a Role for Interferon Regulatory Factor 7 in the Control of Respiratory Syncytial Virus Disease

    PubMed Central

    McDonald, Jacqueline U.; Kaforou, Myrsini; Clare, Simon; Hale, Christine; Ivanova, Maria; Huntley, Derek; Dorner, Marcus; Wright, Victoria J.; Levin, Michael; Martinon-Torres, Federico; Herberg, Jethro A.

    2016-01-01

    ABSTRACT Greater understanding of the functions of host gene products in response to infection is required. While many of these genes enable pathogen clearance, some enhance pathogen growth or contribute to disease symptoms. Many studies have profiled transcriptomic and proteomic responses to infection, generating large data sets, but selecting targets for further study is challenging. Here we propose a novel data-mining approach combining multiple heterogeneous data sets to prioritize genes for further study by using respiratory syncytial virus (RSV) infection as a model pathogen with a significant health care impact. The assumption was that the more frequently a gene is detected across multiple studies, the more important its role is. A literature search was performed to find data sets of genes and proteins that change after RSV infection. The data sets were standardized, collated into a single database, and then panned to determine which genes occurred in multiple data sets, generating a candidate gene list. This candidate gene list was validated by using both a clinical cohort and in vitro screening. We identified several genes that were frequently expressed following RSV infection with no assigned function in RSV control, including IFI27, IFIT3, IFI44L, GBP1, OAS3, IFI44, and IRF7. Drilling down into the function of these genes, we demonstrate a role in disease for the gene for interferon regulatory factor 7, which was highly ranked on the list, but not for IRF1, which was not. Thus, we have developed and validated an approach for collating published data sets into a manageable list of candidates, identifying novel targets for future analysis. IMPORTANCE Making the most of “big data” is one of the core challenges of current biology. There is a large array of heterogeneous data sets of host gene responses to infection, but these data sets do not inform us about gene function and require specialized skill sets and training for their utilization. Here we

  3. A Distributed Ensemble Approach for Mining Healthcare Data under Privacy Constraints

    PubMed Central

    Li, Yan; Bai, Changxin; Reddy, Chandan K.

    2015-01-01

    In recent years, electronic health records (EHRs) have been widely adapted at many healthcare facilities in an attempt to improve the quality of patient care and increase the productivity and efficiency of healthcare delivery. These EHRs can accurately diagnose diseases if utilized appropriately. While the EHRs can potentially resolve many of the existing problems associated with disease diagnosis, one of the main obstacles in effectively using them is the patient privacy and sensitivity of the medical information available in the EHR. Due to these concerns, even if the EHRs are available for storage and retrieval purposes, sharing of the patient records between different healthcare facilities has become a major concern and has hampered some of the effective advantages of using EHRs. Due to this lack of data sharing, most of the facilities aim at building clinical decision support systems using limited amount of patient data from their own EHR systems to provide important diagnosis related decisions. It becomes quite infeasible for a newly established healthcare facility to build a robust decision making system due to the lack of sufficient patient records. However, to make effective decisions from clinical data, it is indispensable to have large amounts of data to train the decision models. In this regard, there are conflicting objectives of preserving patient privacy and having sufficient data for modeling and decision making. To handle such disparate goals, we develop two adaptive distributed privacy-preserving algorithms based on a distributed ensemble strategy. The basic idea of our approach is to build an elegant model for each participating facility to accurately learn the data distribution, and then can transfer the useful healthcare knowledge acquired on their data from these participators in the form of their own decision models without revealing and sharing the patient-level sensitive data, thus protecting patient privacy. We demonstrate that our

  4. A Bayesian approach to identifying structural nonlinearity using free-decay response: Application to damage detection in composites

    USGS Publications Warehouse

    Nichols, J.M.; Link, W.A.; Murphy, K.D.; Olson, C.C.

    2010-01-01

    This work discusses a Bayesian approach to approximating the distribution of parameters governing nonlinear structural systems. Specifically, we use a Markov Chain Monte Carlo method for sampling the posterior parameter distributions thus producing both point and interval estimates for parameters. The method is first used to identify both linear and nonlinear parameters in a multiple degree-of-freedom structural systems using free-decay vibrations. The approach is then applied to the problem of identifying the location, size, and depth of delamination in a model composite beam. The influence of additive Gaussian noise on the response data is explored with respect to the quality of the resulting parameter estimates.

  5. Mine waste technology program

    SciTech Connect

    Wilmoth, R.C.; Powers, T.J.

    1995-10-01

    The Mine Waste Technology Program (MWTP) was initiated to address mining waste generated by active and inactive mining production facilities. In June 1991, an Interagency Agreement was signed between the U.S. Environmental Protection Agency and the Department of Energy which outlined the following activities: To identify and prioritize treatment technologies as candidates for demonstration projects; To propose and conduct large pilot-/field-scale demonstration projects of several innovative technologies that show promise for cost effectively remediating local, regional, and national mine waste problems.

  6. Identifying sharks with DNA barcodes: assessing the utility of a nucleotide diagnostic approach.

    PubMed

    Wong, Eugene H-K; Shivji, Mahmood S; Hanner, Robert H

    2009-05-01

    Shark fisheries worldwide are mostly unmanaged, but the burgeoning shark fin industry in the last few decades has made monitoring catch and trade of these animals critical. As a tool for molecular species identification, DNA barcoding offers significant potential. However, the genetic distance-based approach towards species identification employed by the Barcode of Life Data Systems may oftentimes lack the specificity needed for regulatory or legal applications that require unambiguous identification results. This is because such specificity is not typically realized by anything less than a 100% match of the query sequence to an entry in the reference database using genetic distance. Although various divergence thresholds have been proposed to define acceptable levels of intraspecific variation, enough exceptions exist to cast reasonable doubt on many less than exact matches using a distance-based approach for the identification of unknowns. An alternative approach relies on the identification of discrete molecular characters that can be used to unambiguously diagnose species. The objective of this study was to assess the performance differences between these competing approaches by examining more than 1000 DNA barcodes representing nearly 20% of all known elasmobranch species. Our results demonstrate that a character-based, nucleotide diagnostic (ND) approach to barcode identification is feasible and also provides novel insights into the structure of haplotype diversity among closely related species of sharks. Considerations for the use of NDs in applied fields are also explored.

  7. Impact of trace metals from past mining on the aquatic ecosystem: a multi-proxy approach in the Morvan (France).

    PubMed

    Camizuli, E; Monna, F; Scheifler, R; Amiotte-Suchet, P; Losno, R; Beis, P; Bohard, B; Chateau, C; Alibert, P

    2014-10-01

    This study seeks to determine to what extent trace metals resulting from past mining activities are transferred to the aquatic ecosystem, and whether such trace metals still exert deleterious effects on biota. Concentrations of Cd, Cu, Pb and Zn were measured in streambed sediments, transplanted bryophytes and wild brown trout. This study was conducted at two scales: (i) the entire Morvan Regional Nature Park and (ii) three small watersheds selected for their degree of contamination, based on the presence or absence of past mining sites. The overall quality of streambed sediments was assessed using Sediment Quality Indices (SQIs). According to these standard guidelines, more than 96% of the sediments sampled should not represent a threat to biota. Nonetheless, in watersheds where past mining occurred, SQIs are significantly lower. Transplanted bryophytes at these sites consistently present higher trace metal concentrations. For wild brown trout, the scaled mass and liver indices appear to be negatively correlated with liver Pb concentrations, but there are no obvious relationships between past mining and liver metal concentrations or the developmental instability of specimens. Although the impact of past mining and metallurgical works is apparently not as strong as that usually observed in modern mining sites, it is still traceable. For this reason, past mining sites should be monitored, particularly in protected areas erroneously thought to be free of anthropogenic contamination.

  8. Evaluation of an innovative approach based on prototype engineered wetland to control and manage boron (B) mine effluent pollution.

    PubMed

    Türker, Onur Can; Türe, Cengiz; Böcük, Harun; Yakar, Anıl; Chen, Yi

    2016-10-01

    A major environmental problem associated with boron (B) mining in many parts of the world is B pollution, which can become a point source of B mine effluent pollution to aquatic habitats. In this study, a cost-effective, environment-friendly, and sustainable prototype engineered wetland was evaluated and tested to prevent B mine effluent from spilling into adjoining waterways in the largest B reserve in the world. According to the results, average B concentrations in mine effluent significantly decreased from 17.5 to 5.7 mg l(-1) after passing through the prototype with a hydraulic retention time of 14 days. The results of the present experiment, in which different doses of B had been introduced into the prototype, also demonstrated that Typha latifolia (selected as donor species in the prototype) showed a good resistance to alterations against B mine effluent loading rates. Moreover, we found that soil enzymes activities gradually decreased with increasing B dosages during the experiment. Boron mass balance model further showed that 60 % of total B was stored in the filtration media, and only 7 % of B was removed by plant uptake. Consequently, we suggested that application of the prototype in the vicinity of mining site may potentially become an innovative model and integral part of the overall landscape plan of B mine reserve areas worldwide. Graphical Abstract ᅟ.

  9. What's Inside That Seed We Brew? A New Approach To Mining the Coffee Microbiome.

    PubMed

    Vaughan, Michael Joe; Mitchell, Thomas; McSpadden Gardener, Brian B

    2015-10-01

    Coffee is a critically important agricultural commodity for many tropical states and is a beverage enjoyed by millions of people worldwide. Recent concerns over the sustainability of coffee production have prompted investigations of the coffee microbiome as a tool to improve crop health and bean quality. This review synthesizes literature informing our knowledge of the coffee microbiome, with an emphasis on applications of fruit- and seed-associated microbes in coffee production and processing. A comprehensive inventory of microbial species cited in association with coffee fruits and seeds is presented as reference tool for researchers investigating coffee-microbe associations. It concludes with a discussion of the approaches and techniques that provide a path forward to improve our understanding of the coffee microbiome and its utility, as a