Science.gov

Sample records for mining approach identifies

  1. A novel pattern mining approach for identifying cognitive activity in EEG based functional brain networks.

    PubMed

    Thilaga, M; Vijayalakshmi, R; Nadarajan, R; Nandagopal, D

    2016-06-01

    The complex nature of neuronal interactions of the human brain has posed many challenges to the research community. To explore the underlying mechanisms of neuronal activity of cohesive brain regions during different cognitive activities, many innovative mathematical and computational models are required. This paper presents a novel Common Functional Pattern Mining approach to demonstrate the similar patterns of interactions due to common behavior of certain brain regions. The electrode sites of EEG-based functional brain network are modeled as a set of transactions and node-based complex network measures as itemsets. These itemsets are transformed into a graph data structure called Functional Pattern Graph. By mining this Functional Pattern Graph, the common functional patterns due to specific brain functioning can be identified. The empirical analyses show the efficiency of the proposed approach in identifying the extent to which the electrode sites (transactions) are similar during various cognitive load states. PMID:27401999

  2. An integrative data mining approach to identifying adverse outcome pathway signatures.

    PubMed

    Oki, Noffisat O; Edwards, Stephen W

    2016-03-28

    The Adverse Outcome Pathway (AOP) framework is a tool for making biological connections and summarizing key information across different levels of biological organization to connect biological perturbations at the molecular level to adverse outcomes for an individual or population. Computational approaches to explore and determine these connections can accelerate the assembly of AOPs. By leveraging the wealth of publicly available data covering chemical effects on biological systems, computationally-predicted AOPs (cpAOPs) were assembled via data mining of high-throughput screening (HTS) in vitro data, in vivo data and other disease phenotype information. Frequent Itemset Mining (FIM) was used to find associations between the gene targets of ToxCast HTS assays and disease data from Comparative Toxicogenomics Database (CTD) by using the chemicals as the common aggregators between datasets. The method was also used to map gene expression data to disease data from CTD. A cpAOP network was defined by considering genes and diseases as nodes and FIM associations as edges. This network contained 18,283 gene to disease associations for the ToxCast data and 110,253 for CTD gene expression. Two case studies show the value of the cpAOP network by extracting subnetworks focused either on fatty liver disease or the Aryl Hydrocarbon Receptor (AHR). The subnetwork surrounding fatty liver disease included many genes known to play a role in this disease. When querying the cpAOP network with the AHR gene, an interesting subnetwork including glaucoma was identified. While substantial literature exists to support the potential for AHR ligands to elicit glaucoma, it was not explicitly captured in the public annotation information in CTD. The subnetwork from this analysis suggests a cpAOP that includes changes in CYP1B1 expression, which has been previously established in the literature as a primary cause of glaucoma. These case studies highlight the value in integrating multiple data

  3. USING PharmGKB TO TRAIN TEXT MINING APPROACHES FOR IDENTIFYING POTENTIAL GENE TARGETS FOR PHARMACOGENOMIC STUDIES

    PubMed Central

    PAKHOMOV, S.; MCINNES, B.T.; LAMBA, J.; LIU, Y.; MELTON, G.B.; GHODKE, Y.; BHISE, N.; LAMBA, V.; BIRNBAUM, A.K.

    2012-01-01

    The main objective of this study was to investigate the feasibility of using PharmGKB, a pharmacogenomic database, as a source of training data in combination with text of MEDLINE abstracts for a text mining approach to identification of potential gene targets for pathway-driven pharmacogenomics research. We used the manually curated relations between drugs and genes in PharmGKB database to train a support vector machine predictive model and applied this model prospectively to MEDLINE abstracts. The gene targets suggested by this approach were subsequently manually reviewed. Our quantitative analysis showed that a support vector machine classifiers trained on MEDLINE abstracts with single words (unigrams) used as features and PharmGKB relations used for supervision, achieve an overall sensitivity of 85% and specificity of 69%. The subsequent qualitative analysis showed that gene targets “suggested” by the automatic classifier were not anticipated by expert reviewers but were subsequently found to be relevant to the three drugs that were investigated: carbamazepine, lamivudine and zidovudine. Our results show that this approach is not only feasible but may also find new gene targets not identifiable by other methods thus making it a valuable tool for pathway-driven pharmacogenomics research. PMID:22564551

  4. Data mining approach identifies research priorities and data requirements for resolving the red algal tree of life

    PubMed Central

    2010-01-01

    Background The assembly of the tree of life has seen significant progress in recent years but algae and protists have been largely overlooked in this effort. Many groups of algae and protists have ancient roots and it is unclear how much data will be required to resolve their phylogenetic relationships for incorporation in the tree of life. The red algae, a group of primary photosynthetic eukaryotes of more than a billion years old, provide the earliest fossil evidence for eukaryotic multicellularity and sexual reproduction. Despite this evolutionary significance, their phylogenetic relationships are understudied. This study aims to infer a comprehensive red algal tree of life at the family level from a supermatrix containing data mined from GenBank. We aim to locate remaining regions of low support in the topology, evaluate their causes and estimate the amount of data required to resolve them. Results Phylogenetic analysis of a supermatrix of 14 loci and 98 red algal families yielded the most complete red algal tree of life to date. Visualization of statistical support showed the presence of five poorly supported regions. Causes for low support were identified with statistics about the age of the region, data availability and node density, showing that poor support has different origins in different parts of the tree. Parametric simulation experiments yielded optimistic estimates of how much data will be needed to resolve the poorly supported regions (ca. 103 to ca. 104 nucleotides for the different regions). Nonparametric simulations gave a markedly more pessimistic image, some regions requiring more than 2.8 105 nucleotides or not achieving the desired level of support at all. The discrepancies between parametric and nonparametric simulations are discussed in light of our dataset and known attributes of both approaches. Conclusions Our study takes the red algae one step closer to meaningful inclusion in the tree of life. In addition to the recovery of stable

  5. Developing Isotope Tools for Identifying Mercury Mining Sources

    NASA Astrophysics Data System (ADS)

    Koster van Groos, P. G.; Esser, B. K.; Williams, R. W.; Hunt, J. R.

    2009-12-01

    Mining operations in California during the past two centuries have resulted in widespread mercury contamination. Source control strategies are difficult and expensive to implement, in part because links between specific mercury sources and exposures are often uncertain. Examination of mercury’s stable isotopes can help resolve this issue. Sources with distinct isotope compositions may be traced through the environment. Mercury mining operations are predicted to have led to waste tailings, mercury metal products, and air emissions with different isotope compositions as a result of inefficient mercury extraction and recovery from ores. The predicted differences in isotope composition, based on estimated kinetic and diffusion isotope effects, are greater than the precision of current analytical methods using multi-collector inductively coupled plasma mass-spectrometers (MC-ICP-MS). As such, mercury isotope measurements may help identify mercury originating from different mining operations. To support a mechanistic approach to mercury isotope fractionation, the isotope effects of diffusion through solids and gases are being investigated experimentally. Besides demonstrating the utility of mercury isotope analysis for source identification, this work is providing a mechanistic basis for differences in isotope compositions.

  6. A data mining approach to intelligence operations

    NASA Astrophysics Data System (ADS)

    Memon, Nasrullah; Hicks, David L.; Harkiolakis, Nicholas

    2008-03-01

    In this paper we examine the latest thinking, approaches and methodologies in use for finding the nuggets of information and subliminal (and perhaps intentionally hidden) patterns and associations that are critical to identify criminal activity and suspects to private and government security agencies. An emphasis in the paper is placed on Social Network Analysis and Investigative Data Mining, and the use of these technologies in the counterterrorism domain. Tools and techniques from both areas are described, along with the important tasks for which they can be used to assist with the investigation and analysis of terrorist organizations. The process of collecting data about these organizations is also considered along with the inherent difficulties that are involved.

  7. Implementation of an original approach on the Mines-Douai Comparative Reactivity Method (MD-CRM) instrument to identify part of the missing OH reactivity at an urban site

    NASA Astrophysics Data System (ADS)

    Dusanter, S.; Michoud, V.; Leonardis, T.; Riffault, V.; Zhang, S.; Locoge, N.

    2015-12-01

    Due to the large number of Volatile Organic Compounds (VOCs) expected in the atmosphere (104-105) (Goldstein and Galbally, ES&T, 2007), exhaustive measurements of VOCs appear to be currently unfeasible using common analytical techniques. In this context, measurements of the total sink of OH, referred as total OH reactivity, can provide a critical test to assess the completeness of trace gas measurements during field campaigns. This can be done by comparing the measured total OH reactivity to values calculated from trace gas measurements. Indeed, large discrepancies are usually found between measured and calculated OH reactivity values revealing the presence of important unmeasured reactive species, which have yet to be identified. A Comparative Reactivity Method (CRM) instrument has been setup at Mines Douai to allow sequential measurements of VOCs and OH reactivity using the same Proton Transfer Reaction-Time of Flight Mass Spectrometer. This approach aims at identifying unmeasured reactive VOCs based on a method proposed by Kato et al. (Atmos. Environ., 2011), taking advantage of VOC oxidations occurring in the CRM sampling reactor. MD-CRM has been deployed at an urban site in Dunkirk (France) during July 2014 to test this new approach. During this campaign, a large fraction of the OH reactivity was not explained by collocated measurements of trace gases (67% on average). In this presentation, we will first describe the approach that was implemented in the CRM instrument to identify part of the observed missing OH reactivity and we will then discuss the OH reactivity budget regarding the origin of air masses reaching the measurement site.

  8. Mining for Murder-Suicide: An Approach to Identifying Cases of Murder-Suicide in the National Violent Death Reporting System Restricted Access Database.

    PubMed

    McNally, Matthew R; Patton, Christina L; Fremouw, William J

    2016-01-01

    The National Violent Death Reporting System (NVDRS) is a United States Centers for Disease Control and Prevention (CDC) database of violent deaths from 2003 to the present. The NVDRS collects information from 32 states on several types of violent deaths, including suicides, homicides, homicides followed by suicides, and deaths resulting from child maltreatment or intimate partner violence, as well as legal intervention and accidental firearm deaths. Despite the availability of data from police narratives, medical examiner reports, and other sources, reliably finding the cases of murder-suicide in the NVDRS has proven problematic due to the lack of a unique code for murder-suicide incidents and outdated descriptions of case-finding procedures from previous researchers. By providing a description of the methods used to access to the NVDRS and coding procedures used to decipher these data, the authors seek to assist future researchers in correctly identifying cases of murder-suicide deaths while avoiding false positives. PMID:26258816

  9. Identifying Engineering Students' English Sentence Reading Comprehension Errors: Applying a Data Mining Technique

    ERIC Educational Resources Information Center

    Tsai, Yea-Ru; Ouyang, Chen-Sen; Chang, Yukon

    2016-01-01

    The purpose of this study is to propose a diagnostic approach to identify engineering students' English reading comprehension errors. Student data were collected during the process of reading texts of English for science and technology on a web-based cumulative sentence analysis system. For the analysis, the association-rule, data mining technique…

  10. Identifying the Cause of Toxicity of a Saline Mine Water

    PubMed Central

    van Dam, Rick A.; Harford, Andrew J.; Lunn, Simon A.; Gagnon, Marthe M.

    2014-01-01

    Elevated major ions (or salinity) are recognised as being a key contributor to the toxicity of many mine waste waters but the complex interactions between the major ions and large inter-species variability in response to salinity, make it difficult to relate toxicity to causal factors. This study aimed to determine if the toxicity of a typical saline seepage water was solely due to its major ion constituents; and determine which major ions were the leading contributors to the toxicity. Standardised toxicity tests using two tropical freshwater species Chlorella sp. (alga) and Moinodaphnia macleayi (cladoceran) were used to compare the toxicity of 1) mine and synthetic seepage water; 2) key major ions (e.g. Na, Cl, SO4 and HCO3); 3) synthetic seepage water that were modified by excluding key major ions. For Chlorella sp., the toxicity of the seepage water was not solely due to its major ion concentrations because there were differences in effects caused by the mine seepage and synthetic seepage. However, for M. macleayi this hypothesis was supported because similar effects caused by mine seepage and synthetic seepage. Sulfate was identified as a major ion that could predict the toxicity of the synthetic waters, which might be expected as it was the dominant major ion in the seepage water. However, sulfate was not the primary cause of toxicity in the seepage water and electrical conductivity was a better predictor of effects. Ultimately, the results show that specific major ions do not clearly drive the toxicity of saline seepage waters and the effects are probably due to the electrical conductivity of the mine waste waters. PMID:25180579

  11. Data Mining and Pattern Recognition Models for Identifying Inherited Diseases: Challenges and Implications.

    PubMed

    Iddamalgoda, Lahiru; Das, Partha S; Aponso, Achala; Sundararajan, Vijayaraghava S; Suravajhala, Prashanth; Valadi, Jayaraman K

    2016-01-01

    Data mining and pattern recognition methods reveal interesting findings in genetic studies, especially on how the genetic makeup is associated with inherited diseases. Although researchers have proposed various data mining models for biomedical approaches, there remains a challenge in accurately prioritizing the single nucleotide polymorphisms (SNP) associated with the disease. In this commentary, we review the state-of-art data mining and pattern recognition models for identifying inherited diseases and deliberate the need of binary classification- and scoring-based prioritization methods in determining causal variants. While we discuss the pros and cons associated with these methods known, we argue that the gene prioritization methods and the protein interaction (PPI) methods in conjunction with the K nearest neighbors' could be used in accurately categorizing the genetic factors in disease causation. PMID:27559342

  12. Data Mining and Pattern Recognition Models for Identifying Inherited Diseases: Challenges and Implications

    PubMed Central

    Iddamalgoda, Lahiru; Das, Partha S.; Aponso, Achala; Sundararajan, Vijayaraghava S.; Suravajhala, Prashanth; Valadi, Jayaraman K.

    2016-01-01

    Data mining and pattern recognition methods reveal interesting findings in genetic studies, especially on how the genetic makeup is associated with inherited diseases. Although researchers have proposed various data mining models for biomedical approaches, there remains a challenge in accurately prioritizing the single nucleotide polymorphisms (SNP) associated with the disease. In this commentary, we review the state-of-art data mining and pattern recognition models for identifying inherited diseases and deliberate the need of binary classification- and scoring-based prioritization methods in determining causal variants. While we discuss the pros and cons associated with these methods known, we argue that the gene prioritization methods and the protein interaction (PPI) methods in conjunction with the K nearest neighbors' could be used in accurately categorizing the genetic factors in disease causation. PMID:27559342

  13. Pennsylvania's approach to underground coal mine permitting and long-term mine pool management

    SciTech Connect

    Callaghan, T.; Koricich, J.

    1999-07-01

    Pennsylvania's underground coal mine permitting process has two goals: first, to ensure that the mining and reclamation plan is designed to minimize adverse environmental impacts; and second, to minimize interference with the applicant's recovery of coal. A successful review process includes the consistent evaluation of mine site hydrology through scrutiny of key indicators of mining-induced, adverse hydrologic consequences. This allows the regulatory agency to assess the potential for mining-related impacts as well as cumulative impacts throughout the proposed mine area and adjacent area. General trends have been identified regarding quality of underground mine drainage versus coal seam mined. However, the large number of factors controlling the final mine pool chemistry along with the lack of focused research have combined to stunt the development of reliable methodologies for the prediction of postmining water quality. Absent reliable predictive methodologies, mine layout has become the best demonstrated technology for pollution prevention. Strategies include: (1) promotion of postmining inundation by down-dip development with proper location of mine openings and sizing and location of barriers; (2) restriction of mining to zones within the groundwater system where flow is relatively lethargic and time of travel is great when compared to natural mine pool amelioration time frames; and (3) mining in zones remote from groundwater discharge areas and features which may serve to short-circuit mine water to nearby existing water-supply aquifers or to the surface. This paper discusses Pennsylvania's application process for underground bituminous coal mines. It briefly outlines Pennsylvania's statutory history relating to mine discharges, touches on some of the tools permit reviewers use to evaluate the hydrology of proposed underground mining sites, and discusses the key factors that permit reviewers consider in assessing potential postmining mine pool levels.

  14. HDB-Subdue: A Scalable Approach to Graph Mining

    NASA Astrophysics Data System (ADS)

    Padmanabhan, Srihari; Chakravarthy, Sharma

    Transactional data mining (association rules, decision trees etc.) has been effectively used to find non-trivial patterns in categorical and unstructured data. For applications that have an inherent structure (e.g., social networks, proteins), graph mining is useful since mapping the structured data into a transactional representation will lead to loss of information. Graph mining is used for identifying interesting or frequent subgraphs. Database mining uses SQL and relational representation to overcome limitations of main memory algorithms and to achieve scalability.

  15. Innovative approaches to mined land reclamation

    SciTech Connect

    Carlson, C.L.; Swisher, J.H.

    1987-01-01

    This is the proceedings of a conference held on mined land reclamation. The thrust of the meeting was coal-related, although applications provided in this conference are relevant to other noncoal mining programs. The main topics were on methods to forecast acid-forming materials to preclude acid mine drainage; methods to correct acid main drainage after formation; soil conservation and reconstruction; uses of waste materials in land reclamation; and methods of insuring vegetation survival on mined lands. Many papers are presented with regards to the regulatory aspects of these areas.

  16. Screening and prioritisation of chemical risks from metal mining operations, identifying exposure media of concern.

    PubMed

    Pan, Jilang; Oates, Christopher J; Ihlenfeld, Christian; Plant, Jane A; Voulvoulis, Nikolaos

    2010-04-01

    Metals have been central to the development of human civilisation from the Bronze Age to modern times, although in the past, metal mining and smelting have been the cause of serious environmental pollution with the potential to harm human health. Despite problems from artisanal mining in some developing countries, modern mining to Western standards now uses the best available mining technology combined with environmental monitoring, mitigation and remediation measures to limit emissions to the environment. This paper develops risk screening and prioritisation methods previously used for contaminated land on military and civilian sites and engineering systems for the analysis and prioritisation of chemical risks from modern metal mining operations. It uses hierarchical holographic modelling and multi-criteria decision making to analyse and prioritise the risks from potentially hazardous inorganic chemical substances released by mining operations. A case study of an active platinum group metals mine in South Africa is used to demonstrate the potential of the method. This risk-based methodology for identifying, filtering and ranking mining-related environmental and human health risks can be used to identify exposure media of greatest concern to inform risk management. It also provides a practical decision-making tool for mine acquisition and helps to communicate risk to all members of mining operation teams. PMID:19353294

  17. Systematic evaluation of satellite remote sensing for identifying uranium mines and mills.

    SciTech Connect

    Blair, Dianna Sue; Stork, Christopher Lyle; Smartt, Heidi Anne; Smith, Jody Lynn

    2006-01-01

    In this report, we systematically evaluate the ability of current-generation, satellite-based spectroscopic sensors to distinguish uranium mines and mills from other mineral mining and milling operations. We perform this systematic evaluation by (1) outlining the remote, spectroscopic signal generation process, (2) documenting the capabilities of current commercial satellite systems, (3) systematically comparing the uranium mining and milling process to other mineral mining and milling operations, and (4) identifying the most promising observables associated with uranium mining and milling that can be identified using satellite remote sensing. The Ranger uranium mine and mill in Australia serves as a case study where we apply and test the techniques developed in this systematic analysis. Based on literature research of mineral mining and milling practices, we develop a decision tree which utilizes the information contained in one or more observables to determine whether uranium is possibly being mined and/or milled at a given site. Promising observables associated with uranium mining and milling at the Ranger site included in the decision tree are uranium ore, sulfur, the uranium pregnant leach liquor, ammonia, and uranyl compounds and sulfate ion disposed of in the tailings pond. Based on the size, concentration, and spectral characteristics of these promising observables, we then determine whether these observables can be identified using current commercial satellite systems, namely Hyperion, ASTER, and Quickbird. We conclude that the only promising observables at Ranger that can be uniquely identified using a current commercial satellite system (notably Hyperion) are magnesium chlorite in the open pit mine and the sulfur stockpile. Based on the identified magnesium chlorite and sulfur observables, the decision tree narrows the possible mineral candidates at Ranger to uranium, copper, zinc, manganese, vanadium, the rare earths, and phosphorus, all of which are

  18. Design approaches in quarrying and pit-mining reclamation

    USGS Publications Warehouse

    Arbogast, Belinda F.

    1999-01-01

    Reclaimed mine sites have been evaluated so that the public, industry, and land planners may recognize there are innovative designs available for consideration and use. People tend to see cropland, range, and road cuts as a necessary part of their everyday life, not as disturbed areas despite their high visibility. Mining also generates a disturbed landscape, unfortunately one that many consider waste until reclaimed by human beings. The development of mining provides an economic base and use of a natural resource to improve the quality of human life. Equally important is a sensitivity to the geologic origin and natural pattern of the land. Wisely shaping out environment requires a design plan and product that responds to a site's physiography, ecology, function, artistic form, and publication perception. An examination of selected sites for their landscape design suggested nine approaches for mining reclamation. The oldest design approach around is nature itself. Humans may sometimes do more damage going to an area in the attempt to repair it. Given enough geologic time, a small-site area, and stable adjacent ecosystems, disturbed areas recover without mankind's input. Visual screens and buffer zones conceal the facility in a camouflage approach. Typically, earth berms, fences, and plantings are used to disguise the mining facility. Restoration targets social or economic benefits by reusing the site for public amenities, most often in urban centers with large populations. A mitigation approach attempts to protect the environment and return mined areas to use with scientific input. The reuse of cement, building rubble, macadam meets only about 10% of the demand from aggregate. Recognizing the limited supply of mineral resources and encouraging recycling efforts are steps are steps in a renewable resource approach. An educative design approach effectively communicates mining information through outreach, land stewardship, and community service. Mine sites used for

  19. Graduates employment classification using data mining approach

    NASA Astrophysics Data System (ADS)

    Aziz, Mohd Tajul Rizal Ab; Yusof, Yuhanis

    2016-08-01

    Data Mining is a platform to extract hidden knowledge in a collection of data. This study investigates the suitable classification model to classify graduates employment for one of the MARA Professional College (KPM) in Malaysia. The aim is to classify the graduates into either as employed, unemployed or further study. Five data mining algorithms offered in WEKA were used; Naïve Bayes, Logistic regression, Multilayer perceptron, k-nearest neighbor and Decision tree J48. Based on the obtained result, it is learned that the Logistic regression produces the highest classification accuracy which is at 92.5%. Such result was obtained while using 80% data for training and 20% for testing. The produced classification model will benefit the management of the college as it provides insight to the quality of graduates that they produce and how their curriculum can be improved to cater the needs from the industry.

  20. Current approaches for mitigating acid mine drainage.

    PubMed

    Sahoo, Prafulla Kumar; Kim, Kangjoo; Equeenuddin, Sk Md; Powell, Michael A

    2013-01-01

    AMD is one of the critical environmental problems that causes acidification and metal contamination of surface and ground water bodies when mine materials and/or over burden-containing metal sulfides are exposed to oxidizing conditions. The best option to limit AMD is early avoidance of sulfide oxidation. Several techniques are available to achieve this. In this paper, we review all of the major methods now used to limit sulfide oxidation. These fall into five categories: (1) physical barriers,(2) bacterial inhibition, (3) chemical passivation, ( 4) electrochemical, and (5) desulfurization.We describe the processes underlying each method by category and then address aspects relating to effectiveness, cost, and environmental impact. This paper may help researchers and environmental engineers to select suitable methods for addressing site-specific AMD problems.Irrespective of the mechanism by which each method works, all share one common feature, i.e., they delay or prevent oxidation. In addition, all have limitations.Physical barriers such as wet or dry cover have retarded sulfide oxidation in several studies; however, both wet and dry barriers exhibit only short-term effectiveness.Wet cover is suitable at specific sites where complete inundation is established, but this approach requires high maintenance costs. When employing dry cover, plastic liners are expensive and rarely used for large volumes of waste. Bactericides can suppress oxidation, but are only effective on fresh tailings and short-lived, and do not serve as a permanent solution to AMD. In addition, application of bactericides may be toxic to aquatic organisms.Encapsulation or passivation of sulfide surfaces (applying organic and/or inorganic coatings) is simple and effective in preventing AMD. Among inorganic coatings,silica is the most promising, stable, acid-resistant and long lasting, as compared to phosphate and other inorganic coatings. Permanganate passivation is also promising because it

  1. A proactive approach to sustainable management of mine tailings

    NASA Astrophysics Data System (ADS)

    Edraki, Mansour; Baumgartl, Thomas

    2015-04-01

    The reactive strategies to manage mine tailings i.e. containment of slurries of tailings in tailings storage facilities (TSF's) and remediation of tailings solids or tailings seepage water after the decommissioning of those facilities, can be technically inefficient to eliminate environmental risks (e.g. prevent dispersion of contaminants and catastrophic dam wall failures), pose a long term economic burden for companies, governments and society after mine closure, and often fail to meet community expectations. Most preventive environmental management practices promote proactive integrated approaches to waste management whereby the source of environmental issues are identified to help make a more informed decisions. They often use life cycle assessment to find the "hot spots" of environmental burdens. This kind of approach is often based on generic data and has rarely been used for tailings. Besides, life cycle assessments are less useful for designing operations or simulating changes in the process and consequent environmental outcomes. It is evident that an integrated approach for tailings research linked to better processing options is needed. A literature review revealed that there are only few examples of integrated approaches. The aim of this project is to develop new tailings management models by streamlining orebody characterization, process optimization and rehabilitation. The approach is based on continuous fingerprinting of geochemical processes from orebody to tailings storage facility, and benchmark the success of such proactive initiatives by evidence of no impacts and no future projected impacts on receiving environments. We present an approach for developing such a framework and preliminary results from a case study where combined grinding and flotation models developed using geometallurgical data from the orebody were constructed to predict the properties of tailings produced under various processing scenarios. The modelling scenarios based on the

  2. An approach to automated longwall mining

    NASA Technical Reports Server (NTRS)

    Palowitch, E. R.; Broussard, P. H., Jr.

    1979-01-01

    The longwall system of mining coal, providing advantages in the areas of productivity as well as health and safety, is described, and technological developments leading to a full automation of the system are discussed. In the longwall system large blocks of coal (up to 600 feet wide and up to 5000 feet long) are developed, with each block mined out by taking successive slices across the short dimension of the block and loading the broken coal onto a conveyor. A self-advancing system supports the roof over the length of the face throughout cutting and loading, with the supports advanced with the face, and the roof allowed to collapse behind them. A double-ranging drum longwall shearer provides the system with an efficient yaw, roll, and variable-thickness vertical control. Currently two machine operators function as error detectors and controllers. It is shown that electronic sensors can lead to a fully automated vertical control system, and automatic roll control is achievable with available instruments and machine tilt actuators.

  3. Mining the Metabiome: Identifying Novel Natural Products from Microbial Communities

    PubMed Central

    Milshteyn, Aleksandr; Schneider, Jessica S.; Brady, Sean F.

    2014-01-01

    Summary Microbial-derived natural products provide the foundation for most of the chemotherapeutic arsenal available to contemporary medicine. In the face of a dwindling pipeline of new lead structures identified by traditional culturing techniques and an increasing need for new therapeutics, surveys of microbial biosynthetic diversity across environmental metabiomes have revealed enormous reservoirs of as yet untapped natural products chemistry. In this review we touch on the historical context of microbial natural product discovery and discuss innovations and technological advances that are facilitating culture-dependent and culture-independent access to new chemistry from environmental microbiomes with the goal of re-invigorating the small molecule therapeutics discovery pipeline. We highlight the successful strategies that have emerged and some of the challenges that must be overcome to enable the development of high-throughput methods for natural product discovery from complex microbial communities. PMID:25237864

  4. Wastewater treatment polymers identified as the toxic component of a diamond mine effluent.

    PubMed

    De Rosemond, Simone J C; Liber, Karsten

    2004-09-01

    The Ekati Diamond Mine, located approximately 300 km northeast of Yellowknife in Canada's Northwest Territories, uses mechanical crushing and washing processes to extract diamonds from kimberlite ore. The processing plant's effluent contains kimberlite ore particles (< or =0.5 mm), wastewater, and two wastewater treatment polymers, a cationic polydiallydimethylammonium chloride (DADMAC) polymer and an anionic sodium acrylate polyacrylamide (PAM) polymer. A series of acute (48-h) and chronic (7-d) toxicity tests determined the processed kimberlite effluent (PKE) was chronically, but not acutely, toxic to Ceriodaphnia dubia. Reproduction of C. dubia was inhibited significantly at concentrations as low as 12.5% PKE. Toxicity identification evaluations (TIE) were initiated to identify the toxic component of PKE. Ethylenediaminetetraacetic acid (EDTA), sodium thiosulfate, aeration, and solid phase extraction with C-18 manipulations failed to reduce PKE toxicity. Toxicity was reduced significantly by pH adjustments to pH 3 or 11 followed by filtration. Toxicity testing with C. dubia determined that the cationic DADMAC polymer had a 48-h median lethal concentration (LC50) of 0.32 mg/L and 7-d median effective concentration (EC50) of 0.014 mg/L. The anionic PAM polymer had a 48-h LC50 of 218 mg/L. A weight-of-evidence approach, using the data obtained from the TIE, the polymer toxicity experiments, the estimated concentration of the cationic polymer in the kimberlite effluent, and the behavior of kimberlite minerals in pH-adjusted solutions provided sufficient evidence to identify the cationic DADMAC polymer as the toxic component of the diamond mine PKE. PMID:15379002

  5. Lignite mine spoil characterization and approaches for its rehabilitation

    SciTech Connect

    Praveen-Kumar; Kumar, S.; Sharma, K.D.; Choudhary, A.; Gehlot, K.

    2005-01-15

    Open cast mining of lignite leaves behind stockpiles of excavated materials (dumps) and refilled mining pits (spoils). Physicochemical and biochemical properties of both kinds of sites were estimated to identify the reasons for their barrenness. Subsequently, surface modifications were attempted, first in a greenhouse and later infield to develop a suitable approach for their rehabilitation. Dumps had low pH (4.8) and high Na{sup +} (2.5 mg g{sup -1}), spoils high pH (8.7) and high Na{sup +} (1.59 mg g{sup -1} soil). Both sites had low available nitrogen and phosphorus and showed very low dehydrogenase and phosphatases activity but no nitrification. The extreme physicochemical conditions and inert nature of damps and spoils explained their barrenness. In the greenhouse experiment, 14 plant species sown in surface materials of dumps and spoils after spreading a 0.15 m thick layer of dune sand, germinated ({gt}85%), and their seedlings survived for two months. This technique was followed at a spoil site (modified spoil site). After three years of stabilization the modified spoil site had only one-fifth Na{sup +} of that in spoil surface in the beginning and also showed higher dehydrogenase and phosphatase activity and nitrification. Pearl millet and Cenchrus ciliaris grown in modified spoil produced 128 to 394 kg and 2.25 to 3.50 Mg dry matter ha{sup -1}. Addition of farmyard manure with N and P fertilizers increased pearl millet yields.

  6. Data mining approach to model the diagnostic service management.

    PubMed

    Lee, Sun-Mi; Lee, Ae-Kyung; Park, Il-Su

    2006-01-01

    Korea has National Health Insurance Program operated by the government-owned National Health Insurance Corporation, and diagnostic services are provided every two year for the insured and their family members. Developing a customer relationship management (CRM) system using data mining technology would be useful to improve the performance of diagnostic service programs. Under these circumstances, this study developed a model for diagnostic service management taking into account the characteristics of subjects using a data mining approach. This study could be further used to develop an automated CRM system contributing to the increase in the rate of receiving diagnostic services. PMID:17102454

  7. A Node Linkage Approach for Sequential Pattern Mining

    PubMed Central

    Navarro, Osvaldo; Cumplido, René; Villaseñor-Pineda, Luis; Feregrino-Uribe, Claudia; Carrasco-Ochoa, Jesús Ariel

    2014-01-01

    Sequential Pattern Mining is a widely addressed problem in data mining, with applications such as analyzing Web usage, examining purchase behavior, and text mining, among others. Nevertheless, with the dramatic increase in data volume, the current approaches prove inefficient when dealing with large input datasets, a large number of different symbols and low minimum supports. In this paper, we propose a new sequential pattern mining algorithm, which follows a pattern-growth scheme to discover sequential patterns. Unlike most pattern growth algorithms, our approach does not build a data structure to represent the input dataset, but instead accesses the required sequences through pseudo-projection databases, achieving better runtime and reducing memory requirements. Our algorithm traverses the search space in a depth-first fashion and only preserves in memory a pattern node linkage and the pseudo-projections required for the branch being explored at the time. Experimental results show that our new approach, the Node Linkage Depth-First Traversal algorithm (NLDFT), has better performance and scalability in comparison with state of the art algorithms. PMID:24933123

  8. Using Helicopter Electromagnetic Surveys to Identify Potential Hazards at Mine Waste Impoundments

    SciTech Connect

    Hammack, R.W.

    2008-01-01

    In July 2003, helicopter electromagnetic surveys were conducted at 14 coal waste impoundments in southern West Virginia. The purpose of the surveys was to detect conditions that could lead to impoundment failure either by structural failure of the embankment or by the flooding of adjacent or underlying mine works. Specifically, the surveys attempted to: 1) identify saturated zones within the mine waste, 2) delineate filtrate flow paths through the embankment or into adjacent strata and receiving streams, and 3) identify flooded mine workings underlying or adjacent to the waste impoundment. Data from the helicopter surveys were processed to generate conductivity/depth images. Conductivity/depth images were then spatially linked to georeferenced air photos or topographic maps for interpretation. Conductivity/depth images were found to provide a snapshot of the hydrologic conditions that exist within the impoundment. This information can be used to predict potential areas of failure within the embankment because of its ability to image the phreatic zone. Also, the electromagnetic survey can identify areas of unconsolidated slurry in the decant basin and beneath the embankment. Although shallow, flooded mineworks beneath the impoundment were identified by this survey, it cannot be assumed that electromagnetic surveys can detect all underlying mines. A preliminary evaluation of the data implies that helicopter electromagnetic surveys can provide a better understanding of the phreatic zone than the piezometer arrays that are typically used.

  9. Mining Clinicians' Electronic Documentation to Identify Heart Failure Patients with Ineffective Self-Management: A Pilot Text-Mining Study.

    PubMed

    Topaz, Maxim; Radhakrishnan, Kavita; Lei, Victor; Zhou, Li

    2016-01-01

    Effective self-management can decrease up to 50% of heart failure hospitalizations. Unfortunately, self-management by patients with heart failure remains poor. This pilot study aimed to explore the use of text-mining to identify heart failure patients with ineffective self-management. We first built a comprehensive self-management vocabulary based on the literature and clinical notes review. We then randomly selected 545 heart failure patients treated within Partners Healthcare hospitals (Boston, MA, USA) and conducted a regular expression search with the compiled vocabulary within 43,107 interdisciplinary clinical notes of these patients. We found that 38.2% (n = 208) patients had documentation of ineffective heart failure self-management in the domains of poor diet adherence (28.4%), missed medical encounters (26.4%) poor medication adherence (20.2%) and non-specified self-management issues (e.g., "compliance issues", 34.6%). We showed the feasibility of using text-mining to identify patients with ineffective self-management. More natural language processing algorithms are needed to help busy clinicians identify these patients. PMID:27332377

  10. Data Mining Approaches for Modeling Complex Electronic Circuit Design Activities

    SciTech Connect

    Kwon, Yongjin; Omitaomu, Olufemi A; Wang, Gi-Nam

    2008-01-01

    A printed circuit board (PCB) is an essential part of modern electronic circuits. It is made of a flat panel of insulating materials with patterned copper foils that act as electric pathways for various components such as ICs, diodes, capacitors, resistors, and coils. The size of PCBs has been shrinking over the years, while the number of components mounted on these boards has increased considerably. This trend makes the design and fabrication of PCBs ever more difficult. At the beginning of design cycles, it is important to estimate the time to complete the steps required accurately, based on many factors such as the required parts, approximate board size and shape, and a rough sketch of schematics. Current approach uses multiple linear regression (MLR) technique for time and cost estimations. However, the need for accurate predictive models continues to grow as the technology becomes more advanced. In this paper, we analyze a large volume of historical PCB design data, extract some important variables, and develop predictive models based on the extracted variables using a data mining approach. The data mining approach uses an adaptive support vector regression (ASVR) technique; the benchmark model used is the MLR technique currently being used in the industry. The strengths of SVR for this data include its ability to represent data in high-dimensional space through kernel functions. The computational results show that a data mining approach is a better prediction technique for this data. Our approach reduces computation time and enhances the practical applications of the SVR technique.

  11. Application of data mining approaches to drug delivery.

    PubMed

    Ekins, Sean; Shimada, Jun; Chang, Cheng

    2006-11-30

    Computational approaches play a key role in all areas of the pharmaceutical industry from data mining, experimental and clinical data capture to pharmacoeconomics and adverse events monitoring. They will likely continue to be indispensable assets along with a growing library of software applications. This is primarily due to the increasingly massive amount of biology, chemistry and clinical data, which is now entering the public domain mainly as a result of NIH and commercially funded projects. We are therefore in need of new methods for mining this mountain of data in order to enable new hypothesis generation. The computational approaches include, but are not limited to, database compilation, quantitative structure activity relationships (QSAR), pharmacophores, network visualization models, decision trees, machine learning algorithms and multidimensional data visualization software that could be used to improve drug delivery after mining public and/or proprietary data. We will discuss some areas of unmet needs in the area of data mining for drug delivery that can be addressed with new software tools or databases of relevance to future pharmaceutical projects. PMID:17081647

  12. Data Mining for Identifying Novel Associations and Temporal Relationships with Charcot Foot

    PubMed Central

    Munson, Michael E.; Wrobel, James S.; Holmes, Crystal M.; Hanauer, David A.

    2014-01-01

    Introduction. Charcot foot is a rare and devastating complication of diabetes. While some risk factors are known, debate continues regarding etiology. Elucidating other associated disorders and their temporal occurrence could lead to a better understanding of its pathogenesis. We applied a large data mining approach to Charcot foot for elucidating novel associations. Methods. We conducted an association analysis using ICD-9 diagnosis codes for every patient in our health system (n = 1.6 million with 41.2 million time-stamped ICD-9 codes). For the current analysis, we focused on the 388 patients with Charcot foot (ICD-9 713.5). Results. We found 710 associations, 676 (95.2%) of which had a P value for the association less than 1.0 × 10−5 and 603 (84.9%) of which had an odds ratio > 5.0. There were 111 (15.6%) associations with a significant temporal relationship (P < 1.0 × 10−3). The three novel associations with the strongest temporal component were cardiac dysrhythmia, pulmonary eosinophilia, and volume depletion disorder. Conclusion. We identified novel associations with Charcot foot in the context of pathogenesis models that include neurotrophic, neurovascular, and microtraumatic factors mediated through inflammatory cytokines. Future work should focus on confirmatory analyses. These novel areas of investigation could lead to prevention or earlier diagnosis. PMID:24868558

  13. Identifying MMORPG Bots: A Traffic Analysis Approach

    NASA Astrophysics Data System (ADS)

    Chen, Kuan-Ta; Jiang, Jhih-Wei; Huang, Polly; Chu, Hao-Hua; Lei, Chin-Laung; Chen, Wen-Chin

    2008-12-01

    Massively multiplayer online role playing games (MMORPGs) have become extremely popular among network gamers. Despite their success, one of MMORPG's greatest challenges is the increasing use of game bots, that is, autoplaying game clients. The use of game bots is considered unsportsmanlike and is therefore forbidden. To keep games in order, game police, played by actual human players, often patrol game zones and question suspicious players. This practice, however, is labor-intensive and ineffective. To address this problem, we analyze the traffic generated by human players versus game bots and propose general solutions to identify game bots. Taking Ragnarok Online as our subject, we study the traffic generated by human players and game bots. We find that their traffic is distinguishable by 1) the regularity in the release time of client commands, 2) the trend and magnitude of traffic burstiness in multiple time scales, and 3) the sensitivity to different network conditions. Based on these findings, we propose four strategies and two ensemble schemes to identify bots. Finally, we discuss the robustness of the proposed methods against countermeasures of bot developers, and consider a number of possible ways to manage the increasingly serious bot problem.

  14. Mining Functional Modules in Heterogeneous Biological Networks Using Multiplex PageRank Approach.

    PubMed

    Li, Jun; Zhao, Patrick X

    2016-01-01

    Identification of functional modules/sub-networks in large-scale biological networks is one of the important research challenges in current bioinformatics and systems biology. Approaches have been developed to identify functional modules in single-class biological networks; however, methods for systematically and interactively mining multiple classes of heterogeneous biological networks are lacking. In this paper, we present a novel algorithm (called mPageRank) that utilizes the Multiplex PageRank approach to mine functional modules from two classes of biological networks. We demonstrate the capabilities of our approach by successfully mining functional biological modules through integrating expression-based gene-gene association networks and protein-protein interaction networks. We first compared the performance of our method with that of other methods using simulated data. We then applied our method to identify the cell division cycle related functional module and plant signaling defense-related functional module in the model plant Arabidopsis thaliana. Our results demonstrated that the mPageRank method is effective for mining sub-networks in both expression-based gene-gene association networks and protein-protein interaction networks, and has the potential to be adapted for the discovery of functional modules/sub-networks in other heterogeneous biological networks. The mPageRank executable program, source code, the datasets and results of the presented two case studies are publicly and freely available at http://plantgrn.noble.org/MPageRank/. PMID:27446133

  15. Mining Functional Modules in Heterogeneous Biological Networks Using Multiplex PageRank Approach

    PubMed Central

    Li, Jun; Zhao, Patrick X.

    2016-01-01

    Identification of functional modules/sub-networks in large-scale biological networks is one of the important research challenges in current bioinformatics and systems biology. Approaches have been developed to identify functional modules in single-class biological networks; however, methods for systematically and interactively mining multiple classes of heterogeneous biological networks are lacking. In this paper, we present a novel algorithm (called mPageRank) that utilizes the Multiplex PageRank approach to mine functional modules from two classes of biological networks. We demonstrate the capabilities of our approach by successfully mining functional biological modules through integrating expression-based gene-gene association networks and protein-protein interaction networks. We first compared the performance of our method with that of other methods using simulated data. We then applied our method to identify the cell division cycle related functional module and plant signaling defense-related functional module in the model plant Arabidopsis thaliana. Our results demonstrated that the mPageRank method is effective for mining sub-networks in both expression-based gene-gene association networks and protein-protein interaction networks, and has the potential to be adapted for the discovery of functional modules/sub-networks in other heterogeneous biological networks. The mPageRank executable program, source code, the datasets and results of the presented two case studies are publicly and freely available at http://plantgrn.noble.org/MPageRank/. PMID:27446133

  16. A geomorphological approach to the management of rivers contaminated by metal mining

    NASA Astrophysics Data System (ADS)

    Macklin, M. G.; Brewer, P. A.; Hudson-Edwards, K. A.; Bird, G.; Coulthard, T. J.; Dennis, I. A.; Lechler, P. J.; Miller, J. R.; Turner, J. N.

    2006-09-01

    As the result of current and historical metal mining, river channels and floodplains in many parts of the world have become contaminated by metal-rich waste in concentrations that may pose a hazard to human livelihoods and sustainable development. Environmental and human health impacts commonly arise because of the prolonged residence time of heavy metals in river sediments and alluvial soils and their bioaccumulatory nature in plants and animals. This paper considers how an understanding of the processes of sediment-associated metal dispersion in rivers, and the space and timescales over which they operate, can be used in a practical way to help river basin managers more effectively control and remediate catchments affected by current and historical metal mining. A geomorphological approach to the management of rivers contaminated by metals is outlined and four emerging research themes are highlighted and critically reviewed. These are: (1) response and recovery of river systems following the failures of major tailings dams; (2) effects of flooding on river contamination and the sustainable use of floodplains; (3) new developments in isotopic fingerprinting, remote sensing and numerical modelling for identifying the sources of contaminant metals and for mapping the spatial distribution of contaminants in river channels and floodplains; and (4) current approaches to the remediation of river basins affected by mining, appraised in light of the European Union's Water Framework Directive (2000/60/EC). Future opportunities for geomorphologically-based assessments of mining-affected catchments are also identified.

  17. Efflorescent sulfates from Baia Sprie mining area (Romania)--Acid mine drainage and climatological approach.

    PubMed

    Buzatu, Andrei; Dill, Harald G; Buzgar, Nicolae; Damian, Gheorghe; Maftei, Andreea Elena; Apopei, Andrei Ionuț

    2016-01-15

    The Baia Sprie epithermal system, a well-known deposit for its impressive mineralogical associations, shows the proper conditions for acid mine drainage and can be considered a general example for affected mining areas around the globe. Efflorescent samples from the abandoned open pit Minei Hill have been analyzed by X-ray diffraction (XRD), scanning electron microscopy (SEM), Raman and near-infrared (NIR) spectrometry. The identified phases represent mostly iron sulfates with different hydration degrees (szomolnokite, rozenite, melanterite, coquimbite, ferricopiapite), Zn and Al sulfates (gunningite, alunogen, halotrichite). The samples were heated at different temperatures in order to establish the phase transformations among the studied sulfates. The dehydration temperatures and intermediate phases upon decomposition were successfully identified for each of mineral phases. Gunningite was the single sulfate that showed no transformations during the heating experiment. All the other sulfates started to dehydrate within the 30-90 °C temperature range. The acid mine drainage is the main cause for sulfates formation, triggered by pyrite oxidation as the major source for the abundant iron sulfates. Based on the dehydration temperatures, the climatological interpretation indicated that melanterite formation and long-term presence is related to continental and temperate climates. Coquimbite and rozenite are attributed also to the dry arid/semi-arid areas, in addition to the above mentioned ones. The more stable sulfates, alunogen, halotrichite, szomolnokite, ferricopiapite and gunningite, can form and persists in all climate regimes, from dry continental to even tropical humid. PMID:26544892

  18. Mining Patterns of Disease Progression: A Topic-Model-Based Approach.

    PubMed

    Zhang, Lingxiao; Zhao, Junfeng; Wang, Yasha; Xie, Bing

    2016-01-01

    Knowledge of how diseases progress and transform is crucial for clinical decision making. Frequent pattern mining techniques, such as sequential pattern mining (SPM) algorithms, can automatically extract such knowledge from large collections of electronic medical records (EMR). However, EMR data are usually unorganized and highly noisy. Finding meaningful disease patterns often calls for manual manipulation such as cohort and feature selection on EMR data by medical professionals. In this paper, we propose a topic-model-based SPM approach to find disease progression patterns from diagnostic records. We improve the traditional SPM algorithms by filtering and grouping the diagnosis sequences according to different clinical topics. These topics represent certain clinical conditions with closely related diagnoses, and are detected without prior medical knowledge. The experiment on real-world EMR data shows that our approach is able to find meaningful progression patterns with less noises, and can help quickly identify interesting patterns related to a certain clinical condition with less human effort. PMID:27577403

  19. Clustering-based approaches to SAGE data mining.

    PubMed

    Wang, Haiying; Zheng, Huiru; Azuaje, Francisco

    2008-01-01

    Serial analysis of gene expression (SAGE) is one of the most powerful tools for global gene expression profiling. It has led to several biological discoveries and biomedical applications, such as the prediction of new gene functions and the identification of biomarkers in human cancer research. Clustering techniques have become fundamental approaches in these applications. This paper reviews relevant clustering techniques specifically designed for this type of data. It places an emphasis on current limitations and opportunities in this area for supporting biologically-meaningful data mining and visualisation. PMID:18822151

  20. A Practical Approach for Content Mining of Tweets

    PubMed Central

    Yoon, Sunmoo; Elhadad, Noémie; Bakken, Suzanne

    2013-01-01

    Use of data generated through social media for health studies is gradually increasing. Twitter is a short-text message system developed 6 years ago, now with more than 100 million users generating over 300 million Tweets every day. Twitter may be used to gain real-world insights to promote healthy behaviors. The purposes of this paper are to describe a practical approach to analyzing Tweet contents and to illustrate an application of the approach to the topic of physical activity. The approach includes five steps: (1) selecting keywords to gather an initial set of Tweets to analyze; (2) importing data; (3) preparing data; (4) analyzing data (topic, sentiment, and ecologic context); and (5) interpreting data. The steps are implemented using tools that are publically available and free of charge and designed for use by researchers with limited programming skills. Content mining of Tweets can contribute to addressing challenges in health behavior research. PMID:23790998

  1. Identifying Understudied Nuclear Reactions by Text-mining the EXFOR Experimental Nuclear Reaction Library

    NASA Astrophysics Data System (ADS)

    Hirdt, J. A.; Brown, D. A.

    2016-01-01

    The EXFOR library contains the largest collection of experimental nuclear reaction data available as well as the data's bibliographic information and experimental details. We text-mined the REACTION and MONITOR fields of the ENTRYs in the EXFOR library in order to identify understudied reactions and quantities. Using the results of the text-mining, we created an undirected graph from the EXFOR datasets with each graph node representing a single reaction and quantity and graph links representing the various types of connections between these reactions and quantities. This graph is an abstract representation of the connections in EXFOR, similar to graphs of social networks, authorship networks, etc. We use various graph theoretical tools to identify important yet understudied reactions and quantities in EXFOR. Although we identified a few cross sections relevant for shielding applications and isotope production, mostly we identified charged particle fluence monitor cross sections. As a side effect of this work, we learn that our abstract graph is typical of other real-world graphs.

  2. Identifying underground coal mine displacement through field and laboratory laser scanning

    NASA Astrophysics Data System (ADS)

    Slaker, Brent; Westman, Erik

    2014-01-01

    The ability to identify ground movements in the unique environment of an underground coalmine is explored through the use of laser scanning. Time-lapse scans were performed in an underground coal mine to detect rib surface change after different volumes of coal were removed from the mine ribs. Surface changes in the rib as small as 57 cm3 were detected through analysis of surface differences between triangulated surfaces created from point clouds. Results suggest that the uneven geometry, coal reflectance, and small movements of objects and references in the scene due to ventilation air do not significantly influence monitoring ability. Time-lapse scans were also performed on an artificial coal rib constructed to allow the researchers to control deformation and error precisely. A test of displacement measurement precision showed relative standard deviations of <0.1% are attainable with point cloud densities of >3200 pts/m2. Changing the distance and angle of incidence of the artificial coal rib to the scanner had little impact on the accuracy of results beyond the expected reduction due to a smaller point density of the target area. The results collected in this study suggest that laser scanning can be a useful, comprehensive tool for measuring ground change in an underground coal mining environment.

  3. WHAT INNOVATIVE APPROACHES CAN BE DEVELOPED FOR MINING SITES?

    EPA Science Inventory

    Mining is essential to maintain our way of life. However, based upon industry's reporting in the most recent Toxic Release Inventory (TRI), the primary sources of heavy metal releases to the environment are mining and mining related activities. The hard rock mining industry rel...

  4. Using Frequent Item Set Mining and Feature Selection Methods to Identify Interacted Risk Factors - The Atrial Fibrillation Case Study.

    PubMed

    Li, Xiang; Liu, Haifeng; Du, Xin; Hu, Gang; Xie, Guotong; Zhang, Ping

    2016-01-01

    Disease risk prediction is highly important for early intervention and treatment, and identification of predictive risk factors is the key point to achieve accurate prediction. In addition to original independent features in a dataset, some interacted features, such as comorbidities and combination therapies, may have non-additive influence on the disease outcome and can also be used in risk prediction to improve the prediction performance. However, it is usually difficult to manually identify the possible interacted risk factors due to the combination explosion of features. In this paper, we propose an automatic approach to identify predictive risk factors with interactions using frequent item set mining and feature selection methods. The proposed approach was applied in the real world case study of predicting ischemic stroke and thromboembolism for atrial fibrillation patients on the Chinese atrial fibrillation registry dataset, and the results show that our approach can not only improve the prediction performance, but also identify the comorbidities and combination therapies that have potential influences on TE occurrence for AF. PMID:27577446

  5. Development and application of the Safe Performance Index as a risk-based methodology for identifying major hazard-related safety issues in underground coal mines

    NASA Astrophysics Data System (ADS)

    Kinilakodi, Harisha

    The underground coal mining industry has been under constant watch due to the high risk involved in its activities, and scrutiny increased because of the disasters that occurred in 2006-07. In the aftermath of the incidents, the U.S. Congress passed the Mine Improvement and New Emergency Response Act of 2006 (MINER Act), which strengthened the existing regulations and mandated new laws to address the various issues related to a safe working environment in the mines. Risk analysis in any form should be done on a regular basis to tackle the possibility of unwanted major hazard-related events such as explosions, outbursts, airbursts, inundations, spontaneous combustion, and roof fall instabilities. One of the responses by the Mine Safety and Health Administration (MSHA) in 2007 involved a new pattern of violations (POV) process to target mines with a poor safety performance, specifically to improve their safety. However, the 2010 disaster (worst in 40 years) gave an impression that the collective effort of the industry, federal/state agencies, and researchers to achieve the goal of zero fatalities and serious injuries has gone awry. The Safe Performance Index (SPI) methodology developed in this research is a straight-forward, effective, transparent, and reproducible approach that can help in identifying and addressing some of the existing issues while targeting (poor safety performance) mines which need help. It combines three injury and three citation measures that are scaled to have an equal mean (5.0) in a balanced way with proportionate weighting factors (0.05, 0.15, 0.30) and overall normalizing factor (15) into a mine safety performance evaluation tool. It can be used to assess the relative safety-related risk of mines, including by mine-size category. Using 2008 and 2009 data, comparisons were made of SPI-associated, normalized safety performance measures across mine-size categories, with emphasis on small-mine safety performance as compared to large- and

  6. Approaches to Post-Mining Land Reclamation in Polish Open-Cast Lignite Mining

    NASA Astrophysics Data System (ADS)

    Kasztelewicz, Zbigniew

    2014-06-01

    The paper presents the situation regarding the reclamation of post-mining land in the case of particular lignite mines in Poland until 2012 against the background of the whole opencast mining. It discusses the process of land purchase for mining operations and its sales after reclamation. It presents the achievements of mines in the reclamation and regeneration of post-mining land as a result of which-after development processes carried out according to European standards-it now serves the inhabitants as a recreational area that increases the attractiveness of the regions.

  7. Identifying Catchment-Scale Predictors of Coal Mining Impacts on New Zealand Stream Communities.

    PubMed

    Clapcott, Joanne E; Goodwin, Eric O; Harding, Jon S

    2016-03-01

    Coal mining activities can have severe and long-term impacts on freshwater ecosystems. At the individual stream scale, these impacts have been well studied; however, few attempts have been made to determine the predictors of mine impacts at a regional scale. We investigated whether catchment-scale measures of mining impacts could be used to predict biological responses. We collated data from multiple studies and analyzed algae, benthic invertebrate, and fish community data from 186 stream sites, including un-mined streams, and those associated with 620 mines on the West Coast of the South Island, New Zealand. Algal, invertebrate, and fish richness responded to mine impacts and were significantly higher in un-mined compared to mine-impacted streams. Changes in community composition toward more acid- and metal-tolerant species were evident for algae and invertebrates, whereas changes in fish communities were significant and driven by a loss of nonmigratory native species. Consistent catchment-scale predictors of mining activities affecting biota included the time post mining (years), mining density (the number of mines upstream per catchment area), and mining intensity (tons of coal production per catchment area). Mining was associated with a decline in stream biodiversity irrespective of catchment size, and recovery was not evident until at least 30 years after mining activities have ceased. These catchment-scale predictors can provide managers and regulators with practical metrics to focus on management and remediation decisions. PMID:26467674

  8. Identifying Catchment-Scale Predictors of Coal Mining Impacts on New Zealand Stream Communities

    NASA Astrophysics Data System (ADS)

    Clapcott, Joanne E.; Goodwin, Eric O.; Harding, Jon S.

    2016-03-01

    Coal mining activities can have severe and long-term impacts on freshwater ecosystems. At the individual stream scale, these impacts have been well studied; however, few attempts have been made to determine the predictors of mine impacts at a regional scale. We investigated whether catchment-scale measures of mining impacts could be used to predict biological responses. We collated data from multiple studies and analyzed algae, benthic invertebrate, and fish community data from 186 stream sites, including un-mined streams, and those associated with 620 mines on the West Coast of the South Island, New Zealand. Algal, invertebrate, and fish richness responded to mine impacts and were significantly higher in un-mined compared to mine-impacted streams. Changes in community composition toward more acid- and metal-tolerant species were evident for algae and invertebrates, whereas changes in fish communities were significant and driven by a loss of nonmigratory native species. Consistent catchment-scale predictors of mining activities affecting biota included the time post mining (years), mining density (the number of mines upstream per catchment area), and mining intensity (tons of coal production per catchment area). Mining was associated with a decline in stream biodiversity irrespective of catchment size, and recovery was not evident until at least 30 years after mining activities have ceased. These catchment-scale predictors can provide managers and regulators with practical metrics to focus on management and remediation decisions.

  9. An Integrated Assessment Approach to Address Artisanal and Small-Scale Gold Mining in Ghana.

    PubMed

    Basu, Niladri; Renne, Elisha P; Long, Rachel N

    2015-09-01

    Artisanal and small-scale gold mining (ASGM) is growing in many regions of the world including Ghana. The problems in these communities are complex and multi-faceted. To help increase understanding of such problems, and to enable consensus-building and effective translation of scientific findings to stakeholders, help inform policies, and ultimately improve decision making, we utilized an Integrated Assessment approach to study artisanal and small-scale gold mining activities in Ghana. Though Integrated Assessments have been used in the fields of environmental science and sustainable development, their use in addressing specific matter in public health, and in particular, environmental and occupational health is quite limited despite their many benefits. The aim of the current paper was to describe specific activities undertaken and how they were organized, and the outputs and outcomes of our activity. In brief, three disciplinary workgroups (Natural Sciences, Human Health, Social Sciences and Economics) were formed, with 26 researchers from a range of Ghanaian institutions plus international experts. The workgroups conducted activities in order to address the following question: What are the causes, consequences and correctives of small-scale gold mining in Ghana? More specifically: What alternatives are available in resource-limited settings in Ghana that allow for gold-mining to occur in a manner that maintains ecological health and human health without hindering near- and long-term economic prosperity? Several response options were identified and evaluated, and are currently being disseminated to various stakeholders within Ghana and internationally. PMID:26393627

  10. An Integrated Assessment Approach to Address Artisanal and Small-Scale Gold Mining in Ghana

    PubMed Central

    Basu, Niladri; Renne, Elisha P.; Long, Rachel N.

    2015-01-01

    Artisanal and small-scale gold mining (ASGM) is growing in many regions of the world including Ghana. The problems in these communities are complex and multi-faceted. To help increase understanding of such problems, and to enable consensus-building and effective translation of scientific findings to stakeholders, help inform policies, and ultimately improve decision making, we utilized an Integrated Assessment approach to study artisanal and small-scale gold mining activities in Ghana. Though Integrated Assessments have been used in the fields of environmental science and sustainable development, their use in addressing specific matter in public health, and in particular, environmental and occupational health is quite limited despite their many benefits. The aim of the current paper was to describe specific activities undertaken and how they were organized, and the outputs and outcomes of our activity. In brief, three disciplinary workgroups (Natural Sciences, Human Health, Social Sciences and Economics) were formed, with 26 researchers from a range of Ghanaian institutions plus international experts. The workgroups conducted activities in order to address the following question: What are the causes, consequences and correctives of small-scale gold mining in Ghana? More specifically: What alternatives are available in resource-limited settings in Ghana that allow for gold-mining to occur in a manner that maintains ecological health and human health without hindering near- and long-term economic prosperity? Several response options were identified and evaluated, and are currently being disseminated to various stakeholders within Ghana and internationally. PMID:26393627

  11. Identifying Liver Cancer and Its Relations with Diseases, Drugs, and Genes: A Literature-Based Approach

    PubMed Central

    Song, Min

    2016-01-01

    In biomedicine, scientific literature is a valuable source for knowledge discovery. Mining knowledge from textual data has become an ever important task as the volume of scientific literature is growing unprecedentedly. In this paper, we propose a framework for examining a certain disease based on existing information provided by scientific literature. Disease-related entities that include diseases, drugs, and genes are systematically extracted and analyzed using a three-level network-based approach. A paper-entity network and an entity co-occurrence network (macro-level) are explored and used to construct six entity specific networks (meso-level). Important diseases, drugs, and genes as well as salient entity relations (micro-level) are identified from these networks. Results obtained from the literature-based literature mining can serve to assist clinical applications. PMID:27195695

  12. A systematic approach to identify cellular auxetic materials

    NASA Astrophysics Data System (ADS)

    Körner, Carolin; Liebold-Ribeiro, Yvonne

    2015-02-01

    Auxetics are materials showing a negative Poisson’s ratio. This characteristic leads to unusual mechanical properties that make this an interesting class of materials. So far no systematic approach for generating auxetic cellular materials has been reported. In this contribution, we present a systematic approach to identifying auxetic cellular materials based on eigenmode analysis. The fundamental mechanism generating auxetic behavior is identified as rotation. With this knowledge, a variety of complex two-dimensional (2D) and three-dimensional (3D) auxetic structures based on simple unit cells can be identified.

  13. Text Mining approaches for automated literature knowledge extraction and representation.

    PubMed

    Nuzzo, Angelo; Mulas, Francesca; Gabetta, Matteo; Arbustini, Eloisa; Zupan, Blaz; Larizza, Cristiana; Bellazzi, Riccardo

    2010-01-01

    Due to the overwhelming volume of published scientific papers, information tools for automated literature analysis are essential to support current biomedical research. We have developed a knowledge extraction tool to help researcher in discovering useful information which can support their reasoning process. The tool is composed of a search engine based on Text Mining and Natural Language Processing techniques, and an analysis module which process the search results in order to build annotation similarity networks. We tested our approach on the available knowledge about the genetic mechanism of cardiac diseases, where the target is to find both known and possible hypothetical relations between specific candidate genes and the trait of interest. We show that the system i) is able to effectively retrieve medical concepts and genes and ii) plays a relevant role assisting researchers in the formulation and evaluation of novel literature-based hypotheses. PMID:20841825

  14. Application of numerical simulation using a progressive failure approach to underground-coal-mine stability analysis

    SciTech Connect

    Ash, N.F.

    1987-01-01

    Stability in underground coal mines is of major concern to the coal industry due to its effect on both safety and productivity. Consequently, this can have a great influence on the design of efficient mine systems. In this work a progressive failure approach was used to simulate underground coal mine stability at two different mines. The two mines considered have different characteristics. Two- and three- dimensional finite element models were created to model different areas of a longwall mine. Different chain pillar configurations were considered and the resulting stress distributions were comparable to field measurements. A complete mine section was successfully modeled taking into consideration face advancement. The roof above entry intersections was also modeled using laminated composite simulation and the finite element method. The results showed trends similar to field observations. In addition, the progressive development of subsidence for the two different mines was simulated. The same variation in subsidence behavior recorded at the mine was realized in the finite element simulation. The progressive failure approach used in this work can successfully simulate underground coal mine stability. It can also be a helpful tool in the design of more efficient mine systems which can increase productivity and maintain a high level of safety.

  15. A genetic algorithm approach to recognition and data mining

    SciTech Connect

    Punch, W.F.; Goodman, E.D.; Min, Pei

    1996-12-31

    We review here our use of genetic algorithm (GA) and genetic programming (GP) techniques to perform {open_quotes}data mining,{close_quotes} the discovery of particular/important data within large datasets, by finding optimal data classifications using known examples. Our first experiments concentrated on the use of a K-nearest neighbor algorithm in combination with a GA. The GA selected weights for each feature so as to optimize knn classification based on a linear combination of features. This combined GA-knn approach was successfully applied to both generated and real-world data. We later extended this work by substituting a GP for the GA. The GP-knn could not only optimize data classification via linear combinations of features but also determine functional relationships among the features. This allowed for improved performance and new information on important relationships among features. We review the effectiveness of the overall approach on examples from biology and compare the effectiveness of the GA and GP.

  16. Online Discourse on Fibromyalgia: Text-Mining to Identify Clinical Distinction and Patient Concerns

    PubMed Central

    Park, Jungsik; Ryu, Young Uk

    2014-01-01

    Background The purpose of this study was to evaluate the possibility of using text-mining to identify clinical distinctions and patient concerns in online memoires posted by patients with fibromyalgia (FM). Material/Methods A total of 399 memoirs were collected from an FM group website. The unstructured data of memoirs associated with FM were collected through a crawling process and converted into structured data with a concordance, parts of speech tagging, and word frequency. We also conducted a lexical analysis and phrase pattern identification. After examining the data, a set of FM-related keywords were obtained and phrase net relationships were set through a web-based visualization tool. Results The clinical distinction of FM was verified. Pain is the biggest issue to the FM patients. The pains were affecting body parts including ‘muscles,’ ‘leg,’ ‘neck,’ ‘back,’ ‘joints,’ and ‘shoulders’ with accompanying symptoms such as ‘spasms,’ ‘stiffness,’ and ‘aching,’ and were described as ‘sever,’ ‘chronic,’ and ‘constant.’ This study also demonstrated that it was possible to understand the interests and concerns of FM patients through text-mining. FM patients wanted to escape from the pain and symptoms, so they were interested in medical treatment and help. Also, they seemed to have interest in their work and occupation, and hope to continue to live life through the relationships with the people around them. Conclusions This research shows the potential for extracting keywords to confirm the clinical distinction of a certain disease, and text-mining can help objectively understand the concerns of patients by generalizing their large number of subjective illness experiences. However, it is believed that there are limitations to the processes and methods for organizing and classifying large amounts of text, so these limits have to be considered when analyzing the results. The development of research methodology to overcome

  17. A data-mining approach for multiple structural alignment of proteins.

    PubMed

    Siu, Wing-Yan; Mamoulis, Nikos; Yiu, Siu-Ming; Chan, Ho-Leung

    2010-01-01

    Comparing the 3D structures of proteins is an important but computationally hard problem in bioinformatics. In this paper, we propose studying the problem when much less information or assumptions are available. We model the structural alignment of proteins as a combinatorial problem. In the problem, each protein is simply a set of points in the 3D space, without sequence order information, and the objective is to discover all large enough alignments for any subset of the input. We propose a data-mining approach for this problem. We first perform geometric hashing of the structures such that points with similar locations in the 3D space are hashed into the same bin in the hash table. The novelty is that we consider each bin as a coincidence group and mine for frequent patterns, which is a well-studied technique in data mining. We observe that these frequent patterns are already potentially large alignments. Then a simple heuristic is used to extend the alignments if possible. We implemented the algorithm and tested it using real protein structures. The results were compared with existing tools. They showed that the algorithm is capable of finding conserved substructures that do not preserve sequence order, especially those existing in protein interfaces. The algorithm can also identify conserved substructures of functionally similar structures within a mixture with dissimilar ones. The running time of the program was smaller or comparable to that of the existing tools. PMID:21079664

  18. Pharmacogenomic Approach to Identify Drug Sensitivity in Small-Cell Lung Cancer

    PubMed Central

    Wildey, Gary; Chen, Yanwen; Lent, Ian; Stetson, Lindsay; Pink, John; Barnholtz-Sloan, Jill S.; Dowlati, Afshin

    2014-01-01

    There are currently no molecular targeted approaches to treat small-cell lung cancer (SCLC) similar to those used successfully against non-small-cell lung cancer. This failure is attributable to our inability to identify clinically-relevant subtypes of this disease. Thus, a more systematic approach to drug discovery for SCLC is needed. In this regard, two comprehensive studies recently published in Nature, the Cancer Cell Line Encyclopedia and the Cancer Genome Project, provide a wealth of data regarding the drug sensitivity and genomic profiles of many different types of cancer cells. In the present study we have mined these two studies for new therapeutic agents for SCLC and identified heat shock proteins, cyclin-dependent kinases and polo-like kinases (PLK) as attractive molecular targets with little current clinical trial activity in SCLC. Remarkably, our analyses demonstrated that most SCLC cell lines clustered into a single, predominant subgroup by either gene expression or CNV analyses, leading us to take a pharmacogenomic approach to identify subgroups of drug-sensitive SCLC cells. Using PLK inhibitors as an example, we identified and validated a gene signature for drug sensitivity in SCLC cell lines. This gene signature could distinguish subpopulations among human SCLC tumors, suggesting its potential clinical utility. Finally, circos plots were constructed to yield a comprehensive view of how transcriptional, copy number and mutational elements affect PLK sensitivity in SCLC cell lines. Taken together, this study outlines an approach to predict drug sensitivity in SCLC to novel targeted therapeutics. PMID:25198282

  19. An Integrative Proteomic Approach Identifies Novel Cellular SMYD2 Substrates.

    PubMed

    Ahmed, Hazem; Duan, Shili; Arrowsmith, Cheryl H; Barsyte-Lovejoy, Dalia; Schapira, Matthieu

    2016-06-01

    Protein methylation is a post-translational modification with important roles in transcriptional regulation and other biological processes, but the enzyme-substrate relationship between the 68 known human protein methyltransferases and the thousands of reported methylation sites is poorly understood. Here, we propose a bioinformatic approach that integrates structural, biochemical, cellular, and proteomic data to identify novel cellular substrates of the lysine methyltransferase SMYD2. Of the 14 novel putative SMYD2 substrates identified by our approach, six were confirmed in cells by immunoprecipitation: MAPT, CCAR2, EEF2, NCOA3, STUB1, and UTP14A. Treatment with the selective SMYD2 inhibitor BAY-598 abrogated the methylation signal, indicating that methylation of these novel substrates was dependent on the catalytic activity of the enzyme. We believe that our integrative approach can be applied to other protein lysine methyltransferases, and help understand how lysine methylation participates in wider signaling processes. PMID:27163177

  20. Proteomic and Genetic Approaches Identify Syk as an AML Target

    PubMed Central

    Hahn, Cynthia K.; Berchuck, Jacob E.; Ross, Kenneth N.; Kakoza, Rose M.; Clauser, Karl; Schinzel, Anna C.; Ross, Linda; Galinsky, Ilene; Davis, Tina N.; Silver, Serena J.; Root, David E.; Stone, Richard M.; DeAngelo, Daniel J.; Carroll, Martin; Hahn, William C.; Carr, Steven A.; Golub, Todd R.; Kung, Andrew L.; Stegmaier, Kimberly

    2009-01-01

    SUMMARY Cell-based screening can facilitate rapid identification of compounds inducing complex cellular phenotypes. Advancing a compound toward the clinic, however, generally requires identification of precise mechanisms of action. We previously found that epidermal growth factor receptor (EGFR) inhibitors induce acute myeloid leukemia (AML) differentiation via a non-EGFR mechanism. In this report, we integrated proteomic and RNAi-based strategies to identify their off-target anti-AML mechanism. These orthogonal approaches identified Syk as a target in AML. Genetic and pharmacological inactivation of Syk with a drug in clinical trial for other indications promoted differentiation of AML cells and attenuated leukemia growth in vivo. These results demonstrate the power of integrating diverse chemical, proteomic, and genomic screening approaches to identify therapeutic strategies for cancer. PMID:19800574

  1. A data mining approach to finding relationships between reservoir properties and oil production for CHOPS

    NASA Astrophysics Data System (ADS)

    Cai, Yongxiang; Wang, Xin; Hu, Kezhen; Dong, Mingzhe

    2014-12-01

    Cold heavy oil production with sand (CHOPS) is a primary oil extraction process for heavy crude oil and reservoir properties are key factors that contribute to the effectiveness of CHOPS. However, identification of the key reservoir properties and quantification of the relationships between the reservoir properties and the oil production are still challenging tasks. In this paper, we propose the use of a data mining approach for finding quantitative relationships between various reservoir properties and oil production for CHOPS. The approach includes four steps: firstly, a set of reservoir properties are identified to describe reservoir characteristics through a petrophysical analysis. In addition to common parameters, such as porosity and permeability, two new parameters - a fluid mobility factor and the maximum inscribed rectangular of net pay (MIRNP) - are proposed. Secondly, three new parameters to describe the production performance of wells are proposed: the peak value, effective life cycle and effective yield. Next, the fuzzy ranking method is used to rank the importance of the identified reservoir properties in terms of oil production. Finally, association rule mining is used to obtain quantitative relationships between reservoir property variables and the production performance of wells. The proposed methods have been applied for 118 wells in the Sparky Formation of the Lloydminster heavy oil field in Alberta. The result shows that the production performance of wells in the area could be described and predicted by using the found quantitative relations.

  2. A Tools-Based Approach to Teaching Data Mining Methods

    ERIC Educational Resources Information Center

    Jafar, Musa J.

    2010-01-01

    Data mining is an emerging field of study in Information Systems programs. Although the course content has been streamlined, the underlying technology is still in a state of flux. The purpose of this paper is to describe how we utilized Microsoft Excel's data mining add-ins as a front-end to Microsoft's Cloud Computing and SQL Server 2008 Business…

  3. Risk evaluation of uranium mining: A geochemical inverse modelling approach

    NASA Astrophysics Data System (ADS)

    Rillard, J.; Zuddas, P.; Scislewski, A.

    2011-12-01

    It is well known that uranium extraction operations can increase risks linked to radiation exposure. The toxicity of uranium and associated heavy metals is the main environmental concern regarding exploitation and processing of U-ore. In areas where U mining is planned, a careful assessment of toxic and radioactive element concentrations is recommended before the start of mining activities. A background evaluation of harmful elements is important in order to prevent and/or quantify future water contamination resulting from possible migration of toxic metals coming from ore and waste water interaction. Controlled leaching experiments were carried out to investigate processes of ore and waste (leached ore) degradation, using samples from the uranium exploitation site located in Caetité-Bahia, Brazil. In experiments in which the reaction of waste with water was tested, we found that the water had low pH and high levels of sulphates and aluminium. On the other hand, in experiments in which ore was tested, the water had a chemical composition comparable to natural water found in the region of Caetité. On the basis of our experiments, we suggest that waste resulting from sulphuric acid treatment can induce acidification and salinization of surface and ground water. For this reason proper storage of waste is imperative. As a tool to evaluate the risks, a geochemical inverse modelling approach was developed to estimate the water-mineral interaction involving the presence of toxic elements. We used a method earlier described by Scislewski and Zuddas 2010 (Geochim. Cosmochim. Acta 74, 6996-7007) in which the reactive surface area of mineral dissolution can be estimated. We found that the reactive surface area of rock parent minerals is not constant during time but varies according to several orders of magnitude in only two months of interaction. We propose that parent mineral heterogeneity and particularly, neogenic phase formation may explain the observed variation of the

  4. InSAR Identifies Mine-Dewatering Associated Bedrock Compaction and Subsidence in North- Central Nevada

    NASA Astrophysics Data System (ADS)

    Katzenstein, K. W.; Bell, J. W.; Watters, R. J.

    2007-12-01

    During the last decade, InSAR has been used extensively for the delineation of aquifer-system response to heavy groundwater pumping. A number of studies have demonstrated the vastly improved spatial resolution afforded by InSAR relative to traditional surveying techniques in detecting groundwater-related effects, including subsidence. This has allowed for further understanding of the complexity of subsidence bowls and the role of secondary factors such as structure, aquifer material properties and other previously unforeseen factors. In the western U.S., ground subsidence related to mine dewatering is a common occurrence due to the very large volumes of water (as high as 100,000 acre-ft/yr) that are typically pumped in order to lower the local groundwater table to facilitate the excavation of open pit and underground mines. Several gold mines located along the Carlin Trend of Central Nevada have produced distinct InSAR-identified subsidence signals of greater aerial extent and magnitude than most municipal groundwater signals, including signals partly or entirely within bedrock. One signal in particular shows a minimum of 54 cm of cumulative dewatering related subsidence between June 1, 1992 and September 21, 2000. Our study has produced many (>50) interferograms, each covering different time intervals, allowing a better understanding of how the subsidence signal has evolved in response to varied pumping rates from dewatering wells. Since the spatial resolution of the InSAR is much better than that of the monitoring well locations, the complexity of the signal is better delineated. The aerial extent of the subsidence feature is impressive as it extends as far as 20 km away from the location of the extraction wells used for dewatering. The area of maximum subsidence correlates well with the area of maximum groundwater drawdown, however the subsidence signal extends well beyond (as much as 8-10 km) the observed groundwater drawdown pattern. This suggests a much

  5. Data Mining Approaches for Genome-Wide Association of Mood Disorders

    PubMed Central

    Pirooznia, Mehdi; Seifuddin, Fayaz; Judy, Jennifer; Mahon, Pamela B.; Potash, James B.; Zandi, Peter P.

    2012-01-01

    Mood disorders are highly heritable forms of major mental illness. A major breakthrough in elucidating the genetic architecture of mood disorders was anticipated with the advent of genome-wide association studies (GWAS). However, to date few susceptibility loci have been conclusively identified. The genetic etiology of mood disorders appears to be quite complex, and as a result, alternative approaches for analyzing GWAS data are needed. Recently, a polygenic scoring approach that captures the effects of alleles across multiple loci was successfully applied to the analysis of GWAS data in schizophrenia and bipolar disorder (BP). However, this method may be overly simplistic in its approach to the complexity of genetic effects. Data mining methods are available that may be applied to analyze the high dimensional data generated by GWAS of complex psychiatric disorders. We sought to compare the performance of five data mining methods, namely, Bayesian Networks (BN), Support Vector Machine (SVM), Random Forest (RF), Radial Basis Function network (RBF), and Logistic Regression (LR), against the polygenic scoring approach in the analysis of GWAS data on BP. The different classification methods were trained on GWAS datasets from the Bipolar Genome Study (2,191 cases with BP and 1,434 controls) and their ability to accurately classify case/control status was tested on a GWAS dataset from the Wellcome Trust Case Control Consortium. The performance of the classifiers in the test dataset was evaluated by comparing area under the receiver operating characteristic curves (AUC). BN performed the best of all the data mining classifiers, but none of these did significantly better than the polygenic score approach. We further examined a subset of SNPs in genes that are expressed in the brain, under the hypothesis that these might be most relevant to BP susceptibility, but all the classifiers performed worse with this reduced set of SNPs. The discriminative accuracy of all of these

  6. Identifying Learning Behaviors by Contextualizing Differential Sequence Mining with Action Features and Performance Evolution

    ERIC Educational Resources Information Center

    Kinnebrew, John S.; Biswas, Gautam

    2012-01-01

    Our learning-by-teaching environment, Betty's Brain, captures a wealth of data on students' learning interactions as they teach a virtual agent. This paper extends an exploratory data mining methodology for assessing and comparing students' learning behaviors from these interaction traces. The core algorithm employs sequence mining techniques to…

  7. IDENTIFYING RECENT SURFACE MINING ACTIVITIES USING A NORMALIZED DIFFERENCE VEGETATION INDEX (NDVI) CHANGE DETECTION METHOD

    EPA Science Inventory



    Coal mining is a major resource extraction activity on the Appalachian Mountains. The increased size and frequency of a specific type of surface mining, known as mountain top removal-valley fill, has in recent years raised various environmental concerns. During mountainto...

  8. GTA: a game theoretic approach to identifying cancer subnetwork markers.

    PubMed

    Farahmand, S; Goliaei, S; Ansari-Pour, N; Razaghi-Moghadam, Z

    2016-03-01

    The identification of genetic markers (e.g. genes, pathways and subnetworks) for cancer has been one of the most challenging research areas in recent years. A subset of these studies attempt to analyze genome-wide expression profiles to identify markers with high reliability and reusability across independent whole-transcriptome microarray datasets. Therefore, the functional relationships of genes are integrated with their expression data. However, for a more accurate representation of the functional relationships among genes, utilization of the protein-protein interaction network (PPIN) seems to be necessary. Herein, a novel game theoretic approach (GTA) is proposed for the identification of cancer subnetwork markers by integrating genome-wide expression profiles and PPIN. The GTA method was applied to three distinct whole-transcriptome breast cancer datasets to identify the subnetwork markers associated with metastasis. To evaluate the performance of our approach, the identified subnetwork markers were compared with gene-based, pathway-based and network-based markers. We show that GTA is not only capable of identifying robust metastatic markers, it also provides a higher classification performance. In addition, based on these GTA-based subnetworks, we identified a new bonafide candidate gene for breast cancer susceptibility. PMID:26750920

  9. National Conference on Mining-Influenced Waters: Approaches for Characterization, Source Control and Treatment

    EPA Science Inventory

    The conference goal was to provide a forum for the exchange of scientific information on current and emerging approaches to assessing characterization, monitoring, source control, treatment and/or remediation on mining-influenced waters. The conference was aimed at mining remedi...

  10. TOXICITY APPROACHES TO ASSESSING MINING IMPACTS AND MINE WASTE TREATMENT EFFECTIVENESS

    EPA Science Inventory

    The USEPA Office of Research and Development's National Exposure Research Laboratory and National Risk Management Research Laboratory have been evaluating the impact of mining sites on receiving streams and the effectiveness of waste treatment technologies in removing toxicity fo...

  11. Acid mine drainage risks - A modeling approach to siting mine facilities in Northern Minnesota USA

    NASA Astrophysics Data System (ADS)

    Myers, Tom

    2016-02-01

    Most watershed-scale planning for mine-caused contamination concerns remediation of past problems while future planning relies heavily on engineering controls. As an alternative, a watershed scale groundwater fate and transport model for the Rainy Headwaters, a northeastern Minnesota watershed, has been developed to examine the risks of leaks or spills to a pristine downstream watershed. The model shows that the risk depends on the location and whether the source of the leak is on the surface or from deeper underground facilities. Underground sources cause loads that last longer but arrive at rivers after a longer travel time and have lower concentrations due to dilution and attenuation. Surface contaminant sources could cause much more short-term damage to the resource. Because groundwater dominates baseflow, mine contaminant seepage would cause the most damage during low flow periods. Groundwater flow and transport modeling is a useful tool for decreasing the risk to downgradient sources by aiding in the placement of mine facilities. Although mines are located based on the minerals, advance planning and analysis could avoid siting mine facilities where failure or leaks would cause too much natural resource damage. Watershed scale transport modeling could help locate the facilities or decide in advance that the mine should not be constructed due to the risk to downstream resources.

  12. Large screen approaches to identify novel malaria vaccine candidates.

    PubMed

    Davies, D Huw; Duffy, Patrick; Bodmer, Jean-Luc; Felgner, Philip L; Doolan, Denise L

    2015-12-22

    Until recently, malaria vaccine development efforts have focused almost exclusively on a handful of well characterized Plasmodium falciparum antigens. Despite dedicated work by many researchers on different continents spanning more than half a century, a successful malaria vaccine remains elusive. Sequencing of the P. falciparum genome has revealed more than five thousand genes, providing the foundation for systematic approaches to discover candidate vaccine antigens. We are taking advantage of this wealth of information to discover new antigens that may be more effective vaccine targets. Herein, we describe different approaches to large-scale screening of the P. falciparum genome to identify targets of either antibody responses or T cell responses using human specimens collected in Controlled Human Malaria Infections (CHMI) or under conditions of natural exposure in the field. These genome, proteome and transcriptome based approaches offer enormous potential for the development of an efficacious malaria vaccine. PMID:26428458

  13. Identifying Prolonged Grief Reactions in Children: Dimensional and Diagnostic Approaches

    PubMed Central

    Melhem, Nadine M.; Porta, Giovanna; Payne, Monica Walker; Brent, David A.

    2013-01-01

    Objective Children with prolonged grief reactions (PGR) have been found to be at increased risk for depression and functional impairment. Identifying and diagnosing PGR in children is challenging, as there are no available dimensional measures with established thresholds and no diagnostic criteria in the DSM-IV. We examine thresholds for the Inventory for Complicated Grief–Revised for Children (ICG-RC) and compare this dimensional approach to the proposed DSM-5 criteria for Persistent Complex Bereavement-Related Disorder. We also identify a screening tool for PGR. Method Parentally bereaved children, 8–17 years of age, were assessed at 9, 21, and 33 months after parental death. Receiver Operator Characteristics were used to establish the “best threshold” that would identify children with PGR and evaluate the proposed DSM-5 criteria cross-sectionally and longitudinally. Results A score of 68 or higher on the ICG-RC was found to have high sensitivity (0.942) and specificity (0.965) in differentiating cases with PGR from noncases at 9 months. We also identify a 6-item screening tool that consists of longing and yearning for the deceased, inability to accept the death, shock, disbelief, loneliness, and a changed world view. The proposed DSM-5 criteria only correctly identified 20% to 41.7% of cases with PGR at different timepoints. Conclusions For the identification of youth at risk for PGR, the dimensional approach outperformed the proposed categorical diagnostic criteria. We propose a brief screening scale that, if validated, can help clinicians identify bereaved children at risk for PGR, and guide the development of prevention and intervention strategies. PMID:23702449

  14. Trend Motif: A Graph Mining Approach for Analysis of Dynamic Complex Networks

    SciTech Connect

    Jin, R; McCallen, S; Almaas, E

    2007-05-28

    Complex networks have been used successfully in scientific disciplines ranging from sociology to microbiology to describe systems of interacting units. Until recently, studies of complex networks have mainly focused on their network topology. However, in many real world applications, the edges and vertices have associated attributes that are frequently represented as vertex or edge weights. Furthermore, these weights are often not static, instead changing with time and forming a time series. Hence, to fully understand the dynamics of the complex network, we have to consider both network topology and related time series data. In this work, we propose a motif mining approach to identify trend motifs for such purposes. Simply stated, a trend motif describes a recurring subgraph where each of its vertices or edges displays similar dynamics over a userdefined period. Given this, each trend motif occurrence can help reveal significant events in a complex system; frequent trend motifs may aid in uncovering dynamic rules of change for the system, and the distribution of trend motifs may characterize the global dynamics of the system. Here, we have developed efficient mining algorithms to extract trend motifs. Our experimental validation using three disparate empirical datasets, ranging from the stock market, world trade, to a protein interaction network, has demonstrated the efficiency and effectiveness of our approach.

  15. A novel approach to tag and identify geranylgeranylated proteins

    PubMed Central

    Chan, Lai N.; Hart, Courtenay; Guo, Lea; Nyberg, Tamara; Davies, Brandon S.J.; Fong, Loren G.; Young, Stephen G.; Agnew, Brian J.; Tamanoi, Fuyuhiko

    2010-01-01

    A recently developed proteomic strategy, the “GG-azide”-labeling approach, is described for the detection and proteomic analysis of geranylgeranylated proteins. This approach involves metabolic incorporation of a synthetic azido-geranylgeranyl analog and chemoselective derivatization of azido-geranylgeranyl-modified proteins by the “click” chemistry, using a tetramethylrhodamine-alkyne. The resulting conjugated proteins can be separated by 1-D or 2-D and pH fractionation, and detected by fluorescence imaging. This method is compatible with downstream LC-MS/MS analysis. Proteomic analysis of conjugated proteins by this approach identified several known geranylgeranylated proteins as well as Rap2c, a novel member of the Ras family. Furthermore, prenylation of progerin in mouse embryonic fibroblast cells was examined using this approach, demonstrating that this strategy can be used to study prenylation of specific proteins. The “GG-azide”-labeling approach provides a new tool for the detection and proteomic analysis of geranylgeranylated proteins, and it can readily be extended to other post-translational modifications. PMID:19784953

  16. Determining the familial risk distribution of colorectal cancer: a data mining approach.

    PubMed

    Chau, Rowena; Jenkins, Mark A; Buchanan, Daniel D; Ait Ouakrim, Driss; Giles, Graham G; Casey, Graham; Gallinger, Steven; Haile, Robert W; Le Marchand, Loic; Newcomb, Polly A; Lindor, Noralane M; Hopper, John L; Win, Aung Ko

    2016-04-01

    This study was aimed to characterize the distribution of colorectal cancer risk using family history of cancers by data mining. Family histories for 10,066 colorectal cancer cases recruited to population cancer registries of the Colon Cancer Family Registry were analyzed using a data mining framework. A novel index was developed to quantify familial cancer aggregation. Artificial neural network was used to identify distinct categories of familial risk. Standardized incidence ratios (SIRs) and corresponding 95% confidence intervals (CIs) of colorectal cancer were calculated for each category. We identified five major, and 66 minor categories of familial risk for developing colorectal cancer. The distribution the major risk categories were: (1) 7% of families (SIR = 7.11; 95% CI 6.65-7.59) had a strong family history of colorectal cancer; (2) 13% of families (SIR = 2.94; 95% CI 2.78-3.10) had a moderate family history of colorectal cancer; (3) 11% of families (SIR = 1.23; 95% CI 1.12-1.36) had a strong family history of breast cancer and a weak family history of colorectal cancer; (4) 9 % of families (SIR = 1.06; 95 % CI 0.96-1.18) had strong family history of prostate cancer and weak family history of colorectal cancer; and (5) 60% of families (SIR = 0.61; 95% CI 0.57-0.65) had a weak family history of all cancers. There is a wide variation of colorectal cancer risk that can be categorized by family history of cancer, with a strong gradient of colorectal cancer risk between the highest and lowest risk categories. The risk of colorectal cancer for people with the highest risk category of family history (7% of the population) was 12-times that for people in the lowest risk category (60%) of the population. Data mining was proven an effective approach for gaining insight into the underlying cancer aggregation patterns and for categorizing familial risk of colorectal cancer. PMID:26681340

  17. Hazards identified and the need for health risk assessment in the South African mining industry.

    PubMed

    Utembe, W; Faustman, E M; Matatiele, P; Gulumian, M

    2015-12-01

    Although mining plays a prominent role in the economy of South Africa, it is associated with many chemical hazards. Exposure to dust from mining can lead to many pathological effects depending on mineralogical composition, size, shape and levels and duration of exposure. Mining and processing of minerals also result in occupational exposure to toxic substances such as platinum, chromium, vanadium, manganese, mercury, cyanide and diesel particulate. South Africa has set occupational exposure limits (OELs) for some hazards, but mine workers are still at a risk. Since the hazard posed by a mineral depends on its physiochemical properties, it is recommended that South Africa should not simply adopt OELs from other countries but rather set her own standards based on local toxicity studies. The limits should take into account the issue of mixtures to which workers could be exposed as well as the health status of the workers. The mining industry is also a source of contamination of the environment, due inter alia to the large areas of tailings dams and dumps left behind. Therefore, there is need to develop guidelines for safe land-uses of contaminated lands after mine closure. PMID:26614808

  18. The Usage of Association Rule Mining to Identify Influencing Factors on Deafness After Birth

    PubMed Central

    Shahraki, Azimeh Danesh; Safdari, Reza; Gahfarokhi, Hamid Habibi; Tahmasebian, Shahram

    2015-01-01

    Background: Providing complete and high quality health care services has very important role to enable people to understand the factors related to personal and social health and to make decision regarding choice of suitable healthy behaviors in order to achieve healthy life. For this reason, demographic and clinical data of person are collecting, this huge volume of data can be known as a valuable resource for analyzing, exploring and discovering valuable information and communication. This study using forum rules techniques in the data mining has tried to identify the affecting factors on hearing loss after birth in Iran. Materials and Methods: The survey is kind of data oriented study. The population of the study is contained questionnaires in several provinces of the country. First, all data of questionnaire was implemented in the form of information table in Software SQL Server and followed by Data Entry using written software of C # .Net, then algorithm Association in SQL Server Data Tools software and Clementine software was implemented to determine the rules and hidden patterns in the gathered data. Findings: Two factors of number of deaf brothers and the degree of consanguinity of the parents have a significant impact on severity of deafness of individuals. Also, when the severity of hearing loss is greater than or equal to moderately severe hearing loss, people use hearing aids and Men are also less interested in the use of hearing aids. Conclusion: In fact, it can be said that in families with consanguineous marriage of parents that are from first degree (girl/boy cousins) and 2nd degree relatives (girl/boy cousins) and especially from first degree, the number of people with severe hearing loss or deafness are more and in the use of hearing aids, gender of the patient is more important than the severity of the hearing loss. PMID:26862245

  19. A genomic approach to identify hybrid incompatibility genes

    PubMed Central

    Cooper, Jacob C.; Phadnis, Nitin

    2016-01-01

    ABSTRACT Uncovering the genetic and molecular basis of barriers to gene flow between populations is key to understanding how new species are born. Intrinsic postzygotic reproductive barriers such as hybrid sterility and hybrid inviability are caused by deleterious genetic interactions known as hybrid incompatibilities. The difficulty in identifying these hybrid incompatibility genes remains a rate-limiting step in our understanding of the molecular basis of speciation. We recently described how whole genome sequencing can be applied to identify hybrid incompatibility genes, even from genetically terminal hybrids. Using this approach, we discovered a new hybrid incompatibility gene, gfzf, between Drosophila melanogaster and Drosophila simulans, and found that it plays an essential role in cell cycle regulation. Here, we discuss the history of the hunt for incompatibility genes between these species, discuss the molecular roles of gfzf in cell cycle regulation, and explore how intragenomic conflict drives the evolution of fundamental cellular mechanisms that lead to the developmental arrest of hybrids. PMID:27230814

  20. Quantiles Regression Approach to Identifying the Determinant of Breastfeeding Duration

    NASA Astrophysics Data System (ADS)

    Mahdiyah; Norsiah Mohamed, Wan; Ibrahim, Kamarulzaman

    In this study, quantiles regression approach is applied to the data of Malaysian Family Life Survey (MFLS), to identify factors which are significantly related to the different conditional quantiles of the breastfeeding duration. It is known that the classical linear regression methods are based on minimizing residual sum of squared, but quantiles regression use a mechanism which are based on the conditional median function and the full range of other conditional quantile functions. Overall, it is found that the period of breastfeeding is significantly related to place of living, religion and total number of children in the family.

  1. Genetic approaches for identifying kinetochore components in Saccharomyces cerevisiae

    SciTech Connect

    Doheny, K.F.; Puziss, J.; Spencer, F.; Hieter, P.

    1993-12-31

    A fundamental aspect of the cell division cycle is the chromosome cycle in which each of the chromosomal DNA molecules undergoes a series of morphological changes and complex movements to ensure faithful distribution at mitosis. The gene products responsible for execution of the chromosome cycle include structural components, such as those that assemble into the mitotic spindle apparatus, and regulatory components, such as those that coordinate the ordered series of events leading to chromosome segregation within the cell cycle. We have been taking several genetic approaches to identify genes encoding determinants critical to the chromosome cycle in the budding yeast, S. cerevisiae.

  2. Practical Approaches for Mining Frequent Patterns in Molecular Datasets

    PubMed Central

    Naulaerts, Stefan; Moens, Sandy; Engelen, Kristof; Berghe, Wim Vanden; Goethals, Bart; Laukens, Kris; Meysman, Pieter

    2016-01-01

    Pattern detection is an inherent task in the analysis and interpretation of complex and continuously accumulating biological data. Numerous itemset mining algorithms have been developed in the last decade to efficiently detect specific pattern classes in data. Although many of these have proven their value for addressing bioinformatics problems, several factors still slow down promising algorithms from gaining popularity in the life science community. Many of these issues stem from the low user-friendliness of these tools and the complexity of their output, which is often large, static, and consequently hard to interpret. Here, we apply three software implementations on common bioinformatics problems and illustrate some of the advantages and disadvantages of each, as well as inherent pitfalls of biological data mining. Frequent itemset mining exists in many different flavors, and users should decide their software choice based on their research question, programming proficiency, and added value of extra features. PMID:27168722

  3. Practical Approaches for Mining Frequent Patterns in Molecular Datasets.

    PubMed

    Naulaerts, Stefan; Moens, Sandy; Engelen, Kristof; Berghe, Wim Vanden; Goethals, Bart; Laukens, Kris; Meysman, Pieter

    2016-01-01

    Pattern detection is an inherent task in the analysis and interpretation of complex and continuously accumulating biological data. Numerous itemset mining algorithms have been developed in the last decade to efficiently detect specific pattern classes in data. Although many of these have proven their value for addressing bioinformatics problems, several factors still slow down promising algorithms from gaining popularity in the life science community. Many of these issues stem from the low user-friendliness of these tools and the complexity of their output, which is often large, static, and consequently hard to interpret. Here, we apply three software implementations on common bioinformatics problems and illustrate some of the advantages and disadvantages of each, as well as inherent pitfalls of biological data mining. Frequent itemset mining exists in many different flavors, and users should decide their software choice based on their research question, programming proficiency, and added value of extra features. PMID:27168722

  4. Functional epigenetic approach identifies frequently methylated genes in Ewing sarcoma.

    PubMed

    Alholle, Abdullah; Brini, Anna T; Gharanei, Seley; Vaiyapuri, Sumathi; Arrigoni, Elena; Dallol, Ashraf; Gentle, Dean; Kishida, Takeshi; Hiruma, Toru; Avigad, Smadar; Grimer, Robert; Maher, Eamonn R; Latif, Farida

    2013-11-01

    Using a candidate gene approach we recently identified frequent methylation of the RASSF2 gene associated with poor overall survival in Ewing sarcoma (ES). To identify effective biomarkers in ES on a genome-wide scale, we used a functionally proven epigenetic approach, in which gene expression was induced in ES cell lines by treatment with a demethylating agent followed by hybridization onto high density gene expression microarrays. After following a strict selection criterion, 34 genes were selected for expression and methylation analysis in ES cell lines and primary ES. Eight genes (CTHRC1, DNAJA4, ECHDC2, NEFH, NPTX2, PHF11, RARRES2, TSGA14) showed methylation frequencies of>20% in ES tumors (range 24-71%), these genes were expressed in human bone marrow derived mesenchymal stem cells (hBMSC) and hypermethylation was associated with transcriptional silencing. Methylation of NPTX2 or PHF11 was associated with poorer prognosis in ES. In addition, six of the above genes also showed methylation frequency of>20% (range 36-50%) in osteosarcomas. Identification of these genes may provide insights into bone cancer tumorigenesis and development of epigenetic biomarkers for prognosis and detection of these rare tumor types. PMID:24005033

  5. Experimental approaches to identify non-coding RNAs

    PubMed Central

    Hüttenhofer, Alexander; Vogel, Jörg

    2006-01-01

    Cellular RNAs that do not function as messenger RNAs (mRNAs), transfer RNAs (tRNAs) or ribosomal RNAs (rRNAs) comprise a diverse class of molecules that are commonly referred to as non-protein-coding RNAs (ncRNAs). These molecules have been known for quite a while, but their importance was not fully appreciated until recent genome-wide searches discovered thousands of these molecules and their genes in a variety of model organisms. Some of these screens were based on biocomputational prediction of ncRNA candidates within entire genomes of model organisms. Alternatively, direct biochemical isolation of expressed ncRNAs from cells, tissues or entire organisms has been shown to be a powerful approach to identify ncRNAs both at the level of individual molecules and at a global scale. In this review, we will survey several such wet-lab strategies, i.e. direct sequencing of ncRNAs, shotgun cloning of small-sized ncRNAs (cDNA libraries), microarray analysis and genomic SELEX to identify novel ncRNAs, and discuss the advantages and limits of these approaches. PMID:16436800

  6. Detecting Structural Damage of Nuclear Power Plant by Interactive Data Mining Approach

    SciTech Connect

    Yufei Shu

    2006-07-01

    This paper presents a nonlinear structural damage identification technique, based on an interactive data mining approach, which integrates a human cognitive model in a data mining loop. A mining control agent emulating human analysts is developed, which directly interacts with the data miner, analyzing and verifying the output of the data miner and controlling the data mining process. Additionally, an artificial neural network method, which is adopted as a core component of the proposed interactive data mining method, is evolved by adding a novelty detecting and retraining function for handling complicated nuclear power plant quake-proof data. Plant quake-proof testing data has been applied to the system to show the validation of the proposed method. (author)

  7. An online approach for mining collective behaviors from molecular dynamics simulations.

    PubMed

    Ramanathan, Arvind; Agarwal, Pratul K; Kurnikova, Maria; Langmead, Christopher J

    2010-03-01

    Collective behavior involving distally separate regions in a protein is known to widely affect its function. In this article, we present an online approach to study and characterize collective behavior in proteins as molecular dynamics (MD) simulations progress. Our representation of MD simulations as a stream of continuously evolving data allows us to succinctly capture spatial and temporal dependencies that may exist and analyze them efficiently using data mining techniques. By using tensor analysis we identify (a) collective motions (i.e., dynamic couplings) and (b) time-points during the simulation where the collective motions suddenly change. We demonstrate the applicability of this method on two different protein simulations for barnase and cyclophilin A. We characterize the collective motions in these proteins using our method and analyze sudden changes in these motions. Taken together, our results indicate that tensor analysis is well suited to extracting information from MD trajectories in an online fashion. PMID:20377447

  8. An Integrated Approach to Identifying International Foodborne Norovirus Outbreaks1

    PubMed Central

    Kouyos, Roger D.; Vennema, Harry; Kroneman, Annelies; Siebenga, Joukje; van Pelt, Wilfrid; Koopmans, Marion

    2011-01-01

    International foodborne norovirus outbreaks can be difficult to recognize when using standard outbreak investigation methods. In a novel approach, we provide step-wise selection criteria to identify clusters of outbreaks that may involve an internationally distributed common foodborne source. After computerized linking of epidemiologic data to aligned sequences, we retrospectively identified 100 individually reported outbreaks that potentially represented 14 international common source events in Europe during 1999–2008. Analysis of capsid sequences of outbreak strains (n = 1,456), showed that ≈7% of outbreaks reported to the Foodborne Viruses in Europe database were part of an international event (range 2%–9%), compared with 0.4% identified through standard epidemiologic investigations. Our findings point to a critical gap in surveillance and suggest that international collaboration could have increased the number of recognized international foodborne outbreaks. Real-time exchange of combined epidemiologic and molecular data is needed to validate our findings through timely trace-backs of clustered outbreaks. PMID:21392431

  9. Diagnosis of cardiovascular abnormalities from compressed ECG: a data mining-based approach.

    PubMed

    Sufi, Fahim; Khalil, Ibrahim

    2011-01-01

    Usage of compressed ECG for fast and efficient telecardiology application is crucial, as ECG signals are enormously large in size. However, conventional ECG diagnosis algorithms require the compressed ECG packets to be decompressed before diagnosis can be performed. This added step of decompression before performing diagnosis for every ECG packet introduces unnecessary delay, which is undesirable for cardiovascular diseased (CVD) patients. In this paper, we are demonstrating an innovative technique that performs real-time classification of CVD. With the help of this real-time classification of CVD, the emergency personnel or the hospital can automatically be notified via SMS/MMS/e-mail when a life-threatening cardiac abnormality of the CVD affected patient is detected. Our proposed system initially uses data mining techniques, such as attribute selection (i.e., selects only a few features from the compressed ECG) and expectation maximization (EM)-based clustering. These data mining techniques running on a hospital server generate a set of constraints for representing each of the abnormalities. Then, the patient's mobile phone receives these set of constraints and employs a rule-based system that can identify each of abnormal beats in real time. Our experimentation results on 50 MIT-BIH ECG entries reveal that the proposed approach can successfully detect cardiac abnormalities (e.g., ventricular flutter/fibrillation, premature ventricular contraction, atrial fibrillation, etc.) with 97% accuracy on average. This innovative data mining technique on compressed ECG packets enables faster identification of cardiac abnormality directly from the compressed ECG, helping to build an efficient telecardiology diagnosis system. PMID:21097383

  10. Identifying Subgroups among Hardcore Smokers: a Latent Profile Approach

    PubMed Central

    Bommelé, Jeroen; Kleinjan, Marloes; Schoenmakers, Tim M.; Burk, William J.; van den Eijnden, Regina; van de Mheen, Dike

    2015-01-01

    Introduction Hardcore smokers are smokers who have little to no intention to quit. Previous research suggests that there are distinct subgroups among hardcore smokers and that these subgroups vary in the perceived pros and cons of smoking and quitting. Identifying these subgroups could help to develop individualized messages for the group of hardcore smokers. In this study we therefore used the perceived pros and cons of smoking and quitting to identify profiles among hardcore smokers. Methods A sample of 510 hardcore smokers completed an online survey on the perceived pros and cons of smoking and quitting. We used these perceived pros and cons in a latent profile analysis to identify possible subgroups among hardcore smokers. To validate the profiles identified among hardcore smokers, we analysed data from a sample of 338 non-hardcore smokers in a similar way. Results We found three profiles among hardcore smokers. ‘Receptive’ hardcore smokers (36%) perceived many cons of smoking and many pros of quitting. ‘Ambivalent’ hardcore smokers (59%) were rather undecided towards quitting. ‘Resistant’ hardcore smokers (5%) saw few cons of smoking and few pros of quitting. Among non-hardcore smokers, we found similar groups of ‘receptive’ smokers (30%) and ‘ambivalent’ smokers (54%). However, a third group consisted of ‘disengaged’ smokers (16%), who saw few pros and cons of both smoking and quitting. Discussion Among hardcore smokers, we found three distinct profiles based on perceived pros and cons of smoking. This indicates that hardcore smokers are not a homogenous group. Each profile might require a different tobacco control approach. Our findings may help to develop individualized tobacco control messages for the particularly hard-to-reach group of hardcore smokers. PMID:26207829

  11. A Visualization System Using Data Mining Techniques for Identifying Information Sources.

    ERIC Educational Resources Information Center

    Fowler, Richard H.; Karadayi, Tarkan; Chen, Zhixiang; Meng, Xiannong; Fowler, Wendy A. Lawrence

    The Visual Analysis System (VAS) was developed to couple emerging successes in data mining with information visualization techniques in order to create a richly interactive environment for information retrieval from the World Wide Web. VAS's retrieval strategy operates by first using a conventional search engine to form a core set of retrieved…

  12. Novel approaches to identify protein adducts produced by lipid peroxidation.

    PubMed

    Codreanu, S G; Liebler, D C

    2015-01-01

    Lipid peroxidation is responsible for the generation of chemically reactive, diffusible lipid-derived electrophiles (LDEs) that covalently modify cellular protein targets. These protein modifications modulate protein activity and macromolecular interactions and induce adaptive and toxic cell signaling. Protein modifications induced by LDEs can be identified and quantified by affinity enrichment and liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based techniques. Tagged LDE analog probes with different electrophilic groups can be covalently captured by click chemistry for LC-MS/MS analyses, thereby enabling in-depth studies of proteome damage at the protein and peptide sequence levels. Conversely, click-reactive, thiol-directed probes can be used to evaluate thiol damage caused by LDE by difference. These analytical approaches permit systematic study of the dynamics of protein damage caused by LDE and mechanisms by which oxidative stress contribute to toxicity and diseases. PMID:25819163

  13. A metabolomics approach to characterise and identify various Mycobacterium species.

    PubMed

    Olivier, Ilse; Loots, Du Toit

    2012-03-01

    We investigated the potential use of gas chromatography mass spectrometry (GC-MS), in combination with multivariate statistical data processing, to build a model for the classification of various tuberculosis (TB) causing, and non-TB Mycobacterium species, on the basis of their characteristic metabolite profiles. A modified Bligh-Dyer extraction procedure was used to extract lipid components from Mycobacterium tuberculosis, Mycobacterium avium, Mycobacterium bovis, and Mycobacterium kansasii cultures. Principle component analyses (PCA) of the GC-MS generated data showed a clear differentiation between all the Mycobacterium species tested. Subsequently, the 12 compounds best describing the variation between the sample groups were identified as potential metabolite markers, using PCA and partial least-squares discriminant analysis (PLS-DA). These metabolite markers were then used to build a discriminant classification model based on Bayes' theorem, in conjunction with multivariate kernel density estimation. This model subsequently correctly classified 2 "unknown" samples for each of the Mycobacterium species analysed, with probabilities ranging from 72 to 100%. Furthermore, Mycobacterium species classification could be achieved in less than 16 h, and the detection limit for this approach was 1×10(3)bacteriamL(-1). This study proves the capacity of a GC-MS, metabolomics pattern recognition approach for its possible use in TB diagnostics and disease characterisation. PMID:22301369

  14. Using Bioinformatic Approaches to Identify Pathways Targeted by Human Leukemogens

    PubMed Central

    Thomas, Reuben; Phuong, Jimmy; McHale, Cliona M.; Zhang, Luoping

    2012-01-01

    We have applied bioinformatic approaches to identify pathways common to chemical leukemogens and to determine whether leukemogens could be distinguished from non-leukemogenic carcinogens. From all known and probable carcinogens classified by IARC and NTP, we identified 35 carcinogens that were associated with leukemia risk in human studies and 16 non-leukemogenic carcinogens. Using data on gene/protein targets available in the Comparative Toxicogenomics Database (CTD) for 29 of the leukemogens and 11 of the non-leukemogenic carcinogens, we analyzed for enrichment of all 250 human biochemical pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. The top pathways targeted by the leukemogens included metabolism of xenobiotics by cytochrome P450, glutathione metabolism, neurotrophin signaling pathway, apoptosis, MAPK signaling, Toll-like receptor signaling and various cancer pathways. The 29 leukemogens formed 18 distinct clusters comprising 1 to 3 chemicals that did not correlate with known mechanism of action or with structural similarity as determined by 2D Tanimoto coefficients in the PubChem database. Unsupervised clustering and one-class support vector machines, based on the pathway data, were unable to distinguish the 29 leukemogens from 11 non-leukemogenic known and probable IARC carcinogens. However, using two-class random forests to estimate leukemogen and non-leukemogen patterns, we estimated a 76% chance of distinguishing a random leukemogen/non-leukemogen pair from each other. PMID:22851955

  15. Data-mining the FlyAtlas online resource to identify core functional motifs across transporting epithelia

    PubMed Central

    2013-01-01

    Background Comparative analysis of tissue-specific transcriptomes is a powerful technique to uncover tissue functions. Our FlyAtlas.org provides authoritative gene expression levels for multiple tissues of Drosophila melanogaster (1). Although the main use of such resources is single gene lookup, there is the potential for powerful meta-analysis to address questions that could not easily be framed otherwise. Here, we illustrate the power of data-mining of FlyAtlas data by comparing epithelial transcriptomes to identify a core set of highly-expressed genes, across the four major epithelial tissues (salivary glands, Malpighian tubules, midgut and hindgut) of both adults and larvae. Method Parallel hypothesis-led and hypothesis-free approaches were adopted to identify core genes that underpin insect epithelial function. In the former, gene lists were created from transport processes identified in the literature, and their expression profiles mapped from the flyatlas.org online dataset. In the latter, gene enrichment lists were prepared for each epithelium, and genes (both transport related and unrelated) consistently enriched in transporting epithelia identified. Results A key set of transport genes, comprising V-ATPases, cation exchangers, aquaporins, potassium and chloride channels, and carbonic anhydrase, was found to be highly enriched across the epithelial tissues, compared with the whole fly. Additionally, a further set of genes that had not been predicted to have epithelial roles, were co-expressed with the core transporters, extending our view of what makes a transporting epithelium work. Further insights were obtained by studying the genes uniquely overexpressed in each epithelium; for example, the salivary gland expresses lipases, the midgut organic solute transporters, the tubules specialize for purine metabolism and the hindgut overexpresses still unknown genes. Conclusion Taken together, these data provide a unique insight into epithelial function in this

  16. Data mining approaches for information retrieval from genomic databases

    NASA Astrophysics Data System (ADS)

    Liu, Donglin; Singh, Gautam B.

    2000-04-01

    Sequence retrieval in genomic databases is used for finding sequences related to a query sequence specified by a user. Comparison is the main part of the retrieval system in genomic databases. An efficient sequence comparison algorithm is critical in bioinformatics. There are several different algorithms to perform sequence comparison, such as the suffix array based database search, divergence measurement, methods that rely upon the existence of a local similarity between the query sequence and sequences in the database, or common mutual information between query and sequences in DB. In this paper we have described a new method for DNA sequence retrieval based on data mining techniques. Data mining tools generally find patterns among data and have been successfully applied in industries to improve marketing, sales, and customer support operations. We have applied the descriptive data mining techniques to find relevant patterns that are significant for comparing genetic sequences. Relevance feedback score based on common patterns is developed and employed to compute distance between sequences. The contigs of human chromosomes are used to test the retrieval accuracy and the experimental results are presented.

  17. An Approach to Realizing Process Control for Underground Mining Operations of Mobile Machines

    PubMed Central

    Song, Zhen; Schunnesson, Håkan; Rinne, Mikael; Sturgul, John

    2015-01-01

    The excavation and production in underground mines are complicated processes which consist of many different operations. The process of underground mining is considerably constrained by the geometry and geology of the mine. The various mining operations are normally performed in series at each working face. The delay of a single operation will lead to a domino effect, thus delay the starting time for the next process and the completion time of the entire process. This paper presents a new approach to the process control for underground mining operations, e.g. drilling, bolting, mucking. This approach can estimate the working time and its probability for each operation more efficiently and objectively by improving the existing PERT (Program Evaluation and Review Technique) and CPM (Critical Path Method). If the delay of the critical operation (which is on a critical path) inevitably affects the productivity of mined ore, the approach can rapidly assign mucking machines new jobs to increase this amount at a maximum level by using a new mucking algorithm under external constraints. PMID:26062092

  18. A Feature Mining Based Approach for the Classification of Text Documents into Disjoint Classes.

    ERIC Educational Resources Information Center

    Nieto Sanchez, Salvador; Triantaphyllou, Evangelos; Kraft, Donald

    2002-01-01

    Proposes a new approach for classifying text documents into two disjoint classes. Highlights include a brief overview of document clustering; a data mining approach called the One Clause at a Time (OCAT) algorithm which is based on mathematical logic; vector space model (VSM); and comparing the OCAT to the VSM. (Author/LRW)

  19. Dual-band, infrared buried mine detection using a statistical pattern recognition approach

    SciTech Connect

    Buhl, M.R.; Hernandez, J.E.; Clark, G.A.; Sengupta, S.K.

    1993-08-01

    The main objective of this work was to detect surrogate land mines, which were buried in clay and sand, using dual-band, infrared images. A statistical pattern recognition approach was used to achieve this objective. This approach is discussed and results of applying it to real images are given.

  20. THE FUTURE OF COMPUTER-BASED TOXICITY PREDICTION: MECHANISM-BASED MODELS VS. INFORMATION MINING APPROACHES

    EPA Science Inventory


    The Future of Computer-Based Toxicity Prediction:
    Mechanism-Based Models vs. Information Mining Approaches

    When we speak of computer-based toxicity prediction, we are generally referring to a broad array of approaches which rely primarily upon chemical structure ...

  1. A review of approaches to identifying patient phenotype cohorts using electronic health records

    PubMed Central

    Shivade, Chaitanya; Raghavan, Preethi; Fosler-Lussier, Eric; Embi, Peter J; Elhadad, Noemie; Johnson, Stephen B; Lai, Albert M

    2014-01-01

    Objective To summarize literature describing approaches aimed at automatically identifying patients with a common phenotype. Materials and methods We performed a review of studies describing systems or reporting techniques developed for identifying cohorts of patients with specific phenotypes. Every full text article published in (1) Journal of American Medical Informatics Association, (2) Journal of Biomedical Informatics, (3) Proceedings of the Annual American Medical Informatics Association Symposium, and (4) Proceedings of Clinical Research Informatics Conference within the past 3 years was assessed for inclusion in the review. Only articles using automated techniques were included. Results Ninety-seven articles met our inclusion criteria. Forty-six used natural language processing (NLP)-based techniques, 24 described rule-based systems, 41 used statistical analyses, data mining, or machine learning techniques, while 22 described hybrid systems. Nine articles described the architecture of large-scale systems developed for determining cohort eligibility of patients. Discussion We observe that there is a rise in the number of studies associated with cohort identification using electronic medical records. Statistical analyses or machine learning, followed by NLP techniques, are gaining popularity over the years in comparison with rule-based systems. Conclusions There are a variety of approaches for classifying patients into a particular phenotype. Different techniques and data sources are used, and good performance is reported on datasets at respective institutions. However, no system makes comprehensive use of electronic medical records addressing all of their known weaknesses. PMID:24201027

  2. Mining 3D genome structure populations identifies major factors governing the stability of regulatory communities

    PubMed Central

    Dai, Chao; Li, Wenyuan; Tjong, Harianto; Hao, Shengli; Zhou, Yonggang; Li, Qingjiao; Chen, Lin; Zhu, Bing; Alber, Frank; Jasmine Zhou, Xianghong

    2016-01-01

    Three-dimensional (3D) genome structures vary from cell to cell even in an isogenic sample. Unlike protein structures, genome structures are highly plastic, posing a significant challenge for structure-function mapping. Here we report an approach to comprehensively identify 3D chromatin clusters that each occurs frequently across a population of genome structures, either deconvoluted from ensemble-averaged Hi-C data or from a collection of single-cell Hi-C data. Applying our method to a population of genome structures (at the macrodomain resolution) of lymphoblastoid cells, we identify an atlas of stable inter-chromosomal chromatin clusters. A large number of these clusters are enriched in binding of specific regulatory factors and are therefore defined as ‘Regulatory Communities.' We reveal two major factors, centromere clustering and transcription factor binding, which significantly stabilize such communities. Finally, we show that the regulatory communities differ substantially from cell to cell, indicating that expression variability could be impacted by genome structures. PMID:27240697

  3. Social Network Analysis and Mining to Monitor and Identify Problems with Large-Scale Information and Communication Technology Interventions

    PubMed Central

    da Silva, Aleksandra do Socorro; de Brito, Silvana Rossy; Vijaykumar, Nandamudi Lankalapalli; da Rocha, Cláudio Alex Jorge; Monteiro, Maurílio de Abreu; Costa, João Crisóstomo Weyl Albuquerque; Francês, Carlos Renato Lisboa

    2016-01-01

    The published literature reveals several arguments concerning the strategic importance of information and communication technology (ICT) interventions for developing countries where the digital divide is a challenge. Large-scale ICT interventions can be an option for countries whose regions, both urban and rural, present a high number of digitally excluded people. Our goal was to monitor and identify problems in interventions aimed at certification for a large number of participants in different geographical regions. Our case study is the training at the Telecentros.BR, a program created in Brazil to install telecenters and certify individuals to use ICT resources. We propose an approach that applies social network analysis and mining techniques to data collected from Telecentros.BR dataset and from the socioeconomics and telecommunications infrastructure indicators of the participants’ municipalities. We found that (i) the analysis of interactions in different time periods reflects the objectives of each phase of training, highlighting the increased density in the phase in which participants develop and disseminate their projects; (ii) analysis according to the roles of participants (i.e., tutors or community members) reveals that the interactions were influenced by the center (or region) to which the participant belongs (that is, a community contained mainly members of the same region and always with the presence of tutors, contradicting expectations of the training project, which aimed for intense collaboration of the participants, regardless of the geographic region); (iii) the social network of participants influences the success of the training: that is, given evidence that the degree of the community member is in the highest range, the probability of this individual concluding the training is 0.689; (iv) the North region presented the lowest probability of participant certification, whereas the Northeast, which served municipalities with similar

  4. Social Network Analysis and Mining to Monitor and Identify Problems with Large-Scale Information and Communication Technology Interventions.

    PubMed

    da Silva, Aleksandra do Socorro; de Brito, Silvana Rossy; Vijaykumar, Nandamudi Lankalapalli; da Rocha, Cláudio Alex Jorge; Monteiro, Maurílio de Abreu; Costa, João Crisóstomo Weyl Albuquerque; Francês, Carlos Renato Lisboa

    2016-01-01

    The published literature reveals several arguments concerning the strategic importance of information and communication technology (ICT) interventions for developing countries where the digital divide is a challenge. Large-scale ICT interventions can be an option for countries whose regions, both urban and rural, present a high number of digitally excluded people. Our goal was to monitor and identify problems in interventions aimed at certification for a large number of participants in different geographical regions. Our case study is the training at the Telecentros.BR, a program created in Brazil to install telecenters and certify individuals to use ICT resources. We propose an approach that applies social network analysis and mining techniques to data collected from Telecentros.BR dataset and from the socioeconomics and telecommunications infrastructure indicators of the participants' municipalities. We found that (i) the analysis of interactions in different time periods reflects the objectives of each phase of training, highlighting the increased density in the phase in which participants develop and disseminate their projects; (ii) analysis according to the roles of participants (i.e., tutors or community members) reveals that the interactions were influenced by the center (or region) to which the participant belongs (that is, a community contained mainly members of the same region and always with the presence of tutors, contradicting expectations of the training project, which aimed for intense collaboration of the participants, regardless of the geographic region); (iii) the social network of participants influences the success of the training: that is, given evidence that the degree of the community member is in the highest range, the probability of this individual concluding the training is 0.689; (iv) the North region presented the lowest probability of participant certification, whereas the Northeast, which served municipalities with similar

  5. Assessing Weather-Yield Relationships in Rice at Local Scale Using Data Mining Approaches

    PubMed Central

    Delerce, Sylvain; Dorado, Hugo; Grillon, Alexandre; Rebolledo, Maria Camila; Prager, Steven D.; Patiño, Victor Hugo; Garcés Varón, Gabriel; Jiménez, Daniel

    2016-01-01

    Seasonal and inter-annual climate variability have become important issues for farmers, and climate change has been shown to increase them. Simultaneously farmers and agricultural organizations are increasingly collecting observational data about in situ crop performance. Agriculture thus needs new tools to cope with changing environmental conditions and to take advantage of these data. Data mining techniques make it possible to extract embedded knowledge associated with farmer experiences from these large observational datasets in order to identify best practices for adapting to climate variability. We introduce new approaches through a case study on irrigated and rainfed rice in Colombia. Preexisting observational datasets of commercial harvest records were combined with in situ daily weather series. Using Conditional Inference Forest and clustering techniques, we assessed the relationships between climatic factors and crop yield variability at the local scale for specific cultivars and growth stages. The analysis showed clear relationships in the various location-cultivar combinations, with climatic factors explaining 6 to 46% of spatiotemporal variability in yield, and with crop responses to weather being non-linear and cultivar-specific. Climatic factors affected cultivars differently during each stage of development. For instance, one cultivar was affected by high nighttime temperatures in the reproductive stage but responded positively to accumulated solar radiation during the ripening stage. Another was affected by high nighttime temperatures during both the vegetative and reproductive stages. Clustering of the weather patterns corresponding to individual cropping events revealed different groups of weather patterns for irrigated and rainfed systems with contrasting yield levels. Best-suited cultivars were identified for some weather patterns, making weather-site-specific recommendations possible. This study illustrates the potential of data mining for

  6. A text mining approach to the prediction of disease status from clinical discharge summaries.

    PubMed

    Yang, Hui; Spasic, Irena; Keane, John A; Nenadic, Goran

    2009-01-01

    OBJECTIVE The authors present a system developed for the Challenge in Natural Language Processing for Clinical Data-the i2b2 obesity challenge, whose aim was to automatically identify the status of obesity and 15 related co-morbidities in patients using their clinical discharge summaries. The challenge consisted of two tasks, textual and intuitive. The textual task was to identify explicit references to the diseases, whereas the intuitive task focused on the prediction of the disease status when the evidence was not explicitly asserted. DESIGN The authors assembled a set of resources to lexically and semantically profile the diseases and their associated symptoms, treatments, etc. These features were explored in a hybrid text mining approach, which combined dictionary look-up, rule-based, and machine-learning methods. MEASUREMENTS The methods were applied on a set of 507 previously unseen discharge summaries, and the predictions were evaluated against a manually prepared gold standard. The overall ranking of the participating teams was primarily based on the macro-averaged F-measure. RESULTS The implemented method achieved the macro-averaged F-measure of 81% for the textual task (which was the highest achieved in the challenge) and 63% for the intuitive task (ranked 7(th) out of 28 teams-the highest was 66%). The micro-averaged F-measure showed an average accuracy of 97% for textual and 96% for intuitive annotations. CONCLUSIONS The performance achieved was in line with the agreement between human annotators, indicating the potential of text mining for accurate and efficient prediction of disease statuses from clinical discharge summaries. PMID:19390098

  7. Assessing Weather-Yield Relationships in Rice at Local Scale Using Data Mining Approaches.

    PubMed

    Delerce, Sylvain; Dorado, Hugo; Grillon, Alexandre; Rebolledo, Maria Camila; Prager, Steven D; Patiño, Victor Hugo; Garcés Varón, Gabriel; Jiménez, Daniel

    2016-01-01

    Seasonal and inter-annual climate variability have become important issues for farmers, and climate change has been shown to increase them. Simultaneously farmers and agricultural organizations are increasingly collecting observational data about in situ crop performance. Agriculture thus needs new tools to cope with changing environmental conditions and to take advantage of these data. Data mining techniques make it possible to extract embedded knowledge associated with farmer experiences from these large observational datasets in order to identify best practices for adapting to climate variability. We introduce new approaches through a case study on irrigated and rainfed rice in Colombia. Preexisting observational datasets of commercial harvest records were combined with in situ daily weather series. Using Conditional Inference Forest and clustering techniques, we assessed the relationships between climatic factors and crop yield variability at the local scale for specific cultivars and growth stages. The analysis showed clear relationships in the various location-cultivar combinations, with climatic factors explaining 6 to 46% of spatiotemporal variability in yield, and with crop responses to weather being non-linear and cultivar-specific. Climatic factors affected cultivars differently during each stage of development. For instance, one cultivar was affected by high nighttime temperatures in the reproductive stage but responded positively to accumulated solar radiation during the ripening stage. Another was affected by high nighttime temperatures during both the vegetative and reproductive stages. Clustering of the weather patterns corresponding to individual cropping events revealed different groups of weather patterns for irrigated and rainfed systems with contrasting yield levels. Best-suited cultivars were identified for some weather patterns, making weather-site-specific recommendations possible. This study illustrates the potential of data mining for

  8. A quantitative approach to identifying predators from nest remains

    USGS Publications Warehouse

    Anthony, R.M.; Grand, J.B.; Fondell, T.F.; Manly, B.F.

    2004-01-01

    Nesting success of Dusky Canada Geese (Branta canadensis occidentalis) has declined greatly since a major earthquake affected southern Alaska in 1964. To identify nest predators, we collected predation data at goose nests and photographs of predators at natural nests containing artificial eggs in 1997-2000. To document feeding behavior by nest predators, we compiled the evidence from destroyed nests with known predators on our study site and from previous studies. We constructed a profile for each predator group and compared the evidence from 895 nests with unknown predators to our predator profiles using mixture-model analysis. This analysis indicated that 72% of destroyed nests were depredated by Bald Eagles and 13% by brown bears, and also yielded the probability that each nest was correctly assigned to a predator group based on model fit. Model testing using simulations indicated that the proportion estimated for eagle predation was unbiased and the proportion for bear predation was slightly overestimated. This approach may have application whenever there are adequate data on nests destroyed by known predators and predators exhibit different feeding behavior at nests.

  9. A geometric approach to identify cavities in particle systems

    NASA Astrophysics Data System (ADS)

    Voyiatzis, Evangelos; Böhm, Michael C.; Müller-Plathe, Florian

    2015-11-01

    The implementation of a geometric algorithm to identify cavities in particle systems in an open-source python program is presented. The algorithm makes use of the Delaunay space tessellation. The present python software is based on platform-independent tools, leading to a portable program. Its successful execution provides information concerning the accessible volume fraction of the system, the size and shape of the cavities and the group of atoms forming each of them. The program can be easily incorporated into the LAMMPS software. An advantage of the present algorithm is that no a priori assumption on the cavity shape has to be made. As an example, the cavity size and shape distributions in a polyethylene melt system are presented for three spherical probe particles. This paper serves also as an introductory manual to the script. It summarizes the algorithm, its implementation, the required user-defined parameters as well as the format of the input and output files. Additionally, we demonstrate possible applications of our approach and compare its capability with the ones of well documented cavity size estimators.

  10. Identifying predictors of physics item difficulty: A linear regression approach

    NASA Astrophysics Data System (ADS)

    Mesic, Vanes; Muratovic, Hasnija

    2011-06-01

    Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal physics knowledge

  11. A fluorescent approach for identifying P2X1 ligands

    PubMed Central

    Ruepp, Marc-David; Brozik, James A.; de Esch, Iwan J.P.; Farndale, Richard W.; Murrell-Lagnado, Ruth D.; Thompson, Andrew J.

    2015-01-01

    There are no commercially available, small, receptor-specific P2X1 ligands. There are several synthetic derivatives of the natural agonist ATP and some structurally-complex antagonists including compounds such as PPADS, NTP-ATP, suramin and its derivatives (e.g. NF279, NF449). NF449 is the most potent and selective ligand, but potencies of many others are not particularly high and they can also act at other P2X, P2Y and non-purinergic receptors. While there is clearly scope for further work on P2X1 receptor pharmacology, screening can be difficult owing to rapid receptor desensitisation. To reduce desensitisation substitutions can be made within the N-terminus of the P2X1 receptor, but these could also affect ligand properties. An alternative is the use of fluorescent voltage-sensitive dyes that respond to membrane potential changes resulting from channel opening. Here we utilised this approach in conjunction with fragment-based drug-discovery. Using a single concentration (300 μM) we identified 46 novel leads from a library of 1443 fragments (hit rate = 3.2%). These hits were independently validated by measuring concentration-dependence with the same voltage-sensitive dye, and by visualising the competition of hits with an Alexa-647-ATP fluorophore using confocal microscopy; confocal yielded kon (1.142 × 106 M−1 s−1) and koff (0.136 s−1) for Alexa-647-ATP (Kd = 119 nM). The identified hit fragments had promising structural diversity. In summary, the measurement of functional responses using voltage-sensitive dyes was flexible and cost-effective because labelled competitors were not needed, effects were independent of a specific binding site, and both agonist and antagonist actions were probed in a single assay. The method is widely applicable and could be applied to all P2X family members, as well as other voltage-gated and ligand-gated ion channels. This article is part of the Special Issue entitled ‘Fluorescent Tools in Neuropharmacology

  12. An Efficient Leave One Block Out approach to identify outliers

    NASA Astrophysics Data System (ADS)

    Biagi, Ludovico; Caldera, Stefano

    2013-03-01

    In Least Squares (LS), the linearized functional model betweenM observables and N unknown parameters is given. LS provides estimates of parameters, observables, residuals and a posteriori variance. To identify outliers and to estimate accuracies and reliabilities, tests on the model and on the individual residuals can be performed at different levels of significance and power. However, LS is not robust: one outlier could be spread into all the residuals and its identification is difficult. A possible solution to this problem is given by a Leave One Block Out approach. Let's suppose that the observation vector can be decomposed into m sub-vectors (blocks) that are reciprocally uncorrelated: in the case of completely uncorrelated observations, m = M. A suspected block is excluded from the adjustment, whose results are used to check it. Clearly, the check is more robust, because one outlier in the excluded block does not affect the adjustment results. The process can be repeated on all the blocks, but can be very slow, because m adjustments must be computed. To efficiently apply Leave One Block Out, an algorithm has been studied. The usual LS adjustment is performed on all the observations to obtain the 'batch' results. The contribution of each block is subtracted from the batch results by algebraic decompositions, with a minimal computational effort: this holds for parameters, a posteriori residuals and variance. Therefore all the blocks can be checked. In the paper, the algorithm is discussed. Two examples of ELOBO application are presented: the first testifies ELOBO reliability against classical LS tests. In the second, ELOBO numerical efficiency is analyzed.

  13. Identifying Heterogeneous Anisotropic Properties in Cerebral Aneurysms: A Pointwise Approach

    PubMed Central

    Zhao, Xuefeng; Raghavan, Madhavan L.; Lu, Jia

    2014-01-01

    The traditional approaches of estimating heterogeneous properties in a soft tissue structure using optimization based inverse methods often face difficulties because of the large number of unknowns to be simultaneously determined. This article proposes a new method for identifying the heterogeneous anisotropic nonlinear elastic properties in cerebral aneurysms. In this method, the local properties are determined directly from the pointwise stress-strain data, thus avoiding the need for simultaneously optimizing for the property values at all points/regions in the aneurysm. The stress distributions needed for a pointwise identification are computed using an inverse elastostatic method without invoking the material properties in question. This paradigm is tested numerically through simulated inflation tests on an image-based cerebral aneurysm sac. The wall tissue is modeled as an eight-ply laminate whose constitutive behavior is described by an anisotropic hyperelastic strain-energy function containing four parameters. The parameters are assumed to vary continuously in the sac. Deformed configurations generated from forward finite element analysis are taken as input to inversely establish the parameter distributions. The delineated and the assigned distributions are in excellent agreement. A forward verification is conducted by comparing the displacement solutions obtained from the delineated and the assigned material parameters at a different pressure. The deviations in nodal displacements are found to be within 0.2% in most part of the sac. The study highlights some distinct features of the proposed method, and demonstrates the feasibility of organ level identification of the distributive anisotropic nonlinear properties in cerebral aneurysms. PMID:20490886

  14. Approaches to identifying synthetic lethal interactions in cancer.

    PubMed

    Thompson, Jordan M; Nguyen, Quy H; Singh, Manpreet; Razorenova, Olga V

    2015-06-01

    Targeting synthetic lethal interactions is a promising new therapeutic approach to exploit specific changes that occur within cancer cells. Multiple approaches to investigate these interactions have been developed and successfully implemented, including chemical, siRNA, shRNA, and CRISPR library screens. Genome-wide computational approaches, such as DAISY, also have been successful in predicting synthetic lethal interactions from both cancer cell lines and patient samples. Each approach has its advantages and disadvantages that need to be considered depending on the cancer type and its molecular alterations. This review discusses these approaches and examines case studies that highlight their use. PMID:26029013

  15. Approaches to Identifying Synthetic Lethal Interactions in Cancer

    PubMed Central

    Thompson, Jordan M.; Nguyen, Quy H.; Singh, Manpreet; Razorenova, Olga V.

    2015-01-01

    Targeting synthetic lethal interactions is a promising new therapeutic approach to exploit specific changes that occur within cancer cells. Multiple approaches to investigate these interactions have been developed and successfully implemented, including chemical, siRNA, shRNA, and CRISPR library screens. Genome-wide computational approaches, such as DAISY, also have been successful in predicting synthetic lethal interactions from both cancer cell lines and patient samples. Each approach has its advantages and disadvantages that need to be considered depending on the cancer type and its molecular alterations. This review discusses these approaches and examines case studies that highlight their use. PMID:26029013

  16. A Hybrid Data Mining Approach for Credit Card Usage Behavior Analysis

    NASA Astrophysics Data System (ADS)

    Tsai, Chieh-Yuan

    Credit card is one of the most popular e-payment approaches in current online e-commerce. To consolidate valuable customers, card issuers invest a lot of money to maintain good relationship with their customers. Although several efforts have been done in studying card usage motivation, few researches emphasize on credit card usage behavior analysis when time periods change from t to t+1. To address this issue, an integrated data mining approach is proposed in this paper. First, the customer profile and their transaction data at time period t are retrieved from databases. Second, a LabelSOM neural network groups customers into segments and identify critical characteristics for each group. Third, a fuzzy decision tree algorithm is used to construct usage behavior rules of interesting customer groups. Finally, these rules are used to analysis the behavior changes between time periods t and t+1. An implementation case using a practical credit card database provided by a commercial bank in Taiwan is illustrated to show the benefits of the proposed framework.

  17. A Hybrid Approach for Efficient Modeling of Medium-Frequency Propagation in Coal Mines

    PubMed Central

    Brocker, Donovan E.; Sieber, Peter E.; Waynert, Joseph A.; Li, Jingcheng; Werner, Pingjuan L.; Werner, Douglas H.

    2015-01-01

    An efficient procedure for modeling medium frequency (MF) communications in coal mines is introduced. In particular, a hybrid approach is formulated and demonstrated utilizing ideal transmission line equations to model MF propagation in combination with full-wave sections used for accurate simulation of local antenna-line coupling and other near-field effects. This work confirms that the hybrid method accurately models signal propagation from a source to a load for various system geometries and material compositions, while significantly reducing computation time. With such dramatic improvement to solution times, it becomes feasible to perform large-scale optimizations with the primary motivation of improving communications in coal mines both for daily operations and emergency response. Furthermore, it is demonstrated that the hybrid approach is suitable for modeling and optimizing large communication networks in coal mines that may otherwise be intractable to simulate using traditional full-wave techniques such as moment methods or finite-element analysis. PMID:26478686

  18. Cluster Analysis-Based Approaches for Geospatiotemporal Data Mining of Massive Data Sets for Identification of Forest Threats

    SciTech Connect

    Mills, Richard T; Hoffman, Forrest M; Kumar, Jitendra; HargroveJr., William Walter

    2011-01-01

    We investigate methods for geospatiotemporal data mining of multi-year land surface phenology data (250 m2 Normalized Difference Vegetation Index (NDVI) values derived from the Moderate Resolution Imaging Spectrometer (MODIS) in this study) for the conterminous United States (CONUS) as part of an early warning system for detecting threats to forest ecosystems. The approaches explored here are based on k-means cluster analysis of this massive data set, which provides a basis for defining the bounds of the expected or normal phenological patterns that indicate healthy vegetation at a given geographic location. We briefly describe the computational approaches we have used to make cluster analysis of such massive data sets feasible, describe approaches we have explored for distinguishing between normal and abnormal phenology, and present some examples in which we have applied these approaches to identify various forest disturbances in the CONUS.

  19. Meta-control of combustion performance with a data mining approach

    NASA Astrophysics Data System (ADS)

    Song, Zhe

    Large scale combustion process is complex and proposes challenges of optimizing its performance. Traditional approaches based on thermal dynamics have limitations on finding optimal operational regions due to time-shift nature of the process. Recent advances in information technology enable people collect large volumes of process data easily and continuously. The collected process data contains rich information about the process and, to some extent, represents a digital copy of the process over time. Although large volumes of data exist in industrial combustion processes, they are not fully utilized to the level where the process can be optimized. Data mining is an emerging science which finds patterns or models from large data sets. It has found many successful applications in business marketing, medical and manufacturing domains The focus of this dissertation is on applying data mining to industrial combustion processes, and ultimately optimizing the combustion performance. However the philosophy, methods and frameworks discussed in this research can also be applied to other industrial processes. Optimizing an industrial combustion process has two major challenges. One is the underlying process model changes over time and obtaining an accurate process model is nontrivial. The other is that a process model with high fidelity is usually highly nonlinear, solving the optimization problem needs efficient heuristics. This dissertation is set to solve these two major challenges. The major contribution of this 4-year research is the data-driven solution to optimize the combustion process, where process model or knowledge is identified based on the process data, then optimization is executed by evolutionary algorithms to search for optimal operating regions.

  20. North American Bats and Mines Project: A cooperative approach for integrating bat conservation and mine-land reclamation

    SciTech Connect

    Ducummon, S.L.

    1997-12-31

    Inactive underground mines now provide essential habitat for more than half of North America`s 44 bat species, including some of the largest remaining populations. Thousands of abandoned mines have already been closed or are slated for safety closures, and many are destroyed during renewed mining in historic districts. The available evidence suggests that millions of bats have already been lost due to these closures. Bats are primary predators of night-flying insects that cost American farmers and foresters billions of dollars annually, therefore, threats to bat survival are cause for serious concern. Fortunately, mine closure methods exist that protect both bats and humans. Bat Conservation International (BCI) and the USDI-Bureau of Land Management founded the North American Bats and Mines Project to provide national leadership and coordination to minimize the loss of mine-roosting bats. This partnership has involved federal and state mine-land and wildlife managers and the mining industry. BCI has trained hundreds of mine-land and wildlife managers nationwide in mine assessment techniques for bats and bat-compatible closure methods, published technical information on bats and mine-land management, presented papers on bats and mines at national mining and wildlife conferences, and collaborated with numerous federal, state, and private partners to protect some of the most important mine-roosting bat populations. Our new mining industry initiative, Mining for Habitat, is designed to develop bat habitat conservation and enhancement plans for active mining operations. It includes the creation of cost-effective artificial underground bat roosts using surplus mining materials such as old mine-truck tires and culverts buried beneath waste rock.

  1. Quantitative risk-based approach for improving water quality management in mining.

    PubMed

    Liu, Wenying; Moran, Chris J; Vink, Sue

    2011-09-01

    The potential environmental threats posed by freshwater withdrawal and mine water discharge are some of the main drivers for the mining industry to improve water management. The use of multiple sources of water supply and introducing water reuse into the mine site water system have been part of the operating philosophies employed by the mining industry to realize these improvements. However, a barrier to implementation of such good water management practices is concomitant water quality variation and the resulting impacts on the efficiency of mineral separation processes, and an increased environmental consequence of noncompliant discharge events. There is an increasing appreciation that conservative water management practices, production efficiency, and environmental consequences are intimately linked through the site water system. It is therefore essential to consider water management decisions and their impacts as an integrated system as opposed to dealing with each impact separately. This paper proposes an approach that could assist mine sites to manage water quality issues in a systematic manner at the system level. This approach can quantitatively forecast the risk related with water quality and evaluate the effectiveness of management strategies in mitigating the risk by quantifying implications for production and hence economic viability. PMID:21797262

  2. Data mining approach to web application intrusions detection

    NASA Astrophysics Data System (ADS)

    Kalicki, Arkadiusz

    2011-10-01

    Web applications became most popular medium in the Internet. Popularity, easiness of web application script languages and frameworks together with careless development results in high number of web application vulnerabilities and high number of attacks performed. There are several types of attacks possible because of improper input validation: SQL injection Cross-site scripting, Cross-Site Request Forgery (CSRF), web spam in blogs and others. In order to secure web applications intrusion detection (IDS) and intrusion prevention systems (IPS) are being used. Intrusion detection systems are divided in two groups: misuse detection (traditional IDS) and anomaly detection. This paper presents data mining based algorithm for anomaly detection. The principle of this method is the comparison of the incoming HTTP traffic with a previously built profile that contains a representation of the "normal" or expected web application usage sequence patterns. The frequent sequence patterns are found with GSP algorithm. Previously presented detection method was rewritten and improved. Some tests show that the software catches malicious requests, especially long attack sequences, results quite good with medium length sequences, for short length sequences must be complemented with other methods.

  3. Ultrabroadband photonic Internet: data mining approach to security aspects

    NASA Astrophysics Data System (ADS)

    Kalicki, Arkadiusz

    2009-06-01

    Web applications became most popular medium in the Internet. Popularity, easiness of web application frameworks together with careless development results in high number of vulnerabilities and attacks. There are several types of attacks possible because of improper input validation. SQL injection is ability to execute arbitrary SQL queries in a database through an existing application. Cross-site scripting is the vulnerability which allows malicious web users to inject code into the web pages viewed by other users. Cross-Site Request Forgery (CSRF) is an attack that tricks the victim into loading a page that contains malicious request. Web spam in blogs. In order to secure web applications intrusion detection (IDS) and intrusion prevention systems (IPS) are being used. Intrusion detection systems are divided in two groups: misuse detection (traditional IDS) and anomaly detection. Misuse detection systems are signature based, have high accuracy in detecting many kinds of known attacks but cannot detect unknown and emerging attacks. This can be complemented with anomaly based intrusion detection and prevention systems. This paper presents anomaly driven proxy as an IPS and data mining based algorithm which was used to detecting anomalies. The principle of this method is the comparison of the incoming HTTP traffic with a previously built profile that contains a representation of the "normal" or expected web application usage sequence patterns. The frequent sequence patterns are found with GSP algorithm. Some basic tests show that the software catches malicious requests.

  4. EVALUATION OF A TWO-STAGE PASSIVE TREATMENT APPROACH FOR MINING INFLUENCE WATERS

    EPA Science Inventory

    A two-stage passive treatment approach was assessed at bench-scale using two Colorado Mining Influenced Waters (MIWs). The first-stage was a limestone drain with the purpose of removing iron and aluminum and mitigating the potential effects of mineral acidity. The second stage w...

  5. DNA enrichment approaches to identify unauthorized genetically modified organisms (GMOs).

    PubMed

    Arulandhu, Alfred J; van Dijk, Jeroen P; Dobnik, David; Holst-Jensen, Arne; Shi, Jianxin; Zel, Jana; Kok, Esther J

    2016-07-01

    With the increased global production of different genetically modified (GM) plant varieties, chances increase that unauthorized GM organisms (UGMOs) may enter the food chain. At the same time, the detection of UGMOs is a challenging task because of the limited sequence information that will generally be available. PCR-based methods are available to detect and quantify known UGMOs in specific cases. If this approach is not feasible, DNA enrichment of the unknown adjacent sequences of known GMO elements is one way to detect the presence of UGMOs in a food or feed product. These enrichment approaches are also known as chromosome walking or gene walking (GW). In recent years, enrichment approaches have been coupled with next generation sequencing (NGS) analysis and implemented in, amongst others, the medical and microbiological fields. The present review will provide an overview of these approaches and an evaluation of their applicability in the identification of UGMOs in complex food or feed samples. PMID:27086015

  6. Abandoned mined land reclamation on the Wayne National Forest - an interdisciplinary approach

    SciTech Connect

    Moss, R.G.

    1982-12-01

    The Wayne National Forest contains several thousand acres of abandoned surface-mined lands, many of which are in need of reclamation. The Forest Service has developed a systematic interdisciplinary approach to planning and implementing reclamation projects. An environmental assessment report is prepared before the project is designed which provides decision makers the information needed to select a preferred reclamation alternative. A case study known as the Yost II Abandoned Mined Land Reclamation Project is presented. The abandoned mine, basically a double contour configuration, presented designers with a difficult mosaic of barren, toxic areas, well-revegetated areas, and acid ponds. The reclamation technique employed utilized burial of toxic soil, pond underdrains, crushed limestone filter strips, and topsoiling.

  7. Data Mining: A Systems Approach to Formative Assessment

    ERIC Educational Resources Information Center

    Schmid, Dale

    2012-01-01

    This article describes how using raw data and information from reliable assessments can inform teachers' decisions leading to improved instruction. The primary aim is to use a systems approach to provide evidence of what students know and how they demonstrate mastery. Such evidence can empower teachers to reach all students. The pedagogic…

  8. An Approach for Identifying Benefit Segments among Prospective College Students.

    ERIC Educational Resources Information Center

    Miller, Patrick; And Others

    1990-01-01

    A study investigated the importance to 578 applicants of various benefits offered by a moderately selective private university. Applicants rated the institution on 43 academic, social, financial, religious, and curricular attributes. The objective was to test the efficacy of one approach to college market segmentation. Results support the utility…

  9. Identifying Predictors of Physics Item Difficulty: A Linear Regression Approach

    ERIC Educational Resources Information Center

    Mesic, Vanes; Muratovic, Hasnija

    2011-01-01

    Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary…

  10. Genomic approaches to identifying transcriptional regulators of osteoblast differentiation

    NASA Technical Reports Server (NTRS)

    Stains, Joseph P.; Civitelli, Roberto

    2003-01-01

    Recent microarray studies of mouse and human osteoblast differentiation in vitro have identified novel transcription factors that may be important in the establishment and maintenance of differentiation. These findings help unravel the pattern of gene-expression changes that underly the complex process of bone formation.

  11. Identifying the "Truly Disadvantaged": A Comprehensive Biosocial Approach

    ERIC Educational Resources Information Center

    Barnes, J. C.; Beaver, Kevin M.; Connolly, Eric J.; Schwartz, Joseph A.

    2016-01-01

    There has been significant interest in examining the developmental factors that predispose individuals to chronic criminal offending. This body of research has identified some social-environmental risk factors as potentially important. At the same time, the research producing these results has generally failed to employ genetically sensitive…

  12. Novel LanT Associated Lantibiotic Clusters Identified by Genome Database Mining

    PubMed Central

    Singh, Mangal; Sareen, Dipti

    2014-01-01

    Background Frequent use of antibiotics has led to the emergence of antibiotic resistance in bacteria. Lantibiotic compounds are ribosomally synthesized antimicrobial peptides against which bacteria are not able to produce resistance, hence making them a good alternative to antibiotics. Nisin is the oldest and the most widely used lantibiotic, in food preservation, without having developed any significant resistance against it. Having their antimicrobial potential and a limited number, there is a need to identify novel lantibiotics. Methodology/Findings Identification of novel lantibiotic biosynthetic clusters from an ever increasing database of bacterial genomes, can provide a major lead in this direction. In order to achieve this, a strategy was adopted to identify novel lantibiotic biosynthetic clusters by screening the sequenced genomes for LanT homolog, which is a conserved lantibiotic transporter specific to type IB clusters. This strategy resulted in identification of 54 bacterial strains containing the LanT homologs, which are not the known lantibiotic producers. Of these, 24 strains were subjected to a detailed bioinformatic analysis to identify genes encoding for precursor peptides, modification enzyme, immunity and quorum sensing proteins. Eight clusters having two LanM determinants, similar to haloduracin and lichenicidin were identified, along with 13 clusters having a single LanM determinant as in mersacidin biosynthetic cluster. Besides these, orphan LanT homologs were also identified which might be associated with novel bacteriocins, encoded somewhere else in the genome. Three identified gene clusters had a C39 domain containing LanT transporter, associated with the LanBC proteins and double glycine type precursor peptides, the only known example of such a cluster is that of salivaricin. Conclusion This study led to the identification of 8 novel putative two-component lantibiotic clusters along with 13 having a single LanM and 3 with LanBC genes

  13. A network approach for identifying and delimiting biogeographical regions.

    PubMed

    Vilhena, Daril A; Antonelli, Alexandre

    2015-01-01

    Biogeographical regions (geographically distinct assemblages of species and communities) constitute a cornerstone for ecology, biogeography, evolution and conservation biology. Species turnover measures are often used to quantify spatial biodiversity patterns, but algorithms based on similarity can be sensitive to common sampling biases in species distribution data. Here we apply a community detection approach from network theory that incorporates complex, higher-order presence-absence patterns. We demonstrate the performance of the method by applying it to all amphibian species in the world (c. 6,100 species), all vascular plant species of the USA (c. 17,600) and a hypothetical data set containing a zone of biotic transition. In comparison with current methods, our approach tackles the challenges posed by transition zones and succeeds in retrieving a larger number of commonly recognized biogeographical regions. This method can be applied to generate objective, data-derived identification and delimitation of the world's biogeographical regions. PMID:25907961

  14. Computational approaches to identify functional genetic variants in cancer genomes

    PubMed Central

    Gonzalez-Perez, Abel; Mustonen, Ville; Reva, Boris; Ritchie, Graham R.S.; Creixell, Pau; Karchin, Rachel; Vazquez, Miguel; Fink, J. Lynn; Kassahn, Karin S.; Pearson, John V.; Bader, Gary; Boutros, Paul C.; Muthuswamy, Lakshmi; Ouellette, B.F. Francis; Reimand, Jüri; Linding, Rune; Shibata, Tatsuhiro; Valencia, Alfonso; Butler, Adam; Dronov, Serge; Flicek, Paul; Shannon, Nick B.; Carter, Hannah; Ding, Li; Sander, Chris; Stuart, Josh M.; Stein, Lincoln D.; Lopez-Bigas, Nuria

    2014-01-01

    The International Cancer Genome Consortium (ICGC) aims to catalog genomic abnormalities in tumors from 50 different cancer types. Genome sequencing reveals hundreds to thousands of somatic mutations in each tumor, but only a minority drive tumor progression. We present the result of discussions within the ICGC on how to address the challenge of identifying mutations that contribute to oncogenesis, tumor maintenance or response to therapy, and recommend computational techniques to annotate somatic variants and predict their impact on cancer phenotype. PMID:23900255

  15. Multidisciplinary approach to identify aquifer-peatland connectivity

    NASA Astrophysics Data System (ADS)

    Larocque, Marie; Pellerin, Stéphanie; Cloutier, Vincent; Ferlatte, Miryane; Munger, Julie; Quillet, Anne; Paniconi, Claudio

    2015-04-01

    In southern Quebec (Canada), wetlands sustain increasing pressures from agriculture, urban development, and peat exploitation. To protect both groundwater and ecosystems, it is important to be able to identify how, where, and to what extent shallow aquifers and wetlands are connected. This study focuses on peatlands which are especially abundant in Quebec. The objective of this research was to better understand aquifer-peatland connectivity and to identify easily measured indicators of this connectivity. Geomorphology, hydrogeochemistry, and vegetation were selected as key indicators of connectivity. Twelve peatland transects were instrumented and monitored in the Abitibi (slope peatlands associated with eskers) and Centre-du-Quebec (depression peatlands) regions of Quebec (Canada). Geomorphology, geology, water levels, water chemistry, and vegetation species were identified/measured on all transects. Flow conditions were simulated numerically on two typical transects. Results show that a majority of peatland transects receives groundwater from a shallow aquifer. In slope peatlands, groundwater flows through the organic deposits towards the peatland center. In depression peatlands, groundwater flows only 100-200 m within the peatland before being redirected through surface routes towards the outlet. Flow modeling and sensitivity analysis have identified that the thickness and hydraulic conductivity of permeable deposits close to the peatland and beneath the organic deposits influence flow directions within the peatland. Geochemical data have confirmed the usefulness of total dissolved solids (TDS) exceeding 14 mg/L as an indicator of the presence of groundwater within the peatland. Vegetation surveys have allowed the identification of species and groups of species that occur mostly when groundwater is present, for instance Carex limosa and Sphagnum russowii. Geomorphological conditions (slope or depression peatland), TDS, and vegetation can be measured

  16. New Seasonal Shift in In-Stream Diurnal Nitrate Cycles Identified by Mining High-Frequency Data.

    PubMed

    Aubert, Alice H; Breuer, Lutz

    2016-01-01

    The recent development of in-situ monitoring devices, such as UV-spectrometers, makes the study of short-term stream chemistry variation relevant, especially the study of diurnal cycles, which are not yet fully understood. Our study is based on high-frequency data from an agricultural catchment (Studienlandschaft Schwingbachtal, Germany). We propose a novel approach, i.e. the combination of cluster analysis and Linear Discriminant Analysis, to mine from these data nitrate behavior patterns. As a result, we observe a seasonality of nitrate diurnal cycles, that differs from the most common cycle seasonality described in the literature, i.e. pre-dawn peaks in spring. Our cycles appear in summer and the maximum and minimum shift to a later time in late summer/autumn. This is observed both for water- and energy-limited years, thus potentially stressing the role of evapotranspiration. This concluding hypothesis on the role of evapotranspiration on nitrate stream concentration, which was obtained through data mining, broadens the perspective on the diurnal cycling of stream nitrate concentrations. PMID:27073838

  17. New Seasonal Shift in In-Stream Diurnal Nitrate Cycles Identified by Mining High-Frequency Data

    PubMed Central

    2016-01-01

    The recent development of in-situ monitoring devices, such as UV-spectrometers, makes the study of short-term stream chemistry variation relevant, especially the study of diurnal cycles, which are not yet fully understood. Our study is based on high-frequency data from an agricultural catchment (Studienlandschaft Schwingbachtal, Germany). We propose a novel approach, i.e. the combination of cluster analysis and Linear Discriminant Analysis, to mine from these data nitrate behavior patterns. As a result, we observe a seasonality of nitrate diurnal cycles, that differs from the most common cycle seasonality described in the literature, i.e. pre-dawn peaks in spring. Our cycles appear in summer and the maximum and minimum shift to a later time in late summer/autumn. This is observed both for water- and energy-limited years, thus potentially stressing the role of evapotranspiration. This concluding hypothesis on the role of evapotranspiration on nitrate stream concentration, which was obtained through data mining, broadens the perspective on the diurnal cycling of stream nitrate concentrations. PMID:27073838

  18. A data mining based approach to predict spatiotemporal changes in satellite images

    NASA Astrophysics Data System (ADS)

    Boulila, W.; Farah, I. R.; Ettabaa, K. Saheb; Solaiman, B.; Ghézala, H. Ben

    2011-06-01

    The interpretation of remotely sensed images in a spatiotemporal context is becoming a valuable research topic. However, the constant growth of data volume in remote sensing imaging makes reaching conclusions based on collected data a challenging task. Recently, data mining appears to be a promising research field leading to several interesting discoveries in various areas such as marketing, surveillance, fraud detection and scientific discovery. By integrating data mining and image interpretation techniques, accurate and relevant information (i.e. functional relation between observed parcels and a set of informational contents) can be automatically elicited. This study presents a new approach to predict spatiotemporal changes in satellite image databases. The proposed method exploits fuzzy sets and data mining concepts to build predictions and decisions for several remote sensing fields. It takes into account imperfections related to the spatiotemporal mining process in order to provide more accurate and reliable information about land cover changes in satellite images. The proposed approach is validated using SPOT images representing the Saint-Denis region, capital of Reunion Island. Results show good performances of the proposed framework in predicting change for the urban zone.

  19. Identifying the Factors Affecting Science and Mathematics Achievement Using Data Mining Methods

    ERIC Educational Resources Information Center

    Kiray, S. Ahmet; Gok, Bilge; Bozkir, A. Selman

    2015-01-01

    The purpose of this article is to identify the order of significance of the variables that affect science and mathematics achievement in middle school students. For this aim, the study deals with the relationship between science and math in terms of different angles using the perspectives of multiple causes-single effect and of multiple…

  20. Using Data Mining to Identify Actionable Information: Breaking New Ground in Data-Driven Decision Making

    ERIC Educational Resources Information Center

    Streifer, Philip A.; Schumann, Jeffrey A.

    2005-01-01

    The implementation of No Child Left Behind (NCLB) presents important challenges for schools across the nation to identify problems that lead to poor performance. Yet schools must intervene with instructional programs that can make a difference and evaluate the effectiveness of such programs. New advances in artificial intelligence (AI) data-mining…

  1. Mining a Written Values Affirmation Intervention to Identify the Unique Linguistic Features of Stigmatized Groups

    ERIC Educational Resources Information Center

    Riddle, Travis; Bhagavatula, Sowmya Sree; Guo, Weiwei; Muresan, Smaranda; Cohen, Geoff; Cook, Jonathan E.; Purdie-Vaughns, Valerie

    2015-01-01

    Social identity threat refers to the process through which an individual underperforms in some domain due to their concern with confirming a negative stereotype held about their group. Psychological research has identified this as one contributor to the underperformance and underrepresentation of women, Blacks, and Latinos in STEM fields. Over the…

  2. Proteomic Approach to Identify Nuclear Proteins in Wheat Grain.

    PubMed

    Bancel, Emmanuelle; Bonnot, Titouan; Davanture, Marlène; Branlard, Gérard; Zivy, Michel; Martre, Pierre

    2015-10-01

    The nuclear proteome of the grain of the two cultivated wheat species Triticum aestivum (hexaploid wheat; genomes A, B, and D) and T. monococcum (diploid wheat; genome A) was analyzed in two early stages of development using shotgun-based proteomics. A procedure was optimized to purify nuclei, and an improved protein sample preparation was developed to efficiently remove nonprotein substances (starch and nucleic acids). A total of 797 proteins corresponding to 528 unique proteins were identified, 36% of which were classified in functional groups related to DNA and RNA metabolism. A large number (107 proteins) of unknown functions and hypothetical proteins were also found. Some identified proteins may be multifunctional and may present multiple localizations. On the basis of the MS/MS analysis, 368 proteins were present in the two species, and in two stages of development, some qualitative differences between species and stages of development were also found. All of these data illustrate the dynamic function of the grain nucleus in the early stages of development. PMID:26228564

  3. Timely approaches to identify probiotic species of the genus Lactobacillus

    PubMed Central

    2013-01-01

    Over the past decades the use of probiotics in food has increased largely due to the manufacturer’s interest in placing “healthy” food on the market based on the consumer’s ambitions to live healthy. Due to this trend, health benefits of products containing probiotic strains such as lactobacilli are promoted and probiotic strains have been established in many different products with their numbers increasing steadily. Probiotics are used as starter cultures in dairy products such as cheese or yoghurts and in addition they are also utilized in non-dairy products such as fermented vegetables, fermented meat and pharmaceuticals, thereby, covering a large variety of products. To assure quality management, several pheno-, physico- and genotyping methods have been established to unambiguously identify probiotic lactobacilli. These methods are often specific enough to identify the probiotic strains at genus and species levels. However, the probiotic ability is often strain dependent and it is impossible to distinguish strains by basic microbiological methods. Therefore, this review aims to critically summarize and evaluate conventional identification methods for the genus Lactobacillus, complemented by techniques that are currently being developed. PMID:24063519

  4. PedMine – A simulated annealing algorithm to identify maximally unrelated individuals in population isolates

    PubMed Central

    Douglas, Julie A.; Sandefur, Conner I.

    2010-01-01

    Summary In family-based genetic studies, it is often useful to identify a subset of unrelated individuals. When such studies are conducted in population isolates, however, most if not all individuals are often detectably related to each other. To identify a set of maximally unrelated (or equivalently, minimally related) individuals, we have implemented simulated annealing, a general-purpose algorithm for solving difficult combinatorial optimization problems. We illustrate our method on data from a genetic study in the Old Order Amish of Lancaster County, Pennsylvania, a population isolate derived from a modest number of founders. Given one or more pedigrees, our program automatically and rapidly extracts a fixed number of maximally unrelated individuals. PMID:18321883

  5. A novel meta-analytic approach: Mining frequent co-activation patterns in neuroimaging databases

    PubMed Central

    Caspers, Julian; Zilles, Karl; Beierle, Christoph; Rottschy, Claudia; Eickhoff, Simon B.

    2016-01-01

    In recent years, coordinate-based meta-analyses have become a powerful and widely used tool to study coactivity across neuroimaging experiments, a development that was supported by the emergence of large-scale neuroimaging databases like BrainMap. However, the evaluation of co-activation patterns is constrained by the fact that previous coordinate-based meta-analysis techniques like Activation Likelihood Estimation (ALE) and Multilevel Kernel Density Analysis (MKDA) reveal all brain regions that show convergent activity within a dataset without taking into account actual within-experiment co-occurrence patterns. To overcome this issue we here propose a novel meta-analytic approach named PaMiNI that utilizes a combination of two well-established data-mining techniques, Gaussian mixture modeling and the Apriori algorithm. By this, PaMiNI enables a data-driven detection of frequent co-activation patterns within neuroimaging datasets. The feasibility of the method is demonstrated by means of several analyses on simulated data as well as a real application. The analyses of the simulated data show that PaMiNI identifies the brain regions underlying the simulated activation foci and perfectly separates the co-activation patterns of the experiments in the simulations. Furthermore, PaMiNI still yields good results when activation foci of distinct brain regions become closer together or if they are non-Gaussian distributed. For the further evaluation, a real dataset on working memory experiments is used, which was previously examined in an ALE meta-analysis and hence allows a cross-validation of both methods. In this latter analysis, PaMiNI revealed a fronto-parietal “core” network of working memory and furthermore indicates a left-lateralization in this network. Finally, to encourage a widespread usage of this new method, the PaMiNI approach was implemented into a publicly available software system. PMID:24365675

  6. Identifying Pathogenicity Islands in Bacterial Pathogenomics Using Computational Approaches

    PubMed Central

    Che, Dongsheng; Hasan, Mohammad Shabbir; Chen, Bernard

    2014-01-01

    High-throughput sequencing technologies have made it possible to study bacteria through analyzing their genome sequences. For instance, comparative genome sequence analyses can reveal the phenomenon such as gene loss, gene gain, or gene exchange in a genome. By analyzing pathogenic bacterial genomes, we can discover that pathogenic genomic regions in many pathogenic bacteria are horizontally transferred from other bacteria, and these regions are also known as pathogenicity islands (PAIs). PAIs have some detectable properties, such as having different genomic signatures than the rest of the host genomes, and containing mobility genes so that they can be integrated into the host genome. In this review, we will discuss various pathogenicity island-associated features and current computational approaches for the identification of PAIs. Existing pathogenicity island databases and related computational resources will also be discussed, so that researchers may find it to be useful for the studies of bacterial evolution and pathogenicity mechanisms. PMID:25437607

  7. Multimodal Approach to Identifying Malingered Posttraumatic Stress Disorder: A Review

    PubMed Central

    Jabeen, Shagufta; Alam, Farzana

    2015-01-01

    The primary aim of this article is to aid clinicians in differentiating true posttraumatic stress disorder from malingered posttraumatic stress disorder. Posttraumatic stress disorder and malingering are defined, and prevalence rates are explored. Similarities and differences in diagnostic criteria between the fourth and fifth editions of the Diagnostic and Statistical Manual of Mental Disorders are described for posttraumatic stress disorder. Possible motivations for malingering posttraumatic stress disorder are discussed, and common characteristics of malingered posttraumatic stress disorder are described. A multimodal approach is described for evaluating posttraumatic stress disorder, including interview techniques, collection of collateral data, and psychometric and physiologic testing, that should allow clinicians to distinguish between those patients who are truly suffering from posttraumatic disorder and those who are malingering the illness. PMID:25852974

  8. A new approach to estimate fugitive methane emissions from coal mining in China.

    PubMed

    Ju, Yiwen; Sun, Yue; Sa, Zhanyou; Pan, Jienan; Wang, Jilin; Hou, Quanlin; Li, Qingguang; Yan, Zhifeng; Liu, Jie

    2016-02-01

    Developing a more accurate greenhouse gas (GHG) emissions inventory draws too much attention. Because of its resource endowment and technical status, China has made coal-related GHG emissions a big part of its inventory. Lacking a stoichiometric carbon conversion coefficient and influenced by geological conditions and mining technologies, previous efforts to estimate fugitive methane emissions from coal mining in China has led to disagreeing results. This paper proposes a new calculation methodology to determine fugitive methane emissions from coal mining based on the domestic analysis of gas geology, gas emission features, and the merits and demerits of existing estimation methods. This new approach involves four main parameters: in-situ original gas content, gas remaining post-desorption, raw coal production, and mining influence coefficient. The case studies in Huaibei-Huainan Coalfield and Jincheng Coalfield show that the new method obtains the smallest error, +9.59% and 7.01% respectively compared with other methods, Tier 1 and Tier 2 (with two samples) in this study, which resulted in +140.34%, +138.90%, and -18.67%, in Huaibei-Huainan Coalfield, while +64.36%, +47.07%, and -14.91% in Jincheng Coalfield. Compared with the predominantly used methods, this new one possesses the characteristics of not only being a comparably more simple process and lower uncertainty than the "emission factor method" (IPCC recommended Tier 1 and Tier 2), but also having easier data accessibility, similar uncertainty, and additional post-mining emissions compared to the "absolute gas emission method" (IPCC recommended Tier 3). Therefore, methane emissions dissipated from most of the producing coal mines worldwide could be more accurately and more easily estimated. PMID:26605831

  9. Identifying hosts of families of viruses: a machine learning approach.

    PubMed

    Raj, Anil; Dewar, Michael; Palacios, Gustavo; Rabadan, Raul; Wiggins, Christopher H

    2011-01-01

    Identifying emerging viral pathogens and characterizing their transmission is essential to developing effective public health measures in response to an epidemic. Phylogenetics, though currently the most popular tool used to characterize the likely host of a virus, can be ambiguous when studying species very distant to known species and when there is very little reliable sequence information available in the early stages of the outbreak of disease. Motivated by an existing framework for representing biological sequence information, we learn sparse, tree-structured models, built from decision rules based on subsequences, to predict viral hosts from protein sequence data using popular discriminative machine learning tools. Furthermore, the predictive motifs robustly selected by the learning algorithm are found to show strong host-specificity and occur in highly conserved regions of the viral proteome. PMID:22174744

  10. A new approach to identify, classify and count drugrelated events

    PubMed Central

    Bürkle, Thomas; Müller, Fabian; Patapovas, Andrius; Sonst, Anja; Pfistermeister, Barbara; Plank-Kiegele, Bettina; Dormann, Harald; Maas, Renke

    2013-01-01

    Aims The incidence of clinical events related to medication errors and/or adverse drug reactions reported in the literature varies by a degree that cannot solely be explained by the clinical setting, the varying scrutiny of investigators or varying definitions of drug-related events. Our hypothesis was that the individual complexity of many clinical cases may pose relevant limitations for current definitions and algorithms used to identify, classify and count adverse drug-related events. Methods Based on clinical cases derived from an observational study we identified and classified common clinical problems that cannot be adequately characterized by the currently used definitions and algorithms. Results It appears that some key models currently used to describe the relation of medication errors (MEs), adverse drug reactions (ADRs) and adverse drug events (ADEs) can easily be misinterpreted or contain logical inconsistencies that limit their accurate use to all but the simplest clinical cases. A key limitation of current models is the inability to deal with complex interactions such as one drug causing two clinically distinct side effects or multiple drugs contributing to a single clinical event. Using a large set of clinical cases we developed a revised model of the interdependence between MEs, ADEs and ADRs and extended current event definitions when multiple medications cause multiple types of problems. We propose algorithms that may help to improve the identification, classification and counting of drug-related events. Conclusions The new model may help to overcome some of the limitations that complex clinical cases pose to current paper- or software-based drug therapy safety. PMID:24007453

  11. Newer Approaches to Identify Potential Untoward Effects in Functional Foods.

    PubMed

    Marone, Palma Ann; Birkenbach, Victoria L; Hayes, A Wallace

    2016-01-01

    Globalization has greatly accelerated the numbers and variety of food and beverage products available worldwide. The exchange among greater numbers of countries, manufacturers, and products in the United States and worldwide has necessitated enhanced quality measures for nutritional products for larger populations increasingly reliant on functionality. These functional foods, those that provide benefit beyond basic nutrition, are increasingly being used for their potential to alleviate food insufficiency while enhancing quality and longevity of life. In the United States alone, a steady import increase of greater than 15% per year or 24 million shipments, over 70% products of which are food related, is regulated under the Food and Drug Administration (FDA). This unparalleled growth has resulted in the need for faster, cheaper, and better safety and efficacy screening methods in the form of harmonized guidelines and recommendations for product standardization. In an effort to meet this need, the in vitro toxicology testing market has similarly grown with an anticipatory 15% increase between 2010 and 2015 of US$1.3 to US$2.7 billion. Although traditionally occupying a small fraction of the market behind pharmaceuticals and cosmetic/household products, the scope of functional food testing, including additives/supplements, ingredients, residues, contact/processing, and contaminants, is potentially expansive. Similarly, as functional food testing has progressed, so has the need to identify potential adverse factors that threaten the safety and quality of these products. PMID:26657815

  12. Correlation approach to identify coding regions in DNA sequences

    NASA Technical Reports Server (NTRS)

    Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1994-01-01

    Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.

  13. Enhanced Approaches for Identifying Amadori Products: Application to Peanut Allergens.

    PubMed

    Johnson, Katina L; Williams, Jason G; Maleki, Soheila J; Hurlburt, Barry K; London, Robert E; Mueller, Geoffrey A

    2016-02-17

    The dry roasting of peanuts is suggested to influence allergic sensitization as a result of the formation of advanced glycation end products (AGEs) on peanut proteins. Identifying AGEs is technically challenging. The AGEs of a peanut allergen were probed with nano-scale liquid chromatography-electrospray ionization-mass spectrometry (nanoLC-ESI-MS) and tandem mass spectrometry (MS/MS) analyses. Amadori product ions matched to expected peptides and yielded fragments that included a loss of three waters and HCHO. As a result of the paucity of b and y ions in the MS/MS spectrum, standard search algorithms do not perform well. Reactions with isotopically labeled sugars confirmed that the peptides contained Amadori products. An algorithm was developed on the basis of information content (Shannon entropy) and the loss of water and HCHO. Results with test data show that the algorithm finds the correct spectra with high precision, reducing the time needed to manually inspect data. Computational and technical improvements allowed for better identification of the chemical differences between modified and unmodified proteins. PMID:26811263

  14. A landscape ecology approach identifies important drivers of urban biodiversity.

    PubMed

    Turrini, Tabea; Knop, Eva

    2015-04-01

    Cities are growing rapidly worldwide, yet a mechanistic understanding of the impact of urbanization on biodiversity is lacking. We assessed the impact of urbanization on arthropod diversity (species richness and evenness) and abundance in a study of six cities and nearby intensively managed agricultural areas. Within the urban ecosystem, we disentangled the relative importance of two key landscape factors affecting biodiversity, namely the amount of vegetated area and patch isolation. To do so, we a priori selected sites that independently varied in the amount of vegetated area in the surrounding landscape at the 500-m scale and patch isolation at the 100-m scale, and we hold local patch characteristics constant. As indicator groups, we used bugs, beetles, leafhoppers, and spiders. Compared to intensively managed agricultural ecosystems, urban ecosystems supported a higher abundance of most indicator groups, a higher number of bug species, and a lower evenness of bug and beetle species. Within cities, a high amount of vegetated area increased species richness and abundance of most arthropod groups, whereas evenness showed no clear pattern. Patch isolation played only a limited role in urban ecosystems, which contrasts findings from agro-ecological studies. Our results show that urban areas can harbor a similar arthropod diversity and abundance compared to intensively managed agricultural ecosystems. Further, negative consequences of urbanization on arthropod diversity can be mitigated by providing sufficient vegetated space in the urban area, while patch connectivity is less important in an urban context. This highlights the need for applying a landscape ecological approach to understand the mechanisms shaping urban biodiversity and underlines the potential of appropriate urban planning for mitigating biodiversity loss. PMID:25620599

  15. Configurational approach to identifying the earliest hominin butchers.

    PubMed

    Domínguez-Rodrigo, Manuel; Pickering, Travis Rayne; Bunn, Henry T

    2010-12-01

    The announcement of two approximately 3.4-million-y-old purportedly butchered fossil bones from the Dikika paleoanthropological research area (Lower Awash Valley, Ethiopia) could profoundly alter our understanding of human evolution. Butchering damage on the Dikika bones would imply that tool-assisted meat-eating began approximately 800,000 y before previously thought, based on butchered bones from 2.6- to 2.5-million-y-old sites at the Ethiopian Gona and Bouri localities. Further, the only hominin currently known from Dikika at approximately 3.4 Ma is Australopithecus afarensis, a temporally and geographically widespread species unassociated previously with any archaeological evidence of butchering. Our taphonomic configurational approach to assess the claims of A. afarensis butchery at Dikika suggests the claims of unexpectedly early butchering at the site are not warranted. The Dikika research group focused its analysis on the morphology of the marks in question but failed to demonstrate, through recovery of similarly marked in situ fossils, the exact provenience of the published fossils, and failed to note occurrences of random striae on the cortices of the published fossils (incurred through incidental movement of the defleshed specimens across and/or within their abrasive encasing sediments). The occurrence of such random striae (sometimes called collectively "trampling" damage) on the two fossils provide the configurational context for rejection of the claimed butchery marks. The earliest best evidence for hominin butchery thus remains at 2.6 to 2.5 Ma, presumably associated with more derived species than A. afarensis. PMID:21078985

  16. Genetic Approaches To Identifying Novel Osteoporosis Drug Targets.

    PubMed

    Brommage, Robert

    2015-10-01

    During the past two decades effective drugs for treating osteoporosis have been developed, including anti-resorptives inhibiting bone resorption (estrogens, the SERM raloxifene, four bisphosphonates, RANKL inhibitor denosumab) and the anabolic bone forming daily injectable peptide teriparatide. Two potential drugs (odanacatib and romosozumab) are in late stage clinical development. The most pressing unmet need is for orally active anabolic drugs. This review describes the basic biological studies involved in developing these drugs, including the animal models employed for osteoporosis drug development. The genomics revolution continues to identify potential novel osteoporosis drug targets. Studies include human GWAS studies and identification of mutant genes in subjects having abnormal bone mass, mouse QTL and gene knockouts, and gene expression studies. Multiple lines of evidence indicate that Wnt signaling plays a major role in regulating bone formation and continued study of this complex pathway is likely to lead to key discoveries. In addition to the classic Wnt signaling targets DKK1 and sclerostin, LRP4, LRP5/LRP6, SFRP4, WNT16, and NOTUM can potentially be targeted to modulate Wnt signaling. Next-generation whole genome and exome sequencing, RNA-sequencing and CRISPR/CAS9 gene editing are new experimental techniques contributing to understanding the genome. The International Knockout Mouse Consortium efforts to knockout and phenotype all mouse genes are poised to accelerate. Accumulating knowledge will focus attention on readily accessible databases (Big Data). Efforts are underway by the International Bone and Mineral Society to develop an annotated Skeletome database providing information on all genes directly influencing bone mass, architecture, mineralization or strength. PMID:25833316

  17. Identifying new targets in leukemogenesis using computational approaches

    PubMed Central

    Jayaraman, Archana; Jamil, Kaiser; Khan, Haseeb A.

    2015-01-01

    There is a need to identify novel targets in Acute Lymphoblastic Leukemia (ALL), a hematopoietic cancer affecting children, to improve our understanding of disease biology and that can be used for developing new therapeutics. Hence, the aim of our study was to find new genes as targets using in silico studies; for this we retrieved the top 10% overexpressed genes from Oncomine public domain microarray expression database; 530 overexpressed genes were short-listed from Oncomine database. Then, using prioritization tools such as ENDEAVOUR, DIR and TOPPGene online tools, we found fifty-four genes common to the three prioritization tools which formed our candidate leukemogenic genes for this study. As per the protocol we selected thirty training genes from PubMed. The prioritized and training genes were then used to construct STRING functional association network, which was further analyzed using cytoHubba hub analysis tool to investigate new genes which could form drug targets in leukemia. Analysis of the STRING protein network built from these prioritized and training genes led to identification of two hub genes, SMAD2 and CDK9, which were not implicated in leukemogenesis earlier. Filtering out from several hundred genes in the network we also found MEN1, HDAC1 and LCK genes, which re-emphasized the important role of these genes in leukemogenesis. This is the first report on these five additional signature genes in leukemogenesis. We propose these as new targets for developing novel therapeutics and also as biomarkers in leukemogenesis, which could be important for prognosis and diagnosis. PMID:26288567

  18. A Bayesian Approach to Identifying New Risk Factors for Dementia

    PubMed Central

    Wen, Yen-Hsia; Wu, Shihn-Sheng; Lin, Chun-Hung Richard; Tsai, Jui-Hsiu; Yang, Pinchen; Chang, Yang-Pei; Tseng, Kuan-Hua

    2016-01-01

    Abstract Dementia is one of the most disabling and burdensome health conditions worldwide. In this study, we identified new potential risk factors for dementia from nationwide longitudinal population-based data by using Bayesian statistics. We first tested the consistency of the results obtained using Bayesian statistics with those obtained using classical frequentist probability for 4 recognized risk factors for dementia, namely severe head injury, depression, diabetes mellitus, and vascular diseases. Then, we used Bayesian statistics to verify 2 new potential risk factors for dementia, namely hearing loss and senile cataract, determined from the Taiwan's National Health Insurance Research Database. We included a total of 6546 (6.0%) patients diagnosed with dementia. We observed older age, female sex, and lower income as independent risk factors for dementia. Moreover, we verified the 4 recognized risk factors for dementia in the older Taiwanese population; their odds ratios (ORs) ranged from 3.469 to 1.207. Furthermore, we observed that hearing loss (OR = 1.577) and senile cataract (OR = 1.549) were associated with an increased risk of dementia. We found that the results obtained using Bayesian statistics for assessing risk factors for dementia, such as head injury, depression, DM, and vascular diseases, were consistent with those obtained using classical frequentist probability. Moreover, hearing loss and senile cataract were found to be potential risk factors for dementia in the older Taiwanese population. Bayesian statistics could help clinicians explore other potential risk factors for dementia and for developing appropriate treatment strategies for these patients. PMID:27227925

  19. A text mining approach to detect mentions of protein glycosylation in biomedical text

    PubMed Central

    Shukla, Daksha; Jayaraman, Valadi K

    2012-01-01

    Protein Glycosylation is an important post translational event that plays a pivotal role in protein folding and protein is trafficking. We describe a dictionary based and a rule based approach to mine ‘mentions‘ of protein glycosylation in text. The dictionary based approach relies on a set of manually curated dictionaries specially constructed to address this task. Abstracts are then screened for the ‘mentions‘ of words from these dictionaries which are further scored followed by classification on the basis of a threshold. The rule based approaches also relies on the words in the dictionary to arrive at the features which are used for classification. The performance of the system using both the approaches has been evaluated using a manually curated corpus of 3133 abstracts. The evaluation suggests that the performance of the Rule based approach supersedes that of the Dictionary based approach. PMID:23055626

  20. An ecosystem approach to evaluate restoration measures in the lignite mining district of Lusatia/Germany

    NASA Astrophysics Data System (ADS)

    Schaaf, Wolfgang

    2015-04-01

    Lignite mining in Lusatia has a history of over 100 years. Open-cast mining directly affected an area of 1000 km2. Since 20 years we established an ecosystem oriented approach to evaluate the development and site characteristics of post-mining areas mainly restored for agricultural and silvicultural land use. Water and element budgets of afforested sites were studied under different geochemical settings in a chronosequence approach (Schaaf 2001), as well as the effect of soil amendments like sewage sludge or compost in restoration (Schaaf & Hüttl 2006). Since 10 years we also study the development of natural site regeneration in the constructed catchment Chicken Creek at the watershed scale (Schaaf et al. 2011, 2013). One of the striking characteristics of post-mining sites is a very large small-scale soil heterogeneity that has to be taken into account with respect to soil forming processes and element cycling. Results from these studies in combination with smaller-scale process studies enable to evaluate the long-term effect of restoration measures and adapted land use options. In addition, it is crucial to compare these results with data from undisturbed, i.e. non-mined sites. Schaaf, W., 2001: What can element budgets of false-time series tell us about ecosystem development on post-lignite mining sites? Ecological Engineering 17, 241-252. Schaaf, W. and Hüttl, R. F., 2006: Direct and indirect effects of soil pollution by lignite mining. Water, Air and Soil Pollution - Focus 6, 253-264. Schaaf, W., Bens, O., Fischer, A., Gerke, H.H., Gerwin, W., Grünewald, U., Holländer, H.M., Kögel-Knabner, I., Mutz, M., Schloter, M., Schulin, R., Veste, M., Winter, S. & Hüttl, R.F., 2011: Patterns and processes of initial terrestrial-ecosystem development. Journal of Plant Nutrition and Soil Science, 174, 229-239. Schaaf, W., Elmer, M., Fischer, A., Gerwin, W., Nenov, R., Pretsch, H. and Zaplate, M.K., 2013: Feedbacks between vegetation, surface structures and hydrology

  1. Evaluation of the approach to respirable quartz exposure control in U.S. coal mines.

    PubMed

    Joy, Gerald J

    2012-01-01

    Occupational exposure to high levels of respirable quartz can result in respiratory and other diseases in humans. The Mine Safety and Health Adminstration (MSHA) regulates exposure to respirable quartz in coal mines indirectly through reductions in the respirable coal mine dust exposure limit based on the content of quartz in the airborne respirable dust. This reduction is implemented when the quartz content of airborne respirable dust exceeds 5% by weight. The intent of this dust standard reduction is to restrict miners' exposure to respirable quartz to a time-weighted average concentration of 100 μg/m(3). The effectiveness of this indirect approach to control quartz exposure was evaluated by analyzing respirable dust samples collected by MSHA inspectors from 1995 through 2008. The performance of the current regulatory approach was found to be lacking due to the use of a variable property-quartz content in airborne dust-to establish a standard for subsequent exposures. In one situation, 11.7% (4370/37,346) of samples that were below the applicable respirable coal mine dust exposure limit exceeded 100 μg/m(3) quartz. In a second situation, 4.4% (895/20,560) of samples with 5% or less quartz content in the airborne respirable dust exceeded 100 μg/m(3) quartz. In these two situations, the samples exceeding 100 μg/m(3) quartz were not subject to any potential compliance action. Therefore, the current respirable quartz exposure control approach does not reliably maintain miner exposure below 100 μg/m(3) quartz. A separate and specific respirable quartz exposure standard may improve control of coal miners' occupational exposure to respirable quartz. PMID:22181563

  2. EST mining identifies proteins putatively secreted by the anthracnose pathogen Colletotrichum truncatum

    PubMed Central

    2011-01-01

    Background Colletotrichum truncatum is a haploid, hemibiotrophic, ascomycete fungal pathogen that causes anthracnose disease on many economically important leguminous crops. This pathogen exploits sequential biotrophic- and necrotrophic- infection strategies to colonize the host. Transition from biotrophy to a destructive necrotrophic phase called the biotrophy-necrotrophy switch is critical in symptom development. C. truncatum likely secretes an arsenal of proteins that are implicated in maintaining a compatible interaction with its host. Some of them might be transition specific. Results A directional cDNA library was constructed from mRNA isolated from infected Lens culinaris leaflet tissues displaying the biotrophy-necrotrophy switch of C. truncatum and 5000 expressed sequence tags (ESTs) with an average read of > 600 bp from the 5-prime end were generated. Nearly 39% of the ESTs were predicted to encode proteins of fungal origin and among these, 162 ESTs were predicted to contain N-terminal signal peptides (SPs) in their deduced open reading frames (ORFs). The 162 sequences could be assembled into 122 tentative unigenes comprising 32 contigs and 90 singletons. Sequence analyses of unigenes revealed four potential groups: hydrolases, cell envelope associated proteins (CEAPs), candidate effectors and other proteins. Eleven candidate effector genes were identified based on features common to characterized fungal effectors, i.e. they encode small, soluble (lack of transmembrane domain), cysteine-rich proteins with a putative SP. For a selected subset of CEAPs and candidate effectors, semiquantitative RT-PCR showed that these transcripts were either expressed constitutively in both in vitro and in planta or induced during plant infection. Using potato virus X (PVX) based transient expression assays, we showed that one of the candidate effectors, i. e. contig 8 that encodes a cerato-platanin (CP) domain containing protein, unlike CP proteins from other fungal

  3. Forecasting Precipitation over the MENA Region: A Data Mining and Remote Sensing Based Approach

    NASA Astrophysics Data System (ADS)

    Elkadiri, R.; Sultan, M.; Elbayoumi, T.; Chouinard, K.

    2015-12-01

    We developed and applied an integrated approach to construct predictive tools with lead times of 1 to 12 months to forecast precipitation amounts over the Middle East and North Africa (MENA) region. The following steps were conducted: (1) acquire and analyze temporal remote sensing-based precipitation datasets (i.e. Tropical Rainfall Measuring Mission [TRMM]) over five main water source regions in the MENA area (i.e. Atlas Mountains in Morocco, Southern Sudan, Red Sea Hills of Yemen, and Blue Nile and White Nile source areas) throughout the investigation period (1998 to 2015), (2) acquire and extract monthly values for all of the climatic indices that are likely to influence the climatic patterns over the MENA region (e.g., Northern Atlantic Oscillation [NOI], Southern Oscillation Index [SOI], and Tropical North Atlantic Index [TNA]); and (3) apply data mining methods to extract relationships between the observed precipitation and the controlling factors (climatic indices) and use predictive tools to forecast monthly precipitation over each of the identified pilot study areas. Preliminary results indicate that by using the period from January 1998 until August 2012 for model training and the period from September 2012 to January 2015 for testing, precipitation can be successfully predicted with a three-months lead over South West Yemen, Atlas Mountains in Morocco, Southern Sudan, Blue Nile sources and White Nile sources with confidence (Pearson correlation coefficient: 0.911, 0.823, 0.807, 0.801 and 0.895 respectively). Future work will focus on applying this technique for prediction of precipitation over each of the climatically contiguous areas of the MENA region. If our efforts are successful, our findings will lead the way to the development and implementation of sound water management scenarios for the MENA countries.

  4. An Approach to Identify Site Response Directivity of Accelerometer Sites and Application to the Iranian Area

    NASA Astrophysics Data System (ADS)

    Del Gaudio, Vincenzo; Pierri, Pierpaolo; Rajabi, Ali M.

    2015-06-01

    In recent years, several workers have found numerous cases of sites characterised by significant azimuthal variation of dynamic response to seismic shaking. The causes of this phenomenon are still unclear, but are possibly related to combinations of geological and geomorphological factors determining a polarisation of resonance effects. To improve their comprehension, it would be desirable to extend the database of observations on this phenomenon. Thus, considering that unrevealed cases of site response directivity can be "hidden" among the sites of accelerometer networks, we developed a two-stage approach of data mining from existing strong motion databases to identify sites affected by directional amplification. The proposed procedure first calculates Arias Intensity tensor components from accelerometer recordings of each site to determine mean directional variations of total shaking energy. Then, at the sites where a significant anisotropy appears in ground motion, azimuthal variations of HVSR values (spectral ratios between horizontal and vertical components of recordings) are analysed to confirm the occurrence of site resonance conditions. We applied this technique to a database of recordings acquired by accelerometer stations in the Iranian area. The results of this investigation pointed out some sites affected by directional resonance that appear to be correlated to the orientation of local tectonic lineaments, these being mostly transversal to the direction of maximum shaking. Comparing Arias Intensities observed at these sites with theoretical estimates provided by ground motion prediction equations, the presence of significant site amplifications was confirmed. The magnitude of the amplification factors appear to be correlated to the results of HVSR analysis, even though the pattern of dispersion of HVSR values suggests that while high peak values of spectral ratios are indicative of strong amplifications, lower values do not necessarily imply lower

  5. Mining and characterization of two amidase signature family amidases from Brevibacterium epidermidis ZJB-07021 by an efficient genome mining approach.

    PubMed

    Ruan, Li-Tao; Zheng, Ren-Chao; Zheng, Yu-Guo

    2016-10-01

    Amidases have received increasing attention for their significant potential in the production of valuable carboxylic acids. In this study, two amidases belonging to amidase signature family (BeAmi2 and BeAmi4) were identified and mined from genomic DNA of Brevibacterium epidermidis ZJB-07021 by an efficient strategy combining comparative analysis of genomes and identification of unknown region by high-efficiency thermal asymmetric interlaced PCR (HiTAIL-PCR). The deduced amino acid sequences of BeAmi2 and BeAmi4 showed low identity (< 40%) with other reported amidases. The two amidases displayed optimum activity toward a wide spectrum of substrates at a mild alkaline pH and 45 °C. Both of them were remarkably inactivated by serine-directed inhibitor and sulfhydryl-reducing agent. Kinetic analysis revealed that nicotinamide was the preferable substrate for both amidases and the chlorine substitutions on the pyridine ring had a negative effect on activity. The bioprocesses for hydrolysis of 100 mM nicotinamide, isonicotinamide, 2-chloronicotinamide and 5-chloronicotinamide with purified BeAmi2 (6 U mL(-1)) were complete in 60 min with full conversion except 2-chloronicotinamide. These results indicated BeAmi2 was an effective catalyst for hydrolysis of several nicotinamide derivatives. PMID:27180252

  6. A Data Mining Approach for Examining Predictors of Physical Activity among Older Urban Adults

    PubMed Central

    Yoon, Sunmoo; Suero-Tejeda, Niurka; Bakken, Suzanne

    2015-01-01

    This study applied innovative data mining techniques to a community survey dataset to develop prediction models for two aspects of physical activity (active transport and screen time) in sample of older, primarily Hispanic, urban adults (N=2, 514). Main predictors for active transport (accuracy=69.29%, precision .67, recall .69) were immigrant status, high level of anxiety, having a place for physical activity, and willingness to make time for physical activity. The main predictors for screen time (accuracy=63.13%, precision .60, recall .63) were willingness to make time for exercise, having a place for exercise, age, and availability of family support to look up health information on the Internet. Data mining methods were useful to identify intervention targets and inform design of customized interventions. PMID:25941800

  7. A Data Mining Approach for Examining Predictors of Physical Activity Among Urban Older Adults.

    PubMed

    Yoon, Sunmoo; Suero-Tejeda, Niurka; Bakken, Suzanne

    2015-07-01

    The current study applied innovative data mining techniques to a community survey dataset to develop prediction models for two aspects of physical activity (i.e., active transport and screen time) in a sample of urban, primarily Hispanic, older adults (N=2,514). Main predictors for active transport (accuracy=69.29%, precision=0.67, recall=0.69) were immigrant status, high level of anxiety, having a place for physical activity, and willingness to make time for physical activity. The main predictors for screen time (accuracy=63.13%, precision=0.60, recall=0.63) were willingness to make time for exercise, having a place for exercise, age, and availability of family support to access health information on the Internet. Data mining methods were useful to identify intervention targets and inform design of customized interventions. PMID:25941800

  8. Text Mining.

    ERIC Educational Resources Information Center

    Trybula, Walter J.

    1999-01-01

    Reviews the state of research in text mining, focusing on newer developments. The intent is to describe the disparate investigations currently included under the term text mining and provide a cohesive structure for these efforts. A summary of research identifies key organizations responsible for pushing the development of text mining. A section…

  9. Identifying and overcoming the constraints that prevent the full implementation of decommissioning and remediation programs in uranium mining sites.

    PubMed

    Franklin, Mariza Ramalho; Fernandes, Horst Monken

    2013-05-01

    Environmental remediation of radioactive contamination is about achieving appropriate reduction of exposures to ionizing radiation. This goal can be achieved by means of isolation or removal of the contamination source(s) or by breaking the exposure pathways. Ideally, environmental remediation is part of the planning phase of any industrial operation with the potential to cause environmental contamination. This concept is even more important in mining operations due to the significant impacts produced. This approach has not been considered in several operations developed in the past. Therefore many legacy sites face the challenge to implement appropriate remediation plans. One of the first barriers to remediation works is the lack of financial resources as environmental issues used to be taken in the past as marginal costs and were not included in the overall budget of the company. This paper analyses the situation of the former uranium production site of Poços de Caldas in Brazil. It is demonstrated that in addition to the lack of resources, other barriers such as the lack of information on site characteristics, appropriate regulatory framework, funding mechanisms, stakeholder involvement, policy and strategy, technical experience and mechanism for the appropriation of adequate technical expertise will play key roles in preventing the implementation of remediation programs. All these barriers are discussed and some solutions are suggested. It is expected that lessons learned from the Poços de Caldas legacy site may stimulate advancement of more sustainable options in the development of future uranium production centers. PMID:21955840

  10. Pattern recognition and data mining techniques to identify factors in wafer processing and control determining overlay error

    NASA Astrophysics Data System (ADS)

    Lam, Auguste; Ypma, Alexander; Gatefait, Maxime; Deckers, David; Koopman, Arne; van Haren, Richard; Beltman, Jan

    2015-03-01

    On-product overlay can be improved through the use of context data from the fab and the scanner. Continuous improvements in lithography and processing performance over the past years have resulted in consequent overlay performance improvement for critical layers. Identification of the remaining factors causing systematic disturbances and inefficiencies will further reduce overlay. By building a context database, mappings between context, fingerprints and alignment & overlay metrology can be learned through techniques from pattern recognition and data mining. We relate structure (`patterns') in the metrology data to relevant contextual factors. Once understood, these factors could be moved to the known effects (e.g. the presence of systematic fingerprints from reticle writing error or lens and reticle heating). Hence, we build up a knowledge base of known effects based on data. Outcomes from such an integral (`holistic') approach to lithography data analysis may be exploited in a model-based predictive overlay controller that combines feedback and feedforward control [1]. Hence, the available measurements from scanner, fab and metrology equipment are combined to reveal opportunities for further overlay improvement which would otherwise go unnoticed.

  11. A multi-isotope approach to characterize acid mine drainage in a hardrock alpine mine, Chaffe Co,Colorado.

    NASA Astrophysics Data System (ADS)

    Cordalis, D.; Williams, M. W.; Wireman, M.; Michel, R. L.; Manning, A.

    2004-12-01

    Here we present information from an innovative suite of stable, radiogenic, and cosmogenic isotopes to better understand groundwater flowpaths and groundwater-surface water interactions in an applied acid mine drainage system. Stable water isotopes, tritium, helium-tritium, sulfur-35, and uranium 234/238 ratios were analyzed from precipitation, groundwater wells, interior mine drainages, and surface waters at the Mary Murphy Mine in Colorado to determine hydrologic transport mechanisms responsible for contaminated zinc releases. Hydrometric measurements suggested a snowmelt-driven pulse of elevated zinc in adit outflow. However, mixing models using stable water isotopes showed a regional groundwater signal in the adit outflow. Tritium values of 11 to 13 TU showed a slight enrichment of bomb spike water compared to snow values of about 9 TU, suggesting an older water source as well. Helium/tritium ratios on a subset of groundwater wells suggested that average residence times of alluvial wells ranged from 2.5 to 8 years. The combination of stable water isotopes and sulfur-35 (half-life of 87 days), showed that zinc-rich waters within the mine derived from infiltrating snowmelt more than a year old. However, measurement of sulfur-35 using low-level scintillation counts was compromised at times by the presence of uranium. We were able to remove the uranium through wet chemistry procedures, improving the accuracy of S-35 measurements. The U234/U238 ratio shows promise in discriminating between acid mine drainage and acid rock drainage. Acid rock drainage shows an unaltered ratio of 1:1, while acid mine drainage is enriched relative to the 1:1 equilibrium ratio. The combination of cosmogenic and stable isotopes within and near the Mary Murphy Mine may provide a useful tool for studying interactions between groundwater and surfacewater in a fractured rock setting. Remediation techniques can be directed more appropriately, and cost effectively, by the characterization of

  12. Web Mining

    NASA Astrophysics Data System (ADS)

    Fürnkranz, Johannes

    The World-Wide Web provides every internet citizen with access to an abundance of information, but it becomes increasingly difficult to identify the relevant pieces of information. Research in web mining tries to address this problem by applying techniques from data mining and machine learning to Web data and documents. This chapter provides a brief overview of web mining techniques and research areas, most notably hypertext classification, wrapper induction, recommender systems and web usage mining.

  13. Stochastic Modeling Approach for the Evaluation of Backbreak due to Blasting Operations in Open Pit Mines

    NASA Astrophysics Data System (ADS)

    Sari, Mehmet; Ghasemi, Ebrahim; Ataei, Mohammad

    2014-03-01

    Backbreak is an undesirable side effect of bench blasting operations in open pit mines. A large number of parameters affect backbreak, including controllable parameters (such as blast design parameters and explosive characteristics) and uncontrollable parameters (such as rock and discontinuities properties). The complexity of the backbreak phenomenon and the uncertainty in terms of the impact of various parameters makes its prediction very difficult. The aim of this paper is to determine the suitability of the stochastic modeling approach for the prediction of backbreak and to assess the influence of controllable parameters on the phenomenon. To achieve this, a database containing actual measured backbreak occurrences and the major effective controllable parameters on backbreak (i.e., burden, spacing, stemming length, powder factor, and geometric stiffness ratio) was created from 175 blasting events in the Sungun copper mine, Iran. From this database, first, a new site-specific empirical equation for predicting backbreak was developed using multiple regression analysis. Then, the backbreak phenomenon was simulated by the Monte Carlo (MC) method. The results reveal that stochastic modeling is a good means of modeling and evaluating the effects of the variability of blasting parameters on backbreak. Thus, the developed model is suitable for practical use in the Sungun copper mine. Finally, a sensitivity analysis showed that stemming length is the most important parameter in controlling backbreak.

  14. Soil quality assessment using GIS-based chemometric approach and pollution indices: Nakhlak mining district, Central Iran.

    PubMed

    Moore, Farid; Sheykhi, Vahideh; Salari, Mohammad; Bagheri, Adel

    2016-04-01

    This paper is a comprehensive assessment of the quality of soil in the Nakhlak mining district in Central Iran with special reference to potentially toxic metals. In this regard, an integrated approach involving geostatistical, correlation matrix, pollution indices, and chemical fractionation measurement is used to evaluate selected potentially toxic metals in soil samples. The fractionation of metals indicated a relatively high variability. Some metals (Mo, Ag, and Pb) showed important enrichment in the bioavailable fractions (i.e., exchangeable and carbonate), whereas the residual fraction mostly comprised Sb and Cr. The Cd, Zn, Co, Ni, Mo, Cu, and As were retained in Fe-Mn oxide and oxidizable fractions, suggesting that they may be released to the environment by changes in physicochemical conditions. The spatial variability patterns of 11 soil heavy metals (Ag, As, Cd, Co, Cr, Cu, Mo, Ni, Pb, Sb, and Zn) were identified and mapped. The results demonstrated that Ag, As, Cd, Mo, Cu, Pb, Sb, and Zn pollution are associated with mineralized veins and mining operations in this area. Further environmental monitoring and remedial actions are required for management of soil heavy metals in the study area. The present study not only enhanced our knowledge regarding soil pollution in the study area but also introduced a better technique to analyze pollution indices by multivariate geostatistical methods. PMID:26956012

  15. An experimental approach to assessing the effects of mining subsidence on a flood meadow community

    SciTech Connect

    Benyon, P.R.; Humphries, R.N.; Gregson, K.; Marshall, S.; Peace, S.W.

    1998-12-31

    The Lower Derwent Valley (LDV) is a candidate Special Area of Conservation (SAC) under the provisions of the UK 1994 Conservation Regulations for its internationally important Alopecurus pratense-Sanguisorba officinalis flood meadow vegetation. Mining from RJB`s Selby Complex (UK`s largest mine) has taken place around and under the LDV since the 1980s. Under the provisions of the Regulations the potential effects of mining subsidence have been recently reviewed. From field data and models it has been predicted that the resulting small amount of subsidence is unlikely to have a deleterious effect on the composition and extent of the key community. While the proposed long-term monitoring will verify the prediction, it will be some years before the results will be available. In order to identify incipient changes in grassland community and to implement any necessary mitigation measures before significant changes occur, a field experiment was set up in late 1996 to assess the effects of increased wetness and inundation which might be induced by subsidence. This involved the transplantation of turves from the different grassland communities within and along a previously defined gradient of relative wetness and inundation. The response of the communities to the different conditions is being monitored. The background studies and the results of the transplantation so far will be presented.

  16. A Control Chart Approach for Representing and Mining Data Streams with Shape Based Similarity

    SciTech Connect

    Omitaomu, Olufemi A

    2014-01-01

    The mining of data streams for online condition monitoring is a challenging task in several domains including (electric) power grid system, intelligent manufacturing, and consumer science. Considering a power grid application in which thousands of sensors, called the phasor measurement units, are deployed on the power grid network to continuously collect streams of digital data for real-time situational awareness and system management. Depending on design, each sensor could stream between ten and sixty data samples per second. The myriad of sensory data captured could convey deeper insights about sequence of events in real-time and before major damages are done. However, the timely processing and analysis of these high-velocity and high-volume data streams is a challenge. Hence, a new data processing and transformation approach, based on the concept of control charts, for representing sequence of data streams from sensors is proposed. In addition, an application of the proposed approach for enhancing data mining tasks such as clustering using real-world power grid data streams is presented. The results indicate that the proposed approach is very efficient for data streams storage and manipulation.

  17. The adaptive approach for storage assignment by mining data of warehouse management system for distribution centres

    NASA Astrophysics Data System (ADS)

    Ming-Huang Chiang, David; Lin, Chia-Ping; Chen, Mu-Chen

    2011-05-01

    Among distribution centre operations, order picking has been reported to be the most labour-intensive activity. Sophisticated storage assignment policies adopted to reduce the travel distance of order picking have been explored in the literature. Unfortunately, previous research has been devoted to locating entire products from scratch. Instead, this study intends to propose an adaptive approach, a Data Mining-based Storage Assignment approach (DMSA), to find the optimal storage assignment for newly delivered products that need to be put away when there is vacant shelf space in a distribution centre. In the DMSA, a new association index (AIX) is developed to evaluate the fitness between the put away products and the unassigned storage locations by applying association rule mining. With AIX, the storage location assignment problem (SLAP) can be formulated and solved as a binary integer programming. To evaluate the performance of DMSA, a real-world order database of a distribution centre is obtained and used to compare the results from DMSA with a random assignment approach. It turns out that DMSA outperforms random assignment as the number of put away products and the proportion of put away products with high turnover rates increase.

  18. A collaborative approach for mine waste cleanup -- the Animas River experience

    SciTech Connect

    Broetzman, G.; Parsons, G.

    1996-11-01

    An innovative, collaborative approach is underway in the Animas River Basin for addressing a myriad of inactive mine waste sites using a watershed framework. A group composed of all vested interest in the Basin, including the regulatory agencies, are evaluating all sites. Their intent is to select those sites that will lead to a cost-effective attainment of State-defined water quality improvements in the Animas River. This paper will address process, methodology, regulatory, and related issues associated with this overall effort.

  19. Knowledge Discovery using Domain-Concept Mining Approach for the Behavioral Risk Factor Surveillance System (BRFSS) Data

    PubMed Central

    Mahamaneerat, Wannapa Kay; Shyu, Chi-Ren

    2006-01-01

    The publicly available Behavioral Risk Factor Surveillance System (BRFSS) data is the largest telephone survey data set in the world. Often times, the data set is under-utilized due to its size and the difficulties to comprehend and explore the relationships among variables. With a traditional data mining approach, such as association rule (AR) mining, it is still not possible to discover valuable information under the existing computational power. To promote the usefulness of this rich data set efficiently, we propose a novel data mining approach called Domain-Concept Mining (DCM) that partitions data into groups of relevant domain-concept, then extracts associations among variables from each partition. The findings from the DCM show that it can efficiently discover relevant information from the BRFSS with respect to the previously published literature. PMID:17238640

  20. VALUING ACID MINE DRAINAGE REMEDIATION OF IMPAIRED WATERWAYS IN WEST VIRGINIA: A HEDONIC MODELING APPROACH

    EPA Science Inventory

    States with active and abandoned mines face large private and public costs to remediate damage to streams and rivers from acid mine drainage (AMD), the metal rich runoff flowing primarily from abandoned mines and surface deposits of mine waste. AMD can lower stream and river pH ...

  1. A Data Mining Approach to Predict In Situ Detoxification Potential of Chlorinated Ethenes.

    PubMed

    Lee, Jaejin; Im, Jeongdae; Kim, Ungtae; Löffler, Frank E

    2016-05-17

    Despite advances in physicochemical remediation technologies, in situ bioremediation treatment based on Dehalococcoides mccartyi (Dhc) reductive dechlorination activity remains a cornerstone approach to remedy sites impacted with chlorinated ethenes. Selecting the best remedial strategy is challenging due to uncertainties and complexity associated with biological and geochemical factors influencing Dhc activity. Guidelines based on measurable biogeochemical parameters have been proposed, but contemporary efforts fall short of meaningfully integrating the available information. Extensive groundwater monitoring data sets have been collected for decades, but have not been systematically analyzed and used for developing tools to guide decision-making. In the present study, geochemical and microbial data sets collected from 35 wells at five contaminated sites were used to demonstrate that a data mining prediction model using the classification and regression tree (CART) algorithm can provide improved predictive understanding of a site's reductive dechlorination potential. The CART model successfully predicted the 3-month-ahead reductive dechlorination potential with 75.8% and 69.5% true positive rate (i.e., sensitivity) for the training set and the test set, respectively. The machine learning algorithm ranked parameters by relative importance for assessing in situ reductive dechlorination potential. The abundance of Dhc 16S rRNA genes, CH4, Fe(2+), NO3(-), NO2(-), and SO4(2-) concentrations, total organic carbon (TOC) amounts, and oxidation-reduction potential (ORP) displayed significant correlations (p < 0.01) with dechlorination potential, with NO3(-), NO2(-), and Fe(2+) concentrations exhibiting precedence over other parameters. Contrary to prior efforts, the power of data mining approaches lies in the ability to discern synergetic effects between multiple parameters that affect reductive dechlorination activity. Overall, these findings demonstrate that data mining

  2. A multi-disciplinary approach to understanding the impacts of mines on traditional uses of water in Northern Mongolia.

    PubMed

    McIntyre, Neil; Bulovic, Nevenka; Cane, Isabel; McKenna, Phill

    2016-07-01

    Mongolia is an example of a nation where the rapidity of mining development is outpacing capacity to manage the potential land and water resources impacts. Further, Mongolia has a particular social and economic reliance on traditional uses of land and water, principally livestock herding. While some mining operations are setting high standards in protecting the natural resources surrounding the mine site, others have less incentive and capacity to do so and therefore are having adverse effects on surrounding communities. The paper describes a case study of the Sharyn Gol Soum in northern Mongolia where a range of mining types, from artisanal, small-scale mining to a large coal mine, operate alongside traditional herding lifestyles. A multi-disciplinary approach is taken to observe and attribute causes to the water resources impacts in the area. Surveys of the herding household community, land use mapping, and monitoring the spatial variations in water quality indicate deterioration of water resources. Collectively, the different sources of evidence suggest that the deterioration is mainly due to small-scale gold mining. The evidence included the perception of 78% of the interviewed herders that water quality had changed due to mining; a change in the footprint of small-scale gold mining from 2.8 to 15.2km(2) during the period 1999 to 2015; and pH and sulphate values in 2015 consistently outside the ranges observed at a baseline site in the same region. It is concluded that the lack of baseline data and effective governance mechanisms are fundamental challenges that need to be addressed if Mongolia's transition to a mining economy is to be managed alongside sustainability of herder lifestyles. PMID:27016688

  3. A data mining approach to optimize pellets manufacturing process based on a decision tree algorithm.

    PubMed

    Ronowicz, Joanna; Thommes, Markus; Kleinebudde, Peter; Krysiński, Jerzy

    2015-06-20

    The present study is focused on the thorough analysis of cause-effect relationships between pellet formulation characteristics (pellet composition as well as process parameters) and the selected quality attribute of the final product. The shape using the aspect ratio value expressed the quality of pellets. A data matrix for chemometric analysis consisted of 224 pellet formulations performed by means of eight different active pharmaceutical ingredients and several various excipients, using different extrusion/spheronization process conditions. The data set contained 14 input variables (both formulation and process variables) and one output variable (pellet aspect ratio). A tree regression algorithm consistent with the Quality by Design concept was applied to obtain deeper understanding and knowledge of formulation and process parameters affecting the final pellet sphericity. The clear interpretable set of decision rules were generated. The spehronization speed, spheronization time, number of holes and water content of extrudate have been recognized as the key factors influencing pellet aspect ratio. The most spherical pellets were achieved by using a large number of holes during extrusion, a high spheronizer speed and longer time of spheronization. The described data mining approach enhances knowledge about pelletization process and simultaneously facilitates searching for the optimal process conditions which are necessary to achieve ideal spherical pellets, resulting in good flow characteristics. This data mining approach can be taken into consideration by industrial formulation scientists to support rational decision making in the field of pellets technology. PMID:25835791

  4. Novel data-mining approach identifies biomarkers for diagnosis of Kawasaki disease

    PubMed Central

    Tremoulet, Adriana H.; Dutkowski, Janusz; Sato, Yuichiro; Kanegaye, John T.; Ling, Xuefeng B.; Burns, Jane C.

    2015-01-01

    Background As Kawasaki disease (KD) shares many clinical features with other more common febrile illnesses and misdiagnosis, leading to a delay in treatment, increases the risk of coronary artery damage, a diagnostic test for KD is urgently needed. We sought to develop a panel of biomarkers that could distinguish between acute KD patients and febrile controls (FC) with sufficient accuracy to be clinically useful. Methods Plasma samples were collected from three independent cohorts of FC and acute KD patients who met the American Heart Association definition for KD and presented within the first 10 days of fever. The levels of 88 biomarkers associated with inflammation were assessed by Luminex bead technology. Unsupervised clustering followed by supervised clustering using a Random Forest model was used to find a panel of candidate biomarkers. Results A panel of biomarkers commonly available in the hospital laboratory (absolute neutrophil count, erythrocyte sedimentation rate, alanine aminotransferase, gamma glutamyl transferase, concentrations of alpha-1-antitrypsin, C-reactive protein, and fibrinogen, and platelet count) accurately diagnosed 81 to 96% of KD patients in a series of three independent cohorts. Conclusions After prospective validation, this 8-biomarker panel may improve the recognition of KD. PMID:26237629

  5. tmVar: a text mining approach for extracting sequence variants in biomedical literature

    PubMed Central

    Wei, Chih-Hsuan; Harris, Bethany R.; Kao, Hung-Yu; Lu, Zhiyong

    2013-01-01

    Motivation: Text-mining mutation information from the literature becomes a critical part of the bioinformatics approach for the analysis and interpretation of sequence variations in complex diseases in the post-genomic era. It has also been used for assisting the creation of disease-related mutation databases. Most of existing approaches are rule-based and focus on limited types of sequence variations, such as protein point mutations. Thus, extending their extraction scope requires significant manual efforts in examining new instances and developing corresponding rules. As such, new automatic approaches are greatly needed for extracting different kinds of mutations with high accuracy. Results: Here, we report tmVar, a text-mining approach based on conditional random field (CRF) for extracting a wide range of sequence variants described at protein, DNA and RNA levels according to a standard nomenclature developed by the Human Genome Variation Society. By doing so, we cover several important types of mutations that were not considered in past studies. Using a novel CRF label model and feature set, our method achieves higher performance than a state-of-the-art method on both our corpus (91.4 versus 78.1% in F-measure) and their own gold standard (93.9 versus 89.4% in F-measure). These results suggest that tmVar is a high-performance method for mutation extraction from biomedical literature. Availability: tmVar software and its corpus of 500 manually curated abstracts are available for download at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/pub/tmVar. Contact: zhiyong.lu@nih.gov PMID:23564842

  6. Use of lead isotopes to identify sources of metal and metalloid contaminants in atmospheric aerosol from mining operations.

    PubMed

    Félix, Omar I; Csavina, Janae; Field, Jason; Rine, Kyle P; Sáez, A Eduardo; Betterton, Eric A

    2015-03-01

    Mining operations are a potential source of metal and metalloid contamination by atmospheric particulate generated from smelting activities, as well as from erosion of mine tailings. In this work, we show how lead isotopes can be used for source apportionment of metal and metalloid contaminants from the site of an active copper mine. Analysis of atmospheric aerosol shows two distinct isotopic signatures: one prevalent in fine particles (<1μm aerodynamic diameter) while the other corresponds to coarse particles as well as particles in all size ranges from a nearby urban environment. The lead isotopic ratios found in the fine particles are equal to those of the mine that provides the ore to the smelter. Topsoil samples at the mining site show concentrations of Pb and As decreasing with distance from the smelter. Isotopic ratios for the sample closest to the smelter (650m) and from topsoil at all sample locations, extending to more than 1km from the smelter, were similar to those found in fine particles in atmospheric dust. The results validate the use of lead isotope signatures for source apportionment of metal and metalloid contaminants transported by atmospheric particulate. PMID:25496740

  7. Use of Lead Isotopes to Identify Sources of Metal and Metalloid Contaminants in Atmospheric Aerosol from Mining Operations

    PubMed Central

    Félix, Omar I.; Csavina, Janae; Field, Jason; Rine, Kyle P.; Sáez, A. Eduardo; Betterton, Eric A.

    2014-01-01

    Mining operations are a potential source of metal and metalloid contamination by atmospheric particulate generated from smelting activities, as well as from erosion of mine tailings. In this work, we show how lead isotopes can be used for source apportionment of metal and metalloid contaminants from the site of an active copper mine. Analysis of atmospheric aerosol shows two distinct isotopic signatures: one prevalent in fine particles (< 1 μm aerodynamic diameter) while the other corresponds to coarse particles as well as particles in all size ranges from a nearby urban environment. The lead isotopic ratios found in the fine particles are equal to those of the mine that provides the ore to the smelter. Topsoil samples at the mining site show concentrations of Pb and As decreasing with distance from the smelter. Isotopic ratios for the sample closest to the smelter (650 m) and from topsoil at all sample locations, extending to more than 1 km from the smelter, were similar to those found in fine particles in atmospheric dust. The results validate the use of lead isotope signatures for source apportionment of metal and metalloid contaminants transported by atmospheric particulate. PMID:25496740

  8. Order Batching in Warehouses by Minimizing Total Tardiness: A Hybrid Approach of Weighted Association Rule Mining and Genetic Algorithms

    PubMed Central

    Taheri, Shahrooz; Mat Saman, Muhamad Zameri; Wong, Kuan Yew

    2013-01-01

    One of the cost-intensive issues in managing warehouses is the order picking problem which deals with the retrieval of items from their storage locations in order to meet customer requests. Many solution approaches have been proposed in order to minimize traveling distance in the process of order picking. However, in practice, customer orders have to be completed by certain due dates in order to avoid tardiness which is neglected in most of the related scientific papers. Consequently, we proposed a novel solution approach in order to minimize tardiness which consists of four phases. First of all, weighted association rule mining has been used to calculate associations between orders with respect to their due date. Next, a batching model based on binary integer programming has been formulated to maximize the associations between orders within each batch. Subsequently, the order picking phase will come up which used a Genetic Algorithm integrated with the Traveling Salesman Problem in order to identify the most suitable travel path. Finally, the Genetic Algorithm has been applied for sequencing the constructed batches in order to minimize tardiness. Illustrative examples and comparisons are presented to demonstrate the proficiency and solution quality of the proposed approach. PMID:23864823

  9. Citation Mining: Integrating Text Mining and Bibliometrics for Research User Profiling.

    ERIC Educational Resources Information Center

    Kostoff, Ronald N.; del Rio, J. Antonio; Humenik, James A.; Garcia, Esther Ofilia; Ramirez, Ana Maria

    2001-01-01

    Discusses the importance of identifying the users and impact of research, and describes an approach for identifying the pathways through which research can impact other research, technology development, and applications. Describes a study that used citation mining, an integration of citation bibliometrics and text mining, on articles from the…

  10. Using a Text-Mining Approach to Evaluate the Quality of Nursing Records.

    PubMed

    Chang, Hsiu-Mei; Chiou, Shwu-Fen; Liu, Hsiu-Yun; Yu, Hui-Chu

    2016-01-01

    Nursing records in Taiwan have been computerized, but their quality has rarely been discussed. Therefore, this study employed a text-mining approach and a cross-sectional retrospective research design to evaluate the quality of electronic nursing records at a medical center in Northern Taiwan. SAS Text Miner software Version 13.2 was employed to analyze unstructured nursing event records. The results show that SAS Text Miner is suitable for developing a textmining model for validating nursing records. The sensitivity of SAS Text Miner was approximately 0.94, and the specificity and accuracy were 0.99. Thus, SAS Text Miner software is an effective tool for auditing unstructured electronic nursing records. PMID:27332355

  11. Colorado School of Mines behavioral approach to the 1995 UGR competition

    NASA Astrophysics Data System (ADS)

    Murphy, Robin R.; Hoff, William A.; Blitch, John; Gough, Val; Hawkins, Dale; Hoffman, James C.; Krosley, Ramon; Lyons, Torsten; Mali, Amol; MacMillan, James; Warshawsky, Steven

    1995-12-01

    The Colorado School of Mines (CSM) entry placed fourth in the 1995 International Unmanned Ground Robotics Competition sponsored by the Association for Unmanned Vehicles (AUVS). Clementine 2, a battery powered children's jeep outfitted with a 100 MHz Pentium field computer, a camcorder, and a panning ultrasonic range finder served as the platform. The objectives of the CSM team were to gain familiarity with the CSM architecture by applying it to a well defined problem, evaluate existing computer vision based road following techniques, and gain practical experience in using multiple sensing modalities. The entry used the behavioral portion of the CSM hybrid deliberative/reactive architecture, which divided robot activities into four strategic and tactical behaviors: vision based follow-path, ultrasonic based avoid-obstacle, pan-camera, and speed-control using inclinometers. This paper details the motivation behind the CSM entry, the approach taken, and lessons learned.

  12. A data mining approach to predict in situ chlorinated ethene detoxification potential

    NASA Astrophysics Data System (ADS)

    Lee, J.; Im, J.; Kim, U.; Loeffler, F. E.

    2015-12-01

    Despite major advances in physicochemical remediation technologies, in situ biostimulation and bioaugmentation treatment aimed at stimulating Dehalococcoides mccartyi (Dhc) reductive dechlorination activity remains a cornerstone approach to remedy sites impacted with chlorinated ethenes. In practice, selecting the best remedial strategy is challenging due to uncertainties associated with the microbiology (e.g., presence and activity of Dhc) and geochemical factors influencing Dhc activity. Extensive groundwater datasets collected over decades of monitoring exist, but have not been systematically analyzed. In the present study, geochemical and microbial data sets collected from 35 wells at 5 contaminated sites were used to develop a predictive empirical model using a machine learning algorithm (i) to rank the relative importance of parameters that affect in situ reductive dechlorination potential, and (ii) to provide recommendations for selecting the optimal remediation strategy at a specific site. Classification and regression tree (CART) analysis was applied, and a representative classification tree model was developed that allowed short-term prediction of dechlorination potential. Indirect indicators for low dissolved oxygen (e.g., low NO3-and NO2-, high Fe2+ and CH4) were the most influential factors for predicting dechlorination potential, followed by total organic carbon content (TOC) and Dhc cell abundance. These findings indicate that machine learning-based data mining techniques applied to groundwater monitoring data can lead to the development of predictive groundwater remediation models. A major need for improving the predictive capabilities of the data mining approach is a curated, up-to-date and comprehensive collection of groundwater monitoring data.

  13. EVALUATION OF FUGITIVE DUST EMISSIONS FROM MINING

    EPA Science Inventory

    This evaluation of fugitive dust air pollution from mining operations identifies and compiles currently available information on emission sources and rates, regulatory approaches, control techniques, measuring and monitoring techniques, health and welfare effects, and research pr...

  14. Identifying medical terms in patient-authored text: a crowdsourcing-based approach

    PubMed Central

    MacLean, Diana Lynn; Heer, Jeffrey

    2013-01-01

    Background and objective As people increasingly engage in online health-seeking behavior and contribute to health-oriented websites, the volume of medical text authored by patients and other medical novices grows rapidly. However, we lack an effective method for automatically identifying medical terms in patient-authored text (PAT). We demonstrate that crowdsourcing PAT medical term identification tasks to non-experts is a viable method for creating large, accurately-labeled PAT datasets; moreover, such datasets can be used to train classifiers that outperform existing medical term identification tools. Materials and methods To evaluate the viability of using non-expert crowds to label PAT, we compare expert (registered nurses) and non-expert (Amazon Mechanical Turk workers; Turkers) responses to a PAT medical term identification task. Next, we build a crowd-labeled dataset comprising 10 000 sentences from MedHelp. We train two models on this dataset and evaluate their performance, as well as that of MetaMap, Open Biomedical Annotator (OBA), and NaCTeM's TerMINE, against two gold standard datasets: one from MedHelp and the other from CureTogether. Results When aggregated according to a corroborative voting policy, Turker responses predict expert responses with an F1 score of 84%. A conditional random field (CRF) trained on 10 000 crowd-labeled MedHelp sentences achieves an F1 score of 78% against the CureTogether gold standard, widely outperforming OBA (47%), TerMINE (43%), and MetaMap (39%). A failure analysis of the CRF suggests that misclassified terms are likely to be either generic or rare. Conclusions Our results show that combining statistical models sensitive to sentence-level context with crowd-labeled data is a scalable and effective technique for automatically identifying medical terms in PAT. PMID:23645553

  15. A Network Biology Approach Identifies Molecular Cross-Talk between Normal Prostate Epithelial and Prostate Carcinoma Cells.

    PubMed

    Trevino, Victor; Cassese, Alberto; Nagy, Zsuzsanna; Zhuang, Xiaodong; Herbert, John; Antzack, Philipp; Clarke, Kim; Davies, Nicholas; Rahman, Ayesha; Campbell, Moray J; Guindani, Michele; Bicknell, Roy; Vannucci, Marina; Falciani, Francesco

    2016-04-01

    The advent of functional genomics has enabled the genome-wide characterization of the molecular state of cells and tissues, virtually at every level of biological organization. The difficulty in organizing and mining this unprecedented amount of information has stimulated the development of computational methods designed to infer the underlying structure of regulatory networks from observational data. These important developments had a profound impact in biological sciences since they triggered the development of a novel data-driven investigative approach. In cancer research, this strategy has been particularly successful. It has contributed to the identification of novel biomarkers, to a better characterization of disease heterogeneity and to a more in depth understanding of cancer pathophysiology. However, so far these approaches have not explicitly addressed the challenge of identifying networks representing the interaction of different cell types in a complex tissue. Since these interactions represent an essential part of the biology of both diseased and healthy tissues, it is of paramount importance that this challenge is addressed. Here we report the definition of a network reverse engineering strategy designed to infer directional signals linking adjacent cell types within a complex tissue. The application of this inference strategy to prostate cancer genome-wide expression profiling data validated the approach and revealed that normal epithelial cells exert an anti-tumour activity on prostate carcinoma cells. Moreover, by using a Bayesian hierarchical model integrating genetics and gene expression data and combining this with survival analysis, we show that the expression of putative cell communication genes related to focal adhesion and secretion is affected by epistatic gene copy number variation and it is predictive of patient survival. Ultimately, this study represents a generalizable approach to the challenge of deciphering cell communication networks

  16. A Network Biology Approach Identifies Molecular Cross-Talk between Normal Prostate Epithelial and Prostate Carcinoma Cells

    PubMed Central

    Trevino, Victor; Cassese, Alberto; Nagy, Zsuzsanna; Zhuang, Xiaodong; Herbert, John; Antzack, Philipp; Clarke, Kim; Davies, Nicholas; Rahman, Ayesha; Campbell, Moray J.; Bicknell, Roy; Vannucci, Marina; Falciani, Francesco

    2016-01-01

    Abstract The advent of functional genomics has enabled the genome-wide characterization of the molecular state of cells and tissues, virtually at every level of biological organization. The difficulty in organizing and mining this unprecedented amount of information has stimulated the development of computational methods designed to infer the underlying structure of regulatory networks from observational data. These important developments had a profound impact in biological sciences since they triggered the development of a novel data-driven investigative approach. In cancer research, this strategy has been particularly successful. It has contributed to the identification of novel biomarkers, to a better characterization of disease heterogeneity and to a more in depth understanding of cancer pathophysiology. However, so far these approaches have not explicitly addressed the challenge of identifying networks representing the interaction of different cell types in a complex tissue. Since these interactions represent an essential part of the biology of both diseased and healthy tissues, it is of paramount importance that this challenge is addressed. Here we report the definition of a network reverse engineering strategy designed to infer directional signals linking adjacent cell types within a complex tissue. The application of this inference strategy to prostate cancer genome-wide expression profiling data validated the approach and revealed that normal epithelial cells exert an anti-tumour activity on prostate carcinoma cells. Moreover, by using a Bayesian hierarchical model integrating genetics and gene expression data and combining this with survival analysis, we show that the expression of putative cell communication genes related to focal adhesion and secretion is affected by epistatic gene copy number variation and it is predictive of patient survival. Ultimately, this study represents a generalizable approach to the challenge of deciphering cell communication

  17. Mining quasi-bicliques from HIV-1-human protein interaction network: a multiobjective biclustering approach.

    PubMed

    Maulik, Ujjwal; Mukhopadhyay, Anirban; Bhattacharyya, Malay; Kaderali, Lars; Brors, Benedikt; Bandyopadhyay, Sanghamitra; Eils, Roland

    2013-01-01

    In this work, we model the problem of mining quasi-bicliques from weighted viral-host protein-protein interaction network as a biclustering problem for identifying strong interaction modules. In this regard, a multiobjective genetic algorithm-based biclustering technique is proposed that simultaneously optimizes three objective functions to obtain dense biclusters having high mean interaction strengths. The performance of the proposed technique has been compared with that of other existing biclustering methods on an artificial data. Subsequently, the proposed biclustering method is applied on the records of biologically validated and predicted interactions between a set of HIV-1 proteins and a set of human proteins to identify strong interaction modules. For this, the entire interaction information is realized as a bipartite graph. We have further investigated the biological significance of the obtained biclusters. The human proteins involved in the strong interaction module have been found to share common biological properties and they are identified as the gateways of viral infection leading to various diseases. These human proteins can be potential drug targets for developing anti-HIV drugs. PMID:23929866

  18. Mining Quasi-Bicliques from HIV-1--Human Protein Interaction Network: A Multiobjective Biclustering Approach.

    PubMed

    Maulik, Ujjwal; Mukhopadhyay, Anirban; Bhattacharyya, Malay; Kaderali, Lars; Brors, Benedikt; Bandyopadhyay, Sanghamitra; Eils, Roland

    2012-11-28

    In this work, we model the problem of mining quasi-bicliques from weighted viral-host protein-protein interaction network as a biclustering problem for identifying strong interaction modules. In this regard, a multiobjective genetic algorithm based biclustering technique is proposed that simultaneously optimizes three objective functions to obtain dense biclusters having high mean interaction strengths. The performance of the proposed technique has been compared with that of other existing biclustering methods on an artificial data. Subsequently, the proposed biclustering method is applied on the records of biologically validated and predicted interactions between a set of HIV-1 proteins and a set of human proteins to identify strong interaction modules. For this, the entire interaction information is realized as a bipartite graph. We have further investigated the biological significance of the obtained biclusters. The human proteins involved in the strong interaction module have been found to share common biological properties and they are identified as the gateways of viral infection leading to various diseases. These human proteins can be potential drug targets for developing anti-HIV drugs. PMID:23209057

  19. Integrating Communication into Engineering Curricula: An Interdisciplinary Approach to Facilitating Transfer at New Mexico Institute of Mining and Technology

    ERIC Educational Resources Information Center

    Ford, Julie Dyke

    2012-01-01

    This program profile describes a new approach towards integrating communication within Mechanical Engineering curricula. The author, who holds a joint appointment between Technical Communication and Mechanical Engineering at New Mexico Institute of Mining and Technology, has been collaborating with Mechanical Engineering colleagues to establish a…

  20. An Approach to Developing Independent Learning and Non-Technical Skills Amongst Final Year Mining Engineering Students

    ERIC Educational Resources Information Center

    Knobbs, C. G.; Grayson, D. J.

    2012-01-01

    There is mounting evidence to show that engineers need more than technical skills to succeed in industry. This paper describes a curriculum innovation in which so-called "soft" skills, specifically inter-personal and intra-personal skills, were integrated into a final year mining engineering course. The instructional approach was designed to…

  1. The Voice of Chinese Health Consumers: A Text Mining Approach to Web-Based Physician Reviews

    PubMed Central

    Zhang, Kunpeng

    2016-01-01

    experience of finding doctors, doctors’ technical skills and bedside manner, general appreciation from patients, and description of various symptoms. Conclusions To the best of our knowledge, our work is the first study using an automated text-mining approach to analyze a large amount of unstructured textual data of Web-based physician reviews in China. Based on our analysis, we found that Chinese reviewers mainly concentrate on a few popular topics. This is consistent with the goal of Chinese online health platforms and demonstrates the health care focus in China’s health care system. Our text-mining approach reveals a new research area on how to use big data to help health care providers, health care administrators, and policy makers hear patient voices, target patient concerns, and improve the quality of care in this age of patient-centered care. Also, on the health care consumer side, our text mining technique helps patients make more informed decisions about which specialists to see without reading thousands of reviews, which is simply not feasible. In addition, our comparison analysis of Web-based physician reviews in China and the United States also indicates some cultural differences. PMID:27165558

  2. Identifying diagnostically-relevant resting state brain functional connectivity in the ventral posterior complex via genetic data mining in autism spectrum disorder.

    PubMed

    Baldwin, Philip R; Curtis, Kaylah N; Patriquin, Michelle A; Wolf, Varina; Viswanath, Humsini; Shaw, Chad; Sakai, Yasunari; Salas, Ramiro

    2016-05-01

    Exome sequencing and copy number variation analyses continue to provide novel insight to the biological bases of autism spectrum disorder (ASD). The growing speed at which massive genetic data are produced causes serious lags in analysis and interpretation of the data. Thus, there is a need to develop systematic genetic data mining processes that facilitate efficient analysis of large datasets. We report a new genetic data mining system, ProcessGeneLists and integrated a list of ASD-related genes with currently available resources in gene expression and functional connectivity of the human brain. Our data-mining program successfully identified three primary regions of interest (ROIs) in the mouse brain: inferior colliculus, ventral posterior complex of the thalamus (VPC), and parafascicular nucleus (PFn). To understand its pathogenic relevance in ASD, we examined the resting state functional connectivity (RSFC) of the homologous ROIs in human brain with other brain regions that were previously implicated in the neuro-psychiatric features of ASD. Among them, the RSFC of the VPC with the medial frontal gyrus (MFG) was significantly more anticorrelated, whereas the RSFC of the PN with the globus pallidus was significantly increased in children with ASD compared with healthy children. Moreover, greater values of RSFC between VPC and MFG were correlated with severity index and repetitive behaviors in children with ASD. No significant RSFC differences were detected in adults with ASD. Together, these data demonstrate the utility of our data-mining program through identifying the aberrant connectivity of thalamo-cortical circuits in children with ASD. Autism Res 2016, 9: 553-562. © 2015 International Society for Autism Research, Wiley Periodicals, Inc. PMID:26451751

  3. Correlation of HIV protease structure with Indinavir resistance: a data mining and neural networks approach

    NASA Astrophysics Data System (ADS)

    Draghici, Sorin; Cumberland, Lonnie T., Jr.; Kovari, Ladislau C.

    2000-04-01

    This paper presents some results of data mining HIV genotypic and structural data. Our aim is to try to relate structural features of HIV enzymes essential to its reproductive abilities to the drug resistance phenomenon. This paper concentrates on the HIV protease enzyme and Indinavir which is one of the FDA approved protease inhibitors. Our starting point was the current list of HIV mutations related to drug resistance. We used the fact that some molecular structures determined through high resolution X-ray crystallography were available for the protease-Indinavir complex. Starting with these structures and the known mutations, we modelled the mutant proteases and studied the pattern of atomic contacts between the protease and the drug. After suitable pre- processing, these patterns have been used as the input of our data mining process. We have used both supervised and unsupervised learning techniques with the aim of understanding the relationship between structural features at a molecular level and resistance to Indinavir. The supervised learning was aimed at predicting IC90 values for arbitrary mutants. The SOFM was aimed at identifying those structural features that are important for drug resistance and discovering a classifier based on such features. We have used validation and cross validation to test the generalization abilities of the learning paradigm we have designed. The straightforward supervised learning was able to learn very successfully but validation results are less than satisfactory. This is due to the insufficient number of patterns in the training set which in turn is due to the scarcity of the available data. The data mining using SOFM was very successful. We have managed to distinguish between resistant and non-resistant mutants using structural features. We have been able to divide all reported HIV mutants into several categories based on their 3- dimensional molecular structures and the pattern of contacts between the mutant protease and

  4. Identifying Key Priorities for Future Palliative Care Research Using an Innovative Analytic Approach

    PubMed Central

    Riffin, Catherine; Pillemer, Karl; Chen, Emily K.; Warmington, Marcus; Adelman, Ronald D.; Reid, M. C.

    2015-01-01

    Using an innovative approach, we identified research priorities in palliative care to guide future research initiatives. We searched 7databases (2005–2012) for review articles published on the topics of palliative and hospice–end-of-life care. The identified research recommendations (n = 648) fell into 2 distinct categories: (1) ways to improve methodological approaches and (2) specific topic areas in need of future study. The most commonly cited priority within the theme of methodological approaches was the need for enhanced rigor. Specific topics in need of future study included perspectives and needs of patients, relatives, and providers; underrepresented populations; decision-making; cost-effectiveness; provider education; spirituality; service use; and inter-disciplinary approaches to delivering palliative care. This review underscores the need for additional research on specific topics and methodologically rigorous research to inform health policy and practice. PMID:25393169

  5. BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature

    PubMed Central

    2009-01-01

    Background To automatically process large quantities of biological literature for knowledge discovery and information curation, text mining tools are becoming essential. Abbreviation recognition is related to NER and can be considered as a pair recognition task of a terminology and its corresponding abbreviation from free text. The successful identification of abbreviation and its corresponding definition is not only a prerequisite to index terms of text databases to produce articles of related interests, but also a building block to improve existing gene mention tagging and gene normalization tools. Results Our approach to abbreviation recognition (AR) is based on machine-learning, which exploits a novel set of rich features to learn rules from training data. Tested on the AB3P corpus, our system demonstrated a F-score of 89.90% with 95.86% precision at 84.64% recall, higher than the result achieved by the existing best AR performance system. We also annotated a new corpus of 1200 PubMed abstracts which was derived from BioCreative II gene normalization corpus. On our annotated corpus, our system achieved a F-score of 86.20% with 93.52% precision at 79.95% recall, which also outperforms all tested systems. Conclusion By applying our system to extract all short form-long form pairs from all available PubMed abstracts, we have constructed BIOADI. Mining BIOADI reveals many interesting trends of bio-medical research. Besides, we also provide an off-line AR software in the download section on http://bioagent.iis.sinica.edu.tw/BIOADI/. PMID:19958517

  6. Model-based approach to the detection and classification of mines in sidescan sonar.

    PubMed

    Reed, Scott; Petillot, Yvan; Bell, Judith

    2004-01-10

    This paper presents a model-based approach to mine detection and classification by use of sidescan sonar. Advances in autonomous underwater vehicle technology have increased the interest in automatic target recognition systems in an effort to automate a process that is currently carried out by a human operator. Current automated systems generally require training and thus produce poor results when the test data set is different from the training set. This has led to research into unsupervised systems, which are able to cope with the large variability in conditions and terrains seen in sidescan imagery. The system presented in this paper first detects possible minelike objects using a Markov random field model, which operates well on noisy images, such as sidescan, and allows a priori information to be included through the use of priors. The highlight and shadow regions of the object are then extracted with a cooperating statistical snake, which assumes these regions are statistically separate from the background. Finally, a classification decision is made using Dempster-Shafer theory, where the extracted features are compared with synthetic realizations generated with a sidescan sonar simulator model. Results for the entire process are shown on real sidescan sonar data. Similarities between the sidescan sonar and synthetic aperture radar (SAR) imaging processes ensure that the approach outlined here could be made applied to SAR image analysis. PMID:14735943

  7. Model-based approach to the detection and classification of mines in sidescan sonar

    NASA Astrophysics Data System (ADS)

    Reed, Scott; Petillot, Yvan; Bell, Judith

    2004-01-01

    This paper presents a model-based approach to mine detection and classification by use of sidescan sonar. Advances in autonomous underwater vehicle technology have increased the interest in automatic target recognition systems in an effort to automate a process that is currently carried out by a human operator. Current automated systems generally require training and thus produce poor results when the test data set is different from the training set. This has led to research into unsupervised systems, which are able to cope with the large variability in conditions and terrains seen in sidescan imagery. The system presented in this paper first detects possible minelike objects using a Markov random field model, which operates well on noisy images, such as sidescan, and allows a priori information to be included through the use of priors. The highlight and shadow regions of the object are then extracted with a cooperating statistical snake, which assumes these regions are statistically separate from the background. Finally, a classification decision is made using Dempster-Shafer theory, where the extracted features are compared with synthetic realizations generated with a sidescan sonar simulator model. Results for the entire process are shown on real sidescan sonar data. Similarities between the sidescan sonar and synthetic aperture radar (SAR) imaging processes ensure that the approach outlined here could be made applied to SAR image analysis.

  8. Systematic Analysis of the Molecular Mechanism Underlying Decidualization Using a Text Mining Approach.

    PubMed

    Liu, Ji-Long; Wang, Tong-Song

    2015-01-01

    Decidualization is a crucial process for successful embryo implantation and pregnancy in humans. Defects in decidualization during early pregnancy are associated with several pregnancy complications, such as pre-eclampsia, intrauterine growth restriction and recurrent pregnancy loss. However, the mechanism underlying decidualization remains poorly understood. In the present study, we performed a systematic analysis of decidualization-related genes using text mining. We identified 286 genes for humans and 287 genes for mice respectively, with an overlap of 111 genes shared by both species. Through enrichment test, we demonstrated that although divergence was observed, the majority of enriched gene ontology terms and pathways were shared by both species, suggesting that functional categories were more conserved than individual genes. We further constructed a decidualization-related protein-protein interaction network consisted of 344 nodes connected via 1,541 edges. We prioritized genes in this network and identified 12 genes that may be key regulators of decidualization. These findings would provide some clues for further research on the mechanism underlying decidualization. PMID:26222155

  9. Systematic Analysis of the Molecular Mechanism Underlying Decidualization Using a Text Mining Approach

    PubMed Central

    Liu, Ji-Long; Wang, Tong-Song

    2015-01-01

    Decidualization is a crucial process for successful embryo implantation and pregnancy in humans. Defects in decidualization during early pregnancy are associated with several pregnancy complications, such as pre-eclampsia, intrauterine growth restriction and recurrent pregnancy loss. However, the mechanism underlying decidualization remains poorly understood. In the present study, we performed a systematic analysis of decidualization-related genes using text mining. We identified 286 genes for humans and 287 genes for mice respectively, with an overlap of 111 genes shared by both species. Through enrichment test, we demonstrated that although divergence was observed, the majority of enriched gene ontology terms and pathways were shared by both species, suggesting that functional categories were more conserved than individual genes. We further constructed a decidualization-related protein-protein interaction network consisted of 344 nodes connected via 1,541 edges. We prioritized genes in this network and identified 12 genes that may be key regulators of decidualization. These findings would provide some clues for further research on the mechanism underlying decidualization. PMID:26222155

  10. Optimizing data collection for public health decisions: a data mining approach

    PubMed Central

    2014-01-01

    Background Collecting data can be cumbersome and expensive. Lack of relevant, accurate and timely data for research to inform policy may negatively impact public health. The aim of this study was to test if the careful removal of items from two community nutrition surveys guided by a data mining technique called feature selection, can (a) identify a reduced dataset, while (b) not damaging the signal inside that data. Methods The Nutrition Environment Measures Surveys for stores (NEMS-S) and restaurants (NEMS-R) were completed on 885 retail food outlets in two counties in West Virginia between May and November of 2011. A reduced dataset was identified for each outlet type using feature selection. Coefficients from linear regression modeling were used to weight items in the reduced datasets. Weighted item values were summed with the error term to compute reduced item survey scores. Scores produced by the full survey were compared to the reduced item scores using a Wilcoxon rank-sum test. Results Feature selection identified 9 store and 16 restaurant survey items as significant predictors of the score produced from the full survey. The linear regression models built from the reduced feature sets had R2 values of 92% and 94% for restaurant and grocery store data, respectively. Conclusions While there are many potentially important variables in any domain, the most useful set may only be a small subset. The use of feature selection in the initial phase of data collection to identify the most influential variables may be a useful tool to greatly reduce the amount of data needed thereby reducing cost. PMID:24919484

  11. The impact of vascular diameter ratio on hemodialysis maturation time: Evidence from data mining approaches and thermodynamics law

    PubMed Central

    Rezapour, Mohammad; Taran, Somayeh; Balin Parast, Mahmood; Khavanin Zadeh, Morteza

    2016-01-01

    Background: Vascular Access (VA) is an important aspect for blood circulatory in Hemodialysis (HD). Arteriovenous Fistula (AVF) is a suitable procedure to gain VA. Maturation of the AVF is a status of AVF, which can be cannulated for HD. This study aimed to discover the parameters that effectively reduce the duration between VA and start of HD, which symbolizes the maturation time (MT). Methods: Ninety-six patients who underwent AVF creation were selected for this study. The decision tree method was used based on CART/C4.5 algorithm, which is one of the data mining approaches for data classification. Vascular diameter ratio (VDR) coefficient was obtained (VDR=Artery/Vein diameters). Results: We investigated the relationship between the VDR and MT in this study and found that MT is reversely related to VDR in elderly patients, while this relation was direct in younger patients. Conclusion: The analysis revealed a Spearman's correlation coefficient for Vein diameter with MT. MT decreases when diameters of vein and artery are close to one another. This study can help the surgeons to identify high- risk patients who elongate MT for HD. PMID:27453889

  12. Identifying Useful Auxiliary Variables for Incomplete Data Analyses: A Note on a Group Difference Examination Approach

    ERIC Educational Resources Information Center

    Raykov, Tenko; Marcoulides, George A.

    2014-01-01

    This research note contributes to the discussion of methods that can be used to identify useful auxiliary variables for analyses of incomplete data sets. A latent variable approach is discussed, which is helpful in finding auxiliary variables with the property that if included in subsequent maximum likelihood analyses they may enhance considerably…

  13. APPLICATION OF A TIERED SURROGATE APPROACH TO IDENTIFY TOXICITY SURROGATES FOR HUMAN HEALTH RISK ASSESSMENT

    EPA Science Inventory

    APPLICATION OF A TIERED SURROGATE APPROACH TO IDENTIFY TOXICITY SURROGATES FOR HUMAN HEALTH RISK ASSESSMENT. P.R. Dodmane1, L.E. Lizarraga1, J.P. Kaiser2, S.C. Wesselkamper2, Q.J. Zhao2. 1ORISE Participant, U.S. EPA, National Center for Environmental Assessment (NCEA), Cincinnati...

  14. Doing the Work of Extension: Three Approaches to Identify, Amplify, and Implement Outreach

    ERIC Educational Resources Information Center

    Raison, Brian

    2014-01-01

    This article explores the literature and practice of how the Cooperative Extension Service does its work and asks if traditional outreach and engagement models have room for innovative delivery mechanisms that may identify emerging trends and help meet community needs. It considers three innovative approaches to the educational mission:…

  15. Using a Linkage Mapping Approach to Identify QTL for Day-Neutrality in the Octoploid Strawberry

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A linkage mapping approach was used to identify quantitative trait loci (QTL) associated with day-neutrality in the commercial strawberry, Fragaria ×ananassa (Duch ex Rozier). Amplified Fragment Length Polymorphic (AFLP) markers were used to build a genetic map with a population of 127 lines develo...

  16. An Information Theoretic Approach for Identifying Shared Information and Asymmetric Relationships among Variables.

    ERIC Educational Resources Information Center

    Golden, Linda L.; And Others

    1990-01-01

    The general-information-theoretic approach was used to identify informational overlap and asymmetry between variables, using affective, cognitive, and behavioral measures. Using the chi-squared test, no significant differences were found in response rates, demographics, or patronage frequency of three stores between numerical (n=453) and graphic…

  17. Identifying Core Mobile Learning Faculty Competencies Based Integrated Approach: A Delphi Study

    ERIC Educational Resources Information Center

    Elbarbary, Rafik Said

    2015-01-01

    This study is based on the integrated approach as a concept framework to identify, categorize, and rank a key component of mobile learning core competencies for Egyptian faculty members in higher education. The field investigation framework used four rounds Delphi technique to determine the importance rate of each component of core competencies…

  18. A Comprehensive Approach to Identifying Intervention Targets for Patient-Safety Improvement in a Hospital Setting

    ERIC Educational Resources Information Center

    Cunningham, Thomas R.; Geller, E. Scott

    2012-01-01

    Despite differences in approaches to organizational problem solving, healthcare managers and organizational behavior management (OBM) practitioners share a number of practices, and connecting healthcare management with OBM may lead to improvements in patient safety. A broad needs-assessment methodology was applied to identify patient-safety…

  19. The Baby TALK Model: An Innovative Approach to Identifying High-Risk Children and Families

    ERIC Educational Resources Information Center

    Villalpando, Aimee Hilado; Leow, Christine; Hornstein, John

    2012-01-01

    This research report examines the Baby TALK model, an innovative early childhood intervention approach used to identify, recruit, and serve young children who are at-risk for developmental delays, mental health needs, and/or school failure, and their families. The report begins with a description of the model. This description is followed by an…

  20. A Function-First Approach to Identifying Formulaic Language in Academic Writing

    ERIC Educational Resources Information Center

    Durrant, Philip; Mathews-Aydinli, Julie

    2011-01-01

    There is currently much interest in creating pedagogically-oriented descriptions of formulaic language. Research in this area has typically taken what we call a "form-first" approach, in which formulas are identified as the most frequent recurrent forms in a relevant corpus. While this research continues to yield valuable results, the present…

  1. Identifying Bioaccumulative Halogenated Organic Compounds Using a Nontargeted Analytical Approach: Seabirds as Sentinels

    PubMed Central

    Millow, Christopher J.; Mackintosh, Susan A.; Lewison, Rebecca L.; Dodder, Nathan G.; Hoh, Eunha

    2015-01-01

    Persistent organic pollutants (POPs) are typically monitored via targeted mass spectrometry, which potentially identifies only a fraction of the contaminants actually present in environmental samples. With new anthropogenic compounds continuously introduced to the environment, novel and proactive approaches that provide a comprehensive alternative to targeted methods are needed in order to more completely characterize the diversity of known and unknown compounds likely to cause adverse effects. Nontargeted mass spectrometry attempts to extensively screen for compounds, providing a feasible approach for identifying contaminants that warrant future monitoring. We employed a nontargeted analytical method using comprehensive two-dimensional gas chromatography coupled to time-of-flight mass spectrometry (GC×GC/TOF-MS) to characterize halogenated organic compounds (HOCs) in California Black skimmer (Rynchops niger) eggs. Our study identified 111 HOCs; 84 of these compounds were regularly detected via targeted approaches, while 27 were classified as typically unmonitored or unknown. Typically unmonitored compounds of note in bird eggs included tris(4-chlorophenyl)methane (TCPM), tris(4-chlorophenyl)methanol (TCPMOH), triclosan, permethrin, heptachloro-1'-methyl-1,2'-bipyrrole (MBP), as well as four halogenated unknown compounds that could not be identified through database searching or the literature. The presence of these compounds in Black skimmer eggs suggests they are persistent, bioaccumulative, potentially biomagnifying, and maternally transferring. Our results highlight the utility and importance of employing nontargeted analytical tools to assess true contaminant burdens in organisms, as well as to demonstrate the value in using environmental sentinels to proactively identify novel contaminants. PMID:26020245

  2. Application of techniques to identify coal-mine and power-generation effects on surface-water quality, San Juan River basin, New Mexico and Colorado

    USGS Publications Warehouse

    Goetz, C.L.; Abeyta, Cynthia G.; Thomas, E.V.

    1987-01-01

    Numerous analytical techniques were applied to determine water quality changes in the San Juan River basin upstream of Shiprock , New Mexico. Eight techniques were used to analyze hydrologic data such as: precipitation, water quality, and streamflow. The eight methods used are: (1) Piper diagram, (2) time-series plot, (3) frequency distribution, (4) box-and-whisker plot, (5) seasonal Kendall test, (6) Wilcoxon rank-sum test, (7) SEASRS procedure, and (8) analysis of flow adjusted, specific conductance data and smoothing. Post-1963 changes in dissolved solids concentration, dissolved potassium concentration, specific conductance, suspended sediment concentration, or suspended sediment load in the San Juan River downstream from the surface coal mines were examined to determine if coal mining was having an effect on the quality of surface water. None of the analytical methods used to analyzed the data showed any increase in dissolved solids concentration, dissolved potassium concentration, or specific conductance in the river downstream from the mines; some of the analytical methods used showed a decrease in dissolved solids concentration and specific conductance. Chaco River, an ephemeral stream tributary to the San Juan River, undergoes changes in water quality due to effluent from a power generation facility. The discharge in the Chaco River contributes about 1.9% of the average annual discharge at the downstream station, San Juan River at Shiprock, NM. The changes in water quality detected at the Chaco River station were not detected at the downstream Shiprock station. It was not possible, with the available data, to identify any effects of the surface coal mines on water quality that were separable from those of urbanization, agriculture, and other cultural and natural changes. In order to determine the specific causes of changes in water quality, it would be necessary to collect additional data at strategically located stations. (Author 's abstract)

  3. A Data Mining Approach to Reveal Representative Collaboration Indicators in Open Collaboration Frameworks

    ERIC Educational Resources Information Center

    Anaya, Antonio R.; Boticario, Jesus G.

    2009-01-01

    Data mining methods are successful in educational environments to discover new knowledge or learner skills or features. Unfortunately, they have not been used in depth with collaboration. We have developed a scalable data mining method, whose objective is to infer information on the collaboration during the collaboration process in a…

  4. VALUING ACID MINE DRAINAGE REMEDIATION IN WEST VIRGINIA: A HEDONIC MODELING APPROACH

    EPA Science Inventory

    States with active and abandoned mines face large private and public costs to remediate damage to streams and rivers from acid mine drainage (AMD). Appalachian states have an especially large number of contaminated streams and rivers, and the USGS places AMD as the primary source...

  5. VALUING ACID MINE DRAINAGE REMEDIATION IN WEST VIRGINIA: A HEDONIC MODELING APPROACH INCORPORATING GEOGRAPHIC INFORMATION SYSTEMS

    EPA Science Inventory

    States with active and abandoned mines face large private and public costs to remediate damage to streams and rivers from acid mine drainage (AMD). Appalachian states have an especially large number of contaminated streams and rivers, and the USGS places AMD as the primary source...

  6. Missing defects? A comparison of microscopic and macroscopic approaches to identifying linear enamel hypoplasia.

    PubMed

    Hassett, Brenna R

    2014-03-01

    Linear enamel hypoplasia (LEH), the presence of linear defects of dental enamel formed during periods of growth disruption, is frequently analyzed in physical anthropology as evidence for childhood health in the past. However, a wide variety of methods for identifying and interpreting these defects in archaeological remains exists, preventing easy cross-comparison of results from disparate studies. This article compares a standard approach to identifying LEH using the naked eye to the evidence of growth disruption observed microscopically from the enamel surface. This comparison demonstrates that what is interpreted as evidence of growth disruption microscopically is not uniformly identified with the naked eye, and provides a reference for the level of consistency between the number and timing of defects identified using microscopic versus macroscopic approaches. This is done for different tooth types using a large sample of unworn permanent teeth drawn from several post-medieval London burial assemblages. The resulting schematic diagrams showing where macroscopic methods achieve more or less similar results to microscopic methods are presented here and clearly demonstrate that "naked-eye" methods of identifying growth disruptions do not identify LEH as often as microscopic methods in areas where perikymata are more densely packed. PMID:24323494

  7. Identifying inhibitory compounds in lignocellulosic biomass hydrolysates using an exometabolomics approach

    PubMed Central

    2014-01-01

    Background Inhibitors are formed that reduce the fermentation performance of fermenting yeast during the pretreatment process of lignocellulosic biomass. An exometabolomics approach was applied to systematically identify inhibitors in lignocellulosic biomass hydrolysates. Results We studied the composition and fermentability of 24 different biomass hydrolysates. To create diversity, the 24 hydrolysates were prepared from six different biomass types, namely sugar cane bagasse, corn stover, wheat straw, barley straw, willow wood chips and oak sawdust, and with four different pretreatment methods, i.e. dilute acid, mild alkaline, alkaline/peracetic acid and concentrated acid. Their composition and that of fermentation samples generated with these hydrolysates were analyzed with two GC-MS methods. Either ethyl acetate extraction or ethyl chloroformate derivatization was used before conducting GC-MS to prevent sugars are overloaded in the chromatograms, which obscure the detection of less abundant compounds. Using multivariate PLS-2CV and nPLS-2CV data analysis models, potential inhibitors were identified through establishing relationship between fermentability and composition of the hydrolysates. These identified compounds were tested for their effects on the growth of the model yeast, Saccharomyces. cerevisiae CEN.PK 113-7D, confirming that the majority of the identified compounds were indeed inhibitors. Conclusion Inhibitory compounds in lignocellulosic biomass hydrolysates were successfully identified using a non-targeted systematic approach: metabolomics. The identified inhibitors include both known ones, such as furfural, HMF and vanillin, and novel inhibitors, namely sorbic acid and phenylacetaldehyde. PMID:24655423

  8. Quantitative and qualitative approaches to identifying migration chronology in a continental migrant

    USGS Publications Warehouse

    Beatty, William S.; Kesler, Dylan C.; Webb, Elisabeth B.; Raedeke, Andrew H.; Naylor, Luke W.; Humburg, Dale D.

    2013-01-01

    The degree to which extrinsic factors influence migration chronology in North American waterfowl has not been quantified, particularly for dabbling ducks. Previous studies have examined waterfowl migration using various methods, however, quantitative approaches to define avian migration chronology over broad spatio-temporal scales are limited, and the implications for using different approaches have not been assessed. We used movement data from 19 female adult mallards (Anas platyrhynchos) equipped with solar-powered global positioning system satellite transmitters to evaluate two individual level approaches for quantifying migration chronology. The first approach defined migration based on individual movements among geopolitical boundaries (state, provincial, international), whereas the second method modeled net displacement as a function of time using nonlinear models. Differences in migration chronologies identified by each of the approaches were examined with analysis of variance. The geopolitical method identified mean autumn migration midpoints at 15 November 2010 and 13 November 2011, whereas the net displacement method identified midpoints at 15 November 2010 and 14 November 2011. The mean midpoints for spring migration were 3 April 2011 and 20 March 2012 using the geopolitical method and 31 March 2011 and 22 March 2012 using the net displacement method. The duration, initiation date, midpoint, and termination date for both autumn and spring migration did not differ between the two individual level approaches. Although we did not detect differences in migration parameters between the different approaches, the net displacement metric offers broad potential to address questions in movement ecology for migrating species. Ultimately, an objective definition of migration chronology will allow researchers to obtain a comprehensive understanding of the extrinsic factors that drive migration at the individual and population levels. As a result, targeted

  9. Quantitative and Qualitative Approaches to Identifying Migration Chronology in a Continental Migrant

    PubMed Central

    Beatty, William S.; Kesler, Dylan C.; Webb, Elisabeth B.; Raedeke, Andrew H.; Naylor, Luke W.; Humburg, Dale D.

    2013-01-01

    The degree to which extrinsic factors influence migration chronology in North American waterfowl has not been quantified, particularly for dabbling ducks. Previous studies have examined waterfowl migration using various methods, however, quantitative approaches to define avian migration chronology over broad spatio-temporal scales are limited, and the implications for using different approaches have not been assessed. We used movement data from 19 female adult mallards (Anas platyrhynchos) equipped with solar-powered global positioning system satellite transmitters to evaluate two individual level approaches for quantifying migration chronology. The first approach defined migration based on individual movements among geopolitical boundaries (state, provincial, international), whereas the second method modeled net displacement as a function of time using nonlinear models. Differences in migration chronologies identified by each of the approaches were examined with analysis of variance. The geopolitical method identified mean autumn migration midpoints at 15 November 2010 and 13 November 2011, whereas the net displacement method identified midpoints at 15 November 2010 and 14 November 2011. The mean midpoints for spring migration were 3 April 2011 and 20 March 2012 using the geopolitical method and 31 March 2011 and 22 March 2012 using the net displacement method. The duration, initiation date, midpoint, and termination date for both autumn and spring migration did not differ between the two individual level approaches. Although we did not detect differences in migration parameters between the different approaches, the net displacement metric offers broad potential to address questions in movement ecology for migrating species. Ultimately, an objective definition of migration chronology will allow researchers to obtain a comprehensive understanding of the extrinsic factors that drive migration at the individual and population levels. As a result, targeted

  10. A cross-species bi-clustering approach to identifying conserved co-regulated genes

    PubMed Central

    Sun, Jiangwen; Jiang, Zongliang; Tian, Xiuchun; Bi, Jinbo

    2016-01-01

    Motivation: A growing number of studies have explored the process of pre-implantation embryonic development of multiple mammalian species. However, the conservation and variation among different species in their developmental programming are poorly defined due to the lack of effective computational methods for detecting co-regularized genes that are conserved across species. The most sophisticated method to date for identifying conserved co-regulated genes is a two-step approach. This approach first identifies gene clusters for each species by a cluster analysis of gene expression data, and subsequently computes the overlaps of clusters identified from different species to reveal common subgroups. This approach is ineffective to deal with the noise in the expression data introduced by the complicated procedures in quantifying gene expression. Furthermore, due to the sequential nature of the approach, the gene clusters identified in the first step may have little overlap among different species in the second step, thus difficult to detect conserved co-regulated genes. Results: We propose a cross-species bi-clustering approach which first denoises the gene expression data of each species into a data matrix. The rows of the data matrices of different species represent the same set of genes that are characterized by their expression patterns over the developmental stages of each species as columns. A novel bi-clustering method is then developed to cluster genes into subgroups by a joint sparse rank-one factorization of all the data matrices. This method decomposes a data matrix into a product of a column vector and a row vector where the column vector is a consistent indicator across the matrices (species) to identify the same gene cluster and the row vector specifies for each species the developmental stages that the clustered genes co-regulate. Efficient optimization algorithm has been developed with convergence analysis. This approach was first validated on

  11. Integrative network-based approach identifies key genetic elements in breast invasive carcinoma

    PubMed Central

    2015-01-01

    Background Breast cancer is a genetically heterogeneous type of cancer that belongs to the most prevalent types with a high mortality rate. Treatment and prognosis of breast cancer would profit largely from a correct classification and identification of genetic key drivers and major determinants driving the tumorigenesis process. In the light of the availability of tumor genomic and epigenomic data from different sources and experiments, new integrative approaches are needed to boost the probability of identifying such genetic key drivers. We present here an integrative network-based approach that is able to associate regulatory network interactions with the development of breast carcinoma by integrating information from gene expression, DNA methylation, miRNA expression, and somatic mutation datasets. Results Our results showed strong association between regulatory elements from different data sources in terms of the mutual regulatory influence and genomic proximity. By analyzing different types of regulatory interactions, TF-gene, miRNA-mRNA, and proximity analysis of somatic variants, we identified 106 genes, 68 miRNAs, and 9 mutations that are candidate drivers of oncogenic processes in breast cancer. Moreover, we unraveled regulatory interactions among these key drivers and the other elements in the breast cancer network. Intriguingly, about one third of the identified driver genes are targeted by known anti-cancer drugs and the majority of the identified key miRNAs are implicated in cancerogenesis of multiple organs. Also, the identified driver mutations likely cause damaging effects on protein functions. The constructed gene network and the identified key drivers were compared to well-established network-based methods. Conclusion The integrated molecular analysis enabled by the presented network-based approach substantially expands our knowledge base of prospective genomic drivers of genes, miRNAs, and mutations. For a good part of the identified key drivers

  12. A comparison of approaches for finding minimum identifying codes on graphs

    NASA Astrophysics Data System (ADS)

    Horan, Victoria; Adachi, Steve; Bak, Stanley

    2016-05-01

    In order to formulate mathematical conjectures likely to be true, a number of base cases must be determined. However, many combinatorial problems are NP-hard and the computational complexity makes this research approach difficult using a standard brute force approach on a typical computer. One sample problem explored is that of finding a minimum identifying code. To work around the computational issues, a variety of methods are explored and consist of a parallel computing approach using MATLAB, an adiabatic quantum optimization approach using a D-Wave quantum annealing processor, and lastly using satisfiability modulo theory (SMT) and corresponding SMT solvers. Each of these methods requires the problem to be formulated in a unique manner. In this paper, we address the challenges of computing solutions to this NP-hard problem with respect to each of these methods.

  13. Multi-variate flood damage assessment: a tree-based data-mining approach

    NASA Astrophysics Data System (ADS)

    Merz, B.; Kreibich, H.; Lall, U.

    2013-01-01

    The usual approach for flood damage assessment consists of stage-damage functions which relate the relative or absolute damage for a certain class of objects to the inundation depth. Other characteristics of the flooding situation and of the flooded object are rarely taken into account, although flood damage is influenced by a variety of factors. We apply a group of data-mining techniques, known as tree-structured models, to flood damage assessment. A very comprehensive data set of more than 1000 records of direct building damage of private households in Germany is used. Each record contains details about a large variety of potential damage-influencing characteristics, such as hydrological and hydraulic aspects of the flooding situation, early warning and emergency measures undertaken, state of precaution of the household, building characteristics and socio-economic status of the household. Regression trees and bagging decision trees are used to select the more important damage-influencing variables and to derive multi-variate flood damage models. It is shown that these models outperform existing models, and that tree-structured models are a promising alternative to traditional damage models.

  14. Identifying potential adverse effects using the web: a new approach to medical hypothesis generation

    PubMed Central

    Benton, Adrian; Ungar, Lyle; Hill, Shawndra; Hennessy, Sean; Mao, Jun; Chung, Annie; Leonard, Charles E.; Holmes, John H.

    2011-01-01

    Medical message boards are online resources where users with a particular condition exchange information, some of which they might not otherwise share with medical providers. Many of these boards contain a large number of posts and contain patient opinions and experiences that would be potentially useful to clinicians and researchers. We present an approach that is able to collect a corpus of medical message board posts, de-identify the corpus, and extract information on potential adverse drug effects discussed by users. Using a corpus of posts to breast cancer message boards, we identified drug event pairs using co-occurrence statistics. We then compared the identified drug event pairs with adverse effects listed on the package labels of tamoxifen, anastrozole, exemestane, and letrozole. Of the pairs identified by our system, 75–80% were documented on the drug labels. Some of the undocumented pairs may represent previously unidentified adverse drug effects. PMID:21820083

  15. An information-theoretic approach to assess practical identifiability of parametric dynamical systems.

    PubMed

    Pant, Sanjay; Lombardi, Damiano

    2015-10-01

    A new approach for assessing parameter identifiability of dynamical systems in a Bayesian setting is presented. The concept of Shannon entropy is employed to measure the inherent uncertainty in the parameters. The expected reduction in this uncertainty is seen as the amount of information one expects to gain about the parameters due to the availability of noisy measurements of the dynamical system. Such expected information gain is interpreted in terms of the variance of a hypothetical measurement device that can measure the parameters directly, and is related to practical identifiability of the parameters. If the individual parameters are unidentifiable, correlation between parameter combinations is assessed through conditional mutual information to determine which sets of parameters can be identified together. The information theoretic quantities of entropy and information are evaluated numerically through a combination of Monte Carlo and k-nearest neighbour methods in a non-parametric fashion. Unlike many methods to evaluate identifiability proposed in the literature, the proposed approach takes the measurement-noise into account and is not restricted to any particular noise-structure. Whilst computationally intensive for large dynamical systems, it is easily parallelisable and is non-intrusive as it does not necessitate re-writing of the numerical solvers of the dynamical system. The application of such an approach is presented for a variety of dynamical systems--ranging from systems governed by ordinary differential equations to partial differential equations--and, where possible, validated against results previously published in the literature. PMID:26292167

  16. Ab initio thermodynamic approach to identify mixed solid sorbents for CO2 capture technology

    DOE PAGESBeta

    Duan, Yuhua

    2015-10-15

    Because the current technologies for capturing CO2 are still too energy intensive, new materials must be developed that can capture CO2 reversibly with acceptable energy costs. At a given CO2 pressure, the turnover temperature (Tt) of the reaction of an individual solid that can capture CO2 is fixed. Such Tt may be outside the operating temperature range (ΔTo) for a practical capture technology. To adjust Tt to fit the practical ΔTo, in this study, three scenarios of mixing schemes are explored by combining thermodynamic database mining with first principles density functional theory and phonon lattice dynamics calculations. Our calculated resultsmore » demonstrate that by mixing different types of solids, it’s possible to shift Tt to the range of practical operating temperature conditions. According to the requirements imposed by the pre- and post- combustion technologies and based on our calculated thermodynamic properties for the CO2 capture reactions by the mixed solids of interest, we were able to identify the mixing ratios of two or more solids to form new sorbent materials for which lower capture energy costs are expected at the desired pressure and temperature conditions.« less

  17. Identifying Prognostic Features by Bottom-Up Approach and Correlating to Drug Repositioning

    PubMed Central

    Li, Wei; Yu, Jian; Lian, Baofeng; Sun, Han; Li, Jing; Zhang, Menghuan; Li, Ling; Li, Yixue; Liu, Qian; Xie, Lu

    2015-01-01

    Background Traditionally top-down method was used to identify prognostic features in cancer research. That is to say, differentially expressed genes usually in cancer versus normal were identified to see if they possess survival prediction power. The problem is that prognostic features identified from one set of patient samples can rarely be transferred to other datasets. We apply bottom-up approach in this study: survival correlated or clinical stage correlated genes were selected first and prioritized by their network topology additionally, then a small set of features can be used as a prognostic signature. Methods Gene expression profiles of a cohort of 221 hepatocellular carcinoma (HCC) patients were used as a training set, ‘bottom-up’ approach was applied to discover gene-expression signatures associated with survival in both tumor and adjacent non-tumor tissues, and compared with ‘top-down’ approach. The results were validated in a second cohort of 82 patients which was used as a testing set. Results Two sets of gene signatures separately identified in tumor and adjacent non-tumor tissues by bottom-up approach were developed in the training cohort. These two signatures were associated with overall survival times of HCC patients and the robustness of each was validated in the testing set, and each predictive performance was better than gene expression signatures reported previously. Moreover, genes in these two prognosis signature gave some indications for drug-repositioning on HCC. Some approved drugs targeting these markers have the alternative indications on hepatocellular carcinoma. Conclusion Using the bottom-up approach, we have developed two prognostic gene signatures with a limited number of genes that associated with overall survival times of patients with HCC. Furthermore, prognostic markers in these two signatures have the potential to be therapeutic targets. PMID:25738841

  18. A rule-based approach for identifying obesity and its comorbidities in medical discharge summaries.

    PubMed

    Mishra, Ninad K; Cummo, David M; Arnzen, James J; Bonander, Jason

    2009-01-01

    OBJECTIVE Evaluate the effectiveness of a simple rule-based approach in classifying medical discharge summaries according to indicators for obesity and 15 associated co-morbidities as part of the 2008 i2b2 Obesity Challenge. METHODS The authors applied a rule-based approach that looked for occurrences of morbidity-related keywords and identified the types of assertions in which those keywords occurred. The documents were then classified using a simple scoring algorithm based on a mapping of the assertion types to possible judgment categories. MEASUREMENTS RESULTS for the challenge were evaluated based on macro F-measure. We report micro and macro F-measure results for all morbidities combined and for each morbidity separately. Results Our rule-based approach achieved micro and macro F-measures of 0.97 and 0.77, respectively, ranking fifth out of the entries submitted by 28 teams participating in the classification task based on textual judgments and substantially outperforming the average for the challenge. CONCLUSIONS As shown by its ranking in the challenge results, this approach performed relatively well under conditions in which limited training data existed for some judgment categories. Further, the approach held up well in relation to more complex approaches applied to this classification task. The approach could be enhanced by the addition of expert rules to model more complex medical reasoning. PMID:19390102

  19. A cellular genetics approach identifies gene-drug interactions and pinpoints drug toxicity pathway nodes

    PubMed Central

    Suzuki, Oscar T.; Frick, Amber; Parks, Bethany B.; Trask, O. Joseph; Butz, Natasha; Steffy, Brian; Chan, Emmanuel; Scoville, David K.; Healy, Eric; Benton, Cristina; McQuaid, Patricia E.; Thomas, Russell S.; Wiltshire, Tim

    2014-01-01

    New approaches to toxicity testing have incorporated high-throughput screening across a broad-range of in vitro assays to identify potential key events in response to chemical or drug treatment. To date, these approaches have primarily utilized repurposed drug discovery assays. In this study, we describe an approach that combines in vitro screening with genetic approaches for the experimental identification of genes and pathways involved in chemical or drug toxicity. Primary embryonic fibroblasts isolated from 32 genetically-characterized inbred mouse strains were treated in concentration-response format with 65 compounds, including pharmaceutical drugs, environmental chemicals, and compounds with known modes-of-action. Integrated cellular responses were measured at 24 and 72 h using high-content imaging and included cell loss, membrane permeability, mitochondrial function, and apoptosis. Genetic association analysis of cross-strain differences in the cellular responses resulted in a collection of candidate loci potentially underlying the variable strain response to each chemical. As a demonstration of the approach, one candidate gene involved in rotenone sensitivity, Cybb, was experimentally validated in vitro and in vivo. Pathway analysis on the combined list of candidate loci across all chemicals identified a number of over-connected nodes that may serve as core regulatory points in toxicity pathways. PMID:25221565

  20. A novel approach to identify genes that determine grain protein deviation in cereals.

    PubMed

    Mosleth, Ellen F; Wan, Yongfang; Lysenko, Artem; Chope, Gemma A; Penson, Simon P; Shewry, Peter R; Hawkesford, Malcolm J

    2015-06-01

    Grain yield and protein content were determined for six wheat cultivars grown over 3 years at multiple sites and at multiple nitrogen (N) fertilizer inputs. Although grain protein content was negatively correlated with yield, some grain samples had higher protein contents than expected based on their yields, a trait referred to as grain protein deviation (GPD). We used novel statistical approaches to identify gene transcripts significantly related to GPD across environments. The yield and protein content were initially adjusted for nitrogen fertilizer inputs and then adjusted for yield (to remove the negative correlation with protein content), resulting in a parameter termed corrected GPD. Significant genetic variation in corrected GPD was observed for six cultivars grown over a range of environmental conditions (a total of 584 samples). Gene transcript profiles were determined in a subset of 161 samples of developing grain to identify transcripts contributing to GPD. Principal component analysis (PCA), analysis of variance (ANOVA) and means of scores regression (MSR) were used to identify individual principal components (PCs) correlating with GPD alone. Scores of the selected PCs, which were significantly related to GPD and protein content but not to the yield and significantly affected by cultivar, were identified as reflecting a multivariate pattern of gene expression related to genetic variation in GPD. Transcripts with consistent variation along the selected PCs were identified by an approach hereby called one-block means of scores regression (one-block MSR). PMID:25400203

  1. New approach for reduction of diesel consumption by comparing different mining haulage configurations.

    PubMed

    Rodovalho, Edmo da Cunha; Lima, Hernani Mota; de Tomi, Giorgio

    2016-05-01

    The mining operations of loading and haulage have an energy source that is highly dependent on fossil fuels. In mining companies that select trucks for haulage, this input is the main component of mining costs. How can the impact of the operational aspects on the diesel consumption of haulage operations in surface mines be assessed? There are many studies relating the consumption of fuel trucks to several variables, but a methodology that prioritizes higher-impact variables under each specific condition is not available. Generic models may not apply to all operational settings presented in the mining industry. This study aims to create a method of analysis, identification, and prioritization of variables related to fuel consumption of haul trucks in open pit mines. For this purpose, statistical analysis techniques and mathematical modelling tools using multiple linear regressions will be applied. The model is shown to be suitable because the results generate a good description of the fuel consumption behaviour. In the practical application of the method, the reduction of diesel consumption reached 10%. The implementation requires no large-scale investments or very long deadlines and can be applied to mining haulage operations in other settings. PMID:26946166

  2. Floodplain storage of mine tailings in the Belle Fourche river system: a sediment budget approach

    USGS Publications Warehouse

    Marron, D.C.

    1992-01-01

    Arsenic-contaminated mine tailings that were discharged into Whitewood Creek at Lead, South Dakota, from 1876 to 1978, were deposited along the floodplains of Whitewood Creek and the Belle Fourche River. The resulting arsenic-contaminated floodplain deposit consists mostly of overbank sediments and filled abandoned meanders along Whitewood Creek, and overbank and point-bar sediments along the Belle Fourche River. Arsenic concentrations of the contaminated sediments indicate the degree of dilution of mine tailings by uncontaminated alluvium. About 13% of the 110 ?? 106 Mg of mine tailings that were discharged at Lead were deposited along the Whitewood Creek floodplain. -from Author

  3. A Critical Study on the Underground Environment of Coal Mines in India-an Ergonomic Approach

    NASA Astrophysics Data System (ADS)

    Dey, Netai Chandra; Sharma, Gourab Dhara

    2013-04-01

    Ergonomics application on underground miner's health plays a great role in controlling the efficiency of miners. The job stress in underground mine is still physically demanding and continuous stress due to certain posture or movement of miners during work leads to localized muscle fatigue creating musculo-skeletal disorders. A good working environment can change the degree of job heaviness and thermal stress (WBGT values) can directly have the effect on stretch of work of miners. Out of many unit operations in underground mine, roof bolting keeps an important contribution with regard to safety of the mine and miners. Occupational stress of roof bolters from ergonomic consideration has been discussed in the paper.

  4. Identifying Pigment Mixtures in Art Using SERS: A Treatment Flowchart Approach.

    PubMed

    Roh, Joo Yeon; Matecki, Mary K; Svoboda, Shelley A; Wustholz, Kristin L

    2016-02-16

    A novel treatment flowchart approach for surface-enhanced Raman scattering (SERS) is used to identify both blue and yellow organic pigments in a single microscopic sample from a series of reference oil paints as well as an actual 18th century oil painting. In particular, several treatment strategies using acids and solvents are integrated into a specific flowchart designed to enable the minimally invasive identification of unknown blue (i.e., indigo, Prussian blue) and yellow organic (i.e., Reseda lake, Stil de Grain, gamboge) pigments in one sample. We demonstrate the first successful identification of a yellow lake pigment in a historic painting using SERS as well as the utility of our treatment flowchart approach for identifying pigments of varying resonance conditions, surface affinities, and treatment requirements in a single microscopic sample from a historic oil painting. PMID:26799174

  5. An innovative and integrated approach based on DNA walking to identify unauthorised GMOs.

    PubMed

    Fraiture, Marie-Alice; Herman, Philippe; Taverniers, Isabel; De Loose, Marc; Deforce, Dieter; Roosens, Nancy H

    2014-03-15

    In the coming years, the frequency of unauthorised genetically modified organisms (GMOs) being present in the European food and feed chain will increase significantly. Therefore, we have developed a strategy to identify unauthorised GMOs containing a pCAMBIA family vector, frequently present in transgenic plants. This integrated approach is performed in two successive steps on Bt rice grains. First, the potential presence of unauthorised GMOs is assessed by the qPCR SYBR®Green technology targeting the terminator 35S pCAMBIA element. Second, its presence is confirmed via the characterisation of the junction between the transgenic cassette and the rice genome. To this end, a DNA walking strategy is applied using a first reverse primer followed by two semi-nested PCR rounds using primers that are each time nested to the previous reverse primer. This approach allows to rapidly identify the transgene flanking region and can easily be implemented by the enforcement laboratories. PMID:24206686

  6. Systems approaches to unraveling plant metabolism: identifying biosynthetic genes of secondary metabolic pathways.

    PubMed

    Spiering, Martin J; Kaur, Bhavneet; Parsons, James F; Eisenstein, Edward

    2014-01-01

    The diversity of useful compounds produced by plant secondary metabolism has stimulated broad systems biology approaches to identify the genes involved in their biosynthesis. Systems biology studies in non-model plants pose interesting but addressable challenges, and have been greatly facilitated by the ability to grow and maintain plants, develop laboratory culture systems, and profile key metabolites in order to identify critical genes involved their biosynthesis. In this chapter we describe a suite of approaches that have been useful in Actaea racemosa (L.; syn. Cimicifuga racemosa, Nutt., black coshosh), a non-model medicinal plant with no genome sequence and little horticultural information available, that have led to the development of initial gene-metabolite relationships for the production of several bioactive metabolites in this multicomponent botanical therapeutic, and that can be readily applied to a wide variety of under-characterized medicinal plants. PMID:24218220

  7. An integrated remote sensing approach for identifying ecological range sites. [parker mountain

    NASA Technical Reports Server (NTRS)

    Jaynes, R. A.

    1983-01-01

    A model approach for identifying ecological range sites was applied to high elevation sagebrush-dominated rangelands on Parker Mountain, in south-central Utah. The approach utilizes map information derived from both high altitude color infrared photography and LANDSAT digital data, integrated with soils, geological, and precipitation maps. Identification of the ecological range site for a given area requires an evaluation of all relevant environmental factors which combine to give that site the potential to produce characteristic types and amounts of vegetation. A table is presented which allows the user to determine ecological range site based upon an integrated use of the maps which were prepared. The advantages of identifying ecological range sites through an integrated photo interpretation/LANDSAT analysis are discussed.

  8. Microbial populations identified by fluorescence in situ hybridization in a constructed wetland treating acid coal mine drainage

    SciTech Connect

    Nicomrat, D.; Dick, W.A.; Tuovinen, O.H.

    2006-07-15

    Microorganisms are an integral part of the biogeochemical processes in wetlands, yet microbial communities in sediments within constructed wetlands receiving acid mine drainage (AMD) are only poorly understood. The purpose of this study was to characterize the microbial diversity and abundance in a wetland receiving AMD using fluorescence in situ hybridization (FISH) analysis. Seasonal samples of oxic surface sediments, comprised of Fe(III) precipitates, were collected from two treatment cells of the constructed wetland system. The pH of the bulk samples ranged between pH 2.1 and 3.9. Viable counts of acidophilic Fe and S oxidizers and heterotrophs were determined with a most probable number (MPN) method. The MPN counts were only a fraction of the corresponding FISH counts. The sediment samples contained microorganisms in the Bacteria (including the subgroups of acidophilic Fe- and S-oxidizing bacteria and Acidiphilium spp.) and Eukarya domains. Archaea were present in the sediment surface samples at < 0.01% of the total microbial community. The most numerous bacterial species in this wetland system was Acidithiobacillus ferrooxidans, comprising up to 37% of the bacterial population. Acidithiobacillus thiooxidans was also abundant.

  9. A new approach to hazardous materials transportation risk analysis: decision modeling to identify critical variables.

    PubMed

    Clark, Renee M; Besterfield-Sacre, Mary E

    2009-03-01

    We take a novel approach to analyzing hazardous materials transportation risk in this research. Previous studies analyzed this risk from an operations research (OR) or quantitative risk assessment (QRA) perspective by minimizing or calculating risk along a transport route. Further, even though the majority of incidents occur when containers are unloaded, the research has not focused on transportation-related activities, including container loading and unloading. In this work, we developed a decision model of a hazardous materials release during unloading using actual data and an exploratory data modeling approach. Previous studies have had a theoretical perspective in terms of identifying and advancing the key variables related to this risk, and there has not been a focus on probability and statistics-based approaches for doing this. Our decision model empirically identifies the critical variables using an exploratory methodology for a large, highly categorical database involving latent class analysis (LCA), loglinear modeling, and Bayesian networking. Our model identified the most influential variables and countermeasures for two consequences of a hazmat incident, dollar loss and release quantity, and is one of the first models to do this. The most influential variables were found to be related to the failure of the container. In addition to analyzing hazmat risk, our methodology can be used to develop data-driven models for strategic decision making in other domains involving risk. PMID:19087232

  10. An Approach for Identifying Cytokines Based on a Novel Ensemble Classifier

    PubMed Central

    Zou, Quan; Wang, Zhen; Guan, Xinjun; Liu, Bin; Wu, Yunfeng; Lin, Ziyu

    2013-01-01

    Biology is meaningful and important to identify cytokines and investigate their various functions and biochemical mechanisms. However, several issues remain, including the large scale of benchmark datasets, serious imbalance of data, and discovery of new gene families. In this paper, we employ the machine learning approach based on a novel ensemble classifier to predict cytokines. We directly selected amino acids sequences as research objects. First, we pretreated the benchmark data accurately. Next, we analyzed the physicochemical properties and distribution of whole amino acids and then extracted a group of 120-dimensional (120D) valid features to represent sequences. Third, in the view of the serious imbalance in benchmark datasets, we utilized a sampling approach based on the synthetic minority oversampling technique algorithm and K-means clustering undersampling algorithm to rebuild the training set. Finally, we built a library for dynamic selection and circulating combination based on clustering (LibD3C) and employed the new training set to realize cytokine classification. Experiments showed that the geometric mean of sensitivity and specificity obtained through our approach is as high as 93.3%, which proves that our approach is effective for identifying cytokines. PMID:24027761