Citation Mining: Integrating Text Mining and Bibliometrics for Research User Profiling.
ERIC Educational Resources Information Center
Kostoff, Ronald N.; del Rio, J. Antonio; Humenik, James A.; Garcia, Esther Ofilia; Ramirez, Ana Maria
2001-01-01
Discusses the importance of identifying the users and impact of research, and describes an approach for identifying the pathways through which research can impact other research, technology development, and applications. Describes a study that used citation mining, an integration of citation bibliometrics and text mining, on articles from the…
Geologic considerations in underground coal mining system design
NASA Technical Reports Server (NTRS)
Camilli, F. A.; Maynard, D. P.; Mangolds, A.; Harris, J.
1981-01-01
Geologic characteristics of coal resources which may impact new extraction technologies are identified and described to aid system designers and planners in their task of designing advanced coal extraction systems for the central Appalachian region. These geologic conditions are then organized into a matrix identified as the baseline mine concept. A sample region, eastern Kentucy is analyzed using both the developed baseline mine concept and the traditional geologic investigative approach.
Mining integrated semantic networks for drug repositioning opportunities
Mullen, Joseph; Tipney, Hannah
2016-01-01
Current research and development approaches to drug discovery have become less fruitful and more costly. One alternative paradigm is that of drug repositioning. Many marketed examples of repositioned drugs have been identified through serendipitous or rational observations, highlighting the need for more systematic methodologies to tackle the problem. Systems level approaches have the potential to enable the development of novel methods to understand the action of therapeutic compounds, but requires an integrative approach to biological data. Integrated networks can facilitate systems level analyses by combining multiple sources of evidence to provide a rich description of drugs, their targets and their interactions. Classically, such networks can be mined manually where a skilled person is able to identify portions of the graph (semantic subgraphs) that are indicative of relationships between drugs and highlight possible repositioning opportunities. However, this approach is not scalable. Automated approaches are required to systematically mine integrated networks for these subgraphs and bring them to the attention of the user. We introduce a formal framework for the definition of integrated networks and their associated semantic subgraphs for drug interaction analysis and describe DReSMin, an algorithm for mining semantically-rich networks for occurrences of a given semantic subgraph. This algorithm allows instances of complex semantic subgraphs that contain data about putative drug repositioning opportunities to be identified in a computationally tractable fashion, scaling close to linearly with network data. We demonstrate the utility of our approach by mining an integrated drug interaction network built from 11 sources. This work identified and ranked 9,643,061 putative drug-target interactions, showing a strong correlation between highly scored associations and those supported by literature. We discuss the 20 top ranked associations in more detail, of which 14 are novel and 6 are supported by the literature. We also show that our approach better prioritizes known drug-target interactions, than other state-of-the art approaches for predicting such interactions. PMID:26844016
A Data Mining Approach to Identify Sexuality Patterns in a Brazilian University Population.
Waleska Simões, Priscyla; Cesconetto, Samuel; Toniazzo de Abreu, Larissa Letieli; Côrtes de Mattos Garcia, Merisandra; Cassettari Junior, José Márcio; Comunello, Eros; Bisognin Ceretta, Luciane; Aparecida Manenti, Sandra
2015-01-01
This paper presents the profile and experience of sexuality generated from a data mining classification task. We used a database about sexuality and gender violence performed on a university population in southern Brazil. The data mining task identified two relationships between the variables, which enabled the distinction of subgroups that better detail the profile and experience of sexuality. The identification of the relationships between the variables define behavioral models and factors of risk that will help define the algorithms being implemented in the data mining classification task.
Differentially Private Frequent Subgraph Mining
Xu, Shengzhi; Xiong, Li; Cheng, Xiang; Xiao, Ke
2016-01-01
Mining frequent subgraphs from a collection of input graphs is an important topic in data mining research. However, if the input graphs contain sensitive information, releasing frequent subgraphs may pose considerable threats to individual's privacy. In this paper, we study the problem of frequent subgraph mining (FGM) under the rigorous differential privacy model. We introduce a novel differentially private FGM algorithm, which is referred to as DFG. In this algorithm, we first privately identify frequent subgraphs from input graphs, and then compute the noisy support of each identified frequent subgraph. In particular, to privately identify frequent subgraphs, we present a frequent subgraph identification approach which can improve the utility of frequent subgraph identifications through candidates pruning. Moreover, to compute the noisy support of each identified frequent subgraph, we devise a lattice-based noisy support derivation approach, where a series of methods has been proposed to improve the accuracy of the noisy supports. Through formal privacy analysis, we prove that our DFG algorithm satisfies ε-differential privacy. Extensive experimental results on real datasets show that the DFG algorithm can privately find frequent subgraphs with high data utility. PMID:27616876
Citation-related reliability analysis for a pilot sample of underground coal mines.
Kinilakodi, Harisha; Grayson, R Larry
2011-05-01
The scrutiny of underground coal mine safety was heightened because of the disasters that occurred in 2006-2007, and more recently in 2010. In the aftermath of the 2006 incidents, the U.S. Congress passed the Mine Improvement and New Emergency Response Act of 2006 (MINER Act), which strengthened the existing regulations and mandated new laws to address various issues related to emergency preparedness and response, escape from an emergency situation, and protection of miners. The National Mining Association-sponsored Mine Safety Technology and Training Commission study highlighted the role of risk management in identifying and controlling major hazards, which are elements that could come together and cause a mine disaster. In 2007 MSHA revised its approach to the "Pattern of Violations" (POV) process in order to target unsafe mines and then force them to remediate conditions in their mines. The POV approach has certain limitations that make it difficult for it to be enforced. One very understandable way to focus on removing threats from major-hazard conditions is to use citation-related reliability analysis. The citation reliability approach, which focuses on the probability of not getting a citation on a given inspector day, is considered an analogue to the maintenance reliability approach, which many mine operators understand and use. In this study, the citation reliability approach was applied to a stratified random sample of 31 underground coal mines to examine its potential for broader application. The results clearly show the best-performing and worst-performing mines for compliance with mine safety standards, and they highlight differences among different mine sizes. Copyright © 2010 Elsevier Ltd. All rights reserved.
Improving Fraud and Abuse Detection in General Physician Claims: A Data Mining Study
Joudaki, Hossein; Rashidian, Arash; Minaei-Bidgoli, Behrouz; Mahmoodi, Mahmood; Geraili, Bijan; Nasiri, Mahdi; Arab, Mohammad
2016-01-01
Background: We aimed to identify the indicators of healthcare fraud and abuse in general physicians’ drug prescription claims, and to identify a subset of general physicians that were more likely to have committed fraud and abuse. Methods: We applied data mining approach to a major health insurance organization dataset of private sector general physicians’ prescription claims. It involved 5 steps: clarifying the nature of the problem and objectives, data preparation, indicator identification and selection, cluster analysis to identify suspect physicians, and discriminant analysis to assess the validity of the clustering approach. Results: Thirteen indicators were developed in total. Over half of the general physicians (54%) were ‘suspects’ of conducting abusive behavior. The results also identified 2% of physicians as suspects of fraud. Discriminant analysis suggested that the indicators demonstrated adequate performance in the detection of physicians who were suspect of perpetrating fraud (98%) and abuse (85%) in a new sample of data. Conclusion: Our data mining approach will help health insurance organizations in low-and middle-income countries (LMICs) in streamlining auditing approaches towards the suspect groups rather than routine auditing of all physicians. PMID:26927587
Improving Fraud and Abuse Detection in General Physician Claims: A Data Mining Study.
Joudaki, Hossein; Rashidian, Arash; Minaei-Bidgoli, Behrouz; Mahmoodi, Mahmood; Geraili, Bijan; Nasiri, Mahdi; Arab, Mohammad
2015-11-10
We aimed to identify the indicators of healthcare fraud and abuse in general physicians' drug prescription claims, and to identify a subset of general physicians that were more likely to have committed fraud and abuse. We applied data mining approach to a major health insurance organization dataset of private sector general physicians' prescription claims. It involved 5 steps: clarifying the nature of the problem and objectives, data preparation, indicator identification and selection, cluster analysis to identify suspect physicians, and discriminant analysis to assess the validity of the clustering approach. Thirteen indicators were developed in total. Over half of the general physicians (54%) were 'suspects' of conducting abusive behavior. The results also identified 2% of physicians as suspects of fraud. Discriminant analysis suggested that the indicators demonstrated adequate performance in the detection of physicians who were suspect of perpetrating fraud (98%) and abuse (85%) in a new sample of data. Our data mining approach will help health insurance organizations in low-and middle-income countries (LMICs) in streamlining auditing approaches towards the suspect groups rather than routine auditing of all physicians. © 2016 by Kerman University of Medical Sciences.
ERIC Educational Resources Information Center
Tsai, Yea-Ru; Ouyang, Chen-Sen; Chang, Yukon
2016-01-01
The purpose of this study is to propose a diagnostic approach to identify engineering students' English reading comprehension errors. Student data were collected during the process of reading texts of English for science and technology on a web-based cumulative sentence analysis system. For the analysis, the association-rule, data mining technique…
He, Qiwei; Veldkamp, Bernard P; Glas, Cees A W; de Vries, Theo
2017-03-01
Patients' narratives about traumatic experiences and symptoms are useful in clinical screening and diagnostic procedures. In this study, we presented an automated assessment system to screen patients for posttraumatic stress disorder via a natural language processing and text-mining approach. Four machine-learning algorithms-including decision tree, naive Bayes, support vector machine, and an alternative classification approach called the product score model-were used in combination with n-gram representation models to identify patterns between verbal features in self-narratives and psychiatric diagnoses. With our sample, the product score model with unigrams attained the highest prediction accuracy when compared with practitioners' diagnoses. The addition of multigrams contributed most to balancing the metrics of sensitivity and specificity. This article also demonstrates that text mining is a promising approach for analyzing patients' self-expression behavior, thus helping clinicians identify potential patients from an early stage.
Service-based analysis of biological pathways
Zheng, George; Bouguettaya, Athman
2009-01-01
Background Computer-based pathway discovery is concerned with two important objectives: pathway identification and analysis. Conventional mining and modeling approaches aimed at pathway discovery are often effective at achieving either objective, but not both. Such limitations can be effectively tackled leveraging a Web service-based modeling and mining approach. Results Inspired by molecular recognitions and drug discovery processes, we developed a Web service mining tool, named PathExplorer, to discover potentially interesting biological pathways linking service models of biological processes. The tool uses an innovative approach to identify useful pathways based on graph-based hints and service-based simulation verifying user's hypotheses. Conclusion Web service modeling of biological processes allows the easy access and invocation of these processes on the Web. Web service mining techniques described in this paper enable the discovery of biological pathways linking these process service models. Algorithms presented in this paper for automatically highlighting interesting subgraph within an identified pathway network enable the user to formulate hypothesis, which can be tested out using our simulation algorithm that are also described in this paper. PMID:19796403
Identification of ex-sand mining area using optical and SAR imagery
NASA Astrophysics Data System (ADS)
Indriasari, Novie; Kusratmoko, Eko; Indra, Tito Latif; Julzarika, Atriyon
2018-05-01
Open mining activities in Sumedang Regency has been operated since 1984 impacted to degradation of environment due to large area of ex-mining. Therefore, identification of ex-mining area which generally been used for sand mining is crucial and important to detect and monitor recent environmental degradation impacted from the ex-mining activities. In this research, identification ex-sand mining area using optical and SAR data in Sumedang Regency will be discussed. We use Landsat 5 TM acquisition date August 01, 2009 and Landsat 8 OLI acquired on June 24, 2016 to identify location of sand mining area, processed using Tasselled Cap Trasformation (TCT), while the landform deformation approached using ALOS PALSAR in 2009 and ALOS PALSAR 2 in 2016 processed using SAR interferometry (InSAR) method. The results show that TCT and InSAR method can can be used to identify the areas of ex-sand mining clearly. In 2016 the total area of ex-mining were 352.92 Ha. The land deformation show that during 7 years period since 2009 has impacted to the deformation at 7 meters.
Environmental considerations related to mining of nonfuel minerals
Seal, Robert R.; Piatak, Nadine M.; Kimball, Bryn E.; Hammarstrom, Jane M.; Schulz, Klaus J.; DeYoung,, John H.; Seal, Robert R.; Bradley, Dwight C.
2017-12-19
Throughout most of human history, environmental stewardship during mining has not been a priority partly because of the lack of applicable laws and regulations and partly because of ignorance about the effects that mining can have on the environment. In the United States, the National Environmental Policy Act of 1969, in conjunction with related laws, codified a more modern approach to mining, including the responsibility for environmental stewardship, and provided a framework for incorporating environmental protection into mine planning. Today, similar frameworks are in place in the other developed countries of the world, and international mining companies generally follow similar procedures wherever they work in the world. The regulatory guidance has fostered an international effort among all stakeholders to identify best practices for environmental stewardship.The modern approach to mining using best practices involves the following: (a) establishment of a pre-mining baseline from which to monitor environmental effects during mining and help establish geologically reasonable closure goals; (b) identification of environmental risks related to mining through standardized approaches; and (c) formulation of an environmental closure plan before the start of mining. A key aspect of identifying the environmental risks and mitigating those risks is understanding how the risks vary from one deposit type to another—a concept that forms the basis for geoenvironmental mineral-deposit models.Accompanying the quest for best practices is the goal of making mining sustainable into the future. Sustainable mine development is generally considered to be development that meets the needs of the present generation without compromising the ability of future generations to meet their own needs. The concept extends beyond the availability of nonrenewable mineral commodities and includes the environmental and social effects of mine development.Global population growth, meanwhile, has decreased the percentage of inhabitable land available to support society’s material needs. Presently, the land area available to supply the mineral resources, energy resources, water, food, shelter, and waste disposal needs of all Earth’s inhabitants is estimated to be 135 square meters per person. Continued global population growth will only increase the challenges of sustainable mining.Current trends in mining are also expected to lead to new environmental challenges in the future, among which are mine-waste management issues related to mining larger deposits for lower ore grade; water-management issues related to both the mining of larger deposits and the changes in precipitation brought about by climate change; and greenhouse gas issues related to reducing the carbon footprint of larger, more energy-intensive mining operations.
Determining Plant – Leaf Miner – Parasitoid Interactions: A DNA Barcoding Approach
Derocles, Stéphane A. P.; Evans, Darren M.; Nichols, Paul C.; Evans, S. Aifionn; Lunt, David H.
2015-01-01
A major challenge in network ecology is to describe the full-range of species interactions in a community to create highly-resolved food-webs. We developed a molecular approach based on DNA full barcoding and mini-barcoding to describe difficult to observe plant – leaf miner – parasitoid interactions, consisting of animals commonly regarded as agricultural pests and their natural enemies. We tested the ability of universal primers to amplify the remaining DNA inside leaf miner mines after the emergence of the insect. We compared the results of a) morphological identification of adult specimens; b) identification based on the shape of the mines; c) the COI Mini-barcode (130 bp) and d) the COI full barcode (658 bp) fragments to accurately identify the leaf-miner species. We used the molecular approach to build and analyse a tri-partite ecological network of plant – leaf miner – parasitoid interactions. We were able to detect the DNA of leaf-mining insects within their feeding mines on a range of host plants using mini-barcoding primers: 6% for the leaves collected empty and 33% success after we observed the emergence of the leaf miner. We suggest that the low amplification success of leaf mines collected empty was mainly due to the time since the adult emerged and discuss methodological improvements. Nevertheless our approach provided new species-interaction data for the ecological network. We found that the 130 bp fragment is variable enough to identify all the species included in this study. Both COI fragments reveal that some leaf miner species could be composed of cryptic species. The network built using the molecular approach was more accurate in describing tri-partite interactions compared with traditional approaches based on morphological criteria. PMID:25710377
Perkins, William T; Bird, Graham; Jacobs, Suzanne R; Devoy, Cora
2016-03-01
Mine tailings represent a globally significant source of potentially harmful elements (PHEs) to the environment. The management of large volumes of mine tailings represents a major challenge to the mining industry and environmental managers. This field-scale study evaluates the impact of two highly contrasting remediation approaches to the management and stabilisation of mine tailings. The geochemistry of the tailings, overlying amendment layers and vegetation are examined in the light of the different management approaches. Pseudo-total As, Cd and Pb concentrations and solid-state partitioning (speciation), determined via sequential extraction, were established for two Tailings Management Facilities (TMFs) in Ireland subjected to the following: (1) a 'walk-away' approach (Silvermines) and (2) application of an amendment layer (Galmoy). PHE concentrations in roots and herbage of grasses growing on the TMFs were also determined. Results identify very different PHE concentration profiles with depth through the TMFs and the impact of remediation approach on concentrations and their potential bioavailability in the rooting zone of grass species. Data also highlight the importance of choice of grass species in remediation approaches and the benefits of relatively shallow-rooting Agrostis capillaris and Festuca rubra varieties. In addition, data from the Galmoy TMF indicate the importance of regional soil geochemistry for interpreting the influence of the PHE geochemistry of capping and amendment layers applied to mine tailings.
Can abstract screening workload be reduced using text mining? User experiences of the tool Rayyan.
Olofsson, Hanna; Brolund, Agneta; Hellberg, Christel; Silverstein, Rebecca; Stenström, Karin; Österberg, Marie; Dagerhamn, Jessica
2017-09-01
One time-consuming aspect of conducting systematic reviews is the task of sifting through abstracts to identify relevant studies. One promising approach for reducing this burden uses text mining technology to identify those abstracts that are potentially most relevant for a project, allowing those abstracts to be screened first. To examine the effectiveness of the text mining functionality of the abstract screening tool Rayyan. User experiences were collected. Rayyan was used to screen abstracts for 6 reviews in 2015. After screening 25%, 50%, and 75% of the abstracts, the screeners logged the relevant references identified. A survey was sent to users. After screening half of the search result with Rayyan, 86% to 99% of the references deemed relevant to the study were identified. Of those studies included in the final reports, 96% to 100% were already identified in the first half of the screening process. Users rated Rayyan 4.5 out of 5. The text mining function in Rayyan successfully helped reviewers identify relevant studies early in the screening process. Copyright © 2017 John Wiley & Sons, Ltd.
Mines Systems Safety Improvement Using an Integrated Event Tree and Fault Tree Analysis
NASA Astrophysics Data System (ADS)
Kumar, Ranjan; Ghosh, Achyuta Krishna
2017-04-01
Mines systems such as ventilation system, strata support system, flame proof safety equipment, are exposed to dynamic operational conditions such as stress, humidity, dust, temperature, etc., and safety improvement of such systems can be done preferably during planning and design stage. However, the existing safety analysis methods do not handle the accident initiation and progression of mine systems explicitly. To bridge this gap, this paper presents an integrated Event Tree (ET) and Fault Tree (FT) approach for safety analysis and improvement of mine systems design. This approach includes ET and FT modeling coupled with redundancy allocation technique. In this method, a concept of top hazard probability is introduced for identifying system failure probability and redundancy is allocated to the system either at component or system level. A case study on mine methane explosion safety with two initiating events is performed. The results demonstrate that the presented method can reveal the accident scenarios and improve the safety of complex mine systems simultaneously.
Method to Select Technical Terms for Glossaries in Support of Joint Task Force Operations
2012-01-01
have been prohibitively time-consuming. Instead, we identified two publicly available terminology extractor tools: TerMine (NaCTEM, 2011) and Alchemy ...and that from the latter, by high recall. The Alchemy approach contrasts with that used in TerMine in that Alchemy will process the text with...information categories, such as person, location, and organization, in addition to returning topic keywords. Output from both TerMine and Alchemy
Iddamalgoda, Lahiru; Das, Partha S; Aponso, Achala; Sundararajan, Vijayaraghava S; Suravajhala, Prashanth; Valadi, Jayaraman K
2016-01-01
Data mining and pattern recognition methods reveal interesting findings in genetic studies, especially on how the genetic makeup is associated with inherited diseases. Although researchers have proposed various data mining models for biomedical approaches, there remains a challenge in accurately prioritizing the single nucleotide polymorphisms (SNP) associated with the disease. In this commentary, we review the state-of-art data mining and pattern recognition models for identifying inherited diseases and deliberate the need of binary classification- and scoring-based prioritization methods in determining causal variants. While we discuss the pros and cons associated with these methods known, we argue that the gene prioritization methods and the protein interaction (PPI) methods in conjunction with the K nearest neighbors' could be used in accurately categorizing the genetic factors in disease causation.
Monitoring the growth or decline of vegetation on mine dumps
NASA Technical Reports Server (NTRS)
Gilbertson, B. P. (Principal Investigator)
1975-01-01
The author has identified the following signficant results. It was established that particular mine dumps throughout the entire test area can be detected and identified. It was also established that patterns of vegetative growth on the mine dumps can be recognized from a simple visual analysis of photographic images. Because vegetation tends to occur in patches on many mine dumps, it is unsatisfactory to classify complete dumps into categories of percentage vegetative cover. A more desirable approach is to classify the patches of vegetation themselves. The coarse resolution of conventional densitometers restricts the accuracy of this procedure, and consequently a direct analysis of ERTS CCT's is preferred. A set of computer programs was written to perform the data reading and manipulating functions required for basic CCT analysis.
Frequent Itemset Hiding Algorithm Using Frequent Pattern Tree Approach
ERIC Educational Resources Information Center
Alnatsheh, Rami
2012-01-01
A problem that has been the focus of much recent research in privacy preserving data-mining is the frequent itemset hiding (FIH) problem. Identifying itemsets that appear together frequently in customer transactions is a common task in association rule mining. Organizations that share data with business partners may consider some of the frequent…
What Satisfies Students?: Mining Student-Opinion Data with Regression and Decision Tree Analysis
ERIC Educational Resources Information Center
Thomas, Emily H.; Galambos, Nora
2004-01-01
To investigate how students' characteristics and experiences affect satisfaction, this study uses regression and decision tree analysis with the CHAID algorithm to analyze student-opinion data. A data mining approach identifies the specific aspects of students' university experience that most influence three measures of general satisfaction. The…
A science-based, watershed strategy to support effective remediation of abandoned mine lands
Buxton, Herbert T.; Nimick, David A.; Von Guerard, Paul; Church, Stan E.; Frazier, Ann G.; Gray, John R.; Lipin, Bruce R.; Marsh, Sherman P.; Woodward, Daniel F.; Kimball, Briant A.; Finger, Susan E.; Ischinger, Lee S.; Fordham, John C.; Power, Martha S.; Bunch, Christine M.; Jones, John W.
1997-01-01
A U.S. Geological Survey Abandoned Mine Lands Initiative will develop a strategy for gathering and communicating the scientific information needed to formulate effective and cost-efficient remediation of abandoned mine lands. A watershed approach will identify, characterize, and remediate contaminated sites that have the most profound effect on water and ecosystem quality within a watershed. The Initiative will be conducted during 1997 through 2001 in two pilot watersheds, the Upper Animas River watershed in Colorado and the Boulder River watershed in Montana. Initiative efforts are being coordinated with the U.S. Forest Service, Bureau of Land Management, National Park Service, and other stakeholders which are using the resulting scientific information to design and implement remediation activities. The Initiative has the following eight objective-oriented components: estimate background (pre-mining) conditions; define baseline (current) conditions; identify target sites (major contaminant sources); characterize target sites and processes affecting contaminant dispersal; characterize ecosystem health and controlling processes at target sites; develop remediation goals and monitoring network; provide an integrated, quality-assured and accessible data network; and document lessons learned for future applications of the watershed approach.
May, Brian H; Zhang, Anthony; Lu, Yubo; Lu, Chuanjian; Xue, Charlie C L
2014-12-01
This project aimed to develop an approach to evaluating information contained in the premodern Traditional Chinese Medicine (TCM) literature that was (1) comprehensive, systematic, and replicable and (2) able to produce quantifiable output that could be used to answer specific research questions in order to identify natural products for clinical and experimental research. The project involved two stages. In stage 1, 14 TCM collections and compendia were evaluated for suitability as sources for searching; 8 of these were compared in detail. The results were published in the Journal of Alternative and Complementary Medicine. Stage 2 developed a text-mining approach for two of these sources. The text-mining approach was developed for Zhong Hua Yi Dian; Encyclopaedia of Traditional Chinese Medicine, 4th edition) and Zhong Yi Fang Ji Da Ci Dian; Great Compendium of Chinese Medical Formulae). This approach developed procedures for search term selection; methods for screening, classifying, and scoring data; procedures for systematic searching and data extraction; data checking procedures; and approaches for analyzing results. Examples are provided for studies of memory impairment and diabetic nephropathy, and issues relating to data interpretation are discussed. This approach to the analysis of large collections of the premodern TCM literature uses widely available sources and provides a text-mining approach that is systematic, replicable, and adaptable to the requirements of the particular project. Researchers can use these methods to explore changes in the names and conceptions of a disease over time, to identify which therapeutic methods have been more or less frequently used in different eras for particular disorders, and to assist in the selection of natural products for research efforts.
Using data mining techniques to characterize participation in observational studies.
Linden, Ariel; Yarnold, Paul R
2016-12-01
Data mining techniques are gaining in popularity among health researchers for an array of purposes, such as improving diagnostic accuracy, identifying high-risk patients and extracting concepts from unstructured data. In this paper, we describe how these techniques can be applied to another area in the health research domain: identifying characteristics of individuals who do and do not choose to participate in observational studies. In contrast to randomized studies where individuals have no control over their treatment assignment, participants in observational studies self-select into the treatment arm and therefore have the potential to differ in their characteristics from those who elect not to participate. These differences may explain part, or all, of the difference in the observed outcome, making it crucial to assess whether there is differential participation based on observed characteristics. As compared to traditional approaches to this assessment, data mining offers a more precise understanding of these differences. To describe and illustrate the application of data mining in this domain, we use data from a primary care-based medical home pilot programme and compare the performance of commonly used classification approaches - logistic regression, support vector machines, random forests and classification tree analysis (CTA) - in correctly classifying participants and non-participants. We find that CTA is substantially more accurate than the other models. Moreover, unlike the other models, CTA offers transparency in its computational approach, ease of interpretation via the decision rules produced and provides statistical results familiar to health researchers. Beyond their application to research, data mining techniques could help administrators to identify new candidates for participation who may most benefit from the intervention. © 2016 John Wiley & Sons, Ltd.
Small, Aeron M; Kiss, Daniel H; Zlatsin, Yevgeny; Birtwell, David L; Williams, Heather; Guerraty, Marie A; Han, Yuchi; Anwaruddin, Saif; Holmes, John H; Chirinos, Julio A; Wilensky, Robert L; Giri, Jay; Rader, Daniel J
2017-08-01
Interrogation of the electronic health record (EHR) using billing codes as a surrogate for diagnoses of interest has been widely used for clinical research. However, the accuracy of this methodology is variable, as it reflects billing codes rather than severity of disease, and depends on the disease and the accuracy of the coding practitioner. Systematic application of text mining to the EHR has had variable success for the detection of cardiovascular phenotypes. We hypothesize that the application of text mining algorithms to cardiovascular procedure reports may be a superior method to identify patients with cardiovascular conditions of interest. We adapted the Oracle product Endeca, which utilizes text mining to identify terms of interest from a NoSQL-like database, for purposes of searching cardiovascular procedure reports and termed the tool "PennSeek". We imported 282,569 echocardiography reports representing 81,164 individuals and 27,205 cardiac catheterization reports representing 14,567 individuals from non-searchable databases into PennSeek. We then applied clinical criteria to these reports in PennSeek to identify patients with trileaflet aortic stenosis (TAS) and coronary artery disease (CAD). Accuracy of patient identification by text mining through PennSeek was compared with ICD-9 billing codes. Text mining identified 7115 patients with TAS and 9247 patients with CAD. ICD-9 codes identified 8272 patients with TAS and 6913 patients with CAD. 4346 patients with AS and 6024 patients with CAD were identified by both approaches. A randomly selected sample of 200-250 patients uniquely identified by text mining was compared with 200-250 patients uniquely identified by billing codes for both diseases. We demonstrate that text mining was superior, with a positive predictive value (PPV) of 0.95 compared to 0.53 by ICD-9 for TAS, and a PPV of 0.97 compared to 0.86 for CAD. These results highlight the superiority of text mining algorithms applied to electronic cardiovascular procedure reports in the identification of phenotypes of interest for cardiovascular research. Copyright © 2017. Published by Elsevier Inc.
Realising the knowledge spiral in healthcare: the role of data mining and knowledge management.
Wickramasinghe, Nilmini; Bali, Rajeev K; Gibbons, M Chris; Schaffer, Jonathan
2008-01-01
Knowledge Management (KM) is an emerging business approach aimed at solving current problems such as competitiveness and the need to innovate which are faced by businesses today. The premise for the need for KM is based on a paradigm shift in the business environment where knowledge is central to organizational performance . Organizations trying to embrace KM have many tools, techniques and strategies at their disposal. A vital technique in KM is data mining which enables critical knowledge to be gained from the analysis of large amounts of data and information. The healthcare industry is a very information rich industry. The collecting of data and information permeate most, if not all areas of this industry; however, the healthcare industry has yet to fully embrace KM, let alone the new evolving techniques of data mining. In this paper, we demonstrate the ubiquitous benefits of data mining and KM to healthcare by highlighting their potential to enable and facilitate superior clinical practice and administrative management to ensue. Specifically, we show how data mining can realize the knowledge spiral by effecting the four key transformations identified by Nonaka of turning: (1) existing explicit knowledge to new explicit knowledge, (2) existing explicit knowledge to new tacit knowledge, (3) existing tacit knowledge to new explicit knowledge and (4) existing tacit knowledge to new tacit knowledge. This is done through the establishment of theoretical models that respectively identify the function of the knowledge spiral and the powers of data mining, both exploratory and predictive, in the knowledge discovery process. Our models are then applied to a healthcare data set to demonstrate the potential of this approach as well as the implications of such an approach to the clinical and administrative aspects of healthcare. Further, we demonstrate how these techniques can facilitate hospitals to address the six healthcare quality dimensions identified by the Committee for Quality Healthcare.
Zhang, Kai; Ren, Fang; Wang, Xuelong; Hu, Enyuan; Xu, Yahong; Yang, Xiao-Qing; Li, Hong; Chen, Liquan; Pianetta, Piero; Mehta, Apurva; Yu, Xiqian; Liu, Yijin
2017-12-13
The in-depth understanding of the minority phases' roles in functional materials, e.g., batteries, is critical for optimizing the system performance and the operational efficiency. Although the visualization of battery electrode under operating conditions has been demonstrated, the development of advanced data-mining approaches is still needed in order to identify minority phases and to understand their functionalities. The present study uses nanoscale X-ray spectromicroscopy to study a functional LiCoO 2 /Li battery pouch cell. The data-mining approaches developed herein were used to search through over 10 million X-ray absorption spectra that cover more than 100 active cathode particles. Two particles with unanticipated chemical fingerprints were identified and further analyzed, providing direct evidence and valuable insight into the undesired side reactions involving the cation dissolution and precipitation as well as the local overlithiation-caused subparticle domain deactivation. The data-mining approach described in this work is widely applicable to many other structurally complex and chemically heterogeneous systems, in which the secondary/minority phases could critically affect the overall performance of the system, well beyond battery research.
ERIC Educational Resources Information Center
Thomas, Emily H.; Galambos, Nora
To investigate how students' characteristics and experiences affect satisfaction, this study used regression and decision-tree analysis with the CHAID algorithm to analyze student opinion data from a sample of 1,783 college students. A data-mining approach identifies the specific aspects of students' university experience that most influence three…
Identifying antecedent conditions responsible for the high rate of mining injuries in Zambia.
Miller, Hugh B; Sinkala, Thomson; Renger, Ralph F; Peacock, Erin M; Tabor, Joseph A; Burgess, Jefferey L
2006-01-01
The incident rates of mining-related accidents and injuries in developing countries exceed those of developed nations. Interventions by international organizations routinely fail to produce appreciable long-term improvement. One major reason is the inability to identify and analyze the underlying factors responsible for creating unsafe working conditions. Understanding these antecedent conditions is necessary to formulate effective intervention strategies and prioritize the use of limited resources. This study utilized a logic model approach to determine the root causes and broad categories of potential interventions for mining accidents and injuries in Zambia. Results showed that policy interventions have the greatest potential for substantive change. A process of educating officials from government and mining companies about the economic and social merits of health and safety programs and extensive changes in regulatory structure and enforcement are needed.
Association rule mining in the US Vaccine Adverse Event Reporting System (VAERS).
Wei, Lai; Scott, John
2015-09-01
Spontaneous adverse event reporting systems are critical tools for monitoring the safety of licensed medical products. Commonly used signal detection algorithms identify disproportionate product-adverse event pairs and may not be sensitive to more complex potential signals. We sought to develop a computationally tractable multivariate data-mining approach to identify product-multiple adverse event associations. We describe an application of stepwise association rule mining (Step-ARM) to detect potential vaccine-symptom group associations in the US Vaccine Adverse Event Reporting System. Step-ARM identifies strong associations between one vaccine and one or more adverse events. To reduce the number of redundant association rules found by Step-ARM, we also propose a clustering method for the post-processing of association rules. In sample applications to a trivalent intradermal inactivated influenza virus vaccine and to measles, mumps, rubella, and varicella (MMRV) vaccine and in simulation studies, we find that Step-ARM can detect a variety of medically coherent potential vaccine-symptom group signals efficiently. In the MMRV example, Step-ARM appears to outperform univariate methods in detecting a known safety signal. Our approach is sensitive to potentially complex signals, which may be particularly important when monitoring novel medical countermeasure products such as pandemic influenza vaccines. The post-processing clustering algorithm improves the applicability of the approach as a screening method to identify patterns that may merit further investigation. Copyright © 2015 John Wiley & Sons, Ltd.
Mining Consumer Health Vocabulary from Community-Generated Text
Vydiswaran, V.G. Vinod; Mei, Qiaozhu; Hanauer, David A.; Zheng, Kai
2014-01-01
Community-generated text corpora can be a valuable resource to extract consumer health vocabulary (CHV) and link them to professional terminologies and alternative variants. In this research, we propose a pattern-based text-mining approach to identify pairs of CHV and professional terms from Wikipedia, a large text corpus created and maintained by the community. A novel measure, leveraging the ratio of frequency of occurrence, was used to differentiate consumer terms from professional terms. We empirically evaluated the applicability of this approach using a large data sample consisting of MedLine abstracts and all posts from an online health forum, MedHelp. The results show that the proposed approach is able to identify synonymous pairs and label the terms as either consumer or professional term with high accuracy. We conclude that the proposed approach provides great potential to produce a high quality CHV to improve the performance of computational applications in processing consumer-generated health text. PMID:25954426
Tilbury, Trudy; Sanderson, Liz
2012-01-01
Queensland Mining has a strong focus on safety performance, but risk management of health, including Musculoskeletal Disorders (MSDs) continues to have a lower priority. The reliance on individual screening of workers and lower level approaches such as manual handling training is part of the coal mining 'culture'. Initiatives such as the New South Wales and Queensland Mining joint project to develop good practice guidance for mining has allowed for a more consistent message on participatory ergonomics and prevention of MSD. An evidence based practice approach, including the introduction of participatory ergonomics and safe design principles, was proposed to Anglo American Coal operations in Queensland. The project consisted of a skills analysis of current health personnel, design of a facilitated participatory ergonomics training program, site visits to identify good practice and champions, and a graduated mentoring program for health personnel. Early results demonstrate a number of sites are benefiting from site taskforces with a focus on positive performance outcomes.
Gene prioritization and clustering by multi-view text mining
2010-01-01
Background Text mining has become a useful tool for biologists trying to understand the genetics of diseases. In particular, it can help identify the most interesting candidate genes for a disease for further experimental analysis. Many text mining approaches have been introduced, but the effect of disease-gene identification varies in different text mining models. Thus, the idea of incorporating more text mining models may be beneficial to obtain more refined and accurate knowledge. However, how to effectively combine these models still remains a challenging question in machine learning. In particular, it is a non-trivial issue to guarantee that the integrated model performs better than the best individual model. Results We present a multi-view approach to retrieve biomedical knowledge using different controlled vocabularies. These controlled vocabularies are selected on the basis of nine well-known bio-ontologies and are applied to index the vast amounts of gene-based free-text information available in the MEDLINE repository. The text mining result specified by a vocabulary is considered as a view and the obtained multiple views are integrated by multi-source learning algorithms. We investigate the effect of integration in two fundamental computational disease gene identification tasks: gene prioritization and gene clustering. The performance of the proposed approach is systematically evaluated and compared on real benchmark data sets. In both tasks, the multi-view approach demonstrates significantly better performance than other comparing methods. Conclusions In practical research, the relevance of specific vocabulary pertaining to the task is usually unknown. In such case, multi-view text mining is a superior and promising strategy for text-based disease gene identification. PMID:20074336
Combined mining: discovering informative knowledge in complex data.
Cao, Longbing; Zhang, Huaifeng; Zhao, Yanchang; Luo, Dan; Zhang, Chengqi
2011-06-01
Enterprise data mining applications often involve complex data such as multiple large heterogeneous data sources, user preferences, and business impact. In such situations, a single method or one-step mining is often limited in discovering informative knowledge. It would also be very time and space consuming, if not impossible, to join relevant large data sources for mining patterns consisting of multiple aspects of information. It is crucial to develop effective approaches for mining patterns combining necessary information from multiple relevant business lines, catering for real business settings and decision-making actions rather than just providing a single line of patterns. The recent years have seen increasing efforts on mining more informative patterns, e.g., integrating frequent pattern mining with classifications to generate frequent pattern-based classifiers. Rather than presenting a specific algorithm, this paper builds on our existing works and proposes combined mining as a general approach to mining for informative patterns combining components from either multiple data sets or multiple features or by multiple methods on demand. We summarize general frameworks, paradigms, and basic processes for multifeature combined mining, multisource combined mining, and multimethod combined mining. Novel types of combined patterns, such as incremental cluster patterns, can result from such frameworks, which cannot be directly produced by the existing methods. A set of real-world case studies has been conducted to test the frameworks, with some of them briefed in this paper. They identify combined patterns for informing government debt prevention and improving government service objectives, which show the flexibility and instantiation capability of combined mining in discovering informative knowledge in complex data.
PubMedMiner: Mining and Visualizing MeSH-based Associations in PubMed.
Zhang, Yucan; Sarkar, Indra Neil; Chen, Elizabeth S
2014-01-01
The exponential growth of biomedical literature provides the opportunity to develop approaches for facilitating the identification of possible relationships between biomedical concepts. Indexing by Medical Subject Headings (MeSH) represent high-quality summaries of much of this literature that can be used to support hypothesis generation and knowledge discovery tasks using techniques such as association rule mining. Based on a survey of literature mining tools, a tool implemented using Ruby and R - PubMedMiner - was developed in this study for mining and visualizing MeSH-based associations for a set of MEDLINE articles. To demonstrate PubMedMiner's functionality, a case study was conducted that focused on identifying and comparing comorbidities for asthma in children and adults. Relative to the tools surveyed, the initial results suggest that PubMedMiner provides complementary functionality for summarizing and comparing topics as well as identifying potentially new knowledge.
Mining the human gut microbiota for effector strains that shape the immune system
Ahern, Philip P.; Faith, Jeremiah J.; Gordon, Jeffrey I.
2014-01-01
Summary The gut microbiota co-develops with the immune system beginning at birth. Mining the microbiota for bacterial strains responsible for shaping the structure and dynamic operations of the innate and adaptive arms of the immune system represents a formidable combinatorial problem but one that needs to be overcome to advance mechanistic understanding of microbial community-immune system co-regulation, and in order to develop new diagnostic and therapeutic approaches that promote health. Here, we discuss a scalable, less biased approach for identifying effector strains in complex microbial communities that impact immune function. The approach begins by identifying uncultured human fecal microbiota samples that transmit immune phenotypes to germ-free mice. Clonally-arrayed sequenced collections of bacterial strains are constructed from representative donor microbiota. If the collection transmits phenotypes, effector strains are identified by testing randomly generated subsets with overlapping membership in individually-housed germ-free animals. Detailed mechanistic studies of effector strain-host interactions can then be performed. PMID:24950201
NASA Astrophysics Data System (ADS)
Kinilakodi, Harisha
The underground coal mining industry has been under constant watch due to the high risk involved in its activities, and scrutiny increased because of the disasters that occurred in 2006-07. In the aftermath of the incidents, the U.S. Congress passed the Mine Improvement and New Emergency Response Act of 2006 (MINER Act), which strengthened the existing regulations and mandated new laws to address the various issues related to a safe working environment in the mines. Risk analysis in any form should be done on a regular basis to tackle the possibility of unwanted major hazard-related events such as explosions, outbursts, airbursts, inundations, spontaneous combustion, and roof fall instabilities. One of the responses by the Mine Safety and Health Administration (MSHA) in 2007 involved a new pattern of violations (POV) process to target mines with a poor safety performance, specifically to improve their safety. However, the 2010 disaster (worst in 40 years) gave an impression that the collective effort of the industry, federal/state agencies, and researchers to achieve the goal of zero fatalities and serious injuries has gone awry. The Safe Performance Index (SPI) methodology developed in this research is a straight-forward, effective, transparent, and reproducible approach that can help in identifying and addressing some of the existing issues while targeting (poor safety performance) mines which need help. It combines three injury and three citation measures that are scaled to have an equal mean (5.0) in a balanced way with proportionate weighting factors (0.05, 0.15, 0.30) and overall normalizing factor (15) into a mine safety performance evaluation tool. It can be used to assess the relative safety-related risk of mines, including by mine-size category. Using 2008 and 2009 data, comparisons were made of SPI-associated, normalized safety performance measures across mine-size categories, with emphasis on small-mine safety performance as compared to large- and medium-sized mines. The accident rates (NDL IR, NFDL IR, SM/100) of very small and small mines in 2008 and 2009 were less than those of medium and large mines. The data indicates a heavy occurrence of very severe injuries in a number of very small and small mines. In another application which is a part of this research, the six normalized safety measures and the SPI are used to evaluate the risk that existed at mines in the two years preceding the occurrence of a fatality. This mine safety performance tracking method could have been helpful to the companies, state agency, or MSHA in recognizing and addressing emerging problems with actions that may have been able to prevent high-risk conditions, the fatality, and/or other serious injuries. The approach would have given scrutiny to the risk of mines that encompassed 74% of the fatalities during 2007-2010. In order to assess the SPI as a comparable risk measurement tool, a traditional risk approach is also developed using data embracing frequency and severity in the final equation to analyze the relative risk for all underground coal mines for the years 2007--2010. Then, the SPI is compared with this traditional risk analysis method to demonstrate that the results attained by either method provide the relative safety-related risk of underground coal mines regarding injuries and citations for violations of regulations. The comparison reveals that the SPI does emulate a traditional approach to risk analysis. A correlation coefficient of --0.89 or more was observed between the results of these two methodologies and either can be used to assist companies, the Mine Safety and Health Administration (MSHA), or state agencies in target-ing mines with high risk for serious injuries and elevated citations for remediation of their injury and/or violation experience. The SPI, however, provides a more understandable approach for mine operators to apply using measures compatible with MSHA's enforcement tools. These methodologies form an all-encompassing approach that can be used to assist companies, the MSHA, or state agencies in targeting mines with high risk for serious injuries and elevated citations. Once targeted as high risk, mines can then pursue appropriate intervention to remediate their violation and/or injury experience. This research may help in plugging the gap in the safety system and better pursue the goal of zero fatalities and serious injuries in the underground coal mines.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Kai; Ren, Fang; Wang, Xuelong
The in-depth understanding of the minority phases’ roles in functional materials, e.g., batteries, is critical for optimizing the system performance and the operational efficiency. Although the visualization of battery electrode under operating conditions has been demonstrated, the development of advanced data-mining approaches is still needed in order to identify minority phases and to understand their functionalities. The present study uses nanoscale X-ray spectromicroscopy to study a functional LiCoO 2/Li battery pouch cell. The data-mining approaches developed herein were used to search through over 10 million X-ray absorption spectra that cover more than 100 active cathode particles. Two particles with unanticipatedmore » chemical fingerprints were identified and further analyzed, providing direct evidence and valuable insight into the undesired side reactions involving the cation dissolution and precipitation as well as the local overlithiation-caused subparticle domain deactivation. As a result, the data-mining approach described in this work is widely applicable to many other structurally complex and chemically heterogeneous systems, in which the secondary/minority phases could critically affect the overall performance of the system, well beyond battery research.« less
Zhang, Kai; Ren, Fang; Wang, Xuelong; ...
2017-11-08
The in-depth understanding of the minority phases’ roles in functional materials, e.g., batteries, is critical for optimizing the system performance and the operational efficiency. Although the visualization of battery electrode under operating conditions has been demonstrated, the development of advanced data-mining approaches is still needed in order to identify minority phases and to understand their functionalities. The present study uses nanoscale X-ray spectromicroscopy to study a functional LiCoO 2/Li battery pouch cell. The data-mining approaches developed herein were used to search through over 10 million X-ray absorption spectra that cover more than 100 active cathode particles. Two particles with unanticipatedmore » chemical fingerprints were identified and further analyzed, providing direct evidence and valuable insight into the undesired side reactions involving the cation dissolution and precipitation as well as the local overlithiation-caused subparticle domain deactivation. As a result, the data-mining approach described in this work is widely applicable to many other structurally complex and chemically heterogeneous systems, in which the secondary/minority phases could critically affect the overall performance of the system, well beyond battery research.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shumway, R.H.; McQuarrie, A.D.
Robust statistical approaches to the problem of discriminating between regional earthquakes and explosions are developed. We compare linear discriminant analysis using descriptive features like amplitude and spectral ratios with signal discrimination techniques using the original signal waveforms and spectral approximations to the log likelihood function. Robust information theoretic techniques are proposed and all methods are applied to 8 earthquakes and 8 mining explosions in Scandinavia and to an event from Novaya Zemlya of unknown origin. It is noted that signal discrimination approaches based on discrimination information and Renyi entropy perform better in the test sample than conventional methods based onmore » spectral ratios involving the P and S phases. Two techniques for identifying the ripple-firing pattern for typical mining explosions are proposed and shown to work well on simulated data and on several Scandinavian earthquakes and explosions. We use both cepstral analysis in the frequency domain and a time domain method based on the autocorrelation and partial autocorrelation functions. The proposed approach strips off underlying smooth spectral and seasonal spectral components corresponding to the echo pattern induced by two simple ripple-fired models. For two mining explosions, a pattern is identified whereas for two earthquakes, no pattern is evident.« less
Intelligent bar chart plagiarism detection in documents.
Al-Dabbagh, Mohammed Mumtaz; Salim, Naomie; Rehman, Amjad; Alkawaz, Mohammed Hazim; Saba, Tanzila; Al-Rodhaan, Mznah; Al-Dhelaan, Abdullah
2014-01-01
This paper presents a novel features mining approach from documents that could not be mined via optical character recognition (OCR). By identifying the intimate relationship between the text and graphical components, the proposed technique pulls out the Start, End, and Exact values for each bar. Furthermore, the word 2-gram and Euclidean distance methods are used to accurately detect and determine plagiarism in bar charts.
ERIC Educational Resources Information Center
Cho, Moon-Heum; Yoo, Jin Soung
2017-01-01
Many researchers who are interested in studying students' online self-regulated learning (SRL) have heavily relied on self-reported surveys. Data mining is an alternative technique that can be used to discover students' SRL patterns from large data logs saved on a course management system. The purpose of this study was to identify students' online…
Intelligent Bar Chart Plagiarism Detection in Documents
Al-Dabbagh, Mohammed Mumtaz; Salim, Naomie; Alkawaz, Mohammed Hazim; Saba, Tanzila; Al-Rodhaan, Mznah; Al-Dhelaan, Abdullah
2014-01-01
This paper presents a novel features mining approach from documents that could not be mined via optical character recognition (OCR). By identifying the intimate relationship between the text and graphical components, the proposed technique pulls out the Start, End, and Exact values for each bar. Furthermore, the word 2-gram and Euclidean distance methods are used to accurately detect and determine plagiarism in bar charts. PMID:25309952
Altiparmak, Fatih; Ferhatosmanoglu, Hakan; Erdal, Selnur; Trost, Donald C
2006-04-01
An effective analysis of clinical trials data involves analyzing different types of data such as heterogeneous and high dimensional time series data. The current time series analysis methods generally assume that the series at hand have sufficient length to apply statistical techniques to them. Other ideal case assumptions are that data are collected in equal length intervals, and while comparing time series, the lengths are usually expected to be equal to each other. However, these assumptions are not valid for many real data sets, especially for the clinical trials data sets. An addition, the data sources are different from each other, the data are heterogeneous, and the sensitivity of the experiments varies by the source. Approaches for mining time series data need to be revisited, keeping the wide range of requirements in mind. In this paper, we propose a novel approach for information mining that involves two major steps: applying a data mining algorithm over homogeneous subsets of data, and identifying common or distinct patterns over the information gathered in the first step. Our approach is implemented specifically for heterogeneous and high dimensional time series clinical trials data. Using this framework, we propose a new way of utilizing frequent itemset mining, as well as clustering and declustering techniques with novel distance metrics for measuring similarity between time series data. By clustering the data, we find groups of analytes (substances in blood) that are most strongly correlated. Most of these relationships already known are verified by the clinical panels, and, in addition, we identify novel groups that need further biomedical analysis. A slight modification to our algorithm results an effective declustering of high dimensional time series data, which is then used for "feature selection." Using industry-sponsored clinical trials data sets, we are able to identify a small set of analytes that effectively models the state of normal health.
Using Data Mining to Detect Health Care Fraud and Abuse: A Review of Literature
Joudaki, Hossein; Rashidian, Arash; Minaei-Bidgoli, Behrouz; Mahmoodi, Mahmood; Geraili, Bijan; Nasiri, Mahdi; Arab, Mohammad
2015-01-01
Inappropriate payments by insurance organizations or third party payers occur because of errors, abuse and fraud. The scale of this problem is large enough to make it a priority issue for health systems. Traditional methods of detecting health care fraud and abuse are time-consuming and inefficient. Combining automated methods and statistical knowledge lead to the emergence of a new interdisciplinary branch of science that is named Knowledge Discovery from Databases (KDD). Data mining is a core of the KDD process. Data mining can help third-party payers such as health insurance organizations to extract useful information from thousands of claims and identify a smaller subset of the claims or claimants for further assessment. We reviewed studies that performed data mining techniques for detecting health care fraud and abuse, using supervised and unsupervised data mining approaches. Most available studies have focused on algorithmic data mining without an emphasis on or application to fraud detection efforts in the context of health service provision or health insurance policy. More studies are needed to connect sound and evidence-based diagnosis and treatment approaches toward fraudulent or abusive behaviors. Ultimately, based on available studies, we recommend seven general steps to data mining of health care claims. PMID:25560347
Alzheimer's disease biomarker discovery using in silico literature mining and clinical validation
2012-01-01
Background Alzheimer’s Disease (AD) is the most widespread form of dementia in the elderly but despite progress made in recent years towards a mechanistic understanding, there is still an urgent need for disease modification therapy and for early diagnostic tests. Substantial international efforts are being made to discover and validate biomarkers for AD using candidate analytes and various data-driven 'omics' approaches. Cerebrospinal fluid is in many ways the tissue of choice for biomarkers of brain disease but is limited by patient and clinician acceptability, and increasing attention is being paid to the search for blood-based biomarkers. The aim of this study was to use a novel in silico approach to discover a set of candidate biomarkers for AD. Methods We used an in silico literature mining approach to identify potential biomarkers by creating a summarized set of assertional metadata derived from relevant legacy information. We then assessed the validity of this approach using direct assays of the identified biomarkers in plasma by immunodetection methods. Results Using this in silico approach, we identified 25 biomarker candidates, at least three of which have subsequently been reported to be altered in blood or CSF from AD patients. Two further candidate biomarkers, indicated from the in silico approach, were choline acetyltransferase and urokinase-type plasminogen activator receptor. Using immunodetection, we showed that, in a large sample set, these markers are either altered in disease or correlate with MRI markers of atrophy. Conclusions These data support as a proof of concept the use of data mining and in silico analyses to derive valid biomarker candidates for AD and, by extension, for other disorders. PMID:23113945
Valente, Carlo C; Bauer, Florian F; Venter, Fritz; Watson, Bruce; Nieuwoudt, Hélène H
2018-03-21
The increasingly large volumes of publicly available sensory descriptions of wine raises the question whether this source of data can be mined to extract meaningful domain-specific information about the sensory properties of wine. We introduce a novel application of formal concept lattices, in combination with traditional statistical tests, to visualise the sensory attributes of a big data set of some 7,000 Chenin blanc and Sauvignon blanc wines. Complexity was identified as an important driver of style in hereto uncharacterised Chenin blanc, and the sensory cues for specific styles were identified. This is the first study to apply these methods for the purpose of identifying styles within varietal wines. More generally, our interactive data visualisation and mining driven approach opens up new investigations towards better understanding of the complex field of sensory science.
Efficient discovery of risk patterns in medical data.
Li, Jiuyong; Fu, Ada Wai-chee; Fahey, Paul
2009-01-01
This paper studies a problem of efficiently discovering risk patterns in medical data. Risk patterns are defined by a statistical metric, relative risk, which has been widely used in epidemiological research. To avoid fruitless search in the complete exploration of risk patterns, we define optimal risk pattern set to exclude superfluous patterns, i.e. complicated patterns with lower relative risk than their corresponding simpler form patterns. We prove that mining optimal risk pattern sets conforms an anti-monotone property that supports an efficient mining algorithm. We propose an efficient algorithm for mining optimal risk pattern sets based on this property. We also propose a hierarchical structure to present discovered patterns for the easy perusal by domain experts. The proposed approach is compared with two well-known rule discovery methods, decision tree and association rule mining approaches on benchmark data sets and applied to a real world application. The proposed method discovers more and better quality risk patterns than a decision tree approach. The decision tree method is not designed for such applications and is inadequate for pattern exploring. The proposed method does not discover a large number of uninteresting superfluous patterns as an association mining approach does. The proposed method is more efficient than an association rule mining method. A real world case study shows that the method reveals some interesting risk patterns to medical practitioners. The proposed method is an efficient approach to explore risk patterns. It quickly identifies cohorts of patients that are vulnerable to a risk outcome from a large data set. The proposed method is useful for exploratory study on large medical data to generate and refine hypotheses. The method is also useful for designing medical surveillance systems.
Brady, Laura M.; Gray, Floyd; Wissler, Craig A.; Guertin, D. Phillip
2001-01-01
In this study, a geographic information system (GIS) is used to integrate and accurately map field studies, information from remotely sensed data, watershed models, and the dispersion of potentially toxic mine waste and tailings. The purpose of this study is to identify erosion rates and net sediment delivery of soil and mine waste/tailings to the drainage channel within several watershed regions to determine source areas of sediment delivery as a method of quantifying geo-environmental analysis of transport mechanisms in abandoned mine lands in arid climate conditions. Users of this study are the researchers interested in exploration of approaches to depicting historical activity in an area which has no baseline data records for environmental analysis of heavily mined terrain.
Renewed mining and reclamation: Imapacts on bats and potential mitigation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, P.E.; Berry, R.D.
Historic mining created new roosting habitat for many bat species. Now the same industry has the potential to adversely impact bats. Contemporary mining operations usually occur in historic districts; consequently the old workings are destroyed by open pit operations. Occasionally, underground techniques are employed, resulting in the enlargement or destruction of the original workings. Even during exploratory operations, historic mine openings can be covered as drill roads are bulldozed, or drills can penetrate and collapse underground workings. Nearby blasting associated with mine construction and operation can disrupt roosting bats. Bats can also be disturbed by the entry of mine personnelmore » to collect ore samples or by recreational mine explorers, since the creation of roads often results in easier access. In addition to roost disturbance, other aspects of renewed mining can have adverse impacts on bat populations, and affect even those bats that do not live in mines. Open cyanide ponds, or other water in which toxic chemicals accumulate, can poison bats and other wildlife. The creation of the pits, roads and processing areas often destroys critical foraging habitat, or change drainage patterns. Finally, at the completion of mining, any historic mines still open may be sealed as part of closure and reclamation activities. The net result can be a loss of bats and bat habitat. Conversely, in some contemporary underground operations, future roosting habitat for bats can be fabricated. An experimental approach to the creation of new roosting habitat is to bury culverts or old tires beneath waste rock. Mining companies can mitigate for impacts to bats by surveying to identify bat-roosting habitat, removing bats prior to renewed mining or closure, protecting non-impacted roost sites with gates and fences, researching to identify habitat requirements and creating new artificial roosts.« less
Quantitative Analysis of Critical Factors for the Climate Impact of Landfill Mining.
Laner, David; Cencic, Oliver; Svensson, Niclas; Krook, Joakim
2016-07-05
Landfill mining has been proposed as an innovative strategy to mitigate environmental risks associated with landfills, to recover secondary raw materials and energy from the deposited waste, and to enable high-valued land uses at the site. The present study quantitatively assesses the importance of specific factors and conditions for the net contribution of landfill mining to global warming using a novel, set-based modeling approach and provides policy recommendations for facilitating the development of projects contributing to global warming mitigation. Building on life-cycle assessment, scenario modeling and sensitivity analysis methods are used to identify critical factors for the climate impact of landfill mining. The net contributions to global warming of the scenarios range from -1550 (saving) to 640 (burden) kg CO2e per Mg of excavated waste. Nearly 90% of the results' total variation can be explained by changes in four factors, namely the landfill gas management in the reference case (i.e., alternative to mining the landfill), the background energy system, the composition of the excavated waste, and the applied waste-to-energy technology. Based on the analyses, circumstances under which landfill mining should be prioritized or not are identified and sensitive parameters for the climate impact assessment of landfill mining are highlighted.
Process mining in oncology using the MIMIC-III dataset
NASA Astrophysics Data System (ADS)
Prima Kurniati, Angelina; Hall, Geoff; Hogg, David; Johnson, Owen
2018-03-01
Process mining is a data analytics approach to discover and analyse process models based on the real activities captured in information systems. There is a growing body of literature on process mining in healthcare, including oncology, the study of cancer. In earlier work we found 37 peer-reviewed papers describing process mining research in oncology with a regular complaint being the limited availability and accessibility of datasets with suitable information for process mining. Publicly available datasets are one option and this paper describes the potential to use MIMIC-III, for process mining in oncology. MIMIC-III is a large open access dataset of de-identified patient records. There are 134 publications listed as using the MIMIC dataset, but none of them have used process mining. The MIMIC-III dataset has 16 event tables which are potentially useful for process mining and this paper demonstrates the opportunities to use MIMIC-III for process mining in oncology. Our research applied the L* lifecycle method to provide a worked example showing how process mining can be used to analyse cancer pathways. The results and data quality limitations are discussed along with opportunities for further work and reflection on the value of MIMIC-III for reproducible process mining research.
NASA Astrophysics Data System (ADS)
Passaro, Perry David
Misconceptions can be thought of as naive approaches to problem solving that are perceptually appealing but incorrect and inconsistent with scientific evidence (Piaget, 1929). One type of misconception involves flow distributions within circuits. This concept is important because miners' conceptual errors about flow distribution changes within complex circuits may be in part responsible for fatal mine disasters. Based on the theory that misconceptions of flow distribution changes within circuits were responsible for underground mine disasters involving mine ventilation circuits, a series of studies was undertaken with mining engineering students, professional mining engineers, as well as mine foremen, mine supervisors, mine rescue members, mine maintenance personnel, mining researchers and working miners to identify these conceptual errors and errors in mine ventilation procedures. Results indicate that misconceptions of flow distribution changes within circuits exist in over 70 percent of the subjects sampled. It is assumed that these misconceptions of flow distribution changes within circuits result in errors of judgment when miners are faced with inferring and changing ventilation arrangements when two or more mine sections are connected. Furthermore, it is assumed that these misconceptions are pervasive in the mining industry and may be responsible for at least two mine ventilation disasters. The findings of this study are consistent with Piaget's (1929) model of figurative and operative knowledge. This model states that misconceptions are in part due to a lack of knowledge of dynamic transformations and how to apply content information. Recommendations for future research include the development of an interactive expert system for training miners with ventilation arrangements. Such a system would meet the educational recommendations made by Piaget (1973b) by involving a hands-on approach that allows discovery, interaction, the opportunity to make mistakes and to review the cognitive concepts on which the subject relied during his manipulation of the ventilation system.
Ye, Kai; Kosters, Walter A; Ijzerman, Adriaan P
2007-03-15
Pattern discovery in protein sequences is often based on multiple sequence alignments (MSA). The procedure can be computationally intensive and often requires manual adjustment, which may be particularly difficult for a set of deviating sequences. In contrast, two algorithms, PRATT2 (http//www.ebi.ac.uk/pratt/) and TEIRESIAS (http://cbcsrv.watson.ibm.com/) are used to directly identify frequent patterns from unaligned biological sequences without an attempt to align them. Here we propose a new algorithm with more efficiency and more functionality than both PRATT2 and TEIRESIAS, and discuss some of its applications to G protein-coupled receptors, a protein family of important drug targets. In this study, we designed and implemented six algorithms to mine three different pattern types from either one or two datasets using a pattern growth approach. We compared our approach to PRATT2 and TEIRESIAS in efficiency, completeness and the diversity of pattern types. Compared to PRATT2, our approach is faster, capable of processing large datasets and able to identify the so-called type III patterns. Our approach is comparable to TEIRESIAS in the discovery of the so-called type I patterns but has additional functionality such as mining the so-called type II and type III patterns and finding discriminating patterns between two datasets. The source code for pattern growth algorithms and their pseudo-code are available at http://www.liacs.nl/home/kosters/pg/.
NASA Astrophysics Data System (ADS)
Yang, Yi-Chen E.; Cai, Ximing; Herricks, Edwin E.
2008-04-01
This paper develops a new approach to identify hydrologic indicators related to fish community and generate a quantitative function between an ecological target index and the identified hydrologic indicators. The approach is based on genetic programming (GP), a data mining method. Using the Shannon Index (a fish community diversity index) or the number of individuals (total abundance) of a fish community, as an ecological target, the GP identified the most ecologically relevant hydrologic indicators (ERHIs) from 32 indicators of hydrologic alteration, for the case study site, the upper Illinois River. Robustness analysis showed that different GP runs found a similar set of ERHIs; each of the identified ERHI from different GP runs had a consistent relationship with the target index. By comparing the GP results with those from principal component analysis and autecology matrix, the three approaches identified a small number (six) of common ERHIs. Particularly, the timing of low flow (Dmin) seems to be more relevant to the diversity of the fish community, while the magnitude of the low flow (Qb) is more relevant to the total fish abundance; large rising rates result in a significant improvement of fish diversity, which is counterintuitive and against previous findings. The quantitative function developed by GP was further used to construct an indicator impact matrix (IIM), which was demonstrated as a potentially useful tool for streamflow restoration design.
Improving the Method of Roof Fall Susceptibility Assessment based on Fuzzy Approach
NASA Astrophysics Data System (ADS)
Ghasemi, Ebrahim; Ataei, Mohammad; Shahriar, Kourosh
2017-03-01
Retreat mining is always accompanied by a great amount of accidents and most of them are due to roof fall. Therefore, development of methodologies to evaluate the roof fall susceptibility (RFS) seems essential. Ghasemi et al. (2012) proposed a systematic methodology to assess the roof fall risk during retreat mining based on risk assessment classic approach. The main defect of this method is ignorance of subjective uncertainties due to linguistic input value of some factors, low resolution, fixed weighting, sharp class boundaries, etc. To remove this defection and improve the mentioned method, in this paper, a novel methodology is presented to assess the RFS using fuzzy approach. The application of fuzzy approach provides an effective tool to handle the subjective uncertainties. Furthermore, fuzzy analytical hierarchy process (AHP) is used to structure and prioritize various risk factors and sub-factors during development of this method. This methodology is applied to identify the susceptibility of roof fall occurrence in main panel of Tabas Central Mine (TCM), Iran. The results indicate that this methodology is effective and efficient in assessing RFS.
Restoring Forests and Associated Ecosystem Services on Appalachian Coal Surface Mines
NASA Astrophysics Data System (ADS)
Zipper, Carl E.; Burger, James A.; Skousen, Jeffrey G.; Angel, Patrick N.; Barton, Christopher D.; Davis, Victor; Franklin, Jennifer A.
2011-05-01
Surface coal mining in Appalachia has caused extensive replacement of forest with non-forested land cover, much of which is unmanaged and unproductive. Although forested ecosystems are valued by society for both marketable products and ecosystem services, forests have not been restored on most Appalachian mined lands because traditional reclamation practices, encouraged by regulatory policies, created conditions poorly suited for reforestation. Reclamation scientists have studied productive forests growing on older mine sites, established forest vegetation experimentally on recent mines, and identified mine reclamation practices that encourage forest vegetation re-establishment. Based on these findings, they developed a Forestry Reclamation Approach (FRA) that can be employed by coal mining firms to restore forest vegetation. Scientists and mine regulators, working collaboratively, have communicated the FRA to the coal industry and to regulatory enforcement personnel. Today, the FRA is used routinely by many coal mining firms, and thousands of mined hectares have been reclaimed to restore productive mine soils and planted with native forest trees. Reclamation of coal mines using the FRA is expected to restore these lands' capabilities to provide forest-based ecosystem services, such as wood production, atmospheric carbon sequestration, wildlife habitat, watershed protection, and water quality protection to a greater extent than conventional reclamation practices.
Song, Min
2016-01-01
In biomedicine, scientific literature is a valuable source for knowledge discovery. Mining knowledge from textual data has become an ever important task as the volume of scientific literature is growing unprecedentedly. In this paper, we propose a framework for examining a certain disease based on existing information provided by scientific literature. Disease-related entities that include diseases, drugs, and genes are systematically extracted and analyzed using a three-level network-based approach. A paper-entity network and an entity co-occurrence network (macro-level) are explored and used to construct six entity specific networks (meso-level). Important diseases, drugs, and genes as well as salient entity relations (micro-level) are identified from these networks. Results obtained from the literature-based literature mining can serve to assist clinical applications. PMID:27195695
Systematic analysis of molecular mechanisms for HCC metastasis via text mining approach.
Zhen, Cheng; Zhu, Caizhong; Chen, Haoyang; Xiong, Yiru; Tan, Junyuan; Chen, Dong; Li, Jin
2017-02-21
To systematically explore the molecular mechanism for hepatocellular carcinoma (HCC) metastasis and identify regulatory genes with text mining methods. Genes with highest frequencies and significant pathways related to HCC metastasis were listed. A handful of proteins such as EGFR, MDM2, TP53 and APP, were identified as hub nodes in PPI (protein-protein interaction) network. Compared with unique genes for HBV-HCCs, genes particular to HCV-HCCs were less, but may participate in more extensive signaling processes. VEGFA, PI3KCA, MAPK1, MMP9 and other genes may play important roles in multiple phenotypes of metastasis. Genes in abstracts of HCC-metastasis literatures were identified. Word frequency analysis, KEGG pathway and PPI network analysis were performed. Then co-occurrence analysis between genes and metastasis-related phenotypes were carried out. Text mining is effective for revealing potential regulators or pathways, but the purpose of it should be specific, and the combination of various methods will be more useful.
Brain-computer interface using wavelet transformation and naïve bayes classifier.
Bassani, Thiago; Nievola, Julio Cesar
2010-01-01
The main purpose of this work is to establish an exploratory approach using electroencephalographic (EEG) signal, analyzing the patterns in the time-frequency plane. This work also aims to optimize the EEG signal analysis through the improvement of classifiers and, eventually, of the BCI performance. In this paper a novel exploratory approach for data mining of EEG signal based on continuous wavelet transformation (CWT) and wavelet coherence (WC) statistical analysis is introduced and applied. The CWT allows the representation of time-frequency patterns of the signal's information content by WC qualiatative analysis. Results suggest that the proposed methodology is capable of identifying regions in time-frequency spectrum during the specified task of BCI. Furthermore, an example of a region is identified, and the patterns are classified using a Naïve Bayes Classifier (NBC). This innovative characteristic of the process justifies the feasibility of the proposed approach to other data mining applications. It can open new physiologic researches in this field and on non stationary time series analysis.
Exploring the use of situation awareness in behaviors and practices of health and safety leaders.
Willmer, D R
2017-01-01
An understanding of how health and safety management systems (HSMS) reduce worksite injuries, illnesses and fatalities may be gained in studying the behaviors of health and safety leaders. These leaders bear the accountability for identifying, understanding and managing the risks of a mining operation. More importantly, they have to transfer this knowledge of perception, recognition and response to risks in the mining environment to their workers. The leaders' efforts to build and maintain a mining operation's workforce that consistently executes safe work practices may be captured through more than just lagging indicators of health and safety performance. This exploratory study interviewed six leaders in occupations such as site-level safety supervisors, mine superintendents and/or general managers at surface and underground stone, sand and gravel and metal/nonmetal mine sites throughout the United States, with employee populations ranging from 40 to 175. In exploring leaders' perspectives on how they systematically manage health and safety, examples such as approaches to task training, handling near-miss incidents, identifying future leaders and providing workers with feedback offer insights into how leaders translate their knowledge and management of site-level risks to others.
Exploring the use of situation awareness in behaviors and practices of health and safety leaders
Willmer, D.R.
2018-01-01
An understanding of how health and safety management systems (HSMS) reduce worksite injuries, illnesses and fatalities may be gained in studying the behaviors of health and safety leaders. These leaders bear the accountability for identifying, understanding and managing the risks of a mining operation. More importantly, they have to transfer this knowledge of perception, recognition and response to risks in the mining environment to their workers. The leaders’ efforts to build and maintain a mining operation’s workforce that consistently executes safe work practices may be captured through more than just lagging indicators of health and safety performance. This exploratory study interviewed six leaders in occupations such as site-level safety supervisors, mine superintendents and/or general managers at surface and underground stone, sand and gravel and metal/nonmetal mine sites throughout the United States, with employee populations ranging from 40 to 175. In exploring leaders’ perspectives on how they systematically manage health and safety, examples such as approaches to task training, handling near-miss incidents, identifying future leaders and providing workers with feedback offer insights into how leaders translate their knowledge and management of site-level risks to others. PMID:29593373
Managing equipment innovations in mining: A review.
Trudel, Bryan; Nadeau, Sylvie; Zaras, Kazimierz; Deschamps, Isabelle
2015-01-01
Technological innovations in mining equipment have led to increased productivity and occupational health and safety (OHS) performance, but their introduction also brings new risks for workers. The aim of this study is to provide support for mining industry managers who are required to reconcile equipment choices with OHS and productivity. Examination of the literature through interdisciplinary digital databases. Databases were searched using specific combinations of keywords and limited to studies dating back no farther than 1992. The ``snowball'' technique was also used to examining the references listed in research articles initially identified with the databases. A total of 19 contextual factors were identified as having the potential to influence the OHS and productivity leverage of equipment innovations. The most often cited among these factors are the level of training provided to the equipment operators, operator experience and age, supervisor leadership abilities, and maintaining good relations within work crews. Interactions between these factors are not discussed in mining innovation literature. It would be helpful to use a systems thinking approach which incorporates interaction between relevant actors and factors to define properly the most sensitive aspects of innovation management as it applies to mining equipment.
MELODI: Mining Enriched Literature Objects to Derive Intermediates
Elsworth, Benjamin; Dawe, Karen; Vincent, Emma E; Langdon, Ryan; Lynch, Brigid M; Martin, Richard M; Relton, Caroline; Higgins, Julian P T; Gaunt, Tom R
2018-01-01
Abstract Background The scientific literature contains a wealth of information from different fields on potential disease mechanisms. However, identifying and prioritizing mechanisms for further analytical evaluation presents enormous challenges in terms of the quantity and diversity of published research. The application of data mining approaches to the literature offers the potential to identify and prioritize mechanisms for more focused and detailed analysis. Methods Here we present MELODI, a literature mining platform that can identify mechanistic pathways between any two biomedical concepts. Results Two case studies demonstrate the potential uses of MELODI and how it can generate hypotheses for further investigation. First, an analysis of ETS-related gene ERG and prostate cancer derives the intermediate transcription factor SP1, recently confirmed to be physically interacting with ERG. Second, examining the relationship between a new potential risk factor for pancreatic cancer identifies possible mechanistic insights which can be studied in vitro. Conclusions We have demonstrated the possible applications of MELODI, including two case studies. MELODI has been implemented as a Python/Django web application, and is freely available to use at [www.melodi.biocompute.org.uk]. PMID:29342271
Wilson, Paul; Larminie, Christopher; Smith, Rona
2016-01-01
To use literature mining to catalogue Behçet's associated genes, and advanced computational methods to improve the understanding of the pathways and signalling mechanisms that lead to the typical clinical characteristics of Behçet's patients. To extend this technique to identify potential treatment targets for further experimental validation. Text mining methods combined with gene enrichment tools, pathway analysis and causal analysis algorithms. This approach identified 247 human genes associated with Behçet's disease and the resulting disease map, comprising 644 nodes and 19220 edges, captured important details of the relationships between these genes and their associated pathways, as described in diverse data repositories. Pathway analysis has identified how Behçet's associated genes are likely to participate in innate and adaptive immune responses. Causal analysis algorithms have identified a number of potential therapeutic strategies for further investigation. Computational methods have captured pertinent features of the prominent disease characteristics presented in Behçet's disease and have highlighted NOD2, ICOS and IL18 signalling as potential therapeutic strategies.
NASA Astrophysics Data System (ADS)
Gawior, D.; Rutkiewicz, P.; Malik, I.; Wistuba, M.
2017-11-01
LiDAR data provide new insights into the historical development of mining industry recorded in the topography and landscape. In the study on the lead ore mining in the 13th-17th century we identified remnants of mining activity in relief that are normally obscured by dense vegetation. The industry in Tarnowice Plateau was based on exploitation of galena from the bedrock. New technologies, including DEM from airborne LiDAR provide show that present landscape and relief of post-mining area under study developed during several, subsequent phases of exploitation when different techniques of exploitation were used and probably different types of ores were exploited. Study conducted on the Tarnowice Plateau proved that combining GIS visualization techniques with historical maps, among all geological maps, is a promising approach in reconstructing development of anthropogenic relief and landscape..
Ghazizadeh, Mahtab; McDonald, Anthony D; Lee, John D
2014-09-01
This study applies text mining to extract clusters of vehicle problems and associated trends from free-response data in the National Highway Traffic Safety Administration's vehicle owner's complaint database. As the automotive industry adopts new technologies, it is important to systematically assess the effect of these changes on traffic safety. Driving simulators, naturalistic driving data, and crash databases all contribute to a better understanding of how drivers respond to changing vehicle technology, but other approaches, such as automated analysis of incident reports, are needed. Free-response data from incidents representing two severity levels (fatal incidents and incidents involving injury) were analyzed using a text mining approach: latent semantic analysis (LSA). LSA and hierarchical clustering identified clusters of complaints for each severity level, which were compared and analyzed across time. Cluster analysis identified eight clusters of fatal incidents and six clusters of incidents involving injury. Comparisons showed that although the airbag clusters across the two severity levels have the same most frequent terms, the circumstances around the incidents differ. The time trends show clear increases in complaints surrounding the Ford/Firestone tire recall and the Toyota unintended acceleration recall. Increases in complaints may be partially driven by these recall announcements and the associated media attention. Text mining can reveal useful information from free-response databases that would otherwise be prohibitively time-consuming and difficult to summarize manually. Text mining can extend human analysis capabilities for large free-response databases to support earlier detection of problems and more timely safety interventions.
Machine learning approaches to analysing textual injury surveillance data: a systematic review.
Vallmuur, Kirsten
2015-06-01
To synthesise recent research on the use of machine learning approaches to mining textual injury surveillance data. Systematic review. The electronic databases which were searched included PubMed, Cinahl, Medline, Google Scholar, and Proquest. The bibliography of all relevant articles was examined and associated articles were identified using a snowballing technique. For inclusion, articles were required to meet the following criteria: (a) used a health-related database, (b) focused on injury-related cases, AND used machine learning approaches to analyse textual data. The papers identified through the search were screened resulting in 16 papers selected for review. Articles were reviewed to describe the databases and methodology used, the strength and limitations of different techniques, and quality assurance approaches used. Due to heterogeneity between studies meta-analysis was not performed. Occupational injuries were the focus of half of the machine learning studies and the most common methods described were Bayesian probability or Bayesian network based methods to either predict injury categories or extract common injury scenarios. Models were evaluated through either comparison with gold standard data or content expert evaluation or statistical measures of quality. Machine learning was found to provide high precision and accuracy when predicting a small number of categories, was valuable for visualisation of injury patterns and prediction of future outcomes. However, difficulties related to generalizability, source data quality, complexity of models and integration of content and technical knowledge were discussed. The use of narrative text for injury surveillance has grown in popularity, complexity and quality over recent years. With advances in data mining techniques, increased capacity for analysis of large databases, and involvement of computer scientists in the injury prevention field, along with more comprehensive use and description of quality assurance methods in text mining approaches, it is likely that we will see a continued growth and advancement in knowledge of text mining in the injury field. Copyright © 2015 Elsevier Ltd. All rights reserved.
Modeling of gold production in Malaysia
NASA Astrophysics Data System (ADS)
Muda, Nora; Ainuddeen, Nasihah Rasyiqah; Ismail, Hamizun; Umor, Mohd Rozi
2013-04-01
This study was conducted to identify the main factors that contribute to the gold production and hence determine the factors that affect to the development of the mining industry in Malaysia. An econometric approach was used by performing the cointegration analysis among the factors to determine the existence of long term relationship between the gold prices, the number of gold mines, the number of workers in gold mines and the gold production. The study continued with the Granger analysis to determine the relationship between factors and gold production. Results have found that there are long term relationship between price, gold production and number of employees. Granger causality analysis shows that there is only one way relationship between the number of employees with gold production in Malaysia and the number of gold mines in Malaysia.
Incorporating ecosystem services into environmental management of deep-seabed mining
NASA Astrophysics Data System (ADS)
Le, Jennifer T.; Levin, Lisa A.; Carson, Richard T.
2017-03-01
Accelerated exploration of minerals in the deep sea over the past decade has raised the likelihood that commercial mining of the deep seabed will commence in the near future. Environmental concerns create a growing urgency for development of environmental regulations under commercial exploitation. Here, we consider an ecosystem services approach to the environmental policy and management of deep-sea mineral resources. Ecosystem services link the environment and human well-being, and can help improve sustainability and stewardship of the deep sea by providing a quantitative basis for decision-making. This paper briefly reviews ecosystem services provided by habitats targeted for deep-seabed mining (hydrothermal vents, seamounts, nodule provinces, and phosphate-rich margins), and presents practical steps to incorporate ecosystem services into deep-seabed mining regulation. The linkages and translation between ecosystem structure, ecological function (including supporting services), and ecosystem services are highlighted as generating human benefits. We consider criteria for identifying which ecosystem services are vulnerable to potential mining impacts, the role of ecological functions in providing ecosystem services, development of ecosystem service indicators, valuation of ecosystem services, and implementation of ecosystem services concepts. The first three steps put ecosystem services into a deep-seabed mining context; the last two steps help to incorporate ecosystem services into a management and decision-making framework. Phases of environmental planning discussed in the context of ecosystem services include conducting strategic environmental assessments, collecting baseline data, monitoring, establishing marine protected areas, assessing cumulative impacts, identifying thresholds and triggers, and creating an environmental damage compensation regime. We also identify knowledge gaps that need to be addressed in order to operationalize ecosystem services concepts in deep-seabed mining regulation and propose potential tools to fill them.
A New Approach in Coal Mine Exploration Using Cosmic Ray Muons
NASA Astrophysics Data System (ADS)
Darijani, Reza; Negarestani, Ali; Rezaie, Mohammad Reza; Fatemi, Syed Jalil; Akhond, Ahmad
2016-08-01
Muon radiography is a technique that uses cosmic ray muons to image the interior of large scale geological structures. The muon absorption in matter is the most important parameter in cosmic ray muon radiography. Cosmic ray muon radiography is similar to X-ray radiography. The main aim in this survey is the simulation of the muon radiography for exploration of mines. So, the production source, tracking, and detection of cosmic ray muons were simulated by MCNPX code. For this purpose, the input data of the source card in MCNPX code were extracted from the muon energy spectrum at sea level. In addition, the other input data such as average density and thickness of layers that were used in this code are the measured data from Pabdana (Kerman, Iran) coal mines. The average thickness and density of these layers in the coal mines are from 2 to 4 m and 1.3 gr/c3, respectively. To increase the spatial resolution, a detector was placed inside the mountain. The results indicated that using this approach, the layers with minimum thickness about 2.5 m can be identified.
Nimick, David A.; Church, Stan E.; Finger, Susan E.
2004-01-01
The Boulder River watershed is one of many watersheds in the western United States where historical mining has left a legacy of acid mine drainage and elevated concentrations of potentially toxic trace elements. Abandoned mine lands commonly are located on or affect Federal land. Cleaning up these Federal lands will require substantial investment of resources. As part of a cooperative effort with Federal land-management agencies, the U.S. Geological Survey implemented an Abandoned Mine Lands Initiative in 1997. The goal of the initiative was to use the watershed approach to develop a strategy for gathering and communicating the scientific information needed to formulate effective and cost-efficient remediation of affected lands in a watershed. The watershed approach is based on the premise that contaminated sites that have the most profound effect on water and ecosystem quality within an entire watershed should be identified, characterized, and ranked for remediation.The watershed approach provides an effective means to evaluate the overall status of affected resources and helps to focus remediation at sites where the most benefit will be gained in the watershed. Such a large-scale approach can result in the collection of extensive information on the geology and geochemistry of rocks and sediment, the hydrology and water chemistry of streams and ground water, and the diversity and health of aquatic and terrestrial organisms. During the assessment of the Boulder River watershed, we inventoried historical mines, defined geological conditions, assessed fish habitat, collected and chemically analyzed hundreds of water and sediment samples, conducted toxicity tests, analyzed fish tissue and indicators of physiological malfunction, examined invertebrates and biofilm, and defined hydrological regimes. Land- and resource-management agencies are faced with evaluating risks associated with thousands of potentially harmful mine sites, and this level of effort is not always feasible for every affected watershed. The detailed work described in this report can help Federal land-management agencies decide which characterization efforts would be most useful in characterization of other affected watersheds.
NASA Astrophysics Data System (ADS)
Masaitis, Alexandra
2014-05-01
New economic, environmental and social challenges for the mining industry in the USA show the need to implement "responsible" mining practices that include improved community involvement. Conflicts which occur in the US territory and with US mining companies around the world are now common between the mining proponents, NGO's and communities. These conflicts can sometimes be alleviated by early development of modes of communication, and a formal discussion format that allows airing of concerns and potential resolution of problems. One of the methods that can formalize this process is to establish a Good Neighbor Agreement (GNA), which deals specifically with challenges in relationships between mining operations and the local communities. It is a new practice related to mining operations that are oriented toward social needs and concerns of local communities that arise during the normal life of a mine, which can achieve sustainable mining practices. The GNA project being currently developed at the University of Nevada, USA in cooperation with the Newmont Mining Corporation has a goal of creating an open company/community dialog that will help identify and address sociological and environmental concerns associated with mining. Discussion: The Good Neighbor Agreement currently evolving will address the following: 1. Identify spheres of possible cooperation between mining companies, government organizations, and NGO's. 2. Provide an economically viable mechanism for developing a partnership between mining operations and the local communities that will increase mining industry's accountability and provide higher levels of confidence for the community that a mine is operated in a safe and sustainable manner. Implementation of the GNA can help identify and evaluate conflict criteria in mining/community relationships; determine the status of concerns; determine the role and responsibilities of stakeholders; analyze problem resolution feasibility; maintain the community involvement and support through economic benefits and environmental safeguards; develop options for the concerns resolution. Difficulties in establishing the GNA standards include lack of insurance/bonding policies, and by the lack of audit and monitoring that could determine the level of exposure of the local community and the environment to the contaminants released at the mine sites. Since many problems of mines can occur during closure and post-closure, GNA's should address those issues also. The goal of the GNA is to have open access for the public to the safety, health, and environmental information pertaining to the mining operation, as well as to educate the local communities about mining practices that promote mutual acknowledgment of the need to build a relationship amenable to each other's needs. Frequent conflicts between mining companies and surrounding communities lead to work disruptions or even mine closures and show the necessity of a less confrontational approach to environmental and social justice. The Good Neighbor Agreement is a unique way to provide the benefits for the both mining operations and local community to provide a mechanism for risk redaction and communication that offer the potential to protect both mining and community interests.
Multiagent data warehousing and multiagent data mining for cerebrum/cerebellum modeling
NASA Astrophysics Data System (ADS)
Zhang, Wen-Ran
2002-03-01
An algorithm named Neighbor-Miner is outlined for multiagent data warehousing and multiagent data mining. The algorithm is defined in an evolving dynamic environment with autonomous or semiautonomous agents. Instead of mining frequent itemsets from customer transactions, the new algorithm discovers new agents and mining agent associations in first-order logic from agent attributes and actions. While the Apriori algorithm uses frequency as a priory threshold, the new algorithm uses agent similarity as priory knowledge. The concept of agent similarity leads to the notions of agent cuboid, orthogonal multiagent data warehousing (MADWH), and multiagent data mining (MADM). Based on agent similarities and action similarities, Neighbor-Miner is proposed and illustrated in a MADWH/MADM approach to cerebrum/cerebellum modeling. It is shown that (1) semiautonomous neurofuzzy agents can be identified for uniped locomotion and gymnastic training based on attribute relevance analysis; (2) new agents can be discovered and agent cuboids can be dynamically constructed in an orthogonal MADWH, which resembles an evolving cerebrum/cerebellum system; and (3) dynamic motion laws can be discovered as association rules in first order logic. Although examples in legged robot gymnastics are used to illustrate the basic ideas, the new approach is generally suitable for a broad category of data mining tasks where knowledge can be discovered collectively by a set of agents from a geographically or geometrically distributed but relevant environment, especially in scientific and engineering data environments.
Detecting and characterizing coal mine related seismicity in the Western U.S. using subspace methods
NASA Astrophysics Data System (ADS)
Chambers, Derrick J. A.; Koper, Keith D.; Pankow, Kristine L.; McCarter, Michael K.
2015-11-01
We present an approach for subspace detection of small seismic events that includes methods for estimating magnitudes and associating detections from multiple stations into unique events. The process is used to identify mining related seismicity from a surface coal mine and an underground coal mining district, both located in the Western U.S. Using a blasting log and a locally derived seismic catalogue as ground truth, we assess detector performance in terms of verified detections, false positives and failed detections. We are able to correctly identify over 95 per cent of the surface coal mine blasts and about 33 per cent of the events from the underground mining district, while keeping the number of potential false positives relatively low by requiring all detections to occur on two stations. We find that most of the potential false detections for the underground coal district are genuine events missed by the local seismic network, demonstrating the usefulness of regional subspace detectors in augmenting local catalogues. We note a trade-off in detection performance between stations at smaller source-receiver distances, which have increased signal-to-noise ratio, and stations at larger distances, which have greater waveform similarity. We also explore the increased detection capabilities of a single higher dimension subspace detector, compared to multiple lower dimension detectors, in identifying events that can be described as linear combinations of training events. We find, in our data set, that such an advantage can be significant, justifying the use of a subspace detection scheme over conventional correlation methods.
Mining Tasks from the Web Anchor Text Graph: MSR Notebook Paper for the TREC 2015 Tasks Track
2015-11-20
Mining Tasks from the Web Anchor Text Graph: MSR Notebook Paper for the TREC 2015 Tasks Track Paul N. Bennett Microsoft Research Redmond, USA pauben...anchor text graph has proven useful in the general realm of query reformulation [2], we sought to quantify the value of extracting key phrases from...anchor text in the broader setting of the task understanding track. Given a query, our approach considers a simple method for identifying a relevant
Investigation into the effect of infrastructure on fly-in fly-out mining workers.
Perring, Adam; Pham, Kieu; Snow, Steve; Buys, Laurie
2014-12-01
To explore fly-in fly-out (FIFO) mining workers' attitudes towards the leisure time they spend in mining camps, the recreational and social aspects of mining camp culture, the camps' communal and recreational infrastructure and activities, and implications for health. In-depth semistructured interviews. Individual interviews at locations convenient for each participant. A total of seven participants, one female and six males. The age group varied within 20-59 years. Marital status varied across participants. A qualitative approach was used to interview participants, with responses thematically analysed. Findings highlight how the recreational infrastructure and activities at mining camps impact participants' enjoyment of the camps and their feelings of community and social inclusion. Three main areas of need were identified in the interviews, as follows: (i) on-site facilities and activities; (ii) the role of infrastructure in facilitating a sense of community; and (iii) barriers to social interaction. Recreational infrastructure and activities enhance the experience of FIFO workers at mining camps. The availability of quality recreational facilities helps promote social interaction, provides for greater social inclusion and improves the experience of mining camps for their temporary FIFO residents. The infrastructure also needs to allow for privacy and individual recreational activities, which participants identified as important emotional needs. Developing appropriate recreational infrastructure at mining camps would enhance social interactions among FIFO workers, improve their well-being and foster a sense of community. Introducing infrastructure to promote social and recreational activities could also reduce alcohol-related social exclusion. © 2014 National Rural Health Alliance Inc.
A data mining approach to intelligence operations
NASA Astrophysics Data System (ADS)
Memon, Nasrullah; Hicks, David L.; Harkiolakis, Nicholas
2008-03-01
In this paper we examine the latest thinking, approaches and methodologies in use for finding the nuggets of information and subliminal (and perhaps intentionally hidden) patterns and associations that are critical to identify criminal activity and suspects to private and government security agencies. An emphasis in the paper is placed on Social Network Analysis and Investigative Data Mining, and the use of these technologies in the counterterrorism domain. Tools and techniques from both areas are described, along with the important tasks for which they can be used to assist with the investigation and analysis of terrorist organizations. The process of collecting data about these organizations is also considered along with the inherent difficulties that are involved.
Zhao, Ning; Zheng, Guang; Li, Jian; Zhao, Hong-Yan; Lu, Cheng; Jiang, Miao; Zhang, Chi; Guo, Hong-Tao; Lu, Ai-Ping
2018-01-09
To identify the commonalities between rheumatoid arthritis (RA) and diabetes mellitus (DM) to understand the mechanisms of Chinese medicine (CM) in different diseases with the same treatment. A text mining approach was adopted to analyze the commonalities between RA and DM according to CM and biological elements. The major commonalities were subsequently verifified in RA and DM rat models, in which herbal formula for the treatment of both RA and DM identifified via text mining was used as the intervention. Similarities were identifified between RA and DM regarding the CM approach used for diagnosis and treatment, as well as the networks of biological activities affected by each disease, including the involvement of adhesion molecules, oxidative stress, cytokines, T-lymphocytes, apoptosis, and inflfl ammation. The Ramulus Cinnamomi-Radix Paeoniae Alba-Rhizoma Anemarrhenae is an herbal combination used to treat RA and DM. This formula demonstrated similar effects on oxidative stress and inflfl ammation in rats with collagen-induced arthritis, which supports the text mining results regarding the commonalities between RA and DM. Commonalities between the biological activities involved in RA and DM were identifified through text mining, and both RA and DM might be responsive to the same intervention at a specifific stage.
Byrne, Patrick; Runkel, Robert L; Walton-Day, Katherine
2017-07-01
Combining the synoptic mass balance approach with principal components analysis (PCA) can be an effective method for discretising the chemistry of inflows and source areas in watersheds where contamination is diffuse in nature and/or complicated by groundwater interactions. This paper presents a field-scale study in which synoptic sampling and PCA are employed in a mineralized watershed (Lion Creek, Colorado, USA) under low flow conditions to (i) quantify the impacts of mining activity on stream water quality; (ii) quantify the spatial pattern of constituent loading; and (iii) identify inflow sources most responsible for observed changes in stream chemistry and constituent loading. Several of the constituents investigated (Al, Cd, Cu, Fe, Mn, Zn) fail to meet chronic aquatic life standards along most of the study reach. The spatial pattern of constituent loading suggests four primary sources of contamination under low flow conditions. Three of these sources are associated with acidic (pH <3.1) seeps that enter along the left bank of Lion Creek. Investigation of inflow water (trace metal and major ion) chemistry using PCA suggests a hydraulic connection between many of the left bank inflows and mine water in the Minnesota Mine shaft located to the north-east of the river channel. In addition, water chemistry data during a rainfall-runoff event suggests the spatial pattern of constituent loading may be modified during rainfall due to dissolution of efflorescent salts or erosion of streamside tailings. These data point to the complexity of contaminant mobilisation processes and constituent loading in mining-affected watersheds but the combined synoptic sampling and PCA approach enables a conceptual model of contaminant dynamics to be developed to inform remediation.
Byrne, Patrick; Runkel, Robert L.; Walton-Day, Katie
2017-01-01
Combining the synoptic mass balance approach with principal components analysis (PCA) can be an effective method for discretising the chemistry of inflows and source areas in watersheds where contamination is diffuse in nature and/or complicated by groundwater interactions. This paper presents a field-scale study in which synoptic sampling and PCA are employed in a mineralized watershed (Lion Creek, Colorado, USA) under low flow conditions to (i) quantify the impacts of mining activity on stream water quality; (ii) quantify the spatial pattern of constituent loading; and (iii) identify inflow sources most responsible for observed changes in stream chemistry and constituent loading. Several of the constituents investigated (Al, Cd, Cu, Fe, Mn, Zn) fail to meet chronic aquatic life standards along most of the study reach. The spatial pattern of constituent loading suggests four primary sources of contamination under low flow conditions. Three of these sources are associated with acidic (pH <3.1) seeps that enter along the left bank of Lion Creek. Investigation of inflow water (trace metal and major ion) chemistry using PCA suggests a hydraulic connection between many of the left bank inflows and mine water in the Minnesota Mine shaft located to the north-east of the river channel. In addition, water chemistry data during a rainfall-runoff event suggests the spatial pattern of constituent loading may be modified during rainfall due to dissolution of efflorescent salts or erosion of streamside tailings. These data point to the complexity of contaminant mobilisation processes and constituent loading in mining-affected watersheds but the combined synoptic sampling and PCA approach enables a conceptual model of contaminant dynamics to be developed to inform remediation.
Sánchez-López, Ariadna S; Del Carmen A González-Chávez, Ma; Carrillo-González, Rogelio; Vangronsveld, Jaco; Díaz-Garduño, Margarita
2015-01-01
The aim of this research was to identify wild plant species applicable for remediation of mine tailings in arid soils. Plants growing on two mine tailings were identified and evaluated for their potential use in phytoremediation based on the concentration of potentially toxic elements (PTEs) in roots and shoots, bioconcentration (BCF) and translocation factors (TF). Total, water-soluble and DTPA-extractable concentrations of Pb, Cd, Zn, Cu, Co and Ni in rhizospheric and bulk soil were determined. Twelve species can grow on mine tailings, accumulate PTEs concentrations above the commonly accepted phytotoxicity levels, and are suitable for establishing a vegetation cover on barren mine tailings in the Zimapan region. Pteridium sp. is suitable for Zn and Cd phytostabilization. Aster gymnocephalus is a potential phytoextractor for Zn, Cd, Pb and Cu; Gnaphalium sp. for Cu and Crotalaria pumila for Zn. The species play different roles according to the specific conditions where they are growing at one site behaving as a PTEs accumulator and at another as a stabilizer. For this reason and due to the lack of a unified approach for calculation and interpretation of bioaccumulation factors, only considering BCF and TF may be not practical in all cases.
NASA Astrophysics Data System (ADS)
Gutierrez, Adrian Emmanuel Gutierrez
A 3D gravity model of the Copper Flat Mine was performed as part of the exploration of new resources in at the mine. The project is located in the Las Animas Mining District in Sierra County, New Mexico. The mine has been producing ore since 1877 and is currently owned by the New Mexico Copper Corporation, which plans o bringing the closed copper mine back into production with innovation and a sustainable approach to mining development. The Project is located on the Eastern side of the Arizona-Sonora-New Mexico porphyry copper Belt of Cretaceous age. Copper Flat is predominantly a Cretaceous age stratovolcano composed mostly of quartz monzonite. The quartz monzonite was intruded by a block of andesite alter which a series of latite dikes creating veining along the topography where the majority of the deposit. The Copper Flat deposit is mineralized along a breccia pipe where the breccia is the result of auto-brecciation due to the pore pressure. There have been a number of geophysical studies conducted at the site. The most recent survey was a gravity profile on the area. The purpose of the new study is the reinterpretation of the IP Survey and emphasizes the practical use of the gravity geophysical method in evaluating the validity of the previous survey results. The primary method used to identify the deposit is gravity in which four Talwani models were created in order to created a 3D model of the ore body. The Talwani models have numerical integration approaches that were used to divide every model into polygons. The profiles were sectioned into polygons; each polygon was assigning a specific density depending on the body being drawn. Three different gridding techniques with three different filtering methods were used producing ten maps prior to the modeling, these maps were created to establish the best map to fit the models. The calculation of the polygons used an exact formula instead of the numerical integration of the profile made with a Talwani approach. A least squared comparison between the calculated and observed gravity is used to determine the best fitting gravity vectors and the best susceptibility for the assemblage of polygonal prisms. The survey is expected to identify the geophysical anomalies found at the Copper Flat deposit in order to identify the alteration that surrounds that part of the ore body. The understanding of the anomalies needs to be reevaluated in order to have a sharper model of Copper Flat, and to understand the relations of the different structures that shaped this copper porphyry deposit.
MELODI: Mining Enriched Literature Objects to Derive Intermediates.
Elsworth, Benjamin; Dawe, Karen; Vincent, Emma E; Langdon, Ryan; Lynch, Brigid M; Martin, Richard M; Relton, Caroline; Higgins, Julian P T; Gaunt, Tom R
2018-01-12
The scientific literature contains a wealth of information from different fields on potential disease mechanisms. However, identifying and prioritizing mechanisms for further analytical evaluation presents enormous challenges in terms of the quantity and diversity of published research. The application of data mining approaches to the literature offers the potential to identify and prioritize mechanisms for more focused and detailed analysis. Here we present MELODI, a literature mining platform that can identify mechanistic pathways between any two biomedical concepts. Two case studies demonstrate the potential uses of MELODI and how it can generate hypotheses for further investigation. First, an analysis of ETS-related gene ERG and prostate cancer derives the intermediate transcription factor SP1, recently confirmed to be physically interacting with ERG. Second, examining the relationship between a new potential risk factor for pancreatic cancer identifies possible mechanistic insights which can be studied in vitro. We have demonstrated the possible applications of MELODI, including two case studies. MELODI has been implemented as a Python/Django web application, and is freely available to use at [www.melodi.biocompute.org.uk]. © The Author(s) 2018. Published by Oxford University Press on behalf of the International Epidemiological Association
An application of data mining in district heating substations for improving energy performance
NASA Astrophysics Data System (ADS)
Xue, Puning; Zhou, Zhigang; Chen, Xin; Liu, Jing
2017-11-01
Automatic meter reading system is capable of collecting and storing a huge number of district heating (DH) data. However, the data obtained are rarely fully utilized. Data mining is a promising technology to discover potential interesting knowledge from vast data. This paper applies data mining methods to analyse the massive data for improving energy performance of DH substation. The technical approach contains three steps: data selection, cluster analysis and association rule mining (ARM). Two-heating-season data of a substation are used for case study. Cluster analysis identifies six distinct heating patterns based on the primary heat of the substation. ARM reveals that secondary pressure difference and secondary flow rate have a strong correlation. Using the discovered rules, a fault occurring in remote flow meter installed at secondary network is detected accurately. The application demonstrates that data mining techniques can effectively extrapolate potential useful knowledge to better understand substation operation strategies and improve substation energy performance.
NASA Astrophysics Data System (ADS)
Sakala, E.; Fourie, F.; Gomo, M.; Coetzee, H.
2018-01-01
In the last 20 years, the popular mineral systems approach has been used successfully for the exploration of various mineral commodities at various scales owing to its scientific soundness, cost effectiveness and simplicity in mapping the critical processes required for the formation of deposits. In the present study this approach was modified for the assessment of groundwater vulnerability. In terms of the modified approach, water drives the pollution migration processes, with various analogies having been derived from the mineral systems approach. The modified approach is illustrated here by the discussion of a case study of acid mine drainage (AMD) pollution in the Witbank, Ermelo and Highveld coalfields of the Mpumalanga and KwaZulu-Natal Provinces in South Africa. Many AMD cases have been reported in these provinces in recent years and are a cause of concern for local municipalities, mining and environmental agencies. In the Witbank, Ermelo and Highveld coalfields, several areas have been mined out while mining has not yet started in others, hence the need to identify groundwater regions prone to AMD pollution in order to avoid further impacts on the groundwater resources. A knowledge-based fuzzy expert system was built using vulnerability factors (energy sources, ligands sources, pollutant sources, transportation pathways and traps) to generate a groundwater vulnerability model of the coalfields. Highly vulnerable areas were identified in Witbank coalfield and the eastern part of the Ermelo coalfield which are characterised by the presence of AMD sources, good subsurface transport coupled with poor AMD pollution trapping properties. The results from the analysis indicate significant correlations between model values and both groundwater sulphate concentrations as well as pH. This shows that the proposed approach can indeed be used as an alternative to traditional methods of groundwater vulnerability assessment. The methodology only considers the AMD pollution attenuation and migration at a regional scale and does not account for local-scale sources of pollution and attenuation. Further research to refine the approach may include the incorporation of groundwater flow direction, rock-pollution reaction time, and temporal datasets for the future prediction of groundwater vulnerability. The approach may be applied to other coalfields to assess its robustness to changing hydrogeological conditions.
Developing customer databases.
Rao, S K; Shenbaga, S
2000-01-01
There is a growing consensus among pharmaceutical companies that more product and customer-specific approaches to marketing and selling a new drug can result in substantial increases in sales. Marketers and researchers taking a proactive micro-marketing approach to identifying, profiling, and communicating with target customers are likely to facilitate such approaches and outcomes. This article provides a working framework for creating customer databases that can be effectively mined to achieve a variety of such marketing and sales force objectives.
An Integrative data mining approach to identifying Adverse Outcome Pathway (AOP) Signatures
The Adverse Outcome Pathway (AOP) framework is a tool for making biological connections and summarizing key information across different levels of biological organization to connect biological perturbations at the molecular level to adverse outcomes for an individual or populatio...
Using high-throughput literature mining to support read-across predictions of toxicity (SOT)
Building scientific confidence in the development and evaluation of read-across remains an ongoing challenge. Approaches include establishing systematic frameworks to identify sources of uncertainty and ways to address them. One source of uncertainty is related to characterizing ...
High-throughput literature mining to support read-across predictions of toxicity (ASCCT meeting)
Building scientific confidence in the development and evaluation of read-across remains an ongoing challenge. Approaches include establishing systematic frameworks to identify sources of uncertainty and ways to address them. One source of uncertainty is related to characterizing ...
An Integrative Data Mining Approach to Identify Adverse Outcome Pathway Signatures
Adverse Outcome Pathways (AOPs) provide a formal framework for describing the mechanisms underlying the toxicity of chemicals in our environment. This process improves our ability to incorporate high-throughput toxicity testing (HTT) results and biomarker information on early key...
Mining Context-Aware Association Rules Using Grammar-Based Genetic Programming.
Luna, Jose Maria; Pechenizkiy, Mykola; Del Jesus, Maria Jose; Ventura, Sebastian
2017-09-25
Real-world data usually comprise features whose interpretation depends on some contextual information. Such contextual-sensitive features and patterns are of high interest to be discovered and analyzed in order to obtain the right meaning. This paper formulates the problem of mining context-aware association rules, which refers to the search for associations between itemsets such that the strength of their implication depends on a contextual feature. For the discovery of this type of associations, a model that restricts the search space and includes syntax constraints by means of a grammar-based genetic programming methodology is proposed. Grammars can be considered as a useful way of introducing subjective knowledge to the pattern mining process as they are highly related to the background knowledge of the user. The performance and usefulness of the proposed approach is examined by considering synthetically generated datasets. A posteriori analysis on different domains is also carried out to demonstrate the utility of this kind of associations. For example, in educational domains, it is essential to identify and understand contextual and context-sensitive factors that affect overall and individual student behavior and performance. The results of the experiments suggest that the approach is feasible and it automatically identifies interesting context-aware associations from real-world datasets.
Model of environmental life cycle assessment for coal mining operations.
Burchart-Korol, Dorota; Fugiel, Agata; Czaplicka-Kolarz, Krystyna; Turek, Marian
2016-08-15
This paper presents a novel approach to environmental assessment of coal mining operations, which enables assessment of the factors that are both directly and indirectly affecting the environment and are associated with the production of raw materials and energy used in processes. The primary novelty of the paper is the development of a computational environmental life cycle assessment (LCA) model for coal mining operations and the application of the model for coal mining operations in Poland. The LCA model enables the assessment of environmental indicators for all identified unit processes in hard coal mines with the life cycle approach. The proposed model enables the assessment of greenhouse gas emissions (GHGs) based on the IPCC method and the assessment of damage categories, such as human health, ecosystems and resources based on the ReCiPe method. The model enables the assessment of GHGs for hard coal mining operations in three time frames: 20, 100 and 500years. The model was used to evaluate the coal mines in Poland. It was demonstrated that the largest environmental impacts in damage categories were associated with the use of fossil fuels, methane emissions and the use of electricity, processing of wastes, heat, and steel supports. It was concluded that an environmental assessment of coal mining operations, apart from direct influence from processing waste, methane emissions and drainage water, should include the use of electricity, heat and steel, particularly for steel supports. Because the model allows the comparison of environmental impact assessment for various unit processes, it can be used for all hard coal mines, not only in Poland but also in the world. This development is an important step forward in the study of the impacts of fossil fuels on the environment with the potential to mitigate the impact of the coal industry on the environment. Copyright © 2016 Elsevier B.V. All rights reserved.
Jorritsma, Wiard; Cnossen, Fokie; Dierckx, Rudi A; Oudkerk, Matthijs; van Ooijen, Peter M A
2016-01-01
To perform a post-deployment usability evaluation of a radiology Picture Archiving and Communication System (PACS) client based on pattern mining of user interaction log data, and to assess the usefulness of this approach compared to a field study. All user actions performed on the PACS client were logged for four months. A data mining technique called closed sequential pattern mining was used to automatically extract frequently occurring interaction patterns from the log data. These patterns were used to identify usability issues with the PACS. The results of this evaluation were compared to the results of a field study based usability evaluation of the same PACS client. The interaction patterns revealed four usability issues: (1) the display protocols do not function properly, (2) the line measurement tool stays active until another tool is selected, rather than being deactivated after one use, (3) the PACS's built-in 3D functionality does not allow users to effectively perform certain 3D-related tasks, (4) users underuse the PACS's customization possibilities. All usability issues identified based on the log data were also found in the field study, which identified 48 issues in total. Post-deployment usability evaluation based on pattern mining of user interaction log data provides useful insights into the way users interact with the radiology PACS client. However, it reveals few usability issues compared to a field study and should therefore not be used as the sole method of usability evaluation. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Annotating images by mining image search results.
Wang, Xin-Jing; Zhang, Lei; Li, Xirong; Ma, Wei-Ying
2008-11-01
Although it has been studied for years by the computer vision and machine learning communities, image annotation is still far from practical. In this paper, we propose a novel attempt at model-free image annotation, which is a data-driven approach that annotates images by mining their search results. Some 2.4 million images with their surrounding text are collected from a few photo forums to support this approach. The entire process is formulated in a divide-and-conquer framework where a query keyword is provided along with the uncaptioned image to improve both the effectiveness and efficiency. This is helpful when the collected data set is not dense everywhere. In this sense, our approach contains three steps: 1) the search process to discover visually and semantically similar search results, 2) the mining process to identify salient terms from textual descriptions of the search results, and 3) the annotation rejection process to filter out noisy terms yielded by Step 2. To ensure real-time annotation, two key techniques are leveraged-one is to map the high-dimensional image visual features into hash codes, the other is to implement it as a distributed system, of which the search and mining processes are provided as Web services. As a typical result, the entire process finishes in less than 1 second. Since no training data set is required, our approach enables annotating with unlimited vocabulary and is highly scalable and robust to outliers. Experimental results on both real Web images and a benchmark image data set show the effectiveness and efficiency of the proposed algorithm. It is also worth noting that, although the entire approach is illustrated within the divide-and conquer framework, a query keyword is not crucial to our current implementation. We provide experimental results to prove this.
Promoter Sequences Prediction Using Relational Association Rule Mining
Czibula, Gabriela; Bocicor, Maria-Iuliana; Czibula, Istvan Gergely
2012-01-01
In this paper we are approaching, from a computational perspective, the problem of promoter sequences prediction, an important problem within the field of bioinformatics. As the conditions for a DNA sequence to function as a promoter are not known, machine learning based classification models are still developed to approach the problem of promoter identification in the DNA. We are proposing a classification model based on relational association rules mining. Relational association rules are a particular type of association rules and describe numerical orderings between attributes that commonly occur over a data set. Our classifier is based on the discovery of relational association rules for predicting if a DNA sequence contains or not a promoter region. An experimental evaluation of the proposed model and comparison with similar existing approaches is provided. The obtained results show that our classifier overperforms the existing techniques for identifying promoter sequences, confirming the potential of our proposal. PMID:22563233
A novel approach to generating CER hypotheses based on mining clinical data.
Zhang, Shuo; Li, Lin; Yu, Yiqin; Sun, Xingzhi; Xu, Linhao; Zhao, Wei; Teng, Xiaofei; Pan, Yue
2013-01-01
Comparative effectiveness research (CER) is a scientific method of investigating the effectiveness of alternative intervention methods. In a CER study, clinical researchers typically start with a CER hypothesis, and aim to evaluate it by applying a series of medical statistical methods. Traditionally, the CER hypotheses are defined manually by clinical researchers. This makes the task of hypothesis generation very time-consuming and the quality of hypothesis heavily dependent on the researchers' skills. Recently, with more electronic medical data being collected, it is highly promising to apply the computerized method for discovering CER hypotheses from clinical data sets. In this poster, we proposes a novel approach to automatically generating CER hypotheses based on mining clinical data, and presents a case study showing that the approach can facilitate clinical researchers to identify potentially valuable hypotheses and eventually define high quality CER studies.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sanfilippo, Antonio P.; McGrath, Liam R.; Whitney, Paul D.
2011-11-17
We present a computational approach to radical rhetoric that leverages the co-expression of rhetoric and action features in discourse to identify violent intent. The approach combines text mining and machine learning techniques with insights from Frame Analysis and theories that explain the emergence of violence in terms of moral disengagement, the violation of sacred values and social isolation in order to build computational models that identify messages from terrorist sources and estimate their proximity to an attack. We discuss a specific application of this approach to a body of documents from and about radical and terrorist groups in the Middlemore » East and present the results achieved.« less
Mining and harnessing natural variation - a little MAGIC
USDA-ARS?s Scientific Manuscript database
As has been frequently noted, exotic germplasm ( lines unadapted to local conditions) can be sources of very beneficial genes. The trouble is that it's often difficult to identify these genes. We propose an approach in which mutations can be used to uncover useful variants of natural genes....
Management of the water balance and quality in mining areas
NASA Astrophysics Data System (ADS)
Pasanen, Antti; Krogerus, Kirsti; Mroueh, Ulla-Maija; Turunen, Kaisa; Backnäs, Soile; Vento, Tiia; Veijalainen, Noora; Hentinen, Kimmo; Korkealaakso, Juhani
2015-04-01
Although mining companies have long been conscious of water related risks they still face environmental management problems. These problems mainly emerge because mine sites' water balances have not been adequately assessed in the stage of the planning of mines. More consistent approach is required to help mining companies identify risks and opportunities related to the management of water resources in all stages of mining. This approach requires that the water cycle of a mine site is interconnected with the general hydrologic water cycle. In addition to knowledge on hydrological conditions, the control of the water balance in the mining processes require knowledge of mining processes, the ability to adjust process parameters to variable hydrological conditions, adaptation of suitable water management tools and systems, systematic monitoring of amounts and quality of water, adequate capacity in water management infrastructure to handle the variable water flows, best practices to assess the dispersion, mixing and dilution of mine water and pollutant loading to receiving water bodies, and dewatering and separation of water from tailing and precipitates. WaterSmart project aims to improve the awareness of actual quantities of water, and water balances in mine areas to improve the forecasting and the management of the water volumes. The study is executed through hydrogeological and hydrological surveys and online monitoring procedures. One of the aims is to exploit on-line water quantity and quality monitoring for the better management of the water balances. The target is to develop a practical and end-user-specific on-line input and output procedures. The second objective is to develop mathematical models to calculate combined water balances including the surface, ground and process waters. WSFS, the Hydrological Modeling and Forecasting System of SYKE is being modified for mining areas. New modelling tools are developed on spreadsheet and system dynamics platforms to systematically integrate all water balance components (groundwater, surface water, infiltration, precipitation, mine water facilities and operations etc.) into overall dynamic mine site considerations. After coupling the surface and ground water models (e.g. Feflow and WSFS) with each other, they are compared with Goldsim. The third objective is to integrate the monitoring and modelling tools into the mine management system and process control. The modelling and predictive process control can prevent flood situations, ensure water adequacy, and enable the controlled mine water treatment. The project will develop a constantly updated management system for water balance including both natural waters and process waters.
Lu, Songjian; Jin, Bo; Cowart, L Ashley; Lu, Xinghua
2013-01-01
Genetic and pharmacological perturbation experiments, such as deleting a gene and monitoring gene expression responses, are powerful tools for studying cellular signal transduction pathways. However, it remains a challenge to automatically derive knowledge of a cellular signaling system at a conceptual level from systematic perturbation-response data. In this study, we explored a framework that unifies knowledge mining and data mining towards the goal. The framework consists of the following automated processes: 1) applying an ontology-driven knowledge mining approach to identify functional modules among the genes responding to a perturbation in order to reveal potential signals affected by the perturbation; 2) applying a graph-based data mining approach to search for perturbations that affect a common signal; and 3) revealing the architecture of a signaling system by organizing signaling units into a hierarchy based on their relationships. Applying this framework to a compendium of yeast perturbation-response data, we have successfully recovered many well-known signal transduction pathways; in addition, our analysis has led to many new hypotheses regarding the yeast signal transduction system; finally, our analysis automatically organized perturbed genes as a graph reflecting the architecture of the yeast signaling system. Importantly, this framework transformed molecular findings from a gene level to a conceptual level, which can be readily translated into computable knowledge in the form of rules regarding the yeast signaling system, such as "if genes involved in the MAPK signaling are perturbed, genes involved in pheromone responses will be differentially expressed."
Nariya, Maulik K; Kim, Jae Hyun; Xiong, Jian; Kleindl, Peter A; Hewarathna, Asha; Fisher, Adam C; Joshi, Sangeeta B; Schöneich, Christian; Forrest, M Laird; Middaugh, C Russell; Volkin, David B; Deeds, Eric J
2017-11-01
There is growing interest in generating physicochemical and biological analytical data sets to compare complex mixture drugs, for example, products from different manufacturers. In this work, we compare various crofelemer samples prepared from a single lot by filtration with varying molecular weight cutoffs combined with incubation for different times at different temperatures. The 2 preceding articles describe experimental data sets generated from analytical characterization of fractionated and degraded crofelemer samples. In this work, we use data mining techniques such as principal component analysis and mutual information scores to help visualize the data and determine discriminatory regions within these large data sets. The mutual information score identifies chemical signatures that differentiate crofelemer samples. These signatures, in many cases, would likely be missed by traditional data analysis tools. We also found that supervised learning classifiers robustly discriminate samples with around 99% classification accuracy, indicating that mathematical models of these physicochemical data sets are capable of identifying even subtle differences in crofelemer samples. Data mining and machine learning techniques can thus identify fingerprint-type attributes of complex mixture drugs that may be used for comparative characterization of products. Copyright © 2017 American Pharmacists Association®. All rights reserved.
Review of Recent Development of Dynamic Wind Farm Equivalent Models Based on Big Data Mining
NASA Astrophysics Data System (ADS)
Wang, Chenggen; Zhou, Qian; Han, Mingzhe; Lv, Zhan’ao; Hou, Xiao; Zhao, Haoran; Bu, Jing
2018-04-01
Recently, the big data mining method has been applied in dynamic wind farm equivalent modeling. In this paper, its recent development with present research both domestic and overseas is reviewed. Firstly, the studies of wind speed prediction, equivalence and its distribution in the wind farm are concluded. Secondly, two typical approaches used in the big data mining method is introduced, respectively. For single wind turbine equivalent modeling, it focuses on how to choose and identify equivalent parameters. For multiple wind turbine equivalent modeling, the following three aspects are concentrated, i.e. aggregation of different wind turbine clusters, the parameters in the same cluster, and equivalence of collector system. Thirdly, an outlook on the development of dynamic wind farm equivalent models in the future is discussed.
Mining the human gut microbiome for novel stress resistance genes
Culligan, Eamonn P.; Marchesi, Julian R.; Hill, Colin; Sleator, Roy D.
2012-01-01
With the rapid advances in sequencing technologies in recent years, the human genome is now considered incomplete without the complementing microbiome, which outnumbers human genes by a factor of one hundred. The human microbiome, and more specifically the gut microbiome, has received considerable attention and research efforts over the past decade. Many studies have identified and quantified “who is there?,” while others have determined some of their functional capacity, or “what are they doing?” In a recent study, we identified novel salt-tolerance loci from the human gut microbiome using combined functional metagenomic and bioinformatics based approaches. Herein, we discuss the identified loci, their role in salt-tolerance and their importance in the context of the gut environment. We also consider the utility and power of functional metagenomics for mining such environments for novel genes and proteins, as well as the implications and possible applications for future research. PMID:22688726
Hassanpour, Saeed; O'Connor, Martin J; Das, Amar K
2013-08-12
A variety of informatics approaches have been developed that use information retrieval, NLP and text-mining techniques to identify biomedical concepts and relations within scientific publications or their sentences. These approaches have not typically addressed the challenge of extracting more complex knowledge such as biomedical definitions. In our efforts to facilitate knowledge acquisition of rule-based definitions of autism phenotypes, we have developed a novel semantic-based text-mining approach that can automatically identify such definitions within text. Using an existing knowledge base of 156 autism phenotype definitions and an annotated corpus of 26 source articles containing such definitions, we evaluated and compared the average rank of correctly identified rule definition or corresponding rule template using both our semantic-based approach and a standard term-based approach. We examined three separate scenarios: (1) the snippet of text contained a definition already in the knowledge base; (2) the snippet contained an alternative definition for a concept in the knowledge base; and (3) the snippet contained a definition not in the knowledge base. Our semantic-based approach had a higher average rank than the term-based approach for each of the three scenarios (scenario 1: 3.8 vs. 5.0; scenario 2: 2.8 vs. 4.9; and scenario 3: 4.5 vs. 6.2), with each comparison significant at the p-value of 0.05 using the Wilcoxon signed-rank test. Our work shows that leveraging existing domain knowledge in the information extraction of biomedical definitions significantly improves the correct identification of such knowledge within sentences. Our method can thus help researchers rapidly acquire knowledge about biomedical definitions that are specified and evolving within an ever-growing corpus of scientific publications.
Design approaches in quarrying and pit-mining reclamation
Arbogast, Belinda F.
1999-01-01
Reclaimed mine sites have been evaluated so that the public, industry, and land planners may recognize there are innovative designs available for consideration and use. People tend to see cropland, range, and road cuts as a necessary part of their everyday life, not as disturbed areas despite their high visibility. Mining also generates a disturbed landscape, unfortunately one that many consider waste until reclaimed by human beings. The development of mining provides an economic base and use of a natural resource to improve the quality of human life. Equally important is a sensitivity to the geologic origin and natural pattern of the land. Wisely shaping out environment requires a design plan and product that responds to a site's physiography, ecology, function, artistic form, and publication perception. An examination of selected sites for their landscape design suggested nine approaches for mining reclamation. The oldest design approach around is nature itself. Humans may sometimes do more damage going to an area in the attempt to repair it. Given enough geologic time, a small-site area, and stable adjacent ecosystems, disturbed areas recover without mankind's input. Visual screens and buffer zones conceal the facility in a camouflage approach. Typically, earth berms, fences, and plantings are used to disguise the mining facility. Restoration targets social or economic benefits by reusing the site for public amenities, most often in urban centers with large populations. A mitigation approach attempts to protect the environment and return mined areas to use with scientific input. The reuse of cement, building rubble, macadam meets only about 10% of the demand from aggregate. Recognizing the limited supply of mineral resources and encouraging recycling efforts are steps are steps in a renewable resource approach. An educative design approach effectively communicates mining information through outreach, land stewardship, and community service. Mine sites used for art show a celebration of beauty and experience -- abstract geology. The last design approach combines art and science in a human-nature ecosystem termed integration. With environmental concerns, an operating or reclaimed mine site can no longer be considered isolated from its surroundings. Site analysis of mine works needs to go beyond site-specific information and relate to the regional context of the greater landscape. Understanding design approach can turn undesirable features (mines and pits) into something perceived as desirable by the public.
Random Forests for Evaluating Pedagogy and Informing Personalized Learning
ERIC Educational Resources Information Center
Spoon, Kelly; Beemer, Joshua; Whitmer, John C.; Fan, Juanjuan; Frazee, James P.; Stronach, Jeanne; Bohonak, Andrew J.; Levine, Richard A.
2016-01-01
Random forests are presented as an analytics foundation for educational data mining tasks. The focus is on course- and program-level analytics including evaluating pedagogical approaches and interventions and identifying and characterizing at-risk students. As part of this development, the concept of individualized treatment effects (ITE) is…
Large-Scale Overlays and Trends: Visually Mining, Panning and Zooming the Observable Universe.
Luciani, Timothy Basil; Cherinka, Brian; Oliphant, Daniel; Myers, Sean; Wood-Vasey, W Michael; Labrinidis, Alexandros; Marai, G Elisabeta
2014-07-01
We introduce a web-based computing infrastructure to assist the visual integration, mining and interactive navigation of large-scale astronomy observations. Following an analysis of the application domain, we design a client-server architecture to fetch distributed image data and to partition local data into a spatial index structure that allows prefix-matching of spatial objects. In conjunction with hardware-accelerated pixel-based overlays and an online cross-registration pipeline, this approach allows the fetching, displaying, panning and zooming of gigabit panoramas of the sky in real time. To further facilitate the integration and mining of spatial and non-spatial data, we introduce interactive trend images-compact visual representations for identifying outlier objects and for studying trends within large collections of spatial objects of a given class. In a demonstration, images from three sky surveys (SDSS, FIRST and simulated LSST results) are cross-registered and integrated as overlays, allowing cross-spectrum analysis of astronomy observations. Trend images are interactively generated from catalog data and used to visually mine astronomy observations of similar type. The front-end of the infrastructure uses the web technologies WebGL and HTML5 to enable cross-platform, web-based functionality. Our approach attains interactive rendering framerates; its power and flexibility enables it to serve the needs of the astronomy community. Evaluation on three case studies, as well as feedback from domain experts emphasize the benefits of this visual approach to the observational astronomy field; and its potential benefits to large scale geospatial visualization in general.
Multisource geological data mining and its utilization of uranium resources exploration
NASA Astrophysics Data System (ADS)
Zhang, Jie-lin
2009-10-01
Nuclear energy as one of clear energy sources takes important role in economic development in CHINA, and according to the national long term development strategy, many more nuclear powers will be built in next few years, so it is a great challenge for uranium resources exploration. Research and practice on mineral exploration demonstrates that utilizing the modern Earth Observe System (EOS) technology and developing new multi-source geological data mining methods are effective approaches to uranium deposits prospecting. Based on data mining and knowledge discovery technology, this paper uses multi-source geological data to character electromagnetic spectral, geophysical and spatial information of uranium mineralization factors, and provides the technical support for uranium prospecting integrating with field remote sensing geological survey. Multi-source geological data used in this paper include satellite hyperspectral image (Hyperion), high spatial resolution remote sensing data, uranium geological information, airborne radiometric data, aeromagnetic and gravity data, and related data mining methods have been developed, such as data fusion of optical data and Radarsat image, information integration of remote sensing and geophysical data, and so on. Based on above approaches, the multi-geoscience information of uranium mineralization factors including complex polystage rock mass, mineralization controlling faults and hydrothermal alterations have been identified, the metallogenic potential of uranium has been evaluated, and some predicting areas have been located.
Information mining in remote sensing imagery
NASA Astrophysics Data System (ADS)
Li, Jiang
The volume of remotely sensed imagery continues to grow at an enormous rate due to the advances in sensor technology, and our capability for collecting and storing images has greatly outpaced our ability to analyze and retrieve information from the images. This motivates us to develop image information mining techniques, which is very much an interdisciplinary endeavor drawing upon expertise in image processing, databases, information retrieval, machine learning, and software design. This dissertation proposes and implements an extensive remote sensing image information mining (ReSIM) system prototype for mining useful information implicitly stored in remote sensing imagery. The system consists of three modules: image processing subsystem, database subsystem, and visualization and graphical user interface (GUI) subsystem. Land cover and land use (LCLU) information corresponding to spectral characteristics is identified by supervised classification based on support vector machines (SVM) with automatic model selection, while textural features that characterize spatial information are extracted using Gabor wavelet coefficients. Within LCLU categories, textural features are clustered using an optimized k-means clustering approach to acquire search efficient space. The clusters are stored in an object-oriented database (OODB) with associated images indexed in an image database (IDB). A k-nearest neighbor search is performed using a query-by-example (QBE) approach. Furthermore, an automatic parametric contour tracing algorithm and an O(n) time piecewise linear polygonal approximation (PLPA) algorithm are developed for shape information mining of interesting objects within the image. A fuzzy object-oriented database based on the fuzzy object-oriented data (FOOD) model is developed to handle the fuzziness and uncertainty. Three specific applications are presented: integrated land cover and texture pattern mining, shape information mining for change detection of lakes, and fuzzy normalized difference vegetation index (NDVI) pattern mining. The study results show the effectiveness of the proposed system prototype and the potentials for other applications in remote sensing.
Papamokos, George; Silins, Ilona
2016-01-01
There is an increasing need for new reliable non-animal based methods to predict and test toxicity of chemicals. Quantitative structure-activity relationship (QSAR), a computer-based method linking chemical structures with biological activities, is used in predictive toxicology. In this study, we tested the approach to combine QSAR data with literature profiles of carcinogenic modes of action automatically generated by a text-mining tool. The aim was to generate data patterns to identify associations between chemical structures and biological mechanisms related to carcinogenesis. Using these two methods, individually and combined, we evaluated 96 rat carcinogens of the hematopoietic system, liver, lung, and skin. We found that skin and lung rat carcinogens were mainly mutagenic, while the group of carcinogens affecting the hematopoietic system and the liver also included a large proportion of non-mutagens. The automatic literature analysis showed that mutagenicity was a frequently reported endpoint in the literature of these carcinogens, however, less common endpoints such as immunosuppression and hormonal receptor-mediated effects were also found in connection with some of the carcinogens, results of potential importance for certain target organs. The combined approach, using QSAR and text-mining techniques, could be useful for identifying more detailed information on biological mechanisms and the relation with chemical structures. The method can be particularly useful in increasing the understanding of structure and activity relationships for non-mutagens.
Papamokos, George; Silins, Ilona
2016-01-01
There is an increasing need for new reliable non-animal based methods to predict and test toxicity of chemicals. Quantitative structure-activity relationship (QSAR), a computer-based method linking chemical structures with biological activities, is used in predictive toxicology. In this study, we tested the approach to combine QSAR data with literature profiles of carcinogenic modes of action automatically generated by a text-mining tool. The aim was to generate data patterns to identify associations between chemical structures and biological mechanisms related to carcinogenesis. Using these two methods, individually and combined, we evaluated 96 rat carcinogens of the hematopoietic system, liver, lung, and skin. We found that skin and lung rat carcinogens were mainly mutagenic, while the group of carcinogens affecting the hematopoietic system and the liver also included a large proportion of non-mutagens. The automatic literature analysis showed that mutagenicity was a frequently reported endpoint in the literature of these carcinogens, however, less common endpoints such as immunosuppression and hormonal receptor-mediated effects were also found in connection with some of the carcinogens, results of potential importance for certain target organs. The combined approach, using QSAR and text-mining techniques, could be useful for identifying more detailed information on biological mechanisms and the relation with chemical structures. The method can be particularly useful in increasing the understanding of structure and activity relationships for non-mutagens. PMID:27625608
Li, Ke; Li, Junfang; Su, Jin; Xiao, Xuefeng; Peng, Xiujuan; Liu, Feng; Li, Defeng; Zhang, Yi; Chong, Tao; Xu, Haiyu; Liu, Changxiao; Yang, Hongjun
2018-03-07
The quality evaluation of traditional Chinese medicine (TCM) formulations is needed to guarantee the safety and efficacy. In our laboratory, we established interaction rules between chemical quality control and biological activity evaluations to study Yuanhu Zhitong tablets (YZTs). Moreover, a quality marker (Q-marker) has recently been proposed as a new concept in the quality control of TCM. However, no appropriate methods are available for the identification of Q-markers from the complex TCM systems. We aimed to use an integrative pharmacological (IP) approach to further identify Q-markers from YZTs through the integration of multidisciplinary knowledge. In addition, data mining was used to determine the correlation between multiple constituents of this TCM and its bioactivity to improve quality control. The IP approach was used to identify the active constituents of YZTs and elucidate the molecular mechanisms by integrating chemical and biosynthetic analyses, drug metabolism, and network pharmacology. Data mining methods including grey relational analysis (GRA) and least squares support vector machine (LS-SVM) regression techniques, were used to establish the correlations among the constituents and efficacy, and dose efficacy in multiple dimensions. Seven constituents (tetrahydropalmatine, α-allocryptopine, protopine, corydaline, imperatorin, isoimperatorin, and byakangelicin) were identified as Q-markers of YZT using IP based on their high abundance, specific presence in the individual herbal constituents and the product, appropriate drug-like properties, and critical contribution to the bioactivity of the mixture of YZT constituents. Moreover, three Q-markers (protopine, α-allocryptopine, and corydaline) were highly correlated with the multiple bioactivities of the YZTs, as found using data mining. Finally, three constituents (tetrahydropalmatine, corydaline, and imperatorin) were chosen as minimum combinations that both distinguished the authentic components from false products and indicated the intensity of bioactivity to improve the quality control of YZTs. Tetrahydropalmatine, imperatorin, and corydaline could be used as minimum combinations to effectively control the quality of YZTs. Copyright © 2018. Published by Elsevier GmbH.
A systems biology approach to the global analysis of transcription factors in colorectal cancer.
Pradhan, Meeta P; Prasad, Nagendra K A; Palakal, Mathew J
2012-08-01
Biological entities do not perform in isolation, and often, it is the nature and degree of interactions among numerous biological entities which ultimately determines any final outcome. Hence, experimental data on any single biological entity can be of limited value when considered only in isolation. To address this, we propose that augmenting individual entity data with the literature will not only better define the entity's own significance but also uncover relationships with novel biological entities.To test this notion, we developed a comprehensive text mining and computational methodology that focused on discovering new targets of one class of molecular entities, transcription factors (TF), within one particular disease, colorectal cancer (CRC). We used 39 molecular entities known to be associated with CRC along with six colorectal cancer terms as the bait list, or list of search terms, for mining the biomedical literature to identify CRC-specific genes and proteins. Using the literature-mined data, we constructed a global TF interaction network for CRC. We then developed a multi-level, multi-parametric methodology to identify TFs to CRC. The small bait list, when augmented with literature-mined data, identified a large number of biological entities associated with CRC. The relative importance of these TF and their associated modules was identified using functional and topological features. Additional validation of these highly-ranked TF using the literature strengthened our findings. Some of the novel TF that we identified were: SLUG, RUNX1, IRF1, HIF1A, ATF-2, ABL1, ELK-1 and GATA-1. Some of these TFs are associated with functional modules in known pathways of CRC, including the Beta-catenin/development, immune response, transcription, and DNA damage pathways. Our methodology of using text mining data and a multi-level, multi-parameter scoring technique was able to identify both known and novel TF that have roles in CRC. Starting with just one TF (SMAD3) in the bait list, the literature mining process identified an additional 116 CRC-associated TFs. Our network-based analysis showed that these TFs all belonged to any of 13 major functional groups that are known to play important roles in CRC. Among these identified TFs, we obtained a novel six-node module consisting of ATF2-P53-JNK1-ELK1-EPHB2-HIF1A, from which the novel JNK1-ELK1 association could potentially be a significant marker for CRC.
An Integrated Assessment Approach to Address Artisanal and Small-Scale Gold Mining in Ghana.
Basu, Niladri; Renne, Elisha P; Long, Rachel N
2015-09-17
Artisanal and small-scale gold mining (ASGM) is growing in many regions of the world including Ghana. The problems in these communities are complex and multi-faceted. To help increase understanding of such problems, and to enable consensus-building and effective translation of scientific findings to stakeholders, help inform policies, and ultimately improve decision making, we utilized an Integrated Assessment approach to study artisanal and small-scale gold mining activities in Ghana. Though Integrated Assessments have been used in the fields of environmental science and sustainable development, their use in addressing specific matter in public health, and in particular, environmental and occupational health is quite limited despite their many benefits. The aim of the current paper was to describe specific activities undertaken and how they were organized, and the outputs and outcomes of our activity. In brief, three disciplinary workgroups (Natural Sciences, Human Health, Social Sciences and Economics) were formed, with 26 researchers from a range of Ghanaian institutions plus international experts. The workgroups conducted activities in order to address the following question: What are the causes, consequences and correctives of small-scale gold mining in Ghana? More specifically: What alternatives are available in resource-limited settings in Ghana that allow for gold-mining to occur in a manner that maintains ecological health and human health without hindering near- and long-term economic prosperity? Several response options were identified and evaluated, and are currently being disseminated to various stakeholders within Ghana and internationally.
An Integrated Assessment Approach to Address Artisanal and Small-Scale Gold Mining in Ghana
Basu, Niladri; Renne, Elisha P.; Long, Rachel N.
2015-01-01
Artisanal and small-scale gold mining (ASGM) is growing in many regions of the world including Ghana. The problems in these communities are complex and multi-faceted. To help increase understanding of such problems, and to enable consensus-building and effective translation of scientific findings to stakeholders, help inform policies, and ultimately improve decision making, we utilized an Integrated Assessment approach to study artisanal and small-scale gold mining activities in Ghana. Though Integrated Assessments have been used in the fields of environmental science and sustainable development, their use in addressing specific matter in public health, and in particular, environmental and occupational health is quite limited despite their many benefits. The aim of the current paper was to describe specific activities undertaken and how they were organized, and the outputs and outcomes of our activity. In brief, three disciplinary workgroups (Natural Sciences, Human Health, Social Sciences and Economics) were formed, with 26 researchers from a range of Ghanaian institutions plus international experts. The workgroups conducted activities in order to address the following question: What are the causes, consequences and correctives of small-scale gold mining in Ghana? More specifically: What alternatives are available in resource-limited settings in Ghana that allow for gold-mining to occur in a manner that maintains ecological health and human health without hindering near- and long-term economic prosperity? Several response options were identified and evaluated, and are currently being disseminated to various stakeholders within Ghana and internationally. PMID:26393627
Landscape Character of Pongkor Mining Ecotourism Area
NASA Astrophysics Data System (ADS)
Kusumoarto, A.; Gunawan, A.; Machfud; Hikmat, A.
2017-10-01
Pongkor Mining Ecotourism Area has a diverse landscape character as a potential landscape resources for the development of ecotourism destination. This area is part of the Mount of Botol Resort, Halimun Salak National Park (HSNP). This area also has a fairly high biodiversity. This study aims to identify and analysis the category of landscape character in the Pongkor Mining Ecotourism Area for the development of ecotourism destination. This study used a descriptive approach through field surveys and interviews, was carried out through two steps : 1) identify the landscape character, and 2) analysis of the landscape character. The results showed that in areas set aside for ecotourism destination in Pongkor Mining, landscape character category scattered forests, tailing ponds, river, plain, and the built environment. The Category of landscape character most dominant scattered in the area is forest, here is the river, plain, tailing ponds, the built environment, and plain. The landscape character in a natural environment most preferred for ecotourism activities. The landscape character that spread in the natural environment and the built environment is a potential that must be protected and modified such as elimination of incongruous element, accentuation of natural form, alteration of the natural form, intensification and enhanced visual quality intensively to be developed as a ecotourism destination area.
A new genome-mining tool redefines the lasso peptide biosynthetic landscape
Tietz, Jonathan I.; Schwalen, Christopher J.; Patel, Parth S.; Maxson, Tucker; Blair, Patricia M.; Tai, Hua-Chia; Zakai, Uzma I.; Mitchell, Douglas A.
2016-01-01
Ribosomally synthesized and post-translationally modified peptide (RiPP) natural products are attractive for genome-driven discovery and re-engineering, but limitations in bioinformatic methods and exponentially increasing genomic data make large-scale mining difficult. We report RODEO (Rapid ORF Description and Evaluation Online), which combines hidden Markov model-based analysis, heuristic scoring, and machine learning to identify biosynthetic gene clusters and predict RiPP precursor peptides. We initially focused on lasso peptides, which display intriguing physiochemical properties and bioactivities, but their hypervariability renders them challenging prospects for automated mining. Our approach yielded the most comprehensive mapping of lasso peptide space, revealing >1,300 compounds. We characterized the structures and bioactivities of six lasso peptides, prioritized based on predicted structural novelty, including an unprecedented handcuff-like topology and another with a citrulline modification exceptionally rare among bacteria. These combined insights significantly expand the knowledge of lasso peptides, and more broadly, provide a framework for future genome-mining efforts. PMID:28244986
Estimating instream constituent loads using replicate synoptic sampling, Peru Creek, Colorado
NASA Astrophysics Data System (ADS)
Runkel, Robert L.; Walton-Day, Katherine; Kimball, Briant A.; Verplanck, Philip L.; Nimick, David A.
2013-05-01
SummaryThe synoptic mass balance approach is often used to evaluate constituent mass loading in streams affected by mine drainage. Spatial profiles of constituent mass load are used to identify sources of contamination and prioritize sites for remedial action. This paper presents a field scale study in which replicate synoptic sampling campaigns are used to quantify the aggregate uncertainty in constituent load that arises from (1) laboratory analyses of constituent and tracer concentrations, (2) field sampling error, and (3) temporal variation in concentration from diel constituent cycles and/or source variation. Consideration of these factors represents an advance in the application of the synoptic mass balance approach by placing error bars on estimates of constituent load and by allowing all sources of uncertainty to be quantified in aggregate; previous applications of the approach have provided only point estimates of constituent load and considered only a subset of the possible errors. Given estimates of aggregate uncertainty, site specific data and expert judgement may be used to qualitatively assess the contributions of individual factors to uncertainty. This assessment can be used to guide the collection of additional data to reduce uncertainty. Further, error bars provided by the replicate approach can aid the investigator in the interpretation of spatial loading profiles and the subsequent identification of constituent source areas within the watershed. The replicate sampling approach is applied to Peru Creek, a stream receiving acidic, metal-rich effluent from the Pennsylvania Mine. Other sources of acidity and metals within the study reach include a wetland area adjacent to the mine and tributary inflow from Cinnamon Gulch. Analysis of data collected under low-flow conditions indicates that concentrations of Al, Cd, Cu, Fe, Mn, Pb, and Zn in Peru Creek exceed aquatic life standards. Constituent loading within the study reach is dominated by effluent from the Pennsylvania Mine, with over 50% of the Cd, Cu, Fe, Mn, and Zn loads attributable to a collapsed adit near the top of the study reach. These estimates of mass load may underestimate the effect of the Pennsylvania Mine as leakage from underground mine workings may contribute to metal loads that are currently attributed to the wetland area. This potential leakage confounds the evaluation of remedial options and additional research is needed to determine the magnitude and location of the leakage.
Estimating instream constituent loads using replicate synoptic sampling, Peru Creek, Colorado
Runkel, Robert L.; Walton-Day, Katherine; Kimball, Briant A.; Verplanck, Philip L.; Nimick, David A.
2013-01-01
The synoptic mass balance approach is often used to evaluate constituent mass loading in streams affected by mine drainage. Spatial profiles of constituent mass load are used to identify sources of contamination and prioritize sites for remedial action. This paper presents a field scale study in which replicate synoptic sampling campaigns are used to quantify the aggregate uncertainty in constituent load that arises from (1) laboratory analyses of constituent and tracer concentrations, (2) field sampling error, and (3) temporal variation in concentration from diel constituent cycles and/or source variation. Consideration of these factors represents an advance in the application of the synoptic mass balance approach by placing error bars on estimates of constituent load and by allowing all sources of uncertainty to be quantified in aggregate; previous applications of the approach have provided only point estimates of constituent load and considered only a subset of the possible errors. Given estimates of aggregate uncertainty, site specific data and expert judgement may be used to qualitatively assess the contributions of individual factors to uncertainty. This assessment can be used to guide the collection of additional data to reduce uncertainty. Further, error bars provided by the replicate approach can aid the investigator in the interpretation of spatial loading profiles and the subsequent identification of constituent source areas within the watershed.The replicate sampling approach is applied to Peru Creek, a stream receiving acidic, metal-rich effluent from the Pennsylvania Mine. Other sources of acidity and metals within the study reach include a wetland area adjacent to the mine and tributary inflow from Cinnamon Gulch. Analysis of data collected under low-flow conditions indicates that concentrations of Al, Cd, Cu, Fe, Mn, Pb, and Zn in Peru Creek exceed aquatic life standards. Constituent loading within the study reach is dominated by effluent from the Pennsylvania Mine, with over 50% of the Cd, Cu, Fe, Mn, and Zn loads attributable to a collapsed adit near the top of the study reach. These estimates of mass load may underestimate the effect of the Pennsylvania Mine as leakage from underground mine workings may contribute to metal loads that are currently attributed to the wetland area. This potential leakage confounds the evaluation of remedial options and additional research is needed to determine the magnitude and location of the leakage.
Buzatu, Andrei; Dill, Harald G; Buzgar, Nicolae; Damian, Gheorghe; Maftei, Andreea Elena; Apopei, Andrei Ionuț
2016-01-15
The Baia Sprie epithermal system, a well-known deposit for its impressive mineralogical associations, shows the proper conditions for acid mine drainage and can be considered a general example for affected mining areas around the globe. Efflorescent samples from the abandoned open pit Minei Hill have been analyzed by X-ray diffraction (XRD), scanning electron microscopy (SEM), Raman and near-infrared (NIR) spectrometry. The identified phases represent mostly iron sulfates with different hydration degrees (szomolnokite, rozenite, melanterite, coquimbite, ferricopiapite), Zn and Al sulfates (gunningite, alunogen, halotrichite). The samples were heated at different temperatures in order to establish the phase transformations among the studied sulfates. The dehydration temperatures and intermediate phases upon decomposition were successfully identified for each of mineral phases. Gunningite was the single sulfate that showed no transformations during the heating experiment. All the other sulfates started to dehydrate within the 30-90 °C temperature range. The acid mine drainage is the main cause for sulfates formation, triggered by pyrite oxidation as the major source for the abundant iron sulfates. Based on the dehydration temperatures, the climatological interpretation indicated that melanterite formation and long-term presence is related to continental and temperate climates. Coquimbite and rozenite are attributed also to the dry arid/semi-arid areas, in addition to the above mentioned ones. The more stable sulfates, alunogen, halotrichite, szomolnokite, ferricopiapite and gunningite, can form and persists in all climate regimes, from dry continental to even tropical humid. Copyright © 2015 Elsevier B.V. All rights reserved.
Chen, Xiaoyi; Faviez, Carole; Schuck, Stéphane; Lillo-Le-Louët, Agnès; Texier, Nathalie; Dahamna, Badisse; Huot, Charles; Foulquié, Pierre; Pereira, Suzanne; Leroux, Vincent; Karapetiantz, Pierre; Guenegou-Arnoux, Armelle; Katsahian, Sandrine; Bousquet, Cédric; Burgun, Anita
2018-01-01
Background: The Food and Drug Administration (FDA) in the United States and the European Medicines Agency (EMA) have recognized social media as a new data source to strengthen their activities regarding drug safety. Objective: Our objective in the ADR-PRISM project was to provide text mining and visualization tools to explore a corpus of posts extracted from social media. We evaluated this approach on a corpus of 21 million posts from five patient forums, and conducted a qualitative analysis of the data available on methylphenidate in this corpus. Methods: We applied text mining methods based on named entity recognition and relation extraction in the corpus, followed by signal detection using proportional reporting ratio (PRR). We also used topic modeling based on the Correlated Topic Model to obtain the list of the matics in the corpus and classify the messages based on their topics. Results: We automatically identified 3443 posts about methylphenidate published between 2007 and 2016, among which 61 adverse drug reactions (ADR) were automatically detected. Two pharmacovigilance experts evaluated manually the quality of automatic identification, and a f-measure of 0.57 was reached. Patient's reports were mainly neuro-psychiatric effects. Applying PRR, 67% of the ADRs were signals, including most of the neuro-psychiatric symptoms but also palpitations. Topic modeling showed that the most represented topics were related to Childhood and Treatment initiation , but also Side effects . Cases of misuse were also identified in this corpus, including recreational use and abuse. Conclusion: Named entity recognition combined with signal detection and topic modeling have demonstrated their complementarity in mining social media data. An in-depth analysis focused on methylphenidate showed that this approach was able to detect potential signals and to provide better understanding of patients' behaviors regarding drugs, including misuse.
An Approach to Realizing Process Control for Underground Mining Operations of Mobile Machines
Song, Zhen; Schunnesson, Håkan; Rinne, Mikael; Sturgul, John
2015-01-01
The excavation and production in underground mines are complicated processes which consist of many different operations. The process of underground mining is considerably constrained by the geometry and geology of the mine. The various mining operations are normally performed in series at each working face. The delay of a single operation will lead to a domino effect, thus delay the starting time for the next process and the completion time of the entire process. This paper presents a new approach to the process control for underground mining operations, e.g. drilling, bolting, mucking. This approach can estimate the working time and its probability for each operation more efficiently and objectively by improving the existing PERT (Program Evaluation and Review Technique) and CPM (Critical Path Method). If the delay of the critical operation (which is on a critical path) inevitably affects the productivity of mined ore, the approach can rapidly assign mucking machines new jobs to increase this amount at a maximum level by using a new mucking algorithm under external constraints. PMID:26062092
An Approach to Realizing Process Control for Underground Mining Operations of Mobile Machines.
Song, Zhen; Schunnesson, Håkan; Rinne, Mikael; Sturgul, John
2015-01-01
The excavation and production in underground mines are complicated processes which consist of many different operations. The process of underground mining is considerably constrained by the geometry and geology of the mine. The various mining operations are normally performed in series at each working face. The delay of a single operation will lead to a domino effect, thus delay the starting time for the next process and the completion time of the entire process. This paper presents a new approach to the process control for underground mining operations, e.g. drilling, bolting, mucking. This approach can estimate the working time and its probability for each operation more efficiently and objectively by improving the existing PERT (Program Evaluation and Review Technique) and CPM (Critical Path Method). If the delay of the critical operation (which is on a critical path) inevitably affects the productivity of mined ore, the approach can rapidly assign mucking machines new jobs to increase this amount at a maximum level by using a new mucking algorithm under external constraints.
1989-03-01
RIC ILE COPY AIR WAR COLLGE REEAC R~pCR UNITED STATES COAST GUARD ANTISUBMARINE WARFARE (ASW) IN THE MARITIME DEFENSE ZONE (MDZ) -A STRATEGIC...going to perform in these MDZs. Those tasks identified so far include: port and coastal physical security & preventive safety, mine warfare
Network-based modeling and intelligent data mining of social media for improving care.
Akay, Altug; Dragomir, Andrei; Erlandsson, Bjorn-Erik
2015-01-01
Intelligently extracting knowledge from social media has recently attracted great interest from the Biomedical and Health Informatics community to simultaneously improve healthcare outcomes and reduce costs using consumer-generated opinion. We propose a two-step analysis framework that focuses on positive and negative sentiment, as well as the side effects of treatment, in users' forum posts, and identifies user communities (modules) and influential users for the purpose of ascertaining user opinion of cancer treatment. We used a self-organizing map to analyze word frequency data derived from users' forum posts. We then introduced a novel network-based approach for modeling users' forum interactions and employed a network partitioning method based on optimizing a stability quality measure. This allowed us to determine consumer opinion and identify influential users within the retrieved modules using information derived from both word-frequency data and network-based properties. Our approach can expand research into intelligently mining social media data for consumer opinion of various treatments to provide rapid, up-to-date information for the pharmaceutical industry, hospitals, and medical staff, on the effectiveness (or ineffectiveness) of future treatments.
A systematic review of lost-time injuries in the global mining industry.
Nowrouzi-Kia, Behdin; Gohar, Basem; Casole, Jennifer; Chidu, Carla; Dumond, Jennifer; McDougall, Alicia; Nowrouzi-Kia, Behnam
2018-05-01
Mining is a hazardous occupation with elevated rates of lost-time injury and disability. The purpose of this study is twofold: 1) To identify the type of lost-time injuries in the mining workforce, regardless of the kind of mining and 2) To examine the antecedent factors to the occupational injury (lost-time injuries). We identified and extracted primary papers related to lost-time injuries in the mining sector by conducting a systematic search of the electronic literature in the eight health and related databases. We critically reviewed nine articles in the mining sector that examined lost-time injuries. Musculoskeletal injuries (hand, back, limbs, fractures, lacerations and muscle contusions), slips and falls were identified as types of lost-time injuries. The review identified the following antecedent factors related to lost-time injuries: the mining work environment (underground mining), being male, age, working with mining equipment, organizational size, falling objects, disease status, job training and lack of occupational safety management teams, recovery time, social supports, access to health services, pre-injury health status and susceptibility to injury. The mining sector is a hazardous environment that increases workers' susceptibility to occupational injuries. There is a need to create and implement monitoring systems of lost-time injuries to implement prevention programs.
The conference goal was to provide a forum for the exchange of scientific information on current and emerging approaches to assessing characterization, monitoring, source control, treatment and/or remediation on mining-influenced waters. The conference was aimed at mining remedi...
NASA Astrophysics Data System (ADS)
Dusanter, S.; Michoud, V.; Leonardis, T.; Riffault, V.; Zhang, S.; Locoge, N.
2015-12-01
Due to the large number of Volatile Organic Compounds (VOCs) expected in the atmosphere (104-105) (Goldstein and Galbally, ES&T, 2007), exhaustive measurements of VOCs appear to be currently unfeasible using common analytical techniques. In this context, measurements of the total sink of OH, referred as total OH reactivity, can provide a critical test to assess the completeness of trace gas measurements during field campaigns. This can be done by comparing the measured total OH reactivity to values calculated from trace gas measurements. Indeed, large discrepancies are usually found between measured and calculated OH reactivity values revealing the presence of important unmeasured reactive species, which have yet to be identified. A Comparative Reactivity Method (CRM) instrument has been setup at Mines Douai to allow sequential measurements of VOCs and OH reactivity using the same Proton Transfer Reaction-Time of Flight Mass Spectrometer. This approach aims at identifying unmeasured reactive VOCs based on a method proposed by Kato et al. (Atmos. Environ., 2011), taking advantage of VOC oxidations occurring in the CRM sampling reactor. MD-CRM has been deployed at an urban site in Dunkirk (France) during July 2014 to test this new approach. During this campaign, a large fraction of the OH reactivity was not explained by collocated measurements of trace gases (67% on average). In this presentation, we will first describe the approach that was implemented in the CRM instrument to identify part of the observed missing OH reactivity and we will then discuss the OH reactivity budget regarding the origin of air masses reaching the measurement site.
Pep2Path: automated mass spectrometry-guided genome mining of peptidic natural products.
Medema, Marnix H; Paalvast, Yared; Nguyen, Don D; Melnik, Alexey; Dorrestein, Pieter C; Takano, Eriko; Breitling, Rainer
2014-09-01
Nonribosomally and ribosomally synthesized bioactive peptides constitute a source of molecules of great biomedical importance, including antibiotics such as penicillin, immunosuppressants such as cyclosporine, and cytostatics such as bleomycin. Recently, an innovative mass-spectrometry-based strategy, peptidogenomics, has been pioneered to effectively mine microbial strains for novel peptidic metabolites. Even though mass-spectrometric peptide detection can be performed quite fast, true high-throughput natural product discovery approaches have still been limited by the inability to rapidly match the identified tandem mass spectra to the gene clusters responsible for the biosynthesis of the corresponding compounds. With Pep2Path, we introduce a software package to fully automate the peptidogenomics approach through the rapid Bayesian probabilistic matching of mass spectra to their corresponding biosynthetic gene clusters. Detailed benchmarking of the method shows that the approach is powerful enough to correctly identify gene clusters even in data sets that consist of hundreds of genomes, which also makes it possible to match compounds from unsequenced organisms to closely related biosynthetic gene clusters in other genomes. Applying Pep2Path to a data set of compounds without known biosynthesis routes, we were able to identify candidate gene clusters for the biosynthesis of five important compounds. Notably, one of these clusters was detected in a genome from a different subphylum of Proteobacteria than that in which the molecule had first been identified. All in all, our approach paves the way towards high-throughput discovery of novel peptidic natural products. Pep2Path is freely available from http://pep2path.sourceforge.net/, implemented in Python, licensed under the GNU General Public License v3 and supported on MS Windows, Linux and Mac OS X.
NASA Astrophysics Data System (ADS)
Rogowitz, Bernice E.; Rabenhorst, David A.; Gerth, John A.; Kalin, Edward B.
1996-04-01
This paper describes a set of visual techniques, based on principles of human perception and cognition, which can help users analyze and develop intuitions about tabular data. Collections of tabular data are widely available, including, for example, multivariate time series data, customer satisfaction data, stock market performance data, multivariate profiles of companies and individuals, and scientific measurements. In our approach, we show how visual cues can help users perform a number of data mining tasks, including identifying correlations and interaction effects, finding clusters and understanding the semantics of cluster membership, identifying anomalies and outliers, and discovering multivariate relationships among variables. These cues are derived from psychological studies on perceptual organization, visual search, perceptual scaling, and color perception. These visual techniques are presented as a complement to the statistical and algorithmic methods more commonly associated with these tasks, and provide an interactive interface for the human analyst.
A Hybrid Data Mining Approach for Credit Card Usage Behavior Analysis
NASA Astrophysics Data System (ADS)
Tsai, Chieh-Yuan
Credit card is one of the most popular e-payment approaches in current online e-commerce. To consolidate valuable customers, card issuers invest a lot of money to maintain good relationship with their customers. Although several efforts have been done in studying card usage motivation, few researches emphasize on credit card usage behavior analysis when time periods change from t to t+1. To address this issue, an integrated data mining approach is proposed in this paper. First, the customer profile and their transaction data at time period t are retrieved from databases. Second, a LabelSOM neural network groups customers into segments and identify critical characteristics for each group. Third, a fuzzy decision tree algorithm is used to construct usage behavior rules of interesting customer groups. Finally, these rules are used to analysis the behavior changes between time periods t and t+1. An implementation case using a practical credit card database provided by a commercial bank in Taiwan is illustrated to show the benefits of the proposed framework.
Adaptive semantic tag mining from heterogeneous clinical research texts.
Hao, T; Weng, C
2015-01-01
To develop an adaptive approach to mine frequent semantic tags (FSTs) from heterogeneous clinical research texts. We develop a "plug-n-play" framework that integrates replaceable unsupervised kernel algorithms with formatting, functional, and utility wrappers for FST mining. Temporal information identification and semantic equivalence detection were two example functional wrappers. We first compared this approach's recall and efficiency for mining FSTs from ClinicalTrials.gov to that of a recently published tag-mining algorithm. Then we assessed this approach's adaptability to two other types of clinical research texts: clinical data requests and clinical trial protocols, by comparing the prevalence trends of FSTs across three texts. Our approach increased the average recall and speed by 12.8% and 47.02% respectively upon the baseline when mining FSTs from ClinicalTrials.gov, and maintained an overlap in relevant FSTs with the base- line ranging between 76.9% and 100% for varying FST frequency thresholds. The FSTs saturated when the data size reached 200 documents. Consistent trends in the prevalence of FST were observed across the three texts as the data size or frequency threshold changed. This paper contributes an adaptive tag-mining framework that is scalable and adaptable without sacrificing its recall. This component-based architectural design can be potentially generalizable to improve the adaptability of other clinical text mining methods.
NASA Astrophysics Data System (ADS)
Ebrahimabadi, Arash
2016-12-01
This paper describes an effective approach to select suitable plant species for reclamation of mined lands in Chadormaloo iron mine which is located in central part of Iran, near the city of Bafgh in Yazd province. After mine's total reserves are excavated, the mine requires to be permanently closed and reclaimed. Mine reclamation and post-mining land-use are the main issues in the phase of mine closure. In general, among various scenarios for mine reclamation process, i.e. planting, agriculture, forestry, residency, tourist attraction, etc., planting is the oldest and commonly-used technology for the reclamation of lands damaged by mining activities. Planting and vegetation play a major role in restoring productivity, ecosystem stability and biological diversity to degraded areas, therefore the main goal of this research work is to choose proper and suitable plants compatible with the conditions of Chadormaloo mined area, providing consistent conditions for future use. To ensure the sustainability of the reclaimed landscape, the most suitable plant species adapted to the mine conditions are selected. Plant species selection is a Multi Criteria Decision Making (MCDM) problem. In this paper, a fuzzy MCDM technique, namely Fuzzy Analytic Hierarchy Process (FAHP) is developed to assist chadormaloo iron mine managers and designers in the process of plant type selection for reclamation of the mine under fuzzy environment where the vagueness and uncertainty are taken into account with linguistic variables parameterized by triangular fuzzy numbers. The results achieved from using FAHP approach demonstrate that the most proper plant species are ranked as Artemisia sieberi, Salsola yazdiana, Halophytes types, and Zygophyllum, respectively for reclamation of Chadormaloo iron mine.
Dimensioning Principles in Potash and Salt: Stability and Integrity
NASA Astrophysics Data System (ADS)
Minkley, W.; Mühlbauer, J.; Lüdeling, C.
2016-11-01
The paper describes the principal geomechanical approaches to mine dimensioning in salt and potash mining, focusing on stability of the mining system and integrity of the hydraulic barrier. Several common dimensioning are subjected to a comparative analysis. We identify geomechanical discontinuum models as essential physical ingredients for examining the collapse of working fields in potash mining. The basic mechanisms rely on the softening behaviour of salt rocks and the interfaces. A visco-elasto-plastic material model with strain softening, dilatancy and creep describes the time-dependent softening behaviour of the salt pillars, while a shear model with velocity-dependent adhesive friction with shear displacement-dependent softening is used for bedding planes and discontinuities. Pillar stability critically depends on the shear conditions of the bedding planes to the overlying and underlying beds, which provide the necessary confining pressure for the pillar core, but can fail dynamically, leading to large-scale field collapses. We further discuss the integrity conditions for the hydraulic barrier, most notably the minimal stress criterion, the violation of which leads to pressure-driven percolation as the mechanism of fluid transport and hence barrier failure. We present a number of examples where violation of the minimal stress criterion has led to mine floodings.
Logistic Principles Application for Managing the Extraction and Transportation of Solid Minerals
NASA Astrophysics Data System (ADS)
Tyurin, Alexey
2017-11-01
Reducing the cost of resources in solid mineral extraction is an urgent task. For its solution the article proposes logistic approach use to management of mining company all resources, including extraction processes, transport, mineral handling and storage. The account of the uneven operation of mining, transport units and complexes for processing and loading coal into railroad cars allows you to identify the shortcomings in the work of the entire enterprise and reduce resources use at the planned production level. In the article the mining planning model taking into account the dynamics of the production, transport stations and export coal to consumers rail transport on example of Krasnoyarsk region Nazarovo JSC «Razrez Sereul'skiy». Rolling planning methods use and data aggregation allows you to split the planning horizon (month) on equal periods and to use of dynamic programming method for building mining optimal production programme for the month. Coal mining production program definition technique will help align the work of all enterprise units, to optimize resources of all areas, to establish a flexible relationship between manufacturer and consumer, to take into account the irregularity of rail transport.
Church, S.E.; Fey, D.L.; Brouwers, E.M.; Holmes, C.W.; Blair, Robert
1999-01-01
Determination of the pre-mining geochemical baseline in bed sediments and the paleoecology in a watershed impacted by historical mining activity is of utmost importance in establishing watershed restoration goals. We have approached this problem in the Animas River watershed using geomorphologic mapping methods to identify old pre-mining sediments. A systematic evaluation of possible sites resulted in collection of a large number of samples of pre-mining sediments, overbank sediments, and fluvial tailings deposits from more than 50 sites throughout the watershed. Chemical analysis of individual stratigraphic layers has resulted in a chemical stratigraphy that can be tied to the historical record through geochronological and dendochronological studies at these sites. Preliminary analysis of geochemical data from more than 500 samples from this study, when coupled with both the historical and geochronological record, clearly show that there has been a major impact by historical mining activities on the geochemical record preserved in these fluvial bed sediments. Historical mining activity has resulted in a substantial increase in metals in the very fine sand to clay sized component of the bed sediment of the upper Animas River, and Cement and Mineral Creeks. Enrichment factors for metals in modern bed sediments, relative to the pre-mining sediments, range from a factor of 2 to 6 for arsenic, 4 to more than 10 for cadmium, 2 to more than 10 for lead, 2 to 5 for silver, and 2 to more than 15 for zinc. However, the pre-mining bed sediment geochemical baseline is high relative to crustal abundance levels of many orerelated metals and the watershed would readily be identified as a highly mineralized area suitable for mineral exploration if it had not been disturbed by historical mining activity. We infer from these data that the water chemistry in the streams was less acidic prior to historical mining activity in the watershed. Paleoentologic evidence does not indicate a healthy aquatic habitat in any of the stream reaches investigated above the confluence of the Animas River with Mineral Creek (fig. 1) prior to the impact of historical mining activity. The absence of paleoentologic remains is interpreted to reflect the poor preservation regime of the bed sediment materials sampled. The fluvial sediments sampled in this study represent higher energy environments than are conducive to the preservation of most aquatic organisms including fish remains. We interpret the sedimentological data to indicate that there has been substantial loss of riparian habitat in the upper Animas River above Howardsville as a result of historical mining activity.
Reduction of Conflicts in Mining Development Using "Good Neighbor Agreements"
NASA Astrophysics Data System (ADS)
Masaitis, A.
2013-05-01
New environmental and social challenges for the mining industry in both developed and developing countries show the obvious need to implement "responsible" mining practices that include improved community involvement. Good Neighbor Agreements (GNA's) are a relatively new mechanism for improving communication and trust between a mining company and the community. The focus of a GNA will be to provide a written and enforceable agreement, negotiated between the concerned public and the respective mining company to respond to concerns from the public, and also provide a mechanism for conflict resolution, when there is mutual benefit to maintain a working relationship. Development of GNA's, a recently evolving process that promotes environmentally sound relationships between mines and the surrounding communities. Modify and apply the resulting GNA formulas to the developing countries and countries with transitional economies. This is particularly important for countries that have poorly functioning regulatory systems that cannot guarantee a healthy and safe environment for the communities. The fundamental questions addressed by this research. 1. This is a three-year research project started in August 2012 at the University of Nevada, Reno (UNR) to develop a Good Neighbor Agreements standards as well as to investigate the details of mine development. 2. Identify spheres of possible cooperation between mining companies, government organizations, and the Non-Governmental Organizations (NGO's). Use this cooperation to develop international standards for the GNA, to promote exchange of environmental information, and exchange of successful environmental, health, and safety practices between mining operations from different countries. Discussion: The Good Neighbor Agreement currently evolving will address the following: 1. Provide an economically viable mechanism for developing a partnership between mining operations and the local communities that will increase mining industry's accountability and provide higher levels of confidence for the community that a mine is operated in a safe and sustainable manner. 2. Increase the diversity of people benefiting from the results of this research by providing standards that could be adopted in developing countries. The goal of the GNA is to have open access for the public to the safety, health, and environmental information pertaining to the mining operation, as well as to educate the local communities about safe and sustainable mining practices that promote mutual acknowledgment of the need to build a relationship amenable to each other's needs. Frequent conflicts between mining companies and surrounding communities lead to work disruptions or even mine closures and show the necessity of a less confrontational approach to environmental and social justice. Because of the higher quality environmental standards already in place, this new approach perhaps should first be established in developed countries and then applied to other countries with less developed economies. The Good Neighbor Agreement is a unique way to provide the benefits for the both mining operations and local community to provide a mechanism for development of trust and communication that offer the potential to protect both mining and community interests, and can possibly reduce conflicts in resource development projects.
Durmaz, Arda; Henderson, Tim A D; Brubaker, Douglas; Bebek, Gurkan
2017-01-01
Large scale genomics studies have generated comprehensive molecular characterization of numerous cancer types. Subtypes for many tumor types have been established; however, these classifications are based on molecular characteristics of a small gene sets with limited power to detect dysregulation at the patient level. We hypothesize that frequent graph mining of pathways to gather pathways functionally relevant to tumors can characterize tumor types and provide opportunities for personalized therapies. In this study we present an integrative omics approach to group patients based on their altered pathway characteristics and show prognostic differences within breast cancer (p < 9:57E - 10) and glioblastoma multiforme (p < 0:05) patients. We were able validate this approach in secondary RNA-Seq datasets with p < 0:05 and p < 0:01 respectively. We also performed pathway enrichment analysis to further investigate the biological relevance of dysregulated pathways. We compared our approach with network-based classifier algorithms and showed that our unsupervised approach generates more robust and biologically relevant clustering whereas previous approaches failed to report specific functions for similar patient groups or classify patients into prognostic groups. These results could serve as a means to improve prognosis for future cancer patients, and to provide opportunities for improved treatment options and personalized interventions. The proposed novel graph mining approach is able to integrate PPI networks with gene expression in a biologically sound approach and cluster patients in to clinically distinct groups. We have utilized breast cancer and glioblastoma multiforme datasets from microarray and RNA-Seq platforms and identified disease mechanisms differentiating samples. Supplementary methods, figures, tables and code are available at https://github.com/bebeklab/dysprog.
Bravo, Àlex; Piñero, Janet; Queralt-Rosinach, Núria; Rautschka, Michael; Furlong, Laura I
2015-02-21
Current biomedical research needs to leverage and exploit the large amount of information reported in scientific publications. Automated text mining approaches, in particular those aimed at finding relationships between entities, are key for identification of actionable knowledge from free text repositories. We present the BeFree system aimed at identifying relationships between biomedical entities with a special focus on genes and their associated diseases. By exploiting morpho-syntactic information of the text, BeFree is able to identify gene-disease, drug-disease and drug-target associations with state-of-the-art performance. The application of BeFree to real-case scenarios shows its effectiveness in extracting information relevant for translational research. We show the value of the gene-disease associations extracted by BeFree through a number of analyses and integration with other data sources. BeFree succeeds in identifying genes associated to a major cause of morbidity worldwide, depression, which are not present in other public resources. Moreover, large-scale extraction and analysis of gene-disease associations, and integration with current biomedical knowledge, provided interesting insights on the kind of information that can be found in the literature, and raised challenges regarding data prioritization and curation. We found that only a small proportion of the gene-disease associations discovered by using BeFree is collected in expert-curated databases. Thus, there is a pressing need to find alternative strategies to manual curation, in order to review, prioritize and curate text-mining data and incorporate it into domain-specific databases. We present our strategy for data prioritization and discuss its implications for supporting biomedical research and applications. BeFree is a novel text mining system that performs competitively for the identification of gene-disease, drug-disease and drug-target associations. Our analyses show that mining only a small fraction of MEDLINE results in a large dataset of gene-disease associations, and only a small proportion of this dataset is actually recorded in curated resources (2%), raising several issues on data prioritization and curation. We propose that joint analysis of text mined data with data curated by experts appears as a suitable approach to both assess data quality and highlight novel and interesting information.
NASA Astrophysics Data System (ADS)
Emmerton, Bevan; Burgess, Jon; Esterle, Joan; Erskine, Peter; Baumgartl, Thomas
2017-04-01
Large-scale open cut mining in the Bowen Basin, Queensland, Australia has undergone an evolutionary process over the period of a few decades, transitioning from shallow mining depths, limited spoil elevation and pasture based rehabilitation to increased mining depths, escalating pre-stripping, elevated mesa-like landforms and native woody species rehabilitation. As a consequence of this development, the stabilisation of recent constructed landforms has to be assured through means other than the establishment of vegetative cover. Recent developments are the specific selection and partitioning of resilient fragmental spoil types for the construction of final landform surface. They can also be used as cladding resources for stabilizing steep erosive batters and this has been identified as a practical methodology that has the potential to significantly improve rehabilitation outcomes. Examples of improvements are an increase of the surface rock cover, roughness and infiltration and reducing inherent erodibility and runoff and velocity of surface flow. However, a thorough understanding of the properties and behavior of individual spoil materials disturbed during mining is required. Relevant information from published literature on the geological origins, lithology and weathering characteristics of individual strata within the Bowen Basin Coal Measures located in Queensland, Australia (and younger overlying weathered strata) has been studied, and related both to natural landforms and to the surface stability of major strata types when disturbed by mining. The resulting spoil classification developed from this study is based primarily on inherent geological characteristics and weathering behaviour of identifiable lithologic components, and as such describes the expected fragmental resilience likely within disturbed materials at Bowen Basin coal mines. The proposed classification system allows the allocation of spoil types to use categories which have application in pre-mine feasibility investigations, landform design and material selection and placement. It finds its application by practitioners who find encouragement in using this approach of a relatively easy usable classification system to improve the overall outcome of rehabilitation through selection of optimal substrates.
Hu, Zhi-Liang; Ramos, Antonio M.; Humphray, Sean J.; Rogers, Jane; Reecy, James M.; Rothschild, Max F.
2011-01-01
The newly available pig genome sequence has provided new information to fine map quantitative trait loci (QTL) in order to eventually identify causal variants. With targeted genomic sequencing efforts, we were able to obtain high quality BAC sequences that cover a region on pig chromosome 17 where a number of meat quality QTL have been previously discovered. Sequences from 70 BAC clones were assembled to form an 8-Mbp contig. Subsequently, we successfully mapped five previously identified QTL, three for meat color and two for lactate related traits, to the contig. With an additional 25 genetic markers that were identified by sequence comparison, we were able to carry out further linkage disequilibrium analysis to narrow down the genomic locations of these QTL, which allowed identification of the chromosomal regions that likely contain the causative variants. This research has provided one practical approach to combine genetic and molecular information for QTL mining. PMID:22303339
NASA Astrophysics Data System (ADS)
Gontaszewska-Piekarz, Agnieszka; Mrówczyńska, Maria
2018-04-01
The paper presents the possibilities of using data obtained by airborne laser scanning for identifying areas where lignite used to be mined. The technology of airborne laser scanning presented in the paper as and its results have a vast potential in terms of identifying local terrain deformations. The paper also presents the history of lignite mining in the region of Ośno Lubuskie (the north-west of Ziemia Lubuska - western Poland). It describes underground mining in complicated geological conditions (glaciotectonic deformations). The paper is supplemented with historical maps showing the locations of the mines
Narayanan, Ajit; Chen, Yi; Pang, Shaoning; Tao, Ban
2013-01-01
The continuous growth of malware presents a problem for internet computing due to increasingly sophisticated techniques for disguising malicious code through mutation and the time required to identify signatures for use by antiviral software systems (AVS). Malware modelling has focused primarily on semantics due to the intended actions and behaviours of viral and worm code. The aim of this paper is to evaluate a static structure approach to malware modelling using the growing malware signature databases now available. We show that, if malware signatures are represented as artificial protein sequences, it is possible to apply standard sequence alignment techniques in bioinformatics to improve accuracy of distinguishing between worm and virus signatures. Moreover, aligned signature sequences can be mined through traditional data mining techniques to extract metasignatures that help to distinguish between viral and worm signatures. All bioinformatics and data mining analysis were performed on publicly available tools and Weka.
The Effects of Different Representations on Static Structure Analysis of Computer Malware Signatures
Narayanan, Ajit; Chen, Yi; Pang, Shaoning; Tao, Ban
2013-01-01
The continuous growth of malware presents a problem for internet computing due to increasingly sophisticated techniques for disguising malicious code through mutation and the time required to identify signatures for use by antiviral software systems (AVS). Malware modelling has focused primarily on semantics due to the intended actions and behaviours of viral and worm code. The aim of this paper is to evaluate a static structure approach to malware modelling using the growing malware signature databases now available. We show that, if malware signatures are represented as artificial protein sequences, it is possible to apply standard sequence alignment techniques in bioinformatics to improve accuracy of distinguishing between worm and virus signatures. Moreover, aligned signature sequences can be mined through traditional data mining techniques to extract metasignatures that help to distinguish between viral and worm signatures. All bioinformatics and data mining analysis were performed on publicly available tools and Weka. PMID:23983644
NASA Technical Reports Server (NTRS)
Mcmurtry, G. J.; Petersen, G. W. (Principal Investigator)
1973-01-01
The author has identified the following significant results. An interdisciplinary group at Penn State University is analyzing ERTS-1 data. The geographical area of interest is that of the Susquehanna River Basin in Pennsylvania. The objectives of the work have been to ascertain the usefulness of ERTS-1 data in the areas of natural resources and land use inventory, geology and hydrology, and environmental quality. Specific results include a study of land use in the Harrisburg area, discrimination between types of forest resources and vegetation, detection of previously unknown geologic faults and correlation of these with known mineral deposits and ground water, mapping of mine spoils in the anthracite region of eastern Pennsylvania, and mapping of strip mines and acid mine drainage in central Pennsylvania. Both photointerpretive techniques and automatic computer processing methods have been developed and used, separately and in a combined approach.
Abbe, Adeline; Falissard, Bruno
2017-10-23
Internet is a particularly dynamic way to quickly capture the perceptions of a population in real time. Complementary to traditional face-to-face communication, online social networks help patients to improve self-esteem and self-help. The aim of this study was to use text mining on material from an online forum exploring patients' concerns about treatment (antidepressants and anxiolytics). Concerns about treatment were collected from discussion titles in patients' online community related to antidepressants and anxiolytics. To examine the content of these titles automatically, we used text mining methods, such as word frequency in a document-term matrix and co-occurrence of words using a network analysis. It was thus possible to identify topics discussed on the forum. The forum included 2415 discussions on antidepressants and anxiolytics over a period of 3 years. After a preprocessing step, the text mining algorithm identified the 99 most frequently occurring words in titles, among which were escitalopram, withdrawal, antidepressant, venlafaxine, paroxetine, and effect. Patients' concerns were related to antidepressant withdrawal, the need to share experience about symptoms, effects, and questions on weight gain with some drugs. Patients' expression on the Internet is a potential additional resource in addressing patients' concerns about treatment. Patient profiles are close to that of patients treated in psychiatry. ©Adeline Abbe, Bruno Falissard. Originally published in JMIR Mental Health (http://mental.jmir.org), 23.10.2017.
Liu, L L; Liu, M J; Ma, M
2015-09-28
The central task of this study was to mine the gene-to-medium relationship. Adequate knowledge of this relationship could potentially improve the accuracy of differentially expressed gene mining. One of the approaches to differentially expressed gene mining uses conventional clustering algorithms to identify the gene-to-medium relationship. Compared to conventional clustering algorithms, self-organization maps (SOMs) identify the nonlinear aspects of the gene-to-medium relationships by mapping the input space into another higher dimensional feature space. However, SOMs are not suitable for huge datasets consisting of millions of samples. Therefore, a new computational model, the Function Clustering Self-Organization Maps (FCSOMs), was developed. FCSOMs take advantage of the theory of granular computing as well as advanced statistical learning methodologies, and are built specifically for each information granule (a function cluster of genes), which are intelligently partitioned by the clustering algorithm provided by the DAVID_6.7 software platform. However, only the gene functions, and not their expression values, are considered in the fuzzy clustering algorithm of DAVID. Compared to the clustering algorithm of DAVID, these experimental results show a marked improvement in the accuracy of classification with the application of FCSOMs. FCSOMs can handle huge datasets and their complex classification problems, as each FCSOM (modeled for each function cluster) can be easily parallelized.
ERIC Educational Resources Information Center
Miller, L. Dee; Soh, Leen-Kiat; Samal, Ashok; Kupzyk, Kevin; Nugent, Gwen
2015-01-01
Learning objects (LOs) are important online resources for both learners and instructors and usage for LOs is growing. Automatic LO tracking collects large amounts of metadata about individual students as well as data aggregated across courses, learning objects, and other demographic characteristics (e.g. gender). The challenge becomes identifying…
WHAT INNOVATIVE APPROACHES CAN BE DEVELOPED FOR MINING SITES?
Mining is essential to maintain our way of life. However, based upon industry's reporting in the most recent Toxic Release Inventory (TRI), the primary sources of heavy metal releases to the environment are mining and mining related activities. The hard rock mining industry rel...
Runkel, Robert L.; Verplanck, Philip; Kimball, Briant; Walton-Day, Katie
2018-01-01
Baseline, premining data for streams draining abandoned mine lands is virtually non existent, and indirect methods for estimating premining conditions are needed to establish realistic, cost effective cleanup goals. One such indirect method is the proximal analog approach, in which premining conditions are estimated using data from nearby mineralized areas that are unaffected by mining. In this paper, we combine the proximal analog approach with a quantitative mass balance framework using data from a spatially-detailed synoptic sampling campaign. The combined approach is applied to Cinnamon Gulch, a headwater stream with numerous draining adits. Synoptic sampling results indicate that three of the top five metal sources are affected by mining activities, and stream segments draining these sources account for a large percentage of overall metal loading within the study reach. These initial calculations overestimate the effects of mining, as the affected stream segments were likely acidic and metal rich prior to mining. Premining loads and concentrations were therefore determined through a replacement approach in which the chemistry of each mining-affected stream segment is revised based on proximal analog concentrations. The revised loading profiles indicate that 15–17% of the Al, Cd, Cu, Mn, Ni, and Zn loads are attributable to mining, whereas the mining contribution for Pb is 40%. Premining concentrations of Al, Cd, Cu, Mn, and Zn are estimated to be in excess of aquatic life standards over the length of the study reach.
Texas lignite mining: Groundwater and slope stability control in the nineties and beyond
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lawrence J.
As lignite mining in Texas approaches and exceeds depths of 200 feet below ground level, rising costs demand that innovative mining approaches be used in order to maintain the economic viability of lignite mining. Groundwater and slope stability problems multiply at these depths, resulting in increasing focus on how to control these costs. Dewatering costs are consistently rising for the lignite industry, as deeper mining encounters more and larger saturated sand bodies. These sands require dewatering in order to improve slope stability. Planning and analysis become more important as the number of wells grows beyond what can be managed withmore » a simple {open_quotes}cookie-cutter{close_quotes} approach. Slope stability plays an increasing role in mining concerns as deeper lignite is recovered. Slope stability causes several problems, including loss of lignite, increased rehandle, and hazards to personnel and equipment. Traditional lignite mine planning involved a fairly {open_quotes}generic{close_quotes} pit design with one design highwall angle, one design spoil angle, and little geotechnical evaluation of the deposit. This {open_quotes}one mine-one design{close_quotes} approach, while cost-effective in the past, is now being replaced by a more critical analysis of the design requirements of each area. Geotechnical evaluation plays an increasing role in the planning and operational aspects of lignite mining. Laboratory core sample test results can be used for slope stability modeling, in order to obtain more accurate design and operational information.« less
Redundancy and Novelty Mining in the Business Blogosphere
ERIC Educational Resources Information Center
Tsai, Flora S.; Chan, Kap Luk
2010-01-01
Purpose: The paper aims to explore the performance of redundancy and novelty mining in the business blogosphere, which has not been studied before. Design/methodology/approach: Novelty mining techniques are implemented to single out novel information out of a massive set of text documents. This paper adopted the mixed metric approach which…
Chapter 2: The forestry reclamation approach
Jim Burger; Don Graves; Patrick Angel; Vic Davis; Carl Zipper
2017-01-01
The Forestry Reclamation Approach (FRA) is a method for reclaiming coal-mined land to forest under the federal Surface Mining Control and Reclamation Act of 1977 (SMCRA). The FRA is based on knowledge gained from both scientific research and experience (Fig. 2-1). The FRA can achieve cost-effective regulatory compliance for mine operators while creating productive...
Mining concepts of health responsibility using text mining and exploratory graph analysis.
Kjellström, Sofia; Golino, Hudson
2018-05-24
Occupational therapists need to know about people's beliefs about personal responsibility for health to help them pursue everyday activities. The study aims to employ state-of-the-art quantitative approaches to understand people's views of health and responsibility at different ages. A mixed method approach was adopted, using text mining to extract information from 233 interviews with participants aged 5 to 96 years, and then exploratory graph analysis to estimate the number of latent variables. The fit of the structure estimated via the exploratory graph analysis was verified using confirmatory factor analysis. Exploratory graph analysis estimated three dimensions of health responsibility: (1) creating good health habits and feeling good; (2) thinking about one's own health and wanting to improve it; and 3) adopting explicitly normative attitudes to take care of one's health. The comparison between the three dimensions among age groups showed, in general, that children and adolescents, as well as the old elderly (>73 years old) expressed ideas about personal responsibility for health less than young adults, adults and young elderly. Occupational therapists' knowledge of the concepts of health responsibility is of value when working with a patient's health, but an identified challenge is how to engage children and older persons.
Mining the SDSS SkyServer SQL queries log
NASA Astrophysics Data System (ADS)
Hirota, Vitor M.; Santos, Rafael; Raddick, Jordan; Thakar, Ani
2016-05-01
SkyServer, the Internet portal for the Sloan Digital Sky Survey (SDSS) astronomic catalog, provides a set of tools that allows data access for astronomers and scientific education. One of SkyServer data access interfaces allows users to enter ad-hoc SQL statements to query the catalog. SkyServer also presents some template queries that can be used as basis for more complex queries. This interface has logged over 330 million queries submitted since 2001. It is expected that analysis of this data can be used to investigate usage patterns, identify potential new classes of queries, find similar queries, etc. and to shed some light on how users interact with the Sloan Digital Sky Survey data and how scientists have adopted the new paradigm of e-Science, which could in turn lead to enhancements on the user interfaces and experience in general. In this paper we review some approaches to SQL query mining, apply the traditional techniques used in the literature and present lessons learned, namely, that the general text mining approach for feature extraction and clustering does not seem to be adequate for this type of data, and, most importantly, we find that this type of analysis can result in very different queries being clustered together.
NASA Astrophysics Data System (ADS)
Hoehndorf, Robert; Schofield, Paul N.; Gkoutos, Georgios V.
2015-06-01
Phenotypes are the observable characteristics of an organism arising from its response to the environment. Phenotypes associated with engineered and natural genetic variation are widely recorded using phenotype ontologies in model organisms, as are signs and symptoms of human Mendelian diseases in databases such as OMIM and Orphanet. Exploiting these resources, several computational methods have been developed for integration and analysis of phenotype data to identify the genetic etiology of diseases or suggest plausible interventions. A similar resource would be highly useful not only for rare and Mendelian diseases, but also for common, complex and infectious diseases. We apply a semantic text-mining approach to identify the phenotypes (signs and symptoms) associated with over 6,000 diseases. We evaluate our text-mined phenotypes by demonstrating that they can correctly identify known disease-associated genes in mice and humans with high accuracy. Using a phenotypic similarity measure, we generate a human disease network in which diseases that have similar signs and symptoms cluster together, and we use this network to identify closely related diseases based on common etiological, anatomical as well as physiological underpinnings.
A methodological toolkit for field assessments of artisanally mined alluvial diamond deposits
Chirico, Peter G.; Malpeli, Katherine C.
2014-01-01
This toolkit provides a standardized checklist of critical issues relevant to artisanal mining-related field research. An integrated sociophysical geographic approach to collecting data at artisanal mine sites is outlined. The implementation and results of a multistakeholder approach to data collection, carried out in the assessment of Guinea’s artisanally mined diamond deposits, also are summarized. This toolkit, based on recent and successful field campaigns in West Africa, has been developed as a reference document to assist other government agencies or organizations in collecting the data necessary for artisanal diamond mining or similar natural resource assessments.
Pan, Jilang; Oates, Christopher J; Ihlenfeld, Christian; Plant, Jane A; Voulvoulis, Nikolaos
2010-04-01
Metals have been central to the development of human civilisation from the Bronze Age to modern times, although in the past, metal mining and smelting have been the cause of serious environmental pollution with the potential to harm human health. Despite problems from artisanal mining in some developing countries, modern mining to Western standards now uses the best available mining technology combined with environmental monitoring, mitigation and remediation measures to limit emissions to the environment. This paper develops risk screening and prioritisation methods previously used for contaminated land on military and civilian sites and engineering systems for the analysis and prioritisation of chemical risks from modern metal mining operations. It uses hierarchical holographic modelling and multi-criteria decision making to analyse and prioritise the risks from potentially hazardous inorganic chemical substances released by mining operations. A case study of an active platinum group metals mine in South Africa is used to demonstrate the potential of the method. This risk-based methodology for identifying, filtering and ranking mining-related environmental and human health risks can be used to identify exposure media of greatest concern to inform risk management. It also provides a practical decision-making tool for mine acquisition and helps to communicate risk to all members of mining operation teams.
NASA Astrophysics Data System (ADS)
Goix, Sylvaine; Resongles, Eléonore; Point, David; Oliva, Priscia; Duprey, Jean Louis; de la Galvez, Erika; Ugarte, Lincy; Huayta, Carlos; Prunier, Jonathan; Zouiten, Cyril; Gardon, Jacques
2013-12-01
Monitoring atmospheric trace elements (TE) levels and tracing their source origin is essential for exposure assessment and human health studies. Epiphytic Tillandsia capillaris plants were used as bioaccumulator of TE in a complex polymetallic mining/smelting urban context (Oruro, Bolivia). Specimens collected from a pristine reference site were transplanted at a high spatial resolution (˜1 sample/km2) throughout the urban area. About twenty-seven elements were measured after a 4-month exposure, also providing new information values for reference material BCR482. Statistical power analysis for this biomonitoring mapping approach against classical aerosols surveys performed on the same site showed the better aptitude of T. Capillaris to detect geographical trend, and to deconvolute multiple contamination sources using geostatistical principal component analysis. Transplanted specimens in the vicinity of the mining and smelting areas were characterized by extreme TE accumulation (Sn > Ag > Sb > Pb > Cd > As > W > Cu > Zn). Three contamination sources were identified: mining (Ag, Pb, Sb), smelting (As, Sn) and road traffic (Zn) emissions, confirming results of previous aerosol survey.
Improving risk-stratification of Diabetes complications using temporal data mining.
Sacchi, Lucia; Dagliati, Arianna; Segagni, Daniele; Leporati, Paola; Chiovato, Luca; Bellazzi, Riccardo
2015-01-01
To understand which factor trigger worsened disease control is a crucial step in Type 2 Diabetes (T2D) patient management. The MOSAIC project, funded by the European Commission under the FP7 program, has been designed to integrate heterogeneous data sources and provide decision support in chronic T2D management through patients' continuous stratification. In this work we show how temporal data mining can be fruitfully exploited to improve risk stratification. In particular, we exploit administrative data on drug purchases to divide patients in meaningful groups. The detection of drug consumption patterns allows stratifying the population on the basis of subjects' purchasing attitude. Merging these findings with clinical values indicates the relevance of the applied methods while showing significant differences in the identified groups. This extensive approach emphasized the exploitation of administrative data to identify patterns able to explain clinical conditions.
Trends in Fetal Medicine: A 10-Year Bibliometric Analysis of Prenatal Diagnosis
Dhombres, Ferdinand; Bodenreider, Olivier
2018-01-01
The objective is to automatically identify trends in Fetal Medicine over the past 10 years through a bibliometric analysis of articles published in Prenatal Diagnosis, using text mining techniques. We processed 2,423 full-text articles published in Prenatal Diagnosis between 2006 and 2015. We extracted salient terms, calculated their frequencies over time, and established evolution profiles for terms, from which we derived falling, stable, and rising trends. We identified 618 terms with a falling trend, 2,142 stable terms, and 839 terms with a rising trend. Terms with increasing frequencies include those related to statistics and medical study design. The most recent of these terms reflect the new opportunities of next- generation sequencing. Many terms related to cytogenetics exhibit a falling trend. A bibliometric analysis based on text mining effectively supports identification of trends over time. This scalable approach is complementary to analyses based on metadata or expert opinion. PMID:29295220
Automatic target validation based on neuroscientific literature mining for tractography
Vasques, Xavier; Richardet, Renaud; Hill, Sean L.; Slater, David; Chappelier, Jean-Cedric; Pralong, Etienne; Bloch, Jocelyne; Draganski, Bogdan; Cif, Laura
2015-01-01
Target identification for tractography studies requires solid anatomical knowledge validated by an extensive literature review across species for each seed structure to be studied. Manual literature review to identify targets for a given seed region is tedious and potentially subjective. Therefore, complementary approaches would be useful. We propose to use text-mining models to automatically suggest potential targets from the neuroscientific literature, full-text articles and abstracts, so that they can be used for anatomical connection studies and more specifically for tractography. We applied text-mining models to three structures: two well-studied structures, since validated deep brain stimulation targets, the internal globus pallidus and the subthalamic nucleus and, the nucleus accumbens, an exploratory target for treating psychiatric disorders. We performed a systematic review of the literature to document the projections of the three selected structures and compared it with the targets proposed by text-mining models, both in rat and primate (including human). We ran probabilistic tractography on the nucleus accumbens and compared the output with the results of the text-mining models and literature review. Overall, text-mining the literature could find three times as many targets as two man-weeks of curation could. The overall efficiency of the text-mining against literature review in our study was 98% recall (at 36% precision), meaning that over all the targets for the three selected seeds, only one target has been missed by text-mining. We demonstrate that connectivity for a structure of interest can be extracted from a very large amount of publications and abstracts. We believe this tool will be useful in helping the neuroscience community to facilitate connectivity studies of particular brain regions. The text mining tools used for the study are part of the HBP Neuroinformatics Platform, publicly available at http://connectivity-brainer.rhcloud.com/. PMID:26074781
Reverse and forward engineering of protein pattern formation.
Kretschmer, Simon; Harrington, Leon; Schwille, Petra
2018-05-26
Living systems employ protein pattern formation to regulate important life processes in space and time. Although pattern-forming protein networks have been identified in various prokaryotes and eukaryotes, their systematic experimental characterization is challenging owing to the complex environment of living cells. In turn, cell-free systems are ideally suited for this goal, as they offer defined molecular environments that can be precisely controlled and manipulated. Towards revealing the molecular basis of protein pattern formation, we outline two complementary approaches: the biochemical reverse engineering of reconstituted networks and the de novo design, or forward engineering, of artificial self-organizing systems. We first illustrate the reverse engineering approach by the example of the Escherichia coli Min system, a model system for protein self-organization based on the reversible and energy-dependent interaction of the ATPase MinD and its activating protein MinE with a lipid membrane. By reconstituting MinE mutants impaired in ATPase stimulation, we demonstrate how large-scale Min protein patterns are modulated by MinE activity and concentration. We then provide a perspective on the de novo design of self-organizing protein networks. Tightly integrated reverse and forward engineering approaches will be key to understanding and engineering the intriguing phenomenon of protein pattern formation.This article is part of the theme issue 'Self-organization in cell biology'. © 2018 The Author(s).
Trend Motif: A Graph Mining Approach for Analysis of Dynamic Complex Networks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, R; McCallen, S; Almaas, E
2007-05-28
Complex networks have been used successfully in scientific disciplines ranging from sociology to microbiology to describe systems of interacting units. Until recently, studies of complex networks have mainly focused on their network topology. However, in many real world applications, the edges and vertices have associated attributes that are frequently represented as vertex or edge weights. Furthermore, these weights are often not static, instead changing with time and forming a time series. Hence, to fully understand the dynamics of the complex network, we have to consider both network topology and related time series data. In this work, we propose a motifmore » mining approach to identify trend motifs for such purposes. Simply stated, a trend motif describes a recurring subgraph where each of its vertices or edges displays similar dynamics over a userdefined period. Given this, each trend motif occurrence can help reveal significant events in a complex system; frequent trend motifs may aid in uncovering dynamic rules of change for the system, and the distribution of trend motifs may characterize the global dynamics of the system. Here, we have developed efficient mining algorithms to extract trend motifs. Our experimental validation using three disparate empirical datasets, ranging from the stock market, world trade, to a protein interaction network, has demonstrated the efficiency and effectiveness of our approach.« less
NASA Astrophysics Data System (ADS)
Munirwansyah; Irsyam, Masyhur; Munirwan, Reza P.; Yunita, Halida; Zulfan Usrina, M.
2018-05-01
Occupational safety and health (OSH) is a planned effort to prevent accidents and diseases caused by work. In conducting mining activities often occur work accidents caused by unsafe field conditions. In open mine area, there is often a slump due to unstable slopes, which can disrupt the activities and productivity of mining companies. Based on research on stability of open pit slopes conducted by Febrianti [8], the Meureubo coal mine located in Aceh Barat district, on the slope of mine was indicated unsafe slope conditions, it will be continued research on OSH for landslide which is to understand the stability of the excavation slope and the shape of the slope collapse. Plaxis software was used for this research. After analyzing the slope stability and the effect of landslide on OSH with Job Safety Analysis (JSA) method, to identify the hazard to work safety, risk management analysis will be conducted to classified hazard level and its handling technique. This research aim is to know the level of risk of work accident at the company and its prevention effort. The result of risk analysis research is very high-risk value that is > 350 then the activity must be stopped until the risk can be reduced to reach the risk value limit < 20 which is allowed or accepted.
Rollins, Derrick K; Teh, Ailing
2010-12-17
Microarray data sets provide relative expression levels for thousands of genes for a small number, in comparison, of different experimental conditions called assays. Data mining techniques are used to extract specific information of genes as they relate to the assays. The multivariate statistical technique of principal component analysis (PCA) has proven useful in providing effective data mining methods. This article extends the PCA approach of Rollins et al. to the development of ranking genes of microarray data sets that express most differently between two biologically different grouping of assays. This method is evaluated on real and simulated data and compared to a current approach on the basis of false discovery rate (FDR) and statistical power (SP) which is the ability to correctly identify important genes. This work developed and evaluated two new test statistics based on PCA and compared them to a popular method that is not PCA based. Both test statistics were found to be effective as evaluated in three case studies: (i) exposing E. coli cells to two different ethanol levels; (ii) application of myostatin to two groups of mice; and (iii) a simulated data study derived from the properties of (ii). The proposed method (PM) effectively identified critical genes in these studies based on comparison with the current method (CM). The simulation study supports higher identification accuracy for PM over CM for both proposed test statistics when the gene variance is constant and for one of the test statistics when the gene variance is non-constant. PM compares quite favorably to CM in terms of lower FDR and much higher SP. Thus, PM can be quite effective in producing accurate signatures from large microarray data sets for differential expression between assays groups identified in a preliminary step of the PCA procedure and is, therefore, recommended for use in these applications.
Monitoring genotoxic exposure in uranium mines.
Srám, R J; Dobiás, L; Rössner, P; Veselá, D; Veselý, D; Rakusová, R; Rericha, V
1993-01-01
Recent data from deep uranium mines in Czechoslovakia indicated that mines are exposed to other mutagenic factors in addition to radon daughter products. Mycotoxins were identified as a possible source of mutagens in these mines. Mycotoxins were examined in 38 samples from mines and in throat swabs taken from 116 miners and 78 controls. The following mycotoxins were identified from mines samples: aflatoxins B1 and G1, citrinin, citreoviridin, mycophenolic acid, and sterigmatocystin. Some mold strains isolated from mines and throat swabs were investigated for mutagenic activity by the SOS chromotest and Salmonella assay with strains TA100 and TA98. Mutagenicity was observed, especially with metabolic activation in vitro. These data suggest that mycotoxins produced by molds in uranium mines are a new genotoxic factor for uranium miners. PMID:8143610
Kang, Hahk-Soo
2017-02-01
Genomics-based methods are now commonplace in natural products research. A phylogeny-guided mining approach provides a means to quickly screen a large number of microbial genomes or metagenomes in search of new biosynthetic gene clusters of interest. In this approach, biosynthetic genes serve as molecular markers, and phylogenetic trees built with known and unknown marker gene sequences are used to quickly prioritize biosynthetic gene clusters for their metabolites characterization. An increase in the use of this approach has been observed for the last couple of years along with the emergence of low cost sequencing technologies. The aim of this review is to discuss the basic concept of a phylogeny-guided mining approach, and also to provide examples in which this approach was successfully applied to discover new natural products from microbial genomes and metagenomes. I believe that the phylogeny-guided mining approach will continue to play an important role in genomics-based natural products research.
Genome Mining for Ribosomally Synthesized Natural Products
Velásquez, Juan E.; van der Donk, Wilfred
2011-01-01
In recent years, the number of known peptide natural products that are synthesized via the ribosomal pathway has rapidly grown. Taking advantage of sequence homology among genes encoding precursor peptides or biosynthetic proteins, in silico mining of genomes combined with molecular biology approaches has guided the discovery of a large number of new ribosomal natural products, including lantipeptides, cyanobactins, linear thiazole/oxazole-containing peptides, microviridins, lasso peptides, amatoxins, cyclotides, and conopeptides. In this review, we describe the strategies used for the identification of these ribosomally-synthesized and posttranslationally modified peptides (RiPPs) and the structures of newly identified compounds. The increasing number of chemical entities and their remarkable structural and functional diversity may lead to novel pharmaceutical applications. PMID:21095156
Abandoned Mine Lands: Site Information
A catalogue of mining sites proposed for and listed on the NPL as well as mining sites being cleaned up using the Superfund Alternative Approach. Also mine sites not on the NPL but that have had removal or emergency response cleanup actions.
Lee, Hyeongyu; Choi, Yosoon; Suh, Jangwon; Lee, Seung-Ho
2016-01-01
Understanding spatial variation of potentially toxic trace elements (PTEs) in soil is necessary to identify the proper measures for preventing soil contamination at both operating and abandoned mining areas. Many studies have been conducted worldwide to explore the spatial variation of PTEs and to create soil contamination maps using geostatistical methods. However, they generally depend only on inductively coupled plasma atomic emission spectrometry (ICP–AES) analysis data, therefore such studies are limited by insufficient input data owing to the disadvantages of ICP–AES analysis such as its costly operation and lengthy period required for analysis. To overcome this limitation, this study used both ICP–AES and portable X-ray fluorescence (PXRF) analysis data, with relatively low accuracy, for mapping copper and lead concentrations at a section of the Busan abandoned mine in Korea and compared the prediction performances of four different approaches: the application of ordinary kriging to ICP–AES analysis data, PXRF analysis data, both ICP–AES and transformed PXRF analysis data by considering the correlation between the ICP–AES and PXRF analysis data, and co-kriging to both the ICP–AES (primary variable) and PXRF analysis data (secondary variable). Their results were compared using an independent validation data set. The results obtained in this case study showed that the application of ordinary kriging to both ICP–AES and transformed PXRF analysis data is the most accurate approach when considers the spatial distribution of copper and lead contaminants in the soil and the estimation errors at 11 sampling points for validation. Therefore, when generating soil contamination maps for an abandoned mine, it is beneficial to use the proposed approach that incorporates the advantageous aspects of both ICP–AES and PXRF analysis data. PMID:27043594
Lee, Hyeongyu; Choi, Yosoon; Suh, Jangwon; Lee, Seung-Ho
2016-03-30
Understanding spatial variation of potentially toxic trace elements (PTEs) in soil is necessary to identify the proper measures for preventing soil contamination at both operating and abandoned mining areas. Many studies have been conducted worldwide to explore the spatial variation of PTEs and to create soil contamination maps using geostatistical methods. However, they generally depend only on inductively coupled plasma atomic emission spectrometry (ICP-AES) analysis data, therefore such studies are limited by insufficient input data owing to the disadvantages of ICP-AES analysis such as its costly operation and lengthy period required for analysis. To overcome this limitation, this study used both ICP-AES and portable X-ray fluorescence (PXRF) analysis data, with relatively low accuracy, for mapping copper and lead concentrations at a section of the Busan abandoned mine in Korea and compared the prediction performances of four different approaches: the application of ordinary kriging to ICP-AES analysis data, PXRF analysis data, both ICP-AES and transformed PXRF analysis data by considering the correlation between the ICP-AES and PXRF analysis data, and co-kriging to both the ICP-AES (primary variable) and PXRF analysis data (secondary variable). Their results were compared using an independent validation data set. The results obtained in this case study showed that the application of ordinary kriging to both ICP-AES and transformed PXRF analysis data is the most accurate approach when considers the spatial distribution of copper and lead contaminants in the soil and the estimation errors at 11 sampling points for validation. Therefore, when generating soil contamination maps for an abandoned mine, it is beneficial to use the proposed approach that incorporates the advantageous aspects of both ICP-AES and PXRF analysis data.
Pilot study on the use of data mining to identify cochlear implant candidates.
Grisel, Jedidiah J; Schafer, Erin; Lam, Anne; Griffin, Terry
2018-05-01
The goal of this pilot study was to determine the clinical utility of data-mining software that screens for cochlear implant (CI) candidacy. The Auditory Implant Initiative developed a software module that screens for CI candidates via integration with a software system (Noah 4) that serves as a depository for hearing test data. To identify candidates, patient audiograms from one practice were exported into the screening module. Candidates were tracked to determine if any eventually underwent implantation. After loading 4836 audiograms from the Noah 4 system, the screening module identified 558 potential CI candidates. After reviewing the data for the potential candidates, 117 were targeted and invited to an educational event. Following the event, a total of six candidates were evaluated, and two were implanted. This objective approach to identifying candidates has the potential to address the gross underutilization of CIs by removing any bias or lack of knowledge regarding the management of severe to profound sensorineural hearing loss with CIs. The screening module was an effective tool for identifying potential CI candidates at one ENT practice. On a larger scale, the screening module has the potential to impact thousands of CI candidates worldwide.
Proceedings: Fourth Workshop on Mining Scientific Datasets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kamath, C
Commercial applications of data mining in areas such as e-commerce, market-basket analysis, text-mining, and web-mining have taken on a central focus in the JCDD community. However, there is a significant amount of innovative data mining work taking place in the context of scientific and engineering applications that is not well represented in the mainstream KDD conferences. For example, scientific data mining techniques are being developed and applied to diverse fields such as remote sensing, physics, chemistry, biology, astronomy, structural mechanics, computational fluid dynamics etc. In these areas, data mining frequently complements and enhances existing analysis methods based on statistics, exploratorymore » data analysis, and domain-specific approaches. On the surface, it may appear that data from one scientific field, say genomics, is very different from another field, such as physics. However, despite their diversity, there is much that is common across the mining of scientific and engineering data. For example, techniques used to identify objects in images are very similar, regardless of whether the images came from a remote sensing application, a physics experiment, an astronomy observation, or a medical study. Further, with data mining being applied to new types of data, such as mesh data from scientific simulations, there is the opportunity to apply and extend data mining to new scientific domains. This one-day workshop brings together data miners analyzing science data and scientists from diverse fields to share their experiences, learn how techniques developed in one field can be applied in another, and better understand some of the newer techniques being developed in the KDD community. This is the fourth workshop on the topic of Mining Scientific Data sets; for information on earlier workshops, see http://www.ahpcrc.org/conferences/. This workshop continues the tradition of addressing challenging problems in a field where the diversity of applications is matched only by the opportunities that await a practitioner.« less
A Node Linkage Approach for Sequential Pattern Mining
Navarro, Osvaldo; Cumplido, René; Villaseñor-Pineda, Luis; Feregrino-Uribe, Claudia; Carrasco-Ochoa, Jesús Ariel
2014-01-01
Sequential Pattern Mining is a widely addressed problem in data mining, with applications such as analyzing Web usage, examining purchase behavior, and text mining, among others. Nevertheless, with the dramatic increase in data volume, the current approaches prove inefficient when dealing with large input datasets, a large number of different symbols and low minimum supports. In this paper, we propose a new sequential pattern mining algorithm, which follows a pattern-growth scheme to discover sequential patterns. Unlike most pattern growth algorithms, our approach does not build a data structure to represent the input dataset, but instead accesses the required sequences through pseudo-projection databases, achieving better runtime and reducing memory requirements. Our algorithm traverses the search space in a depth-first fashion and only preserves in memory a pattern node linkage and the pseudo-projections required for the branch being explored at the time. Experimental results show that our new approach, the Node Linkage Depth-First Traversal algorithm (NLDFT), has better performance and scalability in comparison with state of the art algorithms. PMID:24933123
Stochastic production phase design for an open pit mining complex with multiple processing streams
NASA Astrophysics Data System (ADS)
Asad, Mohammad Waqar Ali; Dimitrakopoulos, Roussos; van Eldert, Jeroen
2014-08-01
In a mining complex, the mine is a source of supply of valuable material (ore) to a number of processes that convert the raw ore to a saleable product or a metal concentrate for production of the refined metal. In this context, expected variation in metal content throughout the extent of the orebody defines the inherent uncertainty in the supply of ore, which impacts the subsequent ore and metal production targets. Traditional optimization methods for designing production phases and ultimate pit limit of an open pit mine not only ignore the uncertainty in metal content, but, in addition, commonly assume that the mine delivers ore to a single processing facility. A stochastic network flow approach is proposed that jointly integrates uncertainty in supply of ore and multiple ore destinations into the development of production phase design and ultimate pit limit. An application at a copper mine demonstrates the intricacies of the new approach. The case study shows a 14% higher discounted cash flow when compared to the traditional approach.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Raymond, David W.; Gaither, Katherine N.; Polsky, Yarom
Sandia National Laboratories (Sandia) has a long history in developing compact, mobile, very high-speed drilling systems and this technology could be applied to increasing the rate at which boreholes are drilled during a mine accident response. The present study reviews current technical approaches, primarily based on technology developed under other programs, analyzes mine rescue specific requirements to develop a conceptual mine rescue drilling approach, and finally, proposes development of a phased mine rescue drilling system (MRDS) that accomplishes (1) development of rapid drilling MRDS equipment; (2) structuring improved web communication through the Mine Safety & Health Administration (MSHA) web site;more » (3) development of an improved protocol for employment of existing drilling technology in emergencies; (4) deployment of advanced technologies to complement mine rescue drilling operations during emergency events; and (5) preliminary discussion of potential future technology development of specialized MRDS equipment. This phased approach allows for rapid fielding of a basic system for improved rescue drilling, with the ability to improve the system over time at a reasonable cost.« less
States with active and abandoned mines face large private and public costs to remediate damage to streams and rivers from acid mine drainage (AMD), the metal rich runoff flowing primarily from abandoned mines and surface deposits of mine waste. AMD can lower stream and river pH ...
A Data Mining Approach to Improve Re-Accessibility and Delivery of Learning Knowledge Objects
ERIC Educational Resources Information Center
Sabitha, Sai; Mehrotra, Deepti; Bansal, Abhay
2014-01-01
Today Learning Management Systems (LMS) have become an integral part of learning mechanism of both learning institutes and industry. A Learning Object (LO) can be one of the atomic components of LMS. A large amount of research is conducted into identifying benchmarks for creating Learning Objects. Some of the major concerns associated with LO are…
From scientific discovery to cures: bright stars within a galaxy.
Williams, R Sanders; Lotia, Samad; Holloway, Alisha K; Pico, Alexander R
2015-09-24
We propose that data mining and network analysis utilizing public databases can identify and quantify relationships between scientific discoveries and major advances in medicine (cures). Further development of such approaches could help to increase public understanding and governmental support for life science research and could enhance decision making in the quest for cures. Copyright © 2015 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Bishop, Malachy; Chan, Fong; Rumrill, Phillip D., Jr.; Frain, Michael P.; Tansey, Timothy N.; Chiu, Chung-Yi; Strauser, David; Umeasiegbu, Veronica I.
2015-01-01
Purpose: To examine demographic, functional, and clinical multiple sclerosis (MS) variables affecting employment status in a national sample of adults with MS in the United States. Method: The sample included 4,142 working-age (20-65 years) Americans with MS (79.1% female) who participated in a national survey. The mean age of participants was…
Rismanchian, Farhood; Lee, Young Hoon
2017-07-01
This article proposes an approach to help designers analyze complex care processes and identify the optimal layout of an emergency department (ED) considering several objectives simultaneously. These objectives include minimizing the distances traveled by patients, maximizing design preferences, and minimizing the relocation costs. Rising demand for healthcare services leads to increasing demand for new hospital buildings as well as renovating existing ones. Operations management techniques have been successfully applied in both manufacturing and service industries to design more efficient layouts. However, high complexity of healthcare processes makes it challenging to apply these techniques in healthcare environments. Process mining techniques were applied to address the problem of complexity and to enhance healthcare process analysis. Process-related information, such as information about the clinical pathways, was extracted from the information system of an ED. A goal programming approach was then employed to find a single layout that would simultaneously satisfy several objectives. The layout identified using the proposed method improved the distances traveled by noncritical and critical patients by 42.2% and 47.6%, respectively, and minimized the relocation costs. This study has shown that an efficient placement of the clinical units yields remarkable improvements in the distances traveled by patients.
Blazing Signature Filter: a library for fast pairwise similarity comparisons
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Joon-Yong; Fujimoto, Grant M.; Wilson, Ryan
Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phenotype, the underlying computational task is often a multi-dimensional similarity test. As datasets continue to grow, improvements to the efficiency, sensitivity or specificity of such computation will have broad impacts as it allows scientists to more completely explore the wealth of scientific data. A significant practical drawback of large-scale data mining is the vast majoritymore » of pairwise comparisons are unlikely to be relevant, meaning that they do not share a signature of interest. It is therefore essential to efficiently identify these unproductive comparisons as rapidly as possible and exclude them from more time-intensive similarity calculations. The Blazing Signature Filter (BSF) is a highly efficient pairwise similarity algorithm which enables extensive data mining within a reasonable amount of time. The algorithm transforms datasets into binary metrics, allowing it to utilize the computationally efficient bit operators and provide a coarse measure of similarity. As a result, the BSF can scale to high dimensionality and rapidly filter unproductive pairwise comparison. Two bioinformatics applications of the tool are presented to demonstrate the ability to scale to billions of pairwise comparisons and the usefulness of this approach.« less
Application-Specific Graph Sampling for Frequent Subgraph Mining and Community Detection
DOE Office of Scientific and Technical Information (OSTI.GOV)
Purohit, Sumit; Choudhury, Sutanay; Holder, Lawrence B.
Graph mining is an important data analysis methodology, but struggles as the input graph size increases. The scalability and usability challenges posed by such large graphs make it imperative to sample the input graph and reduce its size. The critical challenge in sampling is to identify the appropriate algorithm to insure the resulting analysis does not suffer heavily from the data reduction. Predicting the expected performance degradation for a given graph and sampling algorithm is also useful. In this paper, we present different sampling approaches for graph mining applications such as Frequent Subgrpah Mining (FSM), and Community Detection (CD). Wemore » explore graph metrics such as PageRank, Triangles, and Diversity to sample a graph and conclude that for heterogeneous graphs Triangles and Diversity perform better than degree based metrics. We also present two new sampling variations for targeted graph mining applications. We present empirical results to show that knowledge of the target application, along with input graph properties can be used to select the best sampling algorithm. We also conclude that performance degradation is an abrupt, rather than gradual phenomena, as the sample size decreases. We present the empirical results to show that the performance degradation follows a logistic function.« less
Mining the Temporal Dimension of the Information Propagation
NASA Astrophysics Data System (ADS)
Berlingerio, Michele; Coscia, Michele; Giannotti, Fosca
In the last decade, Social Network Analysis has been a field in which the effort devoted from several researchers in the Data Mining area has increased very fast. Among the possible related topics, the study of the information propagation in a network attracted the interest of many researchers, also from the industrial world. However, only a few answers to the questions “How does the information propagates over a network, why and how fast?” have been discovered so far. On the other hand, these answers are of large interest, since they help in the tasks of finding experts in a network, assessing viral marketing strategies, identifying fast or slow paths of the information inside a collaborative network. In this paper we study the problem of finding frequent patterns in a network with the help of two different techniques: TAS (Temporally Annotated Sequences) mining, aimed at extracting sequential patterns where each transition between two events is annotated with a typical transition time that emerges from input data, and Graph Mining, which is helpful for locally analyzing the nodes of the networks with their properties. Finally we show preliminary results done in the direction of mining the information propagation over a network, performed on two well known email datasets, that show the power of the combination of these two approaches.
The risk of collapse in abandoned mine sites: the issue of data uncertainty
NASA Astrophysics Data System (ADS)
Longoni, Laura; Papini, Monica; Brambilla, Davide; Arosio, Diego; Zanzi, Luigi
2016-04-01
Ground collapses over abandoned underground mines constitute a new environmental risk in the world. The high risk associated with subsurface voids, together with lack of knowledge of the geometric and geomechanical features of mining areas, makes abandoned underground mines one of the current challenges for countries with a long mining history. In this study, a stability analysis of Montevecchia marl mine is performed in order to validate a general approach that takes into account the poor local information and the variability of the input data. The collapse risk was evaluated through a numerical approach that, starting with some simplifying assumptions, is able to provide an overview of the collapse probability. The final results is an easy-accessible-transparent summary graph that shows the collapse probability. This approach may be useful for public administrators called upon to manage this environmental risk. The approach tries to simplify this complex problem in order to achieve a roughly risk assessment, but, since it relies on just a small amount of information, any final user should be aware that a comprehensive and detailed risk scenario can be generated only through more exhaustive investigations.
From IHE Audit Trails to XES Event Logs Facilitating Process Mining.
Paster, Ferdinand; Helm, Emmanuel
2015-01-01
Recently Business Intelligence approaches like process mining are applied to the healthcare domain. The goal of process mining is to gain process knowledge, compliance and room for improvement by investigating recorded event data. Previous approaches focused on process discovery by event data from various specific systems. IHE, as a globally recognized basis for healthcare information systems, defines in its ATNA profile how real-world events must be recorded in centralized event logs. The following approach presents how audit trails collected by the means of ATNA can be transformed to enable process mining. Using the standardized audit trails provides the ability to apply these methods to all IHE based information systems.
A Text-Mining Framework for Supporting Systematic Reviews.
Li, Dingcheng; Wang, Zhen; Wang, Liwei; Sohn, Sunghwan; Shen, Feichen; Murad, Mohammad Hassan; Liu, Hongfang
2016-11-01
Systematic reviews (SRs) involve the identification, appraisal, and synthesis of all relevant studies for focused questions in a structured reproducible manner. High-quality SRs follow strict procedures and require significant resources and time. We investigated advanced text-mining approaches to reduce the burden associated with abstract screening in SRs and provide high-level information summary. A text-mining SR supporting framework consisting of three self-defined semantics-based ranking metrics was proposed, including keyword relevance, indexed-term relevance and topic relevance. Keyword relevance is based on the user-defined keyword list used in the search strategy. Indexed-term relevance is derived from indexed vocabulary developed by domain experts used for indexing journal articles and books. Topic relevance is defined as the semantic similarity among retrieved abstracts in terms of topics generated by latent Dirichlet allocation, a Bayesian-based model for discovering topics. We tested the proposed framework using three published SRs addressing a variety of topics (Mass Media Interventions, Rectal Cancer and Influenza Vaccine). The results showed that when 91.8%, 85.7%, and 49.3% of the abstract screening labor was saved, the recalls were as high as 100% for the three cases; respectively. Relevant studies identified manually showed strong topic similarity through topic analysis, which supported the inclusion of topic analysis as relevance metric. It was demonstrated that advanced text mining approaches can significantly reduce the abstract screening labor of SRs and provide an informative summary of relevant studies.
Design risk assessment for burst-prone mines: Application in a Canadian mine
NASA Astrophysics Data System (ADS)
Cheung, David J.
A proactive stance towards improving the effectiveness and consistency of risk assessments has been adopted recently by mining companies and industry. The next 10-20 years forecasts that ore deposits accessible using shallow mining techniques will diminish. The industry continues to strive for success in "deeper" mining projects in order to keep up with the continuing demand for raw materials. Although the returns are quite profitable, many projects have been sidelined due to high uncertainty and technical risk in the mining of the mineral deposit. Several hardrock mines have faced rockbursting and seismicity problems. Within those reported, mines in countries like South Africa, Australia and Canada have documented cases of severe rockburst conditions attributed to the mining depth. Severe rockburst conditions known as "burst-prone" can be effectively managed with design. Adopting a more robust design can ameliorate the exposure of workers and equipment to adverse conditions and minimize the economic consequences, which can hinder the bottom line of an operation. This thesis presents a methodology created for assessing the design risk in burst-prone mines. The methodology includes an evaluation of relative risk ratings for scenarios with options of risk reduction through several design principles. With rockbursts being a hazard of seismic events, the methodology is based on research in the area of mining seismicity factoring in rockmass failure mechanisms, which results from a combination of mining induced stress, geological structures, rockmass properties and mining influences. The methodology was applied to case studies at Craig Mine of Xstrata Nickel in Sudbury, Ontario, which is known to contain seismically active fault zones. A customized risk assessment was created and applied to rockburst case studies, evaluating the seismic vulnerability and consequence for each case. Application of the methodology to Craig Mine demonstrates that changes in the design can reduce both exposure risk (personnel and equipment), and economical risk (revenue and costs). Fatal and catastrophic consequences can be averted through robust planning and design. Two customized approaches were developed to conduct risk assessment of case studies at Craig Mine. Firstly, the Brownfield Approach utilizes the seismic database to determine the seismic hazard from a rating system that evaluates frequency-magnitude, event size, and event-blast relation. Secondly, the Greenfield Approach utilizes the seismic database, focusing on larger magnitude events, rocktype, and geological structure. The customized Greenfield Approach can also be applied in the evaluation of design risk in deep mines with the same setting and condition as Craig Mine. Other mines with different settings and conditions can apply the principles in the methodology to evaluate design alternatives and risk reduction strategies for burst-prone mines.
Determining the familial risk distribution of colorectal cancer: a data mining approach.
Chau, Rowena; Jenkins, Mark A; Buchanan, Daniel D; Ait Ouakrim, Driss; Giles, Graham G; Casey, Graham; Gallinger, Steven; Haile, Robert W; Le Marchand, Loic; Newcomb, Polly A; Lindor, Noralane M; Hopper, John L; Win, Aung Ko
2016-04-01
This study was aimed to characterize the distribution of colorectal cancer risk using family history of cancers by data mining. Family histories for 10,066 colorectal cancer cases recruited to population cancer registries of the Colon Cancer Family Registry were analyzed using a data mining framework. A novel index was developed to quantify familial cancer aggregation. Artificial neural network was used to identify distinct categories of familial risk. Standardized incidence ratios (SIRs) and corresponding 95% confidence intervals (CIs) of colorectal cancer were calculated for each category. We identified five major, and 66 minor categories of familial risk for developing colorectal cancer. The distribution the major risk categories were: (1) 7% of families (SIR = 7.11; 95% CI 6.65-7.59) had a strong family history of colorectal cancer; (2) 13% of families (SIR = 2.94; 95% CI 2.78-3.10) had a moderate family history of colorectal cancer; (3) 11% of families (SIR = 1.23; 95% CI 1.12-1.36) had a strong family history of breast cancer and a weak family history of colorectal cancer; (4) 9 % of families (SIR = 1.06; 95 % CI 0.96-1.18) had strong family history of prostate cancer and weak family history of colorectal cancer; and (5) 60% of families (SIR = 0.61; 95% CI 0.57-0.65) had a weak family history of all cancers. There is a wide variation of colorectal cancer risk that can be categorized by family history of cancer, with a strong gradient of colorectal cancer risk between the highest and lowest risk categories. The risk of colorectal cancer for people with the highest risk category of family history (7% of the population) was 12-times that for people in the lowest risk category (60%) of the population. Data mining was proven an effective approach for gaining insight into the underlying cancer aggregation patterns and for categorizing familial risk of colorectal cancer.
Determining the familial risk distribution of colorectal cancer: A data mining approach
Chau, Rowena; Jenkins, Mark A.; Buchanan, Daniel D.; Ouakrim, Driss Ait; Giles, Graham G.; Casey, Graham; Gallinger, Steven; Haile, Robert W.; Le Marchand, Loic; Newcomb, Polly A.; Lindor, Noralane M.; Hopper, John L.; Win, Aung Ko
2016-01-01
This study was aimed to characterize the distribution of colorectal cancer risk using family history of cancers by data mining. Family histories for 10,066 colorectal cancer cases recruited to population cancer registries of the Colon Cancer Family Registry were analyzed using a data mining framework. A novel index was developed to quantify familial cancer aggregation. Artificial neural network was used to identify distinct categories of familial risk. Standardized incidence ratios (SIRs) and corresponding 95% confidence intervals (CIs) of colorectal cancer were calculated for each category. We identified five major, and sixty-six minor categories of familial risk for developing colorectal cancer. The distribution the major risk categories were: (i) 7% of families (SIR=7.11; 95%CI=6.65–7.59) had a strong family history of colorectal cancer; (ii) 13% of families (SIR=2.94; 95%CI=2.78–3.10) had a moderate family history of colorectal cancer; (iii) 11% of families (SIR=1.23; 95%CI=1.12–1.36) had a strong family history of breast cancer and weak family history of colorectal cancer; (iv) 9% of families (SIR=1.06; 95% CI=0.96–1.18) had a strong family history of prostate cancer and a weak family history of colorectal cancer; and (v) 60% of families (SIR=0.61; 95%CI=0.57–0.65) had weak family history of all cancers. There is a wide variation of colorectal cancer risk that can be categorized by family history of cancer, with a strong gradient of colorectal cancer risk between the highest and lowest risk categories. The risk of colorectal cancer for people with the highest risk category of family history (7% of the population) was 12-times that for people in the lowest risk category (60%) of the population. Data mining was proven an effective approach for gaining insight into the underlying cancer aggregation patterns and for categorizing familial risk of colorectal cancer. PMID:26681340
Singhal, Ayush; Simmons, Michael; Lu, Zhiyong
2016-11-01
The practice of precision medicine will ultimately require databases of genes and mutations for healthcare providers to reference in order to understand the clinical implications of each patient's genetic makeup. Although the highest quality databases require manual curation, text mining tools can facilitate the curation process, increasing accuracy, coverage, and productivity. However, to date there are no available text mining tools that offer high-accuracy performance for extracting such triplets from biomedical literature. In this paper we propose a high-performance machine learning approach to automate the extraction of disease-gene-variant triplets from biomedical literature. Our approach is unique because we identify the genes and protein products associated with each mutation from not just the local text content, but from a global context as well (from the Internet and from all literature in PubMed). Our approach also incorporates protein sequence validation and disease association using a novel text-mining-based machine learning approach. We extract disease-gene-variant triplets from all abstracts in PubMed related to a set of ten important diseases (breast cancer, prostate cancer, pancreatic cancer, lung cancer, acute myeloid leukemia, Alzheimer's disease, hemochromatosis, age-related macular degeneration (AMD), diabetes mellitus, and cystic fibrosis). We then evaluate our approach in two ways: (1) a direct comparison with the state of the art using benchmark datasets; (2) a validation study comparing the results of our approach with entries in a popular human-curated database (UniProt) for each of the previously mentioned diseases. In the benchmark comparison, our full approach achieves a 28% improvement in F1-measure (from 0.62 to 0.79) over the state-of-the-art results. For the validation study with UniProt Knowledgebase (KB), we present a thorough analysis of the results and errors. Across all diseases, our approach returned 272 triplets (disease-gene-variant) that overlapped with entries in UniProt and 5,384 triplets without overlap in UniProt. Analysis of the overlapping triplets and of a stratified sample of the non-overlapping triplets revealed accuracies of 93% and 80% for the respective categories (cumulative accuracy, 77%). We conclude that our process represents an important and broadly applicable improvement to the state of the art for curation of disease-gene-variant relationships.
Vieira, Caroline Krug; Marascalchi, Matheus Nicoletti; Rodrigues, Arthur Vinicius; de Armas, Rafael Dutra; Stürmer, Sidney Luiz
2018-05-01
Arbuscular mycorrhizal fungi (AMF) are important during revegetation of mining sites, but few studies compared AMF community in revegetated sites with pristine adjacent ecosystems. The aim of this study was to assess AMF species richness in a revegetated iron-mining site and adjacent ecosystems and to relate AMF occurrence to soil chemical parameters. Soil samples were collected in dry and rainy seasons in a revegetated iron-mining site (RA) and compared with pristine ecosystems of forest (FL), canga (NG), and Cerrado (CE). AMF species were identified by spore morphology from field and trap cultures and by LSU rDNA sequencing using Illumina. A total of 62 AMF species were recovered, pertaining to 18 genera and nine families of Glomeromycota. The largest number of species and families were detected in RA, and Acaulospora mellea and Glomus sp1 were the most frequent species. Species belonging to Glomeraceae and Acaulosporaceae accounted for 42%-48% of total species richness. Total number of spores and mycorrhizal inoculum potential tended to be higher in the dry than in the rainy season, except in RA. Sequences of uncultured Glomerales were dominant in all sites and seasons and five species were detected exclusively by DNA-based identification. Redundancy analysis evidenced soil pH, organic matter, aluminum, and iron as main factors influencing AMF presence. In conclusion, revegetation of the iron-mining site seems to be effective in maintaining a diverse AMF community and different approaches are complementary to reveal AMF species, despite the larger number of species being identified by traditional identification of field spores. Copyright © 2017. Published by Elsevier B.V.
MINE: Module Identification in Networks
2011-01-01
Background Graphical models of network associations are useful for both visualizing and integrating multiple types of association data. Identifying modules, or groups of functionally related gene products, is an important challenge in analyzing biological networks. However, existing tools to identify modules are insufficient when applied to dense networks of experimentally derived interaction data. To address this problem, we have developed an agglomerative clustering method that is able to identify highly modular sets of gene products within highly interconnected molecular interaction networks. Results MINE outperforms MCODE, CFinder, NEMO, SPICi, and MCL in identifying non-exclusive, high modularity clusters when applied to the C. elegans protein-protein interaction network. The algorithm generally achieves superior geometric accuracy and modularity for annotated functional categories. In comparison with the most closely related algorithm, MCODE, the top clusters identified by MINE are consistently of higher density and MINE is less likely to designate overlapping modules as a single unit. MINE offers a high level of granularity with a small number of adjustable parameters, enabling users to fine-tune cluster results for input networks with differing topological properties. Conclusions MINE was created in response to the challenge of discovering high quality modules of gene products within highly interconnected biological networks. The algorithm allows a high degree of flexibility and user-customisation of results with few adjustable parameters. MINE outperforms several popular clustering algorithms in identifying modules with high modularity and obtains good overall recall and precision of functional annotations in protein-protein interaction networks from both S. cerevisiae and C. elegans. PMID:21605434
Image Information Mining Utilizing Hierarchical Segmentation
NASA Technical Reports Server (NTRS)
Tilton, James C.; Marchisio, Giovanni; Koperski, Krzysztof; Datcu, Mihai
2002-01-01
The Hierarchical Segmentation (HSEG) algorithm is an approach for producing high quality, hierarchically related image segmentations. The VisiMine image information mining system utilizes clustering and segmentation algorithms for reducing visual information in multispectral images to a manageable size. The project discussed herein seeks to enhance the VisiMine system through incorporating hierarchical segmentations from HSEG into the VisiMine system.
A probabilistic approach for mine burial prediction
NASA Astrophysics Data System (ADS)
Barbu, Costin; Valent, Philip; Richardson, Michael; Abelev, Andrei; Plant, Nathaniel
2004-09-01
Predicting the degree of burial of mines in soft sediments is one of the main concerns of Naval Mine CounterMeasures (MCM) operations. This is a difficult problem to solve due to uncertainties and variability of the sediment parameters (i.e., density and shear strength) and of the mine state at contact with the seafloor (i.e., vertical and horizontal velocity, angular rotation rate, and pitch angle at the mudline). A stochastic approach is proposed in this paper to better incorporate the dynamic nature of free-falling cylindrical mines in the modeling of impact burial. The orientation, trajectory and velocity of cylindrical mines, after about 4 meters free-fall in the water column, are very strongly influenced by boundary layer effects causing quite chaotic behavior. The model's convolution of the uncertainty through its nonlinearity is addressed by employing Monte Carlo simulations. Finally a risk analysis based on the probability of encountering an undetectable mine is performed.
Numerical Modeling Tools for the Prediction of Solution Migration Applicable to Mining Site
DOE Office of Scientific and Technical Information (OSTI.GOV)
Martell, M.; Vaughn, P.
1999-01-06
Mining has always had an important influence on cultures and traditions of communities around the globe and throughout history. Today, because mining legislation places heavy emphasis on environmental protection, there is great interest in having a comprehensive understanding of ancient mining and mining sites. Multi-disciplinary approaches (i.e., Pb isotopes as tracers) are being used to explore the distribution of metals in natural environments. Another successful approach is to model solution migration numerically. A proven method to simulate solution migration in natural rock salt has been applied to project through time for 10,000 years the system performance and solution concentrations surroundingmore » a proposed nuclear waste repository. This capability is readily adaptable to simulate solution migration around mining.« less
Occupational respiratory diseases in the South African mining industry
Nelson, Gill
2013-01-01
Background Crystalline silica and asbestos are common minerals that occur throughout South Africa, exposure to either causes respiratory disease. Most studies on silicosis in South Africa have been cross-sectional and long-term trends have not been reported. Although much research has been conducted on the health effects of silica dust and asbestos fibre in the gold-mining and asbestos-mining sectors, little is known about their health effects in other mining sectors. Objective The aims of this thesis were to describe silicosis trends in gold miners over three decades, and to explore the potential for diamond mine workers to develop asbestos-related diseases and platinum mine workers to develop silicosis. Methods Mine workers for the three sub-studies were identified from a mine worker autopsy database at the National Institute for Occupational Health. Results From 1975 to 2007, the proportions of white and black gold mine workers with silicosis increased from 18 to 22% and from 3 to 32% respectively. Cases of diamond and platinum mine workers with asbestos-related diseases and silicosis, respectively, were also identified. Conclusion The trends in silicosis in gold miners at autopsy clearly demonstrate the failure of the gold mines to adequately control dust and prevent occupational respiratory disease. The two case series of diamond and platinum mine workers contribute to the evidence for the risk of asbestos-related diseases in diamond mine workers and silicosis in platinum mine workers, respectively. The absence of reliable environmental dust measurements and incomplete work history records impedes occupational health research in South Africa because it is difficult to identify and/or validate sources of dust exposure that may be associated with occupational respiratory disease. PMID:23364097
Occupational respiratory diseases in the South African mining industry.
Nelson, Gill
2013-01-24
Crystalline silica and asbestos are common minerals that occur throughout South Africa, exposure to either causes respiratory disease. Most studies on silicosis in South Africa have been cross-sectional and long-term trends have not been reported. Although much research has been conducted on the health effects of silica dust and asbestos fibre in the gold-mining and asbestos-mining sectors, little is known about their health effects in other mining sectors. The aims of this thesis were to describe silicosis trends in gold miners over three decades, and to explore the potential for diamond mine workers to develop asbestos-related diseases and platinum mine workers to develop silicosis. Mine workers for the three sub-studies were identified from a mine worker autopsy database at the National Institute for Occupational Health. From 1975 to 2007, the proportions of white and black gold mine workers with silicosis increased from 18 to 22% and from 3 to 32% respectively. Cases of diamond and platinum mine workers with asbestos-related diseases and silicosis, respectively, were also identified. The trends in silicosis in gold miners at autopsy clearly demonstrate the failure of the gold mines to adequately control dust and prevent occupational respiratory disease. The two case series of diamond and platinum mine workers contribute to the evidence for the risk of asbestos-related diseases in diamond mine workers and silicosis in platinum mine workers, respectively. The absence of reliable environmental dust measurements and incomplete work history records impedes occupational health research in South Africa because it is difficult to identify and/or validate sources of dust exposure that may be associated with occupational respiratory disease.
NASA Technical Reports Server (NTRS)
Wier, C. E.; Wobber, F. J. (Principal Investigator); Russell, O. R.; Amato, R. V.; Leshendok, T. V.
1974-01-01
The author has identified the following significant results. New fracture detail of Indiana has been observed and mapped from ERTS-1 imagery. Studies so far indicate a close relationship between the directions of fracture traces mapped from the imagery, fractures measured on bedrock outcrops, and fractures measured in the underground mines. First hand observations and discussions with underground mine operators indicate good correlation of mine hazard maps prepared from ERTS-1/aircraft imagery and actual roof falls. The inventory of refuse piles/slurry ponds of the coal field of Indiana has identified over 225 such sites from past mining operations. These data will serve the State Legislature in making tax decisions on coal mining which take on increased importance because of the energy crisis.
Data Integration and Mining for Synthetic Biology Design.
Mısırlı, Göksel; Hallinan, Jennifer; Pocock, Matthew; Lord, Phillip; McLaughlin, James Alastair; Sauro, Herbert; Wipat, Anil
2016-10-21
One aim of synthetic biologists is to create novel and predictable biological systems from simpler modular parts. This approach is currently hampered by a lack of well-defined and characterized parts and devices. However, there is a wealth of existing biological information, which can be used to identify and characterize biological parts, and their design constraints in the literature and numerous biological databases. However, this information is spread among these databases in many different formats. New computational approaches are required to make this information available in an integrated format that is more amenable to data mining. A tried and tested approach to this problem is to map disparate data sources into a single data set, with common syntax and semantics, to produce a data warehouse or knowledge base. Ontologies have been used extensively in the life sciences, providing this common syntax and semantics as a model for a given biological domain, in a fashion that is amenable to computational analysis and reasoning. Here, we present an ontology for applications in synthetic biology design, SyBiOnt, which facilitates the modeling of information about biological parts and their relationships. SyBiOnt was used to create the SyBiOntKB knowledge base, incorporating and building upon existing life sciences ontologies and standards. The reasoning capabilities of ontologies were then applied to automate the mining of biological parts from this knowledge base. We propose that this approach will be useful to speed up synthetic biology design and ultimately help facilitate the automation of the biological engineering life cycle.
NASA Technical Reports Server (NTRS)
Wier, C. E. (Principal Investigator); Wobber, F. J. (Principal Investigator); Russell, O. R.; Amato, R. V.
1972-01-01
The author has identified the following significant results. Numerous fractures are identifiable on the 1:120,000 color infrared photography. Some of these fractures are in the proximity of operating open pit mines and should provide opportunities for field checking and confirmation.
Ensemble Learning Method for Hidden Markov Models
2014-12-01
Ensemble HMM landmine detector Mine signatures vary according to the mine type, mine size , and burial depth. Similarly, clutter signatures vary with soil ...approaches for the di erent K groups depending on their size and homogeneity. In particular, we investigate the maximum likelihood (ML), the minimum...propose using and optimizing various training approaches for the different K groups depending on their size and homogeneity. In particular, we
Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra
2015-01-01
Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level. PMID:25830807
Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra
2015-01-01
Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level.
The Spatial Assessment of the Current Seismic Hazard State for Hard Rock Underground Mines
NASA Astrophysics Data System (ADS)
Wesseloo, Johan
2018-06-01
Mining-induced seismic hazard assessment is an important component in the management of safety and financial risk in mines. As the seismic hazard is a response to the mining activity, it is non-stationary and variable both in space and time. This paper presents an approach for implementing a probabilistic seismic hazard assessment to assess the current hazard state of a mine. Each of the components of the probabilistic seismic hazard assessment is considered within the context of hard rock underground mines. The focus of this paper is the assessment of the in-mine hazard distribution and does not consider the hazard to nearby public or structures. A rating system and methodologies to present hazard maps, for the purpose of communicating to different stakeholders in the mine, i.e. mine managers, technical personnel and the work force, are developed. The approach allows one to update the assessment with relative ease and within short time periods as new data become available, enabling the monitoring of the spatial and temporal change in the seismic hazard.
Bell, Shannon M; Edwards, Stephen W
2015-11-01
There are > 80,000 chemicals in commerce with few data available describing their impacts on human health. Biomonitoring surveys, such as the NHANES (National Health and Nutrition Examination Survey), offer one route to identifying possible relationships between environmental chemicals and health impacts, but sparse data and the complexity of traditional models make it difficult to leverage effectively. We describe a workflow to efficiently and comprehensively evaluate and prioritize chemical-health impact relationships from the NHANES biomonitoring survey studies. Using a frequent itemset mining (FIM) approach, we identified relationships between chemicals and health biomarkers and diseases. The FIM method identified 7,848 relationships between 219 chemicals and 93 health outcomes/biomarkers. Two case studies used to evaluate the FIM rankings demonstrate that the FIM approach is able to identify published relationships. Because the relationships are derived from the vast majority of the chemicals monitored by NHANES, the resulting list of associations is appropriate for evaluating results from targeted data mining or identifying novel candidate relationships for more detailed investigation. Because of the computational efficiency of the FIM method, all chemicals and health effects can be considered in a single analysis. The resulting list provides a comprehensive summary of the chemical/health co-occurrences from NHANES that are higher than expected by chance. This information enables ranking and prioritization on chemicals or health effects of interest for evaluation of published results and design of future studies. Bell SM, Edwards SW. 2015. Identification and prioritization of relationships between environmental stressors and adverse human health impacts. Environ Health Perspect 123:1193-1199; http://dx.doi.org/10.1289/ehp.1409138.
Approaches to Post-Mining Land Reclamation in Polish Open-Cast Lignite Mining
NASA Astrophysics Data System (ADS)
Kasztelewicz, Zbigniew
2014-06-01
The paper presents the situation regarding the reclamation of post-mining land in the case of particular lignite mines in Poland until 2012 against the background of the whole opencast mining. It discusses the process of land purchase for mining operations and its sales after reclamation. It presents the achievements of mines in the reclamation and regeneration of post-mining land as a result of which-after development processes carried out according to European standards-it now serves the inhabitants as a recreational area that increases the attractiveness of the regions.
O'Mara-Eves, Alison; Thomas, James; McNaught, John; Miwa, Makoto; Ananiadou, Sophia
2015-01-14
The large and growing number of published studies, and their increasing rate of publication, makes the task of identifying relevant studies in an unbiased way for inclusion in systematic reviews both complex and time consuming. Text mining has been offered as a potential solution: through automating some of the screening process, reviewer time can be saved. The evidence base around the use of text mining for screening has not yet been pulled together systematically; this systematic review fills that research gap. Focusing mainly on non-technical issues, the review aims to increase awareness of the potential of these technologies and promote further collaborative research between the computer science and systematic review communities. Five research questions led our review: what is the state of the evidence base; how has workload reduction been evaluated; what are the purposes of semi-automation and how effective are they; how have key contextual problems of applying text mining to the systematic review field been addressed; and what challenges to implementation have emerged? We answered these questions using standard systematic review methods: systematic and exhaustive searching, quality-assured data extraction and a narrative synthesis to synthesise findings. The evidence base is active and diverse; there is almost no replication between studies or collaboration between research teams and, whilst it is difficult to establish any overall conclusions about best approaches, it is clear that efficiencies and reductions in workload are potentially achievable. On the whole, most suggested that a saving in workload of between 30% and 70% might be possible, though sometimes the saving in workload is accompanied by the loss of 5% of relevant studies (i.e. a 95% recall). Using text mining to prioritise the order in which items are screened should be considered safe and ready for use in 'live' reviews. The use of text mining as a 'second screener' may also be used cautiously. The use of text mining to eliminate studies automatically should be considered promising, but not yet fully proven. In highly technical/clinical areas, it may be used with a high degree of confidence; but more developmental and evaluative work is needed in other disciplines.
Abandoned Uranium Mine (AUM) Trust Mine Points, Navajo Nation, 2016, US EPA Region 9
This GIS dataset contains point features that represent mines included in the Navajo Environmental Response Trust. This mine category also includes Priority mines. USEPA and NNEPA prioritized mines based on gamma radiation levels, proximity to homes and potential for water contamination identified in the preliminary assessments. Attributes include mine names, reclaimed status, links to US EPA AUM reports, and the region in which the mine is located. This dataset contains 19 features.
Mining Adverse Drug Reactions in Social Media with Named Entity Recognition and Semantic Methods.
Chen, Xiaoyi; Deldossi, Myrtille; Aboukhamis, Rim; Faviez, Carole; Dahamna, Badisse; Karapetiantz, Pierre; Guenegou-Arnoux, Armelle; Girardeau, Yannick; Guillemin-Lanne, Sylvie; Lillo-Le-Louët, Agnès; Texier, Nathalie; Burgun, Anita; Katsahian, Sandrine
2017-01-01
Suspected adverse drug reactions (ADR) reported by patients through social media can be a complementary source to current pharmacovigilance systems. However, the performance of text mining tools applied to social media text data to discover ADRs needs to be evaluated. In this paper, we introduce the approach developed to mine ADR from French social media. A protocol of evaluation is highlighted, which includes a detailed sample size determination and evaluation corpus constitution. Our text mining approach provided very encouraging preliminary results with F-measures of 0.94 and 0.81 for recognition of drugs and symptoms respectively, and with F-measure of 0.70 for ADR detection. Therefore, this approach is promising for downstream pharmacovigilance analysis.
The Future of Computer-Based Toxicity Prediction:
Mechanism-Based Models vs. Information Mining Approaches
When we speak of computer-based toxicity prediction, we are generally referring to a broad array of approaches which rely primarily upon chemical structure ...
Gorokhovich, Yuri; Reid, Matthew; Mignone, Erica; Voros, Andrew
2003-10-01
Coal mine reclamation projects are very expensive and require coordination of local and federal agencies to identify resources for the most economic way of reclaiming mined land. Location of resources for mine reclamation is a spatial problem. This article presents a methodology that allows the combination of spatial data on resources for the coal mine reclamation and uses GIS analysis to develop a priority list of potential mine reclamation sites within contiguous United States using the method of extrapolation. The extrapolation method in this study was based on the Bark Camp reclamation project. The mine reclamation project at Bark Camp, Pennsylvania, USA, provided an example of the beneficial use of fly ash and dredged material to reclaim 402,600 sq mi of a mine abandoned in the 1980s. Railroads provided transportation of dredged material and fly ash to the site. Therefore, four spatial elements contributed to the reclamation project at Bark Camp: dredged material, abandoned mines, fly ash sources, and railroads. Using spatial distribution of these data in the contiguous United States, it was possible to utilize GIS analysis to prioritize areas where reclamation projects similar to Bark Camp are feasible. GIS analysis identified unique occurrences of all four spatial elements used in the Bark Camp case for each 1 km of the United States territory within 20, 40, 60, 80, and 100 km radii from abandoned mines. The results showed the number of abandoned mines for each state and identified their locations. The federal or state governments can use these results in mine reclamation planning.
Compositional mining of multiple object API protocols through state abstraction.
Dai, Ziying; Mao, Xiaoguang; Lei, Yan; Qi, Yuhua; Wang, Rui; Gu, Bin
2013-01-01
API protocols specify correct sequences of method invocations. Despite their usefulness, API protocols are often unavailable in practice because writing them is cumbersome and error prone. Multiple object API protocols are more expressive than single object API protocols. However, the huge number of objects of typical object-oriented programs poses a major challenge to the automatic mining of multiple object API protocols: besides maintaining scalability, it is important to capture various object interactions. Current approaches utilize various heuristics to focus on small sets of methods. In this paper, we present a general, scalable, multiple object API protocols mining approach that can capture all object interactions. Our approach uses abstract field values to label object states during the mining process. We first mine single object typestates as finite state automata whose transitions are annotated with states of interacting objects before and after the execution of the corresponding method and then construct multiple object API protocols by composing these annotated single object typestates. We implement our approach for Java and evaluate it through a series of experiments.
Compositional Mining of Multiple Object API Protocols through State Abstraction
Mao, Xiaoguang; Qi, Yuhua; Wang, Rui; Gu, Bin
2013-01-01
API protocols specify correct sequences of method invocations. Despite their usefulness, API protocols are often unavailable in practice because writing them is cumbersome and error prone. Multiple object API protocols are more expressive than single object API protocols. However, the huge number of objects of typical object-oriented programs poses a major challenge to the automatic mining of multiple object API protocols: besides maintaining scalability, it is important to capture various object interactions. Current approaches utilize various heuristics to focus on small sets of methods. In this paper, we present a general, scalable, multiple object API protocols mining approach that can capture all object interactions. Our approach uses abstract field values to label object states during the mining process. We first mine single object typestates as finite state automata whose transitions are annotated with states of interacting objects before and after the execution of the corresponding method and then construct multiple object API protocols by composing these annotated single object typestates. We implement our approach for Java and evaluate it through a series of experiments. PMID:23844378
Mining of Business-Oriented Conversations at a Call Center
NASA Astrophysics Data System (ADS)
Takeuchi, Hironori; Nasukawa, Tetsuya; Watanabe, Hideo
Recently it has become feasible to transcribe textual records from telephone conversations at call centers by using automatic speech recognition. In this research, we extended a text mining system for call summary records and constructed a conversation mining system for the business-oriented conversations at the call center. To acquire useful business insights from the conversational data through the text mining system, it is critical to identify appropriate textual segments and expressions as the viewpoints to focus on. In the analysis of call summary data using a text mining system, some experts defined the viewpoints for the analysis by looking at some sample records and by preparing the dictionaries based on frequent keywords in the sample dataset. However with conversations it is difficult to identify such viewpoints manually and in advance because the target data consists of complete transcripts that are often lengthy and redundant. In this research, we defined a model of the business-oriented conversations and proposed a mining method to identify segments that have impacts on the outcomes of the conversations and can then extract useful expressions in each of these identified segments. In the experiment, we processed the real datasets from a car rental service center and constructed a mining system. With this system, we show the effectiveness of the method based on the defined conversation model.
Miranda-Carrazco, Alejandra; Vigueras-Cortés, Juan M; Villa-Tanaca, Lourdes; Hernández-Rodríguez, César
2018-04-11
Mine tailings and wastewater generate man-made environments with several selective pressures, including the presence of heavy metals, arsenic and high cyanide concentrations, but severe nutritional limitations. Some oligotrophic and pioneer bacteria can colonise and grow in mine wastes containing a low concentration of organic matter and combined nitrogen sources. In this study, Pseudomonas mendocina P6115 was isolated from mine tailings in Durango, Mexico, and identified through a phylogenetic approach of 16S rRNA, gyrB, rpoB, and rpoD genes. Cell growth, cyanide consumption, and ammonia production kinetics in a medium with cyanide as sole nitrogen source showed that at the beginning, the strain grew assimilating cyanide, when cyanide was removed, ammonium was produced and accumulated in the culture medium. However, no clear stoichiometric relationship between both nitrogen sources was observed. Also, cyanide complexes were assimilated as nitrogen sources. Other phenotypic tasks that contribute to the strain's adaptation to a mine tailing environment included siderophores production in media with moderate amounts of heavy metals, arsenite and arsenate tolerance, and the capacity of oxidizing arsenite. P. mendocina P6115 harbours cioA/cioB and aoxB genes encoding for a cyanide-insensitive oxidase and an arsenite oxidase, respectively. This is the first report where P. mendocina is described as a cyanotrophic and arsenic oxidizing species. Genotypic and phenotypic tasks of P. mendocina P6115 autochthonous from mine wastes are potentially relevant for biological treatment of residues contaminated with cyanide and arsenic.
Mirzaei Aliabadi, Mostafa; Aghaei, Hamed; Kalatpour, Omid; Soltanian, Ali Reza; SeyedTabib, Maryam
2018-05-18
Mines are a dangerous workplace worldwide with a high accident rate. According to the Statistical Center of Iran, the number of occupational accidents in Iranian mines has increased in recent years. This study determined and explained human and organizational deficiencies influencing Iranian mining accidents. In this study, the data associated with 305 mining accidents were investigated. The data were analyzed based on a systems analysis approach to identify critical deficiencies in organizational influences, unsafe supervision, preconditions for unsafe acts, and workers' unsafe acts. Partial Least Square Structural Equation Modeling [PLS-SEM] was utilized for modeling the interactions between these deficiencies. It was demonstrated that organizational deficiencies had a direct positive effect on workers' violations (path coefficient=0.16) and workers' errors (path coefficient=0.23). The effect of unsafe supervision on workers' violations and workers' errors was also significant with the path coefficients of 0.14 and 0.20. Likewise, preconditions for unsafe acts also had a significant effect on both workers' violations (path coefficient=0.16) and workers' errors (path coefficient=0.21). Moreover, organizational deficiencies had an indirect positive effect on workers' unsafe acts mediated by unsafe supervision and preconditions for unsafe acts. Among the variables examined in the current study, organizational influences had the strongest impacts on workers' unsafe acts. Organizational deficiencies are the main causes of accidents in mining sectors that affects all other aspects of system safety. For preventing occupational accidents, organizational deficiencies should be modified first.
Genome mining for ribosomally synthesized natural products.
Velásquez, Juan E; van der Donk, Wilfred A
2011-02-01
In recent years, the number of known peptide natural products that are synthesized via the ribosomal pathway has rapidly grown. Taking advantage of sequence homology among genes encoding precursor peptides or biosynthetic proteins, in silico mining of genomes combined with molecular biology approaches has guided the discovery of a large number of new ribosomal natural products, including lantipeptides, cyanobactins, linear thiazole/oxazole-containing peptides, microviridins, lasso peptides, amatoxins, cyclotides, and conopeptides. In this review, we describe the strategies used for the identification of these ribosomally synthesized and posttranslationally modified peptides (RiPPs) and the structures of newly identified compounds. The increasing number of chemical entities and their remarkable structural and functional diversity may lead to novel pharmaceutical applications. Copyright © 2010 Elsevier Ltd. All rights reserved.
Three-dimensional organic Dirac-line materials due to nonsymmorphic symmetry: A data mining approach
NASA Astrophysics Data System (ADS)
Geilhufe, R. Matthias; Bouhon, Adrien; Borysov, Stanislav S.; Balatsky, Alexander V.
2017-01-01
A data mining study of electronic Kohn-Sham band structures was performed to identify Dirac materials within the Organic Materials Database. Out of that, the three-dimensional organic crystal 5,6-bis(trifluoromethyl)-2-methoxy-1 H -1,3-diazepine was found to host different Dirac-line nodes within the band structure. From a group theoretical analysis, it is possible to distinguish between Dirac-line nodes occurring due to twofold degenerate energy levels protected by the monoclinic crystalline symmetry and twofold degenerate accidental crossings protected by the topology of the electronic band structure. The obtained results can be generalized to all materials having the space group P 21/c (No. 14, C2h 5) by introducing three distinct topological classes.
Statistical data mining of streaming motion data for fall detection in assistive environments.
Tasoulis, S K; Doukas, C N; Maglogiannis, I; Plagianakos, V P
2011-01-01
The analysis of human motion data is interesting for the purpose of activity recognition or emergency event detection, especially in the case of elderly or disabled people living independently in their homes. Several techniques have been proposed for identifying such distress situations using either motion, audio or video sensors on the monitored subject (wearable sensors) or the surrounding environment. The output of such sensors is data streams that require real time recognition, especially in emergency situations, thus traditional classification approaches may not be applicable for immediate alarm triggering or fall prevention. This paper presents a statistical mining methodology that may be used for the specific problem of real time fall detection. Visual data captured from the user's environment, using overhead cameras along with motion data are collected from accelerometers on the subject's body and are fed to the fall detection system. The paper includes the details of the stream data mining methodology incorporated in the system along with an initial evaluation of the achieved accuracy in detecting falls.
Applying Data Mining Techniques to Improve Breast Cancer Diagnosis.
Diz, Joana; Marreiros, Goreti; Freitas, Alberto
2016-09-01
In the field of breast cancer research, and more than ever, new computer aided diagnosis based systems have been developed aiming to reduce diagnostic tests false-positives. Within this work, we present a data mining based approach which might support oncologists in the process of breast cancer classification and diagnosis. The present study aims to compare two breast cancer datasets and find the best methods in predicting benign/malignant lesions, breast density classification, and even for finding identification (mass / microcalcification distinction). To carry out these tasks, two matrices of texture features extraction were implemented using Matlab, and classified using data mining algorithms, on WEKA. Results revealed good percentages of accuracy for each class: 89.3 to 64.7 % - benign/malignant; 75.8 to 78.3 % - dense/fatty tissue; 71.0 to 83.1 % - finding identification. Among the different tests classifiers, Naive Bayes was the best to identify masses texture, and Random Forests was the first or second best classifier for the majority of tested groups.
APPLYING DATA MINING APPROACHES TO FURTHER ...
This dataset will be used to illustrate various data mining techniques to biologically profile the chemical space. This dataset will be used to illustrate various data mining techniques to biologically profile the chemical space.
Monitoring genotoxic exposure in uranium mines
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sram, R.J.; Vesela, D.; Vesely, D.
1993-10-01
Recent data from deep uranium mines in Czechoslovakia indicated that miners are exposed to other mutagenic factors in addition to radon daughter products. Mycotoxins were identified as a possible source of mutagens in these mines. Mycotoxins were examined in 38 samples from mines and in throat swabs taken from 116 miners and 78 controls. The following mycotoxins were identified from mines samples: aflatoxins B{sub 1} and G1, citrinin, citreoviridin, mycophenolic acid, and sterigmatocystin. Some mold strains isolated from mines and throat swabs were investigated for mutagenic activity by the SOS chromotest and Salmonella assay with strains TA100 and TA98. Mutagenicitymore » was observed, especially with metabolic activation in citro. These data suggest that mycotoxins produced by molds in uranium mines are a new genotoxic factor im uranium miners. 17 refs., 4 tabs.« less
Raja, Kalpana; Patrick, Matthew; Gao, Yilin; Madu, Desmond; Yang, Yuyang
2017-01-01
In the past decade, the volume of “omics” data generated by the different high-throughput technologies has expanded exponentially. The managing, storing, and analyzing of this big data have been a great challenge for the researchers, especially when moving towards the goal of generating testable data-driven hypotheses, which has been the promise of the high-throughput experimental techniques. Different bioinformatics approaches have been developed to streamline the downstream analyzes by providing independent information to interpret and provide biological inference. Text mining (also known as literature mining) is one of the commonly used approaches for automated generation of biological knowledge from the huge number of published articles. In this review paper, we discuss the recent advancement in approaches that integrate results from omics data and information generated from text mining approaches to uncover novel biomedical information. PMID:28331849
BOUNDS ON SUBSURFACE MERCURY FLUX FROM THE SULPHUR BANK MERCURY MINE, LAKE COUNTY, CALIFORNIA
The Sulphur Bank Mercury Mine (SBMM) in Lake County, California has been identified as a significant source of mercury to Clear Lake. The mine was operated from the 1860s through the 1950's. Mining started with surface operations, progressed to shaft mining, and later to open p...
Jiang, Li; Edwards, Stefan M; Thomsen, Bo; Workman, Christopher T; Guldbrandtsen, Bernt; Sørensen, Peter
2014-09-24
Prioritizing genetic variants is a challenge because disease susceptibility loci are often located in genes of unknown function or the relationship with the corresponding phenotype is unclear. A global data-mining exercise on the biomedical literature can establish the phenotypic profile of genes with respect to their connection to disease phenotypes. The importance of protein-protein interaction networks in the genetic heterogeneity of common diseases or complex traits is becoming increasingly recognized. Thus, the development of a network-based approach combined with phenotypic profiling would be useful for disease gene prioritization. We developed a random-set scoring model and implemented it to quantify phenotype relevance in a network-based disease gene-prioritization approach. We validated our approach based on different gene phenotypic profiles, which were generated from PubMed abstracts, OMIM, and GeneRIF records. We also investigated the validity of several vocabulary filters and different likelihood thresholds for predicted protein-protein interactions in terms of their effect on the network-based gene-prioritization approach, which relies on text-mining of the phenotype data. Our method demonstrated good precision and sensitivity compared with those of two alternative complex-based prioritization approaches. We then conducted a global ranking of all human genes according to their relevance to a range of human diseases. The resulting accurate ranking of known causal genes supported the reliability of our approach. Moreover, these data suggest many promising novel candidate genes for human disorders that have a complex mode of inheritance. We have implemented and validated a network-based approach to prioritize genes for human diseases based on their phenotypic profile. We have devised a powerful and transparent tool to identify and rank candidate genes. Our global gene prioritization provides a unique resource for the biological interpretation of data from genome-wide association studies, and will help in the understanding of how the associated genetic variants influence disease or quantitative phenotypes.
NASA Astrophysics Data System (ADS)
Phuong, Vu Hung
2018-03-01
This research applies Data Envelopment Analysis (DEA) approach to analyze Total Factor Productivity (TFP) and efficiency changes in Vietnam coal mining industry from 2007 to 2013. The TFP of Vietnam coal mining companies decreased due to slow technological progress and unimproved efficiency. The decadence of technical efficiency in many enterprises proved that the coal mining industry has a large potential to increase productivity through technical efficiency improvement. Enhancing human resource training, technology and research & development investment could help the industry to improve efficiency and productivity in Vietnam coal mining industry.
Gallego, J R; Esquinas, N; Rodríguez-Valdés, E; Menéndez-Aguado, J M; Sierra, C
2015-12-30
The abandonment of Hg-As mining and metallurgy sites, together with long-term weathering, can dramatically degrade the environment. In this work it is exemplified the complex legacy of contamination that afflicts Hg-As brownfields through the detailed study of a paradigmatic site. Firstly, an in-depth study of the former industrial process was performed to identify sources of different types of waste. Subsequently, the composition and reactivity of As- and Hg-rich wastes (calcines, As-rich soot, stupp, and flue dust) was analyzed by means of multielemental analysis, mineralogical characterization (X-ray diffraction, electronic, and optical microscopy, microbrobe), chemical speciation, and sequential extractions. As-rich soot in the form of arsenolite, a relatively mobile by-product of the pyrometallurgical process, and stupp, a residue originated in the former condensing system, were determined to be the main risk at the site. In addition, the screening of organic pollution was also aimed, as shown by the outcome of benzo(a) pyrene and other PAHs, and by the identification of unexpected Hg organo-compounds (phenylmercury propionate). The approach followed unravels evidence from waste from the mining and metallurgy industry that may be present in other similar sites, and identifies unexpected contaminants overlooked by conventional analyses. Copyright © 2015 Elsevier B.V. All rights reserved.
Zou, Wei; She, Jianwen; Tolstikov, Vladimir V.
2013-01-01
Current available biomarkers lack sensitivity and/or specificity for early detection of cancer. To address this challenge, a robust and complete workflow for metabolic profiling and data mining is described in details. Three independent and complementary analytical techniques for metabolic profiling are applied: hydrophilic interaction liquid chromatography (HILIC–LC), reversed-phase liquid chromatography (RP–LC), and gas chromatography (GC). All three techniques are coupled to a mass spectrometer (MS) in the full scan acquisition mode, and both unsupervised and supervised methods are used for data mining. The univariate and multivariate feature selection are used to determine subsets of potentially discriminative predictors. These predictors are further identified by obtaining accurate masses and isotopic ratios using selected ion monitoring (SIM) and data-dependent MS/MS and/or accurate mass MSn ion tree scans utilizing high resolution MS. A list combining all of the identified potential biomarkers generated from different platforms and algorithms is used for pathway analysis. Such a workflow combining comprehensive metabolic profiling and advanced data mining techniques may provide a powerful approach for metabolic pathway analysis and biomarker discovery in cancer research. Two case studies with previous published data are adapted and included in the context to elucidate the application of the workflow. PMID:24958150
Zhang, Lei; Li, Yin; Guo, Xinfeng; May, Brian H.; Xue, Charlie C. L.; Yang, Lihong; Liu, Xusheng
2014-01-01
Objectives. To apply modern text-mining methods to identify candidate herbs and formulae for the treatment of diabetic nephropathy. Methods. The method we developed includes three steps: (1) identification of candidate ancient terms; (2) systemic search and assessment of medical records written in classical Chinese; (3) preliminary evaluation of the effect and safety of candidates. Results. Ancient terms Xia Xiao, Shen Xiao, and Xiao Shen were determined as the most likely to correspond with diabetic nephropathy and used in text mining. A total of 80 Chinese formulae for treating conditions congruent with diabetic nephropathy recorded in medical books from Tang Dynasty to Qing Dynasty were collected. Sao si tang (also called Reeling Silk Decoction) was chosen to show the process of preliminary evaluation of the candidates. It had promising potential for development as new agent for the treatment of diabetic nephropathy. However, further investigations about the safety to patients with renal insufficiency are still needed. Conclusions. The methods developed in this study offer a targeted approach to identifying traditional herbs and/or formulae as candidates for further investigation in the search for new drugs for modern disease. However, more effort is still required to improve our techniques, especially with regard to compound formulae. PMID:24744808
Defining hazard from the mine worker's perspective
Eiter, B.M.; Kosmoski, C.L.; Connor, B.P.
2016-01-01
In the recent past, the mining industry has witnessed a substantial increase in the numbers of fatalities occurring at metal and nonmetal mine sites, but it is unclear why this is occurring. One possible explanation is that workers struggle with identifying worksite hazards and accurately assessing the associated risk. The purpose of this research was to explore this possibility within the mining industry and to more fully understand stone, sand and gravel (SSG) mine workers' thoughts, understandings and perceptions of worksite hazards and risks. Eight mine workers were interviewed and asked to identify common hazards they come across when doing their jobs and to then discuss their perceptions of the risks associated with those identified hazards. The results of this exploratory study indicate the importance of workers' job-related experience as it applies to hazard identification and risk perception, particularly their knowledge of or familiarity with a task, whether or not they had personal control over that task, and the frequency with which they perform that task. PMID:28042176
Almeida, C E; Folly-Ramos, E; Peterson, A T; Lima-Neiva, V; Gumiel, M; Duarte, R; Lima, M M; Locks, M; Beltrão, M; Costa, J
2009-12-01
Searches for Chagas disease vectors were performed at the type locality from which Triatoma sherlocki Papa et al. (Hemiptera: Reduviidae: Triatominae) was described in the municipality of Gentio do Ouro, in the state of Bahia, Brazil, and in a small artisan quarry-mining community approximately 13 km distant in a remote area of the same municipality. The latter site represents a new locality record for this species. Adults, nymphs and exuviae of T. sherlocki were found in 21% of human dwellings, indicating that the species is in the process of domiciliation. Prevalence of Trypanosoma cruzi infection in collected bugs was 10.8%. Simple predictive approaches based on environmental similarity were used to identify additional sites likely suitable for this species. The approach successfully predicted an additional five sites for the species in surrounding landscapes. Ecological and entomological indicators were combined to discuss whether this scenario likely represents an isolated case or an emerging public health problem.
Quality Assurance of NCI Thesaurus by Mining Structural-Lexical Patterns
Abeysinghe, Rashmie; Brooks, Michael A.; Talbert, Jeffery; Licong, Cui
2017-01-01
Quality assurance of biomedical terminologies such as the National Cancer Institute (NCI) Thesaurus is an essential part of the terminology management lifecycle. We investigate a structural-lexical approach based on non-lattice subgraphs to automatically identify missing hierarchical relations and missing concepts in the NCI Thesaurus. We mine six structural-lexical patterns exhibiting in non-lattice subgraphs: containment, union, intersection, union-intersection, inference-contradiction, and inference union. Each pattern indicates a potential specific type of error and suggests a potential type of remediation. We found 809 non-lattice subgraphs with these patterns in the NCI Thesaurus (version 16.12d). Domain experts evaluated a random sample of 50 small non-lattice subgraphs, of which 33 were confirmed to contain errors and make correct suggestions (33/50 = 66%). Of the 25 evaluated subgraphs revealing multiple patterns, 22 were verified correct (22/25 = 88%). This shows the effectiveness of our structurallexical-pattern-based approach in detecting errors and suggesting remediations in the NCI Thesaurus. PMID:29854100
Systematic review of community health impacts of mountaintop removal mining.
Boyles, Abee L; Blain, Robyn B; Rochester, Johanna R; Avanasi, Raghavendhran; Goldhaber, Susan B; McComb, Sofie; Holmgren, Stephanie D; Masten, Scott A; Thayer, Kristina A
2017-10-01
The objective of this evaluation is to understand the human health impacts of mountaintop removal (MTR) mining, the major method of coal mining in and around Central Appalachia. MTR mining impacts the air, water, and soil and raises concerns about potential adverse health effects in neighboring communities; exposures associated with MTR mining include particulate matter (PM), polycyclic aromatic hydrocarbons (PAHs), metals, hydrogen sulfide, and other recognized harmful substances. A systematic review was conducted of published studies of MTR mining and community health, occupational studies of MTR mining, and any available animal and in vitro experimental studies investigating the effects of exposures to MTR-mining-related chemical mixtures. Six databases (Embase, PsycINFO, PubMed, Scopus, Toxline, and Web of Science) were searched with customized terms, and no restrictions on publication year or language, through October 27, 2016. The eligibility criteria included all human population studies and animal models of human health, direct and indirect measures of MTR-mining exposure, any health-related effect or change in physiological response, and any study design type. Risk of bias was assessed for observational and experimental studies using an approach developed by the National Toxicology Program (NTP) Office of Health Assessment and Translation (OHAT). To provide context for these health effects, a summary of the exposure literature is included that focuses on describing findings for outdoor air, indoor air, and drinking water. From a literature search capturing 3088 studies, 33 human studies (29 community, four occupational), four experimental studies (two in rat, one in vitro and in mice, one in C. elegans), and 58 MTR mining exposure studies were identified. A number of health findings were reported in observational human studies, including cardiopulmonary effects, mortality, and birth defects. However, concerns for risk of bias were identified, especially with respect to exposure characterization, accounting for confounding variables (such as socioeconomic status), and methods used to assess health outcomes. Typically, exposure was assessed by proximity of residence or hospital to coal mining or production level at the county level. In addition, assessing the consistency of findings was challenging because separate publications likely included overlapping case and comparison groups. For example, 11 studies of mortality were conducted with most reporting higher rates associated with coal mining, but many of these relied on the same national datasets and were unable to consider individual-level contributors to mortality such as poor socioeconomic status or smoking. Two studies of adult rats reported impaired microvascular and cardiac mitochondrial function after intratracheal exposure to PM from MTR-mining sites. Exposures associated with MTR mining included reports of PM levels that sometimes exceeded Environmental Protection Agency (EPA) standards; higher levels of dust, trace metals, hydrogen sulfide gas; and a report of increased public drinking water violations. This systematic review could not reach conclusions on community health effects of MTR mining because of the strong potential for bias in the current body of human literature. Improved characterization of exposures by future community health studies and further study of the effects of MTR mining chemical mixtures in experimental models will be critical to determining health risks of MTR mining to communities. Without such work, uncertainty will remain regarding the impact of these practices on the health of the people who breathe the air and drink the water affected by MTR mining. Published by Elsevier Ltd.
Search for underground openings for in situ test facilities in crystalline rock
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wollenberg, H.A.; Strisower, B.; Corrigan, D.J.
1980-01-01
With a few exceptions, crystalline rocks in this study were limited to plutonic rocks and medium to high-grade metamorphic rocks. Nearly 1700 underground mines, possibly occurring in crystalline rock, were initially identified. Application of criteria resulted in the identification of 60 potential sites. Within this number, 26 mines and 4 civil works were identified as having potential in that they fulfilled the criteria. Thirty other mines may have similar potential. Most of the mines identified are near the contact between a pluton and older sedimentary, volcanic and metamorphic rocks. However, some mines and the civil works are well within plutonicmore » or metamorphic rock masses. Civil works, notably underground galleries associated with pumped storage hydroelectric facilities, are generally located in tectonically stable regions, in relatively homogeneous crystalline rock bodies. A program is recommended which would identify one or more sites where a concordance exists between geologic setting, company amenability, accessibility and facilities to conduct in situ tests in crystalline rock.« less
NASA Astrophysics Data System (ADS)
Qu, Shen; Wang, Guangcai; Shi, Zheming; Xu, Qingyu; Guo, Yuying; Ma, Luan; Sheng, Yizhi
2018-05-01
With depleted coal resources or deteriorating mining geological conditions, some coal mines have been abandoned in the Fengfeng mining district, China. Water that accumulates in an abandoned underground mine (goaf water) may be a hazard to neighboring mines and impact the groundwater environment. Groundwater samples at three abandoned mines (Yi, Er and Quantou mines) in the Fengfeng mining district and the underlying Ordovician limestone aquifer were collected to characterize their chemical and isotopic compositions and identify the sources of the mine water. The water was HCO3·SO4-Ca·Mg type in Er mine and the auxiliary shaft of Yi mine, and HCO3·SO4-Na type in the main shaft of Quantou mine. The isotopic compositions (δD and δ18O) of water in the three abandoned mines were close to that of Ordovician limestone groundwater. Faults in the abandoned mines were developmental, possibly facilitating inflows of groundwater from the underlying Ordovician limestone aquifers into the coal mines. Although the Sr2+ concentrations differed considerably, the ratios of Sr2+/Ca2+ and 87Sr/86Sr and the 34S content of SO4 2- were similar for all three mine waters and Ordovician limestone groundwater, indicating that a close hydraulic connection may exist. Geochemical and isotopic indicators suggest that (1) the mine waters may originate mainly from the Ordovician limestone groundwater inflows, and (2) the upward hydraulic gradient in the limestone aquifer may prevent its contamination by the overlying abandoned mine water. The results of this study could be useful for water resources management in this area and other similar mining areas.
Risk Management Interventions to Reduce Injuries and Maximize Economic Benefits in U.S. Mining.
Griffin, Stephanie C; Bui, David P; Gowrisankaran, Gautam; Lutz, Eric A; He, Charles; Hu, Chengcheng; Burgess, Jefferey L
2018-03-01
Risk management (RM) is a cyclical process of identifying and ranking risks, implementing controls, and evaluating their effectiveness. This study aims to identify effective RM interventions in the U.S. mining industry. RM interventions were identified in four companies representing metal, aggregate, and coal mining sectors. Injury rates were determined using Mine Safety and Health Administration (MSHA) data and changes in injury rates identified through change point analysis. Program implementation costs and associated changes in injury costs were evaluated for select interventions. Six of 20 RM interventions were associated with a decline in all injuries and one with a reduction in lost-time injuries, all with a positive return on investment. Reductions in injuries and associated costs were observed following implementation of a limited number of specific RM interventions.
Chen, Annie T; Zhu, Shu-Hong; Conway, Mike
2015-09-29
The rise in popularity of electronic cigarettes (e-cigarettes) and hookah over recent years has been accompanied by some confusion and uncertainty regarding the development of an appropriate regulatory response towards these emerging products. Mining online discussion content can lead to insights into people's experiences, which can in turn further our knowledge of how to address potential health implications. In this work, we take a novel approach to understanding the use and appeal of these emerging products by applying text mining techniques to compare consumer experiences across discussion forums. This study examined content from the websites Vapor Talk, Hookah Forum, and Reddit to understand people's experiences with different tobacco products. Our investigation involves three parts. First, we identified contextual factors that inform our understanding of tobacco use behaviors, such as setting, time, social relationships, and sensory experience, and compared the forums to identify the ones where content on these factors is most common. Second, we compared how the tobacco use experience differs with combustible cigarettes and e-cigarettes. Third, we investigated differences between e-cigarette and hookah use. In the first part of our study, we employed a lexicon-based extraction approach to estimate prevalence of contextual factors, and then we generated a heat map based on these estimates to compare the forums. In the second and third parts of the study, we employed a text mining technique called topic modeling to identify important topics and then developed a visualization, Topic Bars, to compare topic coverage across forums. In the first part of the study, we identified two forums, Vapor Talk Health & Safety and the Stopsmoking subreddit, where discussion concerning contextual factors was particularly common. The second part showed that the discussion in Vapor Talk Health & Safety focused on symptoms and comparisons of combustible cigarettes and e-cigarettes, and the Stopsmoking subreddit focused on psychological aspects of quitting. Last, we examined the discussion content on Vapor Talk and Hookah Forum. Prominent topics included equipment, technique, experiential elements of use, and the buying and selling of equipment. This study has three main contributions. Discussion forums differ in the extent to which their content may help us understand behaviors with potential health implications. Identifying dimensions of interest and using a heat map visualization to compare across forums can be helpful for identifying forums with the greatest density of health information. Additionally, our work has shown that the quitting experience can potentially be very different depending on whether or not e-cigarettes are used. Finally, e-cigarette and hookah forums are similar in that members represent a "hobbyist culture" that actively engages in information exchange. These differences have important implications for both tobacco regulation and smoking cessation intervention design.
Wide-Open: Accelerating public data release by automating detection of overdue datasets
Poon, Hoifung; Howe, Bill
2017-01-01
Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private. We demonstrate the effectiveness of this approach on 2 popular National Center for Biotechnology Information (NCBI) repositories: Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). Our Wide-Open system identified a large number of overdue datasets, which spurred administrators to respond directly by releasing 400 datasets in one week. PMID:28594819
Wide-Open: Accelerating public data release by automating detection of overdue datasets.
Grechkin, Maxim; Poon, Hoifung; Howe, Bill
2017-06-01
Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private. We demonstrate the effectiveness of this approach on 2 popular National Center for Biotechnology Information (NCBI) repositories: Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). Our Wide-Open system identified a large number of overdue datasets, which spurred administrators to respond directly by releasing 400 datasets in one week.
Chemical Topic Modeling: Exploring Molecular Data Sets Using a Common Text-Mining Approach.
Schneider, Nadine; Fechner, Nikolas; Landrum, Gregory A; Stiefl, Nikolaus
2017-08-28
Big data is one of the key transformative factors which increasingly influences all aspects of modern life. Although this transformation brings vast opportunities it also generates novel challenges, not the least of which is organizing and searching this data deluge. The field of medicinal chemistry is not different: more and more data are being generated, for instance, by technologies such as DNA encoded libraries, peptide libraries, text mining of large literature corpora, and new in silico enumeration methods. Handling those huge sets of molecules effectively is quite challenging and requires compromises that often come at the expense of the interpretability of the results. In order to find an intuitive and meaningful approach to organizing large molecular data sets, we adopted a probabilistic framework called "topic modeling" from the text-mining field. Here we present the first chemistry-related implementation of this method, which allows large molecule sets to be assigned to "chemical topics" and investigating the relationships between those. In this first study, we thoroughly evaluate this novel method in different experiments and discuss both its disadvantages and advantages. We show very promising results in reproducing human-assigned concepts using the approach to identify and retrieve chemical series from sets of molecules. We have also created an intuitive visualization of the chemical topics output by the algorithm. This is a huge benefit compared to other unsupervised machine-learning methods, like clustering, which are commonly used to group sets of molecules. Finally, we applied the new method to the 1.6 million molecules of the ChEMBL22 data set to test its robustness and efficiency. In about 1 h we built a 100-topic model of this large data set in which we could identify interesting topics like "proteins", "DNA", or "steroids". Along with this publication we provide our data sets and an open-source implementation of the new method (CheTo) which will be part of an upcoming version of the open-source cheminformatics toolkit RDKit.
NASA Astrophysics Data System (ADS)
Fryanov, V. N.; Pavlova, L. D.; Temlyantsev, M. V.
2017-09-01
Methodological approaches to theoretical substantiation of the structure and parameters of robotic coal mines are outlined. The results of mathematical and numerical modeling revealed the features of manifestation of geomechanical and gas dynamic processes in the conditions of robotic mines. Technological solutions for the design and manufacture of technical means for robotic mine are adopted using the method of economic and mathematical modeling and in accordance with the current regulatory documents. For a comparative performance evaluation of technological schemes of traditional and robotic mines, methods of cognitive modeling and matrix search for subsystem elements in the synthesis of a complex geotechnological system are applied. It is substantiated that the process of technical re-equipment of a traditional mine with a phased transition to a robotic mine will reduce unit costs by almost 1.5 times with a significant social effect due to a reduction in the number of personnel engaged in hazardous work.
A malware detection scheme based on mining format information.
Bai, Jinrong; Wang, Junfeng; Zou, Guozhong
2014-01-01
Malware has become one of the most serious threats to computer information system and the current malware detection technology still has very significant limitations. In this paper, we proposed a malware detection approach by mining format information of PE (portable executable) files. Based on in-depth analysis of the static format information of the PE files, we extracted 197 features from format information of PE files and applied feature selection methods to reduce the dimensionality of the features and achieve acceptable high performance. When the selected features were trained using classification algorithms, the results of our experiments indicate that the accuracy of the top classification algorithm is 99.1% and the value of the AUC is 0.998. We designed three experiments to evaluate the performance of our detection scheme and the ability of detecting unknown and new malware. Although the experimental results of identifying new malware are not perfect, our method is still able to identify 97.6% of new malware with 1.3% false positive rates.
A Malware Detection Scheme Based on Mining Format Information
Bai, Jinrong; Wang, Junfeng; Zou, Guozhong
2014-01-01
Malware has become one of the most serious threats to computer information system and the current malware detection technology still has very significant limitations. In this paper, we proposed a malware detection approach by mining format information of PE (portable executable) files. Based on in-depth analysis of the static format information of the PE files, we extracted 197 features from format information of PE files and applied feature selection methods to reduce the dimensionality of the features and achieve acceptable high performance. When the selected features were trained using classification algorithms, the results of our experiments indicate that the accuracy of the top classification algorithm is 99.1% and the value of the AUC is 0.998. We designed three experiments to evaluate the performance of our detection scheme and the ability of detecting unknown and new malware. Although the experimental results of identifying new malware are not perfect, our method is still able to identify 97.6% of new malware with 1.3% false positive rates. PMID:24991639
BioProspecting: novel marker discovery obtained by mining the bibleome.
Elkin, Peter L; Tuttle, Mark S; Trusko, Brett E; Brown, Steven H
2009-02-05
BioProspecting is a novel approach that enabled our team to mine data related to genetic markers from the New England Journal of Medicine (NEJM) utilizing SNOMED CT and the Human Gene Onotology (HUGO). The Biomedical Informatics Research Collaborative was able to link genes and disorders using the Multi-threaded Clinical Vocabulary Server (MCVS) and natural language processing engine, whose output creates an ontology-network using the semantic encodings of the literature that is organized by these two terminologies. We identified relationships between (genes or proteins) and (diseases or drugs) as linked by metabolic functions and identified potentially novel functional relationships between, for example, genes and diseases (e.g. Article #1 ([Gene - IL27] = > {Enzyme - Dipeptidyl Carboxypeptidase 1}) and Article #2 ({Enzyme - Dipeptidyl Carboxypeptidase 1} < = [Disorder - Type II DM]) showing a metabolic link between IL27 and Type II DM). In this manuscript we describe our method for developing the database and its content as well as its potential to assist in the discovery of novel markers and drugs.
DrugQuest - a text mining workflow for drug association discovery.
Papanikolaou, Nikolas; Pavlopoulos, Georgios A; Theodosiou, Theodosios; Vizirianakis, Ioannis S; Iliopoulos, Ioannis
2016-06-06
Text mining and data integration methods are gaining ground in the field of health sciences due to the exponential growth of bio-medical literature and information stored in biological databases. While such methods mostly try to extract bioentity associations from PubMed, very few of them are dedicated in mining other types of repositories such as chemical databases. Herein, we apply a text mining approach on the DrugBank database in order to explore drug associations based on the DrugBank "Description", "Indication", "Pharmacodynamics" and "Mechanism of Action" text fields. We apply Name Entity Recognition (NER) techniques on these fields to identify chemicals, proteins, genes, pathways, diseases, and we utilize the TextQuest algorithm to find additional biologically significant words. Using a plethora of similarity and partitional clustering techniques, we group the DrugBank records based on their common terms and investigate possible scenarios why these records are clustered together. Different views such as clustered chemicals based on their textual information, tag clouds consisting of Significant Terms along with the terms that were used for clustering are delivered to the user through a user-friendly web interface. DrugQuest is a text mining tool for knowledge discovery: it is designed to cluster DrugBank records based on text attributes in order to find new associations between drugs. The service is freely available at http://bioinformatics.med.uoc.gr/drugquest .
NASA Astrophysics Data System (ADS)
Masaitis, A.
2013-12-01
Conflicts in the development of mining projects are now common between the mining proponents, NGO's and communities. These conflicts can sometimes be alleviated by early development of modes of communication, and a formal discussion format that allows airing of concerns and potential resolution of problems. One of the methods that can formalize this process is to establish a Good Neighbor Agreement (GNA), which deals specifically with challenges in relationships between mining operations and the local communities. It is a new practice related to mining operations that are oriented toward social needs and concerns of local communities that arise during the normal life of a mine, which can achieve sustainable mining practices in both developing and developed countries. The GNA project being currently developed at the University of Nevada, Reno in cooperation with the Newmont Mining Corporation has a goal to create an open company/community dialog that is based on the international standards and that will help identify and address sociological and environmental concerns associated with mining, as well as find methods for communication and conflict resolution. GNA standards should be based on trust doctrine, open information access, and community involvement in the decision making process. It should include the following components: emergency response and community communications; environmental issues, including air and water quality standards; reclamation and recultivation; socio-economic issues: transportation, safety, training, and local hiring; and financial issues, particularly related to mitigation offsets and community needs. The GNA standards help identify and evaluate conflict criteria in mining/community relationships; determine the status of concerns; focus on the local political and government systems; separate the acute and the chronic concerns; determine the role and responsibilities of stakeholders; analyze problem resolution feasibility; maintain the community involvement and support through economic benefits and environmental safeguards; develop options for the concerns resolution; develop and manage short and long-term plans. Difficulties in establishing the GNA standards include identification of the full list of stakeholders, lack of responsible environmental protection practices, dependence on the government and political system, lack of will to disclose full information to the public. It is further complicated by the lack of insurance/bonding policies, and by the lack of audit and monitoring that could determine the level of exposure of the local community and the environment to the contaminants released at the mine sites. Since many problems of mines can occur during closure and post-closure, GNA's should address those issues also. Determined the process for the GNA implementation as a conflict prevention/resolution tool, analyzed conflict/concerns criteria associated with mining operations, determined the role of the stakeholders, worked out the process of stakeholders monitoring, carried out the sociological survey of the stakeholders and the community. Frequent conflicts between mining companies and surrounding communities that lead to work disruptions or even mine closures show the necessity of a less confrontational approach to environmental and social justice. Establishment of GNA standards for use in both developed and developing nations can decrease these conflicts.
NASA Astrophysics Data System (ADS)
Cominola, A.; Spang, E. S.; Giuliani, M.; Castelletti, A.; Loge, F. J.; Lund, J. R.
2016-12-01
Demand side management strategies are key to meet future water and energy demands in urban contexts, promote water and energy efficiency in the residential sector, provide customized services and communications to consumers, and reduce utilities' costs. Smart metering technologies allow gathering high temporal and spatial resolution water and energy consumption data and support the development of data-driven models of consumers' behavior. Modelling and predicting resource consumption behavior is essential to inform demand management. Yet, analyzing big, smart metered, databases requires proper data mining and modelling techniques, in order to extract useful information supporting decision makers to spot end uses towards which water and energy efficiency or conservation efforts should be prioritized. In this study, we consider the following research questions: (i) how is it possible to extract representative consumers' personalities out of big smart metered water and energy data? (ii) are residential water and energy consumption profiles interconnected? (iii) Can we design customized water and energy demand management strategies based on the knowledge of water- energy demand profiles and other user-specific psychographic information? To address the above research questions, we contribute a data-driven approach to identify and model routines in water and energy consumers' behavior. We propose a novel customer segmentation procedure based on data-mining techniques. Our procedure consists of three steps: (i) extraction of typical water-energy consumption profiles for each household, (ii) profiles clustering based on their similarity, and (iii) evaluation of the influence of candidate explanatory variables on the identified clusters. The approach is tested onto a dataset of smart metered water and energy consumption data from over 1000 households in South California. Our methodology allows identifying heterogeneous groups of consumers from the studied sample, as well as characterizing them with respect to consumption profiles features and socio- demographic information. Results show how such better understanding of the considered users' community allows spotting potentially interesting areas for water and energy demand management interventions.
NASA Technical Reports Server (NTRS)
Wier, C. E.; Wobber, F. J.; Amato, R. V.; Russell, O. R. (Principal Investigator)
1974-01-01
The author has identified the following significant results. All Skylab 2 imagery received to date has been analyzed manually and data related to fracture analysis and mined land inventories has been summarized on map-overlays. A comparison of the relative utility of the Skylab image products for fracture detection, soil tone/vegetation contrast mapping, and mined land mapping has been completed. Numerous fracture traces were detected on both color and black and white transparencies. Unique fracture trace data which will contribute to the investigator's mining hazards analysis were noted on the EREP imagery; these data could not be detected on ERTS-1 imagery or high altitude aircraft color infrared photography. Stream segments controlled by fractures or joint systems could be identified in more detail than with ERTS-1 imagery of comparable scale. ERTS-1 mine hazards products will be modified to demonstrate the value of this additional data. Skylab images were used successfully to update a mined land map of Indiana made in 1972. Changes in mined area as small as two acres can be identified. As the Energy Crisis increases the demand for coal, such demonstrations of the application of Skylab data to coal resources will take on new importance.
NASA Technical Reports Server (NTRS)
Wier, C. E.; Wobber, F. J.; Amato, R. V.; Russell, O. R. (Principal Investigator)
1973-01-01
The author has identified the following significant results. Numerous fracture traces were detected on both the color transparencies and black and white spectral bands. Fracture traces of value to mining hazards analysis were noted on the EREP imagery which could not be detected on either the ERTS-1 or high altitude aircraft color infrared photography. Several areas of mine subsidence occurring in the Busseron Creek area near Sullivan, Indiana were successfully identified using color photography. Skylab photography affords an increase over comparable scale ERTS-1 imagery in level of information obtained in mined lands inventory and reclamation analysis. A review of EREP color photography permitted the identification of a substantial number of non-fuel mines within the Southern Indiana test area. A new mine was detected on the EREP photography without prior data. EREP has definite value for estimating areal changes in active mines and for detecting new non-fuel mines. Gob piles and slurry ponds of several acres could be detected on the S-190B color photography when observed in association with large scale mining operations. Apparent degradation of water quality resulting from acid mine drainage and/or siltation was noted in several ponds or small lakes and appear to be related to intensive mining activity near Sullivan, Indiana.
Application of data mining approaches to drug delivery.
Ekins, Sean; Shimada, Jun; Chang, Cheng
2006-11-30
Computational approaches play a key role in all areas of the pharmaceutical industry from data mining, experimental and clinical data capture to pharmacoeconomics and adverse events monitoring. They will likely continue to be indispensable assets along with a growing library of software applications. This is primarily due to the increasingly massive amount of biology, chemistry and clinical data, which is now entering the public domain mainly as a result of NIH and commercially funded projects. We are therefore in need of new methods for mining this mountain of data in order to enable new hypothesis generation. The computational approaches include, but are not limited to, database compilation, quantitative structure activity relationships (QSAR), pharmacophores, network visualization models, decision trees, machine learning algorithms and multidimensional data visualization software that could be used to improve drug delivery after mining public and/or proprietary data. We will discuss some areas of unmet needs in the area of data mining for drug delivery that can be addressed with new software tools or databases of relevance to future pharmaceutical projects.
Dietary patterns analysis using data mining method. An application to data from the CYKIDS study.
Lazarou, Chrystalleni; Karaolis, Minas; Matalas, Antonia-Leda; Panagiotakos, Demosthenes B
2012-11-01
Data mining is a computational method that permits the extraction of patterns from large databases. We applied the data mining approach in data from 1140 children (9-13 years), in order to derive dietary habits related to children's obesity status. Rules emerged via data mining approach revealed the detrimental influence of the increased consumption of soft dinks, delicatessen meat, sweets, fried and junk food. For example, frequent (3-5 times/week) consumption of all these foods increases the risk for being obese by 75%, whereas in children who have a similar dietary pattern, but eat >2 times/week fish and seafood the risk for obesity is reduced by 33%. In conclusion patterns revealed from data mining technique refer to specific groups of children and demonstrate the effect on the risk associated with obesity status when a single dietary habit might be modified. Thus, a more individualized approach when translating public health messages could be achieved. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Wright, Adam; Ricciardi, Thomas N.; Zwick, Martin
2005-01-01
The Medical Quality Improvement Consortium data warehouse contains de-identified data on more than 3.6 million patients including their problem lists, test results, procedures and medication lists. This study uses reconstructability analysis, an information-theoretic data mining technique, on the MQIC data warehouse to empirically identify risk factors for various complications of diabetes including myocardial infarction and microalbuminuria. The risk factors identified match those risk factors identified in the literature, demonstrating the utility of the MQIC data warehouse for outcomes research, and RA as a technique for mining clinical data warehouses. PMID:16779156
Ground Truth Collections for Explosions in Northern Fennoscandia and Russia
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harris, D B; Ringdal, F; Kremenetskaya, E
2003-07-28
This project is providing ground-truth information on explosions conducted at the principal mines within 500 kilometers of the ARCES station, and is assembling a seismic waveform database for these events from local and regional stations. The principal mines of interest are in northwest Russia (Khibiny Massif, Olenogorsk, Zapolyamy, and Kovdor groups) and Sweden (Malmberget, Kiruna). These mines form a natural laboratory for examining the variation of mining explosion observations with source type, since they include colocated surface and underground mines and mines conducting a variety of different shot types. In September 2002 we deployed two lines of temporary stations frommore » the Khibiny Massif through and to the north of the ARCES station. This deployment is producing data that will allow researchers to examine the variation of discriminants caused by varying source-receiver distance and the diversity of explosion types. To date, we have collected ground-truth information on 1,118 explosions in the Kola Peninsula, and have assembled waveform data for approximately 700 of these. The database includes waveforms from instruments temporarily deployed in the Khibiny Massif mines, from the Apatity network just outside of the Massif, from LVZ, KEV and ARCES, and from the stations deployed along the two lines into northern Norway. In this paper we present representative waveforms for several types of shots recorded at various regional distances. We have conducted a preliminary study of the variation of phase ratios as a function of source type. This study shows significant differences in Pd/Sn and Pd/Lg ratios for two types of mining explosions: surface ripple-fired explosions and compact underground explosions. Compact explosions are, typically, underground explosions of a few tons with only one or two short delays, and are the closest approximation to single, well-tamped explosions available in the Khibiny mines. The surface shots typically are much larger (ranging up to hundreds of tons), with many delays. The surface mine that we present results for typically also conducts several distinct shots across the mine nearly simultaneously (with a few seconds or tens of seconds). Measured phase ratios are more consistent for compact underground explosions. This consistency is an expected result given the smaller scope for shot variation in these smaller events. In addition, Pd/Lg ratios appear more stable than Pd/Sn ratios for both types of events. The most interesting result is that the compact underground explosions are richer in shear energy (i.e. having smaller P/S ratios) than their surface ripple-fired counterparts. We continue to work on an approach for identifying the principal mines to be targeted for screening at a particular station. Often, routine industrial blasts constitute a large proportion of events detected by monitoring stations close to major mining districts. Many mines may be present, and it may be a problem to determine which subset of mines is responsible for the majority of the events, and should be prime candidates for the deployment of ground-truth collection resources. Our solution to this problem entails several steps. The first is to find geographic clusters of events that may correspond to major groups of mines. For this step, we use event density maps generated from existing network catalogs. This year we examined some of the tradeoffs in generating event density maps: use of automated bulletins to produce maps vs. analyst-reviewed bulletins, and the amount of time required to produce stables maps which can be used to identify significant mines.« less
Data-Mining Technologies for Diabetes: A Systematic Review
Marinov, Miroslav; Mosa, Abu Saleh Mohammad; Yoo, Illhoi; Boren, Suzanne Austin
2011-01-01
Background The objective of this study is to conduct a systematic review of applications of data-mining techniques in the field of diabetes research. Method We searched the MEDLINE database through PubMed. We initially identified 31 articles by the search, and selected 17 articles representing various data-mining methods used for diabetes research. Our main interest was to identify research goals, diabetes types, data sets, data-mining methods, data-mining software and technologies, and outcomes. Results The applications of data-mining techniques in the selected articles were useful for extracting valuable knowledge and generating new hypothesis for further scientific research/experimentation and improving health care for diabetes patients. The results could be used for both scientific research and real-life practice to improve the quality of health care diabetes patients. Conclusions Data mining has played an important role in diabetes research. Data mining would be a valuable asset for diabetes researchers because it can unearth hidden knowledge from a huge amount of diabetes-related data. We believe that data mining can significantly help diabetes research and ultimately improve the quality of health care for diabetes patients. PMID:22226277
Parker, Tony J.; Sampson, Dayle L.; Broszczak, Daniel; Chng, Yee L.; Carter, Shea L.; Leavesley, David I.; Parker, Anthony W.; Upton, Zee
2012-01-01
Biomarker analysis has been implemented in sports research in an attempt to monitor the effects of exertion and fatigue in athletes. This study proposed that while such biomarkers may be useful for monitoring injury risk in workers, proteomic approaches might also be utilised to identify novel exertion or injury markers. We found that urinary urea and cortisol levels were significantly elevated in mining workers following a 12 hour overnight shift. These levels failed to return to baseline over 24 h in the more active maintenance crew compared to truck drivers (operators) suggesting a lack of recovery between shifts. Use of a SELDI-TOF MS approach to detect novel exertion or injury markers revealed a spectral feature which was associated with workers in both work categories who were engaged in higher levels of physical activity. This feature was identified as the LG3 peptide, a C-terminal fragment of the anti-angiogenic/anti-tumourigenic protein endorepellin. This finding suggests that urinary LG3 peptide may be a biomarker of physical activity. It is also possible that the activity mediated release of LG3/endorepellin into the circulation may represent a biological mechanism for the known inverse association between physical activity and cancer risk/survival. PMID:22457785
Parker, Tony J; Sampson, Dayle L; Broszczak, Daniel; Chng, Yee L; Carter, Shea L; Leavesley, David I; Parker, Anthony W; Upton, Zee
2012-01-01
Biomarker analysis has been implemented in sports research in an attempt to monitor the effects of exertion and fatigue in athletes. This study proposed that while such biomarkers may be useful for monitoring injury risk in workers, proteomic approaches might also be utilised to identify novel exertion or injury markers. We found that urinary urea and cortisol levels were significantly elevated in mining workers following a 12 hour overnight shift. These levels failed to return to baseline over 24 h in the more active maintenance crew compared to truck drivers (operators) suggesting a lack of recovery between shifts. Use of a SELDI-TOF MS approach to detect novel exertion or injury markers revealed a spectral feature which was associated with workers in both work categories who were engaged in higher levels of physical activity. This feature was identified as the LG3 peptide, a C-terminal fragment of the anti-angiogenic/anti-tumourigenic protein endorepellin. This finding suggests that urinary LG3 peptide may be a biomarker of physical activity. It is also possible that the activity mediated release of LG3/endorepellin into the circulation may represent a biological mechanism for the known inverse association between physical activity and cancer risk/survival.
McNabb, Matthew; Cao, Yu; Devlin, Thomas; Baxter, Blaise; Thornton, Albert
2012-01-01
Mechanical Embolus Removal in Cerebral Ischemia (MERCI) has been supported by medical trials as an improved method of treating ischemic stroke past the safe window of time for administering clot-busting drugs, and was released for medical use in 2004. The importance of analyzing real-world data collected from MERCI clinical trials is key to providing insights on the effectiveness of MERCI. Most of the existing data analysis on MERCI results has thus far employed conventional statistical analysis techniques. To the best of our knowledge, advanced data analytics and data mining techniques have not yet been systematically applied. To address the issue in this thesis, we conduct a comprehensive study on employing state of the art machine learning algorithms to generate prediction criteria for the outcome of MERCI patients. Specifically, we investigate the issue of how to choose the most significant attributes of a data set with limited instance examples. We propose a few search algorithms to identify the significant attributes, followed by a thorough performance analysis for each algorithm. Finally, we apply our proposed approach to the real-world, de-identified patient data provided by Erlanger Southeast Regional Stroke Center, Chattanooga, TN. Our experimental results have demonstrated that our proposed approach performs well.
CoBOP: Electro-Optic Identification Laser Line Sean Sensors
1998-01-01
Electro - Optic Identification Sensors Project[1] is to develop and demonstrate high resolution underwater electro - optic (EO) imaging sensors, and associated image processing/analysis methods, for rapid visual identification of mines and mine-like contacts (MLCs). Identification of MLCs is a pressing Fleet need. During MCM operations, sonar contacts are classified as mine-like if they are sufficiently similar to signatures of mines. Each contact classified as mine-like must be identified as a mine or not a mine. During MCM operations in littoral areas,
Exploration of geo-mineral compounds in granite mining soils using XRD pattern data analysis
NASA Astrophysics Data System (ADS)
Koteswara Reddy, G.; Yarakkula, Kiran
2017-11-01
The purpose of the study was to investigate the major minerals present in granite mining waste and agricultural soils near and away from mining areas. The mineral exploration of representative sub-soil samples are identified by X-Ray Diffractometer (XRD) pattern data analysis. The morphological features and quantitative elementary analysis was performed by Scanning Electron Microscopy-Energy Dispersed Spectroscopy (SEM-EDS).The XRD pattern data revealed that the major minerals are identified as Quartz, Albite, Anorthite, K-Feldspars, Muscovite, Annite, Lepidolite, Illite, Enstatite and Ferrosilite in granite waste. However, in case of agricultural farm soils the major minerals are identified as Gypsum, Calcite, Magnetite, Hematite, Muscovite, K-Feldspars and Quartz. Moreover, the agricultural soils neighbouring mining areas, the minerals are found that, the enriched Mica group minerals (Lepidolite and Illite) the enriched Orthopyroxene group minerals (Ferrosilite and Enstatite). It is observed that the Mica and Orthopyroxene group minerals are present in agricultural farm soils neighbouring mining areas and absent in agricultural farm soils away from mining areas. The study demonstrated that the chemical migration takes place at agricultural farm lands in the vicinity of the granite mining areas.
A Feature Mining Based Approach for the Classification of Text Documents into Disjoint Classes.
ERIC Educational Resources Information Center
Nieto Sanchez, Salvador; Triantaphyllou, Evangelos; Kraft, Donald
2002-01-01
Proposes a new approach for classifying text documents into two disjoint classes. Highlights include a brief overview of document clustering; a data mining approach called the One Clause at a Time (OCAT) algorithm which is based on mathematical logic; vector space model (VSM); and comparing the OCAT to the VSM. (Author/LRW)
A simulation-based approach for estimating premining water quality: Red Mountain Creek, Colorado
Runkel, Robert L.; Kimball, Briant A; Walton-Day, Katherine; Verplanck, Philip L.
2007-01-01
Regulatory agencies are often charged with the task of setting site-specific numeric water quality standards for impaired streams. This task is particularly difficult for streams draining highly mineralized watersheds with past mining activity. Baseline water quality data obtained prior to mining are often non-existent and application of generic water quality standards developed for unmineralized watersheds is suspect given the geology of most watersheds affected by mining. Various approaches have been used to estimate premining conditions, but none of the existing approaches rigorously consider the physical and geochemical processes that ultimately determine instream water quality. An approach based on simulation modeling is therefore proposed herein. The approach utilizes synoptic data that provide spatially-detailed profiles of concentration, streamflow, and constituent load along the study reach. This field data set is used to calibrate a reactive stream transport model that considers the suite of physical and geochemical processes that affect constituent concentrations during instream transport. A key input to the model is the quality and quantity of waters entering the study reach. This input is based on chemical analyses available from synoptic sampling and observed increases in streamflow along the study reach. Given the calibrated model, additional simulations are conducted to estimate premining conditions. In these simulations, the chemistry of mining-affected sources is replaced with the chemistry of waters that are thought to be unaffected by mining (proximal, premining analogues). The resultant simulations provide estimates of premining water quality that reflect both the reduced loads that were present prior to mining and the processes that affect these loads as they are transported downstream. This simulation-based approach is demonstrated using data from Red Mountain Creek, Colorado, a small stream draining a heavily-mined watershed. Model application to the premining problem for Red Mountain Creek is based on limited field reconnaissance and chemical analyses; additional field work and analyses may be needed to develop definitive, quantitative estimates of premining water quality.
ElGokhy, Sherin M; ElHefnawi, Mahmoud; Shoukry, Amin
2014-05-06
MicroRNAs (miRNAs) are endogenous ∼22 nt RNAs that are identified in many species as powerful regulators of gene expressions. Experimental identification of miRNAs is still slow since miRNAs are difficult to isolate by cloning due to their low expression, low stability, tissue specificity and the high cost of the cloning procedure. Thus, computational identification of miRNAs from genomic sequences provide a valuable complement to cloning. Different approaches for identification of miRNAs have been proposed based on homology, thermodynamic parameters, and cross-species comparisons. The present paper focuses on the integration of miRNA classifiers in a meta-classifier and the identification of miRNAs from metagenomic sequences collected from different environments. An ensemble of classifiers is proposed for miRNA hairpin prediction based on four well-known classifiers (Triplet SVM, Mipred, Virgo and EumiR), with non-identical features, and which have been trained on different data. Their decisions are combined using a single hidden layer neural network to increase the accuracy of the predictions. Our ensemble classifier achieved 89.3% accuracy, 82.2% f-measure, 74% sensitivity, 97% specificity, 92.5% precision and 88.2% negative predictive value when tested on real miRNA and pseudo sequence data. The area under the receiver operating characteristic curve of our classifier is 0.9 which represents a high performance index.The proposed classifier yields a significant performance improvement relative to Triplet-SVM, Virgo and EumiR and a minor refinement over MiPred.The developed ensemble classifier is used for miRNA prediction in mine drainage, groundwater and marine metagenomic sequences downloaded from the NCBI sequence reed archive. By consulting the miRBase repository, 179 miRNAs have been identified as highly probable miRNAs. Our new approach could thus be used for mining metagenomic sequences and finding new and homologous miRNAs. The paper investigates a computational tool for miRNA prediction in genomic or metagenomic data. It has been applied on three metagenomic samples from different environments (mine drainage, groundwater and marine metagenomic sequences). The prediction results provide a set of extremely potential miRNA hairpins for cloning prediction methods. Among the ensemble prediction obtained results there are pre-miRNA candidates that have been validated using miRbase while they have not been recognized by some of the base classifiers.
Yoo, Sooyoung; Cho, Minsu; Kim, Eunhye; Kim, Seok; Sim, Yerim; Yoo, Donghyun; Hwang, Hee; Song, Minseok
2016-04-01
Many hospitals are increasing their efforts to improve processes because processes play an important role in enhancing work efficiency and reducing costs. However, to date, a quantitative tool has not been available to examine the before and after effects of processes and environmental changes, other than the use of indirect indicators, such as mortality rate and readmission rate. This study used process mining technology to analyze process changes based on changes in the hospital environment, such as the construction of a new building, and to measure the effects of environmental changes in terms of consultation wait time, time spent per task, and outpatient care processes. Using process mining technology, electronic health record (EHR) log data of outpatient care before and after constructing a new building were analyzed, and the effectiveness of the technology in terms of the process was evaluated. Using the process mining technique, we found that the total time spent in outpatient care did not increase significantly compared to that before the construction of a new building, considering that the number of outpatients increased, and the consultation wait time decreased. These results suggest that the operation of the outpatient clinic was effective after changes were implemented in the hospital environment. We further identified improvements in processes using the process mining technique, thereby demonstrating the usefulness of this technique for analyzing complex hospital processes at a low cost. This study confirmed the effectiveness of process mining technology at an actual hospital site. In future studies, the use of process mining technology will be expanded by applying this approach to a larger variety of process change situations. Copyright © 2016. Published by Elsevier Ireland Ltd.
NASA Astrophysics Data System (ADS)
Davies, Gwendolyn E.
Acid mine drainage (AMD) resulting from the oxidation of sulfides in mine waste is a major environmental issue facing the mining industry today. Open pit mines, tailings ponds, ore stockpiles, and waste rock dumps can all be significant sources of pollution, primarily heavy metals. These large mining-induced footprints are often located across vast geographic expanses and are difficult to access. With the continuing advancement of imaging satellites, remote sensing may provide a useful monitoring tool for pit lake water quality and the rapid assessment of abandoned mine sites. This study explored the applications of laboratory spectroscopy and multi-season hyperspectral remote sensing for environmental monitoring of mine waste environments. Laboratory spectral experiments were first performed on acid mine waters and synthetic ferric iron solutions to identify and isolate the unique spectral properties of mine waters. These spectral characterizations were then applied to airborne hyperspectral imagery for identification of poor water quality in AMD ponds at the Leviathan Mine Superfund site, CA. Finally, imagery varying in temporal and spatial resolutions were used to identify changes in mineralogy over weathering overburden piles and on dry AMD pond liner surfaces at the Leviathan Mine. Results show the utility of hyperspectral remote sensing for monitoring a diverse range of surfaces associated with AMD.
ERIC Educational Resources Information Center
Trybula, Walter J.
1999-01-01
Reviews the state of research in text mining, focusing on newer developments. The intent is to describe the disparate investigations currently included under the term text mining and provide a cohesive structure for these efforts. A summary of research identifies key organizations responsible for pushing the development of text mining. A section…
NASA Technical Reports Server (NTRS)
Wier, C. E. (Principal Investigator); Powell, R. L.; Amato, R. V.; Russell, O. R.; Martin, K. R.
1975-01-01
The author has identified the following significant results. This investigation evaluated the applicability of a variety of sensor types, formats, and resolution capabilities to the study of both fuel and nonfuel mined lands. The image reinforcement provided by stereo viewing of the EREP images proved useful for identifying lineaments and for mined lands mapping. Skylab S190B color and color infrared transparencies were the most useful EREP imagery. New information on lineament and fracture patterns in the bedrock of Indiana and Illinois extracted from analysis of the Skylab imagery has contributed to furthering the geological understanding of this portion of the Illinois basin.
ERIC Educational Resources Information Center
Yu, Chong Ho; Jannasch-Pennell, Angel; DiGangi, Samuel
2011-01-01
The objective of this article is to illustrate that text mining and qualitative research are epistemologically compatible. First, like many qualitative research approaches, such as grounded theory, text mining encourages open-mindedness and discourages preconceptions. Contrary to the popular belief that text mining is a linear and fully automated…
Collective feature selection to identify crucial epistatic variants.
Verma, Shefali S; Lucas, Anastasia; Zhang, Xinyuan; Veturi, Yogasudha; Dudek, Scott; Li, Binglan; Li, Ruowang; Urbanowicz, Ryan; Moore, Jason H; Kim, Dokyoon; Ritchie, Marylyn D
2018-01-01
Machine learning methods have gained popularity and practicality in identifying linear and non-linear effects of variants associated with complex disease/traits. Detection of epistatic interactions still remains a challenge due to the large number of features and relatively small sample size as input, thus leading to the so-called "short fat data" problem. The efficiency of machine learning methods can be increased by limiting the number of input features. Thus, it is very important to perform variable selection before searching for epistasis. Many methods have been evaluated and proposed to perform feature selection, but no single method works best in all scenarios. We demonstrate this by conducting two separate simulation analyses to evaluate the proposed collective feature selection approach. Through our simulation study we propose a collective feature selection approach to select features that are in the "union" of the best performing methods. We explored various parametric, non-parametric, and data mining approaches to perform feature selection. We choose our top performing methods to select the union of the resulting variables based on a user-defined percentage of variants selected from each method to take to downstream analysis. Our simulation analysis shows that non-parametric data mining approaches, such as MDR, may work best under one simulation criteria for the high effect size (penetrance) datasets, while non-parametric methods designed for feature selection, such as Ranger and Gradient boosting, work best under other simulation criteria. Thus, using a collective approach proves to be more beneficial for selecting variables with epistatic effects also in low effect size datasets and different genetic architectures. Following this, we applied our proposed collective feature selection approach to select the top 1% of variables to identify potential interacting variables associated with Body Mass Index (BMI) in ~ 44,000 samples obtained from Geisinger's MyCode Community Health Initiative (on behalf of DiscovEHR collaboration). In this study, we were able to show that selecting variables using a collective feature selection approach could help in selecting true positive epistatic variables more frequently than applying any single method for feature selection via simulation studies. We were able to demonstrate the effectiveness of collective feature selection along with a comparison of many methods in our simulation analysis. We also applied our method to identify non-linear networks associated with obesity.
30 CFR 47.21 - Identifying hazardous chemicals.
Code of Federal Regulations, 2011 CFR
2011-07-01
... 30 Mineral Resources 1 2011-07-01 2011-07-01 false Identifying hazardous chemicals. 47.21 Section... TRAINING HAZARD COMMUNICATION (HazCom) Hazard Determination § 47.21 Identifying hazardous chemicals. The operator must evaluate each chemical brought on mine property and each chemical produced on mine property...
30 CFR 47.21 - Identifying hazardous chemicals.
Code of Federal Regulations, 2010 CFR
2010-07-01
... 30 Mineral Resources 1 2010-07-01 2010-07-01 false Identifying hazardous chemicals. 47.21 Section... TRAINING HAZARD COMMUNICATION (HazCom) Hazard Determination § 47.21 Identifying hazardous chemicals. The operator must evaluate each chemical brought on mine property and each chemical produced on mine property...
30 CFR 47.21 - Identifying hazardous chemicals.
Code of Federal Regulations, 2013 CFR
2013-07-01
... 30 Mineral Resources 1 2013-07-01 2013-07-01 false Identifying hazardous chemicals. 47.21 Section... TRAINING HAZARD COMMUNICATION (HazCom) Hazard Determination § 47.21 Identifying hazardous chemicals. The operator must evaluate each chemical brought on mine property and each chemical produced on mine property...
30 CFR 47.21 - Identifying hazardous chemicals.
Code of Federal Regulations, 2012 CFR
2012-07-01
... 30 Mineral Resources 1 2012-07-01 2012-07-01 false Identifying hazardous chemicals. 47.21 Section... TRAINING HAZARD COMMUNICATION (HazCom) Hazard Determination § 47.21 Identifying hazardous chemicals. The operator must evaluate each chemical brought on mine property and each chemical produced on mine property...
30 CFR 47.21 - Identifying hazardous chemicals.
Code of Federal Regulations, 2014 CFR
2014-07-01
... 30 Mineral Resources 1 2014-07-01 2014-07-01 false Identifying hazardous chemicals. 47.21 Section... TRAINING HAZARD COMMUNICATION (HazCom) Hazard Determination § 47.21 Identifying hazardous chemicals. The operator must evaluate each chemical brought on mine property and each chemical produced on mine property...
Xu, Li; Han, Ting; Ge, Mei; Zhu, Li; Qian, XiuPing
2016-09-01
Analysis of the Amycolatopsis orientalis HCCB10007 genome revealed new gene clusters involved in natural product biosynthesis that were not associated with the production of known compounds. Halogenases are a type of tailoring enzymes that are usually found within these secondary gene clusters. In this study, we identified an indole-type halometabolite 6-chrolo-1H-indole-3-carboxamide, named LYXLF2, by whole genome mining and metabolic profiling of a flavin-dependent halogenase mutant. LYXLF2 is a new plant growth-regulating compound that promotes root elongation. The results of this study demonstrated that the special gene knock-out/comparative metabolic profiling approach provides a powerful tool for the discovery of novel natural products by genome mining.
Using natural language processing techniques to inform research on nanotechnology.
Lewinski, Nastassja A; McInnes, Bridget T
2015-01-01
Literature in the field of nanotechnology is exponentially increasing with more and more engineered nanomaterials being created, characterized, and tested for performance and safety. With the deluge of published data, there is a need for natural language processing approaches to semi-automate the cataloguing of engineered nanomaterials and their associated physico-chemical properties, performance, exposure scenarios, and biological effects. In this paper, we review the different informatics methods that have been applied to patent mining, nanomaterial/device characterization, nanomedicine, and environmental risk assessment. Nine natural language processing (NLP)-based tools were identified: NanoPort, NanoMapper, TechPerceptor, a Text Mining Framework, a Nanodevice Analyzer, a Clinical Trial Document Classifier, Nanotoxicity Searcher, NanoSifter, and NEIMiner. We conclude with recommendations for sharing NLP-related tools through online repositories to broaden participation in nanoinformatics.
Building a protein name dictionary from full text: a machine learning term extraction approach.
Shi, Lei; Campagne, Fabien
2005-04-07
The majority of information in the biological literature resides in full text articles, instead of abstracts. Yet, abstracts remain the focus of many publicly available literature data mining tools. Most literature mining tools rely on pre-existing lexicons of biological names, often extracted from curated gene or protein databases. This is a limitation, because such databases have low coverage of the many name variants which are used to refer to biological entities in the literature. We present an approach to recognize named entities in full text. The approach collects high frequency terms in an article, and uses support vector machines (SVM) to identify biological entity names. It is also computationally efficient and robust to noise commonly found in full text material. We use the method to create a protein name dictionary from a set of 80,528 full text articles. Only 8.3% of the names in this dictionary match SwissProt description lines. We assess the quality of the dictionary by studying its protein name recognition performance in full text. This dictionary term lookup method compares favourably to other published methods, supporting the significance of our direct extraction approach. The method is strong in recognizing name variants not found in SwissProt.
Building a protein name dictionary from full text: a machine learning term extraction approach
Shi, Lei; Campagne, Fabien
2005-01-01
Background The majority of information in the biological literature resides in full text articles, instead of abstracts. Yet, abstracts remain the focus of many publicly available literature data mining tools. Most literature mining tools rely on pre-existing lexicons of biological names, often extracted from curated gene or protein databases. This is a limitation, because such databases have low coverage of the many name variants which are used to refer to biological entities in the literature. Results We present an approach to recognize named entities in full text. The approach collects high frequency terms in an article, and uses support vector machines (SVM) to identify biological entity names. It is also computationally efficient and robust to noise commonly found in full text material. We use the method to create a protein name dictionary from a set of 80,528 full text articles. Only 8.3% of the names in this dictionary match SwissProt description lines. We assess the quality of the dictionary by studying its protein name recognition performance in full text. Conclusion This dictionary term lookup method compares favourably to other published methods, supporting the significance of our direct extraction approach. The method is strong in recognizing name variants not found in SwissProt. PMID:15817129
Taheri, Shahrooz; Mat Saman, Muhamad Zameri; Wong, Kuan Yew
2013-01-01
One of the cost-intensive issues in managing warehouses is the order picking problem which deals with the retrieval of items from their storage locations in order to meet customer requests. Many solution approaches have been proposed in order to minimize traveling distance in the process of order picking. However, in practice, customer orders have to be completed by certain due dates in order to avoid tardiness which is neglected in most of the related scientific papers. Consequently, we proposed a novel solution approach in order to minimize tardiness which consists of four phases. First of all, weighted association rule mining has been used to calculate associations between orders with respect to their due date. Next, a batching model based on binary integer programming has been formulated to maximize the associations between orders within each batch. Subsequently, the order picking phase will come up which used a Genetic Algorithm integrated with the Traveling Salesman Problem in order to identify the most suitable travel path. Finally, the Genetic Algorithm has been applied for sequencing the constructed batches in order to minimize tardiness. Illustrative examples and comparisons are presented to demonstrate the proficiency and solution quality of the proposed approach. PMID:23864823
Azadnia, Amir Hossein; Taheri, Shahrooz; Ghadimi, Pezhman; Saman, Muhamad Zameri Mat; Wong, Kuan Yew
2013-01-01
One of the cost-intensive issues in managing warehouses is the order picking problem which deals with the retrieval of items from their storage locations in order to meet customer requests. Many solution approaches have been proposed in order to minimize traveling distance in the process of order picking. However, in practice, customer orders have to be completed by certain due dates in order to avoid tardiness which is neglected in most of the related scientific papers. Consequently, we proposed a novel solution approach in order to minimize tardiness which consists of four phases. First of all, weighted association rule mining has been used to calculate associations between orders with respect to their due date. Next, a batching model based on binary integer programming has been formulated to maximize the associations between orders within each batch. Subsequently, the order picking phase will come up which used a Genetic Algorithm integrated with the Traveling Salesman Problem in order to identify the most suitable travel path. Finally, the Genetic Algorithm has been applied for sequencing the constructed batches in order to minimize tardiness. Illustrative examples and comparisons are presented to demonstrate the proficiency and solution quality of the proposed approach.
A vector space model approach to identify genetically related diseases.
Sarkar, Indra Neil
2012-01-01
The relationship between diseases and their causative genes can be complex, especially in the case of polygenic diseases. Further exacerbating the challenges in their study is that many genes may be causally related to multiple diseases. This study explored the relationship between diseases through the adaptation of an approach pioneered in the context of information retrieval: vector space models. A vector space model approach was developed that bridges gene disease knowledge inferred across three knowledge bases: Online Mendelian Inheritance in Man, GenBank, and Medline. The approach was then used to identify potentially related diseases for two target diseases: Alzheimer disease and Prader-Willi Syndrome. In the case of both Alzheimer Disease and Prader-Willi Syndrome, a set of plausible diseases were identified that may warrant further exploration. This study furthers seminal work by Swanson, et al. that demonstrated the potential for mining literature for putative correlations. Using a vector space modeling approach, information from both biomedical literature and genomic resources (like GenBank) can be combined towards identification of putative correlations of interest. To this end, the relevance of the predicted diseases of interest in this study using the vector space modeling approach were validated based on supporting literature. The results of this study suggest that a vector space model approach may be a useful means to identify potential relationships between complex diseases, and thereby enable the coordination of gene-based findings across multiple complex diseases.
Xu, Min; Wang, Yemin; Zhao, Zhilong; Gao, Guixi; Huang, Sheng-Xiong; Kang, Qianjin; He, Xinyi; Lin, Shuangjun; Pang, Xiuhua; Deng, Zixin
2016-01-01
ABSTRACT Genome sequencing projects in the last decade revealed numerous cryptic biosynthetic pathways for unknown secondary metabolites in microbes, revitalizing drug discovery from microbial metabolites by approaches called genome mining. In this work, we developed a heterologous expression and functional screening approach for genome mining from genomic bacterial artificial chromosome (BAC) libraries in Streptomyces spp. We demonstrate mining from a strain of Streptomyces rochei, which is known to produce streptothricins and borrelidin, by expressing its BAC library in the surrogate host Streptomyces lividans SBT5, and screening for antimicrobial activity. In addition to the successful capture of the streptothricin and borrelidin biosynthetic gene clusters, we discovered two novel linear lipopeptides and their corresponding biosynthetic gene cluster, as well as a novel cryptic gene cluster for an unknown antibiotic from S. rochei. This high-throughput functional genome mining approach can be easily applied to other streptomycetes, and it is very suitable for the large-scale screening of genomic BAC libraries for bioactive natural products and the corresponding biosynthetic pathways. IMPORTANCE Microbial genomes encode numerous cryptic biosynthetic gene clusters for unknown small metabolites with potential biological activities. Several genome mining approaches have been developed to activate and bring these cryptic metabolites to biological tests for future drug discovery. Previous sequence-guided procedures relied on bioinformatic analysis to predict potentially interesting biosynthetic gene clusters. In this study, we describe an efficient approach based on heterologous expression and functional screening of a whole-genome library for the mining of bioactive metabolites from Streptomyces. The usefulness of this function-driven approach was demonstrated by the capture of four large biosynthetic gene clusters for metabolites of various chemical types, including streptothricins, borrelidin, two novel lipopeptides, and one unknown antibiotic from Streptomyces rochei Sal35. The transfer, expression, and screening of the library were all performed in a high-throughput way, so that this approach is scalable and adaptable to industrial automation for next-generation antibiotic discovery. PMID:27451447
Semi-automated knowledge discovery: identifying and profiling human trafficking
NASA Astrophysics Data System (ADS)
Poelmans, Jonas; Elzinga, Paul; Ignatov, Dmitry I.; Kuznetsov, Sergei O.
2012-11-01
We propose an iterative and human-centred knowledge discovery methodology based on formal concept analysis. The proposed approach recognizes the important role of the domain expert in mining real-world enterprise applications and makes use of specific domain knowledge, including human intelligence and domain-specific constraints. Our approach was empirically validated at the Amsterdam-Amstelland police to identify suspects and victims of human trafficking in 266,157 suspicious activity reports. Based on guidelines of the Attorney Generals of the Netherlands, we first defined multiple early warning indicators that were used to index the police reports. Using concept lattices, we revealed numerous unknown human trafficking and loverboy suspects. In-depth investigation by the police resulted in a confirmation of their involvement in illegal activities resulting in actual arrestments been made. Our human-centred approach was embedded into operational policing practice and is now successfully used on a daily basis to cope with the vastly growing amount of unstructured information.
Application of LANDSAT data to monitor land reclamation progress in Belmont County, Ohio
NASA Technical Reports Server (NTRS)
Bloemer, H. H. L.; Brumfield, J. O.; Campbell, W. J.; Witt, R. G.; Bly, B. G.
1981-01-01
Strip and contour mining techniques are reviewed as well as some studies conducted to determine the applicability of LANDSAT and associated digital image processing techniques to the surficial problems associated with mining operations. A nontraditional unsupervised classification approach to multispectral data is considered which renders increased classification separability in land cover analysis of surface mined areas. The approach also reduces the dimensionality of the data and requires only minimal analytical skills in digital data processing.
Comparing digital data processing techniques for surface mine and reclamation monitoring
NASA Technical Reports Server (NTRS)
Witt, R. G.; Bly, B. G.; Campbell, W. J.; Bloemer, H. H. L.; Brumfield, J. O.
1982-01-01
The results of three techniques used for processing Landsat digital data are compared for their utility in delineating areas of surface mining and subsequent reclamation. An unsupervised clustering algorithm (ISOCLS), a maximum-likelihood classifier (CLASFY), and a hybrid approach utilizing canonical analysis (ISOCLS/KLTRANS/ISOCLS) were compared by means of a detailed accuracy assessment with aerial photography at NASA's Goddard Space Flight Center. Results show that the hybrid approach was superior to the traditional techniques in distinguishing strip mined and reclaimed areas.
NASA Astrophysics Data System (ADS)
Smith, James F., III; Blank, Joseph A.
2003-03-01
An approach is being explored that involves embedding a fuzzy logic based resource manager in an electronic game environment. Game agents can function under their own autonomous logic or human control. This approach automates the data mining problem. The game automatically creates a cleansed database reflecting the domain expert's knowledge, it calls a data mining function, a genetic algorithm, for data mining of the data base as required and allows easy evaluation of the information extracted. The co-evolutionary fitness functions, chromosomes and stopping criteria for ending the game are discussed. Genetic algorithm and genetic program based data mining procedures are discussed that automatically discover new fuzzy rules and strategies. The strategy tree concept and its relationship to co-evolutionary data mining are examined as well as the associated phase space representation of fuzzy concepts. The overlap of fuzzy concepts in phase space reduces the effective strategies available to adversaries. Co-evolutionary data mining alters the geometric properties of the overlap region known as the admissible region of phase space significantly enhancing the performance of the resource manager. Procedures for validation of the information data mined are discussed and significant experimental results provided.
GROUNDWATER IMPACTED BY ACID MINE DRAINAGE
The generation and release of acidic, metal-rich water from mine wastes continues to be an intractable environmental problem. Although the effects of acid mine drainage (AMD) are most evident in surface waters, there is an obvious need for developing cost-effective approaches fo...
TOXICITY APPROACHES TO ASSESSING MINING IMPACTS AND MINE WASTE TREATMENT EFFECTIVENESS
The USEPA Office of Research and Development's National Exposure Research Laboratory and National Risk Management Research Laboratory have been evaluating the impact of mining sites on receiving streams and the effectiveness of waste treatment technologies in removing toxicity fo...
ERIC Educational Resources Information Center
Qin, Jian; Jurisica, Igor; Liddy, Elizabeth D.; Jansen, Bernard J; Spink, Amanda; Priss, Uta; Norton, Melanie J.
2000-01-01
These six articles discuss knowledge discovery in databases (KDD). Topics include data mining; knowledge management systems; applications of knowledge discovery; text and Web mining; text mining and information retrieval; user search patterns through Web log analysis; concept analysis; data collection; and data structure inconsistency. (LRW)
Detection of antipersonnel (AP) mines using mechatronics approach
NASA Astrophysics Data System (ADS)
Shahri, Ali M.; Naghdy, Fazel
1998-09-01
At present there are approximately 110 million land-mines scattered around the world in 64 countries. The clearance of these mines takes place manually. Unfortunately, on average for every 5000 mines cleared one mine clearer is killed. A Mine Detector Arm (MDA) using mechatronics approach is under development in this work. The robot arm imitates manual hand- prodding technique for mine detection. It inserts a bayonet into the soil and models the dynamics of the manipulator and environment parameters, such as stiffness variation in the soil to control the impact caused by contacting a stiff object. An explicit impact control scheme is applied as the main control scheme, while two different intelligent control methods are designed to deal with uncertainties and varying environmental parameters. Firstly, a neuro-fuzzy adaptive gain controller (NFAGC) is designed to adapt the force gain control according to the estimated environment stiffness. Then, an adaptive neuro-fuzzy plus PID controller is employed to switch from a conventional PID controller to neuro-fuzzy impact control (NFIC), when an impact is detected. The developed control schemes are validated through computer simulation and experimental work.
An Integrative data mining approach to identifying Adverse ...
The Adverse Outcome Pathway (AOP) framework is a tool for making biological connections and summarizing key information across different levels of biological organization to connect biological perturbations at the molecular level to adverse outcomes for an individual or population. Computational approaches to explore and determine these connections can accelerate the assembly of AOPs. By leveraging the wealth of publicly available data covering chemical effects on biological systems, computationally-predicted AOPs (cpAOPs) were assembled via data mining of high-throughput screening (HTS) in vitro data, in vivo data and other disease phenotype information. Frequent Itemset Mining (FIM) was used to find associations between the gene targets of ToxCast HTS assays and disease data from Comparative Toxicogenomics Database (CTD) by using the chemicals as the common aggregators between datasets. The method was also used to map gene expression data to disease data from CTD. A cpAOP network was defined by considering genes and diseases as nodes and FIM associations as edges. This network contained 18,283 gene to disease associations for the ToxCast data and 110,253 for CTD gene expression. Two case studies show the value of the cpAOP network by extracting subnetworks focused either on fatty liver disease or the Aryl Hydrocarbon Receptor (AHR). The subnetwork surrounding fatty liver disease included many genes known to play a role in this disease. When querying the cpAOP
Moore, Farid; Sheykhi, Vahideh; Salari, Mohammad; Bagheri, Adel
2016-04-01
This paper is a comprehensive assessment of the quality of soil in the Nakhlak mining district in Central Iran with special reference to potentially toxic metals. In this regard, an integrated approach involving geostatistical, correlation matrix, pollution indices, and chemical fractionation measurement is used to evaluate selected potentially toxic metals in soil samples. The fractionation of metals indicated a relatively high variability. Some metals (Mo, Ag, and Pb) showed important enrichment in the bioavailable fractions (i.e., exchangeable and carbonate), whereas the residual fraction mostly comprised Sb and Cr. The Cd, Zn, Co, Ni, Mo, Cu, and As were retained in Fe-Mn oxide and oxidizable fractions, suggesting that they may be released to the environment by changes in physicochemical conditions. The spatial variability patterns of 11 soil heavy metals (Ag, As, Cd, Co, Cr, Cu, Mo, Ni, Pb, Sb, and Zn) were identified and mapped. The results demonstrated that Ag, As, Cd, Mo, Cu, Pb, Sb, and Zn pollution are associated with mineralized veins and mining operations in this area. Further environmental monitoring and remedial actions are required for management of soil heavy metals in the study area. The present study not only enhanced our knowledge regarding soil pollution in the study area but also introduced a better technique to analyze pollution indices by multivariate geostatistical methods.
Critical analysis of world uranium resources
Hall, Susan; Coleman, Margaret
2013-01-01
The U.S. Department of Energy, Energy Information Administration (EIA) joined with the U.S. Department of the Interior, U.S. Geological Survey (USGS) to analyze the world uranium supply and demand balance. To evaluate short-term primary supply (0–15 years), the analysis focused on Reasonably Assured Resources (RAR), which are resources projected with a high degree of geologic assurance and considered to be economically feasible to mine. Such resources include uranium resources from mines currently in production as well as resources that are in the stages of feasibility or of being permitted. Sources of secondary supply for uranium, such as stockpiles and reprocessed fuel, were also examined. To evaluate long-term primary supply, estimates of uranium from unconventional and from undiscovered resources were analyzed. At 2010 rates of consumption, uranium resources identified in operating or developing mines would fuel the world nuclear fleet for about 30 years. However, projections currently predict an increase in uranium requirements tied to expansion of nuclear energy worldwide. Under a low-demand scenario, requirements through the period ending in 2035 are about 2.1 million tU. In the low demand case, uranium identified in existing and developing mines is adequate to supply requirements. However, whether or not these identified resources will be developed rapidly enough to provide an uninterrupted fuel supply to expanded nuclear facilities could not be determined. On the basis of a scenario of high demand through 2035, 2.6 million tU is required and identified resources in operating or developing mines is inadequate. Beyond 2035, when requirements could exceed resources in these developing properties, other sources will need to be developed from less well-assured resources, deposits not yet at the prefeasibility stage, resources that are currently subeconomic, secondary sources, undiscovered conventional resources, and unconventional uranium supplies. This report’s analysis of 141 mines that are operating or are being actively developed identifies 2.7 million tU of in-situ uranium resources worldwide, approximately 2.1 million tU recoverable after mining and milling losses were deducted. Sixty-four operating mines report a total of 1.4 million tU of in-situ RAR (about 1 million tU recoverable). Seventy-seven developing mines/production centers report 1.3 million tU in-situ Reasonably Assured Resources (RAR) (about 1.1 million tU recoverable), which have a reasonable chance of producing uranium within 5 years. Most of the production is projected to come from conventional underground or open pit mines as opposed to in-situ leach mines. Production capacity in operating mines is about 76,000 tU/yr, and in developing mines is estimated at greater than 52,000 tU/yr. Production capacity in operating mines should be considered a maximum as mines seldom produce up to licensed capacity due to operational difficulties. In 2010, worldwide mines operated at 70 percent of licensed capacity, and production has never exceeded 89 percent of capacity. The capacity in developing mines is not always reported. In this study 35 percent of developing mines did not report a target licensed capacity, so estimates of future capacity may be too low. The Organisation for Economic Co-operation and Development’s Nuclear Energy Agency (NEA) and International Atomic Energy Agency (IAEA) estimate an additional 1.4 million tU economically recoverable resources, beyond that identified in operating or developing mines identified in this report. As well, 0.5 million tU in subeconomic resources, and 2.3 million tU in the geologically less certain inferred category are identified worldwide. These agencies estimate 2.2 million tU in secondary sources such as government and commercial stockpiles and re-enriched uranium tails. They also estimate that unconventional uranium supplies (uraniferous phosphate and black shale deposits) may contain up to 7.6 million tU. Although unconventional resources are currently subeconomic, the improvement of extraction techniques or the production of coproducts may make extraction of uranium from these types of deposits profitable. A large undiscovered resource base is reported by these agencies, however this class of resource should be considered speculative and will require intensive exploration programs to adequately define them as mineable. These resources may all contribute to uranium supply that would fuel the world nuclear fleet well beyond that calculated in this report. Production of resources in both operating and developing uranium mines is subject to uncertainties caused by technical, legal, regulatory, and financial challenges that combined to create long timelines between deposit discovery and mine production. This analysis indicates that mine development is proceeding too slowly to fully meet requirements for an expanded nuclear power reactor fleet in the near future (to 2035), and unless adequate secondary or unconventional resources can be identified, imbalances in supply and demand may occur.
Improving postapproval drug safety surveillance: getting better information sooner.
Hennessy, Sean; Strom, Brian L
2015-01-01
Adverse drug events (ADEs) are an important public health concern, accounting for 5% of all hospital admissions and two-thirds of all complications occurring shortly after hospital discharge. There are often long delays between when a drug is approved and when serious ADEs are identified. Recent and ongoing advances in drug safety surveillance include the establishment of government-sponsored networks of population databases, the use of data mining approaches, and the formal integration of diverse sources of drug safety information. These advances promise to reduce delays in identifying drug-related risks and in providing reassurance about the absence of such risks.
USGS Toxic Substances Hydrology Program, 2010
Buxton, Herbert T.
2010-01-01
The U.S. Geological Survey (USGS) Toxic Substances Hydrology Program adapts research priorities to address the most important contamination issues facing the Nation and to identify new threats to environmental health. The Program investigates two major types of contamination problems: * Subsurface Point-Source Contamination, and * Watershed and Regional Contamination. Research objectives include developing remediation methods that use natural processes, characterizing and remediating contaminant plumes in fractured-rock aquifers, identifying new environmental contaminants, characterizing new and understudied pesticides in common pesticide-use settings, explaining mercury methylation and bioaccumulation, and developing approaches for remediating watersheds affected by active and historic mining.
Real-time diesel particulate monitor for underground mines.
Noll, James; Janisko, Samuel; Mischler, Steven E
The standard method for determining diesel particulate matter (DPM) exposures in underground metal/ nonmetal mines provides the average exposure concentration for an entire working shift, and several weeks might pass before results are obtained. The main problem with this approach is that it only indicates that an overexposure has occurred rather than providing the ability to prevent an overexposure or detect its cause. Conversely, real-time measurement would provide miners with timely information to allow engineering controls to be deployed immediately and to identify the major factors contributing to any overexposures. Toward this purpose, the National Institute for Occupational Safety and Health (NIOSH) developed a laser extinction method to measure real-time elemental carbon (EC) concentrations (EC is a DPM surrogate). To employ this method, NIOSH developed a person-wearable instrument that was commercialized in 2011. This paper evaluates this commercial instrument, including the calibration curve, limit of detection, accuracy, and potential interferences. The instrument was found to meet the NIOSH accuracy criteria and to be capable of measuring DPM concentrations at levels observed in underground mines. In addition, it was found that a submicron size selector was necessary to avoid interference from mine dust and that cigarette smoke can be an interference when sampling in enclosed cabs.
Miners wives: Gender, culture, and society in the south Wales coalfields, 1919-1939
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gier, J.J.
1993-01-01
This study addresses the absence of historical research on the role of women in Welsh coalfield society through the use of oral history testimony, folk culture sources, literature, newspapers, union records and comparative data from other British and Australian coalfield regions. The thesis identifies the ways in which the domestic ideology and the vernacular culture of coalfield society influenced gender roles and relations in Welsh mining communities during the interwar period. Since the myth of the Miner and the Mining Mam signified the pervasive influence of both the domestic ideology and the vernacular culture, the aim of the study wasmore » to dismantle these ideals in order to reconstruct a history of miners' wives. To further this goal a life-cycle approach was used; the thesis examined courtship an marriage practices, domestic culture in the mining household, women's customary role in ritual surrounding birth and death, and their role in mining strikes and other forms of collective action. The study concludes that while the traditions of rural Wales tended to support a broader role for women in coalfield society, the domestic ideology denied the miner's wife her identity as a worker, and thus limited her participation in class struggle and obscured her role in the history of coalfield society.« less
A data mining based approach to predict spatiotemporal changes in satellite images
NASA Astrophysics Data System (ADS)
Boulila, W.; Farah, I. R.; Ettabaa, K. Saheb; Solaiman, B.; Ghézala, H. Ben
2011-06-01
The interpretation of remotely sensed images in a spatiotemporal context is becoming a valuable research topic. However, the constant growth of data volume in remote sensing imaging makes reaching conclusions based on collected data a challenging task. Recently, data mining appears to be a promising research field leading to several interesting discoveries in various areas such as marketing, surveillance, fraud detection and scientific discovery. By integrating data mining and image interpretation techniques, accurate and relevant information (i.e. functional relation between observed parcels and a set of informational contents) can be automatically elicited. This study presents a new approach to predict spatiotemporal changes in satellite image databases. The proposed method exploits fuzzy sets and data mining concepts to build predictions and decisions for several remote sensing fields. It takes into account imperfections related to the spatiotemporal mining process in order to provide more accurate and reliable information about land cover changes in satellite images. The proposed approach is validated using SPOT images representing the Saint-Denis region, capital of Reunion Island. Results show good performances of the proposed framework in predicting change for the urban zone.
Geochemistry of Standard Mine Waters, Gunnison County, Colorado, July 2009
Verplanck, Philip L.; Manning, Andrew H.; Graves, Jeffrey T.; McCleskey, R. Blaine; Todorov, Todor I.; Lamothe, Paul J.
2009-01-01
In many hard-rock-mining districts water flowing from abandoned mine adits is a primary source of metals to receiving streams. Understanding the generation of adit discharge is an important step in developing remediation plans. In 2006, the U.S. Environmental Protection Agency listed the Standard Mine in the Elk Creek drainage basin near Crested Butte, Colorado as a superfund site because drainage from the Standard Mine enters Elk Creek, contributing dissolved and suspended loads of zinc, cadmium, copper, and other metals to the stream. Elk Creek flows into Coal Creek, which is a source of drinking water for the town of Crested Butte. In 2006 and 2007, the U.S. Geological Survey undertook a hydrogeologic investigation of the Standard Mine and vicinity and identified areas of the underground workings for additional work. Mine drainage, underground-water samples, and selected spring water samples were collected in July 2009 for analysis of inorganic solutes as part of a follow-up study. Water analyses are reported for mine-effluent samples from Levels 1 and 5 of the Standard Mine, underground samples from Levels 2 and 3 of the Standard Mine, two spring samples, and an Elk Creek sample. Reported analyses include field measurements (pH, specific conductance, water temperature, dissolved oxygen, and redox potential), major constituents and trace elements, and oxygen and hydrogen isotopic determinations. Overall, water samples collected in 2009 at the same sites as were collected in 2006 have similar chemical compositions. Similar to 2006, water in Level 3 did not flow out the portal but was observed to flow into open workings to lower parts of the mine. Many dissolved constituent concentrations, including calcium, magnesium, sulfate, manganese, zinc, and cadmium, in Level 3 waters substantially are lower than in Level 1 effluent. Concentrations of these dissolved constituents in water samples collected from Level 2 approach or exceed concentrations of Level 1 effluent suggesting that water-rock interaction between Levels 3 and 1 can account for the elevated concentration of metals and other constituents in Level 1 portal effluent. Ore minerals (sphalerite, argentiferous galena, and chalcopyrite) are the likely sources of zinc, cadmium, lead, and copper and are present within the mine in unmined portions of the vein system, within plugged ore chutes, and in muck piles.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kilpatrick, Laura E.; Cotter, Ed
The U.S. Department of Energy (DOE) Office of Legacy Management is responsible for administering the DOE Uranium Leasing Program (ULP) and its 31 uranium lease tracts located in the Uravan Mineral Belt of southwestern Colorado (see Figure 1). In addition to administering the ULP for the last six decades, DOE has also undertaken the significant task of reclaiming a large number of abandoned uranium (legacy) mine sites and associated features located throughout the Uravan Mineral Belt. In 1995, DOE initiated a 3-year reconnaissance program to locate and delineate (through extensive on-the-ground mapping) the legacy mine sites and associated features containedmore » within the historically defined boundaries of its uranium lease tracts. During that same time frame, DOE recognized the lack of regulations pertaining to the reclamation of legacy mine sites and contacted the U.S. Bureau of Land Management (BLM) concerning the reclamation of legacy mine sites. In November 1995, The BLM Colorado State Office formally issued the United States Department of the Interior, Colorado Bureau of Land Management, Closure/Reclamation Guidelines, Abandoned Uranium Mine Sites as a supplement to its Solid Minerals Reclamation Handbook (H-3042-1). Over the next five-and-one-half years, DOE reclaimed the 161 legacy mine sites that had been identified on DOE withdrawn lands. By the late 1990's, the various BLM field offices in southwestern Colorado began to recognize DOE's experience and expertise in reclaiming legacy mine sites. During the ensuing 8 years, BLM funded DOE (through a series of task orders) to perform reclamation activities at 182 BLM mine sites. To date, DOE has reclaimed 372 separate and distinct legacy mine sites. During this process, DOE has learned many lessons and is willing to share those lessons with others in the reclamation industry because there are still many legacy mine sites not yet reclaimed. DOE currently administers 31 lease tracts (11,017 ha) that collectively contain over 220 legacy (abandoned) uranium mine sites. This contrasts to the millions of hectares administered by the BLM, the U.S. Forest Service, and other federal, tribal, and state agencies that contain thousands of such sites. DOE believes that the processes it has used provide a practical and cost-effective approach to abandoned uranium mine-site reclamation. Although the Federal Acquisition Regulations preclude DOE from competing with private industry, DOE is available to assist other governmental and tribal agencies in their reclamation efforts. (authors)« less
Farrington, John D
2005-07-01
Mongolia's protected areas cover 20.5 million ha or 13.1% of its national territory. Existing and proposed protected areas, however, are threatened by mining. Mining impacts on Mongolia's protected areas are diverse and include licensed and unlicensed mineral activities in protected areas, buffer zone disturbance, and prevention of the establishment of proposed protected areas. Review of United States, Canadian, and Australian policies revealed 9 basic approaches to resolving conflicts between protected areas and mining. Four approaches suitable for Mongolia are granting land trades and special dispensations in exchange for mineral licenses in protected areas; granting protected status to all lapsed mineral licenses in protected areas; voluntary forfeiting of mineral licenses in protected areas in exchange for positive corporate publicity; and prohibiting all new mineral activities in existing and proposed protected areas. Mining is Mongolia's most important industry, however, and the long-term benefits of preserving Mongolia's natural heritage must be considered and weighed against the economic benefits and costs of mining activities.
Effective application of improved profit-mining algorithm for the interday trading model.
Hsieh, Yu-Lung; Yang, Don-Lin; Wu, Jungpin
2014-01-01
Many real world applications of association rule mining from large databases help users make better decisions. However, they do not work well in financial markets at this time. In addition to a high profit, an investor also looks for a low risk trading with a better rate of winning. The traditional approach of using minimum confidence and support thresholds needs to be changed. Based on an interday model of trading, we proposed effective profit-mining algorithms which provide investors with profit rules including information about profit, risk, and winning rate. Since profit-mining in the financial market is still in its infant stage, it is important to detail the inner working of mining algorithms and illustrate the best way to apply them. In this paper we go into details of our improved profit-mining algorithm and showcase effective applications with experiments using real world trading data. The results show that our approach is practical and effective with good performance for various datasets.
Effective Application of Improved Profit-Mining Algorithm for the Interday Trading Model
Wu, Jungpin
2014-01-01
Many real world applications of association rule mining from large databases help users make better decisions. However, they do not work well in financial markets at this time. In addition to a high profit, an investor also looks for a low risk trading with a better rate of winning. The traditional approach of using minimum confidence and support thresholds needs to be changed. Based on an interday model of trading, we proposed effective profit-mining algorithms which provide investors with profit rules including information about profit, risk, and winning rate. Since profit-mining in the financial market is still in its infant stage, it is important to detail the inner working of mining algorithms and illustrate the best way to apply them. In this paper we go into details of our improved profit-mining algorithm and showcase effective applications with experiments using real world trading data. The results show that our approach is practical and effective with good performance for various datasets. PMID:24688442
Feature-opinion pair identification of product reviews in Chinese: a domain ontology modeling method
NASA Astrophysics Data System (ADS)
Yin, Pei; Wang, Hongwei; Guo, Kaiqiang
2013-03-01
With the emergence of the new economy based on social media, a great amount of consumer feedback on particular products are conveyed through wide-spreading product online reviews, making opinion mining a growing interest for both academia and industry. According to the characteristic mode of expression in Chinese, this research proposes an ontology-based linguistic model to identify the basic appraisal expression in Chinese product reviews-"feature-opinion pair (FOP)." The product-oriented domain ontology is constructed automatically at first, then algorithms to identify FOP are designed by mapping product features and opinions to the conceptual space of the domain ontology, and finally comparative experiments are conducted to evaluate the model. Experimental results indicate that the performance of the proposed approach in this paper is efficient in obtaining a more accurate result compared to the state-of-art algorithms. Furthermore, through identifying and analyzing FOPs, the unstructured product reviews are converted into structured and machine-sensible expression, which provides valuable information for business application. This paper contributes to the related research in opinion mining by developing a solid foundation for further sentiment analysis at a fine-grained level and proposing a general way for automatic ontology construction.
NASA Technical Reports Server (NTRS)
Wier, C. E.; Wobber, F. J. (Principal Investigator); Russell, O. R.; Amato, R. V.
1973-01-01
The author has identified the following significant results. The utility of ERTS-1/high altitude aircraft imagery to detect underground mine hazards is strongly suggested. A 1:250,000 scale mined lands map of the Vincennes Quadrangle, Indiana has been prepared. This map is a prototype for a national mined lands inventory and will be distributed to State and Federal offices.
Wagland, Richard; Recio-Saucedo, Alejandra; Simon, Michael; Bracher, Michael; Hunt, Katherine; Foster, Claire; Downing, Amy; Glaser, Adam; Corner, Jessica
2016-08-01
Quality of cancer care may greatly impact on patients' health-related quality of life (HRQoL). Free-text responses to patient-reported outcome measures (PROMs) provide rich data but analysis is time and resource-intensive. This study developed and tested a learning-based text-mining approach to facilitate analysis of patients' experiences of care and develop an explanatory model illustrating impact on HRQoL. Respondents to a population-based survey of colorectal cancer survivors provided free-text comments regarding their experience of living with and beyond cancer. An existing coding framework was tested and adapted, which informed learning-based text mining of the data. Machine-learning algorithms were trained to identify comments relating to patients' specific experiences of service quality, which were verified by manual qualitative analysis. Comparisons between coded retrieved comments and a HRQoL measure (EQ5D) were explored. The survey response rate was 63.3% (21 802/34 467), of which 25.8% (n=5634) participants provided free-text comments. Of retrieved comments on experiences of care (n=1688), over half (n=1045, 62%) described positive care experiences. Most negative experiences concerned a lack of post-treatment care (n=191, 11% of retrieved comments) and insufficient information concerning self-management strategies (n=135, 8%) or treatment side effects (n=160, 9%). Associations existed between HRQoL scores and coded algorithm-retrieved comments. Analysis indicated that the mechanism by which service quality impacted on HRQoL was the extent to which services prevented or alleviated challenges associated with disease and treatment burdens. Learning-based text mining techniques were found useful and practical tools to identify specific free-text comments within a large dataset, facilitating resource-efficient qualitative analysis. This method should be considered for future PROM analysis to inform policy and practice. Study findings indicated that perceived care quality directly impacts on HRQoL. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
EVALUATION OF A TWO-STAGE PASSIVE TREATMENT APPROACH FOR MINING INFLUENCE WATERS
A two-stage passive treatment approach was assessed at bench-scale using two Colorado Mining Influenced Waters (MIWs). The first-stage was a limestone drain with the purpose of removing iron and aluminum and mitigating the potential effects of mineral acidity. The second stage w...
Revealing Learner Interests through Topic Mining from Question-Answering Data
ERIC Educational Resources Information Center
Dun, Yijie; Wang, Na; Wang, Min; Hao, Tianyong
2017-01-01
In a question-answering system, learner generated content including asked and answered questions is a meaningful resource to capture learning interests. This paper proposes an approach based on question topic mining for revealing learners' concerned topics in real community question-answering systems. The authors' approach firstly preprocesses all…
NASA Astrophysics Data System (ADS)
Nez, N.
2017-12-01
By effectively engaging in government-to-government consultation the Tonto National Forest is able to consider oral histories and tribal cultural knowledge in decision making. These conversations often have the potential to lead to the protection and preservation of public lands. Discussed here is one example of successful tribal consultation and how it let to the protection of Traditional Cultural Properties (TCPs). One hour east of Phoenix, Arizona on the Tonto National Forest, Resolution Copper Mine, is working to access a rich copper vein more than 7,000 feet deep. As part of the mining plan of operation they are investigating viable locations to store the earth removed from the mine site. One proposed storage location required hydrologic and geotechnical studies to determine viability. This constituted a significant amount of ground disturbance in an area that is of known importance to local Indian tribes. To ensure proper consideration of tribal concerns, the Forest engaged nine local tribes in government-government consultation. Consultation resulted in the identification of five springs in the project area considered (TCPs) by the Western Apache tribes. Due to the presence of identified TCPs, the Forest asked tribes to assist in the development of mitigation measures to minimize effects of this project on the TCPs identified. The goal of this partnership was to find a way for the Mine to still be able to gather data, while protecting TCPs. During field visits and consultations, a wide range of concerns were shared which were recorded and considered by Tonto National Forest. The Forest developed a proposed mitigation approach to protect springs, which would prevent (not permit) the installation of water monitoring wells, geotechnical borings or trench excavations within 1,200 feet of perennial springs in the project area. As an added mitigation measure, a cultural resources specialist would be on-site during all ground-disturbing activities. Diligent work on behalf of the tribes and the forest resulted in finding mutually acceptable means to allow this project work to commence while respecting the cultural values of the tribes.
New directions in biomedical text annotation: definitions, guidelines and corpus construction
Wilbur, W John; Rzhetsky, Andrey; Shatkay, Hagit
2006-01-01
Background While biomedical text mining is emerging as an important research area, practical results have proven difficult to achieve. We believe that an important first step towards more accurate text-mining lies in the ability to identify and characterize text that satisfies various types of information needs. We report here the results of our inquiry into properties of scientific text that have sufficient generality to transcend the confines of a narrow subject area, while supporting practical mining of text for factual information. Our ultimate goal is to annotate a significant corpus of biomedical text and train machine learning methods to automatically categorize such text along certain dimensions that we have defined. Results We have identified five qualitative dimensions that we believe characterize a broad range of scientific sentences, and are therefore useful for supporting a general approach to text-mining: focus, polarity, certainty, evidence, and directionality. We define these dimensions and describe the guidelines we have developed for annotating text with regard to them. To examine the effectiveness of the guidelines, twelve annotators independently annotated the same set of 101 sentences that were randomly selected from current biomedical periodicals. Analysis of these annotations shows 70–80% inter-annotator agreement, suggesting that our guidelines indeed present a well-defined, executable and reproducible task. Conclusion We present our guidelines defining a text annotation task, along with annotation results from multiple independently produced annotations, demonstrating the feasibility of the task. The annotation of a very large corpus of documents along these guidelines is currently ongoing. These annotations form the basis for the categorization of text along multiple dimensions, to support viable text mining for experimental results, methodology statements, and other forms of information. We are currently developing machine learning methods, to be trained and tested on the annotated corpus, that would allow for the automatic categorization of biomedical text along the general dimensions that we have presented. The guidelines in full detail, along with annotated examples, are publicly available. PMID:16867190
An Outbreak of Lymphocutaneous Sporotrichosis among Mine-Workers in South Africa.
Govender, Nelesh P; Maphanga, Tsidiso G; Zulu, Thokozile G; Patel, Jaymati; Walaza, Sibongile; Jacobs, Charlene; Ebonwu, Joy I; Ntuli, Sindile; Naicker, Serisha D; Thomas, Juno
2015-09-01
The largest outbreak of sporotrichosis occurred between 1938 and 1947 in the gold mines of Witwatersrand in South Africa. Here, we describe an outbreak of lymphocutaneous sporotrichosis that was investigated in a South African gold mine in 2011. Employees working at a reopened section of the mine were recruited for a descriptive cross-sectional study. Informed consent was sought for interview, clinical examination and medical record review. Specimens were collected from participants with active or partially-healed lymphocutaneous lesions. Environmental samples were collected from underground mine levels. Sporothrix isolates were identified by sequencing of the internal transcribed spacer region of the ribosomal gene and the nuclear calmodulin gene. Of 87 male miners, 81 (93%) were interviewed and examined, of whom 29 (36%) had skin lesions; specimens were collected from 17 (59%). Sporotrichosis was laboratory-confirmed among 10 patients and seven had clinically-compatible lesions. Of 42 miners with known HIV status, 11 (26%) were HIV-infected. No cases of disseminated disease were detected. Participants with ≤ 3 years' mining experience had a four times greater odds of developing sporotrichosis than those who had been employed for >3 years (adjusted OR 4.0, 95% CI 1.2-13.1). Isolates from 8 patients were identified as Sporothrix schenckii sensu stricto by calmodulin gene sequencing while environmental isolates were identified as Sporothrix mexicana. S. schenckii sensu stricto was identified as the causative pathogen. Although genetically distinct species were isolated from clinical and environmental sources, it is likely that the source was contaminated soil and untreated wood underground. No cases occurred following recommendations to close sections of the mine, treat timber and encourage consistent use of personal protective equipment. Sporotrichosis is a potentially re-emerging disease where traditional, rather than heavily mechanised, mining techniques are used. Surveillance should be instituted at sentinel locations.
The Forestry Reclamation Approach: guide to successful reforestation of mined lands
Mary Beth Adams
2017-01-01
Appalachian forests are among the most productive and diverse in the world. The land underlying them is also rich in coal, and surface mines operated on more than 2.4 million acres in the region from 1977, when the federal Surface Mining Control and Reclamation Act was passed, through 2015. Many efforts to reclaim mined lands most often resulted in the establishment of...
Si, Lei; Wang, Zhongbin; Liu, Xinhua; Tan, Chao; Liu, Ze; Xu, Jing
2016-01-01
Shearers play an important role in fully mechanized coal mining face and accurately identifying their cutting pattern is very helpful for improving the automation level of shearers and ensuring the safety of coal mining. The least squares support vector machine (LSSVM) has been proven to offer strong potential in prediction and classification issues, particularly by employing an appropriate meta-heuristic algorithm to determine the values of its two parameters. However, these meta-heuristic algorithms have the drawbacks of being hard to understand and reaching the global optimal solution slowly. In this paper, an improved fly optimization algorithm (IFOA) to optimize the parameters of LSSVM was presented and the LSSVM coupled with IFOA (IFOA-LSSVM) was used to identify the shearer cutting pattern. The vibration acceleration signals of five cutting patterns were collected and the special state features were extracted based on the ensemble empirical mode decomposition (EEMD) and the kernel function. Some examples on the IFOA-LSSVM model were further presented and the results were compared with LSSVM, PSO-LSSVM, GA-LSSVM and FOA-LSSVM models in detail. The comparison results indicate that the proposed approach was feasible, efficient and outperformed the others. Finally, an industrial application example at the coal mining face was demonstrated to specify the effect of the proposed system. PMID:26771615
MinePath: Mining for Phenotype Differential Sub-paths in Molecular Pathways
Koumakis, Lefteris; Kartsaki, Evgenia; Chatzimina, Maria; Zervakis, Michalis; Vassou, Despoina; Marias, Kostas; Moustakis, Vassilis; Potamias, George
2016-01-01
Pathway analysis methodologies couple traditional gene expression analysis with knowledge encoded in established molecular pathway networks, offering a promising approach towards the biological interpretation of phenotype differentiating genes. Early pathway analysis methodologies, named as gene set analysis (GSA), view pathways just as plain lists of genes without taking into account either the underlying pathway network topology or the involved gene regulatory relations. These approaches, even if they achieve computational efficiency and simplicity, consider pathways that involve the same genes as equivalent in terms of their gene enrichment characteristics. Most recent pathway analysis approaches take into account the underlying gene regulatory relations by examining their consistency with gene expression profiles and computing a score for each profile. Even with this approach, assessing and scoring single-relations limits the ability to reveal key gene regulation mechanisms hidden in longer pathway sub-paths. We introduce MinePath, a pathway analysis methodology that addresses and overcomes the aforementioned problems. MinePath facilitates the decomposition of pathways into their constituent sub-paths. Decomposition leads to the transformation of single-relations to complex regulation sub-paths. Regulation sub-paths are then matched with gene expression sample profiles in order to evaluate their functional status and to assess phenotype differential power. Assessment of differential power supports the identification of the most discriminant profiles. In addition, MinePath assess the significance of the pathways as a whole, ranking them by their p-values. Comparison results with state-of-the-art pathway analysis systems are indicative for the soundness and reliability of the MinePath approach. In contrast with many pathway analysis tools, MinePath is a web-based system (www.minepath.org) offering dynamic and rich pathway visualization functionality, with the unique characteristic to color regulatory relations between genes and reveal their phenotype inclination. This unique characteristic makes MinePath a valuable tool for in silico molecular biology experimentation as it serves the biomedical researchers’ exploratory needs to reveal and interpret the regulatory mechanisms that underlie and putatively govern the expression of target phenotypes. PMID:27832067
MinePath: Mining for Phenotype Differential Sub-paths in Molecular Pathways.
Koumakis, Lefteris; Kanterakis, Alexandros; Kartsaki, Evgenia; Chatzimina, Maria; Zervakis, Michalis; Tsiknakis, Manolis; Vassou, Despoina; Kafetzopoulos, Dimitris; Marias, Kostas; Moustakis, Vassilis; Potamias, George
2016-11-01
Pathway analysis methodologies couple traditional gene expression analysis with knowledge encoded in established molecular pathway networks, offering a promising approach towards the biological interpretation of phenotype differentiating genes. Early pathway analysis methodologies, named as gene set analysis (GSA), view pathways just as plain lists of genes without taking into account either the underlying pathway network topology or the involved gene regulatory relations. These approaches, even if they achieve computational efficiency and simplicity, consider pathways that involve the same genes as equivalent in terms of their gene enrichment characteristics. Most recent pathway analysis approaches take into account the underlying gene regulatory relations by examining their consistency with gene expression profiles and computing a score for each profile. Even with this approach, assessing and scoring single-relations limits the ability to reveal key gene regulation mechanisms hidden in longer pathway sub-paths. We introduce MinePath, a pathway analysis methodology that addresses and overcomes the aforementioned problems. MinePath facilitates the decomposition of pathways into their constituent sub-paths. Decomposition leads to the transformation of single-relations to complex regulation sub-paths. Regulation sub-paths are then matched with gene expression sample profiles in order to evaluate their functional status and to assess phenotype differential power. Assessment of differential power supports the identification of the most discriminant profiles. In addition, MinePath assess the significance of the pathways as a whole, ranking them by their p-values. Comparison results with state-of-the-art pathway analysis systems are indicative for the soundness and reliability of the MinePath approach. In contrast with many pathway analysis tools, MinePath is a web-based system (www.minepath.org) offering dynamic and rich pathway visualization functionality, with the unique characteristic to color regulatory relations between genes and reveal their phenotype inclination. This unique characteristic makes MinePath a valuable tool for in silico molecular biology experimentation as it serves the biomedical researchers' exploratory needs to reveal and interpret the regulatory mechanisms that underlie and putatively govern the expression of target phenotypes.
Konias, Sokratis; Chouvarda, Ioanna; Vlahavas, Ioannis; Maglaveras, Nicos
2005-09-01
Current approaches for mining association rules usually assume that the mining is performed in a static database, where the problem of missing attribute values does not practically exist. However, these assumptions are not preserved in some medical databases, like in a home care system. In this paper, a novel uncertainty rule algorithm is illustrated, namely URG-2 (Uncertainty Rule Generator), which addresses the problem of mining dynamic databases containing missing values. This algorithm requires only one pass from the initial dataset in order to generate the item set, while new metrics corresponding to the notion of Support and Confidence are used. URG-2 was evaluated over two medical databases, introducing randomly multiple missing values for each record's attribute (rate: 5-20% by 5% increments) in the initial dataset. Compared with the classical approach (records with missing values are ignored), the proposed algorithm was more robust in mining rules from datasets containing missing values. In all cases, the difference in preserving the initial rules ranged between 30% and 60% in favour of URG-2. Moreover, due to its incremental nature, URG-2 saved over 90% of the time required for thorough re-mining. Thus, the proposed algorithm can offer a preferable solution for mining in dynamic relational databases.
Stefănescu, Lucrina; Robu, Brînduşa Mihaela; Ozunu, Alexandru
2013-11-01
The environmental impact assessment of mining sites represents nowadays a large interest topic in Romania. Historical pollution in the Rosia Montana mining area of Romania caused extensive damage to environmental media. This paper has two goals: to investigate the environmental pollution induced by mining activities in the Rosia Montana area and to quantify the environmental impacts and associated risks by means of an integrated approach. Thus, a new method was developed and applied for quantifying the impact of mining activities, taking account of the quality of environmental media in the mining area, and used as case study in the present paper. The associated risks are a function of the environmental impacts and the probability of their occurrence. The results show that the environmental impacts and quantified risks, based on quality indicators to characterize the environmental quality, are of a higher order, and thus measures for pollution remediation and control need to be considered in the investigated area. The conclusion drawn is that an integrated approach for the assessment of environmental impact and associated risks is a valuable and more objective method, and is an important tool that can be applied in the decision-making process for national authorities in the prioritization of emergency action.
Microbially assisted phytoremediation approaches for two multi-element contaminated sites.
Langella, Francesca; Grawunder, Anja; Stark, Romy; Weist, Aileen; Merten, Dirk; Haferburg, Götz; Büchel, Georg; Kothe, Erika
2014-01-01
Phytoremediation is an environmental friendly, cost-effective technology for a soft restoration of abandoned mine sites. The grasses Agrostis capillaris, Deschampsia flexuosa and Festuca rubra, and the annual herb Helianthus annuus were combined with microbial consortia in pot experiments on multi-metal polluted substrates collected at a former uranium mine near Ronneburg, Germany, and a historic copper mine in Kopparberg, Sweden, to test for phytoextraction versus phytostabilization abilities. Metal uptake into plant biomass was evaluated to identify optimal plant-microbe combinations for each substrate. Metal bioavailability was found to be plant species and element specific, and influenced by the applied bacterial consortia of 10 strains, each isolated from the same soil to which it was applied. H. annuus showed high extraction capacity for several metals on the German soil independent of inoculation. Our study could also show a significant enhancement of extraction for F. rubra and A. capillaris when combined with the bacterial consortium, although usually grasses are considered metal excluder species. On the Swedish mixed substrate, due to its toxicity, with 30 % bark compost, A. capillaris inoculated with the respective consortium was able to extract multi-metal contaminants.
Marateb, Hamid Reza; Mansourian, Marjan; Adibi, Peyman; Farina, Dario
2014-01-01
Background: selecting the correct statistical test and data mining method depends highly on the measurement scale of data, type of variables, and purpose of the analysis. Different measurement scales are studied in details and statistical comparison, modeling, and data mining methods are studied based upon using several medical examples. We have presented two ordinal–variables clustering examples, as more challenging variable in analysis, using Wisconsin Breast Cancer Data (WBCD). Ordinal-to-Interval scale conversion example: a breast cancer database of nine 10-level ordinal variables for 683 patients was analyzed by two ordinal-scale clustering methods. The performance of the clustering methods was assessed by comparison with the gold standard groups of malignant and benign cases that had been identified by clinical tests. Results: the sensitivity and accuracy of the two clustering methods were 98% and 96%, respectively. Their specificity was comparable. Conclusion: by using appropriate clustering algorithm based on the measurement scale of the variables in the study, high performance is granted. Moreover, descriptive and inferential statistics in addition to modeling approach must be selected based on the scale of the variables. PMID:24672565
Hilson, Gavin
2006-06-01
This paper critiques contemporary research and policy approaches taken toward the analysis and abatement of mercury pollution in the small-scale gold mining sector. Unmonitored releases of mercury from gold amalgamation have caused considerable environmental contamination and human health complications in rural reaches of sub-Saharan Africa, Latin America and Asia. Whilst these problems have caught the attention of the scientific community over the past 15-20 years, the research that has since been undertaken has failed to identify appropriate mitigation measures, and has done little to advance understanding of why contamination persists. Moreover, the strategies used to educate operators about the impacts of acute mercury exposure, and the technologies implemented to prevent further pollution, have been marginally effective at best. The mercury pollution problem will not be resolved until governments and donor agencies commit to carrying out research aimed at improving understanding of the dynamics of small scale gold mining communities. Acquisition of this knowledge is the key to designing and implementing appropriate support and abatement measures.
Magenes, G; Bellazzi, R; Malovini, A; Signorini, M G
2016-08-01
The onset of fetal pathologies can be screened during pregnancy by means of Fetal Heart Rate (FHR) monitoring and analysis. Noticeable advances in understanding FHR variations were obtained in the last twenty years, thanks to the introduction of quantitative indices extracted from the FHR signal. This study searches for discriminating Normal and Intra Uterine Growth Restricted (IUGR) fetuses by applying data mining techniques to FHR parameters, obtained from recordings in a population of 122 fetuses (61 healthy and 61 IUGRs), through standard CTG non-stress test. We computed N=12 indices (N=4 related to time domain FHR analysis, N=4 to frequency domain and N=4 to non-linear analysis) and normalized them with respect to the gestational week. We compared, through a 10-fold crossvalidation procedure, 15 data mining techniques in order to select the more reliable approach for identifying IUGR fetuses. The results of this comparison highlight that two techniques (Random Forest and Logistic Regression) show the best classification accuracy and that both outperform the best single parameter in terms of mean AUROC on the test sets.
Selection of remedial alternatives for mine sites: a multicriteria decision analysis approach.
Betrie, Getnet D; Sadiq, Rehan; Morin, Kevin A; Tesfamariam, Solomon
2013-04-15
The selection of remedial alternatives for mine sites is a complex task because it involves multiple criteria and often with conflicting objectives. However, an existing framework used to select remedial alternatives lacks multicriteria decision analysis (MCDA) aids and does not consider uncertainty in the selection of alternatives. The objective of this paper is to improve the existing framework by introducing deterministic and probabilistic MCDA methods. The Preference Ranking Organization Method for Enrichment Evaluation (PROMETHEE) methods have been implemented in this study. The MCDA analysis involves processing inputs to the PROMETHEE methods that are identifying the alternatives, defining the criteria, defining the criteria weights using analytical hierarchical process (AHP), defining the probability distribution of criteria weights, and conducting Monte Carlo Simulation (MCS); running the PROMETHEE methods using these inputs; and conducting a sensitivity analysis. A case study was presented to demonstrate the improved framework at a mine site. The results showed that the improved framework provides a reliable way of selecting remedial alternatives as well as quantifying the impact of different criteria on selecting alternatives. Copyright © 2013 Elsevier Ltd. All rights reserved.
On-line Machine Learning and Event Detection in Petascale Data Streams
NASA Astrophysics Data System (ADS)
Thompson, David R.; Wagstaff, K. L.
2012-01-01
Traditional statistical data mining involves off-line analysis in which all data are available and equally accessible. However, petascale datasets have challenged this premise since it is often impossible to store, let alone analyze, the relevant observations. This has led the machine learning community to investigate adaptive processing chains where data mining is a continuous process. Here pattern recognition permits triage and followup decisions at multiple stages of a processing pipeline. Such techniques can also benefit new astronomical instruments such as the Large Synoptic Survey Telescope (LSST) and Square Kilometre Array (SKA) that will generate petascale data volumes. We summarize some machine learning perspectives on real time data mining, with representative cases of astronomical applications and event detection in high volume datastreams. The first is a "supervised classification" approach currently used for transient event detection at the Very Long Baseline Array (VLBA). It injects known signals of interest - faint single-pulse anomalies - and tunes system parameters to recover these events. This permits meaningful event detection for diverse instrument configurations and observing conditions whose noise cannot be well-characterized in advance. Second, "semi-supervised novelty detection" finds novel events based on statistical deviations from previous patterns. It detects outlier signals of interest while considering known examples of false alarm interference. Applied to data from the Parkes pulsar survey, the approach identifies anomalous "peryton" phenomena that do not match previous event models. Finally, we consider online light curve classification that can trigger adaptive followup measurements of candidate events. Classifier performance analyses suggest optimal survey strategies, and permit principled followup decisions from incomplete data. These examples trace a broad range of algorithm possibilities available for online astronomical data mining. This talk describes research performed at the Jet Propulsion Laboratory, California Institute of Technology. Copyright 2012, All Rights Reserved. U.S. Government support acknowledged.
Biomedical text mining for research rigor and integrity: tasks, challenges, directions.
Kilicoglu, Halil
2017-06-13
An estimated quarter of a trillion US dollars is invested in the biomedical research enterprise annually. There is growing alarm that a significant portion of this investment is wasted because of problems in reproducibility of research findings and in the rigor and integrity of research conduct and reporting. Recent years have seen a flurry of activities focusing on standardization and guideline development to enhance the reproducibility and rigor of biomedical research. Research activity is primarily communicated via textual artifacts, ranging from grant applications to journal publications. These artifacts can be both the source and the manifestation of practices leading to research waste. For example, an article may describe a poorly designed experiment, or the authors may reach conclusions not supported by the evidence presented. In this article, we pose the question of whether biomedical text mining techniques can assist the stakeholders in the biomedical research enterprise in doing their part toward enhancing research integrity and rigor. In particular, we identify four key areas in which text mining techniques can make a significant contribution: plagiarism/fraud detection, ensuring adherence to reporting guidelines, managing information overload and accurate citation/enhanced bibliometrics. We review the existing methods and tools for specific tasks, if they exist, or discuss relevant research that can provide guidance for future work. With the exponential increase in biomedical research output and the ability of text mining approaches to perform automatic tasks at large scale, we propose that such approaches can support tools that promote responsible research practices, providing significant benefits for the biomedical research enterprise. Published by Oxford University Press 2017. This work is written by a US Government employee and is in the public domain in the US.
Cashman, Sarah A; Meyer, David E; Edelen, Ashley N; Ingwersen, Wesley W; Abraham, John P; Barrett, William M; Gonzalez, Michael A; Randall, Paul M; Ruiz-Mercado, Gerardo; Smith, Raymond L
2016-09-06
Demands for quick and accurate life cycle assessments create a need for methods to rapidly generate reliable life cycle inventories (LCI). Data mining is a suitable tool for this purpose, especially given the large amount of available governmental data. These data are typically applied to LCIs on a case-by-case basis. As linked open data becomes more prevalent, it may be possible to automate LCI using data mining by establishing a reproducible approach for identifying, extracting, and processing the data. This work proposes a method for standardizing and eventually automating the discovery and use of publicly available data at the United States Environmental Protection Agency for chemical-manufacturing LCI. The method is developed using a case study of acetic acid. The data quality and gap analyses for the generated inventory found that the selected data sources can provide information with equal or better reliability and representativeness on air, water, hazardous waste, on-site energy usage, and production volumes but with key data gaps including material inputs, water usage, purchased electricity, and transportation requirements. A comparison of the generated LCI with existing data revealed that the data mining inventory is in reasonable agreement with existing data and may provide a more-comprehensive inventory of air emissions and water discharges. The case study highlighted challenges for current data management practices that must be overcome to successfully automate the method using semantic technology. Benefits of the method are that the openly available data can be compiled in a standardized and transparent approach that supports potential automation with flexibility to incorporate new data sources as needed.
Hamm, V; Collon-Drouaillet, P; Fabriol, R
2008-02-19
The flooding of abandoned mines in the Lorraine Iron Basin (LIB) over the past 25 years has degraded the quality of the groundwater tapped for drinking water. High concentrations of dissolved sulphate have made the water unsuitable for human consumption. This problematic issue has led to the development of numerical tools to support water-resource management in mining contexts. Here we examine two modelling approaches using different numerical tools that we tested on the Saizerais flooded iron-ore mine (Lorraine, France). A first approach considers the Saizerais Mine as a network of two chemical reactors (NCR). The second approach is based on a physically distributed pipe network model (PNM) built with EPANET 2 software. This approach considers the mine as a network of pipes defined by their geometric and chemical parameters. Each reactor in the NCR model includes a detailed chemical model built to simulate quality evolution in the flooded mine water. However, in order to obtain a robust PNM, we simplified the detailed chemical model into a specific sulphate dissolution-precipitation model that is included as sulphate source/sink in both a NCR model and a pipe network model. Both the NCR model and the PNM, based on different numerical techniques, give good post-calibration agreement between the simulated and measured sulphate concentrations in the drinking-water well and overflow drift. The NCR model incorporating the detailed chemical model is useful when a detailed chemical behaviour at the overflow is needed. The PNM incorporating the simplified sulphate dissolution-precipitation model provides better information of the physics controlling the effect of flow and low flow zones, and the time of solid sulphate removal whereas the NCR model will underestimate clean-up time due to the complete mixing assumption. In conclusion, the detailed NCR model will give a first assessment of chemical processes at overflow, and in a second time, the PNM model will provide more detailed information on flow and chemical behaviour (dissolved sulphate concentrations, remaining mass of solid sulphate) in the network. Nevertheless, both modelling methods require hydrological and chemical parameters (recharge flow rate, outflows, volume of mine voids, mass of solids, kinetic constants of the dissolution-precipitation reactions), which are commonly not available for a mine and therefore call for calibration data.
Bonnail, Estefanía; Pérez-López, Rafael; Sarmiento, Aguasanta M; Nieto, José Miguel; DelValls, T Ángel
2017-09-15
Lanthanide series have been used as a record of the water-rock interaction and work as a tool for identifying impacts of acid mine drainage (lixiviate residue derived from sulphide oxidation). The application of North-American Shale Composite-normalized rare earth elements patterns to these minority elements allows determining the origin of the contamination. In the current study, geochemical patterns were applied to rare earth elements bioaccumulated in the soft tissue of the freshwater clam Corbicula fluminea after exposure to different acid mine drainage contaminated environments. Results show significant bioaccumulation of rare earth elements in soft tissue of the clam after 14 days of exposure to acid mine drainage contaminated sediment (ΣREE=1.3-8μg/gdw). Furthermore, it was possible to biomonitor different degrees of contamination based on rare earth elements in tissue. The pattern of this type of contamination describes a particular curve characterized by an enrichment in the middle rare earth elements; a homologous pattern (E MREE =0.90) has also been observed when applied NASC normalization in clam tissues. Results of lanthanides found in clams were contrasted with the paucity of toxicity studies, determining risk caused by light rare earth elements in the Odiel River close to the Estuary. The current study purposes the use of clam as an innovative "bio-tool" for the biogeochemical monitoring of pollution inputs that determines the acid mine drainage networks affection. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Lambert, I. B.
2012-04-01
Dr Ian Lambert, Geoscience Australia and Secretary General 34th International Geological Congress Australia has comparative advantages in production of mineral commodities compared to most other countries. These stem from its rich and diverse mineral endowment; availability of regional scale (pre-competitive) geoscience information to lower the risks of exploration; advances in exploration, mining and processing technologies; skilled work force; generally benign physical conditions; and low population density. Building on these strengths, Australia is a major producer and exporter of a wide range of mineral and energy commodities to global markets. Given that demand for most major commodities is likely to continue, and that there will be growing markets for some other commodities, Australia needs to have a strategic view of what is likely to be available for mining. Further, Australia (and the world) needs to be attuned to issues that need to be faced in meeting international demand for commodities in the long term. This presentation outlines how Australia's national minerals inventory is compiled. It discusses trends for Australia's identified mineral resources for major commodities, and how these compare with other major mining nations. It then considers some significant issues in relation to sustaining a strong mining sector - in the medium to long term this requires a strategic approach to achieve goals such as more effective/lower risk exploration particularly in greenfields regions; well-Informed decisions on mining proposals; ongoing significant improvements in efficiencies of energy, water and land use.
Martin, Jeffrey D.; Duwelius, Richard F.; Crawford, Charles G.
1987-01-01
The watersheds studied include mined and reclaimed; mined and unreclaimed; and unmined, agricultural land uses, and are each < 3 sq mi in area. Surface water, groundwater, and meteorologic data for the 1981 and 1982 water years were used to describe and compare hydrologic systems of the six watersheds and to identify hydrologic effects of mining and reclamation. Peak discharges were greater at the agricultural watersheds than at the unreclaimed watersheds, primarily because of large final-cut lakes in the unreclaimed watersheds. Annual runoff was greatest at the unreclaimed watersheds, intermediate at the agricultural watersheds, and least at the reclaimed watersheds. Hydrologic effects of mining were identified by comparing the hydrologic systems at mined and unreclaimed watersheds with those at unmined, agricultural watersheds. Comparisons of the hydrologic systems of these watersheds indicate that surface coal mining without reclamation has the potential to increase annual runoff, base flow, and groundwater recharge to the bedrock; reduce peak flow rates and variation in flow; lower the water table in upland areas; change the relation between surface water and groundwater divides; and create numerous, local flow systems in the shallow groundwater. Hydrologic effects of reclamation were identified by comparing the hydrologic systems at mined and reclaimed watersheds with those at mined and unreclaimed watersheds. Reclamation has the potential to decrease annual runoff, base flow, and recharge to the bedrock; increase peak flow rates, variation in flow, and response to thunderstorms; reestablish the premining relation between surface and groundwater divides; and create fewer local flow systems in the shallow groundwater. (Lantz-PTT)
Identifying influential factors of business process performance using dependency analysis
NASA Astrophysics Data System (ADS)
Wetzstein, Branimir; Leitner, Philipp; Rosenberg, Florian; Dustdar, Schahram; Leymann, Frank
2011-02-01
We present a comprehensive framework for identifying influential factors of business process performance. In particular, our approach combines monitoring of process events and Quality of Service (QoS) measurements with dependency analysis to effectively identify influential factors. The framework uses data mining techniques to construct tree structures to represent dependencies of a key performance indicator (KPI) on process and QoS metrics. These dependency trees allow business analysts to determine how process KPIs depend on lower-level process metrics and QoS characteristics of the IT infrastructure. The structure of the dependencies enables a drill-down analysis of single factors of influence to gain a deeper knowledge why certain KPI targets are not met.
Data Mining Techniques Applied to Hydrogen Lactose Breath Test.
Rubio-Escudero, Cristina; Valverde-Fernández, Justo; Nepomuceno-Chamorro, Isabel; Pontes-Balanza, Beatriz; Hernández-Mendoza, Yoedusvany; Rodríguez-Herrera, Alfonso
2017-01-01
Analyze a set of data of hydrogen breath tests by use of data mining tools. Identify new patterns of H2 production. Hydrogen breath tests data sets as well as k-means clustering as the data mining technique to a dataset of 2571 patients. Six different patterns have been extracted upon analysis of the hydrogen breath test data. We have also shown the relevance of each of the samples taken throughout the test. Analysis of the hydrogen breath test data sets using data mining techniques has identified new patterns of hydrogen generation upon lactose absorption. We can see the potential of application of data mining techniques to clinical data sets. These results offer promising data for future research on the relations between gut microbiota produced hydrogen and its link to clinical symptoms.
Using remote sensing imagery to monitoring sea surface pollution cause by abandoned gold-copper mine
NASA Astrophysics Data System (ADS)
Kao, H. M.; Ren, H.; Lee, Y. T.
2010-08-01
The Chinkuashih Benshen mine was the largest gold-copper mine in Taiwan before the owner had abandoned the mine in 1987. However, even the mine had been closed, the mineral still interacts with rain and underground water and flowed into the sea. The polluted sea surface had appeared yellow, green and even white color, and the pollutants had carried by the coast current. In this study, we used the optical satellite images to monitoring the sea surface. Several image processing algorithms are employed especial the subpixel technique and linear mixture model to estimate the concentration of pollutants. The change detection approach is also applied to track them. We also conduct the chemical analysis of the polluted water to provide the ground truth validation. By the correlation analysis between the satellite observation and the ground truth chemical analysis, an effective approach to monitoring water pollution could be established.
Activity recognition from minimal distinguishing subsequence mining
NASA Astrophysics Data System (ADS)
Iqbal, Mohammad; Pao, Hsing-Kuo
2017-08-01
Human activity recognition is one of the most important research topics in the era of Internet of Things. To separate different activities given sensory data, we utilize a Minimal Distinguishing Subsequence (MDS) mining approach to efficiently find distinguishing patterns among different activities. We first transform the sensory data into a series of sensor triggering events and operate the MDS mining procedure afterwards. The gap constraints are also considered in the MDS mining. Given the multi-class nature of most activity recognition tasks, we modify the MDS mining approach from a binary case to a multi-class one to fit the need for multiple activity recognition. We also study how to select the best parameter set including the minimal and the maximal support thresholds in finding the MDSs for effective activity recognition. Overall, the prediction accuracy is 86.59% on the van Kasteren dataset which consists of four different activities for recognition.
Text and Structural Data Mining of Influenza Mentions in Web and Social Media
DOE Office of Scientific and Technical Information (OSTI.GOV)
Corley, Courtney D.; Cook, Diane; Mikler, Armin R.
Text and structural data mining of Web and social media (WSM) provides a novel disease surveillance resource and can identify online communities for targeted public health communications (PHC) to assure wide dissemination of pertinent information. WSM that mention influenza are harvested over a 24-week period, 5-October-2008 to 21-March-2009. Link analysis reveals communities for targeted PHC. Text mining is shown to identify trends in flu posts that correlate to real-world influenza-like-illness patient report data. We also bring to bear a graph-based data mining technique to detect anomalies among flu blogs connected by publisher type, links, and user-tags.
Zhu, Shu-Hong; Conway, Mike
2015-01-01
Background The rise in popularity of electronic cigarettes (e-cigarettes) and hookah over recent years has been accompanied by some confusion and uncertainty regarding the development of an appropriate regulatory response towards these emerging products. Mining online discussion content can lead to insights into people’s experiences, which can in turn further our knowledge of how to address potential health implications. In this work, we take a novel approach to understanding the use and appeal of these emerging products by applying text mining techniques to compare consumer experiences across discussion forums. Objective This study examined content from the websites Vapor Talk, Hookah Forum, and Reddit to understand people’s experiences with different tobacco products. Our investigation involves three parts. First, we identified contextual factors that inform our understanding of tobacco use behaviors, such as setting, time, social relationships, and sensory experience, and compared the forums to identify the ones where content on these factors is most common. Second, we compared how the tobacco use experience differs with combustible cigarettes and e-cigarettes. Third, we investigated differences between e-cigarette and hookah use. Methods In the first part of our study, we employed a lexicon-based extraction approach to estimate prevalence of contextual factors, and then we generated a heat map based on these estimates to compare the forums. In the second and third parts of the study, we employed a text mining technique called topic modeling to identify important topics and then developed a visualization, Topic Bars, to compare topic coverage across forums. Results In the first part of the study, we identified two forums, Vapor Talk Health & Safety and the Stopsmoking subreddit, where discussion concerning contextual factors was particularly common. The second part showed that the discussion in Vapor Talk Health & Safety focused on symptoms and comparisons of combustible cigarettes and e-cigarettes, and the Stopsmoking subreddit focused on psychological aspects of quitting. Last, we examined the discussion content on Vapor Talk and Hookah Forum. Prominent topics included equipment, technique, experiential elements of use, and the buying and selling of equipment. Conclusions This study has three main contributions. Discussion forums differ in the extent to which their content may help us understand behaviors with potential health implications. Identifying dimensions of interest and using a heat map visualization to compare across forums can be helpful for identifying forums with the greatest density of health information. Additionally, our work has shown that the quitting experience can potentially be very different depending on whether or not e-cigarettes are used. Finally, e-cigarette and hookah forums are similar in that members represent a “hobbyist culture” that actively engages in information exchange. These differences have important implications for both tobacco regulation and smoking cessation intervention design. PMID:26420469
Aquatic Ecosystem Enhancement at Mountaintop Mining Sites Symposium
DOE Office of Scientific and Technical Information (OSTI.GOV)
Black, D. Courtney; Lawson, Peter; Morgan, John
2000-01-12
Welcome to this symposium which is part of the ongoing effort to prepare an Environmental Impact Statement (EIS) regarding mountaintop mining and valley fills. The EIS is being prepared by the U.S. Environmental Protection Agency, U.S. Army Corps of Engineers, U.S. Office of Surface Mining, and U.S. Fish and Wildlife Service, in cooperation with the State of West Virginia. Aquatic Ecosystem Enhancement (AEE) at mountaintop mining sites is one of fourteen technical areas identified for study by the EIS Interagency Steering Committee. Three goals were identified in the AEE Work Plan: 1. Assess mining and reclamation practices to show howmore » mining operations might be carried out in a way that minimizes adverse impacts to streams and other environmental resources and to local communities. Clarify economic and technical constraints and benefits. 2. Help citizens clarify choices by showing whether there are affordable ways to enhance existing mining, reclamation, mitigation processes and/or procedures. 3. Ide identify data needed to improve environmental evaluation and design of mining projects to protect the environment. Today’s symposium was proposed in the AEE Team Work Plans but coordinated planning for the event began September 15, 1999 when representatives from coal industry, environmental groups and government regulators met in Morgantown. The meeting participants worked with a facilitator from the Canaan Valley Institute to outline plans for the symposium. Several teams were formed to carry out the plans we outlined in the meeting.« less
Biblio-MetReS for user-friendly mining of genes and biological processes in scientific documents.
Usie, Anabel; Karathia, Hiren; Teixidó, Ivan; Alves, Rui; Solsona, Francesc
2014-01-01
One way to initiate the reconstruction of molecular circuits is by using automated text-mining techniques. Developing more efficient methods for such reconstruction is a topic of active research, and those methods are typically included by bioinformaticians in pipelines used to mine and curate large literature datasets. Nevertheless, experimental biologists have a limited number of available user-friendly tools that use text-mining for network reconstruction and require no programming skills to use. One of these tools is Biblio-MetReS. Originally, this tool permitted an on-the-fly analysis of documents contained in a number of web-based literature databases to identify co-occurrence of proteins/genes. This approach ensured results that were always up-to-date with the latest live version of the databases. However, this 'up-to-dateness' came at the cost of large execution times. Here we report an evolution of the application Biblio-MetReS that permits constructing co-occurrence networks for genes, GO processes, Pathways, or any combination of the three types of entities and graphically represent those entities. We show that the performance of Biblio-MetReS in identifying gene co-occurrence is as least as good as that of other comparable applications (STRING and iHOP). In addition, we also show that the identification of GO processes is on par to that reported in the latest BioCreAtIvE challenge. Finally, we also report the implementation of a new strategy that combines on-the-fly analysis of new documents with preprocessed information from documents that were encountered in previous analyses. This combination simultaneously decreases program run time and maintains 'up-to-dateness' of the results. http://metres.udl.cat/index.php/downloads, metres.cmb@gmail.com.
Andoh, Akira; Kobayashi, Toshio; Kuzuoka, Hiroyuki; Tsujikawa, Tomoyuki; Suzuki, Yasuo; Hirai, Fumihito; Matsui, Toshiyuki; Nakamura, Shiro; Matsumoto, Takayuki; Fujiyama, Yoshihide
2014-05-01
The gut microbiota plays a significant role in the pathogenesis of Crohn's disease (CD). In this study, we analyzed the disease activity and associated fecal microbiota profiles in 160 CD patients and 121 healthy individuals. Fecal samples from the CD patients were collected during three different clinical phases, the active (n=66), remission-achieved (n=51) and remission-maintained (n=43) phases. Terminal restriction fragment length polymorphism (T-RFLP) and data mining analysis using the Classification and Regression Tree (C&RT) approach were performed. Data mining provided a decision tree that clearly identified the various subject groups (nodes). The majority of the healthy individuals were divided into Node-5 and Node-8. Healthy subjects comprised 99% of Node-5 (91 of 92) and 84% of Node-8 (21 of 25 subjects). Node-3 was characterized by CD (136 of 160 CD subjects) and was divided into Node-6 and Node-7. Node-6 (n=103) was characterized by subjects in the active phase (n=48; 46%) and remission-achieved phase (n=39; 38%) and Node-7 was characterized by the remission-maintained phase (21 of 37 subjects; 57%). Finally, Node-6 was divided into Node-9 and Node-10. Node-9 (n=78) was characterized by subjects in the active phase (n=43; 55%) and Node-10 (n=25) was characterized by subjects in the remission-maintained phase (n=16; 64%). Differences in the gut microbiota associated with disease activity of CD patients were identified. Thus, data mining analysis appears to be an ideal tool for the characterization of the gut microbiota in inflammatory bowel disease.
An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature.
ERIC Educational Resources Information Center
Trybula, Walter J.; Wyllys, Ronald E.
2000-01-01
Addresses an approach to the discovery of scientific knowledge through an examination of data mining and text mining techniques. Presents the results of experiments that investigated knowledge acquisition from a selected set of technical documents by domain experts. (Contains 15 references.) (Author/LRW)
LABORATORY EVALUATION OF ZERO-VALENT IRON TO TREAT GROUNDWATER IMPACTED BY ACID MINE DRAINAGE
The generation and release of acidic, metal-rich water from mine wastes continues to be an intractable environmental problem. Although the effects of acid mine drainage (AMD) are most evident in surface waters, there is an obvious need for developing cost-effective approaches fo...
American elm in mine land reforestation
M.B. Adams; P. Angel; C. Barton; J. Slavicek
2015-01-01
Reforestation of mined land in the Appalachians realizes many important benefits and provides important ecosystem services. Because much of the reclaimed mine lands in Appalachia were previously in forest, reclaiming these drastically disturbed areas to forests is desirable, feasible and cost-effective. The Forestry Reclamation Approach (FRA) provides a five-step...
VALUING ACID MINE DRAINAGE REMEDIATION IN WEST VIRGINIA: A HEDONIC MODELING APPROACH
States with active and abandoned mines face large private and public costs to remediate damage to streams and rivers from acid mine drainage (AMD). Appalachian states have an especially large number of contaminated streams and rivers, and the USGS places AMD as the primary source...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Winter, Allen Douglas; Rojas, Wudmir Y.; Williams, Adrienne D.
The promise from graphene to produce devices with high mobilities and detectors with fast response times is truncated in practice by strain and deformation originating during growth and subsequent processing. This work describes effects from graphene growth, multiple layer transfer, and substrate termination on out of plane deformation, critical to device performance. Synchrotron spectroscopy data was acquired with a state-of-the-art hyperspectral large-area detector to describe growth and processing with molecular sensitivity at wafer length scales. A study of methodologies used in data analysis discouraged dichroic ratio approaches in favor of orbital vector approximations and data mining algorithms. Orbital vector methodsmore » provide a physical insight into mobility-detrimental rippling by identifying ripple frequency as main actor, rather than intensity; which was confirmed by data mining algorithms, and in good agreement with electron scattering theories of corrugation in graphene. This work paves the way to efficient information from mechanical properties in graphene in a high throughput mode throughout growth and processing in a materials by design approach.« less
Mineral Carbonation Feasibility, an Economic Approach.
NASA Astrophysics Data System (ADS)
Pasquier, L. C.; Kemache, N.; Cecchi, E.; Mercier, G.; Blais, J. F.; Kentish, S.
2016-12-01
Mineral Carbonation (MC) is one of the ways proposed to mitigate Carbon dioxide (CO2) emissions. Although it intends to transform CO2 into a stable and inert carbonate by reacting it with any divalent containing material, MC is still globally seen as an unrealistic methodology to reduce CO2, mostly because carbonation was seen as a sequestration technique only (after CO2 capture). Nevertheless, recent studies considered and showed the feasibility of an integrated capture/storage approach. Thus, MC can be adapted to flue gas or other industrial gas streams more or less concentrated in CO2. Furthermore, carbonation can be applied to various problematics and offers the advantage to be feasible with a broad range of feedstock such as alkaline industrial or mining residues. Using an economic approach where by-product valorization is favored, interesting approaches were identified. More specifically, the particular case of the Québec province shows that different synergies between wastes and industries can be elaborated. The results indicate that MC can be seen as a practical approach to both reduce CO2 emissions and enhance waste remediation. For instance, the feasibility to export significant amounts of serpentinite mining residue to distant industrial sites using the St Lawrence maritime route was demonstrated. Here the applicability stands on the high value of the generated by-products. On the other hand, steel slags or waste concrete need more local applications due to their limited reaction efficiencies and the lower price of calcium carbonates. While transportation is a major factor for the OPEX cost, the profitability relies on the by-products potential sale. Indeed, the production of low carbon footprint materials from the reaction product will also expand the offer of CO2 utilization avenues. The presentation highlights the results of research made in the lab and using economic modeling to draw a portrait of the opportunities and challenges identified with this regional approach that can apply to a wider range worldwide.
A Need for Systems Architecture Approach for Next Generation Mine Warfare Capability
2006-09-01
MRUUV Mission Reconfigurable Unmanned Undersea Vehicle MSC Mine Countermeasures Ship Coastal MSO Mine Countermeasures Ship Open-ocean P3I Preplanned...Helicopter, the Remote Mine Hunting System (RMS), the Mission Reconfigurable Unmanned Undersea Vehicle (MRUUV) and finally the Littoral Combat Ship (LCS...guarding against the sophisticated Soviet blue-water, air, and undersea threats. Yet since World War II, U.S. Naval Forces have suffered significantly
Development and application of biotechnologies in the metal mining industry.
Johnson, D Barrie
2013-11-01
Metal mining faces a number of significant economic and environmental challenges in the twenty-first century for which established and emerging biotechnologies may, at least in part, provide the answers. Bioprocessing of mineral ores and concentrates is already used in variously engineered formats to extract base (e.g., copper, cobalt, and nickel) and precious (gold and silver) metals in mines throughout the world, though it remains a niche technology. However, current projections of an increasing future need to use low-grade primary metal ores, to reprocess mine wastes, and to develop in situ leaching technologies to extract metals from deep-buried ore bodies, all of which are economically more amenable to bioprocessing than conventional approaches (e.g., pyrometallurgy), would suggest that biomining will become more extensively utilized in the future. Recent research has also shown that bioleaching could be used to process a far wider range of metal ores (e.g., oxidized ores) than has previously been the case. Biotechnologies are also being developed to control mine-related pollution, including securing mine wastes (rocks and tailings) by using "ecological engineering" approaches, and also to remediate and recover metals from waste waters, such as acid mine drainage. This article reviews the current status of biotechnologies within the mining sector and considers how these may be developed and applied in future years.
Mining and Development of Novel SSR Markers Using Next Generation Sequencing (NGS) Data in Plants.
Taheri, Sima; Lee Abdullah, Thohirah; Yusop, Mohd Rafii; Hanafi, Mohamed Musa; Sahebi, Mahbod; Azizi, Parisa; Shamshiri, Redmond Ramin
2018-02-13
Microsatellites, or simple sequence repeats (SSRs), are one of the most informative and multi-purpose genetic markers exploited in plant functional genomics. However, the discovery of SSRs and development using traditional methods are laborious, time-consuming, and costly. Recently, the availability of high-throughput sequencing technologies has enabled researchers to identify a substantial number of microsatellites at less cost and effort than traditional approaches. Illumina is a noteworthy transcriptome sequencing technology that is currently used in SSR marker development. Although 454 pyrosequencing datasets can be used for SSR development, this type of sequencing is no longer supported. This review aims to present an overview of the next generation sequencing, with a focus on the efficient use of de novo transcriptome sequencing (RNA-Seq) and related tools for mining and development of microsatellites in plants.
Calypso: a user-friendly web-server for mining and visualizing microbiome-environment interactions.
Zakrzewski, Martha; Proietti, Carla; Ellis, Jonathan J; Hasan, Shihab; Brion, Marie-Jo; Berger, Bernard; Krause, Lutz
2017-03-01
Calypso is an easy-to-use online software suite that allows non-expert users to mine, interpret and compare taxonomic information from metagenomic or 16S rDNA datasets. Calypso has a focus on multivariate statistical approaches that can identify complex environment-microbiome associations. The software enables quantitative visualizations, statistical testing, multivariate analysis, supervised learning, factor analysis, multivariable regression, network analysis and diversity estimates. Comprehensive help pages, tutorials and videos are provided via a wiki page. The web-interface is accessible via http://cgenome.net/calypso/ . The software is programmed in Java, PERL and R and the source code is available from Zenodo ( https://zenodo.org/record/50931 ). The software is freely available for non-commercial users. l.krause@uq.edu.au. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Using natural language processing techniques to inform research on nanotechnology
Lewinski, Nastassja A
2015-01-01
Summary Literature in the field of nanotechnology is exponentially increasing with more and more engineered nanomaterials being created, characterized, and tested for performance and safety. With the deluge of published data, there is a need for natural language processing approaches to semi-automate the cataloguing of engineered nanomaterials and their associated physico-chemical properties, performance, exposure scenarios, and biological effects. In this paper, we review the different informatics methods that have been applied to patent mining, nanomaterial/device characterization, nanomedicine, and environmental risk assessment. Nine natural language processing (NLP)-based tools were identified: NanoPort, NanoMapper, TechPerceptor, a Text Mining Framework, a Nanodevice Analyzer, a Clinical Trial Document Classifier, Nanotoxicity Searcher, NanoSifter, and NEIMiner. We conclude with recommendations for sharing NLP-related tools through online repositories to broaden participation in nanoinformatics. PMID:26199848
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chambers, Doug; Wiatzka, Gerd; Brown, Steve
This paper provides the life story of Canada's original radium/uranium mine. In addition to the history of operations, it discusses the unique and successful approach used to identify the key issues and concerns associated with the former radium, uranium and silver mining property and the activities undertaken to define the remedial actions and subsequent remedial plan. The Port Radium Mine site, situated approximately 275 km north of Yellowknife on the east shore of Great Bear Lake, Northwest Territories, was discovered in 1930 and underground mining began in 1932. The mine operated almost continuously from 1932 to 1982, initially for recoverymore » of radium, then uranium and finally, for recovery of silver. Tailings production totaled an estimated 900,000 tons and 800,000 tons from uranium and silver processing operations respectively. In the early days of mining, Port Radium miners were exposed to radon and associated decay product levels (in Working Level Months of exposure - WLM) hundreds of times greater than modern standards. The experience of the Port Radium miners provides important contribution to understanding the risks from radon. While the uranium mine was originally decommissioned in the early 1960's, to the standards of the day, the community of Deline (formerly Fort Franklin) had concerns about residual contamination at the mine site and the potential effects arising from use of traditional lands. The Deline people were also concerned about the possible risks to Deline Dene arising from their work as ore carriers. In the late 1990's, the community of Deline brought these concerns to national attention and consequently, the Government of Canada and the community of Deline agreed to move forward in a collaborative manner to address these concerns. The approach agreed to was to establish the Canada-Deline Uranium Table (CDUT) to provide a joint process by which the people of Deline could have their concerns expressed and addressed. A great deal of work was done through the CDUT, including efforts to assess site environment and safety issues in the context of modern reclamation standards. In addition to the environmental and remediation studies, an assessment of historic exposures of Deline ore carriers to radiation and a follow-up epidemiological feasibility study were performed. SENES Consultants Limited (SENES) carried out the dose reconstruction for the Port Radium miners in the 1990's, was the environmental consultant to the CDUT from 2000 to 2005, developed the Remedial Action Plan (RAP), engineering plans and specifications for decommissioning the Port Radium mine and vicinity sites in 2005/6, supervised the remedial works in 2007 and carried out the long term post closure monitoring from 2008 to 2012. Our firsthand experience from working cooperatively with the CDUT provides insights into effective decommissioning of historic contaminated sites. (authors)« less
Gene Expression Patterns Associated With Histopathology in Toxic Liver Fibrosis.
Ippolito, Danielle L; AbdulHameed, Mohamed Diwan M; Tawa, Gregory J; Baer, Christine E; Permenter, Matthew G; McDyre, Bonna C; Dennis, William E; Boyle, Molly H; Hobbs, Cheryl A; Streicker, Michael A; Snowden, Bobbi S; Lewis, John A; Wallqvist, Anders; Stallings, Jonathan D
2016-01-01
Toxic industrial chemicals induce liver injury, which is difficult to diagnose without invasive procedures. Identifying indicators of end organ injury can complement exposure-based assays and improve predictive power. A multiplexed approach was used to experimentally evaluate a panel of 67 genes predicted to be associated with the fibrosis pathology by computationally mining DrugMatrix, a publicly available repository of gene microarray data. Five-day oral gavage studies in male Sprague Dawley rats dosed with varying concentrations of 3 fibrogenic compounds (allyl alcohol, carbon tetrachloride, and 4,4'-methylenedianiline) and 2 nonfibrogenic compounds (bromobenzene and dexamethasone) were conducted. Fibrosis was definitively diagnosed by histopathology. The 67-plex gene panel accurately diagnosed fibrosis in both microarray and multiplexed-gene expression assays. Necrosis and inflammatory infiltration were comorbid with fibrosis. ANOVA with contrasts identified that 51 of the 67 predicted genes were significantly associated with the fibrosis phenotype, with 24 of these specific to fibrosis alone. The protein product of the gene most strongly correlated with the fibrosis phenotype PCOLCE (Procollagen C-Endopeptidase Enhancer) was dose-dependently elevated in plasma from animals administered fibrogenic chemicals (P < .05). Semiquantitative global mass spectrometry analysis of the plasma identified an additional 5 protein products of the gene panel which increased after fibrogenic toxicant administration: fibronectin, ceruloplasmin, vitronectin, insulin-like growth factor binding protein, and α2-macroglobulin. These results support the data mining approach for identifying gene and/or protein panels for assessing liver injury and may suggest bridging biomarkers for molecular mediators linked to histopathology. Published by Oxford University Press on behalf of the Society of Toxicology 2015. This work is written by US Government employees and is in the public domain in the US.
Utility of hyperspectral imagers in the mining industry: Italy's gypsum reserves
NASA Astrophysics Data System (ADS)
Wilson, Janette H.; Greenberger, Rebecca N.
2014-05-01
The mining industry is plagued with socioeconomic and safety roadblocks with not many solutions in the midst of a demanding market. As more and more geologic research using hyperspectral technology has been performed, along with an affordable price point for commercial use of hyperspectral technology, the benefits of hyperspectral imaging to the mining industry has become apparent. This study identifies the key areas of use for hyperspectral imaging in the mining industry through a case study of gypsum mine samples obtained from a mine in central Tuscany.
Characterizing the hydrological system in Rosia Montana mining area (Romania) for AMD mitigation
NASA Astrophysics Data System (ADS)
Cozma, Alexandra; Baciu, Calin; Olenici, Adriana; Brahaita, Dorian; Pop, Cristian; Lazar, Laura; Roba, Carmen; Popita, Gabriela
2015-04-01
Keywords: mining, AMD mitigation, isotopic analyses, Romania Rosia Montana is one of the most important European gold fields, with a long history of mining. The extraction of gold started on site during the Roman age, and the mining operations that spanned over almost two millennia have produced a visible environmental footprint. More than 140 km of mining galleries are documented by historical sources and recent surveys. Water streams are the main vectors spreading the pollution outside the mining area. The main streams, Rosia, Corna, and Saliste, tributaries of Abruzel River are significantly impacted by the acid waters issued by adits, exposed rock surfaces, or rock waste heaps, and tailings depots. Low contamination has been observed in the streams outside the mining area, artificial ponds, and shallow groundwater. Excepting the shallow groundwater system that can be sampled in domestic wells and some springs, the circulation of groundwater is largely unknown. An important amount of the infiltration water is channelled through galleries. The waters sampled at the galleries outlets have low pH, generally between 2 and 3, and very high content of heavy metals. A systematic approach based on monthly sampling and chemical analyses, and isotopic measurements, has been initiated, in order to better understand the underground itinerary of water and the chemical transformations that occur. A sampling network of 28 water points, including streams, ponds, dug wells, springs, and gallery outlets has been setup. Beyond producing a water circulation model in the mining area, the main purpose of the research is to identify ways of decreasing the acid water production and to design low cost techniques for the AMD mitigation. The deposit still hosts about 300 tonnes of gold, and 1600 tonnes of silver. A new large scale mining project is currently under permitting. Cost-effective solutions for the water treatment would be beneficial, especially for the post-mining stage of any future operation. Acknowledgments: The present contribution was financially supported by a grant of the Romanian National Authority for Scientific Research, CCCDI - UEFISCDI, project 3-005 Tools for sustainable gold mining in EU (SUSMIN). Dorian Brahaita has benefited from the financial support provided by the project POSDRU/159/1.5/S/132400.
Moment tensor clustering: a tool to monitor mining induced seismicity
NASA Astrophysics Data System (ADS)
Cesca, Simone; Dahm, Torsten; Tolga Sen, Ali
2013-04-01
Automated moment tensor inversion routines have been setup in the last decades for the analysis of global and regional seismicity. Recent developments could be used to analyse smaller events and larger datasets. In particular, applications to microseismicity, e.g. in mining environments, have then led to the generation of large moment tensor catalogues. Moment tensor catalogues provide a valuable information about the earthquake source and details of rupturing processes taking place in the seismogenic region. Earthquake focal mechanisms can be used to discuss the local stress field, possible orientations of the fault system or to evaluate the presence of shear and/or tensile cracks. Focal mechanism and moment tensor solutions are typically analysed for selected events, and quick and robust tools for the automated analysis of larger catalogues are needed. We propose here a method to perform cluster analysis for large moment tensor catalogues and identify families of events which characterize the studied microseismicity. Clusters include events with similar focal mechanisms, first requiring the definition of distance between focal mechanisms. Different metrics are here proposed, both for the case of pure double couple, constrained moment tensor and full moment tensor catalogues. Different clustering approaches are implemented and discussed. The method is here applied to synthetic and real datasets from mining environments to demonstrate its potential: the proposed cluserting techniques prove to be able to automatically recognise major clusters. An important application for mining monitoring concerns the early identification of anomalous rupture processes, which is relevant for the hazard assessment. This study is funded by the project MINE, which is part of the R&D-Programme GEOTECHNOLOGIEN. The project MINE is funded by the German Ministry of Education and Research (BMBF), Grant of project BMBF03G0737.
Biogeochemical behaviour and bioremediation of uranium in waters of abandoned mines.
Mkandawire, Martin
2013-11-01
The discharges of uranium and associated radionuclides as well as heavy metals and metalloids from waste and tailing dumps in abandoned uranium mining and processing sites pose contamination risks to surface and groundwater. Although many more are being planned for nuclear energy purposes, most of the abandoned uranium mines are a legacy of uranium production that fuelled arms race during the cold war of the last century. Since the end of cold war, there have been efforts to rehabilitate the mining sites, initially, using classical remediation techniques based on high chemical and civil engineering. Recently, bioremediation technology has been sought as alternatives to the classical approach due to reasons, which include: (a) high demand of sites requiring remediation; (b) the economic implication of running and maintaining the facilities due to high energy and work force demand; and (c) the pattern and characteristics of contaminant discharges in most of the former uranium mining and processing sites prevents the use of classical methods. This review discusses risks of uranium contamination from abandoned uranium mines from the biogeochemical point of view and the potential and limitation of uranium bioremediation technique as alternative to classical approach in abandoned uranium mining and processing sites.
A life-cycle description of underground coal mining
NASA Technical Reports Server (NTRS)
Lavin, M. L.; Borden, C. S.; Duda, J. R.
1978-01-01
An initial effort to relate the major technological and economic variables which impact conventional underground coal mining systems, in order to help identify promising areas for advanced mining technology is described. The point of departure is a series of investment analyses published by the United States Bureau of Mines, which provide both the analytical framework and guidance on a choice of variables.
Prior knowledge based mining functional modules from Yeast PPI networks with gene ontology
2010-01-01
Background In the literature, there are fruitful algorithmic approaches for identification functional modules in protein-protein interactions (PPI) networks. Because of accumulation of large-scale interaction data on multiple organisms and non-recording interaction data in the existing PPI database, it is still emergent to design novel computational techniques that can be able to correctly and scalably analyze interaction data sets. Indeed there are a number of large scale biological data sets providing indirect evidence for protein-protein interaction relationships. Results The main aim of this paper is to present a prior knowledge based mining strategy to identify functional modules from PPI networks with the aid of Gene Ontology. Higher similarity value in Gene Ontology means that two gene products are more functionally related to each other, so it is better to group such gene products into one functional module. We study (i) to encode the functional pairs into the existing PPI networks; and (ii) to use these functional pairs as pairwise constraints to supervise the existing functional module identification algorithms. Topology-based modularity metric and complex annotation in MIPs will be used to evaluate the identified functional modules by these two approaches. Conclusions The experimental results on Yeast PPI networks and GO have shown that the prior knowledge based learning methods perform better than the existing algorithms. PMID:21172053
Salvati, L; Kosmas, C; Kairis, O; Karavitis, C; Acikalin, S; Belgacem, A; Solé-Benet, A; Chaker, M; Fassouli, V; Gokceoglu, C; Gungor, H; Hessel, R; Khatteli, H; Kounalaki, A; Laouina, A; Ocakoglu, F; Ouessar, M; Ritsema, C; Sghaier, M; Sonmez, H; Taamallah, H; Tezcan, L; de Vente, J; Kelly, C; Colantoni, A; Carlucci, M
2016-12-01
This study investigates the relationship between fine resolution, local-scale biophysical and socioeconomic contexts within which land degradation occurs, and the human responses to it. The research draws on experimental data collected under different territorial and socioeconomic conditions at 586 field sites in five Mediterranean countries (Spain, Greece, Turkey, Tunisia and Morocco). We assess the level of desertification risk under various land management practices (terracing, grazing control, prevention of wildland fires, soil erosion control measures, soil water conservation measures, sustainable farming practices, land protection measures and financial subsidies) taken as possible responses to land degradation. A data mining approach, incorporating principal component analysis, non-parametric correlations, multiple regression and canonical analysis, was developed to identify the spatial relationship between land management conditions, the socioeconomic and environmental context (described using 40 biophysical and socioeconomic indicators) and desertification risk. Our analysis identified a number of distinct relationships between the level of desertification experienced and the underlying socioeconomic context, suggesting that the effectiveness of responses to land degradation is strictly dependent on the local biophysical and socioeconomic context. Assessing the latent relationship between land management practices and the biophysical/socioeconomic attributes characterizing areas exposed to different levels of desertification risk proved to be an indirect measure of the effectiveness of field actions contrasting land degradation. Copyright © 2016 Elsevier Ltd. All rights reserved.
Data mining: comparing the empiric CFS to the Canadian ME/CFS case definition.
Jason, Leonard A; Skendrovic, Beth; Furst, Jacob; Brown, Abigail; Weng, Angela; Bronikowski, Christine
2012-01-01
This article contrasts two case definitions for myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS). We compared the empiric CFS case definition (Reeves et al., 2005) and the Canadian ME/CFS clinical case definition (Carruthers et al., 2003) with a sample of individuals with CFS versus those without. Data mining with decision trees was used to identify the best items to identify patients with CFS. Data mining is a statistical technique that was used to help determine which of the survey questions were most effective for accurately classifying cases. The empiric criteria identified about 79% of patients with CFS and the Canadian criteria identified 87% of patients. Items identified by the Canadian criteria had more construct validity. The implications of these findings are discussed. © 2011 Wiley Periodicals, Inc.
Towards an Enhanced Aspect-based Contradiction Detection Approach for Online Review Content
NASA Astrophysics Data System (ADS)
Nuradilah Azman, Siti; Ishak, Iskandar; Sharef, Nurfadhlina Mohd; Sidi, Fatimah
2017-09-01
User generated content as such online reviews plays an important role in customer’s purchase decisions. Many works have focused on identifying satisfaction of the reviewer in social media through the study of sentiment analysis (SA) and opinion mining. The large amount of potential application and the increasing number of opinions expresses on the web results in researchers interest on sentiment analysis and opinion mining. However, due to the reviewer’s idiosyncrasy, reviewer may have different preferences and point of view for a particular subject which in this case hotel reviews. There is still limited research that focuses on this contradiction detection in the perspective of tourism online review especially in numerical contradiction. Therefore, the aim of this paper to investigate the type of contradiction in online review which mainly focusing on hotel online review, to provide useful material on process or methods for identifying contradiction which mainly on the review itself and to determine opportunities for relevant future research for online review contradiction detection. We also proposed a model to detect numerical contradiction in user generated content for tourism industry.
Using data mining to segment healthcare markets from patients' preference perspectives.
Liu, Sandra S; Chen, Jie
2009-01-01
This paper aims to provide an example of how to use data mining techniques to identify patient segments regarding preferences for healthcare attributes and their demographic characteristics. Data were derived from a number of individuals who received in-patient care at a health network in 2006. Data mining and conventional hierarchical clustering with average linkage and Pearson correlation procedures are employed and compared to show how each procedure best determines segmentation variables. Data mining tools identified three differentiable segments by means of cluster analysis. These three clusters have significantly different demographic profiles. The study reveals, when compared with traditional statistical methods, that data mining provides an efficient and effective tool for market segmentation. When there are numerous cluster variables involved, researchers and practitioners need to incorporate factor analysis for reducing variables to clearly and meaningfully understand clusters. Interests and applications in data mining are increasing in many businesses. However, this technology is seldom applied to healthcare customer experience management. The paper shows that efficient and effective application of data mining methods can aid the understanding of patient healthcare preferences.
Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges
Singhal, Ayush; Leaman, Robert; Catlett, Natalie; Lemberger, Thomas; McEntyre, Johanna; Polson, Shawn; Xenarios, Ioannis; Arighi, Cecilia; Lu, Zhiyong
2016-01-01
Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs. We note that system ‘accuracy’ remains a challenge and identify several additional common difficulties and potential research directions including (i) the ‘scalability’ issue due to the increasing need of mining information from millions of full-text articles, (ii) the ‘interoperability’ issue of integrating various text-mining systems into existing curation workflows and (iii) the ‘reusability’ issue on the difficulty of applying trained systems to text genres that are not seen previously during development. We then describe related efforts within the text-mining community, with a special focus on the BioCreative series of challenge workshops. We believe that focusing on the near-term challenges identified in this work will amplify the opportunities afforded by the continued adoption of text-mining tools. Finally, in order to sustain the curation ecosystem and have text-mining systems adopted for practical benefits, we call for increased collaboration between text-mining researchers and various stakeholders, including researchers, publishers and biocurators. PMID:28025348
Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges
Singhal, Ayush; Leaman, Robert; Catlett, Natalie; ...
2016-12-26
Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs. We note that system ‘accuracy’ remains a challenge and identify several additional common difficulties and potential research directions including (i) the ‘scalability’ issue due to themore » increasing need of mining information from millions of full-text articles, (ii) the ‘interoperability’ issue of integrating various text-mining systems into existing curation workflows and (iii) the ‘reusability’ issue on the difficulty of applying trained systems to text genres that are not seen previously during development. We then describe related efforts within the text-mining community, with a special focus on the BioCreative series of challenge workshops. We believe that focusing on the near-term challenges identified in this work will amplify the opportunities afforded by the continued adoption of text-mining tools. In conclusion, in order to sustain the curation ecosystem and have text-mining systems adopted for practical benefits, we call for increased collaboration between text-mining researchers and various stakeholders, including researchers, publishers and biocurators.« less
Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singhal, Ayush; Leaman, Robert; Catlett, Natalie
Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs. We note that system ‘accuracy’ remains a challenge and identify several additional common difficulties and potential research directions including (i) the ‘scalability’ issue due to themore » increasing need of mining information from millions of full-text articles, (ii) the ‘interoperability’ issue of integrating various text-mining systems into existing curation workflows and (iii) the ‘reusability’ issue on the difficulty of applying trained systems to text genres that are not seen previously during development. We then describe related efforts within the text-mining community, with a special focus on the BioCreative series of challenge workshops. We believe that focusing on the near-term challenges identified in this work will amplify the opportunities afforded by the continued adoption of text-mining tools. In conclusion, in order to sustain the curation ecosystem and have text-mining systems adopted for practical benefits, we call for increased collaboration between text-mining researchers and various stakeholders, including researchers, publishers and biocurators.« less
A new approach to preserve privacy data mining based on fuzzy theory in numerical database
NASA Astrophysics Data System (ADS)
Cui, Run; Kim, Hyoung Joong
2014-01-01
With the rapid development of information techniques, data mining approaches have become one of the most important tools to discover the in-deep associations of tuples in large-scale database. Hence how to protect the private information is quite a huge challenge, especially during the data mining procedure. In this paper, a new method is proposed for privacy protection which is based on fuzzy theory. The traditional fuzzy approach in this area will apply fuzzification to the data without considering its readability. A new style of obscured data expression is introduced to provide more details of the subsets without reducing the readability. Also we adopt a balance approach between the privacy level and utility when to achieve the suitable subgroups. An experiment is provided to show that this approach is suitable for the classification without a lower accuracy. In the future, this approach can be adapted to the data stream as the low computation complexity of the fuzzy function with a suitable modification.
Mak, Wai Shun; Tran, Stephen; Marcheschi, Ryan; Bertolani, Steve; Thompson, James; Baker, David; Liao, James C; Siegel, Justin B
2015-11-24
The ability to biosynthetically produce chemicals beyond what is commonly found in Nature requires the discovery of novel enzyme function. Here we utilize two approaches to discover enzymes that enable specific production of longer-chain (C5-C8) alcohols from sugar. The first approach combines bioinformatics and molecular modelling to mine sequence databases, resulting in a diverse panel of enzymes capable of catalysing the targeted reaction. The median catalytic efficiency of the computationally selected enzymes is 75-fold greater than a panel of naively selected homologues. This integrative genomic mining approach establishes a unique avenue for enzyme function discovery in the rapidly expanding sequence databases. The second approach uses computational enzyme design to reprogramme specificity. Both approaches result in enzymes with >100-fold increase in specificity for the targeted reaction. When enzymes from either approach are integrated in vivo, longer-chain alcohol production increases over 10-fold and represents >95% of the total alcohol products.
Kwon, Yeondae; Natori, Yukikazu
2017-01-01
The proportion of the elderly population in most countries worldwide is increasing dramatically. Therefore, social interest in the fields of health, longevity, and anti-aging has been increasing as well. However, the basic research results obtained from a reductionist approach in biology and a bioinformatic approach in genome science have limited usefulness for generating insights on future health, longevity, and anti-aging-related research on a case by case basis. We propose a new approach that uses our literature mining technique and bioinformatics, which lead to a better perspective on research trends by providing an expanded knowledge base to work from. We demonstrate that our approach provides useful information that deepens insights on future trends which differs from data obtained conventionally, and this methodology is already paving the way for a new field in aging-related research based on literature mining. One compelling example of this is how our new approach can be a useful tool in drug repositioning. PMID:28817730
An Efficient Pattern Mining Approach for Event Detection in Multivariate Temporal Data
Batal, Iyad; Cooper, Gregory; Fradkin, Dmitriy; Harrison, James; Moerchen, Fabian; Hauskrecht, Milos
2015-01-01
This work proposes a pattern mining approach to learn event detection models from complex multivariate temporal data, such as electronic health records. We present Recent Temporal Pattern mining, a novel approach for efficiently finding predictive patterns for event detection problems. This approach first converts the time series data into time-interval sequences of temporal abstractions. It then constructs more complex time-interval patterns backward in time using temporal operators. We also present the Minimal Predictive Recent Temporal Patterns framework for selecting a small set of predictive and non-spurious patterns. We apply our methods for predicting adverse medical events in real-world clinical data. The results demonstrate the benefits of our methods in learning accurate event detection models, which is a key step for developing intelligent patient monitoring and decision support systems. PMID:26752800
Huesch, Marco D
2017-12-01
Surveillance of the safety of prescribed drugs after marketing approval has been secured remains fraught with complications. Formal ascertainment by providers and reporting to adverse-event registries, formal surveys by manufacturers, and mining of electronic medical records are all well-known approaches with varying degrees of difficulty, cost, and success. Novel approaches may be a useful adjunct, especially approaches that mine or sample internet-based methods such as online social networks. A novel commercial software-as-a-service data-mining product supplied by Sysomos from Datasift/Facebook was used to mine all mentions on Facebook of statins and stain-related side effects in the US in the 1-month period 9 January 2017 through 8 February 2017. A total of 4.3% of all 25,700 mentions of statins also mentioned typical stain-related side effects. Multiple methodological weaknesses stymie interpretation of this percentage, which is however not inconsistent with estimates that 5-20% of patients taking statins will experience typical side effects at some time. Future work on pharmacovigilance may be informed by this novel commercial tool, but the inability to mine the full text of a posting poses serious challenges to content categorization.
An Empirical Model for Mine-Blast Loading
2014-10-17
fledged experimental program. The numerical approach however suffers from several drawbacks in the mine blast simulations. First, it is a very...Suffield consisted in a pendulum type device to measure global impulse of buried mine [15]. One of the main purposes of the ONAGER pendulum was to study...TP-1 Terminal effects, KTA 1-34 report, 2004. [15] Bues, R., Hlady, S.L. and Bergeron, D.M., Pendulum Measurement of Land Mine Blast Output, Volume
Utilization of volume correlation filters for underwater mine identification in LIDAR imagery
NASA Astrophysics Data System (ADS)
Walls, Bradley
2008-04-01
Underwater mine identification persists as a critical technology pursued aggressively by the Navy for fleet protection. As such, new and improved techniques must continue to be developed in order to provide measurable increases in mine identification performance and noticeable reductions in false alarm rates. In this paper we show how recent advances in the Volume Correlation Filter (VCF) developed for ground based LIDAR systems can be adapted to identify targets in underwater LIDAR imagery. Current automated target recognition (ATR) algorithms for underwater mine identification employ spatial based three-dimensional (3D) shape fitting of models to LIDAR data to identify common mine shapes consisting of the box, cylinder, hemisphere, truncated cone, wedge, and annulus. VCFs provide a promising alternative to these spatial techniques by correlating 3D models against the 3D rendered LIDAR data.
Introducing Text Analytics as a Graduate Business School Course
ERIC Educational Resources Information Center
Edgington, Theresa M.
2011-01-01
Text analytics refers to the process of analyzing unstructured data from documented sources, including open-ended surveys, blogs, and other types of web dialog. Text analytics has enveloped the concept of text mining, an analysis approach influenced heavily from data mining. While text mining has been covered extensively in various computer…
ERIC Educational Resources Information Center
Anaya, Antonio R.; Boticario, Jesus G.
2009-01-01
Data mining methods are successful in educational environments to discover new knowledge or learner skills or features. Unfortunately, they have not been used in depth with collaboration. We have developed a scalable data mining method, whose objective is to infer information on the collaboration during the collaboration process in a…
States with active and abandoned mines face large private and public costs to remediate damage to streams and rivers from acid mine drainage (AMD). Appalachian states have an especially large number of contaminated streams and rivers, and the USGS places AMD as the primary source...
Sustainable Remediation of Legacy Mine Drainage: A Case Study of the Flight 93 National Memorial.
Emili, Lisa A; Pizarchik, Joseph; Mahan, Carolyn G
2016-03-01
Pollution from mining activities is a global environmental concern, not limited to areas of current resource extraction, but including a broader geographic area of historic (legacy) and abandoned mines. The pollution of surface waters from acid mine drainage is a persistent problem and requires a holistic and sustainable approach to addressing the spatial and temporal complexity of mining-specific problems. In this paper, we focus on the environmental, socio-economic, and legal challenges associated with the concurrent activities to remediate a coal mine site and to develop a national memorial following a catastrophic event. We provide a conceptual construct of a socio-ecological system defined at several spatial, temporal, and organizational scales and a critical synthesis of the technical and social learning processes necessary to achieving sustainable environmental remediation. Our case study is an example of a multi-disciplinary management approach, whereby collaborative interaction of stakeholders, the emergence of functional linkages for information exchange, and mediation led to scientifically informed decision making, creative management solutions, and ultimately environmental policy change.
Screening Electronic Health Record-Related Patient Safety Reports Using Machine Learning.
Marella, William M; Sparnon, Erin; Finley, Edward
2017-03-01
The objective of this study was to develop a semiautomated approach to screening cases that describe hazards associated with the electronic health record (EHR) from a mandatory, population-based patient safety reporting system. Potentially relevant cases were identified through a query of the Pennsylvania Patient Safety Reporting System. A random sample of cases were manually screened for relevance and divided into training, testing, and validation data sets to develop a machine learning model. This model was used to automate screening of remaining potentially relevant cases. Of the 4 algorithms tested, a naive Bayes kernel performed best, with an area under the receiver operating characteristic curve of 0.927 ± 0.023, accuracy of 0.855 ± 0.033, and F score of 0.877 ± 0.027. The machine learning model and text mining approach described here are useful tools for identifying and analyzing adverse event and near-miss reports. Although reporting systems are beginning to incorporate structured fields on health information technology and the EHR, these methods can identify related events that reporters classify in other ways. These methods can facilitate analysis of legacy safety reports by retrieving health information technology-related and EHR-related events from databases without fields and controlled values focused on this subject and distinguishing them from reports in which the EHR is mentioned only in passing. Machine learning and text mining are useful additions to the patient safety toolkit and can be used to semiautomate screening and analysis of unstructured text in safety reports from frontline staff.
Mimosa: Mixture Model of Co-expression to Detect Modulators of Regulatory Interaction
NASA Astrophysics Data System (ADS)
Hansen, Matthew; Everett, Logan; Singh, Larry; Hannenhalli, Sridhar
Functionally related genes tend to be correlated in their expression patterns across multiple conditions and/or tissue-types. Thus co-expression networks are often used to investigate functional groups of genes. In particular, when one of the genes is a transcription factor (TF), the co-expression-based interaction is interpreted, with caution, as a direct regulatory interaction. However, any particular TF, and more importantly, any particular regulatory interaction, is likely to be active only in a subset of experimental conditions. Moreover, the subset of expression samples where the regulatory interaction holds may be marked by presence or absence of a modifier gene, such as an enzyme that post-translationally modifies the TF. Such subtlety of regulatory interactions is overlooked when one computes an overall expression correlation. Here we present a novel mixture modeling approach where a TF-Gene pair is presumed to be significantly correlated (with unknown coefficient) in a (unknown) subset of expression samples. The parameters of the model are estimated using a Maximum Likelihood approach. The estimated mixture of expression samples is then mined to identify genes potentially modulating the TF-Gene interaction. We have validated our approach using synthetic data and on three biological cases in cow and in yeast. While limited in some ways, as discussed, the work represents a novel approach to mine expression data and detect potential modulators of regulatory interactions.
Sams, James I.; Veloski, Garret; Ackman, T.E.
2003-01-01
Nighttime high-resolution airborne thermal infrared imagery (TIR) data were collected in the predawn hours during Feb 5-8 and March 11-12, 1999, from a helicopter platform for 72.4 km of the Youghiogheny River, from Connellsville to McKeesport, in southwestern Pennsylvania. The TIR data were used to identify sources of mine drainage from abandoned mines that discharge directly into the Youghiogheny River. Image-processing and geographic information systems (GIS) techniques were used to identify 70 sites within the study area as possible mine drainage sources. The combination of GIS datasets and the airborne TIR data provided a fast and accurate method to target the possible sources. After field reconnaissance, it was determined that 24 of the 70 sites were mine drainage. This paper summarizes: the procedures used to process the TIR data and extract potential mine-drainage sites; methods used for verification of the TIR data; a discussion of factors affecting the TIR data; and a brief summary of water quality.
ERIC Educational Resources Information Center
Ford, Julie Dyke
2012-01-01
This program profile describes a new approach towards integrating communication within Mechanical Engineering curricula. The author, who holds a joint appointment between Technical Communication and Mechanical Engineering at New Mexico Institute of Mining and Technology, has been collaborating with Mechanical Engineering colleagues to establish a…
A Voronoi interior adjacency-based approach for generating a contour tree
NASA Astrophysics Data System (ADS)
Chen, Jun; Qiao, Chaofei; Zhao, Renliang
2004-05-01
A contour tree is a good graphical tool for representing the spatial relations of contour lines and has found many applications in map generalization, map annotation, terrain analysis, etc. A new approach for generating contour trees by introducing a Voronoi-based interior adjacency set concept is proposed in this paper. The immediate interior adjacency set is employed to identify all of the children contours of each contour without contour elevations. It has advantages over existing methods such as the point-in-polygon method and the region growing-based method. This new approach can be used for spatial data mining and knowledge discovering, such as the automatic extraction of terrain features and construction of multi-resolution digital elevation model.
A Hybrid Approach for Efficient Modeling of Medium-Frequency Propagation in Coal Mines
Brocker, Donovan E.; Sieber, Peter E.; Waynert, Joseph A.; Li, Jingcheng; Werner, Pingjuan L.; Werner, Douglas H.
2015-01-01
An efficient procedure for modeling medium frequency (MF) communications in coal mines is introduced. In particular, a hybrid approach is formulated and demonstrated utilizing ideal transmission line equations to model MF propagation in combination with full-wave sections used for accurate simulation of local antenna-line coupling and other near-field effects. This work confirms that the hybrid method accurately models signal propagation from a source to a load for various system geometries and material compositions, while significantly reducing computation time. With such dramatic improvement to solution times, it becomes feasible to perform large-scale optimizations with the primary motivation of improving communications in coal mines both for daily operations and emergency response. Furthermore, it is demonstrated that the hybrid approach is suitable for modeling and optimizing large communication networks in coal mines that may otherwise be intractable to simulate using traditional full-wave techniques such as moment methods or finite-element analysis. PMID:26478686
Aubry, Marc; Monnier, Annabelle; Chicault, Celine; de Tayrac, Marie; Galibert, Marie-Dominique; Burgun, Anita; Mosser, Jean
2006-01-01
Background Large-scale genomic studies based on transcriptome technologies provide clusters of genes that need to be functionally annotated. The Gene Ontology (GO) implements a controlled vocabulary organised into three hierarchies: cellular components, molecular functions and biological processes. This terminology allows a coherent and consistent description of the knowledge about gene functions. The GO terms related to genes come primarily from semi-automatic annotations made by trained biologists (annotation based on evidence) or text-mining of the published scientific literature (literature profiling). Results We report an original functional annotation method based on a combination of evidence and literature that overcomes the weaknesses and the limitations of each approach. It relies on the Gene Ontology Annotation database (GOA Human) and the PubGene biomedical literature index. We support these annotations with statistically associated GO terms and retrieve associative relations across the three GO hierarchies to emphasise the major pathways involved by a gene cluster. Both annotation methods and associative relations were quantitatively evaluated with a reference set of 7397 genes and a multi-cluster study of 14 clusters. We also validated the biological appropriateness of our hybrid method with the annotation of a single gene (cdc2) and that of a down-regulated cluster of 37 genes identified by a transcriptome study of an in vitro enterocyte differentiation model (CaCo-2 cells). Conclusion The combination of both approaches is more informative than either separate approach: literature mining can enrich an annotation based only on evidence. Text-mining of the literature can also find valuable associated MEDLINE references that confirm the relevance of the annotation. Eventually, GO terms networks can be built with associative relations in order to highlight cooperative and competitive pathways and their connected molecular functions. PMID:16674810
Using Helicopter Electromagnetic Surveys to Identify Potential Hazards at Mine Waste Impoundments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hammack, R.W.
2008-01-01
In July 2003, helicopter electromagnetic surveys were conducted at 14 coal waste impoundments in southern West Virginia. The purpose of the surveys was to detect conditions that could lead to impoundment failure either by structural failure of the embankment or by the flooding of adjacent or underlying mine works. Specifically, the surveys attempted to: 1) identify saturated zones within the mine waste, 2) delineate filtrate flow paths through the embankment or into adjacent strata and receiving streams, and 3) identify flooded mine workings underlying or adjacent to the waste impoundment. Data from the helicopter surveys were processed to generate conductivity/depthmore » images. Conductivity/depth images were then spatially linked to georeferenced air photos or topographic maps for interpretation. Conductivity/depth images were found to provide a snapshot of the hydrologic conditions that exist within the impoundment. This information can be used to predict potential areas of failure within the embankment because of its ability to image the phreatic zone. Also, the electromagnetic survey can identify areas of unconsolidated slurry in the decant basin and beneath the embankment. Although shallow, flooded mineworks beneath the impoundment were identified by this survey, it cannot be assumed that electromagnetic surveys can detect all underlying mines. A preliminary evaluation of the data implies that helicopter electromagnetic surveys can provide a better understanding of the phreatic zone than the piezometer arrays that are typically used.« less
Estimating natural background groundwater chemistry, Questa molybdenum mine, New Mexico
Verplanck, Phillip L.; Nordstrom, D. Kirk; Plumlee, Geoffrey S.; Walker, Bruce M.; Morgan, Lisa A.; Quane, Steven L.
2010-01-01
This 2 1/2 day field trip will present an overview of a U.S. Geological Survey (USGS) project whose objective was to estimate pre-mining groundwater chemistry at the Questa molybdenum mine, New Mexico. Because of intense debate among stakeholders regarding pre-mining groundwater chemistry standards, the New Mexico Environment Department and Chevron Mining Inc. (formerly Molycorp) agreed that the USGS should determine pre-mining groundwater quality at the site. In 2001, the USGS began a 5-year, multidisciplinary investigation to estimate pre-mining groundwater chemistry utilizing a detailed assessment of a proximal natural analog site and applied an interdisciplinary approach to infer pre-mining conditions. The trip will include a surface tour of the Questa mine and key locations in the erosion scar areas and along the Red River. The trip will provide participants with a detailed understanding of geochemical processes that influence pre-mining environmental baselines in mineralized areas and estimation techniques for determining pre-mining baseline conditions.
NASA Astrophysics Data System (ADS)
Moyle, Steve
Collaborative Data Mining is a setting where the Data Mining effort is distributed to multiple collaborating agents - human or software. The objective of the collaborative Data Mining effort is to produce solutions to the tackled Data Mining problem which are considered better by some metric, with respect to those solutions that would have been achieved by individual, non-collaborating agents. The solutions require evaluation, comparison, and approaches for combination. Collaboration requires communication, and implies some form of community. The human form of collaboration is a social task. Organizing communities in an effective manner is non-trivial and often requires well defined roles and processes. Data Mining, too, benefits from a standard process. This chapter explores the standard Data Mining process CRISP-DM utilized in a collaborative setting.
Data mining approach to model the diagnostic service management.
Lee, Sun-Mi; Lee, Ae-Kyung; Park, Il-Su
2006-01-01
Korea has National Health Insurance Program operated by the government-owned National Health Insurance Corporation, and diagnostic services are provided every two year for the insured and their family members. Developing a customer relationship management (CRM) system using data mining technology would be useful to improve the performance of diagnostic service programs. Under these circumstances, this study developed a model for diagnostic service management taking into account the characteristics of subjects using a data mining approach. This study could be further used to develop an automated CRM system contributing to the increase in the rate of receiving diagnostic services.
Stress-Survival Gene Identification From an Acid Mine Drainage Algal Mat Community
NASA Astrophysics Data System (ADS)
Urbina-Navarrete, J.; Fujishima, K.; Paulino-Lima, I. G.; Rothschild-Mancinelli, B.; Rothschild, L. J.
2014-12-01
Microbial communities from acid mine drainage environments are exposed to multiple stressors to include low pH, high dissolved metal loads, seasonal freezing, and desiccation. The microbial and algal communities that inhabit these niche environments have evolved strategies that allow for their ecological success. Metagenomic analyses are useful in identifying species diversity, however they do not elucidate the mechanisms that allow for the resilience of a community under these extreme conditions. Many known or predicted genes encode for protein products that are unknown, or similarly, many proteins cannot be traced to their gene of origin. This investigation seeks to identify genes that are active in an algal consortium during stress from living in an acid mine drainage environment. Our approach involves using the entire community transcriptome for a functional screen in an Escherichia coli host. This approach directly targets the genes involved in survival, without need for characterizing the members of the consortium.The consortium was harvested and stressed with conditions similar to the native environment it was collected from. Exposure to low pH (< 3.2), high metal load, desiccation, and deep freeze resulted in the expression of stress-induced genes that were transcribed into messenger RNA (mRNA). These mRNA transcripts were harvested to build complementary DNA (cDNA) libraries in E. coli. The transformed E. coli were exposed to the same stressors as the original algal consortium to select for surviving cells. Successful cells incorporated the transcripts that encode survival mechanisms, thus allowing for selection and identification of the gene(s) involved. Initial selection screens for freeze and desiccation tolerance have yielded E. coli that are 1 order of magnitude more resistant to freezing (0.01% survival of control with no transcript, 0.2% survival of E. coli with transcript) and 3 orders of magnitude more resistant to desiccation (0.005% survival of control cells with no transcripts, 5% survival of cells with transcript).This work is transformative because genetic functions can be selected without having prior knowledge of the genes or of the organisms involved. Work continues to identify the genes responsible for tolerance to extreme conditions and the bio-mechanisms involved.
78 FR 32691 - Proposed Collection; Comment Request; Certificate of Electrical Training
Federal Register 2010, 2011, 2012, 2013, 2014
2013-05-31
... DEPARTMENT OF LABOR Mine Safety and Health Administration Proposed Collection; Comment Request; Certificate of Electrical Training AGENCY: Mine Safety and Health Administration, Labor. ACTION: 60-Day Notice... clearly identified with ``OMB 1219-0001'' and sent to the Mine Safety and Health Administration (MSHA...
NASA Astrophysics Data System (ADS)
Davies, G.; Calvin, W. M.
2015-12-01
The exposure of pyrite to oxygen and water in mine waste environments is known to generate acidity and the accumulation of secondary iron minerals. Sulfates and secondary iron minerals associated with acid mine drainage (AMD) exhibit diverse spectral properties in the ultraviolet, visible and near-infrared regions of the electromagnetic spectrum. The use of hyperspectral imagery for identification of AMD mineralogy and contamination has been well studied. Fewer studies have examined the impacts of hydrologic variations on mapping AMD or the unique spectral signatures of mine waters. Open-pit mine lakes are an additional environmental hazard which have not been widely studied using imaging spectroscopy. A better understanding of AMD variation related to climate fluctuations and the spectral signatures of contaminated surface waters will aid future assessments of environmental contamination. This study examined the ability of multi-season airborne hyperspectral data to identify the geochemical evolution of substances and contaminant patterns at the Leviathan Mine Superfund site. The mine is located 24 miles southeast of Lake Tahoe and contains remnant tailings piles and several AMD collection ponds. The objectives were to 1) distinguish temporal changes in mineralogy at a the remediated open-pit sulfur mine, 2) identify the absorption features of mine affected waters, and 3) quantitatively link water spectra to known dissolved iron concentrations. Images from NASA's AVIRIS instrument were collected in the spring, summer, and fall seasons for two consecutive years at Leviathan (HyspIRI campaign). Images had a spatial resolution of 15 meters at nadir. Ground-based surveys using the ASD FieldSpecPro spectrometer and laboratory spectral and chemical analysis complemented the remote sensing data. Temporal changes in surface mineralogy were difficult to distinguish. However, seasonal changes in pond water quality were identified. Dissolved ferric iron and chlorophyll-a concentrations were determined to be the major influences on pond water spectral variation.
Alanazi, Ibrahim O; AlYahya, Sami A; Ebrahimie, Esmaeil; Mohammadi-Dehcheshmeh, Manijeh
2018-06-15
Exponentially growing scientific knowledge in scientific publications has resulted in the emergence of a new interdisciplinary science of literature mining. In text mining, the machine reads the published literature and transfers the discovered knowledge to mathematical-like formulas. In an integrative approach in this study, we used text mining in combination with network discovery, pathway analysis, and enrichment analysis of genomic regions for better understanding of biomarkers in lung cancer. Particular attention was paid to non-coding biomarkers. In total, 60 MicroRNA biomarkers were reported for lung cancer, including some prognostic biomarkers. MIR21, MIR155, MALAT1, and MIR31 were the top non-coding RNA biomarkers of lung cancer. Text mining identified 447 proteins which have been studied as biomarkers in lung cancer. EGFR (receptor), TP53 (transcription factor), KRAS, CDKN2A, ENO2, KRT19, RASSF1, GRP (ligand), SHOX2 (transcription factor), and ERBB2 (receptor) were the most studied proteins. Within small molecules, thymosin-a1, oestrogen, and 8-OHdG have received more attention. We found some chromosomal bands, such as 7q32.2, 18q12.1, 6p12, 11p15.5, and 3p21.3 that are highly involved in deriving lung cancer biomarkers. Copyright © 2018 Elsevier B.V. All rights reserved.
Mining Available Data from the United States Environmental ...
Demands for quick and accurate life cycle assessments create a need for methods to rapidly generate reliable life cycle inventories (LCI). Data mining is a suitable tool for this purpose, especially given the large amount of available governmental data. These data are typically applied to LCIs on a case-by-case basis. As linked open data becomes more prevalent, it may be possible to automate LCI using data mining by establishing a reproducible approach for identifying, extracting, and processing the data. This work proposes a method for standardizing and eventually automating the discovery and use of publicly available data at the United States Environmental Protection Agency for chemical-manufacturing LCI. The method is developed using a case study of acetic acid. The data quality and gap analyses for the generated inventory found that the selected data sources can provide information with equal or better reliability and representativeness on air, water, hazardous waste, on-site energy usage, and production volumes but with key data gaps including material inputs, water usage, purchased electricity, and transportation requirements. A comparison of the generated LCI with existing data revealed that the data mining inventory is in reasonable agreement with existing data and may provide a more-comprehensive inventory of air emissions and water discharges. The case study highlighted challenges for current data management practices that must be overcome to successfu
Convalescing Cluster Configuration Using a Superlative Framework
Sabitha, R.; Karthik, S.
2015-01-01
Competent data mining methods are vital to discover knowledge from databases which are built as a result of enormous growth of data. Various techniques of data mining are applied to obtain knowledge from these databases. Data clustering is one such descriptive data mining technique which guides in partitioning data objects into disjoint segments. K-means algorithm is a versatile algorithm among the various approaches used in data clustering. The algorithm and its diverse adaptation methods suffer certain problems in their performance. To overcome these issues a superlative algorithm has been proposed in this paper to perform data clustering. The specific feature of the proposed algorithm is discretizing the dataset, thereby improving the accuracy of clustering, and also adopting the binary search initialization method to generate cluster centroids. The generated centroids are fed as input to K-means approach which iteratively segments the data objects into respective clusters. The clustered results are measured for accuracy and validity. Experiments conducted by testing the approach on datasets from the UC Irvine Machine Learning Repository evidently show that the accuracy and validity measure is higher than the other two approaches, namely, simple K-means and Binary Search method. Thus, the proposed approach proves that discretization process will improve the efficacy of descriptive data mining tasks. PMID:26543895
An Outbreak of Lymphocutaneous Sporotrichosis among Mine-Workers in South Africa
Govender, Nelesh P.; Maphanga, Tsidiso G.; Zulu, Thokozile G.; Patel, Jaymati; Walaza, Sibongile; Jacobs, Charlene; Ebonwu, Joy I.; Ntuli, Sindile; Naicker, Serisha D.; Thomas, Juno
2015-01-01
Background The largest outbreak of sporotrichosis occurred between 1938 and 1947 in the gold mines of Witwatersrand in South Africa. Here, we describe an outbreak of lymphocutaneous sporotrichosis that was investigated in a South African gold mine in 2011. Methodology Employees working at a reopened section of the mine were recruited for a descriptive cross-sectional study. Informed consent was sought for interview, clinical examination and medical record review. Specimens were collected from participants with active or partially-healed lymphocutaneous lesions. Environmental samples were collected from underground mine levels. Sporothrix isolates were identified by sequencing of the internal transcribed spacer region of the ribosomal gene and the nuclear calmodulin gene. Principal Findings Of 87 male miners, 81 (93%) were interviewed and examined, of whom 29 (36%) had skin lesions; specimens were collected from 17 (59%). Sporotrichosis was laboratory-confirmed among 10 patients and seven had clinically-compatible lesions. Of 42 miners with known HIV status, 11 (26%) were HIV-infected. No cases of disseminated disease were detected. Participants with ≤3 years’ mining experience had a four times greater odds of developing sporotrichosis than those who had been employed for >3 years (adjusted OR 4.0, 95% CI 1.2–13.1). Isolates from 8 patients were identified as Sporothrix schenckii sensu stricto by calmodulin gene sequencing while environmental isolates were identified as Sporothrix mexicana. Conclusions/Significance S. schenckii sensu stricto was identified as the causative pathogen. Although genetically distinct species were isolated from clinical and environmental sources, it is likely that the source was contaminated soil and untreated wood underground. No cases occurred following recommendations to close sections of the mine, treat timber and encourage consistent use of personal protective equipment. Sporotrichosis is a potentially re-emerging disease where traditional, rather than heavily mechanised, mining techniques are used. Surveillance should be instituted at sentinel locations. PMID:26407300
Allegretta, Ignazio; Porfido, Carlo; Martin, Maria; Barberis, Elisabetta; Terzano, Roberto; Spagnuolo, Matteo
2018-06-24
Arsenic concentration and distribution were studied by combining laboratory X-ray-based techniques (wavelength dispersive X-ray fluorescence (WDXRF), micro X-ray fluorescence (μXRF), and X-ray powder diffraction (XRPD)), field emission scanning electron microscopy equipped with microanalysis (FE-SEM-EDX), and sequential extraction procedure (SEP) coupled to total reflection X-ray fluorescence (TXRF) analysis. This approach was applied to three contaminated soils and one mine tailing collected near the gold extraction plant at the Crocette gold mine (Macugnaga, VB) in the Monte Rosa mining district (Piedmont, Italy). Arsenic (As) concentration, measured with WDXRF, ranged from 145 to 40,200 mg/kg. XRPD analysis evidenced the presence of jarosite and the absence of any As-bearing mineral, suggesting a high weathering grade and strong oxidative conditions. However, small domains of Fe arsenate were identified by combining μXRF with FE-SEM-EDX. SEP results revealed that As was mainly associated to amorphous Fe oxides/hydroxides or hydroxysulfates (50-80%) and the combination of XRPD and FE-SEM-EDX suggested that this phase could be attributed to schwertmannite. On the basis of the reported results, As is scarcely mobile, even if a consistent As fraction (1-3 g As/kg of soil) is still potentially mobilizable. In general, the proposed combination of laboratory X-ray techniques could be successfully employed to unravel environmental issues related to metal(loid) pollution in soil and sediments.
Understanding Teacher Users of a Digital Library Service: A Clustering Approach
ERIC Educational Resources Information Center
Xu, Beijie; Recker, Mimi
2011-01-01
This article describes the Knowledge Discovery and Data Mining (KDD) process and its application in the field of educational data mining (EDM) in the context of a digital library service called the Instructional Architect (IA.usu.edu). In particular, the study reported in this article investigated a certain type of data mining problem, clustering,…
Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges.
Singhal, Ayush; Leaman, Robert; Catlett, Natalie; Lemberger, Thomas; McEntyre, Johanna; Polson, Shawn; Xenarios, Ioannis; Arighi, Cecilia; Lu, Zhiyong
2016-01-01
Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs. We note that system 'accuracy' remains a challenge and identify several additional common difficulties and potential research directions including (i) the 'scalability' issue due to the increasing need of mining information from millions of full-text articles, (ii) the 'interoperability' issue of integrating various text-mining systems into existing curation workflows and (iii) the 'reusability' issue on the difficulty of applying trained systems to text genres that are not seen previously during development. We then describe related efforts within the text-mining community, with a special focus on the BioCreative series of challenge workshops. We believe that focusing on the near-term challenges identified in this work will amplify the opportunities afforded by the continued adoption of text-mining tools. Finally, in order to sustain the curation ecosystem and have text-mining systems adopted for practical benefits, we call for increased collaboration between text-mining researchers and various stakeholders, including researchers, publishers and biocurators. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.
NASA Astrophysics Data System (ADS)
Zhang, Jun; Yao, Duoxi; Su, Yue
2018-02-01
Under the current situation of energy demand, coal is still one of the major energy sources in China for a certain period of time, so the task of coal mine safety production remains arduous. In order to identify the water source of the mine accurately, this article takes the example from Renlou and Tongting coal mines in the northern Anhui mining area. A total of 7 conventional water chemical indexes were selected, including Ca2+, Mg2+, Na++K+, Cl-, SO4 2-, HCO3 - and TDS, to establish a multivariate matrix model for the source identifying inrush water. The results show that the model is simple and is rarely limited by the quantity of water samples, and the recognition effect is ideal, which can be applied to the control and treatment for water inrush.
The influence of para-seismic vibrations, induced by blasting works, on structures: a Case Study
NASA Astrophysics Data System (ADS)
Andrusikiewicz, Wacław
2018-04-01
Underground mining operations are often associated with the necessity to use explosives. Several hundreds of kilograms of explosives, subdivided into small charges suitable for a specific mining job, are used each time in a blasting operation. In many cases, mining engineers carry out remote central blasting works, which means that all the charges placed at faces are initiated from one control point (usually, a control room in the mine) at the same time. Such coordinated explosions generate para-seismic movements whose consequences can be felt on land surface, with subsequent effects identified in buildings and structures. This paper discusses briefly selected standards applicable to the harmful para-seismic impacts. The author presents the results of the research conducted with the intention to identify harmful effects of the basting works carried out in the "Kłodawa" Salt Mine.
NASA Astrophysics Data System (ADS)
Whitford, Melinda M.
Science educational reforms have placed major emphasis on improving science classroom instruction and it is therefore vital to study opportunity-to-learn (OTL) variables related to student science learning experiences and teacher teaching practices. This study will identify relationships between OTL and student science achievement and will identify OTL predictors of students' attainment at various distinct achievement levels (low/intermediate/high/advanced). Specifically, the study (a) address limitations of previous studies by examining a large number of independent and control variables that may impact students' science achievement and (b) it will test hypotheses of structural relations to how the identified predictors and mediating factors impact on student achievement levels. The study will follow a multi-stage and integrated bottom-up and top-down approach to identify predictors of students' achievement levels on standardized tests using TIMSS 2011 dataset. Data mining or pattern recognition, a bottom-up approach will identify the most prevalent association patterns between different student achievement levels and variables related to student science learning experiences, teacher teaching practices and home and school environments. The second stage is a top-down approach, testing structural equation models of relations between the significant predictors and students' achievement levels according.
A Contextualized, Differential Sequence Mining Method to Derive Students' Learning Behavior Patterns
ERIC Educational Resources Information Center
Kinnebrew, John S.; Loretz, Kirk M.; Biswas, Gautam
2013-01-01
Computer-based learning environments can produce a wealth of data on student learning interactions. This paper presents an exploratory data mining methodology for assessing and comparing students' learning behaviors from these interaction traces. The core algorithm employs a novel combination of sequence mining techniques to identify deferentially…
Coal mining is a major resource extraction activity on the Appalachian Mountains. The increased size and frequency of a specific type of surface mining, known as mountain top removal-valley fill, has in recent years raised various environmental concerns. During mountainto...
Using text-mining techniques in electronic patient records to identify ADRs from medicine use.
Warrer, Pernille; Hansen, Ebba Holme; Juhl-Jensen, Lars; Aagaard, Lise
2012-05-01
This literature review included studies that use text-mining techniques in narrative documents stored in electronic patient records (EPRs) to investigate ADRs. We searched PubMed, Embase, Web of Science and International Pharmaceutical Abstracts without restrictions from origin until July 2011. We included empirically based studies on text mining of electronic patient records (EPRs) that focused on detecting ADRs, excluding those that investigated adverse events not related to medicine use. We extracted information on study populations, EPR data sources, frequencies and types of the identified ADRs, medicines associated with ADRs, text-mining algorithms used and their performance. Seven studies, all from the United States, were eligible for inclusion in the review. Studies were published from 2001, the majority between 2009 and 2010. Text-mining techniques varied over time from simple free text searching of outpatient visit notes and inpatient discharge summaries to more advanced techniques involving natural language processing (NLP) of inpatient discharge summaries. Performance appeared to increase with the use of NLP, although many ADRs were still missed. Due to differences in study design and populations, various types of ADRs were identified and thus we could not make comparisons across studies. The review underscores the feasibility and potential of text mining to investigate narrative documents in EPRs for ADRs. However, more empirical studies are needed to evaluate whether text mining of EPRs can be used systematically to collect new information about ADRs. © 2011 The Authors. British Journal of Clinical Pharmacology © 2011 The British Pharmacological Society.
Using text-mining techniques in electronic patient records to identify ADRs from medicine use
Warrer, Pernille; Hansen, Ebba Holme; Juhl-Jensen, Lars; Aagaard, Lise
2012-01-01
This literature review included studies that use text-mining techniques in narrative documents stored in electronic patient records (EPRs) to investigate ADRs. We searched PubMed, Embase, Web of Science and International Pharmaceutical Abstracts without restrictions from origin until July 2011. We included empirically based studies on text mining of electronic patient records (EPRs) that focused on detecting ADRs, excluding those that investigated adverse events not related to medicine use. We extracted information on study populations, EPR data sources, frequencies and types of the identified ADRs, medicines associated with ADRs, text-mining algorithms used and their performance. Seven studies, all from the United States, were eligible for inclusion in the review. Studies were published from 2001, the majority between 2009 and 2010. Text-mining techniques varied over time from simple free text searching of outpatient visit notes and inpatient discharge summaries to more advanced techniques involving natural language processing (NLP) of inpatient discharge summaries. Performance appeared to increase with the use of NLP, although many ADRs were still missed. Due to differences in study design and populations, various types of ADRs were identified and thus we could not make comparisons across studies. The review underscores the feasibility and potential of text mining to investigate narrative documents in EPRs for ADRs. However, more empirical studies are needed to evaluate whether text mining of EPRs can be used systematically to collect new information about ADRs. PMID:22122057
Challenges in recovering resources from acid mine drainage
Nordstrom, D. Kirk; Bowell, Robert J.; Campbell, Kate M.; Alpers, Charles N.
2017-01-01
Metal recovery from mine waters and effluents is not a new approach but one that has occurred largely opportunistically over the last four millennia. Due to the need for low-cost resources and increasingly stringent environmental conditions, mine waters are being considered in a fresh light with a designed, deliberate approach to resource recovery often as part of a larger water treatment evaluation. Mine water chemistry is highly dependent on many factors including geology, ore deposit composition and mineralogy, mining methods, climate, site hydrology, and others. Mine waters are typically Ca-Mg-SO4±Al±Fe with a broad range in pH and metal content. The main issue in recovering components of these waters having potential economic value, such as base metals or rare earth elements, is the separation of these from more reactive metals such as Fe and Al. Broad categories of methods for separating and extracting substances from acidic mine drainage are chemical and biological. Chemical methods include solution, physicochemical, and electrochemical technologies. Advances in membrane techniques such as reverse osmosis have been substantial and the technique is both physical and chemical. Biological methods may be further divided into microbiological and macrobiological, but only the former is considered here as a recovery method, as the latter is typically used as a passive form of water treatment.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mendis, M.S.; Rosenberg, J.I.; Medville, D.M.
1980-03-01
This report presents a summary of the analytical approach taken and the conclusions reached in an assessment of the supply and demand for manpower in the coal mining industry through the year 2000. A hybrid system dynamics/econometric model of the coal mining industry was developed which incorporates relationships between technological change, labor productivity, production costs, wages, graduation rates, and other key variables in estimating imbalances between labor supply and demand. Study results indicate that while the supply of production workers is expected to be sufficient under most future demand scenarios, periodic shortages of experienced workers, especially in the Northern Greatmore » Plains can be expected. Other study findings are that the supply of mining engineers will be sufficient under all but the highest coal demand scenario, a shortage of faculty will affect the supply of mining engineers in the near-term and the employment of mining technicians is expected to exhibit the largest increase in any labor category studied. In this volume the nature of the coal mining manpower problem is discussed, a detailed description of that analysis conducted and the sources of data used is provided, and the findings of the study are presented.« less
Chapter 16: text mining for translational bioinformatics.
Cohen, K Bretonnel; Hunter, Lawrence E
2013-04-01
Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.
ERIC Educational Resources Information Center
Luan, Jing; Zhao, Chun-Mei; Hayek, John C.
2009-01-01
Data mining provides both systematic and systemic ways to detect patterns of student engagement among students at hundreds of institutions. Using traditional statistical techniques alone, the task would be significantly difficult--if not impossible--considering the size and complexity in both data and analytical approaches necessary for this…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kniesner, T.J.; Leeth, J.D.
2004-09-15
Using recently assembled data from the Mine Safety and Health Administration (MSHA) we shed new light on the regulatory approach to workplace safety. Because all underground coal mines are inspected quarterly, MSHA regulations will not be ineffective because of infrequent inspections. From over 200 different specifications of dynamic mine safety regressions we select the specification producing the largest MSHA impact. Even using results most favorable to the agency, MSHA is not currently cost effective. Almost 700,000 life years could be gained for typical miners if a quarter of MSHA's enforcement budget were reallocated to other programs (more heart disease screeningmore » or defibrillators at worksites).« less
Vilar, Santiago; Harpaz, Rave; Chase, Herbert S; Costanzi, Stefano; Rabadan, Raul
2011-01-01
Background Adverse drug events (ADE) cause considerable harm to patients, and consequently their detection is critical for patient safety. The US Food and Drug Administration maintains an adverse event reporting system (AERS) to facilitate the detection of ADE in drugs. Various data mining approaches have been developed that use AERS to detect signals identifying associations between drugs and ADE. The signals must then be monitored further by domain experts, which is a time-consuming task. Objective To develop a new methodology that combines existing data mining algorithms with chemical information by analysis of molecular fingerprints to enhance initial ADE signals generated from AERS, and to provide a decision support mechanism to facilitate the identification of novel adverse events. Results The method achieved a significant improvement in precision in identifying known ADE, and a more than twofold signal enhancement when applied to the ADE rhabdomyolysis. The simplicity of the method assists in highlighting the etiology of the ADE by identifying structurally similar drugs. A set of drugs with strong evidence from both AERS and molecular fingerprint-based modeling is constructed for further analysis. Conclusion The results demonstrate that the proposed methodology could be used as a pharmacovigilance decision support tool to facilitate ADE detection. PMID:21946238
NASA Astrophysics Data System (ADS)
Mohammadi, Mousa; Rai, Piyush; Gupta, Suprakash
2017-03-01
Overall Equipment Effectiveness (OEE) has been used since last over two decades as a measure of performance in manufacturing industries. Unfortunately, enough, application of OEE in mining and excavation industry has not been duly adopted. In this paper an effort has been made to identify the OEE for performance evaluation of Bucket based Excavating, Loading and Transport (BELT) equipment. The conceptual model of OEE, as used in the manufacturing industries, has been revised to adapt to the BELT equipment. The revised and adapted model considered the operational time, speed and bucket capacity utilization losses as the key OEE components for evaluating the performance of BELT equipment. To illustrate the efficacy of the devised model on real-time basis, a case study was undertaken on the biggest single bucket excavating equipment - the dragline, in a large surface coal mine. One-year data was collected in order to evaluate the proposed OEE model.
NASA Astrophysics Data System (ADS)
Colantonio, Alessandro; di Pietro, Roberto; Ocello, Alberto; Verde, Nino Vincenzo
In this paper we address the problem of generating a candidate role-set for an RBAC configuration that enjoys the following two key features: it minimizes the administration cost; and, it is a stable candidate role-set. To achieve these goals, we implement a three steps methodology: first, we associate a weight to roles; second, we identify and remove the user-permission assignments that cannot belong to a role that have a weight exceeding a given threshold; third, we restrict the problem of finding a candidate role-set for the given system configuration using only the user-permission assignments that have not been removed in the second step—that is, user-permission assignments that belong to roles with a weight exceeding the given threshold. We formally show—proof of our results are rooted in graph theory—that this methodology achieves the intended goals. Finally, we discuss practical applications of our approach to the role mining problem.
Assessing Weather-Yield Relationships in Rice at Local Scale Using Data Mining Approaches
Delerce, Sylvain; Dorado, Hugo; Grillon, Alexandre; Rebolledo, Maria Camila; Prager, Steven D.; Patiño, Victor Hugo; Garcés Varón, Gabriel; Jiménez, Daniel
2016-01-01
Seasonal and inter-annual climate variability have become important issues for farmers, and climate change has been shown to increase them. Simultaneously farmers and agricultural organizations are increasingly collecting observational data about in situ crop performance. Agriculture thus needs new tools to cope with changing environmental conditions and to take advantage of these data. Data mining techniques make it possible to extract embedded knowledge associated with farmer experiences from these large observational datasets in order to identify best practices for adapting to climate variability. We introduce new approaches through a case study on irrigated and rainfed rice in Colombia. Preexisting observational datasets of commercial harvest records were combined with in situ daily weather series. Using Conditional Inference Forest and clustering techniques, we assessed the relationships between climatic factors and crop yield variability at the local scale for specific cultivars and growth stages. The analysis showed clear relationships in the various location-cultivar combinations, with climatic factors explaining 6 to 46% of spatiotemporal variability in yield, and with crop responses to weather being non-linear and cultivar-specific. Climatic factors affected cultivars differently during each stage of development. For instance, one cultivar was affected by high nighttime temperatures in the reproductive stage but responded positively to accumulated solar radiation during the ripening stage. Another was affected by high nighttime temperatures during both the vegetative and reproductive stages. Clustering of the weather patterns corresponding to individual cropping events revealed different groups of weather patterns for irrigated and rainfed systems with contrasting yield levels. Best-suited cultivars were identified for some weather patterns, making weather-site-specific recommendations possible. This study illustrates the potential of data mining for adding value to existing observational data in agriculture by allowing embedded knowledge to be quickly leveraged. It generates site-specific information on cultivar response to climatic factors and supports on-farm management decisions for adaptation to climate variability. PMID:27560980
Mukhopadhyay, Anirban; Maulik, Ujjwal; Bandyopadhyay, Sanghamitra
2012-01-01
Identification of potential viral-host protein interactions is a vital and useful approach towards development of new drugs targeting those interactions. In recent days, computational tools are being utilized for predicting viral-host interactions. Recently a database containing records of experimentally validated interactions between a set of HIV-1 proteins and a set of human proteins has been published. The problem of predicting new interactions based on this database is usually posed as a classification problem. However, posing the problem as a classification one suffers from the lack of biologically validated negative interactions. Therefore it will be beneficial to use the existing database for predicting new viral-host interactions without the need of negative samples. Motivated by this, in this article, the HIV-1–human protein interaction database has been analyzed using association rule mining. The main objective is to identify a set of association rules both among the HIV-1 proteins and among the human proteins, and use these rules for predicting new interactions. In this regard, a novel association rule mining technique based on biclustering has been proposed for discovering frequent closed itemsets followed by the association rules from the adjacency matrix of the HIV-1–human interaction network. Novel HIV-1–human interactions have been predicted based on the discovered association rules and tested for biological significance. For validation of the predicted new interactions, gene ontology-based and pathway-based studies have been performed. These studies show that the human proteins which are predicted to interact with a particular viral protein share many common biological activities. Moreover, literature survey has been used for validation purpose to identify some predicted interactions that are already validated experimentally but not present in the database. Comparison with other prediction methods is also discussed. PMID:22539940
An integrative data mining approach to identifying adverse outcome pathway signatures.
Oki, Noffisat O; Edwards, Stephen W
2016-03-28
The Adverse Outcome Pathway (AOP) framework is a tool for making biological connections and summarizing key information across different levels of biological organization to connect biological perturbations at the molecular level to adverse outcomes for an individual or population. Computational approaches to explore and determine these connections can accelerate the assembly of AOPs. By leveraging the wealth of publicly available data covering chemical effects on biological systems, computationally-predicted AOPs (cpAOPs) were assembled via data mining of high-throughput screening (HTS) in vitro data, in vivo data and other disease phenotype information. Frequent Itemset Mining (FIM) was used to find associations between the gene targets of ToxCast HTS assays and disease data from Comparative Toxicogenomics Database (CTD) by using the chemicals as the common aggregators between datasets. The method was also used to map gene expression data to disease data from CTD. A cpAOP network was defined by considering genes and diseases as nodes and FIM associations as edges. This network contained 18,283 gene to disease associations for the ToxCast data and 110,253 for CTD gene expression. Two case studies show the value of the cpAOP network by extracting subnetworks focused either on fatty liver disease or the Aryl Hydrocarbon Receptor (AHR). The subnetwork surrounding fatty liver disease included many genes known to play a role in this disease. When querying the cpAOP network with the AHR gene, an interesting subnetwork including glaucoma was identified. While substantial literature exists to support the potential for AHR ligands to elicit glaucoma, it was not explicitly captured in the public annotation information in CTD. The subnetwork from this analysis suggests a cpAOP that includes changes in CYP1B1 expression, which has been previously established in the literature as a primary cause of glaucoma. These case studies highlight the value in integrating multiple data sources when defining cpAOPs for HTS data. Copyright © 2016. Published by Elsevier Ireland Ltd.
Assessing Weather-Yield Relationships in Rice at Local Scale Using Data Mining Approaches.
Delerce, Sylvain; Dorado, Hugo; Grillon, Alexandre; Rebolledo, Maria Camila; Prager, Steven D; Patiño, Victor Hugo; Garcés Varón, Gabriel; Jiménez, Daniel
2016-01-01
Seasonal and inter-annual climate variability have become important issues for farmers, and climate change has been shown to increase them. Simultaneously farmers and agricultural organizations are increasingly collecting observational data about in situ crop performance. Agriculture thus needs new tools to cope with changing environmental conditions and to take advantage of these data. Data mining techniques make it possible to extract embedded knowledge associated with farmer experiences from these large observational datasets in order to identify best practices for adapting to climate variability. We introduce new approaches through a case study on irrigated and rainfed rice in Colombia. Preexisting observational datasets of commercial harvest records were combined with in situ daily weather series. Using Conditional Inference Forest and clustering techniques, we assessed the relationships between climatic factors and crop yield variability at the local scale for specific cultivars and growth stages. The analysis showed clear relationships in the various location-cultivar combinations, with climatic factors explaining 6 to 46% of spatiotemporal variability in yield, and with crop responses to weather being non-linear and cultivar-specific. Climatic factors affected cultivars differently during each stage of development. For instance, one cultivar was affected by high nighttime temperatures in the reproductive stage but responded positively to accumulated solar radiation during the ripening stage. Another was affected by high nighttime temperatures during both the vegetative and reproductive stages. Clustering of the weather patterns corresponding to individual cropping events revealed different groups of weather patterns for irrigated and rainfed systems with contrasting yield levels. Best-suited cultivars were identified for some weather patterns, making weather-site-specific recommendations possible. This study illustrates the potential of data mining for adding value to existing observational data in agriculture by allowing embedded knowledge to be quickly leveraged. It generates site-specific information on cultivar response to climatic factors and supports on-farm management decisions for adaptation to climate variability.
British Defense Policy: A New Approach?
1988-12-14
inherent to their well-being, was also acknowledged by the remainder of the world in its attitude toward Britain. Is not "Rule Britannia , Britannia ...Castle Class 1 1 Island Class 7 43 Mine -Counter Minesweepers 2 2 Mine River Class 12 Ton Class 10 3 Hunt Class 12 1 Patrol Craft Bird Class 5 Coastal 15...submarine warfare carriers, assault ships, and mine -counter mine vessels. British naval aircraft is as depicted in Table 2. Table 2. Aircraft of the Royal
The enviornmental assessment of a contemporary coal mining system
NASA Technical Reports Server (NTRS)
Dutzi, E. J.; Sullivan, P. J.; Hutchinson, C. F.; Stevens, C. M.
1980-01-01
A contemporary underground coal mine in eastern Kentucky was assessed in order to determine potential off-site and on-site environmental impacts associated with the mining system in the given environmental setting. A 4 section, continuous room and pillor mine plan was developed for an appropriate site in eastern Kentucky. Potential environmental impacts were identified, and mitigation costs determined. The major potential environmental impacts were determined to be: acid water drainage from the mine and refuse site, uneven subsidence of the surface as a result of mining activity, and alteration of ground water aquifers in the subsidence zone. In the specific case examined, the costs of environmental impact mitigation to levels prescribed by regulations would not exceed $1/ton of coal mined, and post mining land values would not be affected.
Guo, Li; Zhao, Weituo; Gu, Xiaowen; Zhao, Xinyun; Chen, Juan; Cheng, Shenggao
2017-11-29
Background: Mining activities always emit metal(loid)s into the surrounding environment, where their accumulation in the soil may pose risks and hazards to humans and ecosystems. Objective : This paper aims to determine of the type, source, chemical form, fate and transport, and accurate risk assessment of 17 metal(loid) contaminants including As, Cd, Cu, Ni, Pb, Zn, Cr, Ag, B, Bi, Co, Mo, Sb, Ti, V, W and Sn in the soils collected from an abandoned tungsten mining area, and to guide the implementing of appropriate remediation strategies. Methods : Contamination factors ( CFs ) and integrated pollution indexes ( IPIs ) and enrichment factors ( EFs ) were used to assess their ecological risk and the sources were identified by using multivariate statistics analysis, spatial distribution investigation and correlation matrix. Results : The IPI and EF values indicated the soils in the mine site and the closest downstream one were extremely disturbed by metal(loid)s such as As, Bi, W, B, Cu, Pb and Sn, which were emitted from the mining wastes and acid drainages and delivered by the runoff and human activities. Arsenic contamination was detected in nine sites with the highest CF values at 24.70 next to the mining site. The Cd contamination scattered in the paddy soils around the resident areas with higher fraction of bioavailable forms, primarily associated with intense application of phosphorus fertilizer. The lithogenic elements V, Ti, Ag, Ni, Sb, Mo exhibit low contamination in all sampling points and their distribution were depended on the soil texture and pedogenesis process. Conclusions : The long term historical mining activities have caused severe As contamination and higher enrichment of the other elements of orebody in the local soils. The appropriate remediation treatment approach should be proposed to reduce the bioavailability of Cd in the paddy soils and to immobilize As to reclaim the soils around the mining site. Furthermore, alternative fertilizing way and irrigating water sources are urgencies to reduce the input of Cd and As into the local soils effectively.
Guo, Li; Zhao, Weituo; Gu, Xiaowen; Zhao, Xinyun; Chen, Juan; Cheng, Shenggao
2017-01-01
Background: Mining activities always emit metal(loid)s into the surrounding environment, where their accumulation in the soil may pose risks and hazards to humans and ecosystems. Objective: This paper aims to determine of the type, source, chemical form, fate and transport, and accurate risk assessment of 17 metal(loid) contaminants including As, Cd, Cu, Ni, Pb, Zn, Cr, Ag, B, Bi, Co, Mo, Sb, Ti, V, W and Sn in the soils collected from an abandoned tungsten mining area, and to guide the implementing of appropriate remediation strategies. Methods: Contamination factors (CFs) and integrated pollution indexes (IPIs) and enrichment factors (EFs) were used to assess their ecological risk and the sources were identified by using multivariate statistics analysis, spatial distribution investigation and correlation matrix. Results: The IPI and EF values indicated the soils in the mine site and the closest downstream one were extremely disturbed by metal(loid)s such as As, Bi, W, B, Cu, Pb and Sn, which were emitted from the mining wastes and acid drainages and delivered by the runoff and human activities. Arsenic contamination was detected in nine sites with the highest CF values at 24.70 next to the mining site. The Cd contamination scattered in the paddy soils around the resident areas with higher fraction of bioavailable forms, primarily associated with intense application of phosphorus fertilizer. The lithogenic elements V, Ti, Ag, Ni, Sb, Mo exhibit low contamination in all sampling points and their distribution were depended on the soil texture and pedogenesis process. Conclusions: The long term historical mining activities have caused severe As contamination and higher enrichment of the other elements of orebody in the local soils. The appropriate remediation treatment approach should be proposed to reduce the bioavailability of Cd in the paddy soils and to immobilize As to reclaim the soils around the mining site. Furthermore, alternative fertilizing way and irrigating water sources are urgencies to reduce the input of Cd and As into the local soils effectively. PMID:29186069
Parodi, Stefano; Dosi, Corrado; Zambon, Antonella; Ferrari, Enrico; Muselli, Marco
2017-12-01
Identifying potential risk factors for problem gambling (PG) is of primary importance for planning preventive and therapeutic interventions. We illustrate a new approach based on the combination of standard logistic regression and an innovative method of supervised data mining (Logic Learning Machine or LLM). Data were taken from a pilot cross-sectional study to identify subjects with PG behaviour, assessed by two internationally validated scales (SOGS and Lie/Bet). Information was obtained from 251 gamblers recruited in six betting establishments. Data on socio-demographic characteristics, lifestyle and cognitive-related factors, and type, place and frequency of preferred gambling were obtained by a self-administered questionnaire. The following variables associated with PG were identified: instant gratification games, alcohol abuse, cognitive distortion, illegal behaviours and having started gambling with a relative or a friend. Furthermore, the combination of LLM and LR indicated the presence of two different types of PG, namely: (a) daily gamblers, more prone to illegal behaviour, with poor money management skills and who started gambling at an early age, and (b) non-daily gamblers, characterised by superstitious beliefs and a higher preference for immediate reward games. Finally, instant gratification games were strongly associated with the number of games usually played. Studies on gamblers habitually frequently betting shops are rare. The finding of different types of PG by habitual gamblers deserves further analysis in larger studies. Advanced data mining algorithms, like LLM, are powerful tools and potentially useful in identifying risk factors for PG.
Identifying Threats Using Graph-based Anomaly Detection
NASA Astrophysics Data System (ADS)
Eberle, William; Holder, Lawrence; Cook, Diane
Much of the data collected during the monitoring of cyber and other infrastructures is structural in nature, consisting of various types of entities and relationships between them. The detection of threatening anomalies in such data is crucial to protecting these infrastructures. We present an approach to detecting anomalies in a graph-based representation of such data that explicitly represents these entities and relationships. The approach consists of first finding normative patterns in the data using graph-based data mining and then searching for small, unexpected deviations to these normative patterns, assuming illicit behavior tries to mimic legitimate, normative behavior. The approach is evaluated using several synthetic and real-world datasets. Results show that the approach has high truepositive rates, low false-positive rates, and is capable of detecting complex structural anomalies in real-world domains including email communications, cellphone calls and network traffic.
Topaz, Maxim; Radhakrishnan, Kavita; Lei, Victor; Zhou, Li
2016-01-01
Effective self-management can decrease up to 50% of heart failure hospitalizations. Unfortunately, self-management by patients with heart failure remains poor. This pilot study aimed to explore the use of text-mining to identify heart failure patients with ineffective self-management. We first built a comprehensive self-management vocabulary based on the literature and clinical notes review. We then randomly selected 545 heart failure patients treated within Partners Healthcare hospitals (Boston, MA, USA) and conducted a regular expression search with the compiled vocabulary within 43,107 interdisciplinary clinical notes of these patients. We found that 38.2% (n = 208) patients had documentation of ineffective heart failure self-management in the domains of poor diet adherence (28.4%), missed medical encounters (26.4%) poor medication adherence (20.2%) and non-specified self-management issues (e.g., "compliance issues", 34.6%). We showed the feasibility of using text-mining to identify patients with ineffective self-management. More natural language processing algorithms are needed to help busy clinicians identify these patients.
Novel methods for detecting buried explosive devices
NASA Astrophysics Data System (ADS)
Kercel, Stephen W.; Burlage, Robert S.; Patek, David R.; Smith, Cyrus M.; Hibbs, Andrew D.; Rayner, Timothy J.
1997-07-01
Oak Ridge National Laboratory and Quantum Magnetics, Inc. are exploring novel landmine detection technologies. Technologies considered here include bioreporter bacteria, swept acoustic resonance, nuclear quadrupole resonance (NQR), and semiotic data fusion. Bioreporter bacteria look promising for third-world humanitarian applications; they are inexpensive, and deployment does not require high-tech methods. Swept acoustic resonance may be a useful adjunct to magnetometers in humanitarian demining. For military demining, NQR is a promising method for detecting explosive substances; of 50,000 substances that have been tested, one has an NQR signature that can be mistaken for RDX or TNT. For both military and commercial demining, sensor fusion entails two daunting tasks, identifying fusible features in both present-day and emerging technologies, and devising a fusion algorithm that runs in real-time on cheap hardware. Preliminary research in these areas is encouraging. A bioreporter bacterium for TNT detection is under development. Investigation has just started in swept acoustic resonance as an approach to a cheap mine detector for humanitarian use. Real-time wavelet processing appears to be a key to extending NQR bomb detection into mine detection, including TNT-based mines. Recent discoveries in semiotics may be the breakthrough that will lead to a robust fused detection scheme.
Effective integrated frameworks for assessing mining sustainability.
Virgone, K M; Ramirez-Andreotta, M; Mainhagu, J; Brusseau, M L
2018-05-28
The objectives of this research are to review existing methods used for assessing mining sustainability, analyze the limited prior research that has evaluated the methods, and identify key characteristics that would constitute an enhanced sustainability framework that would serve to improve sustainability reporting in the mining industry. Five of the most relevant frameworks were selected for comparison in this analysis, and the results show that there are many commonalities among the five, as well as some disparities. In addition, relevant components are missing from all five. An enhanced evaluation system and framework were created to provide a more holistic, comprehensive method for sustainability assessment and reporting. The proposed framework has five components that build from and encompass the twelve evaluation characteristics used in the analysis. The components include Foundation, Focus, Breadth, Quality Assurance, and Relevance. The enhanced framework promotes a comprehensive, location-specific reporting approach with a concise set of well-defined indicators. Built into the framework is quality assurance, as well as a defined method to use information from sustainability reports to inform decisions. The framework incorporates human health and socioeconomic aspects via initiatives such as community-engaged research, economic valuations, and community-initiated environmental monitoring.
Origin and influence of coal mine drainage on streams of the United States
Powell, J.D.
1988-01-01
Degradation of water quality related to oxidation of iron disulfide minerals associated with coal is a naturally occurring process that has been observed since the late seventeenth century, many years before commencement of commercial coal mining in the United States. Disturbing coal strata during mining operations accelerates this natural deterioration of water quality by exposing greater surface areas of reactive minerals to the weathering effects of the atmosphere, hydrosphere, and biosphere. Degraded water quality in the temperate eastern half of the United States is readily detected because of the low mineralization of natural water. Maps are presented showing areas in the eastern United States where concentrations of chemical constituents in water affected by coal mining (pH, dissolved sulfate, total iron, total manganese) exceed background values and indicate effects of coal mining. Areas in the East most affected by mine drainage are in western Pennsylvania, southern Ohio, western Maryland, West Virginia, southern Illinois, western Kentucky, northern Missouri, and southern Iowa. Effects of coal mining on water quality in the more arid western half of the United States are more difficult to detect because of the high degree of mineralization of natural water. Normal background concentrations of constituents are not useful in evaluating effects of coal mine drainage on streams in the more arid West. Three approaches to reduce the effects of coal mining on water quality are: (1) exclusion of oxygenated water from reactive minerals, (2) neutralization of the acid produced, (3) retardation of acid-producing bacteria population in spoil material, by application of detergents that do not produce byproducts requiring disposal. These approaches can be used to help prevent further degradation of water quality in streams by future mining. ?? 1988 Springer-Verlag New York Inc.
Factors influencing mine rescue team behaviors.
Jansky, Jacqueline H; Kowalski-Trakofler, K M; Brnich, M J; Vaught, C
2016-01-01
A focus group study of the first moments in an underground mine emergency response was conducted by the National Institute for Occupational Safety and Health (NIOSH), Office for Mine Safety and Health Research. Participants in the study included mine rescue team members, team trainers, mine officials, state mining personnel, and individual mine managers. A subset of the data consists of responses from participants with mine rescue backgrounds. These responses were noticeably different from those given by on-site emergency personnel who were at the mine and involved with decisions made during the first moments of an event. As a result, mine rescue team behavior data were separated in the analysis and are reported in this article. By considering the responses from mine rescue team members and trainers, it was possible to sort the data and identify seven key areas of importance to them. On the basis of the responses from the focus group participants with a mine rescue background, the authors concluded that accurate and complete information and a unity of purpose among all command center personnel are two of the key conditions needed for an effective mine rescue operation.
Klein, Terry L.; Thamke, Joanna N.; Harper, David D.; Farag, Aïda M.; Nimick, David A.; Fey, David L.
2003-01-01
The upper Prickly Pear Creek watershed encompasses the upstream 15 miles of Prickly Pear Creek, south of Helena, Montana (fig. 1). The headwaters of Prickly Pear Creek and its tributaries (Beavertown Creek, Clancy Creek, Dutchman Creek, Golconda Creek, Lump Gulch, Spring Creek, and Warm Springs Creek) are primarily in the Helena National Forest, whereas the central part of the watershed primarily is within either Bureau of Land Management (BLM) or privately owned property. Three mining districts are present in the upper Prickly Pear Creek watershed: Alhambra, Clancy, and Colorado. Numerous prospects, adits, tailings piles, mills, dredge piles, and mines (mostly inactive) are located throughout the watershed. These districts contain polymetallic (Ag, Au, Cu, Pb, Zn) vein deposits and precious-metal (Au-Ag) vein and disseminated deposits that were exploited beginning in the 1860’s. Placer Au deposits in the major streams were extensively mined in the late 1800’s and early 1900’s.As part of a cooperative effort with Federal land management agencies, the U.S. Geological Survey (USGS) is currently using an integrated approach to investigate two mining impacted watersheds in the western United States (the Animas River in Colorado and the Boulder River in Montana). These studies provide the USDA Forest Service and BLM scientific data for implementing informed land-management decisions regarding cleanup of abandoned mine lands within each watershed. A similar integrated-science approach will be used to characterize the upper Prickly Pear Creek watershed with respect to water and streambed sediment chemistry, aquatic biota, and geologic framework. This integrated database presents data that will be used to identify important pathways of metals movement and biological impacts, thereby guiding resource management decisions of land-managers in several publications that are in preparation. Watershed-level characterization in terms of water quality, streambed sediment chemistry, and fish health will facilitate determinations of whether removal of contaminated materials or other cleanup activities are necessary, planning of short- and long-term restoration efforts, and development of a monitoring plan to document cleanup effectiveness.
Noah R. Lottig; H. Maurice Valett; Madeline E. Schreiber; Jackson R. Webster
2007-01-01
We investigated the influence of flooding and chronic arsenic contamination on ecosystem structure and function in a headwater stream adjacent to an abandoned arsenic (As) mine using an upstream (reference) and downstream (mine-influenced) comparative reach approach. In this study, floods were addressed as a pulse disturbance, and the abandoned As mine was...
Chapter 7: Selecting tree species for reforestation of Appalachian mined lands
V. Davis; J.A. Burger; R. Rathfon; C.E. Zipper
2017-01-01
The Forestry Reclamation Approach (FRA) is a method for reclaiming coal-mined land to forested postmining land uses under the federal Surface Mining Control and Reclamation Act of 1977 (SMCRA) (Chapter 2, this volume). Step 4 of the FRA is to plant native trees for commercial timber value, wildlife habitat, soil stability, watershed protection, and other environmental...
A Tools-Based Approach to Teaching Data Mining Methods
ERIC Educational Resources Information Center
Jafar, Musa J.
2010-01-01
Data mining is an emerging field of study in Information Systems programs. Although the course content has been streamlined, the underlying technology is still in a state of flux. The purpose of this paper is to describe how we utilized Microsoft Excel's data mining add-ins as a front-end to Microsoft's Cloud Computing and SQL Server 2008 Business…
NASA Astrophysics Data System (ADS)
Wu, Qiang; Liu, Yuanzhang; Liu, Donghai; Zhou, Wanfang
2011-09-01
Floor water inrush represents a geohazard that can pose significant threat to safe operations for instance in coal mines in China and elsewhere. Its occurrence is controlled by many factors, and the processes are often not amenable to mathematical expressions. To evaluate the water inrush risk, the paper proposes the vulnerability index approach by coupling the analytic hierarchy process (AHP) and geographic information system (GIS). The detailed procedures of using this innovative approach are shown in a case study in China (Donghuantuo Coal Mine). The powerful spatial data analysis functions of GIS was used to establish the thematic layer of each of the six factors that control the water inrush, and the contribution weights of each factor was determined with the AHP method. The established AHP evaluation model was used to determine the threshold value for each risk level with a histogram of the water inrush vulnerability index. As a result, the mine area was divided into five regions with different vulnerability levels which served as general guidelines for the mine operations. The prediction results were further corroborated with the actual mining data, and the evaluation result is satisfactory.
Nimick, David A.; Von Guerard, Paul
1998-01-01
From the Preface: There are thousands of abandoned or inactive mines on or adjacent to public lands administered by the U.S. Forest Service, Bureau of Land Management, and National Park Service. Mine wastes from many of these abandoned mines adversely affect resources on public lands. In 1995, an interdepartmental work group within the Federal government developed a strategy to address remediation of the many abandoned mines on public lands. This strategy is based on using a watershed approach to address the abandoned mine lands (AML) problem. The USGS, working closely with the Federal land-management agencies (FLMAs), is key for the success of this watershed approach. In support of this watershed approach, the USGS developed an AML Initiative with pilot studies in the Boulder River in Montana and the Animas River in Colorado. The goal of these studies is to design and implement a reliable strategy that will supply the scientific information to the FLMAs so that land managers can develop efficient and cost-effective remediation of AML. The symposium 'Science for Watershed Decisions on Abandoned Mine Lands: Review of Preliminary Results' held in Denver, Colorado, on February 4-5, 1998, provided the FLMAs a first look at the techniques, data, and interpretations being generated by the USGS pilot studies. This multidisciplined effort already is proving very valuable to land managers in making science-based AML cleanup decisions and will continue to be of increasing value as additional and more complete information is obtained. Ongoing interaction between scientists and land managers is essential to insure the efficient continuation and success of AML cleanup efforts.
Microbial genome mining for accelerated natural products discovery: is a renaissance in the making?
Bachmann, Brian O; Van Lanen, Steven G; Baltz, Richard H
2014-02-01
Microbial genome mining is a rapidly developing approach to discover new and novel secondary metabolites for drug discovery. Many advances have been made in the past decade to facilitate genome mining, and these are reviewed in this Special Issue of the Journal of Industrial Microbiology and Biotechnology. In this Introductory Review, we discuss the concept of genome mining and why it is important for the revitalization of natural product discovery; what microbes show the most promise for focused genome mining; how microbial genomes can be mined; how genome mining can be leveraged with other technologies; how progress on genome mining can be accelerated; and who should fund future progress in this promising field. We direct interested readers to more focused reviews on the individual topics in this Special Issue for more detailed summaries on the current state-of-the-art.
Mining method selection by integrated AHP and PROMETHEE method.
Bogdanovic, Dejan; Nikolic, Djordje; Ilic, Ivana
2012-03-01
Selecting the best mining method among many alternatives is a multicriteria decision making problem. The aim of this paper is to demonstrate the implementation of an integrated approach that employs AHP and PROMETHEE together for selecting the most suitable mining method for the "Coka Marin" underground mine in Serbia. The related problem includes five possible mining methods and eleven criteria to evaluate them. Criteria are accurately chosen in order to cover the most important parameters that impact on the mining method selection, such as geological and geotechnical properties, economic parameters and geographical factors. The AHP is used to analyze the structure of the mining method selection problem and to determine weights of the criteria, and PROMETHEE method is used to obtain the final ranking and to make a sensitivity analysis by changing the weights. The results have shown that the proposed integrated method can be successfully used in solving mining engineering problems.
2007-04-01
1 Chapter 1 – Introduction 1 - 1 1.1 Background and Problem Definition 1 - 1 1.1.1...Background 1 - 1 1.1.2 Problem Definition 1 -2 1.2 The Objective and Approach of the HFM-090/TG-25 1 -2 1.2.1 Objective 1 -2 1.2.2 Approach 1 -2 1.3...Organization of this Report 1 -3 1.4 References 1 -3 Chapter 2 – The Mine Detonation Process and Occupant Loading 2- 1 2.1 Introduction to Mines 2- 1 2.2
RECENT GEOCHEMICAL SAMPLING AND MERCURY SOURCES AT SULPHUR BANK MERCURY MINE, LAKE COUNTY, CA
The Sulphur Bank Mercury Mine (SBMM), located on the shore of Clear Lake in Lake County, California, has been identified as a significant source of mercury to the lake. Sulphur Bank was actively minded from the 1880's to the 1950's. Mining and processing operations at the Sulph...
ERIC Educational Resources Information Center
Kinnebrew, John S.; Biswas, Gautam
2012-01-01
Our learning-by-teaching environment, Betty's Brain, captures a wealth of data on students' learning interactions as they teach a virtual agent. This paper extends an exploratory data mining methodology for assessing and comparing students' learning behaviors from these interaction traces. The core algorithm employs sequence mining techniques to…
A Bioinformatic Approach to Inter Functional Interactions within Protein Sequences
2009-02-23
AFOSR/AOARD Reference Number: USAFAOGA07: FA4869-07-1-4050 AFOSR/AOARD Program Manager : Hiroshi Motoda, Ph.D. Period of...Conference on Knowledge Discovery and Data Mining.) In a separate study we have applied our approaches to the problem of whole genome alignment. We have...SIGKDD Conference on Knowledge Discovery and Data Mining Attached. Interactions: Please list: (a) Participation/presentations at meetings
A study of acid and ferruginous mine water in coal mining operations
NASA Astrophysics Data System (ADS)
Atkins, A. S.; Singh, R. N.
1982-06-01
The paper describes a bio-chemical investigation in the laboratory to identify various factors which promote the formation of acidic and ferruginous mine water. Biochemical reactions responsible for bacterial oxidation of Iron pyrites are described. The acidic and ferruginous mine water are not only responsible for the corrosion of mine plant and equipment and formation of scales in the delivery pipe range, but also pollution of the mine surface environment, thus affecting the surface ecology. Control measures to mitigate the adverse effects of acid mine discharge include the protection of mining equipment and prevention of formation of acid and ferruginous water. Various control measures discussed in the paper are blending with alkaline or spring water, use of neutralising agents and bactericides, and various types of seals for preventing water and air coming into contact with pyrites in caved mine workings.
Wirt, Laurie; Leib, Kenneth J.; Melick, Roger; Bove, Dana J.
2001-01-01
strongly affected by natural acidity from pyrite weathering. Metal content in the water column is a composite of multiple sources affected by hydrologic, geologic, climatic, and anthropogenic conditions. Identifying sources of metals from various drainage areas was determined using a tracer injection approach and synoptic sampling during low flow conditions on September 29, 1999 to determine loads. The tracer data was interpreted in conjunction with detailed geologic mapping, topographic profiling, geochemical characterization, and the occurrence and distribution of trace metals to identify sources of ground-water inflows. For this highly mineralized sub-basin, we demonstrate that SO4, Al, and Fe load contributions from drainage areas that have experienced historical mining?although substantial?are relatively insignificant in comparison with SO4, Al, and Fe loads from areas experiencing natural weathering of highlyaltered, pyritic rocks. Regional weathering of acid-sulfate mineral assemblages produces moderately low pH waters elevated in SO4, Al, and Fe; but generally lacking in Cu, Cd, Ni, and Pb. Samples impacted by mining are also characterized by low pH and large concentrations of SO4, Al, and Fe; but contained elevated dissolved metals from ore-bearing vein minerals such as Cu, Zn, Cd, Ni, and Pb. Occurrences of dissolved trace metals were helpful in identifying ground-water sources and flow paths. For example, cadmium was greatest in inflows associated with drainage from inactive mine sites and absent in inflows that were unaffected by past mining activities and thus served as an important indicator of mining contamination for this environmental setting. The most heavily mine-impacted reach (PG153 to PG800), contributed 8% of the discharge, and 11%, 9%, and 12% of the total SO4, Al, and Fe loads in Prospect Gulch. The same reach yielded 59% and 37% of the total Cu and Zn loads for the subbasin. In contrast, the naturally acidic inflows from the Red Chemotroph iron spring yielded 39% of the discharge and 54%, 73%, and 87% of the SO4, Al, and Fe loads; but only 4% of the total Cu and 30% of the total Zn loads in Prospect Gulch. Base flow from the Prospect Gulch sub-basin contributes about 4.8 percent of the total discharge at the mouth of Cement Creek; compared with sampled instream loads of 1.8%, 8.8%, 15.9%, 28%, and 8.6% for SO4, Al, Fe, Cu and Zn, respectively. Water-shed scale remediation efforts targeted at reducing loads of SO4, Al, and Fe at inactive mine sites are likely to fail because the major sources of these constituents in Prospect Gulch are predominantly discharged from natural sources. Remediation goals aimed at reducing acidity and loads of Cu and other base metals, may succeed, however, because changes in pH and loads are disproportionately greater than increases in discharge over the same reach, and a substantial fraction of the metal loading is from mining-impacted reaches. Whether remediation of abandoned mines in Prospect Gulch can be successful depends on how goals are defined?that is, whether the objective is to reduce loads of SO4, Al, and Fe; or whether loads of Cu and other base metals and pH are targeted.
Design of strength characteristics on the example of a mining support
NASA Astrophysics Data System (ADS)
Gwiazda, A.; Sękala, A.; Banaś, W.; Topolska, S.; Foit, K.; Monica, Z.
2017-08-01
It is a special group of particular design aproches that could be characterized as “design for X”. All areas of specific these design methodology, taking into account the requirements of the life cycle are described with the acronym DfX. It means an integrated computing platform approach to design binding together both the area of design knowledge and area of computer systems. In this perspective, computer systems are responsible for the link between design requirements with the subject of the project and to filter the information being circulated throughout the operation of the project. The DfX methodologies together form an approach integrating to different functional areas of industrial organization. Among the internal elements it can distinguish the structure of the project team, the people making it, the same process design, control system design and implementation of the action tools to assist this process. Among the elements that are obtained in the framework of this approach should be distinguished: higher operating efficiency, professionalism, the ability to create innovation, incremental progress of the project and the appropriate focus of the project team. It have been done attempts to integrate identified specific areas for action in the field of design methodology. They have already taken place earlier in the design due to the Economic Design for Manufacture. This approach was characteristic for European industry. In this case, an approach was developed in methodology, which can be defined as the Design to/for Cost. The article presents the idea of an integrated design approach related with the DfX approach. The results are described on the base of a virtual 3D model of a mining support. This model was elaborated in the advanced engineering platform like Siemens PLM NX.
Hewett, Paul; Morey, Sandy Z; Holen, Brian M; Logan, Perry W; Olsen, Geary W
2012-01-01
A study was conducted to construct a job exposure matrix for the roofing granule mine and mill workers at four U.S. plants. Each plant mined different minerals and had unique departments and jobs. The goal of the study was to generate accurate estimates of the mean exposure to respirable crystalline silica for each cell of the job exposure matrix, that is, every combination of plant, department, job, and year represented in the job histories of the study participants. The objectives of this study were to locate, identify, and collect information on all exposure measurements ever collected at each plant, statistically analyze the data to identify deficiencies in the database, identify and resolve questionable measurements, identify all important process and control changes for each plant-department-job combination, construct a time line for each plant-department combination indicating periods where the equipment and conditions were unchanged, and finally, construct a job exposure matrix. After evaluation, 1871 respirable crystalline silica measurements and estimates remained. The primary statistic of interest was the mean exposure for each job exposure matrix cell. The average exposure for each of the four plants was 0.042 mg/m(3) (Belle Mead, N.J.), 0.106 mg/m(3) (Corona, Calif.), 0.051 mg/m(3) (Little Rock, Ark.), and 0.152 mg/m(3) (Wausau, Wis.), suggesting that there may be substantial differences in the employee cumulative exposures. Using the database and the available plant information, the study team assigned an exposure category and mean exposure for every plant-department-job and time interval combination. Despite a fairly large database, the mean exposure for > 95% of the job exposure matrix cells, or specific plant-department-job-year combinations, were estimated by analogy to similar jobs in the plant for which sufficient data were available. This approach preserved plant specificity, hopefully improving the usefulness of the job exposure matrix.
Uranium mobility and accumulation along the Rio Paguate, Jackpile Mine in Laguna Pueblo, NM.
Blake, Johanna M; De Vore, Cherie L; Avasarala, Sumant; Ali, Abdul-Mehdi; Roldan, Claudia; Bowers, Fenton; Spilde, Michael N; Artyushkova, Kateryna; Kirk, Matthew F; Peterson, Eric; Rodriguez-Freire, Lucia; Cerrato, José M
2017-04-19
The mobility and accumulation of uranium (U) along the Rio Paguate, adjacent to the Jackpile Mine, in Laguna Pueblo, New Mexico was investigated using aqueous chemistry, electron microprobe, X-ray diffraction and spectroscopy analyses. Given that it is not common to identify elevated concentrations of U in surface water sources, the Rio Paguate is a unique site that concerns the Laguna Pueblo community. This study aims to better understand the solid chemistry of abandoned mine waste sediments from the Jackpile Mine and identify key hydrogeological and geochemical processes that affect the fate of U along the Rio Paguate. Solid analyses using X-ray fluorescence determined that sediments located in the Jackpile Mine contain ranges of 320 to 9200 mg kg -1 U. The presence of coffinite, a U(iv)-bearing mineral, was identified by X-ray diffraction analyses in abandoned mine waste solids exposed to several decades of weathering and oxidation. The dissolution of these U-bearing minerals from abandoned mine wastes could contribute to U mobility during rain events. The U concentration in surface waters sampled closest to mine wastes are highest during the southwestern monsoon season. Samples collected from September 2014 to August 2016 showed higher U concentrations in surface water adjacent to the Jackpile Mine (35.3 to 772 μg L -1 ) compared with those at a wetland 4.5 kilometers downstream of the mine (5.77 to 110 μg L -1 ). Sediments co-located in the stream bed and bank along the reach between the mine and wetland had low U concentrations (range 1-5 mg kg -1 ) compared to concentrations in wetland sediments with higher organic matter (14-15%) and U concentrations (2-21 mg kg -1 ). Approximately 10% of the total U in wetland sediments was amenable to complexation with 1 mM sodium bicarbonate in batch experiments; a decrease of U concentration in solution was observed over time in these experiments likely due to re-association with sediments in the reactor. The findings from this study provide new insights about how hydrologic events may affect the reactivity of U present in mine waste solids exposed to surface oxidizing conditions, and the influence of organic-rich sediments on U accumulation in the Rio Paguate.
Mendez, Monica O; Maier, Raina M
2008-03-01
Unreclaimed mine tailings sites are a worldwide problem, with thousands of unvegetated, exposed tailings piles presenting a source of contamination for nearby communities. Tailings disposal sites in arid and semiarid environments are especially subject to eolian dispersion and water erosion. Phytostabilization, the use of plants for in situ stabilization of tailings and metal contaminants, is a feasible alternative to costly remediation practices. In this review we emphasize considerations for phytostabilization of mine tailings in arid and semiarid environments, as well as issues impeding its long-term success. We reviewed literature addressing mine closures and revegetation of mine tailings, along with publications evaluating plant ecology, microbial ecology, and soil properties of mine tailings. Data were extracted from peer-reviewed articles and books identified in Web of Science and Agricola databases, and publications available through the U.S. Department of Agriculture, U.S. Environmental Protection Agency, and the United Nations Environment Programme. Harsh climatic conditions in arid and semiarid environments along with the innate properties of mine tailings require specific considerations. Plants suitable for phytostabilization must be native, be drought-, salt-, and metal-tolerant, and should limit shoot metal accumulation. Factors for evaluating metal accumulation and toxicity issues are presented. Also reviewed are aspects of implementing phytostabilization, including plant growth stage, amendments, irrigation, and evaluation. Phytostabilization of mine tailings is a promising remedial technology but requires further research to identify factors affecting its long-term success by expanding knowledge of suitable plant species and mine tailings chemistry in ongoing field trials.
Levings, G.W.
1982-01-01
The Greenleaf-Miller area of the Ashland coal field contains reserves of Federal coal that have been identified for potential lease sale. A hydrologic study was conducted in the potential lease area in 1981 to describe the existing hydrologic system and to assess potential impacts of surface coal mining on local water resources. The hydrologic data collected from wells, test holes, and springs were used to identify aquifers in the alluvium (Pleistocene and Holocene age) and the Tongue River member of the Fort Union Formation (Paleocene age). Coal, clinker, and sandstone beds comprise the aquifers in the Tongue River Member. Most streams are ephemeral and flow only as a result of precipitation. The only perennial surface-water flow in the study area is along short reaches downstream from springs. A mine plan for the area is not available; thus, the location of mine cuts, direction and rate of the mine expansion, and duration of mining are unknown. The mining of the Sawyer and Knoblock coal beds in the Tonge River Member would effect ground-water flow in the area. Declines in the potentiometric surface would be caused by dewatering where the mine pits intersect the water table. Wells and springs would be removed in the mine area; however, deeper aquifers are available as replacement sources of water. The chemical quality of the ground water would change after moving through the spoils. The change would be an increase in the concentration of dissolved solids. (USGS)
Improving Fishing Pattern Detection from Satellite AIS Using Data Mining and Machine Learning.
de Souza, Erico N; Boerder, Kristina; Matwin, Stan; Worm, Boris
2016-01-01
A key challenge in contemporary ecology and conservation is the accurate tracking of the spatial distribution of various human impacts, such as fishing. While coastal fisheries in national waters are closely monitored in some countries, existing maps of fishing effort elsewhere are fraught with uncertainty, especially in remote areas and the High Seas. Better understanding of the behavior of the global fishing fleets is required in order to prioritize and enforce fisheries management and conservation measures worldwide. Satellite-based Automatic Information Systems (S-AIS) are now commonly installed on most ocean-going vessels and have been proposed as a novel tool to explore the movements of fishing fleets in near real time. Here we present approaches to identify fishing activity from S-AIS data for three dominant fishing gear types: trawl, longline and purse seine. Using a large dataset containing worldwide fishing vessel tracks from 2011-2015, we developed three methods to detect and map fishing activities: for trawlers we produced a Hidden Markov Model (HMM) using vessel speed as observation variable. For longliners we have designed a Data Mining (DM) approach using an algorithm inspired from studies on animal movement. For purse seiners a multi-layered filtering strategy based on vessel speed and operation time was implemented. Validation against expert-labeled datasets showed average detection accuracies of 83% for trawler and longliner, and 97% for purse seiner. Our study represents the first comprehensive approach to detect and identify potential fishing behavior for three major gear types operating on a global scale. We hope that this work will enable new efforts to assess the spatial and temporal distribution of global fishing effort and make global fisheries activities transparent to ocean scientists, managers and the public.
Improving Fishing Pattern Detection from Satellite AIS Using Data Mining and Machine Learning
Matwin, Stan; Worm, Boris
2016-01-01
A key challenge in contemporary ecology and conservation is the accurate tracking of the spatial distribution of various human impacts, such as fishing. While coastal fisheries in national waters are closely monitored in some countries, existing maps of fishing effort elsewhere are fraught with uncertainty, especially in remote areas and the High Seas. Better understanding of the behavior of the global fishing fleets is required in order to prioritize and enforce fisheries management and conservation measures worldwide. Satellite-based Automatic Information Systems (S-AIS) are now commonly installed on most ocean-going vessels and have been proposed as a novel tool to explore the movements of fishing fleets in near real time. Here we present approaches to identify fishing activity from S-AIS data for three dominant fishing gear types: trawl, longline and purse seine. Using a large dataset containing worldwide fishing vessel tracks from 2011–2015, we developed three methods to detect and map fishing activities: for trawlers we produced a Hidden Markov Model (HMM) using vessel speed as observation variable. For longliners we have designed a Data Mining (DM) approach using an algorithm inspired from studies on animal movement. For purse seiners a multi-layered filtering strategy based on vessel speed and operation time was implemented. Validation against expert-labeled datasets showed average detection accuracies of 83% for trawler and longliner, and 97% for purse seiner. Our study represents the first comprehensive approach to detect and identify potential fishing behavior for three major gear types operating on a global scale. We hope that this work will enable new efforts to assess the spatial and temporal distribution of global fishing effort and make global fisheries activities transparent to ocean scientists, managers and the public. PMID:27367425
Elevated rates of gold mining in the Amazon revealed through high-resolution monitoring.
Asner, Gregory P; Llactayo, William; Tupayachi, Raul; Luna, Ernesto Ráez
2013-11-12
Gold mining has rapidly increased in western Amazonia, but the rates and ecological impacts of mining remain poorly known and potentially underestimated. We combined field surveys, airborne mapping, and high-resolution satellite imaging to assess road- and river-based gold mining in the Madre de Dios region of the Peruvian Amazon from 1999 to 2012. In this period, the geographic extent of gold mining increased 400%. The average annual rate of forest loss as a result of gold mining tripled in 2008 following the global economic recession, closely associated with increased gold prices. Small clandestine operations now comprise more than half of all gold mining activities throughout the region. These rates of gold mining are far higher than previous estimates that were based on traditional satellite mapping techniques. Our results prove that gold mining is growing more rapidly than previously thought, and that high-resolution monitoring approaches are required to accurately quantify human impacts on tropical forests.
Elevated rates of gold mining in the Amazon revealed through high-resolution monitoring
Asner, Gregory P.; Llactayo, William; Tupayachi, Raul; Luna, Ernesto Ráez
2013-01-01
Gold mining has rapidly increased in western Amazonia, but the rates and ecological impacts of mining remain poorly known and potentially underestimated. We combined field surveys, airborne mapping, and high-resolution satellite imaging to assess road- and river-based gold mining in the Madre de Dios region of the Peruvian Amazon from 1999 to 2012. In this period, the geographic extent of gold mining increased 400%. The average annual rate of forest loss as a result of gold mining tripled in 2008 following the global economic recession, closely associated with increased gold prices. Small clandestine operations now comprise more than half of all gold mining activities throughout the region. These rates of gold mining are far higher than previous estimates that were based on traditional satellite mapping techniques. Our results prove that gold mining is growing more rapidly than previously thought, and that high-resolution monitoring approaches are required to accurately quantify human impacts on tropical forests. PMID:24167281
Informatic search strategies to discover analogues and variants of natural product archetypes.
Johnston, Chad W; Connaty, Alex D; Skinnider, Michael A; Li, Yong; Grunwald, Alyssa; Wyatt, Morgan A; Kerr, Russell G; Magarvey, Nathan A
2016-03-01
Natural products are a crucial source of antimicrobial agents, but reliance on low-resolution bioactivity-guided approaches has led to diminishing interest in discovery programmes. Here, we demonstrate that two in-house automated informatic platforms can be used to target classes of biologically active natural products, specifically, peptaibols. We demonstrate that mass spectrometry-based informatic approaches can be used to detect natural products with high sensitivity, identifying desired agents present in complex microbial extracts. Using our specialised software packages, we could elaborate specific branches of chemical space, uncovering new variants of trichopolyn and demonstrating a way forward in mining natural products as a valuable source of potential pharmaceutical agents.
Open-source tools for data mining.
Zupan, Blaz; Demsar, Janez
2008-03-01
With a growing volume of biomedical databases and repositories, the need to develop a set of tools to address their analysis and support knowledge discovery is becoming acute. The data mining community has developed a substantial set of techniques for computational treatment of these data. In this article, we discuss the evolution of open-source toolboxes that data mining researchers and enthusiasts have developed over the span of a few decades and review several currently available open-source data mining suites. The approaches we review are diverse in data mining methods and user interfaces and also demonstrate that the field and its tools are ready to be fully exploited in biomedical research.
Tadesse, Tsegaye; Brown, Jesslyn F.; Hayes, M.J.
2005-01-01
Droughts are normal climate episodes, yet they are among the most expensive natural disasters in the world. Knowledge about the timing, severity, and pattern of droughts on the landscape can be incorporated into effective planning and decision-making. In this study, we present a data mining approach to modeling vegetation stress due to drought and mapping its spatial extent during the growing season. Rule-based regression tree models were generated that identify relationships between satellite-derived vegetation conditions, climatic drought indices, and biophysical data, including land-cover type, available soil water capacity, percent of irrigated farm land, and ecological type. The data mining method builds numerical rule-based models that find relationships among the input variables. Because the models can be applied iteratively with input data from previous time periods, the method enables to provide predictions of vegetation conditions farther into the growing season based on earlier conditions. Visualizing the model outputs as mapped information (called VegPredict) provides a means to evaluate the model. We present prototype maps for the 2002 drought year for Nebraska and South Dakota and discuss potential uses for these maps.
Ma, Jing; Porter, Alan L; Aminabhavi, Tejraj M; Zhu, Donghua
2015-10-01
"Tech mining" applies bibliometric and text analytic methods to scientific literature of a target field. In this study, we compare the evolution of nano-enabled drug delivery (NEDD) systems for two different applications - viz., brain cancer (BC) and Alzheimer's disease (AD) - using this approach. In this process, we derive research intelligence from papers indexed in MEDLINE. Review by domain specialists helps understand the macro-level disease problems and pathologies to identify commonalities and differences between BC and AD. Results provide a fresh perspective on the developmental pathways for NEDD approaches that have been used in the treatment of BC and AD. Results also point toward finding future solutions to drug delivery issues that are critical to medical practitioners and pharmaceutical scientists addressing the brain. Drug delivery to brain cells has been very challenging due to the presence of the blood-brain barrier (BBB). Suitable and effective nano-enabled drug delivery (NEDD) system is urgently needed. In this study, the authors utilized "tech-mining" tools to describe and compare various choices of delivery system available for the diagnosis, as well as treatment, of brain cancer and Alzheimer's disease. Copyright © 2015 Elsevier Inc. All rights reserved.
Mining spatiotemporal patterns of urban dwellers from taxi trajectory data
NASA Astrophysics Data System (ADS)
Mao, Feng; Ji, Minhe; Liu, Ting
2016-06-01
With the widespread adoption of locationaware technology, obtaining long-sequence, massive and high-accuracy spatiotemporal trajectory data of individuals has become increasingly popular in various geographic studies. Trajectory data of taxis, one of the most widely used inner-city travel modes, contain rich information about both road network traffic and travel behavior of passengers. Such data can be used to study the microscopic activity patterns of individuals as well as the macro system of urban spatial structures. This paper focuses on trajectories obtained from GPS-enabled taxis and their applications for mining urban commuting patterns. A novel approach is proposed to discover spatiotemporal patterns of household travel from the taxi trajectory dataset with a large number of point locations. The approach involves three critical steps: spatial clustering of taxi origin-destination (OD) based on urban traffic grids to discover potentially meaningful places, identifying threshold values from statistics of the OD clusters to extract urban jobs-housing structures, and visualization of analytic results to understand the spatial distribution and temporal trends of the revealed urban structures and implied household commuting behavior. A case study with a taxi trajectory dataset in Shanghai, China is presented to demonstrate and evaluate the proposed method.
Anawar, Hossain Md
2015-08-01
The oxidative dissolution of sulfidic minerals releases the extremely acidic leachate, sulfate and potentially toxic elements e.g., As, Ag, Cd, Cr, Cu, Hg, Ni, Pb, Sb, Th, U, Zn, etc. from different mine tailings and waste dumps. For the sustainable rehabilitation and disposal of mining waste, the sources and mechanisms of contaminant generation, fate and transport of contaminants should be clearly understood. Therefore, this study has provided a critical review on (1) recent insights in mechanisms of oxidation of sulfidic minerals, (2) environmental contamination by mining waste, and (3) remediation and rehabilitation techniques, and (4) then developed the GEMTEC conceptual model/guide [(bio)-geochemistry-mine type-mineralogy- geological texture-ore extraction process-climatic knowledge)] to provide the new scientific approach and knowledge for remediation of mining wastes and acid mine drainage. This study has suggested the pre-mining geological, geochemical, mineralogical and microtextural characterization of different mineral deposits, and post-mining studies of ore extraction processes, physical, geochemical, mineralogical and microbial reactions, natural attenuation and effect of climate change for sustainable rehabilitation of mining waste. All components of this model should be considered for effective and integrated management of mining waste and acid mine drainage. Copyright © 2015 Elsevier Ltd. All rights reserved.
Liljeqvist, Maria; Ossandon, Francisco J; González, Carolina; Rajan, Sukithar; Stell, Adam; Valdes, Jorge; Holmes, David S; Dopson, Mark
2015-04-01
An acid mine drainage (pH 2.5-2.7) stream biofilm situated 250 m below ground in the low-temperature (6-10°C) Kristineberg mine, northern Sweden, contained a microbial community equipped for growth at low temperature and acidic pH. Metagenomic sequencing of the biofilm and planktonic fractions identified the most abundant microorganism to be similar to the psychrotolerant acidophile, Acidithiobacillus ferrivorans. In addition, metagenome contigs were most similar to other Acidithiobacillus species, an Acidobacteria-like species, and a Gallionellaceae-like species. Analyses of the metagenomes indicated functional characteristics previously characterized as related to growth at low temperature including cold-shock proteins, several pathways for the production of compatible solutes and an anti-freeze protein. In addition, genes were predicted to encode functions related to pH homeostasis and metal resistance related to growth in the acidic metal-containing mine water. Metagenome analyses identified microorganisms capable of nitrogen fixation and exhibiting a primarily autotrophic lifestyle driven by the oxidation of the ferrous iron and inorganic sulfur compounds contained in the sulfidic mine waters. The study identified a low diversity of abundant microorganisms adapted to a low-temperature acidic environment as well as identifying some of the strategies the microorganisms employ to grow in this extreme environment. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Macromolecule mass spectrometry: citation mining of user documents.
Kostoff, Ronald N; Bedford, Clifford D; del Río, J Antonio; Cortes, Héctor D; Karypis, George
2004-03-01
Identifying research users, applications, and impact is important for research performers, managers, evaluators, and sponsors. Identification of the user audience and the research impact is complex and time consuming due to the many indirect pathways through which fundamental research can impact applications. This paper identified the literature pathways through which two highly-cited papers of 2002 Chemistry Nobel Laureates Fenn and Tanaka impacted research, technology development, and applications. Citation Mining, an integration of citation bibliometrics and text mining, was applied to the >1600 first generation Science Citation Index (SCI) citing papers to Fenn's 1989 Science paper on Electrospray Ionization for Mass Spectrometry, and to the >400 first generation SCI citing papers to Tanaka's 1988 Rapid Communications in Mass Spectrometry paper on Laser Ionization Time-of-Flight Mass Spectrometry. Bibliometrics was performed on the citing papers to profile the user characteristics. Text mining was performed on the citing papers to identify the technical areas impacted by the research, and the relationships among these technical areas.
NASA Technical Reports Server (NTRS)
Wier, C. E.; Wobber, F. J. (Principal Investigator); Russell, O. R.; Amato, R. V.; Leshendok, T.
1973-01-01
The author has identified the following significant results. The Mined Land Inventory map of Pike, Gibson, and Warrick Counties, Indiana, prepared from ERTS-1 imagery, was included in the 1973 Annual Report of the President's Council on Environmental Quality as an example of ERTS applications to mined lands. Increasing numbers of inquiries have been received from coal producing states and coal companies interested in the Indiana Program.
Sams, James I.; Veloski, Garret
2003-01-01
High-resolution airborne thermal infrared (TIR) imagery data were collected over 90.6 km2 (35 mi2) of remote and rugged terrain in the Kettle Creek and Cooks Run Basins, tributaries of the West Branch of the Susquehanna River in north-central Pennsylvania. The purpose of this investigation was to evaluate the effectiveness of TIR for identifying sources of acid mine drainage (AMD) associated with abandoned coal mines. Coal mining from the late 1800s resulted in many AMD sources from abandoned mines in the area. However, very little detailed mine information was available, particularly on the source locations of AMD sites. Potential AMD sources were extracted from airborne TIR data employing custom image processing algorithms and GIS data analysis. Based on field reconnaissance of 103 TIR anomalies, 53 sites (51%) were classified as AMD. The AMD sources had low pH (<4) and elevated concentrations of iron and aluminum. Of the 53 sites, approximately 26 sites could be correlated with sites previously documented as AMD. The other 27 mine discharges identified in the TIR data were previously undocumented. This paper presents a summary of the procedures used to process the TIR data and extract potential mine drainage sites, methods used for field reconnaissance and verification of TIR data, and a brief summary of water-quality data.
Mine safety assessment using gray relational analysis and bow tie model
2018-01-01
Mine safety assessment is a precondition for ensuring orderly and safety in production. The main purpose of this study was to prevent mine accidents more effectively by proposing a composite risk analysis model. First, the weights of the assessment indicators were determined by the revised integrated weight method, in which the objective weights were determined by a variation coefficient method and the subjective weights determined by the Delphi method. A new formula was then adopted to calculate the integrated weights based on the subjective and objective weights. Second, after the assessment indicator weights were determined, gray relational analysis was used to evaluate the safety of mine enterprises. Mine enterprise safety was ranked according to the gray relational degree, and weak links of mine safety practices identified based on gray relational analysis. Third, to validate the revised integrated weight method adopted in the process of gray relational analysis, the fuzzy evaluation method was used to the safety assessment of mine enterprises. Fourth, for first time, bow tie model was adopted to identify the causes and consequences of weak links and allow corresponding safety measures to be taken to guarantee the mine’s safe production. A case study of mine safety assessment was presented to demonstrate the effectiveness and rationality of the proposed composite risk analysis model, which can be applied to other related industries for safety evaluation. PMID:29561875
Directed Selection of Biochars for Amending Metal ...
Approximately 500,000 abandoned mines across the U.S. pose a considerable, pervasive risk to human health and the environment. World-wide the problem is even larger. Lime, organic matter, biosolids and other amendments have been used to decrease metal bioavailability in contaminated mine wastes and to promote the development of a mine waste stabilizing plant cover. The demonstrated properties of biochar make it a viable candidate as an amendment for remediating metal contaminated mine soils. In addition to sequestering potentially toxic metals, biochar can also be a source of plant nutrients, used to adjust soil pH, improve soil water holding characteristics, and increase soil carbon content. However, methods are needed for matching biochar beneficial properties with mine waste toxicities and soil health deficiencies. In this presentation we will report on a study in which we used mine soil from an abandoned Cu and Zn mine to develop a three-step procedure for identifying biochars that are most effective at reducing heavy metal bioavailability. Step 1: a slightly acidic extract of the mine spoil soil was produced, representing the potentially available metals, and used to identify metal removal properties of a library of 38 different biochars (e.g., made from a variety of feedstocks and pyrolysis or gasification conditions). Step 2: evaluation of how well these biochars retained (i.e., did not desorb) previously sorbed metals. Step 3: laboratory evalua
Association between borderline dysnatremia and mortality insight into a new data mining approach.
Girardeau, Yannick; Jannot, Anne-Sophie; Chatellier, Gilles; Saint-Jean, Olivier
2017-11-22
Even small variations of serum sodium concentration may be associated with mortality. Our objective was to confirm the impact of borderline dysnatremia for patients admitted to hospital on in-hospital mortality using real life care data from our electronic health record (EHR) and a phenome-wide association analysis (PheWAS). Retrospective observational study based on patient data admitted to Hôpital Européen George Pompidou, between 01/01/2008 and 31/06/2014; including 45,834 patients with serum sodium determinations on admission. We analyzed the association between dysnatremia and in-hospital mortality, using a multivariate logistic regression model to adjust for classical potential confounders. We performed a PheWAS to identify new potential confounders. Hyponatremia and hypernatremia were recorded for 12.0% and 1.0% of hospital stays, respectively. Adjusted odds ratios (ORa) for severe, moderate and borderline hyponatremia were 3.44 (95% CI, 2.41-4.86), 2.48 (95% CI, 1.96-3.13) and 1.98 (95% CI, 1.73-2.28), respectively. ORa for severe, moderate and borderline hypernatremia were 4.07 (95% CI, 2.92-5.62), 4.42 (95% CI, 2.04-9.20) and 3.72 (95% CI, 1.53-8.45), respectively. Borderline hyponatremia (ORa = 1.57 95% CI, 1.35-1.81) and borderline hypernatremia (ORa = 3.47 95% CI, 2.43-4.90) were still associated with in-hospital mortality after adjustment for classical and new confounding factors identified through the PheWAS analysis. Borderline dysnatremia on admission are independently associated with a higher risk of in-hospital mortality. By using medical data automatically collected in EHR and a new data mining approach, we identified new potential confounding factors that were highly associated with both mortality and dysnatremia.
NASA Astrophysics Data System (ADS)
Schauberger, Bernhard; Rolinski, Susanne; Müller, Christoph
2016-12-01
Variability of crop yields is detrimental for food security. Under climate change its amplitude is likely to increase, thus it is essential to understand the underlying causes and mechanisms. Crop models are the primary tool to project future changes in crop yields under climate change. A systematic overview of drivers and mechanisms of crop yield variability (YV) can thus inform crop model development and facilitate improved understanding of climate change impacts on crop yields. Yet there is a vast body of literature on crop physiology and YV, which makes a prioritization of mechanisms for implementation in models challenging. Therefore this paper takes on a novel approach to systematically mine and organize existing knowledge from the literature. The aim is to identify important mechanisms lacking in models, which can help to set priorities in model improvement. We structure knowledge from the literature in a semi-quantitative network. This network consists of complex interactions between growing conditions, plant physiology and crop yield. We utilize the resulting network structure to assign relative importance to causes of YV and related plant physiological processes. As expected, our findings confirm existing knowledge, in particular on the dominant role of temperature and precipitation, but also highlight other important drivers of YV. More importantly, our method allows for identifying the relevant physiological processes that transmit variability in growing conditions to variability in yield. We can identify explicit targets for the improvement of crop models. The network can additionally guide model development by outlining complex interactions between processes and by easily retrieving quantitative information for each of the 350 interactions. We show the validity of our network method as a structured, consistent and scalable dictionary of literature. The method can easily be applied to many other research fields.
Ding, Xuemei; Bucholc, Magda; Wang, Haiying; Glass, David H; Wang, Hui; Clarke, Dave H; Bjourson, Anthony John; Dowey, Le Roy C; O'Kane, Maurice; Prasad, Girijesh; Maguire, Liam; Wong-Lin, KongFatt
2018-06-27
There is currently a lack of an efficient, objective and systemic approach towards the classification of Alzheimer's disease (AD), due to its complex etiology and pathogenesis. As AD is inherently dynamic, it is also not clear how the relationships among AD indicators vary over time. To address these issues, we propose a hybrid computational approach for AD classification and evaluate it on the heterogeneous longitudinal AIBL dataset. Specifically, using clinical dementia rating as an index of AD severity, the most important indicators (mini-mental state examination, logical memory recall, grey matter and cerebrospinal volumes from MRI and active voxels from PiB-PET brain scans, ApoE, and age) can be automatically identified from parallel data mining algorithms. In this work, Bayesian network modelling across different time points is used to identify and visualize time-varying relationships among the significant features, and importantly, in an efficient way using only coarse-grained data. Crucially, our approach suggests key data features and their appropriate combinations that are relevant for AD severity classification with high accuracy. Overall, our study provides insights into AD developments and demonstrates the potential of our approach in supporting efficient AD diagnosis.
Immersion Cooling of Electronics in DoD Installations
2016-05-01
2012). Bitcoin Mining Electronics Cooling Development In January 2013, inventor/consultant Mark Miyoshi began development of a two-phase cooling...system using Novec 649 to be used for cooling bitcoin mining hardware. After a short trial period, hardware power supply and logic-board failures...are reports of bitcoin mining companies vertically stacking two-phase immersion baths to improve the floor space density, but this approach is likely
SEMINAR PUBLICATION: MANAGING ENVIRONMENTAL PROBLEMS AT INACTIVE AND ABANDONED METALS MINE SITES
Environmental problems associated with abandoned and inactive mines are addressed along with some approaches to resolving those problems, including case studies demonstrating technologies that have worked. New technologies being investigated are addressed also.
NASA Astrophysics Data System (ADS)
Vathsala, H.; Koolagudi, Shashidhar G.
2017-01-01
In this paper we discuss a data mining application for predicting peninsular Indian summer monsoon rainfall, and propose an algorithm that combine data mining and statistical techniques. We select likely predictors based on association rules that have the highest confidence levels. We then cluster the selected predictors to reduce their dimensions and use cluster membership values for classification. We derive the predictors from local conditions in southern India, including mean sea level pressure, wind speed, and maximum and minimum temperatures. The global condition variables include southern oscillation and Indian Ocean dipole conditions. The algorithm predicts rainfall in five categories: Flood, Excess, Normal, Deficit and Drought. We use closed itemset mining, cluster membership calculations and a multilayer perceptron function in the algorithm to predict monsoon rainfall in peninsular India. Using Indian Institute of Tropical Meteorology data, we found the prediction accuracy of our proposed approach to be exceptionally good.
The ``battle of gold'' under the light of green economics: a case study from Greece
NASA Astrophysics Data System (ADS)
Damigos, D.; Kaliampakos, D.
2006-05-01
Mining firms stimulate local and national economies but this comes at a certain cost. In the light of increasing public concern, external costs of environmental degradation and social disruption are no longer of pure academic interest. The assessment of mining projects on the grounds of sustainable development is critical in order to decide whether the exploitation of mineral resources is socially desirable. In practice, few steps have been taken towards this end. In this paper, a case study is illustrated that provides the means for evaluating the social worthiness of mining projects. The analysis, which is the first of its kind in Greece, deals with a major problem of the mining industry: the gold debate on the grounds of green economics. The assessment is based on the social cost benefit approach. Well-established techniques (e.g. benefit transfer) and innovative approaches have been adopted to overcome various practical problems
Martin, Jeffrey D.; Duwelius, Richard F.; Crawford, Charles G.
1990-01-01
Hydrologic effects of mining and reclamation were identified by comparing the hydrologic systems at mined and reclaimed watersheds with those at unmined agricultural watersheds. The presence or absence of a large final-cut lake in the reclaimed watershed greatly influences the hydrologic systems and the effects of mining and reclamation. Surface coal mining and reclamation can decrease base flow, annual runoff, and peak flow rates; increase the variability of flow and recharge to the bedrock; reestablish the premining relation between surface- and ground-water divides; and lower the water table in upland areas.
Computer-aided visual assessment in mine planning and design
Michael Hatfield; A. J. LeRoy Balzer; Roger E. Nelson
1979-01-01
A computer modeling technique is described for evaluating the visual impact of a proposed surface mine located within the viewshed of a national park. A computer algorithm analyzes digitized USGS baseline topography and identifies areas subject to surface disturbance visible from the park. Preliminary mine and reclamation plan information is used to describe how the...
Selective Guide to Literature on Mining Engineering. Engineering Literature Guides, Number 6.
ERIC Educational Resources Information Center
Erdmann, Charlotte A., Comp.
The multidisciplinary field of mining engineering offers many challenges. Often, many sources must be used to solve a problem. This document is a survey of information sources in mining engineering and is intended to identify those core resources which can help engineers and librarians to find information about the discipline. Sections include:…
Maansson, Maria; Vynne, Nikolaj G.; Klitgaard, Andreas; Nybo, Jane L.; Melchiorsen, Jette; Nguyen, Don D.; Sanchez, Laura M.; Ziemert, Nadine; Dorrestein, Pieter C.
2016-01-01
ABSTRACT Microorganisms are a rich source of bioactives; however, chemical identification is a major bottleneck. Strategies that can prioritize the most prolific microbial strains and novel compounds are of great interest. Here, we present an integrated approach to evaluate the biosynthetic richness in bacteria and mine the associated chemical diversity. Thirteen strains closely related to Pseudoalteromonas luteoviolacea isolated from all over the Earth were analyzed using an untargeted metabolomics strategy, and metabolomic profiles were correlated with whole-genome sequences of the strains. We found considerable diversity: only 2% of the chemical features and 7% of the biosynthetic genes were common to all strains, while 30% of all features and 24% of the genes were unique to single strains. The list of chemical features was reduced to 50 discriminating features using a genetic algorithm and support vector machines. Features were dereplicated by tandem mass spectrometry (MS/MS) networking to identify molecular families of the same biosynthetic origin, and the associated pathways were probed using comparative genomics. Most of the discriminating features were related to antibacterial compounds, including the thiomarinols that were reported from P. luteoviolacea here for the first time. By comparative genomics, we identified the biosynthetic cluster responsible for the production of the antibiotic indolmycin, which could not be predicted with standard methods. In conclusion, we present an efficient, integrative strategy for elucidating the chemical richness of a given set of bacteria and link the chemistry to biosynthetic genes. IMPORTANCE We here combine chemical analysis and genomics to probe for new bioactive secondary metabolites based on their pattern of distribution within bacterial species. We demonstrate the usefulness of this combined approach in a group of marine Gram-negative bacteria closely related to Pseudoalteromonas luteoviolacea, which is a species known to produce a broad spectrum of chemicals. The approach allowed us to identify new antibiotics and their associated biosynthetic pathways. Combining chemical analysis and genetics is an efficient “mining” workflow for identifying diverse pharmaceutical candidates in a broad range of microorganisms and therefore of great use in bioprospecting. PMID:27822535
NASA Astrophysics Data System (ADS)
Ribeiro, A. I.; Fengler, F. H.; Longo, R. M.; Mello, G. F.; Damame, D. B.; Crowley, D. E.
2015-12-01
Brazil has a high mineral potential that have been explored over the years. A large fraction of these mineral resources are located in Amazon region, which is known for its large biodiversity and world climate importance. As the policies that control the Amazon preservation are relatively new, several mining activities have been exploring the Amazon territory, promoting a large process of degradation. Once the mining activities have a high potential of environmental changes the government created polices to restrain the mining in Amazon forests and obligate mining companies to reclaim theirs minded areas. However, the measurement of reclamation development still is a challenging task for the Professionals involved. The volume and complexity of the variables, allied to the difficulty in identifying the reclamation of ecosystem functionalities are still lack to ensure the reclamation success. In this sense this work aims to investigate the representativeness of morphometric soil aggregates parameters in the understanding of reclamation development. The study area is located in the National Forest of Jamari, State of Rondônia. In the past mining companies explored the region producing eight closed mines that are now in reclamation process. The soil aggregates morphometric measurements: geometric mean diameter (GMD), aggregate circularity index, and aggregate roundness, were choose based in its obtaining facility, and their association to biological activity. To achieve the proposed objective the aggregates of eight sites in reclamation, from different closed mines, where chosen and compared to Amazon forest and open mine soil aggregates. The results were analyzed to one way ANOVA to identifying differences between areas in reclamation, natural ecosystem, and open mine. It was obtained differences for GMD and circularity index. However, only the circularity index allowed to identifying differences between the reclamation sites. The results allowed concluding: (1) Morphometric aggregates measurements can represent the reclamation process in Amazon territory; (2) To validate the results more areas in reclamation process in different ecosystems must be investigated; (3) Roundness didn't represented any differences.Key words: circularity index, ecosystem, geometric mean diameter.
Discovery of the leinamycin family of natural products by mining actinobacterial genomes
Xu, Zhengren; Guo, Zhikai; Hindra; Ma, Ming; Zhou, Hao; Gansemans, Yannick; Zhu, Xiangcheng; Huang, Yong; Zhao, Li-Xing; Jiang, Yi; Cheng, Jinhua; Van Nieuwerburgh, Filip; Suh, Joo-Won; Duan, Yanwen
2017-01-01
Nature’s ability to generate diverse natural products from simple building blocks has inspired combinatorial biosynthesis. The knowledge-based approach to combinatorial biosynthesis has allowed the production of designer analogs by rational metabolic pathway engineering. While successful, structural alterations are limited, with designer analogs often produced in compromised titers. The discovery-based approach to combinatorial biosynthesis complements the knowledge-based approach by exploring the vast combinatorial biosynthesis repertoire found in Nature. Here we showcase the discovery-based approach to combinatorial biosynthesis by targeting the domain of unknown function and cysteine lyase domain (DUF–SH) didomain, specific for sulfur incorporation from the leinamycin (LNM) biosynthetic machinery, to discover the LNM family of natural products. By mining bacterial genomes from public databases and the actinomycetes strain collection at The Scripps Research Institute, we discovered 49 potential producers that could be grouped into 18 distinct clades based on phylogenetic analysis of the DUF–SH didomains. Further analysis of the representative genomes from each of the clades identified 28 lnm-type gene clusters. Structural diversities encoded by the LNM-type biosynthetic machineries were predicted based on bioinformatics and confirmed by in vitro characterization of selected adenylation proteins and isolation and structural elucidation of the guangnanmycins and weishanmycins. These findings demonstrate the power of the discovery-based approach to combinatorial biosynthesis for natural product discovery and structural diversity and highlight Nature’s rich biosynthetic repertoire. Comparative analysis of the LNM-type biosynthetic machineries provides outstanding opportunities to dissect Nature’s biosynthetic strategies and apply these findings to combinatorial biosynthesis for natural product discovery and structural diversity. PMID:29229819
Discovery of the leinamycin family of natural products by mining actinobacterial genomes.
Pan, Guohui; Xu, Zhengren; Guo, Zhikai; Hindra; Ma, Ming; Yang, Dong; Zhou, Hao; Gansemans, Yannick; Zhu, Xiangcheng; Huang, Yong; Zhao, Li-Xing; Jiang, Yi; Cheng, Jinhua; Van Nieuwerburgh, Filip; Suh, Joo-Won; Duan, Yanwen; Shen, Ben
2017-12-26
Nature's ability to generate diverse natural products from simple building blocks has inspired combinatorial biosynthesis. The knowledge-based approach to combinatorial biosynthesis has allowed the production of designer analogs by rational metabolic pathway engineering. While successful, structural alterations are limited, with designer analogs often produced in compromised titers. The discovery-based approach to combinatorial biosynthesis complements the knowledge-based approach by exploring the vast combinatorial biosynthesis repertoire found in Nature. Here we showcase the discovery-based approach to combinatorial biosynthesis by targeting the domain of unknown function and cysteine lyase domain (DUF-SH) didomain, specific for sulfur incorporation from the leinamycin (LNM) biosynthetic machinery, to discover the LNM family of natural products. By mining bacterial genomes from public databases and the actinomycetes strain collection at The Scripps Research Institute, we discovered 49 potential producers that could be grouped into 18 distinct clades based on phylogenetic analysis of the DUF-SH didomains. Further analysis of the representative genomes from each of the clades identified 28 lnm -type gene clusters. Structural diversities encoded by the LNM-type biosynthetic machineries were predicted based on bioinformatics and confirmed by in vitro characterization of selected adenylation proteins and isolation and structural elucidation of the guangnanmycins and weishanmycins. These findings demonstrate the power of the discovery-based approach to combinatorial biosynthesis for natural product discovery and structural diversity and highlight Nature's rich biosynthetic repertoire. Comparative analysis of the LNM-type biosynthetic machineries provides outstanding opportunities to dissect Nature's biosynthetic strategies and apply these findings to combinatorial biosynthesis for natural product discovery and structural diversity.
Context-specific target definition in influenza a virus hemagglutinin-glycan receptor interactions.
Shriver, Zachary; Raman, Rahul; Viswanathan, Karthik; Sasisekharan, Ram
2009-08-28
Protein-glycan interactions are important regulators of a variety of biological processes, ranging from immune recognition to anticoagulation. An important area of active research is directed toward understanding the role of host cell surface glycans as recognition sites for pathogen protein receptors. Recognition of cell surface glycans is a widely employed strategy for a variety of pathogens, including bacteria, parasites, and viruses. We present here a representative example of such an interaction: the binding of influenza A hemagglutinin (HA) to specific sialylated glycans on the cell surface of human upper airway epithelial cells, which initiates the infection cycle. We detail a generalizable strategy to understand the nature of protein-glycan interactions both structurally and biochemically, using HA as a model system. This strategy combines a top-down approach using available structural information to define important contacts between glycans and HA, with a bottom-up approach using data-mining and informatics approaches to identify the common motifs that distinguish glycan binders from nonbinders. By probing protein-glycan interactions simultaneously through top-down and bottom-up approaches, we can scientifically validate a series of observations. This in turn provides additional confidence and surmounts known challenges in the study of protein-glycan interactions, such as accounting for multivalency, and thus truly defines concepts such as specificity, affinity, and avidity. With the advent of new technologies for glycomics-including glycan arrays, data-mining solutions, and robust algorithms to model protein-glycan interactions-we anticipate that such combination approaches will become tractable for a wide variety of protein-glycan interactions.
Lew-Tabor, A E; Rodriguez Valle, M
2016-06-01
The field of reverse vaccinology developed as an outcome of the genome sequence revolution. Following the introduction of live vaccinations in the western world by Edward Jenner in 1798 and the coining of the phrase 'vaccine', in 1881 Pasteur developed a rational design for vaccines. Pasteur proposed that in order to make a vaccine that one should 'isolate, inactivate and inject the microorganism' and these basic rules of vaccinology were largely followed for the next 100 years leading to the elimination of several highly infectious diseases. However, new technologies were needed to conquer many pathogens which could not be eliminated using these traditional technologies. Thus increasingly, computers were used to mine genome sequences to rationally design recombinant vaccines. Several vaccines for bacterial and viral diseases (i.e. meningococcus and HIV) have been developed, however the on-going challenge for parasite vaccines has been due to their comparatively larger genomes. Understanding the immune response is important in reverse vaccinology studies as this knowledge will influence how the genome mining is to be conducted. Vaccine candidates for anaplasmosis, cowdriosis, theileriosis, leishmaniasis, malaria, schistosomiasis, and the cattle tick have been identified using reverse vaccinology approaches. Some challenges for parasite vaccine development include the ability to address antigenic variability as well the understanding of the complex interplay between antibody, mucosal and/or T cell immune responses. To understand the complex parasite interactions with the livestock host, there is the limitation where algorithms for epitope mining using the human genome cannot directly be adapted for bovine, for example the prediction of peptide binding to major histocompatibility complex motifs. As the number of genomes for both hosts and parasites increase, the development of new algorithms for pan-genomic mining will continue to impact the future of parasite and ricketsial (and other tick borne pathogens) disease vaccine development. Copyright © 2015 Elsevier GmbH. All rights reserved.
Bidone, Edison; Cesar, Ricardo; Santos, Maria Carla; Sierpe, Ricardo; Silva-Filho, Emmanuel Vieira; Kutter, Vinicius; Dias da Silva, Lílian I; Castilhos, Zuleica
2018-03-01
Arsenic (As) is a dangerous and carcinogenic element and drinking water is its main pathway of human exposure. Gold mines are widely recognized as important sources of As pollution. This work proposes the assessment of As distribution along watersheds surrounding "Morro do Ouro" gold mine (Paracatu, southeastern Brazil). A balance approach between filtered As fluxes (As < 0.45 μm) and suspended particulate material (AsSPM) in different river segments was applied. Ultrafiltration procedure was used to categorize As into the following classes: particulate > 0.1 μm, colloidal < 0.1 μm to > 10 kDa, dissolved < 10 kDa to > 1 kDa, and truly dissolved < 1 kDa. By applying this approach, arsenic contributions from mining facilities were quantified in order to identify critical fluvial segments and support decision makers in actions of remediation. The mass balance indicated the occurrence of a decreasing gradient from upstream to downstream: (i) of the As concentrations higher than the limit established by Brazilian law (10 μg L -1 ); (ii) of the ratio between specific fluxes (g As km -2 day -1 ) and those determined using an uncontaminated watershed (a proxy for estimating the anthropic contribution), from 103 to 101; (iii) of the specific fluxes As < 0.45 μm and AsSPM from 102 to 100; and (iv) of the negative balance output minus input for each river segment that suggests As accumulation in sediments along the rivers in both urban and rural areas, mainly due to SPM sedimentation and sorption by Fe oxyhydroxides. Ultrafiltration shattering showed concentrations of decreasing As with particle size; the SPM load (> 0.1 μm) was almost one order higher to dissolved load (< 1 kDa).
A proactive approach to sustainable management of mine tailings
NASA Astrophysics Data System (ADS)
Edraki, Mansour; Baumgartl, Thomas
2015-04-01
The reactive strategies to manage mine tailings i.e. containment of slurries of tailings in tailings storage facilities (TSF's) and remediation of tailings solids or tailings seepage water after the decommissioning of those facilities, can be technically inefficient to eliminate environmental risks (e.g. prevent dispersion of contaminants and catastrophic dam wall failures), pose a long term economic burden for companies, governments and society after mine closure, and often fail to meet community expectations. Most preventive environmental management practices promote proactive integrated approaches to waste management whereby the source of environmental issues are identified to help make a more informed decisions. They often use life cycle assessment to find the "hot spots" of environmental burdens. This kind of approach is often based on generic data and has rarely been used for tailings. Besides, life cycle assessments are less useful for designing operations or simulating changes in the process and consequent environmental outcomes. It is evident that an integrated approach for tailings research linked to better processing options is needed. A literature review revealed that there are only few examples of integrated approaches. The aim of this project is to develop new tailings management models by streamlining orebody characterization, process optimization and rehabilitation. The approach is based on continuous fingerprinting of geochemical processes from orebody to tailings storage facility, and benchmark the success of such proactive initiatives by evidence of no impacts and no future projected impacts on receiving environments. We present an approach for developing such a framework and preliminary results from a case study where combined grinding and flotation models developed using geometallurgical data from the orebody were constructed to predict the properties of tailings produced under various processing scenarios. The modelling scenarios based on the case study data provide the capacity to predict the composition of tailings and the resulting environmental management implications. For example, the type and content of clay minerals in tailings will affect the geotechnical stability and water recovery. Clay content will also influence decisions made for paste or thickened tailings and underground backfilling. It is possible by using an integrated assessment framework to evaluate more alternatives, including the production of additional saleable and benign streams, alternative tailings treatment and disposal, as well as options for reuse, recycling and pre-processing of existing tailings.
NASA Astrophysics Data System (ADS)
Krawczyk, Artur
2018-01-01
In this article, topics regarding the technical and legal aspects of creating digital underground mining maps are described. Currently used technologies and solutions for creating, storing and making digital maps accessible are described in the context of the Polish mining industry. Also, some problems with the use of these technologies are identified and described. One of the identified problems is the need to expand the range of mining map data provided by survey departments to other mining departments, such as ventilation maintenance or geological maintenance. Three solutions are proposed and analyzed, and one is chosen for further analysis. The analysis concerns data storage and making survey data accessible not only from paper documentation, but also directly from computer systems. Based on enrichment data, new processing procedures are proposed for a new way of presenting information that allows the preparation of new cartographic representations (symbols) of data with regard to users' needs.
NASA Astrophysics Data System (ADS)
Wu, Qiang; Zhou, Wanfang; Wang, Jinhua; Xie, Shuhan
2009-05-01
Groundwater inrush is a geohazard that can significantly impact safe operations of the coal mines in China. Its occurrence is controlled by many factors and processes are often not amenable to mathematical expressions. To evaluate the water inrush risk, Professor Wu and his colleagues have proposed the vulnerability index approach by coupling the artificial neural network (ANN) and geographic information system (GIS). The detailed procedures of using this innovative approach are shown in a case study. Firstly, the powerful spatial data analysis functions of GIS was used to establish the thematic layer of each of the main factors that control the water inrush, and then to choose the training sample on the thematic layer with the ANN-BP Arithmetic. Secondly, the ANN evaluation model of the water inrush was established to determine the threshold value for each risk level with a histogram of the water inrush vulnerability index. As a result, the mine area was divided into four regions with different vulnerability levels and they served as the general guidelines for the mine operations.
Automation and robotics technology for intelligent mining systems
NASA Technical Reports Server (NTRS)
Welsh, Jeffrey H.
1989-01-01
The U.S. Bureau of Mines is approaching the problems of accidents and efficiency in the mining industry through the application of automation and robotics to mining systems. This technology can increase safety by removing workers from hazardous areas of the mines or from performing hazardous tasks. The short-term goal of the Automation and Robotics program is to develop technology that can be implemented in the form of an autonomous mining machine using current continuous mining machine equipment. In the longer term, the goal is to conduct research that will lead to new intelligent mining systems that capitalize on the capabilities of robotics. The Bureau of Mines Automation and Robotics program has been structured to produce the technology required for the short- and long-term goals. The short-term goal of application of automation and robotics to an existing mining machine, resulting in autonomous operation, is expected to be accomplished within five years. Key technology elements required for an autonomous continuous mining machine are well underway and include machine navigation systems, coal-rock interface detectors, machine condition monitoring, and intelligent computer systems. The Bureau of Mines program is described, including status of key technology elements for an autonomous continuous mining machine, the program schedule, and future work. Although the program is directed toward underground mining, much of the technology being developed may have applications for space systems or mining on the Moon or other planets.
Spectral methods to detect surface mines
NASA Astrophysics Data System (ADS)
Winter, Edwin M.; Schatten Silvious, Miranda
2008-04-01
Over the past five years, advances have been made in the spectral detection of surface mines under minefield detection programs at the U. S. Army RDECOM CERDEC Night Vision and Electronic Sensors Directorate (NVESD). The problem of detecting surface land mines ranges from the relatively simple, the detection of large anti-vehicle mines on bare soil, to the very difficult, the detection of anti-personnel mines in thick vegetation. While spatial and spectral approaches can be applied to the detection of surface mines, spatial-only detection requires many pixels-on-target such that the mine is actually imaged and shape-based features can be exploited. This method is unreliable in vegetated areas because only part of the mine may be exposed, while spectral detection is possible without the mine being resolved. At NVESD, hyperspectral and multi-spectral sensors throughout the reflection and thermal spectral regimes have been applied to the mine detection problem. Data has been collected on mines in forest and desert regions and algorithms have been developed both to detect the mines as anomalies and to detect the mines based on their spectral signature. In addition to the detection of individual mines, algorithms have been developed to exploit the similarities of mines in a minefield to improve their detection probability. In this paper, the types of spectral data collected over the past five years will be summarized along with the advances in algorithm development.
Figueroa, Rosa L; Flores, Christopher A
2016-08-01
Obesity is a chronic disease with an increasing impact on the world's population. In this work, we present a method of identifying obesity automatically using text mining techniques and information related to body weight measures and obesity comorbidities. We used a dataset of 3015 de-identified medical records that contain labels for two classification problems. The first classification problem distinguishes between obesity, overweight, normal weight, and underweight. The second classification problem differentiates between obesity types: super obesity, morbid obesity, severe obesity and moderate obesity. We used a Bag of Words approach to represent the records together with unigram and bigram representations of the features. We implemented two approaches: a hierarchical method and a nonhierarchical one. We used Support Vector Machine and Naïve Bayes together with ten-fold cross validation to evaluate and compare performances. Our results indicate that the hierarchical approach does not work as well as the nonhierarchical one. In general, our results show that Support Vector Machine obtains better performances than Naïve Bayes for both classification problems. We also observed that bigram representation improves performance compared with unigram representation.
Hajjo, Rima; Setola, Vincent; Roth, Bryan L.; Tropsha, Alexander
2012-01-01
We have devised a chemocentric informatics methodology for drug discovery integrating independent approaches to mining biomolecular databases. As a proof of concept, we have searched for novel putative cognition enhancers. First, we generated Quantitative Structure- Activity Relationship (QSAR) models of compounds binding to 5-hydroxytryptamine-6 receptor (5HT6R), a known target for cognition enhancers, and employed these models for virtual screening to identify putative 5-HT6R actives. Second, we queried chemogenomics data from the Connectivity Map (http://www.broad.mit.edu/cmap/) with the gene expression profile signatures of Alzheimer’s disease patients to identify compounds putatively linked to the disease. Thirteen common hits were tested in 5-HT6R radioligand binding assays and ten were confirmed as actives. Four of them were known selective estrogen receptor modulators that were never reported as 5-HT6R ligands. Furthermore, nine of the confirmed actives were reported elsewhere to have memory-enhancing effects. The approaches discussed herein can be used broadly to identify novel drug-target-disease associations. PMID:22537153
A Market-Basket Approach to Predict the Acute Aquatic Toxicity of Munitions and Energetic Materials.
Burgoon, Lyle D
2016-06-01
An ongoing challenge in chemical production, including the production of insensitive munitions and energetics, is the ability to make predictions about potential environmental hazards early in the process. To address this challenge, a quantitative structure activity relationship model was developed to predict acute fathead minnow toxicity of insensitive munitions and energetic materials. Computational predictive toxicology models like this one may be used to identify and prioritize environmentally safer materials early in their development. The developed model is based on the Apriori market-basket/frequent itemset mining approach to identify probabilistic prediction rules using chemical atom-pairs and the lethality data for 57 compounds from a fathead minnow acute toxicity assay. Lethality data were discretized into four categories based on the Globally Harmonized System of Classification and Labelling of Chemicals. Apriori identified toxicophores for categories two and three. The model classified 32 of the 57 compounds correctly, with a fivefold cross-validation classification rate of 74 %. A structure-based surrogate approach classified the remaining 25 chemicals correctly at 48 %. This result is unsurprising as these 25 chemicals were fairly unique within the larger set.
Espitia-Pérez, Lyda; Arteaga-Pertuz, Marcia; Soto, José Salvador; Espitia-Pérez, Pedro; Salcedo-Arteaga, Shirley; Pastor-Sierra, Karina; Galeano-Páez, Claudia; Brango, Hugo; da Silva, Juliana; Henriques, João A P
2018-09-01
During coal surface mining, several activities such as drilling, blasting, loading, and transport produce large quantities of particulate matter (PM) that is directly emitted into the atmosphere. Occupational exposure to this PM has been associated with an increase of DNA damage, but there is a scarcity of data examining the impact of these industrial operations in cytogenetic endpoints frequency and cancer risk of potentially exposed surrounding populations. In this study, we used a Geographic Information Systems (GIS) approach and Inverse Distance Weighting (IDW) methods to perform a spatial and statistical analysis to explore whether exposure to PM 2.5 and PM 10 pollution, and additional factors, including the enrichment of the PM with inorganic elements, contribute to cytogenetic damage in residents living in proximity to an open-pit coal mining area. Results showed a spatial relationship between exposure to elevated concentrations of PM 2.5, PM 10 and micronuclei frequency in binucleated (MNBN) and mononucleated (MNMONO) cells. Active pits, disposal, and storage areas could be identified as the possible emission sources of combustion elements. Mining activities were also correlated with increased concentrations of highly enriched elements like S, Cu and Cr in the atmosphere, corroborating its role in the inorganic elements pollution around coal mines. Elements enriched in the PM 2.5 fraction contributed to increasing of MNBN but seems to be more related to increased MNMONO frequencies and DNA damage accumulated in vivo. The combined use of GIS and IDW methods could represent an important tool for monitoring potential cancer risk associated to dynamically distributed variables like the PM. Copyright © 2018 Elsevier Ltd. All rights reserved.
Hoffman, Sarah R; Vines, Anissa I; Halladay, Jacqueline R; Pfaff, Emily; Schiff, Lauren; Westreich, Daniel; Sundaresan, Aditi; Johnson, La-Shell; Nicholson, Wanda K
2018-06-01
Women with symptomatic uterine fibroids can report a myriad of symptoms, including pain, bleeding, infertility, and psychosocial sequelae. Optimizing fibroid research requires the ability to enroll populations of women with image-confirmed symptomatic uterine fibroids. Our objective was to develop an electronic health record-based algorithm to identify women with symptomatic uterine fibroids for a comparative effectiveness study of medical or surgical treatments on quality-of-life measures. Using an iterative process and text-mining techniques, an effective computable phenotype algorithm, composed of demographics, and clinical and laboratory characteristics, was developed with reasonable performance. Such algorithms provide a feasible, efficient way to identify populations of women with symptomatic uterine fibroids for the conduct of large traditional or pragmatic trials and observational comparative effectiveness studies. Symptomatic uterine fibroids, due to menorrhagia, pelvic pain, bulk symptoms, or infertility, are a source of substantial morbidity for reproductive-age women. Comparing Treatment Options for Uterine Fibroids is a multisite registry study to compare the effectiveness of hormonal or surgical fibroid treatments on women's perceptions of their quality of life. Electronic health record-based algorithms are able to identify large numbers of women with fibroids, but additional work is needed to develop electronic health record algorithms that can identify women with symptomatic fibroids to optimize fibroid research. We sought to develop an efficient electronic health record-based algorithm that can identify women with symptomatic uterine fibroids in a large health care system for recruitment into large-scale observational and interventional research in fibroid management. We developed and assessed the accuracy of 3 algorithms to identify patients with symptomatic fibroids using an iterative approach. The data source was the Carolina Data Warehouse for Health, a repository for the health system's electronic health record data. In addition to International Classification of Diseases, Ninth Revision diagnosis and procedure codes and clinical characteristics, text data-mining software was used to derive information from imaging reports to confirm the presence of uterine fibroids. Results of each algorithm were compared with expert manual review to calculate the positive predictive values for each algorithm. Algorithm 1 was composed of the following criteria: (1) age 18-54 years; (2) either ≥1 International Classification of Diseases, Ninth Revision diagnosis codes for uterine fibroids or mention of fibroids using text-mined key words in imaging records or documents; and (3) no International Classification of Diseases, Ninth Revision or Current Procedural Terminology codes for hysterectomy and no reported history of hysterectomy. The positive predictive value was 47% (95% confidence interval 39-56%). Algorithm 2 required ≥2 International Classification of Diseases, Ninth Revision diagnosis codes for fibroids and positive text-mined key words and had a positive predictive value of 65% (95% confidence interval 50-79%). In algorithm 3, further refinements included ≥2 International Classification of Diseases, Ninth Revision diagnosis codes for fibroids on separate outpatient visit dates, the exclusion of women who had a positive pregnancy test within 3 months of their fibroid-related visit, and exclusion of incidentally detected fibroids during prenatal or emergency department visits. Algorithm 3 achieved a positive predictive value of 76% (95% confidence interval 71-81%). An electronic health record-based algorithm is capable of identifying cases of symptomatic uterine fibroids with moderate positive predictive value and may be an efficient approach for large-scale study recruitment. Copyright © 2018 Elsevier Inc. All rights reserved.
Controlled biological and biomimetic systems for landmine detection.
Habib, Maki K
2007-08-30
Humanitarian demining requires to accurately detect, locate and deactivate every single landmine and other buried mine-like objects as safely and as quickly as possible, and in the most non-invasive manner. The quality of landmine detection affects directly the efficiency and safety of this process. Most of the available methods to detect explosives and landmines are limited by their sensitivity and/or operational complexities. All landmines leak with time small amounts of their explosives that can be found on surrounding ground and plant life. Hence, explosive signatures represent the robust primary indicator of landmines. Accordingly, developing innovative technologies and efficient techniques to identify in real-time explosives residue in mined areas represents an attractive and promising approach. Biological and biologically inspired detection technology has the potential to compete with or be used in conjunction with other artificial technology to complement performance strengths. Biological systems are sensitive to many different scents concurrently, a property that has proven difficult to replicate artificially. Understanding biological systems presents unique opportunities for developing new capabilities through direct use of trained bio-systems, integration of living and non-living components, or inspiring new design by mimicking biological capabilities. It is expected that controlled bio-systems, biotechnology and microbial techniques will contribute to the advancement of mine detection and other application domains. This paper provides directions, evaluation and analysis on the progress of controlled biological and biomimetic systems for landmine detection. It introduces and discusses different approaches developed, underlining their relative advantages and limitations, and highlighting trends, safety and ecology concern, and possible future directions.
Meta-control of combustion performance with a data mining approach
NASA Astrophysics Data System (ADS)
Song, Zhe
Large scale combustion process is complex and proposes challenges of optimizing its performance. Traditional approaches based on thermal dynamics have limitations on finding optimal operational regions due to time-shift nature of the process. Recent advances in information technology enable people collect large volumes of process data easily and continuously. The collected process data contains rich information about the process and, to some extent, represents a digital copy of the process over time. Although large volumes of data exist in industrial combustion processes, they are not fully utilized to the level where the process can be optimized. Data mining is an emerging science which finds patterns or models from large data sets. It has found many successful applications in business marketing, medical and manufacturing domains The focus of this dissertation is on applying data mining to industrial combustion processes, and ultimately optimizing the combustion performance. However the philosophy, methods and frameworks discussed in this research can also be applied to other industrial processes. Optimizing an industrial combustion process has two major challenges. One is the underlying process model changes over time and obtaining an accurate process model is nontrivial. The other is that a process model with high fidelity is usually highly nonlinear, solving the optimization problem needs efficient heuristics. This dissertation is set to solve these two major challenges. The major contribution of this 4-year research is the data-driven solution to optimize the combustion process, where process model or knowledge is identified based on the process data, then optimization is executed by evolutionary algorithms to search for optimal operating regions.
Preventing spontaneous combustion after mine closing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lewicki, G.
1987-11-01
The author explains how the Northern Coal Company and a Houston-based firefighting firm developed an innovative technique to reduce the risk of spontaneous combustion after mine closing in its Rienau number2 Mine. The ''Light Water TM'' ATC series of firefighting foam concentrates were designed for extinguishing flammable liquid fires. By slightly altering the chemicals, the concentrates could be used to seal the coal ribs, floor, and roof, reducing the risk of combustion. Subsequent monitoring of the mine has identified no signs of heating.
2015-01-01
Background Sufficient knowledge of molecular and genetic interactions, which comprise the entire basis of the functioning of living systems, is one of the necessary requirements for successfully answering almost any research question in the field of biology and medicine. To date, more than 24 million scientific papers can be found in PubMed, with many of them containing descriptions of a wide range of biological processes. The analysis of such tremendous amounts of data requires the use of automated text-mining approaches. Although a handful of tools have recently been developed to meet this need, none of them provide error-free extraction of highly detailed information. Results The ANDSystem package was developed for the reconstruction and analysis of molecular genetic networks based on an automated text-mining technique. It provides a detailed description of the various types of interactions between genes, proteins, microRNA's, metabolites, cellular components, pathways and diseases, taking into account the specificity of cell lines and organisms. Although the accuracy of ANDSystem is comparable to other well known text-mining tools, such as Pathway Studio and STRING, it outperforms them in having the ability to identify an increased number of interaction types. Conclusion The use of ANDSystem, in combination with Pathway Studio and STRING, can improve the quality of the automated reconstruction of molecular and genetic networks. ANDSystem should provide a useful tool for researchers working in a number of different fields, including biology, biotechnology, pharmacology and medicine. PMID:25881313
Kitsos, Christine M; Bhamidipati, Phani; Melnikova, Irena; Cash, Ethan P; McNulty, Chris; Furman, Julia; Cima, Michael J; Levinson, Douglas
2007-01-01
This study examined whether hierarchical clustering could be used to detect cell states induced by treatment combinations that were generated through automation and high-throughput (HT) technology. Data-mining techniques were used to analyze the large experimental data sets to determine whether nonlinear, non-obvious responses could be extracted from the data. Unary, binary, and ternary combinations of pharmacological factors (examples of stimuli) were used to induce differentiation of HL-60 cells using a HT automated approach. Cell profiles were analyzed by incorporating hierarchical clustering methods on data collected by flow cytometry. Data-mining techniques were used to explore the combinatorial space for nonlinear, unexpected events. Additional small-scale, follow-up experiments were performed on cellular profiles of interest. Multiple, distinct cellular profiles were detected using hierarchical clustering of expressed cell-surface antigens. Data-mining of this large, complex data set retrieved cases of both factor dominance and cooperativity, as well as atypical cellular profiles. Follow-up experiments found that treatment combinations producing "atypical cell types" made those cells more susceptible to apoptosis. CONCLUSIONS Hierarchical clustering and other data-mining techniques were applied to analyze large data sets from HT flow cytometry. From each sample, the data set was filtered and used to define discrete, usable states that were then related back to their original formulations. Analysis of resultant cell populations induced by a multitude of treatments identified unexpected phenotypes and nonlinear response profiles.
Mining protein function from text using term-based support vector machines
Rice, Simon B; Nenadic, Goran; Stapley, Benjamin J
2005-01-01
Background Text mining has spurred huge interest in the domain of biology. The goal of the BioCreAtIvE exercise was to evaluate the performance of current text mining systems. We participated in Task 2, which addressed assigning Gene Ontology terms to human proteins and selecting relevant evidence from full-text documents. We approached it as a modified form of the document classification task. We used a supervised machine-learning approach (based on support vector machines) to assign protein function and select passages that support the assignments. As classification features, we used a protein's co-occurring terms that were automatically extracted from documents. Results The results evaluated by curators were modest, and quite variable for different problems: in many cases we have relatively good assignment of GO terms to proteins, but the selected supporting text was typically non-relevant (precision spanning from 3% to 50%). The method appears to work best when a substantial set of relevant documents is obtained, while it works poorly on single documents and/or short passages. The initial results suggest that our approach can also mine annotations from text even when an explicit statement relating a protein to a GO term is absent. Conclusion A machine learning approach to mining protein function predictions from text can yield good performance only if sufficient training data is available, and significant amount of supporting data is used for prediction. The most promising results are for combined document retrieval and GO term assignment, which calls for the integration of methods developed in BioCreAtIvE Task 1 and Task 2. PMID:15960835
A review of contrast pattern based data mining
NASA Astrophysics Data System (ADS)
Zhu, Shiwei; Ju, Meilong; Yu, Junfeng; Cai, Binlei; Wang, Aiping
2015-07-01
Contrast pattern based data mining is concerned with the mining of patterns and models that contrast two or more datasets. Contrast patterns can describe similarities or differences between the datasets. They represent strong contrast knowledge and have been shown to be very successful for constructing accurate and robust clusters and classifiers. The increasing use of contrast pattern data mining has initiated a great deal of research and development attempts in the field of data mining. A comprehensive revision on the existing contrast pattern based data mining research is given in this paper. They are generally categorized into background and representation, definitions and mining algorithms, contrast pattern based classification, clustering, and other applications, the research trends in future. The primary of this paper is to server as a glossary for interested researchers to have an overall picture on the current contrast based data mining development and identify their potential research direction to future investigation.
Resilience of benthic deep-sea fauna to mining activities.
Gollner, Sabine; Kaiser, Stefanie; Menzel, Lena; Jones, Daniel O B; Brown, Alastair; Mestre, Nelia C; van Oevelen, Dick; Menot, Lenaick; Colaço, Ana; Canals, Miquel; Cuvelier, Daphne; Durden, Jennifer M; Gebruk, Andrey; Egho, Great A; Haeckel, Matthias; Marcon, Yann; Mevenkamp, Lisa; Morato, Telmo; Pham, Christopher K; Purser, Autun; Sanchez-Vidal, Anna; Vanreusel, Ann; Vink, Annemiek; Martinez Arbizu, Pedro
2017-08-01
With increasing demand for mineral resources, extraction of polymetallic sulphides at hydrothermal vents, cobalt-rich ferromanganese crusts at seamounts, and polymetallic nodules on abyssal plains may be imminent. Here, we shortly introduce ecosystem characteristics of mining areas, report on recent mining developments, and identify potential stress and disturbances created by mining. We analyze species' potential resistance to future mining and perform meta-analyses on population density and diversity recovery after disturbances most similar to mining: volcanic eruptions at vents, fisheries on seamounts, and experiments that mimic nodule mining on abyssal plains. We report wide variation in recovery rates among taxa, size, and mobility of fauna. While densities and diversities of some taxa can recover to or even exceed pre-disturbance levels, community composition remains affected after decades. The loss of hard substrata or alteration of substrata composition may cause substantial community shifts that persist over geological timescales at mined sites. Copyright © 2017 Elsevier Ltd. All rights reserved.
Identifying Topics in Microblogs Using Wikipedia.
Yıldırım, Ahmet; Üsküdarlı, Suzan; Özgür, Arzucan
2016-01-01
Twitter is an extremely high volume platform for user generated contributions regarding any topic. The wealth of content created at real-time in massive quantities calls for automated approaches to identify the topics of the contributions. Such topics can be utilized in numerous ways, such as public opinion mining, marketing, entertainment, and disaster management. Towards this end, approaches to relate single or partial posts to knowledge base items have been proposed. However, in microblogging systems like Twitter, topics emerge from the culmination of a large number of contributions. Therefore, identifying topics based on collections of posts, where individual posts contribute to some aspect of the greater topic is necessary. Models, such as Latent Dirichlet Allocation (LDA), propose algorithms for relating collections of posts to sets of keywords that represent underlying topics. In these approaches, figuring out what the specific topic(s) the keyword sets represent remains as a separate task. Another issue in topic detection is the scope, which is often limited to specific domain, such as health. This work proposes an approach for identifying domain-independent specific topics related to sets of posts. In this approach, individual posts are processed and then aggregated to identify key tokens, which are then mapped to specific topics. Wikipedia article titles are selected to represent topics, since they are up to date, user-generated, sophisticated articles that span topics of human interest. This paper describes the proposed approach, a prototype implementation, and a case study based on data gathered during the heavily contributed periods corresponding to the four US election debates in 2012. The manually evaluated results (0.96 precision) and other observations from the study are discussed in detail.