Frontiers of biomedical text mining: current progress
Zweigenbaum, Pierre; Demner-Fushman, Dina; Yu, Hong; Cohen, Kevin B.
2008-01-01
It is now almost 15 years since the publication of the first paper on text mining in the genomics domain, and decades since the first paper on text mining in the medical domain. Enormous progress has been made in the areas of information retrieval, evaluation methodologies and resource construction. Some problems, such as abbreviation-handling, can essentially be considered solved problems, and others, such as identification of gene mentions in text, seem likely to be solved soon. However, a number of problems at the frontiers of biomedical text mining continue to present interesting challenges and opportunities for great improvements and interesting research. In this article we review the current state of the art in biomedical text mining or ‘BioNLP’ in general, focusing primarily on papers published within the past year. PMID:17977867
Using Text Mining to Uncover Students' Technology-Related Problems in Live Video Streaming
ERIC Educational Resources Information Center
Abdous, M'hammed; He, Wu
2011-01-01
Because of their capacity to sift through large amounts of data, text mining and data mining are enabling higher education institutions to reveal valuable patterns in students' learning behaviours without having to resort to traditional survey methods. In an effort to uncover live video streaming (LVS) students' technology related-problems and to…
ERIC Educational Resources Information Center
Walkington, Candace; Clinton, Virginia; Ritter, Steven N.; Nathan, Mitchell J.
2015-01-01
Solving mathematics story problems requires text comprehension skills. However, previous studies have found few connections between traditional measures of text readability and performance on story problems. We hypothesized that recently developed measures of readability and topic incidence measured by text-mining tools may illuminate associations…
PubRunner: A light-weight framework for updating text mining results.
Anekalla, Kishore R; Courneya, J P; Fiorini, Nicolas; Lever, Jake; Muchow, Michael; Busby, Ben
2017-01-01
Biomedical text mining promises to assist biologists in quickly navigating the combined knowledge in their domain. This would allow improved understanding of the complex interactions within biological systems and faster hypothesis generation. New biomedical research articles are published daily and text mining tools are only as good as the corpus from which they work. Many text mining tools are underused because their results are static and do not reflect the constantly expanding knowledge in the field. In order for biomedical text mining to become an indispensable tool used by researchers, this problem must be addressed. To this end, we present PubRunner, a framework for regularly running text mining tools on the latest publications. PubRunner is lightweight, simple to use, and can be integrated with an existing text mining tool. The workflow involves downloading the latest abstracts from PubMed, executing a user-defined tool, pushing the resulting data to a public FTP or Zenodo dataset, and publicizing the location of these results on the public PubRunner website. We illustrate the use of this tool by re-running the commonly used word2vec tool on the latest PubMed abstracts to generate up-to-date word vector representations for the biomedical domain. This shows a proof of concept that we hope will encourage text mining developers to build tools that truly will aid biologists in exploring the latest publications.
Using Syntactic Patterns to Enhance Text Analytics
ERIC Educational Resources Information Center
Meyer, Bradley B.
2017-01-01
Large scale product and service reviews proliferate and are commonly found across the web. The ability to harvest, digest and analyze a large corpus of reviews from online websites is still however a difficult problem. This problem is referred to as "opinion mining." Opinion mining is an important area of research as advances in the…
Application of text mining in the biomedical domain.
Fleuren, Wilco W M; Alkema, Wynand
2015-03-01
In recent years the amount of experimental data that is produced in biomedical research and the number of papers that are being published in this field have grown rapidly. In order to keep up to date with developments in their field of interest and to interpret the outcome of experiments in light of all available literature, researchers turn more and more to the use of automated literature mining. As a consequence, text mining tools have evolved considerably in number and quality and nowadays can be used to address a variety of research questions ranging from de novo drug target discovery to enhanced biological interpretation of the results from high throughput experiments. In this paper we introduce the most important techniques that are used for a text mining and give an overview of the text mining tools that are currently being used and the type of problems they are typically applied for. Copyright © 2015 Elsevier Inc. All rights reserved.
Chapter 16: text mining for translational bioinformatics.
Cohen, K Bretonnel; Hunter, Lawrence E
2013-04-01
Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.
Semantic Annotation of Complex Text Structures in Problem Reports
NASA Technical Reports Server (NTRS)
Malin, Jane T.; Throop, David R.; Fleming, Land D.
2011-01-01
Text analysis is important for effective information retrieval from databases where the critical information is embedded in text fields. Aerospace safety depends on effective retrieval of relevant and related problem reports for the purpose of trend analysis. The complex text syntax in problem descriptions has limited statistical text mining of problem reports. The presentation describes an intelligent tagging approach that applies syntactic and then semantic analysis to overcome this problem. The tags identify types of problems and equipment that are embedded in the text descriptions. The power of these tags is illustrated in a faceted searching and browsing interface for problem report trending that combines automatically generated tags with database code fields and temporal information.
Application of text mining for customer evaluations in commercial banking
NASA Astrophysics Data System (ADS)
Tan, Jing; Du, Xiaojiang; Hao, Pengpeng; Wang, Yanbo J.
2015-07-01
Nowadays customer attrition is increasingly serious in commercial banks. To combat this problem roundly, mining customer evaluation texts is as important as mining customer structured data. In order to extract hidden information from customer evaluations, Textual Feature Selection, Classification and Association Rule Mining are necessary techniques. This paper presents all three techniques by using Chinese Word Segmentation, C5.0 and Apriori, and a set of experiments were run based on a collection of real textual data that includes 823 customer evaluations taken from a Chinese commercial bank. Results, consequent solutions, some advice for the commercial bank are given in this paper.
A New Framework for Textual Information Mining over Parse Trees. CRESST Report 805
ERIC Educational Resources Information Center
Mousavi, Hamid; Kerr, Deirdre; Iseli, Markus R.
2011-01-01
Textual information mining is a challenging problem that has resulted in the creation of many different rule-based linguistic query languages. However, these languages generally are not optimized for the purpose of text mining. In other words, they usually consider queries as individuals and only return raw results for each query. Moreover they…
Ghazizadeh, Mahtab; McDonald, Anthony D; Lee, John D
2014-09-01
This study applies text mining to extract clusters of vehicle problems and associated trends from free-response data in the National Highway Traffic Safety Administration's vehicle owner's complaint database. As the automotive industry adopts new technologies, it is important to systematically assess the effect of these changes on traffic safety. Driving simulators, naturalistic driving data, and crash databases all contribute to a better understanding of how drivers respond to changing vehicle technology, but other approaches, such as automated analysis of incident reports, are needed. Free-response data from incidents representing two severity levels (fatal incidents and incidents involving injury) were analyzed using a text mining approach: latent semantic analysis (LSA). LSA and hierarchical clustering identified clusters of complaints for each severity level, which were compared and analyzed across time. Cluster analysis identified eight clusters of fatal incidents and six clusters of incidents involving injury. Comparisons showed that although the airbag clusters across the two severity levels have the same most frequent terms, the circumstances around the incidents differ. The time trends show clear increases in complaints surrounding the Ford/Firestone tire recall and the Toyota unintended acceleration recall. Increases in complaints may be partially driven by these recall announcements and the associated media attention. Text mining can reveal useful information from free-response databases that would otherwise be prohibitively time-consuming and difficult to summarize manually. Text mining can extend human analysis capabilities for large free-response databases to support earlier detection of problems and more timely safety interventions.
Lucini, Filipe R; S Fogliatto, Flavio; C da Silveira, Giovani J; L Neyeloff, Jeruza; Anzanello, Michel J; de S Kuchenbecker, Ricardo; D Schaan, Beatriz
2017-04-01
Emergency department (ED) overcrowding is a serious issue for hospitals. Early information on short-term inward bed demand from patients receiving care at the ED may reduce the overcrowding problem, and optimize the use of hospital resources. In this study, we use text mining methods to process data from early ED patient records using the SOAP framework, and predict future hospitalizations and discharges. We try different approaches for pre-processing of text records and to predict hospitalization. Sets-of-words are obtained via binary representation, term frequency, and term frequency-inverse document frequency. Unigrams, bigrams and trigrams are tested for feature formation. Feature selection is based on χ 2 and F-score metrics. In the prediction module, eight text mining methods are tested: Decision Tree, Random Forest, Extremely Randomized Tree, AdaBoost, Logistic Regression, Multinomial Naïve Bayes, Support Vector Machine (Kernel linear) and Nu-Support Vector Machine (Kernel linear). Prediction performance is evaluated by F1-scores. Precision and Recall values are also informed for all text mining methods tested. Nu-Support Vector Machine was the text mining method with the best overall performance. Its average F1-score in predicting hospitalization was 77.70%, with a standard deviation (SD) of 0.66%. The method could be used to manage daily routines in EDs such as capacity planning and resource allocation. Text mining could provide valuable information and facilitate decision-making by inward bed management teams. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
Weighted mining of massive collections of [Formula: see text]-values by convex optimization.
Dobriban, Edgar
2018-06-01
Researchers in data-rich disciplines-think of computational genomics and observational cosmology-often wish to mine large bodies of [Formula: see text]-values looking for significant effects, while controlling the false discovery rate or family-wise error rate. Increasingly, researchers also wish to prioritize certain hypotheses, for example, those thought to have larger effect sizes, by upweighting, and to impose constraints on the underlying mining, such as monotonicity along a certain sequence. We introduce Princessp , a principled method for performing weighted multiple testing by constrained convex optimization. Our method elegantly allows one to prioritize certain hypotheses through upweighting and to discount others through downweighting, while constraining the underlying weights involved in the mining process. When the [Formula: see text]-values derive from monotone likelihood ratio families such as the Gaussian means model, the new method allows exact solution of an important optimal weighting problem previously thought to be non-convex and computationally infeasible. Our method scales to massive data set sizes. We illustrate the applications of Princessp on a series of standard genomics data sets and offer comparisons with several previous 'standard' methods. Princessp offers both ease of operation and the ability to scale to extremely large problem sizes. The method is available as open-source software from github.com/dobriban/pvalue_weighting_matlab (accessed 11 October 2017).
A Framework for Text Mining in Scientometric Study: A Case Study in Biomedicine Publications
NASA Astrophysics Data System (ADS)
Silalahi, V. M. M.; Hardiyati, R.; Nadhiroh, I. M.; Handayani, T.; Rahmaida, R.; Amelia, M.
2018-04-01
The data of Indonesians research publications in the domain of biomedicine has been collected to be text mined for the purpose of a scientometric study. The goal is to build a predictive model that provides a classification of research publications on the potency for downstreaming. The model is based on the drug development processes adapted from the literatures. An effort is described to build the conceptual model and the development of a corpus on the research publications in the domain of Indonesian biomedicine. Then an investigation is conducted relating to the problems associated with building a corpus and validating the model. Based on our experience, a framework is proposed to manage the scientometric study based on text mining. Our method shows the effectiveness of conducting a scientometric study based on text mining in order to get a valid classification model. This valid model is mainly supported by the iterative and close interactions with the domain experts starting from identifying the issues, building a conceptual model, to the labelling, validation and results interpretation.
Mining Quality Phrases from Massive Text Corpora
Liu, Jialu; Shang, Jingbo; Wang, Chi; Ren, Xiang; Han, Jiawei
2015-01-01
Text data are ubiquitous and play an essential role in big data applications. However, text data are mostly unstructured. Transforming unstructured text into structured units (e.g., semantically meaningful phrases) will substantially reduce semantic ambiguity and enhance the power and efficiency at manipulating such data using database technology. Thus mining quality phrases is a critical research problem in the field of databases. In this paper, we propose a new framework that extracts quality phrases from text corpora integrated with phrasal segmentation. The framework requires only limited training but the quality of phrases so generated is close to human judgment. Moreover, the method is scalable: both computation time and required space grow linearly as corpus size increases. Our experiments on large text corpora demonstrate the quality and efficiency of the new method. PMID:26705375
Neural networks for data mining electronic text collections
NASA Astrophysics Data System (ADS)
Walker, Nicholas; Truman, Gregory
1997-04-01
The use of neural networks in information retrieval and text analysis has primarily suffered from the issues of adequate document representation, the ability to scale to very large collections, dynamism in the face of new information and the practical difficulties of basing the design on the use of supervised training sets. Perhaps the most important approach to begin solving these problems is the use of `intermediate entities' which reduce the dimensionality of document representations and the size of documents collections to manageable levels coupled with the use of unsupervised neural network paradigms. This paper describes the issues, a fully configured neural network-based text analysis system--dataHARVEST--aimed at data mining text collections which begins this process, along with the remaining difficulties and potential ways forward.
Mining protein function from text using term-based support vector machines
Rice, Simon B; Nenadic, Goran; Stapley, Benjamin J
2005-01-01
Background Text mining has spurred huge interest in the domain of biology. The goal of the BioCreAtIvE exercise was to evaluate the performance of current text mining systems. We participated in Task 2, which addressed assigning Gene Ontology terms to human proteins and selecting relevant evidence from full-text documents. We approached it as a modified form of the document classification task. We used a supervised machine-learning approach (based on support vector machines) to assign protein function and select passages that support the assignments. As classification features, we used a protein's co-occurring terms that were automatically extracted from documents. Results The results evaluated by curators were modest, and quite variable for different problems: in many cases we have relatively good assignment of GO terms to proteins, but the selected supporting text was typically non-relevant (precision spanning from 3% to 50%). The method appears to work best when a substantial set of relevant documents is obtained, while it works poorly on single documents and/or short passages. The initial results suggest that our approach can also mine annotations from text even when an explicit statement relating a protein to a GO term is absent. Conclusion A machine learning approach to mining protein function predictions from text can yield good performance only if sufficient training data is available, and significant amount of supporting data is used for prediction. The most promising results are for combined document retrieval and GO term assignment, which calls for the integration of methods developed in BioCreAtIvE Task 1 and Task 2. PMID:15960835
Takeda, Kayoko; Takahashi, Kiyoshi; Masukawa, Hiroyuki; Shimamori, Yoshimitsu
2017-01-01
Recently, the practice of active learning has spread, increasingly recognized as an essential component of academic studies. Classes incorporating small group discussion (SGD) are conducted at many universities. At present, assessments of the effectiveness of SGD have mostly involved evaluation by questionnaires conducted by teachers, by peer assessment, and by self-evaluation of students. However, qualitative data, such as open-ended descriptions by students, have not been widely evaluated. As a result, we have been unable to analyze the processes and methods involved in how students acquire knowledge in SGD. In recent years, due to advances in information and communication technology (ICT), text mining has enabled the analysis of qualitative data. We therefore investigated whether the introduction of a learning system comprising the jigsaw method and problem-based learning (PBL) would improve student attitudes toward learning; we did this by text mining analysis of the content of student reports. We found that by applying the jigsaw method before PBL, we were able to improve student attitudes toward learning and increase the depth of their understanding of the area of study as a result of working with others. The use of text mining to analyze qualitative data also allowed us to understand the processes and methods by which students acquired knowledge in SGD and also changes in students' understanding and performance based on improvements to the class. This finding suggests that the use of text mining to analyze qualitative data could enable teachers to evaluate the effectiveness of various methods employed to improve learning.
Experiences with Text Mining Large Collections of Unstructured Systems Development Artifacts at JPL
NASA Technical Reports Server (NTRS)
Port, Dan; Nikora, Allen; Hihn, Jairus; Huang, LiGuo
2011-01-01
Often repositories of systems engineering artifacts at NASA's Jet Propulsion Laboratory (JPL) are so large and poorly structured that they have outgrown our capability to effectively manually process their contents to extract useful information. Sophisticated text mining methods and tools seem a quick, low-effort approach to automating our limited manual efforts. Our experiences of exploring such methods mainly in three areas including historical risk analysis, defect identification based on requirements analysis, and over-time analysis of system anomalies at JPL, have shown that obtaining useful results requires substantial unanticipated efforts - from preprocessing the data to transforming the output for practical applications. We have not observed any quick 'wins' or realized benefit from short-term effort avoidance through automation in this area. Surprisingly we have realized a number of unexpected long-term benefits from the process of applying text mining to our repositories. This paper elaborates some of these benefits and our important lessons learned from the process of preparing and applying text mining to large unstructured system artifacts at JPL aiming to benefit future TM applications in similar problem domains and also in hope for being extended to broader areas of applications.
PPInterFinder--a mining tool for extracting causal relations on human proteins from literature.
Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar
2013-01-01
One of the most common and challenging problem in biomedical text mining is to mine protein-protein interactions (PPIs) from MEDLINE abstracts and full-text research articles because PPIs play a major role in understanding the various biological processes and the impact of proteins in diseases. We implemented, PPInterFinder--a web-based text mining tool to extract human PPIs from biomedical literature. PPInterFinder uses relation keyword co-occurrences with protein names to extract information on PPIs from MEDLINE abstracts and consists of three phases. First, it identifies the relation keyword using a parser with Tregex and a relation keyword dictionary. Next, it automatically identifies the candidate PPI pairs with a set of rules related to PPI recognition. Finally, it extracts the relations by matching the sentence with a set of 11 specific patterns based on the syntactic nature of PPI pair. We find that PPInterFinder is capable of predicting PPIs with the accuracy of 66.05% on AIMED corpus and outperforms most of the existing systems. DATABASE URL: http://www.biomining-bu.in/ppinterfinder/
PPInterFinder—a mining tool for extracting causal relations on human proteins from literature
Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar
2013-01-01
One of the most common and challenging problem in biomedical text mining is to mine protein–protein interactions (PPIs) from MEDLINE abstracts and full-text research articles because PPIs play a major role in understanding the various biological processes and the impact of proteins in diseases. We implemented, PPInterFinder—a web-based text mining tool to extract human PPIs from biomedical literature. PPInterFinder uses relation keyword co-occurrences with protein names to extract information on PPIs from MEDLINE abstracts and consists of three phases. First, it identifies the relation keyword using a parser with Tregex and a relation keyword dictionary. Next, it automatically identifies the candidate PPI pairs with a set of rules related to PPI recognition. Finally, it extracts the relations by matching the sentence with a set of 11 specific patterns based on the syntactic nature of PPI pair. We find that PPInterFinder is capable of predicting PPIs with the accuracy of 66.05% on AIMED corpus and outperforms most of the existing systems. Database URL: http://www.biomining-bu.in/ppinterfinder/ PMID:23325628
NASA Technical Reports Server (NTRS)
Srivastava, Ashok; McIntosh, Dawn; Castle, Pat; Pontikakis, Manos; Diev, Vesselin; Zane-Ulman, Brett; Turkov, Eugene; Akella, Ram; Xu, Zuobing; Kumaresan, Sakthi Preethi
2006-01-01
This viewgraph document describes the data mining system developed at NASA Ames. Many NASA programs have large numbers (and types) of problem reports.These free text reports are written by a number of different people, thus the emphasis and wording vary considerably With so much data to sift through, analysts (subject experts) need help identifying any possible safety issues or concerns and help them confirm that they haven't missed important problems. Unsupervised clustering is the initial step to accomplish this; We think we can go much farther, specifically, identify possible recurring anomalies. Recurring anomalies may be indicators of larger systemic problems. The requirement to identify these anomalies has led to the development of Recurring Anomaly Discovery System (ReADS).
NASA Astrophysics Data System (ADS)
Tirupattur, Naveen; Lapish, Christopher C.; Mukhopadhyay, Snehasis
2011-06-01
Text mining, sometimes alternately referred to as text analytics, refers to the process of extracting high-quality knowledge from the analysis of textual data. Text mining has wide variety of applications in areas such as biomedical science, news analysis, and homeland security. In this paper, we describe an approach and some relatively small-scale experiments which apply text mining to neuroscience research literature to find novel associations among a diverse set of entities. Neuroscience is a discipline which encompasses an exceptionally wide range of experimental approaches and rapidly growing interest. This combination results in an overwhelmingly large and often diffuse literature which makes a comprehensive synthesis difficult. Understanding the relations or associations among the entities appearing in the literature not only improves the researchers current understanding of recent advances in their field, but also provides an important computational tool to formulate novel hypotheses and thereby assist in scientific discoveries. We describe a methodology to automatically mine the literature and form novel associations through direct analysis of published texts. The method first retrieves a set of documents from databases such as PubMed using a set of relevant domain terms. In the current study these terms yielded a set of documents ranging from 160,909 to 367,214 documents. Each document is then represented in a numerical vector form from which an Association Graph is computed which represents relationships between all pairs of domain terms, based on co-occurrence. Association graphs can then be subjected to various graph theoretic algorithms such as transitive closure and cycle (circuit) detection to derive additional information, and can also be visually presented to a human researcher for understanding. In this paper, we present three relatively small-scale problem-specific case studies to demonstrate that such an approach is very successful in replicating a neuroscience expert's mental model of object-object associations entirely by means of text mining. These preliminary results provide the confidence that this type of text mining based research approach provides an extremely powerful tool to better understand the literature and drive novel discovery for the neuroscience community.
@Note: a workbench for biomedical text mining.
Lourenço, Anália; Carreira, Rafael; Carneiro, Sónia; Maia, Paulo; Glez-Peña, Daniel; Fdez-Riverola, Florentino; Ferreira, Eugénio C; Rocha, Isabel; Rocha, Miguel
2009-08-01
Biomedical Text Mining (BioTM) is providing valuable approaches to the automated curation of scientific literature. However, most efforts have addressed the benchmarking of new algorithms rather than user operational needs. Bridging the gap between BioTM researchers and biologists' needs is crucial to solve real-world problems and promote further research. We present @Note, a platform for BioTM that aims at the effective translation of the advances between three distinct classes of users: biologists, text miners and software developers. Its main functional contributions are the ability to process abstracts and full-texts; an information retrieval module enabling PubMed search and journal crawling; a pre-processing module with PDF-to-text conversion, tokenisation and stopword removal; a semantic annotation schema; a lexicon-based annotator; a user-friendly annotation view that allows to correct annotations and a Text Mining Module supporting dataset preparation and algorithm evaluation. @Note improves the interoperability, modularity and flexibility when integrating in-home and open-source third-party components. Its component-based architecture allows the rapid development of new applications, emphasizing the principles of transparency and simplicity of use. Although it is still on-going, it has already allowed the development of applications that are currently being used.
O'Mara-Eves, Alison; Thomas, James; McNaught, John; Miwa, Makoto; Ananiadou, Sophia
2015-01-14
The large and growing number of published studies, and their increasing rate of publication, makes the task of identifying relevant studies in an unbiased way for inclusion in systematic reviews both complex and time consuming. Text mining has been offered as a potential solution: through automating some of the screening process, reviewer time can be saved. The evidence base around the use of text mining for screening has not yet been pulled together systematically; this systematic review fills that research gap. Focusing mainly on non-technical issues, the review aims to increase awareness of the potential of these technologies and promote further collaborative research between the computer science and systematic review communities. Five research questions led our review: what is the state of the evidence base; how has workload reduction been evaluated; what are the purposes of semi-automation and how effective are they; how have key contextual problems of applying text mining to the systematic review field been addressed; and what challenges to implementation have emerged? We answered these questions using standard systematic review methods: systematic and exhaustive searching, quality-assured data extraction and a narrative synthesis to synthesise findings. The evidence base is active and diverse; there is almost no replication between studies or collaboration between research teams and, whilst it is difficult to establish any overall conclusions about best approaches, it is clear that efficiencies and reductions in workload are potentially achievable. On the whole, most suggested that a saving in workload of between 30% and 70% might be possible, though sometimes the saving in workload is accompanied by the loss of 5% of relevant studies (i.e. a 95% recall). Using text mining to prioritise the order in which items are screened should be considered safe and ready for use in 'live' reviews. The use of text mining as a 'second screener' may also be used cautiously. The use of text mining to eliminate studies automatically should be considered promising, but not yet fully proven. In highly technical/clinical areas, it may be used with a high degree of confidence; but more developmental and evaluative work is needed in other disciplines.
Text mining for metabolic pathways, signaling cascades, and protein networks.
Hoffmann, Robert; Krallinger, Martin; Andres, Eduardo; Tamames, Javier; Blaschke, Christian; Valencia, Alfonso
2005-05-10
The complexity of the information stored in databases and publications on metabolic and signaling pathways, the high throughput of experimental data, and the growing number of publications make it imperative to provide systems to help the researcher navigate through these interrelated information resources. Text-mining methods have started to play a key role in the creation and maintenance of links between the information stored in biological databases and its original sources in the literature. These links will be extremely useful for database updating and curation, especially if a number of technical problems can be solved satisfactorily, including the identification of protein and gene names (entities in general) and the characterization of their types of interactions. The first generation of openly accessible text-mining systems, such as iHOP (Information Hyperlinked over Proteins), provides additional functions to facilitate the reconstruction of protein interaction networks, combine database and text information, and support the scientist in the formulation of novel hypotheses. The next challenge is the generation of comprehensive information regarding the general function of signaling pathways and protein interaction networks.
Unapparent Information Revelation: Text Mining for Counterterrorism
NASA Astrophysics Data System (ADS)
Srihari, Rohini K.
Unapparent information revelation (UIR) is a special case of text mining that focuses on detecting possible links between concepts across multiple text documents by generating an evidence trail explaining the connection. A traditional search involving, for example, two or more person names will attempt to find documents mentioning both these individuals. This research focuses on a different interpretation of such a query: what is the best evidence trail across documents that explains a connection between these individuals? For example, all may be good golfers. A generalization of this task involves query terms representing general concepts (e.g. indictment, foreign policy). Previous approaches to this problem have focused on graph mining involving hyperlinked documents, and link analysis exploiting named entities. A new robust framework is presented, based on (i) generating concept chain graphs, a hybrid content representation, (ii) performing graph matching to select candidate subgraphs, and (iii) subsequently using graphical models to validate hypotheses using ranked evidence trails. We adapt the DUC data set for cross-document summarization to evaluate evidence trails generated by this approach
Percha, Bethany; Altman, Russ B
2013-01-01
The biomedical literature presents a uniquely challenging text mining problem. Sentences are long and complex, the subject matter is highly specialized with a distinct vocabulary, and producing annotated training data for this domain is time consuming and expensive. In this environment, unsupervised text mining methods that do not rely on annotated training data are valuable. Here we investigate the use of random indexing, an automated method for producing vector-space semantic representations of words from large, unlabeled corpora, to address the problem of term normalization in sentences describing drugs and genes. We show that random indexing produces similarity scores that capture some of the structure of PHARE, a manually curated ontology of pharmacogenomics concepts. We further show that random indexing can be used to identify likely word candidates for inclusion in the ontology, and can help localize these new labels among classes and roles within the ontology.
Percha, Bethany; Altman, Russ B.
2013-01-01
The biomedical literature presents a uniquely challenging text mining problem. Sentences are long and complex, the subject matter is highly specialized with a distinct vocabulary, and producing annotated training data for this domain is time consuming and expensive. In this environment, unsupervised text mining methods that do not rely on annotated training data are valuable. Here we investigate the use of random indexing, an automated method for producing vector-space semantic representations of words from large, unlabeled corpora, to address the problem of term normalization in sentences describing drugs and genes. We show that random indexing produces similarity scores that capture some of the structure of PHARE, a manually curated ontology of pharmacogenomics concepts. We further show that random indexing can be used to identify likely word candidates for inclusion in the ontology, and can help localize these new labels among classes and roles within the ontology. PMID:24551397
DiMeX: A Text Mining System for Mutation-Disease Association Extraction.
Mahmood, A S M Ashique; Wu, Tsung-Jung; Mazumder, Raja; Vijay-Shanker, K
2016-01-01
The number of published articles describing associations between mutations and diseases is increasing at a fast pace. There is a pressing need to gather such mutation-disease associations into public knowledge bases, but manual curation slows down the growth of such databases. We have addressed this problem by developing a text-mining system (DiMeX) to extract mutation to disease associations from publication abstracts. DiMeX consists of a series of natural language processing modules that preprocess input text and apply syntactic and semantic patterns to extract mutation-disease associations. DiMeX achieves high precision and recall with F-scores of 0.88, 0.91 and 0.89 when evaluated on three different datasets for mutation-disease associations. DiMeX includes a separate component that extracts mutation mentions in text and associates them with genes. This component has been also evaluated on different datasets and shown to achieve state-of-the-art performance. The results indicate that our system outperforms the existing mutation-disease association tools, addressing the low precision problems suffered by most approaches. DiMeX was applied on a large set of abstracts from Medline to extract mutation-disease associations, as well as other relevant information including patient/cohort size and population data. The results are stored in a database that can be queried and downloaded at http://biotm.cis.udel.edu/dimex/. We conclude that this high-throughput text-mining approach has the potential to significantly assist researchers and curators to enrich mutation databases.
DiMeX: A Text Mining System for Mutation-Disease Association Extraction
Mahmood, A. S. M. Ashique; Wu, Tsung-Jung; Mazumder, Raja; Vijay-Shanker, K.
2016-01-01
The number of published articles describing associations between mutations and diseases is increasing at a fast pace. There is a pressing need to gather such mutation-disease associations into public knowledge bases, but manual curation slows down the growth of such databases. We have addressed this problem by developing a text-mining system (DiMeX) to extract mutation to disease associations from publication abstracts. DiMeX consists of a series of natural language processing modules that preprocess input text and apply syntactic and semantic patterns to extract mutation-disease associations. DiMeX achieves high precision and recall with F-scores of 0.88, 0.91 and 0.89 when evaluated on three different datasets for mutation-disease associations. DiMeX includes a separate component that extracts mutation mentions in text and associates them with genes. This component has been also evaluated on different datasets and shown to achieve state-of-the-art performance. The results indicate that our system outperforms the existing mutation-disease association tools, addressing the low precision problems suffered by most approaches. DiMeX was applied on a large set of abstracts from Medline to extract mutation-disease associations, as well as other relevant information including patient/cohort size and population data. The results are stored in a database that can be queried and downloaded at http://biotm.cis.udel.edu/dimex/. We conclude that this high-throughput text-mining approach has the potential to significantly assist researchers and curators to enrich mutation databases. PMID:27073839
2010-01-01
Background An increase in work on the full text of journal articles and the growth of PubMedCentral have the opportunity to create a major paradigm shift in how biomedical text mining is done. However, until now there has been no comprehensive characterization of how the bodies of full text journal articles differ from the abstracts that until now have been the subject of most biomedical text mining research. Results We examined the structural and linguistic aspects of abstracts and bodies of full text articles, the performance of text mining tools on both, and the distribution of a variety of semantic classes of named entities between them. We found marked structural differences, with longer sentences in the article bodies and much heavier use of parenthesized material in the bodies than in the abstracts. We found content differences with respect to linguistic features. Three out of four of the linguistic features that we examined were statistically significantly differently distributed between the two genres. We also found content differences with respect to the distribution of semantic features. There were significantly different densities per thousand words for three out of four semantic classes, and clear differences in the extent to which they appeared in the two genres. With respect to the performance of text mining tools, we found that a mutation finder performed equally well in both genres, but that a wide variety of gene mention systems performed much worse on article bodies than they did on abstracts. POS tagging was also more accurate in abstracts than in article bodies. Conclusions Aspects of structure and content differ markedly between article abstracts and article bodies. A number of these differences may pose problems as the text mining field moves more into the area of processing full-text articles. However, these differences also present a number of opportunities for the extraction of data types, particularly that found in parenthesized text, that is present in article bodies but not in article abstracts. PMID:20920264
Information Gain Based Dimensionality Selection for Classifying Text Documents
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dumidu Wijayasekara; Milos Manic; Miles McQueen
2013-06-01
Selecting the optimal dimensions for various knowledge extraction applications is an essential component of data mining. Dimensionality selection techniques are utilized in classification applications to increase the classification accuracy and reduce the computational complexity. In text classification, where the dimensionality of the dataset is extremely high, dimensionality selection is even more important. This paper presents a novel, genetic algorithm based methodology, for dimensionality selection in text mining applications that utilizes information gain. The presented methodology uses information gain of each dimension to change the mutation probability of chromosomes dynamically. Since the information gain is calculated a priori, the computational complexitymore » is not affected. The presented method was tested on a specific text classification problem and compared with conventional genetic algorithm based dimensionality selection. The results show an improvement of 3% in the true positives and 1.6% in the true negatives over conventional dimensionality selection methods.« less
Text Mining in Biomedical Domain with Emphasis on Document Clustering.
Renganathan, Vinaitheerthan
2017-07-01
With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise.
Text Mining in Biomedical Domain with Emphasis on Document Clustering
2017-01-01
Objectives With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. Methods This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. Results Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. Conclusions Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise. PMID:28875048
Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges
Singhal, Ayush; Leaman, Robert; Catlett, Natalie; Lemberger, Thomas; McEntyre, Johanna; Polson, Shawn; Xenarios, Ioannis; Arighi, Cecilia; Lu, Zhiyong
2016-01-01
Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs. We note that system ‘accuracy’ remains a challenge and identify several additional common difficulties and potential research directions including (i) the ‘scalability’ issue due to the increasing need of mining information from millions of full-text articles, (ii) the ‘interoperability’ issue of integrating various text-mining systems into existing curation workflows and (iii) the ‘reusability’ issue on the difficulty of applying trained systems to text genres that are not seen previously during development. We then describe related efforts within the text-mining community, with a special focus on the BioCreative series of challenge workshops. We believe that focusing on the near-term challenges identified in this work will amplify the opportunities afforded by the continued adoption of text-mining tools. Finally, in order to sustain the curation ecosystem and have text-mining systems adopted for practical benefits, we call for increased collaboration between text-mining researchers and various stakeholders, including researchers, publishers and biocurators. PMID:28025348
Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges
Singhal, Ayush; Leaman, Robert; Catlett, Natalie; ...
2016-12-26
Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs. We note that system ‘accuracy’ remains a challenge and identify several additional common difficulties and potential research directions including (i) the ‘scalability’ issue due to themore » increasing need of mining information from millions of full-text articles, (ii) the ‘interoperability’ issue of integrating various text-mining systems into existing curation workflows and (iii) the ‘reusability’ issue on the difficulty of applying trained systems to text genres that are not seen previously during development. We then describe related efforts within the text-mining community, with a special focus on the BioCreative series of challenge workshops. We believe that focusing on the near-term challenges identified in this work will amplify the opportunities afforded by the continued adoption of text-mining tools. In conclusion, in order to sustain the curation ecosystem and have text-mining systems adopted for practical benefits, we call for increased collaboration between text-mining researchers and various stakeholders, including researchers, publishers and biocurators.« less
Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singhal, Ayush; Leaman, Robert; Catlett, Natalie
Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs. We note that system ‘accuracy’ remains a challenge and identify several additional common difficulties and potential research directions including (i) the ‘scalability’ issue due to themore » increasing need of mining information from millions of full-text articles, (ii) the ‘interoperability’ issue of integrating various text-mining systems into existing curation workflows and (iii) the ‘reusability’ issue on the difficulty of applying trained systems to text genres that are not seen previously during development. We then describe related efforts within the text-mining community, with a special focus on the BioCreative series of challenge workshops. We believe that focusing on the near-term challenges identified in this work will amplify the opportunities afforded by the continued adoption of text-mining tools. In conclusion, in order to sustain the curation ecosystem and have text-mining systems adopted for practical benefits, we call for increased collaboration between text-mining researchers and various stakeholders, including researchers, publishers and biocurators.« less
Classification of Traffic Related Short Texts to Analyse Road Problems in Urban Areas
NASA Astrophysics Data System (ADS)
Saldana-Perez, A. M. M.; Moreno-Ibarra, M.; Tores-Ruiz, M.
2017-09-01
The Volunteer Geographic Information (VGI) can be used to understand the urban dynamics. In the classification of traffic related short texts to analyze road problems in urban areas, a VGI data analysis is done over a social media's publications, in order to classify traffic events at big cities that modify the movement of vehicles and people through the roads, such as car accidents, traffic and closures. The classification of traffic events described in short texts is done by applying a supervised machine learning algorithm. In the approach users are considered as sensors which describe their surroundings and provide their geographic position at the social network. The posts are treated by a text mining process and classified into five groups. Finally, the classified events are grouped in a data corpus and geo-visualized in the study area, to detect the places with more vehicular problems.
Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges.
Singhal, Ayush; Leaman, Robert; Catlett, Natalie; Lemberger, Thomas; McEntyre, Johanna; Polson, Shawn; Xenarios, Ioannis; Arighi, Cecilia; Lu, Zhiyong
2016-01-01
Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs. We note that system 'accuracy' remains a challenge and identify several additional common difficulties and potential research directions including (i) the 'scalability' issue due to the increasing need of mining information from millions of full-text articles, (ii) the 'interoperability' issue of integrating various text-mining systems into existing curation workflows and (iii) the 'reusability' issue on the difficulty of applying trained systems to text genres that are not seen previously during development. We then describe related efforts within the text-mining community, with a special focus on the BioCreative series of challenge workshops. We believe that focusing on the near-term challenges identified in this work will amplify the opportunities afforded by the continued adoption of text-mining tools. Finally, in order to sustain the curation ecosystem and have text-mining systems adopted for practical benefits, we call for increased collaboration between text-mining researchers and various stakeholders, including researchers, publishers and biocurators. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.
Text Mining for Adverse Drug Events: the Promise, Challenges, and State of the Art
Harpaz, Rave; Callahan, Alison; Tamang, Suzanne; Low, Yen; Odgers, David; Finlayson, Sam; Jung, Kenneth; LePendu, Paea; Shah, Nigam H.
2014-01-01
Text mining is the computational process of extracting meaningful information from large amounts of unstructured text. Text mining is emerging as a tool to leverage underutilized data sources that can improve pharmacovigilance, including the objective of adverse drug event detection and assessment. This article provides an overview of recent advances in pharmacovigilance driven by the application of text mining, and discusses several data sources—such as biomedical literature, clinical narratives, product labeling, social media, and Web search logs—that are amenable to text-mining for pharmacovigilance. Given the state of the art, it appears text mining can be applied to extract useful ADE-related information from multiple textual sources. Nonetheless, further research is required to address remaining technical challenges associated with the text mining methodologies, and to conclusively determine the relative contribution of each textual source to improving pharmacovigilance. PMID:25151493
ERIC Educational Resources Information Center
Trybula, Walter J.
1999-01-01
Reviews the state of research in text mining, focusing on newer developments. The intent is to describe the disparate investigations currently included under the term text mining and provide a cohesive structure for these efforts. A summary of research identifies key organizations responsible for pushing the development of text mining. A section…
Empirical advances with text mining of electronic health records.
Delespierre, T; Denormandie, P; Bar-Hen, A; Josseran, L
2017-08-22
Korian is a private group specializing in medical accommodations for elderly and dependent people. A professional data warehouse (DWH) established in 2010 hosts all of the residents' data. Inside this information system (IS), clinical narratives (CNs) were used only by medical staff as a residents' care linking tool. The objective of this study was to show that, through qualitative and quantitative textual analysis of a relatively small physiotherapy and well-defined CN sample, it was possible to build a physiotherapy corpus and, through this process, generate a new body of knowledge by adding relevant information to describe the residents' care and lives. Meaningful words were extracted through Standard Query Language (SQL) with the LIKE function and wildcards to perform pattern matching, followed by text mining and a word cloud using R® packages. Another step involved principal components and multiple correspondence analyses, plus clustering on the same residents' sample as well as on other health data using a health model measuring the residents' care level needs. By combining these techniques, physiotherapy treatments could be characterized by a list of constructed keywords, and the residents' health characteristics were built. Feeding defects or health outlier groups could be detected, physiotherapy residents' data and their health data were matched, and differences in health situations showed qualitative and quantitative differences in physiotherapy narratives. This textual experiment using a textual process in two stages showed that text mining and data mining techniques provide convenient tools to improve residents' health and quality of care by adding new, simple, useable data to the electronic health record (EHR). When used with a normalized physiotherapy problem list, text mining through information extraction (IE), named entity recognition (NER) and data mining (DM) can provide a real advantage to describe health care, adding new medical material and helping to integrate the EHR system into the health staff work environment.
Sahadevan, S; Hofmann-Apitius, M; Schellander, K; Tesfaye, D; Fluck, J; Friedrich, C M
2012-10-01
In biological research, establishing the prior art by searching and collecting information already present in the domain has equal importance as the experiments done. To obtain a complete overview about the relevant knowledge, researchers mainly rely on 2 major information sources: i) various biological databases and ii) scientific publications in the field. The major difference between the 2 information sources is that information from databases is available, typically well structured and condensed. The information content in scientific literature is vastly unstructured; that is, dispersed among the many different sections of scientific text. The traditional method of information extraction from scientific literature occurs by generating a list of relevant publications in the field of interest and manually scanning these texts for relevant information, which is very time consuming. It is more than likely that in using this "classical" approach the researcher misses some relevant information mentioned in the literature or has to go through biological databases to extract further information. Text mining and named entity recognition methods have already been used in human genomics and related fields as a solution to this problem. These methods can process and extract information from large volumes of scientific text. Text mining is defined as the automatic extraction of previously unknown and potentially useful information from text. Named entity recognition (NER) is defined as the method of identifying named entities (names of real world objects; for example, gene/protein names, drugs, enzymes) in text. In animal sciences, text mining and related methods have been briefly used in murine genomics and associated fields, leaving behind other fields of animal sciences, such as livestock genomics. The aim of this work was to develop an information retrieval platform in the livestock domain focusing on livestock publications and the recognition of relevant data from cattle and pigs. For this purpose, the rather noncomprehensive resources of pig and cattle gene and protein terminologies were enriched with orthologue synonyms, integrated in the NER platform, ProMiner, which is successfully used in human genomics domain. Based on the performance tests done, the present system achieved a fair performance with precision 0.64, recall 0.74, and F(1) measure of 0.69 in a test scenario based on cattle literature.
Biomedical text mining and its applications in cancer research.
Zhu, Fei; Patumcharoenpol, Preecha; Zhang, Cheng; Yang, Yang; Chan, Jonathan; Meechai, Asawin; Vongsangnak, Wanwipa; Shen, Bairong
2013-04-01
Cancer is a malignant disease that has caused millions of human deaths. Its study has a long history of well over 100years. There have been an enormous number of publications on cancer research. This integrated but unstructured biomedical text is of great value for cancer diagnostics, treatment, and prevention. The immense body and rapid growth of biomedical text on cancer has led to the appearance of a large number of text mining techniques aimed at extracting novel knowledge from scientific text. Biomedical text mining on cancer research is computationally automatic and high-throughput in nature. However, it is error-prone due to the complexity of natural language processing. In this review, we introduce the basic concepts underlying text mining and examine some frequently used algorithms, tools, and data sets, as well as assessing how much these algorithms have been utilized. We then discuss the current state-of-the-art text mining applications in cancer research and we also provide some resources for cancer text mining. With the development of systems biology, researchers tend to understand complex biomedical systems from a systems biology viewpoint. Thus, the full utilization of text mining to facilitate cancer systems biology research is fast becoming a major concern. To address this issue, we describe the general workflow of text mining in cancer systems biology and each phase of the workflow. We hope that this review can (i) provide a useful overview of the current work of this field; (ii) help researchers to choose text mining tools and datasets; and (iii) highlight how to apply text mining to assist cancer systems biology research. Copyright © 2012 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Xiaoyang, Zhong; Hong, Ren; Jingxin, Gao
2018-03-01
With the gradual maturity of the real estate market in China, urban housing prices are also better able to reflect changes in market demand and the commodity property of commercial housing has become more and more obvious. Many scholars in our country have made a lot of research on the factors that affect the price of commercial housing in the city and the number of related research papers increased rapidly. These scholars’ research results provide valuable wealth to solve the problem of urban housing price changes in our country. However, due to the huge amount of literature, the vast amount of information is submerged in the library and cannot be fully utilized. Text mining technology has been widely concerned and developed in the field of Humanities and Social Sciences in recent years. But through the text mining technology to obtain the influence factors on the price of urban commercial housing is still relatively rare. In this paper, the research results of the existing scholars were excavated by text mining algorithm based on support vector machine in order to further make full use of the current research results and to provide a reference for stabilizing housing prices.
ERIC Educational Resources Information Center
Yu, Chong Ho; Jannasch-Pennell, Angel; DiGangi, Samuel
2011-01-01
The objective of this article is to illustrate that text mining and qualitative research are epistemologically compatible. First, like many qualitative research approaches, such as grounded theory, text mining encourages open-mindedness and discourages preconceptions. Contrary to the popular belief that text mining is a linear and fully automated…
Text mining meets workflow: linking U-Compare with Taverna
Kano, Yoshinobu; Dobson, Paul; Nakanishi, Mio; Tsujii, Jun'ichi; Ananiadou, Sophia
2010-01-01
Summary: Text mining from the biomedical literature is of increasing importance, yet it is not easy for the bioinformatics community to create and run text mining workflows due to the lack of accessibility and interoperability of the text mining resources. The U-Compare system provides a wide range of bio text mining resources in a highly interoperable workflow environment where workflows can very easily be created, executed, evaluated and visualized without coding. We have linked U-Compare to Taverna, a generic workflow system, to expose text mining functionality to the bioinformatics community. Availability: http://u-compare.org/taverna.html, http://u-compare.org Contact: kano@is.s.u-tokyo.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20709690
A Node Linkage Approach for Sequential Pattern Mining
Navarro, Osvaldo; Cumplido, René; Villaseñor-Pineda, Luis; Feregrino-Uribe, Claudia; Carrasco-Ochoa, Jesús Ariel
2014-01-01
Sequential Pattern Mining is a widely addressed problem in data mining, with applications such as analyzing Web usage, examining purchase behavior, and text mining, among others. Nevertheless, with the dramatic increase in data volume, the current approaches prove inefficient when dealing with large input datasets, a large number of different symbols and low minimum supports. In this paper, we propose a new sequential pattern mining algorithm, which follows a pattern-growth scheme to discover sequential patterns. Unlike most pattern growth algorithms, our approach does not build a data structure to represent the input dataset, but instead accesses the required sequences through pseudo-projection databases, achieving better runtime and reducing memory requirements. Our algorithm traverses the search space in a depth-first fashion and only preserves in memory a pattern node linkage and the pseudo-projections required for the branch being explored at the time. Experimental results show that our new approach, the Node Linkage Depth-First Traversal algorithm (NLDFT), has better performance and scalability in comparison with state of the art algorithms. PMID:24933123
Survey of Natural Language Processing Techniques in Bioinformatics.
Zeng, Zhiqiang; Shi, Hua; Wu, Yun; Hong, Zhiling
2015-01-01
Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers.
Text Mining in Organizational Research
Kobayashi, Vladimer B.; Berkers, Hannah A.; Kismihók, Gábor; Den Hartog, Deanne N.
2017-01-01
Despite the ubiquity of textual data, so far few researchers have applied text mining to answer organizational research questions. Text mining, which essentially entails a quantitative approach to the analysis of (usually) voluminous textual data, helps accelerate knowledge discovery by radically increasing the amount data that can be analyzed. This article aims to acquaint organizational researchers with the fundamental logic underpinning text mining, the analytical stages involved, and contemporary techniques that may be used to achieve different types of objectives. The specific analytical techniques reviewed are (a) dimensionality reduction, (b) distance and similarity computing, (c) clustering, (d) topic modeling, and (e) classification. We describe how text mining may extend contemporary organizational research by allowing the testing of existing or new research questions with data that are likely to be rich, contextualized, and ecologically valid. After an exploration of how evidence for the validity of text mining output may be generated, we conclude the article by illustrating the text mining process in a job analysis setting using a dataset composed of job vacancies. PMID:29881248
Text Mining in Organizational Research.
Kobayashi, Vladimer B; Mol, Stefan T; Berkers, Hannah A; Kismihók, Gábor; Den Hartog, Deanne N
2018-07-01
Despite the ubiquity of textual data, so far few researchers have applied text mining to answer organizational research questions. Text mining, which essentially entails a quantitative approach to the analysis of (usually) voluminous textual data, helps accelerate knowledge discovery by radically increasing the amount data that can be analyzed. This article aims to acquaint organizational researchers with the fundamental logic underpinning text mining, the analytical stages involved, and contemporary techniques that may be used to achieve different types of objectives. The specific analytical techniques reviewed are (a) dimensionality reduction, (b) distance and similarity computing, (c) clustering, (d) topic modeling, and (e) classification. We describe how text mining may extend contemporary organizational research by allowing the testing of existing or new research questions with data that are likely to be rich, contextualized, and ecologically valid. After an exploration of how evidence for the validity of text mining output may be generated, we conclude the article by illustrating the text mining process in a job analysis setting using a dataset composed of job vacancies.
Knowledge Discovery from Massive Healthcare Claims Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chandola, Varun; Sukumar, Sreenivas R; Schryver, Jack C
The role of big data in addressing the needs of the present healthcare system in US and rest of the world has been echoed by government, private, and academic sectors. There has been a growing emphasis to explore the promise of big data analytics in tapping the potential of the massive healthcare data emanating from private and government health insurance providers. While the domain implications of such collaboration are well known, this type of data has been explored to a limited extent in the data mining community. The objective of this paper is two fold: first, we introduce the emergingmore » domain of big"healthcare claims data to the KDD community, and second, we describe the success and challenges that we encountered in analyzing this data using state of art analytics for massive data. Specically, we translate the problem of analyzing healthcare data into some of the most well-known analysis problems in the data mining community, social network analysis, text mining, and temporal analysis and higher order feature construction, and describe how advances within each of these areas can be leveraged to understand the domain of healthcare. Each case study illustrates a unique intersection of data mining and healthcare with a common objective of improving the cost-care ratio by mining for opportunities to improve healthcare operations and reducing hat seems to fall under fraud, waste,and abuse.« less
Advances in Machine Learning and Data Mining for Astronomy
NASA Astrophysics Data System (ADS)
Way, Michael J.; Scargle, Jeffrey D.; Ali, Kamal M.; Srivastava, Ashok N.
2012-03-01
Advances in Machine Learning and Data Mining for Astronomy documents numerous successful collaborations among computer scientists, statisticians, and astronomers who illustrate the application of state-of-the-art machine learning and data mining techniques in astronomy. Due to the massive amount and complexity of data in most scientific disciplines, the material discussed in this text transcends traditional boundaries between various areas in the sciences and computer science. The book's introductory part provides context to issues in the astronomical sciences that are also important to health, social, and physical sciences, particularly probabilistic and statistical aspects of classification and cluster analysis. The next part describes a number of astrophysics case studies that leverage a range of machine learning and data mining technologies. In the last part, developers of algorithms and practitioners of machine learning and data mining show how these tools and techniques are used in astronomical applications. With contributions from leading astronomers and computer scientists, this book is a practical guide to many of the most important developments in machine learning, data mining, and statistics. It explores how these advances can solve current and future problems in astronomy and looks at how they could lead to the creation of entirely new algorithms within the data mining community.
Text mining for adverse drug events: the promise, challenges, and state of the art.
Harpaz, Rave; Callahan, Alison; Tamang, Suzanne; Low, Yen; Odgers, David; Finlayson, Sam; Jung, Kenneth; LePendu, Paea; Shah, Nigam H
2014-10-01
Text mining is the computational process of extracting meaningful information from large amounts of unstructured text. It is emerging as a tool to leverage underutilized data sources that can improve pharmacovigilance, including the objective of adverse drug event (ADE) detection and assessment. This article provides an overview of recent advances in pharmacovigilance driven by the application of text mining, and discusses several data sources-such as biomedical literature, clinical narratives, product labeling, social media, and Web search logs-that are amenable to text mining for pharmacovigilance. Given the state of the art, it appears text mining can be applied to extract useful ADE-related information from multiple textual sources. Nonetheless, further research is required to address remaining technical challenges associated with the text mining methodologies, and to conclusively determine the relative contribution of each textual source to improving pharmacovigilance.
Cele, Emmanuel Nkosinathi; Maboeta, Mark
2016-01-01
An iron ore mine site in Swaziland is currently (2015) in a derelict state as a consequence of past (1964-1988) and present (2011 - current) iron ore mining operations. In order to control problems associated with mine wastes, the Swaziland Water Services Corporation (SWSC) recently (2013) proposed the application of biosolids in sites degraded by mining operations. It is thought that this practice could generally improve soil conditions and enhance plant reestablishment. More importantly, the SWSC foresees this as a potential solution to the biosolids disposal problems. In order to investigate the effects of biosolids and plants in soil physicochemical conditions of iron mine soils, we conducted two plant growth trials. Trial 1 consisted of tailings that received biosolids and topsoil (TUSB mix) while in trial 2, tailings received biosolids only (TB mix). In the two trials, the application rates of 0 (control), 10, 25, 50, 75 and 100 t ha(-1) were used. After 30 days of equilibration, 25 seeds of Cynodon dactylon were sown in each pot and thinned to 10 plants after 4 weeks. Plants were watered twice weekly and remained under greenhouse conditions for 12 weeks, subsequent to which soils were subjected to chemical analysis. According to the results obtained, there were significant improvements in soil parameters related to fertility such as organic matter (OM), water holding capacity (WHC), cation exchange capacity (CEC), ammonium [Formula: see text] , magnesium (Mg(2+)), calcium (Ca(2+)) and phosphorus ( [Formula: see text] ). With regard to heavy metals, biosolids led to significant increases in soil total concentrations of Cu, Zn, Cd, Hg and Pb. The higher concentrations of Zn and Cu in treated tailings compared to undisturbed adjacent soils are a cause for concern because in the field, this might work against the broader objectives of mine soil remediation, which include the recolonization of reclaimed sites by soil-dwelling organisms. Therefore, while biosolids contain important nutrients that may greatly improve physicochemical conditions and enhance vegetation reestablishment in mined soils, the threat of the build-up of higher levels of trace elements in treated tailings compared to surrounding adjacent soils must not be underestimated. Copyright © 2015 Elsevier Ltd. All rights reserved.
Plumlee, Geoffrey S.; Morman, Suzette A.
2011-01-01
Historical mining and mineral processing have been linked definitively to health problems resulting from occupational and environmental exposures to mine wastes. Modern mining and processing methods, when properly designed and implemented, prevent or greatly reduce potential environmental health impacts. However, particularly in developing countries, there are examples of health problems linked to recent mining. In other cases, recent mining has been blamed for health problems but no clear links have been found. The types and abundances of potential toxicants in mine wastes are predictably influenced by the geologic characteristics of the deposit being mined. Hence, Earth scientists can help understand, anticipate, and mitigate potential health issues associated with mining and mineral processing.
Health Terrain: Visualizing Large Scale Health Data
2015-12-01
Text mining ; Data mining . 16. SECURITY CLASSIFICATION OF: 17... text mining algorithms to construct a concept space. A browser-‐based user interface is developed to...Public health data, Notifiable condition detector, Text mining , Data mining 4 of 29 Disease Patient Location Term
Introducing Text Analytics as a Graduate Business School Course
ERIC Educational Resources Information Center
Edgington, Theresa M.
2011-01-01
Text analytics refers to the process of analyzing unstructured data from documented sources, including open-ended surveys, blogs, and other types of web dialog. Text analytics has enveloped the concept of text mining, an analysis approach influenced heavily from data mining. While text mining has been covered extensively in various computer…
Itatani, Tomoya; Nagata, Kyoko; Yanagihara, Kiyoko; Tabuchi, Noriko
2017-08-22
The importance of active learning has continued to increase in Japan. The authors conducted classes for first-year students who entered the nursing program using the problem-based learning method which is a kind of active learning. Students discussed social topics in classes. The purposes of this study were to analyze the post-class essay, describe logical and critical thinking after attended a Problem-Based Learning (PBL) course. The authors used Mayring's methodology for qualitative content analysis and text mining. In the description about the skills required to resolve social issues, seven categories were extracted: (recognition of diverse social issues), (attitudes about resolving social issues), (discerning the root cause), (multi-lateral information processing skills), (making a path to resolve issues), (processivity in dealing with issues), and (reflecting). In the description about communication, five categories were extracted: (simple statement), (robust theories), (respecting the opponent), (communication skills), and (attractive presentations). As the result of text mining, the words extracted more than 100 times included "issue," "society," "resolve," "myself," "ability," "opinion," and "information." Education using PBL could be an effective means of improving skills that students described, and communication in general. Some students felt difficulty of communication resulting from characteristics of Japanese.
Van Landeghem, Sofie; De Bodt, Stefanie; Drebert, Zuzanna J; Inzé, Dirk; Van de Peer, Yves
2013-03-01
Despite the availability of various data repositories for plant research, a wealth of information currently remains hidden within the biomolecular literature. Text mining provides the necessary means to retrieve these data through automated processing of texts. However, only recently has advanced text mining methodology been implemented with sufficient computational power to process texts at a large scale. In this study, we assess the potential of large-scale text mining for plant biology research in general and for network biology in particular using a state-of-the-art text mining system applied to all PubMed abstracts and PubMed Central full texts. We present extensive evaluation of the textual data for Arabidopsis thaliana, assessing the overall accuracy of this new resource for usage in plant network analyses. Furthermore, we combine text mining information with both protein-protein and regulatory interactions from experimental databases. Clusters of tightly connected genes are delineated from the resulting network, illustrating how such an integrative approach is essential to grasp the current knowledge available for Arabidopsis and to uncover gene information through guilt by association. All large-scale data sets, as well as the manually curated textual data, are made publicly available, hereby stimulating the application of text mining data in future plant biology studies.
Integration of Text- and Data-Mining Technologies for Use in Banking Applications
NASA Astrophysics Data System (ADS)
Maslankowski, Jacek
Unstructured data, most of it in the form of text files, typically accounts for 85% of an organization's knowledge stores, but it's not always easy to find, access, analyze or use (Robb 2004). That is why it is important to use solutions based on text and data mining. This solution is known as duo mining. This leads to improve management based on knowledge owned in organization. The results are interesting. Data mining provides to lead with structuralized data, usually powered from data warehouses. Text mining, sometimes called web mining, looks for patterns in unstructured data — memos, document and www. Integrating text-based information with structured data enriches predictive modeling capabilities and provides new stores of insightful and valuable information for driving business and research initiatives forward.
Searching for Significance in Unstructured Data: Text Mining with Leximancer
ERIC Educational Resources Information Center
Thomas, David A.
2014-01-01
Scholars in many knowledge domains rely on sophisticated information technologies to search for and retrieve records and publications pertinent to their research interests. But what is a scholar to do when a search identifies hundreds of documents, any of which might be vital or irrelevant to his or her work? The problem is further complicated by…
ERIC Educational Resources Information Center
Mu, Jin; Stegmann, Karsten; Mayfield, Elijah; Rose, Carolyn; Fischer, Frank
2012-01-01
Research related to online discussions frequently faces the problem of analyzing huge corpora. Natural Language Processing (NLP) technologies may allow automating this analysis. However, the state-of-the-art in machine learning and text mining approaches yields models that do not transfer well between corpora related to different topics. Also,…
The Use of Gamification in Education: A Bibliometric and Text Mining Analysis
ERIC Educational Resources Information Center
Martí-Parreño, J.; Méndez-Ibáñez, E.; Alonso-Arroyo, A.
2016-01-01
The use of games in education represents a promising tool to motivate and engage students in their learning process. Most of previous research on the topic has focused to develop theoretical frameworks or to conduct experiments as a means to analyse learning outcomes such as knowledge retention, problem-solving skills gains or attitudes toward…
Enhancing a Core Journal Collection for Digital Libraries
ERIC Educational Resources Information Center
Kovacevic, Ana; Devedzic, Vladan; Pocajt, Viktor
2010-01-01
Purpose: This paper aims to address the problem of enhancing the selection of titles offered by a digital library, by analysing the differences in these titles when they are cited by local authors in their publications and when they are listed in the digital library offer. Design/methodology/approach: Text mining techniques were used to identify…
SparkText: Biomedical Text Mining on Big Data Framework.
Ye, Zhan; Tafti, Ahmad P; He, Karen Y; Wang, Kai; He, Max M
Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research.
SparkText: Biomedical Text Mining on Big Data Framework
He, Karen Y.; Wang, Kai
2016-01-01
Background Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. Results In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. Conclusions This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research. PMID:27685652
ERIC Educational Resources Information Center
Qin, Jian; Jurisica, Igor; Liddy, Elizabeth D.; Jansen, Bernard J; Spink, Amanda; Priss, Uta; Norton, Melanie J.
2000-01-01
These six articles discuss knowledge discovery in databases (KDD). Topics include data mining; knowledge management systems; applications of knowledge discovery; text and Web mining; text mining and information retrieval; user search patterns through Web log analysis; concept analysis; data collection; and data structure inconsistency. (LRW)
Biomedical text mining for research rigor and integrity: tasks, challenges, directions.
Kilicoglu, Halil
2017-06-13
An estimated quarter of a trillion US dollars is invested in the biomedical research enterprise annually. There is growing alarm that a significant portion of this investment is wasted because of problems in reproducibility of research findings and in the rigor and integrity of research conduct and reporting. Recent years have seen a flurry of activities focusing on standardization and guideline development to enhance the reproducibility and rigor of biomedical research. Research activity is primarily communicated via textual artifacts, ranging from grant applications to journal publications. These artifacts can be both the source and the manifestation of practices leading to research waste. For example, an article may describe a poorly designed experiment, or the authors may reach conclusions not supported by the evidence presented. In this article, we pose the question of whether biomedical text mining techniques can assist the stakeholders in the biomedical research enterprise in doing their part toward enhancing research integrity and rigor. In particular, we identify four key areas in which text mining techniques can make a significant contribution: plagiarism/fraud detection, ensuring adherence to reporting guidelines, managing information overload and accurate citation/enhanced bibliometrics. We review the existing methods and tools for specific tasks, if they exist, or discuss relevant research that can provide guidance for future work. With the exponential increase in biomedical research output and the ability of text mining approaches to perform automatic tasks at large scale, we propose that such approaches can support tools that promote responsible research practices, providing significant benefits for the biomedical research enterprise. Published by Oxford University Press 2017. This work is written by a US Government employee and is in the public domain in the US.
Westergaard, David; Stærfeldt, Hans-Henrik; Tønsberg, Christian; Jensen, Lars Juhl; Brunak, Søren
2018-02-01
Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823-2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein-protein, disease-gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only.
Westergaard, David; Stærfeldt, Hans-Henrik
2018-01-01
Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823–2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein–protein, disease–gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only. PMID:29447159
Searching for 'Unknown Unknowns'
NASA Technical Reports Server (NTRS)
Parsons, Vickie S.
2005-01-01
The NASA Engineering and Safety Center (NESC) was established to improve safety through engineering excellence within NASA programs and projects. As part of this goal, methods are being investigated to enable the NESC to become proactive in identifying areas that may be precursors to future problems. The goal is to find unknown indicators of future problems, not to duplicate the program-specific trending efforts. The data that is critical for detecting these indicators exist in a plethora of dissimilar non-conformance and other databases (without a common format or taxonomy). In fact, much of the data is unstructured text. However, one common database is not required if the right standards and electronic tools are employed. Electronic data mining is a particularly promising tool for this effort into unsupervised learning of common factors. This work in progress began with a systematic evaluation of available data mining software packages, based on documented decision techniques using weighted criteria. The four packages, which were perceived to have the most promise for NASA applications, are being benchmarked and evaluated by independent contractors. Preliminary recommendations for "best practices" in data mining and trending are provided. Final results and recommendations should be available in the Fall 2005. This critical first step in identifying "unknown unknowns" before they become problems is applicable to any set of engineering or programmatic data.
Text mining for the biocuration workflow
Hirschman, Lynette; Burns, Gully A. P. C; Krallinger, Martin; Arighi, Cecilia; Cohen, K. Bretonnel; Valencia, Alfonso; Wu, Cathy H.; Chatr-Aryamontri, Andrew; Dowell, Karen G.; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G.
2012-01-01
Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on ‘Text Mining for the BioCuration Workflow’ at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community. PMID:22513129
Text mining for the biocuration workflow.
Hirschman, Lynette; Burns, Gully A P C; Krallinger, Martin; Arighi, Cecilia; Cohen, K Bretonnel; Valencia, Alfonso; Wu, Cathy H; Chatr-Aryamontri, Andrew; Dowell, Karen G; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G
2012-01-01
Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on 'Text Mining for the BioCuration Workflow' at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community.
Van Landeghem, Sofie; De Bodt, Stefanie; Drebert, Zuzanna J.; Inzé, Dirk; Van de Peer, Yves
2013-01-01
Despite the availability of various data repositories for plant research, a wealth of information currently remains hidden within the biomolecular literature. Text mining provides the necessary means to retrieve these data through automated processing of texts. However, only recently has advanced text mining methodology been implemented with sufficient computational power to process texts at a large scale. In this study, we assess the potential of large-scale text mining for plant biology research in general and for network biology in particular using a state-of-the-art text mining system applied to all PubMed abstracts and PubMed Central full texts. We present extensive evaluation of the textual data for Arabidopsis thaliana, assessing the overall accuracy of this new resource for usage in plant network analyses. Furthermore, we combine text mining information with both protein–protein and regulatory interactions from experimental databases. Clusters of tightly connected genes are delineated from the resulting network, illustrating how such an integrative approach is essential to grasp the current knowledge available for Arabidopsis and to uncover gene information through guilt by association. All large-scale data sets, as well as the manually curated textual data, are made publicly available, hereby stimulating the application of text mining data in future plant biology studies. PMID:23532071
String Mining in Bioinformatics
NASA Astrophysics Data System (ADS)
Abouelhoda, Mohamed; Ghanem, Moustafa
Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra- and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word "data-mining" is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14].
Automated detection of follow-up appointments using text mining of discharge records.
Ruud, Kari L; Johnson, Matthew G; Liesinger, Juliette T; Grafft, Carrie A; Naessens, James M
2010-06-01
To determine whether text mining can accurately detect specific follow-up appointment criteria in free-text hospital discharge records. Cross-sectional study. Mayo Clinic Rochester hospitals. Inpatients discharged from general medicine services in 2006 (n = 6481). Textual hospital dismissal summaries were manually reviewed to determine whether the records contained specific follow-up appointment arrangement elements: date, time and either physician or location for an appointment. The data set was evaluated for the same criteria using SAS Text Miner software. The two assessments were compared to determine the accuracy of text mining for detecting records containing follow-up appointment arrangements. Agreement of text-mined appointment findings with gold standard (manual abstraction) including sensitivity, specificity, positive predictive and negative predictive values (PPV and NPV). About 55.2% (3576) of discharge records contained all criteria for follow-up appointment arrangements according to the manual review, 3.2% (113) of which were missed through text mining. Text mining incorrectly identified 3.7% (107) follow-up appointments that were not considered valid through manual review. Therefore, the text mining analysis concurred with the manual review in 96.6% of the appointment findings. Overall sensitivity and specificity were 96.8 and 96.3%, respectively; and PPV and NPV were 97.0 and 96.1%, respectively. of individual appointment criteria resulted in accuracy rates of 93.5% for date, 97.4% for time, 97.5% for physician and 82.9% for location. Text mining of unstructured hospital dismissal summaries can accurately detect documentation of follow-up appointment arrangement elements, thus saving considerable resources for performance assessment and quality-related research.
Wiegers, Thomas C; Davis, Allan Peter; Mattingly, Carolyn J
2014-01-01
The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation tasks collectively represent a community-wide effort to evaluate a variety of text-mining and information extraction systems applied to the biological domain. The BioCreative IV Workshop included five independent subject areas, including Track 3, which focused on named-entity recognition (NER) for the Comparative Toxicogenomics Database (CTD; http://ctdbase.org). Previously, CTD had organized document ranking and NER-related tasks for the BioCreative Workshop 2012; a key finding of that effort was that interoperability and integration complexity were major impediments to the direct application of the systems to CTD's text-mining pipeline. This underscored a prevailing problem with software integration efforts. Major interoperability-related issues included lack of process modularity, operating system incompatibility, tool configuration complexity and lack of standardization of high-level inter-process communications. One approach to potentially mitigate interoperability and general integration issues is the use of Web services to abstract implementation details; rather than integrating NER tools directly, HTTP-based calls from CTD's asynchronous, batch-oriented text-mining pipeline could be made to remote NER Web services for recognition of specific biological terms using BioC (an emerging family of XML formats) for inter-process communications. To test this concept, participating groups developed Representational State Transfer /BioC-compliant Web services tailored to CTD's NER requirements. Participants were provided with a comprehensive set of training materials. CTD evaluated results obtained from the remote Web service-based URLs against a test data set of 510 manually curated scientific articles. Twelve groups participated in the challenge. Recall, precision, balanced F-scores and response times were calculated. Top balanced F-scores for gene, chemical and disease NER were 61, 74 and 51%, respectively. Response times ranged from fractions-of-a-second to over a minute per article. We present a description of the challenge and summary of results, demonstrating how curation groups can effectively use interoperable NER technologies to simplify text-mining pipeline implementation. Database URL: http://ctdbase.org/ © The Author(s) 2014. Published by Oxford University Press.
Wiegers, Thomas C.; Davis, Allan Peter; Mattingly, Carolyn J.
2014-01-01
The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation tasks collectively represent a community-wide effort to evaluate a variety of text-mining and information extraction systems applied to the biological domain. The BioCreative IV Workshop included five independent subject areas, including Track 3, which focused on named-entity recognition (NER) for the Comparative Toxicogenomics Database (CTD; http://ctdbase.org). Previously, CTD had organized document ranking and NER-related tasks for the BioCreative Workshop 2012; a key finding of that effort was that interoperability and integration complexity were major impediments to the direct application of the systems to CTD's text-mining pipeline. This underscored a prevailing problem with software integration efforts. Major interoperability-related issues included lack of process modularity, operating system incompatibility, tool configuration complexity and lack of standardization of high-level inter-process communications. One approach to potentially mitigate interoperability and general integration issues is the use of Web services to abstract implementation details; rather than integrating NER tools directly, HTTP-based calls from CTD's asynchronous, batch-oriented text-mining pipeline could be made to remote NER Web services for recognition of specific biological terms using BioC (an emerging family of XML formats) for inter-process communications. To test this concept, participating groups developed Representational State Transfer /BioC-compliant Web services tailored to CTD's NER requirements. Participants were provided with a comprehensive set of training materials. CTD evaluated results obtained from the remote Web service-based URLs against a test data set of 510 manually curated scientific articles. Twelve groups participated in the challenge. Recall, precision, balanced F-scores and response times were calculated. Top balanced F-scores for gene, chemical and disease NER were 61, 74 and 51%, respectively. Response times ranged from fractions-of-a-second to over a minute per article. We present a description of the challenge and summary of results, demonstrating how curation groups can effectively use interoperable NER technologies to simplify text-mining pipeline implementation. Database URL: http://ctdbase.org/ PMID:24919658
Mining method selection by integrated AHP and PROMETHEE method.
Bogdanovic, Dejan; Nikolic, Djordje; Ilic, Ivana
2012-03-01
Selecting the best mining method among many alternatives is a multicriteria decision making problem. The aim of this paper is to demonstrate the implementation of an integrated approach that employs AHP and PROMETHEE together for selecting the most suitable mining method for the "Coka Marin" underground mine in Serbia. The related problem includes five possible mining methods and eleven criteria to evaluate them. Criteria are accurately chosen in order to cover the most important parameters that impact on the mining method selection, such as geological and geotechnical properties, economic parameters and geographical factors. The AHP is used to analyze the structure of the mining method selection problem and to determine weights of the criteria, and PROMETHEE method is used to obtain the final ranking and to make a sensitivity analysis by changing the weights. The results have shown that the proposed integrated method can be successfully used in solving mining engineering problems.
Automatic target validation based on neuroscientific literature mining for tractography
Vasques, Xavier; Richardet, Renaud; Hill, Sean L.; Slater, David; Chappelier, Jean-Cedric; Pralong, Etienne; Bloch, Jocelyne; Draganski, Bogdan; Cif, Laura
2015-01-01
Target identification for tractography studies requires solid anatomical knowledge validated by an extensive literature review across species for each seed structure to be studied. Manual literature review to identify targets for a given seed region is tedious and potentially subjective. Therefore, complementary approaches would be useful. We propose to use text-mining models to automatically suggest potential targets from the neuroscientific literature, full-text articles and abstracts, so that they can be used for anatomical connection studies and more specifically for tractography. We applied text-mining models to three structures: two well-studied structures, since validated deep brain stimulation targets, the internal globus pallidus and the subthalamic nucleus and, the nucleus accumbens, an exploratory target for treating psychiatric disorders. We performed a systematic review of the literature to document the projections of the three selected structures and compared it with the targets proposed by text-mining models, both in rat and primate (including human). We ran probabilistic tractography on the nucleus accumbens and compared the output with the results of the text-mining models and literature review. Overall, text-mining the literature could find three times as many targets as two man-weeks of curation could. The overall efficiency of the text-mining against literature review in our study was 98% recall (at 36% precision), meaning that over all the targets for the three selected seeds, only one target has been missed by text-mining. We demonstrate that connectivity for a structure of interest can be extracted from a very large amount of publications and abstracts. We believe this tool will be useful in helping the neuroscience community to facilitate connectivity studies of particular brain regions. The text mining tools used for the study are part of the HBP Neuroinformatics Platform, publicly available at http://connectivity-brainer.rhcloud.com/. PMID:26074781
Decision support methods for the environmental assessment of contamination at mining sites.
Jordan, Gyozo; Abdaal, Ahmed
2013-09-01
Polluting mine accidents and widespread environmental contamination associated with historic mining in Europe and elsewhere has triggered the improvement of related environmental legislation and of the environmental assessment and management methods for the mining industry. Mining has some unique features such as natural background pollution associated with natural mineral deposits, industrial activities and contamination located in the three-dimensional sub-surface space, the problem of long-term remediation after mine closure, problem of secondary contaminated areas around mine sites and abandoned mines in historic regions like Europe. These mining-specific problems require special tools to address the complexity of the environmental problems of mining-related contamination. The objective of this paper is to review and evaluate some of the decision support methods that have been developed and applied to mining contamination. In this paper, only those methods that are both efficient decision support tools and provide a 'holistic' approach to the complex problem as well are considered. These tools are (1) landscape ecology, (2) industrial ecology, (3) landscape geochemistry, (4) geo-environmental models, (5) environmental impact assessment, (6) environmental risk assessment, (7) material flow analysis and (8) life cycle assessment. This unique inter-disciplinary study should enable both the researcher and the practitioner to obtain broad view on the state-of-the-art of decision support methods for the environmental assessment of contamination at mine sites. Documented examples and abundant references are also provided.
Mine Water Treatment in Hongai Coal Mines
NASA Astrophysics Data System (ADS)
Dang, Phuong Thao; Dang, Vu Chi
2018-03-01
Acid mine drainage (AMD) is recognized as one of the most serious environmental problem associated with mining industry. Acid water, also known as acid mine drainage forms when iron sulfide minerals found in the rock of coal seams are exposed to oxidizing conditions in coal mining. Until 2009, mine drainage in Hongai coal mines was not treated, leading to harmful effects on humans, animals and aquatic ecosystem. This report has examined acid mine drainage problem and techniques for acid mine drainage treatment in Hongai coal mines. In addition, selection and criteria for the design of the treatment systems have been presented.
NASA Technical Reports Server (NTRS)
Srivastava, Ashok, N.; Akella, Ram; Diev, Vesselin; Kumaresan, Sakthi Preethi; McIntosh, Dawn M.; Pontikakis, Emmanuel D.; Xu, Zuobing; Zhang, Yi
2006-01-01
This paper describes the results of a significant research and development effort conducted at NASA Ames Research Center to develop new text mining techniques to discover anomalies in free-text reports regarding system health and safety of two aerospace systems. We discuss two problems of significant importance in the aviation industry. The first problem is that of automatic anomaly discovery about an aerospace system through the analysis of tens of thousands of free-text problem reports that are written about the system. The second problem that we address is that of automatic discovery of recurring anomalies, i.e., anomalies that may be described m different ways by different authors, at varying times and under varying conditions, but that are truly about the same part of the system. The intent of recurring anomaly identification is to determine project or system weakness or high-risk issues. The discovery of recurring anomalies is a key goal in building safe, reliable, and cost-effective aerospace systems. We address the anomaly discovery problem on thousands of free-text reports using two strategies: (1) as an unsupervised learning problem where an algorithm takes free-text reports as input and automatically groups them into different bins, where each bin corresponds to a different unknown anomaly category; and (2) as a supervised learning problem where the algorithm classifies the free-text reports into one of a number of known anomaly categories. We then discuss the application of these methods to the problem of discovering recurring anomalies. In fact the special nature of recurring anomalies (very small cluster sizes) requires incorporating new methods and measures to enhance the original approach for anomaly detection. ?& pant 0-
Adaptive semantic tag mining from heterogeneous clinical research texts.
Hao, T; Weng, C
2015-01-01
To develop an adaptive approach to mine frequent semantic tags (FSTs) from heterogeneous clinical research texts. We develop a "plug-n-play" framework that integrates replaceable unsupervised kernel algorithms with formatting, functional, and utility wrappers for FST mining. Temporal information identification and semantic equivalence detection were two example functional wrappers. We first compared this approach's recall and efficiency for mining FSTs from ClinicalTrials.gov to that of a recently published tag-mining algorithm. Then we assessed this approach's adaptability to two other types of clinical research texts: clinical data requests and clinical trial protocols, by comparing the prevalence trends of FSTs across three texts. Our approach increased the average recall and speed by 12.8% and 47.02% respectively upon the baseline when mining FSTs from ClinicalTrials.gov, and maintained an overlap in relevant FSTs with the base- line ranging between 76.9% and 100% for varying FST frequency thresholds. The FSTs saturated when the data size reached 200 documents. Consistent trends in the prevalence of FST were observed across the three texts as the data size or frequency threshold changed. This paper contributes an adaptive tag-mining framework that is scalable and adaptable without sacrificing its recall. This component-based architectural design can be potentially generalizable to improve the adaptability of other clinical text mining methods.
NASA Astrophysics Data System (ADS)
Jordan, Gyozo
2009-07-01
Wide-spread environmental contamination associated with historic mining in Europe has triggered social responses to improve related environmental legislation, the environmental assessment and management methods for the mining industry. Mining has some unique features such as natural background contamination associated with mineral deposits, industrial activities and contamination in the three-dimensional subsurface space, problem of long-term remediation after mine closure, problem of secondary contaminated areas around mine sites, land use conflicts and abandoned mines. These problems require special tools to address the complexity of the environmental problems of mining-related contamination. The objective of this paper is to show how regional mineral resources mapping has developed into the spatial contamination risk assessment of mining and how geological knowledge can be transferred to environmental assessment of mines. The paper provides a state-of-the-art review of the spatial mine inventory, hazard, impact and risk assessment and ranking methods developed by national and international efforts in Europe. It is concluded that geological knowledge on mineral resources exploration is essential and should be used for the environmental contamination assessment of mines. Also, sufficient methodological experience, knowledge and documented results are available, but harmonisation of these methods is still required for the efficient spatial environmental assessment of mine contamination.
Text mining resources for the life sciences.
Przybyła, Piotr; Shardlow, Matthew; Aubin, Sophie; Bossy, Robert; Eckart de Castilho, Richard; Piperidis, Stelios; McNaught, John; Ananiadou, Sophia
2016-01-01
Text mining is a powerful technology for quickly distilling key information from vast quantities of biomedical literature. However, to harness this power the researcher must be well versed in the availability, suitability, adaptability, interoperability and comparative accuracy of current text mining resources. In this survey, we give an overview of the text mining resources that exist in the life sciences to help researchers, especially those employed in biocuration, to engage with text mining in their own work. We categorize the various resources under three sections: Content Discovery looks at where and how to find biomedical publications for text mining; Knowledge Encoding describes the formats used to represent the different levels of information associated with content that enable text mining, including those formats used to carry such information between processes; Tools and Services gives an overview of workflow management systems that can be used to rapidly configure and compare domain- and task-specific processes, via access to a wide range of pre-built tools. We also provide links to relevant repositories in each section to enable the reader to find resources relevant to their own area of interest. Throughout this work we give a special focus to resources that are interoperable-those that have the crucial ability to share information, enabling smooth integration and reusability. © The Author(s) 2016. Published by Oxford University Press.
Text mining resources for the life sciences
Shardlow, Matthew; Aubin, Sophie; Bossy, Robert; Eckart de Castilho, Richard; Piperidis, Stelios; McNaught, John; Ananiadou, Sophia
2016-01-01
Text mining is a powerful technology for quickly distilling key information from vast quantities of biomedical literature. However, to harness this power the researcher must be well versed in the availability, suitability, adaptability, interoperability and comparative accuracy of current text mining resources. In this survey, we give an overview of the text mining resources that exist in the life sciences to help researchers, especially those employed in biocuration, to engage with text mining in their own work. We categorize the various resources under three sections: Content Discovery looks at where and how to find biomedical publications for text mining; Knowledge Encoding describes the formats used to represent the different levels of information associated with content that enable text mining, including those formats used to carry such information between processes; Tools and Services gives an overview of workflow management systems that can be used to rapidly configure and compare domain- and task-specific processes, via access to a wide range of pre-built tools. We also provide links to relevant repositories in each section to enable the reader to find resources relevant to their own area of interest. Throughout this work we give a special focus to resources that are interoperable—those that have the crucial ability to share information, enabling smooth integration and reusability. PMID:27888231
Cohen, Raphael; Elhadad, Michael; Elhadad, Noémie
2013-01-16
The increasing availability of Electronic Health Record (EHR) data and specifically free-text patient notes presents opportunities for phenotype extraction. Text-mining methods in particular can help disease modeling by mapping named-entities mentions to terminologies and clustering semantically related terms. EHR corpora, however, exhibit specific statistical and linguistic characteristics when compared with corpora in the biomedical literature domain. We focus on copy-and-paste redundancy: clinicians typically copy and paste information from previous notes when documenting a current patient encounter. Thus, within a longitudinal patient record, one expects to observe heavy redundancy. In this paper, we ask three research questions: (i) How can redundancy be quantified in large-scale text corpora? (ii) Conventional wisdom is that larger corpora yield better results in text mining. But how does the observed EHR redundancy affect text mining? Does such redundancy introduce a bias that distorts learned models? Or does the redundancy introduce benefits by highlighting stable and important subsets of the corpus? (iii) How can one mitigate the impact of redundancy on text mining? We analyze a large-scale EHR corpus and quantify redundancy both in terms of word and semantic concept repetition. We observe redundancy levels of about 30% and non-standard distribution of both words and concepts. We measure the impact of redundancy on two standard text-mining applications: collocation identification and topic modeling. We compare the results of these methods on synthetic data with controlled levels of redundancy and observe significant performance variation. Finally, we compare two mitigation strategies to avoid redundancy-induced bias: (i) a baseline strategy, keeping only the last note for each patient in the corpus; (ii) removing redundant notes with an efficient fingerprinting-based algorithm. (a)For text mining, preprocessing the EHR corpus with fingerprinting yields significantly better results. Before applying text-mining techniques, one must pay careful attention to the structure of the analyzed corpora. While the importance of data cleaning has been known for low-level text characteristics (e.g., encoding and spelling), high-level and difficult-to-quantify corpus characteristics, such as naturally occurring redundancy, can also hurt text mining. Fingerprinting enables text-mining techniques to leverage available data in the EHR corpus, while avoiding the bias introduced by redundancy.
ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers.
Xing, Yuting; Wu, Chengkun; Yang, Xi; Wang, Wei; Zhu, En; Yin, Jianping
2018-04-27
A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER.
Itatani, Tomoya; Nagata, Kyoko; Yanagihara, Kiyoko; Tabuchi, Noriko
2017-01-01
The importance of active learning has continued to increase in Japan. The authors conducted classes for first-year students who entered the nursing program using the problem-based learning method which is a kind of active learning. Students discussed social topics in classes. The purposes of this study were to analyze the post-class essay, describe logical and critical thinking after attended a Problem-Based Learning (PBL) course. The authors used Mayring’s methodology for qualitative content analysis and text mining. In the description about the skills required to resolve social issues, seven categories were extracted: (recognition of diverse social issues), (attitudes about resolving social issues), (discerning the root cause), (multi-lateral information processing skills), (making a path to resolve issues), (processivity in dealing with issues), and (reflecting). In the description about communication, five categories were extracted: (simple statement), (robust theories), (respecting the opponent), (communication skills), and (attractive presentations). As the result of text mining, the words extracted more than 100 times included “issue,” “society,” “resolve,” “myself,” “ability,” “opinion,” and “information.” Education using PBL could be an effective means of improving skills that students described, and communication in general. Some students felt difficulty of communication resulting from characteristics of Japanese. PMID:28829362
Text-mining and information-retrieval services for molecular biology
Krallinger, Martin; Valencia, Alfonso
2005-01-01
Text-mining in molecular biology - defined as the automatic extraction of information about genes, proteins and their functional relationships from text documents - has emerged as a hybrid discipline on the edges of the fields of information science, bioinformatics and computational linguistics. A range of text-mining applications have been developed recently that will improve access to knowledge for biologists and database annotators. PMID:15998455
Text mining for traditional Chinese medical knowledge discovery: a survey.
Zhou, Xuezhong; Peng, Yonghong; Liu, Baoyan
2010-08-01
Extracting meaningful information and knowledge from free text is the subject of considerable research interest in the machine learning and data mining fields. Text data mining (or text mining) has become one of the most active research sub-fields in data mining. Significant developments in the area of biomedical text mining during the past years have demonstrated its great promise for supporting scientists in developing novel hypotheses and new knowledge from the biomedical literature. Traditional Chinese medicine (TCM) provides a distinct methodology with which to view human life. It is one of the most complete and distinguished traditional medicines with a history of several thousand years of studying and practicing the diagnosis and treatment of human disease. It has been shown that the TCM knowledge obtained from clinical practice has become a significant complementary source of information for modern biomedical sciences. TCM literature obtained from the historical period and from modern clinical studies has recently been transformed into digital data in the form of relational databases or text documents, which provide an effective platform for information sharing and retrieval. This motivates and facilitates research and development into knowledge discovery approaches and to modernize TCM. In order to contribute to this still growing field, this paper presents (1) a comparative introduction to TCM and modern biomedicine, (2) a survey of the related information sources of TCM, (3) a review and discussion of the state of the art and the development of text mining techniques with applications to TCM, (4) a discussion of the research issues around TCM text mining and its future directions. Copyright 2010 Elsevier Inc. All rights reserved.
Managing biological networks by using text mining and computer-aided curation
NASA Astrophysics Data System (ADS)
Yu, Seok Jong; Cho, Yongseong; Lee, Min-Ho; Lim, Jongtae; Yoo, Jaesoo
2015-11-01
In order to understand a biological mechanism in a cell, a researcher should collect a huge number of protein interactions with experimental data from experiments and the literature. Text mining systems that extract biological interactions from papers have been used to construct biological networks for a few decades. Even though the text mining of literature is necessary to construct a biological network, few systems with a text mining tool are available for biologists who want to construct their own biological networks. We have developed a biological network construction system called BioKnowledge Viewer that can generate a biological interaction network by using a text mining tool and biological taggers. It also Boolean simulation software to provide a biological modeling system to simulate the model that is made with the text mining tool. A user can download PubMed articles and construct a biological network by using the Multi-level Knowledge Emergence Model (KMEM), MetaMap, and A Biomedical Named Entity Recognizer (ABNER) as a text mining tool. To evaluate the system, we constructed an aging-related biological network that consist 9,415 nodes (genes) by using manual curation. With network analysis, we found that several genes, including JNK, AP-1, and BCL-2, were highly related in aging biological network. We provide a semi-automatic curation environment so that users can obtain a graph database for managing text mining results that are generated in the server system and can navigate the network with BioKnowledge Viewer, which is freely available at http://bioknowledgeviewer.kisti.re.kr.
Personalized Privacy-Preserving Frequent Itemset Mining Using Randomized Response
Sun, Chongjing; Fu, Yan; Zhou, Junlin; Gao, Hui
2014-01-01
Frequent itemset mining is the important first step of association rule mining, which discovers interesting patterns from the massive data. There are increasing concerns about the privacy problem in the frequent itemset mining. Some works have been proposed to handle this kind of problem. In this paper, we introduce a personalized privacy problem, in which different attributes may need different privacy levels protection. To solve this problem, we give a personalized privacy-preserving method by using the randomized response technique. By providing different privacy levels for different attributes, this method can get a higher accuracy on frequent itemset mining than the traditional method providing the same privacy level. Finally, our experimental results show that our method can have better results on the frequent itemset mining while preserving personalized privacy. PMID:25143989
Personalized privacy-preserving frequent itemset mining using randomized response.
Sun, Chongjing; Fu, Yan; Zhou, Junlin; Gao, Hui
2014-01-01
Frequent itemset mining is the important first step of association rule mining, which discovers interesting patterns from the massive data. There are increasing concerns about the privacy problem in the frequent itemset mining. Some works have been proposed to handle this kind of problem. In this paper, we introduce a personalized privacy problem, in which different attributes may need different privacy levels protection. To solve this problem, we give a personalized privacy-preserving method by using the randomized response technique. By providing different privacy levels for different attributes, this method can get a higher accuracy on frequent itemset mining than the traditional method providing the same privacy level. Finally, our experimental results show that our method can have better results on the frequent itemset mining while preserving personalized privacy.
An overview of the biocreative 2012 workshop track III: Interactive text mining task
USDA-ARS?s Scientific Manuscript database
An important question is how to make use of text mining to enhance the biocuration workflow. A number of groups have developed tools for text mining from a computer science/linguistics perspective and there are many initiatives to curate some aspect of biology from the literature. In some cases the ...
Rating prediction using textual reviews
NASA Astrophysics Data System (ADS)
NithyaKalyani, A.; Ushasukhanya, S.; Nagamalleswari, TYJ; Girija, S.
2018-04-01
Information today is present in the form of opinions. Two & a half quintillion bytes are exchanged today in Internet everyday and a large amount consists of people’s speculation and reflection over an issue. It is the need of the hour to be able to mine this information that is presented to us. Sentimental analysis refers to mining of this raw information to make sense. The discipline of opinion mining has seen a lot of encouragement in the past few years augmented by involvement of social media like Instagram, Facebook, and twitter. The hidden message in this web of information is useful in several fields such as marketing, political polls, product review, forecast market movement, Identifying detractor and promoter. In this endeavor, we introduced sentiment rating system for a particular text or paragraph to determine the opinions polarity. Firstly we resolve the searching problem, tokenization, classification, and reliable content identification. Secondly we extract probability for given text or paragraph for both positive & negative sentiment value using naive bayes classifier. At last we use sentiment dictionary (SD), sentiment degree dictionary (SDD) and negation dictionary (ND) for more accuracy. Later we blend all above mentioned factor into given formula to find the rating for the review.
Text Mining in Cancer Gene and Pathway Prioritization
Luo, Yuan; Riedlinger, Gregory; Szolovits, Peter
2014-01-01
Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed. A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways. This review places existing gene prioritization tools against the backdrop of an integrative Omic hierarchy view toward cancer and focuses on the analysis of their text mining components. We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes. PMID:25392685
Text mining in cancer gene and pathway prioritization.
Luo, Yuan; Riedlinger, Gregory; Szolovits, Peter
2014-01-01
Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed. A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways. This review places existing gene prioritization tools against the backdrop of an integrative Omic hierarchy view toward cancer and focuses on the analysis of their text mining components. We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes.
Citation Mining: Integrating Text Mining and Bibliometrics for Research User Profiling.
ERIC Educational Resources Information Center
Kostoff, Ronald N.; del Rio, J. Antonio; Humenik, James A.; Garcia, Esther Ofilia; Ramirez, Ana Maria
2001-01-01
Discusses the importance of identifying the users and impact of research, and describes an approach for identifying the pathways through which research can impact other research, technology development, and applications. Describes a study that used citation mining, an integration of citation bibliometrics and text mining, on articles from the…
Using text-mining techniques in electronic patient records to identify ADRs from medicine use.
Warrer, Pernille; Hansen, Ebba Holme; Juhl-Jensen, Lars; Aagaard, Lise
2012-05-01
This literature review included studies that use text-mining techniques in narrative documents stored in electronic patient records (EPRs) to investigate ADRs. We searched PubMed, Embase, Web of Science and International Pharmaceutical Abstracts without restrictions from origin until July 2011. We included empirically based studies on text mining of electronic patient records (EPRs) that focused on detecting ADRs, excluding those that investigated adverse events not related to medicine use. We extracted information on study populations, EPR data sources, frequencies and types of the identified ADRs, medicines associated with ADRs, text-mining algorithms used and their performance. Seven studies, all from the United States, were eligible for inclusion in the review. Studies were published from 2001, the majority between 2009 and 2010. Text-mining techniques varied over time from simple free text searching of outpatient visit notes and inpatient discharge summaries to more advanced techniques involving natural language processing (NLP) of inpatient discharge summaries. Performance appeared to increase with the use of NLP, although many ADRs were still missed. Due to differences in study design and populations, various types of ADRs were identified and thus we could not make comparisons across studies. The review underscores the feasibility and potential of text mining to investigate narrative documents in EPRs for ADRs. However, more empirical studies are needed to evaluate whether text mining of EPRs can be used systematically to collect new information about ADRs. © 2011 The Authors. British Journal of Clinical Pharmacology © 2011 The British Pharmacological Society.
Using text-mining techniques in electronic patient records to identify ADRs from medicine use
Warrer, Pernille; Hansen, Ebba Holme; Juhl-Jensen, Lars; Aagaard, Lise
2012-01-01
This literature review included studies that use text-mining techniques in narrative documents stored in electronic patient records (EPRs) to investigate ADRs. We searched PubMed, Embase, Web of Science and International Pharmaceutical Abstracts without restrictions from origin until July 2011. We included empirically based studies on text mining of electronic patient records (EPRs) that focused on detecting ADRs, excluding those that investigated adverse events not related to medicine use. We extracted information on study populations, EPR data sources, frequencies and types of the identified ADRs, medicines associated with ADRs, text-mining algorithms used and their performance. Seven studies, all from the United States, were eligible for inclusion in the review. Studies were published from 2001, the majority between 2009 and 2010. Text-mining techniques varied over time from simple free text searching of outpatient visit notes and inpatient discharge summaries to more advanced techniques involving natural language processing (NLP) of inpatient discharge summaries. Performance appeared to increase with the use of NLP, although many ADRs were still missed. Due to differences in study design and populations, various types of ADRs were identified and thus we could not make comparisons across studies. The review underscores the feasibility and potential of text mining to investigate narrative documents in EPRs for ADRs. However, more empirical studies are needed to evaluate whether text mining of EPRs can be used systematically to collect new information about ADRs. PMID:22122057
Land Ecological Security Evaluation of Underground Iron Mine Based on PSR Model
NASA Astrophysics Data System (ADS)
Xiao, Xiao; Chen, Yong; Ruan, Jinghua; Hong, Qiang; Gan, Yong
2018-01-01
Iron ore mine provides an important strategic resource to the national economy while it also causes many serious ecological problems to the environment. The study summed up the characteristics of ecological environment problems of underground iron mine. Considering the mining process of underground iron mine, we analysis connections between mining production, resource, environment and economical background. The paper proposed a land ecological security evaluation system and method of underground iron mine based on Pressure-State-Response model. Our application in Chengchao iron mine proves its efficiency and promising guide on land ecological security evaluation.
Gene prioritization and clustering by multi-view text mining
2010-01-01
Background Text mining has become a useful tool for biologists trying to understand the genetics of diseases. In particular, it can help identify the most interesting candidate genes for a disease for further experimental analysis. Many text mining approaches have been introduced, but the effect of disease-gene identification varies in different text mining models. Thus, the idea of incorporating more text mining models may be beneficial to obtain more refined and accurate knowledge. However, how to effectively combine these models still remains a challenging question in machine learning. In particular, it is a non-trivial issue to guarantee that the integrated model performs better than the best individual model. Results We present a multi-view approach to retrieve biomedical knowledge using different controlled vocabularies. These controlled vocabularies are selected on the basis of nine well-known bio-ontologies and are applied to index the vast amounts of gene-based free-text information available in the MEDLINE repository. The text mining result specified by a vocabulary is considered as a view and the obtained multiple views are integrated by multi-source learning algorithms. We investigate the effect of integration in two fundamental computational disease gene identification tasks: gene prioritization and gene clustering. The performance of the proposed approach is systematically evaluated and compared on real benchmark data sets. In both tasks, the multi-view approach demonstrates significantly better performance than other comparing methods. Conclusions In practical research, the relevance of specific vocabulary pertaining to the task is usually unknown. In such case, multi-view text mining is a superior and promising strategy for text-based disease gene identification. PMID:20074336
String Mining in Bioinformatics
NASA Astrophysics Data System (ADS)
Abouelhoda, Mohamed; Ghanem, Moustafa
Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra- and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word “data-mining” is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14].
78 FR 22557 - Notice of Proposed Information Collection; Request for Comments
Federal Register 2010, 2011, 2012, 2013, 2014
2013-04-16
... to request approval for the collection of information for the Abandoned Mine Land Problem Area... collection of information found in the form OSM-76, Abandoned Mine Land Problem Area Description form. OSM is... activity: Title: OSM-76--Abandoned Mine Land Problem Area Description Form. OMB Control Number: 1029-0087...
Information Extraction for Clinical Data Mining: A Mammography Case Study
Nassif, Houssam; Woods, Ryan; Burnside, Elizabeth; Ayvaci, Mehmet; Shavlik, Jude; Page, David
2013-01-01
Breast cancer is the leading cause of cancer mortality in women between the ages of 15 and 54. During mammography screening, radiologists use a strict lexicon (BI-RADS) to describe and report their findings. Mammography records are then stored in a well-defined database format (NMD). Lately, researchers have applied data mining and machine learning techniques to these databases. They successfully built breast cancer classifiers that can help in early detection of malignancy. However, the validity of these models depends on the quality of the underlying databases. Unfortunately, most databases suffer from inconsistencies, missing data, inter-observer variability and inappropriate term usage. In addition, many databases are not compliant with the NMD format and/or solely consist of text reports. BI-RADS feature extraction from free text and consistency checks between recorded predictive variables and text reports are crucial to addressing this problem. We describe a general scheme for concept information retrieval from free text given a lexicon, and present a BI-RADS features extraction algorithm for clinical data mining. It consists of a syntax analyzer, a concept finder and a negation detector. The syntax analyzer preprocesses the input into individual sentences. The concept finder uses a semantic grammar based on the BI-RADS lexicon and the experts’ input. It parses sentences detecting BI-RADS concepts. Once a concept is located, a lexical scanner checks for negation. Our method can handle multiple latent concepts within the text, filtering out ultrasound concepts. On our dataset, our algorithm achieves 97.7% precision, 95.5% recall and an F1-score of 0.97. It outperforms manual feature extraction at the 5% statistical significance level. PMID:23765123
Information Extraction for Clinical Data Mining: A Mammography Case Study.
Nassif, Houssam; Woods, Ryan; Burnside, Elizabeth; Ayvaci, Mehmet; Shavlik, Jude; Page, David
2009-01-01
Breast cancer is the leading cause of cancer mortality in women between the ages of 15 and 54. During mammography screening, radiologists use a strict lexicon (BI-RADS) to describe and report their findings. Mammography records are then stored in a well-defined database format (NMD). Lately, researchers have applied data mining and machine learning techniques to these databases. They successfully built breast cancer classifiers that can help in early detection of malignancy. However, the validity of these models depends on the quality of the underlying databases. Unfortunately, most databases suffer from inconsistencies, missing data, inter-observer variability and inappropriate term usage. In addition, many databases are not compliant with the NMD format and/or solely consist of text reports. BI-RADS feature extraction from free text and consistency checks between recorded predictive variables and text reports are crucial to addressing this problem. We describe a general scheme for concept information retrieval from free text given a lexicon, and present a BI-RADS features extraction algorithm for clinical data mining. It consists of a syntax analyzer, a concept finder and a negation detector. The syntax analyzer preprocesses the input into individual sentences. The concept finder uses a semantic grammar based on the BI-RADS lexicon and the experts' input. It parses sentences detecting BI-RADS concepts. Once a concept is located, a lexical scanner checks for negation. Our method can handle multiple latent concepts within the text, filtering out ultrasound concepts. On our dataset, our algorithm achieves 97.7% precision, 95.5% recall and an F 1 -score of 0.97. It outperforms manual feature extraction at the 5% statistical significance level.
Mining adverse drug reactions from online healthcare forums using hidden Markov model.
Sampathkumar, Hariprasad; Chen, Xue-wen; Luo, Bo
2014-10-23
Adverse Drug Reactions are one of the leading causes of injury or death among patients undergoing medical treatments. Not all Adverse Drug Reactions are identified before a drug is made available in the market. Current post-marketing drug surveillance methods, which are based purely on voluntary spontaneous reports, are unable to provide the early indications necessary to prevent the occurrence of such injuries or fatalities. The objective of this research is to extract reports of adverse drug side-effects from messages in online healthcare forums and use them as early indicators to assist in post-marketing drug surveillance. We treat the task of extracting adverse side-effects of drugs from healthcare forum messages as a sequence labeling problem and present a Hidden Markov Model(HMM) based Text Mining system that can be used to classify a message as containing drug side-effect information and then extract the adverse side-effect mentions from it. A manually annotated dataset from http://www.medications.com is used in the training and validation of the HMM based Text Mining system. A 10-fold cross-validation on the manually annotated dataset yielded on average an F-Score of 0.76 from the HMM Classifier, in comparison to 0.575 from the Baseline classifier. Without the Plain Text Filter component as a part of the Text Processing module, the F-Score of the HMM Classifier was reduced to 0.378 on average, while absence of the HTML Filter component was found to have no impact. Reducing the Drug names dictionary size by half, on average reduced the F-Score of the HMM Classifier to 0.359, while a similar reduction to the side-effects dictionary yielded an F-Score of 0.651 on average. Adverse side-effects mined from http://www.medications.com and http://www.steadyhealth.com were found to match the Adverse Drug Reactions on the Drug Package Labels of several drugs. In addition, some novel adverse side-effects, which can be potential Adverse Drug Reactions, were also identified. The results from the HMM based Text Miner are encouraging to pursue further enhancements to this approach. The mined novel side-effects can act as early indicators for health authorities to help focus their efforts in post-marketing drug surveillance.
Tree planting - strip-mined area in Maryland
Fred L. Bagley
1980-01-01
This report is written to elucidate some of the problems encountered in the planting of trees on strip-mined areas in Maryland. When problems are recognized, normally a solution (or at least, an improvement) can be instituted to alleviate the problem. The methods cited herein are those of experienced foresters engaged in strip-mine planting during the past seventeen...
ERIC Educational Resources Information Center
Mei, Qiaozhu
2009-01-01
With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the…
Small, Aeron M; Kiss, Daniel H; Zlatsin, Yevgeny; Birtwell, David L; Williams, Heather; Guerraty, Marie A; Han, Yuchi; Anwaruddin, Saif; Holmes, John H; Chirinos, Julio A; Wilensky, Robert L; Giri, Jay; Rader, Daniel J
2017-08-01
Interrogation of the electronic health record (EHR) using billing codes as a surrogate for diagnoses of interest has been widely used for clinical research. However, the accuracy of this methodology is variable, as it reflects billing codes rather than severity of disease, and depends on the disease and the accuracy of the coding practitioner. Systematic application of text mining to the EHR has had variable success for the detection of cardiovascular phenotypes. We hypothesize that the application of text mining algorithms to cardiovascular procedure reports may be a superior method to identify patients with cardiovascular conditions of interest. We adapted the Oracle product Endeca, which utilizes text mining to identify terms of interest from a NoSQL-like database, for purposes of searching cardiovascular procedure reports and termed the tool "PennSeek". We imported 282,569 echocardiography reports representing 81,164 individuals and 27,205 cardiac catheterization reports representing 14,567 individuals from non-searchable databases into PennSeek. We then applied clinical criteria to these reports in PennSeek to identify patients with trileaflet aortic stenosis (TAS) and coronary artery disease (CAD). Accuracy of patient identification by text mining through PennSeek was compared with ICD-9 billing codes. Text mining identified 7115 patients with TAS and 9247 patients with CAD. ICD-9 codes identified 8272 patients with TAS and 6913 patients with CAD. 4346 patients with AS and 6024 patients with CAD were identified by both approaches. A randomly selected sample of 200-250 patients uniquely identified by text mining was compared with 200-250 patients uniquely identified by billing codes for both diseases. We demonstrate that text mining was superior, with a positive predictive value (PPV) of 0.95 compared to 0.53 by ICD-9 for TAS, and a PPV of 0.97 compared to 0.86 for CAD. These results highlight the superiority of text mining algorithms applied to electronic cardiovascular procedure reports in the identification of phenotypes of interest for cardiovascular research. Copyright © 2017. Published by Elsevier Inc.
Problems in using p-curve analysis and text-mining to detect rate of p-hacking and evidential value.
Bishop, Dorothy V M; Thompson, Paul A
2016-01-01
Background. The p-curve is a plot of the distribution of p-values reported in a set of scientific studies. Comparisons between ranges of p-values have been used to evaluate fields of research in terms of the extent to which studies have genuine evidential value, and the extent to which they suffer from bias in the selection of variables and analyses for publication, p-hacking. Methods. p-hacking can take various forms. Here we used R code to simulate the use of ghost variables, where an experimenter gathers data on several dependent variables but reports only those with statistically significant effects. We also examined a text-mined dataset used by Head et al. (2015) and assessed its suitability for investigating p-hacking. Results. We show that when there is ghost p-hacking, the shape of the p-curve depends on whether dependent variables are intercorrelated. For uncorrelated variables, simulated p-hacked data do not give the "p-hacking bump" just below .05 that is regarded as evidence of p-hacking, though there is a negative skew when simulated variables are inter-correlated. The way p-curves vary according to features of underlying data poses problems when automated text mining is used to detect p-values in heterogeneous sets of published papers. Conclusions. The absence of a bump in the p-curve is not indicative of lack of p-hacking. Furthermore, while studies with evidential value will usually generate a right-skewed p-curve, we cannot treat a right-skewed p-curve as an indicator of the extent of evidential value, unless we have a model specific to the type of p-values entered into the analysis. We conclude that it is not feasible to use the p-curve to estimate the extent of p-hacking and evidential value unless there is considerable control over the type of data entered into the analysis. In particular, p-hacking with ghost variables is likely to be missed.
Community challenges in biomedical text mining over 10 years: success, failure and the future
Huang, Chung-Chi
2016-01-01
One effective way to improve the state of the art is through competitions. Following the success of the Critical Assessment of protein Structure Prediction (CASP) in bioinformatics research, a number of challenge evaluations have been organized by the text-mining research community to assess and advance natural language processing (NLP) research for biomedicine. In this article, we review the different community challenge evaluations held from 2002 to 2014 and their respective tasks. Furthermore, we examine these challenge tasks through their targeted problems in NLP research and biomedical applications, respectively. Next, we describe the general workflow of organizing a Biomedical NLP (BioNLP) challenge and involved stakeholders (task organizers, task data producers, task participants and end users). Finally, we summarize the impact and contributions by taking into account different BioNLP challenges as a whole, followed by a discussion of their limitations and difficulties. We conclude with future trends in BioNLP challenge evaluations. PMID:25935162
NASA Technical Reports Server (NTRS)
Wier, C. E.; Wobber, F. J. (Principal Investigator); Russell, O. R.; Amato, R. V.; Leshendok, T.
1973-01-01
The author has identified the following significant results. The Kings Station Mine in Gibson County, Indiana has experienced considerable roof fall problems. Detailed fracture mapping of the mine area was done with ERTS-1 and aircraft imagery, and a prediction map of roof problem areas was produced in advance of a visit. The visit to the mine and discussions with the operator indicated that of four zones mapped as potential problem areas, three coincided with areas of excessive roof fall. This positive correlation of 75% lends confidence to the validity of the technique being applied in the investigation. The mine officials expressed an interest in the project and are anxious to see the final product maps which are forthcoming.
Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II.
Lu, Zhiyong; Hirschman, Lynette
2012-01-01
Manual curation of data from the biomedical literature is a rate-limiting factor for many expert curated databases. Despite the continuing advances in biomedical text mining and the pressing needs of biocurators for better tools, few existing text-mining tools have been successfully integrated into production literature curation systems such as those used by the expert curated databases. To close this gap and better understand all aspects of literature curation, we invited submissions of written descriptions of curation workflows from expert curated databases for the BioCreative 2012 Workshop Track II. We received seven qualified contributions, primarily from model organism databases. Based on these descriptions, we identified commonalities and differences across the workflows, the common ontologies and controlled vocabularies used and the current and desired uses of text mining for biocuration. Compared to a survey done in 2009, our 2012 results show that many more databases are now using text mining in parts of their curation workflows. In addition, the workshop participants identified text-mining aids for finding gene names and symbols (gene indexing), prioritization of documents for curation (document triage) and ontology concept assignment as those most desired by the biocurators. DATABASE URL: http://www.biocreative.org/tasks/bc-workshop-2012/workflow/.
NASA Astrophysics Data System (ADS)
Znikina, Ludmila; Rozhneva, Elena
2017-11-01
The article deals with the distribution of informative intensity of the English-language scientific text based on its structural features contributing to the process of formalization of the scientific text and the preservation of the adequacy of the text with derived semantic information in relation to the primary. Discourse analysis is built on specific compositional and meaningful examples of scientific texts taken from the mining field. It also analyzes the adequacy of the translation of foreign texts into another language, the relationships between elements of linguistic systems, the degree of a formal conformance, translation with the specific objectives and information needs of the recipient. Some key words and ideas are emphasized in the paragraphs of the English-language mining scientific texts. The article gives the characteristic features of the structure of paragraphs of technical text and examples of constructions in English scientific texts based on a mining theme with the aim to explain the possible ways of their adequate translation.
An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature.
ERIC Educational Resources Information Center
Trybula, Walter J.; Wyllys, Ronald E.
2000-01-01
Addresses an approach to the discovery of scientific knowledge through an examination of data mining and text mining techniques. Presents the results of experiments that investigated knowledge acquisition from a selected set of technical documents by domain experts. (Contains 15 references.) (Author/LRW)
ERIC Educational Resources Information Center
Chen, Hsinchun
2003-01-01
Discusses information retrieval techniques used on the World Wide Web. Topics include machine learning in information extraction; relevance feedback; information filtering and recommendation; text classification and text clustering; Web mining, based on data mining techniques; hyperlink structure; and Web size. (LRW)
Data Mining and Complex Problems: Case Study in Composite Materials
NASA Technical Reports Server (NTRS)
Rabelo, Luis; Marin, Mario
2009-01-01
Data mining is defined as the discovery of useful, possibly unexpected, patterns and relationships in data using statistical and non-statistical techniques in order to develop schemes for decision and policy making. Data mining can be used to discover the sources and causes of problems in complex systems. In addition, data mining can support simulation strategies by finding the different constants and parameters to be used in the development of simulation models. This paper introduces a framework for data mining and its application to complex problems. To further explain some of the concepts outlined in this paper, the potential application to the NASA Shuttle Reinforced Carbon-Carbon structures and genetic programming is used as an illustration.
The Islamic State Battle Plan: Press Release Natural Language Processing
2016-06-01
Processing, text mining , corpus, generalized linear model, cascade, R Shiny, leaflet, data visualization 15. NUMBER OF PAGES 83 16. PRICE CODE...Terrorism and Responses to Terrorism TDM Term Document Matrix TF Term Frequency TF-IDF Term Frequency-Inverse Document Frequency tm text mining (R...package=leaflet. Feinerer I, Hornik K (2015) Text Mining Package “tm,” Version 0.6-2. (Jul 3) https://cran.r-project.org/web/packages/tm/tm.pdf
OntoGene web services for biomedical text mining.
Rinaldi, Fabio; Clematide, Simon; Marques, Hernani; Ellendorff, Tilia; Romacker, Martin; Rodriguez-Esteban, Raul
2014-01-01
Text mining services are rapidly becoming a crucial component of various knowledge management pipelines, for example in the process of database curation, or for exploration and enrichment of biomedical data within the pharmaceutical industry. Traditional architectures, based on monolithic applications, do not offer sufficient flexibility for a wide range of use case scenarios, and therefore open architectures, as provided by web services, are attracting increased interest. We present an approach towards providing advanced text mining capabilities through web services, using a recently proposed standard for textual data interchange (BioC). The web services leverage a state-of-the-art platform for text mining (OntoGene) which has been tested in several community-organized evaluation challenges,with top ranked results in several of them.
Text mining patents for biomedical knowledge.
Rodriguez-Esteban, Raul; Bundschus, Markus
2016-06-01
Biomedical text mining of scientific knowledge bases, such as Medline, has received much attention in recent years. Given that text mining is able to automatically extract biomedical facts that revolve around entities such as genes, proteins, and drugs, from unstructured text sources, it is seen as a major enabler to foster biomedical research and drug discovery. In contrast to the biomedical literature, research into the mining of biomedical patents has not reached the same level of maturity. Here, we review existing work and highlight the associated technical challenges that emerge from automatically extracting facts from patents. We conclude by outlining potential future directions in this domain that could help drive biomedical research and drug discovery. Copyright © 2016 Elsevier Ltd. All rights reserved.
OntoGene web services for biomedical text mining
2014-01-01
Text mining services are rapidly becoming a crucial component of various knowledge management pipelines, for example in the process of database curation, or for exploration and enrichment of biomedical data within the pharmaceutical industry. Traditional architectures, based on monolithic applications, do not offer sufficient flexibility for a wide range of use case scenarios, and therefore open architectures, as provided by web services, are attracting increased interest. We present an approach towards providing advanced text mining capabilities through web services, using a recently proposed standard for textual data interchange (BioC). The web services leverage a state-of-the-art platform for text mining (OntoGene) which has been tested in several community-organized evaluation challenges, with top ranked results in several of them. PMID:25472638
Detecting causality from online psychiatric texts using inter-sentential language patterns
2012-01-01
Background Online psychiatric texts are natural language texts expressing depressive problems, published by Internet users via community-based web services such as web forums, message boards and blogs. Understanding the cause-effect relations embedded in these psychiatric texts can provide insight into the authors’ problems, thus increasing the effectiveness of online psychiatric services. Methods Previous studies have proposed the use of word pairs extracted from a set of sentence pairs to identify cause-effect relations between sentences. A word pair is made up of two words, with one coming from the cause text span and the other from the effect text span. Analysis of the relationship between these words can be used to capture individual word associations between cause and effect sentences. For instance, (broke up, life) and (boyfriend, meaningless) are two word pairs extracted from the sentence pair: “I broke up with my boyfriend. Life is now meaningless to me”. The major limitation of word pairs is that individual words in sentences usually cannot reflect the exact meaning of the cause and effect events, and thus may produce semantically incomplete word pairs, as the previous examples show. Therefore, this study proposes the use of inter-sentential language patterns such as ≪broke up, boyfriend>,
Negative and Positive Association Rules Mining from Text Using Frequent and Infrequent Itemsets
Mahmood, Sajid; Shahbaz, Muhammad; Guergachi, Aziz
2014-01-01
Association rule mining research typically focuses on positive association rules (PARs), generated from frequently occurring itemsets. However, in recent years, there has been a significant research focused on finding interesting infrequent itemsets leading to the discovery of negative association rules (NARs). The discovery of infrequent itemsets is far more difficult than their counterparts, that is, frequent itemsets. These problems include infrequent itemsets discovery and generation of accurate NARs, and their huge number as compared with positive association rules. In medical science, for example, one is interested in factors which can either adjudicate the presence of a disease or write-off of its possibility. The vivid positive symptoms are often obvious; however, negative symptoms are subtler and more difficult to recognize and diagnose. In this paper, we propose an algorithm for discovering positive and negative association rules among frequent and infrequent itemsets. We identify associations among medications, symptoms, and laboratory results using state-of-the-art data mining technology. PMID:24955429
SEMINAR PUBLICATION: MANAGING ENVIRONMENTAL PROBLEMS AT INACTIVE AND ABANDONED METALS MINE SITES
Environmental problems associated with abandoned and inactive mines are addressed along with some approaches to resolving those problems, including case studies demonstrating technologies that have worked. New technologies being investigated are addressed also.
Text and Structural Data Mining of Influenza Mentions in Web and Social Media
DOE Office of Scientific and Technical Information (OSTI.GOV)
Corley, Courtney D.; Cook, Diane; Mikler, Armin R.
Text and structural data mining of Web and social media (WSM) provides a novel disease surveillance resource and can identify online communities for targeted public health communications (PHC) to assure wide dissemination of pertinent information. WSM that mention influenza are harvested over a 24-week period, 5-October-2008 to 21-March-2009. Link analysis reveals communities for targeted PHC. Text mining is shown to identify trends in flu posts that correlate to real-world influenza-like-illness patient report data. We also bring to bear a graph-based data mining technique to detect anomalies among flu blogs connected by publisher type, links, and user-tags.
Vaccine adverse event text mining system for extracting features from vaccine safety reports.
Botsis, Taxiarchis; Buttolph, Thomas; Nguyen, Michael D; Winiecki, Scott; Woo, Emily Jane; Ball, Robert
2012-01-01
To develop and evaluate a text mining system for extracting key clinical features from vaccine adverse event reporting system (VAERS) narratives to aid in the automated review of adverse event reports. Based upon clinical significance to VAERS reviewing physicians, we defined the primary (diagnosis and cause of death) and secondary features (eg, symptoms) for extraction. We built a novel vaccine adverse event text mining (VaeTM) system based on a semantic text mining strategy. The performance of VaeTM was evaluated using a total of 300 VAERS reports in three sequential evaluations of 100 reports each. Moreover, we evaluated the VaeTM contribution to case classification; an information retrieval-based approach was used for the identification of anaphylaxis cases in a set of reports and was compared with two other methods: a dedicated text classifier and an online tool. The performance metrics of VaeTM were text mining metrics: recall, precision and F-measure. We also conducted a qualitative difference analysis and calculated sensitivity and specificity for classification of anaphylaxis cases based on the above three approaches. VaeTM performed best in extracting diagnosis, second level diagnosis, drug, vaccine, and lot number features (lenient F-measure in the third evaluation: 0.897, 0.817, 0.858, 0.874, and 0.914, respectively). In terms of case classification, high sensitivity was achieved (83.1%); this was equal and better compared to the text classifier (83.1%) and the online tool (40.7%), respectively. Our VaeTM implementation of a semantic text mining strategy shows promise in providing accurate and efficient extraction of key features from VAERS narratives.
Extracting biomedical events from pairs of text entities
2015-01-01
Background Huge amounts of electronic biomedical documents, such as molecular biology reports or genomic papers are generated daily. Nowadays, these documents are mainly available in the form of unstructured free texts, which require heavy processing for their registration into organized databases. This organization is instrumental for information retrieval, enabling to answer the advanced queries of researchers and practitioners in biology, medicine, and related fields. Hence, the massive data flow calls for efficient automatic methods of text-mining that extract high-level information, such as biomedical events, from biomedical text. The usual computational tools of Natural Language Processing cannot be readily applied to extract these biomedical events, due to the peculiarities of the domain. Indeed, biomedical documents contain highly domain-specific jargon and syntax. These documents also describe distinctive dependencies, making text-mining in molecular biology a specific discipline. Results We address biomedical event extraction as the classification of pairs of text entities into the classes corresponding to event types. The candidate pairs of text entities are recursively provided to a multiclass classifier relying on Support Vector Machines. This recursive process extracts events involving other events as arguments. Compared to joint models based on Markov Random Fields, our model simplifies inference and hence requires shorter training and prediction times along with lower memory capacity. Compared to usual pipeline approaches, our model passes over a complex intermediate problem, while making a more extensive usage of sophisticated joint features between text entities. Our method focuses on the core event extraction of the Genia task of BioNLP challenges yielding the best result reported so far on the 2013 edition. PMID:26201478
Yuan, Soe-Tsyr; Sun, Jerry
2005-10-01
Development of algorithms for automated text categorization in massive text document sets is an important research area of data mining and knowledge discovery. Most of the text-clustering methods were grounded in the term-based measurement of distance or similarity, ignoring the structure of the documents. In this paper, we present a novel method named structured cosine similarity (SCS) that furnishes document clustering with a new way of modeling on document summarization, considering the structure of the documents so as to improve the performance of document clustering in terms of quality, stability, and efficiency. This study was motivated by the problem of clustering speech documents (of no rich document features) attained from the wireless experience oral sharing conducted by mobile workforce of enterprises, fulfilling audio-based knowledge management. In other words, this problem aims to facilitate knowledge acquisition and sharing by speech. The evaluations also show fairly promising results on our method of structured cosine similarity.
DISEASES: text mining and data integration of disease-gene associations.
Pletscher-Frankild, Sune; Pallejà, Albert; Tsafou, Kalliopi; Binder, Janos X; Jensen, Lars Juhl
2015-03-01
Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease-gene associations from biomedical abstracts. The system consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases, which we combine with a scoring scheme that takes into account co-occurrences both within and between sentences. We show that this approach is able to extract half of all manually curated associations with a false positive rate of only 0.16%. Nonetheless, text mining should not stand alone, but be combined with other types of evidence. For this reason, we have developed the DISEASES resource, which integrates the results from text mining with manually curated disease-gene associations, cancer mutation data, and genome-wide association studies from existing databases. The DISEASES resource is accessible through a web interface at http://diseases.jensenlab.org/, where the text-mining software and all associations are also freely available for download. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Learning the Structure of Biomedical Relationships from Unstructured Text
Percha, Bethany; Altman, Russ B.
2015-01-01
The published biomedical research literature encompasses most of our understanding of how drugs interact with gene products to produce physiological responses (phenotypes). Unfortunately, this information is distributed throughout the unstructured text of over 23 million articles. The creation of structured resources that catalog the relationships between drugs and genes would accelerate the translation of basic molecular knowledge into discoveries of genomic biomarkers for drug response and prediction of unexpected drug-drug interactions. Extracting these relationships from natural language sentences on such a large scale, however, requires text mining algorithms that can recognize when different-looking statements are expressing similar ideas. Here we describe a novel algorithm, Ensemble Biclustering for Classification (EBC), that learns the structure of biomedical relationships automatically from text, overcoming differences in word choice and sentence structure. We validate EBC's performance against manually-curated sets of (1) pharmacogenomic relationships from PharmGKB and (2) drug-target relationships from DrugBank, and use it to discover new drug-gene relationships for both knowledge bases. We then apply EBC to map the complete universe of drug-gene relationships based on their descriptions in Medline, revealing unexpected structure that challenges current notions about how these relationships are expressed in text. For instance, we learn that newer experimental findings are described in consistently different ways than established knowledge, and that seemingly pure classes of relationships can exhibit interesting chimeric structure. The EBC algorithm is flexible and adaptable to a wide range of problems in biomedical text mining. PMID:26219079
Simmons, Michael; Singhal, Ayush; Lu, Zhiyong
2018-01-01
The key question of precision medicine is whether it is possible to find clinically actionable granularity in diagnosing disease and classifying patient risk. The advent of next generation sequencing and the widespread adoption of electronic health records (EHRs) have provided clinicians and researchers a wealth of data and made possible the precise characterization of individual patient genotypes and phenotypes. Unstructured text — found in biomedical publications and clinical notes — is an important component of genotype and phenotype knowledge. Publications in the biomedical literature provide essential information for interpreting genetic data. Likewise, clinical notes contain the richest source of phenotype information in EHRs. Text mining can render these texts computationally accessible and support information extraction and hypothesis generation. This chapter reviews the mechanics of text mining in precision medicine and discusses several specific use cases, including database curation for personalized cancer medicine, patient outcome prediction from EHR-derived cohorts, and pharmacogenomic research. Taken as a whole, these use cases demonstrate how text mining enables effective utilization of existing knowledge sources and thus promotes increased value for patients and healthcare systems. Text mining is an indispensable tool for translating genotype-phenotype data into effective clinical care that will undoubtedly play an important role in the eventual realization of precision medicine. PMID:27807747
Simmons, Michael; Singhal, Ayush; Lu, Zhiyong
2016-01-01
The key question of precision medicine is whether it is possible to find clinically actionable granularity in diagnosing disease and classifying patient risk. The advent of next-generation sequencing and the widespread adoption of electronic health records (EHRs) have provided clinicians and researchers a wealth of data and made possible the precise characterization of individual patient genotypes and phenotypes. Unstructured text-found in biomedical publications and clinical notes-is an important component of genotype and phenotype knowledge. Publications in the biomedical literature provide essential information for interpreting genetic data. Likewise, clinical notes contain the richest source of phenotype information in EHRs. Text mining can render these texts computationally accessible and support information extraction and hypothesis generation. This chapter reviews the mechanics of text mining in precision medicine and discusses several specific use cases, including database curation for personalized cancer medicine, patient outcome prediction from EHR-derived cohorts, and pharmacogenomic research. Taken as a whole, these use cases demonstrate how text mining enables effective utilization of existing knowledge sources and thus promotes increased value for patients and healthcare systems. Text mining is an indispensable tool for translating genotype-phenotype data into effective clinical care that will undoubtedly play an important role in the eventual realization of precision medicine.
Zhao, Yufeng; Xie, Qi; He, Liyun; Liu, Baoyan; Li, Kun; Zhang, Xiang; Bai, Wenjing; Luo, Lin; Jing, Xianghong; Huo, Ruili
2014-10-01
To help researchers selecting appropriate data mining models to provide better evidence for the clinical practice of Traditional Chinese Medicine (TCM) diagnosis and therapy. Clinical issues based on data mining models were comprehensively summarized from four significant elements of the clinical studies: symptoms, symptom patterns, herbs, and efficacy. Existing problems were further generalized to determine the relevant factors of the performance of data mining models, e.g. data type, samples, parameters, variable labels. Combining these relevant factors, the TCM clinical data features were compared with regards to statistical characters and informatics properties. Data models were compared simultaneously from the view of applied conditions and suitable scopes. The main application problems were the inconsistent data type and the small samples for the used data mining models, which caused the inappropriate results, even the mistake results. These features, i.e. advantages, disadvantages, satisfied data types, tasks of data mining, and the TCM issues, were summarized and compared. By aiming at the special features of different data mining models, the clinical doctors could select the suitable data mining models to resolve the TCM problem.
THE EPA/DOE MINE WASTE TECHNOLOGY PROGRAM
Mining activities in the US (not counting coal) produce between 1-2B tons of mine waste annually. Since many of the ore mines involve sulfide minerals, the production of acid mine drainage (AMD) is a common problem from these abandoned mine sites. The combination of acidity, heav...
Scholarly Information Extraction Is Going to Make a Quantum Leap with PubMed Central (PMC).
Matthies, Franz; Hahn, Udo
2017-01-01
With the increasing availability of complete full texts (journal articles), rather than their surrogates (titles, abstracts), as resources for text analytics, entirely new opportunities arise for information extraction and text mining from scholarly publications. Yet, we gathered evidence that a range of problems are encountered for full-text processing when biomedical text analytics simply reuse existing NLP pipelines which were developed on the basis of abstracts (rather than full texts). We conducted experiments with four different relation extraction engines all of which were top performers in previous BioNLP Event Extraction Challenges. We found that abstract-trained engines loose up to 6.6% F-score points when run on full-text data. Hence, the reuse of existing abstract-based NLP software in a full-text scenario is considered harmful because of heavy performance losses. Given the current lack of annotated full-text resources to train on, our study quantifies the price paid for this short cut.
Hammond, Kenric W; Ben-Ari, Alon Y; Laundry, Ryan J; Boyko, Edward J; Samore, Matthew H
2015-12-01
Free text in electronic health records resists large-scale analysis. Text records facts of interest not found in encoded data, and text mining enables their retrieval and quantification. The U.S. Department of Veterans Affairs (VA) clinical data repository affords an opportunity to apply text-mining methodology to study clinical questions in large populations. To assess the feasibility of text mining, investigation of the relationship between exposure to adverse childhood experiences (ACEs) and recorded diagnoses was conducted among all VA-treated Gulf war veterans, utilizing all progress notes recorded from 2000-2011. Text processing extracted ACE exposures recorded among 44.7 million clinical notes belonging to 243,973 veterans. The relationship of ACE exposure to adult illnesses was analyzed using logistic regression. Bias considerations were assessed. ACE score was strongly associated with suicide attempts and serious mental disorders (ORs = 1.84 to 1.97), and less so with behaviorally mediated and somatic conditions (ORs = 1.02 to 1.36) per unit. Bias adjustments did not remove persistent associations between ACE score and most illnesses. Text mining to detect ACE exposure in a large population was feasible. Analysis of the relationship between ACE score and adult health conditions yielded patterns of association consistent with prior research. Copyright © 2015 International Society for Traumatic Stress Studies.
Intelligent Information Retrieval and Web Mining Architecture Using SOA
ERIC Educational Resources Information Center
El-Bathy, Naser Ibrahim
2010-01-01
The study of this dissertation provides a solution to a very specific problem instance in the area of data mining, data warehousing, and service-oriented architecture in publishing and newspaper industries. The research question focuses on the integration of data mining and data warehousing. The research problem focuses on the development of…
Text mining and its potential applications in systems biology.
Ananiadou, Sophia; Kell, Douglas B; Tsujii, Jun-ichi
2006-12-01
With biomedical literature increasing at a rate of several thousand papers per week, it is impossible to keep abreast of all developments; therefore, automated means to manage the information overload are required. Text mining techniques, which involve the processes of information retrieval, information extraction and data mining, provide a means of solving this. By adding meaning to text, these techniques produce a more structured analysis of textual knowledge than simple word searches, and can provide powerful tools for the production and analysis of systems biology models.
Kreuzthaler, Markus; Miñarro-Giménez, Jose Antonio; Schulz, Stefan
2016-01-01
Big data resources are difficult to process without a scaled hardware environment that is specifically adapted to the problem. The emergence of flexible cloud-based virtualization techniques promises solutions to this problem. This paper demonstrates how a billion of lines can be processed in a reasonable amount of time in a cloud-based environment. Our use case addresses the accumulation of concept co-occurrence data in MEDLINE annotation as a series of MapReduce jobs, which can be scaled and executed in the cloud. Besides showing an efficient way solving this problem, we generated an additional resource for the scientific community to be used for advanced text mining approaches.
Proceedings: Fourth Workshop on Mining Scientific Datasets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kamath, C
Commercial applications of data mining in areas such as e-commerce, market-basket analysis, text-mining, and web-mining have taken on a central focus in the JCDD community. However, there is a significant amount of innovative data mining work taking place in the context of scientific and engineering applications that is not well represented in the mainstream KDD conferences. For example, scientific data mining techniques are being developed and applied to diverse fields such as remote sensing, physics, chemistry, biology, astronomy, structural mechanics, computational fluid dynamics etc. In these areas, data mining frequently complements and enhances existing analysis methods based on statistics, exploratorymore » data analysis, and domain-specific approaches. On the surface, it may appear that data from one scientific field, say genomics, is very different from another field, such as physics. However, despite their diversity, there is much that is common across the mining of scientific and engineering data. For example, techniques used to identify objects in images are very similar, regardless of whether the images came from a remote sensing application, a physics experiment, an astronomy observation, or a medical study. Further, with data mining being applied to new types of data, such as mesh data from scientific simulations, there is the opportunity to apply and extend data mining to new scientific domains. This one-day workshop brings together data miners analyzing science data and scientists from diverse fields to share their experiences, learn how techniques developed in one field can be applied in another, and better understand some of the newer techniques being developed in the KDD community. This is the fourth workshop on the topic of Mining Scientific Data sets; for information on earlier workshops, see http://www.ahpcrc.org/conferences/. This workshop continues the tradition of addressing challenging problems in a field where the diversity of applications is matched only by the opportunities that await a practitioner.« less
Unmanned Mine of the 21st Centuries
NASA Astrophysics Data System (ADS)
Semykina, Irina; Grigoryev, Aleksandr; Gargayev, Andrey; Zavyalov, Valeriy
2017-11-01
The article is analytical. It considers the construction principles of the automation system structure which realize the concept of «unmanned mine». All of these principles intend to deal with problems caused by a continuous complication of mining-and-geological conditions at coalmine such as the labor safety and health protection, the weak integration of different mining automation subsystems and the deficiency of optimal balance between a quantity of resource and energy consumed by mining machines and their throughput. The authors describe the main problems and neck stage of mining machines autonomation and automation subsystem. The article makes a general survey of the applied «unmanned technology» in the field of mining such as the remotely operated autonomous complexes, the underground positioning systems of mining machines using infrared radiation in mine workings etc. The concept of «unmanned mine» is considered with an example of the robotic road heading machine. In the final, the authors analyze the techniques and methods that could solve the task of underground mining without human labor.
U.S. ARMY CORPS OF ENGINEERS ABANDONED MINE LAND REMEDIATION WORKSHOP
Mining activities in the US (not counting coal) produce 1-2 billion tons of mine waste annually. Since many of the ore mines involve sulfide minerals, the production of acid mine drainage (AMD) is a common problem from these abandoned mine sites. The combination of acidity, heavy...
ASSESSING AND MANAGING MERCURY FROM HISTORIC AND CURRENT MINING ACTIVITIES
Mining activities in the US (not counting coal) produce between one and two billion tons of mine waste annually. Since many of the ore mines involve sulfide minerals, the production of acid mine drainage (AMD) is a common problem from these abandoned mine sites. The combination o...
VisualUrText: A Text Analytics Tool for Unstructured Textual Data
NASA Astrophysics Data System (ADS)
Zainol, Zuraini; Jaymes, Mohd T. H.; Nohuddin, Puteri N. E.
2018-05-01
The growing amount of unstructured text over Internet is tremendous. Text repositories come from Web 2.0, business intelligence and social networking applications. It is also believed that 80-90% of future growth data is available in the form of unstructured text databases that may potentially contain interesting patterns and trends. Text Mining is well known technique for discovering interesting patterns and trends which are non-trivial knowledge from massive unstructured text data. Text Mining covers multidisciplinary fields involving information retrieval (IR), text analysis, natural language processing (NLP), data mining, machine learning statistics and computational linguistics. This paper discusses the development of text analytics tool that is proficient in extracting, processing, analyzing the unstructured text data and visualizing cleaned text data into multiple forms such as Document Term Matrix (DTM), Frequency Graph, Network Analysis Graph, Word Cloud and Dendogram. This tool, VisualUrText, is developed to assist students and researchers for extracting interesting patterns and trends in document analyses.
Lotfian, Reza; Najafi, Mehdi
2018-02-26
Background Every year, many mining accidents occur in underground mines all over the world resulting in the death and maiming of many miners and heavy financial losses to mining companies. Underground mining accounts for an increasing share of these events due to their special circumstances and the risks of working therein. Thus, the optimal location of emergency stations within the network of an underground mine in order to provide medical first aid and transport injured people at the right time, plays an essential role in reducing deaths and disabilities caused by accidents Objective The main objective of this study is to determine the location of emergency stations (ES) within the network of an underground coal mine in order to minimize the outreach time for the injured. Methods A three-objective mathematical model is presented for placement of ES facility location selection and allocation of facilities to the injured in various stopes. Results Taking into account the radius of influence for each ES, the proposed model is capable to reduce the maximum time for provision of emergency services in the event of accident for each stope. In addition, the coverage or lack of coverage of each stope by any of the emergency facility is determined by means of Floyd-Warshall algorithm and graph. To solve the problem, a global criterion method using GAMS software is used to evaluate the accuracy and efficiency of the model. Conclusions 7 locations were selected from among 46 candidates for the establishment of emergency facilities in Tabas underground coal mine. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
BioCreative Workshops for DOE Genome Sciences: Text Mining for Metagenomics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Cathy H.; Hirschman, Lynette
The objective of this project was to host BioCreative workshops to define and develop text mining tasks to meet the needs of the Genome Sciences community, focusing on metadata information extraction in metagenomics. Following the successful introduction of metagenomics at the BioCreative IV workshop, members of the metagenomics community and BioCreative communities continued discussion to identify candidate topics for a BioCreative metagenomics track for BioCreative V. Of particular interest was the capture of environmental and isolation source information from text. The outcome was to form a “community of interest” around work on the interactive EXTRACT system, which supported interactive taggingmore » of environmental and species data. This experiment is included in the BioCreative V virtual issue of Database. In addition, there was broad participation by members of the metagenomics community in the panels held at BioCreative V, leading to valuable exchanges between the text mining developers and members of the metagenomics research community. These exchanges are reflected in a number of the overview and perspective pieces also being captured in the BioCreative V virtual issue. Overall, this conversation has exposed the metagenomics researchers to the possibilities of text mining, and educated the text mining developers to the specific needs of the metagenomics community.« less
Benchmarking infrastructure for mutation text mining
2014-01-01
Background Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. Results We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. Conclusion We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption. PMID:24568600
Benchmarking infrastructure for mutation text mining.
Klein, Artjom; Riazanov, Alexandre; Hindle, Matthew M; Baker, Christopher Jo
2014-02-25
Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption.
LAND REBORN: TOOLS FOR THE 21ST CENTURY/NATIONAL ASSOCIATION OF ABANDONED MINE LAND PROGRAMS
Mining activities in the US (not counting coal) produce 1-2 billion tons of mine waste annually. Since many of the ore mines involve sulfide minerals, the production of acid mine drainage (AMD) is a common problem from these abandoned mine sites. The combination of acidity, heavy...
DrugQuest - a text mining workflow for drug association discovery.
Papanikolaou, Nikolas; Pavlopoulos, Georgios A; Theodosiou, Theodosios; Vizirianakis, Ioannis S; Iliopoulos, Ioannis
2016-06-06
Text mining and data integration methods are gaining ground in the field of health sciences due to the exponential growth of bio-medical literature and information stored in biological databases. While such methods mostly try to extract bioentity associations from PubMed, very few of them are dedicated in mining other types of repositories such as chemical databases. Herein, we apply a text mining approach on the DrugBank database in order to explore drug associations based on the DrugBank "Description", "Indication", "Pharmacodynamics" and "Mechanism of Action" text fields. We apply Name Entity Recognition (NER) techniques on these fields to identify chemicals, proteins, genes, pathways, diseases, and we utilize the TextQuest algorithm to find additional biologically significant words. Using a plethora of similarity and partitional clustering techniques, we group the DrugBank records based on their common terms and investigate possible scenarios why these records are clustered together. Different views such as clustered chemicals based on their textual information, tag clouds consisting of Significant Terms along with the terms that were used for clustering are delivered to the user through a user-friendly web interface. DrugQuest is a text mining tool for knowledge discovery: it is designed to cluster DrugBank records based on text attributes in order to find new associations between drugs. The service is freely available at http://bioinformatics.med.uoc.gr/drugquest .
ERIC Educational Resources Information Center
Kong, Siu Cheung; Li, Ping; Song, Yanjie
2018-01-01
This study evaluated a bilingual text-mining system, which incorporated a bilingual taxonomy of key words and provided hierarchical visualization, for understanding learner-generated text in the learning management systems through automatic identification and counting of matching key words. A class of 27 in-service teachers studied a course…
MINE WASTE TECHNOLOGY PROGRAM: A SUCCESS STORY
Mining Waste generated by active and inactive mining operations is a growing problem for the mining industry, local governments, and Native American communities because of its impact on human health and the environment. In the US, the reported volume of mine waste is immense: 2 b...
Beyond accuracy: creating interoperable and scalable text-mining web services.
Wei, Chih-Hsuan; Leaman, Robert; Lu, Zhiyong
2016-06-15
The biomedical literature is a knowledge-rich resource and an important foundation for future research. With over 24 million articles in PubMed and an increasing growth rate, research in automated text processing is becoming increasingly important. We report here our recently developed web-based text mining services for biomedical concept recognition and normalization. Unlike most text-mining software tools, our web services integrate several state-of-the-art entity tagging systems (DNorm, GNormPlus, SR4GN, tmChem and tmVar) and offer a batch-processing mode able to process arbitrary text input (e.g. scholarly publications, patents and medical records) in multiple formats (e.g. BioC). We support multiple standards to make our service interoperable and allow simpler integration with other text-processing pipelines. To maximize scalability, we have preprocessed all PubMed articles, and use a computer cluster for processing large requests of arbitrary text. Our text-mining web service is freely available at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/#curl : Zhiyong.Lu@nih.gov. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.
Community challenges in biomedical text mining over 10 years: success, failure and the future.
Huang, Chung-Chi; Lu, Zhiyong
2016-01-01
One effective way to improve the state of the art is through competitions. Following the success of the Critical Assessment of protein Structure Prediction (CASP) in bioinformatics research, a number of challenge evaluations have been organized by the text-mining research community to assess and advance natural language processing (NLP) research for biomedicine. In this article, we review the different community challenge evaluations held from 2002 to 2014 and their respective tasks. Furthermore, we examine these challenge tasks through their targeted problems in NLP research and biomedical applications, respectively. Next, we describe the general workflow of organizing a Biomedical NLP (BioNLP) challenge and involved stakeholders (task organizers, task data producers, task participants and end users). Finally, we summarize the impact and contributions by taking into account different BioNLP challenges as a whole, followed by a discussion of their limitations and difficulties. We conclude with future trends in BioNLP challenge evaluations. Published by Oxford University Press 2015. This work is written by US Government employees and is in the public domain in the US.
75 FR 51291 - National Science Board: Sunshine Act Meetings; Notice
Federal Register 2010, 2011, 2012, 2013, 2014
2010-08-19
...-Gathering Activities. [cir] COV Report Text-Mining. [cir] Design of Research Questions for External Input. [cir] SBE/CISE Text-Mining Projects. [cir] Using a Blog for Informal Input. Committee on Education and...
Improve Data Mining and Knowledge Discovery Through the Use of MatLab
NASA Technical Reports Server (NTRS)
Shaykhian, Gholam Ali; Martin, Dawn (Elliott); Beil, Robert
2011-01-01
Data mining is widely used to mine business, engineering, and scientific data. Data mining uses pattern based queries, searches, or other analyses of one or more electronic databases/datasets in order to discover or locate a predictive pattern or anomaly indicative of system failure, criminal or terrorist activity, etc. There are various algorithms, techniques and methods used to mine data; including neural networks, genetic algorithms, decision trees, nearest neighbor method, rule induction association analysis, slice and dice, segmentation, and clustering. These algorithms, techniques and methods used to detect patterns in a dataset, have been used in the development of numerous open source and commercially available products and technology for data mining. Data mining is best realized when latent information in a large quantity of data stored is discovered. No one technique solves all data mining problems; challenges are to select algorithms or methods appropriate to strengthen data/text mining and trending within given datasets. In recent years, throughout industry, academia and government agencies, thousands of data systems have been designed and tailored to serve specific engineering and business needs. Many of these systems use databases with relational algebra and structured query language to categorize and retrieve data. In these systems, data analyses are limited and require prior explicit knowledge of metadata and database relations; lacking exploratory data mining and discoveries of latent information. This presentation introduces MatLab(R) (MATrix LABoratory), an engineering and scientific data analyses tool to perform data mining. MatLab was originally intended to perform purely numerical calculations (a glorified calculator). Now, in addition to having hundreds of mathematical functions, it is a programming language with hundreds built in standard functions and numerous available toolboxes. MatLab's ease of data processing, visualization and its enormous availability of built in functionalities and toolboxes make it suitable to perform numerical computations and simulations as well as a data mining tool. Engineers and scientists can take advantage of the readily available functions/toolboxes to gain wider insight in their perspective data mining experiments.
Improve Data Mining and Knowledge Discovery through the use of MatLab
NASA Technical Reports Server (NTRS)
Shaykahian, Gholan Ali; Martin, Dawn Elliott; Beil, Robert
2011-01-01
Data mining is widely used to mine business, engineering, and scientific data. Data mining uses pattern based queries, searches, or other analyses of one or more electronic databases/datasets in order to discover or locate a predictive pattern or anomaly indicative of system failure, criminal or terrorist activity, etc. There are various algorithms, techniques and methods used to mine data; including neural networks, genetic algorithms, decision trees, nearest neighbor method, rule induction association analysis, slice and dice, segmentation, and clustering. These algorithms, techniques and methods used to detect patterns in a dataset, have been used in the development of numerous open source and commercially available products and technology for data mining. Data mining is best realized when latent information in a large quantity of data stored is discovered. No one technique solves all data mining problems; challenges are to select algorithms or methods appropriate to strengthen data/text mining and trending within given datasets. In recent years, throughout industry, academia and government agencies, thousands of data systems have been designed and tailored to serve specific engineering and business needs. Many of these systems use databases with relational algebra and structured query language to categorize and retrieve data. In these systems, data analyses are limited and require prior explicit knowledge of metadata and database relations; lacking exploratory data mining and discoveries of latent information. This presentation introduces MatLab(TradeMark)(MATrix LABoratory), an engineering and scientific data analyses tool to perform data mining. MatLab was originally intended to perform purely numerical calculations (a glorified calculator). Now, in addition to having hundreds of mathematical functions, it is a programming language with hundreds built in standard functions and numerous available toolboxes. MatLab's ease of data processing, visualization and its enormous availability of built in functionalities and toolboxes make it suitable to perform numerical computations and simulations as well as a data mining tool. Engineers and scientists can take advantage of the readily available functions/toolboxes to gain wider insight in their perspective data mining experiments.
Imitating manual curation of text-mined facts in biomedicine.
Rodriguez-Esteban, Raul; Iossifov, Ivan; Rzhetsky, Andrey
2006-09-08
Text-mining algorithms make mistakes in extracting facts from natural-language texts. In biomedical applications, which rely on use of text-mined data, it is critical to assess the quality (the probability that the message is correctly extracted) of individual facts--to resolve data conflicts and inconsistencies. Using a large set of almost 100,000 manually produced evaluations (most facts were independently reviewed more than once, producing independent evaluations), we implemented and tested a collection of algorithms that mimic human evaluation of facts provided by an automated information-extraction system. The performance of our best automated classifiers closely approached that of our human evaluators (ROC score close to 0.95). Our hypothesis is that, were we to use a larger number of human experts to evaluate any given sentence, we could implement an artificial-intelligence curator that would perform the classification job at least as accurately as an average individual human evaluator. We illustrated our analysis by visualizing the predicted accuracy of the text-mined relations involving the term cocaine.
Assimilating Text-Mining & Bio-Informatics Tools to Analyze Cellulase structures
NASA Astrophysics Data System (ADS)
Satyasree, K. P. N. V., Dr; Lalitha Kumari, B., Dr; Jyotsna Devi, K. S. N. V.; Choudri, S. M. Roy; Pratap Joshi, K.
2017-08-01
Text-mining is one of the best potential way of automatically extracting information from the huge biological literature. To exploit its prospective, the knowledge encrypted in the text should be converted to some semantic representation such as entities and relations, which could be analyzed by machines. But large-scale practical systems for this purpose are rare. But text mining could be helpful for generating or validating predictions. Cellulases have abundant applications in various industries. Cellulose degrading enzymes are cellulases and the same producing bacteria - Bacillus subtilis & fungus Pseudomonas putida were isolated from top soil of Guntur Dt. A.P. India. Absolute cultures were conserved on potato dextrose agar medium for molecular studies. In this paper, we presented how well the text mining concepts can be used to analyze cellulase producing bacteria and fungi, their comparative structures are also studied with the aid of well-establised, high quality standard bioinformatic tools such as Bioedit, Swissport, Protparam, EMBOSSwin with which a complete data on Cellulases like structure, constituents of the enzyme has been obtained.
Gurulingappa, Harsha; Toldo, Luca; Rajput, Abdul Mateen; Kors, Jan A; Taweel, Adel; Tayrouz, Yorki
2013-11-01
The aim of this study was to assess the impact of automatically detected adverse event signals from text and open-source data on the prediction of drug label changes. Open-source adverse effect data were collected from FAERS, Yellow Cards and SIDER databases. A shallow linguistic relation extraction system (JSRE) was applied for extraction of adverse effects from MEDLINE case reports. Statistical approach was applied on the extracted datasets for signal detection and subsequent prediction of label changes issued for 29 drugs by the UK Regulatory Authority in 2009. 76% of drug label changes were automatically predicted. Out of these, 6% of drug label changes were detected only by text mining. JSRE enabled precise identification of four adverse drug events from MEDLINE that were undetectable otherwise. Changes in drug labels can be predicted automatically using data and text mining techniques. Text mining technology is mature and well-placed to support the pharmacovigilance tasks. Copyright © 2013 John Wiley & Sons, Ltd.
Mining Adverse Drug Reactions in Social Media with Named Entity Recognition and Semantic Methods.
Chen, Xiaoyi; Deldossi, Myrtille; Aboukhamis, Rim; Faviez, Carole; Dahamna, Badisse; Karapetiantz, Pierre; Guenegou-Arnoux, Armelle; Girardeau, Yannick; Guillemin-Lanne, Sylvie; Lillo-Le-Louët, Agnès; Texier, Nathalie; Burgun, Anita; Katsahian, Sandrine
2017-01-01
Suspected adverse drug reactions (ADR) reported by patients through social media can be a complementary source to current pharmacovigilance systems. However, the performance of text mining tools applied to social media text data to discover ADRs needs to be evaluated. In this paper, we introduce the approach developed to mine ADR from French social media. A protocol of evaluation is highlighted, which includes a detailed sample size determination and evaluation corpus constitution. Our text mining approach provided very encouraging preliminary results with F-measures of 0.94 and 0.81 for recognition of drugs and symptoms respectively, and with F-measure of 0.70 for ADR detection. Therefore, this approach is promising for downstream pharmacovigilance analysis.
Detection and Evaluation of Cheating on College Exams Using Supervised Classification
ERIC Educational Resources Information Center
Cavalcanti, Elmano Ramalho; Pires, Carlos Eduardo; Cavalcanti, Elmano Pontes; Pires, Vládia Freire
2012-01-01
Text mining has been used for various purposes, such as document classification and extraction of domain-specific information from text. In this paper we present a study in which text mining methodology and algorithms were properly employed for academic dishonesty (cheating) detection and evaluation on open-ended college exams, based on document…
Problems in using p-curve analysis and text-mining to detect rate of p-hacking and evidential value
Thompson, Paul A.
2016-01-01
Background. The p-curve is a plot of the distribution of p-values reported in a set of scientific studies. Comparisons between ranges of p-values have been used to evaluate fields of research in terms of the extent to which studies have genuine evidential value, and the extent to which they suffer from bias in the selection of variables and analyses for publication, p-hacking. Methods. p-hacking can take various forms. Here we used R code to simulate the use of ghost variables, where an experimenter gathers data on several dependent variables but reports only those with statistically significant effects. We also examined a text-mined dataset used by Head et al. (2015) and assessed its suitability for investigating p-hacking. Results. We show that when there is ghost p-hacking, the shape of the p-curve depends on whether dependent variables are intercorrelated. For uncorrelated variables, simulated p-hacked data do not give the “p-hacking bump” just below .05 that is regarded as evidence of p-hacking, though there is a negative skew when simulated variables are inter-correlated. The way p-curves vary according to features of underlying data poses problems when automated text mining is used to detect p-values in heterogeneous sets of published papers. Conclusions. The absence of a bump in the p-curve is not indicative of lack of p-hacking. Furthermore, while studies with evidential value will usually generate a right-skewed p-curve, we cannot treat a right-skewed p-curve as an indicator of the extent of evidential value, unless we have a model specific to the type of p-values entered into the analysis. We conclude that it is not feasible to use the p-curve to estimate the extent of p-hacking and evidential value unless there is considerable control over the type of data entered into the analysis. In particular, p-hacking with ghost variables is likely to be missed. PMID:26925335
What the papers say: Text mining for genomics and systems biology
2010-01-01
Keeping up with the rapidly growing literature has become virtually impossible for most scientists. This can have dire consequences. First, we may waste research time and resources on reinventing the wheel simply because we can no longer maintain a reliable grasp on the published literature. Second, and perhaps more detrimental, judicious (or serendipitous) combination of knowledge from different scientific disciplines, which would require following disparate and distinct research literatures, is rapidly becoming impossible for even the most ardent readers of research publications. Text mining -- the automated extraction of information from (electronically) published sources -- could potentially fulfil an important role -- but only if we know how to harness its strengths and overcome its weaknesses. As we do not expect that the rate at which scientific results are published will decrease, text mining tools are now becoming essential in order to cope with, and derive maximum benefit from, this information explosion. In genomics, this is particularly pressing as more and more rare disease-causing variants are found and need to be understood. Not being conversant with this technology may put scientists and biomedical regulators at a severe disadvantage. In this review, we introduce the basic concepts underlying modern text mining and its applications in genomics and systems biology. We hope that this review will serve three purposes: (i) to provide a timely and useful overview of the current status of this field, including a survey of present challenges; (ii) to enable researchers to decide how and when to apply text mining tools in their own research; and (iii) to highlight how the research communities in genomics and systems biology can help to make text mining from biomedical abstracts and texts more straightforward. PMID:21106487
Knowledge based word-concept model estimation and refinement for biomedical text mining.
Jimeno Yepes, Antonio; Berlanga, Rafael
2015-02-01
Text mining of scientific literature has been essential for setting up large public biomedical databases, which are being widely used by the research community. In the biomedical domain, the existence of a large number of terminological resources and knowledge bases (KB) has enabled a myriad of machine learning methods for different text mining related tasks. Unfortunately, KBs have not been devised for text mining tasks but for human interpretation, thus performance of KB-based methods is usually lower when compared to supervised machine learning methods. The disadvantage of supervised methods though is they require labeled training data and therefore not useful for large scale biomedical text mining systems. KB-based methods do not have this limitation. In this paper, we describe a novel method to generate word-concept probabilities from a KB, which can serve as a basis for several text mining tasks. This method not only takes into account the underlying patterns within the descriptions contained in the KB but also those in texts available from large unlabeled corpora such as MEDLINE. The parameters of the model have been estimated without training data. Patterns from MEDLINE have been built using MetaMap for entity recognition and related using co-occurrences. The word-concept probabilities were evaluated on the task of word sense disambiguation (WSD). The results showed that our method obtained a higher degree of accuracy than other state-of-the-art approaches when evaluated on the MSH WSD data set. We also evaluated our method on the task of document ranking using MEDLINE citations. These results also showed an increase in performance over existing baseline retrieval approaches. Copyright © 2014 Elsevier Inc. All rights reserved.
75 FR 21341 - Notice of Proposed Information Collection for 1029-0087
Federal Register 2010, 2011, 2012, 2013, 2014
2010-04-23
..., and its accompanying form, OSM-76, Abandoned Mine Land Problem Area Description form. The collection... Land Problem Area Description form. OSM is requesting a 3-year term of approval for this information... Mine Land Problem Area Description Form. OMB Control Number: 1029-0087. Summary: The regulation at 886...
Assessing semantic similarity of texts - Methods and algorithms
NASA Astrophysics Data System (ADS)
Rozeva, Anna; Zerkova, Silvia
2017-12-01
Assessing the semantic similarity of texts is an important part of different text-related applications like educational systems, information retrieval, text summarization, etc. This task is performed by sophisticated analysis, which implements text-mining techniques. Text mining involves several pre-processing steps, which provide for obtaining structured representative model of the documents in a corpus by means of extracting and selecting the features, characterizing their content. Generally the model is vector-based and enables further analysis with knowledge discovery approaches. Algorithms and measures are used for assessing texts at syntactical and semantic level. An important text-mining method and similarity measure is latent semantic analysis (LSA). It provides for reducing the dimensionality of the document vector space and better capturing the text semantics. The mathematical background of LSA for deriving the meaning of the words in a given text by exploring their co-occurrence is examined. The algorithm for obtaining the vector representation of words and their corresponding latent concepts in a reduced multidimensional space as well as similarity calculation are presented.
ERIC Educational Resources Information Center
Benoit, Gerald
2002-01-01
Discusses data mining (DM) and knowledge discovery in databases (KDD), taking the view that KDD is the larger view of the entire process, with DM emphasizing the cleaning, warehousing, mining, and visualization of knowledge discovery in databases. Highlights include algorithms; users; the Internet; text mining; and information extraction.…
ERIC Educational Resources Information Center
Bowers, Alex J.; Chen, Jingjing
2015-01-01
The purpose of this study is to bring together recent innovations in the research literature around school district capital facility finance, municipal bond elections, statistical models of conditional time-varying outcomes, and data mining algorithms for automated text mining of election ballot proposals to examine the factors that influence the…
New directions in biomedical text annotation: definitions, guidelines and corpus construction
Wilbur, W John; Rzhetsky, Andrey; Shatkay, Hagit
2006-01-01
Background While biomedical text mining is emerging as an important research area, practical results have proven difficult to achieve. We believe that an important first step towards more accurate text-mining lies in the ability to identify and characterize text that satisfies various types of information needs. We report here the results of our inquiry into properties of scientific text that have sufficient generality to transcend the confines of a narrow subject area, while supporting practical mining of text for factual information. Our ultimate goal is to annotate a significant corpus of biomedical text and train machine learning methods to automatically categorize such text along certain dimensions that we have defined. Results We have identified five qualitative dimensions that we believe characterize a broad range of scientific sentences, and are therefore useful for supporting a general approach to text-mining: focus, polarity, certainty, evidence, and directionality. We define these dimensions and describe the guidelines we have developed for annotating text with regard to them. To examine the effectiveness of the guidelines, twelve annotators independently annotated the same set of 101 sentences that were randomly selected from current biomedical periodicals. Analysis of these annotations shows 70–80% inter-annotator agreement, suggesting that our guidelines indeed present a well-defined, executable and reproducible task. Conclusion We present our guidelines defining a text annotation task, along with annotation results from multiple independently produced annotations, demonstrating the feasibility of the task. The annotation of a very large corpus of documents along these guidelines is currently ongoing. These annotations form the basis for the categorization of text along multiple dimensions, to support viable text mining for experimental results, methodology statements, and other forms of information. We are currently developing machine learning methods, to be trained and tested on the annotated corpus, that would allow for the automatic categorization of biomedical text along the general dimensions that we have presented. The guidelines in full detail, along with annotated examples, are publicly available. PMID:16867190
NASA Astrophysics Data System (ADS)
Weiss, Gary M.
Rare cases are often the most interesting cases. For example, in medical diagnosis one is typically interested in identifying relatively rare diseases, such as cancer, rather than more frequently occurring ones, such as the common cold. In this chapter we discuss the role of rare cases in Data Mining. Specific problems associated with mining rare cases are discussed, followed by a description of methods for addressing these problems.
WORKSHOP ON: MINING-IMPACTED NATIVE AMERICAN LANDS
Mining waste which is generated from both active and inactive mining sites continues to be a problem for human health and ecosystems. Recent scoping studies show significant environmental impacts from mining activities primarily in the Western States. Approximately, 85% of this...
NASA Astrophysics Data System (ADS)
Scheele, C. J.; Huang, Q.
2016-12-01
In the past decade, the rise in social media has led to the development of a vast number of social media services and applications. Disaster management represents one of such applications leveraging massive data generated for event detection, response, and recovery. In order to find disaster relevant social media data, current approaches utilize natural language processing (NLP) methods based on keywords, or machine learning algorithms relying on text only. However, these approaches cannot be perfectly accurate due to the variability and uncertainty in language used on social media. To improve current methods, the enhanced text-mining framework is proposed to incorporate location information from social media and authoritative remote sensing datasets for detecting disaster relevant social media posts, which are determined by assessing the textual content using common text mining methods and how the post relates spatiotemporally to the disaster event. To assess the framework, geo-tagged Tweets were collected for three different spatial and temporal disaster events: hurricane, flood, and tornado. Remote sensing data and products for each event were then collected using RealEarthTM. Both Naive Bayes and Logistic Regression classifiers were used to compare the accuracy within the enhanced text-mining framework. Finally, the accuracies from the enhanced text-mining framework were compared to the current text-only methods for each of the case study disaster events. The results from this study address the need for more authoritative data when using social media in disaster management applications.
Mining Bug Databases for Unidentified Software Vulnerabilities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dumidu Wijayasekara; Milos Manic; Jason Wright
2012-06-01
Identifying software vulnerabilities is becoming more important as critical and sensitive systems increasingly rely on complex software systems. It has been suggested in previous work that some bugs are only identified as vulnerabilities long after the bug has been made public. These vulnerabilities are known as hidden impact vulnerabilities. This paper discusses the feasibility and necessity to mine common publicly available bug databases for vulnerabilities that are yet to be identified. We present bug database analysis of two well known and frequently used software packages, namely Linux kernel and MySQL. It is shown that for both Linux and MySQL, amore » significant portion of vulnerabilities that were discovered for the time period from January 2006 to April 2011 were hidden impact vulnerabilities. It is also shown that the percentage of hidden impact vulnerabilities has increased in the last two years, for both software packages. We then propose an improved hidden impact vulnerability identification methodology based on text mining bug databases, and conclude by discussing a few potential problems faced by such a classifier.« less
Using Open Web APIs in Teaching Web Mining
ERIC Educational Resources Information Center
Chen, Hsinchun; Li, Xin; Chau, M.; Ho, Yi-Jen; Tseng, Chunju
2009-01-01
With the advent of the World Wide Web, many business applications that utilize data mining and text mining techniques to extract useful business information on the Web have evolved from Web searching to Web mining. It is important for students to acquire knowledge and hands-on experience in Web mining during their education in information systems…
77 FR 61406 - Agency Forms Undergoing Paperwork Reduction Act Review
Federal Register 2010, 2011, 2012, 2013, 2014
2012-10-09
... Underground Coal Mining (0920-0835 Expiration 12/31/2012)--Revision--National Institute for Occupational... occupational safety and health problems in the coal mining industry. In recent years, coal mining safety has... health, the U.S. relies on coal mining to meet its electricity needs. For this reason, the coal mining...
Zhao, Ning; Zheng, Guang; Li, Jian; Zhao, Hong-Yan; Lu, Cheng; Jiang, Miao; Zhang, Chi; Guo, Hong-Tao; Lu, Ai-Ping
2018-01-09
To identify the commonalities between rheumatoid arthritis (RA) and diabetes mellitus (DM) to understand the mechanisms of Chinese medicine (CM) in different diseases with the same treatment. A text mining approach was adopted to analyze the commonalities between RA and DM according to CM and biological elements. The major commonalities were subsequently verifified in RA and DM rat models, in which herbal formula for the treatment of both RA and DM identifified via text mining was used as the intervention. Similarities were identifified between RA and DM regarding the CM approach used for diagnosis and treatment, as well as the networks of biological activities affected by each disease, including the involvement of adhesion molecules, oxidative stress, cytokines, T-lymphocytes, apoptosis, and inflfl ammation. The Ramulus Cinnamomi-Radix Paeoniae Alba-Rhizoma Anemarrhenae is an herbal combination used to treat RA and DM. This formula demonstrated similar effects on oxidative stress and inflfl ammation in rats with collagen-induced arthritis, which supports the text mining results regarding the commonalities between RA and DM. Commonalities between the biological activities involved in RA and DM were identifified through text mining, and both RA and DM might be responsive to the same intervention at a specifific stage.
Recent developments in the reclamation of surface mined lands
Sharma, K.D.; Gough, L.P.; Kumar, S.; Sharma, B.K.; Saxena, S.K.
1997-01-01
A broad review of mine land reclamation problems and challenges in arid lands is presented with special emphasis on work recently completed in India. The economics of mining in the Indian Desert is second only to agriculture in importance. Lands disturbed by mining, however, have only recently been the focus of reclamation attempts. Studies were made and results compiled of problems associated with germplasm selection, soil, plant and overburden characterization and manipulation, plant establishment methods utilized, soil amendment needs, use and conservation of available water and the evaluation of ecosystem sustainability. Emphasis is made of the need for multi-disciplinary approaches to mine land reclamation research and for the long-term monitoring of reclamation success.
Spectral methods to detect surface mines
NASA Astrophysics Data System (ADS)
Winter, Edwin M.; Schatten Silvious, Miranda
2008-04-01
Over the past five years, advances have been made in the spectral detection of surface mines under minefield detection programs at the U. S. Army RDECOM CERDEC Night Vision and Electronic Sensors Directorate (NVESD). The problem of detecting surface land mines ranges from the relatively simple, the detection of large anti-vehicle mines on bare soil, to the very difficult, the detection of anti-personnel mines in thick vegetation. While spatial and spectral approaches can be applied to the detection of surface mines, spatial-only detection requires many pixels-on-target such that the mine is actually imaged and shape-based features can be exploited. This method is unreliable in vegetated areas because only part of the mine may be exposed, while spectral detection is possible without the mine being resolved. At NVESD, hyperspectral and multi-spectral sensors throughout the reflection and thermal spectral regimes have been applied to the mine detection problem. Data has been collected on mines in forest and desert regions and algorithms have been developed both to detect the mines as anomalies and to detect the mines based on their spectral signature. In addition to the detection of individual mines, algorithms have been developed to exploit the similarities of mines in a minefield to improve their detection probability. In this paper, the types of spectral data collected over the past five years will be summarized along with the advances in algorithm development.
Text Mining of Journal Articles for Sleep Disorder Terminologies.
Lam, Calvin; Lai, Fu-Chih; Wang, Chia-Hui; Lai, Mei-Hsin; Hsu, Nanly; Chung, Min-Huey
2016-01-01
Research on publication trends in journal articles on sleep disorders (SDs) and the associated methodologies by using text mining has been limited. The present study involved text mining for terms to determine the publication trends in sleep-related journal articles published during 2000-2013 and to identify associations between SD and methodology terms as well as conducting statistical analyses of the text mining findings. SD and methodology terms were extracted from 3,720 sleep-related journal articles in the PubMed database by using MetaMap. The extracted data set was analyzed using hierarchical cluster analyses and adjusted logistic regression models to investigate publication trends and associations between SD and methodology terms. MetaMap had a text mining precision, recall, and false positive rate of 0.70, 0.77, and 11.51%, respectively. The most common SD term was breathing-related sleep disorder, whereas narcolepsy was the least common. Cluster analyses showed similar methodology clusters for each SD term, except narcolepsy. The logistic regression models showed an increasing prevalence of insomnia, parasomnia, and other sleep disorders but a decreasing prevalence of breathing-related sleep disorder during 2000-2013. Different SD terms were positively associated with different methodology terms regarding research design terms, measure terms, and analysis terms. Insomnia-, parasomnia-, and other sleep disorder-related articles showed an increasing publication trend, whereas those related to breathing-related sleep disorder showed a decreasing trend. Furthermore, experimental studies more commonly focused on hypersomnia and other SDs and less commonly on insomnia, breathing-related sleep disorder, narcolepsy, and parasomnia. Thus, text mining may facilitate the exploration of the publication trends in SDs and the associated methodologies.
Text Mining to Support Gene Ontology Curation and Vice Versa.
Ruch, Patrick
2017-01-01
In this chapter, we explain how text mining can support the curation of molecular biology databases dealing with protein functions. We also show how curated data can play a disruptive role in the developments of text mining methods. We review a decade of efforts to improve the automatic assignment of Gene Ontology (GO) descriptors, the reference ontology for the characterization of genes and gene products. To illustrate the high potential of this approach, we compare the performances of an automatic text categorizer and show a large improvement of +225 % in both precision and recall on benchmarked data. We argue that automatic text categorization functions can ultimately be embedded into a Question-Answering (QA) system to answer questions related to protein functions. Because GO descriptors can be relatively long and specific, traditional QA systems cannot answer such questions. A new type of QA system, so-called Deep QA which uses machine learning methods trained with curated contents, is thus emerging. Finally, future advances of text mining instruments are directly dependent on the availability of high-quality annotated contents at every curation step. Databases workflows must start recording explicitly all the data they curate and ideally also some of the data they do not curate.
Hahn, P; Dullweber, F; Unglaub, F; Spies, C K
2014-06-01
Searching for relevant publications is becoming more difficult with the increasing number of scientific articles. Text mining as a specific form of computer-based data analysis may be helpful in this context. Highlighting relations between authors and finding relevant publications concerning a specific subject using text analysis programs are illustrated graphically by 2 performed examples. © Georg Thieme Verlag KG Stuttgart · New York.
30 CFR 49.60 - Requirements for a local mine rescue contest.
Code of Federal Regulations, 2010 CFR
2010-07-01
... EDUCATION AND TRAINING MINE RESCUE TEAMS Mine Rescue Teams for Underground Coal Mines § 49.60 Requirements... United States; (2) Uses MSHA-recognized rules; (3) Has a minimum of three mine rescue teams competing; (4) Has one or more problems conducted on one or more days with a determined winner; (5) Includes team...
30 CFR 49.60 - Requirements for a local mine rescue contest.
Code of Federal Regulations, 2013 CFR
2013-07-01
... EDUCATION AND TRAINING MINE RESCUE TEAMS Mine Rescue Teams for Underground Coal Mines § 49.60 Requirements... United States; (2) Uses MSHA-recognized rules; (3) Has a minimum of three mine rescue teams competing; (4) Has one or more problems conducted on one or more days with a determined winner; (5) Includes team...
30 CFR 49.60 - Requirements for a local mine rescue contest.
Code of Federal Regulations, 2012 CFR
2012-07-01
... EDUCATION AND TRAINING MINE RESCUE TEAMS Mine Rescue Teams for Underground Coal Mines § 49.60 Requirements... United States; (2) Uses MSHA-recognized rules; (3) Has a minimum of three mine rescue teams competing; (4) Has one or more problems conducted on one or more days with a determined winner; (5) Includes team...
30 CFR 49.60 - Requirements for a local mine rescue contest.
Code of Federal Regulations, 2011 CFR
2011-07-01
... EDUCATION AND TRAINING MINE RESCUE TEAMS Mine Rescue Teams for Underground Coal Mines § 49.60 Requirements... United States; (2) Uses MSHA-recognized rules; (3) Has a minimum of three mine rescue teams competing; (4) Has one or more problems conducted on one or more days with a determined winner; (5) Includes team...
30 CFR 49.60 - Requirements for a local mine rescue contest.
Code of Federal Regulations, 2014 CFR
2014-07-01
... EDUCATION AND TRAINING MINE RESCUE TEAMS Mine Rescue Teams for Underground Coal Mines § 49.60 Requirements... United States; (2) Uses MSHA-recognized rules; (3) Has a minimum of three mine rescue teams competing; (4) Has one or more problems conducted on one or more days with a determined winner; (5) Includes team...
Mining of Business-Oriented Conversations at a Call Center
NASA Astrophysics Data System (ADS)
Takeuchi, Hironori; Nasukawa, Tetsuya; Watanabe, Hideo
Recently it has become feasible to transcribe textual records from telephone conversations at call centers by using automatic speech recognition. In this research, we extended a text mining system for call summary records and constructed a conversation mining system for the business-oriented conversations at the call center. To acquire useful business insights from the conversational data through the text mining system, it is critical to identify appropriate textual segments and expressions as the viewpoints to focus on. In the analysis of call summary data using a text mining system, some experts defined the viewpoints for the analysis by looking at some sample records and by preparing the dictionaries based on frequent keywords in the sample dataset. However with conversations it is difficult to identify such viewpoints manually and in advance because the target data consists of complete transcripts that are often lengthy and redundant. In this research, we defined a model of the business-oriented conversations and proposed a mining method to identify segments that have impacts on the outcomes of the conversations and can then extract useful expressions in each of these identified segments. In the experiment, we processed the real datasets from a car rental service center and constructed a mining system. With this system, we show the effectiveness of the method based on the defined conversation model.
Practice-based evidence: profiling the safety of cilostazol by text-mining of clinical notes.
Leeper, Nicholas J; Bauer-Mehren, Anna; Iyer, Srinivasan V; Lependu, Paea; Olson, Cliff; Shah, Nigam H
2013-01-01
Peripheral arterial disease (PAD) is a growing problem with few available therapies. Cilostazol is the only FDA-approved medication with a class I indication for intermittent claudication, but carries a black box warning due to concerns for increased cardiovascular mortality. To assess the validity of this black box warning, we employed a novel text-analytics pipeline to quantify the adverse events associated with Cilostazol use in a clinical setting, including patients with congestive heart failure (CHF). We analyzed the electronic medical records of 1.8 million subjects from the Stanford clinical data warehouse spanning 18 years using a novel text-mining/statistical analytics pipeline. We identified 232 PAD patients taking Cilostazol and created a control group of 1,160 PAD patients not taking this drug using 1:5 propensity-score matching. Over a mean follow up of 4.2 years, we observed no association between Cilostazol use and any major adverse cardiovascular event including stroke (OR = 1.13, CI [0.82, 1.55]), myocardial infarction (OR = 1.00, CI [0.71, 1.39]), or death (OR = 0.86, CI [0.63, 1.18]). Cilostazol was not associated with an increase in any arrhythmic complication. We also identified a subset of CHF patients who were prescribed Cilostazol despite its black box warning, and found that it did not increase mortality in this high-risk group of patients. This proof of principle study shows the potential of text-analytics to mine clinical data warehouses to uncover 'natural experiments' such as the use of Cilostazol in CHF patients. We envision this method will have broad applications for examining difficult to test clinical hypotheses and to aid in post-marketing drug safety surveillance. Moreover, our observations argue for a prospective study to examine the validity of a drug safety warning that may be unnecessarily limiting the use of an efficacious therapy.
Practice-Based Evidence: Profiling the Safety of Cilostazol by Text-Mining of Clinical Notes
Iyer, Srinivasan V.; LePendu, Paea; Olson, Cliff; Shah, Nigam H.
2013-01-01
Background Peripheral arterial disease (PAD) is a growing problem with few available therapies. Cilostazol is the only FDA-approved medication with a class I indication for intermittent claudication, but carries a black box warning due to concerns for increased cardiovascular mortality. To assess the validity of this black box warning, we employed a novel text-analytics pipeline to quantify the adverse events associated with Cilostazol use in a clinical setting, including patients with congestive heart failure (CHF). Methods and Results We analyzed the electronic medical records of 1.8 million subjects from the Stanford clinical data warehouse spanning 18 years using a novel text-mining/statistical analytics pipeline. We identified 232 PAD patients taking Cilostazol and created a control group of 1,160 PAD patients not taking this drug using 1∶5 propensity-score matching. Over a mean follow up of 4.2 years, we observed no association between Cilostazol use and any major adverse cardiovascular event including stroke (OR = 1.13, CI [0.82, 1.55]), myocardial infarction (OR = 1.00, CI [0.71, 1.39]), or death (OR = 0.86, CI [0.63, 1.18]). Cilostazol was not associated with an increase in any arrhythmic complication. We also identified a subset of CHF patients who were prescribed Cilostazol despite its black box warning, and found that it did not increase mortality in this high-risk group of patients. Conclusions This proof of principle study shows the potential of text-analytics to mine clinical data warehouses to uncover ‘natural experiments’ such as the use of Cilostazol in CHF patients. We envision this method will have broad applications for examining difficult to test clinical hypotheses and to aid in post-marketing drug safety surveillance. Moreover, our observations argue for a prospective study to examine the validity of a drug safety warning that may be unnecessarily limiting the use of an efficacious therapy. PMID:23717437
Recent progress in automatically extracting information from the pharmacogenomic literature
Garten, Yael; Coulet, Adrien; Altman, Russ B
2011-01-01
The biomedical literature holds our understanding of pharmacogenomics, but it is dispersed across many journals. In order to integrate our knowledge, connect important facts across publications and generate new hypotheses we must organize and encode the contents of the literature. By creating databases of structured pharmocogenomic knowledge, we can make the value of the literature much greater than the sum of the individual reports. We can, for example, generate candidate gene lists or interpret surprising hits in genome-wide association studies. Text mining automatically adds structure to the unstructured knowledge embedded in millions of publications, and recent years have seen a surge in work on biomedical text mining, some specific to pharmacogenomics literature. These methods enable extraction of specific types of information and can also provide answers to general, systemic queries. In this article, we describe the main tasks of text mining in the context of pharmacogenomics, summarize recent applications and anticipate the next phase of text mining applications. PMID:21047206
NASA Astrophysics Data System (ADS)
Dai, Chunxiao; Wang, Songhui; Sun, Dian; Chen, Dong
2007-06-01
The result of land use in coalfield is important to sustainable development in resourceful city. For surface morphology being changed by subsidence, the mining subsidence becomes the main problem to land use with the negative influence of ecological environment, production and steadily develop in coal mining areas. Taking Panyi Coal Mine of Huainan Mining Group Corp as an example, this paper predicted and simulated the mining subsidence in Matlab environment on the basis of the probability integral method. The change of land use types of early term, medium term and long term was analyzed in accordance with the results of mining subsidence prediction with GIS as a spatial data management and spatial analysis tool. The result of analysis showed that 80% area in Panyi Coal Mine be affected by mining subsidence and 52km2 perennial waterlogged area was gradually formed. The farmland ecosystem was gradually turned into wetland ecosystem in most study area. According to the economic and social development and natural conditions of mining area, calculating the ecological environment, production and people's livelihood, this paper supplied the plan for comprehensive utilization of land resource. In this plan, intervention measures be taken during the coal mining and the mining subsidence formation and development, and this method can solve the problems of Land use at the relative low cost.
Conceptual biology, hypothesis discovery, and text mining: Swanson's legacy.
Bekhuis, Tanja
2006-04-03
Innovative biomedical librarians and information specialists who want to expand their roles as expert searchers need to know about profound changes in biology and parallel trends in text mining. In recent years, conceptual biology has emerged as a complement to empirical biology. This is partly in response to the availability of massive digital resources such as the network of databases for molecular biologists at the National Center for Biotechnology Information. Developments in text mining and hypothesis discovery systems based on the early work of Swanson, a mathematician and information scientist, are coincident with the emergence of conceptual biology. Very little has been written to introduce biomedical digital librarians to these new trends. In this paper, background for data and text mining, as well as for knowledge discovery in databases (KDD) and in text (KDT) is presented, then a brief review of Swanson's ideas, followed by a discussion of recent approaches to hypothesis discovery and testing. 'Testing' in the context of text mining involves partially automated methods for finding evidence in the literature to support hypothetical relationships. Concluding remarks follow regarding (a) the limits of current strategies for evaluation of hypothesis discovery systems and (b) the role of literature-based discovery in concert with empirical research. Report of an informatics-driven literature review for biomarkers of systemic lupus erythematosus is mentioned. Swanson's vision of the hidden value in the literature of science and, by extension, in biomedical digital databases, is still remarkably generative for information scientists, biologists, and physicians.
Mining problems caused by tectonic stress in Illinois basin
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nelson, W.J.
1991-08-01
The Illinois basin coalfield is subject to a contemporary tectonic stress field in which the principal compressive stress axis ({sigma}1) is horizontal and strikes N60{degree}E to east-west. This stress is responsible for widespread development of kind zones and directional roof failures in mine headings driven perpendicular to {sigma}1. Also, small thrust faults perpendicular to {sigma}1 and joints parallel to {sigma}1 weaken the mine roof and occasionally admit water and gas to workings, depending upon geologic setting. The direction of magnitude of stress have been identified by a variety of techniques that can be applied both prior to mining and duringmore » development. Mining experience shows that the best method of minimizing stress-related problems is to drive mine headings at about 45 to {sigma}1.« less
Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery.
Gonzalez, Graciela H; Tahsin, Tasnia; Goodale, Britton C; Greene, Anna C; Greene, Casey S
2016-01-01
Precision medicine will revolutionize the way we treat and prevent disease. A major barrier to the implementation of precision medicine that clinicians and translational scientists face is understanding the underlying mechanisms of disease. We are starting to address this challenge through automatic approaches for information extraction, representation and analysis. Recent advances in text and data mining have been applied to a broad spectrum of key biomedical questions in genomics, pharmacogenomics and other fields. We present an overview of the fundamental methods for text and data mining, as well as recent advances and emerging applications toward precision medicine. © The Author 2015. Published by Oxford University Press.
Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery
Gonzalez, Graciela H.; Tahsin, Tasnia; Goodale, Britton C.; Greene, Anna C.
2016-01-01
Precision medicine will revolutionize the way we treat and prevent disease. A major barrier to the implementation of precision medicine that clinicians and translational scientists face is understanding the underlying mechanisms of disease. We are starting to address this challenge through automatic approaches for information extraction, representation and analysis. Recent advances in text and data mining have been applied to a broad spectrum of key biomedical questions in genomics, pharmacogenomics and other fields. We present an overview of the fundamental methods for text and data mining, as well as recent advances and emerging applications toward precision medicine. PMID:26420781
Extracting semantically enriched events from biomedical literature
2012-01-01
Background Research into event-based text mining from the biomedical literature has been growing in popularity to facilitate the development of advanced biomedical text mining systems. Such technology permits advanced search, which goes beyond document or sentence-based retrieval. However, existing event-based systems typically ignore additional information within the textual context of events that can determine, amongst other things, whether an event represents a fact, hypothesis, experimental result or analysis of results, whether it describes new or previously reported knowledge, and whether it is speculated or negated. We refer to such contextual information as meta-knowledge. The automatic recognition of such information can permit the training of systems allowing finer-grained searching of events according to the meta-knowledge that is associated with them. Results Based on a corpus of 1,000 MEDLINE abstracts, fully manually annotated with both events and associated meta-knowledge, we have constructed a machine learning-based system that automatically assigns meta-knowledge information to events. This system has been integrated into EventMine, a state-of-the-art event extraction system, in order to create a more advanced system (EventMine-MK) that not only extracts events from text automatically, but also assigns five different types of meta-knowledge to these events. The meta-knowledge assignment module of EventMine-MK performs with macro-averaged F-scores in the range of 57-87% on the BioNLP’09 Shared Task corpus. EventMine-MK has been evaluated on the BioNLP’09 Shared Task subtask of detecting negated and speculated events. Our results show that EventMine-MK can outperform other state-of-the-art systems that participated in this task. Conclusions We have constructed the first practical system that extracts both events and associated, detailed meta-knowledge information from biomedical literature. The automatically assigned meta-knowledge information can be used to refine search systems, in order to provide an extra search layer beyond entities and assertions, dealing with phenomena such as rhetorical intent, speculations, contradictions and negations. This finer grained search functionality can assist in several important tasks, e.g., database curation (by locating new experimental knowledge) and pathway enrichment (by providing information for inference). To allow easy integration into text mining systems, EventMine-MK is provided as a UIMA component that can be used in the interoperable text mining infrastructure, U-Compare. PMID:22621266
Extracting semantically enriched events from biomedical literature.
Miwa, Makoto; Thompson, Paul; McNaught, John; Kell, Douglas B; Ananiadou, Sophia
2012-05-23
Research into event-based text mining from the biomedical literature has been growing in popularity to facilitate the development of advanced biomedical text mining systems. Such technology permits advanced search, which goes beyond document or sentence-based retrieval. However, existing event-based systems typically ignore additional information within the textual context of events that can determine, amongst other things, whether an event represents a fact, hypothesis, experimental result or analysis of results, whether it describes new or previously reported knowledge, and whether it is speculated or negated. We refer to such contextual information as meta-knowledge. The automatic recognition of such information can permit the training of systems allowing finer-grained searching of events according to the meta-knowledge that is associated with them. Based on a corpus of 1,000 MEDLINE abstracts, fully manually annotated with both events and associated meta-knowledge, we have constructed a machine learning-based system that automatically assigns meta-knowledge information to events. This system has been integrated into EventMine, a state-of-the-art event extraction system, in order to create a more advanced system (EventMine-MK) that not only extracts events from text automatically, but also assigns five different types of meta-knowledge to these events. The meta-knowledge assignment module of EventMine-MK performs with macro-averaged F-scores in the range of 57-87% on the BioNLP'09 Shared Task corpus. EventMine-MK has been evaluated on the BioNLP'09 Shared Task subtask of detecting negated and speculated events. Our results show that EventMine-MK can outperform other state-of-the-art systems that participated in this task. We have constructed the first practical system that extracts both events and associated, detailed meta-knowledge information from biomedical literature. The automatically assigned meta-knowledge information can be used to refine search systems, in order to provide an extra search layer beyond entities and assertions, dealing with phenomena such as rhetorical intent, speculations, contradictions and negations. This finer grained search functionality can assist in several important tasks, e.g., database curation (by locating new experimental knowledge) and pathway enrichment (by providing information for inference). To allow easy integration into text mining systems, EventMine-MK is provided as a UIMA component that can be used in the interoperable text mining infrastructure, U-Compare.
Individual Profiling Using Text Analysis
2016-04-15
Mining a Text for Errors. . . . on Knowledge discovery in data mining , pages 624–628, 2005. [12] Michal Kosinski, David Stillwell, and Thore Graepel...AFRL-AFOSR-UK-TR-2016-0011 Individual Profiling using Text Analysis 140333 Mark Stevenson UNIVERSITY OF SHEFFIELD, DEPARTMENT OF PSYCHOLOGY Final...REPORT TYPE Final 3. DATES COVERED (From - To) 15 Sep 2014 to 14 Sep 2015 4. TITLE AND SUBTITLE Individual Profiling using Text Analysis
Mining the pharmacogenomics literature—a survey of the state of the art
Cohen, K. Bretonnel; Garten, Yael; Shah, Nigam H.
2012-01-01
This article surveys efforts on text mining of the pharmacogenomics literature, mainly from the period 2008 to 2011. Pharmacogenomics (or pharmacogenetics) is the field that studies how human genetic variation impacts drug response. Therefore, publications span the intersection of research in genotypes, phenotypes and pharmacology, a topic that has increasingly become a focus of active research in recent years. This survey covers efforts dealing with the automatic recognition of relevant named entities (e.g. genes, gene variants and proteins, diseases and other pathological phenomena, drugs and other chemicals relevant for medical treatment), as well as various forms of relations between them. A wide range of text genres is considered, such as scientific publications (abstracts, as well as full texts), patent texts and clinical narratives. We also discuss infrastructure and resources needed for advanced text analytics, e.g. document corpora annotated with corresponding semantic metadata (gold standards and training data), biomedical terminologies and ontologies providing domain-specific background knowledge at different levels of formality and specificity, software architectures for building complex and scalable text analytics pipelines and Web services grounded to them, as well as comprehensive ways to disseminate and interact with the typically huge amounts of semiformal knowledge structures extracted by text mining tools. Finally, we consider some of the novel applications that have already been developed in the field of pharmacogenomic text mining and point out perspectives for future research. PMID:22833496
Mining the pharmacogenomics literature--a survey of the state of the art.
Hahn, Udo; Cohen, K Bretonnel; Garten, Yael; Shah, Nigam H
2012-07-01
This article surveys efforts on text mining of the pharmacogenomics literature, mainly from the period 2008 to 2011. Pharmacogenomics (or pharmacogenetics) is the field that studies how human genetic variation impacts drug response. Therefore, publications span the intersection of research in genotypes, phenotypes and pharmacology, a topic that has increasingly become a focus of active research in recent years. This survey covers efforts dealing with the automatic recognition of relevant named entities (e.g. genes, gene variants and proteins, diseases and other pathological phenomena, drugs and other chemicals relevant for medical treatment), as well as various forms of relations between them. A wide range of text genres is considered, such as scientific publications (abstracts, as well as full texts), patent texts and clinical narratives. We also discuss infrastructure and resources needed for advanced text analytics, e.g. document corpora annotated with corresponding semantic metadata (gold standards and training data), biomedical terminologies and ontologies providing domain-specific background knowledge at different levels of formality and specificity, software architectures for building complex and scalable text analytics pipelines and Web services grounded to them, as well as comprehensive ways to disseminate and interact with the typically huge amounts of semiformal knowledge structures extracted by text mining tools. Finally, we consider some of the novel applications that have already been developed in the field of pharmacogenomic text mining and point out perspectives for future research.
Using Text Mining to Characterize Online Discussion Facilitation
ERIC Educational Resources Information Center
Ming, Norma; Baumer, Eric
2011-01-01
Facilitating class discussions effectively is a critical yet challenging component of instruction, particularly in online environments where student and faculty interaction is limited. Our goals in this research were to identify facilitation strategies that encourage productive discussion, and to explore text mining techniques that can help…
DOT National Transportation Integrated Search
2003-06-01
It is estimated that approximately 8,500 abandoned underground mines are present in Ohio and mine-related : subsidence has been a problem dating back to the 1920's. Many investigative methods have been utilized with : varying degrees of success in an...
40 CFR 372.23 - SIC and NAICS codes to which this Part applies.
Code of Federal Regulations, 2010 CFR
2010-07-01
... facilities primarily engaged in reproducing text, drawings, plans, maps, or other copy, by blueprinting...)); 212324Kaolin and Ball Clay Mining Limited to facilities operating without a mine or quarry and that are...)); 212393Other Chemical and Fertilizer Mineral Mining Limited to facilities operating without a mine or quarry...
pubmed.mineR: an R package with text-mining algorithms to analyse PubMed abstracts.
Rani, Jyoti; Shah, A B Rauf; Ramachandran, Srinivasan
2015-10-01
The PubMed literature database is a valuable source of information for scientific research. It is rich in biomedical literature with more than 24 million citations. Data-mining of voluminous literature is a challenging task. Although several text-mining algorithms have been developed in recent years with focus on data visualization, they have limitations such as speed, are rigid and are not available in the open source. We have developed an R package, pubmed.mineR, wherein we have combined the advantages of existing algorithms, overcome their limitations, and offer user flexibility and link with other packages in Bioconductor and the Comprehensive R Network (CRAN) in order to expand the user capabilities for executing multifaceted approaches. Three case studies are presented, namely, 'Evolving role of diabetes educators', 'Cancer risk assessment' and 'Dynamic concepts on disease and comorbidity' to illustrate the use of pubmed.mineR. The package generally runs fast with small elapsed times in regular workstations even on large corpus sizes and with compute intensive functions. The pubmed.mineR is available at http://cran.rproject. org/web/packages/pubmed.mineR.
Considine, Robyn; Tynan, Ross; James, Carole; Wiggers, John; Lewin, Terry; Inder, Kerry; Perkins, David; Handley, Tonelle; Kelly, Brian
2017-01-01
Evidence regarding the extent of mental health problems and the associated characteristics within an employee population is necessary to inform appropriate and tailored workplace mental health programs. Mental health within male dominated industries (such as mining) has received recent public attention, chiefly through observations regarding suicide in such populations in Australia and internationally. Currently there is limited empirical evidence regarding the mental health needs in the mining industry as an exemplar of a male dominated workforce, and the relative contribution to such problems of individual, socio-economic and workplace factors. This study aimed to investigate the mental health and associated characteristics among employees in the Australian coal mining industry with a specific focus on identifying modifiable work characteristics. A cross-sectional study was conducted among employees (n = 1457) across eight coal mines stratified by key mine characteristics (state, mine type and employee commute arrangements). Participants completed measures of psychological distress (K10+) and key variables across four categories (socio-demographic characteristics, health history, current health behaviours, work attitudes and characteristics). Psychological distress levels within this sample were significantly higher in comparison with a community sample of employed Australians. The following factors contributed significantly to levels of psychological distress using hierarchical linear regression analysis: lower social networks; a past history of depression, anxiety or drug/alcohol problems; high recent alcohol use; work role (managers) and a set of work characteristics (level of satisfaction with work, financial factors and job insecurity; perception of lower workplace support for people with mental health problems. This is the first study to examine the characteristics associated with mental health problems in the Australian coal mining industry. The findings indicate the salience of mental health needs in this population, and the associated interplay of personal, social and work characteristics. The work characteristics associated with psychological distress are modifiable and can guide an industry response, as well as help inform the understanding of the role of workplace factors in mental health problems in a male dominated workforce more generally.
James, Carole; Wiggers, John; Lewin, Terry; Inder, Kerry; Perkins, David; Handley, Tonelle
2017-01-01
Background Evidence regarding the extent of mental health problems and the associated characteristics within an employee population is necessary to inform appropriate and tailored workplace mental health programs. Mental health within male dominated industries (such as mining) has received recent public attention, chiefly through observations regarding suicide in such populations in Australia and internationally. Currently there is limited empirical evidence regarding the mental health needs in the mining industry as an exemplar of a male dominated workforce, and the relative contribution to such problems of individual, socio-economic and workplace factors. This study aimed to investigate the mental health and associated characteristics among employees in the Australian coal mining industry with a specific focus on identifying modifiable work characteristics. Methods A cross-sectional study was conducted among employees (n = 1457) across eight coal mines stratified by key mine characteristics (state, mine type and employee commute arrangements). Participants completed measures of psychological distress (K10+) and key variables across four categories (socio-demographic characteristics, health history, current health behaviours, work attitudes and characteristics). Results Psychological distress levels within this sample were significantly higher in comparison with a community sample of employed Australians. The following factors contributed significantly to levels of psychological distress using hierarchical linear regression analysis: lower social networks; a past history of depression, anxiety or drug/alcohol problems; high recent alcohol use; work role (managers) and a set of work characteristics (level of satisfaction with work, financial factors and job insecurity; perception of lower workplace support for people with mental health problems. Conclusion This is the first study to examine the characteristics associated with mental health problems in the Australian coal mining industry. The findings indicate the salience of mental health needs in this population, and the associated interplay of personal, social and work characteristics. The work characteristics associated with psychological distress are modifiable and can guide an industry response, as well as help inform the understanding of the role of workplace factors in mental health problems in a male dominated workforce more generally. PMID:28045935
Chen, Chou-Cheng; Ho, Chung-Liang
2014-01-01
While a huge amount of information about biological literature can be obtained by searching the PubMed database, reading through all the titles and abstracts resulting from such a search for useful information is inefficient. Text mining makes it possible to increase this efficiency. Some websites use text mining to gather information from the PubMed database; however, they are database-oriented, using pre-defined search keywords while lacking a query interface for user-defined search inputs. We present the PubMed Abstract Reading Helper (PubstractHelper) website which combines text mining and reading assistance for an efficient PubMed search. PubstractHelper can accept a maximum of ten groups of keywords, within each group containing up to ten keywords. The principle behind the text-mining function of PubstractHelper is that keywords contained in the same sentence are likely to be related. PubstractHelper highlights sentences with co-occurring keywords in different colors. The user can download the PMID and the abstracts with color markings to be reviewed later. The PubstractHelper website can help users to identify relevant publications based on the presence of related keywords, which should be a handy tool for their research. http://bio.yungyun.com.tw/ATM/PubstractHelper.aspx and http://holab.med.ncku.edu.tw/ATM/PubstractHelper.aspx.
Text Classification for Organizational Researchers
Kobayashi, Vladimer B.; Mol, Stefan T.; Berkers, Hannah A.; Kismihók, Gábor; Den Hartog, Deanne N.
2017-01-01
Organizations are increasingly interested in classifying texts or parts thereof into categories, as this enables more effective use of their information. Manual procedures for text classification work well for up to a few hundred documents. However, when the number of documents is larger, manual procedures become laborious, time-consuming, and potentially unreliable. Techniques from text mining facilitate the automatic assignment of text strings to categories, making classification expedient, fast, and reliable, which creates potential for its application in organizational research. The purpose of this article is to familiarize organizational researchers with text mining techniques from machine learning and statistics. We describe the text classification process in several roughly sequential steps, namely training data preparation, preprocessing, transformation, application of classification techniques, and validation, and provide concrete recommendations at each step. To help researchers develop their own text classifiers, the R code associated with each step is presented in a tutorial. The tutorial draws from our own work on job vacancy mining. We end the article by discussing how researchers can validate a text classification model and the associated output. PMID:29881249
Graphics-based intelligent search and abstracting using Data Modeling
NASA Astrophysics Data System (ADS)
Jaenisch, Holger M.; Handley, James W.; Case, Carl T.; Songy, Claude G.
2002-11-01
This paper presents an autonomous text and context-mining algorithm that converts text documents into point clouds for visual search cues. This algorithm is applied to the task of data-mining a scriptural database comprised of the Old and New Testaments from the Bible and the Book of Mormon, Doctrine and Covenants, and the Pearl of Great Price. Results are generated which graphically show the scripture that represents the average concept of the database and the mining of the documents down to the verse level.
Towards cross-lingual alerting for bursty epidemic events.
Collier, Nigel
2011-10-06
Online news reports are increasingly becoming a source for event-based early warning systems that detect natural disasters. Harnessing the massive volume of information available from multilingual newswire presents as many challanges as opportunities due to the patterns of reporting complex spatio-temporal events. In this article we study the problem of utilising correlated event reports across languages. We track the evolution of 16 disease outbreaks using 5 temporal aberration detection algorithms on text-mined events classified according to disease and outbreak country. Using ProMED reports as a silver standard, comparative analysis of news data for 13 languages over a 129 day trial period showed improved sensitivity, F1 and timeliness across most models using cross-lingual events. We report a detailed case study analysis for Cholera in Angola 2010 which highlights the challenges faced in correlating news events with the silver standard. The results show that automated health surveillance using multilingual text mining has the potential to turn low value news into high value alerts if informed choices are used to govern the selection of models and data sources. An implementation of the C2 alerting algorithm using multilingual news is available at the BioCaster portal http://born.nii.ac.jp/?page=globalroundup.
NASA Astrophysics Data System (ADS)
Vorontsova, Elena; Vorontsov, Andrey; Drozdenko, Yuriy
2017-11-01
The article is devoted to the analysis of problems of maintenance of ecological safety of the mining enterprises. The aim of the work was the formulation of proposals, the implementation of which, in the opinion of the authors, is capable of raising the level of environmental safety of the mining industry and ultimately ensuring the environmentally oriented growth of the Russian economy.
The Labour Welfare Fund Laws (Amendment) Act, 1987 (No. 15 of 1987), 22 May 1987.
1987-01-01
This Act authorizes funds constituted under the Mica Mines Labour Welfare Fund Act, 1946, the Limestone and Dolomite Mines Labour Welfare Fund Act, 1972, the Iron Ore Mines, Manganese Ore Mines and Chrome Mines Labour Welfare Fund Act, 1976, and the Beedi Workers Welfare Fund Act, 1976, to be applied for the provision of family welfare, including family planning education and services. full text
Mining Tasks from the Web Anchor Text Graph: MSR Notebook Paper for the TREC 2015 Tasks Track
2015-11-20
Mining Tasks from the Web Anchor Text Graph: MSR Notebook Paper for the TREC 2015 Tasks Track Paul N. Bennett Microsoft Research Redmond, USA pauben...anchor text graph has proven useful in the general realm of query reformulation [2], we sought to quantify the value of extracting key phrases from...anchor text in the broader setting of the task understanding track. Given a query, our approach considers a simple method for identifying a relevant
NASA Astrophysics Data System (ADS)
Dou, Zhi-Wu
2010-08-01
To solve the inherent safety problem puzzling the coal mining industry, analyzing the characteristic and the application of distributed interactive simulation based on high level architecture (DIS/HLA), a new method is proposed for developing coal mining industry inherent safety distributed interactive simulation adopting HLA technology. Researching the function and structure of the system, a simple coal mining industry inherent safety is modeled with HLA, the FOM and SOM are developed, and the math models are suggested. The results of the instance research show that HLA plays an important role in developing distributed interactive simulation of complicated distributed system and the method is valid to solve the problem puzzling coal mining industry. To the coal mining industry, the conclusions show that the simulation system with HLA plays an important role to identify the source of hazard, to make the measure for accident, and to improve the level of management.
Mines and human casualties: a robotics approach toward mine clearing
NASA Astrophysics Data System (ADS)
Ghaffari, Masoud; Manthena, Dinesh; Ghaffari, Alireza; Hall, Ernest L.
2004-10-01
An estimated 100 million landmines which have been planted in more than 60 countries kill or maim thousands of civilians every year. Millions of people live in the vast dangerous areas and are not able to access to basic human services because of landmines" threats. This problem has affected many third world countries and poor nations which are not able to afford high cost solutions. This paper tries to present some experiences with the land mine victims and solutions for the mine clearing. It studies current situation of this crisis as well as state of the art robotics technology for the mine clearing. It also introduces a survey robot which is suitable for the mine clearing applications. The results show that in addition to technical aspects, this problem has many socio-economic issues. The significance of this study is to persuade robotics researchers toward this topic and to peruse the technical and humanitarian facets of this issue.
Privacy Preserving Sequential Pattern Mining in Data Stream
NASA Astrophysics Data System (ADS)
Huang, Qin-Hua
The privacy preserving data mining technique researches have gained much attention in recent years. For data stream systems, wireless networks and mobile devices, the related stream data mining techniques research is still in its' early stage. In this paper, an data mining algorithm dealing with privacy preserving problem in data stream is presented.
Large-Scale Constraint-Based Pattern Mining
ERIC Educational Resources Information Center
Zhu, Feida
2009-01-01
We studied the problem of constraint-based pattern mining for three different data formats, item-set, sequence and graph, and focused on mining patterns of large sizes. Colossal patterns in each data formats are studied to discover pruning properties that are useful for direct mining of these patterns. For item-set data, we observed robustness of…
Acid-mine drainage (AMD) is a severe pollution problem attributed to past mining activities. AMD is an acidic, metal-bearing wastewater generated by the oxidation of metal sulfides to sulfates by Thiobacillus bacteria in both active and abandoned mining operations. The wastewater...
Acid-mine drainage (AMD) is a severe pollution problem attributed to past mining activities. AMD is an acidic, metal-bearing wastewater generated by the oxidation of metal sulfides to sulfates by Thiobacillus bacteria in both the active and abandoned mining operations. The wastew...
NASA Astrophysics Data System (ADS)
Xue, Minghui; Wang, Chenghao; Zhang, Shanshan
2017-06-01
Coal mine community is driven by the coal mine industry, and it mainly relies on coal mining enterprises to provide benefits for residents. Under the background of increasing serious global aging problem, the problems in the field of elderly people’s health, life, entertainment, communication, retirement and re-employment and other aspects become more acute and urgently to be solved. So it is necessary to make a more detailed study on how to transform the coal mine community according to the special needs of the elderly miners. This article takes renewal design of SiHe coal mine in JinCheng of ShanXi province as an example and takes the community’s “life field” as a clue, trying to put forward the transformed strategy of “life field” for aged in coal mine community and to come up with a method to update the community throughout the whole atmosphere to the personal space.
SPMBR: a scalable algorithm for mining sequential patterns based on bitmaps
NASA Astrophysics Data System (ADS)
Xu, Xiwei; Zhang, Changhai
2013-12-01
Now some sequential patterns mining algorithms generate too many candidate sequences, and increase the processing cost of support counting. Therefore, we present an effective and scalable algorithm called SPMBR (Sequential Patterns Mining based on Bitmap Representation) to solve the problem of mining the sequential patterns for large databases. Our method differs from previous related works of mining sequential patterns. The main difference is that the database of sequential patterns is represented by bitmaps, and a simplified bitmap structure is presented firstly. In this paper, First the algorithm generate candidate sequences by SE(Sequence Extension) and IE(Item Extension), and then obtain all frequent sequences by comparing the original bitmap and the extended item bitmap .This method could simplify the problem of mining the sequential patterns and avoid the high processing cost of support counting. Both theories and experiments indicate that the performance of SPMBR is predominant for large transaction databases, the required memory size for storing temporal data is much less during mining process, and all sequential patterns can be mined with feasibility.
ERIC Educational Resources Information Center
Wang, Yinying; Bowers, Alex J.; Fikis, David J.
2017-01-01
Purpose: The purpose of this study is to describe the underlying topics and the topic evolution in the 50-year history of educational leadership research literature. Method: We used automated text data mining with probabilistic latent topic models to examine the full text of the entire publication history of all 1,539 articles published in…
78 FR 6130 - Notice of Proposed Information Collection
Federal Register 2010, 2011, 2012, 2013, 2014
2013-01-29
... for the collection of information for the Abandoned Mine Land Problem Area Description form. This... collection is contained in the Form OSM-76, Abandoned Mine Land Problem Area Description form. OSM will... requests to OMB. Before including your address, phone number, email address, or other personal identifying...
NASA Astrophysics Data System (ADS)
Kim, Kwang Hyeon; Lee, Suk; Shim, Jang Bo; Chang, Kyung Hwan; Yang, Dae Sik; Yoon, Won Sup; Park, Young Je; Kim, Chul Yong; Cao, Yuan Jie
2017-08-01
The aim of this study is an integrated research for text-based data mining and toxicity prediction modeling system for clinical decision support system based on big data in radiation oncology as a preliminary research. The structured and unstructured data were prepared by treatment plans and the unstructured data were extracted by dose-volume data image pattern recognition of prostate cancer for research articles crawling through the internet. We modeled an artificial neural network to build a predictor model system for toxicity prediction of organs at risk. We used a text-based data mining approach to build the artificial neural network model for bladder and rectum complication predictions. The pattern recognition method was used to mine the unstructured toxicity data for dose-volume at the detection accuracy of 97.9%. The confusion matrix and training model of the neural network were achieved with 50 modeled plans (n = 50) for validation. The toxicity level was analyzed and the risk factors for 25% bladder, 50% bladder, 20% rectum, and 50% rectum were calculated by the artificial neural network algorithm. As a result, 32 plans could cause complication but 18 plans were designed as non-complication among 50 modeled plans. We integrated data mining and a toxicity modeling method for toxicity prediction using prostate cancer cases. It is shown that a preprocessing analysis using text-based data mining and prediction modeling can be expanded to personalized patient treatment decision support based on big data.
A sentence sliding window approach to extract protein annotations from biomedical articles
Krallinger, Martin; Padron, Maria; Valencia, Alfonso
2005-01-01
Background Within the emerging field of text mining and statistical natural language processing (NLP) applied to biomedical articles, a broad variety of techniques have been developed during the past years. Nevertheless, there is still a great ned of comparative assessment of the performance of the proposed methods and the development of common evaluation criteria. This issue was addressed by the Critical Assessment of Text Mining Methods in Molecular Biology (BioCreative) contest. The aim of this contest was to assess the performance of text mining systems applied to biomedical texts including tools which recognize named entities such as genes and proteins, and tools which automatically extract protein annotations. Results The "sentence sliding window" approach proposed here was found to efficiently extract text fragments from full text articles containing annotations on proteins, providing the highest number of correctly predicted annotations. Moreover, the number of correct extractions of individual entities (i.e. proteins and GO terms) involved in the relationships used for the annotations was significantly higher than the correct extractions of the complete annotations (protein-function relations). Conclusion We explored the use of averaging sentence sliding windows for information extraction, especially in a context where conventional training data is unavailable. The combination of our approach with more refined statistical estimators and machine learning techniques might be a way to improve annotation extraction for future biomedical text mining applications. PMID:15960831
Text Mining Improves Prediction of Protein Functional Sites
Cohn, Judith D.; Ravikumar, Komandur E.
2012-01-01
We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites). The structure analysis was carried out using Dynamics Perturbation Analysis (DPA), which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites) in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions. PMID:22393388
Xia, Jingbo; Zhang, Xing; Yuan, Daojun; Chen, Lingling; Webster, Jonathan; Fang, Alex Chengyu
2013-01-01
To effectively assess the possibility of the unknown rice protein resistant to Xanthomonas oryzae pv. oryzae, a hybrid strategy is proposed to enhance gene prioritization by combining text mining technologies with a sequence-based approach. The text mining technique of term frequency inverse document frequency is used to measure the importance of distinguished terms which reflect biomedical activity in rice before candidate genes are screened and vital terms are produced. Afterwards, a built-in classifier under the chaos games representation algorithm is used to sieve the best possible candidate gene. Our experiment results show that the combination of these two methods achieves enhanced gene prioritization. PMID:24371834
Uncovering text mining: A survey of current work on web-based epidemic intelligence
Collier, Nigel
2012-01-01
Real world pandemics such as SARS 2002 as well as popular fiction like the movie Contagion graphically depict the health threat of a global pandemic and the key role of epidemic intelligence (EI). While EI relies heavily on established indicator sources a new class of methods based on event alerting from unstructured digital Internet media is rapidly becoming acknowledged within the public health community. At the heart of automated information gathering systems is a technology called text mining. My contribution here is to provide an overview of the role that text mining technology plays in detecting epidemics and to synthesise my existing research on the BioCaster project. PMID:22783909
Xia, Jingbo; Zhang, Xing; Yuan, Daojun; Chen, Lingling; Webster, Jonathan; Fang, Alex Chengyu
2013-01-01
To effectively assess the possibility of the unknown rice protein resistant to Xanthomonas oryzae pv. oryzae, a hybrid strategy is proposed to enhance gene prioritization by combining text mining technologies with a sequence-based approach. The text mining technique of term frequency inverse document frequency is used to measure the importance of distinguished terms which reflect biomedical activity in rice before candidate genes are screened and vital terms are produced. Afterwards, a built-in classifier under the chaos games representation algorithm is used to sieve the best possible candidate gene. Our experiment results show that the combination of these two methods achieves enhanced gene prioritization.
Kreula, Sanna M; Kaewphan, Suwisa; Ginter, Filip; Jones, Patrik R
2018-01-01
The increasing move towards open access full-text scientific literature enhances our ability to utilize advanced text-mining methods to construct information-rich networks that no human will be able to grasp simply from 'reading the literature'. The utility of text-mining for well-studied species is obvious though the utility for less studied species, or those with no prior track-record at all, is not clear. Here we present a concept for how advanced text-mining can be used to create information-rich networks even for less well studied species and apply it to generate an open-access gene-gene association network resource for Synechocystis sp. PCC 6803, a representative model organism for cyanobacteria and first case-study for the methodology. By merging the text-mining network with networks generated from species-specific experimental data, network integration was used to enhance the accuracy of predicting novel interactions that are biologically relevant. A rule-based algorithm (filter) was constructed in order to automate the search for novel candidate genes with a high degree of likely association to known target genes by (1) ignoring established relationships from the existing literature, as they are already 'known', and (2) demanding multiple independent evidences for every novel and potentially relevant relationship. Using selected case studies, we demonstrate the utility of the network resource and filter to ( i ) discover novel candidate associations between different genes or proteins in the network, and ( ii ) rapidly evaluate the potential role of any one particular gene or protein. The full network is provided as an open-source resource.
SWIFT-Review: a text-mining workbench for systematic review.
Howard, Brian E; Phillips, Jason; Miller, Kyle; Tandon, Arpit; Mav, Deepak; Shah, Mihir R; Holmgren, Stephanie; Pelch, Katherine E; Walker, Vickie; Rooney, Andrew A; Macleod, Malcolm; Shah, Ruchir R; Thayer, Kristina
2016-05-23
There is growing interest in using machine learning approaches to priority rank studies and reduce human burden in screening literature when conducting systematic reviews. In addition, identifying addressable questions during the problem formulation phase of systematic review can be challenging, especially for topics having a large literature base. Here, we assess the performance of the SWIFT-Review priority ranking algorithm for identifying studies relevant to a given research question. We also explore the use of SWIFT-Review during problem formulation to identify, categorize, and visualize research areas that are data rich/data poor within a large literature corpus. Twenty case studies, including 15 public data sets, representing a range of complexity and size, were used to assess the priority ranking performance of SWIFT-Review. For each study, seed sets of manually annotated included and excluded titles and abstracts were used for machine training. The remaining references were then ranked for relevance using an algorithm that considers term frequency and latent Dirichlet allocation (LDA) topic modeling. This ranking was evaluated with respect to (1) the number of studies screened in order to identify 95 % of known relevant studies and (2) the "Work Saved over Sampling" (WSS) performance metric. To assess SWIFT-Review for use in problem formulation, PubMed literature search results for 171 chemicals implicated as EDCs were uploaded into SWIFT-Review (264,588 studies) and categorized based on evidence stream and health outcome. Patterns of search results were surveyed and visualized using a variety of interactive graphics. Compared with the reported performance of other tools using the same datasets, the SWIFT-Review ranking procedure obtained the highest scores on 11 out of 15 of the public datasets. Overall, these results suggest that using machine learning to triage documents for screening has the potential to save, on average, more than 50 % of the screening effort ordinarily required when using un-ordered document lists. In addition, the tagging and annotation capabilities of SWIFT-Review can be useful during the activities of scoping and problem formulation. Text-mining and machine learning software such as SWIFT-Review can be valuable tools to reduce the human screening burden and assist in problem formulation.
Code of Federal Regulations, 2010 CFR
2010-07-01
... texts of State and Federal cooperative agreements for regulation of mining on Federal lands. The... Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR PROGRAMS FOR THE CONDUCT OF SURFACE MINING OPERATIONS WITHIN EACH STATE INTRODUCTION § 900.2 Objectives. The objective of...
76 FR 40649 - Indiana Regulatory Program
Federal Register 2010, 2011, 2012, 2013, 2014
2011-07-11
... at 312 IAC 25-6-30 Surface mining; explosives; general requirements. The full text of the program... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 914... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period on proposed...
Complementing the Numbers: A Text Mining Analysis of College Course Withdrawals
ERIC Educational Resources Information Center
Michalski, Greg V.
2011-01-01
Excessive college course withdrawals are costly to the student and the institution in terms of time to degree completion, available classroom space, and other resources. Although generally well quantified, detailed analysis of the reasons given by students for course withdrawal is less common. To address this, a text mining analysis was performed…
Can abstract screening workload be reduced using text mining? User experiences of the tool Rayyan.
Olofsson, Hanna; Brolund, Agneta; Hellberg, Christel; Silverstein, Rebecca; Stenström, Karin; Österberg, Marie; Dagerhamn, Jessica
2017-09-01
One time-consuming aspect of conducting systematic reviews is the task of sifting through abstracts to identify relevant studies. One promising approach for reducing this burden uses text mining technology to identify those abstracts that are potentially most relevant for a project, allowing those abstracts to be screened first. To examine the effectiveness of the text mining functionality of the abstract screening tool Rayyan. User experiences were collected. Rayyan was used to screen abstracts for 6 reviews in 2015. After screening 25%, 50%, and 75% of the abstracts, the screeners logged the relevant references identified. A survey was sent to users. After screening half of the search result with Rayyan, 86% to 99% of the references deemed relevant to the study were identified. Of those studies included in the final reports, 96% to 100% were already identified in the first half of the screening process. Users rated Rayyan 4.5 out of 5. The text mining function in Rayyan successfully helped reviewers identify relevant studies early in the screening process. Copyright © 2017 John Wiley & Sons, Ltd.
Pandey, Abhishek; Kreimeyer, Kory; Foster, Matthew; Botsis, Taxiarchis; Dang, Oanh; Ly, Thomas; Wang, Wei; Forshee, Richard
2018-01-01
Structured Product Labels follow an XML-based document markup standard approved by the Health Level Seven organization and adopted by the US Food and Drug Administration as a mechanism for exchanging medical products information. Their current organization makes their secondary use rather challenging. We used the Side Effect Resource database and DailyMed to generate a comparison dataset of 1159 Structured Product Labels. We processed the Adverse Reaction section of these Structured Product Labels with the Event-based Text-mining of Health Electronic Records system and evaluated its ability to extract and encode Adverse Event terms to Medical Dictionary for Regulatory Activities Preferred Terms. A small sample of 100 labels was then selected for further analysis. Of the 100 labels, Event-based Text-mining of Health Electronic Records achieved a precision and recall of 81 percent and 92 percent, respectively. This study demonstrated Event-based Text-mining of Health Electronic Record's ability to extract and encode Adverse Event terms from Structured Product Labels which may potentially support multiple pharmacoepidemiological tasks.
Data Processing and Text Mining Technologies on Electronic Medical Records: A Review
Sun, Wencheng; Li, Yangyang; Liu, Fang; Fang, Shengqun; Wang, Guoyan
2018-01-01
Currently, medical institutes generally use EMR to record patient's condition, including diagnostic information, procedures performed, and treatment results. EMR has been recognized as a valuable resource for large-scale analysis. However, EMR has the characteristics of diversity, incompleteness, redundancy, and privacy, which make it difficult to carry out data mining and analysis directly. Therefore, it is necessary to preprocess the source data in order to improve data quality and improve the data mining results. Different types of data require different processing technologies. Most structured data commonly needs classic preprocessing technologies, including data cleansing, data integration, data transformation, and data reduction. For semistructured or unstructured data, such as medical text, containing more health information, it requires more complex and challenging processing methods. The task of information extraction for medical texts mainly includes NER (named-entity recognition) and RE (relation extraction). This paper focuses on the process of EMR processing and emphatically analyzes the key techniques. In addition, we make an in-depth study on the applications developed based on text mining together with the open challenges and research issues for future work. PMID:29849998
30 CFR 785.25 - Lands eligible for remining.
Code of Federal Regulations, 2010 CFR
2010-07-01
... application to conduct a surface coal mining operation on lands eligible for remining must comply with this... environmental and safety problems related to prior mining activity at the site and that could be reasonably... tailored to current site conditions. (2) With regard to potential environmental and safety problems...
Analysis of radon reduction and ventilation systems in uranium mines in China.
Hu, Peng-hua; Li, Xian-jie
2012-09-01
Mine ventilation is the most important way of reducing radon in uranium mines. At present, the radon and radon progeny levels in Chinese uranium mines where the cut and fill stoping method is used are 3-5 times higher than those in foreign uranium mines, as there is not much difference in the investments for ventilation protection between Chinese uranium mines and international advanced uranium mines with compaction methodology. In this paper, through the analysis of radon reduction and ventilation systems in Chinese uranium mines and the comparison of advantages and disadvantages between a variety of ventilation systems in terms of radon control, the authors try to illustrate the reasons for the higher radon and radon progeny levels in Chinese uranium mines and put forward some problems in three areas, namely the theory of radon control and ventilation systems, radon reduction ventilation measures and ventilation management. For these problems, this paper puts forward some proposals regarding some aspects, such as strengthening scrutiny, verifying and monitoring the practical situation, making clear ventilation plans, strictly following the mining sequence, promoting training of ventilation staff, enhancing ventilation system management, developing radon reduction ventilation technology, purchasing ventilation equipment as soon as possible in the future, and so on.
North Branch Potomac River Basin mine drainage study. Phase I. Baseline survey. Final report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
1977-05-06
This baseline survey of the mine drainage and related water resources of the North Branch Potomac River Basin established the extent, magnitude, and effects of coal mine drainage pollution. Alternative abatement and reclamation solutions were considered. The study included an analysis of socioeconomic and environmental conditions as related to the mine drainage problem.
An overview of the BioCreative 2012 Workshop Track III: interactive text mining task
Arighi, Cecilia N.; Carterette, Ben; Cohen, K. Bretonnel; Krallinger, Martin; Wilbur, W. John; Fey, Petra; Dodson, Robert; Cooper, Laurel; Van Slyke, Ceri E.; Dahdul, Wasila; Mabee, Paula; Li, Donghui; Harris, Bethany; Gillespie, Marc; Jimenez, Silvia; Roberts, Phoebe; Matthews, Lisa; Becker, Kevin; Drabkin, Harold; Bello, Susan; Licata, Luana; Chatr-aryamontri, Andrew; Schaeffer, Mary L.; Park, Julie; Haendel, Melissa; Van Auken, Kimberly; Li, Yuling; Chan, Juancarlos; Muller, Hans-Michael; Cui, Hong; Balhoff, James P.; Chi-Yang Wu, Johnny; Lu, Zhiyong; Wei, Chih-Hsuan; Tudor, Catalina O.; Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar; Cejuela, Juan Miguel; Dubey, Pratibha; Wu, Cathy
2013-01-01
In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators’ overall experience of a system, regardless of the system’s high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV. PMID:23327936
An overview of the BioCreative 2012 Workshop Track III: interactive text mining task.
Arighi, Cecilia N; Carterette, Ben; Cohen, K Bretonnel; Krallinger, Martin; Wilbur, W John; Fey, Petra; Dodson, Robert; Cooper, Laurel; Van Slyke, Ceri E; Dahdul, Wasila; Mabee, Paula; Li, Donghui; Harris, Bethany; Gillespie, Marc; Jimenez, Silvia; Roberts, Phoebe; Matthews, Lisa; Becker, Kevin; Drabkin, Harold; Bello, Susan; Licata, Luana; Chatr-aryamontri, Andrew; Schaeffer, Mary L; Park, Julie; Haendel, Melissa; Van Auken, Kimberly; Li, Yuling; Chan, Juancarlos; Muller, Hans-Michael; Cui, Hong; Balhoff, James P; Chi-Yang Wu, Johnny; Lu, Zhiyong; Wei, Chih-Hsuan; Tudor, Catalina O; Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar; Cejuela, Juan Miguel; Dubey, Pratibha; Wu, Cathy
2013-01-01
In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators' overall experience of a system, regardless of the system's high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV.
76 FR 12849 - Kentucky Regulatory Program
Federal Register 2010, 2011, 2012, 2013, 2014
2011-03-09
... (underground mining). The text of the Kentucky regulations can be found in the administrative record and online... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 917 [KY-252-FOR; OSM-2009-0011] Kentucky Regulatory Program AGENCY: Office of Surface Mining Reclamation...
NASA Astrophysics Data System (ADS)
Popov, Valeriy; Filatov, Yuriy; Lee, Hee; Golik, Anatoliy
2017-11-01
The paper discusses the problem of the underground mining safety control. The long-term air intake to coal accumulations is reviewed as one of the reasons of endogenous fires during mining. The methods of combating air leaks (inflows) in order to prevent endogenous fires are analyzed. The calculations showing the discrepancy between the design calculations for the mine ventilation, disregarding a number of mining-andgeological and mining-engineering factors, and the actual conditions of mining are given. It is proved that the conversion of operating mines to combined (pressure and exhaust) ventilation system in order to reduce the endogenous fire hazard of underground mining is unreasonable due to impossibility of providing an optimal distribution of aerodynamic pressure in mines. The conversion does not exclude the entry of air into potentially hazardous zones of endogenous fires. The essence of the combined application of positive and negative control methods for the distribution of air pressure is revealed. It consists of air doors installation in easily ventilated airways and installation of pressure equalization chambers equipped with auxiliary fans near the stoppings, working sections and in parallel airways.The effectiveness of the combined application of negative and positive control methods for the air pressure distribution in order to reduce endogenous fire hazard of mining operations is proved.
Research on Classification of Chinese Text Data Based on SVM
NASA Astrophysics Data System (ADS)
Lin, Yuan; Yu, Hongzhi; Wan, Fucheng; Xu, Tao
2017-09-01
Data Mining has important application value in today’s industry and academia. Text classification is a very important technology in data mining. At present, there are many mature algorithms for text classification. KNN, NB, AB, SVM, decision tree and other classification methods all show good classification performance. Support Vector Machine’ (SVM) classification method is a good classifier in machine learning research. This paper will study the classification effect based on the SVM method in the Chinese text data, and use the support vector machine method in the chinese text to achieve the classify chinese text, and to able to combination of academia and practical application.
StemTextSearch: Stem cell gene database with evidence from abstracts.
Chen, Chou-Cheng; Ho, Chung-Liang
2017-05-01
Previous studies have used many methods to find biomarkers in stem cells, including text mining, experimental data and image storage. However, no text-mining methods have yet been developed which can identify whether a gene plays a positive or negative role in stem cells. StemTextSearch identifies the role of a gene in stem cells by using a text-mining method to find combinations of gene regulation, stem-cell regulation and cell processes in the same sentences of biomedical abstracts. The dataset includes 5797 genes, with 1534 genes having positive roles in stem cells, 1335 genes having negative roles, 1654 genes with both positive and negative roles, and 1274 with an uncertain role. The precision of gene role in StemTextSearch is 0.66, and the recall is 0.78. StemTextSearch is a web-based engine with queries that specify (i) gene, (ii) category of stem cell, (iii) gene role, (iv) gene regulation, (v) cell process, (vi) stem-cell regulation, and (vii) species. StemTextSearch is available through http://bio.yungyun.com.tw/StemTextSearch.aspx. Copyright © 2017. Published by Elsevier Inc.
Implementation of Paste Backfill Mining Technology in Chinese Coal Mines
Chang, Qingliang; Zhou, Huaqiang; Bai, Jianbiao
2014-01-01
Implementation of clean mining technology at coal mines is crucial to protect the environment and maintain balance among energy resources, consumption, and ecology. After reviewing present coal clean mining technology, we introduce the technology principles and technological process of paste backfill mining in coal mines and discuss the components and features of backfill materials, the constitution of the backfill system, and the backfill process. Specific implementation of this technology and its application are analyzed for paste backfill mining in Daizhuang Coal Mine; a practical implementation shows that paste backfill mining can improve the safety and excavation rate of coal mining, which can effectively resolve surface subsidence problems caused by underground mining activities, by utilizing solid waste such as coal gangues as a resource. Therefore, paste backfill mining is an effective clean coal mining technology, which has widespread application. PMID:25258737
Implementation of paste backfill mining technology in Chinese coal mines.
Chang, Qingliang; Chen, Jianhang; Zhou, Huaqiang; Bai, Jianbiao
2014-01-01
Implementation of clean mining technology at coal mines is crucial to protect the environment and maintain balance among energy resources, consumption, and ecology. After reviewing present coal clean mining technology, we introduce the technology principles and technological process of paste backfill mining in coal mines and discuss the components and features of backfill materials, the constitution of the backfill system, and the backfill process. Specific implementation of this technology and its application are analyzed for paste backfill mining in Daizhuang Coal Mine; a practical implementation shows that paste backfill mining can improve the safety and excavation rate of coal mining, which can effectively resolve surface subsidence problems caused by underground mining activities, by utilizing solid waste such as coal gangues as a resource. Therefore, paste backfill mining is an effective clean coal mining technology, which has widespread application.
Hydrology of area 18, Eastern Coal Province, Tennessee
May, V.J.
1981-01-01
The Eastern Coal Province is divided into 24 hydrologic reporting areas. This report describes the hydrology of area 18 which is located in the Cumberland River basin in central Tennessee near the southern end of the Province. Hydrologic information and sources are presented as text, tables, maps, and other illustrations designed to be useful to mine owners, operators, and consulting engineers in implementing permit applications that comply with the environmental requirements of the ' Surface Mining Control and Reclamation Act of 1977. ' Area 18 encompasses parts of three physiographic regions; from east to west the Cumberland Plateau, Highland Rim, and Central Basin. The Plateau is underlain by sandstones and shales, with thin interbedded coal beds, of Pennsylvanian age. The Highland Rim and Central Basin are underlain by limestone and dolomite of Mississippian age. Field and laboratory analyses of chemical and physical water-quality parameters of streamflow samples show no widespread water quality problems. Some streams, however, in the heavily mined areas have concentrations of sulfate, iron, manganese, and sediment above natural levels, and pH values below natural levels. Mine seepage and direct mine drainage were not sampled. Ground water occurs in and moves through fractures in the sandstones and shales and solution openings in the limestones and dolomites. Depth to water is variable, ranging from about 5 to 70 feet below land-surface in the limestones and dolomites, and 15 to 40 feet in the coal-bearing rocks. The quality of ground water is generally good. Locally, in coal-bearing rocks, acidic water and high concentrations of manganese, chloride, and iron have been detected. (USGS)
GROUNDWATER IMPACTED BY ACID MINE DRAINAGE
The generation and release of acidic, metal-rich water from mine wastes continues to be an intractable environmental problem. Although the effects of acid mine drainage (AMD) are most evident in surface waters, there is an obvious need for developing cost-effective approaches fo...
Problems associated with noise measurements in the mining industry
NASA Astrophysics Data System (ADS)
Bauer, Eric R.; Vipperman, Jeffrey S.
2002-05-01
In response to the continuing problem of noise-induced hearing loss (NIHL) among mine workers, the National Institute for Occupational Safety and Health (NIOSH) has been conducting numerous noise- and hearing-loss research efforts in the mining industry. Research is underway to determine worker noise exposure, equipment noise, hearing loss and hearing protection use, and to evaluate engineering controls. Issues that are peculiar to the mining industry have complicated these efforts. A few of the issues that must be overcome to conduct meaningful research include constantly moving equipment, changing work environments, confined space, varying production rates, multiple noise sources, and electronic permissibility of instrumentation. This presentation will address the factors that affect the measurement and analysis of noise in the mining industry and how these factors are managed. In addition, some examples of research results will be included.
Developing image processing meta-algorithms with data mining of multiple metrics.
Leung, Kelvin; Cunha, Alexandre; Toga, A W; Parker, D Stott
2014-01-01
People often use multiple metrics in image processing, but here we take a novel approach of mining the values of batteries of metrics on image processing results. We present a case for extending image processing methods to incorporate automated mining of multiple image metric values. Here by a metric we mean any image similarity or distance measure, and in this paper we consider intensity-based and statistical image measures and focus on registration as an image processing problem. We show how it is possible to develop meta-algorithms that evaluate different image processing results with a number of different metrics and mine the results in an automated fashion so as to select the best results. We show that the mining of multiple metrics offers a variety of potential benefits for many image processing problems, including improved robustness and validation.
Deriving novel relationships from the scientific literature is an important adjunct to datamining activities for complex datasets in genomics and high-throughput screening activities. Automated text-mining algorithms can be used to extract relevant content from the literature and...
A Feature Mining Based Approach for the Classification of Text Documents into Disjoint Classes.
ERIC Educational Resources Information Center
Nieto Sanchez, Salvador; Triantaphyllou, Evangelos; Kraft, Donald
2002-01-01
Proposes a new approach for classifying text documents into two disjoint classes. Highlights include a brief overview of document clustering; a data mining approach called the One Clause at a Time (OCAT) algorithm which is based on mathematical logic; vector space model (VSM); and comparing the OCAT to the VSM. (Author/LRW)
ERIC Educational Resources Information Center
Hung, Jui-Long; Zhang, Ke
2012-01-01
This study investigated the longitudinal trends of academic articles in Mobile Learning (ML) using text mining techniques. One hundred and nineteen (119) refereed journal articles and proceedings papers from the SCI/SSCI database were retrieved and analyzed. The taxonomies of ML publications were grouped into twelve clusters (topics) and four…
Trends of E-Learning Research from 2000 to 2008: Use of Text Mining and Bibliometrics
ERIC Educational Resources Information Center
Hung, Jui-long
2012-01-01
This study investigated the longitudinal trends of e-learning research using text mining techniques. Six hundred and eighty-nine (689) refereed journal articles and proceedings were retrieved from the Science Citation Index/Social Science Citation Index database in the period from 2000 to 2008. All e-learning publications were grouped into two…
Frequent Itemset Hiding Algorithm Using Frequent Pattern Tree Approach
ERIC Educational Resources Information Center
Alnatsheh, Rami
2012-01-01
A problem that has been the focus of much recent research in privacy preserving data-mining is the frequent itemset hiding (FIH) problem. Identifying itemsets that appear together frequently in customer transactions is a common task in association rule mining. Organizations that share data with business partners may consider some of the frequent…
Kreula, Sanna M.; Kaewphan, Suwisa; Ginter, Filip
2018-01-01
The increasing move towards open access full-text scientific literature enhances our ability to utilize advanced text-mining methods to construct information-rich networks that no human will be able to grasp simply from ‘reading the literature’. The utility of text-mining for well-studied species is obvious though the utility for less studied species, or those with no prior track-record at all, is not clear. Here we present a concept for how advanced text-mining can be used to create information-rich networks even for less well studied species and apply it to generate an open-access gene-gene association network resource for Synechocystis sp. PCC 6803, a representative model organism for cyanobacteria and first case-study for the methodology. By merging the text-mining network with networks generated from species-specific experimental data, network integration was used to enhance the accuracy of predicting novel interactions that are biologically relevant. A rule-based algorithm (filter) was constructed in order to automate the search for novel candidate genes with a high degree of likely association to known target genes by (1) ignoring established relationships from the existing literature, as they are already ‘known’, and (2) demanding multiple independent evidences for every novel and potentially relevant relationship. Using selected case studies, we demonstrate the utility of the network resource and filter to (i) discover novel candidate associations between different genes or proteins in the network, and (ii) rapidly evaluate the potential role of any one particular gene or protein. The full network is provided as an open-source resource. PMID:29844966
Van Landeghem, Sofie; Abeel, Thomas; Saeys, Yvan; Van de Peer, Yves
2010-09-15
In the field of biomolecular text mining, black box behavior of machine learning systems currently limits understanding of the true nature of the predictions. However, feature selection (FS) is capable of identifying the most relevant features in any supervised learning setting, providing insight into the specific properties of the classification algorithm. This allows us to build more accurate classifiers while at the same time bridging the gap between the black box behavior and the end-user who has to interpret the results. We show that our FS methodology successfully discards a large fraction of machine-generated features, improving classification performance of state-of-the-art text mining algorithms. Furthermore, we illustrate how FS can be applied to gain understanding in the predictions of a framework for biomolecular event extraction from text. We include numerous examples of highly discriminative features that model either biological reality or common linguistic constructs. Finally, we discuss a number of insights from our FS analyses that will provide the opportunity to considerably improve upon current text mining tools. The FS algorithms and classifiers are available in Java-ML (http://java-ml.sf.net). The datasets are publicly available from the BioNLP'09 Shared Task web site (http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/SharedTask/).
An update on USGS studies of the Summitville Mine and its downstream environmental effects
Plumlee, Geoffrey S.; Edelmann, Patrick R.
1995-01-01
The Summitville gold mine, located at ~3800 meters (11,500 ft) elevation in the San Juan Mountains of southwestern Colorado, was the focus of extensive public attention in 1992 and 1993 for environmental problems stemming from recent open-pit mining activities. Summitville catalyzed national debates about the environmental effects of modern mining activities, and became the focus of arguments for proposed revisions to the 1872 Mining Law governing mining activities on public lands. In early 1993, the State of Colorado, U.S. Environmental Protection Agency (EPA), U.S. Geological Survey (USGS), U.S. Fish and Wildlife Service (USFWS), Colorado State University, San Luis Valley agencies, downstream water users, private companies, and individuals began a multi-disciplinary research program to provide needed scientific information on Summitville's environmental problems and downstream environmental effects. Detailed results of this multi-agency effort were presented, along with legal and policy issues, at the Summitville Forum in January, 1995, at Colorado State University, Fort Collins, Colorado.
Mine countermeasures (MCM) sensor technology drivers
NASA Astrophysics Data System (ADS)
Skinner, David P.
1995-06-01
In recent years, MCM has moved to the forefront of the Navy's attention. This paper describes the general problems that drive the technology requirements of classical sea mine countermeasure (MCM) sensors for those working outside of this specialized area. Sensor requirements for MCM are compared with those for antisubmarine warfare. This highlights the unique environmental issues and crucial false target problems. The elimination of false targets, not mine detection, is the principal driver of MCM sensor requirements and places special emphasis on the technologies needed for the sequential operations of detection, classification, and identification.
NASA Astrophysics Data System (ADS)
Sirota, Dmitry; Ivanov, Vadim
2017-11-01
Any mining operations influence stability of natural and technogenic massifs are the reason of emergence of the sources of differences of mechanical tension. These sources generate a quasistationary electric field with a Newtonian potential. The paper reviews the method of determining the shape and size of a flat source field with this kind of potential. This common problem meets in many fields of mining: geological exploration mineral resources, ore deposits, control of mining by underground method, determining coal self-heating source, localization of the rock crack's sources and other applied problems of practical physics. This problems are ill-posed and inverse and solved by converting to Fredholm-Uryson integral equation of the first kind. This equation will be solved by A.N. Tikhonov regularization method.
Generative Topic Modeling in Image Data Mining and Bioinformatics Studies
ERIC Educational Resources Information Center
Chen, Xin
2012-01-01
Probabilistic topic models have been developed for applications in various domains such as text mining, information retrieval and computer vision and bioinformatics domain. In this thesis, we focus on developing novel probabilistic topic models for image mining and bioinformatics studies. Specifically, a probabilistic topic-connection (PTC) model…
Federal Register 2010, 2011, 2012, 2013, 2014
2013-07-05
... silver mining operation. Most of the infrastructure to support a mining operation was authorized and.... The Proposed Action consists of underground mining, constructing a new production shaft, improving.... Public comments resulted in the addition of clarifying text, but did not significantly change the...
NASA Astrophysics Data System (ADS)
Masaitis, Alexandra
2014-05-01
New economic, environmental and social challenges for the mining industry in the USA show the need to implement "responsible" mining practices that include improved community involvement. Conflicts which occur in the US territory and with US mining companies around the world are now common between the mining proponents, NGO's and communities. These conflicts can sometimes be alleviated by early development of modes of communication, and a formal discussion format that allows airing of concerns and potential resolution of problems. One of the methods that can formalize this process is to establish a Good Neighbor Agreement (GNA), which deals specifically with challenges in relationships between mining operations and the local communities. It is a new practice related to mining operations that are oriented toward social needs and concerns of local communities that arise during the normal life of a mine, which can achieve sustainable mining practices. The GNA project being currently developed at the University of Nevada, USA in cooperation with the Newmont Mining Corporation has a goal of creating an open company/community dialog that will help identify and address sociological and environmental concerns associated with mining. Discussion: The Good Neighbor Agreement currently evolving will address the following: 1. Identify spheres of possible cooperation between mining companies, government organizations, and NGO's. 2. Provide an economically viable mechanism for developing a partnership between mining operations and the local communities that will increase mining industry's accountability and provide higher levels of confidence for the community that a mine is operated in a safe and sustainable manner. Implementation of the GNA can help identify and evaluate conflict criteria in mining/community relationships; determine the status of concerns; determine the role and responsibilities of stakeholders; analyze problem resolution feasibility; maintain the community involvement and support through economic benefits and environmental safeguards; develop options for the concerns resolution. Difficulties in establishing the GNA standards include lack of insurance/bonding policies, and by the lack of audit and monitoring that could determine the level of exposure of the local community and the environment to the contaminants released at the mine sites. Since many problems of mines can occur during closure and post-closure, GNA's should address those issues also. The goal of the GNA is to have open access for the public to the safety, health, and environmental information pertaining to the mining operation, as well as to educate the local communities about mining practices that promote mutual acknowledgment of the need to build a relationship amenable to each other's needs. Frequent conflicts between mining companies and surrounding communities lead to work disruptions or even mine closures and show the necessity of a less confrontational approach to environmental and social justice. The Good Neighbor Agreement is a unique way to provide the benefits for the both mining operations and local community to provide a mechanism for risk redaction and communication that offer the potential to protect both mining and community interests.
INNOVATIVE, IN SITU TREATMENT OF ACID MINE DRAINAGE USING SULFATE REDUCING BACTERIA
Acid generation in abandoned mines is a widespread problem. There are a numberous quantity of abandoned mines in the west which have no power source, have limited physical accessibility and have limited remediation funds available. Acid is produced chemically, through pyritic min...
Computer-Aided Analysis of Patents for Product Technology Maturity Forecasting
NASA Astrophysics Data System (ADS)
Liang, Yanhong; Gan, Dequan; Guo, Yingchun; Zhang, Peng
Product technology maturity foresting is vital for any enterprises to hold the chance for innovation and keep competitive for a long term. The Theory of Invention Problem Solving (TRIZ) is acknowledged both as a systematic methodology for innovation and a powerful tool for technology forecasting. Based on TRIZ, the state -of-the-art on the technology maturity of product and the limits of application are discussed. With the application of text mining and patent analysis technologies, this paper proposes a computer-aided approach for product technology maturity forecasting. It can overcome the shortcomings of the current methods.
NASA Astrophysics Data System (ADS)
Demigha, Souâd.
2016-03-01
The paper presents a Case-Based Reasoning Tool for Breast Cancer Knowledge Management to improve breast cancer screening. To develop this tool, we combine both concepts and techniques of Case-Based Reasoning (CBR) and Data Mining (DM). Physicians and radiologists ground their diagnosis on their expertise (past experience) based on clinical cases. Case-Based Reasoning is the process of solving new problems based on the solutions of similar past problems and structured as cases. CBR is suitable for medical use. On the other hand, existing traditional hospital information systems (HIS), Radiological Information Systems (RIS) and Picture Archiving Information Systems (PACS) don't allow managing efficiently medical information because of its complexity and heterogeneity. Data Mining is the process of mining information from a data set and transform it into an understandable structure for further use. Combining CBR to Data Mining techniques will facilitate diagnosis and decision-making of medical experts.
Developing Image Processing Meta-Algorithms with Data Mining of Multiple Metrics
Cunha, Alexandre; Toga, A. W.; Parker, D. Stott
2014-01-01
People often use multiple metrics in image processing, but here we take a novel approach of mining the values of batteries of metrics on image processing results. We present a case for extending image processing methods to incorporate automated mining of multiple image metric values. Here by a metric we mean any image similarity or distance measure, and in this paper we consider intensity-based and statistical image measures and focus on registration as an image processing problem. We show how it is possible to develop meta-algorithms that evaluate different image processing results with a number of different metrics and mine the results in an automated fashion so as to select the best results. We show that the mining of multiple metrics offers a variety of potential benefits for many image processing problems, including improved robustness and validation. PMID:24653748
Analysis of Nature of Science Included in Recent Popular Writing Using Text Mining Techniques
ERIC Educational Resources Information Center
Jiang, Feng; McComas, William F.
2014-01-01
This study examined the inclusion of nature of science (NOS) in popular science writing to determine whether it could serve supplementary resource for teaching NOS and to evaluate the accuracy of text mining and classification as a viable research tool in science education research. Four groups of documents published from 2001 to 2010 were…
ERIC Educational Resources Information Center
Cheon, Jongpil; Lee, Sangno; Smith, Walter; Song, Jaeki; Kim, Yongjin
2013-01-01
The purpose of this study was to use text mining analysis of early adolescents' online essays to determine their knowledge of global lunar patterns. Australian and American students in grades five to seven wrote about global lunar patterns they had discovered by sharing observations with each other via the Internet. These essays were analyzed for…
ERIC Educational Resources Information Center
Çepni, Sevcan Bayraktar; Demirel, Elif Tokdemir
2016-01-01
This study aimed to find out the impact of "text mining and imitating" strategies on lexical richness, lexical diversity and general success of students in their compositions in second language writing. The participants were 98 students studying their first year in Karadeniz Technical University in English Language and Literature…
Science and Technology Text Mining: Text Mining of the Journal Cortex
2004-01-01
Amnesia Retrograde Amnesia GENERAL Semantic Memory Episodic Memory Working Memory TEST Serial Position Curve...in Cortex can be reasonably divided into four categories (papers in each category in parenthesis): Semantic Memory (151); Handedness (145); Amnesia ... Semantic Memory (151) is divided into Verbal/ Numerical (76) and Visual/ Spatial (75). Amnesia (119) is divided into Amnesia Symptoms (50) and
Designed to chronicle the pollution problem occurring at one of the world's largest Superfund sites and to address the potential of the application of the approach of the recovery of metal resources occurring in the acid mine drainage causing the pollution problem. Acid mine drai...
Data Mining for Financial Applications
NASA Astrophysics Data System (ADS)
Kovalerchuk, Boris; Vityaev, Evgenii
This chapter describes Data Mining in finance by discussing financial tasks, specifics of methodologies and techniques in this Data Mining area. It includes time dependence, data selection, forecast horizon, measures of success, quality of patterns, hypothesis evaluation, problem ID, method profile, attribute-based and relational methodologies. The second part of the chapter discusses Data Mining models and practice in finance. It covers use of neural networks in portfolio management, design of interpretable trading rules and discovering money laundering schemes using decision rules and relational Data Mining methodology.
Biomedical hypothesis generation by text mining and gene prioritization.
Petric, Ingrid; Ligeti, Balazs; Gyorffy, Balazs; Pongor, Sandor
2014-01-01
Text mining methods can facilitate the generation of biomedical hypotheses by suggesting novel associations between diseases and genes. Previously, we developed a rare-term model called RaJoLink (Petric et al, J. Biomed. Inform. 42(2): 219-227, 2009) in which hypotheses are formulated on the basis of terms rarely associated with a target domain. Since many current medical hypotheses are formulated in terms of molecular entities and molecular mechanisms, here we extend the methodology to proteins and genes, using a standardized vocabulary as well as a gene/protein network model. The proposed enhanced RaJoLink rare-term model combines text mining and gene prioritization approaches. Its utility is illustrated by finding known as well as potential gene-disease associations in ovarian cancer using MEDLINE abstracts and the STRING database.
The Functional Genomics Network in the evolution of biological text mining over the past decade.
Blaschke, Christian; Valencia, Alfonso
2013-03-25
Different programs of The European Science Foundation (ESF) have contributed significantly to connect researchers in Europe and beyond through several initiatives. This support was particularly relevant for the development of the areas related with extracting information from papers (text-mining) because it supported the field in its early phases long before it was recognized by the community. We review the historical development of text mining research and how it was introduced in bioinformatics. Specific applications in (functional) genomics are described like it's integration in genome annotation pipelines and the support to the analysis of high-throughput genomics experimental data, and we highlight the activities of evaluation of methods and benchmarking for which the ESF programme support was instrumental. Copyright © 2013 Elsevier B.V. All rights reserved.
Agile Text Mining for the 2014 i2b2/UTHealth Cardiac Risk Factors Challenge
Cormack, James; Nath, Chinmoy; Milward, David; Raja, Kalpana; Jonnalagadda, Siddhartha R
2016-01-01
This paper describes the use of an agile text mining platform (Linguamatics’ Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 Challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system. PMID:26209007
ASSESSING THE WATER QUALITY OF MINE-IMPACTED STREAMS USING HYPERSPECTRAL DATA
Streoan degradation by mining activities is a wide spread problem in the eastern US. Drainage from coal and ferrous metal mines can produce large quantities of sediment and acidity, which can have a deleterious impact an receiving waters. The mineralogy of these sediments is ...
LABORATORY EVALUATION OF ZERO-VALENT IRON TO TREAT GROUNDWATER IMPACTED BY ACID MINE DRAINAGE
The generation and release of acidic, metal-rich water from mine wastes continues to be an intractable environmental problem. Although the effects of acid mine drainage (AMD) are most evident in surface waters, there is an obvious need for developing cost-effective approaches fo...
Non-standard equipment for construction of vertical shafts
NASA Astrophysics Data System (ADS)
Yagodkin, F. I.; Prokopov, A. Y.; Pleshko, M. S.; Pankratenko, A. N.
2017-10-01
The article deals with the modern problems of construction and reconstruction of vertical shafts of mines, which require innovative technical solutions in the mechanization of mining operations. The examples developed by the authors of the original equipment and technologies, are successfully implemented for the mining industry in Russia.
Fu, Xiao; Batista-Navarro, Riza; Rak, Rafal; Ananiadou, Sophia
2015-01-01
Chronic obstructive pulmonary disease (COPD) is a life-threatening lung disorder whose recent prevalence has led to an increasing burden on public healthcare. Phenotypic information in electronic clinical records is essential in providing suitable personalised treatment to patients with COPD. However, as phenotypes are often "hidden" within free text in clinical records, clinicians could benefit from text mining systems that facilitate their prompt recognition. This paper reports on a semi-automatic methodology for producing a corpus that can ultimately support the development of text mining tools that, in turn, will expedite the process of identifying groups of COPD patients. A corpus of 30 full-text papers was formed based on selection criteria informed by the expertise of COPD specialists. We developed an annotation scheme that is aimed at producing fine-grained, expressive and computable COPD annotations without burdening our curators with a highly complicated task. This was implemented in the Argo platform by means of a semi-automatic annotation workflow that integrates several text mining tools, including a graphical user interface for marking up documents. When evaluated using gold standard (i.e., manually validated) annotations, the semi-automatic workflow was shown to obtain a micro-averaged F-score of 45.70% (with relaxed matching). Utilising the gold standard data to train new concept recognisers, we demonstrated that our corpus, although still a work in progress, can foster the development of significantly better performing COPD phenotype extractors. We describe in this work the means by which we aim to eventually support the process of COPD phenotype curation, i.e., by the application of various text mining tools integrated into an annotation workflow. Although the corpus being described is still under development, our results thus far are encouraging and show great potential in stimulating the development of further automatic COPD phenotype extractors.
Localisation of an Unknown Number of Land Mines Using a Network of Vapour Detectors
Chhadé, Hiba Haj; Abdallah, Fahed; Mougharbel, Imad; Gning, Amadou; Julier, Simon; Mihaylova, Lyudmila
2014-01-01
We consider the problem of localising an unknown number of land mines using concentration information provided by a wireless sensor network. A number of vapour sensors/detectors, deployed in the region of interest, are able to detect the concentration of the explosive vapours, emanating from buried land mines. The collected data is communicated to a fusion centre. Using a model for the transport of the explosive chemicals in the air, we determine the unknown number of sources using a Principal Component Analysis (PCA)-based technique. We also formulate the inverse problem of determining the positions and emission rates of the land mines using concentration measurements provided by the wireless sensor network. We present a solution for this problem based on a probabilistic Bayesian technique using a Markov chain Monte Carlo sampling scheme, and we compare it to the least squares optimisation approach. Experiments conducted on simulated data show the effectiveness of the proposed approach. PMID:25384008
Ding, Xu; Shi, Lei; Han, Jianghong; Lu, Jingting
2016-01-01
Wireless sensor networks deployed in coal mines could help companies provide workers working in coal mines with more qualified working conditions. With the underground information collected by sensor nodes at hand, the underground working conditions could be evaluated more precisely. However, sensor nodes may tend to malfunction due to their limited energy supply. In this paper, we study the cross-layer optimization problem for wireless rechargeable sensor networks implemented in coal mines, of which the energy could be replenished through the newly-brewed wireless energy transfer technique. The main results of this article are two-fold: firstly, we obtain the optimal relay nodes’ placement according to the minimum overall energy consumption criterion through the Lagrange dual problem and KKT conditions; secondly, the optimal strategies for recharging locomotives and wireless sensor networks are acquired by solving a cross-layer optimization problem. The cyclic nature of these strategies is also manifested through simulations in this paper. PMID:26828500
Ding, Xu; Shi, Lei; Han, Jianghong; Lu, Jingting
2016-01-28
Wireless sensor networks deployed in coal mines could help companies provide workers working in coal mines with more qualified working conditions. With the underground information collected by sensor nodes at hand, the underground working conditions could be evaluated more precisely. However, sensor nodes may tend to malfunction due to their limited energy supply. In this paper, we study the cross-layer optimization problem for wireless rechargeable sensor networks implemented in coal mines, of which the energy could be replenished through the newly-brewed wireless energy transfer technique. The main results of this article are two-fold: firstly, we obtain the optimal relay nodes' placement according to the minimum overall energy consumption criterion through the Lagrange dual problem and KKT conditions; secondly, the optimal strategies for recharging locomotives and wireless sensor networks are acquired by solving a cross-layer optimization problem. The cyclic nature of these strategies is also manifested through simulations in this paper.
78 FR 64397 - Mississippi Regulatory Program
Federal Register 2010, 2011, 2012, 2013, 2014
2013-10-29
... text of the program amendment available at www.regulations.gov . A. Mississippi Surface Coal Mining... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 924...; S2D2SSS08011000SX066A00033F13XS501520] Mississippi Regulatory Program AGENCY: Office of Surface Mining Reclamation and Enforcement...
Redundancy and Novelty Mining in the Business Blogosphere
ERIC Educational Resources Information Center
Tsai, Flora S.; Chan, Kap Luk
2010-01-01
Purpose: The paper aims to explore the performance of redundancy and novelty mining in the business blogosphere, which has not been studied before. Design/methodology/approach: Novelty mining techniques are implemented to single out novel information out of a massive set of text documents. This paper adopted the mixed metric approach which…
Diversification of the Higher Mining Education Financing in Globalization Era
NASA Astrophysics Data System (ADS)
Frolova, Victoria; Dolina, Olga; Shpil'kina, Tatyana
2017-11-01
In the current conditions of global competition, the development of new mining technologies, the requirements to labor resources, their skills and creative potential are increasing. The tasks facing the mining industry cannot be solved without highly qualified personnel, especially managers, engineers and technicians, specialists who possess the knowledge and competences necessary for the development of science and technology of mining, and ensuring mining industrial safety. The authors analyze personnel problems and financing of mining higher education, conclude that there is a need to develop social partnership and diversify the sources of funding for training, advanced training and retraining of personnel for mining and processing of solid mineral deposits.
ERIC Educational Resources Information Center
Repine, Tom; Hemler, Deb; Lane, Duane
2003-01-01
Presents a problem-solving investigation on coal mining that integrates science and mathematics with geology. Engages students in a scenario in which they play the roles of geologists and mining engineers. (NB)
HPC-NMF: A High-Performance Parallel Algorithm for Nonnegative Matrix Factorization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kannan, Ramakrishnan; Sukumar, Sreenivas R.; Ballard, Grey M.
NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient distributed algorithms to solve the problem for big data sets. We propose a high-performance distributed-memory parallel algorithm that computes the factorization by iteratively solving alternating non-negative least squares (NLS) subproblems formore » $$\\WW$$ and $$\\HH$$. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). As opposed to previous implementation, our algorithm is also flexible: It performs well for both dense and sparse matrices, and allows the user to choose any one of the multiple algorithms for solving the updates to low rank factors $$\\WW$$ and $$\\HH$$ within the alternating iterations.« less
LLNL electro-optical mine detection program
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anderson, C.; Aimonetti, W.; Barth, M.
1994-09-30
Under funding from the Advanced Research Projects Agency (ARPA) and the US Marine Corps (USMC), Lawrence Livermore National Laboratory (LLNL) has directed a program aimed at improving detection capabilities against buried mines and munitions. The program has provided a national test facility for buried mines in arid environments, compiled and distributed an extensive data base of infrared (IR), ground penetrating radar (GPR), and other measurements made at that site, served as a host for other organizations wishing to make measurements, made considerable progress in the use of ground penetrating radar for mine detection, and worked on the difficult problem ofmore » sensor fusion as applied to buried mine detection. While the majority of our effort has been concentrated on the buried mine problem, LLNL has worked with the U.S.M.C. on surface mine problems as well, providing data and analysis to support the COBRA (Coastal Battlefield Reconnaissance and Analysis) program. The original aim of the experimental aspect of the program was the utilization of multiband infrared approaches for the detection of buried mines. Later the work was extended to a multisensor investigation, including sensors other than infrared imagers. After an early series of measurements, it was determined that further progress would require a larger test facility in a natural environment, so the Buried Object Test Facility (BOTF) was constructed at the Nevada Test Site. After extensive testing, with sensors spanning the electromagnetic spectrum from the near ultraviolet to radio frequencies, possible paths for improvement were: improved spatial resolution providing better ground texture discrimination; analysis which involves more complicated spatial queueing and filtering; additional IR bands using imaging spectroscopy; the use of additional sensors other than IR and the use of data fusion techniques with multi-sensor data; and utilizing time dependent observables like temperature.« less
GeoSegmenter: A statistically learned Chinese word segmenter for the geoscience domain
NASA Astrophysics Data System (ADS)
Huang, Lan; Du, Youfu; Chen, Gongyang
2015-03-01
Unlike English, the Chinese language has no space between words. Segmenting texts into words, known as the Chinese word segmentation (CWS) problem, thus becomes a fundamental issue for processing Chinese documents and the first step in many text mining applications, including information retrieval, machine translation and knowledge acquisition. However, for the geoscience subject domain, the CWS problem remains unsolved. Although a generic segmenter can be applied to process geoscience documents, they lack the domain specific knowledge and consequently their segmentation accuracy drops dramatically. This motivated us to develop a segmenter specifically for the geoscience subject domain: the GeoSegmenter. We first proposed a generic two-step framework for domain specific CWS. Following this framework, we built GeoSegmenter using conditional random fields, a principled statistical framework for sequence learning. Specifically, GeoSegmenter first identifies general terms by using a generic baseline segmenter. Then it recognises geoscience terms by learning and applying a model that can transform the initial segmentation into the goal segmentation. Empirical experimental results on geoscience documents and benchmark datasets showed that GeoSegmenter could effectively recognise both geoscience terms and general terms.
Kurth, Laura M; McCawley, Michael; Hendryx, Michael; Lusk, Stephanie
2014-07-01
People who live in Appalachian areas where coal mining is prominent have increased health problems compared with people in non-mining areas of Appalachia. Coal mines and related mining activities result in the production of atmospheric particulate matter (PM) that is associated with human health effects. There is a gap in research regarding particle size concentration and distribution to determine respiratory dose around coal mining and non-mining areas. Mass- and number-based size distributions were determined with an Aerodynamic Particle Size and Scanning Mobility Particle Sizer to calculate lung deposition around mining and non-mining areas of West Virginia. Particle number concentrations and deposited lung dose were significantly greater around mining areas compared with non-mining areas, demonstrating elevated risks to humans. The greater dose was correlated with elevated disease rates in the West Virginia mining areas. Number concentrations in the mining areas were comparable to a previously documented urban area where number concentration was associated with respiratory and cardiovascular disease.
Meng, Guilin; Meng, Xiulin; Ma, Xiaoye; Zhang, Gengping; Hu, Xiaolin; Jin, Aiping; Liu, Xueyuan
2018-01-01
Alzheimer’s disease (AD) is an increasing concern in human health. Despite significant research, highly effective drugs to treat AD are lacking. The present study describes the text mining process to identify drug candidates from a traditional Chinese medicine (TCM) database, along with associated protein target mechanisms. We carried out text mining to identify literatures that referenced both AD and TCM and focused on identifying compounds and protein targets of interest. After targeting one potential TCM candidate, corresponding protein-protein interaction (PPI) networks were assembled in STRING to decipher the most possible mechanism of action. This was followed by validation using Western blot and co-immunoprecipitation in an AD cell model. The text mining strategy using a vast amount of AD-related literature and the TCM database identified curcumin, whose major component was ferulic acid (FA). This was used as a key candidate compound for further study. Using the top calculated interaction score in STRING, BACE1 and MMP2 were implicated in the activity of FA in AD. Exposure of SHSY5Y-APP cells to FA resulted in the decrease in expression levels of BACE-1 and APP, while the expression of MMP-2 and MMP-9 increased in a dose-dependent manner. This suggests that FA induced BACE1 and MMP2 pathways maybe novel potential mechanisms involved in AD. The text mining of literature and TCM database related to AD suggested FA as a promising TCM ingredient for the treatment of AD. Potential mechanisms interconnected and integrated with Aβ aggregation inhibition and extracellular matrix remodeling underlying the activity of FA were identified using in vitro studies. PMID:29896095
Meng, Guilin; Meng, Xiulin; Ma, Xiaoye; Zhang, Gengping; Hu, Xiaolin; Jin, Aiping; Zhao, Yanxin; Liu, Xueyuan
2018-01-01
Alzheimer's disease (AD) is an increasing concern in human health. Despite significant research, highly effective drugs to treat AD are lacking. The present study describes the text mining process to identify drug candidates from a traditional Chinese medicine (TCM) database, along with associated protein target mechanisms. We carried out text mining to identify literatures that referenced both AD and TCM and focused on identifying compounds and protein targets of interest. After targeting one potential TCM candidate, corresponding protein-protein interaction (PPI) networks were assembled in STRING to decipher the most possible mechanism of action. This was followed by validation using Western blot and co-immunoprecipitation in an AD cell model. The text mining strategy using a vast amount of AD-related literature and the TCM database identified curcumin, whose major component was ferulic acid (FA). This was used as a key candidate compound for further study. Using the top calculated interaction score in STRING, BACE1 and MMP2 were implicated in the activity of FA in AD. Exposure of SHSY5Y-APP cells to FA resulted in the decrease in expression levels of BACE-1 and APP, while the expression of MMP-2 and MMP-9 increased in a dose-dependent manner. This suggests that FA induced BACE1 and MMP2 pathways maybe novel potential mechanisms involved in AD. The text mining of literature and TCM database related to AD suggested FA as a promising TCM ingredient for the treatment of AD. Potential mechanisms interconnected and integrated with Aβ aggregation inhibition and extracellular matrix remodeling underlying the activity of FA were identified using in vitro studies.
Johnson, Robin J.; Lay, Jean M.; Lennon-Hopkins, Kelley; Saraceni-Richards, Cynthia; Sciaky, Daniela; Murphy, Cynthia Grondin; Mattingly, Carolyn J.
2013-01-01
The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) is a public resource that curates interactions between environmental chemicals and gene products, and their relationships to diseases, as a means of understanding the effects of environmental chemicals on human health. CTD provides a triad of core information in the form of chemical-gene, chemical-disease, and gene-disease interactions that are manually curated from scientific articles. To increase the efficiency, productivity, and data coverage of manual curation, we have leveraged text mining to help rank and prioritize the triaged literature. Here, we describe our text-mining process that computes and assigns each article a document relevancy score (DRS), wherein a high DRS suggests that an article is more likely to be relevant for curation at CTD. We evaluated our process by first text mining a corpus of 14,904 articles triaged for seven heavy metals (cadmium, cobalt, copper, lead, manganese, mercury, and nickel). Based upon initial analysis, a representative subset corpus of 3,583 articles was then selected from the 14,094 articles and sent to five CTD biocurators for review. The resulting curation of these 3,583 articles was analyzed for a variety of parameters, including article relevancy, novel data content, interaction yield rate, mean average precision, and biological and toxicological interpretability. We show that for all measured parameters, the DRS is an effective indicator for scoring and improving the ranking of literature for the curation of chemical-gene-disease information at CTD. Here, we demonstrate how fully incorporating text mining-based DRS scoring into our curation pipeline enhances manual curation by prioritizing more relevant articles, thereby increasing data content, productivity, and efficiency. PMID:23613709
Sustainable Remediation of Legacy Mine Drainage: A Case Study of the Flight 93 National Memorial.
Emili, Lisa A; Pizarchik, Joseph; Mahan, Carolyn G
2016-03-01
Pollution from mining activities is a global environmental concern, not limited to areas of current resource extraction, but including a broader geographic area of historic (legacy) and abandoned mines. The pollution of surface waters from acid mine drainage is a persistent problem and requires a holistic and sustainable approach to addressing the spatial and temporal complexity of mining-specific problems. In this paper, we focus on the environmental, socio-economic, and legal challenges associated with the concurrent activities to remediate a coal mine site and to develop a national memorial following a catastrophic event. We provide a conceptual construct of a socio-ecological system defined at several spatial, temporal, and organizational scales and a critical synthesis of the technical and social learning processes necessary to achieving sustainable environmental remediation. Our case study is an example of a multi-disciplinary management approach, whereby collaborative interaction of stakeholders, the emergence of functional linkages for information exchange, and mediation led to scientifically informed decision making, creative management solutions, and ultimately environmental policy change.
Data Mining and Privacy of Social Network Sites' Users: Implications of the Data Mining Problem.
Al-Saggaf, Yeslam; Islam, Md Zahidul
2015-08-01
This paper explores the potential of data mining as a technique that could be used by malicious data miners to threaten the privacy of social network sites (SNS) users. It applies a data mining algorithm to a real dataset to provide empirically-based evidence of the ease with which characteristics about the SNS users can be discovered and used in a way that could invade their privacy. One major contribution of this article is the use of the decision forest data mining algorithm (SysFor) to the context of SNS, which does not only build a decision tree but rather a forest allowing the exploration of more logic rules from a dataset. One logic rule that SysFor built in this study, for example, revealed that anyone having a profile picture showing just the face or a picture showing a family is less likely to be lonely. Another contribution of this article is the discussion of the implications of the data mining problem for governments, businesses, developers and the SNS users themselves.
NASA Astrophysics Data System (ADS)
Tyuleneva, Tatiana
2017-11-01
One of the problems of sustainable development of mining companies is attracting additional investment. To solve it requires access to international capital markets, in this context, enterprises need to prepare financial statements with international requirements based on the data generated by the accounting system. The article considers the basic problems of accounting in the extractive industries due to the nature of the industry, as well as evaluation of the completeness of their solution in the framework of international financial reporting standards. In addition, lists the characteristics of accounting for mining industry, due to the peculiarities of the production process that need to be considered to solve these problems. This sector is extremely important for individual countries and on a global scale.
Design of material management system of mining group based on Hadoop
NASA Astrophysics Data System (ADS)
Xia, Zhiyuan; Tan, Zhuoying; Qi, Kuan; Li, Wen
2018-01-01
Under the background of persistent slowdown in mining market at present, improving the management level in mining group has become the key link to improve the economic benefit of the mine. According to the practical material management in mining group, three core components of Hadoop are applied: distributed file system HDFS, distributed computing framework Map/Reduce and distributed database HBase. Material management system of mining group based on Hadoop is constructed with the three core components of Hadoop and SSH framework technology. This system was found to strengthen collaboration between mining group and affiliated companies, and then the problems such as inefficient management, server pressure, hardware equipment performance deficiencies that exist in traditional mining material-management system are solved, and then mining group materials management is optimized, the cost of mining management is saved, the enterprise profit is increased.
Data Mining: A Hybrid Methodology for Complex and Dynamic Research
ERIC Educational Resources Information Center
Lang, Susan; Baehr, Craig
2012-01-01
This article provides an overview of the ways in which data and text mining have potential as research methodologies in composition studies. It introduces data mining in the context of the field of composition studies and discusses ways in which this methodology can complement and extend our existing research practices by blending the best of what…
Mine Waste at The Kherzet Youcef Mine : Environmental Characterization
NASA Astrophysics Data System (ADS)
Issaad, Mouloud; Boutaleb, Abdelhak; Kolli, Omar
2017-04-01
Mining activity in Algeria has existed since antiquity. But it was very important since the 20th century. This activity has virtually ceased since the beginning of the 1990s, leaving many mine sites abandoned (so-called orphan mines). The abandonment of mining today poses many environmental problems (soil pollution, contamination of surface water, mining collapses...). The mining wastes often occupy large volumes that can be hazardous to the environment and human health, often neglected in the past: Faulting geotechnical implementation, acid mine drainage (AMD), alkalinity, presence of pollutants and toxic substances (heavy metals, cyanide...). The study started already six years ago and it covers all mines located in NE Algeria, almost are stopped for more than thirty years. So the most important is to have an overview of all the study area. After the inventory job of the abandoned mines, the rock drainage prediction will help us to classify sites according to their acid generating potential.
Text mining and medicine: usefulness in respiratory diseases.
Piedra, David; Ferrer, Antoni; Gea, Joaquim
2014-03-01
It is increasingly common to have medical information in electronic format. This includes scientific articles as well as clinical management reviews, and even records from health institutions with patient data. However, traditional instruments, both individual and institutional, are of little use for selecting the most appropriate information in each case, either in the clinical or research field. So-called text or data «mining» enables this huge amount of information to be managed, extracting it from various sources using processing systems (filtration and curation), integrating it and permitting the generation of new knowledge. This review aims to provide an overview of text and data mining, and of the potential usefulness of this bioinformatic technique in the exercise of care in respiratory medicine and in research in the same field. Copyright © 2013 SEPAR. Published by Elsevier Espana. All rights reserved.
Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge.
Cormack, James; Nath, Chinmoy; Milward, David; Raja, Kalpana; Jonnalagadda, Siddhartha R
2015-12-01
This paper describes the use of an agile text mining platform (Linguamatics' Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system. Copyright © 2015 Elsevier Inc. All rights reserved.
Overview of the gene ontology task at BioCreative IV.
Mao, Yuqing; Van Auken, Kimberly; Li, Donghui; Arighi, Cecilia N; McQuilton, Peter; Hayman, G Thomas; Tweedie, Susan; Schaeffer, Mary L; Laulederkind, Stanley J F; Wang, Shur-Jen; Gobeill, Julien; Ruch, Patrick; Luu, Anh Tuan; Kim, Jung-Jae; Chiang, Jung-Hsien; Chen, Yu-De; Yang, Chia-Jung; Liu, Hongfang; Zhu, Dongqing; Li, Yanpeng; Yu, Hong; Emadzadeh, Ehsan; Gonzalez, Graciela; Chen, Jian-Ming; Dai, Hong-Jie; Lu, Zhiyong
2014-01-01
Gene ontology (GO) annotation is a common task among model organism databases (MODs) for capturing gene function data from journal articles. It is a time-consuming and labor-intensive task, and is thus often considered as one of the bottlenecks in literature curation. There is a growing need for semiautomated or fully automated GO curation techniques that will help database curators to rapidly and accurately identify gene function information in full-length articles. Despite multiple attempts in the past, few studies have proven to be useful with regard to assisting real-world GO curation. The shortage of sentence-level training data and opportunities for interaction between text-mining developers and GO curators has limited the advances in algorithm development and corresponding use in practical circumstances. To this end, we organized a text-mining challenge task for literature-based GO annotation in BioCreative IV. More specifically, we developed two subtasks: (i) to automatically locate text passages that contain GO-relevant information (a text retrieval task) and (ii) to automatically identify relevant GO terms for the genes in a given article (a concept-recognition task). With the support from five MODs, we provided teams with >4000 unique text passages that served as the basis for each GO annotation in our task data. Such evidence text information has long been recognized as critical for text-mining algorithm development but was never made available because of the high cost of curation. In total, seven teams participated in the challenge task. From the team results, we conclude that the state of the art in automatically mining GO terms from literature has improved over the past decade while much progress is still needed for computer-assisted GO curation. Future work should focus on addressing remaining technical challenges for improved performance of automatic GO concept recognition and incorporating practical benefits of text-mining tools into real-world GO annotation. http://www.biocreative.org/tasks/biocreative-iv/track-4-GO/. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.
Topaz, Maxim; Radhakrishnan, Kavita; Lei, Victor; Zhou, Li
2016-01-01
Effective self-management can decrease up to 50% of heart failure hospitalizations. Unfortunately, self-management by patients with heart failure remains poor. This pilot study aimed to explore the use of text-mining to identify heart failure patients with ineffective self-management. We first built a comprehensive self-management vocabulary based on the literature and clinical notes review. We then randomly selected 545 heart failure patients treated within Partners Healthcare hospitals (Boston, MA, USA) and conducted a regular expression search with the compiled vocabulary within 43,107 interdisciplinary clinical notes of these patients. We found that 38.2% (n = 208) patients had documentation of ineffective heart failure self-management in the domains of poor diet adherence (28.4%), missed medical encounters (26.4%) poor medication adherence (20.2%) and non-specified self-management issues (e.g., "compliance issues", 34.6%). We showed the feasibility of using text-mining to identify patients with ineffective self-management. More natural language processing algorithms are needed to help busy clinicians identify these patients.
Integration of Artificial Market Simulation and Text Mining for Market Analysis
NASA Astrophysics Data System (ADS)
Izumi, Kiyoshi; Matsui, Hiroki; Matsuo, Yutaka
We constructed an evaluation system of the self-impact in a financial market using an artificial market and text-mining technology. Economic trends were first extracted from text data circulating in the real world. Then, the trends were inputted into the market simulation. Our simulation revealed that an operation by intervention could reduce over 70% of rate fluctuation in 1995. By the simulation results, the system was able to help for its user to find the exchange policy which can stabilize the yen-dollar rate.
Kasemodel, Mariana Consiglio; Lima, Jacqueline Zanin; Sakamoto, Isabel Kimiko; Varesche, Maria Bernadete Amancio; Trofino, Julio Cesar; Rodrigues, Valéria Guimarães Silvestre
2016-12-01
Improper disposal of mining waste is still considered a global problem, and further details on the contamination by potentially toxic metals are required for a proper assessment. In this context, it is important to have a combined view of the chemical and biological changes in the mining dump area. Thus, the objective of this study was to evaluate the Pb, Zn and Cd contamination in a slag disposal area using the integration of geochemical and microbiological data. Analyses of soil organic matter (SOM), pH, Eh, pseudo-total concentration of metals, sequential extraction and microbial community by polymerase chain reaction-denaturing gradient gel electrophoresis (PCR-DGGE) were conducted. Metal availability was evaluated based on the geoaccumulation index (I geo ), ecological risk ([Formula: see text]), Risk Assessment Code (RAC) and experimental data, and different reference values were tested to assist in the interpretation of the indices. The soil pH was slightly acidic to neutral, the Eh values indicated oxidized conditions and the average SOM content varied from 12.10 to 53.60 g kg -1 . The average pseudo-total concentrations of metals were in the order of Zn > Pb > Cd. Pb and Zn were mainly bound to the residual fraction and Fe-Mn oxides, and a significant proportion of Cd was bound to the exchangeable and carbonate fractions. The topsoil (0-20 cm) is highly contaminated (I geo ) with Cd and has a very high potential ecological risk ([Formula: see text]). Higher bacterial diversity was mainly associated with higher metal concentrations. It is concluded that the integration of geochemical and microbiological data can provide an appropriate evaluation of mining waste-contaminated areas.
BioC implementations in Go, Perl, Python and Ruby
Liu, Wanli; Islamaj Doğan, Rezarta; Kwon, Dongseop; Marques, Hernani; Rinaldi, Fabio; Wilbur, W. John; Comeau, Donald C.
2014-01-01
As part of a communitywide effort for evaluating text mining and information extraction systems applied to the biomedical domain, BioC is focused on the goal of interoperability, currently a major barrier to wide-scale adoption of text mining tools. BioC is a simple XML format, specified by DTD, for exchanging data for biomedical natural language processing. With initial implementations in C++ and Java, BioC provides libraries of code for reading and writing BioC text documents and annotations. We extend BioC to Perl, Python, Go and Ruby. We used SWIG to extend the C++ implementation for Perl and one Python implementation. A second Python implementation and the Ruby implementation use native data structures and libraries. BioC is also implemented in the Google language Go. BioC modules are functional in all of these languages, which can facilitate text mining tasks. BioC implementations are freely available through the BioC site: http://bioc.sourceforge.net. Database URL: http://bioc.sourceforge.net/ PMID:24961236
Thematic clustering of text documents using an EM-based approach
2012-01-01
Clustering textual contents is an important step in mining useful information on the web or other text-based resources. The common task in text clustering is to handle text in a multi-dimensional space, and to partition documents into groups, where each group contains documents that are similar to each other. However, this strategy lacks a comprehensive view for humans in general since it cannot explain the main subject of each cluster. Utilizing semantic information can solve this problem, but it needs a well-defined ontology or pre-labeled gold standard set. In this paper, we present a thematic clustering algorithm for text documents. Given text, subject terms are extracted and used for clustering documents in a probabilistic framework. An EM approach is used to ensure documents are assigned to correct subjects, hence it converges to a locally optimal solution. The proposed method is distinctive because its results are sufficiently explanatory for human understanding as well as efficient for clustering performance. The experimental results show that the proposed method provides a competitive performance compared to other state-of-the-art approaches. We also show that the extracted themes from the MEDLINE® dataset represent the subjects of clusters reasonably well. PMID:23046528
PathText: a text mining integrator for biological pathway visualizations
Kemper, Brian; Matsuzaki, Takuya; Matsuoka, Yukiko; Tsuruoka, Yoshimasa; Kitano, Hiroaki; Ananiadou, Sophia; Tsujii, Jun'ichi
2010-01-01
Motivation: Metabolic and signaling pathways are an increasingly important part of organizing knowledge in systems biology. They serve to integrate collective interpretations of facts scattered throughout literature. Biologists construct a pathway by reading a large number of articles and interpreting them as a consistent network, but most of the models constructed currently lack direct links to those articles. Biologists who want to check the original articles have to spend substantial amounts of time to collect relevant articles and identify the sections relevant to the pathway. Furthermore, with the scientific literature expanding by several thousand papers per week, keeping a model relevant requires a continuous curation effort. In this article, we present a system designed to integrate a pathway visualizer, text mining systems and annotation tools into a seamless environment. This will enable biologists to freely move between parts of a pathway and relevant sections of articles, as well as identify relevant papers from large text bases. The system, PathText, is developed by Systems Biology Institute, Okinawa Institute of Science and Technology, National Centre for Text Mining (University of Manchester) and the University of Tokyo, and is being used by groups of biologists from these locations. Contact: brian@monrovian.com. PMID:20529930
miRTex: A Text Mining System for miRNA-Gene Relation Extraction
Li, Gang; Ross, Karen E.; Arighi, Cecilia N.; Peng, Yifan; Wu, Cathy H.; Vijay-Shanker, K.
2015-01-01
MicroRNAs (miRNAs) regulate a wide range of cellular and developmental processes through gene expression suppression or mRNA degradation. Experimentally validated miRNA gene targets are often reported in the literature. In this paper, we describe miRTex, a text mining system that extracts miRNA-target relations, as well as miRNA-gene and gene-miRNA regulation relations. The system achieves good precision and recall when evaluated on a literature corpus of 150 abstracts with F-scores close to 0.90 on the three different types of relations. We conducted full-scale text mining using miRTex to process all the Medline abstracts and all the full-length articles in the PubMed Central Open Access Subset. The results for all the Medline abstracts are stored in a database for interactive query and file download via the website at http://proteininformationresource.org/mirtex. Using miRTex, we identified genes potentially regulated by miRNAs in Triple Negative Breast Cancer, as well as miRNA-gene relations that, in conjunction with kinase-substrate relations, regulate the response to abiotic stress in Arabidopsis thaliana. These two use cases demonstrate the usefulness of miRTex text mining in the analysis of miRNA-regulated biological processes. PMID:26407127
Cañada, Andres; Rabal, Obdulia; Oyarzabal, Julen; Valencia, Alfonso
2017-01-01
Abstract A considerable effort has been devoted to retrieve systematically information for genes and proteins as well as relationships between them. Despite the importance of chemical compounds and drugs as a central bio-entity in pharmacological and biological research, only a limited number of freely available chemical text-mining/search engine technologies are currently accessible. Here we present LimTox (Literature Mining for Toxicology), a web-based online biomedical search tool with special focus on adverse hepatobiliary reactions. It integrates a range of text mining, named entity recognition and information extraction components. LimTox relies on machine-learning, rule-based, pattern-based and term lookup strategies. This system processes scientific abstracts, a set of full text articles and medical agency assessment reports. Although the main focus of LimTox is on adverse liver events, it enables also basic searches for other organ level toxicity associations (nephrotoxicity, cardiotoxicity, thyrotoxicity and phospholipidosis). This tool supports specialized search queries for: chemical compounds/drugs, genes (with additional emphasis on key enzymes in drug metabolism, namely P450 cytochromes—CYPs) and biochemical liver markers. The LimTox website is free and open to all users and there is no login requirement. LimTox can be accessed at: http://limtox.bioinfo.cnio.es PMID:28531339
Text mining a self-report back-translation.
Blanch, Angel; Aluja, Anton
2016-06-01
There are several recommendations about the routine to undertake when back translating self-report instruments in cross-cultural research. However, text mining methods have been generally ignored within this field. This work describes a text mining innovative application useful to adapt a personality questionnaire to 12 different languages. The method is divided in 3 different stages, a descriptive analysis of the available back-translated instrument versions, a dissimilarity assessment between the source language instrument and the 12 back-translations, and an item assessment of item meaning equivalence. The suggested method contributes to improve the back-translation process of self-report instruments for cross-cultural research in 2 significant intertwined ways. First, it defines a systematic approach to the back translation issue, allowing for a more orderly and informed evaluation concerning the equivalence of different versions of the same instrument in different languages. Second, it provides more accurate instrument back-translations, which has direct implications for the reliability and validity of the instrument's test scores when used in different cultures/languages. In addition, this procedure can be extended to the back-translation of self-reports measuring psychological constructs in clinical assessment. Future research works could refine the suggested methodology and use additional available text mining tools. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Elayavilli, Ravikumar Komandur; Liu, Hongfang
2016-01-01
Computational modeling of biological cascades is of great interest to quantitative biologists. Biomedical text has been a rich source for quantitative information. Gathering quantitative parameters and values from biomedical text is one significant challenge in the early steps of computational modeling as it involves huge manual effort. While automatically extracting such quantitative information from bio-medical text may offer some relief, lack of ontological representation for a subdomain serves as impedance in normalizing textual extractions to a standard representation. This may render textual extractions less meaningful to the domain experts. In this work, we propose a rule-based approach to automatically extract relations involving quantitative data from biomedical text describing ion channel electrophysiology. We further translated the quantitative assertions extracted through text mining to a formal representation that may help in constructing ontology for ion channel events using a rule based approach. We have developed Ion Channel ElectroPhysiology Ontology (ICEPO) by integrating the information represented in closely related ontologies such as, Cell Physiology Ontology (CPO), and Cardiac Electro Physiology Ontology (CPEO) and the knowledge provided by domain experts. The rule-based system achieved an overall F-measure of 68.93% in extracting the quantitative data assertions system on an independently annotated blind data set. We further made an initial attempt in formalizing the quantitative data assertions extracted from the biomedical text into a formal representation that offers potential to facilitate the integration of text mining into ontological workflow, a novel aspect of this study. This work is a case study where we created a platform that provides formal interaction between ontology development and text mining. We have achieved partial success in extracting quantitative assertions from the biomedical text and formalizing them in ontological framework. The ICEPO ontology is available for download at http://openbionlp.org/mutd/supplementarydata/ICEPO/ICEPO.owl.
NASA Astrophysics Data System (ADS)
Krawczyk, Artur
2018-01-01
In this article, topics regarding the technical and legal aspects of creating digital underground mining maps are described. Currently used technologies and solutions for creating, storing and making digital maps accessible are described in the context of the Polish mining industry. Also, some problems with the use of these technologies are identified and described. One of the identified problems is the need to expand the range of mining map data provided by survey departments to other mining departments, such as ventilation maintenance or geological maintenance. Three solutions are proposed and analyzed, and one is chosen for further analysis. The analysis concerns data storage and making survey data accessible not only from paper documentation, but also directly from computer systems. Based on enrichment data, new processing procedures are proposed for a new way of presenting information that allows the preparation of new cartographic representations (symbols) of data with regard to users' needs.
Hilson, Gavin; Vieira, Rickford
2007-12-01
This paper examines the barriers to mitigating mercury pollution at small-scale gold mines in the Guianas (Guyana, French Guiana and Suriname), and prescribes recommendations for overcoming these obstacles. Whilst considerable attention has been paid to analysing the environmental impacts of operations in the region, minimal research has been undertaken to identify appropriate policy and educational initiatives for addressing the mounting mercury problem. Findings from recent fieldwork and selected interviews with operators from Guyanese and Surinamese gold mining regions reveal that legislative incapacity, the region's varied industry policy stances, various technological problems, and low environmental awareness on the part of communities are impeding efforts to facilitate improved mercury management at small-scale gold mines in the Guianas. Marked improvements can be achieved, however, if legislation, particularly that pertaining to mercury, is harmonised in the region; educational seminars continue to be held in important mining districts; and additional outlets for disseminating environmental equipment and mercury-free technologies are provided.
Selective Guide to Literature on Mining Engineering. Engineering Literature Guides, Number 6.
ERIC Educational Resources Information Center
Erdmann, Charlotte A., Comp.
The multidisciplinary field of mining engineering offers many challenges. Often, many sources must be used to solve a problem. This document is a survey of information sources in mining engineering and is intended to identify those core resources which can help engineers and librarians to find information about the discipline. Sections include:…
Understanding Teacher Users of a Digital Library Service: A Clustering Approach
ERIC Educational Resources Information Center
Xu, Beijie; Recker, Mimi
2011-01-01
This article describes the Knowledge Discovery and Data Mining (KDD) process and its application in the field of educational data mining (EDM) in the context of a digital library service called the Instructional Architect (IA.usu.edu). In particular, the study reported in this article investigated a certain type of data mining problem, clustering,…
Roehl, Edwin A.; Conrads, Paul
2010-01-01
This is the second of two papers that describe how data mining can aid natural-resource managers with the difficult problem of controlling the interactions between hydrologic and man-made systems. Data mining is a new science that assists scientists in converting large databases into knowledge, and is uniquely able to leverage the large amounts of real-time, multivariate data now being collected for hydrologic systems. Part 1 gives a high-level overview of data mining, and describes several applications that have addressed major water resource issues in South Carolina. This Part 2 paper describes how various data mining methods are integrated to produce predictive models for controlling surface- and groundwater hydraulics and quality. The methods include: - signal processing to remove noise and decompose complex signals into simpler components; - time series clustering that optimally groups hundreds of signals into "classes" that behave similarly for data reduction and (or) divide-and-conquer problem solving; - classification which optimally matches new data to behavioral classes; - artificial neural networks which optimally fit multivariate data to create predictive models; - model response surface visualization that greatly aids in understanding data and physical processes; and, - decision support systems that integrate data, models, and graphics into a single package that is easy to use.
Acid mine drainage and subsidence: effects of increased coal utilization.
Hill, R D; Bates, E R
1979-01-01
The increases above 1975 levels for acid mine drainage and subsidence for the years 1985 and 2000 based on projections of current mining trends and the National Energy Plan are presented. No increases are projected for acid mine drainage from surface mines or waste since enforcement under present laws should control this problem. The increase in acid mine drainage from underground mines is projected to be 16 percent by 1985 and 10 percent by 2000. The smaller increase in 2000 over 1985 reflects the impact of the PL 95-87 abandoned mine program. Mine subsidence is projected to increase by 34 and 115 percent respectively for 1985 and 2000. This estimate assumes that subsidence will parallel the rate of underground coal production and that no new subsidence control measures are adopted to mitigate subsidence occurrence. PMID:540617
2007-04-01
1 Chapter 1 – Introduction 1 - 1 1.1 Background and Problem Definition 1 - 1 1.1.1...Background 1 - 1 1.1.2 Problem Definition 1 -2 1.2 The Objective and Approach of the HFM-090/TG-25 1 -2 1.2.1 Objective 1 -2 1.2.2 Approach 1 -2 1.3...Organization of this Report 1 -3 1.4 References 1 -3 Chapter 2 – The Mine Detonation Process and Occupant Loading 2- 1 2.1 Introduction to Mines 2- 1 2.2
Flooded Underground Coal Mines: A Significant Source of Inexpensive Geothermal Energy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Watzlaf, G.R.; Ackman, T.E.
2007-04-01
Many mining regions in the United States contain extensive areas of flooded underground mines. The water within these mines represents a significant and widespread opportunity for extracting low-grade, geothermal energy. Based on current energy prices, geothermal heat pump systems using mine water could reduce the annual costs for heating to over 70 percent compared to conventional heating methods (natural gas or heating oil). These same systems could reduce annual cooling costs by up to 50 percent over standard air conditioning in many areas of the country. (Formatted full-text version is released by permission of publisher)
Texas lignite mining: Groundwater and slope stability control in the nineties and beyond
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lawrence J.
As lignite mining in Texas approaches and exceeds depths of 200 feet below ground level, rising costs demand that innovative mining approaches be used in order to maintain the economic viability of lignite mining. Groundwater and slope stability problems multiply at these depths, resulting in increasing focus on how to control these costs. Dewatering costs are consistently rising for the lignite industry, as deeper mining encounters more and larger saturated sand bodies. These sands require dewatering in order to improve slope stability. Planning and analysis become more important as the number of wells grows beyond what can be managed withmore » a simple {open_quotes}cookie-cutter{close_quotes} approach. Slope stability plays an increasing role in mining concerns as deeper lignite is recovered. Slope stability causes several problems, including loss of lignite, increased rehandle, and hazards to personnel and equipment. Traditional lignite mine planning involved a fairly {open_quotes}generic{close_quotes} pit design with one design highwall angle, one design spoil angle, and little geotechnical evaluation of the deposit. This {open_quotes}one mine-one design{close_quotes} approach, while cost-effective in the past, is now being replaced by a more critical analysis of the design requirements of each area. Geotechnical evaluation plays an increasing role in the planning and operational aspects of lignite mining. Laboratory core sample test results can be used for slope stability modeling, in order to obtain more accurate design and operational information.« less
NASA Astrophysics Data System (ADS)
Jia, Duo; Wang, Cangjiao; Lei, Shaogang
2018-01-01
Mapping vegetation dynamic types in mining areas is significant for revealing the mechanisms of environmental damage and for guiding ecological construction. Dynamic types of vegetation can be identified by applying interannual normalized difference vegetation index (NDVI) time series. However, phase differences and time shifts in interannual time series decrease mapping accuracy in mining regions. To overcome these problems and to increase the accuracy of mapping vegetation dynamics, an interannual Landsat time series for optimum vegetation growing status was constructed first by using the enhanced spatial and temporal adaptive reflectance fusion model algorithm. We then proposed a Markov random field optimized semisupervised Gaussian dynamic time warping kernel-based fuzzy c-means (FCM) cluster algorithm for interannual NDVI time series to map dynamic vegetation types in mining regions. The proposed algorithm has been tested in the Shengli mining region and Shendong mining region, which are typical representatives of China's open-pit and underground mining regions, respectively. Experiments show that the proposed algorithm can solve the problems of phase differences and time shifts to achieve better performance when mapping vegetation dynamic types. The overall accuracies for the Shengli and Shendong mining regions were 93.32% and 89.60%, respectively, with improvements of 7.32% and 25.84% when compared with the original semisupervised FCM algorithm.
Towards Cooperative Predictive Data Mining in Competitive Environments
NASA Astrophysics Data System (ADS)
Lisý, Viliam; Jakob, Michal; Benda, Petr; Urban, Štěpán; Pěchouček, Michal
We study the problem of predictive data mining in a competitive multi-agent setting, in which each agent is assumed to have some partial knowledge required for correctly classifying a set of unlabelled examples. The agents are self-interested and therefore need to reason about the trade-offs between increasing their classification accuracy by collaborating with other agents and disclosing their private classification knowledge to other agents through such collaboration. We analyze the problem and propose a set of components which can enable cooperation in this otherwise competitive task. These components include measures for quantifying private knowledge disclosure, data-mining models suitable for multi-agent predictive data mining, and a set of strategies by which agents can improve their classification accuracy through collaboration. The overall framework and its individual components are validated on a synthetic experimental domain.
[The "Mining Rescue System and Mine Fires" Working Group. Tasks, results, future activities].
Coenders, A
1983-01-01
The president of the working party presents details of its principal tasks in the past and in the present time. These can be summed up in a study of the problems mentioned below and the subsequent elaboration of recommendations for the benefit of the governments, guidelines, information reports and research proposals. The principal problems that were or are still under study are: --prevention of fires: shaft equipment, hydraulic fluids, belt conveyors, . . .; --detection of mine fires and spontaneous combustion; --fighting of mine fires: shaft fires, construction of stoppings, openings and recovering of fire zones, . . .; --coordination and rescue equipment: escape and rescue breathing apparatus, flameproof clothing, rescue of trapped miners; --stabilization of ventilation in the event of fire, . . . The speaker stresses the importance of the information exchange and the atmosphere of fellowship and solidarity that prevails in the working party.
Big data mining analysis method based on cloud computing
NASA Astrophysics Data System (ADS)
Cai, Qing Qiu; Cui, Hong Gang; Tang, Hao
2017-08-01
Information explosion era, large data super-large, discrete and non-(semi) structured features have gone far beyond the traditional data management can carry the scope of the way. With the arrival of the cloud computing era, cloud computing provides a new technical way to analyze the massive data mining, which can effectively solve the problem that the traditional data mining method cannot adapt to massive data mining. This paper introduces the meaning and characteristics of cloud computing, analyzes the advantages of using cloud computing technology to realize data mining, designs the mining algorithm of association rules based on MapReduce parallel processing architecture, and carries out the experimental verification. The algorithm of parallel association rule mining based on cloud computing platform can greatly improve the execution speed of data mining.
Raja, Kalpana; Patrick, Matthew; Gao, Yilin; Madu, Desmond; Yang, Yuyang
2017-01-01
In the past decade, the volume of “omics” data generated by the different high-throughput technologies has expanded exponentially. The managing, storing, and analyzing of this big data have been a great challenge for the researchers, especially when moving towards the goal of generating testable data-driven hypotheses, which has been the promise of the high-throughput experimental techniques. Different bioinformatics approaches have been developed to streamline the downstream analyzes by providing independent information to interpret and provide biological inference. Text mining (also known as literature mining) is one of the commonly used approaches for automated generation of biological knowledge from the huge number of published articles. In this review paper, we discuss the recent advancement in approaches that integrate results from omics data and information generated from text mining approaches to uncover novel biomedical information. PMID:28331849
Systematic drug repositioning through mining adverse event data in ClinicalTrials.gov.
Su, Eric Wen; Sanger, Todd M
2017-01-01
Drug repositioning (i.e., drug repurposing) is the process of discovering new uses for marketed drugs. Historically, such discoveries were serendipitous. However, the rapid growth in electronic clinical data and text mining tools makes it feasible to systematically identify drugs with the potential to be repurposed. Described here is a novel method of drug repositioning by mining ClinicalTrials.gov. The text mining tools I2E (Linguamatics) and PolyAnalyst (Megaputer) were utilized. An I2E query extracts "Serious Adverse Events" (SAE) data from randomized trials in ClinicalTrials.gov. Through a statistical algorithm, a PolyAnalyst workflow ranks the drugs where the treatment arm has fewer predefined SAEs than the control arm, indicating that potentially the drug is reducing the level of SAE. Hypotheses could then be generated for the new use of these drugs based on the predefined SAE that is indicative of disease (for example, cancer).
Matsuda, Yoshio; Manaka, Tomoko; Kobayashi, Makiko; Sato, Shuhei; Ohwada, Michitaka
2016-06-01
The aim of the present study was to examine the possibility of screening apprehensive pregnant women and mothers at risk for post-partum depression from an analysis of the textual data in the Mother and Child Handbook by using the text-mining method. Uncomplicated pregnant women (n = 58) were divided into two groups according to State-Trait Anxiety Inventory grade (high trait [group I, n = 21] and low trait [group II, n = 37]) or Edinburgh Postnatal Depression Scale score (high score [group III, n = 15] and low score [group IV, n = 43]). An exploratory analysis of the textual data from the Maternal and Child Handbook was conducted using the text-mining method with the Word Miner software program. A comparison of the 'structure elements' was made between the two groups. The number of structure elements extracted by separated words from text data was 20 004 and the number of structure elements with a threshold of 2 or more as an initial value was 1168. Fifteen key words related to maternal anxiety, and six key words related to post-partum depression were extracted. The text-mining method is useful for the exploratory analysis of textual data obtained from pregnant woman, and this screening method has been suggested to be useful for apprehensive pregnant women and mothers at risk for post-partum depression. © 2016 Japan Society of Obstetrics and Gynecology.
An Integrated Suite of Text and Data Mining Tools - Phase II
2005-08-30
Riverside, CA, USA Mazda Motor Corp, Jpn Univ of Darmstadt, Darmstadt, Ger Navy Center for Applied Research in Artificial Intelligence Univ of...with Georgia Tech Research Corporation developed a desktop text-mining software tool named TechOASIS (known commercially as VantagePoint). By the...of this dataset and groups the Corporate Source items that co-occur with the found items. He decides he is only interested in the institutions
He, Qiwei; Veldkamp, Bernard P; Glas, Cees A W; de Vries, Theo
2017-03-01
Patients' narratives about traumatic experiences and symptoms are useful in clinical screening and diagnostic procedures. In this study, we presented an automated assessment system to screen patients for posttraumatic stress disorder via a natural language processing and text-mining approach. Four machine-learning algorithms-including decision tree, naive Bayes, support vector machine, and an alternative classification approach called the product score model-were used in combination with n-gram representation models to identify patterns between verbal features in self-narratives and psychiatric diagnoses. With our sample, the product score model with unigrams attained the highest prediction accuracy when compared with practitioners' diagnoses. The addition of multigrams contributed most to balancing the metrics of sensitivity and specificity. This article also demonstrates that text mining is a promising approach for analyzing patients' self-expression behavior, thus helping clinicians identify potential patients from an early stage.
Using ontology network structure in text mining.
Berndt, Donald J; McCart, James A; Luther, Stephen L
2010-11-13
Statistical text mining treats documents as bags of words, with a focus on term frequencies within documents and across document collections. Unlike natural language processing (NLP) techniques that rely on an engineered vocabulary or a full-featured ontology, statistical approaches do not make use of domain-specific knowledge. The freedom from biases can be an advantage, but at the cost of ignoring potentially valuable knowledge. The approach proposed here investigates a hybrid strategy based on computing graph measures of term importance over an entire ontology and injecting the measures into the statistical text mining process. As a starting point, we adapt existing search engine algorithms such as PageRank and HITS to determine term importance within an ontology graph. The graph-theoretic approach is evaluated using a smoking data set from the i2b2 National Center for Biomedical Computing, cast as a simple binary classification task for categorizing smoking-related documents, demonstrating consistent improvements in accuracy.
NASA Technical Reports Server (NTRS)
Hughes, T. H.; Dillion, A. C., III; White, J. R., Jr.; Drummond, S. E., Jr.; Hooks, W. G.
1975-01-01
Because of the volume of coal produced by strip mining, the proximity of mining operations, and the diversity of mining methods (e.g. contour stripping, area stripping, multiple seam stripping, and augering, as well as underground mining), the Warrior Coal Basin seemed best suited for initial studies on the physical impact of strip mining in Alabama. Two test sites, (Cordova and Searles) representative of the various strip mining techniques and environmental problems, were chosen for intensive studies of the correlation between remote sensing and ground truth data. Efforts were eventually concentrated in the Searles Area, since it is more accessible and offers a better opportunity for study of erosional and depositional processes than the Cordova Area.
NASA Astrophysics Data System (ADS)
Moyle, Steve
Collaborative Data Mining is a setting where the Data Mining effort is distributed to multiple collaborating agents - human or software. The objective of the collaborative Data Mining effort is to produce solutions to the tackled Data Mining problem which are considered better by some metric, with respect to those solutions that would have been achieved by individual, non-collaborating agents. The solutions require evaluation, comparison, and approaches for combination. Collaboration requires communication, and implies some form of community. The human form of collaboration is a social task. Organizing communities in an effective manner is non-trivial and often requires well defined roles and processes. Data Mining, too, benefits from a standard process. This chapter explores the standard Data Mining process CRISP-DM utilized in a collaborative setting.
30 CFR 904.25 - Approval of Arkansas abandoned mine land reclamation plan amendments.
Code of Federal Regulations, 2010 CFR
2010-07-01
...; Management accounting; and Abandoned mine land problem description. September 22, 1999 January 14, 2000... management and disposition of land and water; Reclamation on private land; Rights of entry; Public...
Heavy metal pollution in soils of abandoned mining areas (SE, Spain)
NASA Astrophysics Data System (ADS)
Martínez-Sánchez, M. J.; Pérez-Sirvent, C.; Molina, J.; Tudela, M. L.; Navarro, M. C.; García-Lorenzo, M. L.
2009-04-01
Elevated levels of heavy metals can be found in and around disused metalliferous mines due to discharge and dispersion of mine wastes into nearby agricultural soils, food crops and stream systems. Heavy metals contained in the residues from mining and metallurgical operations are often dispersed by wind and/or water after their disposal. These areas have severe erosion problems caused by wind and water runoff in which soil and mine spoil texture, landscape topography and regional and microclimate play an important role. The present study was carried out in the Cabezo Rajao (La Uni
The structure and infrastructure of the global nanotechnology literature
NASA Astrophysics Data System (ADS)
Kostoff, Ronald N.; Stump, Jesse A.; Johnson, Dustin; Murday, James S.; Lau, Clifford G. Y.; Tolles, William M.
2006-08-01
Text mining is the extraction of useful information from large volumes of text. A text mining analysis of the global open nanotechnology literature was performed. Records from the Science Citation Index (SCI)/Social SCI were analyzed to provide the infrastructure of the global nanotechnology literature (prolific authors/journals/institutions/countries, most cited authors/papers/journals) and the thematic structure (taxonomy) of the global nanotechnology literature, from a science perspective. Records from the Engineering Compendex (EC) were analyzed to provide a taxonomy from a technology perspective. The Far Eastern countries have expanded nanotechnology publication output dramatically in the past decade.
PubMed-EX: a web browser extension to enhance PubMed search with text mining features.
Tsai, Richard Tzong-Han; Dai, Hong-Jie; Lai, Po-Ting; Huang, Chi-Hsin
2009-11-15
PubMed-EX is a browser extension that marks up PubMed search results with additional text-mining information. PubMed-EX's page mark-up, which includes section categorization and gene/disease and relation mark-up, can help researchers to quickly focus on key terms and provide additional information on them. All text processing is performed server-side, freeing up user resources. PubMed-EX is freely available at http://bws.iis.sinica.edu.tw/PubMed-EX and http://iisr.cse.yzu.edu.tw:8000/PubMed-EX/.
Innovation Engine for Blog Spaces
2011-09-01
183 7.2.2 Architecture for mining Wikipedia as a sense-annotated corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183...are mined from a corpus by dictionary learning, and the representation is com- puted by sparse coding (Sec. 5.5). The topics can be embedded into a...intend to deter- mine the exact sense of a word whose surface form is unknown. This generalizes the original word sense disambiguation problem since we
Model architecture of intelligent data mining oriented urban transportation information
NASA Astrophysics Data System (ADS)
Yang, Bogang; Tao, Yingchun; Sui, Jianbo; Zhang, Feizhou
2007-06-01
Aiming at solving practical problems in urban traffic, the paper presents model architecture of intelligent data mining from hierarchical view. With artificial intelligent technologies used in the framework, the intelligent data mining technology improves, which is more suitable for the change of real-time road condition. It also provides efficient technology support for the urban transport information distribution, transmission and display.
Biology in focus: better lives through better science: new hope for acid streams
Watten, Barnaby
1998-01-01
Across the nation, a toxic pollutant turns clean streams orange, kills fish and plant life, and smells like rotten eggs. The culprit is acid mine drainage, the poisonous water leaking from more than 500,000 abandoned and inactive mines in 32 states. The toxic discharge is a problem for operational mines as well. In the Appalachian coal region, for example, acid mine drainage has degraded more than 8,000 miles of streams and has left some aquatic habitats virtually lifeless.
Hazards and occupational risk in hard coal mines - a critical analysis of legal requirements
NASA Astrophysics Data System (ADS)
Krause, Marcin
2017-11-01
This publication concerns the problems of occupational safety and health in hard coal mines, the basic elements of which are the mining hazards and the occupational risk. The work includes a comparative analysis of selected provisions of general and industry-specific law regarding the analysis of hazards and occupational risk assessment. Based on a critical analysis of legal requirements, basic assumptions regarding the practical guidelines for occupational risk assessment in underground coal mines have been proposed.
Gimli: open source and high-performance biomedical name recognition
2013-01-01
Background Automatic recognition of biomedical names is an essential task in biomedical information extraction, presenting several complex and unsolved challenges. In recent years, various solutions have been implemented to tackle this problem. However, limitations regarding system characteristics, customization and usability still hinder their wider application outside text mining research. Results We present Gimli, an open-source, state-of-the-art tool for automatic recognition of biomedical names. Gimli includes an extended set of implemented and user-selectable features, such as orthographic, morphological, linguistic-based, conjunctions and dictionary-based. A simple and fast method to combine different trained models is also provided. Gimli achieves an F-measure of 87.17% on GENETAG and 72.23% on JNLPBA corpus, significantly outperforming existing open-source solutions. Conclusions Gimli is an off-the-shelf, ready to use tool for named-entity recognition, providing trained and optimized models for recognition of biomedical entities from scientific text. It can be used as a command line tool, offering full functionality, including training of new models and customization of the feature set and model parameters through a configuration file. Advanced users can integrate Gimli in their text mining workflows through the provided library, and extend or adapt its functionalities. Based on the underlying system characteristics and functionality, both for final users and developers, and on the reported performance results, we believe that Gimli is a state-of-the-art solution for biomedical NER, contributing to faster and better research in the field. Gimli is freely available at http://bioinformatics.ua.pt/gimli. PMID:23413997
Magnetic sensor technology for detecting mines, UXO, and other concealed security threats
NASA Astrophysics Data System (ADS)
Czipott, Peter V.; Iwanowski, Mark D.
1997-01-01
Magnetic sensors have been the sensor of choice in the detection and classification of buried mines and unexploded ordnance (UXO), both on land and underwater, Quantum Magnetics (QM), together with its research partner IBM, have developed a variety of advanced, very high sensitivity superconducting and room temperature magnetic sensors to meet military needs. This work has led to the development and utilization of a three-sensor gradiometer (TSG) patented by IBM, which cannot only detect, but also localize mines and ordnance. QM is also working with IBM and the U.S. Navy to develop an advanced superconducting gradiometer for buried underwater mine detection. The ability to both detect and classify buried non-metallic mines is virtually impossible with existing magnetic sensors. To solve this problem, Quantum Magnetics, building on work of the Naval Research Laboratory (NRL), is pioneering work in the development of quadrupole resonance (QR) methods which can be used to detect the explosive material directly. Based on recent laboratory work done at QM and previous work done in the U.S., Russia and the United Kingdom, we are confident that QR can be effectively applied to the non-metallic mine identification problem.
Research of mine water source identification based on LIF technology
NASA Astrophysics Data System (ADS)
Zhou, Mengran; Yan, Pengcheng
2016-09-01
According to the problem that traditional chemical methods to the mine water source identification takes a long time, put forward a method for rapid source identification system of mine water inrush based on the technology of laser induced fluorescence (LIF). Emphatically analyzes the basic principle of LIF technology. The hardware composition of LIF system are analyzed and the related modules were selected. Through the fluorescence experiment with the water samples of coal mine in the LIF system, fluorescence spectra of water samples are got. Traditional water source identification mainly according to the ion concentration representative of the water, but it is hard to analysis the ion concentration of the water from the fluorescence spectra. This paper proposes a simple and practical method of rapid identification of water by fluorescence spectrum, which measure the space distance between unknown water samples and standard samples, and then based on the clustering analysis, the category of the unknown water sample can be get. Water source identification for unknown samples verified the reliability of the LIF system, and solve the problem that the current coal mine can't have a better real-time and online monitoring on water inrush, which is of great significance for coal mine safety in production.
Mining algorithm for association rules in big data based on Hadoop
NASA Astrophysics Data System (ADS)
Fu, Chunhua; Wang, Xiaojing; Zhang, Lijun; Qiao, Liying
2018-04-01
In order to solve the problem that the traditional association rules mining algorithm has been unable to meet the mining needs of large amount of data in the aspect of efficiency and scalability, take FP-Growth as an example, the algorithm is realized in the parallelization based on Hadoop framework and Map Reduce model. On the basis, it is improved using the transaction reduce method for further enhancement of the algorithm's mining efficiency. The experiment, which consists of verification of parallel mining results, comparison on efficiency between serials and parallel, variable relationship between mining time and node number and between mining time and data amount, is carried out in the mining results and efficiency by Hadoop clustering. Experiments show that the paralleled FP-Growth algorithm implemented is able to accurately mine frequent item sets, with a better performance and scalability. It can be better to meet the requirements of big data mining and efficiently mine frequent item sets and association rules from large dataset.
Learning in the context of distribution drift
2017-05-09
published in the leading data mining journal, Data Mining and Knowledge Discovery (Webb et. al., 2016)1. We have shown that the previous qualitative...learner Low-bias learner Aggregated classifier Figure 7: Architecture for learning fr m streaming data in th co text of variable or unknown...Learning limited dependence Bayesian classifiers, in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD
Enhancements for a Dynamic Data Warehousing and Mining System for Large-scale HSCB Data
2016-07-20
Intelligent Automation Incorporated Enhancements for a Dynamic Data Warehousing and Mining ...Page | 2 Intelligent Automation Incorporated Monthly Report No. 4 Enhancements for a Dynamic Data Warehousing and Mining System Large-Scale HSCB...including Top Videos, Top Users, Top Words, and Top Languages, and also applied NER to the text associated with YouTube posts. We have also developed UI for
Enhancements for a Dynamic Data Warehousing and Mining System for Large-Scale HSCB Data
2016-07-20
Intelligent Automation Incorporated Enhancements for a Dynamic Data Warehousing and Mining ...Page | 2 Intelligent Automation Incorporated Monthly Report No. 4 Enhancements for a Dynamic Data Warehousing and Mining System Large-Scale HSCB...including Top Videos, Top Users, Top Words, and Top Languages, and also applied NER to the text associated with YouTube posts. We have also developed UI for
Knowledge modeling of coal mining equipments based on ontology
NASA Astrophysics Data System (ADS)
Zhang, Baolong; Wang, Xiangqian; Li, Huizong; Jiang, Miaomiao
2017-06-01
The problems of information redundancy and sharing are universe in coal mining equipment management. In order to improve the using efficiency of knowledge of coal mining equipments, this paper proposed a new method of knowledge modeling based on ontology. On the basis of analyzing the structures and internal relations of coal mining equipment knowledge, taking OWL as ontology construct language, the ontology model of coal mining equipment knowledge is built with the help of Protégé 4.3 software tools. The knowledge description method will lay the foundation for the high effective knowledge management and sharing, which is very significant for improving the production management level of coal mining enterprises.
Wright, A; McCoy, A; Henkin, S; Flaherty, M; Sittig, D
2013-01-01
In a prior study, we developed methods for automatically identifying associations between medications and problems using association rule mining on a large clinical data warehouse and validated these methods at a single site which used a self-developed electronic health record. To demonstrate the generalizability of these methods by validating them at an external site. We received data on medications and problems for 263,597 patients from the University of Texas Health Science Center at Houston Faculty Practice, an ambulatory practice that uses the Allscripts Enterprise commercial electronic health record product. We then conducted association rule mining to identify associated pairs of medications and problems and characterized these associations with five measures of interestingness: support, confidence, chi-square, interest and conviction and compared the top-ranked pairs to a gold standard. 25,088 medication-problem pairs were identified that exceeded our confidence and support thresholds. An analysis of the top 500 pairs according to each measure of interestingness showed a high degree of accuracy for highly-ranked pairs. The same technique was successfully employed at the University of Texas and accuracy was comparable to our previous results. Top associations included many medications that are highly specific for a particular problem as well as a large number of common, accurate medication-problem pairs that reflect practice patterns.
An Overview of GIS-Based Modeling and Assessment of Mining-Induced Hazards: Soil, Water, and Forest
Kim, Sung-Min; Yi, Huiuk; Choi, Yosoon
2017-01-01
In this study, current geographic information system (GIS)-based methods and their application for the modeling and assessment of mining-induced hazards were reviewed. Various types of mining-induced hazard, including soil contamination, soil erosion, water pollution, and deforestation were considered in the discussion of the strength and role of GIS as a viable problem-solving tool in relation to mining-induced hazards. The various types of mining-induced hazard were classified into two or three subtopics according to the steps involved in the reclamation procedure, or elements of the hazard of interest. Because GIS is appropriated for the handling of geospatial data in relation to mining-induced hazards, the application and feasibility of exploiting GIS-based modeling and assessment of mining-induced hazards within the mining industry could be expanded further. PMID:29186922
An Overview of GIS-Based Modeling and Assessment of Mining-Induced Hazards: Soil, Water, and Forest.
Suh, Jangwon; Kim, Sung-Min; Yi, Huiuk; Choi, Yosoon
2017-11-27
In this study, current geographic information system (GIS)-based methods and their application for the modeling and assessment of mining-induced hazards were reviewed. Various types of mining-induced hazard, including soil contamination, soil erosion, water pollution, and deforestation were considered in the discussion of the strength and role of GIS as a viable problem-solving tool in relation to mining-induced hazards. The various types of mining-induced hazard were classified into two or three subtopics according to the steps involved in the reclamation procedure, or elements of the hazard of interest. Because GIS is appropriated for the handling of geospatial data in relation to mining-induced hazards, the application and feasibility of exploiting GIS-based modeling and assessment of mining-induced hazards within the mining industry could be expanded further.
Database citation in full text biomedical articles.
Kafkas, Şenay; Kim, Jee-Hyub; McEntyre, Johanna R
2013-01-01
Molecular biology and literature databases represent essential infrastructure for life science research. Effective integration of these data resources requires that there are structured cross-references at the level of individual articles and biological records. Here, we describe the current patterns of how database entries are cited in research articles, based on analysis of the full text Open Access articles available from Europe PMC. Focusing on citation of entries in the European Nucleotide Archive (ENA), UniProt and Protein Data Bank, Europe (PDBe), we demonstrate that text mining doubles the number of structured annotations of database record citations supplied in journal articles by publishers. Many thousands of new literature-database relationships are found by text mining, since these relationships are also not present in the set of articles cited by database records. We recommend that structured annotation of database records in articles is extended to other databases, such as ArrayExpress and Pfam, entries from which are also cited widely in the literature. The very high precision and high-throughput of this text-mining pipeline makes this activity possible both accurately and at low cost, which will allow the development of new integrated data services.
Database Citation in Full Text Biomedical Articles
Kafkas, Şenay; Kim, Jee-Hyub; McEntyre, Johanna R.
2013-01-01
Molecular biology and literature databases represent essential infrastructure for life science research. Effective integration of these data resources requires that there are structured cross-references at the level of individual articles and biological records. Here, we describe the current patterns of how database entries are cited in research articles, based on analysis of the full text Open Access articles available from Europe PMC. Focusing on citation of entries in the European Nucleotide Archive (ENA), UniProt and Protein Data Bank, Europe (PDBe), we demonstrate that text mining doubles the number of structured annotations of database record citations supplied in journal articles by publishers. Many thousands of new literature-database relationships are found by text mining, since these relationships are also not present in the set of articles cited by database records. We recommend that structured annotation of database records in articles is extended to other databases, such as ArrayExpress and Pfam, entries from which are also cited widely in the literature. The very high precision and high-throughput of this text-mining pipeline makes this activity possible both accurately and at low cost, which will allow the development of new integrated data services. PMID:23734176
Subramani, Suresh; Kalpana, Raja; Monickaraj, Pankaj Moses; Natarajan, Jeyakumar
2015-04-01
The knowledge on protein-protein interactions (PPI) and their related pathways are equally important to understand the biological functions of the living cell. Such information on human proteins is highly desirable to understand the mechanism of several diseases such as cancer, diabetes, and Alzheimer's disease. Because much of that information is buried in biomedical literature, an automated text mining system for visualizing human PPI and pathways is highly desirable. In this paper, we present HPIminer, a text mining system for visualizing human protein interactions and pathways from biomedical literature. HPIminer extracts human PPI information and PPI pairs from biomedical literature, and visualize their associated interactions, networks and pathways using two curated databases HPRD and KEGG. To our knowledge, HPIminer is the first system to build interaction networks from literature as well as curated databases. Further, the new interactions mined only from literature and not reported earlier in databases are highlighted as new. A comparative study with other similar tools shows that the resultant network is more informative and provides additional information on interacting proteins and their associated networks. Copyright © 2015 Elsevier Inc. All rights reserved.
Preference Mining Using Neighborhood Rough Set Model on Two Universes.
Zeng, Kai
2016-01-01
Preference mining plays an important role in e-commerce and video websites for enhancing user satisfaction and loyalty. Some classical methods are not available for the cold-start problem when the user or the item is new. In this paper, we propose a new model, called parametric neighborhood rough set on two universes (NRSTU), to describe the user and item data structures. Furthermore, the neighborhood lower approximation operator is used for defining the preference rules. Then, we provide the means for recommending items to users by using these rules. Finally, we give an experimental example to show the details of NRSTU-based preference mining for cold-start problem. The parameters of the model are also discussed. The experimental results show that the proposed method presents an effective solution for preference mining. In particular, NRSTU improves the recommendation accuracy by about 19% compared to the traditional method.
Luo, Gang
2017-12-01
For user-friendliness, many software systems offer progress indicators for long-duration tasks. A typical progress indicator continuously estimates the remaining task execution time as well as the portion of the task that has been finished. Building a machine learning model often takes a long time, but no existing machine learning software supplies a non-trivial progress indicator. Similarly, running a data mining algorithm often takes a long time, but no existing data mining software provides a nontrivial progress indicator. In this article, we consider the problem of offering progress indicators for machine learning model building and data mining algorithm execution. We discuss the goals and challenges intrinsic to this problem. Then we describe an initial framework for implementing such progress indicators and two advanced, potential uses of them, with the goal of inspiring future research on this topic.
The ``battle of gold'' under the light of green economics: a case study from Greece
NASA Astrophysics Data System (ADS)
Damigos, D.; Kaliampakos, D.
2006-05-01
Mining firms stimulate local and national economies but this comes at a certain cost. In the light of increasing public concern, external costs of environmental degradation and social disruption are no longer of pure academic interest. The assessment of mining projects on the grounds of sustainable development is critical in order to decide whether the exploitation of mineral resources is socially desirable. In practice, few steps have been taken towards this end. In this paper, a case study is illustrated that provides the means for evaluating the social worthiness of mining projects. The analysis, which is the first of its kind in Greece, deals with a major problem of the mining industry: the gold debate on the grounds of green economics. The assessment is based on the social cost benefit approach. Well-established techniques (e.g. benefit transfer) and innovative approaches have been adopted to overcome various practical problems
Luo, Gang
2017-01-01
For user-friendliness, many software systems offer progress indicators for long-duration tasks. A typical progress indicator continuously estimates the remaining task execution time as well as the portion of the task that has been finished. Building a machine learning model often takes a long time, but no existing machine learning software supplies a non-trivial progress indicator. Similarly, running a data mining algorithm often takes a long time, but no existing data mining software provides a nontrivial progress indicator. In this article, we consider the problem of offering progress indicators for machine learning model building and data mining algorithm execution. We discuss the goals and challenges intrinsic to this problem. Then we describe an initial framework for implementing such progress indicators and two advanced, potential uses of them, with the goal of inspiring future research on this topic. PMID:29177022
2009-06-01
capabilities: web-based, relational/multi-dimensional, client/server, and metadata (data about data) inclusion (pp. 39-40). Text mining, on the other...and Organizational Systems ( CASOS ) (Carley, 2005). Although AutoMap can be used to conduct text-mining, it was utilized only for its visualization...provides insight into how the GMCOI is using the terms, and where there might be redundant terms and need for de -confliction and standardization
Terminologies for text-mining; an experiment in the lipoprotein metabolism domain
Alexopoulou, Dimitra; Wächter, Thomas; Pickersgill, Laura; Eyre, Cecilia; Schroeder, Michael
2008-01-01
Background The engineering of ontologies, especially with a view to a text-mining use, is still a new research field. There does not yet exist a well-defined theory and technology for ontology construction. Many of the ontology design steps remain manual and are based on personal experience and intuition. However, there exist a few efforts on automatic construction of ontologies in the form of extracted lists of terms and relations between them. Results We share experience acquired during the manual development of a lipoprotein metabolism ontology (LMO) to be used for text-mining. We compare the manually created ontology terms with the automatically derived terminology from four different automatic term recognition (ATR) methods. The top 50 predicted terms contain up to 89% relevant terms. For the top 1000 terms the best method still generates 51% relevant terms. In a corpus of 3066 documents 53% of LMO terms are contained and 38% can be generated with one of the methods. Conclusions Given high precision, automatic methods can help decrease development time and provide significant support for the identification of domain-specific vocabulary. The coverage of the domain vocabulary depends strongly on the underlying documents. Ontology development for text mining should be performed in a semi-automatic way; taking ATR results as input and following the guidelines we described. Availability The TFIDF term recognition is available as Web Service, described at PMID:18460175
Abbe, Adeline; Falissard, Bruno
2017-10-23
Internet is a particularly dynamic way to quickly capture the perceptions of a population in real time. Complementary to traditional face-to-face communication, online social networks help patients to improve self-esteem and self-help. The aim of this study was to use text mining on material from an online forum exploring patients' concerns about treatment (antidepressants and anxiolytics). Concerns about treatment were collected from discussion titles in patients' online community related to antidepressants and anxiolytics. To examine the content of these titles automatically, we used text mining methods, such as word frequency in a document-term matrix and co-occurrence of words using a network analysis. It was thus possible to identify topics discussed on the forum. The forum included 2415 discussions on antidepressants and anxiolytics over a period of 3 years. After a preprocessing step, the text mining algorithm identified the 99 most frequently occurring words in titles, among which were escitalopram, withdrawal, antidepressant, venlafaxine, paroxetine, and effect. Patients' concerns were related to antidepressant withdrawal, the need to share experience about symptoms, effects, and questions on weight gain with some drugs. Patients' expression on the Internet is a potential additional resource in addressing patients' concerns about treatment. Patient profiles are close to that of patients treated in psychiatry. ©Adeline Abbe, Bruno Falissard. Originally published in JMIR Mental Health (http://mental.jmir.org), 23.10.2017.
Jonnagaddala, Jitendra; Liaw, Siaw-Teng; Ray, Pradeep; Kumar, Manish; Chang, Nai-Wen; Dai, Hong-Jie
2015-12-01
Coronary artery disease (CAD) often leads to myocardial infarction, which may be fatal. Risk factors can be used to predict CAD, which may subsequently lead to prevention or early intervention. Patient data such as co-morbidities, medication history, social history and family history are required to determine the risk factors for a disease. However, risk factor data are usually embedded in unstructured clinical narratives if the data is not collected specifically for risk assessment purposes. Clinical text mining can be used to extract data related to risk factors from unstructured clinical notes. This study presents methods to extract Framingham risk factors from unstructured electronic health records using clinical text mining and to calculate 10-year coronary artery disease risk scores in a cohort of diabetic patients. We developed a rule-based system to extract risk factors: age, gender, total cholesterol, HDL-C, blood pressure, diabetes history and smoking history. The results showed that the output from the text mining system was reliable, but there was a significant amount of missing data to calculate the Framingham risk score. A systematic approach for understanding missing data was followed by implementation of imputation strategies. An analysis of the 10-year Framingham risk scores for coronary artery disease in this cohort has shown that the majority of the diabetic patients are at moderate risk of CAD. Copyright © 2015 Elsevier Inc. All rights reserved.
Cañada, Andres; Capella-Gutierrez, Salvador; Rabal, Obdulia; Oyarzabal, Julen; Valencia, Alfonso; Krallinger, Martin
2017-07-03
A considerable effort has been devoted to retrieve systematically information for genes and proteins as well as relationships between them. Despite the importance of chemical compounds and drugs as a central bio-entity in pharmacological and biological research, only a limited number of freely available chemical text-mining/search engine technologies are currently accessible. Here we present LimTox (Literature Mining for Toxicology), a web-based online biomedical search tool with special focus on adverse hepatobiliary reactions. It integrates a range of text mining, named entity recognition and information extraction components. LimTox relies on machine-learning, rule-based, pattern-based and term lookup strategies. This system processes scientific abstracts, a set of full text articles and medical agency assessment reports. Although the main focus of LimTox is on adverse liver events, it enables also basic searches for other organ level toxicity associations (nephrotoxicity, cardiotoxicity, thyrotoxicity and phospholipidosis). This tool supports specialized search queries for: chemical compounds/drugs, genes (with additional emphasis on key enzymes in drug metabolism, namely P450 cytochromes-CYPs) and biochemical liver markers. The LimTox website is free and open to all users and there is no login requirement. LimTox can be accessed at: http://limtox.bioinfo.cnio.es. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Text Mining for Drugs and Chemical Compounds: Methods, Tools and Applications.
Vazquez, Miguel; Krallinger, Martin; Leitner, Florian; Valencia, Alfonso
2011-06-01
Providing prior knowledge about biological properties of chemicals, such as kinetic values, protein targets, or toxic effects, can facilitate many aspects of drug development. Chemical information is rapidly accumulating in all sorts of free text documents like patents, industry reports, or scientific articles, which has motivated the development of specifically tailored text mining applications. Despite the potential gains, chemical text mining still faces significant challenges. One of the most salient is the recognition of chemical entities mentioned in text. To help practitioners contribute to this area, a good portion of this review is devoted to this issue, and presents the basic concepts and principles underlying the main strategies. The technical details are introduced and accompanied by relevant bibliographic references. Other tasks discussed are retrieving relevant articles, identifying relationships between chemicals and other entities, or determining the chemical structures of chemicals mentioned in text. This review also introduces a number of published applications that can be used to build pipelines in topics like drug side effects, toxicity, and protein-disease-compound network analysis. We conclude the review with an outlook on how we expect the field to evolve, discussing its possibilities and its current limitations. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Mining free-text medical records for companion animal enteric syndrome surveillance.
Anholt, R M; Berezowski, J; Jamal, I; Ribble, C; Stephen, C
2014-03-01
Large amounts of animal health care data are present in veterinary electronic medical records (EMR) and they present an opportunity for companion animal disease surveillance. Veterinary patient records are largely in free-text without clinical coding or fixed vocabulary. Text-mining, a computer and information technology application, is needed to identify cases of interest and to add structure to the otherwise unstructured data. In this study EMR's were extracted from veterinary management programs of 12 participating veterinary practices and stored in a data warehouse. Using commercially available text-mining software (WordStat™), we developed a categorization dictionary that could be used to automatically classify and extract enteric syndrome cases from the warehoused electronic medical records. The diagnostic accuracy of the text-miner for retrieving cases of enteric syndrome was measured against human reviewers who independently categorized a random sample of 2500 cases as enteric syndrome positive or negative. Compared to the reviewers, the text-miner retrieved cases with enteric signs with a sensitivity of 87.6% (95%CI, 80.4-92.9%) and a specificity of 99.3% (95%CI, 98.9-99.6%). Automatic and accurate detection of enteric syndrome cases provides an opportunity for community surveillance of enteric pathogens in companion animals. Copyright © 2014 Elsevier B.V. All rights reserved.
MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kannan, Ramakrishnan; Ballard, Grey; Park, Haesun
Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors W and H, for the given input matrix A, such that A≈WH. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient parallel algorithms to solve the problem for big data sets. The main contribution of this work is a new, high-performance parallel computational framework for a broad class of NMF algorithms thatmore » iteratively solves alternating non-negative least squares (NLS) subproblems for W and H. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). The framework is flexible and able to leverage a variety of NMF and NLS algorithms, including Multiplicative Update, Hierarchical Alternating Least Squares, and Block Principal Pivoting. Our implementation allows us to benchmark and compare different algorithms on massive dense and sparse data matrices of size that spans from few hundreds of millions to billions. We demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance improvements. The code and the datasets used for conducting the experiments are available online.« less
OligoIS: Scalable Instance Selection for Class-Imbalanced Data Sets.
García-Pedrajas, Nicolás; Perez-Rodríguez, Javier; de Haro-García, Aida
2013-02-01
In current research, an enormous amount of information is constantly being produced, which poses a challenge for data mining algorithms. Many of the problems in extremely active research areas, such as bioinformatics, security and intrusion detection, or text mining, share the following two features: large data sets and class-imbalanced distribution of samples. Although many methods have been proposed for dealing with class-imbalanced data sets, most of these methods are not scalable to the very large data sets common to those research fields. In this paper, we propose a new approach to dealing with the class-imbalance problem that is scalable to data sets with many millions of instances and hundreds of features. This proposal is based on the divide-and-conquer principle combined with application of the selection process to balanced subsets of the whole data set. This divide-and-conquer principle allows the execution of the algorithm in linear time. Furthermore, the proposed method is easy to implement using a parallel environment and can work without loading the whole data set into memory. Using 40 class-imbalanced medium-sized data sets, we will demonstrate our method's ability to improve the results of state-of-the-art instance selection methods for class-imbalanced data sets. Using three very large data sets, we will show the scalability of our proposal to millions of instances and hundreds of features.
MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization
Kannan, Ramakrishnan; Ballard, Grey; Park, Haesun
2017-10-30
Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors W and H, for the given input matrix A, such that A≈WH. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient parallel algorithms to solve the problem for big data sets. The main contribution of this work is a new, high-performance parallel computational framework for a broad class of NMF algorithms thatmore » iteratively solves alternating non-negative least squares (NLS) subproblems for W and H. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). The framework is flexible and able to leverage a variety of NMF and NLS algorithms, including Multiplicative Update, Hierarchical Alternating Least Squares, and Block Principal Pivoting. Our implementation allows us to benchmark and compare different algorithms on massive dense and sparse data matrices of size that spans from few hundreds of millions to billions. We demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance improvements. The code and the datasets used for conducting the experiments are available online.« less
Kharat, Amit T; Singh, Amarjit; Kulkarni, Vilas M; Shah, Digish
2014-01-01
Data mining facilitates the study of radiology data in various dimensions. It converts large patient image and text datasets into useful information that helps in improving patient care and provides informative reports. Data mining technology analyzes data within the Radiology Information System and Hospital Information System using specialized software which assesses relationships and agreement in available information. By using similar data analysis tools, radiologists can make informed decisions and predict the future outcome of a particular imaging finding. Data, information and knowledge are the components of data mining. Classes, Clusters, Associations, Sequential patterns, Classification, Prediction and Decision tree are the various types of data mining. Data mining has the potential to make delivery of health care affordable and ensure that the best imaging practices are followed. It is a tool for academic research. Data mining is considered to be ethically neutral, however concerns regarding privacy and legality exists which need to be addressed to ensure success of data mining. PMID:25024513
Analyzing asset management data using data and text mining.
DOT National Transportation Integrated Search
2014-07-01
Predictive models using text from a sample competitively bid California highway projects have been used to predict a construction : projects likely level of cost overrun. A text description of the project and the text of the five largest project line...
BioC implementations in Go, Perl, Python and Ruby.
Liu, Wanli; Islamaj Doğan, Rezarta; Kwon, Dongseop; Marques, Hernani; Rinaldi, Fabio; Wilbur, W John; Comeau, Donald C
2014-01-01
As part of a communitywide effort for evaluating text mining and information extraction systems applied to the biomedical domain, BioC is focused on the goal of interoperability, currently a major barrier to wide-scale adoption of text mining tools. BioC is a simple XML format, specified by DTD, for exchanging data for biomedical natural language processing. With initial implementations in C++ and Java, BioC provides libraries of code for reading and writing BioC text documents and annotations. We extend BioC to Perl, Python, Go and Ruby. We used SWIG to extend the C++ implementation for Perl and one Python implementation. A second Python implementation and the Ruby implementation use native data structures and libraries. BioC is also implemented in the Google language Go. BioC modules are functional in all of these languages, which can facilitate text mining tasks. BioC implementations are freely available through the BioC site: http://bioc.sourceforge.net. Database URL: http://bioc.sourceforge.net/ Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.
Urbanski, William M; Condie, Brian G
2009-12-01
Textpresso Site Specific Recombinases (http://ssrc.genetics.uga.edu/) is a text-mining web server for searching a database of more than 9,000 full-text publications. The papers and abstracts in this database represent a wide range of topics related to site-specific recombinase (SSR) research tools. Included in the database are most of the papers that report the characterization or use of mouse strains that express Cre recombinase as well as papers that describe or analyze mouse lines that carry conditional (floxed) alleles or SSR-activated transgenes/knockins. The database also includes reports describing SSR-based cloning methods such as the Gateway or the Creator systems, papers reporting the development or use of SSR-based tools in systems such as Drosophila, bacteria, parasites, stem cells, yeast, plants, zebrafish, and Xenopus as well as publications that describe the biochemistry, genetics, or molecular structure of the SSRs themselves. Textpresso Site Specific Recombinases is the only comprehensive text-mining resource available for the literature describing the biology and technical applications of SSRs. (c) 2009 Wiley-Liss, Inc.
TOY SAFETY SURVEILLANCE FROM ONLINE REVIEWS
Winkler, Matt; Abrahams, Alan S.; Gruss, Richard; Ehsani, Johnathan P.
2016-01-01
Toy-related injuries account for a significant number of childhood injuries and the prevention of these injuries remains a goal for regulatory agencies and manufacturers. Text-mining is an increasingly prevalent method for uncovering the significance of words using big data. This research sets out to determine the effectiveness of text-mining in uncovering potentially dangerous children’s toys. We develop a danger word list, also known as a ‘smoke word’ list, from injury and recall text narratives. We then use the smoke word lists to score over one million Amazon reviews, with the top scores denoting potential safety concerns. We compare the smoke word list to conventional sentiment analysis techniques, in terms of both word overlap and effectiveness. We find that smoke word lists are highly distinct from conventional sentiment dictionaries and provide a statistically significant method for identifying safety concerns in children’s toy reviews. Our findings indicate that text-mining is, in fact, an effective method for the surveillance of safety concerns in children’s toys and could be a gateway to effective prevention of toy-product-related injuries. PMID:27942092
Fuller, Richard H.; Shay, J.M.; Ferreira, R.F.; Hoffman, R.J.
1978-01-01
Streams draining the mined areas of massive sulfide ore deposits in the Shasta Mining Districts of northern California are generally acidic and contain large concentrations of dissolved metals, including iron, copper, and zinc. The streams, including Flat, Little Backbone, Spring, West Squaw, Horse, and Zinc Creeks, discharge into Shasta Reservoir and the Sacramento River and have caused numerous fish kills. The sources of pollution are discharge from underground mines, streams that flow into open pits, and streams that flow through pyritic mine dumps where the oxidation of pyrite and other sulfide minerals results in the production of acid and the mobilization of metals. Suggested methods of treatment include the use of air and hydraulic seals in the mines, lime neutralization of mine effluent, channeling of runoff and mine effluent away from mine and tailing areas, and the grading and sealing of mine dumps. A comprehensive preabatement and postabatement program is recommended to evaluate the effects of any treatment method used. (Woodard-USGS)
Contaminant Attenuation Processes at Mining Sites
Monitored natural attenuation is sometimes used in combination with active treatment technologies to achieve site-specific remediation objectives. The global imprint of acid drainage problems at mining sites, however, is a clear reminder that in most cases natural processes are ...
ASCOT: a text mining-based web-service for efficient search and assisted creation of clinical trials
2012-01-01
Clinical trials are mandatory protocols describing medical research on humans and among the most valuable sources of medical practice evidence. Searching for trials relevant to some query is laborious due to the immense number of existing protocols. Apart from search, writing new trials includes composing detailed eligibility criteria, which might be time-consuming, especially for new researchers. In this paper we present ASCOT, an efficient search application customised for clinical trials. ASCOT uses text mining and data mining methods to enrich clinical trials with metadata, that in turn serve as effective tools to narrow down search. In addition, ASCOT integrates a component for recommending eligibility criteria based on a set of selected protocols. PMID:22595088
Korkontzelos, Ioannis; Mu, Tingting; Ananiadou, Sophia
2012-04-30
Clinical trials are mandatory protocols describing medical research on humans and among the most valuable sources of medical practice evidence. Searching for trials relevant to some query is laborious due to the immense number of existing protocols. Apart from search, writing new trials includes composing detailed eligibility criteria, which might be time-consuming, especially for new researchers. In this paper we present ASCOT, an efficient search application customised for clinical trials. ASCOT uses text mining and data mining methods to enrich clinical trials with metadata, that in turn serve as effective tools to narrow down search. In addition, ASCOT integrates a component for recommending eligibility criteria based on a set of selected protocols.
Equipment Selection by using Fuzzy TOPSIS Method
NASA Astrophysics Data System (ADS)
Yavuz, Mahmut
2016-10-01
In this study, Fuzzy TOPSIS method was performed for the selection of open pit truck and the optimal solution of the problem was investigated. Data from Turkish Coal Enterprises was used in the application of the method. This paper explains the Fuzzy TOPSIS approaches with group decision-making application in an open pit coal mine in Turkey. An algorithm of the multi-person multi-criteria decision making with fuzzy set approach was applied an equipment selection problem. It was found that Fuzzy TOPSIS with a group decision making is a method that may help decision-makers in solving different decision-making problems in mining.
Pak, Malk Eun; Kim, Yu Ri; Kim, Ha Neui; Ahn, Sung Min; Shin, Hwa Kyoung; Baek, Jin Ung; Choi, Byung Tae
2016-02-17
In literature on Korean medicine, Dongeuibogam (Treasured Mirror of Eastern Medicine), published in 1613, represents the overall results of the traditional medicines of North-East Asia based on prior medicinal literature of this region. We utilized this medicinal literature by text mining to establish a list of candidate herbs for cognitive enhancement in the elderly and then performed an evaluation of their effects. Text mining was performed for selection of candidate herbs. Cell viability was determined in HT22 hippocampal cells and immunohistochemistry and behavioral analysis was performed in a kainic acid (KA) mice model in order to observe alterations of hippocampal cells and cognition. Twenty four herbs for cognitive enhancement in the elderly were selected by text mining of Dongeuibogam. In HT22 cells, pretreatment with 3 candidate herbs resulted in significantly reduced glutamate-induced cell death. Panax ginseng was the most neuroprotective herb against glutamate-induced cell death. In the hippocampus of a KA mice model, pretreatment with 11 candidate herbs resulted in suppression of caspase-3 expression. Treatment with 7 candidate herbs resulted in significantly enhanced expression levels of phosphorylated cAMP response element binding protein. Number of proliferated cells indicated by BrdU labeling was increased by treatment with 10 candidate herbs. Schisandra chinensis was the most effective herb against cell death and proliferation of progenitor cells and Rehmannia glutinosa in neuroprotection in the hippocampus of a KA mice model. In a KA mice model, we confirmed improved spatial and short memory by treatment with the 3 most effective candidate herbs and these recovered functions were involved in a higher number of newly formed neurons from progenitor cells in the hippocampus. These established herbs and their combinations identified by text-mining technique and evaluation for effectiveness may have value in further experimental and clinical applications for cognitive enhancement in the elderly. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Mining the Text: 34 Text Features that Can Ease or Obstruct Text Comprehension and Use
ERIC Educational Resources Information Center
White, Sheida
2012-01-01
This article presents 34 characteristics of texts and tasks ("text features") that can make continuous (prose), noncontinuous (document), and quantitative texts easier or more difficult for adolescents and adults to comprehend and use. The text features were identified by examining the assessment tasks and associated texts in the national…
A prototype system based on visual interactive SDM called VGC
NASA Astrophysics Data System (ADS)
Jia, Zelu; Liu, Yaolin; Liu, Yanfang
2009-10-01
In many application domains, data is collected and referenced by its geo-spatial location. Spatial data mining, or the discovery of interesting patterns in such databases, is an important capability in the development of database systems. Spatial data mining recently emerges from a number of real applications, such as real-estate marketing, urban planning, weather forecasting, medical image analysis, road traffic accident analysis, etc. It demands for efficient solutions for many new, expensive, and complicated problems. For spatial data mining of large data sets to be effective, it is also important to include humans in the data exploration process and combine their flexibility, creativity, and general knowledge with the enormous storage capacity and computational power of today's computers. Visual spatial data mining applies human visual perception to the exploration of large data sets. Presenting data in an interactive, graphical form often fosters new insights, encouraging the information and validation of new hypotheses to the end of better problem-solving and gaining deeper domain knowledge. In this paper a visual interactive spatial data mining prototype system (visual geo-classify) based on VC++6.0 and MapObject2.0 are designed and developed, the basic algorithms of the spatial data mining is used decision tree and Bayesian networks, and data classify are used training and learning and the integration of the two to realize. The result indicates it's a practical and extensible visual interactive spatial data mining tool.
Bird, Graham
2016-12-01
Globally, thousands of kilometres of rivers are degraded due to the presence of elevated concentrations of potentially harmful elements (PHEs) sourced from historical metal mining activity. In many countries, the presence of contaminated water and river sediment creates a legal requirement to address such problems. Remediation of mining-associated point sources has often been focused upon improving river water quality; however, this study evaluates the contaminant legacy present within river sediments and attempts to assess the influence of the scale of mining activity and post-mining remediation upon the magnitude of PHE contamination found within contemporary river sediments. Data collected from four exemplar catchments indicates a strong relationship between the scale of historical mining, as measured by ore output, and maximum PHE enrichment factors, calculated versus environmental quality guidelines. The use of channel slope as a proxy measure for the degree of channel-floodplain coupling indicates that enrichment factors for PHEs in contemporary river sediments may also be the highest where channel-floodplain coupling is the greatest. Calculation of a metric score for mine remediation activity indicates no clear influence of the scale of remediation activity and PHE enrichment factors for river sediments. It is suggested that whilst exemplars of significant successes at improving post-remediation river water quality can be identified; river sediment quality is a much more long-lasting environmental problem. In addition, it is suggested that improvements to river sediment quality do not occur quickly or easily as a result of remediation actions focused a specific mining point sources. Data indicate that PHEs continue to be episodically dispersed through river catchments hundreds of years after the cessation of mining activity, especially during flood flows. The high PHE loads of flood sediments in mining-affected river catchments and the predicted changes to flood frequency, especially, in many river catchments, provides further evidence of the need to enact effective mine remediation strategies and to fully consider the role of river sediments in prolonging the environmental legacy of historical mine sites.
Discovering Psychological Principles by Mining Naturally Occurring Data Sets.
Goldstone, Robert L; Lupyan, Gary
2016-07-01
The very expertise with which psychologists wield their tools for achieving laboratory control may have had the unwelcome effect of blinding psychologists to the possibilities of discovering principles of behavior without conducting experiments. When creatively interrogated, a diverse range of large, real-world data sets provides powerful diagnostic tools for revealing principles of human judgment, perception, categorization, decision-making, language use, inference, problem solving, and representation. Examples of these data sets include patterns of website links, dictionaries, logs of group interactions, collections of images and image tags, text corpora, history of financial transactions, trends in twitter tag usage and propagation, patents, consumer product sales, performance in high-stakes sporting events, dialect maps, and scientific citations. The goal of this issue is to present some exemplary case studies of mining naturally existing data sets to reveal important principles and phenomena in cognitive science, and to discuss some of the underlying issues involved with conducting traditional experiments, analyses of naturally occurring data, computational modeling, and the synthesis of all three methods. Copyright © 2016 Cognitive Science Society, Inc.
Federal Register 2010, 2011, 2012, 2013, 2014
2010-02-24
... Transmittal of Applications: March 26, 2010. Full Text of Announcement I. Funding Opportunity Description... related to industrial health and safety: Mining and mineral engineering, industrial engineering... technology/technician, hazardous materials information systems technology/technician, mining technology...
NASA Astrophysics Data System (ADS)
Gvozdkova, T.; Tyulenev, M.; Zhironkin, S.; Trifonov, V. A.; Osipov, Yu M.
2017-01-01
Surface mining and open pits engineering affect the environment in a very negative way. Among other pollutions that open pits make during mineral deposits exploiting, particular problem is the landscape changing. Along with converting the land into pits, surface mining is connected with pilling dumps that occupy large ground. The article describes an analysis of transportless methods of several coal seams strata surface mining, applied for open pits of South Kuzbass coal enterprises (Western Siberia, Russia). To improve land-use management of open pit mining enterprises, the characteristics of transportless technological schemes for several coal seams strata surface mining are highlighted and observed. These characteristics help to systematize transportless open mining technologies using common criteria that characterize structure of the bottom part of a strata and internal dumping schemes. The schemes of transportless systems of coal strata surface mining implemented in South Kuzbass are given.
Modeling of information on the impact of mining exploitation on bridge objects in BIM
NASA Astrophysics Data System (ADS)
Bętkowski, Piotr
2018-04-01
The article discusses the advantages of BIM (Building Information Modeling) technology in the management of bridge infrastructure on mining areas. The article shows the problems with information flow in the case of bridge objects located on mining areas and the advantages of proper information management, e.g. the possibility of automatic monitoring of structures, improvement of safety, optimization of maintenance activities, cost reduction of damage removal and preventive actions, improvement of atmosphere for mining exploitation, improvement of the relationship between the manager of the bridge and the mine. Traditional model of managing bridge objects on mining areas has many disadvantages, which are discussed in this article. These disadvantages include among others: duplication of information about the object, lack of correlation in investments due to lack of information flow between bridge manager and mine, limited assessment possibilities of damage propagation on technical condition and construction resistance to mining influences.
MISAGA: An Algorithm for Mining Interesting Subgraphs in Attributed Graphs.
He, Tiantian; Chan, Keith C C
2018-05-01
An attributed graph contains vertices that are associated with a set of attribute values. Mining clusters or communities, which are interesting subgraphs in the attributed graph is one of the most important tasks of graph analytics. Many problems can be defined as the mining of interesting subgraphs in attributed graphs. Algorithms that discover subgraphs based on predefined topologies cannot be used to tackle these problems. To discover interesting subgraphs in the attributed graph, we propose an algorithm called mining interesting subgraphs in attributed graph algorithm (MISAGA). MISAGA performs its tasks by first using a probabilistic measure to determine whether the strength of association between a pair of attribute values is strong enough to be interesting. Given the interesting pairs of attribute values, then the degree of association is computed for each pair of vertices using an information theoretic measure. Based on the edge structure and degree of association between each pair of vertices, MISAGA identifies interesting subgraphs by formulating it as a constrained optimization problem and solves it by identifying the optimal affiliation of subgraphs for the vertices in the attributed graph. MISAGA has been tested with several large-sized real graphs and is found to be potentially very useful for various applications.
Tichomirowa, Marion; Heidel, Claudia
2012-01-01
The isotope composition of dissolved sulphate and strontium in atmospheric deposition, groundwater, mine water and river water in the region of Freiberg was investigated to better understand the fate of these components in the regional and global water cycle. Most of the isotope variations of dissolved sulphates in atmospheric deposition from three locations sampled bi- or tri-monthly can be explained by fractionation processes leading to lower [Formula: see text] (of about 2-3‰) and higher [Formula: see text] (of about 8-10‰) values in summer compared with the winter period. These samples showed a negative correlation between [Formula: see text] and [Formula: see text] values and a weak positive correlation between [Formula: see text] and [Formula: see text] values. They reflect the sulphate formed by aqueous oxidation from long-range transport in clouds. However, these isotope variations were superimposed by changes of the dominating atmospheric sulphate source. At two of the sampling points, large variations of mean annual [Formula: see text] values from atmospheric bulk deposition were recorded. From 2008 to 2009, the mean annual [Formula: see text] value increased by about 5‰; and decreased by about 4‰ from 2009 to 2010. A change in the dominating sulphate source or oxidation pathways of SO(2) in the atmosphere is proposed to cause these shifts. No changes were found in corresponding [Formula: see text] values. Groundwater, river water and some mine waters (where groundwater was the dominating sulphate source) also showed temporal shifts in their [Formula: see text] values corresponding to those of bulk atmospheric deposition, albeit to a lower degree. The mean transit time of atmospheric sulphur through the soil into the groundwater and river water was less than a year and therefore much shorter than previously suggested. Mining activities of about 800 years in the Freiberg region may have led to large subsurface areas with an enhanced groundwater flow along fractures and mined-refilled ore lodes which may shorten transit times of sulphate from precipitation through groundwater into river water.
Preliminary study of detection of buried landmines using a programmable hyperspectral imager
NASA Astrophysics Data System (ADS)
McFee, John E.; Ripley, Herb T.; Buxton, Roger; Thriscutt, Andrew M.
1996-05-01
Experiments were conducted to determine if buried mines could be detected by measuring the change in reflectance spectra of vegetation above mine burial sites. Mines were laid using hand methods and simulated mechanical methods and spectral images were obtained over a three month period using a casi hyperspectral imager scanned from a personnel lift. Mines were not detectable by measurement of the shift of the red edge of vegetative spectra. By calculating the linear correlation coefficient image, some mines in light vegetative cover (grass, grass/blueberries) were apparently detected, but mines buried in heavy vegetation cover (deep ferns) were not detectable. Due to problems with ground truthing, accurate probabilities of detection and false alarm rates were not obtained.
Formulations and algorithms for problems on rock mass and support deformation during mining
NASA Astrophysics Data System (ADS)
Seryakov, VM
2018-03-01
The analysis of problem formulations to calculate stress-strain state of mine support and surrounding rocks mass in rock mechanics shows that such formulations incompletely describe the mechanical features of joint deformation in the rock mass–support system. The present paper proposes an algorithm to take into account the actual conditions of rock mass and support interaction and the algorithm implementation method to ensure efficient calculation of stresses in rocks and support.
NASA Astrophysics Data System (ADS)
Wang, Wei; Yang, Jiong
With the rapid growth of computational biology and e-commerce applications, high-dimensional data becomes very common. Thus, mining high-dimensional data is an urgent problem of great practical importance. However, there are some unique challenges for mining data of high dimensions, including (1) the curse of dimensionality and more crucial (2) the meaningfulness of the similarity measure in the high dimension space. In this chapter, we present several state-of-art techniques for analyzing high-dimensional data, e.g., frequent pattern mining, clustering, and classification. We will discuss how these methods deal with the challenges of high dimensionality.
METAL ATTENUATION PROCESSES AT MINING SITES
The purpose of this Issue Paper is to provide scientists and engineers responsible for assessing remediation technologies with background information on MNA processes at mining-impacted sites. The global magnitude of the acid drainage problem is clear evidence that in most cases...
Konias, Sokratis; Chouvarda, Ioanna; Vlahavas, Ioannis; Maglaveras, Nicos
2005-09-01
Current approaches for mining association rules usually assume that the mining is performed in a static database, where the problem of missing attribute values does not practically exist. However, these assumptions are not preserved in some medical databases, like in a home care system. In this paper, a novel uncertainty rule algorithm is illustrated, namely URG-2 (Uncertainty Rule Generator), which addresses the problem of mining dynamic databases containing missing values. This algorithm requires only one pass from the initial dataset in order to generate the item set, while new metrics corresponding to the notion of Support and Confidence are used. URG-2 was evaluated over two medical databases, introducing randomly multiple missing values for each record's attribute (rate: 5-20% by 5% increments) in the initial dataset. Compared with the classical approach (records with missing values are ignored), the proposed algorithm was more robust in mining rules from datasets containing missing values. In all cases, the difference in preserving the initial rules ranged between 30% and 60% in favour of URG-2. Moreover, due to its incremental nature, URG-2 saved over 90% of the time required for thorough re-mining. Thus, the proposed algorithm can offer a preferable solution for mining in dynamic relational databases.
Data Visualization in Information Retrieval and Data Mining (SIG VIS).
ERIC Educational Resources Information Center
Efthimiadis, Efthimis
2000-01-01
Presents abstracts that discuss using data visualization for information retrieval and data mining, including immersive information space and spatial metaphors; spatial data using multi-dimensional matrices with maps; TREC (Text Retrieval Conference) experiments; users' information needs in cartographic information retrieval; and users' relevance…
Macromolecule mass spectrometry: citation mining of user documents.
Kostoff, Ronald N; Bedford, Clifford D; del Río, J Antonio; Cortes, Héctor D; Karypis, George
2004-03-01
Identifying research users, applications, and impact is important for research performers, managers, evaluators, and sponsors. Identification of the user audience and the research impact is complex and time consuming due to the many indirect pathways through which fundamental research can impact applications. This paper identified the literature pathways through which two highly-cited papers of 2002 Chemistry Nobel Laureates Fenn and Tanaka impacted research, technology development, and applications. Citation Mining, an integration of citation bibliometrics and text mining, was applied to the >1600 first generation Science Citation Index (SCI) citing papers to Fenn's 1989 Science paper on Electrospray Ionization for Mass Spectrometry, and to the >400 first generation SCI citing papers to Tanaka's 1988 Rapid Communications in Mass Spectrometry paper on Laser Ionization Time-of-Flight Mass Spectrometry. Bibliometrics was performed on the citing papers to profile the user characteristics. Text mining was performed on the citing papers to identify the technical areas impacted by the research, and the relationships among these technical areas.
2016-09-26
Intelligent Automation Incorporated Enhancements for a Dynamic Data Warehousing and Mining ...Enhancements for a Dynamic Data Warehousing and Mining System for N00014-16-P-3014 Large-Scale Human Social Cultural Behavioral (HSBC) Data 5b. GRANT NUMBER...Representative Media Gallery View. We perform Scraawl’s NER algorithm to the text associated with YouTube post, which classifies the named entities into
Detecting Malicious Tweets in Twitter Using Runtime Monitoring With Hidden Information
2016-06-01
text mining using Twitter streaming API and python [Online]. Available: http://adilmoujahid.com/posts/2014/07/twitter-analytics/ [22] M. Singh, B...sites with 645,750,000 registered users [3] and has open source public tweets for data mining . 2. Malicious Users and Tweets In the modern world...want to data mine in Twitter, and presents the natural language assertions and corresponding rule patterns. It then describes the steps performed using
Numerical linear algebra in data mining
NASA Astrophysics Data System (ADS)
Eldén, Lars
Ideas and algorithms from numerical linear algebra are important in several areas of data mining. We give an overview of linear algebra methods in text mining (information retrieval), pattern recognition (classification of handwritten digits), and PageRank computations for web search engines. The emphasis is on rank reduction as a method of extracting information from a data matrix, low-rank approximation of matrices using the singular value decomposition and clustering, and on eigenvalue methods for network analysis.
Method to Select Technical Terms for Glossaries in Support of Joint Task Force Operations
2012-01-01
have been prohibitively time-consuming. Instead, we identified two publicly available terminology extractor tools: TerMine (NaCTEM, 2011) and Alchemy ...and that from the latter, by high recall. The Alchemy approach contrasts with that used in TerMine in that Alchemy will process the text with...information categories, such as person, location, and organization, in addition to returning topic keywords. Output from both TerMine and Alchemy
Implications of Emerging Data Mining
NASA Astrophysics Data System (ADS)
Kulathuramaiyer, Narayanan; Maurer, Hermann
Data Mining describes a technology that discovers non-trivial hidden patterns in a large collection of data. Although this technology has a tremendous impact on our lives, the invaluable contributions of this invisible technology often go unnoticed. This paper discusses advances in data mining while focusing on the emerging data mining capability. Such data mining applications perform multidimensional mining on a wide variety of heterogeneous data sources, providing solutions to many unresolved problems. This paper also highlights the advantages and disadvantages arising from the ever-expanding scope of data mining. Data Mining augments human intelligence by equipping us with a wealth of knowledge and by empowering us to perform our daily tasks better. As the mining scope and capacity increases, users and organizations become more willing to compromise privacy. The huge data stores of the ‚master miners` allow them to gain deep insights into individual lifestyles and their social and behavioural patterns. Data integration and analysis capability of combining business and financial trends together with the ability to deterministically track market changes will drastically affect our lives.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hirdt, J.A.; Brown, D.A., E-mail: dbrown@bnl.gov
The EXFOR library contains the largest collection of experimental nuclear reaction data available as well as the data's bibliographic information and experimental details. We text-mined the REACTION and MONITOR fields of the ENTRYs in the EXFOR library in order to identify understudied reactions and quantities. Using the results of the text-mining, we created an undirected graph from the EXFOR datasets with each graph node representing a single reaction and quantity and graph links representing the various types of connections between these reactions and quantities. This graph is an abstract representation of the connections in EXFOR, similar to graphs of socialmore » networks, authorship networks, etc. We use various graph theoretical tools to identify important yet understudied reactions and quantities in EXFOR. Although we identified a few cross sections relevant for shielding applications and isotope production, mostly we identified charged particle fluence monitor cross sections. As a side effect of this work, we learn that our abstract graph is typical of other real-world graphs.« less
Systematic analysis of molecular mechanisms for HCC metastasis via text mining approach.
Zhen, Cheng; Zhu, Caizhong; Chen, Haoyang; Xiong, Yiru; Tan, Junyuan; Chen, Dong; Li, Jin
2017-02-21
To systematically explore the molecular mechanism for hepatocellular carcinoma (HCC) metastasis and identify regulatory genes with text mining methods. Genes with highest frequencies and significant pathways related to HCC metastasis were listed. A handful of proteins such as EGFR, MDM2, TP53 and APP, were identified as hub nodes in PPI (protein-protein interaction) network. Compared with unique genes for HBV-HCCs, genes particular to HCV-HCCs were less, but may participate in more extensive signaling processes. VEGFA, PI3KCA, MAPK1, MMP9 and other genes may play important roles in multiple phenotypes of metastasis. Genes in abstracts of HCC-metastasis literatures were identified. Word frequency analysis, KEGG pathway and PPI network analysis were performed. Then co-occurrence analysis between genes and metastasis-related phenotypes were carried out. Text mining is effective for revealing potential regulators or pathways, but the purpose of it should be specific, and the combination of various methods will be more useful.
Knowledge acquisition, semantic text mining, and security risks in health and biomedical informatics
Huang, Jingshan; Dou, Dejing; Dang, Jiangbo; Pardue, J Harold; Qin, Xiao; Huan, Jun; Gerthoffer, William T; Tan, Ming
2012-01-01
Computational techniques have been adopted in medical and biological systems for a long time. There is no doubt that the development and application of computational methods will render great help in better understanding biomedical and biological functions. Large amounts of datasets have been produced by biomedical and biological experiments and simulations. In order for researchers to gain knowledge from original data, nontrivial transformation is necessary, which is regarded as a critical link in the chain of knowledge acquisition, sharing, and reuse. Challenges that have been encountered include: how to efficiently and effectively represent human knowledge in formal computing models, how to take advantage of semantic text mining techniques rather than traditional syntactic text mining, and how to handle security issues during the knowledge sharing and reuse. This paper summarizes the state-of-the-art in these research directions. We aim to provide readers with an introduction of major computing themes to be applied to the medical and biological research. PMID:22371823
Chemical named entities recognition: a review on approaches and applications.
Eltyeb, Safaa; Salim, Naomie
2014-01-01
The rapid increase in the flow rate of published digital information in all disciplines has resulted in a pressing need for techniques that can simplify the use of this information. The chemistry literature is very rich with information about chemical entities. Extracting molecules and their related properties and activities from the scientific literature to "text mine" these extracted data and determine contextual relationships helps research scientists, particularly those in drug development. One of the most important challenges in chemical text mining is the recognition of chemical entities mentioned in the texts. In this review, the authors briefly introduce the fundamental concepts of chemical literature mining, the textual contents of chemical documents, and the methods of naming chemicals in documents. We sketch out dictionary-based, rule-based and machine learning, as well as hybrid chemical named entity recognition approaches with their applied solutions. We end with an outlook on the pros and cons of these approaches and the types of chemical entities extracted.
NASA Astrophysics Data System (ADS)
Hirdt, J. A.; Brown, D. A.
2016-01-01
The EXFOR library contains the largest collection of experimental nuclear reaction data available as well as the data's bibliographic information and experimental details. We text-mined the REACTION and MONITOR fields of the ENTRYs in the EXFOR library in order to identify understudied reactions and quantities. Using the results of the text-mining, we created an undirected graph from the EXFOR datasets with each graph node representing a single reaction and quantity and graph links representing the various types of connections between these reactions and quantities. This graph is an abstract representation of the connections in EXFOR, similar to graphs of social networks, authorship networks, etc. We use various graph theoretical tools to identify important yet understudied reactions and quantities in EXFOR. Although we identified a few cross sections relevant for shielding applications and isotope production, mostly we identified charged particle fluence monitor cross sections. As a side effect of this work, we learn that our abstract graph is typical of other real-world graphs.
Development of a novel low frequency GPR system for ultra-deep detection in Mine
NASA Astrophysics Data System (ADS)
Xu, Xianlei; Peng, Suping; Yang, Feng
2016-04-01
Mine disasters sources is the main source of the underground coal mine accidents in China. This paper describes the development of a novel explosion proof ground penetrating radar (GPR) for mine disasters sources detection, aiming to solve the current problems of the small detection range and low precision in the mine advanced detection in China. A high performance unipolar pulse transmitting unit is developed by using avalanche transistors, and an effective pulse excitation source network. And a new pluggable combined low-frequency antenna involving three frequencies with 12.5MHz, 25 MHz and 50MHz, is designed and developed. The plate-type structure is designed, aiming to enhance the directivity of the antenna, and the achievement of the antenna impedance matching is implemented in the feed point based on the extensions interface design, enhancing the antenna bandwidth and reducing the standing wave interference. Moreover, a high precision stepper delay circuit is designed by transforming the number of the operational amplifier step and using the differential compensation between the metal-oxide semiconductor field effect transistors, aiming to improve the accuracy of the signal acquisition system. In order to adapt to the mine environment, the explosion-proof design is implemented for the GPR system, including the host, transmitter, receiver, battery box, antenna, and other components.Mine detection experiments is carried out and the results show: the novel GPR system can effectively detect the location and depth of the geological disasters source with the depth greater than30 m and the diameter greater than 3m, the maximum detection depth can be up to 80m, which break the current detection depth limitations within 30m, providing an effective technical support for the ultra-deep mine disasters detection and the safety problems in coal mine production.
Large-Scale Event Extraction from Literature with Multi-Level Gene Normalization
Wei, Chih-Hsuan; Hakala, Kai; Pyysalo, Sampo; Ananiadou, Sophia; Kao, Hung-Yu; Lu, Zhiyong; Salakoski, Tapio; Van de Peer, Yves; Ginter, Filip
2013-01-01
Text mining for the life sciences aims to aid database curation, knowledge summarization and information retrieval through the automated processing of biomedical texts. To provide comprehensive coverage and enable full integration with existing biomolecular database records, it is crucial that text mining tools scale up to millions of articles and that their analyses can be unambiguously linked to information recorded in resources such as UniProt, KEGG, BioGRID and NCBI databases. In this study, we investigate how fully automated text mining of complex biomolecular events can be augmented with a normalization strategy that identifies biological concepts in text, mapping them to identifiers at varying levels of granularity, ranging from canonicalized symbols to unique gene and proteins and broad gene families. To this end, we have combined two state-of-the-art text mining components, previously evaluated on two community-wide challenges, and have extended and improved upon these methods by exploiting their complementary nature. Using these systems, we perform normalization and event extraction to create a large-scale resource that is publicly available, unique in semantic scope, and covers all 21.9 million PubMed abstracts and 460 thousand PubMed Central open access full-text articles. This dataset contains 40 million biomolecular events involving 76 million gene/protein mentions, linked to 122 thousand distinct genes from 5032 species across the full taxonomic tree. Detailed evaluations and analyses reveal promising results for application of this data in database and pathway curation efforts. The main software components used in this study are released under an open-source license. Further, the resulting dataset is freely accessible through a novel API, providing programmatic and customized access (http://www.evexdb.org/api/v001/). Finally, to allow for large-scale bioinformatic analyses, the entire resource is available for bulk download from http://evexdb.org/download/, under the Creative Commons – Attribution – Share Alike (CC BY-SA) license. PMID:23613707
Ravikumar, Komandur Elayavilli; Wagholikar, Kavishwar B; Li, Dingcheng; Kocher, Jean-Pierre; Liu, Hongfang
2015-06-06
Advances in the next generation sequencing technology has accelerated the pace of individualized medicine (IM), which aims to incorporate genetic/genomic information into medicine. One immediate need in interpreting sequencing data is the assembly of information about genetic variants and their corresponding associations with other entities (e.g., diseases or medications). Even with dedicated effort to capture such information in biological databases, much of this information remains 'locked' in the unstructured text of biomedical publications. There is a substantial lag between the publication and the subsequent abstraction of such information into databases. Multiple text mining systems have been developed, but most of them focus on the sentence level association extraction with performance evaluation based on gold standard text annotations specifically prepared for text mining systems. We developed and evaluated a text mining system, MutD, which extracts protein mutation-disease associations from MEDLINE abstracts by incorporating discourse level analysis, using a benchmark data set extracted from curated database records. MutD achieves an F-measure of 64.3% for reconstructing protein mutation disease associations in curated database records. Discourse level analysis component of MutD contributed to a gain of more than 10% in F-measure when compared against the sentence level association extraction. Our error analysis indicates that 23 of the 64 precision errors are true associations that were not captured by database curators and 68 of the 113 recall errors are caused by the absence of associated disease entities in the abstract. After adjusting for the defects in the curated database, the revised F-measure of MutD in association detection reaches 81.5%. Our quantitative analysis reveals that MutD can effectively extract protein mutation disease associations when benchmarking based on curated database records. The analysis also demonstrates that incorporating discourse level analysis significantly improved the performance of extracting the protein-mutation-disease association. Future work includes the extension of MutD for full text articles.
Batsaikhan, Bayartungalag; Kwon, Jang-Soon; Kim, Kyoung-Ho; Lee, Young-Joon; Lee, Jeong-Ho; Badarch, Mendbayar; Yun, Seong-Taek
2017-01-01
Although metallic mineral resources are most important in the economy of Mongolia, mining activities with improper management may result in the pollution of stream waters, posing a threat to aquatic ecosystems and humans. In this study, aiming to evaluate potential impacts of metallic mining activities on the quality of a transboundary river (Selenge) in central northern Mongolia, we performed hydrochemical investigations of rivers (Tuul, Khangal, Orkhon, Haraa, and Selenge). Hydrochemical analysis of river waters indicates that, while major dissolved ions originate from natural weathering (especially, dissolution of carbonate minerals) within watersheds, they are also influenced by mining activities. The water quality problem arising from very high turbidity is one of the major environmental concerns and is caused by suspended particles (mainly, sediment and soil particles) from diverse erosion processes, including erosion of river banks along the meandering river system, erosion of soils owing to overgrazing by livestock, and erosion by human activities, such as mining and agriculture. In particular, after passing through the Zaamar gold mining area, due to the disturbance of sediments and soils by placer gold mining, the Tuul River water becomes very turbid (up to 742 Nephelometric Turbidity Unit (NTU)). The Zaamar area is also the contamination source of the Tuul and Orkhon rivers by Al, Fe, and Mn, especially during the mining season. The hydrochemistry of the Khangal River is influenced by heavy metal (especially, Mn, Al, Cd, and As)-loaded mine drainage that originates from a huge tailing dam of the Erdenet porphyry Cu-Mo mine, as evidenced by δ 34 S values of dissolved sulfate (0.2 to 3.8 ‰). These two contaminated rivers (Tuul and Khangal) merge into the Orkhon River that flows to the Selenge River near the boundary between Mongolia and Russia and then eventually flows into Lake Baikal. Because water quality problems due to mining can be critical, mining activities in central northern Mongolia should be carefully managed to minimize the transboundary movement of aquatic contaminants (in particular, turbidity, dissolved organic carbon, Fe and Al) from mining activities.
A semantic model for multimodal data mining in healthcare information systems.
Iakovidis, Dimitris; Smailis, Christos
2012-01-01
Electronic health records (EHRs) are representative examples of multimodal/multisource data collections; including measurements, images and free texts. The diversity of such information sources and the increasing amounts of medical data produced by healthcare institutes annually, pose significant challenges in data mining. In this paper we present a novel semantic model that describes knowledge extracted from the lowest-level of a data mining process, where information is represented by multiple features i.e. measurements or numerical descriptors extracted from measurements, images, texts or other medical data, forming multidimensional feature spaces. Knowledge collected by manual annotation or extracted by unsupervised data mining from one or more feature spaces is modeled through generalized qualitative spatial semantics. This model enables a unified representation of knowledge across multimodal data repositories. It contributes to bridging the semantic gap, by enabling direct links between low-level features and higher-level concepts e.g. describing body parts, anatomies and pathological findings. The proposed model has been developed in web ontology language based on description logics (OWL-DL) and can be applied to a variety of data mining tasks in medical informatics. It utility is demonstrated for automatic annotation of medical data.
Tagawa, Miki; Matsuda, Yoshio; Manaka, Tomoko; Kobayashi, Makiko; Ohwada, Michitaka; Matsubara, Shigeki
2017-01-01
The aim of the study was to examine the possibility of converting subjective textual data written in the free column space of the Mother and Child Handbook (MCH) into objective information using text mining and to compare any monthly changes in the words written by the mothers. Pregnant women without complications (n = 60) were divided into two groups according to State-Trait Anxiety Inventory grade: low trait anxiety (group I, n = 39) and high trait anxiety (group II, n = 21). Exploratory analysis of the textual data from the MCH was conducted by text mining using the Word Miner software program. Using 1203 structural elements extracted after processing, a comparison of monthly changes in the words used in the mothers' comments was made between the two groups. The data was mainly analyzed by a correspondence analysis. The structural elements in groups I and II were divided into seven and six clusters, respectively, by cluster analysis. Correspondence analysis revealed clear monthly changes in the words used in the mothers' comments as the pregnancy progressed in group I, whereas the association was not clear in group II. The text mining method was useful for exploratory analysis of the textual data obtained from pregnant women, and the monthly change in the words used in the mothers' comments as pregnancy progressed differed according to their degree of unease. © 2016 Japan Society of Obstetrics and Gynecology.
A Text-Mining Framework for Supporting Systematic Reviews.
Li, Dingcheng; Wang, Zhen; Wang, Liwei; Sohn, Sunghwan; Shen, Feichen; Murad, Mohammad Hassan; Liu, Hongfang
2016-11-01
Systematic reviews (SRs) involve the identification, appraisal, and synthesis of all relevant studies for focused questions in a structured reproducible manner. High-quality SRs follow strict procedures and require significant resources and time. We investigated advanced text-mining approaches to reduce the burden associated with abstract screening in SRs and provide high-level information summary. A text-mining SR supporting framework consisting of three self-defined semantics-based ranking metrics was proposed, including keyword relevance, indexed-term relevance and topic relevance. Keyword relevance is based on the user-defined keyword list used in the search strategy. Indexed-term relevance is derived from indexed vocabulary developed by domain experts used for indexing journal articles and books. Topic relevance is defined as the semantic similarity among retrieved abstracts in terms of topics generated by latent Dirichlet allocation, a Bayesian-based model for discovering topics. We tested the proposed framework using three published SRs addressing a variety of topics (Mass Media Interventions, Rectal Cancer and Influenza Vaccine). The results showed that when 91.8%, 85.7%, and 49.3% of the abstract screening labor was saved, the recalls were as high as 100% for the three cases; respectively. Relevant studies identified manually showed strong topic similarity through topic analysis, which supported the inclusion of topic analysis as relevance metric. It was demonstrated that advanced text mining approaches can significantly reduce the abstract screening labor of SRs and provide an informative summary of relevant studies.
Bravo, Àlex; Piñero, Janet; Queralt-Rosinach, Núria; Rautschka, Michael; Furlong, Laura I
2015-02-21
Current biomedical research needs to leverage and exploit the large amount of information reported in scientific publications. Automated text mining approaches, in particular those aimed at finding relationships between entities, are key for identification of actionable knowledge from free text repositories. We present the BeFree system aimed at identifying relationships between biomedical entities with a special focus on genes and their associated diseases. By exploiting morpho-syntactic information of the text, BeFree is able to identify gene-disease, drug-disease and drug-target associations with state-of-the-art performance. The application of BeFree to real-case scenarios shows its effectiveness in extracting information relevant for translational research. We show the value of the gene-disease associations extracted by BeFree through a number of analyses and integration with other data sources. BeFree succeeds in identifying genes associated to a major cause of morbidity worldwide, depression, which are not present in other public resources. Moreover, large-scale extraction and analysis of gene-disease associations, and integration with current biomedical knowledge, provided interesting insights on the kind of information that can be found in the literature, and raised challenges regarding data prioritization and curation. We found that only a small proportion of the gene-disease associations discovered by using BeFree is collected in expert-curated databases. Thus, there is a pressing need to find alternative strategies to manual curation, in order to review, prioritize and curate text-mining data and incorporate it into domain-specific databases. We present our strategy for data prioritization and discuss its implications for supporting biomedical research and applications. BeFree is a novel text mining system that performs competitively for the identification of gene-disease, drug-disease and drug-target associations. Our analyses show that mining only a small fraction of MEDLINE results in a large dataset of gene-disease associations, and only a small proportion of this dataset is actually recorded in curated resources (2%), raising several issues on data prioritization and curation. We propose that joint analysis of text mined data with data curated by experts appears as a suitable approach to both assess data quality and highlight novel and interesting information.
tmBioC: improving interoperability of text-mining tools with BioC.
Khare, Ritu; Wei, Chih-Hsuan; Mao, Yuqing; Leaman, Robert; Lu, Zhiyong
2014-01-01
The lack of interoperability among biomedical text-mining tools is a major bottleneck in creating more complex applications. Despite the availability of numerous methods and techniques for various text-mining tasks, combining different tools requires substantial efforts and time owing to heterogeneity and variety in data formats. In response, BioC is a recent proposal that offers a minimalistic approach to tool interoperability by stipulating minimal changes to existing tools and applications. BioC is a family of XML formats that define how to present text documents and annotations, and also provides easy-to-use functions to read/write documents in the BioC format. In this study, we introduce our text-mining toolkit, which is designed to perform several challenging and significant tasks in the biomedical domain, and repackage the toolkit into BioC to enhance its interoperability. Our toolkit consists of six state-of-the-art tools for named-entity recognition, normalization and annotation (PubTator) of genes (GenNorm), diseases (DNorm), mutations (tmVar), species (SR4GN) and chemicals (tmChem). Although developed within the same group, each tool is designed to process input articles and output annotations in a different format. We modify these tools and enable them to read/write data in the proposed BioC format. We find that, using the BioC family of formats and functions, only minimal changes were required to build the newer versions of the tools. The resulting BioC wrapped toolkit, which we have named tmBioC, consists of our tools in BioC, an annotated full-text corpus in BioC, and a format detection and conversion tool. Furthermore, through participation in the 2013 BioCreative IV Interoperability Track, we empirically demonstrate that the tools in tmBioC can be more efficiently integrated with each other as well as with external tools: Our experimental results show that using BioC reduces >60% in lines of code for text-mining tool integration. The tmBioC toolkit is publicly available at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/. Database URL: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, G.
About 40 miles outside of Price, UT, is Natomas Trail Mountain Coal Co. Owned and operated by Natomas Coal, Englewood, CO, Trail Mountain Coal is a two section mine using continuous miners, shuttle cars and belt haulage. Seam height averages about seven feet. Until about a year and a half ago, Trail Mountain Coal Co. was not much different from most mines. With about 120 employees, they were faced with the same problems seen throughout the industry such as high absenteeism, inexperience, poor communication and cooperation, and a host of other problems.
NASA Astrophysics Data System (ADS)
Zhironkin, S. A.; Khoreshok, A. A.; Tyulenev, M. A.; Barysheva, G. A.; Hellmer, M. C.
2016-08-01
This article describes the problems and prospects of development of coal mining in Kuzbass - the center of coal production in Siberia and Russia, in the framework of the major initiatives of the National Energy Strategy for the period until 2035. The structural character of the regional coal industry problems, caused by decline in investment activity, high level of fixed assets depreciation, slow development of deep coal processing and technological reduction of coal mining is shown.
Mining Health-Related Issues in Consumer Product Reviews by Using Scalable Text Analytics
Torii, Manabu; Tilak, Sameer S.; Doan, Son; Zisook, Daniel S.; Fan, Jung-wei
2016-01-01
In an era when most of our life activities are digitized and recorded, opportunities abound to gain insights about population health. Online product reviews present a unique data source that is currently underexplored. Health-related information, although scarce, can be systematically mined in online product reviews. Leveraging natural language processing and machine learning tools, we were able to mine 1.3 million grocery product reviews for health-related information. The objectives of the study were as follows: (1) conduct quantitative and qualitative analysis on the types of health issues found in consumer product reviews; (2) develop a machine learning classifier to detect reviews that contain health-related issues; and (3) gain insights about the task characteristics and challenges for text analytics to guide future research. PMID:27375358
Mining Health-Related Issues in Consumer Product Reviews by Using Scalable Text Analytics.
Torii, Manabu; Tilak, Sameer S; Doan, Son; Zisook, Daniel S; Fan, Jung-Wei
2016-01-01
In an era when most of our life activities are digitized and recorded, opportunities abound to gain insights about population health. Online product reviews present a unique data source that is currently underexplored. Health-related information, although scarce, can be systematically mined in online product reviews. Leveraging natural language processing and machine learning tools, we were able to mine 1.3 million grocery product reviews for health-related information. The objectives of the study were as follows: (1) conduct quantitative and qualitative analysis on the types of health issues found in consumer product reviews; (2) develop a machine learning classifier to detect reviews that contain health-related issues; and (3) gain insights about the task characteristics and challenges for text analytics to guide future research.
30 CFR 900.15 - Federal lands program cooperative agreements.
Code of Federal Regulations, 2010 CFR
2010-07-01
.... 900.15 Section 900.15 Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR PROGRAMS FOR THE CONDUCT OF SURFACE MINING OPERATIONS WITHIN EACH STATE INTRODUCTION § 900.15 Federal lands program cooperative agreements. The full text of any State and Federal...
30 CFR 900.12 - State regulatory programs.
Code of Federal Regulations, 2010 CFR
2010-07-01
....12 Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR PROGRAMS FOR THE CONDUCT OF SURFACE MINING OPERATIONS WITHIN EACH STATE INTRODUCTION § 900.12 State... to be codified under the applicable part number assigned to the State. The full text will not appear...
Literature Mining Methods for Toxicology and Construction of ...
Webinar Presentation on text-mining methodologies in use at NCCT and how they can be used to assist with the OECD Retinoid project. Presentation to 1st Workshop/Scientific Expert Group meeting on the OECD Retinoid Project - April 26, 2016 –Brussels, Presented remotely via web.
Untangling Topic Threads in Chat-Based Communication: A Case Study
2011-08-01
learning techniques such as clustering are very popular for analyzing text for topic identification (Anjewierden,, Kollöffel and Hulshof 2007; Adams...Anjewierden, A., Kollöffel, B., and Hulshof , C. (2007). Towards educational data mining: Using data mining methods for automated chat analysis to
Monitoring Metal Pollution Levels in Mine Wastes around a Coal Mine Site Using GIS
NASA Astrophysics Data System (ADS)
Sanliyuksel Yucel, D.; Yucel, M. A.; Ileri, B.
2017-11-01
In this case study, metal pollution levels in mine wastes at a coal mine site in Etili coal mine (Can coal basin, NW Turkey) are evaluated using geographical information system (GIS) tools. Etili coal mine was operated since the 1980s as an open pit. Acid mine drainage is the main environmental problem around the coal mine. The main environmental contamination source is mine wastes stored around the mine site. Mine wastes were dumped over an extensive area along the riverbeds, and are now abandoned. Mine waste samples were homogenously taken at 10 locations within the sampling area of 102.33 ha. The paste pH and electrical conductivity values of mine wastes ranged from 2.87 to 4.17 and 432 to 2430 μS/cm, respectively. Maximum Al, Fe, Mn, Pb, Zn and Ni concentrations of wastes were measured as 109300, 70600, 309.86, 115.2, 38 and 5.3 mg/kg, respectively. The Al, Fe and Pb concentrations of mine wastes are higher than world surface rock average values. The geochemical analysis results from the study area were presented in the form of maps. The GIS based environmental database will serve as a reference study for our future work.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Byerly, D.W.
1976-06-01
The following is a report of investigation on the geologic setting of several underground limestone mines in Ohio other than the PPG mine at Barberton, Ohio. Due to the element of available time, the writer is only able to deliver a brief synopsis of the geology of three sites visited. These three sites and the Barberton, Ohio site are the only underground limestone mines in Ohio to the best of the writer's knowledge. The sites visited include: (1) the Jonathan Mine located near Zanesville, Ohio, and currently operated by the Columbia Cement Corporation; (2) the abandoned Alpha Portland Cement Minemore » located near Ironton, Ohio; and (3) the Lewisburg Mine located at Lewisburg, Ohio, and currently being utilized as an underground storage facility. Other remaining possibilities where limestone is being mined underground are located in middle Ordovician strata near Carntown and Maysville, Kentucky. These are drift mines into a thick sequence of carbonates. The writer predicts, however, that these mines would have some problems with water due to the preponderance of carbonate rocks and the proximity of the mines to the Ohio River. None of the sites visited nor the sites in Kentucky have conditions comparable to the deep mine at Barberton, Ohio.« less
Control problems in armored face conveyors for longwall mines
DOE Office of Scientific and Technical Information (OSTI.GOV)
Broadfoot, A.R.; Betz, R.E.
1998-03-01
This paper is a tutorial discussion of the current difficulties being experienced with the performance of armored face conveyor (AFC) drive systems, as used in longwall mining. It presents the traditional approaches to the design of the drive system and highlights the inadequacies. The final part of the paper presents a possible solution approach using variable-speed drive systems, emphasizing the advantages of this approach. The paper is significant, in that it discusses, in one document, a number of problems related to the operation of longwall AFC`s. Furthermore, it presents a solution path for these problems. The details of the controlmore » strategies to solve the problems highlighted are left to a companion paper.« less
The Detection Method of Fire Abnormal Based on Directional Drilling in Complex Conditions of Mine
NASA Astrophysics Data System (ADS)
Huijun, Duan; Shijun, Hao; Jie, Feng
2018-06-01
In the light of more and more urgent hidden fire abnormal detection problem in complex conditions of mine, a method which is used directional drilling technology is put forward. The method can avoid the obstacles in mine, and complete the fire abnormal detection. This paper based on analyzing the trajectory control of directional drilling, measurement while drilling and the characteristic of open branch process, the project of the directional drilling is formulated combination with a complex condition mine, and the detection of fire abnormal is implemented. This method can provide technical support for fire prevention, which also can provide a new way for fire anomaly detection in the similar mine.
Evaluation of the Kloswall longwall mining system
NASA Astrophysics Data System (ADS)
Guay, P. J.
1982-04-01
A new longwal mining system specifically designed to extract a very deep web (48 inches or deeper) from a longwall panel was studied. Productivity and cost analysis comparing the new mining system with a conventional longwall operation taking a 30 inch wide web is presented. It is shown that the new system will increase annual production and return on investment in most cases. Conceptual drawings and specifications for a high capacity three drum shearer and a unique shield type of roof support specifically designed for very wide web operation are reported. The advantages and problems associated with wide web mining in general and as they relate specifically to the equipment selected for the new mining system are discussed.
NASA Technical Reports Server (NTRS)
Wier, C. E. (Principal Investigator); Powell, R. L.; Amato, R. V.; Russell, O. R.; Martin, K. R.
1975-01-01
The author has identified the following significant results. This investigation evaluated the applicability of a variety of sensor types, formats, and resolution capabilities to the study of both fuel and nonfuel mined lands. The image reinforcement provided by stereo viewing of the EREP images proved useful for identifying lineaments and for mined lands mapping. Skylab S190B color and color infrared transparencies were the most useful EREP imagery. New information on lineament and fracture patterns in the bedrock of Indiana and Illinois extracted from analysis of the Skylab imagery has contributed to furthering the geological understanding of this portion of the Illinois basin.
Psychosocial and health impacts of uranium mining and milling on Navajo lands.
Dawson, Susan E; Madsen, Gary E
2011-11-01
The uranium industry in the American Southwest has had profoundly negative impacts on American Indian communities. Navajo workers experienced significant health problems, including lung cancer and nonmalignant respiratory diseases, and psychosocial problems, such as depression and anxiety. There were four uranium processing mills and approximately 1,200 uranium mines on the Navajo Nation's over 27,000 square miles. In this paper, a chronology is presented of how uranium mining and milling impacted the lives of Navajo workers and their families. Local community leaders organized meetings across the reservation to inform workers and their families about the relationship between worker exposures and possible health problems. A reservation-wide effort resulted in activists working with political leaders and attorneys to write radiation compensation legislation, which was passed in 1990 as the Radiation Exposure Compensation Act (RECA) and included underground uranium miners, atomic downwinders, and nuclear test-site workers. Later efforts resulted in the inclusion of surface miners, ore truck haulers, and millworkers in the RECA Amendments of 2000. On the Navajo Nation, the Office of Navajo Uranium Workers was created to assist workers and their families to apply for RECA funds. Present issues concerning the Navajo and other uranium-impacted groups include those who worked in mining and milling after 1971 and are excluded from RECA. Perceptions about uranium health impacts have contributed recently to the Navajo people rejecting a resumption of uranium mining and milling on Navajo lands.
Mining Deployment Optimization
NASA Astrophysics Data System (ADS)
Čech, Jozef
2016-09-01
The deployment problem, researched primarily in the military sector, is emerging in some other industries, mining included. The principal decision is how to deploy some activities in space and time to achieve desired outcome while complying with certain requirements or limits. Requirements and limits are on the side constraints, while minimizing costs or maximizing some benefits are on the side of objectives. A model with application to mining of polymetallic deposit is presented. To obtain quick and immediate decision solutions for a mining engineer with experimental possibilities is the main intention of a computer-based tool. The task is to determine strategic deployment of mining activities on a deposit, meeting planned output from the mine and at the same time complying with limited reserves and haulage capacities. Priorities and benefits can be formulated by the planner.
William T. Plass; Willis G. Vogel
1973-01-01
A survey of 39 surface-mine sites in southern West Virginia showed that most of the spoils from current mining operations had a pH of 5.0 or higher. Soil-size material averaged 37 percent of the weight of the spoils sampled. A major problem for the establishment of vegetation was a deficiency of nitrogen and phosphorus. This can be corrected with additions of...
Sunbelt strives for diversification
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jackson, D.
1982-08-01
The Sunbelt Mining Co. is a subsidiary of the Public Service Co., of New Mexico which seeks out and develops additional resources for both its parent company and new customers. The De-Na-Zin surface mine in the Bisti coalfield of New Mexico is described, together with the aggregate crushing and screening facility that supplies road construction material to the mine. The problem of overburden removal and the reclamation of land in the desert is discussed.
Review of Studies of Mechanoelectrical Transformations in Rocks in Russia and Abroad
NASA Astrophysics Data System (ADS)
Pomishin, E.; Yavorovich, L.
2016-06-01
The problem of monitoring and forecast of dynamic manifestations of rock masses becomes immediate in the mining industry because of the growth of mining work intensity and changeover to the mining operations in deeper levels. The article presents a short review of the scientific works of foreign researchers for more complete and in-depth study of geophysical methods of control of the stress-strain state and bump hazard of rock masses.
Directed Selection of Biochars for Amending Metal Contaminated Mine Soils
Approximately 500,000 abandoned mines across the U.S. pose a considerable, pervasive risk to human health and the environment. World-wide the problem is even larger. Lime, organic matter, biosolids and other amendments have been used to decrease metal bioavailability in contami...
USE OF HYDROGEN RESPIROMETRY TO DETERMINE METAL TOXICITY TO SULFATE REDUCING BACTERIA
Acid mine drainage (AMD), an acidic metal-bearing wastewater poses a severe pollution problem attributed to post-mining activities. The metals (metal sulfates) encountered in AMD and considered of concern for risk assessment are: arsenic, cadmium, aluminum, manganese, iron, zinc ...
A New Data Mining Scheme Using Artificial Neural Networks
Kamruzzaman, S. M.; Jehad Sarkar, A. M.
2011-01-01
Classification is one of the data mining problems receiving enormous attention in the database community. Although artificial neural networks (ANNs) have been successfully applied in a wide range of machine learning applications, they are however often regarded as black boxes, i.e., their predictions cannot be explained. To enhance the explanation of ANNs, a novel algorithm to extract symbolic rules from ANNs has been proposed in this paper. ANN methods have not been effectively utilized for data mining tasks because how the classifications were made is not explicitly stated as symbolic rules that are suitable for verification or interpretation by human experts. With the proposed approach, concise symbolic rules with high accuracy, that are easily explainable, can be extracted from the trained ANNs. Extracted rules are comparable with other methods in terms of number of rules, average number of conditions for a rule, and the accuracy. The effectiveness of the proposed approach is clearly demonstrated by the experimental results on a set of benchmark data mining classification problems. PMID:22163866
NASA Astrophysics Data System (ADS)
Gaunaurd, Guillermo C.; Strifors, Hans C.
Landmines have been laid in conflicts around the world, and they cause enormous humanitarian problems in more than 60 countries, killing, mutilating, or maiming the innocent every day. They do not differentiate between elderly, men, women, children, or animals, and they are triggered off by the victims themselves. Detection and clearance of landmines, however, have turned out to be an immensely challenging problem. A traditional means for detecting mines is the metal detector. The detector head is slowly moved over the suspicious terrain, and it gives out audible alerts when metal in the ground disturbs the magnetic field the detector generates. In general, the number of false alarms caused by various (man-made) metal objects in the ground is a great deal larger than the number of mines. This makes mine clearance a tiring task that demands the highest level of concentration. Moreover, a hazard involved is that the metal detector could be too insensitive to mines of low-metal content.
Kernel Methods for Mining Instance Data in Ontologies
NASA Astrophysics Data System (ADS)
Bloehdorn, Stephan; Sure, York
The amount of ontologies and meta data available on the Web is constantly growing. The successful application of machine learning techniques for learning of ontologies from textual data, i.e. mining for the Semantic Web, contributes to this trend. However, no principal approaches exist so far for mining from the Semantic Web. We investigate how machine learning algorithms can be made amenable for directly taking advantage of the rich knowledge expressed in ontologies and associated instance data. Kernel methods have been successfully employed in various learning tasks and provide a clean framework for interfacing between non-vectorial data and machine learning algorithms. In this spirit, we express the problem of mining instances in ontologies as the problem of defining valid corresponding kernels. We present a principled framework for designing such kernels by means of decomposing the kernel computation into specialized kernels for selected characteristics of an ontology which can be flexibly assembled and tuned. Initial experiments on real world Semantic Web data enjoy promising results and show the usefulness of our approach.
Comparison between BIDE, PrefixSpan, and TRuleGrowth for Mining of Indonesian Text
NASA Astrophysics Data System (ADS)
Sa'adillah Maylawati, Dian; Irfan, Mohamad; Budiawan Zulfikar, Wildan
2017-01-01
Mining proscess for Indonesian language still be an interesting research. Multiple of words representation was claimed can keep the meaning of text better than bag of words. In this paper, we compare several sequential pattern algortihm, among others BIDE (BIDirectional Extention), PrefixSpan, and TRuleGrowth. All of those algorithm produce frequent word sequence to keep the meaning of text. However, the experiment result, with 14.006 of Indonesian tweet from Twitter, shows that BIDE can produce more efficient frequent word sequence than PrefixSpan and TRuleGrowth without missing the meaning of text. Then, the average of time process of PrefixSpan is faster than BIDE and TRuleGrowth. In the other hand, PrefixSpan and TRuleGrowth is more efficient in using memory than BIDE.
Massive problem reports mining and analysis based parallelism for similar search
NASA Astrophysics Data System (ADS)
Zhou, Ya; Hu, Cailin; Xiong, Han; Wei, Xiafei; Li, Ling
2017-05-01
Massive problem reports and solutions accumulated over time and continuously collected in XML Spreadsheet (XMLSS) format from enterprises and organizations, which record a series of comprehensive description about problems that can help technicians to trace problems and their solutions. It's a significant and challenging issue to effectively manage and analyze these massive semi-structured data to provide similar problem solutions, decisions of immediate problem and assisting product optimization for users during hardware and software maintenance. For this purpose, we build a data management system to manage, mine and analyze these data search results that can be categorized and organized into several categories for users to quickly find out where their interesting results locate. Experiment results demonstrate that this system is better than traditional centralized management system on the performance and the adaptive capability of heterogeneous data greatly. Besides, because of re-extracting topics, it enables each cluster to be described more precise and reasonable.
Maier, Raina M.; Díaz-Barriga, Fernando; Field, James A.; Hopkins, James; Klein, Bern; Poulton, Mary M.
2016-01-01
Increasing global demand for metals is straining the ability of the mining industry to physically keep up with demand (physical scarcity). On the other hand, social issues including the environmental and human health consequences of mining as well as the disparity in income distribution from mining revenues are disproportionately felt at the local community level. This has created social rifts, particularly in the developing world, between affected communities and both industry and governments. Such rifts can result in a disruption of the steady supply of metals (situational scarcity). Here we discuss the importance of mining in relationship to poverty, identify steps that have been taken to create a framework for socially responsible mining, and then discuss the need for academia to work in partnership with communities, government, and industry to develop trans-disciplinary research-based step change solutions to the intertwined problems of physical and situational scarcity. PMID:24552962
A probabilistic approach for mine burial prediction
NASA Astrophysics Data System (ADS)
Barbu, Costin; Valent, Philip; Richardson, Michael; Abelev, Andrei; Plant, Nathaniel
2004-09-01
Predicting the degree of burial of mines in soft sediments is one of the main concerns of Naval Mine CounterMeasures (MCM) operations. This is a difficult problem to solve due to uncertainties and variability of the sediment parameters (i.e., density and shear strength) and of the mine state at contact with the seafloor (i.e., vertical and horizontal velocity, angular rotation rate, and pitch angle at the mudline). A stochastic approach is proposed in this paper to better incorporate the dynamic nature of free-falling cylindrical mines in the modeling of impact burial. The orientation, trajectory and velocity of cylindrical mines, after about 4 meters free-fall in the water column, are very strongly influenced by boundary layer effects causing quite chaotic behavior. The model's convolution of the uncertainty through its nonlinearity is addressed by employing Monte Carlo simulations. Finally a risk analysis based on the probability of encountering an undetectable mine is performed.
Online discourse on fibromyalgia: text-mining to identify clinical distinction and patient concerns.
Park, Jungsik; Ryu, Young Uk
2014-10-07
The purpose of this study was to evaluate the possibility of using text-mining to identify clinical distinctions and patient concerns in online memoires posted by patients with fibromyalgia (FM). A total of 399 memoirs were collected from an FM group website. The unstructured data of memoirs associated with FM were collected through a crawling process and converted into structured data with a concordance, parts of speech tagging, and word frequency. We also conducted a lexical analysis and phrase pattern identification. After examining the data, a set of FM-related keywords were obtained and phrase net relationships were set through a web-based visualization tool. The clinical distinction of FM was verified. Pain is the biggest issue to the FM patients. The pains were affecting body parts including 'muscles,' 'leg,' 'neck,' 'back,' 'joints,' and 'shoulders' with accompanying symptoms such as 'spasms,' 'stiffness,' and 'aching,' and were described as 'sever,' 'chronic,' and 'constant.' This study also demonstrated that it was possible to understand the interests and concerns of FM patients through text-mining. FM patients wanted to escape from the pain and symptoms, so they were interested in medical treatment and help. Also, they seemed to have interest in their work and occupation, and hope to continue to live life through the relationships with the people around them. This research shows the potential for extracting keywords to confirm the clinical distinction of a certain disease, and text-mining can help objectively understand the concerns of patients by generalizing their large number of subjective illness experiences. However, it is believed that there are limitations to the processes and methods for organizing and classifying large amounts of text, so these limits have to be considered when analyzing the results. The development of research methodology to overcome these limitations is greatly needed.
Unsupervised text mining for assessing and augmenting GWAS results.
Ailem, Melissa; Role, François; Nadif, Mohamed; Demenais, Florence
2016-04-01
Text mining can assist in the analysis and interpretation of large-scale biomedical data, helping biologists to quickly and cheaply gain confirmation of hypothesized relationships between biological entities. We set this question in the context of genome-wide association studies (GWAS), an actively emerging field that contributed to identify many genes associated with multifactorial diseases. These studies allow to identify groups of genes associated with the same phenotype, but provide no information about the relationships between these genes. Therefore, our objective is to leverage unsupervised text mining techniques using text-based cosine similarity comparisons and clustering applied to candidate and random gene vectors, in order to augment the GWAS results. We propose a generic framework which we used to characterize the relationships between 10 genes reported associated with asthma by a previous GWAS. The results of this experiment showed that the similarities between these 10 genes were significantly stronger than would be expected by chance (one-sided p-value<0.01). The clustering of observed and randomly selected gene also allowed to generate hypotheses about potential functional relationships between these genes and thus contributed to the discovery of new candidate genes for asthma. Copyright © 2016 Elsevier Inc. All rights reserved.
Stansfield, Claire; O'Mara-Eves, Alison; Thomas, James
2017-09-01
Using text mining to aid the development of database search strings for topics described by diverse terminology has potential benefits for systematic reviews; however, methods and tools for accomplishing this are poorly covered in the research methods literature. We briefly review the literature on applications of text mining for search term development for systematic reviewing. We found that the tools can be used in 5 overarching ways: improving the precision of searches; identifying search terms to improve search sensitivity; aiding the translation of search strategies across databases; searching and screening within an integrated system; and developing objectively derived search strategies. Using a case study and selected examples, we then reflect on the utility of certain technologies (term frequency-inverse document frequency and Termine, term frequency, and clustering) in improving the precision and sensitivity of searches. Challenges in using these tools are discussed. The utility of these tools is influenced by the different capabilities of the tools, the way the tools are used, and the text that is analysed. Increased awareness of how the tools perform facilitates the further development of methods for their use in systematic reviews. Copyright © 2017 John Wiley & Sons, Ltd.
U-Compare: share and compare text mining tools with UIMA.
Kano, Yoshinobu; Baumgartner, William A; McCrohon, Luke; Ananiadou, Sophia; Cohen, K Bretonnel; Hunter, Lawrence; Tsujii, Jun'ichi
2009-08-01
Due to the increasing number of text mining resources (tools and corpora) available to biologists, interoperability issues between these resources are becoming significant obstacles to using them effectively. UIMA, the Unstructured Information Management Architecture, is an open framework designed to aid in the construction of more interoperable tools. U-Compare is built on top of the UIMA framework, and provides both a concrete framework for out-of-the-box text mining and a sophisticated evaluation platform allowing users to run specific tools on any target text, generating both detailed statistics and instance-based visualizations of outputs. U-Compare is a joint project, providing the world's largest, and still growing, collection of UIMA-compatible resources. These resources, originally developed by different groups for a variety of domains, include many famous tools and corpora. U-Compare can be launched straight from the web, without needing to be manually installed. All U-Compare components are provided ready-to-use and can be combined easily via a drag-and-drop interface without any programming. External UIMA components can also simply be mixed with U-Compare components, without distinguishing between locally and remotely deployed resources. http://u-compare.org/
Automated annotation of functional imaging experiments via multi-label classification
Turner, Matthew D.; Chakrabarti, Chayan; Jones, Thomas B.; Xu, Jiawei F.; Fox, Peter T.; Luger, George F.; Laird, Angela R.; Turner, Jessica A.
2013-01-01
Identifying the experimental methods in human neuroimaging papers is important for grouping meaningfully similar experiments for meta-analyses. Currently, this can only be done by human readers. We present the performance of common machine learning (text mining) methods applied to the problem of automatically classifying or labeling this literature. Labeling terms are from the Cognitive Paradigm Ontology (CogPO), the text corpora are abstracts of published functional neuroimaging papers, and the methods use the performance of a human expert as training data. We aim to replicate the expert's annotation of multiple labels per abstract identifying the experimental stimuli, cognitive paradigms, response types, and other relevant dimensions of the experiments. We use several standard machine learning methods: naive Bayes (NB), k-nearest neighbor, and support vector machines (specifically SMO or sequential minimal optimization). Exact match performance ranged from only 15% in the worst cases to 78% in the best cases. NB methods combined with binary relevance transformations performed strongly and were robust to overfitting. This collection of results demonstrates what can be achieved with off-the-shelf software components and little to no pre-processing of raw text. PMID:24409112
The use of low-cost adsorbents for wastewater purification in mining industries.
Iakovleva, Evgenia; Sillanpää, Mika
2013-11-01
Recently, great attention has been paid to the environmental problems in mining industry. At present there are different ways of mineral processing, as well as various methods of wastewater treatment, most of them are expensive. Work is ongoing to find low-cost treatments. In this article, low-cost adsorbents, potentially useful for wastewater treatment on mining and metallurgical plants, are reviewed; their characteristics, advantages, and disadvantages of their application are compared. Also adsorption of different metals and radioactive compounds from acidic environment similar to composition of mining and metallurgical wastewaters is considered.
Vlsi implementation of flexible architecture for decision tree classification in data mining
NASA Astrophysics Data System (ADS)
Sharma, K. Venkatesh; Shewandagn, Behailu; Bhukya, Shankar Nayak
2017-07-01
The Data mining algorithms have become vital to researchers in science, engineering, medicine, business, search and security domains. In recent years, there has been a terrific raise in the size of the data being collected and analyzed. Classification is the main difficulty faced in data mining. In a number of the solutions developed for this problem, most accepted one is Decision Tree Classification (DTC) that gives high precision while handling very large amount of data. This paper presents VLSI implementation of flexible architecture for Decision Tree classification in data mining using c4.5 algorithm.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kargupta, H.; Stafford, B.; Hamzaoglu, I.
This paper describes an experimental parallel/distributed data mining system PADMA (PArallel Data Mining Agents) that uses software agents for local data accessing and analysis and a web based interface for interactive data visualization. It also presents the results of applying PADMA for detecting patterns in unstructured texts of postmortem reports and laboratory test data for Hepatitis C patients.
29 CFR 570.33 - Prohibited occupations for minors 14 and 15 years of age.
Code of Federal Regulations, 2010 CFR
2010-07-01
... shall apply to all occupations other than the following: (a) Manufacturing, mining, or processing... revised text is set forth as follows: § 570.33 Occupations that are prohibited to minors 14 and 15 years... age: (a) Manufacturing, mining, or processing occupations, including occupations requiring the...
Federal Register 2010, 2011, 2012, 2013, 2014
2011-07-28
... considered but eliminated from detailed analysis include conventional uranium mining and milling, conventional mining and heap leach processing, alternative site location, alternate lixiviants, and alternate...'s Agencywide Document Access and Management System (ADAMS), which provides text and image files of...
30 CFR 732.17 - State program amendments.
Code of Federal Regulations, 2010 CFR
2010-07-01
... Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR... the number or size of coal exploration or surface coal mining and reclamation operations in the State... amendment(s) is being reviewed by the Director and will include the following: (i) The text or a summary of...
30 CFR 745.11 - Application and agreement.
Code of Federal Regulations, 2010 CFR
2010-07-01
....11 Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR... approval under part 731 of this chapter, and has or may have within the State surface coal mining and... the full text of the terms of the proposed cooperative agreement as submitted or as subsequently...
29 CFR 570.118 - Sixteen-year minimum.
Code of Federal Regulations, 2010 CFR
2010-07-01
... for employment in manufacturing or mining occupations. Furthermore, this age minimum is applicable to... convenience of the user, the revised text is set forth as follows: § 570.118 Sixteen-year minimum. The Act sets a 16-year-age minimum for employment in manufacturing or mining occupations, although under FLSA...
Code of Federal Regulations, 2010 CFR
2010-07-01
... other than the following: (1) Manufacturing, (2) Mining, (3) An occupation found by the Secretary to be..., the revised text is set forth as follows: § 570.122 General. (a) Specific exemptions from the child... sixteen years in any occupation other than manufacturing, mining, or an occupation found by the Secretary...
29 CFR 570.119 - Fourteen-year minimum.
Code of Federal Regulations, 2010 CFR
2010-07-01
... occupations other than manufacturing and mining, the Secretary is authorized to issue regulations or orders... Subpart C of this part. 29-30 [Reserved] (a) Manufacturing, mining, or processing occupations; (b... of the user, the revised text is set forth as follows: § 570.119 Fourteen-year minimum. With respect...
OSCAR4: a flexible architecture for chemical text-mining.
Jessop, David M; Adams, Sam E; Willighagen, Egon L; Hawizy, Lezan; Murray-Rust, Peter
2011-10-14
The Open-Source Chemistry Analysis Routines (OSCAR) software, a toolkit for the recognition of named entities and data in chemistry publications, has been developed since 2002. Recent work has resulted in the separation of the core OSCAR functionality and its release as the OSCAR4 library. This library features a modular API (based on reduction of surface coupling) that permits client programmers to easily incorporate it into external applications. OSCAR4 offers a domain-independent architecture upon which chemistry specific text-mining tools can be built, and its development and usage are discussed.
Underground Mining Method Selection Using WPM and PROMETHEE
NASA Astrophysics Data System (ADS)
Balusa, Bhanu Chander; Singam, Jayanthu
2018-04-01
The aim of this paper is to represent the solution to the problem of selecting suitable underground mining method for the mining industry. It is achieved by using two multi-attribute decision making techniques. These two techniques are weighted product method (WPM) and preference ranking organization method for enrichment evaluation (PROMETHEE). In this paper, analytic hierarchy process is used for weight's calculation of the attributes (i.e. parameters which are used in this paper). Mining method selection depends on physical parameters, mechanical parameters, economical parameters and technical parameters. WPM and PROMETHEE techniques have the ability to consider the relationship between the parameters and mining methods. The proposed techniques give higher accuracy and faster computation capability when compared with other decision making techniques. The proposed techniques are presented to determine the effective mining method for bauxite mine. The results of these techniques are compared with methods used in the earlier research works. The results show, conventional cut and fill method is the most suitable mining method.
Mine-hunting dolphins of the Navy
NASA Astrophysics Data System (ADS)
Moore, Patrick W.
1997-07-01
Current counter-mine and obstacle avoidance technology is inadequate, and limits the Navy's capability to conduct shallow water (SW) and very shallow water (VSW) MCM in support of beach assaults by Marine Corps forces. Without information as to the location or density of mined beach areas, it must be assumed that if mines are present in one area then they are present in all areas. Marine mammal systems (MMS) are an unusual, effective and unique solution to current problems of mine and obstacle hunting. In the US Navy Mine Warfare Plan for 1994-1995 Marine Mammal Systems are explicitly identified as the Navy's only means of countering buried mines and the best means for dealing with close-tethered mines. The dolphins in these systems possess a biological sonar specifically adapted for their shallow and very shallow water habitat. Research has demonstrated that the dolphin biosonar outperforms any current hardware system available for SW and VSW applications. This presentation will cover current Fleet MCM systems and future technology application to the littoral region.
Research on Customer Value Based on Extension Data Mining
NASA Astrophysics Data System (ADS)
Chun-Yan, Yang; Wei-Hua, Li
Extenics is a new discipline for dealing with contradiction problems with formulize model. Extension data mining (EDM) is a product combining Extenics with data mining. It explores to acquire the knowledge based on extension transformations, which is called extension knowledge (EK), taking advantage of extension methods and data mining technology. EK includes extensible classification knowledge, conductive knowledge and so on. Extension data mining technology (EDMT) is a new data mining technology that mining EK in databases or data warehouse. Customer value (CV) can weigh the essentiality of customer relationship for an enterprise according to an enterprise as a subject of tasting value and customers as objects of tasting value at the same time. CV varies continually. Mining the changing knowledge of CV in databases using EDMT, including quantitative change knowledge and qualitative change knowledge, can provide a foundation for that an enterprise decides the strategy of customer relationship management (CRM). It can also provide a new idea for studying CV.
Health impacts of coal and coal use: Possible solutions
Finkelman, R.B.; Orem, W.; Castranova, V.; Tatu, C.A.; Belkin, H.E.; Zheng, B.; Lerch, H.E.; Maharaj, S.V.; Bates, A.L.
2002-01-01
Coal will be a dominant energy source in both developed and developing countries for at least the first half of the 21st century. Environmental problems associated with coal, before mining, during mining, in storage, during combustion, and postcombustion waste products are well known and are being addressed by ongoing research. The connection between potential environmental problems with human health is a fairly new field and requires the cooperation of both the geoscience and medical disciplines. Three research programs that illustrate this collaboration are described and used to present a range of human health problems that are potentially caused by coal. Domestic combustion of coal in China has, in some cases, severely affected human health. Both on a local and regional scale, human health has been adversely affected by coals containing arsenic, fluorine, selenium, and possibly, mercury. Balkan endemic nephropathy (BEN), an irreversible kidney disease of unknown origin, has been related to the proximity of Pliocene lignite deposits. The working hypothesis is that groundwater is leaching toxic organic compounds as it passes through the lignites and that these organics are then ingested by the local population contributing to this health problem. Human disease associated with coal mining mainly results from inhalation of particulate matter during the mining process. The disease is Coal Worker's Pneumoconiosis characterized by coal dust-induced lesions in the gas exchange regions of the lung; the coal worker's "black lung disease". ?? 2002 Elsevier Science B.V. All rights reserved.
Comparative Analysis of Document level Text Classification Algorithms using R
NASA Astrophysics Data System (ADS)
Syamala, Maganti; Nalini, N. J., Dr; Maguluri, Lakshamanaphaneendra; Ragupathy, R., Dr.
2017-08-01
From the past few decades there has been tremendous volumes of data available in Internet either in structured or unstructured form. Also, there is an exponential growth of information on Internet, so there is an emergent need of text classifiers. Text mining is an interdisciplinary field which draws attention on information retrieval, data mining, machine learning, statistics and computational linguistics. And to handle this situation, a wide range of supervised learning algorithms has been introduced. Among all these K-Nearest Neighbor(KNN) is efficient and simplest classifier in text classification family. But KNN suffers from imbalanced class distribution and noisy term features. So, to cope up with this challenge we use document based centroid dimensionality reduction(CentroidDR) using R Programming. By combining these two text classification techniques, KNN and Centroid classifiers, we propose a scalable and effective flat classifier, called MCenKNN which works well substantially better than CenKNN.
Mining Consumer Health Vocabulary from Community-Generated Text
Vydiswaran, V.G. Vinod; Mei, Qiaozhu; Hanauer, David A.; Zheng, Kai
2014-01-01
Community-generated text corpora can be a valuable resource to extract consumer health vocabulary (CHV) and link them to professional terminologies and alternative variants. In this research, we propose a pattern-based text-mining approach to identify pairs of CHV and professional terms from Wikipedia, a large text corpus created and maintained by the community. A novel measure, leveraging the ratio of frequency of occurrence, was used to differentiate consumer terms from professional terms. We empirically evaluated the applicability of this approach using a large data sample consisting of MedLine abstracts and all posts from an online health forum, MedHelp. The results show that the proposed approach is able to identify synonymous pairs and label the terms as either consumer or professional term with high accuracy. We conclude that the proposed approach provides great potential to produce a high quality CHV to improve the performance of computational applications in processing consumer-generated health text. PMID:25954426
Automation and robotics technology for intelligent mining systems
NASA Technical Reports Server (NTRS)
Welsh, Jeffrey H.
1989-01-01
The U.S. Bureau of Mines is approaching the problems of accidents and efficiency in the mining industry through the application of automation and robotics to mining systems. This technology can increase safety by removing workers from hazardous areas of the mines or from performing hazardous tasks. The short-term goal of the Automation and Robotics program is to develop technology that can be implemented in the form of an autonomous mining machine using current continuous mining machine equipment. In the longer term, the goal is to conduct research that will lead to new intelligent mining systems that capitalize on the capabilities of robotics. The Bureau of Mines Automation and Robotics program has been structured to produce the technology required for the short- and long-term goals. The short-term goal of application of automation and robotics to an existing mining machine, resulting in autonomous operation, is expected to be accomplished within five years. Key technology elements required for an autonomous continuous mining machine are well underway and include machine navigation systems, coal-rock interface detectors, machine condition monitoring, and intelligent computer systems. The Bureau of Mines program is described, including status of key technology elements for an autonomous continuous mining machine, the program schedule, and future work. Although the program is directed toward underground mining, much of the technology being developed may have applications for space systems or mining on the Moon or other planets.
Figueroa, Rosa L; Flores, Christopher A
2016-08-01
Obesity is a chronic disease with an increasing impact on the world's population. In this work, we present a method of identifying obesity automatically using text mining techniques and information related to body weight measures and obesity comorbidities. We used a dataset of 3015 de-identified medical records that contain labels for two classification problems. The first classification problem distinguishes between obesity, overweight, normal weight, and underweight. The second classification problem differentiates between obesity types: super obesity, morbid obesity, severe obesity and moderate obesity. We used a Bag of Words approach to represent the records together with unigram and bigram representations of the features. We implemented two approaches: a hierarchical method and a nonhierarchical one. We used Support Vector Machine and Naïve Bayes together with ten-fold cross validation to evaluate and compare performances. Our results indicate that the hierarchical approach does not work as well as the nonhierarchical one. In general, our results show that Support Vector Machine obtains better performances than Naïve Bayes for both classification problems. We also observed that bigram representation improves performance compared with unigram representation.
AMELIORATION OF ACID MINE DRAINAGE USING REACTIVE MIXTURES IN PERMEABLE REACTIVE BARRIERS
The generation and release of acidic drainage from mine wastes is an environmental problem of international scale. The use of zero-valent iron and/or iron mixtures in subsurface Permeable Reactive Barriers (PRB) presents a possible passive alternative for remediating acidic grou...
Report #2007-P-00016, April 2, 2007. We did not find evidence to indicate that the EPA's decision making to investigate environmental conditions at the Ringwood Mines/Landfill site were affected by the area’s racial, cultural, or socioeconomic status.
Data Reduction Algorithm Using Nonnegative Matrix Factorization with Nonlinear Constraints
NASA Astrophysics Data System (ADS)
Sembiring, Pasukat
2017-12-01
Processing ofdata with very large dimensions has been a hot topic in recent decades. Various techniques have been proposed in order to execute the desired information or structure. Non- Negative Matrix Factorization (NMF) based on non-negatives data has become one of the popular methods for shrinking dimensions. The main strength of this method is non-negative object, the object model by a combination of some basic non-negative parts, so as to provide a physical interpretation of the object construction. The NMF is a dimension reduction method thathasbeen used widely for numerous applications including computer vision,text mining, pattern recognitions,and bioinformatics. Mathematical formulation for NMF did not appear as a convex optimization problem and various types of algorithms have been proposed to solve the problem. The Framework of Alternative Nonnegative Least Square(ANLS) are the coordinates of the block formulation approaches that have been proven reliable theoretically and empirically efficient. This paper proposes a new algorithm to solve NMF problem based on the framework of ANLS.This algorithm inherits the convergenceproperty of the ANLS framework to nonlinear constraints NMF formulations.
NASA Astrophysics Data System (ADS)
Masaitis, A.
2013-12-01
Conflicts in the development of mining projects are now common between the mining proponents, NGO's and communities. These conflicts can sometimes be alleviated by early development of modes of communication, and a formal discussion format that allows airing of concerns and potential resolution of problems. One of the methods that can formalize this process is to establish a Good Neighbor Agreement (GNA), which deals specifically with challenges in relationships between mining operations and the local communities. It is a new practice related to mining operations that are oriented toward social needs and concerns of local communities that arise during the normal life of a mine, which can achieve sustainable mining practices in both developing and developed countries. The GNA project being currently developed at the University of Nevada, Reno in cooperation with the Newmont Mining Corporation has a goal to create an open company/community dialog that is based on the international standards and that will help identify and address sociological and environmental concerns associated with mining, as well as find methods for communication and conflict resolution. GNA standards should be based on trust doctrine, open information access, and community involvement in the decision making process. It should include the following components: emergency response and community communications; environmental issues, including air and water quality standards; reclamation and recultivation; socio-economic issues: transportation, safety, training, and local hiring; and financial issues, particularly related to mitigation offsets and community needs. The GNA standards help identify and evaluate conflict criteria in mining/community relationships; determine the status of concerns; focus on the local political and government systems; separate the acute and the chronic concerns; determine the role and responsibilities of stakeholders; analyze problem resolution feasibility; maintain the community involvement and support through economic benefits and environmental safeguards; develop options for the concerns resolution; develop and manage short and long-term plans. Difficulties in establishing the GNA standards include identification of the full list of stakeholders, lack of responsible environmental protection practices, dependence on the government and political system, lack of will to disclose full information to the public. It is further complicated by the lack of insurance/bonding policies, and by the lack of audit and monitoring that could determine the level of exposure of the local community and the environment to the contaminants released at the mine sites. Since many problems of mines can occur during closure and post-closure, GNA's should address those issues also. Determined the process for the GNA implementation as a conflict prevention/resolution tool, analyzed conflict/concerns criteria associated with mining operations, determined the role of the stakeholders, worked out the process of stakeholders monitoring, carried out the sociological survey of the stakeholders and the community. Frequent conflicts between mining companies and surrounding communities that lead to work disruptions or even mine closures show the necessity of a less confrontational approach to environmental and social justice. Establishment of GNA standards for use in both developed and developing nations can decrease these conflicts.
Implicit prosody mining based on the human eye image capture technology
NASA Astrophysics Data System (ADS)
Gao, Pei-pei; Liu, Feng
2013-08-01
The technology of eye tracker has become the main methods of analyzing the recognition issues in human-computer interaction. Human eye image capture is the key problem of the eye tracking. Based on further research, a new human-computer interaction method introduced to enrich the form of speech synthetic. We propose a method of Implicit Prosody mining based on the human eye image capture technology to extract the parameters from the image of human eyes when reading, control and drive prosody generation in speech synthesis, and establish prosodic model with high simulation accuracy. Duration model is key issues for prosody generation. For the duration model, this paper put forward a new idea for obtaining gaze duration of eyes when reading based on the eye image capture technology, and synchronous controlling this duration and pronunciation duration in speech synthesis. The movement of human eyes during reading is a comprehensive multi-factor interactive process, such as gaze, twitching and backsight. Therefore, how to extract the appropriate information from the image of human eyes need to be considered and the gaze regularity of eyes need to be obtained as references of modeling. Based on the analysis of current three kinds of eye movement control model and the characteristics of the Implicit Prosody reading, relative independence between speech processing system of text and eye movement control system was discussed. It was proved that under the same text familiarity condition, gaze duration of eyes when reading and internal voice pronunciation duration are synchronous. The eye gaze duration model based on the Chinese language level prosodic structure was presented to change previous methods of machine learning and probability forecasting, obtain readers' real internal reading rhythm and to synthesize voice with personalized rhythm. This research will enrich human-computer interactive form, and will be practical significance and application prospect in terms of disabled assisted speech interaction. Experiments show that Implicit Prosody mining based on the human eye image capture technology makes the synthesized speech has more flexible expressions.
Wagland, Richard; Recio-Saucedo, Alejandra; Simon, Michael; Bracher, Michael; Hunt, Katherine; Foster, Claire; Downing, Amy; Glaser, Adam; Corner, Jessica
2016-08-01
Quality of cancer care may greatly impact on patients' health-related quality of life (HRQoL). Free-text responses to patient-reported outcome measures (PROMs) provide rich data but analysis is time and resource-intensive. This study developed and tested a learning-based text-mining approach to facilitate analysis of patients' experiences of care and develop an explanatory model illustrating impact on HRQoL. Respondents to a population-based survey of colorectal cancer survivors provided free-text comments regarding their experience of living with and beyond cancer. An existing coding framework was tested and adapted, which informed learning-based text mining of the data. Machine-learning algorithms were trained to identify comments relating to patients' specific experiences of service quality, which were verified by manual qualitative analysis. Comparisons between coded retrieved comments and a HRQoL measure (EQ5D) were explored. The survey response rate was 63.3% (21 802/34 467), of which 25.8% (n=5634) participants provided free-text comments. Of retrieved comments on experiences of care (n=1688), over half (n=1045, 62%) described positive care experiences. Most negative experiences concerned a lack of post-treatment care (n=191, 11% of retrieved comments) and insufficient information concerning self-management strategies (n=135, 8%) or treatment side effects (n=160, 9%). Associations existed between HRQoL scores and coded algorithm-retrieved comments. Analysis indicated that the mechanism by which service quality impacted on HRQoL was the extent to which services prevented or alleviated challenges associated with disease and treatment burdens. Learning-based text mining techniques were found useful and practical tools to identify specific free-text comments within a large dataset, facilitating resource-efficient qualitative analysis. This method should be considered for future PROM analysis to inform policy and practice. Study findings indicated that perceived care quality directly impacts on HRQoL. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
NASA Technical Reports Server (NTRS)
1975-01-01
The results of a study of the weather sensitive features of near shore and deep water ocean mining industries are described. Problems with the evaluation of economic benefits for the deep water ocean mining industry are attributed to the relative immaturity and highly proprietary nature of the industry. Case studies on the gold industry, diamond industry, tin industry and sand and gravel industry are cited.
Acoustic Communications Considerations for Collaborative Simultaneous Localization and Mapping
2014-12-01
combat. The United States experienced this problem first hand on April 14, 1988, when the USS Samuel B. Roberts (FFG 58) struck a mine in the Persian...Strait of Hormuz where the USS Samuel B. Roberts was operating. The low cost and advancing technology of naval mines makes them particularly well...access and ensure the free flow of maritime trade in the global commons. The present means of mine countermeasures largely reside on surface ships and
Extraction and Classification of Emotions for Business Research
NASA Astrophysics Data System (ADS)
Verma, Rajib
The commercial study of emotions has not embraced Internet / social mining yet, even though it has important applications in management. This is surprising since the emotional content is freeform, wide spread, can give a better indication of feelings (for instance with taboo subjects), and is inexpensive compared to other business research methods. A brief framework for applying text mining to this new research domain is shown and classification issues are discussed in an effort to quickly get businessman and researchers to adopt the mining methodology.
Clustering and Dimensionality Reduction to Discover Interesting Patterns in Binary Data
NASA Astrophysics Data System (ADS)
Palumbo, Francesco; D'Enza, Alfonso Iodice
The attention towards binary data coding increased consistently in the last decade due to several reasons. The analysis of binary data characterizes several fields of application, such as market basket analysis, DNA microarray data, image mining, text mining and web-clickstream mining. The paper illustrates two different approaches exploiting a profitable combination of clustering and dimensionality reduction for the identification of non-trivial association structures in binary data. An application in the Association Rules framework supports the theory with the empirical evidence.
NASA Astrophysics Data System (ADS)
Durand, J. F.
2012-06-01
The Witwatersrand has been subjected to geological exploration, mining activities, parallel industrial development and associated settlement patterns over the past century. The gold mines brought with them not only development, employment and wealth, but also the most devastating war in the history of South Africa, civil unrest, economical inequality, social uprooting, pollution, negative health impacts and ecological destruction. One of the most consistent and pressing problems caused by mining has been its impact on the water bodies in and adjacent to the Witwatersrand. The dewatering and rewatering of the karstic aquifer overlying and adjacent to the Witwatersrand Supergroup and the pollution caused by Acid Mine Drainage (AMD) are some of the most serious consequences of gold mining in South Africa and will affect the lives of many South Africans.
Management of the water balance and quality in mining areas
NASA Astrophysics Data System (ADS)
Pasanen, Antti; Krogerus, Kirsti; Mroueh, Ulla-Maija; Turunen, Kaisa; Backnäs, Soile; Vento, Tiia; Veijalainen, Noora; Hentinen, Kimmo; Korkealaakso, Juhani
2015-04-01
Although mining companies have long been conscious of water related risks they still face environmental management problems. These problems mainly emerge because mine sites' water balances have not been adequately assessed in the stage of the planning of mines. More consistent approach is required to help mining companies identify risks and opportunities related to the management of water resources in all stages of mining. This approach requires that the water cycle of a mine site is interconnected with the general hydrologic water cycle. In addition to knowledge on hydrological conditions, the control of the water balance in the mining processes require knowledge of mining processes, the ability to adjust process parameters to variable hydrological conditions, adaptation of suitable water management tools and systems, systematic monitoring of amounts and quality of water, adequate capacity in water management infrastructure to handle the variable water flows, best practices to assess the dispersion, mixing and dilution of mine water and pollutant loading to receiving water bodies, and dewatering and separation of water from tailing and precipitates. WaterSmart project aims to improve the awareness of actual quantities of water, and water balances in mine areas to improve the forecasting and the management of the water volumes. The study is executed through hydrogeological and hydrological surveys and online monitoring procedures. One of the aims is to exploit on-line water quantity and quality monitoring for the better management of the water balances. The target is to develop a practical and end-user-specific on-line input and output procedures. The second objective is to develop mathematical models to calculate combined water balances including the surface, ground and process waters. WSFS, the Hydrological Modeling and Forecasting System of SYKE is being modified for mining areas. New modelling tools are developed on spreadsheet and system dynamics platforms to systematically integrate all water balance components (groundwater, surface water, infiltration, precipitation, mine water facilities and operations etc.) into overall dynamic mine site considerations. After coupling the surface and ground water models (e.g. Feflow and WSFS) with each other, they are compared with Goldsim. The third objective is to integrate the monitoring and modelling tools into the mine management system and process control. The modelling and predictive process control can prevent flood situations, ensure water adequacy, and enable the controlled mine water treatment. The project will develop a constantly updated management system for water balance including both natural waters and process waters.
Mining biomedical images towards valuable information retrieval in biomedical and life sciences
Ahmed, Zeeshan; Zeeshan, Saman; Dandekar, Thomas
2016-01-01
Biomedical images are helpful sources for the scientists and practitioners in drawing significant hypotheses, exemplifying approaches and describing experimental results in published biomedical literature. In last decades, there has been an enormous increase in the amount of heterogeneous biomedical image production and publication, which results in a need for bioimaging platforms for feature extraction and analysis of text and content in biomedical images to take advantage in implementing effective information retrieval systems. In this review, we summarize technologies related to data mining of figures. We describe and compare the potential of different approaches in terms of their developmental aspects, used methodologies, produced results, achieved accuracies and limitations. Our comparative conclusions include current challenges for bioimaging software with selective image mining, embedded text extraction and processing of complex natural language queries. PMID:27538578
Monitoring food safety violation reports from internet forums.
Kate, Kiran; Negi, Sumit; Kalagnanam, Jayant
2014-01-01
Food-borne illness is a growing public health concern in the world. Government bodies, which regulate and monitor the state of food safety, solicit citizen feedback about food hygiene practices followed by food establishments. They use traditional channels like call center, e-mail for such feedback collection. With the growing popularity of Web 2.0 and social media, citizens often post such feedback on internet forums, message boards etc. The system proposed in this paper applies text mining techniques to identify and mine such food safety complaints posted by citizens on web data sources thereby enabling the government agencies to gather more information about the state of food safety. In this paper, we discuss the architecture of our system and the text mining methods used. We also present results which demonstrate the effectiveness of this system in a real-world deployment.
Map showing potential metal-mine drainage hazards in Colorado, based on mineral-deposit geology
Plumlee, Geoffrey S.; Streufert, Randall K.; Smith, Kathleen S.; Smith, Steven M.; Wallace, Alan R.; Toth, Margo I.; Nash, J. Thomas; Robinson, Rob A.; Ficklin, Walter H.; Lee, Gregory K.
1995-01-01
This map, compiled by the U.S. Geological Survey (USGS) in cooperation with the Colorado Geological Survey (CGS) and the U. S. Bureau of Land Management (BLM), shows potential mine-drainage hazards that may exist in Colorado metal-mining districts, as indicated by the geologic characteristics of the mineral deposits that occur in the respective districts. It was designed to demonstrate how geologic and geochemical information can be used on a regional scale to help assess the potential for mining-related and natural drainage problems in mining districts, unmined mineralized areas, and surrounding watersheds. The map also provides information on the distribution of different mineral deposit types across Colorado. A GIS (Geographic Information System) format was used to integrate geologic, geochemical, water-quality, climate, landuse, and ecological data from diverse sources. Likely mine-drainage signatures were defined for each mining district based on: (1) a review of the geologic characteristics of the mining district, including mineralogy, trace-element content, host-rock lithology, and wallrock alteration, and; (2) results of site specific studies on the geologic controls on mine-drainage composition.
Method of gas emission control for safe working of flat gassy coal seams
NASA Astrophysics Data System (ADS)
Vinogradov, E. A.; Yaroshenko, V. V.; Kislicyn, M. S.
2017-10-01
The main problems at intensive flat gassy coal seam longwall mining are considered. For example, mine Kotinskaja JSC “SUEK-Kuzbass” shows that when conducting the work on the gassy coal seams, methane emission control by means of ventilation, degassing and insulated drain of methane-air mixture is not effective and stable enough. It is not always possible to remove the coal production restrictions by the gas factor, which leads to financial losses because of incomplete using of longwall equipment and the reduction of the technical and economic indicators of mining. To solve the problems, the authors used a complex method that includes the compilation and analysis of the theory and practice of intensive flat gassy coal seam longwall mining. Based on the results of field and numerical researches, the effect of parameters of technological schemes on efficiency of methane emission control on longwall panels, the non-linear dependence of the permissible according to gas factor longwall productivity on parameters of technological schemes, ventilation and degassing during intensive mining flat gassy coal seams was established. The number of recommendations on the choice of the location and the size of the intermediate section of coal heading to control gassing in the mining extracted area, and guidelines for choosing the parameters of ventilation of extracted area with the help of two air supply entries and removal of isolated methane-air mixture are presented in the paper. The technological scheme, using intermediate entry for fresh air intake, ensuring effective management gassing and allowing one to refuse from drilling wells from the surface to the mined-out space for mining gas-bearing coal seams, was developed.
NASA Astrophysics Data System (ADS)
Ghosh, G. K.; Sivakumar, C.
2018-03-01
Longwall mining technique has been widely used around the globe due to its safe mining process. However, mining operations are suspended when various problems arise like collapse of roof falls, cracks and fractures propagation in the roof and complexity in roof strata behaviors. To overcome these colossal problems, an underground real time microseismic monitoring technique has been implemented in the working panel-P2 in the Rajendra longwall underground coal mine at South Eastern Coalfields Limited (SECL), India. The target coal seams appears at the panel P-2 within a depth of 70 m to 76 m. In this process, 10 to 15 uniaxial geophones were placed inside a borehole at depth range of 40 m to 60 m located over the working panel-P2 with high rock quality designation value for better seismic signal. Various microseismic events were recorded with magnitude ranging from -5 to 2 in the Richter scale. The time-series processing was carried out to get various seismic parameters like activity rate, potential energy, viscosity rate, seismic moment, energy index, apparent volume and potential energy with respect to time. The used of these parameters helped tracing the events, understanding crack and fractures propagation and locating both high and low stress distribution zones prior to roof fall occurrence. In most of the cases, the events were divided into three stage processes: initial or preliminary, middle or building, and final or falling. The results of this study reveal that underground microseismic monitoring provides sufficient prior information of underground weighting events. The information gathered during the study was conveyed to the mining personnel in advance prior to roof fall event. This permits to take appropriate action for safer mining operations and risk reduction during longwall operation.
Ecosystem Health Assessment of Mining Cities Based on Landscape Pattern
NASA Astrophysics Data System (ADS)
Yu, W.; Liu, Y.; Lin, M.; Fang, F.; Xiao, R.
2017-09-01
Ecosystem health assessment (EHA) is one of the most important aspects in ecosystem management. Nowadays, ecological environment of mining cities is facing various problems. In this study, through ecosystem health theory and remote sensing images in 2005, 2009 and 2013, landscape pattern analysis and Vigor-Organization-Resilience (VOR) model were applied to set up an evaluation index system of ecosystem health of mining city to assess the healthy level of ecosystem in Panji District Huainan city. Results showed a temporal stable but high spatial heterogeneity landscape pattern during 2005-2013. According to the regional ecosystem health index, it experienced a rapid decline after a slight increase, and finally it maintained at an ordinary level. Among these areas, a significant distinction was presented in different towns. It indicates that the ecosystem health of Tianjijiedao town, the regional administrative centre, descended rapidly during the study period, and turned into the worst level in the study area. While the Hetuan Town, located in the northwestern suburb area of Panji District, stayed on a relatively better level than other towns. The impacts of coal mining collapse area, land reclamation on the landscape pattern and ecosystem health status of mining cities were also discussed. As a result of underground coal mining, land subsidence has become an inevitable problem in the study area. In addition, the coal mining subsidence area has brought about the destruction of the farmland, construction land and water bodies, which causing the change of the regional landscape pattern and making the evaluation of ecosystem health in mining area more difficult. Therefore, this study provided an ecosystem health approach for relevant departments to make scientific decisions.
Building a glaucoma interaction network using a text mining approach.
Soliman, Maha; Nasraoui, Olfa; Cooper, Nigel G F
2016-01-01
The volume of biomedical literature and its underlying knowledge base is rapidly expanding, making it beyond the ability of a single human being to read through all the literature. Several automated methods have been developed to help make sense of this dilemma. The present study reports on the results of a text mining approach to extract gene interactions from the data warehouse of published experimental results which are then used to benchmark an interaction network associated with glaucoma. To the best of our knowledge, there is, as yet, no glaucoma interaction network derived solely from text mining approaches. The presence of such a network could provide a useful summative knowledge base to complement other forms of clinical information related to this disease. A glaucoma corpus was constructed from PubMed Central and a text mining approach was applied to extract genes and their relations from this corpus. The extracted relations between genes were checked using reference interaction databases and classified generally as known or new relations. The extracted genes and relations were then used to construct a glaucoma interaction network. Analysis of the resulting network indicated that it bears the characteristics of a small world interaction network. Our analysis showed the presence of seven glaucoma linked genes that defined the network modularity. A web-based system for browsing and visualizing the extracted glaucoma related interaction networks is made available at http://neurogene.spd.louisville.edu/GlaucomaINViewer/Form1.aspx. This study has reported the first version of a glaucoma interaction network using a text mining approach. The power of such an approach is in its ability to cover a wide range of glaucoma related studies published over many years. Hence, a bigger picture of the disease can be established. To the best of our knowledge, this is the first glaucoma interaction network to summarize the known literature. The major findings were a set of relations that could not be found in existing interaction databases and that were found to be new, in addition to a smaller subnetwork consisting of interconnected clusters of seven glaucoma genes. Future improvements can be applied towards obtaining a better version of this network.
Mining Longitudinal Web Queries: Trends and Patterns.
ERIC Educational Resources Information Center
Wang, Peiling; Berry, Michael W.; Yang, Yiheng
2003-01-01
Analyzed user queries submitted to an academic Web site during a four-year period, using a relational database, to examine users' query behavior, to identify problems they encounter, and to develop techniques for optimizing query analysis and mining. Linguistic analyses focus on query structures, lexicon, and word associations using statistical…
An Environmental Unit for the Social Studies.
ERIC Educational Resources Information Center
Kroll, Claudia J.
Based on the inquiry method of learning, this instructional unit attempts to encourage students to discover for themselves the facts, problems, values, conflicts, and potential solutions of an environmental issue. Specifically, it deals with surface mining in the United States, with special focus on surface mining in Illinois. Materials and…
Environmental management of acid water problems in mining areas
NASA Astrophysics Data System (ADS)
Singh, Gurdeep; Bhatnagar, Mridula; Sinha, D. K.
1990-03-01
Acid Mine Drainage (AMD) originates from the oxidation and leaching of sulphide minerals present in coal and metalliferrous ore bodies and gives rise to several environmental degradation problems. An investigation has been carried out to combat the acidic water problems. Results of this investigation indicate that application of anionic surfactant (sodium lauryl sulphate) and food preservatives (sodium benzoate and potassium sorbate) effectively abate the acid formation at low concentration levels (15-40 ppm) as tested in laboratory as well as at pilot-scale levels. Acidity, sulphate and iron concentrations are found to reduce by over 70 percent and remained low for more than three months after treatment. Thus this investigation demonstrates the management of these problems in an environmentally safe manner by controlling acid formation at its source.
Mining EEG with SVM for Understanding Cognitive Underpinnings of Math Problem Solving Strategies
López, Julio
2018-01-01
We have developed a new methodology for examining and extracting patterns from brain electric activity by using data mining and machine learning techniques. Data was collected from experiments focused on the study of cognitive processes that might evoke different specific strategies in the resolution of math problems. A binary classification problem was constructed using correlations and phase synchronization between different electroencephalographic channels as characteristics and, as labels or classes, the math performances of individuals participating in specially designed experiments. The proposed methodology is based on using well-established procedures of feature selection, which were used to determine a suitable brain functional network size related to math problem solving strategies and also to discover the most relevant links in this network without including noisy connections or excluding significant connections. PMID:29670667
Mining EEG with SVM for Understanding Cognitive Underpinnings of Math Problem Solving Strategies.
Bosch, Paul; Herrera, Mauricio; López, Julio; Maldonado, Sebastián
2018-01-01
We have developed a new methodology for examining and extracting patterns from brain electric activity by using data mining and machine learning techniques. Data was collected from experiments focused on the study of cognitive processes that might evoke different specific strategies in the resolution of math problems. A binary classification problem was constructed using correlations and phase synchronization between different electroencephalographic channels as characteristics and, as labels or classes, the math performances of individuals participating in specially designed experiments. The proposed methodology is based on using well-established procedures of feature selection, which were used to determine a suitable brain functional network size related to math problem solving strategies and also to discover the most relevant links in this network without including noisy connections or excluding significant connections.