NASA Astrophysics Data System (ADS)
Kim, Kwang Hyeon; Lee, Suk; Shim, Jang Bo; Chang, Kyung Hwan; Yang, Dae Sik; Yoon, Won Sup; Park, Young Je; Kim, Chul Yong; Cao, Yuan Jie
2017-08-01
The aim of this study is an integrated research for text-based data mining and toxicity prediction modeling system for clinical decision support system based on big data in radiation oncology as a preliminary research. The structured and unstructured data were prepared by treatment plans and the unstructured data were extracted by dose-volume data image pattern recognition of prostate cancer for research articles crawling through the internet. We modeled an artificial neural network to build a predictor model system for toxicity prediction of organs at risk. We used a text-based data mining approach to build the artificial neural network model for bladder and rectum complication predictions. The pattern recognition method was used to mine the unstructured toxicity data for dose-volume at the detection accuracy of 97.9%. The confusion matrix and training model of the neural network were achieved with 50 modeled plans (n = 50) for validation. The toxicity level was analyzed and the risk factors for 25% bladder, 50% bladder, 20% rectum, and 50% rectum were calculated by the artificial neural network algorithm. As a result, 32 plans could cause complication but 18 plans were designed as non-complication among 50 modeled plans. We integrated data mining and a toxicity modeling method for toxicity prediction using prostate cancer cases. It is shown that a preprocessing analysis using text-based data mining and prediction modeling can be expanded to personalized patient treatment decision support based on big data.
Text mining for the biocuration workflow
Hirschman, Lynette; Burns, Gully A. P. C; Krallinger, Martin; Arighi, Cecilia; Cohen, K. Bretonnel; Valencia, Alfonso; Wu, Cathy H.; Chatr-Aryamontri, Andrew; Dowell, Karen G.; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G.
2012-01-01
Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on ‘Text Mining for the BioCuration Workflow’ at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community. PMID:22513129
Text mining for the biocuration workflow.
Hirschman, Lynette; Burns, Gully A P C; Krallinger, Martin; Arighi, Cecilia; Cohen, K Bretonnel; Valencia, Alfonso; Wu, Cathy H; Chatr-Aryamontri, Andrew; Dowell, Karen G; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G
2012-01-01
Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on 'Text Mining for the BioCuration Workflow' at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community.
The structure and infrastructure of the global nanotechnology literature
NASA Astrophysics Data System (ADS)
Kostoff, Ronald N.; Stump, Jesse A.; Johnson, Dustin; Murday, James S.; Lau, Clifford G. Y.; Tolles, William M.
2006-08-01
Text mining is the extraction of useful information from large volumes of text. A text mining analysis of the global open nanotechnology literature was performed. Records from the Science Citation Index (SCI)/Social SCI were analyzed to provide the infrastructure of the global nanotechnology literature (prolific authors/journals/institutions/countries, most cited authors/papers/journals) and the thematic structure (taxonomy) of the global nanotechnology literature, from a science perspective. Records from the Engineering Compendex (EC) were analyzed to provide a taxonomy from a technology perspective. The Far Eastern countries have expanded nanotechnology publication output dramatically in the past decade.
Raja, Kalpana; Patrick, Matthew; Gao, Yilin; Madu, Desmond; Yang, Yuyang
2017-01-01
In the past decade, the volume of “omics” data generated by the different high-throughput technologies has expanded exponentially. The managing, storing, and analyzing of this big data have been a great challenge for the researchers, especially when moving towards the goal of generating testable data-driven hypotheses, which has been the promise of the high-throughput experimental techniques. Different bioinformatics approaches have been developed to streamline the downstream analyzes by providing independent information to interpret and provide biological inference. Text mining (also known as literature mining) is one of the commonly used approaches for automated generation of biological knowledge from the huge number of published articles. In this review paper, we discuss the recent advancement in approaches that integrate results from omics data and information generated from text mining approaches to uncover novel biomedical information. PMID:28331849
Training and Employment of Land Mine and Booby Trap Detector Dogs. Volume II
1976-09-01
1Of injury, disease, and other physical abnormalities. All obligatory Li [1/ • ,i 4’: vaccinations should 1•e current ( canine distemper , infectious...as a procedures manual and reference text to be used during the training of initially naive canines v for land mine and booby trap detection service... canine L. training contexts. * • The techniques and procedures elaborated in the present docu- ment were developed for the United States Army Mobility
Comparative Analysis of Document level Text Classification Algorithms using R
NASA Astrophysics Data System (ADS)
Syamala, Maganti; Nalini, N. J., Dr; Maguluri, Lakshamanaphaneendra; Ragupathy, R., Dr.
2017-08-01
From the past few decades there has been tremendous volumes of data available in Internet either in structured or unstructured form. Also, there is an exponential growth of information on Internet, so there is an emergent need of text classifiers. Text mining is an interdisciplinary field which draws attention on information retrieval, data mining, machine learning, statistics and computational linguistics. And to handle this situation, a wide range of supervised learning algorithms has been introduced. Among all these K-Nearest Neighbor(KNN) is efficient and simplest classifier in text classification family. But KNN suffers from imbalanced class distribution and noisy term features. So, to cope up with this challenge we use document based centroid dimensionality reduction(CentroidDR) using R Programming. By combining these two text classification techniques, KNN and Centroid classifiers, we propose a scalable and effective flat classifier, called MCenKNN which works well substantially better than CenKNN.
Text Mining Metal-Organic Framework Papers.
Park, Sanghoon; Kim, Baekjun; Choi, Sihoon; Boyd, Peter G; Smit, Berend; Kim, Jihan
2018-02-26
We have developed a simple text mining algorithm that allows us to identify surface area and pore volumes of metal-organic frameworks (MOFs) using manuscript html files as inputs. The algorithm searches for common units (e.g., m 2 /g, cm 3 /g) associated with these two quantities to facilitate the search. From the sample set data of over 200 MOFs, the algorithm managed to identify 90% and 88.8% of the correct surface area and pore volume values. Further application to a test set of randomly chosen MOF html files yielded 73.2% and 85.1% accuracies for the two respective quantities. Most of the errors stem from unorthodox sentence structures that made it difficult to identify the correct data as well as bolded notations of MOFs (e.g., 1a) that made it difficult identify its real name. These types of tools will become useful when it comes to discovering structure-property relationships among MOFs as well as collecting a large set of data for references.
Lu, Zhiyong
2012-01-01
Today’s biomedical research has become heavily dependent on access to the biological knowledge encoded in expert curated biological databases. As the volume of biological literature grows rapidly, it becomes increasingly difficult for biocurators to keep up with the literature because manual curation is an expensive and time-consuming endeavour. Past research has suggested that computer-assisted curation can improve efficiency, but few text-mining systems have been formally evaluated in this regard. Through participation in the interactive text-mining track of the BioCreative 2012 workshop, we developed PubTator, a PubMed-like system that assists with two specific human curation tasks: document triage and bioconcept annotation. On the basis of evaluation results from two external user groups, we find that the accuracy of PubTator-assisted curation is comparable with that of manual curation and that PubTator can significantly increase human curatorial speed. These encouraging findings warrant further investigation with a larger number of publications to be annotated. Database URL: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/ PMID:23160414
On the unsupervised analysis of domain-specific Chinese texts
Deng, Ke; Bol, Peter K.; Li, Kate J.; Liu, Jun S.
2016-01-01
With the growing availability of digitized text data both publicly and privately, there is a great need for effective computational tools to automatically extract information from texts. Because the Chinese language differs most significantly from alphabet-based languages in not specifying word boundaries, most existing Chinese text-mining methods require a prespecified vocabulary and/or a large relevant training corpus, which may not be available in some applications. We introduce an unsupervised method, top-down word discovery and segmentation (TopWORDS), for simultaneously discovering and segmenting words and phrases from large volumes of unstructured Chinese texts, and propose ways to order discovered words and conduct higher-level context analyses. TopWORDS is particularly useful for mining online and domain-specific texts where the underlying vocabulary is unknown or the texts of interest differ significantly from available training corpora. When outputs from TopWORDS are fed into context analysis tools such as topic modeling, word embedding, and association pattern finding, the results are as good as or better than that from using outputs of a supervised segmentation method. PMID:27185919
An overview of the BioCreative 2012 Workshop Track III: interactive text mining task
Arighi, Cecilia N.; Carterette, Ben; Cohen, K. Bretonnel; Krallinger, Martin; Wilbur, W. John; Fey, Petra; Dodson, Robert; Cooper, Laurel; Van Slyke, Ceri E.; Dahdul, Wasila; Mabee, Paula; Li, Donghui; Harris, Bethany; Gillespie, Marc; Jimenez, Silvia; Roberts, Phoebe; Matthews, Lisa; Becker, Kevin; Drabkin, Harold; Bello, Susan; Licata, Luana; Chatr-aryamontri, Andrew; Schaeffer, Mary L.; Park, Julie; Haendel, Melissa; Van Auken, Kimberly; Li, Yuling; Chan, Juancarlos; Muller, Hans-Michael; Cui, Hong; Balhoff, James P.; Chi-Yang Wu, Johnny; Lu, Zhiyong; Wei, Chih-Hsuan; Tudor, Catalina O.; Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar; Cejuela, Juan Miguel; Dubey, Pratibha; Wu, Cathy
2013-01-01
In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators’ overall experience of a system, regardless of the system’s high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV. PMID:23327936
An overview of the BioCreative 2012 Workshop Track III: interactive text mining task.
Arighi, Cecilia N; Carterette, Ben; Cohen, K Bretonnel; Krallinger, Martin; Wilbur, W John; Fey, Petra; Dodson, Robert; Cooper, Laurel; Van Slyke, Ceri E; Dahdul, Wasila; Mabee, Paula; Li, Donghui; Harris, Bethany; Gillespie, Marc; Jimenez, Silvia; Roberts, Phoebe; Matthews, Lisa; Becker, Kevin; Drabkin, Harold; Bello, Susan; Licata, Luana; Chatr-aryamontri, Andrew; Schaeffer, Mary L; Park, Julie; Haendel, Melissa; Van Auken, Kimberly; Li, Yuling; Chan, Juancarlos; Muller, Hans-Michael; Cui, Hong; Balhoff, James P; Chi-Yang Wu, Johnny; Lu, Zhiyong; Wei, Chih-Hsuan; Tudor, Catalina O; Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar; Cejuela, Juan Miguel; Dubey, Pratibha; Wu, Cathy
2013-01-01
In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators' overall experience of a system, regardless of the system's high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV.
Timber volumes of old Pennsylvania surface mine reclamation plantations
Walter H. Davidson
1981-01-01
Surface mine reclamation plantings established in Pennsylvania from 1919 to 1934 were evaluated to determine merchantable volume, presence and volume of volunteer species, and soil development since planting. The evaluation showed that planted conifers had a total volume of 744 M bm on the 150 acres of reclaimed surface mines. In addition, there were 356 M bm of...
A Node Linkage Approach for Sequential Pattern Mining
Navarro, Osvaldo; Cumplido, René; Villaseñor-Pineda, Luis; Feregrino-Uribe, Claudia; Carrasco-Ochoa, Jesús Ariel
2014-01-01
Sequential Pattern Mining is a widely addressed problem in data mining, with applications such as analyzing Web usage, examining purchase behavior, and text mining, among others. Nevertheless, with the dramatic increase in data volume, the current approaches prove inefficient when dealing with large input datasets, a large number of different symbols and low minimum supports. In this paper, we propose a new sequential pattern mining algorithm, which follows a pattern-growth scheme to discover sequential patterns. Unlike most pattern growth algorithms, our approach does not build a data structure to represent the input dataset, but instead accesses the required sequences through pseudo-projection databases, achieving better runtime and reducing memory requirements. Our algorithm traverses the search space in a depth-first fashion and only preserves in memory a pattern node linkage and the pseudo-projections required for the branch being explored at the time. Experimental results show that our new approach, the Node Linkage Depth-First Traversal algorithm (NLDFT), has better performance and scalability in comparison with state of the art algorithms. PMID:24933123
2016-10-01
abstracts. We have provided basic details in the attached text . Extensive additional data, discussion, and conclusions are included in the attached...are listed in blue, and the following black text details progress towards these tasks. Aim 1) Develop a new type of dual-frequency PC-MUT co...Proceedings of the IEEE International Ultrasonics Symposium, Honolulu, HI, USA, 4–7 December 1990; Volume 2, pp. 799–803. 85. Saitoh, S.; Izumi, M.; Mine
Sahadevan, S; Hofmann-Apitius, M; Schellander, K; Tesfaye, D; Fluck, J; Friedrich, C M
2012-10-01
In biological research, establishing the prior art by searching and collecting information already present in the domain has equal importance as the experiments done. To obtain a complete overview about the relevant knowledge, researchers mainly rely on 2 major information sources: i) various biological databases and ii) scientific publications in the field. The major difference between the 2 information sources is that information from databases is available, typically well structured and condensed. The information content in scientific literature is vastly unstructured; that is, dispersed among the many different sections of scientific text. The traditional method of information extraction from scientific literature occurs by generating a list of relevant publications in the field of interest and manually scanning these texts for relevant information, which is very time consuming. It is more than likely that in using this "classical" approach the researcher misses some relevant information mentioned in the literature or has to go through biological databases to extract further information. Text mining and named entity recognition methods have already been used in human genomics and related fields as a solution to this problem. These methods can process and extract information from large volumes of scientific text. Text mining is defined as the automatic extraction of previously unknown and potentially useful information from text. Named entity recognition (NER) is defined as the method of identifying named entities (names of real world objects; for example, gene/protein names, drugs, enzymes) in text. In animal sciences, text mining and related methods have been briefly used in murine genomics and associated fields, leaving behind other fields of animal sciences, such as livestock genomics. The aim of this work was to develop an information retrieval platform in the livestock domain focusing on livestock publications and the recognition of relevant data from cattle and pigs. For this purpose, the rather noncomprehensive resources of pig and cattle gene and protein terminologies were enriched with orthologue synonyms, integrated in the NER platform, ProMiner, which is successfully used in human genomics domain. Based on the performance tests done, the present system achieved a fair performance with precision 0.64, recall 0.74, and F(1) measure of 0.69 in a test scenario based on cattle literature.
Sentiment analysis of feature ranking methods for classification accuracy
NASA Astrophysics Data System (ADS)
Joseph, Shashank; Mugauri, Calvin; Sumathy, S.
2017-11-01
Text pre-processing and feature selection are important and critical steps in text mining. Text pre-processing of large volumes of datasets is a difficult task as unstructured raw data is converted into structured format. Traditional methods of processing and weighing took much time and were less accurate. To overcome this challenge, feature ranking techniques have been devised. A feature set from text preprocessing is fed as input for feature selection. Feature selection helps improve text classification accuracy. Of the three feature selection categories available, the filter category will be the focus. Five feature ranking methods namely: document frequency, standard deviation information gain, CHI-SQUARE, and weighted-log likelihood -ratio is analyzed.
Text Mining in Biomedical Domain with Emphasis on Document Clustering.
Renganathan, Vinaitheerthan
2017-07-01
With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise.
Addressing Information Proliferation: Applications of Information Extraction and Text Mining
ERIC Educational Resources Information Center
Li, Jingjing
2013-01-01
The advent of the Internet and the ever-increasing capacity of storage media have made it easy to store, deliver, and share enormous volumes of data, leading to a proliferation of information on the Web, in online libraries, on news wires, and almost everywhere in our daily lives. Since our ability to process and absorb this information remains…
Mining influence on underground water resources in arid and semiarid regions
NASA Astrophysics Data System (ADS)
Luo, A. K.; Hou, Y.; Hu, X. Y.
2018-02-01
Coordinated mining of coal and water resources in arid and semiarid regions has traditionally become a focus issue. The research takes Energy and Chemical Base in Northern Shaanxi as an example, and conducts statistical analysis on coal yield and drainage volume from several large-scale mines in the mining area. Meanwhile, research determines average water volume per ton coal, and calculates four typical years’ drainage volume in different mining intensity. Then during mining drainage, with the combination of precipitation observation data in recent two decades and water level data from observation well, the calculation of groundwater table, precipitation infiltration recharge, and evaporation capacity are performed. Moreover, the research analyzes the transforming relationship between surface water, mine water, and groundwater. The result shows that the main reason for reduction of water resources quantity and transforming relationship between surface water, groundwater, and mine water is massive mine drainage, which is caused by large-scale coal mining in the research area.
40 CFR 440.144 - New source performance standards (NSPS).
Code of Federal Regulations, 2014 CFR
2014-07-01
... discharged from an open-cut mine plant site shall not exceed the volume of infiltration, drainage and mine... not exceed the volume of infiltration, drainage and mine drainage waters which is in excess of the...
40 CFR 440.144 - New source performance standards (NSPS).
Code of Federal Regulations, 2012 CFR
2012-07-01
... discharged from an open-cut mine plant site shall not exceed the volume of infiltration, drainage and mine... not exceed the volume of infiltration, drainage and mine drainage waters which is in excess of the...
40 CFR 440.144 - New source performance standards (NSPS).
Code of Federal Regulations, 2013 CFR
2013-07-01
... discharged from an open-cut mine plant site shall not exceed the volume of infiltration, drainage and mine... not exceed the volume of infiltration, drainage and mine drainage waters which is in excess of the...
Text Mining in Biomedical Domain with Emphasis on Document Clustering
2017-01-01
Objectives With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. Methods This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. Results Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. Conclusions Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise. PMID:28875048
Fuzzy and rough formal concept analysis: a survey
NASA Astrophysics Data System (ADS)
Poelmans, Jonas; Ignatov, Dmitry I.; Kuznetsov, Sergei O.; Dedene, Guido
2014-02-01
Formal Concept Analysis (FCA) is a mathematical technique that has been extensively applied to Boolean data in knowledge discovery, information retrieval, web mining, etc. applications. During the past years, the research on extending FCA theory to cope with imprecise and incomplete information made significant progress. In this paper, we give a systematic overview of the more than 120 papers published between 2003 and 2011 on FCA with fuzzy attributes and rough FCA. We applied traditional FCA as a text-mining instrument to 1072 papers mentioning FCA in the abstract. These papers were formatted in pdf files and using a thesaurus with terms referring to research topics, we transformed them into concept lattices. These lattices were used to analyze and explore the most prominent research topics within the FCA with fuzzy attributes and rough FCA research communities. FCA turned out to be an ideal metatechnique for representing large volumes of unstructured texts.
Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges
Singhal, Ayush; Leaman, Robert; Catlett, Natalie; Lemberger, Thomas; McEntyre, Johanna; Polson, Shawn; Xenarios, Ioannis; Arighi, Cecilia; Lu, Zhiyong
2016-01-01
Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs. We note that system ‘accuracy’ remains a challenge and identify several additional common difficulties and potential research directions including (i) the ‘scalability’ issue due to the increasing need of mining information from millions of full-text articles, (ii) the ‘interoperability’ issue of integrating various text-mining systems into existing curation workflows and (iii) the ‘reusability’ issue on the difficulty of applying trained systems to text genres that are not seen previously during development. We then describe related efforts within the text-mining community, with a special focus on the BioCreative series of challenge workshops. We believe that focusing on the near-term challenges identified in this work will amplify the opportunities afforded by the continued adoption of text-mining tools. Finally, in order to sustain the curation ecosystem and have text-mining systems adopted for practical benefits, we call for increased collaboration between text-mining researchers and various stakeholders, including researchers, publishers and biocurators. PMID:28025348
Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges
Singhal, Ayush; Leaman, Robert; Catlett, Natalie; ...
2016-12-26
Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs. We note that system ‘accuracy’ remains a challenge and identify several additional common difficulties and potential research directions including (i) the ‘scalability’ issue due to themore » increasing need of mining information from millions of full-text articles, (ii) the ‘interoperability’ issue of integrating various text-mining systems into existing curation workflows and (iii) the ‘reusability’ issue on the difficulty of applying trained systems to text genres that are not seen previously during development. We then describe related efforts within the text-mining community, with a special focus on the BioCreative series of challenge workshops. We believe that focusing on the near-term challenges identified in this work will amplify the opportunities afforded by the continued adoption of text-mining tools. In conclusion, in order to sustain the curation ecosystem and have text-mining systems adopted for practical benefits, we call for increased collaboration between text-mining researchers and various stakeholders, including researchers, publishers and biocurators.« less
Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singhal, Ayush; Leaman, Robert; Catlett, Natalie
Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs. We note that system ‘accuracy’ remains a challenge and identify several additional common difficulties and potential research directions including (i) the ‘scalability’ issue due to themore » increasing need of mining information from millions of full-text articles, (ii) the ‘interoperability’ issue of integrating various text-mining systems into existing curation workflows and (iii) the ‘reusability’ issue on the difficulty of applying trained systems to text genres that are not seen previously during development. We then describe related efforts within the text-mining community, with a special focus on the BioCreative series of challenge workshops. We believe that focusing on the near-term challenges identified in this work will amplify the opportunities afforded by the continued adoption of text-mining tools. In conclusion, in order to sustain the curation ecosystem and have text-mining systems adopted for practical benefits, we call for increased collaboration between text-mining researchers and various stakeholders, including researchers, publishers and biocurators.« less
Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges.
Singhal, Ayush; Leaman, Robert; Catlett, Natalie; Lemberger, Thomas; McEntyre, Johanna; Polson, Shawn; Xenarios, Ioannis; Arighi, Cecilia; Lu, Zhiyong
2016-01-01
Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs. We note that system 'accuracy' remains a challenge and identify several additional common difficulties and potential research directions including (i) the 'scalability' issue due to the increasing need of mining information from millions of full-text articles, (ii) the 'interoperability' issue of integrating various text-mining systems into existing curation workflows and (iii) the 'reusability' issue on the difficulty of applying trained systems to text genres that are not seen previously during development. We then describe related efforts within the text-mining community, with a special focus on the BioCreative series of challenge workshops. We believe that focusing on the near-term challenges identified in this work will amplify the opportunities afforded by the continued adoption of text-mining tools. Finally, in order to sustain the curation ecosystem and have text-mining systems adopted for practical benefits, we call for increased collaboration between text-mining researchers and various stakeholders, including researchers, publishers and biocurators. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.
Text Mining for Adverse Drug Events: the Promise, Challenges, and State of the Art
Harpaz, Rave; Callahan, Alison; Tamang, Suzanne; Low, Yen; Odgers, David; Finlayson, Sam; Jung, Kenneth; LePendu, Paea; Shah, Nigam H.
2014-01-01
Text mining is the computational process of extracting meaningful information from large amounts of unstructured text. Text mining is emerging as a tool to leverage underutilized data sources that can improve pharmacovigilance, including the objective of adverse drug event detection and assessment. This article provides an overview of recent advances in pharmacovigilance driven by the application of text mining, and discusses several data sources—such as biomedical literature, clinical narratives, product labeling, social media, and Web search logs—that are amenable to text-mining for pharmacovigilance. Given the state of the art, it appears text mining can be applied to extract useful ADE-related information from multiple textual sources. Nonetheless, further research is required to address remaining technical challenges associated with the text mining methodologies, and to conclusively determine the relative contribution of each textual source to improving pharmacovigilance. PMID:25151493
ERIC Educational Resources Information Center
Trybula, Walter J.
1999-01-01
Reviews the state of research in text mining, focusing on newer developments. The intent is to describe the disparate investigations currently included under the term text mining and provide a cohesive structure for these efforts. A summary of research identifies key organizations responsible for pushing the development of text mining. A section…
Building a glaucoma interaction network using a text mining approach.
Soliman, Maha; Nasraoui, Olfa; Cooper, Nigel G F
2016-01-01
The volume of biomedical literature and its underlying knowledge base is rapidly expanding, making it beyond the ability of a single human being to read through all the literature. Several automated methods have been developed to help make sense of this dilemma. The present study reports on the results of a text mining approach to extract gene interactions from the data warehouse of published experimental results which are then used to benchmark an interaction network associated with glaucoma. To the best of our knowledge, there is, as yet, no glaucoma interaction network derived solely from text mining approaches. The presence of such a network could provide a useful summative knowledge base to complement other forms of clinical information related to this disease. A glaucoma corpus was constructed from PubMed Central and a text mining approach was applied to extract genes and their relations from this corpus. The extracted relations between genes were checked using reference interaction databases and classified generally as known or new relations. The extracted genes and relations were then used to construct a glaucoma interaction network. Analysis of the resulting network indicated that it bears the characteristics of a small world interaction network. Our analysis showed the presence of seven glaucoma linked genes that defined the network modularity. A web-based system for browsing and visualizing the extracted glaucoma related interaction networks is made available at http://neurogene.spd.louisville.edu/GlaucomaINViewer/Form1.aspx. This study has reported the first version of a glaucoma interaction network using a text mining approach. The power of such an approach is in its ability to cover a wide range of glaucoma related studies published over many years. Hence, a bigger picture of the disease can be established. To the best of our knowledge, this is the first glaucoma interaction network to summarize the known literature. The major findings were a set of relations that could not be found in existing interaction databases and that were found to be new, in addition to a smaller subnetwork consisting of interconnected clusters of seven glaucoma genes. Future improvements can be applied towards obtaining a better version of this network.
Valente, Carlo C; Bauer, Florian F; Venter, Fritz; Watson, Bruce; Nieuwoudt, Hélène H
2018-03-21
The increasingly large volumes of publicly available sensory descriptions of wine raises the question whether this source of data can be mined to extract meaningful domain-specific information about the sensory properties of wine. We introduce a novel application of formal concept lattices, in combination with traditional statistical tests, to visualise the sensory attributes of a big data set of some 7,000 Chenin blanc and Sauvignon blanc wines. Complexity was identified as an important driver of style in hereto uncharacterised Chenin blanc, and the sensory cues for specific styles were identified. This is the first study to apply these methods for the purpose of identifying styles within varietal wines. More generally, our interactive data visualisation and mining driven approach opens up new investigations towards better understanding of the complex field of sensory science.
[Research of bleeding volume and method in blood-letting acupuncture therapy based on data mining].
Liu, Xin; Jia, Chun-Sheng; Wang, Jian-Ling; Du, Yu-Zhu; Zhang, Xiao-Xu; Shi, Jing; Li, Xiao-Feng; Sun, Yan-Hui; Zhang, Shen; Zhang, Xuan-Ping; Gang, Wei-Juan
2014-03-01
Through computer-based technology and data mining method, with treatment in cases of bloodletting acupuncture therapy in collected literature as sample data, the association rule in data mining was applied. According to self-built database platform, the data was input, arranged and summarized, and eventually required data was acquired to perform the data mining of bleeding volume and method in blood-letting acupuncture therapy, which summarized its application rules and clinical values to provide better guide for clinical practice. There were 9 kinds of blood-letting tools in the literature, in which the frequency of three-edge needle was the highest, accounting for 84.4% (1239/1468). The bleeding volume was classified into six levels, in which less volume (less than 0.1 mL) had the highest frequency (401 times). According to the results of the data mining, blood-letting acupuncture therapy was widely applied in clinical practice of acupuncture, in which use of three-edge needle and less volume (less than 0.1 mL) of blood were the most common, however, there was no central tendency in general.
Biomedical text mining and its applications in cancer research.
Zhu, Fei; Patumcharoenpol, Preecha; Zhang, Cheng; Yang, Yang; Chan, Jonathan; Meechai, Asawin; Vongsangnak, Wanwipa; Shen, Bairong
2013-04-01
Cancer is a malignant disease that has caused millions of human deaths. Its study has a long history of well over 100years. There have been an enormous number of publications on cancer research. This integrated but unstructured biomedical text is of great value for cancer diagnostics, treatment, and prevention. The immense body and rapid growth of biomedical text on cancer has led to the appearance of a large number of text mining techniques aimed at extracting novel knowledge from scientific text. Biomedical text mining on cancer research is computationally automatic and high-throughput in nature. However, it is error-prone due to the complexity of natural language processing. In this review, we introduce the basic concepts underlying text mining and examine some frequently used algorithms, tools, and data sets, as well as assessing how much these algorithms have been utilized. We then discuss the current state-of-the-art text mining applications in cancer research and we also provide some resources for cancer text mining. With the development of systems biology, researchers tend to understand complex biomedical systems from a systems biology viewpoint. Thus, the full utilization of text mining to facilitate cancer systems biology research is fast becoming a major concern. To address this issue, we describe the general workflow of text mining in cancer systems biology and each phase of the workflow. We hope that this review can (i) provide a useful overview of the current work of this field; (ii) help researchers to choose text mining tools and datasets; and (iii) highlight how to apply text mining to assist cancer systems biology research. Copyright © 2012 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Yu, Chong Ho; Jannasch-Pennell, Angel; DiGangi, Samuel
2011-01-01
The objective of this article is to illustrate that text mining and qualitative research are epistemologically compatible. First, like many qualitative research approaches, such as grounded theory, text mining encourages open-mindedness and discourages preconceptions. Contrary to the popular belief that text mining is a linear and fully automated…
Text mining meets workflow: linking U-Compare with Taverna
Kano, Yoshinobu; Dobson, Paul; Nakanishi, Mio; Tsujii, Jun'ichi; Ananiadou, Sophia
2010-01-01
Summary: Text mining from the biomedical literature is of increasing importance, yet it is not easy for the bioinformatics community to create and run text mining workflows due to the lack of accessibility and interoperability of the text mining resources. The U-Compare system provides a wide range of bio text mining resources in a highly interoperable workflow environment where workflows can very easily be created, executed, evaluated and visualized without coding. We have linked U-Compare to Taverna, a generic workflow system, to expose text mining functionality to the bioinformatics community. Availability: http://u-compare.org/taverna.html, http://u-compare.org Contact: kano@is.s.u-tokyo.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20709690
Survey of Natural Language Processing Techniques in Bioinformatics.
Zeng, Zhiqiang; Shi, Hua; Wu, Yun; Hong, Zhiling
2015-01-01
Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers.
Text Mining in Organizational Research
Kobayashi, Vladimer B.; Berkers, Hannah A.; Kismihók, Gábor; Den Hartog, Deanne N.
2017-01-01
Despite the ubiquity of textual data, so far few researchers have applied text mining to answer organizational research questions. Text mining, which essentially entails a quantitative approach to the analysis of (usually) voluminous textual data, helps accelerate knowledge discovery by radically increasing the amount data that can be analyzed. This article aims to acquaint organizational researchers with the fundamental logic underpinning text mining, the analytical stages involved, and contemporary techniques that may be used to achieve different types of objectives. The specific analytical techniques reviewed are (a) dimensionality reduction, (b) distance and similarity computing, (c) clustering, (d) topic modeling, and (e) classification. We describe how text mining may extend contemporary organizational research by allowing the testing of existing or new research questions with data that are likely to be rich, contextualized, and ecologically valid. After an exploration of how evidence for the validity of text mining output may be generated, we conclude the article by illustrating the text mining process in a job analysis setting using a dataset composed of job vacancies. PMID:29881248
Text Mining in Organizational Research.
Kobayashi, Vladimer B; Mol, Stefan T; Berkers, Hannah A; Kismihók, Gábor; Den Hartog, Deanne N
2018-07-01
Despite the ubiquity of textual data, so far few researchers have applied text mining to answer organizational research questions. Text mining, which essentially entails a quantitative approach to the analysis of (usually) voluminous textual data, helps accelerate knowledge discovery by radically increasing the amount data that can be analyzed. This article aims to acquaint organizational researchers with the fundamental logic underpinning text mining, the analytical stages involved, and contemporary techniques that may be used to achieve different types of objectives. The specific analytical techniques reviewed are (a) dimensionality reduction, (b) distance and similarity computing, (c) clustering, (d) topic modeling, and (e) classification. We describe how text mining may extend contemporary organizational research by allowing the testing of existing or new research questions with data that are likely to be rich, contextualized, and ecologically valid. After an exploration of how evidence for the validity of text mining output may be generated, we conclude the article by illustrating the text mining process in a job analysis setting using a dataset composed of job vacancies.
Text mining for adverse drug events: the promise, challenges, and state of the art.
Harpaz, Rave; Callahan, Alison; Tamang, Suzanne; Low, Yen; Odgers, David; Finlayson, Sam; Jung, Kenneth; LePendu, Paea; Shah, Nigam H
2014-10-01
Text mining is the computational process of extracting meaningful information from large amounts of unstructured text. It is emerging as a tool to leverage underutilized data sources that can improve pharmacovigilance, including the objective of adverse drug event (ADE) detection and assessment. This article provides an overview of recent advances in pharmacovigilance driven by the application of text mining, and discusses several data sources-such as biomedical literature, clinical narratives, product labeling, social media, and Web search logs-that are amenable to text mining for pharmacovigilance. Given the state of the art, it appears text mining can be applied to extract useful ADE-related information from multiple textual sources. Nonetheless, further research is required to address remaining technical challenges associated with the text mining methodologies, and to conclusively determine the relative contribution of each textual source to improving pharmacovigilance.
Health Terrain: Visualizing Large Scale Health Data
2015-12-01
Text mining ; Data mining . 16. SECURITY CLASSIFICATION OF: 17... text mining algorithms to construct a concept space. A browser-‐based user interface is developed to...Public health data, Notifiable condition detector, Text mining , Data mining 4 of 29 Disease Patient Location Term
Introducing Text Analytics as a Graduate Business School Course
ERIC Educational Resources Information Center
Edgington, Theresa M.
2011-01-01
Text analytics refers to the process of analyzing unstructured data from documented sources, including open-ended surveys, blogs, and other types of web dialog. Text analytics has enveloped the concept of text mining, an analysis approach influenced heavily from data mining. While text mining has been covered extensively in various computer…
Yang, Jinyan; Tang, Ya; Yang, Kai; Rouff, Ashaki A; Elzinga, Evert J; Huang, Jen-How
2014-01-15
A series of column leaching experiments were performed to understand the leaching behaviour and the potential environmental risk of vanadium in a Panzhihua soil and vanadium titanomagnetite mine tailings. Results from sequential extraction experiments indicated that the mobility of vanadium in both the soil and the mine tailings was low, with <1% of the total vanadium readily mobilised. Column experiments revealed that only <0.1% of vanadium in the soil and mine tailing was leachable. The vanadium concentrations in the soil leachates did not vary considerably, but decreased with the leachate volume in the mine tailing leachates. This suggests that there was a smaller pool of leachable vanadium in the mine tailings compared to that in the soil. Drought and rewetting increased the vanadium concentrations in the soil and mine tailing leachates from 20μgL(-1) to 50-90μgL(-1), indicating the potential for high vanadium release following periods of drought. Experiments with soil columns overlain with 4, 8 and 20% volume mine tailings/volume soil exhibited very similar vanadium leaching behaviour. These results suggest that the transport of vanadium to the subsurface is controlled primarily by the leaching processes occurring in soils. Copyright © 2013 Elsevier B.V. All rights reserved.
Van Landeghem, Sofie; De Bodt, Stefanie; Drebert, Zuzanna J; Inzé, Dirk; Van de Peer, Yves
2013-03-01
Despite the availability of various data repositories for plant research, a wealth of information currently remains hidden within the biomolecular literature. Text mining provides the necessary means to retrieve these data through automated processing of texts. However, only recently has advanced text mining methodology been implemented with sufficient computational power to process texts at a large scale. In this study, we assess the potential of large-scale text mining for plant biology research in general and for network biology in particular using a state-of-the-art text mining system applied to all PubMed abstracts and PubMed Central full texts. We present extensive evaluation of the textual data for Arabidopsis thaliana, assessing the overall accuracy of this new resource for usage in plant network analyses. Furthermore, we combine text mining information with both protein-protein and regulatory interactions from experimental databases. Clusters of tightly connected genes are delineated from the resulting network, illustrating how such an integrative approach is essential to grasp the current knowledge available for Arabidopsis and to uncover gene information through guilt by association. All large-scale data sets, as well as the manually curated textual data, are made publicly available, hereby stimulating the application of text mining data in future plant biology studies.
Volume II investigates the potential radiogenic risks from abandoned uranium mines and evaluates which may pose the greatest hazards to members of the public and to the environment. The intent of this report is to identify who may be most likely to be exposed to wastes at small a...
Integration of Text- and Data-Mining Technologies for Use in Banking Applications
NASA Astrophysics Data System (ADS)
Maslankowski, Jacek
Unstructured data, most of it in the form of text files, typically accounts for 85% of an organization's knowledge stores, but it's not always easy to find, access, analyze or use (Robb 2004). That is why it is important to use solutions based on text and data mining. This solution is known as duo mining. This leads to improve management based on knowledge owned in organization. The results are interesting. Data mining provides to lead with structuralized data, usually powered from data warehouses. Text mining, sometimes called web mining, looks for patterns in unstructured data — memos, document and www. Integrating text-based information with structured data enriches predictive modeling capabilities and provides new stores of insightful and valuable information for driving business and research initiatives forward.
MINE WASTE TECHNOLOGY PROGRAM: A SUCCESS STORY
Mining Waste generated by active and inactive mining operations is a growing problem for the mining industry, local governments, and Native American communities because of its impact on human health and the environment. In the US, the reported volume of mine waste is immense: 2 b...
NASA Astrophysics Data System (ADS)
Lathrop, John D.
1995-06-01
This paper describes the sea mine countermeasures developmental context, technology goals, and progress to date of the two principal Office of Naval Research exploratory development programs addressing sea mine reconnaissance and minehunting technology development. The first of these programs, High Area Rate Reconnaissance, is developing toroidal volume search sonar technology, sidelooking sonar technology, and associated signal processing technologies (motion compensation, beamforming, and computer-aided detection and classification) for reconnaissance and hunting against volume mines and proud bottom mines from 21-inch diameter vehicles operating in deeper waters. The second of these programs, Amphibious Operation Area Mine Reconnaissance/Hunter, is developing a suite of sensor technologies (synthetic aperture sonar, ahead-looking sonar, superconducting magnetic field gradiometer, and electro-optic sensor) and associated signal processing technologies for reconnaissance and hunting against all mine types (including buried mines) in shallow water and very shallow water from 21-inch diameter vehicles. The technologies under development by these two programs must provide excellent capabilities for mine detection, mine classification, and discrimination against false targets.
Smith, S. Jerrod
2013-01-01
From the 1890s through the 1970s the Picher mining district in northeastern Ottawa County, Oklahoma, was the site of mining and processing of lead and zinc ore. When mining ceased in about 1979, as much as 165–300 million tons of mine tailings, locally referred to as “chat,” remained in the Picher mining district. Since 1979, some chat piles have been mined for aggregate materials and have decreased in volume and mass. Currently (2013), the land surface in the Picher mining district is covered by thousands of acres of chat, much of which remains on Indian trust land owned by allottees. The Bureau of Indian Affairs manages these allotted lands and oversees the sale and removal of chat from these properties. To help the Bureau of Indian Affairs better manage the sale and removal of chat, the U.S. Geological Survey, in cooperation with the Bureau of Indian Affairs, estimated the 2005 and 2010 volumes and masses of selected chat piles remaining on allotted lands in the Picher mining district. The U.S. Geological Survey also estimated the changes in volume and mass of these chat piles for the period 2005 through 2010. The 2005 and 2010 chat-pile volume and mass estimates were computed for 34 selected chat piles on 16 properties in the study area. All computations of volume and mass were performed on individual chat piles and on groups of chat piles in the same property. The Sooner property had the greatest estimated volume (4.644 million cubic yards) and mass (5.253 ± 0.473 million tons) of chat in 2010. Five of the selected properties (Sooner, Western, Lawyers, Skelton, and St. Joe) contained estimated chat volumes exceeding 1 million cubic yards and estimated chat masses exceeding 1 million tons in 2010. Four of the selected properties (Lucky Bill Humbah, Ta Mee Heh, Bird Dog, and St. Louis No. 6) contained estimated chat volumes of less than 0.1 million cubic yards and estimated chat masses of less than 0.1 million tons in 2010. The total volume of all selected chat piles was estimated to be 18.073 million cubic yards in 2005 and 16.171 million cubic yards in 2010. The total mass of all selected chat piles was estimated to be 20.445 ± 1.840 million tons in 2005 and 18.294 ± 1.646 million tons in 2010. All of the selected chat piles decreased in volume and mass for the period 2005 through 2010. Chat piles CP022 (Ottawa property) and CP013 (Sooner property) had some within-property chat-pile redistribution, with both chat piles having net decreases in volume and mass for the period 2005 through 2010. The Sooner property and the St. Joe property had the greatest volume (and mass) changes, with 1.266 million cubic yards and 0.217 million cubic yards (1.432 ± 0.129 million tons and 0.246 ± 0.022 million tons) of chat being removed, respectively. The chat removed from the Sooner and St. Joe properties accounts for about 78 percent of the chat removed from all selected chat piles and properties. The total volume and mass removed from all selected chat piles for the period 2005 through 2010 were estimated to be 1.902 million cubic yards and 2.151 ± 0.194 million tons, respectively.
SparkText: Biomedical Text Mining on Big Data Framework.
Ye, Zhan; Tafti, Ahmad P; He, Karen Y; Wang, Kai; He, Max M
Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research.
SparkText: Biomedical Text Mining on Big Data Framework
He, Karen Y.; Wang, Kai
2016-01-01
Background Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. Results In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. Conclusions This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research. PMID:27685652
ERIC Educational Resources Information Center
Qin, Jian; Jurisica, Igor; Liddy, Elizabeth D.; Jansen, Bernard J; Spink, Amanda; Priss, Uta; Norton, Melanie J.
2000-01-01
These six articles discuss knowledge discovery in databases (KDD). Topics include data mining; knowledge management systems; applications of knowledge discovery; text and Web mining; text mining and information retrieval; user search patterns through Web log analysis; concept analysis; data collection; and data structure inconsistency. (LRW)
Using Perilog to Explore "Decision Making at NASA"
NASA Technical Reports Server (NTRS)
McGreevy, Michael W.
2005-01-01
Perilog, a context intensive text mining system, is used as a discovery tool to explore topics and concerns in "Decision Making at NASA," chapter 6 of the Columbia Accident Investigation Board (CAIB) Report, Volume I. Two examples illustrate how Perilog can be used to discover highly significant safety-related information in the text without prior knowledge of the contents of the document. A third example illustrates how "if-then" statements found by Perilog can be used in logical analysis of decision making. In addition, in order to serve as a guide for future work, the technical details of preparing a PDF document for input to Perilog are included in an appendix.
Westergaard, David; Stærfeldt, Hans-Henrik; Tønsberg, Christian; Jensen, Lars Juhl; Brunak, Søren
2018-02-01
Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823-2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein-protein, disease-gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only.
Westergaard, David; Stærfeldt, Hans-Henrik
2018-01-01
Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823–2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein–protein, disease–gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only. PMID:29447159
PubRunner: A light-weight framework for updating text mining results.
Anekalla, Kishore R; Courneya, J P; Fiorini, Nicolas; Lever, Jake; Muchow, Michael; Busby, Ben
2017-01-01
Biomedical text mining promises to assist biologists in quickly navigating the combined knowledge in their domain. This would allow improved understanding of the complex interactions within biological systems and faster hypothesis generation. New biomedical research articles are published daily and text mining tools are only as good as the corpus from which they work. Many text mining tools are underused because their results are static and do not reflect the constantly expanding knowledge in the field. In order for biomedical text mining to become an indispensable tool used by researchers, this problem must be addressed. To this end, we present PubRunner, a framework for regularly running text mining tools on the latest publications. PubRunner is lightweight, simple to use, and can be integrated with an existing text mining tool. The workflow involves downloading the latest abstracts from PubMed, executing a user-defined tool, pushing the resulting data to a public FTP or Zenodo dataset, and publicizing the location of these results on the public PubRunner website. We illustrate the use of this tool by re-running the commonly used word2vec tool on the latest PubMed abstracts to generate up-to-date word vector representations for the biomedical domain. This shows a proof of concept that we hope will encourage text mining developers to build tools that truly will aid biologists in exploring the latest publications.
Text mining applications in psychiatry: a systematic literature review.
Abbe, Adeline; Grouin, Cyril; Zweigenbaum, Pierre; Falissard, Bruno
2016-06-01
The expansion of biomedical literature is creating the need for efficient tools to keep pace with increasing volumes of information. Text mining (TM) approaches are becoming essential to facilitate the automated extraction of useful biomedical information from unstructured text. We reviewed the applications of TM in psychiatry, and explored its advantages and limitations. A systematic review of the literature was carried out using the CINAHL, Medline, EMBASE, PsycINFO and Cochrane databases. In this review, 1103 papers were screened, and 38 were included as applications of TM in psychiatric research. Using TM and content analysis, we identified four major areas of application: (1) Psychopathology (i.e. observational studies focusing on mental illnesses) (2) the Patient perspective (i.e. patients' thoughts and opinions), (3) Medical records (i.e. safety issues, quality of care and description of treatments), and (4) Medical literature (i.e. identification of new scientific information in the literature). The information sources were qualitative studies, Internet postings, medical records and biomedical literature. Our work demonstrates that TM can contribute to complex research tasks in psychiatry. We discuss the benefits, limits, and further applications of this tool in the future. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
Van Landeghem, Sofie; De Bodt, Stefanie; Drebert, Zuzanna J.; Inzé, Dirk; Van de Peer, Yves
2013-01-01
Despite the availability of various data repositories for plant research, a wealth of information currently remains hidden within the biomolecular literature. Text mining provides the necessary means to retrieve these data through automated processing of texts. However, only recently has advanced text mining methodology been implemented with sufficient computational power to process texts at a large scale. In this study, we assess the potential of large-scale text mining for plant biology research in general and for network biology in particular using a state-of-the-art text mining system applied to all PubMed abstracts and PubMed Central full texts. We present extensive evaluation of the textual data for Arabidopsis thaliana, assessing the overall accuracy of this new resource for usage in plant network analyses. Furthermore, we combine text mining information with both protein–protein and regulatory interactions from experimental databases. Clusters of tightly connected genes are delineated from the resulting network, illustrating how such an integrative approach is essential to grasp the current knowledge available for Arabidopsis and to uncover gene information through guilt by association. All large-scale data sets, as well as the manually curated textual data, are made publicly available, hereby stimulating the application of text mining data in future plant biology studies. PMID:23532071
Frontiers of biomedical text mining: current progress
Zweigenbaum, Pierre; Demner-Fushman, Dina; Yu, Hong; Cohen, Kevin B.
2008-01-01
It is now almost 15 years since the publication of the first paper on text mining in the genomics domain, and decades since the first paper on text mining in the medical domain. Enormous progress has been made in the areas of information retrieval, evaluation methodologies and resource construction. Some problems, such as abbreviation-handling, can essentially be considered solved problems, and others, such as identification of gene mentions in text, seem likely to be solved soon. However, a number of problems at the frontiers of biomedical text mining continue to present interesting challenges and opportunities for great improvements and interesting research. In this article we review the current state of the art in biomedical text mining or ‘BioNLP’ in general, focusing primarily on papers published within the past year. PMID:17977867
Automated detection of follow-up appointments using text mining of discharge records.
Ruud, Kari L; Johnson, Matthew G; Liesinger, Juliette T; Grafft, Carrie A; Naessens, James M
2010-06-01
To determine whether text mining can accurately detect specific follow-up appointment criteria in free-text hospital discharge records. Cross-sectional study. Mayo Clinic Rochester hospitals. Inpatients discharged from general medicine services in 2006 (n = 6481). Textual hospital dismissal summaries were manually reviewed to determine whether the records contained specific follow-up appointment arrangement elements: date, time and either physician or location for an appointment. The data set was evaluated for the same criteria using SAS Text Miner software. The two assessments were compared to determine the accuracy of text mining for detecting records containing follow-up appointment arrangements. Agreement of text-mined appointment findings with gold standard (manual abstraction) including sensitivity, specificity, positive predictive and negative predictive values (PPV and NPV). About 55.2% (3576) of discharge records contained all criteria for follow-up appointment arrangements according to the manual review, 3.2% (113) of which were missed through text mining. Text mining incorrectly identified 3.7% (107) follow-up appointments that were not considered valid through manual review. Therefore, the text mining analysis concurred with the manual review in 96.6% of the appointment findings. Overall sensitivity and specificity were 96.8 and 96.3%, respectively; and PPV and NPV were 97.0 and 96.1%, respectively. of individual appointment criteria resulted in accuracy rates of 93.5% for date, 97.4% for time, 97.5% for physician and 82.9% for location. Text mining of unstructured hospital dismissal summaries can accurately detect documentation of follow-up appointment arrangement elements, thus saving considerable resources for performance assessment and quality-related research.
Automatic target validation based on neuroscientific literature mining for tractography
Vasques, Xavier; Richardet, Renaud; Hill, Sean L.; Slater, David; Chappelier, Jean-Cedric; Pralong, Etienne; Bloch, Jocelyne; Draganski, Bogdan; Cif, Laura
2015-01-01
Target identification for tractography studies requires solid anatomical knowledge validated by an extensive literature review across species for each seed structure to be studied. Manual literature review to identify targets for a given seed region is tedious and potentially subjective. Therefore, complementary approaches would be useful. We propose to use text-mining models to automatically suggest potential targets from the neuroscientific literature, full-text articles and abstracts, so that they can be used for anatomical connection studies and more specifically for tractography. We applied text-mining models to three structures: two well-studied structures, since validated deep brain stimulation targets, the internal globus pallidus and the subthalamic nucleus and, the nucleus accumbens, an exploratory target for treating psychiatric disorders. We performed a systematic review of the literature to document the projections of the three selected structures and compared it with the targets proposed by text-mining models, both in rat and primate (including human). We ran probabilistic tractography on the nucleus accumbens and compared the output with the results of the text-mining models and literature review. Overall, text-mining the literature could find three times as many targets as two man-weeks of curation could. The overall efficiency of the text-mining against literature review in our study was 98% recall (at 36% precision), meaning that over all the targets for the three selected seeds, only one target has been missed by text-mining. We demonstrate that connectivity for a structure of interest can be extracted from a very large amount of publications and abstracts. We believe this tool will be useful in helping the neuroscience community to facilitate connectivity studies of particular brain regions. The text mining tools used for the study are part of the HBP Neuroinformatics Platform, publicly available at http://connectivity-brainer.rhcloud.com/. PMID:26074781
Code of Federal Regulations, 2014 CFR
2014-07-01
... which may be discharged from an open-cut mine plant site shall not exceed the volume of infiltration... site shall not exceed the volume of infiltration, drainage and mine drainage waters which is in excess...
Code of Federal Regulations, 2012 CFR
2012-07-01
... which may be discharged from an open-cut mine plant site shall not exceed the volume of infiltration... site shall not exceed the volume of infiltration, drainage and mine drainage waters which is in excess...
Deformation Failure Characteristics of Coal Body and Mining Induced Stress Evolution Law
Wen, Zhijie; Wen, Jinhao; Shi, Yongkui; Jia, Chuanyang
2014-01-01
The results of the interaction between coal failure and mining pressure field evolution during mining are presented. Not only the mechanical model of stope and its relative structure division, but also the failure and behavior characteristic of coal body under different mining stages are built and demonstrated. Namely, the breaking arch and stress arch which influence the mining area are quantified calculated. A systematic method of stress field distribution is worked out. All this indicates that the pore distribution of coal body with different compressed volume has fractal character; it appears to be the linear relationship between propagation range of internal stress field and compressed volume of coal body and nonlinear relationship between the range of outburst coal mass and the number of pores which is influenced by mining pressure. The results provide theory reference for the research on the range of mining-induced stress and broken coal wall. PMID:24967438
Microwave assisted hard rock cutting
Lindroth, David P.; Morrell, Roger J.; Blair, James R.
1991-01-01
An apparatus for the sequential fracturing and cutting of subsurface volume of hard rock (102) in the strata (101) of a mining environment (100) by subjecting the volume of rock to a beam (25) of microwave energy to fracture the subsurface volume of rock by differential expansion; and , then bringing the cutting edge (52) of a piece of conventional mining machinery (50) into contact with the fractured rock (102).
Adaptive semantic tag mining from heterogeneous clinical research texts.
Hao, T; Weng, C
2015-01-01
To develop an adaptive approach to mine frequent semantic tags (FSTs) from heterogeneous clinical research texts. We develop a "plug-n-play" framework that integrates replaceable unsupervised kernel algorithms with formatting, functional, and utility wrappers for FST mining. Temporal information identification and semantic equivalence detection were two example functional wrappers. We first compared this approach's recall and efficiency for mining FSTs from ClinicalTrials.gov to that of a recently published tag-mining algorithm. Then we assessed this approach's adaptability to two other types of clinical research texts: clinical data requests and clinical trial protocols, by comparing the prevalence trends of FSTs across three texts. Our approach increased the average recall and speed by 12.8% and 47.02% respectively upon the baseline when mining FSTs from ClinicalTrials.gov, and maintained an overlap in relevant FSTs with the base- line ranging between 76.9% and 100% for varying FST frequency thresholds. The FSTs saturated when the data size reached 200 documents. Consistent trends in the prevalence of FST were observed across the three texts as the data size or frequency threshold changed. This paper contributes an adaptive tag-mining framework that is scalable and adaptable without sacrificing its recall. This component-based architectural design can be potentially generalizable to improve the adaptability of other clinical text mining methods.
Text mining resources for the life sciences.
Przybyła, Piotr; Shardlow, Matthew; Aubin, Sophie; Bossy, Robert; Eckart de Castilho, Richard; Piperidis, Stelios; McNaught, John; Ananiadou, Sophia
2016-01-01
Text mining is a powerful technology for quickly distilling key information from vast quantities of biomedical literature. However, to harness this power the researcher must be well versed in the availability, suitability, adaptability, interoperability and comparative accuracy of current text mining resources. In this survey, we give an overview of the text mining resources that exist in the life sciences to help researchers, especially those employed in biocuration, to engage with text mining in their own work. We categorize the various resources under three sections: Content Discovery looks at where and how to find biomedical publications for text mining; Knowledge Encoding describes the formats used to represent the different levels of information associated with content that enable text mining, including those formats used to carry such information between processes; Tools and Services gives an overview of workflow management systems that can be used to rapidly configure and compare domain- and task-specific processes, via access to a wide range of pre-built tools. We also provide links to relevant repositories in each section to enable the reader to find resources relevant to their own area of interest. Throughout this work we give a special focus to resources that are interoperable-those that have the crucial ability to share information, enabling smooth integration and reusability. © The Author(s) 2016. Published by Oxford University Press.
Chapter 16: text mining for translational bioinformatics.
Cohen, K Bretonnel; Hunter, Lawrence E
2013-04-01
Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.
Text mining resources for the life sciences
Shardlow, Matthew; Aubin, Sophie; Bossy, Robert; Eckart de Castilho, Richard; Piperidis, Stelios; McNaught, John; Ananiadou, Sophia
2016-01-01
Text mining is a powerful technology for quickly distilling key information from vast quantities of biomedical literature. However, to harness this power the researcher must be well versed in the availability, suitability, adaptability, interoperability and comparative accuracy of current text mining resources. In this survey, we give an overview of the text mining resources that exist in the life sciences to help researchers, especially those employed in biocuration, to engage with text mining in their own work. We categorize the various resources under three sections: Content Discovery looks at where and how to find biomedical publications for text mining; Knowledge Encoding describes the formats used to represent the different levels of information associated with content that enable text mining, including those formats used to carry such information between processes; Tools and Services gives an overview of workflow management systems that can be used to rapidly configure and compare domain- and task-specific processes, via access to a wide range of pre-built tools. We also provide links to relevant repositories in each section to enable the reader to find resources relevant to their own area of interest. Throughout this work we give a special focus to resources that are interoperable—those that have the crucial ability to share information, enabling smooth integration and reusability. PMID:27888231
Cohen, Raphael; Elhadad, Michael; Elhadad, Noémie
2013-01-16
The increasing availability of Electronic Health Record (EHR) data and specifically free-text patient notes presents opportunities for phenotype extraction. Text-mining methods in particular can help disease modeling by mapping named-entities mentions to terminologies and clustering semantically related terms. EHR corpora, however, exhibit specific statistical and linguistic characteristics when compared with corpora in the biomedical literature domain. We focus on copy-and-paste redundancy: clinicians typically copy and paste information from previous notes when documenting a current patient encounter. Thus, within a longitudinal patient record, one expects to observe heavy redundancy. In this paper, we ask three research questions: (i) How can redundancy be quantified in large-scale text corpora? (ii) Conventional wisdom is that larger corpora yield better results in text mining. But how does the observed EHR redundancy affect text mining? Does such redundancy introduce a bias that distorts learned models? Or does the redundancy introduce benefits by highlighting stable and important subsets of the corpus? (iii) How can one mitigate the impact of redundancy on text mining? We analyze a large-scale EHR corpus and quantify redundancy both in terms of word and semantic concept repetition. We observe redundancy levels of about 30% and non-standard distribution of both words and concepts. We measure the impact of redundancy on two standard text-mining applications: collocation identification and topic modeling. We compare the results of these methods on synthetic data with controlled levels of redundancy and observe significant performance variation. Finally, we compare two mitigation strategies to avoid redundancy-induced bias: (i) a baseline strategy, keeping only the last note for each patient in the corpus; (ii) removing redundant notes with an efficient fingerprinting-based algorithm. (a)For text mining, preprocessing the EHR corpus with fingerprinting yields significantly better results. Before applying text-mining techniques, one must pay careful attention to the structure of the analyzed corpora. While the importance of data cleaning has been known for low-level text characteristics (e.g., encoding and spelling), high-level and difficult-to-quantify corpus characteristics, such as naturally occurring redundancy, can also hurt text mining. Fingerprinting enables text-mining techniques to leverage available data in the EHR corpus, while avoiding the bias introduced by redundancy.
The mathematical model of radon-222 accumulation in underground mines
NASA Astrophysics Data System (ADS)
Klimshin, A.
2012-04-01
Necessity to control underground mine air radon level arises during building and operating mines as well as auto and railway tunnels including those for metros. Calculation of underground mine air radon level can be fulfilled for estimation of potential radon danger of area for underground structure building. In this work the new mathematical model of radon accumulation in underground mines has been suggested. It takes into consideration underground mine dimensions, air exchange factor and soils ability to emanate radon. The following assumptions have been taken for model development. It is assumed that underground mine is a cylinder of length L and of base area S. Due to ventilation atmosphere air of volume activity Catm, is coming in through one cylinder base and is going out of volume activity Cind from underground mine. Diffusion radon flux is coming in through side surfaces of underground mine. The sources of this flux are radium-226 atoms distributed evenly in rock. For simplification of the task it considered possible to disregard radon emanation by loosened rock and underground waters. As a result of solution of the radon diffusion equation the following expression for calculation of radon volume activity in underground space air has been got: 2·r0 ·λv ·Catm-·l·K0(r0/l)-+D-·K1(r0/l)·C0- Cind = 2·(λ+ λv)·r0 ·l·K0 (r0/l)+ D ·K1(r0/l) . The following designations are used in this expression: Kν(r) - the second genus modified Bessel's function, C0 - equilibrium radon volume activity in soil air, l - diffusion radon length in soil, D - radon diffusion factor, r0 - radius of underground tunnel, λv - factor of air exchange. Expression found may be used for calculation of the minimum factor of necessary air exchange for ensuring safe radon levels in underground spaces. With this worked out model expected levels of radon volume activity were calculated for air in the second metro line underground spaces in the city of Yekaterinburg, Russia.
ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers.
Xing, Yuting; Wu, Chengkun; Yang, Xi; Wang, Wei; Zhu, En; Yin, Jianping
2018-04-27
A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER.
Hospitalization patterns associated with Appalachian coal mining.
Hendryx, Michael; Ahern, Melissa M; Nurkiewicz, Timothy R
2007-12-01
The goal of this study was to test whether the volume of coal mining was related to population hospitalization risk for diseases postulated to be sensitive or insensitive to coal mining by-products. The study was a retrospective analysis of 2001 adult hospitalization data (n = 93,952) for West Virginia, Kentucky, and Pennsylvania, merged with county-level coal production figures. Hospitalization data were obtained from the Health Care Utilization Project National Inpatient Sample. Diagnoses postulated to be sensitive to coal mining by-product exposure were contrasted with diagnoses postulated to be insensitive to exposure. Data were analyzed using hierarchical nonlinear models, controlling for patient age, gender, insurance, comorbidities, hospital teaching status, county poverty, and county social capital. Controlling for covariates, the volume of coal mining was significantly related to hospitalization risk for two conditions postulated to be sensitive to exposure: hypertension and chronic obstructive pulmonary disease (COPD). The odds for a COPD hospitalization increased 1% for each 1462 tons of coal, and the odds for a hypertension hospitalization increased 1% for each 1873 tons of coal. Other conditions were not related to mining volume. Exposure to particulates or other pollutants generated by coal mining activities may be linked to increased risk of COPD and hypertension hospitalizations. Limitations in the data likely result in an underestimate of associations.
Text-mining and information-retrieval services for molecular biology
Krallinger, Martin; Valencia, Alfonso
2005-01-01
Text-mining in molecular biology - defined as the automatic extraction of information about genes, proteins and their functional relationships from text documents - has emerged as a hybrid discipline on the edges of the fields of information science, bioinformatics and computational linguistics. A range of text-mining applications have been developed recently that will improve access to knowledge for biologists and database annotators. PMID:15998455
Text mining for traditional Chinese medical knowledge discovery: a survey.
Zhou, Xuezhong; Peng, Yonghong; Liu, Baoyan
2010-08-01
Extracting meaningful information and knowledge from free text is the subject of considerable research interest in the machine learning and data mining fields. Text data mining (or text mining) has become one of the most active research sub-fields in data mining. Significant developments in the area of biomedical text mining during the past years have demonstrated its great promise for supporting scientists in developing novel hypotheses and new knowledge from the biomedical literature. Traditional Chinese medicine (TCM) provides a distinct methodology with which to view human life. It is one of the most complete and distinguished traditional medicines with a history of several thousand years of studying and practicing the diagnosis and treatment of human disease. It has been shown that the TCM knowledge obtained from clinical practice has become a significant complementary source of information for modern biomedical sciences. TCM literature obtained from the historical period and from modern clinical studies has recently been transformed into digital data in the form of relational databases or text documents, which provide an effective platform for information sharing and retrieval. This motivates and facilitates research and development into knowledge discovery approaches and to modernize TCM. In order to contribute to this still growing field, this paper presents (1) a comparative introduction to TCM and modern biomedicine, (2) a survey of the related information sources of TCM, (3) a review and discussion of the state of the art and the development of text mining techniques with applications to TCM, (4) a discussion of the research issues around TCM text mining and its future directions. Copyright 2010 Elsevier Inc. All rights reserved.
Managing biological networks by using text mining and computer-aided curation
NASA Astrophysics Data System (ADS)
Yu, Seok Jong; Cho, Yongseong; Lee, Min-Ho; Lim, Jongtae; Yoo, Jaesoo
2015-11-01
In order to understand a biological mechanism in a cell, a researcher should collect a huge number of protein interactions with experimental data from experiments and the literature. Text mining systems that extract biological interactions from papers have been used to construct biological networks for a few decades. Even though the text mining of literature is necessary to construct a biological network, few systems with a text mining tool are available for biologists who want to construct their own biological networks. We have developed a biological network construction system called BioKnowledge Viewer that can generate a biological interaction network by using a text mining tool and biological taggers. It also Boolean simulation software to provide a biological modeling system to simulate the model that is made with the text mining tool. A user can download PubMed articles and construct a biological network by using the Multi-level Knowledge Emergence Model (KMEM), MetaMap, and A Biomedical Named Entity Recognizer (ABNER) as a text mining tool. To evaluate the system, we constructed an aging-related biological network that consist 9,415 nodes (genes) by using manual curation. With network analysis, we found that several genes, including JNK, AP-1, and BCL-2, were highly related in aging biological network. We provide a semi-automatic curation environment so that users can obtain a graph database for managing text mining results that are generated in the server system and can navigate the network with BioKnowledge Viewer, which is freely available at http://bioknowledgeviewer.kisti.re.kr.
An overview of the biocreative 2012 workshop track III: Interactive text mining task
USDA-ARS?s Scientific Manuscript database
An important question is how to make use of text mining to enhance the biocuration workflow. A number of groups have developed tools for text mining from a computer science/linguistics perspective and there are many initiatives to curate some aspect of biology from the literature. In some cases the ...
Text Mining in Cancer Gene and Pathway Prioritization
Luo, Yuan; Riedlinger, Gregory; Szolovits, Peter
2014-01-01
Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed. A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways. This review places existing gene prioritization tools against the backdrop of an integrative Omic hierarchy view toward cancer and focuses on the analysis of their text mining components. We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes. PMID:25392685
Text mining in cancer gene and pathway prioritization.
Luo, Yuan; Riedlinger, Gregory; Szolovits, Peter
2014-01-01
Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed. A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways. This review places existing gene prioritization tools against the backdrop of an integrative Omic hierarchy view toward cancer and focuses on the analysis of their text mining components. We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes.
40 CFR 440.144 - New source performance standards (NSPS).
Code of Federal Regulations, 2011 CFR
2011-07-01
...-cut mine plant site shall not exceed the volume of infiltration, drainage and mine drainage waters... of infiltration, drainage and mine drainage waters which is in excess of the make up water required...
40 CFR 440.144 - New source performance standards (NSPS).
Code of Federal Regulations, 2010 CFR
2010-07-01
...-cut mine plant site shall not exceed the volume of infiltration, drainage and mine drainage waters... of infiltration, drainage and mine drainage waters which is in excess of the make up water required...
Citation Mining: Integrating Text Mining and Bibliometrics for Research User Profiling.
ERIC Educational Resources Information Center
Kostoff, Ronald N.; del Rio, J. Antonio; Humenik, James A.; Garcia, Esther Ofilia; Ramirez, Ana Maria
2001-01-01
Discusses the importance of identifying the users and impact of research, and describes an approach for identifying the pathways through which research can impact other research, technology development, and applications. Describes a study that used citation mining, an integration of citation bibliometrics and text mining, on articles from the…
MET network in PubMed: a text-mined network visualization and curation system.
Dai, Hong-Jie; Su, Chu-Hsien; Lai, Po-Ting; Huang, Ming-Siang; Jonnagaddala, Jitendra; Rose Jue, Toni; Rao, Shruti; Chou, Hui-Jou; Milacic, Marija; Singh, Onkar; Syed-Abdul, Shabbir; Hsu, Wen-Lian
2016-01-01
Metastasis is the dissemination of a cancer/tumor from one organ to another, and it is the most dangerous stage during cancer progression, causing more than 90% of cancer deaths. Improving the understanding of the complicated cellular mechanisms underlying metastasis requires investigations of the signaling pathways. To this end, we developed a METastasis (MET) network visualization and curation tool to assist metastasis researchers retrieve network information of interest while browsing through the large volume of studies in PubMed. MET can recognize relations among genes, cancers, tissues and organs of metastasis mentioned in the literature through text-mining techniques, and then produce a visualization of all mined relations in a metastasis network. To facilitate the curation process, MET is developed as a browser extension that allows curators to review and edit concepts and relations related to metastasis directly in PubMed. PubMed users can also view the metastatic networks integrated from the large collection of research papers directly through MET. For the BioCreative 2015 interactive track (IAT), a curation task was proposed to curate metastatic networks among PubMed abstracts. Six curators participated in the proposed task and a post-IAT task, curating 963 unique metastatic relations from 174 PubMed abstracts using MET.Database URL: http://btm.tmu.edu.tw/metastasisway. © The Author(s) 2016. Published by Oxford University Press.
NASA Astrophysics Data System (ADS)
Valdman, V. V.; Gridnev, S. O.
2017-10-01
The article examines into the vital issues of measuring and calculating the raw stock volumes in covered storehouses at mining and processing plants. The authors bring out two state-of-the-art high-technology solutions: 1 - to use the ground-based laser scanning system (the method is reasonably accurate and dependable, but costly and time consuming; it also requires the stoppage of works in the storehouse); 2 - to use the fundamentally new computerized stocktaking system in mine surveying for the ore mineral volume calculation, based on the profile digital images. These images are obtained via vertical projection of the laser plane onto the surface of the stored raw materials.
Using text-mining techniques in electronic patient records to identify ADRs from medicine use.
Warrer, Pernille; Hansen, Ebba Holme; Juhl-Jensen, Lars; Aagaard, Lise
2012-05-01
This literature review included studies that use text-mining techniques in narrative documents stored in electronic patient records (EPRs) to investigate ADRs. We searched PubMed, Embase, Web of Science and International Pharmaceutical Abstracts without restrictions from origin until July 2011. We included empirically based studies on text mining of electronic patient records (EPRs) that focused on detecting ADRs, excluding those that investigated adverse events not related to medicine use. We extracted information on study populations, EPR data sources, frequencies and types of the identified ADRs, medicines associated with ADRs, text-mining algorithms used and their performance. Seven studies, all from the United States, were eligible for inclusion in the review. Studies were published from 2001, the majority between 2009 and 2010. Text-mining techniques varied over time from simple free text searching of outpatient visit notes and inpatient discharge summaries to more advanced techniques involving natural language processing (NLP) of inpatient discharge summaries. Performance appeared to increase with the use of NLP, although many ADRs were still missed. Due to differences in study design and populations, various types of ADRs were identified and thus we could not make comparisons across studies. The review underscores the feasibility and potential of text mining to investigate narrative documents in EPRs for ADRs. However, more empirical studies are needed to evaluate whether text mining of EPRs can be used systematically to collect new information about ADRs. © 2011 The Authors. British Journal of Clinical Pharmacology © 2011 The British Pharmacological Society.
Using text-mining techniques in electronic patient records to identify ADRs from medicine use
Warrer, Pernille; Hansen, Ebba Holme; Juhl-Jensen, Lars; Aagaard, Lise
2012-01-01
This literature review included studies that use text-mining techniques in narrative documents stored in electronic patient records (EPRs) to investigate ADRs. We searched PubMed, Embase, Web of Science and International Pharmaceutical Abstracts without restrictions from origin until July 2011. We included empirically based studies on text mining of electronic patient records (EPRs) that focused on detecting ADRs, excluding those that investigated adverse events not related to medicine use. We extracted information on study populations, EPR data sources, frequencies and types of the identified ADRs, medicines associated with ADRs, text-mining algorithms used and their performance. Seven studies, all from the United States, were eligible for inclusion in the review. Studies were published from 2001, the majority between 2009 and 2010. Text-mining techniques varied over time from simple free text searching of outpatient visit notes and inpatient discharge summaries to more advanced techniques involving natural language processing (NLP) of inpatient discharge summaries. Performance appeared to increase with the use of NLP, although many ADRs were still missed. Due to differences in study design and populations, various types of ADRs were identified and thus we could not make comparisons across studies. The review underscores the feasibility and potential of text mining to investigate narrative documents in EPRs for ADRs. However, more empirical studies are needed to evaluate whether text mining of EPRs can be used systematically to collect new information about ADRs. PMID:22122057
Nash, J. Thomas; Stillings, Lisa L.
2004-01-01
Reconnaissance hydrogeochemical studies of the Humboldt River basin and adjacent areas of northern Nevada have identified local sources of acidic waters generated by historical mine workings and mine waste. The mine-related acidic waters are rare and generally flow less than a kilometer before being neutralized by natural processes. Where waters have a pH of less than about 3, particularly in the presence of sulfide minerals, the waters take on high to extremely high concentrations of many potentially toxic metals. The processes that create these acidic, metal-rich waters in Nevada are the same as for other parts of the world, but the scale of transport and the fate of metals are much more localized because of the ubiquitous presence of caliche soils. Acid mine drainage is rare in historical mining districts of northern Nevada, and the volume of drainage rarely exceeds about 20 gpm. My findings are in close agreement with those of Price and others (1995) who estimated that less than 0.05 percent of inactive and abandoned mines in Nevada are likely to be a concern for acid mine drainage. Most historical mining districts have no draining mines. Only in two districts (Hilltop and National) does water affected by mining flow into streams of significant size and length (more than 8 km). Water quality in even the worst cases is naturally attenuated to meet water-quality standards within about 1 km of the source. Only a few historical mines release acidic water with elevated metal concentrations to small streams that reach the Humboldt River, and these contaminants and are not detectable in the Humboldt. These reconnaissance studies offer encouraging evidence that abandoned mines in Nevada create only minimal and local water-quality problems. Natural attenuation processes are sufficient to compensate for these relatively small sources of contamination. These results may provide useful analogs for future mining in the Humboldt River basin, but attention must be given to matters of scale: larger volumes of waste and larger volumes of water could easily overwhelm the delicate balance of natural attenuation described here.
Gene prioritization and clustering by multi-view text mining
2010-01-01
Background Text mining has become a useful tool for biologists trying to understand the genetics of diseases. In particular, it can help identify the most interesting candidate genes for a disease for further experimental analysis. Many text mining approaches have been introduced, but the effect of disease-gene identification varies in different text mining models. Thus, the idea of incorporating more text mining models may be beneficial to obtain more refined and accurate knowledge. However, how to effectively combine these models still remains a challenging question in machine learning. In particular, it is a non-trivial issue to guarantee that the integrated model performs better than the best individual model. Results We present a multi-view approach to retrieve biomedical knowledge using different controlled vocabularies. These controlled vocabularies are selected on the basis of nine well-known bio-ontologies and are applied to index the vast amounts of gene-based free-text information available in the MEDLINE repository. The text mining result specified by a vocabulary is considered as a view and the obtained multiple views are integrated by multi-source learning algorithms. We investigate the effect of integration in two fundamental computational disease gene identification tasks: gene prioritization and gene clustering. The performance of the proposed approach is systematically evaluated and compared on real benchmark data sets. In both tasks, the multi-view approach demonstrates significantly better performance than other comparing methods. Conclusions In practical research, the relevance of specific vocabulary pertaining to the task is usually unknown. In such case, multi-view text mining is a superior and promising strategy for text-based disease gene identification. PMID:20074336
The Lure of Statistics in Data Mining
ERIC Educational Resources Information Center
Grover, Lovleen Kumar; Mehra, Rajni
2008-01-01
The field of Data Mining like Statistics concerns itself with "learning from data" or "turning data into information". For statisticians the term "Data mining" has a pejorative meaning. Instead of finding useful patterns in large volumes of data as in the case of Statistics, data mining has the connotation of searching for data to fit preconceived…
Utilization of volume correlation filters for underwater mine identification in LIDAR imagery
NASA Astrophysics Data System (ADS)
Walls, Bradley
2008-04-01
Underwater mine identification persists as a critical technology pursued aggressively by the Navy for fleet protection. As such, new and improved techniques must continue to be developed in order to provide measurable increases in mine identification performance and noticeable reductions in false alarm rates. In this paper we show how recent advances in the Volume Correlation Filter (VCF) developed for ground based LIDAR systems can be adapted to identify targets in underwater LIDAR imagery. Current automated target recognition (ATR) algorithms for underwater mine identification employ spatial based three-dimensional (3D) shape fitting of models to LIDAR data to identify common mine shapes consisting of the box, cylinder, hemisphere, truncated cone, wedge, and annulus. VCFs provide a promising alternative to these spatial techniques by correlating 3D models against the 3D rendered LIDAR data.
ERIC Educational Resources Information Center
Mei, Qiaozhu
2009-01-01
With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the…
Small, Aeron M; Kiss, Daniel H; Zlatsin, Yevgeny; Birtwell, David L; Williams, Heather; Guerraty, Marie A; Han, Yuchi; Anwaruddin, Saif; Holmes, John H; Chirinos, Julio A; Wilensky, Robert L; Giri, Jay; Rader, Daniel J
2017-08-01
Interrogation of the electronic health record (EHR) using billing codes as a surrogate for diagnoses of interest has been widely used for clinical research. However, the accuracy of this methodology is variable, as it reflects billing codes rather than severity of disease, and depends on the disease and the accuracy of the coding practitioner. Systematic application of text mining to the EHR has had variable success for the detection of cardiovascular phenotypes. We hypothesize that the application of text mining algorithms to cardiovascular procedure reports may be a superior method to identify patients with cardiovascular conditions of interest. We adapted the Oracle product Endeca, which utilizes text mining to identify terms of interest from a NoSQL-like database, for purposes of searching cardiovascular procedure reports and termed the tool "PennSeek". We imported 282,569 echocardiography reports representing 81,164 individuals and 27,205 cardiac catheterization reports representing 14,567 individuals from non-searchable databases into PennSeek. We then applied clinical criteria to these reports in PennSeek to identify patients with trileaflet aortic stenosis (TAS) and coronary artery disease (CAD). Accuracy of patient identification by text mining through PennSeek was compared with ICD-9 billing codes. Text mining identified 7115 patients with TAS and 9247 patients with CAD. ICD-9 codes identified 8272 patients with TAS and 6913 patients with CAD. 4346 patients with AS and 6024 patients with CAD were identified by both approaches. A randomly selected sample of 200-250 patients uniquely identified by text mining was compared with 200-250 patients uniquely identified by billing codes for both diseases. We demonstrate that text mining was superior, with a positive predictive value (PPV) of 0.95 compared to 0.53 by ICD-9 for TAS, and a PPV of 0.97 compared to 0.86 for CAD. These results highlight the superiority of text mining algorithms applied to electronic cardiovascular procedure reports in the identification of phenotypes of interest for cardiovascular research. Copyright © 2017. Published by Elsevier Inc.
Code of Federal Regulations, 2011 CFR
2011-07-01
... an open-cut mine plant site shall not exceed the volume of infiltration, drainage and mine drainage... of infiltration, drainage and mine drainage waters which is in excess of the make up water required...
Code of Federal Regulations, 2010 CFR
2010-07-01
... an open-cut mine plant site shall not exceed the volume of infiltration, drainage and mine drainage... of infiltration, drainage and mine drainage waters which is in excess of the make up water required...
Uranium mining wastes, garden exhibition and health risks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schmidt, Gerhard; Schmidt, Peter; Hinz, Wilko
2007-07-01
Available in abstract form only. Full text of publication follows: For more than 40 years the Soviet-German stockholding company SDAG WISMUT mined and milled Uranium in the East of Germany and became up to 1990 the world's third largest Uranium producer. After reunification of Germany, the new found state own company Wismut GmbH was faced with the task of decommissioning and rehabilitation of the mining and milling sites. One of the largest mining areas in the world, that had to be cleaned up, was located close to the municipality of Ronneburg near the City of Gera in Thuringia. After closingmore » the operations of the Ronneburg underground mine and at the 160 m deep open pit mine with a free volume of 84 Mio.m{sup 3}, the open pit and 7 large piles of mine waste, together 112 Mio.m{sup 3} of material, had to be cleaned up. As a result of an optimisation procedure it was chosen to relocate the waste rock piles back into the open pit. After taking this decision and approval of the plan the disposal operation was started. Even though the transport task was done by large trucks, this took 16 years. The work will be finished in 2007, a cover consisting of 40 cm of uncontaminated material will be placed on top of the material, and the re-vegetation of the former open pit area will be established. When in 2002 the City of Gera applied to host the largest garden exhibition in Germany, Bundesgartenschau (BUGA), in 2007, Wismut GmbH supported this plan by offering parts of the territory of the former mining site as an exhibition ground. Finally, it was decided by the BUGA organizers to arrange its 2007 exhibition on grounds in Gera and in the valley adjacent to the former open pit mine, with parts of the remediated area within the fence of the exhibition. (authors)« less
Agyeman, Stephen; Ampadu, Samuel I K
2016-02-01
Mine rock waste, which is the rock material removed in order to access and mine ore, is free from gold processing chemical contaminants but presents a significant environmental challenge owing to the large volumes involved. One way of mitigating the environmental and safety challenges posed by the large volume of mine rock waste stockpiled in mining communities is to find uses of this material as a substitute for rock aggregates in construction. This article reports on a study conducted to evaluate the engineering properties of such a mine deposit to determine its suitability for use as road pavement material. Samples of mine rock waste, derived from the granitic and granodioritic intrusive units overlying the gold-bearing metavolcanic rock and volcano-clastic sediments of a gold mining area in Ghana, were obtained from three mine rock waste disposal facilities and subjected to a battery of laboratory tests to determine their physical, mechanical, geotechnical, geometrical and durability properties. The overall conclusion was that the mine rock waste met all the requirements of the Ghana Ministry of Transportation specification for use as aggregates for crushed rock subbase, base and surface dressing chippings for road pavements. The recommendation is to process it into the required sizes for the various applications. © The Author(s) 2015.
Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II.
Lu, Zhiyong; Hirschman, Lynette
2012-01-01
Manual curation of data from the biomedical literature is a rate-limiting factor for many expert curated databases. Despite the continuing advances in biomedical text mining and the pressing needs of biocurators for better tools, few existing text-mining tools have been successfully integrated into production literature curation systems such as those used by the expert curated databases. To close this gap and better understand all aspects of literature curation, we invited submissions of written descriptions of curation workflows from expert curated databases for the BioCreative 2012 Workshop Track II. We received seven qualified contributions, primarily from model organism databases. Based on these descriptions, we identified commonalities and differences across the workflows, the common ontologies and controlled vocabularies used and the current and desired uses of text mining for biocuration. Compared to a survey done in 2009, our 2012 results show that many more databases are now using text mining in parts of their curation workflows. In addition, the workshop participants identified text-mining aids for finding gene names and symbols (gene indexing), prioritization of documents for curation (document triage) and ontology concept assignment as those most desired by the biocurators. DATABASE URL: http://www.biocreative.org/tasks/bc-workshop-2012/workflow/.
NASA Astrophysics Data System (ADS)
Znikina, Ludmila; Rozhneva, Elena
2017-11-01
The article deals with the distribution of informative intensity of the English-language scientific text based on its structural features contributing to the process of formalization of the scientific text and the preservation of the adequacy of the text with derived semantic information in relation to the primary. Discourse analysis is built on specific compositional and meaningful examples of scientific texts taken from the mining field. It also analyzes the adequacy of the translation of foreign texts into another language, the relationships between elements of linguistic systems, the degree of a formal conformance, translation with the specific objectives and information needs of the recipient. Some key words and ideas are emphasized in the paragraphs of the English-language mining scientific texts. The article gives the characteristic features of the structure of paragraphs of technical text and examples of constructions in English scientific texts based on a mining theme with the aim to explain the possible ways of their adequate translation.
ERIC Educational Resources Information Center
Adkins, John; And Others
A project was designed to produce a broad description of current mining training programs and to evaluate their effectiveness with respect to reducing mine injuries. The research strategy was built on the ranking of mines according to the effectiveness of their training with an effective training effort being defined as that training which is…
An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature.
ERIC Educational Resources Information Center
Trybula, Walter J.; Wyllys, Ronald E.
2000-01-01
Addresses an approach to the discovery of scientific knowledge through an examination of data mining and text mining techniques. Presents the results of experiments that investigated knowledge acquisition from a selected set of technical documents by domain experts. (Contains 15 references.) (Author/LRW)
ERIC Educational Resources Information Center
Chen, Hsinchun
2003-01-01
Discusses information retrieval techniques used on the World Wide Web. Topics include machine learning in information extraction; relevance feedback; information filtering and recommendation; text classification and text clustering; Web mining, based on data mining techniques; hyperlink structure; and Web size. (LRW)
Application of text mining in the biomedical domain.
Fleuren, Wilco W M; Alkema, Wynand
2015-03-01
In recent years the amount of experimental data that is produced in biomedical research and the number of papers that are being published in this field have grown rapidly. In order to keep up to date with developments in their field of interest and to interpret the outcome of experiments in light of all available literature, researchers turn more and more to the use of automated literature mining. As a consequence, text mining tools have evolved considerably in number and quality and nowadays can be used to address a variety of research questions ranging from de novo drug target discovery to enhanced biological interpretation of the results from high throughput experiments. In this paper we introduce the most important techniques that are used for a text mining and give an overview of the text mining tools that are currently being used and the type of problems they are typically applied for. Copyright © 2015 Elsevier Inc. All rights reserved.
Empirical Models of Zones Protecting Against Coal Dust Explosion
NASA Astrophysics Data System (ADS)
Prostański, Dariusz
2017-09-01
The paper presents predicted use of research' results to specify relations between volume of dust deposition and changes of its concentration in air. These were used to shape zones protecting against coal dust explosion. Methodology of research was presented, including methods of measurement of dust concentration as well as deposition. Measurements were taken in the Brzeszcze Mine within framework of MEZAP, co-financed by The National Centre for Research and Development (NCBR) and performed by the Institute of Mining Technology KOMAG, the Central Mining Institute (GIG) and the Coal Company PLC. The project enables performing of research related to measurements of volume of dust deposition as well as its concentration in air in protective zones in a number of mine workings in the Brzeszcze Mine. Developed model may be supportive tool in form of system located directly in protective zones or as operator tool warning about increasing hazard of coal dust explosion.
Event-based text mining for biology and functional genomics
Thompson, Paul; Nawaz, Raheel; McNaught, John; Kell, Douglas B.
2015-01-01
The assessment of genome function requires a mapping between genome-derived entities and biochemical reactions, and the biomedical literature represents a rich source of information about reactions between biological components. However, the increasingly rapid growth in the volume of literature provides both a challenge and an opportunity for researchers to isolate information about reactions of interest in a timely and efficient manner. In response, recent text mining research in the biology domain has been largely focused on the identification and extraction of ‘events’, i.e. categorised, structured representations of relationships between biochemical entities, from the literature. Functional genomics analyses necessarily encompass events as so defined. Automatic event extraction systems facilitate the development of sophisticated semantic search applications, allowing researchers to formulate structured queries over extracted events, so as to specify the exact types of reactions to be retrieved. This article provides an overview of recent research into event extraction. We cover annotated corpora on which systems are trained, systems that achieve state-of-the-art performance and details of the community shared tasks that have been instrumental in increasing the quality, coverage and scalability of recent systems. Finally, several concrete applications of event extraction are covered, together with emerging directions of research. PMID:24907365
Towards cross-lingual alerting for bursty epidemic events.
Collier, Nigel
2011-10-06
Online news reports are increasingly becoming a source for event-based early warning systems that detect natural disasters. Harnessing the massive volume of information available from multilingual newswire presents as many challanges as opportunities due to the patterns of reporting complex spatio-temporal events. In this article we study the problem of utilising correlated event reports across languages. We track the evolution of 16 disease outbreaks using 5 temporal aberration detection algorithms on text-mined events classified according to disease and outbreak country. Using ProMED reports as a silver standard, comparative analysis of news data for 13 languages over a 129 day trial period showed improved sensitivity, F1 and timeliness across most models using cross-lingual events. We report a detailed case study analysis for Cholera in Angola 2010 which highlights the challenges faced in correlating news events with the silver standard. The results show that automated health surveillance using multilingual text mining has the potential to turn low value news into high value alerts if informed choices are used to govern the selection of models and data sources. An implementation of the C2 alerting algorithm using multilingual news is available at the BioCaster portal http://born.nii.ac.jp/?page=globalroundup.
Process Mining Online Assessment Data
ERIC Educational Resources Information Center
Pechenizkiy, Mykola; Trcka, Nikola; Vasilyeva, Ekaterina; van der Aalst, Wil; De Bra, Paul
2009-01-01
Traditional data mining techniques have been extensively applied to find interesting patterns, build descriptive and predictive models from large volumes of data accumulated through the use of different information systems. The results of data mining can be used for getting a better understanding of the underlying educational processes, for…
The Islamic State Battle Plan: Press Release Natural Language Processing
2016-06-01
Processing, text mining , corpus, generalized linear model, cascade, R Shiny, leaflet, data visualization 15. NUMBER OF PAGES 83 16. PRICE CODE...Terrorism and Responses to Terrorism TDM Term Document Matrix TF Term Frequency TF-IDF Term Frequency-Inverse Document Frequency tm text mining (R...package=leaflet. Feinerer I, Hornik K (2015) Text Mining Package “tm,” Version 0.6-2. (Jul 3) https://cran.r-project.org/web/packages/tm/tm.pdf
OntoGene web services for biomedical text mining.
Rinaldi, Fabio; Clematide, Simon; Marques, Hernani; Ellendorff, Tilia; Romacker, Martin; Rodriguez-Esteban, Raul
2014-01-01
Text mining services are rapidly becoming a crucial component of various knowledge management pipelines, for example in the process of database curation, or for exploration and enrichment of biomedical data within the pharmaceutical industry. Traditional architectures, based on monolithic applications, do not offer sufficient flexibility for a wide range of use case scenarios, and therefore open architectures, as provided by web services, are attracting increased interest. We present an approach towards providing advanced text mining capabilities through web services, using a recently proposed standard for textual data interchange (BioC). The web services leverage a state-of-the-art platform for text mining (OntoGene) which has been tested in several community-organized evaluation challenges,with top ranked results in several of them.
Text mining patents for biomedical knowledge.
Rodriguez-Esteban, Raul; Bundschus, Markus
2016-06-01
Biomedical text mining of scientific knowledge bases, such as Medline, has received much attention in recent years. Given that text mining is able to automatically extract biomedical facts that revolve around entities such as genes, proteins, and drugs, from unstructured text sources, it is seen as a major enabler to foster biomedical research and drug discovery. In contrast to the biomedical literature, research into the mining of biomedical patents has not reached the same level of maturity. Here, we review existing work and highlight the associated technical challenges that emerge from automatically extracting facts from patents. We conclude by outlining potential future directions in this domain that could help drive biomedical research and drug discovery. Copyright © 2016 Elsevier Ltd. All rights reserved.
OntoGene web services for biomedical text mining
2014-01-01
Text mining services are rapidly becoming a crucial component of various knowledge management pipelines, for example in the process of database curation, or for exploration and enrichment of biomedical data within the pharmaceutical industry. Traditional architectures, based on monolithic applications, do not offer sufficient flexibility for a wide range of use case scenarios, and therefore open architectures, as provided by web services, are attracting increased interest. We present an approach towards providing advanced text mining capabilities through web services, using a recently proposed standard for textual data interchange (BioC). The web services leverage a state-of-the-art platform for text mining (OntoGene) which has been tested in several community-organized evaluation challenges, with top ranked results in several of them. PMID:25472638
Text-mining analysis of mHealth research.
Ozaydin, Bunyamin; Zengul, Ferhat; Oner, Nurettin; Delen, Dursun
2017-01-01
In recent years, because of the advancements in communication and networking technologies, mobile technologies have been developing at an unprecedented rate. mHealth, the use of mobile technologies in medicine, and the related research has also surged parallel to these technological advancements. Although there have been several attempts to review mHealth research through manual processes such as systematic reviews, the sheer magnitude of the number of studies published in recent years makes this task very challenging. The most recent developments in machine learning and text mining offer some potential solutions to address this challenge by allowing analyses of large volumes of texts through semi-automated processes. The objective of this study is to analyze the evolution of mHealth research by utilizing text-mining and natural language processing (NLP) analyses. The study sample included abstracts of 5,644 mHealth research articles, which were gathered from five academic search engines by using search terms such as mobile health, and mHealth. The analysis used the Text Explorer module of JMP Pro 13 and an iterative semi-automated process involving tokenizing, phrasing, and terming. After developing the document term matrix (DTM) analyses such as single value decomposition (SVD), topic, and hierarchical document clustering were performed, along with the topic-informed document clustering approach. The results were presented in the form of word-clouds and trend analyses. There were several major findings regarding research clusters and trends. First, our results confirmed time-dependent nature of terminology use in mHealth research. For example, in earlier versus recent years the use of terminology changed from "mobile phone" to "smartphone" and from "applications" to "apps". Second, ten clusters for mHealth research were identified including (I) Clinical Research on Lifestyle Management, (II) Community Health, (III) Literature Review, (IV) Medical Interventions, (V) Research Design, (VI) Infrastructure, (VII) Applications, (VIII) Research and Innovation in Health Technologies, (IX) Sensor-based Devices and Measurement Algorithms, (X) Survey-based Research. Third, the trend analyses indicated the infrastructure cluster as the highest percentage researched area until 2014. The Research and Innovation in Health Technologies cluster experienced the largest increase in numbers of publications in recent years, especially after 2014. This study is unique because it is the only known study utilizing text-mining analyses to reveal the streams and trends for mHealth research. The fast growth in mobile technologies is expected to lead to higher numbers of studies focusing on mHealth and its implications for various healthcare outcomes. Findings of this study can be utilized by researchers in identifying areas for future studies.
Text-mining analysis of mHealth research
Zengul, Ferhat; Oner, Nurettin; Delen, Dursun
2017-01-01
In recent years, because of the advancements in communication and networking technologies, mobile technologies have been developing at an unprecedented rate. mHealth, the use of mobile technologies in medicine, and the related research has also surged parallel to these technological advancements. Although there have been several attempts to review mHealth research through manual processes such as systematic reviews, the sheer magnitude of the number of studies published in recent years makes this task very challenging. The most recent developments in machine learning and text mining offer some potential solutions to address this challenge by allowing analyses of large volumes of texts through semi-automated processes. The objective of this study is to analyze the evolution of mHealth research by utilizing text-mining and natural language processing (NLP) analyses. The study sample included abstracts of 5,644 mHealth research articles, which were gathered from five academic search engines by using search terms such as mobile health, and mHealth. The analysis used the Text Explorer module of JMP Pro 13 and an iterative semi-automated process involving tokenizing, phrasing, and terming. After developing the document term matrix (DTM) analyses such as single value decomposition (SVD), topic, and hierarchical document clustering were performed, along with the topic-informed document clustering approach. The results were presented in the form of word-clouds and trend analyses. There were several major findings regarding research clusters and trends. First, our results confirmed time-dependent nature of terminology use in mHealth research. For example, in earlier versus recent years the use of terminology changed from “mobile phone” to “smartphone” and from “applications” to “apps”. Second, ten clusters for mHealth research were identified including (I) Clinical Research on Lifestyle Management, (II) Community Health, (III) Literature Review, (IV) Medical Interventions, (V) Research Design, (VI) Infrastructure, (VII) Applications, (VIII) Research and Innovation in Health Technologies, (IX) Sensor-based Devices and Measurement Algorithms, (X) Survey-based Research. Third, the trend analyses indicated the infrastructure cluster as the highest percentage researched area until 2014. The Research and Innovation in Health Technologies cluster experienced the largest increase in numbers of publications in recent years, especially after 2014. This study is unique because it is the only known study utilizing text-mining analyses to reveal the streams and trends for mHealth research. The fast growth in mobile technologies is expected to lead to higher numbers of studies focusing on mHealth and its implications for various healthcare outcomes. Findings of this study can be utilized by researchers in identifying areas for future studies. PMID:29430456
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rosenberg, J.I.; Mendis, M.S.; Medville, D.M.
1980-03-01
This report presents a summary of the analytical approach taken and the conclusions reached in an assessment of the supply and demand for manpower in the coal mining industry through the year 2000. A hybrid system dynamics/econometric model of the coal mining industry was developed which incorporates relationships between technological change, labor productivity, production costs, wages, graduation rates, and other key variables in estimating imbalances between labor supply and demand. Study results indicate that while the supply of production workers is expected to be sufficient under most future demand scenarios, periodic shortages of experienced workers, especially in the Northern Greatmore » Plains, can be expected. Other study findings are that the supply of mining engineers will be sufficient under all but the highest coal demand scenario; a shortage of faculty will affect the supply of mining engineers in the near term; and the employment of mining technicians is expected to exhibit the largest increase in any labor category studied. This volume is an Executive Summary which provides a brief description of the study and gives its major conclusions and recommendations. An accompanying volume (Vol. 11) provides a detailed description of the analytical basis for the study, the sources of data used, and a discussion of the conclusions reached.« less
Mine burial in the seabed of high-turbidity area—Findings of a first experiment
NASA Astrophysics Data System (ADS)
Baeye, Matthias; Fettweis, Michael; Legrand, Sebastien; Dupont, Yves; Van Lancker, Vera
2012-07-01
The seabed of the North Sea is covered with ammunition dating back from World Wars I and II. With increasing human interference (e.g. fisheries, aggregate extraction, harbor related activities), it forms a threat to the safety at sea. In this study, test mines were deployed on a sandy seabed for 3 months to investigate mine burial processes as a function of hydrodynamic and meteorological conditions. The mine experiment was conducted in a shallow (9 m), macrotidal environment characterized by highly turbid waters (yearly and depth-averaged suspended particulate matter concentration of 100 mg l-1). Results showed some variability of the overall mine burial, which corresponded with scouring processes induced by a (sub-) tidal forcing mechanism. The main burial events however were linked to storm-related scouring processes, and subsequent mine roll into the resulting pit. Two storms affecting the mines during the 3-month experiment resulted in enduring increases in burial volume to 60% and 80%, respectively. More cyclic and ephemeral burial and exposure events appear to be linked to the local hydrodynamic regime. During slack tides, suspended sediment settles on the seabed, increasing the burial volume. In between slack tides, sediment is resuspended, decreasing the burial volume. The temporal pattern of this never reported burial mechanism, as measured optically, mimics the cyclicity of the suspended sediment concentration as recorded by ultrasonic signals at a nearby benthic observatory. Given the similarity in response signals at the two sites, we hypothesize that the formation of high-concentrated mud suspensions (HCMS) is a mechanism causing short-term burial and exposure of mines. This short-term burial and exposure increase the chance that mines are 'missed' during tracking surveys. Test mines contribute to our understanding of the settling and erosion of HCMS, and thus shed a light on generic sedimentary processes.
ERIC Educational Resources Information Center
Adkins, John; And Others
A project was designed to produce a broad description of current mining training programs and to evaluate their effectiveness with respect to reducing mine injuries. Aggregate training and injury data were used to evaluate the overall training effort at 300 mines as well as specific efforts in 12 categories of training course objectives. From such…
Lucini, Filipe R; S Fogliatto, Flavio; C da Silveira, Giovani J; L Neyeloff, Jeruza; Anzanello, Michel J; de S Kuchenbecker, Ricardo; D Schaan, Beatriz
2017-04-01
Emergency department (ED) overcrowding is a serious issue for hospitals. Early information on short-term inward bed demand from patients receiving care at the ED may reduce the overcrowding problem, and optimize the use of hospital resources. In this study, we use text mining methods to process data from early ED patient records using the SOAP framework, and predict future hospitalizations and discharges. We try different approaches for pre-processing of text records and to predict hospitalization. Sets-of-words are obtained via binary representation, term frequency, and term frequency-inverse document frequency. Unigrams, bigrams and trigrams are tested for feature formation. Feature selection is based on χ 2 and F-score metrics. In the prediction module, eight text mining methods are tested: Decision Tree, Random Forest, Extremely Randomized Tree, AdaBoost, Logistic Regression, Multinomial Naïve Bayes, Support Vector Machine (Kernel linear) and Nu-Support Vector Machine (Kernel linear). Prediction performance is evaluated by F1-scores. Precision and Recall values are also informed for all text mining methods tested. Nu-Support Vector Machine was the text mining method with the best overall performance. Its average F1-score in predicting hospitalization was 77.70%, with a standard deviation (SD) of 0.66%. The method could be used to manage daily routines in EDs such as capacity planning and resource allocation. Text mining could provide valuable information and facilitate decision-making by inward bed management teams. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
Text and Structural Data Mining of Influenza Mentions in Web and Social Media
DOE Office of Scientific and Technical Information (OSTI.GOV)
Corley, Courtney D.; Cook, Diane; Mikler, Armin R.
Text and structural data mining of Web and social media (WSM) provides a novel disease surveillance resource and can identify online communities for targeted public health communications (PHC) to assure wide dissemination of pertinent information. WSM that mention influenza are harvested over a 24-week period, 5-October-2008 to 21-March-2009. Link analysis reveals communities for targeted PHC. Text mining is shown to identify trends in flu posts that correlate to real-world influenza-like-illness patient report data. We also bring to bear a graph-based data mining technique to detect anomalies among flu blogs connected by publisher type, links, and user-tags.
Vaccine adverse event text mining system for extracting features from vaccine safety reports.
Botsis, Taxiarchis; Buttolph, Thomas; Nguyen, Michael D; Winiecki, Scott; Woo, Emily Jane; Ball, Robert
2012-01-01
To develop and evaluate a text mining system for extracting key clinical features from vaccine adverse event reporting system (VAERS) narratives to aid in the automated review of adverse event reports. Based upon clinical significance to VAERS reviewing physicians, we defined the primary (diagnosis and cause of death) and secondary features (eg, symptoms) for extraction. We built a novel vaccine adverse event text mining (VaeTM) system based on a semantic text mining strategy. The performance of VaeTM was evaluated using a total of 300 VAERS reports in three sequential evaluations of 100 reports each. Moreover, we evaluated the VaeTM contribution to case classification; an information retrieval-based approach was used for the identification of anaphylaxis cases in a set of reports and was compared with two other methods: a dedicated text classifier and an online tool. The performance metrics of VaeTM were text mining metrics: recall, precision and F-measure. We also conducted a qualitative difference analysis and calculated sensitivity and specificity for classification of anaphylaxis cases based on the above three approaches. VaeTM performed best in extracting diagnosis, second level diagnosis, drug, vaccine, and lot number features (lenient F-measure in the third evaluation: 0.897, 0.817, 0.858, 0.874, and 0.914, respectively). In terms of case classification, high sensitivity was achieved (83.1%); this was equal and better compared to the text classifier (83.1%) and the online tool (40.7%), respectively. Our VaeTM implementation of a semantic text mining strategy shows promise in providing accurate and efficient extraction of key features from VAERS narratives.
DISEASES: text mining and data integration of disease-gene associations.
Pletscher-Frankild, Sune; Pallejà, Albert; Tsafou, Kalliopi; Binder, Janos X; Jensen, Lars Juhl
2015-03-01
Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease-gene associations from biomedical abstracts. The system consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases, which we combine with a scoring scheme that takes into account co-occurrences both within and between sentences. We show that this approach is able to extract half of all manually curated associations with a false positive rate of only 0.16%. Nonetheless, text mining should not stand alone, but be combined with other types of evidence. For this reason, we have developed the DISEASES resource, which integrates the results from text mining with manually curated disease-gene associations, cancer mutation data, and genome-wide association studies from existing databases. The DISEASES resource is accessible through a web interface at http://diseases.jensenlab.org/, where the text-mining software and all associations are also freely available for download. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Simmons, Michael; Singhal, Ayush; Lu, Zhiyong
2018-01-01
The key question of precision medicine is whether it is possible to find clinically actionable granularity in diagnosing disease and classifying patient risk. The advent of next generation sequencing and the widespread adoption of electronic health records (EHRs) have provided clinicians and researchers a wealth of data and made possible the precise characterization of individual patient genotypes and phenotypes. Unstructured text — found in biomedical publications and clinical notes — is an important component of genotype and phenotype knowledge. Publications in the biomedical literature provide essential information for interpreting genetic data. Likewise, clinical notes contain the richest source of phenotype information in EHRs. Text mining can render these texts computationally accessible and support information extraction and hypothesis generation. This chapter reviews the mechanics of text mining in precision medicine and discusses several specific use cases, including database curation for personalized cancer medicine, patient outcome prediction from EHR-derived cohorts, and pharmacogenomic research. Taken as a whole, these use cases demonstrate how text mining enables effective utilization of existing knowledge sources and thus promotes increased value for patients and healthcare systems. Text mining is an indispensable tool for translating genotype-phenotype data into effective clinical care that will undoubtedly play an important role in the eventual realization of precision medicine. PMID:27807747
Simmons, Michael; Singhal, Ayush; Lu, Zhiyong
2016-01-01
The key question of precision medicine is whether it is possible to find clinically actionable granularity in diagnosing disease and classifying patient risk. The advent of next-generation sequencing and the widespread adoption of electronic health records (EHRs) have provided clinicians and researchers a wealth of data and made possible the precise characterization of individual patient genotypes and phenotypes. Unstructured text-found in biomedical publications and clinical notes-is an important component of genotype and phenotype knowledge. Publications in the biomedical literature provide essential information for interpreting genetic data. Likewise, clinical notes contain the richest source of phenotype information in EHRs. Text mining can render these texts computationally accessible and support information extraction and hypothesis generation. This chapter reviews the mechanics of text mining in precision medicine and discusses several specific use cases, including database curation for personalized cancer medicine, patient outcome prediction from EHR-derived cohorts, and pharmacogenomic research. Taken as a whole, these use cases demonstrate how text mining enables effective utilization of existing knowledge sources and thus promotes increased value for patients and healthcare systems. Text mining is an indispensable tool for translating genotype-phenotype data into effective clinical care that will undoubtedly play an important role in the eventual realization of precision medicine.
Using Text Mining to Uncover Students' Technology-Related Problems in Live Video Streaming
ERIC Educational Resources Information Center
Abdous, M'hammed; He, Wu
2011-01-01
Because of their capacity to sift through large amounts of data, text mining and data mining are enabling higher education institutions to reveal valuable patterns in students' learning behaviours without having to resort to traditional survey methods. In an effort to uncover live video streaming (LVS) students' technology related-problems and to…
Goode, Daniel J.; Cravotta, Charles A.; Hornberger, Roger J.; Hewitt, Michael A.; Hughes, Robert E.; Koury, Daniel J.; Eicholtz, Lee W.
2011-01-01
This report, prepared in cooperation with the Pennsylvania Department of Environmental Protection (PaDEP), the Eastern Pennsylvania Coalition for Abandoned Mine Reclamation, and the Dauphin County Conservation District, provides estimates of water budgets and groundwater volumes stored in abandoned underground mines in the Western Middle Anthracite Coalfield, which encompasses an area of 120 square miles in eastern Pennsylvania. The estimates are based on preliminary simulations using a groundwater-flow model and an associated geographic information system that integrates data on the mining features, hydrogeology, and streamflow in the study area. The Mahanoy and Shamokin Creek Basins were the focus of the study because these basins exhibit extensive hydrologic effects and water-quality degradation from the abandoned mines in their headwaters in the Western Middle Anthracite Coalfield. Proposed groundwater withdrawals from the flooded parts of the mines and stream-channel modifications in selected areas have the potential for altering the distribution of groundwater and the interaction between the groundwater and streams in the area. Preliminary three-dimensional, steady-state simulations of groundwater flow by the use of MODFLOW are presented to summarize information on the exchange of groundwater among adjacent mines and to help guide the management of ongoing data collection, reclamation activities, and water-use planning. The conceptual model includes high-permeability mine voids that are connected vertically and horizontally within multicolliery units (MCUs). MCUs were identified on the basis of mine maps, locations of mine discharges, and groundwater levels in the mines measured by PaDEP. The locations and integrity of mine barriers were determined from mine maps and groundwater levels. The permeability of intact barriers is low, reflecting the hydraulic characteristics of unmined host rock and coal. A steady-state model was calibrated to measured groundwater levels and stream base flow, the latter at many locations composed primarily of discharge from mines. Automatic parameter estimation used MODFLOW-2000 with manual adjustments to constrain parameter values to realistic ranges. The calibrated model supports the conceptual model of high-permeability MCUs separated by low-permeability barriers and streamflow losses and gains associated with mine infiltration and discharge. The simulated groundwater levels illustrate low groundwater gradients within an MCU and abrupt changes in water levels between MCUs. The preliminary model results indicate that the primary result of increased pumping from the mine would be reduced discharge from the mine to streams near the pumping wells. The intact barriers limit the spatial extent of mine dewatering. Considering the simulated groundwater levels, depth of mining, and assumed bulk porosity of 11 or 40 percent for the mined seams, the water volume in storage in the mines of the Western Middle Anthracite Coalfield was estimated to range from 60 to 220 billion gallons, respectively. Details of the groundwater-level distribution and the rates of some mine discharges are not simulated well using the preliminary model. Use of the model results should be limited to evaluation of the conceptual model and its simulation using porous-media flow methods, overall water budgets for the Western Middle Anthracite Coalfield, and approximate storage volumes. Model results should not be considered accurate for detailed simulation of flow within a single MCU or individual flooded mine. Although improvements in the model calibration were possible by introducing spatial variability in permeability parameters and adjusting barrier properties, more detailed parameterizations have increased uncertainty because of the limited data set. The preliminary identification of data needs includes continuous streamflow, mine discharge rate, and groundwater levels in the mines and adjacent areas. Data collected whe
Hammond, Kenric W; Ben-Ari, Alon Y; Laundry, Ryan J; Boyko, Edward J; Samore, Matthew H
2015-12-01
Free text in electronic health records resists large-scale analysis. Text records facts of interest not found in encoded data, and text mining enables their retrieval and quantification. The U.S. Department of Veterans Affairs (VA) clinical data repository affords an opportunity to apply text-mining methodology to study clinical questions in large populations. To assess the feasibility of text mining, investigation of the relationship between exposure to adverse childhood experiences (ACEs) and recorded diagnoses was conducted among all VA-treated Gulf war veterans, utilizing all progress notes recorded from 2000-2011. Text processing extracted ACE exposures recorded among 44.7 million clinical notes belonging to 243,973 veterans. The relationship of ACE exposure to adult illnesses was analyzed using logistic regression. Bias considerations were assessed. ACE score was strongly associated with suicide attempts and serious mental disorders (ORs = 1.84 to 1.97), and less so with behaviorally mediated and somatic conditions (ORs = 1.02 to 1.36) per unit. Bias adjustments did not remove persistent associations between ACE score and most illnesses. Text mining to detect ACE exposure in a large population was feasible. Analysis of the relationship between ACE score and adult health conditions yielded patterns of association consistent with prior research. Copyright © 2015 International Society for Traumatic Stress Studies.
Three diameter-limit cuttings in West Virginia hardwoods a 5-year report
Russell J. Hutnik
1958-01-01
Mine timbers are a basic need of West Virginia's giant coal industry. The annual requirement of sawed mine timbers is roughly 250 million board feet. The mines also use a large volume of wood in rough form for props and lagging. Yet, compared to sawlogs and veneer logs, these mine timbers are low-value products. This means that they must be produced at low cost....
Improved fire protection system for underground fueling areas. Volume II. Final report Sep 77-Oct 81
DOE Office of Scientific and Technical Information (OSTI.GOV)
McDonald, L.; Kennedy, D.; Reid, G.
1981-10-01
The objectives of this investigation were to (1) develop safe practice guidelines that will minimize the chance of fires in underground fueling areas and (2) to develop a low-cost, reliable, automatic fire control system (AFCS) for underground fueling areas. Volume I of the report covered the period from June 21, 1976, to September 30, 1977, and included (1) the preparation of safe practice guidelines for underground fueling areas; (2) preparation of recommended AFCS design concepts for underground fueling areas; and (3) the design, fabrication, and in-mine fire test of an AFCS at Pine Creek Mine, Bishop, Calif. Volume II ofmore » the report covers the period from September 30, 1977, to September 30, 1981, and includes (1) a long-term validation test of the AFCS in the Pine Creek Mine, (2) a study of the environmental effects of aqueous film-forming foam, (3) the design and installation of a system at AMAX Buick Mine, Boss, Mo., (4) the design of a system for enclosed fuel areas, and (5) the design of a system for semipermanent fueling areas.« less
Feasibility of mining lunar resources for earth use: Circa 2000 AD. Volume 2: Technical discussion
NASA Technical Reports Server (NTRS)
Nishioka, K.; Arno, R. D.; Alexander, A. D.; Slye, R. E.
1973-01-01
The technologies and systems required to establish the mining base, mine, refine, and return lunar resources to earth are discussed. Gross equipment requirements, their weights and costs are estimated and documented. The operational requirements are analyzed and tabulated. Diagrams of equipment and processing facilities are provided.
Text mining and its potential applications in systems biology.
Ananiadou, Sophia; Kell, Douglas B; Tsujii, Jun-ichi
2006-12-01
With biomedical literature increasing at a rate of several thousand papers per week, it is impossible to keep abreast of all developments; therefore, automated means to manage the information overload are required. Text mining techniques, which involve the processes of information retrieval, information extraction and data mining, provide a means of solving this. By adding meaning to text, these techniques produce a more structured analysis of textual knowledge than simple word searches, and can provide powerful tools for the production and analysis of systems biology models.
Forming artificial soils from waste materials for mine site rehabilitation
NASA Astrophysics Data System (ADS)
Yellishetty, Mohan; Wong, Vanessa; Taylor, Michael; Li, Johnson
2014-05-01
Surface mining activities often produce large volumes of solid wastes which invariably requires the removal of significant quantities of waste rock (overburden). As mines expand, larger volumes of waste rock need to be moved which also require extensive areas for their safe disposal and containment. The erosion of these dumps may result in landform instability, which in turn may result in exposure of contaminants such as trace metals, elevated sediment delivery in adjacent waterways, and the subsequent degradation of downstream water quality. The management of solid waste materials from industrial operations is also a key component for a sustainable economy. For example, in addition to overburden, coal mines produce large amounts of waste in the form of fly ash while sewage treatment plants require disposal of large amounts of compost. Similarly, paper mills produce large volumes of alkaline rejected wood chip waste which is usually disposed of in landfill. These materials, therefore, presents a challenge in their use, and re-use in the rehabilitation of mine sites and provides a number of opportunities for innovative waste disposal. The combination of solid wastes sourced from mines, which are frequently nutrient poor and acidic, with nutrient-rich composted material produced from sewage treatment and alkaline wood chip waste has the potential to lead to a soil suitable for mine rehabilitation and successful seed germination and plant growth. This paper presents findings from two pilot projects which investigated the potential of artificial soils to support plant growth for mine site rehabilitation. We found that pH increased in all the artificial soil mixtures and were able to support plant establishment. Plant growth was greatest in those soils with the greatest proportion of compost due to the higher nutrient content. These pot trials suggest that the use of different waste streams to form an artificial soil can potentially be used in mine site rehabilitation where there is a nutrient-rich source of waste.
VisualUrText: A Text Analytics Tool for Unstructured Textual Data
NASA Astrophysics Data System (ADS)
Zainol, Zuraini; Jaymes, Mohd T. H.; Nohuddin, Puteri N. E.
2018-05-01
The growing amount of unstructured text over Internet is tremendous. Text repositories come from Web 2.0, business intelligence and social networking applications. It is also believed that 80-90% of future growth data is available in the form of unstructured text databases that may potentially contain interesting patterns and trends. Text Mining is well known technique for discovering interesting patterns and trends which are non-trivial knowledge from massive unstructured text data. Text Mining covers multidisciplinary fields involving information retrieval (IR), text analysis, natural language processing (NLP), data mining, machine learning statistics and computational linguistics. This paper discusses the development of text analytics tool that is proficient in extracting, processing, analyzing the unstructured text data and visualizing cleaned text data into multiple forms such as Document Term Matrix (DTM), Frequency Graph, Network Analysis Graph, Word Cloud and Dendogram. This tool, VisualUrText, is developed to assist students and researchers for extracting interesting patterns and trends in document analyses.
An Empirical Model for Mine-Blast Loading
2014-10-17
fledged experimental program. The numerical approach however suffers from several drawbacks in the mine blast simulations. First, it is a very...Suffield consisted in a pendulum type device to measure global impulse of buried mine [15]. One of the main purposes of the ONAGER pendulum was to study...TP-1 Terminal effects, KTA 1-34 report, 2004. [15] Bues, R., Hlady, S.L. and Bergeron, D.M., Pendulum Measurement of Land Mine Blast Output, Volume
BioCreative Workshops for DOE Genome Sciences: Text Mining for Metagenomics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Cathy H.; Hirschman, Lynette
The objective of this project was to host BioCreative workshops to define and develop text mining tasks to meet the needs of the Genome Sciences community, focusing on metadata information extraction in metagenomics. Following the successful introduction of metagenomics at the BioCreative IV workshop, members of the metagenomics community and BioCreative communities continued discussion to identify candidate topics for a BioCreative metagenomics track for BioCreative V. Of particular interest was the capture of environmental and isolation source information from text. The outcome was to form a “community of interest” around work on the interactive EXTRACT system, which supported interactive taggingmore » of environmental and species data. This experiment is included in the BioCreative V virtual issue of Database. In addition, there was broad participation by members of the metagenomics community in the panels held at BioCreative V, leading to valuable exchanges between the text mining developers and members of the metagenomics research community. These exchanges are reflected in a number of the overview and perspective pieces also being captured in the BioCreative V virtual issue. Overall, this conversation has exposed the metagenomics researchers to the possibilities of text mining, and educated the text mining developers to the specific needs of the metagenomics community.« less
Benchmarking infrastructure for mutation text mining
2014-01-01
Background Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. Results We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. Conclusion We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption. PMID:24568600
Benchmarking infrastructure for mutation text mining.
Klein, Artjom; Riazanov, Alexandre; Hindle, Matthew M; Baker, Christopher Jo
2014-02-25
Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption.
Lang, Xu; Li, Huabing; Qin, Wen; Yu, Chunshui
2014-01-01
Investigations on hippocampal and amygdalar volume have revealed inconsistent results in patients with posttraumatic stress disorder (PTSD). Little is known about the structural covariance alterations between the hippocampus and amygdala in PTSD. In this study, we evaluated the alteration in the hippocampal and amygdalar volume and their structural covariance in the coal mine gas explosion related PTSD. High resolution T1-weighted magnetic resonance imaging (MRI) was performed on coal mine gas explosion related PTSD male patients (n = 14) and non-traumatized coalminers without PTSD (n = 25). The voxel-based morphometry (VBM) method was used to test the inter-group differences in hippocampal and amygdalar volume as well as the inter-group differences in structural covariance between the ipsilateral hippocampus and amygdala. PTSD patients exhibited decreased gray matter volume (GMV) in the bilateral hippocampi compared to controls (p<0.05, FDR corrected). GMV covariances between the ipsilateral hippocampus and amygdala were significantly reduced in PTSD patients compared with controls (p<0.05, FDR corrected). The coalminers with gas explosion related PTSD had decreased hippocampal volume and structural covariance with the ipsilateral amygdala, suggesting that the structural impairment of the hippocampus may implicate in the pathophysiology of PTSD. PMID:25000505
DrugQuest - a text mining workflow for drug association discovery.
Papanikolaou, Nikolas; Pavlopoulos, Georgios A; Theodosiou, Theodosios; Vizirianakis, Ioannis S; Iliopoulos, Ioannis
2016-06-06
Text mining and data integration methods are gaining ground in the field of health sciences due to the exponential growth of bio-medical literature and information stored in biological databases. While such methods mostly try to extract bioentity associations from PubMed, very few of them are dedicated in mining other types of repositories such as chemical databases. Herein, we apply a text mining approach on the DrugBank database in order to explore drug associations based on the DrugBank "Description", "Indication", "Pharmacodynamics" and "Mechanism of Action" text fields. We apply Name Entity Recognition (NER) techniques on these fields to identify chemicals, proteins, genes, pathways, diseases, and we utilize the TextQuest algorithm to find additional biologically significant words. Using a plethora of similarity and partitional clustering techniques, we group the DrugBank records based on their common terms and investigate possible scenarios why these records are clustered together. Different views such as clustered chemicals based on their textual information, tag clouds consisting of Significant Terms along with the terms that were used for clustering are delivered to the user through a user-friendly web interface. DrugQuest is a text mining tool for knowledge discovery: it is designed to cluster DrugBank records based on text attributes in order to find new associations between drugs. The service is freely available at http://bioinformatics.med.uoc.gr/drugquest .
ERIC Educational Resources Information Center
Kong, Siu Cheung; Li, Ping; Song, Yanjie
2018-01-01
This study evaluated a bilingual text-mining system, which incorporated a bilingual taxonomy of key words and provided hierarchical visualization, for understanding learner-generated text in the learning management systems through automatic identification and counting of matching key words. A class of 27 in-service teachers studied a course…
Beyond accuracy: creating interoperable and scalable text-mining web services.
Wei, Chih-Hsuan; Leaman, Robert; Lu, Zhiyong
2016-06-15
The biomedical literature is a knowledge-rich resource and an important foundation for future research. With over 24 million articles in PubMed and an increasing growth rate, research in automated text processing is becoming increasingly important. We report here our recently developed web-based text mining services for biomedical concept recognition and normalization. Unlike most text-mining software tools, our web services integrate several state-of-the-art entity tagging systems (DNorm, GNormPlus, SR4GN, tmChem and tmVar) and offer a batch-processing mode able to process arbitrary text input (e.g. scholarly publications, patents and medical records) in multiple formats (e.g. BioC). We support multiple standards to make our service interoperable and allow simpler integration with other text-processing pipelines. To maximize scalability, we have preprocessed all PubMed articles, and use a computer cluster for processing large requests of arbitrary text. Our text-mining web service is freely available at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/#curl : Zhiyong.Lu@nih.gov. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.
75 FR 51291 - National Science Board: Sunshine Act Meetings; Notice
Federal Register 2010, 2011, 2012, 2013, 2014
2010-08-19
...-Gathering Activities. [cir] COV Report Text-Mining. [cir] Design of Research Questions for External Input. [cir] SBE/CISE Text-Mining Projects. [cir] Using a Blog for Informal Input. Committee on Education and...
Imitating manual curation of text-mined facts in biomedicine.
Rodriguez-Esteban, Raul; Iossifov, Ivan; Rzhetsky, Andrey
2006-09-08
Text-mining algorithms make mistakes in extracting facts from natural-language texts. In biomedical applications, which rely on use of text-mined data, it is critical to assess the quality (the probability that the message is correctly extracted) of individual facts--to resolve data conflicts and inconsistencies. Using a large set of almost 100,000 manually produced evaluations (most facts were independently reviewed more than once, producing independent evaluations), we implemented and tested a collection of algorithms that mimic human evaluation of facts provided by an automated information-extraction system. The performance of our best automated classifiers closely approached that of our human evaluators (ROC score close to 0.95). Our hypothesis is that, were we to use a larger number of human experts to evaluate any given sentence, we could implement an artificial-intelligence curator that would perform the classification job at least as accurately as an average individual human evaluator. We illustrated our analysis by visualizing the predicted accuracy of the text-mined relations involving the term cocaine.
Assimilating Text-Mining & Bio-Informatics Tools to Analyze Cellulase structures
NASA Astrophysics Data System (ADS)
Satyasree, K. P. N. V., Dr; Lalitha Kumari, B., Dr; Jyotsna Devi, K. S. N. V.; Choudri, S. M. Roy; Pratap Joshi, K.
2017-08-01
Text-mining is one of the best potential way of automatically extracting information from the huge biological literature. To exploit its prospective, the knowledge encrypted in the text should be converted to some semantic representation such as entities and relations, which could be analyzed by machines. But large-scale practical systems for this purpose are rare. But text mining could be helpful for generating or validating predictions. Cellulases have abundant applications in various industries. Cellulose degrading enzymes are cellulases and the same producing bacteria - Bacillus subtilis & fungus Pseudomonas putida were isolated from top soil of Guntur Dt. A.P. India. Absolute cultures were conserved on potato dextrose agar medium for molecular studies. In this paper, we presented how well the text mining concepts can be used to analyze cellulase producing bacteria and fungi, their comparative structures are also studied with the aid of well-establised, high quality standard bioinformatic tools such as Bioedit, Swissport, Protparam, EMBOSSwin with which a complete data on Cellulases like structure, constituents of the enzyme has been obtained.
Gurulingappa, Harsha; Toldo, Luca; Rajput, Abdul Mateen; Kors, Jan A; Taweel, Adel; Tayrouz, Yorki
2013-11-01
The aim of this study was to assess the impact of automatically detected adverse event signals from text and open-source data on the prediction of drug label changes. Open-source adverse effect data were collected from FAERS, Yellow Cards and SIDER databases. A shallow linguistic relation extraction system (JSRE) was applied for extraction of adverse effects from MEDLINE case reports. Statistical approach was applied on the extracted datasets for signal detection and subsequent prediction of label changes issued for 29 drugs by the UK Regulatory Authority in 2009. 76% of drug label changes were automatically predicted. Out of these, 6% of drug label changes were detected only by text mining. JSRE enabled precise identification of four adverse drug events from MEDLINE that were undetectable otherwise. Changes in drug labels can be predicted automatically using data and text mining techniques. Text mining technology is mature and well-placed to support the pharmacovigilance tasks. Copyright © 2013 John Wiley & Sons, Ltd.
Mining Adverse Drug Reactions in Social Media with Named Entity Recognition and Semantic Methods.
Chen, Xiaoyi; Deldossi, Myrtille; Aboukhamis, Rim; Faviez, Carole; Dahamna, Badisse; Karapetiantz, Pierre; Guenegou-Arnoux, Armelle; Girardeau, Yannick; Guillemin-Lanne, Sylvie; Lillo-Le-Louët, Agnès; Texier, Nathalie; Burgun, Anita; Katsahian, Sandrine
2017-01-01
Suspected adverse drug reactions (ADR) reported by patients through social media can be a complementary source to current pharmacovigilance systems. However, the performance of text mining tools applied to social media text data to discover ADRs needs to be evaluated. In this paper, we introduce the approach developed to mine ADR from French social media. A protocol of evaluation is highlighted, which includes a detailed sample size determination and evaluation corpus constitution. Our text mining approach provided very encouraging preliminary results with F-measures of 0.94 and 0.81 for recognition of drugs and symptoms respectively, and with F-measure of 0.70 for ADR detection. Therefore, this approach is promising for downstream pharmacovigilance analysis.
Detection and Evaluation of Cheating on College Exams Using Supervised Classification
ERIC Educational Resources Information Center
Cavalcanti, Elmano Ramalho; Pires, Carlos Eduardo; Cavalcanti, Elmano Pontes; Pires, Vládia Freire
2012-01-01
Text mining has been used for various purposes, such as document classification and extraction of domain-specific information from text. In this paper we present a study in which text mining methodology and algorithms were properly employed for academic dishonesty (cheating) detection and evaluation on open-ended college exams, based on document…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mendis, M.S.; Rosenberg, J.I.; Medville, D.M.
1980-03-01
This report presents a summary of the analytical approach taken and the conclusions reached in an assessment of the supply and demand for manpower in the coal mining industry through the year 2000. A hybrid system dynamics/econometric model of the coal mining industry was developed which incorporates relationships between technological change, labor productivity, production costs, wages, graduation rates, and other key variables in estimating imbalances between labor supply and demand. Study results indicate that while the supply of production workers is expected to be sufficient under most future demand scenarios, periodic shortages of experienced workers, especially in the Northern Greatmore » Plains can be expected. Other study findings are that the supply of mining engineers will be sufficient under all but the highest coal demand scenario, a shortage of faculty will affect the supply of mining engineers in the near-term and the employment of mining technicians is expected to exhibit the largest increase in any labor category studied. In this volume the nature of the coal mining manpower problem is discussed, a detailed description of that analysis conducted and the sources of data used is provided, and the findings of the study are presented.« less
What the papers say: Text mining for genomics and systems biology
2010-01-01
Keeping up with the rapidly growing literature has become virtually impossible for most scientists. This can have dire consequences. First, we may waste research time and resources on reinventing the wheel simply because we can no longer maintain a reliable grasp on the published literature. Second, and perhaps more detrimental, judicious (or serendipitous) combination of knowledge from different scientific disciplines, which would require following disparate and distinct research literatures, is rapidly becoming impossible for even the most ardent readers of research publications. Text mining -- the automated extraction of information from (electronically) published sources -- could potentially fulfil an important role -- but only if we know how to harness its strengths and overcome its weaknesses. As we do not expect that the rate at which scientific results are published will decrease, text mining tools are now becoming essential in order to cope with, and derive maximum benefit from, this information explosion. In genomics, this is particularly pressing as more and more rare disease-causing variants are found and need to be understood. Not being conversant with this technology may put scientists and biomedical regulators at a severe disadvantage. In this review, we introduce the basic concepts underlying modern text mining and its applications in genomics and systems biology. We hope that this review will serve three purposes: (i) to provide a timely and useful overview of the current status of this field, including a survey of present challenges; (ii) to enable researchers to decide how and when to apply text mining tools in their own research; and (iii) to highlight how the research communities in genomics and systems biology can help to make text mining from biomedical abstracts and texts more straightforward. PMID:21106487
Knowledge based word-concept model estimation and refinement for biomedical text mining.
Jimeno Yepes, Antonio; Berlanga, Rafael
2015-02-01
Text mining of scientific literature has been essential for setting up large public biomedical databases, which are being widely used by the research community. In the biomedical domain, the existence of a large number of terminological resources and knowledge bases (KB) has enabled a myriad of machine learning methods for different text mining related tasks. Unfortunately, KBs have not been devised for text mining tasks but for human interpretation, thus performance of KB-based methods is usually lower when compared to supervised machine learning methods. The disadvantage of supervised methods though is they require labeled training data and therefore not useful for large scale biomedical text mining systems. KB-based methods do not have this limitation. In this paper, we describe a novel method to generate word-concept probabilities from a KB, which can serve as a basis for several text mining tasks. This method not only takes into account the underlying patterns within the descriptions contained in the KB but also those in texts available from large unlabeled corpora such as MEDLINE. The parameters of the model have been estimated without training data. Patterns from MEDLINE have been built using MetaMap for entity recognition and related using co-occurrences. The word-concept probabilities were evaluated on the task of word sense disambiguation (WSD). The results showed that our method obtained a higher degree of accuracy than other state-of-the-art approaches when evaluated on the MSH WSD data set. We also evaluated our method on the task of document ranking using MEDLINE citations. These results also showed an increase in performance over existing baseline retrieval approaches. Copyright © 2014 Elsevier Inc. All rights reserved.
Assessing semantic similarity of texts - Methods and algorithms
NASA Astrophysics Data System (ADS)
Rozeva, Anna; Zerkova, Silvia
2017-12-01
Assessing the semantic similarity of texts is an important part of different text-related applications like educational systems, information retrieval, text summarization, etc. This task is performed by sophisticated analysis, which implements text-mining techniques. Text mining involves several pre-processing steps, which provide for obtaining structured representative model of the documents in a corpus by means of extracting and selecting the features, characterizing their content. Generally the model is vector-based and enables further analysis with knowledge discovery approaches. Algorithms and measures are used for assessing texts at syntactical and semantic level. An important text-mining method and similarity measure is latent semantic analysis (LSA). It provides for reducing the dimensionality of the document vector space and better capturing the text semantics. The mathematical background of LSA for deriving the meaning of the words in a given text by exploring their co-occurrence is examined. The algorithm for obtaining the vector representation of words and their corresponding latent concepts in a reduced multidimensional space as well as similarity calculation are presented.
Mine Waste at The Kherzet Youcef Mine : Environmental Characterization
NASA Astrophysics Data System (ADS)
Issaad, Mouloud; Boutaleb, Abdelhak; Kolli, Omar
2017-04-01
Mining activity in Algeria has existed since antiquity. But it was very important since the 20th century. This activity has virtually ceased since the beginning of the 1990s, leaving many mine sites abandoned (so-called orphan mines). The abandonment of mining today poses many environmental problems (soil pollution, contamination of surface water, mining collapses...). The mining wastes often occupy large volumes that can be hazardous to the environment and human health, often neglected in the past: Faulting geotechnical implementation, acid mine drainage (AMD), alkalinity, presence of pollutants and toxic substances (heavy metals, cyanide...). The study started already six years ago and it covers all mines located in NE Algeria, almost are stopped for more than thirty years. So the most important is to have an overview of all the study area. After the inventory job of the abandoned mines, the rock drainage prediction will help us to classify sites according to their acid generating potential.
Chapter 7: Selecting tree species for reforestation of Appalachian mined lands
V. Davis; J.A. Burger; R. Rathfon; C.E. Zipper
2017-01-01
The Forestry Reclamation Approach (FRA) is a method for reclaiming coal-mined land to forested postmining land uses under the federal Surface Mining Control and Reclamation Act of 1977 (SMCRA) (Chapter 2, this volume). Step 4 of the FRA is to plant native trees for commercial timber value, wildlife habitat, soil stability, watershed protection, and other environmental...
ERIC Educational Resources Information Center
Benoit, Gerald
2002-01-01
Discusses data mining (DM) and knowledge discovery in databases (KDD), taking the view that KDD is the larger view of the entire process, with DM emphasizing the cleaning, warehousing, mining, and visualization of knowledge discovery in databases. Highlights include algorithms; users; the Internet; text mining; and information extraction.…
Geophysical exploration of historical mine dumps for the estimation of valuable residuals
NASA Astrophysics Data System (ADS)
Martin, Tina; Knieß, Rudolf; Noell, Ursula; Hupfer, Sarah; Kuhn, Kerstin; Günther, Thomas
2015-04-01
Within the project ROBEHA, funded by the German Federal Ministry of Education and Research (033R105) the economic potential of different abandoned dump sites for mine waste in the Harz Mountains was investigated. Two different mining dumps were geophysically and mineralogically analysed in order to characterize the mine dump structure and to estimate the volume of the potential recycling material. The geophysical methods comprised geoelectrics, radar, and spectral induced polarization (SIP). One about 100-year old mining dump containing residues from density separated Ag- and Sb-rich Pb (Zn)-gangue ores was investigated in detail. Like most small-scale mining waste disposal sites this investigated dump is very heterogeneously structured. Therefore, 27 geoelectrical profiles, more than 50 radar profiles, and several SIP profiles were measured and analysed. The results from the radar measurements, registered with the GSSI system and a shielded 200 MHz antenna, show the near surface boundary layer (down to 3-4 m beneath surface) of the waste residuals. These results can be used as pre-information for the inversion process of the geoelectrical data. The geoelectrical results reveal the mineral residues as layers with higher resistivities (> 300 Ohm*m) than the surrounding material. The SIP method found low phase signals (< 0.5°) for the residues. To estimate the volume of the potentially reusable material we analysed each geoelectrical profile and interpolated between the single profiles using the BERT algorithm. Taking into account the wooded areas of the mine dump and other parameters we get a first estimate for the volume of the residues but the economical viability and the environmental impact of the reworking of the dump still needs to be evaluated in detail. The results of the second mine dump, an abandoned Cu and Zn-rich slag heap, show that the slag residues are characterized by higher resistivities and higher phases. A localization of the slag residues which are covered by organic material could be realized applying these geophysical methods.
ERIC Educational Resources Information Center
Bowers, Alex J.; Chen, Jingjing
2015-01-01
The purpose of this study is to bring together recent innovations in the research literature around school district capital facility finance, municipal bond elections, statistical models of conditional time-varying outcomes, and data mining algorithms for automated text mining of election ballot proposals to examine the factors that influence the…
New directions in biomedical text annotation: definitions, guidelines and corpus construction
Wilbur, W John; Rzhetsky, Andrey; Shatkay, Hagit
2006-01-01
Background While biomedical text mining is emerging as an important research area, practical results have proven difficult to achieve. We believe that an important first step towards more accurate text-mining lies in the ability to identify and characterize text that satisfies various types of information needs. We report here the results of our inquiry into properties of scientific text that have sufficient generality to transcend the confines of a narrow subject area, while supporting practical mining of text for factual information. Our ultimate goal is to annotate a significant corpus of biomedical text and train machine learning methods to automatically categorize such text along certain dimensions that we have defined. Results We have identified five qualitative dimensions that we believe characterize a broad range of scientific sentences, and are therefore useful for supporting a general approach to text-mining: focus, polarity, certainty, evidence, and directionality. We define these dimensions and describe the guidelines we have developed for annotating text with regard to them. To examine the effectiveness of the guidelines, twelve annotators independently annotated the same set of 101 sentences that were randomly selected from current biomedical periodicals. Analysis of these annotations shows 70–80% inter-annotator agreement, suggesting that our guidelines indeed present a well-defined, executable and reproducible task. Conclusion We present our guidelines defining a text annotation task, along with annotation results from multiple independently produced annotations, demonstrating the feasibility of the task. The annotation of a very large corpus of documents along these guidelines is currently ongoing. These annotations form the basis for the categorization of text along multiple dimensions, to support viable text mining for experimental results, methodology statements, and other forms of information. We are currently developing machine learning methods, to be trained and tested on the annotated corpus, that would allow for the automatic categorization of biomedical text along the general dimensions that we have presented. The guidelines in full detail, along with annotated examples, are publicly available. PMID:16867190
Soil biochemical properties in brown and gray mine soils with and without hydroseeding
NASA Astrophysics Data System (ADS)
Thomas, C.; Sexstone, A.; Skousen, J.
2015-09-01
Surface coal mining in the eastern USA disturbs hundreds of hectares of land every year and removes valuable and ecologically diverse eastern deciduous forests. Reclamation involves restoring the landscape to approximate original contour, replacing the topsoil, and revegetating the site with trees and herbaceous species to a designated post-mining land use. Re-establishing an ecosystem of ecological and economic value as well as restoring soil quality on disturbed sites are the goals of land reclamation, and microbial properties of mine soils can be indicators of restoration success. Reforestation plots were constructed in 2007 using weathered brown sandstone or unweathered gray sandstone as topsoil substitutes to evaluate tree growth and soil properties at Arch Coal's Birch River mine in West Virginia, USA. All plots were planted with 12 hardwood tree species and subplots were hydroseeded with a herbaceous seed mix and fertilizer. After 6 years, the average tree volume index was nearly 10 times greater for trees grown in brown (3853 cm3) compared to gray mine soils (407 cm3). Average pH of brown mine soils increased from 4.7 to 5.0, while gray mine soils declined from 7.9 to 7.0. Hydroseeding doubled tree volume index and ground cover on both mine soils. Hydroseeding doubled microbial biomass carbon (MBC) on brown mine soils (8.7 vs. 17.5 mg kg-1), but showed no effect on gray mine soils (13.3 vs. 12.8 mg kg-1). Hydroseeding also increased the ratio of MBC to soil organic C in both soils and more than tripled the ratio for potentially mineralizable nitrogen (PMN) to total N. Brown mine soils were a better growth medium than gray mine soils and hydroseeding was an important component of reclamation due to improved biochemical properties and microbial activity in mine soils.
NASA Astrophysics Data System (ADS)
Scheele, C. J.; Huang, Q.
2016-12-01
In the past decade, the rise in social media has led to the development of a vast number of social media services and applications. Disaster management represents one of such applications leveraging massive data generated for event detection, response, and recovery. In order to find disaster relevant social media data, current approaches utilize natural language processing (NLP) methods based on keywords, or machine learning algorithms relying on text only. However, these approaches cannot be perfectly accurate due to the variability and uncertainty in language used on social media. To improve current methods, the enhanced text-mining framework is proposed to incorporate location information from social media and authoritative remote sensing datasets for detecting disaster relevant social media posts, which are determined by assessing the textual content using common text mining methods and how the post relates spatiotemporally to the disaster event. To assess the framework, geo-tagged Tweets were collected for three different spatial and temporal disaster events: hurricane, flood, and tornado. Remote sensing data and products for each event were then collected using RealEarthTM. Both Naive Bayes and Logistic Regression classifiers were used to compare the accuracy within the enhanced text-mining framework. Finally, the accuracies from the enhanced text-mining framework were compared to the current text-only methods for each of the case study disaster events. The results from this study address the need for more authoritative data when using social media in disaster management applications.
NASA Astrophysics Data System (ADS)
Tirupattur, Naveen; Lapish, Christopher C.; Mukhopadhyay, Snehasis
2011-06-01
Text mining, sometimes alternately referred to as text analytics, refers to the process of extracting high-quality knowledge from the analysis of textual data. Text mining has wide variety of applications in areas such as biomedical science, news analysis, and homeland security. In this paper, we describe an approach and some relatively small-scale experiments which apply text mining to neuroscience research literature to find novel associations among a diverse set of entities. Neuroscience is a discipline which encompasses an exceptionally wide range of experimental approaches and rapidly growing interest. This combination results in an overwhelmingly large and often diffuse literature which makes a comprehensive synthesis difficult. Understanding the relations or associations among the entities appearing in the literature not only improves the researchers current understanding of recent advances in their field, but also provides an important computational tool to formulate novel hypotheses and thereby assist in scientific discoveries. We describe a methodology to automatically mine the literature and form novel associations through direct analysis of published texts. The method first retrieves a set of documents from databases such as PubMed using a set of relevant domain terms. In the current study these terms yielded a set of documents ranging from 160,909 to 367,214 documents. Each document is then represented in a numerical vector form from which an Association Graph is computed which represents relationships between all pairs of domain terms, based on co-occurrence. Association graphs can then be subjected to various graph theoretic algorithms such as transitive closure and cycle (circuit) detection to derive additional information, and can also be visually presented to a human researcher for understanding. In this paper, we present three relatively small-scale problem-specific case studies to demonstrate that such an approach is very successful in replicating a neuroscience expert's mental model of object-object associations entirely by means of text mining. These preliminary results provide the confidence that this type of text mining based research approach provides an extremely powerful tool to better understand the literature and drive novel discovery for the neuroscience community.
Data Streams: An Overview and Scientific Applications
NASA Astrophysics Data System (ADS)
Aggarwal, Charu C.
In recent years, advances in hardware technology have facilitated the ability to collect data continuously. Simple transactions of everyday life such as using a credit card, a phone, or browsing the web lead to automated data storage. Similarly, advances in information technology have lead to large flows of data across IP networks. In many cases, these large volumes of data can be mined for interesting and relevant information in a wide variety of applications. When the volume of the underlying data is very large, it leads to a number of computational and mining challenges: With increasing volume of the data, it is no longer possible to process the data efficiently by using multiple passes. Rather, one can process a data item at most once. This leads to constraints on the implementation of the underlying algorithms. Therefore, stream mining algorithms typically need to be designed so that the algorithms work with one pass of the data. In most cases, there is an inherent temporal component to the stream mining process. This is because the data may evolve over time. This behavior of data streams is referred to as temporal locality. Therefore, a straightforward adaptation of one-pass mining algorithms may not be an effective solution to the task. Stream mining algorithms need to be carefully designed with a clear focus on the evolution of the underlying data. Another important characteristic of data streams is that they are often mined in a distributed fashion. Furthermore, the individual processors may have limited processing and memory. Examples of such cases include sensor networks, in which it may be desirable to perform in-network processing of data stream with limited processing and memory [1, 2]. This chapter will provide an overview of the key challenges in stream mining algorithms which arise from the unique setup in which these problems are encountered. This chapter is organized as follows. In the next section, we will discuss the generic challenges that stream mining poses to a variety of data management and data mining problems. The next section also deals with several issues which arise in the context of data stream management. In Sect. 3, we discuss several mining algorithms on the data stream model. Section 4 discusses various scientific applications of data streams. Section 5 discusses the research directions and conclusions.
Using Open Web APIs in Teaching Web Mining
ERIC Educational Resources Information Center
Chen, Hsinchun; Li, Xin; Chau, M.; Ho, Yi-Jen; Tseng, Chunju
2009-01-01
With the advent of the World Wide Web, many business applications that utilize data mining and text mining techniques to extract useful business information on the Web have evolved from Web searching to Web mining. It is important for students to acquire knowledge and hands-on experience in Web mining during their education in information systems…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Young, B.C.; Schmit, C.R.
The report, conducted by Energy and Environmental Research Center, was funded by the US Trade and Development Agency. The objective of this report was to determine the technical, environmental and economic feasibility of developing, demonstrating, and commercializing underground coal gasification (UCG) at the Krabi coal mine site in Southern Thailand. This is Volume 1, the Progress Report for the period December 1, 1995, through December 31, 1995.
Zhao, Ning; Zheng, Guang; Li, Jian; Zhao, Hong-Yan; Lu, Cheng; Jiang, Miao; Zhang, Chi; Guo, Hong-Tao; Lu, Ai-Ping
2018-01-09
To identify the commonalities between rheumatoid arthritis (RA) and diabetes mellitus (DM) to understand the mechanisms of Chinese medicine (CM) in different diseases with the same treatment. A text mining approach was adopted to analyze the commonalities between RA and DM according to CM and biological elements. The major commonalities were subsequently verifified in RA and DM rat models, in which herbal formula for the treatment of both RA and DM identifified via text mining was used as the intervention. Similarities were identifified between RA and DM regarding the CM approach used for diagnosis and treatment, as well as the networks of biological activities affected by each disease, including the involvement of adhesion molecules, oxidative stress, cytokines, T-lymphocytes, apoptosis, and inflfl ammation. The Ramulus Cinnamomi-Radix Paeoniae Alba-Rhizoma Anemarrhenae is an herbal combination used to treat RA and DM. This formula demonstrated similar effects on oxidative stress and inflfl ammation in rats with collagen-induced arthritis, which supports the text mining results regarding the commonalities between RA and DM. Commonalities between the biological activities involved in RA and DM were identifified through text mining, and both RA and DM might be responsive to the same intervention at a specifific stage.
NASA Technical Reports Server (NTRS)
Hughes, T. H.; Dillion, A. C., III; White, J. R., Jr.; Drummond, S. E., Jr.; Hooks, W. G.
1975-01-01
Because of the volume of coal produced by strip mining, the proximity of mining operations, and the diversity of mining methods (e.g. contour stripping, area stripping, multiple seam stripping, and augering, as well as underground mining), the Warrior Coal Basin seemed best suited for initial studies on the physical impact of strip mining in Alabama. Two test sites, (Cordova and Searles) representative of the various strip mining techniques and environmental problems, were chosen for intensive studies of the correlation between remote sensing and ground truth data. Efforts were eventually concentrated in the Searles Area, since it is more accessible and offers a better opportunity for study of erosional and depositional processes than the Cordova Area.
NASA Astrophysics Data System (ADS)
Vieira, Alexandre; Matos, João; Lopes, Luis; Martins, Ruben
2016-04-01
Located in the Iberian Pyrite Belt (IPB) northern sector, near the Portuguese/Spanish border, the outcropping São Domingos deposit was mined since Roman time. Between 1854 and 1966 the Mason & Barry Company developed open pit excavation until 120 m depth and underground mining until 420 m depth. The São Domingos subvertical deposit is associated with felsic volcanics and black shales of the IPB Volcano-Sedimentary Complex and is represented by massive sulphide and stockwork ore (py, cpy, sph, ga, tt, aspy) and related supergene enrichment ore (hematite gossan and covellite/chalcocite). Different mine waste classes were mapped around the old open pit: gossan (W1), felsic volcanic and shales (W2), shales (W3) and mining waste landfill (W4). Using the LNEG (Portuguese Geological Survey) CONASA database (company historical mining waste characterization based on 162 shafts and 160 reverse circulation boreholes), a methodology for tridimensional modelling mining waste pile was followed, and a new mining waste resource is presented. Considering some constraints to waste removal, such as the Mina de São Domingos village proximity of the wastes, the industrial and archaeological patrimony (e.g., mining infrastructures, roman galleries), different resource scenarios were considered: unconditioned resources (total estimates) and conditioned resources (only the volumes without removal constraints considered). Using block modelling (SURPAC software) a mineral inferred resource of 2.38 Mt @ 0.77 g/t Au and 8.26 g/t Ag is estimated in unconditioned volumes of waste. Considering all evaluated wastes, including village areas, an inferred resource of 4.0 Mt @ 0.64 g/t Au and 7.30 g/t Ag is presented, corresponding to a total metal content of 82,878 oz t Au and 955,753 oz t Ag. Keywords. São Domingos mine, mining waste resources, mining waste pile modelling, Iberian Pyrite Belt, Portugal
Text Mining of Journal Articles for Sleep Disorder Terminologies.
Lam, Calvin; Lai, Fu-Chih; Wang, Chia-Hui; Lai, Mei-Hsin; Hsu, Nanly; Chung, Min-Huey
2016-01-01
Research on publication trends in journal articles on sleep disorders (SDs) and the associated methodologies by using text mining has been limited. The present study involved text mining for terms to determine the publication trends in sleep-related journal articles published during 2000-2013 and to identify associations between SD and methodology terms as well as conducting statistical analyses of the text mining findings. SD and methodology terms were extracted from 3,720 sleep-related journal articles in the PubMed database by using MetaMap. The extracted data set was analyzed using hierarchical cluster analyses and adjusted logistic regression models to investigate publication trends and associations between SD and methodology terms. MetaMap had a text mining precision, recall, and false positive rate of 0.70, 0.77, and 11.51%, respectively. The most common SD term was breathing-related sleep disorder, whereas narcolepsy was the least common. Cluster analyses showed similar methodology clusters for each SD term, except narcolepsy. The logistic regression models showed an increasing prevalence of insomnia, parasomnia, and other sleep disorders but a decreasing prevalence of breathing-related sleep disorder during 2000-2013. Different SD terms were positively associated with different methodology terms regarding research design terms, measure terms, and analysis terms. Insomnia-, parasomnia-, and other sleep disorder-related articles showed an increasing publication trend, whereas those related to breathing-related sleep disorder showed a decreasing trend. Furthermore, experimental studies more commonly focused on hypersomnia and other SDs and less commonly on insomnia, breathing-related sleep disorder, narcolepsy, and parasomnia. Thus, text mining may facilitate the exploration of the publication trends in SDs and the associated methodologies.
Text Mining to Support Gene Ontology Curation and Vice Versa.
Ruch, Patrick
2017-01-01
In this chapter, we explain how text mining can support the curation of molecular biology databases dealing with protein functions. We also show how curated data can play a disruptive role in the developments of text mining methods. We review a decade of efforts to improve the automatic assignment of Gene Ontology (GO) descriptors, the reference ontology for the characterization of genes and gene products. To illustrate the high potential of this approach, we compare the performances of an automatic text categorizer and show a large improvement of +225 % in both precision and recall on benchmarked data. We argue that automatic text categorization functions can ultimately be embedded into a Question-Answering (QA) system to answer questions related to protein functions. Because GO descriptors can be relatively long and specific, traditional QA systems cannot answer such questions. A new type of QA system, so-called Deep QA which uses machine learning methods trained with curated contents, is thus emerging. Finally, future advances of text mining instruments are directly dependent on the availability of high-quality annotated contents at every curation step. Databases workflows must start recording explicitly all the data they curate and ideally also some of the data they do not curate.
Hahn, P; Dullweber, F; Unglaub, F; Spies, C K
2014-06-01
Searching for relevant publications is becoming more difficult with the increasing number of scientific articles. Text mining as a specific form of computer-based data analysis may be helpful in this context. Highlighting relations between authors and finding relevant publications concerning a specific subject using text analysis programs are illustrated graphically by 2 performed examples. © Georg Thieme Verlag KG Stuttgart · New York.
Mining of Business-Oriented Conversations at a Call Center
NASA Astrophysics Data System (ADS)
Takeuchi, Hironori; Nasukawa, Tetsuya; Watanabe, Hideo
Recently it has become feasible to transcribe textual records from telephone conversations at call centers by using automatic speech recognition. In this research, we extended a text mining system for call summary records and constructed a conversation mining system for the business-oriented conversations at the call center. To acquire useful business insights from the conversational data through the text mining system, it is critical to identify appropriate textual segments and expressions as the viewpoints to focus on. In the analysis of call summary data using a text mining system, some experts defined the viewpoints for the analysis by looking at some sample records and by preparing the dictionaries based on frequent keywords in the sample dataset. However with conversations it is difficult to identify such viewpoints manually and in advance because the target data consists of complete transcripts that are often lengthy and redundant. In this research, we defined a model of the business-oriented conversations and proposed a mining method to identify segments that have impacts on the outcomes of the conversations and can then extract useful expressions in each of these identified segments. In the experiment, we processed the real datasets from a car rental service center and constructed a mining system. With this system, we show the effectiveness of the method based on the defined conversation model.
Recent progress in automatically extracting information from the pharmacogenomic literature
Garten, Yael; Coulet, Adrien; Altman, Russ B
2011-01-01
The biomedical literature holds our understanding of pharmacogenomics, but it is dispersed across many journals. In order to integrate our knowledge, connect important facts across publications and generate new hypotheses we must organize and encode the contents of the literature. By creating databases of structured pharmocogenomic knowledge, we can make the value of the literature much greater than the sum of the individual reports. We can, for example, generate candidate gene lists or interpret surprising hits in genome-wide association studies. Text mining automatically adds structure to the unstructured knowledge embedded in millions of publications, and recent years have seen a surge in work on biomedical text mining, some specific to pharmacogenomics literature. These methods enable extraction of specific types of information and can also provide answers to general, systemic queries. In this article, we describe the main tasks of text mining in the context of pharmacogenomics, summarize recent applications and anticipate the next phase of text mining applications. PMID:21047206
NASA Astrophysics Data System (ADS)
Ma, Kevin C.; Forsyth, Sydney; Amezcua, Lilyana; Liu, Brent J.
2017-03-01
We have designed and developed a multiple sclerosis eFolder system for patient data storage, image viewing, and automatic lesion quantification results to allow patient tracking. The web-based system aims to be integrated in DICOM-compliant clinical and research environments to aid clinicians in patient treatments and data analysis. The system quantifies lesion volumes, identify and register lesion locations to track shifts in volume and quantity of lesions in a longitudinal study. We aim to evaluate the two most important features of the system, data mining and longitudinal lesion tracking, to demonstrate the MS eFolder's capability in improving clinical workflow efficiency and outcome analysis for research. In order to evaluate data mining capabilities, we have collected radiological and neurological data from 72 patients, 36 Caucasian and 36 Hispanic matched by gender, disease duration, and age. Data analysis on those patients based on ethnicity is performed, and analysis results are displayed by the system's web-based user interface. The data mining module is able to successfully separate Hispanic and Caucasian patients and compare their disease profiles. For longitudinal lesion tracking, we have collected 4 longitudinal cases and simulated different lesion growths over the next year. As a result, the eFolder is able to detect changes in lesion volume and identifying lesions with the most changes. Data mining and lesion tracking evaluation results show high potential of eFolder's usefulness in patientcare and informatics research for multiple sclerosis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guernsey, J L; Brown, L A; Perry, A O
1978-02-01
This case study examines the reclamation practices of the Georgia Kaolin's American Industrial Clay Company Division, a kaolin producer centered in Twiggs, Washington, and Wilkinson Counties, Georgia. The State of Georgia accounts for more than one-fourth of the world's kaolin production and about three-fourths of U.S. kaolin output. The mining of kaolin in Georgia illustrates the effects of mining and reclaiming lands disturbed by area surface mining. The disturbed areas are reclaimed under the rules and regulations of the Georgia Surface Mining Act of 1968. The natural conditions influencing the reclamation methodologies and techniques are markedly unique from those ofmore » other mining operations. The environmental disturbances and procedures used in reclaiming the kaolin mined lands are reviewed and implications for planners are noted.« less
Mining protein function from text using term-based support vector machines
Rice, Simon B; Nenadic, Goran; Stapley, Benjamin J
2005-01-01
Background Text mining has spurred huge interest in the domain of biology. The goal of the BioCreAtIvE exercise was to evaluate the performance of current text mining systems. We participated in Task 2, which addressed assigning Gene Ontology terms to human proteins and selecting relevant evidence from full-text documents. We approached it as a modified form of the document classification task. We used a supervised machine-learning approach (based on support vector machines) to assign protein function and select passages that support the assignments. As classification features, we used a protein's co-occurring terms that were automatically extracted from documents. Results The results evaluated by curators were modest, and quite variable for different problems: in many cases we have relatively good assignment of GO terms to proteins, but the selected supporting text was typically non-relevant (precision spanning from 3% to 50%). The method appears to work best when a substantial set of relevant documents is obtained, while it works poorly on single documents and/or short passages. The initial results suggest that our approach can also mine annotations from text even when an explicit statement relating a protein to a GO term is absent. Conclusion A machine learning approach to mining protein function predictions from text can yield good performance only if sufficient training data is available, and significant amount of supporting data is used for prediction. The most promising results are for combined document retrieval and GO term assignment, which calls for the integration of methods developed in BioCreAtIvE Task 1 and Task 2. PMID:15960835
O'Mara-Eves, Alison; Thomas, James; McNaught, John; Miwa, Makoto; Ananiadou, Sophia
2015-01-14
The large and growing number of published studies, and their increasing rate of publication, makes the task of identifying relevant studies in an unbiased way for inclusion in systematic reviews both complex and time consuming. Text mining has been offered as a potential solution: through automating some of the screening process, reviewer time can be saved. The evidence base around the use of text mining for screening has not yet been pulled together systematically; this systematic review fills that research gap. Focusing mainly on non-technical issues, the review aims to increase awareness of the potential of these technologies and promote further collaborative research between the computer science and systematic review communities. Five research questions led our review: what is the state of the evidence base; how has workload reduction been evaluated; what are the purposes of semi-automation and how effective are they; how have key contextual problems of applying text mining to the systematic review field been addressed; and what challenges to implementation have emerged? We answered these questions using standard systematic review methods: systematic and exhaustive searching, quality-assured data extraction and a narrative synthesis to synthesise findings. The evidence base is active and diverse; there is almost no replication between studies or collaboration between research teams and, whilst it is difficult to establish any overall conclusions about best approaches, it is clear that efficiencies and reductions in workload are potentially achievable. On the whole, most suggested that a saving in workload of between 30% and 70% might be possible, though sometimes the saving in workload is accompanied by the loss of 5% of relevant studies (i.e. a 95% recall). Using text mining to prioritise the order in which items are screened should be considered safe and ready for use in 'live' reviews. The use of text mining as a 'second screener' may also be used cautiously. The use of text mining to eliminate studies automatically should be considered promising, but not yet fully proven. In highly technical/clinical areas, it may be used with a high degree of confidence; but more developmental and evaluative work is needed in other disciplines.
Moment tensor inversion of ground motion from mining-induced earthquakes, Trail Mountain, Utah
Fletcher, Joe B.; McGarr, A.
2005-01-01
A seismic network was operated in the vicinity of the Trail Mountain mine, central Utah, from the summer of 2000 to the spring of 2001 to investigate the seismic hazard to a local dam from mining-induced events that we expect to be triggered by future coal mining in this area. In support of efforts to develop groundmotion prediction relations for this situation, we inverted ground-motion recordings for six mining-induced events to determine seismic moment tensors and then to estimate moment magnitudes M for comparison with the network coda magnitudes Mc. Six components of the tensor were determined, for an assumed point source, following the inversion method of McGarr (1992a), which uses key measurements of amplitude from obvious features of the displacement waveforms. When the resulting moment tensors were decomposed into implosive and deviatoric components, we found that four of the six events showed a substantial volume reduction, presumably due to coseismic closure of the adjacent mine openings. For these four events, the volume reduction ranges from 27% to 55% of the shear component (fault area times average slip). Radiated seismic energy, computed from attenuation-corrected body-wave spectra, ranged from 2.4 ?? 105 to 2.4 ?? 106 J for events with M from 1.3 to 1.8, yielding apparent stresses from 0.02 to 0.06 MPa. The energy released for each event, approximated as the product of volume reduction and overburden stress, when compared with the corresponding seismic energies, revealed seismic efficiencies ranging from 0.5% to 7%. The low apparent stresses are consistent with the shallow focal depths of 0.2 to 0.6 km and rupture in a low stress/low strength regime compared with typical earthquake source regions at midcrustal depths.
Conceptual biology, hypothesis discovery, and text mining: Swanson's legacy.
Bekhuis, Tanja
2006-04-03
Innovative biomedical librarians and information specialists who want to expand their roles as expert searchers need to know about profound changes in biology and parallel trends in text mining. In recent years, conceptual biology has emerged as a complement to empirical biology. This is partly in response to the availability of massive digital resources such as the network of databases for molecular biologists at the National Center for Biotechnology Information. Developments in text mining and hypothesis discovery systems based on the early work of Swanson, a mathematician and information scientist, are coincident with the emergence of conceptual biology. Very little has been written to introduce biomedical digital librarians to these new trends. In this paper, background for data and text mining, as well as for knowledge discovery in databases (KDD) and in text (KDT) is presented, then a brief review of Swanson's ideas, followed by a discussion of recent approaches to hypothesis discovery and testing. 'Testing' in the context of text mining involves partially automated methods for finding evidence in the literature to support hypothetical relationships. Concluding remarks follow regarding (a) the limits of current strategies for evaluation of hypothesis discovery systems and (b) the role of literature-based discovery in concert with empirical research. Report of an informatics-driven literature review for biomarkers of systemic lupus erythematosus is mentioned. Swanson's vision of the hidden value in the literature of science and, by extension, in biomedical digital databases, is still remarkably generative for information scientists, biologists, and physicians.
Jeff Skousen; Carl Zipper; Jim Burger; Christopher Barton; Patrick. Angel
2017-01-01
The Forestry Reclamation Approach (FRA), a method for reclaiming coal-mined land to forest (Chapter 2, this volume), is based on research, knowledge, and experience of forest soil scientists and reclamation practitioners. Step 1 of the FRA is to create a suitable rooting medium for good tree growth that is no less than 4 feet deep and consists of topsoil, weathered...
Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery.
Gonzalez, Graciela H; Tahsin, Tasnia; Goodale, Britton C; Greene, Anna C; Greene, Casey S
2016-01-01
Precision medicine will revolutionize the way we treat and prevent disease. A major barrier to the implementation of precision medicine that clinicians and translational scientists face is understanding the underlying mechanisms of disease. We are starting to address this challenge through automatic approaches for information extraction, representation and analysis. Recent advances in text and data mining have been applied to a broad spectrum of key biomedical questions in genomics, pharmacogenomics and other fields. We present an overview of the fundamental methods for text and data mining, as well as recent advances and emerging applications toward precision medicine. © The Author 2015. Published by Oxford University Press.
Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery
Gonzalez, Graciela H.; Tahsin, Tasnia; Goodale, Britton C.; Greene, Anna C.
2016-01-01
Precision medicine will revolutionize the way we treat and prevent disease. A major barrier to the implementation of precision medicine that clinicians and translational scientists face is understanding the underlying mechanisms of disease. We are starting to address this challenge through automatic approaches for information extraction, representation and analysis. Recent advances in text and data mining have been applied to a broad spectrum of key biomedical questions in genomics, pharmacogenomics and other fields. We present an overview of the fundamental methods for text and data mining, as well as recent advances and emerging applications toward precision medicine. PMID:26420781
Text Mining the History of Medicine.
Thompson, Paul; Batista-Navarro, Riza Theresa; Kontonatsios, Georgios; Carter, Jacob; Toon, Elizabeth; McNaught, John; Timmermann, Carsten; Worboys, Michael; Ananiadou, Sophia
2016-01-01
Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while the processing pipeline and its modules may be used and configured within the Argo TM platform.
Text Mining the History of Medicine
Thompson, Paul; Batista-Navarro, Riza Theresa; Kontonatsios, Georgios; Carter, Jacob; Toon, Elizabeth; McNaught, John; Timmermann, Carsten; Worboys, Michael; Ananiadou, Sophia
2016-01-01
Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while the processing pipeline and its modules may be used and configured within the Argo TM platform. PMID:26734936
Application of text mining for customer evaluations in commercial banking
NASA Astrophysics Data System (ADS)
Tan, Jing; Du, Xiaojiang; Hao, Pengpeng; Wang, Yanbo J.
2015-07-01
Nowadays customer attrition is increasingly serious in commercial banks. To combat this problem roundly, mining customer evaluation texts is as important as mining customer structured data. In order to extract hidden information from customer evaluations, Textual Feature Selection, Classification and Association Rule Mining are necessary techniques. This paper presents all three techniques by using Chinese Word Segmentation, C5.0 and Apriori, and a set of experiments were run based on a collection of real textual data that includes 823 customer evaluations taken from a Chinese commercial bank. Results, consequent solutions, some advice for the commercial bank are given in this paper.
Extracting semantically enriched events from biomedical literature
2012-01-01
Background Research into event-based text mining from the biomedical literature has been growing in popularity to facilitate the development of advanced biomedical text mining systems. Such technology permits advanced search, which goes beyond document or sentence-based retrieval. However, existing event-based systems typically ignore additional information within the textual context of events that can determine, amongst other things, whether an event represents a fact, hypothesis, experimental result or analysis of results, whether it describes new or previously reported knowledge, and whether it is speculated or negated. We refer to such contextual information as meta-knowledge. The automatic recognition of such information can permit the training of systems allowing finer-grained searching of events according to the meta-knowledge that is associated with them. Results Based on a corpus of 1,000 MEDLINE abstracts, fully manually annotated with both events and associated meta-knowledge, we have constructed a machine learning-based system that automatically assigns meta-knowledge information to events. This system has been integrated into EventMine, a state-of-the-art event extraction system, in order to create a more advanced system (EventMine-MK) that not only extracts events from text automatically, but also assigns five different types of meta-knowledge to these events. The meta-knowledge assignment module of EventMine-MK performs with macro-averaged F-scores in the range of 57-87% on the BioNLP’09 Shared Task corpus. EventMine-MK has been evaluated on the BioNLP’09 Shared Task subtask of detecting negated and speculated events. Our results show that EventMine-MK can outperform other state-of-the-art systems that participated in this task. Conclusions We have constructed the first practical system that extracts both events and associated, detailed meta-knowledge information from biomedical literature. The automatically assigned meta-knowledge information can be used to refine search systems, in order to provide an extra search layer beyond entities and assertions, dealing with phenomena such as rhetorical intent, speculations, contradictions and negations. This finer grained search functionality can assist in several important tasks, e.g., database curation (by locating new experimental knowledge) and pathway enrichment (by providing information for inference). To allow easy integration into text mining systems, EventMine-MK is provided as a UIMA component that can be used in the interoperable text mining infrastructure, U-Compare. PMID:22621266
Extracting semantically enriched events from biomedical literature.
Miwa, Makoto; Thompson, Paul; McNaught, John; Kell, Douglas B; Ananiadou, Sophia
2012-05-23
Research into event-based text mining from the biomedical literature has been growing in popularity to facilitate the development of advanced biomedical text mining systems. Such technology permits advanced search, which goes beyond document or sentence-based retrieval. However, existing event-based systems typically ignore additional information within the textual context of events that can determine, amongst other things, whether an event represents a fact, hypothesis, experimental result or analysis of results, whether it describes new or previously reported knowledge, and whether it is speculated or negated. We refer to such contextual information as meta-knowledge. The automatic recognition of such information can permit the training of systems allowing finer-grained searching of events according to the meta-knowledge that is associated with them. Based on a corpus of 1,000 MEDLINE abstracts, fully manually annotated with both events and associated meta-knowledge, we have constructed a machine learning-based system that automatically assigns meta-knowledge information to events. This system has been integrated into EventMine, a state-of-the-art event extraction system, in order to create a more advanced system (EventMine-MK) that not only extracts events from text automatically, but also assigns five different types of meta-knowledge to these events. The meta-knowledge assignment module of EventMine-MK performs with macro-averaged F-scores in the range of 57-87% on the BioNLP'09 Shared Task corpus. EventMine-MK has been evaluated on the BioNLP'09 Shared Task subtask of detecting negated and speculated events. Our results show that EventMine-MK can outperform other state-of-the-art systems that participated in this task. We have constructed the first practical system that extracts both events and associated, detailed meta-knowledge information from biomedical literature. The automatically assigned meta-knowledge information can be used to refine search systems, in order to provide an extra search layer beyond entities and assertions, dealing with phenomena such as rhetorical intent, speculations, contradictions and negations. This finer grained search functionality can assist in several important tasks, e.g., database curation (by locating new experimental knowledge) and pathway enrichment (by providing information for inference). To allow easy integration into text mining systems, EventMine-MK is provided as a UIMA component that can be used in the interoperable text mining infrastructure, U-Compare.
Lunar site characterization and mining
NASA Technical Reports Server (NTRS)
Glass, Charles E.
1992-01-01
Lunar mining requirements do not appear to be excessively demanding in terms of volume of material processed. It seems clear, however, that the labor-intensive practices that characterize terrestrial mining will not suffice at the low-gravity, hard-vacuum, and inaccessible sites on the Moon. New research efforts are needed in three important areas: (1) to develop high-speed, high-resolution through-rock vision systems that will permit more detailed and efficient mine site investigation and characterization; (2) to investigate the impact of lunar conditions on our ability to convert conventional mining and exploration equipment to lunar prototypes; and (3) to develop telerobotic or fully robotic mining systems for operations on the Moon and other bodies in the inner solar system. Other aspects of lunar site characterization and mining are discussed.
Individual Profiling Using Text Analysis
2016-04-15
Mining a Text for Errors. . . . on Knowledge discovery in data mining , pages 624–628, 2005. [12] Michal Kosinski, David Stillwell, and Thore Graepel...AFRL-AFOSR-UK-TR-2016-0011 Individual Profiling using Text Analysis 140333 Mark Stevenson UNIVERSITY OF SHEFFIELD, DEPARTMENT OF PSYCHOLOGY Final...REPORT TYPE Final 3. DATES COVERED (From - To) 15 Sep 2014 to 14 Sep 2015 4. TITLE AND SUBTITLE Individual Profiling using Text Analysis
Mining the pharmacogenomics literature—a survey of the state of the art
Cohen, K. Bretonnel; Garten, Yael; Shah, Nigam H.
2012-01-01
This article surveys efforts on text mining of the pharmacogenomics literature, mainly from the period 2008 to 2011. Pharmacogenomics (or pharmacogenetics) is the field that studies how human genetic variation impacts drug response. Therefore, publications span the intersection of research in genotypes, phenotypes and pharmacology, a topic that has increasingly become a focus of active research in recent years. This survey covers efforts dealing with the automatic recognition of relevant named entities (e.g. genes, gene variants and proteins, diseases and other pathological phenomena, drugs and other chemicals relevant for medical treatment), as well as various forms of relations between them. A wide range of text genres is considered, such as scientific publications (abstracts, as well as full texts), patent texts and clinical narratives. We also discuss infrastructure and resources needed for advanced text analytics, e.g. document corpora annotated with corresponding semantic metadata (gold standards and training data), biomedical terminologies and ontologies providing domain-specific background knowledge at different levels of formality and specificity, software architectures for building complex and scalable text analytics pipelines and Web services grounded to them, as well as comprehensive ways to disseminate and interact with the typically huge amounts of semiformal knowledge structures extracted by text mining tools. Finally, we consider some of the novel applications that have already been developed in the field of pharmacogenomic text mining and point out perspectives for future research. PMID:22833496
Mining the pharmacogenomics literature--a survey of the state of the art.
Hahn, Udo; Cohen, K Bretonnel; Garten, Yael; Shah, Nigam H
2012-07-01
This article surveys efforts on text mining of the pharmacogenomics literature, mainly from the period 2008 to 2011. Pharmacogenomics (or pharmacogenetics) is the field that studies how human genetic variation impacts drug response. Therefore, publications span the intersection of research in genotypes, phenotypes and pharmacology, a topic that has increasingly become a focus of active research in recent years. This survey covers efforts dealing with the automatic recognition of relevant named entities (e.g. genes, gene variants and proteins, diseases and other pathological phenomena, drugs and other chemicals relevant for medical treatment), as well as various forms of relations between them. A wide range of text genres is considered, such as scientific publications (abstracts, as well as full texts), patent texts and clinical narratives. We also discuss infrastructure and resources needed for advanced text analytics, e.g. document corpora annotated with corresponding semantic metadata (gold standards and training data), biomedical terminologies and ontologies providing domain-specific background knowledge at different levels of formality and specificity, software architectures for building complex and scalable text analytics pipelines and Web services grounded to them, as well as comprehensive ways to disseminate and interact with the typically huge amounts of semiformal knowledge structures extracted by text mining tools. Finally, we consider some of the novel applications that have already been developed in the field of pharmacogenomic text mining and point out perspectives for future research.
Using Text Mining to Characterize Online Discussion Facilitation
ERIC Educational Resources Information Center
Ming, Norma; Baumer, Eric
2011-01-01
Facilitating class discussions effectively is a critical yet challenging component of instruction, particularly in online environments where student and faculty interaction is limited. Our goals in this research were to identify facilitation strategies that encourage productive discussion, and to explore text mining techniques that can help…
The use of Data Mining in the categorization of patients with Azoospermia.
Mikos, Themistoklis; Maglaveras, Nikolaos; Pantazis, Konstantinos; Goulis, Dimitrios G; Bontis, John N; Papadimas, John
2005-01-01
Data Mining is a relatively new field of Medical Informatics. The aim of this study was to compare Data Mining diagnosis with clinical diagnosis by applying a Data Miner (DM) to a clinical dataset of infertile men with azoospermia. One hundred and forty-seven azoospermic men were clinically classified into four groups: a) obstructive azoospermia (n=63), b) non-obstructive azoospermia (n=71), c) hypergonadotropic hypogonadism (n=2), and d) hypogonadotropic hypogonadism (n=11). The DM (IBM's DB2/Intelligent Miner for Data 6.1) was asked to reproduce a four-cluster model. DM formed four groups of patients: a) eugonadal men with normal testicular volume and normal FSH levels (n=86), b) eugonadal men with significantly reduced testicular volume (median 6.5 cm3) and very high FSH levels (n=29), c) eugonadal men with moderately reduced testicular volume (median 14.5 cm3) and raised FSH levels (n=20), and d) hypogonadal men (n=12). Overall DM concordance rate in hypogonadal men was 92%, in obstructive azoospermia 73%, and in non-obstructive azoospermia 69%. Data Mining produces clinically meaningful results but different from those of the clinical diagnosis. It is possible that the use of large sets of structured and formalised data and continuous evaluation of DM results will generate a useful methodology for the Clinician.
40 CFR 372.23 - SIC and NAICS codes to which this Part applies.
Code of Federal Regulations, 2010 CFR
2010-07-01
... facilities primarily engaged in reproducing text, drawings, plans, maps, or other copy, by blueprinting...)); 212324Kaolin and Ball Clay Mining Limited to facilities operating without a mine or quarry and that are...)); 212393Other Chemical and Fertilizer Mineral Mining Limited to facilities operating without a mine or quarry...
Open-source tools for data mining.
Zupan, Blaz; Demsar, Janez
2008-03-01
With a growing volume of biomedical databases and repositories, the need to develop a set of tools to address their analysis and support knowledge discovery is becoming acute. The data mining community has developed a substantial set of techniques for computational treatment of these data. In this article, we discuss the evolution of open-source toolboxes that data mining researchers and enthusiasts have developed over the span of a few decades and review several currently available open-source data mining suites. The approaches we review are diverse in data mining methods and user interfaces and also demonstrate that the field and its tools are ready to be fully exploited in biomedical research.
Russian thistle for soil mulch in coal mine reclamation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Day, A.D.; Tucker, T.C.; Thames, J.L.
1979-01-01
The effectiveness of Russian thistle mulch in reducing soil moisture loss from coal mine soil was gauged and compared with the effectiveness of barley straw mulch. The decrease in soil moisture loss after mulch addition was greater in a low temperature, high humidity environment. Russian thistle mulch was as effective as barley straw in reducing soil moisture loss in Red Mesa loam, unmined soil, and coal mine soil. Because Russian thistle can be grown on mine spoils and has a higher organic volume than barley straw mulch has, treatment of mine soil with thistle will improve soil characteristics and plantmore » growth. (14 references, 1 table)« less
pubmed.mineR: an R package with text-mining algorithms to analyse PubMed abstracts.
Rani, Jyoti; Shah, A B Rauf; Ramachandran, Srinivasan
2015-10-01
The PubMed literature database is a valuable source of information for scientific research. It is rich in biomedical literature with more than 24 million citations. Data-mining of voluminous literature is a challenging task. Although several text-mining algorithms have been developed in recent years with focus on data visualization, they have limitations such as speed, are rigid and are not available in the open source. We have developed an R package, pubmed.mineR, wherein we have combined the advantages of existing algorithms, overcome their limitations, and offer user flexibility and link with other packages in Bioconductor and the Comprehensive R Network (CRAN) in order to expand the user capabilities for executing multifaceted approaches. Three case studies are presented, namely, 'Evolving role of diabetes educators', 'Cancer risk assessment' and 'Dynamic concepts on disease and comorbidity' to illustrate the use of pubmed.mineR. The package generally runs fast with small elapsed times in regular workstations even on large corpus sizes and with compute intensive functions. The pubmed.mineR is available at http://cran.rproject. org/web/packages/pubmed.mineR.
30 CFR 282.29 - Reports and records.
Code of Federal Regulations, 2011 CFR
2011-07-01
... Mineral Resources BUREAU OF OCEAN ENERGY MANAGEMENT, REGULATION, AND ENFORCEMENT, DEPARTMENT OF THE... mining and processing activity; the number of days operations were conducted; the identity, quantity... mining. All excavations shall be shown in such manner that the volume of OCS minerals produced during a...
Chen, Chou-Cheng; Ho, Chung-Liang
2014-01-01
While a huge amount of information about biological literature can be obtained by searching the PubMed database, reading through all the titles and abstracts resulting from such a search for useful information is inefficient. Text mining makes it possible to increase this efficiency. Some websites use text mining to gather information from the PubMed database; however, they are database-oriented, using pre-defined search keywords while lacking a query interface for user-defined search inputs. We present the PubMed Abstract Reading Helper (PubstractHelper) website which combines text mining and reading assistance for an efficient PubMed search. PubstractHelper can accept a maximum of ten groups of keywords, within each group containing up to ten keywords. The principle behind the text-mining function of PubstractHelper is that keywords contained in the same sentence are likely to be related. PubstractHelper highlights sentences with co-occurring keywords in different colors. The user can download the PMID and the abstracts with color markings to be reviewed later. The PubstractHelper website can help users to identify relevant publications based on the presence of related keywords, which should be a handy tool for their research. http://bio.yungyun.com.tw/ATM/PubstractHelper.aspx and http://holab.med.ncku.edu.tw/ATM/PubstractHelper.aspx.
Text Classification for Organizational Researchers
Kobayashi, Vladimer B.; Mol, Stefan T.; Berkers, Hannah A.; Kismihók, Gábor; Den Hartog, Deanne N.
2017-01-01
Organizations are increasingly interested in classifying texts or parts thereof into categories, as this enables more effective use of their information. Manual procedures for text classification work well for up to a few hundred documents. However, when the number of documents is larger, manual procedures become laborious, time-consuming, and potentially unreliable. Techniques from text mining facilitate the automatic assignment of text strings to categories, making classification expedient, fast, and reliable, which creates potential for its application in organizational research. The purpose of this article is to familiarize organizational researchers with text mining techniques from machine learning and statistics. We describe the text classification process in several roughly sequential steps, namely training data preparation, preprocessing, transformation, application of classification techniques, and validation, and provide concrete recommendations at each step. To help researchers develop their own text classifiers, the R code associated with each step is presented in a tutorial. The tutorial draws from our own work on job vacancy mining. We end the article by discussing how researchers can validate a text classification model and the associated output. PMID:29881249
Graphics-based intelligent search and abstracting using Data Modeling
NASA Astrophysics Data System (ADS)
Jaenisch, Holger M.; Handley, James W.; Case, Carl T.; Songy, Claude G.
2002-11-01
This paper presents an autonomous text and context-mining algorithm that converts text documents into point clouds for visual search cues. This algorithm is applied to the task of data-mining a scriptural database comprised of the Old and New Testaments from the Bible and the Book of Mormon, Doctrine and Covenants, and the Pearl of Great Price. Results are generated which graphically show the scripture that represents the average concept of the database and the mining of the documents down to the verse level.
The Labour Welfare Fund Laws (Amendment) Act, 1987 (No. 15 of 1987), 22 May 1987.
1987-01-01
This Act authorizes funds constituted under the Mica Mines Labour Welfare Fund Act, 1946, the Limestone and Dolomite Mines Labour Welfare Fund Act, 1972, the Iron Ore Mines, Manganese Ore Mines and Chrome Mines Labour Welfare Fund Act, 1976, and the Beedi Workers Welfare Fund Act, 1976, to be applied for the provision of family welfare, including family planning education and services. full text
Optimising dewatering costs on a south african gold mine
NASA Astrophysics Data System (ADS)
Connelly, R. J.; Ward, A. D.
1987-06-01
Many South African Gold Mines are geologically in proximity to the Transvaal Dolomites. This geological unit, is karstic in many areas and is very extensive. Very large volumes of ground water can be found in the dolomites, and have given rise to major dewatering problems on the mines. Hitherto, the general philosophy on the mines has been to acept these large inflows into the mine, and then to pump out from underground at a suitably convenient level. The dolomites constitute a ground water control area which means that Goverment permission is required to do anything with ground water within the dolomite. When the first major inflows occurred, the mines started dewatering the dolomites, and in many areas induced sinkholes, with significant loss of life and buildings. The nett result is that mines have to pump large quantities of water out of the mine but recharge into the dolomite to maintain water levesl. During the past 2 years a number of investigations have been carried out to reduce the very high costs of dewatering. On one mine the cost of removing 130×103 m3/day is about 1×106 Rand/month. The hydrogeologic model for the dolomites is now reasonably well understood. It shows that surface wells to a depth of up to 150 m can withdraw significant quantities of water and reduce the amount that has to be pumped from considerable depth with significant saving in puming costs. Such a system has a number of additional advantages such as removing some of the large volume of water from the underground working environment and providing a system that can be used for controlled surface dewatering should it be required.
NASA Astrophysics Data System (ADS)
Rytuba, J. J.
2015-12-01
An increase in intensity and frequency of extreme events resulting from climate change is expected to result in extreme precipitation events on both regional and local scales. Extreme precipitation events have the potential to mobilize large volumes of mercury (Hg) mine tailings in watersheds where tailings reside in the floodplain downstream from historic Hg mines. The California Hg mineral belt produced one third of the worlds Hg from over 100 mines from the 1850's to 1972. In the absence of environmental regulations, tailings were disposed of into streams adjacent to the mines in order to have them transported from the mine site during storm events. Thus most of the tailings no longer reside at the mine site. Addition of tailings to the streams resulted in stream aggradation, increased over-bank flow, and deposition of tailings in the floodplain for up to 25 kms downstream from the mines. After cessation of mining, the decrease in tailings entering the streams resulted in degradation, incision of the streams into the floodplain, and inability of the streams to access the floodplain. Thus Hg tailings have remained stored in the floodplain since cessation of mining. Hg phases in these tailings consist of cinnabar, metacinnabar and montroydite based on EXAFS analysis. Size analysis indicates that Hg phases are fine grained, less than 1 um. The last regional scale extreme precipitation events to effect the entire area of the California Hg mineral belt were the ARkStorm events of 1861-1862 that occurred prior to large scale Hg mining. Extreme regional ARkStorm precipitation events as well as local summer storms, such as the July 2006 flood in the Clear Creek Hg mining district, are expected to increase in frequency and have the potential to remobilize the large volume of tailings stored in floodplain deposits. Although Hg mine remediation has decreased Hg release from mine sites in a period of benign climate, no remediation efforts have addressed the large source of Hg residing in floodplain deposits. This Hg source in a period of climate change poses a significant environmental risk to aquatic systems downstream from Hg mine-impacted watersheds. An extreme ARkStorm event is estimated to potentially remobilize an amount of Hg equivalent to that released in the past during the peak period of unregulated Hg mining in California.
Mining Tasks from the Web Anchor Text Graph: MSR Notebook Paper for the TREC 2015 Tasks Track
2015-11-20
Mining Tasks from the Web Anchor Text Graph: MSR Notebook Paper for the TREC 2015 Tasks Track Paul N. Bennett Microsoft Research Redmond, USA pauben...anchor text graph has proven useful in the general realm of query reformulation [2], we sought to quantify the value of extracting key phrases from...anchor text in the broader setting of the task understanding track. Given a query, our approach considers a simple method for identifying a relevant
Supply Chain Modeling for Fluorspar and Hydroflouric Acid and Implications for Further Analyses
2015-04-01
Critical Materials, Volume 1 Chapter 2. Fluorspar-HF Supply Chain 4 Foreign Supply Other usesUS Supply Fluorspar Mining HF Production Downstream...analysis are listed across the top: Fluorspar Mining , HF Production, and (pro- duction of) Downstream Products (using HF). • U.S. supply is represented by...material flows from fluorspar mining , to HF production, to downstream fluorine-containing products. – Black lines are material flows included in the supply
ERIC Educational Resources Information Center
Wang, Yinying; Bowers, Alex J.; Fikis, David J.
2017-01-01
Purpose: The purpose of this study is to describe the underlying topics and the topic evolution in the 50-year history of educational leadership research literature. Method: We used automated text data mining with probabilistic latent topic models to examine the full text of the entire publication history of all 1,539 articles published in…
OntoMate: a text-mining tool aiding curation at the Rat Genome Database
Liu, Weisong; Laulederkind, Stanley J. F.; Hayman, G. Thomas; Wang, Shur-Jen; Nigam, Rajni; Smith, Jennifer R.; De Pons, Jeff; Dwinell, Melinda R.; Shimoyama, Mary
2015-01-01
The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow. Database URL: http://rgd.mcw.edu PMID:25619558
Morganwalp, David W.; Buxton, Herbert T.
1999-01-01
This report contains papers presented at the seventh Technical Meeting of the U.S. Geological Survey (USGS), Toxic Substances Hydrology (Toxics) Program. The meeting was held March 8-12, 1999, in Charleston, South Carolina. Toxics Program Technical Meetings are held periodically to provide a forum for presentation and discussion of results of recent research activities.The objectives of these meetings are to:Present recent research results to essential stakeholders,Encourage synthesis and integrated interpretations among scientists with different expertise who are working on a contamination issue, andPromote exchange of ideas among scientists working on different projects and issues within the Toxics Program.The Proceedings is published in three volumes. Volume 1 contains papers that report on results of research on contamination from hard-rock mining. Results include research on contamination from hard rock mining in arid southwest alluvial basins, research on hard rock mining in mountainous terrain, and progress from the USGS Abandoned Mine Lands Initiative. This Initiative is designed to develop a watershed-based approach to characterize and remediate contamination from abandoned mine lands and transfer technologies to Federal land management agencies and stakeholders.Volume 2 contains papers on contamination of hydrologic systems and related ecosystems. The papers discuss research on the response of estuarine ecosystems to contamination from human activities. They include research on San Francisco Bay; mercury contamination of aquatic ecosystems; and investigation of the occurrence, distribution, and fate of agricultural chemicals in the Mississippi River Basin. This volume also contains results on development and reconnaissance testing of new methods to detect emerging contaminants in environmental samples.Volume 3 contains papers on subsurface contamination from point sources. The papers discuss research on: hydrocarbons and fuel oxygenates at gasoline release sites; ground-water contamination by crude oil; complex contaminant mixtures from treated wastewater discharges; waste disposal and subsurface transport of contaminants in arid environments; ground water and surface water affected by municipal landfill leachate; natural attenuation of chlorinated solvents; and characterizing flow and transport in fractured rock aquifers.In all, the more than 175 papers contained in this proceedings reflect the contributions of more than 350 scientists who are co-authors. These scientists are from across the USGS, as well as from universities, other Federal and State agencies, and industry.
Morganwalp, David W.; Buxton, Herbert T.
1999-01-01
This report contains papers presented at the seventh Technical Meeting of the U.S. Geological Survey (USGS), Toxic Substances Hydrology (Toxics) Program. The meeting was held March 8-12, 1999, in Charleston, South Carolina. Toxics Program Technical Meetings are held periodically to provide a forum for presentation and discussion of results of recent research activities.The objectives of these meetings are to:Present recent research results to essential stakeholders,Encourage synthesis and integrated interpretations among scientists with different expertise who are working on a contamination issue, andPromote exchange of ideas among scientists working on different projects and issues within the Toxics Program.The Proceedings is published in three volumes. Volume 1 contains papers that report on results of research on contamination from hard-rock mining. Results include research on contamination from hard rock mining in arid southwest alluvial basins, research on hard rock mining in mountainous terrain, and progress from the USGS Abandoned Mine Lands Initiative. This Initiative is designed to develop a watershed-based approach to characterize and remediate contamination from abandoned mine lands and transfer technologies to Federal land management agencies and stakeholders.Volume 2 contains papers on contamination of hydrologic systems and related ecosystems. The papers discuss research on the response of estuarine ecosystems to contamination from human activities. They include research on San Francisco Bay; mercury contamination of aquatic ecosystems; and investigation of the occurrence, distribution, and fate of agricultural chemicals in the Mississippi River Basin. This volume also contains results on development and reconnaissance testing of new methods to detect emerging contaminants in environmental samples.Volume 3 contains papers on subsurface contamination from point sources. The papers discuss research on: hydrocarbons and fuel oxygenates at gasoline release sites; ground-water contamination by crude oil; complex contaminant mixtures from treated wastewater discharges; waste disposal and subsurface transport of contaminants in arid environments; ground water and surface water affected by municipal landfill leachate; natural attenuation of chlorinated solvents; and characterizing flow and transport in fractured rock aquifers.In all, the more than 175 papers contained in this proceedings reflect the contributions of more than 350 scientists who are co-authors. These scientists are from across the USGS, as well as from universities, other Federal and State agencies, and industry.
Morganwalp, David W.; Buxton, Herbert T.
1999-01-01
This report contains papers presented at the seventh Technical Meeting of the U.S. Geological Survey (USGS), Toxic Substances Hydrology (Toxics) Program. The meeting was held March 8-12, 1999, in Charleston, South Carolina. Toxics Program Technical Meetings are held periodically to provide a forum for presentation and discussion of results of recent research activities.The objectives of these meetings are to:Present recent research results to essential stakeholders,Encourage synthesis and integrated interpretations among scientists with different expertise who are working on a contamination issue, andPromote exchange of ideas among scientists working on different projects and issues within the Toxics Program.The Proceedings is published in three volumes. Volume 1 contains papers that report on results of research on contamination from hard-rock mining. Results include research on contamination from hard rock mining in arid southwest alluvial basins, research on hard rock mining in mountainous terrain, and progress from the USGS Abandoned Mine Lands Initiative. This Initiative is designed to develop a watershed-based approach to characterize and remediate contamination from abandoned mine lands and transfer technologies to Federal land management agencies and stakeholders.Volume 2 contains papers on contamination of hydrologic systems and related ecosystems. The papers discuss research on the response of estuarine ecosystems to contamination from human activities. They include research on San Francisco Bay; mercury contamination of aquatic ecosystems; and investigation of the occurrence, distribution, and fate of agricultural chemicals in the Mississippi River Basin. This volume also contains results on development and reconnaissance testing of new methods to detect emerging contaminants in environmental samples.Volume 3 contains papers on subsurface contamination from point sources. The papers discuss research on: hydrocarbons and fuel oxygenates at gasoline release sites; ground-water contamination by crude oil; complex contaminant mixtures from treated wastewater discharges; waste disposal and subsurface transport of contaminants in arid environments; ground water and surface water affected by municipal landfill leachate; natural attenuation of chlorinated solvents; and characterizing flow and transport in fractured rock aquifers.In all, the more than 175 papers contained in this proceedings reflect the contributions of more than 350 scientists who are co-authors. These scientists are from across the USGS, as well as from universities, other Federal and State agencies, and industry.
A sentence sliding window approach to extract protein annotations from biomedical articles
Krallinger, Martin; Padron, Maria; Valencia, Alfonso
2005-01-01
Background Within the emerging field of text mining and statistical natural language processing (NLP) applied to biomedical articles, a broad variety of techniques have been developed during the past years. Nevertheless, there is still a great ned of comparative assessment of the performance of the proposed methods and the development of common evaluation criteria. This issue was addressed by the Critical Assessment of Text Mining Methods in Molecular Biology (BioCreative) contest. The aim of this contest was to assess the performance of text mining systems applied to biomedical texts including tools which recognize named entities such as genes and proteins, and tools which automatically extract protein annotations. Results The "sentence sliding window" approach proposed here was found to efficiently extract text fragments from full text articles containing annotations on proteins, providing the highest number of correctly predicted annotations. Moreover, the number of correct extractions of individual entities (i.e. proteins and GO terms) involved in the relationships used for the annotations was significantly higher than the correct extractions of the complete annotations (protein-function relations). Conclusion We explored the use of averaging sentence sliding windows for information extraction, especially in a context where conventional training data is unavailable. The combination of our approach with more refined statistical estimators and machine learning techniques might be a way to improve annotation extraction for future biomedical text mining applications. PMID:15960831
Text Mining Improves Prediction of Protein Functional Sites
Cohn, Judith D.; Ravikumar, Komandur E.
2012-01-01
We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites). The structure analysis was carried out using Dynamics Perturbation Analysis (DPA), which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites) in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions. PMID:22393388
NASA Astrophysics Data System (ADS)
Rutherfurd, I.; Davies, P.; Macklin, M. G.; Grove, J. R.
2016-12-01
Coarse and fine sediment has been a major pollutant of Australian rivers and receiving waters since European settlement in 1788. Anthropogenic sediment budget models demonstrate that catchment and channel erosion has increased background sediment delivery by 10 to 20 times across SE Australia, but these estimates ignore the contribution of historical gold mining. Detailed historical records allow us to reconstruct the delivery of coarse and fine sediment (including contaminated sediment) to the fluvial system. Between 1851 and 1900 alluvial gold mining in the state of Victoria liberated between 1.2 billion and 1.4 billion m3 of coarse and fine sediment into streams. Catchment scale modelling demonstrates that this volume is at least twice the volume of all anthropogenic (post-European) erosion from hillslopes, river banks, and gullies. We map the deposition and remobilization of these contaminated legacy mining sediments down selected valleys, and find that many contemporary floodplains are blanketed with mining sediments (although mercury contamination is present but low), and discrete sediment-slugs can be recognized migrating down river beds. Overall, the impact of gold mining is one of the strongest indicators of the Anthropocene in the Australian landscape, and the level of impact on rivers is substantially greater than recognized in the past. Perhaps of most interest is the rapid recovery of many river systems from the substantial impacts of gold mining. The result is that these major changes to the landscape are largely forgotten.
NASA Astrophysics Data System (ADS)
Fuksa, Dariusz; Trzaskuś-Żak, Beata; Gałaś, Zdzisław; Utrata, Arkadiusz
2017-03-01
In the practice of mining companies, the vast majority of them produce more than one product. The analysis of the break-even, which is referred to as CVP (Cost-Volume-Profit) analysis (Wilkinson, 2005; Czopek, 2003) in their case is significantly constricted, given the necessity to include multi-assortment structure in the analysis, which may have more than 20 types of assortments (depending on the grain size) in their offer, as in the case of open-pit mines. The article presents methods of evaluation of break-even (volume and value) for both a single-assortment production and a multi-assortment production. The complexity of problem of break-even evaluation for multi-assortment production has resulted in formation of many methods, and, simultaneously, various approaches to its analysis, especially differences in accounting fixed costs, which may be either totally accounted for among particular assortments, relating to the whole company or partially accounted for among particular assortments and partially relating to the company, as a whole. The evaluation of the chosen methods of break-even analysis, given the availability of data, was based on two examples of mining companies: an open-pit mine of rock materials and an underground hard coal mine. The selection of methods was set by the available data provided by the companies. The data for the analysis comes from internal documentation of the mines - financial statements, breakdowns and cost calculations.
Xia, Jingbo; Zhang, Xing; Yuan, Daojun; Chen, Lingling; Webster, Jonathan; Fang, Alex Chengyu
2013-01-01
To effectively assess the possibility of the unknown rice protein resistant to Xanthomonas oryzae pv. oryzae, a hybrid strategy is proposed to enhance gene prioritization by combining text mining technologies with a sequence-based approach. The text mining technique of term frequency inverse document frequency is used to measure the importance of distinguished terms which reflect biomedical activity in rice before candidate genes are screened and vital terms are produced. Afterwards, a built-in classifier under the chaos games representation algorithm is used to sieve the best possible candidate gene. Our experiment results show that the combination of these two methods achieves enhanced gene prioritization. PMID:24371834
Uncovering text mining: A survey of current work on web-based epidemic intelligence
Collier, Nigel
2012-01-01
Real world pandemics such as SARS 2002 as well as popular fiction like the movie Contagion graphically depict the health threat of a global pandemic and the key role of epidemic intelligence (EI). While EI relies heavily on established indicator sources a new class of methods based on event alerting from unstructured digital Internet media is rapidly becoming acknowledged within the public health community. At the heart of automated information gathering systems is a technology called text mining. My contribution here is to provide an overview of the role that text mining technology plays in detecting epidemics and to synthesise my existing research on the BioCaster project. PMID:22783909
Xia, Jingbo; Zhang, Xing; Yuan, Daojun; Chen, Lingling; Webster, Jonathan; Fang, Alex Chengyu
2013-01-01
To effectively assess the possibility of the unknown rice protein resistant to Xanthomonas oryzae pv. oryzae, a hybrid strategy is proposed to enhance gene prioritization by combining text mining technologies with a sequence-based approach. The text mining technique of term frequency inverse document frequency is used to measure the importance of distinguished terms which reflect biomedical activity in rice before candidate genes are screened and vital terms are produced. Afterwards, a built-in classifier under the chaos games representation algorithm is used to sieve the best possible candidate gene. Our experiment results show that the combination of these two methods achieves enhanced gene prioritization.
Kreula, Sanna M; Kaewphan, Suwisa; Ginter, Filip; Jones, Patrik R
2018-01-01
The increasing move towards open access full-text scientific literature enhances our ability to utilize advanced text-mining methods to construct information-rich networks that no human will be able to grasp simply from 'reading the literature'. The utility of text-mining for well-studied species is obvious though the utility for less studied species, or those with no prior track-record at all, is not clear. Here we present a concept for how advanced text-mining can be used to create information-rich networks even for less well studied species and apply it to generate an open-access gene-gene association network resource for Synechocystis sp. PCC 6803, a representative model organism for cyanobacteria and first case-study for the methodology. By merging the text-mining network with networks generated from species-specific experimental data, network integration was used to enhance the accuracy of predicting novel interactions that are biologically relevant. A rule-based algorithm (filter) was constructed in order to automate the search for novel candidate genes with a high degree of likely association to known target genes by (1) ignoring established relationships from the existing literature, as they are already 'known', and (2) demanding multiple independent evidences for every novel and potentially relevant relationship. Using selected case studies, we demonstrate the utility of the network resource and filter to ( i ) discover novel candidate associations between different genes or proteins in the network, and ( ii ) rapidly evaluate the potential role of any one particular gene or protein. The full network is provided as an open-source resource.
Waste Controls at Base Metal Mines
ERIC Educational Resources Information Center
Bell, Alan V.
1976-01-01
Mining and milling of copper, lead, zinc and nickel in Canada involves an accumulation of a half-million tons of waste material each day and requires 250 million gallons of process water daily. Waste management considerations for handling large volumes of wastes in an economically and environmentally safe manner are discussed. (BT)
30 CFR 27.23 - Automatic warning device.
Code of Federal Regulations, 2010 CFR
2010-07-01
... APPROVAL OF MINING PRODUCTS METHANE-MONITORING SYSTEMS Construction and Design Requirements § 27.23... function automatically at a methane content of the mine atmosphere between 1.0 to 1.5 volume percent and at all higher concentrations of methane. (c) It is recommended that the automatic warning device be...
Code of Federal Regulations, 2010 CFR
2010-07-01
... texts of State and Federal cooperative agreements for regulation of mining on Federal lands. The... Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR PROGRAMS FOR THE CONDUCT OF SURFACE MINING OPERATIONS WITHIN EACH STATE INTRODUCTION § 900.2 Objectives. The objective of...
76 FR 40649 - Indiana Regulatory Program
Federal Register 2010, 2011, 2012, 2013, 2014
2011-07-11
... at 312 IAC 25-6-30 Surface mining; explosives; general requirements. The full text of the program... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 914... Mining Reclamation and Enforcement, Interior. ACTION: Proposed rule; public comment period on proposed...
Complementing the Numbers: A Text Mining Analysis of College Course Withdrawals
ERIC Educational Resources Information Center
Michalski, Greg V.
2011-01-01
Excessive college course withdrawals are costly to the student and the institution in terms of time to degree completion, available classroom space, and other resources. Although generally well quantified, detailed analysis of the reasons given by students for course withdrawal is less common. To address this, a text mining analysis was performed…
Can abstract screening workload be reduced using text mining? User experiences of the tool Rayyan.
Olofsson, Hanna; Brolund, Agneta; Hellberg, Christel; Silverstein, Rebecca; Stenström, Karin; Österberg, Marie; Dagerhamn, Jessica
2017-09-01
One time-consuming aspect of conducting systematic reviews is the task of sifting through abstracts to identify relevant studies. One promising approach for reducing this burden uses text mining technology to identify those abstracts that are potentially most relevant for a project, allowing those abstracts to be screened first. To examine the effectiveness of the text mining functionality of the abstract screening tool Rayyan. User experiences were collected. Rayyan was used to screen abstracts for 6 reviews in 2015. After screening 25%, 50%, and 75% of the abstracts, the screeners logged the relevant references identified. A survey was sent to users. After screening half of the search result with Rayyan, 86% to 99% of the references deemed relevant to the study were identified. Of those studies included in the final reports, 96% to 100% were already identified in the first half of the screening process. Users rated Rayyan 4.5 out of 5. The text mining function in Rayyan successfully helped reviewers identify relevant studies early in the screening process. Copyright © 2017 John Wiley & Sons, Ltd.
Pandey, Abhishek; Kreimeyer, Kory; Foster, Matthew; Botsis, Taxiarchis; Dang, Oanh; Ly, Thomas; Wang, Wei; Forshee, Richard
2018-01-01
Structured Product Labels follow an XML-based document markup standard approved by the Health Level Seven organization and adopted by the US Food and Drug Administration as a mechanism for exchanging medical products information. Their current organization makes their secondary use rather challenging. We used the Side Effect Resource database and DailyMed to generate a comparison dataset of 1159 Structured Product Labels. We processed the Adverse Reaction section of these Structured Product Labels with the Event-based Text-mining of Health Electronic Records system and evaluated its ability to extract and encode Adverse Event terms to Medical Dictionary for Regulatory Activities Preferred Terms. A small sample of 100 labels was then selected for further analysis. Of the 100 labels, Event-based Text-mining of Health Electronic Records achieved a precision and recall of 81 percent and 92 percent, respectively. This study demonstrated Event-based Text-mining of Health Electronic Record's ability to extract and encode Adverse Event terms from Structured Product Labels which may potentially support multiple pharmacoepidemiological tasks.
A Framework for Text Mining in Scientometric Study: A Case Study in Biomedicine Publications
NASA Astrophysics Data System (ADS)
Silalahi, V. M. M.; Hardiyati, R.; Nadhiroh, I. M.; Handayani, T.; Rahmaida, R.; Amelia, M.
2018-04-01
The data of Indonesians research publications in the domain of biomedicine has been collected to be text mined for the purpose of a scientometric study. The goal is to build a predictive model that provides a classification of research publications on the potency for downstreaming. The model is based on the drug development processes adapted from the literatures. An effort is described to build the conceptual model and the development of a corpus on the research publications in the domain of Indonesian biomedicine. Then an investigation is conducted relating to the problems associated with building a corpus and validating the model. Based on our experience, a framework is proposed to manage the scientometric study based on text mining. Our method shows the effectiveness of conducting a scientometric study based on text mining in order to get a valid classification model. This valid model is mainly supported by the iterative and close interactions with the domain experts starting from identifying the issues, building a conceptual model, to the labelling, validation and results interpretation.
Data Processing and Text Mining Technologies on Electronic Medical Records: A Review
Sun, Wencheng; Li, Yangyang; Liu, Fang; Fang, Shengqun; Wang, Guoyan
2018-01-01
Currently, medical institutes generally use EMR to record patient's condition, including diagnostic information, procedures performed, and treatment results. EMR has been recognized as a valuable resource for large-scale analysis. However, EMR has the characteristics of diversity, incompleteness, redundancy, and privacy, which make it difficult to carry out data mining and analysis directly. Therefore, it is necessary to preprocess the source data in order to improve data quality and improve the data mining results. Different types of data require different processing technologies. Most structured data commonly needs classic preprocessing technologies, including data cleansing, data integration, data transformation, and data reduction. For semistructured or unstructured data, such as medical text, containing more health information, it requires more complex and challenging processing methods. The task of information extraction for medical texts mainly includes NER (named-entity recognition) and RE (relation extraction). This paper focuses on the process of EMR processing and emphatically analyzes the key techniques. In addition, we make an in-depth study on the applications developed based on text mining together with the open challenges and research issues for future work. PMID:29849998
Big data mining: In-database Oracle data mining over hadoop
NASA Astrophysics Data System (ADS)
Kovacheva, Zlatinka; Naydenova, Ina; Kaloyanova, Kalinka; Markov, Krasimir
2017-07-01
Big data challenges different aspects of storing, processing and managing data, as well as analyzing and using data for business purposes. Applying Data Mining methods over Big Data is another challenge because of huge data volumes, variety of information, and the dynamic of the sources. Different applications are made in this area, but their successful usage depends on understanding many specific parameters. In this paper we present several opportunities for using Data Mining techniques provided by the analytical engine of RDBMS Oracle over data stored in Hadoop Distributed File System (HDFS). Some experimental results are given and they are discussed.
Space resources. Volume 3: Materials
NASA Technical Reports Server (NTRS)
Mckay, Mary Fae (Editor); Mckay, David S. (Editor); Duke, Michael B. (Editor)
1992-01-01
Space Resources addresses the issues of using space resources to support life on the Moon and for exploration of Mars. This volume - Materials - covers a number of technical and policy issues regarding the materials in space (mainly lunar and asteroidal) which can be used to support space operations. In part 1, nature and location of these materials, exploration strategy, evaluation criteria, and the technical means to collect or mine these materials is discussed. A baseline lunar mine and the basics of asteroid mining are presented and critiqued. In part 2, the beneficiation of ores and the extraction of such materials as oxygen, metals, and the makings of concrete are discussed. In part 3, the manufacturing and fabrication of nonterrestrial products are discussed. The economic tradeoffs between bringing needed products from Earth and making these products on location in space is considered.
NASA Technical Reports Server (NTRS)
1975-01-01
The results of a study of the weather sensitive features of near shore and deep water ocean mining industries are described. Problems with the evaluation of economic benefits for the deep water ocean mining industry are attributed to the relative immaturity and highly proprietary nature of the industry. Case studies on the gold industry, diamond industry, tin industry and sand and gravel industry are cited.
Zhang, Jian; Tan, Qingrong; Yin, Hong; Zhang, Xiaoliang; Huan, Yi; Tang, Lihua; Wang, Huaihai; Xu, Junqing; Li, Lingjiang
2011-05-31
Although limbic structure changes have been found in chronic and recent onset post-traumatic stress disorder (PTSD) patients, there are few studies about brain structure changes in recent onset PTSD patients after a single extreme and prolonged trauma. In the current study, 20 coal mine flood disaster survivors underwent magnetic resonance imaging (MRI). Voxel-based morphometry (VBM) and region of interest (ROI) techniques were used to detect the gray matter and white matter volume changes in 10 survivors with recent onset PTSD and 10 survivors without PTSD. The correlation between the Clinician-Administered PTSD Scale (CAPS) and gray matter density in the ROI was also studied. Compared with survivors without PTSD, survivors with PTSD had significantly decreased gray matter volume and density in left anterior hippocampus, left parahippocampal gyrus, and bilateral calcarine cortex. The CAPS score correlated negatively with the gray matter density in bilateral calcarine cortex and left hippocampus in coal mine disaster survivors. Our study suggests that the gray matter volume and density of limbic structure decreased in recent onset PTSD patients who were exposed to extreme trauma. PTSD symptom severity was associated with gray matter density in calcarine cortex and hippocampus. 2010 Elsevier Ireland Ltd. All rights reserved.
Sediment radioisotope dating across a stratigraphic discontinuity in a mining-impacted lake.
McDonald, C P; Urban, N R
2007-01-01
Application of radioisotope sediment dating models to lakes subjected to large anthropogenic sediment inputs can be problematic. As a result of copper mining activities, Torch Lake received large volumes of sediment, the characteristics of which were dramatically different from those of the native sediment. Commonly used dating models (CIC-CSR, CRS) were applied to Torch Lake, but assumptions of these methods are violated, rendering sediment geochronologies inaccurate. A modification was made to the CRS model, utilizing a distinct horizon separating mining from post-mining sediment to differentiate between two focusing regimes. (210)Pb inventories in post-mining sediment were adjusted to correspond to those in mining-era sediment, and a sediment geochronology was established and verified using independent markers in (137)Cs accumulation profiles and core X-rays.
76 FR 12849 - Kentucky Regulatory Program
Federal Register 2010, 2011, 2012, 2013, 2014
2011-03-09
... (underground mining). The text of the Kentucky regulations can be found in the administrative record and online... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 917 [KY-252-FOR; OSM-2009-0011] Kentucky Regulatory Program AGENCY: Office of Surface Mining Reclamation...
Research on Classification of Chinese Text Data Based on SVM
NASA Astrophysics Data System (ADS)
Lin, Yuan; Yu, Hongzhi; Wan, Fucheng; Xu, Tao
2017-09-01
Data Mining has important application value in today’s industry and academia. Text classification is a very important technology in data mining. At present, there are many mature algorithms for text classification. KNN, NB, AB, SVM, decision tree and other classification methods all show good classification performance. Support Vector Machine’ (SVM) classification method is a good classifier in machine learning research. This paper will study the classification effect based on the SVM method in the Chinese text data, and use the support vector machine method in the chinese text to achieve the classify chinese text, and to able to combination of academia and practical application.
StemTextSearch: Stem cell gene database with evidence from abstracts.
Chen, Chou-Cheng; Ho, Chung-Liang
2017-05-01
Previous studies have used many methods to find biomarkers in stem cells, including text mining, experimental data and image storage. However, no text-mining methods have yet been developed which can identify whether a gene plays a positive or negative role in stem cells. StemTextSearch identifies the role of a gene in stem cells by using a text-mining method to find combinations of gene regulation, stem-cell regulation and cell processes in the same sentences of biomedical abstracts. The dataset includes 5797 genes, with 1534 genes having positive roles in stem cells, 1335 genes having negative roles, 1654 genes with both positive and negative roles, and 1274 with an uncertain role. The precision of gene role in StemTextSearch is 0.66, and the recall is 0.78. StemTextSearch is a web-based engine with queries that specify (i) gene, (ii) category of stem cell, (iii) gene role, (iv) gene regulation, (v) cell process, (vi) stem-cell regulation, and (vii) species. StemTextSearch is available through http://bio.yungyun.com.tw/StemTextSearch.aspx. Copyright © 2017. Published by Elsevier Inc.
2010-01-01
Background An increase in work on the full text of journal articles and the growth of PubMedCentral have the opportunity to create a major paradigm shift in how biomedical text mining is done. However, until now there has been no comprehensive characterization of how the bodies of full text journal articles differ from the abstracts that until now have been the subject of most biomedical text mining research. Results We examined the structural and linguistic aspects of abstracts and bodies of full text articles, the performance of text mining tools on both, and the distribution of a variety of semantic classes of named entities between them. We found marked structural differences, with longer sentences in the article bodies and much heavier use of parenthesized material in the bodies than in the abstracts. We found content differences with respect to linguistic features. Three out of four of the linguistic features that we examined were statistically significantly differently distributed between the two genres. We also found content differences with respect to the distribution of semantic features. There were significantly different densities per thousand words for three out of four semantic classes, and clear differences in the extent to which they appeared in the two genres. With respect to the performance of text mining tools, we found that a mutation finder performed equally well in both genres, but that a wide variety of gene mention systems performed much worse on article bodies than they did on abstracts. POS tagging was also more accurate in abstracts than in article bodies. Conclusions Aspects of structure and content differ markedly between article abstracts and article bodies. A number of these differences may pose problems as the text mining field moves more into the area of processing full-text articles. However, these differences also present a number of opportunities for the extraction of data types, particularly that found in parenthesized text, that is present in article bodies but not in article abstracts. PMID:20920264
Modeling the Use of Mine Waste Rock as a Porous Medium Reservoir for Compressed Air Energy Storage
NASA Astrophysics Data System (ADS)
Donelick, R. A.; Donelick, M. B.
2016-12-01
We are studying the engineering and economic feasibilities of constructing Big Mass Battery (BiMBy) compressed air energy storage devices using some of the giga-tonnes of annually generated and historically produced mine waste rock/overburden/tailings (waste rock). This beneficial use of waste rock is based on the large mass (Big Mass), large pore volume, and wide range of waste rock permeabilities available at some large open pit metal mines and coal strip mines. Porous Big Mass is encapsulated and overlain by additional Big Mass; compressed air is pumped into the encapsulated pore space when renewable energy is abundant; compressed air is released from the encapsulated pore space to run turbines to generate electricity at the grid scale when consumers demand electricity. Energy storage capacity modeling: 1) Yerington Pit, Anaconda Copper Mine, Yerington, NV (inactive metal mine): 340 Mt Big Mass, energy storage capacity equivalent to 390k-710k home batteries of size 10 kW•h/charge, assumed 20% porosity, 50% overall efficiency. 2) Berkeley Pit, Butte Copper Mine, Butte, MT (inactive metal mine): 870 Mt Big Mass, energy storage capacity equivalent to 1.4M-2.9M home batteries of size 10 kW•h/charge, assumed 20% porosity, 50% overall efficiency. 3) Rosebud Mine, Colstrip, MT (active coal strip mine): 87 Mt over 2 years, energy storage capacity equivalent to 45k-67k home batteries of size 10 kW•h/charge, assumed 30% porosity, 50% overall efficiency. Encapsulating impermeable layer modeling: Inactive mine pits like Yerington Pit and Berkeley Pit, and similar active pits, have associated with them low permeability earthen material (silt and clay in Big Mass) at sufficient quantities to manufacture an encapsulating structure with minimal loss of efficiency due to leakage, a lifetime of decades or even centuries, and minimal need for the use of geomembranes. Active coal strip mines like Rosebud mine have associated with them low permeability earthen material such as coal combustion products (fly ash, bottom ash, boiler slag, other) that may be put to beneficial use as part of the encapsulating structure; however, coal strip mines have lower volume to surface ratios than mine pits increasing the potential need to use geomembranes.
SAN PEDRO PARKS WILDERNESS, NEW MEXICO.
Santos, Elmer S.; Weisner, Robert C.
1984-01-01
The San Pedro Parks Wilderness occupies 62. 7 sq mi of the Santa Fe National Forest in north-central New Mexico. Several copper mines, many copper prospects, and a few uranium prospects occur in sedimentary units in the vicinity of the wilderness. These units, where they extend into the wilderness, constitute only a small volume of rock and, judging from analyses of samples and from field observations, are devoid of copper and uranium concentration. Prospects on several of about 65 mining claims within the wilderness revealed concentrations of manganese or barite but only in volumes too small to be considered a demonstrated resource.
Deriving novel relationships from the scientific literature is an important adjunct to datamining activities for complex datasets in genomics and high-throughput screening activities. Automated text-mining algorithms can be used to extract relevant content from the literature and...
A Feature Mining Based Approach for the Classification of Text Documents into Disjoint Classes.
ERIC Educational Resources Information Center
Nieto Sanchez, Salvador; Triantaphyllou, Evangelos; Kraft, Donald
2002-01-01
Proposes a new approach for classifying text documents into two disjoint classes. Highlights include a brief overview of document clustering; a data mining approach called the One Clause at a Time (OCAT) algorithm which is based on mathematical logic; vector space model (VSM); and comparing the OCAT to the VSM. (Author/LRW)
ERIC Educational Resources Information Center
Hung, Jui-Long; Zhang, Ke
2012-01-01
This study investigated the longitudinal trends of academic articles in Mobile Learning (ML) using text mining techniques. One hundred and nineteen (119) refereed journal articles and proceedings papers from the SCI/SSCI database were retrieved and analyzed. The taxonomies of ML publications were grouped into twelve clusters (topics) and four…
Trends of E-Learning Research from 2000 to 2008: Use of Text Mining and Bibliometrics
ERIC Educational Resources Information Center
Hung, Jui-long
2012-01-01
This study investigated the longitudinal trends of e-learning research using text mining techniques. Six hundred and eighty-nine (689) refereed journal articles and proceedings were retrieved from the Science Citation Index/Social Science Citation Index database in the period from 2000 to 2008. All e-learning publications were grouped into two…
Environmental geochemistry of the abandoned Mamut Copper Mine (Sabah) Malaysia.
van der Ent, Antony; Edraki, Mansour
2018-02-01
The Mamut Copper Mine (MCM) located in Sabah (Malaysia) on Borneo Island was the only Cu-Au mine that operated in the country. During its operation (1975-1999), the mine produced 2.47 Mt of concentrate containing approximately 600,000 t of Cu, 45 t of Au and 294 t of Ag, and generated about 250 Mt of overburden and waste rocks and over 150 Mt of tailings, which were deposited at the 397 ha Lohan tailings storage facility, 15.8 km from the mine and 980 m lower in altitude. The MCM site presents challenges for environmental rehabilitation due to the presence of large volumes of sulphidic minerals wastes, the very high rainfall and the large volume of polluted mine pit water. This indicates that rehabilitation and treatment is costly, as for example, exceedingly large quantities of lime are needed for neutralisation of the acidic mine pit discharge. The MCM site has several unusual geochemical features on account of the concomitant occurrence of acid-forming sulphide porphyry rocks and alkaline serpentinite minerals, and unique biological features because of the very high plant diversity in its immediate surroundings. The site hence provides a valuable opportunity for researching natural acid neutralisation processes and mine rehabilitation in tropical areas. Today, the MCM site is surrounded by protected nature reserves (Kinabalu Park, a World Heritage Site, and Bukit Hampuan, a Class I Forest Reserve), and the environmental legacy prevents de-gazetting and inclusion in these protected area in the foreseeable future. This article presents a preliminary geochemical investigation of waste rocks, sediments, secondary precipitates, surface water chemistry and foliar elemental uptake in ferns, and discusses these results in light of their environmental significance for rehabilitation.
CANFAR+Skytree: A Cloud Computing and Data Mining System for Astronomy
NASA Astrophysics Data System (ADS)
Ball, N. M.
2013-10-01
This is a companion Focus Demonstration article to the CANFAR+Skytree poster (Ball 2013, this volume), demonstrating the usage of the Skytree machine learning software on the Canadian Advanced Network for Astronomical Research (CANFAR) cloud computing system. CANFAR+Skytree is the world's first cloud computing system for data mining in astronomy.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sapko, M.J.; Weiss, E.S.; Watson, R.W.
Single-entry gas-explosion characteristics for the Bruceton Experimental Mine (BEM) are compared to those occurring in the larger geometries of the new Lake Lynn Mine (LLM) within the Lake Lynn Laboratory. (All three are Bureau of Mines facilities). Scale factors and boundary conditions for the BEM and the larger entries of the LLM are reviewed in some detail using representative data for pressure, flame, and wind velocity in the two mines. Measured pressure histories for gas explosions at the BEM are compared with data for comparable explosions in the larger cross section of the LLM. The time evolution for flame-front displacmentmore » can be characterized by a general expression that relates gas concentration and length of flammable volume. The course of the explosion development and its destructive power are dependent upon the development of turbulence in the unburned flammable mixture into which the flame propagates. The results of the study indicated that pressure profiles in the larger cross section are maintained to much larger, distances even though the flame front is accelerated less rapidly in a comparable entry length of smaller flammable volume.« less
Kreula, Sanna M.; Kaewphan, Suwisa; Ginter, Filip
2018-01-01
The increasing move towards open access full-text scientific literature enhances our ability to utilize advanced text-mining methods to construct information-rich networks that no human will be able to grasp simply from ‘reading the literature’. The utility of text-mining for well-studied species is obvious though the utility for less studied species, or those with no prior track-record at all, is not clear. Here we present a concept for how advanced text-mining can be used to create information-rich networks even for less well studied species and apply it to generate an open-access gene-gene association network resource for Synechocystis sp. PCC 6803, a representative model organism for cyanobacteria and first case-study for the methodology. By merging the text-mining network with networks generated from species-specific experimental data, network integration was used to enhance the accuracy of predicting novel interactions that are biologically relevant. A rule-based algorithm (filter) was constructed in order to automate the search for novel candidate genes with a high degree of likely association to known target genes by (1) ignoring established relationships from the existing literature, as they are already ‘known’, and (2) demanding multiple independent evidences for every novel and potentially relevant relationship. Using selected case studies, we demonstrate the utility of the network resource and filter to (i) discover novel candidate associations between different genes or proteins in the network, and (ii) rapidly evaluate the potential role of any one particular gene or protein. The full network is provided as an open-source resource. PMID:29844966
2016-08-24
Chuquicamata, in Chile's Atacama Desert, is the largest open pit copper mine in the world, by excavated volume. The copper deposits were first exploited in pre-Hispanic times. Open pit mining began in the early 20th century when a method was developed to work low grade oxidized copper ores. The image was acquired September 2, 2007, covers an area of 19.5 by 29.3 km, and is located at 22.1 degrees south, 68.9 degrees west. http://photojournal.jpl.nasa.gov/catalog/PIA20973
Van Landeghem, Sofie; Abeel, Thomas; Saeys, Yvan; Van de Peer, Yves
2010-09-15
In the field of biomolecular text mining, black box behavior of machine learning systems currently limits understanding of the true nature of the predictions. However, feature selection (FS) is capable of identifying the most relevant features in any supervised learning setting, providing insight into the specific properties of the classification algorithm. This allows us to build more accurate classifiers while at the same time bridging the gap between the black box behavior and the end-user who has to interpret the results. We show that our FS methodology successfully discards a large fraction of machine-generated features, improving classification performance of state-of-the-art text mining algorithms. Furthermore, we illustrate how FS can be applied to gain understanding in the predictions of a framework for biomolecular event extraction from text. We include numerous examples of highly discriminative features that model either biological reality or common linguistic constructs. Finally, we discuss a number of insights from our FS analyses that will provide the opportunity to considerably improve upon current text mining tools. The FS algorithms and classifiers are available in Java-ML (http://java-ml.sf.net). The datasets are publicly available from the BioNLP'09 Shared Task web site (http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/SharedTask/).
Generative Topic Modeling in Image Data Mining and Bioinformatics Studies
ERIC Educational Resources Information Center
Chen, Xin
2012-01-01
Probabilistic topic models have been developed for applications in various domains such as text mining, information retrieval and computer vision and bioinformatics domain. In this thesis, we focus on developing novel probabilistic topic models for image mining and bioinformatics studies. Specifically, a probabilistic topic-connection (PTC) model…
Federal Register 2010, 2011, 2012, 2013, 2014
2013-07-05
... silver mining operation. Most of the infrastructure to support a mining operation was authorized and.... The Proposed Action consists of underground mining, constructing a new production shaft, improving.... Public comments resulted in the addition of clarifying text, but did not significantly change the...
Environmental geochemistry of abandoned mercury mines in West-Central Nevada, USA
Gray, J.E.; Crock, J.G.; Fey, D.L.
2002-01-01
The Humboldt River is a closed basin and is the longest river in Nevada. Numerous abandoned Hg mines are located within the basin, and because Hg is a toxic heavy metal, the potential transport of Hg from these mines into surrounding ecosystems, including the Humboldt River, is of environmental concern Samples of ore, sediment, water, calcines (roasted ore), and leachates of the calcines were analyzed for Hg and other heavy metals to evaluate geochemical dispersion from the mines. Cinnabar-bearing ore samples collected from the mines contain highly elevated Hg concentrations, up to 6.9 %, whereas calcines collected from the mines contain up to 2000 mg Hg/kg. Stream-sediment samples collected within 1 km of the mines contain as much as 170 mg Hg/kg, but those collected distal from the mines (> 5 km) contain 8 km from the Humboldt River, and Hg is transported and diluted through a large volume of pediment before it reaches the Humboldt River. ?? 2002 Elsevier Science Ltd. All rights reserved.
Prediction of the flooding of a mining reservoir in NW Spain.
Álvarez, R; Ordóñez, A; De Miguel, E; Loredo, C
2016-12-15
Abandoned and flooded mines constitute underground reservoirs which must be managed. When pumping is stopped in a closed mine, the process of flooding should be anticipated in order to avoid environmentally undesirable or unexpected mine water discharges at the surface, particularly in populated areas. The Candín-Fondón mining reservoir in Asturias (NW Spain) has an estimated void volume of 8 million m 3 and some urban areas are susceptible to be flooded if the water is freely released from the lowest mine adit/pithead. A conceptual model of this reservoir was undertaken and the flooding process was numerically modelled in order to estimate the time that the flooding would take. Additionally, the maximum safe height for the filling of the reservoir is discussed. Copyright © 2016 Elsevier Ltd. All rights reserved.
Ghazizadeh, Mahtab; McDonald, Anthony D; Lee, John D
2014-09-01
This study applies text mining to extract clusters of vehicle problems and associated trends from free-response data in the National Highway Traffic Safety Administration's vehicle owner's complaint database. As the automotive industry adopts new technologies, it is important to systematically assess the effect of these changes on traffic safety. Driving simulators, naturalistic driving data, and crash databases all contribute to a better understanding of how drivers respond to changing vehicle technology, but other approaches, such as automated analysis of incident reports, are needed. Free-response data from incidents representing two severity levels (fatal incidents and incidents involving injury) were analyzed using a text mining approach: latent semantic analysis (LSA). LSA and hierarchical clustering identified clusters of complaints for each severity level, which were compared and analyzed across time. Cluster analysis identified eight clusters of fatal incidents and six clusters of incidents involving injury. Comparisons showed that although the airbag clusters across the two severity levels have the same most frequent terms, the circumstances around the incidents differ. The time trends show clear increases in complaints surrounding the Ford/Firestone tire recall and the Toyota unintended acceleration recall. Increases in complaints may be partially driven by these recall announcements and the associated media attention. Text mining can reveal useful information from free-response databases that would otherwise be prohibitively time-consuming and difficult to summarize manually. Text mining can extend human analysis capabilities for large free-response databases to support earlier detection of problems and more timely safety interventions.
Introduction to the mining of clinical data.
Harrison, James H
2008-03-01
The increasing volume of medical data online, including laboratory data, represents a substantial resource that can provide a foundation for improved understanding of disease presentation, response to therapy, and health care delivery processes. Data mining supports these goals by providing a set of techniques designed to discover similarities and relationships between data elements in large data sets. Currently, medical data have several characteristics that increase the difficulty of applying these techniques, although there have been notable medical data mining successes. Future developments in integrated medical data repositories, standardized data representation, and guidelines for the appropriate research use of medical data will decrease the barriers to mining projects.
Analysis of Nature of Science Included in Recent Popular Writing Using Text Mining Techniques
ERIC Educational Resources Information Center
Jiang, Feng; McComas, William F.
2014-01-01
This study examined the inclusion of nature of science (NOS) in popular science writing to determine whether it could serve supplementary resource for teaching NOS and to evaluate the accuracy of text mining and classification as a viable research tool in science education research. Four groups of documents published from 2001 to 2010 were…
ERIC Educational Resources Information Center
Cheon, Jongpil; Lee, Sangno; Smith, Walter; Song, Jaeki; Kim, Yongjin
2013-01-01
The purpose of this study was to use text mining analysis of early adolescents' online essays to determine their knowledge of global lunar patterns. Australian and American students in grades five to seven wrote about global lunar patterns they had discovered by sharing observations with each other via the Internet. These essays were analyzed for…
ERIC Educational Resources Information Center
Çepni, Sevcan Bayraktar; Demirel, Elif Tokdemir
2016-01-01
This study aimed to find out the impact of "text mining and imitating" strategies on lexical richness, lexical diversity and general success of students in their compositions in second language writing. The participants were 98 students studying their first year in Karadeniz Technical University in English Language and Literature…
Science and Technology Text Mining: Text Mining of the Journal Cortex
2004-01-01
Amnesia Retrograde Amnesia GENERAL Semantic Memory Episodic Memory Working Memory TEST Serial Position Curve...in Cortex can be reasonably divided into four categories (papers in each category in parenthesis): Semantic Memory (151); Handedness (145); Amnesia ... Semantic Memory (151) is divided into Verbal/ Numerical (76) and Visual/ Spatial (75). Amnesia (119) is divided into Amnesia Symptoms (50) and
Experiences with Text Mining Large Collections of Unstructured Systems Development Artifacts at JPL
NASA Technical Reports Server (NTRS)
Port, Dan; Nikora, Allen; Hihn, Jairus; Huang, LiGuo
2011-01-01
Often repositories of systems engineering artifacts at NASA's Jet Propulsion Laboratory (JPL) are so large and poorly structured that they have outgrown our capability to effectively manually process their contents to extract useful information. Sophisticated text mining methods and tools seem a quick, low-effort approach to automating our limited manual efforts. Our experiences of exploring such methods mainly in three areas including historical risk analysis, defect identification based on requirements analysis, and over-time analysis of system anomalies at JPL, have shown that obtaining useful results requires substantial unanticipated efforts - from preprocessing the data to transforming the output for practical applications. We have not observed any quick 'wins' or realized benefit from short-term effort avoidance through automation in this area. Surprisingly we have realized a number of unexpected long-term benefits from the process of applying text mining to our repositories. This paper elaborates some of these benefits and our important lessons learned from the process of preparing and applying text mining to large unstructured system artifacts at JPL aiming to benefit future TM applications in similar problem domains and also in hope for being extended to broader areas of applications.
NASA Astrophysics Data System (ADS)
Szafarczyk, Anna; Gawałkiewicz, Rafał
2018-03-01
In Poland, there are many mining enterprises, of historic character registered in the UNESCO World Heritage List. One of the oldest mining enterprises in Poland is the Salt Mine in Bochnia. The processes inside the rock mass require that surveying services carry out regular geometric control of the cavities. A particular attention should be paid (due to its sacral function) on St. Kinga Chamber, located 195 metres below the surface, on the mine level "August". So far measurement technologies have been connected with the studies on changes in the geometry of cavities and based on linear bases used to measure convergence. This only provides discrete information (in a point) and not always presents a real state of deformation. In the scanning method, in practice a three dimension image of changes (structural deformations) is obtained, impossible to determine with the application of measurement methods, applied to measure the value of linear convergence (the method with a limited number of bases). Laser scanning, apart from determining the value of volume convergence, gives also the possibility of the visualization of 3D cavern. Moreover, it provides direct information to update mining numerical maps and make it possible to generate various cross-sections through the cavern. The authors analysed the possibility of the application of laser scanning (scanner Faro Focus 3D), as a modern tool allowing the measuring of the value of volume convergence.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
This volume contains five appendixes: Chattanooga Shale preliminary mining study, soils data, meteorologic data, water resources data, and biological resource data. The area around DeKalb County in Tennessee is the most likely site for commercial development for recovery of uranium. (DLC)
Biomedical hypothesis generation by text mining and gene prioritization.
Petric, Ingrid; Ligeti, Balazs; Gyorffy, Balazs; Pongor, Sandor
2014-01-01
Text mining methods can facilitate the generation of biomedical hypotheses by suggesting novel associations between diseases and genes. Previously, we developed a rare-term model called RaJoLink (Petric et al, J. Biomed. Inform. 42(2): 219-227, 2009) in which hypotheses are formulated on the basis of terms rarely associated with a target domain. Since many current medical hypotheses are formulated in terms of molecular entities and molecular mechanisms, here we extend the methodology to proteins and genes, using a standardized vocabulary as well as a gene/protein network model. The proposed enhanced RaJoLink rare-term model combines text mining and gene prioritization approaches. Its utility is illustrated by finding known as well as potential gene-disease associations in ovarian cancer using MEDLINE abstracts and the STRING database.
The Functional Genomics Network in the evolution of biological text mining over the past decade.
Blaschke, Christian; Valencia, Alfonso
2013-03-25
Different programs of The European Science Foundation (ESF) have contributed significantly to connect researchers in Europe and beyond through several initiatives. This support was particularly relevant for the development of the areas related with extracting information from papers (text-mining) because it supported the field in its early phases long before it was recognized by the community. We review the historical development of text mining research and how it was introduced in bioinformatics. Specific applications in (functional) genomics are described like it's integration in genome annotation pipelines and the support to the analysis of high-throughput genomics experimental data, and we highlight the activities of evaluation of methods and benchmarking for which the ESF programme support was instrumental. Copyright © 2013 Elsevier B.V. All rights reserved.
Agile Text Mining for the 2014 i2b2/UTHealth Cardiac Risk Factors Challenge
Cormack, James; Nath, Chinmoy; Milward, David; Raja, Kalpana; Jonnalagadda, Siddhartha R
2016-01-01
This paper describes the use of an agile text mining platform (Linguamatics’ Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 Challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system. PMID:26209007
Fu, Xiao; Batista-Navarro, Riza; Rak, Rafal; Ananiadou, Sophia
2015-01-01
Chronic obstructive pulmonary disease (COPD) is a life-threatening lung disorder whose recent prevalence has led to an increasing burden on public healthcare. Phenotypic information in electronic clinical records is essential in providing suitable personalised treatment to patients with COPD. However, as phenotypes are often "hidden" within free text in clinical records, clinicians could benefit from text mining systems that facilitate their prompt recognition. This paper reports on a semi-automatic methodology for producing a corpus that can ultimately support the development of text mining tools that, in turn, will expedite the process of identifying groups of COPD patients. A corpus of 30 full-text papers was formed based on selection criteria informed by the expertise of COPD specialists. We developed an annotation scheme that is aimed at producing fine-grained, expressive and computable COPD annotations without burdening our curators with a highly complicated task. This was implemented in the Argo platform by means of a semi-automatic annotation workflow that integrates several text mining tools, including a graphical user interface for marking up documents. When evaluated using gold standard (i.e., manually validated) annotations, the semi-automatic workflow was shown to obtain a micro-averaged F-score of 45.70% (with relaxed matching). Utilising the gold standard data to train new concept recognisers, we demonstrated that our corpus, although still a work in progress, can foster the development of significantly better performing COPD phenotype extractors. We describe in this work the means by which we aim to eventually support the process of COPD phenotype curation, i.e., by the application of various text mining tools integrated into an annotation workflow. Although the corpus being described is still under development, our results thus far are encouraging and show great potential in stimulating the development of further automatic COPD phenotype extractors.
78 FR 64397 - Mississippi Regulatory Program
Federal Register 2010, 2011, 2012, 2013, 2014
2013-10-29
... text of the program amendment available at www.regulations.gov . A. Mississippi Surface Coal Mining... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 924...; S2D2SSS08011000SX066A00033F13XS501520] Mississippi Regulatory Program AGENCY: Office of Surface Mining Reclamation and Enforcement...
Redundancy and Novelty Mining in the Business Blogosphere
ERIC Educational Resources Information Center
Tsai, Flora S.; Chan, Kap Luk
2010-01-01
Purpose: The paper aims to explore the performance of redundancy and novelty mining in the business blogosphere, which has not been studied before. Design/methodology/approach: Novelty mining techniques are implemented to single out novel information out of a massive set of text documents. This paper adopted the mixed metric approach which…
Névéol, Aurélie; Wilbur, W John; Lu, Zhiyong
2012-01-01
High-throughput experiments and bioinformatics techniques are creating an exploding volume of data that are becoming overwhelming to keep track of for biologists and researchers who need to access, analyze and process existing data. Much of the available data are being deposited in specialized databases, such as the Gene Expression Omnibus (GEO) for microarrays or the Protein Data Bank (PDB) for protein structures and coordinates. Data sets are also being described by their authors in publications archived in literature databases such as MEDLINE and PubMed Central. Currently, the curation of links between biological databases and the literature mainly relies on manual labour, which makes it a time-consuming and daunting task. Herein, we analysed the current state of link curation between GEO, PDB and MEDLINE. We found that the link curation is heterogeneous depending on the sources and databases involved, and that overlap between sources is low, <50% for PDB and GEO. Furthermore, we showed that text-mining tools can automatically provide valuable evidence to help curators broaden the scope of articles and database entries that they review. As a result, we made recommendations to improve the coverage of curated links, as well as the consistency of information available from different databases while maintaining high-quality curation. Database URLs: http://www.ncbi.nlm.nih.gov/PubMed, http://www.ncbi.nlm.nih.gov/geo/, http://www.rcsb.org/pdb/
Névéol, Aurélie; Wilbur, W. John; Lu, Zhiyong
2012-01-01
High-throughput experiments and bioinformatics techniques are creating an exploding volume of data that are becoming overwhelming to keep track of for biologists and researchers who need to access, analyze and process existing data. Much of the available data are being deposited in specialized databases, such as the Gene Expression Omnibus (GEO) for microarrays or the Protein Data Bank (PDB) for protein structures and coordinates. Data sets are also being described by their authors in publications archived in literature databases such as MEDLINE and PubMed Central. Currently, the curation of links between biological databases and the literature mainly relies on manual labour, which makes it a time-consuming and daunting task. Herein, we analysed the current state of link curation between GEO, PDB and MEDLINE. We found that the link curation is heterogeneous depending on the sources and databases involved, and that overlap between sources is low, <50% for PDB and GEO. Furthermore, we showed that text-mining tools can automatically provide valuable evidence to help curators broaden the scope of articles and database entries that they review. As a result, we made recommendations to improve the coverage of curated links, as well as the consistency of information available from different databases while maintaining high-quality curation. Database URLs: http://www.ncbi.nlm.nih.gov/PubMed, http://www.ncbi.nlm.nih.gov/geo/, http://www.rcsb.org/pdb/ PMID:22685160
NASA Technical Reports Server (NTRS)
Gertsch, Richard E.
1992-01-01
A models lunar mining method is proposed that illustrates the problems to be expected in lunar mining and how they might be solved. While the method is quite feasible, it is, more importantly, a useful baseline system against which to test other, possible better, methods. Our study group proposed the slusher to stimulate discussion of how a lunar mining operation might be successfully accomplished. Critics of the slusher system were invited to propose better methods. The group noted that while nonterrestrial mining has been a vital part of past space manufacturing proposals, no one has proposed a lunar mining system in any real detail. The group considered it essential that the design of actual, workable, and specific lunar mining methods begin immediately. Based on an earlier proposal, the method is a three-drum slusher, also known as a cable-operated drag scraper. Its terrestrial application is quite limited, as it is relatively inefficient and inflexible. The method usually finds use in underwater mining from the shore and in moving small amounts of ore underground. When lunar mining scales up, the lunarized slusher will be replaced by more efficient, high-volume methods. Other aspects of lunar mining are discussed.
Space Resources Utilization Roundtable
NASA Technical Reports Server (NTRS)
1999-01-01
This volume contains abstracts that have been accepted for presentation at the Space Resources Utilization Roundtable, October 27-29, 1999, in Golden, Colorado. The program committee consisted of M. B. Duke (Lunar and Planetary Institute), G. Baughman (Colorado School of Mines), D. Criswell (University of Houston), C. Graham (Canadian Mining Industry Research Organization), H. H. Schmitt (Apollo Astronaut), W. Sharp (Colorado School of Mines), L. Taylor (University of Tennessee), and a space manufacturing representative. Administration and publications support for this meeting were provided by the staff of the Publications and Program Services Department at the Lunar and Planetary Institute.
Annotation analysis for testing drug safety signals using unstructured clinical notes
2012-01-01
Background The electronic surveillance for adverse drug events is largely based upon the analysis of coded data from reporting systems. Yet, the vast majority of electronic health data lies embedded within the free text of clinical notes and is not gathered into centralized repositories. With the increasing access to large volumes of electronic medical data—in particular the clinical notes—it may be possible to computationally encode and to test drug safety signals in an active manner. Results We describe the application of simple annotation tools on clinical text and the mining of the resulting annotations to compute the risk of getting a myocardial infarction for patients with rheumatoid arthritis that take Vioxx. Our analysis clearly reveals elevated risks for myocardial infarction in rheumatoid arthritis patients taking Vioxx (odds ratio 2.06) before 2005. Conclusions Our results show that it is possible to apply annotation analysis methods for testing hypotheses about drug safety using electronic medical records. PMID:22541596
Meng, Guilin; Meng, Xiulin; Ma, Xiaoye; Zhang, Gengping; Hu, Xiaolin; Jin, Aiping; Liu, Xueyuan
2018-01-01
Alzheimer’s disease (AD) is an increasing concern in human health. Despite significant research, highly effective drugs to treat AD are lacking. The present study describes the text mining process to identify drug candidates from a traditional Chinese medicine (TCM) database, along with associated protein target mechanisms. We carried out text mining to identify literatures that referenced both AD and TCM and focused on identifying compounds and protein targets of interest. After targeting one potential TCM candidate, corresponding protein-protein interaction (PPI) networks were assembled in STRING to decipher the most possible mechanism of action. This was followed by validation using Western blot and co-immunoprecipitation in an AD cell model. The text mining strategy using a vast amount of AD-related literature and the TCM database identified curcumin, whose major component was ferulic acid (FA). This was used as a key candidate compound for further study. Using the top calculated interaction score in STRING, BACE1 and MMP2 were implicated in the activity of FA in AD. Exposure of SHSY5Y-APP cells to FA resulted in the decrease in expression levels of BACE-1 and APP, while the expression of MMP-2 and MMP-9 increased in a dose-dependent manner. This suggests that FA induced BACE1 and MMP2 pathways maybe novel potential mechanisms involved in AD. The text mining of literature and TCM database related to AD suggested FA as a promising TCM ingredient for the treatment of AD. Potential mechanisms interconnected and integrated with Aβ aggregation inhibition and extracellular matrix remodeling underlying the activity of FA were identified using in vitro studies. PMID:29896095
Meng, Guilin; Meng, Xiulin; Ma, Xiaoye; Zhang, Gengping; Hu, Xiaolin; Jin, Aiping; Zhao, Yanxin; Liu, Xueyuan
2018-01-01
Alzheimer's disease (AD) is an increasing concern in human health. Despite significant research, highly effective drugs to treat AD are lacking. The present study describes the text mining process to identify drug candidates from a traditional Chinese medicine (TCM) database, along with associated protein target mechanisms. We carried out text mining to identify literatures that referenced both AD and TCM and focused on identifying compounds and protein targets of interest. After targeting one potential TCM candidate, corresponding protein-protein interaction (PPI) networks were assembled in STRING to decipher the most possible mechanism of action. This was followed by validation using Western blot and co-immunoprecipitation in an AD cell model. The text mining strategy using a vast amount of AD-related literature and the TCM database identified curcumin, whose major component was ferulic acid (FA). This was used as a key candidate compound for further study. Using the top calculated interaction score in STRING, BACE1 and MMP2 were implicated in the activity of FA in AD. Exposure of SHSY5Y-APP cells to FA resulted in the decrease in expression levels of BACE-1 and APP, while the expression of MMP-2 and MMP-9 increased in a dose-dependent manner. This suggests that FA induced BACE1 and MMP2 pathways maybe novel potential mechanisms involved in AD. The text mining of literature and TCM database related to AD suggested FA as a promising TCM ingredient for the treatment of AD. Potential mechanisms interconnected and integrated with Aβ aggregation inhibition and extracellular matrix remodeling underlying the activity of FA were identified using in vitro studies.
Johnson, Robin J.; Lay, Jean M.; Lennon-Hopkins, Kelley; Saraceni-Richards, Cynthia; Sciaky, Daniela; Murphy, Cynthia Grondin; Mattingly, Carolyn J.
2013-01-01
The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) is a public resource that curates interactions between environmental chemicals and gene products, and their relationships to diseases, as a means of understanding the effects of environmental chemicals on human health. CTD provides a triad of core information in the form of chemical-gene, chemical-disease, and gene-disease interactions that are manually curated from scientific articles. To increase the efficiency, productivity, and data coverage of manual curation, we have leveraged text mining to help rank and prioritize the triaged literature. Here, we describe our text-mining process that computes and assigns each article a document relevancy score (DRS), wherein a high DRS suggests that an article is more likely to be relevant for curation at CTD. We evaluated our process by first text mining a corpus of 14,904 articles triaged for seven heavy metals (cadmium, cobalt, copper, lead, manganese, mercury, and nickel). Based upon initial analysis, a representative subset corpus of 3,583 articles was then selected from the 14,094 articles and sent to five CTD biocurators for review. The resulting curation of these 3,583 articles was analyzed for a variety of parameters, including article relevancy, novel data content, interaction yield rate, mean average precision, and biological and toxicological interpretability. We show that for all measured parameters, the DRS is an effective indicator for scoring and improving the ranking of literature for the curation of chemical-gene-disease information at CTD. Here, we demonstrate how fully incorporating text mining-based DRS scoring into our curation pipeline enhances manual curation by prioritizing more relevant articles, thereby increasing data content, productivity, and efficiency. PMID:23613709
Use of colliery spoil for infilling mine workings
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ghataora, G.S.; Jarvis, S.T.
1996-12-31
Colliery spoil has been used as a major constituent of rock paste, a controlled low-strength bulk infill material, to infill abandoned limestone mines in the West Midlands of England since the early 1980s. During this time the design of colliery spoil rock paste has been modified and improved to ensure that strengths are achieved and consolidation is minimized. This paper describes the methods used for measuring and monitoring the development of the strength of rock paste used to infill the Littleton Street Mine in Walsall, England. The mine had a volume of about 500,000 m{sup 3} and is possibly themore » largest underground void to be infilled with rock paste.« less
A New Framework for Textual Information Mining over Parse Trees. CRESST Report 805
ERIC Educational Resources Information Center
Mousavi, Hamid; Kerr, Deirdre; Iseli, Markus R.
2011-01-01
Textual information mining is a challenging problem that has resulted in the creation of many different rule-based linguistic query languages. However, these languages generally are not optimized for the purpose of text mining. In other words, they usually consider queries as individuals and only return raw results for each query. Moreover they…
Data Mining: A Hybrid Methodology for Complex and Dynamic Research
ERIC Educational Resources Information Center
Lang, Susan; Baehr, Craig
2012-01-01
This article provides an overview of the ways in which data and text mining have potential as research methodologies in composition studies. It introduces data mining in the context of the field of composition studies and discusses ways in which this methodology can complement and extend our existing research practices by blending the best of what…
ERIC Educational Resources Information Center
Luan, Jing; Zhao, Chun-Mei; Hayek, John C.
2009-01-01
Data mining provides both systematic and systemic ways to detect patterns of student engagement among students at hundreds of institutions. Using traditional statistical techniques alone, the task would be significantly difficult--if not impossible--considering the size and complexity in both data and analytical approaches necessary for this…
Large Scale Data Mining to Improve Usability of Data: An Intelligent Archive Testbed
NASA Technical Reports Server (NTRS)
Ramapriyan, Hampapuram; Isaac, David; Yang, Wenli; Morse, Steve
2005-01-01
Research in certain scientific disciplines - including Earth science, particle physics, and astrophysics - continually faces the challenge that the volume of data needed to perform valid scientific research can at times overwhelm even a sizable research community. The desire to improve utilization of this data gave rise to the Intelligent Archives project, which seeks to make data archives active participants in a knowledge building system capable of discovering events or patterns that represent new information or knowledge. Data mining can automatically discover patterns and events, but it is generally viewed as unsuited for large-scale use in disciplines like Earth science that routinely involve very high data volumes. Dozens of research projects have shown promising uses of data mining in Earth science, but all of these are based on experiments with data subsets of a few gigabytes or less, rather than the terabytes or petabytes typically encountered in operational systems. To bridge this gap, the Intelligent Archives project is establishing a testbed with the goal of demonstrating the use of data mining techniques in an operationally-relevant environment. This paper discusses the goals of the testbed and the design choices surrounding critical issues that arose during testbed implementation.
Sand mining impacts on long-term dune erosion in southern Monterey Bay
Thornton, E.B.; Sallenger, Abby; Sesto, Juan Conforto; Egley, L.; McGee, Timothy; Parsons, Rost
2006-01-01
Southern Monterey Bay was the most intensively mined shoreline (with sand removed directly from the surf zone) in the U.S. during the period from 1906 until 1990, when the mines were closed following hypotheses that the mining caused coastal erosion. It is estimated that the yearly averaged amount of mined sand between 1940 and 1984 was 128,000 m3/yr, which is approximately 50% of the yearly average dune volume loss during this period. To assess the impact of sand mining, erosion rates along an 18 km range of shoreline during the times of intensive sand mining (1940–1990) are compared with the rates after sand mining ceased (1990–2004). Most of the shoreline is composed of unconsolidated sand with extensive sand dunes rising up to a height of 46 m, vulnerable to the erosive forces of storm waves. Erosion is defined here as a recession of the top edge of the dune. Recession was determined using stereo-photogrammetry, and LIDAR and GPS surveys. Long-term erosion rates vary from about 0.5 m/yr at Monterey to 1.5 m/yr in the middle of the range, and then decrease northward. Erosion events are episodic and occur when storm waves and high tides coincide, allowing swash to undercut the dune and resulting in permanent recession. Erosion appears to be correlated with the occurrence of El Niños. The calculated volume loss of the dune in southern Monterey Bay during the 1997–98 El Niño winter was 1,820,000 m3, which is almost seven times the historical annual mean dune erosion of 270,000 m3/yr. The alongshore variation in recession rates appears to be a function of the alongshore gradient in mean wave energy and depletions by sand mining. After cessation of sand mining in 1990, the erosion rates decreased at locations in the southern end of the bay but have not significantly changed at other locations.
Text mining and medicine: usefulness in respiratory diseases.
Piedra, David; Ferrer, Antoni; Gea, Joaquim
2014-03-01
It is increasingly common to have medical information in electronic format. This includes scientific articles as well as clinical management reviews, and even records from health institutions with patient data. However, traditional instruments, both individual and institutional, are of little use for selecting the most appropriate information in each case, either in the clinical or research field. So-called text or data «mining» enables this huge amount of information to be managed, extracting it from various sources using processing systems (filtration and curation), integrating it and permitting the generation of new knowledge. This review aims to provide an overview of text and data mining, and of the potential usefulness of this bioinformatic technique in the exercise of care in respiratory medicine and in research in the same field. Copyright © 2013 SEPAR. Published by Elsevier Espana. All rights reserved.
Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge.
Cormack, James; Nath, Chinmoy; Milward, David; Raja, Kalpana; Jonnalagadda, Siddhartha R
2015-12-01
This paper describes the use of an agile text mining platform (Linguamatics' Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system. Copyright © 2015 Elsevier Inc. All rights reserved.
Overview of the gene ontology task at BioCreative IV.
Mao, Yuqing; Van Auken, Kimberly; Li, Donghui; Arighi, Cecilia N; McQuilton, Peter; Hayman, G Thomas; Tweedie, Susan; Schaeffer, Mary L; Laulederkind, Stanley J F; Wang, Shur-Jen; Gobeill, Julien; Ruch, Patrick; Luu, Anh Tuan; Kim, Jung-Jae; Chiang, Jung-Hsien; Chen, Yu-De; Yang, Chia-Jung; Liu, Hongfang; Zhu, Dongqing; Li, Yanpeng; Yu, Hong; Emadzadeh, Ehsan; Gonzalez, Graciela; Chen, Jian-Ming; Dai, Hong-Jie; Lu, Zhiyong
2014-01-01
Gene ontology (GO) annotation is a common task among model organism databases (MODs) for capturing gene function data from journal articles. It is a time-consuming and labor-intensive task, and is thus often considered as one of the bottlenecks in literature curation. There is a growing need for semiautomated or fully automated GO curation techniques that will help database curators to rapidly and accurately identify gene function information in full-length articles. Despite multiple attempts in the past, few studies have proven to be useful with regard to assisting real-world GO curation. The shortage of sentence-level training data and opportunities for interaction between text-mining developers and GO curators has limited the advances in algorithm development and corresponding use in practical circumstances. To this end, we organized a text-mining challenge task for literature-based GO annotation in BioCreative IV. More specifically, we developed two subtasks: (i) to automatically locate text passages that contain GO-relevant information (a text retrieval task) and (ii) to automatically identify relevant GO terms for the genes in a given article (a concept-recognition task). With the support from five MODs, we provided teams with >4000 unique text passages that served as the basis for each GO annotation in our task data. Such evidence text information has long been recognized as critical for text-mining algorithm development but was never made available because of the high cost of curation. In total, seven teams participated in the challenge task. From the team results, we conclude that the state of the art in automatically mining GO terms from literature has improved over the past decade while much progress is still needed for computer-assisted GO curation. Future work should focus on addressing remaining technical challenges for improved performance of automatic GO concept recognition and incorporating practical benefits of text-mining tools into real-world GO annotation. http://www.biocreative.org/tasks/biocreative-iv/track-4-GO/. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.
PPInterFinder--a mining tool for extracting causal relations on human proteins from literature.
Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar
2013-01-01
One of the most common and challenging problem in biomedical text mining is to mine protein-protein interactions (PPIs) from MEDLINE abstracts and full-text research articles because PPIs play a major role in understanding the various biological processes and the impact of proteins in diseases. We implemented, PPInterFinder--a web-based text mining tool to extract human PPIs from biomedical literature. PPInterFinder uses relation keyword co-occurrences with protein names to extract information on PPIs from MEDLINE abstracts and consists of three phases. First, it identifies the relation keyword using a parser with Tregex and a relation keyword dictionary. Next, it automatically identifies the candidate PPI pairs with a set of rules related to PPI recognition. Finally, it extracts the relations by matching the sentence with a set of 11 specific patterns based on the syntactic nature of PPI pair. We find that PPInterFinder is capable of predicting PPIs with the accuracy of 66.05% on AIMED corpus and outperforms most of the existing systems. DATABASE URL: http://www.biomining-bu.in/ppinterfinder/
PPInterFinder—a mining tool for extracting causal relations on human proteins from literature
Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar
2013-01-01
One of the most common and challenging problem in biomedical text mining is to mine protein–protein interactions (PPIs) from MEDLINE abstracts and full-text research articles because PPIs play a major role in understanding the various biological processes and the impact of proteins in diseases. We implemented, PPInterFinder—a web-based text mining tool to extract human PPIs from biomedical literature. PPInterFinder uses relation keyword co-occurrences with protein names to extract information on PPIs from MEDLINE abstracts and consists of three phases. First, it identifies the relation keyword using a parser with Tregex and a relation keyword dictionary. Next, it automatically identifies the candidate PPI pairs with a set of rules related to PPI recognition. Finally, it extracts the relations by matching the sentence with a set of 11 specific patterns based on the syntactic nature of PPI pair. We find that PPInterFinder is capable of predicting PPIs with the accuracy of 66.05% on AIMED corpus and outperforms most of the existing systems. Database URL: http://www.biomining-bu.in/ppinterfinder/ PMID:23325628
Topaz, Maxim; Radhakrishnan, Kavita; Lei, Victor; Zhou, Li
2016-01-01
Effective self-management can decrease up to 50% of heart failure hospitalizations. Unfortunately, self-management by patients with heart failure remains poor. This pilot study aimed to explore the use of text-mining to identify heart failure patients with ineffective self-management. We first built a comprehensive self-management vocabulary based on the literature and clinical notes review. We then randomly selected 545 heart failure patients treated within Partners Healthcare hospitals (Boston, MA, USA) and conducted a regular expression search with the compiled vocabulary within 43,107 interdisciplinary clinical notes of these patients. We found that 38.2% (n = 208) patients had documentation of ineffective heart failure self-management in the domains of poor diet adherence (28.4%), missed medical encounters (26.4%) poor medication adherence (20.2%) and non-specified self-management issues (e.g., "compliance issues", 34.6%). We showed the feasibility of using text-mining to identify patients with ineffective self-management. More natural language processing algorithms are needed to help busy clinicians identify these patients.
Integration of Artificial Market Simulation and Text Mining for Market Analysis
NASA Astrophysics Data System (ADS)
Izumi, Kiyoshi; Matsui, Hiroki; Matsuo, Yutaka
We constructed an evaluation system of the self-impact in a financial market using an artificial market and text-mining technology. Economic trends were first extracted from text data circulating in the real world. Then, the trends were inputted into the market simulation. Our simulation revealed that an operation by intervention could reduce over 70% of rate fluctuation in 1995. By the simulation results, the system was able to help for its user to find the exchange policy which can stabilize the yen-dollar rate.
COAL PREPARATION PLANT COMPUTER MODEL: VOLUME I. USER DOCUMENTATION
The two-volume report describes a steady state modeling system that simulates the performance of coal preparation plants. The system was developed originally under the technical leadership of the U.S. Bureau of Mines and the sponsorship of the EPA. The modified form described in ...
40 CFR 440.141 - Specialized definitions and provisions.
Code of Federal Regulations, 2011 CFR
2011-07-01
... shaking tables. (7) “Infiltration water” means that water which permeates through the earth into the plant... drainage, and infiltration and drainage waters which commingle with mine drainage or waters resulting from... increase in volume from precipitation or infiltration, plus the maximum volume of water runoff resulting...
40 CFR 440.141 - Specialized definitions and provisions.
Code of Federal Regulations, 2014 CFR
2014-07-01
..., hydrocyclones, or shaking tables. (7) “Infiltration water” means that water which permeates through the earth... drainage, and infiltration and drainage waters which commingle with mine drainage or waters resulting from... increase in volume from precipitation or infiltration, plus the maximum volume of water runoff resulting...
40 CFR 440.141 - Specialized definitions and provisions.
Code of Federal Regulations, 2013 CFR
2013-07-01
..., hydrocyclones, or shaking tables. (7) “Infiltration water” means that water which permeates through the earth... drainage, and infiltration and drainage waters which commingle with mine drainage or waters resulting from... increase in volume from precipitation or infiltration, plus the maximum volume of water runoff resulting...
40 CFR 440.141 - Specialized definitions and provisions.
Code of Federal Regulations, 2012 CFR
2012-07-01
..., hydrocyclones, or shaking tables. (7) “Infiltration water” means that water which permeates through the earth... drainage, and infiltration and drainage waters which commingle with mine drainage or waters resulting from... increase in volume from precipitation or infiltration, plus the maximum volume of water runoff resulting...
40 CFR 440.141 - Specialized definitions and provisions.
Code of Federal Regulations, 2010 CFR
2010-07-01
... shaking tables. (7) “Infiltration water” means that water which permeates through the earth into the plant... drainage, and infiltration and drainage waters which commingle with mine drainage or waters resulting from... increase in volume from precipitation or infiltration, plus the maximum volume of water runoff resulting...
BioC implementations in Go, Perl, Python and Ruby
Liu, Wanli; Islamaj Doğan, Rezarta; Kwon, Dongseop; Marques, Hernani; Rinaldi, Fabio; Wilbur, W. John; Comeau, Donald C.
2014-01-01
As part of a communitywide effort for evaluating text mining and information extraction systems applied to the biomedical domain, BioC is focused on the goal of interoperability, currently a major barrier to wide-scale adoption of text mining tools. BioC is a simple XML format, specified by DTD, for exchanging data for biomedical natural language processing. With initial implementations in C++ and Java, BioC provides libraries of code for reading and writing BioC text documents and annotations. We extend BioC to Perl, Python, Go and Ruby. We used SWIG to extend the C++ implementation for Perl and one Python implementation. A second Python implementation and the Ruby implementation use native data structures and libraries. BioC is also implemented in the Google language Go. BioC modules are functional in all of these languages, which can facilitate text mining tasks. BioC implementations are freely available through the BioC site: http://bioc.sourceforge.net. Database URL: http://bioc.sourceforge.net/ PMID:24961236
PathText: a text mining integrator for biological pathway visualizations
Kemper, Brian; Matsuzaki, Takuya; Matsuoka, Yukiko; Tsuruoka, Yoshimasa; Kitano, Hiroaki; Ananiadou, Sophia; Tsujii, Jun'ichi
2010-01-01
Motivation: Metabolic and signaling pathways are an increasingly important part of organizing knowledge in systems biology. They serve to integrate collective interpretations of facts scattered throughout literature. Biologists construct a pathway by reading a large number of articles and interpreting them as a consistent network, but most of the models constructed currently lack direct links to those articles. Biologists who want to check the original articles have to spend substantial amounts of time to collect relevant articles and identify the sections relevant to the pathway. Furthermore, with the scientific literature expanding by several thousand papers per week, keeping a model relevant requires a continuous curation effort. In this article, we present a system designed to integrate a pathway visualizer, text mining systems and annotation tools into a seamless environment. This will enable biologists to freely move between parts of a pathway and relevant sections of articles, as well as identify relevant papers from large text bases. The system, PathText, is developed by Systems Biology Institute, Okinawa Institute of Science and Technology, National Centre for Text Mining (University of Manchester) and the University of Tokyo, and is being used by groups of biologists from these locations. Contact: brian@monrovian.com. PMID:20529930
@Note: a workbench for biomedical text mining.
Lourenço, Anália; Carreira, Rafael; Carneiro, Sónia; Maia, Paulo; Glez-Peña, Daniel; Fdez-Riverola, Florentino; Ferreira, Eugénio C; Rocha, Isabel; Rocha, Miguel
2009-08-01
Biomedical Text Mining (BioTM) is providing valuable approaches to the automated curation of scientific literature. However, most efforts have addressed the benchmarking of new algorithms rather than user operational needs. Bridging the gap between BioTM researchers and biologists' needs is crucial to solve real-world problems and promote further research. We present @Note, a platform for BioTM that aims at the effective translation of the advances between three distinct classes of users: biologists, text miners and software developers. Its main functional contributions are the ability to process abstracts and full-texts; an information retrieval module enabling PubMed search and journal crawling; a pre-processing module with PDF-to-text conversion, tokenisation and stopword removal; a semantic annotation schema; a lexicon-based annotator; a user-friendly annotation view that allows to correct annotations and a Text Mining Module supporting dataset preparation and algorithm evaluation. @Note improves the interoperability, modularity and flexibility when integrating in-home and open-source third-party components. Its component-based architecture allows the rapid development of new applications, emphasizing the principles of transparency and simplicity of use. Although it is still on-going, it has already allowed the development of applications that are currently being used.
miRTex: A Text Mining System for miRNA-Gene Relation Extraction
Li, Gang; Ross, Karen E.; Arighi, Cecilia N.; Peng, Yifan; Wu, Cathy H.; Vijay-Shanker, K.
2015-01-01
MicroRNAs (miRNAs) regulate a wide range of cellular and developmental processes through gene expression suppression or mRNA degradation. Experimentally validated miRNA gene targets are often reported in the literature. In this paper, we describe miRTex, a text mining system that extracts miRNA-target relations, as well as miRNA-gene and gene-miRNA regulation relations. The system achieves good precision and recall when evaluated on a literature corpus of 150 abstracts with F-scores close to 0.90 on the three different types of relations. We conducted full-scale text mining using miRTex to process all the Medline abstracts and all the full-length articles in the PubMed Central Open Access Subset. The results for all the Medline abstracts are stored in a database for interactive query and file download via the website at http://proteininformationresource.org/mirtex. Using miRTex, we identified genes potentially regulated by miRNAs in Triple Negative Breast Cancer, as well as miRNA-gene relations that, in conjunction with kinase-substrate relations, regulate the response to abiotic stress in Arabidopsis thaliana. These two use cases demonstrate the usefulness of miRTex text mining in the analysis of miRNA-regulated biological processes. PMID:26407127
Cañada, Andres; Rabal, Obdulia; Oyarzabal, Julen; Valencia, Alfonso
2017-01-01
Abstract A considerable effort has been devoted to retrieve systematically information for genes and proteins as well as relationships between them. Despite the importance of chemical compounds and drugs as a central bio-entity in pharmacological and biological research, only a limited number of freely available chemical text-mining/search engine technologies are currently accessible. Here we present LimTox (Literature Mining for Toxicology), a web-based online biomedical search tool with special focus on adverse hepatobiliary reactions. It integrates a range of text mining, named entity recognition and information extraction components. LimTox relies on machine-learning, rule-based, pattern-based and term lookup strategies. This system processes scientific abstracts, a set of full text articles and medical agency assessment reports. Although the main focus of LimTox is on adverse liver events, it enables also basic searches for other organ level toxicity associations (nephrotoxicity, cardiotoxicity, thyrotoxicity and phospholipidosis). This tool supports specialized search queries for: chemical compounds/drugs, genes (with additional emphasis on key enzymes in drug metabolism, namely P450 cytochromes—CYPs) and biochemical liver markers. The LimTox website is free and open to all users and there is no login requirement. LimTox can be accessed at: http://limtox.bioinfo.cnio.es PMID:28531339
Text mining a self-report back-translation.
Blanch, Angel; Aluja, Anton
2016-06-01
There are several recommendations about the routine to undertake when back translating self-report instruments in cross-cultural research. However, text mining methods have been generally ignored within this field. This work describes a text mining innovative application useful to adapt a personality questionnaire to 12 different languages. The method is divided in 3 different stages, a descriptive analysis of the available back-translated instrument versions, a dissimilarity assessment between the source language instrument and the 12 back-translations, and an item assessment of item meaning equivalence. The suggested method contributes to improve the back-translation process of self-report instruments for cross-cultural research in 2 significant intertwined ways. First, it defines a systematic approach to the back translation issue, allowing for a more orderly and informed evaluation concerning the equivalence of different versions of the same instrument in different languages. Second, it provides more accurate instrument back-translations, which has direct implications for the reliability and validity of the instrument's test scores when used in different cultures/languages. In addition, this procedure can be extended to the back-translation of self-reports measuring psychological constructs in clinical assessment. Future research works could refine the suggested methodology and use additional available text mining tools. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Elayavilli, Ravikumar Komandur; Liu, Hongfang
2016-01-01
Computational modeling of biological cascades is of great interest to quantitative biologists. Biomedical text has been a rich source for quantitative information. Gathering quantitative parameters and values from biomedical text is one significant challenge in the early steps of computational modeling as it involves huge manual effort. While automatically extracting such quantitative information from bio-medical text may offer some relief, lack of ontological representation for a subdomain serves as impedance in normalizing textual extractions to a standard representation. This may render textual extractions less meaningful to the domain experts. In this work, we propose a rule-based approach to automatically extract relations involving quantitative data from biomedical text describing ion channel electrophysiology. We further translated the quantitative assertions extracted through text mining to a formal representation that may help in constructing ontology for ion channel events using a rule based approach. We have developed Ion Channel ElectroPhysiology Ontology (ICEPO) by integrating the information represented in closely related ontologies such as, Cell Physiology Ontology (CPO), and Cardiac Electro Physiology Ontology (CPEO) and the knowledge provided by domain experts. The rule-based system achieved an overall F-measure of 68.93% in extracting the quantitative data assertions system on an independently annotated blind data set. We further made an initial attempt in formalizing the quantitative data assertions extracted from the biomedical text into a formal representation that offers potential to facilitate the integration of text mining into ontological workflow, a novel aspect of this study. This work is a case study where we created a platform that provides formal interaction between ontology development and text mining. We have achieved partial success in extracting quantitative assertions from the biomedical text and formalizing them in ontological framework. The ICEPO ontology is available for download at http://openbionlp.org/mutd/supplementarydata/ICEPO/ICEPO.owl.
ERIC Educational Resources Information Center
Yeakey, Carol Camp, Ed.; Henderson, Ronald D., Ed.
This volume includes papers 16-32 in a 32-paper collection: (16) "Mining the Fields of Teacher Education: Preparing Teachers to Teach African American Children in Urban Schools" (Patricia A. Edwards, Gwendolyn T. McMillon, and Clifford T. Bennett); (17) "Mentoring Adolescents At Risk or At Promise" (Tammie M. Causey and Kassie…
Mining large heterogeneous data sets in drug discovery.
Wild, David J
2009-10-01
Increasingly, effective drug discovery involves the searching and data mining of large volumes of information from many sources covering the domains of chemistry, biology and pharmacology amongst others. This has led to a proliferation of databases and data sources relevant to drug discovery. This paper provides a review of the publicly-available large-scale databases relevant to drug discovery, describes the kinds of data mining approaches that can be applied to them and discusses recent work in integrative data mining that looks for associations that pan multiple sources, including the use of Semantic Web techniques. The future of mining large data sets for drug discovery requires intelligent, semantic aggregation of information from all of the data sources described in this review, along with the application of advanced methods such as intelligent agents and inference engines in client applications.
Flooded Underground Coal Mines: A Significant Source of Inexpensive Geothermal Energy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Watzlaf, G.R.; Ackman, T.E.
2007-04-01
Many mining regions in the United States contain extensive areas of flooded underground mines. The water within these mines represents a significant and widespread opportunity for extracting low-grade, geothermal energy. Based on current energy prices, geothermal heat pump systems using mine water could reduce the annual costs for heating to over 70 percent compared to conventional heating methods (natural gas or heating oil). These same systems could reduce annual cooling costs by up to 50 percent over standard air conditioning in many areas of the country. (Formatted full-text version is released by permission of publisher)
Mining Quality Phrases from Massive Text Corpora
Liu, Jialu; Shang, Jingbo; Wang, Chi; Ren, Xiang; Han, Jiawei
2015-01-01
Text data are ubiquitous and play an essential role in big data applications. However, text data are mostly unstructured. Transforming unstructured text into structured units (e.g., semantically meaningful phrases) will substantially reduce semantic ambiguity and enhance the power and efficiency at manipulating such data using database technology. Thus mining quality phrases is a critical research problem in the field of databases. In this paper, we propose a new framework that extracts quality phrases from text corpora integrated with phrasal segmentation. The framework requires only limited training but the quality of phrases so generated is close to human judgment. Moreover, the method is scalable: both computation time and required space grow linearly as corpus size increases. Our experiments on large text corpora demonstrate the quality and efficiency of the new method. PMID:26705375
Chapter 9: Planting hardwood tree seedlings on reclaimed mine land in the Appalachian region
V. Davis; J. Franklin; C. Zipper; P. Angel
2017-01-01
The Forestry Reclamation Approach (FRA) is a method of reclaiming surface coal mines to forested postmining land use (Chapter 2, this volume). "Use proper tree planting techniques" is Step 5 of the FRA; when used with the other FRA steps, proper tree planting can help to ensure successful reforestation. Proper care and planting of tree seedlings is essential...
Cold Treatment of Parts of ESh-20.90S Dragline for Mining Applications at the Uralmashzavod Plant
NASA Astrophysics Data System (ADS)
Krutikova, I. A.; Pupyrev, M. B.; Zakharenko, S. N.; Tikhonova, L. A.
2015-05-01
Results of laboratory and industrial testing of the effect of cold treatment on the structure, hardness and size stability of parts such as spools and sleeves entering the hydraulic drive of an ESh-20.90S dragline (bucket volume 20 m3, boom length 90 m) for mining applications under the conditions of the Far North are presented.
Hardwood tree growth on amended mine soils in west virginia.
Wilson-Kokes, Lindsay; Delong, Curtis; Thomas, Calene; Emerson, Paul; O'Dell, Keith; Skousen, Jeff
2013-09-01
Each year surface mining in Appalachia disrupts large areas of forested land. The Surface Mining Control and Reclamation Act requires coal mine operators to establish a permanent vegetative cover after mining, and current practice emphasizes soil compaction and planting of competitive forage grasses to stabilize the site and control erosion. These practices hinder recolonization of native hardwood trees on these reclaimed sites. Recently reclamation scientists and regulators have encouraged re-establishment of hardwood forests on surface mined land through careful selection and placement of rooting media and proper selection and planting of herbaceous and tree species. To evaluate the effect of rooting media and soil amendments, a 2.8-ha experimental plot was established, with half of the plot being constructed of weathered brown sandstone and half constructed of unweathered gray sandstone. Bark mulch was applied to an area covering both sandstone types, and the ends of the plot were hydroseeded with a tree-compatible herbaceous seed mix, resulting in eight soil treatments. Twelve hardwood tree species were planted, and soil chemical properties and tree growth were measured annually from 2007 to 2012. After six growing seasons, average tree volume index was higher for trees grown on brown sandstone (5333 cm) compared with gray sandstone (3031 cm). Trees planted in mulch outperformed trees on nonmulched treatments (volume index of 6187 cm vs. 4194 cm). Hydroseeding with a tree-compatible mix produced greater ground cover (35 vs. 15%) and resulted in greater tree volume index than nonhydroseed areas (5809 vs. 3403 cm). Soil chemical properties were improved by mulch and improved tree growth, especially on gray sandstone. The average pH of brown sandstone was 5.0 to 5.4, and gray sandstone averaged pH 6.9 to 7.7. The mulch treatment on gray sandstone resulted in tree growth similar to brown sandstone alone and with mulch. After 6 yr, tree growth on brown sandstone was about double the tree growth on gray sandstone, and mulch was a successful amendment to improve tree growth. Copyright © by the American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America, Inc.
Knowledge Discovery and Data Mining in Iran's Climatic Researches
NASA Astrophysics Data System (ADS)
Karimi, Mostafa
2013-04-01
Advances in measurement technology and data collection is the database gets larger. Large databases require powerful tools for analysis data. Iterative process of acquiring knowledge from information obtained from data processing is done in various forms in all scientific fields. However, when the data volume large, and many of the problems the Traditional methods cannot respond. in the recent years, use of databases in various scientific fields, especially atmospheric databases in climatology expanded. in addition, increases in the amount of data generated by the climate models is a challenge for analysis of it for extraction of hidden pattern and knowledge. The approach to this problem has been made in recent years uses the process of knowledge discovery and data mining techniques with the use of the concepts of machine learning, artificial intelligence and expert (professional) systems is overall performance. Data manning is analytically process for manning in massive volume data. The ultimate goal of data mining is access to information and finally knowledge. climatology is a part of science that uses variety and massive volume data. Goal of the climate data manning is Achieve to information from variety and massive atmospheric and non-atmospheric data. in fact, Knowledge Discovery performs these activities in a logical and predetermined and almost automatic process. The goal of this research is study of uses knowledge Discovery and data mining technique in Iranian climate research. For Achieve This goal, study content (descriptive) analysis and classify base method and issue. The result shown that in climatic research of Iran most clustering, k-means and wards applied and in terms of issues precipitation and atmospheric circulation patterns most introduced. Although several studies in geography and climate issues with statistical techniques such as clustering and pattern extraction is done, Due to the nature of statistics and data mining, but cannot say for internal climate studies in data mining and knowledge discovery techniques are used. However, it is necessary to use the KDD Approach and DM techniques in the climatic studies, specific interpreter of climate modeling result.
Systematic drug repositioning through mining adverse event data in ClinicalTrials.gov.
Su, Eric Wen; Sanger, Todd M
2017-01-01
Drug repositioning (i.e., drug repurposing) is the process of discovering new uses for marketed drugs. Historically, such discoveries were serendipitous. However, the rapid growth in electronic clinical data and text mining tools makes it feasible to systematically identify drugs with the potential to be repurposed. Described here is a novel method of drug repositioning by mining ClinicalTrials.gov. The text mining tools I2E (Linguamatics) and PolyAnalyst (Megaputer) were utilized. An I2E query extracts "Serious Adverse Events" (SAE) data from randomized trials in ClinicalTrials.gov. Through a statistical algorithm, a PolyAnalyst workflow ranks the drugs where the treatment arm has fewer predefined SAEs than the control arm, indicating that potentially the drug is reducing the level of SAE. Hypotheses could then be generated for the new use of these drugs based on the predefined SAE that is indicative of disease (for example, cancer).
Matsuda, Yoshio; Manaka, Tomoko; Kobayashi, Makiko; Sato, Shuhei; Ohwada, Michitaka
2016-06-01
The aim of the present study was to examine the possibility of screening apprehensive pregnant women and mothers at risk for post-partum depression from an analysis of the textual data in the Mother and Child Handbook by using the text-mining method. Uncomplicated pregnant women (n = 58) were divided into two groups according to State-Trait Anxiety Inventory grade (high trait [group I, n = 21] and low trait [group II, n = 37]) or Edinburgh Postnatal Depression Scale score (high score [group III, n = 15] and low score [group IV, n = 43]). An exploratory analysis of the textual data from the Maternal and Child Handbook was conducted using the text-mining method with the Word Miner software program. A comparison of the 'structure elements' was made between the two groups. The number of structure elements extracted by separated words from text data was 20 004 and the number of structure elements with a threshold of 2 or more as an initial value was 1168. Fifteen key words related to maternal anxiety, and six key words related to post-partum depression were extracted. The text-mining method is useful for the exploratory analysis of textual data obtained from pregnant woman, and this screening method has been suggested to be useful for apprehensive pregnant women and mothers at risk for post-partum depression. © 2016 Japan Society of Obstetrics and Gynecology.
An Integrated Suite of Text and Data Mining Tools - Phase II
2005-08-30
Riverside, CA, USA Mazda Motor Corp, Jpn Univ of Darmstadt, Darmstadt, Ger Navy Center for Applied Research in Artificial Intelligence Univ of...with Georgia Tech Research Corporation developed a desktop text-mining software tool named TechOASIS (known commercially as VantagePoint). By the...of this dataset and groups the Corporate Source items that co-occur with the found items. He decides he is only interested in the institutions
He, Qiwei; Veldkamp, Bernard P; Glas, Cees A W; de Vries, Theo
2017-03-01
Patients' narratives about traumatic experiences and symptoms are useful in clinical screening and diagnostic procedures. In this study, we presented an automated assessment system to screen patients for posttraumatic stress disorder via a natural language processing and text-mining approach. Four machine-learning algorithms-including decision tree, naive Bayes, support vector machine, and an alternative classification approach called the product score model-were used in combination with n-gram representation models to identify patterns between verbal features in self-narratives and psychiatric diagnoses. With our sample, the product score model with unigrams attained the highest prediction accuracy when compared with practitioners' diagnoses. The addition of multigrams contributed most to balancing the metrics of sensitivity and specificity. This article also demonstrates that text mining is a promising approach for analyzing patients' self-expression behavior, thus helping clinicians identify potential patients from an early stage.
Using ontology network structure in text mining.
Berndt, Donald J; McCart, James A; Luther, Stephen L
2010-11-13
Statistical text mining treats documents as bags of words, with a focus on term frequencies within documents and across document collections. Unlike natural language processing (NLP) techniques that rely on an engineered vocabulary or a full-featured ontology, statistical approaches do not make use of domain-specific knowledge. The freedom from biases can be an advantage, but at the cost of ignoring potentially valuable knowledge. The approach proposed here investigates a hybrid strategy based on computing graph measures of term importance over an entire ontology and injecting the measures into the statistical text mining process. As a starting point, we adapt existing search engine algorithms such as PageRank and HITS to determine term importance within an ontology graph. The graph-theoretic approach is evaluated using a smoking data set from the i2b2 National Center for Biomedical Computing, cast as a simple binary classification task for categorizing smoking-related documents, demonstrating consistent improvements in accuracy.
Sluis-Cremer, G. K.; Walters, L. G.; Sichel, H. S.
1967-01-01
The ventilatory capacity of a random sample of men over the age of 35 years in the town of Carletonville was estimated by the forced expiratory volume and the peak expiratory flow rate. Five hundred and sixty-two persons were working or had worked in gold-mines and 265 had never worked in gold-mines. No difference in ventilatory function was found between the miners and non-miners other than that due to the excess of chronic bronchitis in miners. PMID:6017134
PubMed-EX: a web browser extension to enhance PubMed search with text mining features.
Tsai, Richard Tzong-Han; Dai, Hong-Jie; Lai, Po-Ting; Huang, Chi-Hsin
2009-11-15
PubMed-EX is a browser extension that marks up PubMed search results with additional text-mining information. PubMed-EX's page mark-up, which includes section categorization and gene/disease and relation mark-up, can help researchers to quickly focus on key terms and provide additional information on them. All text processing is performed server-side, freeing up user resources. PubMed-EX is freely available at http://bws.iis.sinica.edu.tw/PubMed-EX and http://iisr.cse.yzu.edu.tw:8000/PubMed-EX/.
Arjunan, Satya Nanda Vel; Tomita, Masaru
2010-03-01
Many important cellular processes are regulated by reaction-diffusion (RD) of molecules that takes place both in the cytoplasm and on the membrane. To model and analyze such multicompartmental processes, we developed a lattice-based Monte Carlo method, Spatiocyte that supports RD in volume and surface compartments at single molecule resolution. Stochasticity in RD and the excluded volume effect brought by intracellular molecular crowding, both of which can significantly affect RD and thus, cellular processes, are also supported. We verified the method by comparing simulation results of diffusion, irreversible and reversible reactions with the predicted analytical and best available numerical solutions. Moreover, to directly compare the localization patterns of molecules in fluorescence microscopy images with simulation, we devised a visualization method that mimics the microphotography process by showing the trajectory of simulated molecules averaged according to the camera exposure time. In the rod-shaped bacterium Escherichia coli, the division site is suppressed at the cell poles by periodic pole-to-pole oscillations of the Min proteins (MinC, MinD and MinE) arising from carefully orchestrated RD in both cytoplasm and membrane compartments. Using Spatiocyte we could model and reproduce the in vivo MinDE localization dynamics by accounting for the previously reported properties of MinE. Our results suggest that the MinE ring, which is essential in preventing polar septation, is largely composed of MinE that is transiently attached to the membrane independently after recruited by MinD. Overall, Spatiocyte allows simulation and visualization of complex spatial and reaction-diffusion mediated cellular processes in volumes and surfaces. As we showed, it can potentially provide mechanistic insights otherwise difficult to obtain experimentally. The online version of this article (doi:10.1007/s11693-009-9047-2) contains supplementary material, which is available to authorized users.
Learning in the context of distribution drift
2017-05-09
published in the leading data mining journal, Data Mining and Knowledge Discovery (Webb et. al., 2016)1. We have shown that the previous qualitative...learner Low-bias learner Aggregated classifier Figure 7: Architecture for learning fr m streaming data in th co text of variable or unknown...Learning limited dependence Bayesian classifiers, in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD
Enhancements for a Dynamic Data Warehousing and Mining System for Large-scale HSCB Data
2016-07-20
Intelligent Automation Incorporated Enhancements for a Dynamic Data Warehousing and Mining ...Page | 2 Intelligent Automation Incorporated Monthly Report No. 4 Enhancements for a Dynamic Data Warehousing and Mining System Large-Scale HSCB...including Top Videos, Top Users, Top Words, and Top Languages, and also applied NER to the text associated with YouTube posts. We have also developed UI for
Enhancements for a Dynamic Data Warehousing and Mining System for Large-Scale HSCB Data
2016-07-20
Intelligent Automation Incorporated Enhancements for a Dynamic Data Warehousing and Mining ...Page | 2 Intelligent Automation Incorporated Monthly Report No. 4 Enhancements for a Dynamic Data Warehousing and Mining System Large-Scale HSCB...including Top Videos, Top Users, Top Words, and Top Languages, and also applied NER to the text associated with YouTube posts. We have also developed UI for
NASA Astrophysics Data System (ADS)
Rochyani, Neny
2017-11-01
Acid mine drainage is a major problem for the mining environment. The main factor that formed acid mine drainage is the volume of rainfall. Therefore, it is important to know clearly the main climate pattern of rainfall and season on the management of acid mine drainage. This study focuses on the effects of rainfall on acid mine water management. Based on daily rainfall data, monthly and seasonal patterns by using Gumbel approach is known the amount of rainfall that occurred in East Pit 3 West Banko area. The data also obtained the highest maximum daily rainfall on 165 mm/day and the lowest at 76.4 mm/day, where it is known that the rainfall conditions during the period 2007 - 2016 is from November to April so the use of lime is also slightly, While the low rainfall is from May to October and the use of lime will be more and more. Based on calculation of lime requirement for each return period, it can be seen the total of lime and financial requirement for treatment of each return period.
NASA Astrophysics Data System (ADS)
Han, Wencheng; Zhou, Renjie; Liu, Xianfeng; Sun, Dongdong
2018-03-01
The non-pillar sublevel caving method with large structural parameters used in Mao Gong Iron Mine is of high rate of dilution and loss, and the ore recovery rate is less than 50%. Aiming at this problem, this paper analyzes the influence mechanism of the caving step on the mining index by means of the matching relationship between the shape of caved ore body and the drawn-out ore body, then through the physical simulation experiment in laboratory, the mining index such as the volume of pure ore drawing, ore recovery ratio and rock mixing ratio are studied under different caving step. The results show that the mining index under caving step of two row of blast hole is better than that under caving step of one row of blast hole. The research has guided significance for production of the mine.
Empirical advances with text mining of electronic health records.
Delespierre, T; Denormandie, P; Bar-Hen, A; Josseran, L
2017-08-22
Korian is a private group specializing in medical accommodations for elderly and dependent people. A professional data warehouse (DWH) established in 2010 hosts all of the residents' data. Inside this information system (IS), clinical narratives (CNs) were used only by medical staff as a residents' care linking tool. The objective of this study was to show that, through qualitative and quantitative textual analysis of a relatively small physiotherapy and well-defined CN sample, it was possible to build a physiotherapy corpus and, through this process, generate a new body of knowledge by adding relevant information to describe the residents' care and lives. Meaningful words were extracted through Standard Query Language (SQL) with the LIKE function and wildcards to perform pattern matching, followed by text mining and a word cloud using R® packages. Another step involved principal components and multiple correspondence analyses, plus clustering on the same residents' sample as well as on other health data using a health model measuring the residents' care level needs. By combining these techniques, physiotherapy treatments could be characterized by a list of constructed keywords, and the residents' health characteristics were built. Feeding defects or health outlier groups could be detected, physiotherapy residents' data and their health data were matched, and differences in health situations showed qualitative and quantitative differences in physiotherapy narratives. This textual experiment using a textual process in two stages showed that text mining and data mining techniques provide convenient tools to improve residents' health and quality of care by adding new, simple, useable data to the electronic health record (EHR). When used with a normalized physiotherapy problem list, text mining through information extraction (IE), named entity recognition (NER) and data mining (DM) can provide a real advantage to describe health care, adding new medical material and helping to integrate the EHR system into the health staff work environment.
Database citation in full text biomedical articles.
Kafkas, Şenay; Kim, Jee-Hyub; McEntyre, Johanna R
2013-01-01
Molecular biology and literature databases represent essential infrastructure for life science research. Effective integration of these data resources requires that there are structured cross-references at the level of individual articles and biological records. Here, we describe the current patterns of how database entries are cited in research articles, based on analysis of the full text Open Access articles available from Europe PMC. Focusing on citation of entries in the European Nucleotide Archive (ENA), UniProt and Protein Data Bank, Europe (PDBe), we demonstrate that text mining doubles the number of structured annotations of database record citations supplied in journal articles by publishers. Many thousands of new literature-database relationships are found by text mining, since these relationships are also not present in the set of articles cited by database records. We recommend that structured annotation of database records in articles is extended to other databases, such as ArrayExpress and Pfam, entries from which are also cited widely in the literature. The very high precision and high-throughput of this text-mining pipeline makes this activity possible both accurately and at low cost, which will allow the development of new integrated data services.
Database Citation in Full Text Biomedical Articles
Kafkas, Şenay; Kim, Jee-Hyub; McEntyre, Johanna R.
2013-01-01
Molecular biology and literature databases represent essential infrastructure for life science research. Effective integration of these data resources requires that there are structured cross-references at the level of individual articles and biological records. Here, we describe the current patterns of how database entries are cited in research articles, based on analysis of the full text Open Access articles available from Europe PMC. Focusing on citation of entries in the European Nucleotide Archive (ENA), UniProt and Protein Data Bank, Europe (PDBe), we demonstrate that text mining doubles the number of structured annotations of database record citations supplied in journal articles by publishers. Many thousands of new literature-database relationships are found by text mining, since these relationships are also not present in the set of articles cited by database records. We recommend that structured annotation of database records in articles is extended to other databases, such as ArrayExpress and Pfam, entries from which are also cited widely in the literature. The very high precision and high-throughput of this text-mining pipeline makes this activity possible both accurately and at low cost, which will allow the development of new integrated data services. PMID:23734176
Subramani, Suresh; Kalpana, Raja; Monickaraj, Pankaj Moses; Natarajan, Jeyakumar
2015-04-01
The knowledge on protein-protein interactions (PPI) and their related pathways are equally important to understand the biological functions of the living cell. Such information on human proteins is highly desirable to understand the mechanism of several diseases such as cancer, diabetes, and Alzheimer's disease. Because much of that information is buried in biomedical literature, an automated text mining system for visualizing human PPI and pathways is highly desirable. In this paper, we present HPIminer, a text mining system for visualizing human protein interactions and pathways from biomedical literature. HPIminer extracts human PPI information and PPI pairs from biomedical literature, and visualize their associated interactions, networks and pathways using two curated databases HPRD and KEGG. To our knowledge, HPIminer is the first system to build interaction networks from literature as well as curated databases. Further, the new interactions mined only from literature and not reported earlier in databases are highlighted as new. A comparative study with other similar tools shows that the resultant network is more informative and provides additional information on interacting proteins and their associated networks. Copyright © 2015 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Nikolaev, A. V.; Alymenko, N. I.; Kamenskikh, A. A.; Alymenko, D. N.; Nikolaev, V. A.; Petrov, A. I.
2017-10-01
The article specifies measuring data of air parameters and its volume flow in the shafts and on the surface, collected in BKPRU-2 (Berezniki potash plant and mine 2) («Uralkali» PJSC) in normal operation mode, after shutdown of the main mine fan (GVU) and within several hours. As a result of the test it has been established that thermal pressure between the mine shafts is active continuously regardless of the GVU operation mode or other draught sources. Also it has been discovered that depth of the mine shafts has no impact on thermal pressure value. By the same difference of shaft elevation marks and parameters of outer air between the shafts, by their different depth, thermal pressure of the same value will be active. Value of the general mine natural draught defined as an algebraic sum of thermal pressure values between the shafts depends only on the difference of temperature and pressure of outer air and air in the shaft bottoms on condition of shutdown of the air handling system (unit-heaters, air conditioning systems).
2009-06-01
capabilities: web-based, relational/multi-dimensional, client/server, and metadata (data about data) inclusion (pp. 39-40). Text mining, on the other...and Organizational Systems ( CASOS ) (Carley, 2005). Although AutoMap can be used to conduct text-mining, it was utilized only for its visualization...provides insight into how the GMCOI is using the terms, and where there might be redundant terms and need for de -confliction and standardization
Terminologies for text-mining; an experiment in the lipoprotein metabolism domain
Alexopoulou, Dimitra; Wächter, Thomas; Pickersgill, Laura; Eyre, Cecilia; Schroeder, Michael
2008-01-01
Background The engineering of ontologies, especially with a view to a text-mining use, is still a new research field. There does not yet exist a well-defined theory and technology for ontology construction. Many of the ontology design steps remain manual and are based on personal experience and intuition. However, there exist a few efforts on automatic construction of ontologies in the form of extracted lists of terms and relations between them. Results We share experience acquired during the manual development of a lipoprotein metabolism ontology (LMO) to be used for text-mining. We compare the manually created ontology terms with the automatically derived terminology from four different automatic term recognition (ATR) methods. The top 50 predicted terms contain up to 89% relevant terms. For the top 1000 terms the best method still generates 51% relevant terms. In a corpus of 3066 documents 53% of LMO terms are contained and 38% can be generated with one of the methods. Conclusions Given high precision, automatic methods can help decrease development time and provide significant support for the identification of domain-specific vocabulary. The coverage of the domain vocabulary depends strongly on the underlying documents. Ontology development for text mining should be performed in a semi-automatic way; taking ATR results as input and following the guidelines we described. Availability The TFIDF term recognition is available as Web Service, described at PMID:18460175
Abbe, Adeline; Falissard, Bruno
2017-10-23
Internet is a particularly dynamic way to quickly capture the perceptions of a population in real time. Complementary to traditional face-to-face communication, online social networks help patients to improve self-esteem and self-help. The aim of this study was to use text mining on material from an online forum exploring patients' concerns about treatment (antidepressants and anxiolytics). Concerns about treatment were collected from discussion titles in patients' online community related to antidepressants and anxiolytics. To examine the content of these titles automatically, we used text mining methods, such as word frequency in a document-term matrix and co-occurrence of words using a network analysis. It was thus possible to identify topics discussed on the forum. The forum included 2415 discussions on antidepressants and anxiolytics over a period of 3 years. After a preprocessing step, the text mining algorithm identified the 99 most frequently occurring words in titles, among which were escitalopram, withdrawal, antidepressant, venlafaxine, paroxetine, and effect. Patients' concerns were related to antidepressant withdrawal, the need to share experience about symptoms, effects, and questions on weight gain with some drugs. Patients' expression on the Internet is a potential additional resource in addressing patients' concerns about treatment. Patient profiles are close to that of patients treated in psychiatry. ©Adeline Abbe, Bruno Falissard. Originally published in JMIR Mental Health (http://mental.jmir.org), 23.10.2017.
Jonnagaddala, Jitendra; Liaw, Siaw-Teng; Ray, Pradeep; Kumar, Manish; Chang, Nai-Wen; Dai, Hong-Jie
2015-12-01
Coronary artery disease (CAD) often leads to myocardial infarction, which may be fatal. Risk factors can be used to predict CAD, which may subsequently lead to prevention or early intervention. Patient data such as co-morbidities, medication history, social history and family history are required to determine the risk factors for a disease. However, risk factor data are usually embedded in unstructured clinical narratives if the data is not collected specifically for risk assessment purposes. Clinical text mining can be used to extract data related to risk factors from unstructured clinical notes. This study presents methods to extract Framingham risk factors from unstructured electronic health records using clinical text mining and to calculate 10-year coronary artery disease risk scores in a cohort of diabetic patients. We developed a rule-based system to extract risk factors: age, gender, total cholesterol, HDL-C, blood pressure, diabetes history and smoking history. The results showed that the output from the text mining system was reliable, but there was a significant amount of missing data to calculate the Framingham risk score. A systematic approach for understanding missing data was followed by implementation of imputation strategies. An analysis of the 10-year Framingham risk scores for coronary artery disease in this cohort has shown that the majority of the diabetic patients are at moderate risk of CAD. Copyright © 2015 Elsevier Inc. All rights reserved.
Cañada, Andres; Capella-Gutierrez, Salvador; Rabal, Obdulia; Oyarzabal, Julen; Valencia, Alfonso; Krallinger, Martin
2017-07-03
A considerable effort has been devoted to retrieve systematically information for genes and proteins as well as relationships between them. Despite the importance of chemical compounds and drugs as a central bio-entity in pharmacological and biological research, only a limited number of freely available chemical text-mining/search engine technologies are currently accessible. Here we present LimTox (Literature Mining for Toxicology), a web-based online biomedical search tool with special focus on adverse hepatobiliary reactions. It integrates a range of text mining, named entity recognition and information extraction components. LimTox relies on machine-learning, rule-based, pattern-based and term lookup strategies. This system processes scientific abstracts, a set of full text articles and medical agency assessment reports. Although the main focus of LimTox is on adverse liver events, it enables also basic searches for other organ level toxicity associations (nephrotoxicity, cardiotoxicity, thyrotoxicity and phospholipidosis). This tool supports specialized search queries for: chemical compounds/drugs, genes (with additional emphasis on key enzymes in drug metabolism, namely P450 cytochromes-CYPs) and biochemical liver markers. The LimTox website is free and open to all users and there is no login requirement. LimTox can be accessed at: http://limtox.bioinfo.cnio.es. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Text Mining for Drugs and Chemical Compounds: Methods, Tools and Applications.
Vazquez, Miguel; Krallinger, Martin; Leitner, Florian; Valencia, Alfonso
2011-06-01
Providing prior knowledge about biological properties of chemicals, such as kinetic values, protein targets, or toxic effects, can facilitate many aspects of drug development. Chemical information is rapidly accumulating in all sorts of free text documents like patents, industry reports, or scientific articles, which has motivated the development of specifically tailored text mining applications. Despite the potential gains, chemical text mining still faces significant challenges. One of the most salient is the recognition of chemical entities mentioned in text. To help practitioners contribute to this area, a good portion of this review is devoted to this issue, and presents the basic concepts and principles underlying the main strategies. The technical details are introduced and accompanied by relevant bibliographic references. Other tasks discussed are retrieving relevant articles, identifying relationships between chemicals and other entities, or determining the chemical structures of chemicals mentioned in text. This review also introduces a number of published applications that can be used to build pipelines in topics like drug side effects, toxicity, and protein-disease-compound network analysis. We conclude the review with an outlook on how we expect the field to evolve, discussing its possibilities and its current limitations. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Mining free-text medical records for companion animal enteric syndrome surveillance.
Anholt, R M; Berezowski, J; Jamal, I; Ribble, C; Stephen, C
2014-03-01
Large amounts of animal health care data are present in veterinary electronic medical records (EMR) and they present an opportunity for companion animal disease surveillance. Veterinary patient records are largely in free-text without clinical coding or fixed vocabulary. Text-mining, a computer and information technology application, is needed to identify cases of interest and to add structure to the otherwise unstructured data. In this study EMR's were extracted from veterinary management programs of 12 participating veterinary practices and stored in a data warehouse. Using commercially available text-mining software (WordStat™), we developed a categorization dictionary that could be used to automatically classify and extract enteric syndrome cases from the warehoused electronic medical records. The diagnostic accuracy of the text-miner for retrieving cases of enteric syndrome was measured against human reviewers who independently categorized a random sample of 2500 cases as enteric syndrome positive or negative. Compared to the reviewers, the text-miner retrieved cases with enteric signs with a sensitivity of 87.6% (95%CI, 80.4-92.9%) and a specificity of 99.3% (95%CI, 98.9-99.6%). Automatic and accurate detection of enteric syndrome cases provides an opportunity for community surveillance of enteric pathogens in companion animals. Copyright © 2014 Elsevier B.V. All rights reserved.
1980-01-01
producers under a state law of 1978. Until the regulations under PURPA Title II (the National Energy Act of 1978) are promulgated and the PUC reviews this...hour (rWi); end it is FURTr.R ORDERMD, that the Corumission will re-examine th4 PURPA issues in this proceedirg upon the issuance of rules by the F-RC
Feasibility of mining lunar resources for earth use: Circa 2000 AD. Volume 1: Summary
NASA Technical Reports Server (NTRS)
Nishioka, K.; Arno, R. D.; Alexander, A. D.; Slye, R. E.
1973-01-01
The feasibility of obtaining lunar minerals for terrestrial uses is examined. Preliminary results gave indications that it will not be economically feasible to mine, refine, and transport lunar materials to Earth for consumption. A broad systems approach was used to analyze the problem. It was determined that even though the procedure was not economically advisable, the concept for the operations is technically sound.
Seqenv: linking sequences to environments through text mining.
Sinclair, Lucas; Ijaz, Umer Z; Jensen, Lars Juhl; Coolen, Marco J L; Gubry-Rangin, Cecile; Chroňáková, Alica; Oulas, Anastasis; Pavloudi, Christina; Schnetzer, Julia; Weimann, Aaron; Ijaz, Ali; Eiler, Alexander; Quince, Christopher; Pafilis, Evangelos
2016-01-01
Understanding the distribution of taxa and associated traits across different environments is one of the central questions in microbial ecology. High-throughput sequencing (HTS) studies are presently generating huge volumes of data to address this biogeographical topic. However, these studies are often focused on specific environment types or processes leading to the production of individual, unconnected datasets. The large amounts of legacy sequence data with associated metadata that exist can be harnessed to better place the genetic information found in these surveys into a wider environmental context. Here we introduce a software program, seqenv, to carry out precisely such a task. It automatically performs similarity searches of short sequences against the "nt" nucleotide database provided by NCBI and, out of every hit, extracts-if it is available-the textual metadata field. After collecting all the isolation sources from all the search results, we run a text mining algorithm to identify and parse words that are associated with the Environmental Ontology (EnvO) controlled vocabulary. This, in turn, enables us to determine both in which environments individual sequences or taxa have previously been observed and, by weighted summation of those results, to summarize complete samples. We present two demonstrative applications of seqenv to a survey of ammonia oxidizing archaea as well as to a plankton paleome dataset from the Black Sea. These demonstrate the ability of the tool to reveal novel patterns in HTS and its utility in the fields of environmental source tracking, paleontology, and studies of microbial biogeography. To install seqenv, go to: https://github.com/xapple/seqenv.
Product Recommendation System Based on Personal Preference Model Using CAM
NASA Astrophysics Data System (ADS)
Murakami, Tomoko; Yoshioka, Nobukazu; Orihara, Ryohei; Furukawa, Koichi
Product recommendation system is realized by applying business rules acquired by data maining techniques. Business rules such as demographical patterns of purchase, are able to cover the groups of users that have a tendency to purchase products, but it is difficult to recommend products adaptive to various personal preferences only by utilizing them. In addition to that, it is very costly to gather the large volume of high quality survey data, which is necessary for good recommendation based on personal preference model. A method collecting kansei information automatically without questionnaire survey is required. The constructing personal preference model from less favor data is also necessary, since it is costly for the user to input favor data. In this paper, we propose product recommendation system based on kansei information extracted by text mining and user's preference model constructed by Category-guided Adaptive Modeling, CAM for short. CAM is a feature construction method that can generate new features constructing the space where same labeled examples are close and different labeled examples are far away from some labeled examples. It is possible to construct personal preference model by CAM despite less information of likes and dislikes categories. In the system, retrieval agent gathers the products' specification and user agent manages preference model, user's likes and dislikes. Kansei information of the products is gained by applying text mining technique to the reputation documents about the products on the web site. We carry out some experimental studies to make sure that prefrence model obtained by our method performs effectively.
Kharat, Amit T; Singh, Amarjit; Kulkarni, Vilas M; Shah, Digish
2014-01-01
Data mining facilitates the study of radiology data in various dimensions. It converts large patient image and text datasets into useful information that helps in improving patient care and provides informative reports. Data mining technology analyzes data within the Radiology Information System and Hospital Information System using specialized software which assesses relationships and agreement in available information. By using similar data analysis tools, radiologists can make informed decisions and predict the future outcome of a particular imaging finding. Data, information and knowledge are the components of data mining. Classes, Clusters, Associations, Sequential patterns, Classification, Prediction and Decision tree are the various types of data mining. Data mining has the potential to make delivery of health care affordable and ensure that the best imaging practices are followed. It is a tool for academic research. Data mining is considered to be ethically neutral, however concerns regarding privacy and legality exists which need to be addressed to ensure success of data mining. PMID:25024513
Analyzing asset management data using data and text mining.
DOT National Transportation Integrated Search
2014-07-01
Predictive models using text from a sample competitively bid California highway projects have been used to predict a construction : projects likely level of cost overrun. A text description of the project and the text of the five largest project line...
BioC implementations in Go, Perl, Python and Ruby.
Liu, Wanli; Islamaj Doğan, Rezarta; Kwon, Dongseop; Marques, Hernani; Rinaldi, Fabio; Wilbur, W John; Comeau, Donald C
2014-01-01
As part of a communitywide effort for evaluating text mining and information extraction systems applied to the biomedical domain, BioC is focused on the goal of interoperability, currently a major barrier to wide-scale adoption of text mining tools. BioC is a simple XML format, specified by DTD, for exchanging data for biomedical natural language processing. With initial implementations in C++ and Java, BioC provides libraries of code for reading and writing BioC text documents and annotations. We extend BioC to Perl, Python, Go and Ruby. We used SWIG to extend the C++ implementation for Perl and one Python implementation. A second Python implementation and the Ruby implementation use native data structures and libraries. BioC is also implemented in the Google language Go. BioC modules are functional in all of these languages, which can facilitate text mining tasks. BioC implementations are freely available through the BioC site: http://bioc.sourceforge.net. Database URL: http://bioc.sourceforge.net/ Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.
Urbanski, William M; Condie, Brian G
2009-12-01
Textpresso Site Specific Recombinases (http://ssrc.genetics.uga.edu/) is a text-mining web server for searching a database of more than 9,000 full-text publications. The papers and abstracts in this database represent a wide range of topics related to site-specific recombinase (SSR) research tools. Included in the database are most of the papers that report the characterization or use of mouse strains that express Cre recombinase as well as papers that describe or analyze mouse lines that carry conditional (floxed) alleles or SSR-activated transgenes/knockins. The database also includes reports describing SSR-based cloning methods such as the Gateway or the Creator systems, papers reporting the development or use of SSR-based tools in systems such as Drosophila, bacteria, parasites, stem cells, yeast, plants, zebrafish, and Xenopus as well as publications that describe the biochemistry, genetics, or molecular structure of the SSRs themselves. Textpresso Site Specific Recombinases is the only comprehensive text-mining resource available for the literature describing the biology and technical applications of SSRs. (c) 2009 Wiley-Liss, Inc.
Text mining for metabolic pathways, signaling cascades, and protein networks.
Hoffmann, Robert; Krallinger, Martin; Andres, Eduardo; Tamames, Javier; Blaschke, Christian; Valencia, Alfonso
2005-05-10
The complexity of the information stored in databases and publications on metabolic and signaling pathways, the high throughput of experimental data, and the growing number of publications make it imperative to provide systems to help the researcher navigate through these interrelated information resources. Text-mining methods have started to play a key role in the creation and maintenance of links between the information stored in biological databases and its original sources in the literature. These links will be extremely useful for database updating and curation, especially if a number of technical problems can be solved satisfactorily, including the identification of protein and gene names (entities in general) and the characterization of their types of interactions. The first generation of openly accessible text-mining systems, such as iHOP (Information Hyperlinked over Proteins), provides additional functions to facilitate the reconstruction of protein interaction networks, combine database and text information, and support the scientist in the formulation of novel hypotheses. The next challenge is the generation of comprehensive information regarding the general function of signaling pathways and protein interaction networks.
TOY SAFETY SURVEILLANCE FROM ONLINE REVIEWS
Winkler, Matt; Abrahams, Alan S.; Gruss, Richard; Ehsani, Johnathan P.
2016-01-01
Toy-related injuries account for a significant number of childhood injuries and the prevention of these injuries remains a goal for regulatory agencies and manufacturers. Text-mining is an increasingly prevalent method for uncovering the significance of words using big data. This research sets out to determine the effectiveness of text-mining in uncovering potentially dangerous children’s toys. We develop a danger word list, also known as a ‘smoke word’ list, from injury and recall text narratives. We then use the smoke word lists to score over one million Amazon reviews, with the top scores denoting potential safety concerns. We compare the smoke word list to conventional sentiment analysis techniques, in terms of both word overlap and effectiveness. We find that smoke word lists are highly distinct from conventional sentiment dictionaries and provide a statistically significant method for identifying safety concerns in children’s toy reviews. Our findings indicate that text-mining is, in fact, an effective method for the surveillance of safety concerns in children’s toys and could be a gateway to effective prevention of toy-product-related injuries. PMID:27942092
Methodology of selecting dozers for lignite open pit mines in Serbia
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stojanovic, D.; Ignjatovic, D.; Kovacevic, S.
1996-12-31
Apart from the main production processes (coal and overburden mining, rail conveyors transportation and storage of excavated masses) performed by great-capacity mechanization at open pit mines, numerous and different auxiliary works, that often have crucial influence on both the work efficiency of main equipment and the maintenance of optimum technical conditions of machines and plants covering technological system of open pit, are present. Successful realization of work indispensably requires a proper and adequate selection of auxiliary machines according to their type quantity, capacity, power etc. thus highly respecting specific conditions existing at each and every open pit mine. A dozermore » is certainly the most important and representative auxiliary machine at single open pit mine. It is widely used in numerous works that, in fact, are preconditions for successful work of the main mechanization and consequently the very selection of a dozer ranges among the most important operations when selecting mechanization. This paper presents the methodology of dozers selection when lignite open pit mines are concerned. A mathematical model defining the volume of work required for dozers to perform at open pit mines and consequently the number of necessary dozers was designed. The model underwent testing in practice at big open pit mines and can be used in design of future open pits mines.« less
Identifying key hospital service quality factors in online health communities.
Jung, Yuchul; Hur, Cinyoung; Jung, Dain; Kim, Minki
2015-04-07
The volume of health-related user-created content, especially hospital-related questions and answers in online health communities, has rapidly increased. Patients and caregivers participate in online community activities to share their experiences, exchange information, and ask about recommended or discredited hospitals. However, there is little research on how to identify hospital service quality automatically from the online communities. In the past, in-depth analysis of hospitals has used random sampling surveys. However, such surveys are becoming impractical owing to the rapidly increasing volume of online data and the diverse analysis requirements of related stakeholders. As a solution for utilizing large-scale health-related information, we propose a novel approach to identify hospital service quality factors and overtime trends automatically from online health communities, especially hospital-related questions and answers. We defined social media-based key quality factors for hospitals. In addition, we developed text mining techniques to detect such factors that frequently occur in online health communities. After detecting these factors that represent qualitative aspects of hospitals, we applied a sentiment analysis to recognize the types of recommendations in messages posted within online health communities. Korea's two biggest online portals were used to test the effectiveness of detection of social media-based key quality factors for hospitals. To evaluate the proposed text mining techniques, we performed manual evaluations on the extraction and classification results, such as hospital name, service quality factors, and recommendation types using a random sample of messages (ie, 5.44% (9450/173,748) of the total messages). Service quality factor detection and hospital name extraction achieved average F1 scores of 91% and 78%, respectively. In terms of recommendation classification, performance (ie, precision) is 78% on average. Extraction and classification performance still has room for improvement, but the extraction results are applicable to more detailed analysis. Further analysis of the extracted information reveals that there are differences in the details of social media-based key quality factors for hospitals according to the regions in Korea, and the patterns of change seem to accurately reflect social events (eg, influenza epidemics). These findings could be used to provide timely information to caregivers, hospital officials, and medical officials for health care policies.
ASCOT: a text mining-based web-service for efficient search and assisted creation of clinical trials
2012-01-01
Clinical trials are mandatory protocols describing medical research on humans and among the most valuable sources of medical practice evidence. Searching for trials relevant to some query is laborious due to the immense number of existing protocols. Apart from search, writing new trials includes composing detailed eligibility criteria, which might be time-consuming, especially for new researchers. In this paper we present ASCOT, an efficient search application customised for clinical trials. ASCOT uses text mining and data mining methods to enrich clinical trials with metadata, that in turn serve as effective tools to narrow down search. In addition, ASCOT integrates a component for recommending eligibility criteria based on a set of selected protocols. PMID:22595088
Korkontzelos, Ioannis; Mu, Tingting; Ananiadou, Sophia
2012-04-30
Clinical trials are mandatory protocols describing medical research on humans and among the most valuable sources of medical practice evidence. Searching for trials relevant to some query is laborious due to the immense number of existing protocols. Apart from search, writing new trials includes composing detailed eligibility criteria, which might be time-consuming, especially for new researchers. In this paper we present ASCOT, an efficient search application customised for clinical trials. ASCOT uses text mining and data mining methods to enrich clinical trials with metadata, that in turn serve as effective tools to narrow down search. In addition, ASCOT integrates a component for recommending eligibility criteria based on a set of selected protocols.
Pak, Malk Eun; Kim, Yu Ri; Kim, Ha Neui; Ahn, Sung Min; Shin, Hwa Kyoung; Baek, Jin Ung; Choi, Byung Tae
2016-02-17
In literature on Korean medicine, Dongeuibogam (Treasured Mirror of Eastern Medicine), published in 1613, represents the overall results of the traditional medicines of North-East Asia based on prior medicinal literature of this region. We utilized this medicinal literature by text mining to establish a list of candidate herbs for cognitive enhancement in the elderly and then performed an evaluation of their effects. Text mining was performed for selection of candidate herbs. Cell viability was determined in HT22 hippocampal cells and immunohistochemistry and behavioral analysis was performed in a kainic acid (KA) mice model in order to observe alterations of hippocampal cells and cognition. Twenty four herbs for cognitive enhancement in the elderly were selected by text mining of Dongeuibogam. In HT22 cells, pretreatment with 3 candidate herbs resulted in significantly reduced glutamate-induced cell death. Panax ginseng was the most neuroprotective herb against glutamate-induced cell death. In the hippocampus of a KA mice model, pretreatment with 11 candidate herbs resulted in suppression of caspase-3 expression. Treatment with 7 candidate herbs resulted in significantly enhanced expression levels of phosphorylated cAMP response element binding protein. Number of proliferated cells indicated by BrdU labeling was increased by treatment with 10 candidate herbs. Schisandra chinensis was the most effective herb against cell death and proliferation of progenitor cells and Rehmannia glutinosa in neuroprotection in the hippocampus of a KA mice model. In a KA mice model, we confirmed improved spatial and short memory by treatment with the 3 most effective candidate herbs and these recovered functions were involved in a higher number of newly formed neurons from progenitor cells in the hippocampus. These established herbs and their combinations identified by text-mining technique and evaluation for effectiveness may have value in further experimental and clinical applications for cognitive enhancement in the elderly. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Mining the Text: 34 Text Features that Can Ease or Obstruct Text Comprehension and Use
ERIC Educational Resources Information Center
White, Sheida
2012-01-01
This article presents 34 characteristics of texts and tasks ("text features") that can make continuous (prose), noncontinuous (document), and quantitative texts easier or more difficult for adolescents and adults to comprehend and use. The text features were identified by examining the assessment tasks and associated texts in the national…
Energy budgets of mining-induced earthquakes and their interactions with nearby stopes
McGarr, A.
2000-01-01
In the early 1960's, N.G.W. Cook, using an underground network of geophones, demonstrated that most Witwatersrand tremors are closely associated with deep level gold mining operations. He also showed that the energy released by the closure of the tabular stopes at depths of the order of 2 km was more than sufficient to account for the mining-induced earthquakes. I report here updated versions of these two results based on more modern underground data acquired in the Witwatersrand gold fields. Firstly, an extensive suite of in situ stress data indicate that the ambient state of crustal stress here is close to the failure state in the absence of mining even though the tectonic setting is thoroughly stable. Mining initially stabilizes the rock mass by reducing the pore fluid pressure from its initial hydrostatic state to nearly zero. The extensive mine excavations, as Cook showed, concentrate the deviatoric stresses, in localized regions of the abutments, back into a failure state resulting in seismicity. Secondly, there appears to be two distinct types of mining-induced earthquakes: the first is strongly coupled to the mining and involves shear failure plus a coseismic volume reduction; the second type is not evidently coupled to any particular mine face, shows purely deviatoric failure, and is presumably caused by more regional changes in the state of stress due to mining. Thirdly, energy budgets for mining induced earthquakes of both types indicate that, of the available released energy, only a few per cent is radiated by the seismic waves with the majority being consumed in overcoming fault friction. Published by Elsevier Science Ltd.In the early 1960's, N.G.W. Cook, using an underground network of geophones, demonstrated that most Witwatersrand tremors are closely associated with deep level gold mining operations. He also showed that the energy released by the closure of the tabular stopes at depths of the order of 2 km was more than sufficient to account for the mining-induced earthquakes. I report here updated versions of these two results based on more modern underground data acquired in the Witwatersrand gold fields. Firstly, an extensive suite of in situ stress data indicate that the ambient state of crustal stress here is close to the failure state in the absence of mining even though the tectonic setting is thoroughly stable. Mining initially stabilizes the rock mass by reducing the pore fluid pressure from its initial hydrostatic state to nearly zero. The extensive mine excavations, as Cook showed, concentrate the deviatoric stresses, in localized regions of the abutments, back into a failure state resulting in seismicity. Secondly, there appears to be two distinct types of mining-induced earthquakes: the first is strongly coupled to the mining and involves shear failure plus a coseismic volume reduction; the second type is not evidently coupled to any particular mine face, shows purely deviatoric failure, and is presumably caused by more regional changes in the state of stress due to mining. Thirdly, energy budgets for mining induced earthquakes of both types indicate that, of the available released energy, only a few per cent is radiated by the seismic waves with the majority being consumed in overcoming fault friction.
Federal Register 2010, 2011, 2012, 2013, 2014
2010-02-24
... Transmittal of Applications: March 26, 2010. Full Text of Announcement I. Funding Opportunity Description... related to industrial health and safety: Mining and mineral engineering, industrial engineering... technology/technician, hazardous materials information systems technology/technician, mining technology...
Tichomirowa, Marion; Heidel, Claudia
2012-01-01
The isotope composition of dissolved sulphate and strontium in atmospheric deposition, groundwater, mine water and river water in the region of Freiberg was investigated to better understand the fate of these components in the regional and global water cycle. Most of the isotope variations of dissolved sulphates in atmospheric deposition from three locations sampled bi- or tri-monthly can be explained by fractionation processes leading to lower [Formula: see text] (of about 2-3‰) and higher [Formula: see text] (of about 8-10‰) values in summer compared with the winter period. These samples showed a negative correlation between [Formula: see text] and [Formula: see text] values and a weak positive correlation between [Formula: see text] and [Formula: see text] values. They reflect the sulphate formed by aqueous oxidation from long-range transport in clouds. However, these isotope variations were superimposed by changes of the dominating atmospheric sulphate source. At two of the sampling points, large variations of mean annual [Formula: see text] values from atmospheric bulk deposition were recorded. From 2008 to 2009, the mean annual [Formula: see text] value increased by about 5‰; and decreased by about 4‰ from 2009 to 2010. A change in the dominating sulphate source or oxidation pathways of SO(2) in the atmosphere is proposed to cause these shifts. No changes were found in corresponding [Formula: see text] values. Groundwater, river water and some mine waters (where groundwater was the dominating sulphate source) also showed temporal shifts in their [Formula: see text] values corresponding to those of bulk atmospheric deposition, albeit to a lower degree. The mean transit time of atmospheric sulphur through the soil into the groundwater and river water was less than a year and therefore much shorter than previously suggested. Mining activities of about 800 years in the Freiberg region may have led to large subsurface areas with an enhanced groundwater flow along fractures and mined-refilled ore lodes which may shorten transit times of sulphate from precipitation through groundwater into river water.
Neural networks for data mining electronic text collections
NASA Astrophysics Data System (ADS)
Walker, Nicholas; Truman, Gregory
1997-04-01
The use of neural networks in information retrieval and text analysis has primarily suffered from the issues of adequate document representation, the ability to scale to very large collections, dynamism in the face of new information and the practical difficulties of basing the design on the use of supervised training sets. Perhaps the most important approach to begin solving these problems is the use of `intermediate entities' which reduce the dimensionality of document representations and the size of documents collections to manageable levels coupled with the use of unsupervised neural network paradigms. This paper describes the issues, a fully configured neural network-based text analysis system--dataHARVEST--aimed at data mining text collections which begins this process, along with the remaining difficulties and potential ways forward.
Numerical simulation of filtration of mine water from coal slurry particles
NASA Astrophysics Data System (ADS)
Dyachenko, E. N.; Dyachenko, N. N.
2017-11-01
The discrete element method is applied to model a technology for clarification of industrial waste water containing fine-dispersed solid impurities. The process is analyzed at the level of discrete particles and pores. The effect of filter porosity on the volume fraction of particles has been shown. The degree of clarification of mine water was also calculated depending on the coal slurry particle size, taking into account the adhesion force.
1993-12-30
projectile fragments from target materials, principally sand. Phase I activities included (1) literature review of separations technology , (2) site visits, (3...the current operation, evaluation of alternative means for separation of DU from sand, a review of uranium mining technology for v possible...the current operation, evaluation of alternative means for separation of DU from sand, a review of uranium mining technology for possible
Data Visualization in Information Retrieval and Data Mining (SIG VIS).
ERIC Educational Resources Information Center
Efthimiadis, Efthimis
2000-01-01
Presents abstracts that discuss using data visualization for information retrieval and data mining, including immersive information space and spatial metaphors; spatial data using multi-dimensional matrices with maps; TREC (Text Retrieval Conference) experiments; users' information needs in cartographic information retrieval; and users' relevance…
Macromolecule mass spectrometry: citation mining of user documents.
Kostoff, Ronald N; Bedford, Clifford D; del Río, J Antonio; Cortes, Héctor D; Karypis, George
2004-03-01
Identifying research users, applications, and impact is important for research performers, managers, evaluators, and sponsors. Identification of the user audience and the research impact is complex and time consuming due to the many indirect pathways through which fundamental research can impact applications. This paper identified the literature pathways through which two highly-cited papers of 2002 Chemistry Nobel Laureates Fenn and Tanaka impacted research, technology development, and applications. Citation Mining, an integration of citation bibliometrics and text mining, was applied to the >1600 first generation Science Citation Index (SCI) citing papers to Fenn's 1989 Science paper on Electrospray Ionization for Mass Spectrometry, and to the >400 first generation SCI citing papers to Tanaka's 1988 Rapid Communications in Mass Spectrometry paper on Laser Ionization Time-of-Flight Mass Spectrometry. Bibliometrics was performed on the citing papers to profile the user characteristics. Text mining was performed on the citing papers to identify the technical areas impacted by the research, and the relationships among these technical areas.
2016-09-26
Intelligent Automation Incorporated Enhancements for a Dynamic Data Warehousing and Mining ...Enhancements for a Dynamic Data Warehousing and Mining System for N00014-16-P-3014 Large-Scale Human Social Cultural Behavioral (HSBC) Data 5b. GRANT NUMBER...Representative Media Gallery View. We perform Scraawl’s NER algorithm to the text associated with YouTube post, which classifies the named entities into
Detecting Malicious Tweets in Twitter Using Runtime Monitoring With Hidden Information
2016-06-01
text mining using Twitter streaming API and python [Online]. Available: http://adilmoujahid.com/posts/2014/07/twitter-analytics/ [22] M. Singh, B...sites with 645,750,000 registered users [3] and has open source public tweets for data mining . 2. Malicious Users and Tweets In the modern world...want to data mine in Twitter, and presents the natural language assertions and corresponding rule patterns. It then describes the steps performed using
Numerical linear algebra in data mining
NASA Astrophysics Data System (ADS)
Eldén, Lars
Ideas and algorithms from numerical linear algebra are important in several areas of data mining. We give an overview of linear algebra methods in text mining (information retrieval), pattern recognition (classification of handwritten digits), and PageRank computations for web search engines. The emphasis is on rank reduction as a method of extracting information from a data matrix, low-rank approximation of matrices using the singular value decomposition and clustering, and on eigenvalue methods for network analysis.
Method to Select Technical Terms for Glossaries in Support of Joint Task Force Operations
2012-01-01
have been prohibitively time-consuming. Instead, we identified two publicly available terminology extractor tools: TerMine (NaCTEM, 2011) and Alchemy ...and that from the latter, by high recall. The Alchemy approach contrasts with that used in TerMine in that Alchemy will process the text with...information categories, such as person, location, and organization, in addition to returning topic keywords. Output from both TerMine and Alchemy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hirdt, J.A.; Brown, D.A., E-mail: dbrown@bnl.gov
The EXFOR library contains the largest collection of experimental nuclear reaction data available as well as the data's bibliographic information and experimental details. We text-mined the REACTION and MONITOR fields of the ENTRYs in the EXFOR library in order to identify understudied reactions and quantities. Using the results of the text-mining, we created an undirected graph from the EXFOR datasets with each graph node representing a single reaction and quantity and graph links representing the various types of connections between these reactions and quantities. This graph is an abstract representation of the connections in EXFOR, similar to graphs of socialmore » networks, authorship networks, etc. We use various graph theoretical tools to identify important yet understudied reactions and quantities in EXFOR. Although we identified a few cross sections relevant for shielding applications and isotope production, mostly we identified charged particle fluence monitor cross sections. As a side effect of this work, we learn that our abstract graph is typical of other real-world graphs.« less
Systematic analysis of molecular mechanisms for HCC metastasis via text mining approach.
Zhen, Cheng; Zhu, Caizhong; Chen, Haoyang; Xiong, Yiru; Tan, Junyuan; Chen, Dong; Li, Jin
2017-02-21
To systematically explore the molecular mechanism for hepatocellular carcinoma (HCC) metastasis and identify regulatory genes with text mining methods. Genes with highest frequencies and significant pathways related to HCC metastasis were listed. A handful of proteins such as EGFR, MDM2, TP53 and APP, were identified as hub nodes in PPI (protein-protein interaction) network. Compared with unique genes for HBV-HCCs, genes particular to HCV-HCCs were less, but may participate in more extensive signaling processes. VEGFA, PI3KCA, MAPK1, MMP9 and other genes may play important roles in multiple phenotypes of metastasis. Genes in abstracts of HCC-metastasis literatures were identified. Word frequency analysis, KEGG pathway and PPI network analysis were performed. Then co-occurrence analysis between genes and metastasis-related phenotypes were carried out. Text mining is effective for revealing potential regulators or pathways, but the purpose of it should be specific, and the combination of various methods will be more useful.
Unapparent Information Revelation: Text Mining for Counterterrorism
NASA Astrophysics Data System (ADS)
Srihari, Rohini K.
Unapparent information revelation (UIR) is a special case of text mining that focuses on detecting possible links between concepts across multiple text documents by generating an evidence trail explaining the connection. A traditional search involving, for example, two or more person names will attempt to find documents mentioning both these individuals. This research focuses on a different interpretation of such a query: what is the best evidence trail across documents that explains a connection between these individuals? For example, all may be good golfers. A generalization of this task involves query terms representing general concepts (e.g. indictment, foreign policy). Previous approaches to this problem have focused on graph mining involving hyperlinked documents, and link analysis exploiting named entities. A new robust framework is presented, based on (i) generating concept chain graphs, a hybrid content representation, (ii) performing graph matching to select candidate subgraphs, and (iii) subsequently using graphical models to validate hypotheses using ranked evidence trails. We adapt the DUC data set for cross-document summarization to evaluate evidence trails generated by this approach
Knowledge acquisition, semantic text mining, and security risks in health and biomedical informatics
Huang, Jingshan; Dou, Dejing; Dang, Jiangbo; Pardue, J Harold; Qin, Xiao; Huan, Jun; Gerthoffer, William T; Tan, Ming
2012-01-01
Computational techniques have been adopted in medical and biological systems for a long time. There is no doubt that the development and application of computational methods will render great help in better understanding biomedical and biological functions. Large amounts of datasets have been produced by biomedical and biological experiments and simulations. In order for researchers to gain knowledge from original data, nontrivial transformation is necessary, which is regarded as a critical link in the chain of knowledge acquisition, sharing, and reuse. Challenges that have been encountered include: how to efficiently and effectively represent human knowledge in formal computing models, how to take advantage of semantic text mining techniques rather than traditional syntactic text mining, and how to handle security issues during the knowledge sharing and reuse. This paper summarizes the state-of-the-art in these research directions. We aim to provide readers with an introduction of major computing themes to be applied to the medical and biological research. PMID:22371823
Chemical named entities recognition: a review on approaches and applications.
Eltyeb, Safaa; Salim, Naomie
2014-01-01
The rapid increase in the flow rate of published digital information in all disciplines has resulted in a pressing need for techniques that can simplify the use of this information. The chemistry literature is very rich with information about chemical entities. Extracting molecules and their related properties and activities from the scientific literature to "text mine" these extracted data and determine contextual relationships helps research scientists, particularly those in drug development. One of the most important challenges in chemical text mining is the recognition of chemical entities mentioned in the texts. In this review, the authors briefly introduce the fundamental concepts of chemical literature mining, the textual contents of chemical documents, and the methods of naming chemicals in documents. We sketch out dictionary-based, rule-based and machine learning, as well as hybrid chemical named entity recognition approaches with their applied solutions. We end with an outlook on the pros and cons of these approaches and the types of chemical entities extracted.
NASA Astrophysics Data System (ADS)
Hirdt, J. A.; Brown, D. A.
2016-01-01
The EXFOR library contains the largest collection of experimental nuclear reaction data available as well as the data's bibliographic information and experimental details. We text-mined the REACTION and MONITOR fields of the ENTRYs in the EXFOR library in order to identify understudied reactions and quantities. Using the results of the text-mining, we created an undirected graph from the EXFOR datasets with each graph node representing a single reaction and quantity and graph links representing the various types of connections between these reactions and quantities. This graph is an abstract representation of the connections in EXFOR, similar to graphs of social networks, authorship networks, etc. We use various graph theoretical tools to identify important yet understudied reactions and quantities in EXFOR. Although we identified a few cross sections relevant for shielding applications and isotope production, mostly we identified charged particle fluence monitor cross sections. As a side effect of this work, we learn that our abstract graph is typical of other real-world graphs.
Large-Scale Event Extraction from Literature with Multi-Level Gene Normalization
Wei, Chih-Hsuan; Hakala, Kai; Pyysalo, Sampo; Ananiadou, Sophia; Kao, Hung-Yu; Lu, Zhiyong; Salakoski, Tapio; Van de Peer, Yves; Ginter, Filip
2013-01-01
Text mining for the life sciences aims to aid database curation, knowledge summarization and information retrieval through the automated processing of biomedical texts. To provide comprehensive coverage and enable full integration with existing biomolecular database records, it is crucial that text mining tools scale up to millions of articles and that their analyses can be unambiguously linked to information recorded in resources such as UniProt, KEGG, BioGRID and NCBI databases. In this study, we investigate how fully automated text mining of complex biomolecular events can be augmented with a normalization strategy that identifies biological concepts in text, mapping them to identifiers at varying levels of granularity, ranging from canonicalized symbols to unique gene and proteins and broad gene families. To this end, we have combined two state-of-the-art text mining components, previously evaluated on two community-wide challenges, and have extended and improved upon these methods by exploiting their complementary nature. Using these systems, we perform normalization and event extraction to create a large-scale resource that is publicly available, unique in semantic scope, and covers all 21.9 million PubMed abstracts and 460 thousand PubMed Central open access full-text articles. This dataset contains 40 million biomolecular events involving 76 million gene/protein mentions, linked to 122 thousand distinct genes from 5032 species across the full taxonomic tree. Detailed evaluations and analyses reveal promising results for application of this data in database and pathway curation efforts. The main software components used in this study are released under an open-source license. Further, the resulting dataset is freely accessible through a novel API, providing programmatic and customized access (http://www.evexdb.org/api/v001/). Finally, to allow for large-scale bioinformatic analyses, the entire resource is available for bulk download from http://evexdb.org/download/, under the Creative Commons – Attribution – Share Alike (CC BY-SA) license. PMID:23613707
Ravikumar, Komandur Elayavilli; Wagholikar, Kavishwar B; Li, Dingcheng; Kocher, Jean-Pierre; Liu, Hongfang
2015-06-06
Advances in the next generation sequencing technology has accelerated the pace of individualized medicine (IM), which aims to incorporate genetic/genomic information into medicine. One immediate need in interpreting sequencing data is the assembly of information about genetic variants and their corresponding associations with other entities (e.g., diseases or medications). Even with dedicated effort to capture such information in biological databases, much of this information remains 'locked' in the unstructured text of biomedical publications. There is a substantial lag between the publication and the subsequent abstraction of such information into databases. Multiple text mining systems have been developed, but most of them focus on the sentence level association extraction with performance evaluation based on gold standard text annotations specifically prepared for text mining systems. We developed and evaluated a text mining system, MutD, which extracts protein mutation-disease associations from MEDLINE abstracts by incorporating discourse level analysis, using a benchmark data set extracted from curated database records. MutD achieves an F-measure of 64.3% for reconstructing protein mutation disease associations in curated database records. Discourse level analysis component of MutD contributed to a gain of more than 10% in F-measure when compared against the sentence level association extraction. Our error analysis indicates that 23 of the 64 precision errors are true associations that were not captured by database curators and 68 of the 113 recall errors are caused by the absence of associated disease entities in the abstract. After adjusting for the defects in the curated database, the revised F-measure of MutD in association detection reaches 81.5%. Our quantitative analysis reveals that MutD can effectively extract protein mutation disease associations when benchmarking based on curated database records. The analysis also demonstrates that incorporating discourse level analysis significantly improved the performance of extracting the protein-mutation-disease association. Future work includes the extension of MutD for full text articles.
Case history of controlling a landslide at Panluo open-pit mine in China
NASA Astrophysics Data System (ADS)
Wei, Zuoan; Yin, Guangzhi; Wan, Ling; Shen, Louyan
2008-04-01
Controlling of landsides safely and economically is a great challenge to mine operators because landslides are major geological problems especially in open-pit mines. In this paper, a case history at Panluo open-pit mine is presented in detail to share the experiences and lessons with mine operators. Panluo open-pit mine is located in the southwestern Fujian province of China. It is the largest open-pit iron mine in the Fujian province and was planned in 1965 and is in full operation from 1978. In July 1990, an earthquake of magnitude 5.3 in Taiwan Strait and big rainstorms impacted the mine slope, causing tension cracks and rather large-scale failures, and forming a U-shaped landslide. Total potential volume was estimated to be up to 1.0 × 106 m3. This directly threatened the mine production. In order to protect the mine production and the dwellers’ safety around, a dynamic comprehensive method was implemented including geotechnical investigations, in-situ testing and monitoring, stability analysis, and many mitigation and preventive measures. These measures slowed down the development and further occurrence of the landslide. The results showed that the landslides were still active, it was slowed with the control measures and moved rapidly with rainfall and mining down. However, no catastrophic accidents occurred and the pit mining was continued till it was closed at the elevation of 887 m in 2000. As a successful case of landslide control at an open-pit mine for 10 years, this paper reports the controlling measures in details. These experiences of landslide control may be beneficial to other similar mines for landslide control.
A semantic model for multimodal data mining in healthcare information systems.
Iakovidis, Dimitris; Smailis, Christos
2012-01-01
Electronic health records (EHRs) are representative examples of multimodal/multisource data collections; including measurements, images and free texts. The diversity of such information sources and the increasing amounts of medical data produced by healthcare institutes annually, pose significant challenges in data mining. In this paper we present a novel semantic model that describes knowledge extracted from the lowest-level of a data mining process, where information is represented by multiple features i.e. measurements or numerical descriptors extracted from measurements, images, texts or other medical data, forming multidimensional feature spaces. Knowledge collected by manual annotation or extracted by unsupervised data mining from one or more feature spaces is modeled through generalized qualitative spatial semantics. This model enables a unified representation of knowledge across multimodal data repositories. It contributes to bridging the semantic gap, by enabling direct links between low-level features and higher-level concepts e.g. describing body parts, anatomies and pathological findings. The proposed model has been developed in web ontology language based on description logics (OWL-DL) and can be applied to a variety of data mining tasks in medical informatics. It utility is demonstrated for automatic annotation of medical data.
Tagawa, Miki; Matsuda, Yoshio; Manaka, Tomoko; Kobayashi, Makiko; Ohwada, Michitaka; Matsubara, Shigeki
2017-01-01
The aim of the study was to examine the possibility of converting subjective textual data written in the free column space of the Mother and Child Handbook (MCH) into objective information using text mining and to compare any monthly changes in the words written by the mothers. Pregnant women without complications (n = 60) were divided into two groups according to State-Trait Anxiety Inventory grade: low trait anxiety (group I, n = 39) and high trait anxiety (group II, n = 21). Exploratory analysis of the textual data from the MCH was conducted by text mining using the Word Miner software program. Using 1203 structural elements extracted after processing, a comparison of monthly changes in the words used in the mothers' comments was made between the two groups. The data was mainly analyzed by a correspondence analysis. The structural elements in groups I and II were divided into seven and six clusters, respectively, by cluster analysis. Correspondence analysis revealed clear monthly changes in the words used in the mothers' comments as the pregnancy progressed in group I, whereas the association was not clear in group II. The text mining method was useful for exploratory analysis of the textual data obtained from pregnant women, and the monthly change in the words used in the mothers' comments as pregnancy progressed differed according to their degree of unease. © 2016 Japan Society of Obstetrics and Gynecology.
A Text-Mining Framework for Supporting Systematic Reviews.
Li, Dingcheng; Wang, Zhen; Wang, Liwei; Sohn, Sunghwan; Shen, Feichen; Murad, Mohammad Hassan; Liu, Hongfang
2016-11-01
Systematic reviews (SRs) involve the identification, appraisal, and synthesis of all relevant studies for focused questions in a structured reproducible manner. High-quality SRs follow strict procedures and require significant resources and time. We investigated advanced text-mining approaches to reduce the burden associated with abstract screening in SRs and provide high-level information summary. A text-mining SR supporting framework consisting of three self-defined semantics-based ranking metrics was proposed, including keyword relevance, indexed-term relevance and topic relevance. Keyword relevance is based on the user-defined keyword list used in the search strategy. Indexed-term relevance is derived from indexed vocabulary developed by domain experts used for indexing journal articles and books. Topic relevance is defined as the semantic similarity among retrieved abstracts in terms of topics generated by latent Dirichlet allocation, a Bayesian-based model for discovering topics. We tested the proposed framework using three published SRs addressing a variety of topics (Mass Media Interventions, Rectal Cancer and Influenza Vaccine). The results showed that when 91.8%, 85.7%, and 49.3% of the abstract screening labor was saved, the recalls were as high as 100% for the three cases; respectively. Relevant studies identified manually showed strong topic similarity through topic analysis, which supported the inclusion of topic analysis as relevance metric. It was demonstrated that advanced text mining approaches can significantly reduce the abstract screening labor of SRs and provide an informative summary of relevant studies.
Bravo, Àlex; Piñero, Janet; Queralt-Rosinach, Núria; Rautschka, Michael; Furlong, Laura I
2015-02-21
Current biomedical research needs to leverage and exploit the large amount of information reported in scientific publications. Automated text mining approaches, in particular those aimed at finding relationships between entities, are key for identification of actionable knowledge from free text repositories. We present the BeFree system aimed at identifying relationships between biomedical entities with a special focus on genes and their associated diseases. By exploiting morpho-syntactic information of the text, BeFree is able to identify gene-disease, drug-disease and drug-target associations with state-of-the-art performance. The application of BeFree to real-case scenarios shows its effectiveness in extracting information relevant for translational research. We show the value of the gene-disease associations extracted by BeFree through a number of analyses and integration with other data sources. BeFree succeeds in identifying genes associated to a major cause of morbidity worldwide, depression, which are not present in other public resources. Moreover, large-scale extraction and analysis of gene-disease associations, and integration with current biomedical knowledge, provided interesting insights on the kind of information that can be found in the literature, and raised challenges regarding data prioritization and curation. We found that only a small proportion of the gene-disease associations discovered by using BeFree is collected in expert-curated databases. Thus, there is a pressing need to find alternative strategies to manual curation, in order to review, prioritize and curate text-mining data and incorporate it into domain-specific databases. We present our strategy for data prioritization and discuss its implications for supporting biomedical research and applications. BeFree is a novel text mining system that performs competitively for the identification of gene-disease, drug-disease and drug-target associations. Our analyses show that mining only a small fraction of MEDLINE results in a large dataset of gene-disease associations, and only a small proportion of this dataset is actually recorded in curated resources (2%), raising several issues on data prioritization and curation. We propose that joint analysis of text mined data with data curated by experts appears as a suitable approach to both assess data quality and highlight novel and interesting information.
tmBioC: improving interoperability of text-mining tools with BioC.
Khare, Ritu; Wei, Chih-Hsuan; Mao, Yuqing; Leaman, Robert; Lu, Zhiyong
2014-01-01
The lack of interoperability among biomedical text-mining tools is a major bottleneck in creating more complex applications. Despite the availability of numerous methods and techniques for various text-mining tasks, combining different tools requires substantial efforts and time owing to heterogeneity and variety in data formats. In response, BioC is a recent proposal that offers a minimalistic approach to tool interoperability by stipulating minimal changes to existing tools and applications. BioC is a family of XML formats that define how to present text documents and annotations, and also provides easy-to-use functions to read/write documents in the BioC format. In this study, we introduce our text-mining toolkit, which is designed to perform several challenging and significant tasks in the biomedical domain, and repackage the toolkit into BioC to enhance its interoperability. Our toolkit consists of six state-of-the-art tools for named-entity recognition, normalization and annotation (PubTator) of genes (GenNorm), diseases (DNorm), mutations (tmVar), species (SR4GN) and chemicals (tmChem). Although developed within the same group, each tool is designed to process input articles and output annotations in a different format. We modify these tools and enable them to read/write data in the proposed BioC format. We find that, using the BioC family of formats and functions, only minimal changes were required to build the newer versions of the tools. The resulting BioC wrapped toolkit, which we have named tmBioC, consists of our tools in BioC, an annotated full-text corpus in BioC, and a format detection and conversion tool. Furthermore, through participation in the 2013 BioCreative IV Interoperability Track, we empirically demonstrate that the tools in tmBioC can be more efficiently integrated with each other as well as with external tools: Our experimental results show that using BioC reduces >60% in lines of code for text-mining tool integration. The tmBioC toolkit is publicly available at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/. Database URL: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.
Feasibility of CO2 Sequestration as a Closure Option for Underground Coal Mine
NASA Astrophysics Data System (ADS)
Ray, Sutapa; Dey, Kaushik
2018-04-01
The Kyoto Protocol, 1998, was signed by member countries to reduce greenhouse gas (GHG) emissions to a minimum acceptable level. India agreed to Kyoto Protocol since 2002 and started its research on GHG mitigation. Few researchers have carried out research work on CO2 sequestration in different rock formations. However, CO2 sequestration in abandoned mines has yet not drawn its attention largely. In the past few years or decades, a significant amount of research and development has been done on Carbon Capture and Storage (CCS) technologies, since it is a possible solution for assuring less emission of CO2 to the atmosphere from power plants and some other major industrial plants. CCS mainly involves three steps: (a) capture and compression of CO2 from source (power plants and industrial areas), (b) transportation of captured CO2 to the storage mine and (c) injecting CO2 into underground mine. CO2 is stored at an underground mine mainly in three forms: (1) adsorbed in the coals left as pillars of the mine, (2) absorbed in water through a chemical process and (3) filled in void with compressed CO2. Adsorption isotherm is a graph developed between the amounts of adsorbate adsorbed on the surface of adsorbent and the pressure at constant temperature. Various types of adsorption isotherms are available, namely, Freundlich, Langmuir and BET theory. Indian coal is different in origin from most of the international coal deposits and thus demands isotherm experiments of the same to arrive at the right adsorption isotherm. To carry out these experiments, adsorption isotherm set up is fabricated in the laboratory with a capacity to measure the adsorbed volume up to a pressure level of 100 bar. The coal samples are collected from the pillars and walls of the underground coal seam using a portable drill machine. The adsorption isotherm experiments have been carried out for the samples taken from a mine. From the adsorption isotherm experiments, Langmuir Equation is found to be more acceptable as compared to Freundlich's and BET adsorption isotherm models. CO2 is soluble in water and is reversibly formed carbonic acid. It is a weak acid since its ionization in water is incomplete. The CO2 solubility in water is estimated from the experimental results published by Wiebe and Gaddy. In most of the cases of abandoned mines, the chances of available air filled void space is limited as the level of operation is below the water table. So it is expected that the void would be completely filled with water. During this research investigation, the practical experimentation for CO2 sequestration was not within the scope. Thus, one operating mine was considered for the feasibility study. The sequestrated quantities of CO2 in terms of adsorbed volume and soluble volume were quantified. The cost of the CO2 was taken from the standard international literature. The sealing cost of the shaft was also considered. Costs of CO2 sequestration for different pressure were estimated for the mine.
NASA Astrophysics Data System (ADS)
Tyulenev, Maxim; Lesin, Yury; Litvin, Oleg; Maliukhina, Elena; Abay, Asmelash
2017-11-01
Features of geological structure of the Kuznetsk coal basin stipulate the application of a low-cost open technique of coal mining, which is more advantageous both from the economic standpoint, and by safety criteria of mining. However, open mining affects significantly the water resources of region. Intensive pollution of reservoirs and water courses, exhaustion of the underground water-bearing layers, violation of a hydrographic network, etc. be-long to the main disadvantages of an open technique of coal mining. Besides, the volume of the water coming into the mining producers exceeds signi-ficantly the needed quantity. According to the data of annual reports of ecology and natural resources department, 348.277 million m3 of water were ta-ken away during production of soft coal, brown coal and lignum fossil from waters of Kemerovo region in 2013 (mostly from underground water objects (96,5%) when draining of mine openings). At the same time, only 87.018 million m3 of water (25%) has been used within a year.
Mining Health-Related Issues in Consumer Product Reviews by Using Scalable Text Analytics
Torii, Manabu; Tilak, Sameer S.; Doan, Son; Zisook, Daniel S.; Fan, Jung-wei
2016-01-01
In an era when most of our life activities are digitized and recorded, opportunities abound to gain insights about population health. Online product reviews present a unique data source that is currently underexplored. Health-related information, although scarce, can be systematically mined in online product reviews. Leveraging natural language processing and machine learning tools, we were able to mine 1.3 million grocery product reviews for health-related information. The objectives of the study were as follows: (1) conduct quantitative and qualitative analysis on the types of health issues found in consumer product reviews; (2) develop a machine learning classifier to detect reviews that contain health-related issues; and (3) gain insights about the task characteristics and challenges for text analytics to guide future research. PMID:27375358
Mining Health-Related Issues in Consumer Product Reviews by Using Scalable Text Analytics.
Torii, Manabu; Tilak, Sameer S; Doan, Son; Zisook, Daniel S; Fan, Jung-Wei
2016-01-01
In an era when most of our life activities are digitized and recorded, opportunities abound to gain insights about population health. Online product reviews present a unique data source that is currently underexplored. Health-related information, although scarce, can be systematically mined in online product reviews. Leveraging natural language processing and machine learning tools, we were able to mine 1.3 million grocery product reviews for health-related information. The objectives of the study were as follows: (1) conduct quantitative and qualitative analysis on the types of health issues found in consumer product reviews; (2) develop a machine learning classifier to detect reviews that contain health-related issues; and (3) gain insights about the task characteristics and challenges for text analytics to guide future research.
A Volterra series-based method for extracting target echoes in the seafloor mining environment.
Zhao, Haiming; Ji, Yaqian; Hong, Yujiu; Hao, Qi; Ma, Liyong
2016-09-01
The purpose of this research was to evaluate the applicability of the Volterra adaptive method to predict the target echo of an ultrasonic signal in an underwater seafloor mining environment. There is growing interest in mining of seafloor minerals because they offer an alternative source of rare metals. Mining the minerals cause the seafloor sediments to be stirred up and suspended in sea water. In such an environment, the target signals used for seafloor mapping are unable to be detected because of the unavoidable presence of volume reverberation induced by the suspended sediments. The detection of target signals in reverberation is currently performed using a stochastic model (for example, the autoregressive (AR) model) based on the statistical characterisation of reverberation. However, we examined a new method of signal detection in volume reverberation based on the Volterra series by confirming that the reverberation is a chaotic signal and generated by a deterministic process. The advantage of this method over the stochastic model is that attributions of the specific physical process are considered in the signal detection problem. To test the Volterra series based method and its applicability to target signal detection in the volume reverberation environment derived from the seafloor mining process, we simulated the real-life conditions of seafloor mining in a water filled tank of dimensions of 5×3×1.8m. The bottom of the tank was covered with 10cm of an irregular sand layer under which 5cm of an irregular cobalt-rich crusts layer was placed. The bottom was interrogated by an acoustic wave generated as 16μs pulses of 500kHz frequency. This frequency is demonstrated to ensure a resolution on the order of one centimetre, which is adequate in exploration practice. Echo signals were collected with a data acquisition card (PCI 1714 UL, 12-bit). Detection of the target echo in these signals was performed by both the Volterra series based model and the AR model. The results obtained confirm that the Volterra series based method is more efficient in the detection of the signal in reverberation than the conventional AR model (the accuracy is 80% for the PIM-Volterra prediction model versus 40% for the AR model). Copyright © 2016 Elsevier B.V. All rights reserved.
Weighted mining of massive collections of [Formula: see text]-values by convex optimization.
Dobriban, Edgar
2018-06-01
Researchers in data-rich disciplines-think of computational genomics and observational cosmology-often wish to mine large bodies of [Formula: see text]-values looking for significant effects, while controlling the false discovery rate or family-wise error rate. Increasingly, researchers also wish to prioritize certain hypotheses, for example, those thought to have larger effect sizes, by upweighting, and to impose constraints on the underlying mining, such as monotonicity along a certain sequence. We introduce Princessp , a principled method for performing weighted multiple testing by constrained convex optimization. Our method elegantly allows one to prioritize certain hypotheses through upweighting and to discount others through downweighting, while constraining the underlying weights involved in the mining process. When the [Formula: see text]-values derive from monotone likelihood ratio families such as the Gaussian means model, the new method allows exact solution of an important optimal weighting problem previously thought to be non-convex and computationally infeasible. Our method scales to massive data set sizes. We illustrate the applications of Princessp on a series of standard genomics data sets and offer comparisons with several previous 'standard' methods. Princessp offers both ease of operation and the ability to scale to extremely large problem sizes. The method is available as open-source software from github.com/dobriban/pvalue_weighting_matlab (accessed 11 October 2017).
30 CFR 900.15 - Federal lands program cooperative agreements.
Code of Federal Regulations, 2010 CFR
2010-07-01
.... 900.15 Section 900.15 Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR PROGRAMS FOR THE CONDUCT OF SURFACE MINING OPERATIONS WITHIN EACH STATE INTRODUCTION § 900.15 Federal lands program cooperative agreements. The full text of any State and Federal...
30 CFR 900.12 - State regulatory programs.
Code of Federal Regulations, 2010 CFR
2010-07-01
....12 Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR PROGRAMS FOR THE CONDUCT OF SURFACE MINING OPERATIONS WITHIN EACH STATE INTRODUCTION § 900.12 State... to be codified under the applicable part number assigned to the State. The full text will not appear...
Literature Mining Methods for Toxicology and Construction of ...
Webinar Presentation on text-mining methodologies in use at NCCT and how they can be used to assist with the OECD Retinoid project. Presentation to 1st Workshop/Scientific Expert Group meeting on the OECD Retinoid Project - April 26, 2016 –Brussels, Presented remotely via web.
Untangling Topic Threads in Chat-Based Communication: A Case Study
2011-08-01
learning techniques such as clustering are very popular for analyzing text for topic identification (Anjewierden,, Kollöffel and Hulshof 2007; Adams...Anjewierden, A., Kollöffel, B., and Hulshof , C. (2007). Towards educational data mining: Using data mining methods for automated chat analysis to
Smith, D. Charlie
2016-12-14
Lead and zinc were mined in the Tri-State Mining District (TSMD) of southwest Missouri, northeast Oklahoma, and southeast Kansas for more than 100 years. The effects of mining on the landscape are still evident, nearly 50 years after the last mine ceased operation. The legacies of mining are the mine waste and discharge of groundwater from underground mines. The mine-waste piles and underground mines are continuous sources of trace metals (primarily lead, zinc, and cadmium) to the streams that drain the TSMD. Many previous studies characterized the horizontal extent of mine-waste contamination in streams but little information exists on the depth of mine-waste contamination in these streams. Characterizing the vertical extent of contamination is difficult because of the large amount of coarse-grained material, ranging from coarse gravel to boulders, within channel sediment. The U.S. Geological Survey, in cooperation with U.S. Fish and Wildlife service, collected channel-sediment samples at depth for subsequent analyses that would allow attainment of the following goals: (1) determination of the relation between concentration and depth for lead, zinc and cadmium in channel sediments and flood-plain sediments, and (2) determination of the volume of gravel-bar sediment from the surface to the maximum depth with concentrations of these metals that exceeded sediment-quality guidelines. For the purpose of this report, volume of gravel-bar sediment is considered to be distributed in two forms, gravel bars and the wetted channel, and this study focused on gravel bars. Concentrations of lead, zinc, and cadmium in samples were compared to the consensus probable effects concentration (CPEC) and Tri-State Mining District specific probable effects concentration (TPEC) sediment-quality guidelines.During the study, more than 700 sediment samples were collected from borings at multiple sites, including gravel bars and flood plains, along Center Creek, Turkey Creek, Shoal Creek, Tar Creek, and Spring River in order to characterize the vertical extent of mine waste in select streams in the TSMD. The largest concentrations of lead, zinc, and cadmium in gravel bar-sediment samples generally were detected in Turkey Creek and Tar Creek and the smallest concentrations were detected in Shoal Creek followed by the Spring River. Gravel bar-sediment samples from Turkey Creek exceeded the CPEC for cadmium (minimum of 70 percent of samples), lead (94 percent), and zinc (99 percent) at a slightly higher frequency than similar samples from Tar Creek (69 percent, 88 percent, and 96 percent, respectively). Gravel bar-sediment samples from Turkey Creek also contained the largest concentrations of cadmium (174 milligrams per kilogram [mg/kg]) and lead (7,520 mg/kg) detected; however, the largest zinc concentration (46,600 mg/kg) was detected in a gravel bar-sediment sample from Tar Creek. In contrast, none of the 65 gravel bar-sediment samples from Shoal Creek contained cadmium above the x-ray fluorescence reporting level of 12 mg/kg, and lead and zinc exceeded the CPEC in only 12 percent and 74 percent of samples, respectively. In most cases, concentrations of lead and zinc above the CPEC or TPEC were present at the maximum depth of boring, which indicated that nearly the entire thickness of sediment in the stream has been contaminated by mine wastes. Approximately 284,000 cubic yards of channel sediment from land surface to the maximum depth that exceeded the CPEC and approximately 236,000 cubic yards of channel sediment from land surface to the maximum depth that exceeded the TPEC were estimated along 37.6 of the 55.1 miles of Center Creek, Turkey Creek, Shoal Creek, and Tar Creek examined in this study. Mine-waste contamination reported along additional reaches of these streams is beyond the scope of this study. Flood-plain cores collected in the TSMD generally only had exceedances of the CPEC and TPEC for lead and zinc in the top 1 or 2 feet of soil with a few exceptions, such as cores in low areas near the stream or cores in areas disturbed by past mining.
The Effects of Sand Sediment Volume Heterogeneities on Sound Propagation and Scattering
2012-09-30
modulus of a poroelastic medium,” J. Acoust . Soc. Am. 127, 3372–3384 (2010). 3. K. L. Williams, “An effective density fluid model for acoustic ...previously developed at APL- UW for the study of high-frequency acoustics . These models include perturbation models applied to scattering from the...scattering levels that may mask target detection. RELATED PROJECTS 1. “ Acoustic Color of mines and mine-like objects: Finite Element modeling (FEM
FlyMine: an integrated database for Drosophila and Anopheles genomics
Lyne, Rachel; Smith, Richard; Rutherford, Kim; Wakeling, Matthew; Varley, Andrew; Guillier, Francois; Janssens, Hilde; Ji, Wenyan; Mclaren, Peter; North, Philip; Rana, Debashis; Riley, Tom; Sullivan, Julie; Watkins, Xavier; Woodbridge, Mark; Lilley, Kathryn; Russell, Steve; Ashburner, Michael; Mizuguchi, Kenji; Micklem, Gos
2007-01-01
FlyMine is a data warehouse that addresses one of the important challenges of modern biology: how to integrate and make use of the diversity and volume of current biological data. Its main focus is genomic and proteomics data for Drosophila and other insects. It provides web access to integrated data at a number of different levels, from simple browsing to construction of complex queries, which can be executed on either single items or lists. PMID:17615057
Folksong in the Classroom. Volume XI, Numbers 1-3, 1990-91.
ERIC Educational Resources Information Center
Scott, John W., Ed.
1991-01-01
This volume of a journal on folksong for elementary and secondary teachers of history, literature, music, and the humanities contains three issues. The Fall 1990 issue is devoted to the songs of Newfoundland. The Winter 1991 issue features songs concerning mine, mill and tunnel workers in the years 1877-1932. The Spring 1991 issue focuses on songs…
Comparison between BIDE, PrefixSpan, and TRuleGrowth for Mining of Indonesian Text
NASA Astrophysics Data System (ADS)
Sa'adillah Maylawati, Dian; Irfan, Mohamad; Budiawan Zulfikar, Wildan
2017-01-01
Mining proscess for Indonesian language still be an interesting research. Multiple of words representation was claimed can keep the meaning of text better than bag of words. In this paper, we compare several sequential pattern algortihm, among others BIDE (BIDirectional Extention), PrefixSpan, and TRuleGrowth. All of those algorithm produce frequent word sequence to keep the meaning of text. However, the experiment result, with 14.006 of Indonesian tweet from Twitter, shows that BIDE can produce more efficient frequent word sequence than PrefixSpan and TRuleGrowth without missing the meaning of text. Then, the average of time process of PrefixSpan is faster than BIDE and TRuleGrowth. In the other hand, PrefixSpan and TRuleGrowth is more efficient in using memory than BIDE.
Church, Stan E.; Von Guerard, Paul; Finger, Susan E.
2007-01-01
This publication comprises a Volume Contents of chapters (listed below) and a CD-ROM of data (contents shown in column at right). The Animas River watershed in southwest Colorado is one of many watersheds in the western United States where historical mining has left a legacy of acid mine drainage and elevated concentrations of potentially toxic trace elements in surface streams. U.S. Geological Survey scientists have completed a major assessment of the environmental effects of historical mining in the Animas River watershed focusing on the area upstream of Silverton, Colo.?the Mineral Creek, Cement Creek, and upper Animas River basins. The study demonstrated how the watershed approach can be used to assess and rank mining-affected sites for possible cleanup. The study was conducted in collaboration with State and Federal land-management agencies and regional stakeholders groups. This book is available for purchase at Information Services, U.S. Geological Survey (1-888-ASK-USGS).
Otton, James K.
2011-01-01
Studies of the natural environment in the Grants Mineral Belt in northwestern New Mexico have been conducted since the 1930s; however, few such investigations predate uranium mining and milling operations, which began in the early 1950s. This report provides an annotated bibliography of reports that describe the hydrology and geochemistry of groundwaters and surface waters and the geochemistry of soils and sediments in the Grants Mineral Belt and contiguous areas. The reports referenced and discussed provide a large volume of information about the environmental conditions in the area after mining started. Data presented in many of these studies, if evaluated carefully, may provide much basic information about the baseline conditions that existed over large parts of the Grants Mineral Belt prior to mining. Other data may provide information that can direct new work in efforts to discriminate between baseline conditions and the effects of the mining and milling on the natural environment.
Online discourse on fibromyalgia: text-mining to identify clinical distinction and patient concerns.
Park, Jungsik; Ryu, Young Uk
2014-10-07
The purpose of this study was to evaluate the possibility of using text-mining to identify clinical distinctions and patient concerns in online memoires posted by patients with fibromyalgia (FM). A total of 399 memoirs were collected from an FM group website. The unstructured data of memoirs associated with FM were collected through a crawling process and converted into structured data with a concordance, parts of speech tagging, and word frequency. We also conducted a lexical analysis and phrase pattern identification. After examining the data, a set of FM-related keywords were obtained and phrase net relationships were set through a web-based visualization tool. The clinical distinction of FM was verified. Pain is the biggest issue to the FM patients. The pains were affecting body parts including 'muscles,' 'leg,' 'neck,' 'back,' 'joints,' and 'shoulders' with accompanying symptoms such as 'spasms,' 'stiffness,' and 'aching,' and were described as 'sever,' 'chronic,' and 'constant.' This study also demonstrated that it was possible to understand the interests and concerns of FM patients through text-mining. FM patients wanted to escape from the pain and symptoms, so they were interested in medical treatment and help. Also, they seemed to have interest in their work and occupation, and hope to continue to live life through the relationships with the people around them. This research shows the potential for extracting keywords to confirm the clinical distinction of a certain disease, and text-mining can help objectively understand the concerns of patients by generalizing their large number of subjective illness experiences. However, it is believed that there are limitations to the processes and methods for organizing and classifying large amounts of text, so these limits have to be considered when analyzing the results. The development of research methodology to overcome these limitations is greatly needed.
Unsupervised text mining for assessing and augmenting GWAS results.
Ailem, Melissa; Role, François; Nadif, Mohamed; Demenais, Florence
2016-04-01
Text mining can assist in the analysis and interpretation of large-scale biomedical data, helping biologists to quickly and cheaply gain confirmation of hypothesized relationships between biological entities. We set this question in the context of genome-wide association studies (GWAS), an actively emerging field that contributed to identify many genes associated with multifactorial diseases. These studies allow to identify groups of genes associated with the same phenotype, but provide no information about the relationships between these genes. Therefore, our objective is to leverage unsupervised text mining techniques using text-based cosine similarity comparisons and clustering applied to candidate and random gene vectors, in order to augment the GWAS results. We propose a generic framework which we used to characterize the relationships between 10 genes reported associated with asthma by a previous GWAS. The results of this experiment showed that the similarities between these 10 genes were significantly stronger than would be expected by chance (one-sided p-value<0.01). The clustering of observed and randomly selected gene also allowed to generate hypotheses about potential functional relationships between these genes and thus contributed to the discovery of new candidate genes for asthma. Copyright © 2016 Elsevier Inc. All rights reserved.
Stansfield, Claire; O'Mara-Eves, Alison; Thomas, James
2017-09-01
Using text mining to aid the development of database search strings for topics described by diverse terminology has potential benefits for systematic reviews; however, methods and tools for accomplishing this are poorly covered in the research methods literature. We briefly review the literature on applications of text mining for search term development for systematic reviewing. We found that the tools can be used in 5 overarching ways: improving the precision of searches; identifying search terms to improve search sensitivity; aiding the translation of search strategies across databases; searching and screening within an integrated system; and developing objectively derived search strategies. Using a case study and selected examples, we then reflect on the utility of certain technologies (term frequency-inverse document frequency and Termine, term frequency, and clustering) in improving the precision and sensitivity of searches. Challenges in using these tools are discussed. The utility of these tools is influenced by the different capabilities of the tools, the way the tools are used, and the text that is analysed. Increased awareness of how the tools perform facilitates the further development of methods for their use in systematic reviews. Copyright © 2017 John Wiley & Sons, Ltd.
U-Compare: share and compare text mining tools with UIMA.
Kano, Yoshinobu; Baumgartner, William A; McCrohon, Luke; Ananiadou, Sophia; Cohen, K Bretonnel; Hunter, Lawrence; Tsujii, Jun'ichi
2009-08-01
Due to the increasing number of text mining resources (tools and corpora) available to biologists, interoperability issues between these resources are becoming significant obstacles to using them effectively. UIMA, the Unstructured Information Management Architecture, is an open framework designed to aid in the construction of more interoperable tools. U-Compare is built on top of the UIMA framework, and provides both a concrete framework for out-of-the-box text mining and a sophisticated evaluation platform allowing users to run specific tools on any target text, generating both detailed statistics and instance-based visualizations of outputs. U-Compare is a joint project, providing the world's largest, and still growing, collection of UIMA-compatible resources. These resources, originally developed by different groups for a variety of domains, include many famous tools and corpora. U-Compare can be launched straight from the web, without needing to be manually installed. All U-Compare components are provided ready-to-use and can be combined easily via a drag-and-drop interface without any programming. External UIMA components can also simply be mixed with U-Compare components, without distinguishing between locally and remotely deployed resources. http://u-compare.org/
Perchlorate in Lake Water from an Operating Diamond Mine.
Smith, Lianna J D; Ptacek, Carol J; Blowes, David W; Groza, Laura G; Moncur, Michael C
2015-07-07
Mining-related perchlorate [ClO4(-)] in the receiving environment was investigated at the operating open-pit and underground Diavik diamond mine, Northwest Territories, Canada. Samples were collected over four years and ClO4(-) was measured in various mine waters, the 560 km(2) ultraoligotrophic receiving lake, background lake water and snow distal from the mine. Groundwaters from the underground mine had variable ClO4(-) concentrations, up to 157 μg L(-1), and were typically an order of magnitude higher than concentrations in combined mine waters prior to treatment and discharge to the lake. Snow core samples had a mean ClO4(-) concentration of 0.021 μg L(-1) (n=16). Snow and lake water Cl(-)/ClO4(-) ratios suggest evapoconcentration was not an important process affecting lake ClO4(-) concentrations. The multiyear mean ClO4(-) concentrations in the lake were 0.30 μg L(-1) (n = 114) in open water and 0.24 μg L(-1) (n = 107) under ice, much below the Canadian drinking water guideline of 6 μg L(-1). Receiving lake concentrations of ClO4(-) generally decreased year over year and ClO4(-) was not likely [biogeo]chemically attenuated within the receiving lake. The discharge of treated mine water was shown to contribute mining-related ClO4(-) to the lake and the low concentrations after 12 years of mining were attributed to the large volume of the receiving lake.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kargupta, H.; Stafford, B.; Hamzaoglu, I.
This paper describes an experimental parallel/distributed data mining system PADMA (PArallel Data Mining Agents) that uses software agents for local data accessing and analysis and a web based interface for interactive data visualization. It also presents the results of applying PADMA for detecting patterns in unstructured texts of postmortem reports and laboratory test data for Hepatitis C patients.
29 CFR 570.33 - Prohibited occupations for minors 14 and 15 years of age.
Code of Federal Regulations, 2010 CFR
2010-07-01
... shall apply to all occupations other than the following: (a) Manufacturing, mining, or processing... revised text is set forth as follows: § 570.33 Occupations that are prohibited to minors 14 and 15 years... age: (a) Manufacturing, mining, or processing occupations, including occupations requiring the...
Federal Register 2010, 2011, 2012, 2013, 2014
2011-07-28
... considered but eliminated from detailed analysis include conventional uranium mining and milling, conventional mining and heap leach processing, alternative site location, alternate lixiviants, and alternate...'s Agencywide Document Access and Management System (ADAMS), which provides text and image files of...
30 CFR 732.17 - State program amendments.
Code of Federal Regulations, 2010 CFR
2010-07-01
... Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR... the number or size of coal exploration or surface coal mining and reclamation operations in the State... amendment(s) is being reviewed by the Director and will include the following: (i) The text or a summary of...
30 CFR 745.11 - Application and agreement.
Code of Federal Regulations, 2010 CFR
2010-07-01
....11 Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR... approval under part 731 of this chapter, and has or may have within the State surface coal mining and... the full text of the terms of the proposed cooperative agreement as submitted or as subsequently...
29 CFR 570.118 - Sixteen-year minimum.
Code of Federal Regulations, 2010 CFR
2010-07-01
... for employment in manufacturing or mining occupations. Furthermore, this age minimum is applicable to... convenience of the user, the revised text is set forth as follows: § 570.118 Sixteen-year minimum. The Act sets a 16-year-age minimum for employment in manufacturing or mining occupations, although under FLSA...
Code of Federal Regulations, 2010 CFR
2010-07-01
... other than the following: (1) Manufacturing, (2) Mining, (3) An occupation found by the Secretary to be..., the revised text is set forth as follows: § 570.122 General. (a) Specific exemptions from the child... sixteen years in any occupation other than manufacturing, mining, or an occupation found by the Secretary...
29 CFR 570.119 - Fourteen-year minimum.
Code of Federal Regulations, 2010 CFR
2010-07-01
... occupations other than manufacturing and mining, the Secretary is authorized to issue regulations or orders... Subpart C of this part. 29-30 [Reserved] (a) Manufacturing, mining, or processing occupations; (b... of the user, the revised text is set forth as follows: § 570.119 Fourteen-year minimum. With respect...
OSCAR4: a flexible architecture for chemical text-mining.
Jessop, David M; Adams, Sam E; Willighagen, Egon L; Hawizy, Lezan; Murray-Rust, Peter
2011-10-14
The Open-Source Chemistry Analysis Routines (OSCAR) software, a toolkit for the recognition of named entities and data in chemistry publications, has been developed since 2002. Recent work has resulted in the separation of the core OSCAR functionality and its release as the OSCAR4 library. This library features a modular API (based on reduction of surface coupling) that permits client programmers to easily incorporate it into external applications. OSCAR4 offers a domain-independent architecture upon which chemistry specific text-mining tools can be built, and its development and usage are discussed.
The effect of selenium on spoil suitability as root zone material at Navajo Mine, New Mexico
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lane, J.R.; Buchanan, B.A.; Ramsey, T.C.
1995-09-01
The root zone suitability limits for spoil Se at Navajo Mine in northwest New Mexico are currently 0.8 ppm total Se and 0.15 ppm hot-water soluble Se. These criteria were largely developed by the Office of Surface Mining using data from the Northern Great Plains. Applying these values, approximately 23% of the spoil volume and 47% of the spoil area sampled at Navajo Mine from 1985 to December 1993 were determined to be unsuitable as root zone material. Secondary Se accumulator plants (Atriplex canescens) growing in both undisturbed and reclaimed areas were randomly sampled for selenium from 1985 to Decembermore » 1993. In most cases the undisturbed soil and reclaimed spoil at these plant sampling sites were sampled for both total and hot-water soluble Se. Selenium values for Atriplex canescens samples collected on the undisturbed sites averaged 0.64 ppm and ranged from 0.20 ppm to 2.5 ppm. Selenium values for the plants growing on spoil ranged from 0.02 ppm to 7.75 ppm and averaged 1.07 ppm. Total and hot-water Se values for spoil averaged 0.66 ppm and 0.06 ppm respectively, and ranged from 0.0 to 14.2 for total Se and 0.0 ppm to 0.72 ppm for hot-water soluble Se. The plant Se values were poorly correlated to both total and hot-water soluble Se values for both soil and spoil. Therefore, predicting suitable guidelines using normal regression techniques was ineffective. Based on background Se levels in native soils, and levels found on reclaimed areas with Atriplex canescens, it is suggested that a total Se level of 2.0 ppm and a hot-water soluble Se level of 0.25 ppm should be used to represent the suitability limits for Se at Navajo Mine. If these Se values are used, it is estimated that less than 1% of the spoil volume would be unsuitable. This volume of spoil seems to be a more accurate estimate of the amount of spoil with unsuitable levels of Se than the estimated 23% using the current guidelines.« less
Mining Consumer Health Vocabulary from Community-Generated Text
Vydiswaran, V.G. Vinod; Mei, Qiaozhu; Hanauer, David A.; Zheng, Kai
2014-01-01
Community-generated text corpora can be a valuable resource to extract consumer health vocabulary (CHV) and link them to professional terminologies and alternative variants. In this research, we propose a pattern-based text-mining approach to identify pairs of CHV and professional terms from Wikipedia, a large text corpus created and maintained by the community. A novel measure, leveraging the ratio of frequency of occurrence, was used to differentiate consumer terms from professional terms. We empirically evaluated the applicability of this approach using a large data sample consisting of MedLine abstracts and all posts from an online health forum, MedHelp. The results show that the proposed approach is able to identify synonymous pairs and label the terms as either consumer or professional term with high accuracy. We conclude that the proposed approach provides great potential to produce a high quality CHV to improve the performance of computational applications in processing consumer-generated health text. PMID:25954426
Man-caused seismicity of Kuzbass
NASA Astrophysics Data System (ADS)
Emanov, Alexandr; Emanov, Alexey; Leskova, Ekaterina; Fateyev, Alexandr
2010-05-01
A natural seismicity of Kuznetsk Basin is confined in the main to mountain frame of Kuznetsk hollow. In this paper materials of experimental work with local station networks within sediment basin are presented. Two types of seismicity display within Kuznetsk hollow have been understood: first, man-caused seismic processes, confined to mine working and concentrated on depths up to one and a half of km; secondly, seismic activations on depths of 2-56 km, not coordinated in plan with coal mines. Every of studied seismic activations consists of large quantity of earthquakes of small powers (Ms=1-3). From one to first tens of earthquakes were recorded in a day. The earthquakes near mine working shift in space along with mine working, and seismic process become stronger at the instant a coal-plough machine is operated, and slacken at the instant the preventive works are executed. The seismic processes near three lavas in Kuznetsk Basin have been studied in detail. Uplift is the most typical focal mechanism. Activated zone near mine working reach in diameter 1-1,5 km. Seismic activations not linked with mine working testify that the subsoil of Kuznetsk hollow remain in stress state in whole. The most probable causes of man-caused action on hollow are processes, coupled with change of physical state of rocks at loss of methane from large volume or change by mine working of rock watering in large volume. In this case condensed rocks, lost gas and water, can press out upwards, realizing the reverse fault mechanism of earthquakes. A combination of stress state of hollow with man-caused action at deep mining may account for incipient activations in Kuznetsk Basin. Today earthquakes happen mainly under mine workings, though damages of workings themselves do not happen, but intensive shaking on surface calls for intent study of so dangerous phenomena. In 2009 replicates of the experiment on research of seismic activations in area of before investigated lavas have been conducted. A spatial displacement of activations along with mine working has been found. An impact of technogeneous factors on behavior of seismic process was investigated. It was demonstrated that industrial explosions in neighboring open-casts have no pronounced effect on seismic process near lavas. Stoppage of mole work in lavas leads to simultaneous changes in man-caused seismicity. The number of technogeneous earthquakes is halved. The earthquakes of small powers remain, but such slack lead to occasional though more strong technogeneous earthquakes.
Wagland, Richard; Recio-Saucedo, Alejandra; Simon, Michael; Bracher, Michael; Hunt, Katherine; Foster, Claire; Downing, Amy; Glaser, Adam; Corner, Jessica
2016-08-01
Quality of cancer care may greatly impact on patients' health-related quality of life (HRQoL). Free-text responses to patient-reported outcome measures (PROMs) provide rich data but analysis is time and resource-intensive. This study developed and tested a learning-based text-mining approach to facilitate analysis of patients' experiences of care and develop an explanatory model illustrating impact on HRQoL. Respondents to a population-based survey of colorectal cancer survivors provided free-text comments regarding their experience of living with and beyond cancer. An existing coding framework was tested and adapted, which informed learning-based text mining of the data. Machine-learning algorithms were trained to identify comments relating to patients' specific experiences of service quality, which were verified by manual qualitative analysis. Comparisons between coded retrieved comments and a HRQoL measure (EQ5D) were explored. The survey response rate was 63.3% (21 802/34 467), of which 25.8% (n=5634) participants provided free-text comments. Of retrieved comments on experiences of care (n=1688), over half (n=1045, 62%) described positive care experiences. Most negative experiences concerned a lack of post-treatment care (n=191, 11% of retrieved comments) and insufficient information concerning self-management strategies (n=135, 8%) or treatment side effects (n=160, 9%). Associations existed between HRQoL scores and coded algorithm-retrieved comments. Analysis indicated that the mechanism by which service quality impacted on HRQoL was the extent to which services prevented or alleviated challenges associated with disease and treatment burdens. Learning-based text mining techniques were found useful and practical tools to identify specific free-text comments within a large dataset, facilitating resource-efficient qualitative analysis. This method should be considered for future PROM analysis to inform policy and practice. Study findings indicated that perceived care quality directly impacts on HRQoL. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Accessing Biomedical Literature in the Current Information Landscape
Khare, Ritu; Leaman, Robert; Lu, Zhiyong
2015-01-01
i. Summary Biomedical and life sciences literature is unique because of its exponentially increasing volume and interdisciplinary nature. Biomedical literature access is essential for several types of users including biomedical researchers, clinicians, database curators, and bibliometricians. In the past few decades, several online search tools and literature archives, generic as well as biomedicine-specific, have been developed. We present this chapter in the light of three consecutive steps of literature access: searching for citations, retrieving full-text, and viewing the article. The first section presents the current state of practice of biomedical literature access, including an analysis of the search tools most frequently used by the users, including PubMed, Google Scholar, Web of Science, Scopus, and Embase, and a study on biomedical literature archives such as PubMed Central. The next section describes current research and the state-of-the-art systems motivated by the challenges a user faces during query formulation and interpretation of search results. The research solutions are classified into five key areas related to text and data mining, text similarity search, semantic search, query support, relevance ranking, and clustering results. Finally, the last section describes some predicted future trends for improving biomedical literature access, such as searching and reading articles on portable devices, and adoption of the open access policy. PMID:24788259
Extraction and Classification of Emotions for Business Research
NASA Astrophysics Data System (ADS)
Verma, Rajib
The commercial study of emotions has not embraced Internet / social mining yet, even though it has important applications in management. This is surprising since the emotional content is freeform, wide spread, can give a better indication of feelings (for instance with taboo subjects), and is inexpensive compared to other business research methods. A brief framework for applying text mining to this new research domain is shown and classification issues are discussed in an effort to quickly get businessman and researchers to adopt the mining methodology.
Clustering and Dimensionality Reduction to Discover Interesting Patterns in Binary Data
NASA Astrophysics Data System (ADS)
Palumbo, Francesco; D'Enza, Alfonso Iodice
The attention towards binary data coding increased consistently in the last decade due to several reasons. The analysis of binary data characterizes several fields of application, such as market basket analysis, DNA microarray data, image mining, text mining and web-clickstream mining. The paper illustrates two different approaches exploiting a profitable combination of clustering and dimensionality reduction for the identification of non-trivial association structures in binary data. An application in the Association Rules framework supports the theory with the empirical evidence.
Mining biomedical images towards valuable information retrieval in biomedical and life sciences
Ahmed, Zeeshan; Zeeshan, Saman; Dandekar, Thomas
2016-01-01
Biomedical images are helpful sources for the scientists and practitioners in drawing significant hypotheses, exemplifying approaches and describing experimental results in published biomedical literature. In last decades, there has been an enormous increase in the amount of heterogeneous biomedical image production and publication, which results in a need for bioimaging platforms for feature extraction and analysis of text and content in biomedical images to take advantage in implementing effective information retrieval systems. In this review, we summarize technologies related to data mining of figures. We describe and compare the potential of different approaches in terms of their developmental aspects, used methodologies, produced results, achieved accuracies and limitations. Our comparative conclusions include current challenges for bioimaging software with selective image mining, embedded text extraction and processing of complex natural language queries. PMID:27538578
Monitoring food safety violation reports from internet forums.
Kate, Kiran; Negi, Sumit; Kalagnanam, Jayant
2014-01-01
Food-borne illness is a growing public health concern in the world. Government bodies, which regulate and monitor the state of food safety, solicit citizen feedback about food hygiene practices followed by food establishments. They use traditional channels like call center, e-mail for such feedback collection. With the growing popularity of Web 2.0 and social media, citizens often post such feedback on internet forums, message boards etc. The system proposed in this paper applies text mining techniques to identify and mine such food safety complaints posted by citizens on web data sources thereby enabling the government agencies to gather more information about the state of food safety. In this paper, we discuss the architecture of our system and the text mining methods used. We also present results which demonstrate the effectiveness of this system in a real-world deployment.
Oxygen transport and pyrite oxidation in unsaturated coal-mine spoil
Guo, Weixing; Cravotta, Charles A.
1996-01-01
An understanding of the mechanisms of oxygen (02) transport in unsaturated mine spoil is necessary to design and implement effective measures to exclude 02 from pyritic materials and to control the formation of acidic mine drainage. Partial pressure of oxygen (Po2) in pore gas, chemistry of pore water, and temperature were measured at different depths in unsaturated spoil at two reclaimed surface coal mines in Pennsylvania. At mine 1, where spoil was loose, blocky sandstone, Po2 changed little with depth, decreasing from 21 volume percent (vol%) at the ground surface to a minimum of about 18 vol% at 10 m depth. At mine 2, where spoil was compacted, friable shale, Po2 decreased to less than 2 vol% at depth of about 10 m. Although pore-water chemistry and temperature data indicate that acid-forming reactions were active at both mines, the pore-gas data indicate that mechanisms for 0 2 transport were different at each mine. A numerical model was developed to simulate 02 transport and pyrite oxidation in unsaturated mine spoil. The results of the numerical simulations indicate that differences in 02 transport at the two mines can be explained by differences in the air permeability of spoil. Po2 changes little with depth if advective transport of 02 dominates as at mine 1, but decreases greatly with depth if diffusive transport of 02 dominates, as in mine 2. Model results also indicate that advective transport becomes significant if the air permeability of spoil is greater than 10-9 m2, which is expected for blocky sandstone spoil. In the advective-dominant system, thermally-induced convective air flow, as a consequence of the exothermic oxidation of pyrite, supplies the 02 to maintain high Po2 within the deep unsaturated zone.
1981-06-23
Some negative impacts of MX deployment on mining in the study area are unavoidable, but careful planning in water use and actual shelter site...depend upon the extent of deployment and location of shelter sites. A major impact on the mining industry will result if draw-down of the water table...use and acquire the necessary land rights or whether the affected shelter (s) should be abandoned or replaced elsewhere in the deployment area. Egec E-TR
2024 Unmanned Undersea Warfare Concept
2013-06-01
mine. Assumptions are that the high-tech mine would have a 400 - meter range that spans 360 degrees, a 90% probability of detecting a HVU, and a 30...motor volume – The electric propulsion motor is assumed to be 0.127 cubic meters . A common figure of 24” x 18” x 18” is assumed. This size will allow...regard to propagation loss is assumed to be 400 HZ. Using Excel spreadsheet modeling, the maximum range is determined by finding that range resulting in
Wiegers, Thomas C; Davis, Allan Peter; Mattingly, Carolyn J
2014-01-01
The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation tasks collectively represent a community-wide effort to evaluate a variety of text-mining and information extraction systems applied to the biological domain. The BioCreative IV Workshop included five independent subject areas, including Track 3, which focused on named-entity recognition (NER) for the Comparative Toxicogenomics Database (CTD; http://ctdbase.org). Previously, CTD had organized document ranking and NER-related tasks for the BioCreative Workshop 2012; a key finding of that effort was that interoperability and integration complexity were major impediments to the direct application of the systems to CTD's text-mining pipeline. This underscored a prevailing problem with software integration efforts. Major interoperability-related issues included lack of process modularity, operating system incompatibility, tool configuration complexity and lack of standardization of high-level inter-process communications. One approach to potentially mitigate interoperability and general integration issues is the use of Web services to abstract implementation details; rather than integrating NER tools directly, HTTP-based calls from CTD's asynchronous, batch-oriented text-mining pipeline could be made to remote NER Web services for recognition of specific biological terms using BioC (an emerging family of XML formats) for inter-process communications. To test this concept, participating groups developed Representational State Transfer /BioC-compliant Web services tailored to CTD's NER requirements. Participants were provided with a comprehensive set of training materials. CTD evaluated results obtained from the remote Web service-based URLs against a test data set of 510 manually curated scientific articles. Twelve groups participated in the challenge. Recall, precision, balanced F-scores and response times were calculated. Top balanced F-scores for gene, chemical and disease NER were 61, 74 and 51%, respectively. Response times ranged from fractions-of-a-second to over a minute per article. We present a description of the challenge and summary of results, demonstrating how curation groups can effectively use interoperable NER technologies to simplify text-mining pipeline implementation. Database URL: http://ctdbase.org/ © The Author(s) 2014. Published by Oxford University Press.
Wiegers, Thomas C.; Davis, Allan Peter; Mattingly, Carolyn J.
2014-01-01
The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation tasks collectively represent a community-wide effort to evaluate a variety of text-mining and information extraction systems applied to the biological domain. The BioCreative IV Workshop included five independent subject areas, including Track 3, which focused on named-entity recognition (NER) for the Comparative Toxicogenomics Database (CTD; http://ctdbase.org). Previously, CTD had organized document ranking and NER-related tasks for the BioCreative Workshop 2012; a key finding of that effort was that interoperability and integration complexity were major impediments to the direct application of the systems to CTD's text-mining pipeline. This underscored a prevailing problem with software integration efforts. Major interoperability-related issues included lack of process modularity, operating system incompatibility, tool configuration complexity and lack of standardization of high-level inter-process communications. One approach to potentially mitigate interoperability and general integration issues is the use of Web services to abstract implementation details; rather than integrating NER tools directly, HTTP-based calls from CTD's asynchronous, batch-oriented text-mining pipeline could be made to remote NER Web services for recognition of specific biological terms using BioC (an emerging family of XML formats) for inter-process communications. To test this concept, participating groups developed Representational State Transfer /BioC-compliant Web services tailored to CTD's NER requirements. Participants were provided with a comprehensive set of training materials. CTD evaluated results obtained from the remote Web service-based URLs against a test data set of 510 manually curated scientific articles. Twelve groups participated in the challenge. Recall, precision, balanced F-scores and response times were calculated. Top balanced F-scores for gene, chemical and disease NER were 61, 74 and 51%, respectively. Response times ranged from fractions-of-a-second to over a minute per article. We present a description of the challenge and summary of results, demonstrating how curation groups can effectively use interoperable NER technologies to simplify text-mining pipeline implementation. Database URL: http://ctdbase.org/ PMID:24919658
Mining in low coal. Volume 1. Biomechanics and work physiology. Open file report 15 Jun 78-15 Sep 81
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ayoub, M.M.; Bethea, N.J.; Bobo, M.
1981-11-01
The objectives of this research were (1) to evaluate the job demands associated with low coal mining, (2) to survey the anthropometry, strength, and aerobic capacity of low coal miners to determine if they differ from the U.S. population, and (3) to recommend, on the basis of available information, optimal job and work station design for low coal mining. The male and female anthropometry, except for weight and circumferential dimensions, was quite similar to the comparison populations. Back strength for male and female miners was significantly lower than the industrial worker population. This can be one of the contributing factorsmore » of low back problems in mining. Shoveling, timbering, and helpers tasks were physiologically demanding activities. However, because of the frequent stoppage of work, adequate rest was usually available. If work stoppage is corrected, then better work and rest schedules are essential.« less
NASA Astrophysics Data System (ADS)
Lee, M. J.; Oh, K. Y.; Joung-ho, L.
2016-12-01
Recently there are many research about analysing the interaction between entities by text-mining analysis in various fields. In this paper, we aimed to quantitatively analyse research-trends in the area of environmental research relating either spatial information or ICT (Information and Communications Technology) by Text-mining analysis. To do this, we applied low-dimensional embedding method, clustering analysis, and association rule to find meaningful associative patterns of key words frequently appeared in the articles. As the authors suppose that KCI (Korea Citation Index) articles reflect academic demands, total 1228 KCI articles that have been published from 1996 to 2015 were reviewed and analysed by Text-mining method. First, we derived KCI articles from NDSL(National Discovery for Science Leaders) site. And then we pre-processed their key-words elected from abstract and then classified those in separable sectors. We investigated the appearance rates and association rule of key-words for articles in the two fields: spatial-information and ICT. In order to detect historic trends, analysis was conducted separately for the four periods: 1996-2000, 2001-2005, 2006-2010, 2011-2015. These analysis were conducted with the usage of R-software. As a result, we conformed that environmental research relating spatial information mainly focused upon such fields as `GIS(35%)', `Remote-Sensing(25%)', `environmental theme map(15.7%)'. Next, `ICT technology(23.6%)', `ICT service(5.4%)', `mobile(24%)', `big data(10%)', `AI(7%)' are primarily emerging from environmental research relating ICT. Thus, from the analysis results, this paper asserts that research trends and academic progresses are well-structured to review recent spatial information and ICT technology and the outcomes of the analysis can be an adequate guidelines to establish environment policies and strategies. KEY WORDS: Big data, Test-mining, Environmental research, Spatial-information, ICT Acknowledgements: The authors appreciate the support that this study has received from `Building application frame of environmental issues, to respond to the latest ICT trends'.
Text mining-based in silico drug discovery in oral mucositis caused by high-dose cancer therapy.
Kirk, Jon; Shah, Nirav; Noll, Braxton; Stevens, Craig B; Lawler, Marshall; Mougeot, Farah B; Mougeot, Jean-Luc C
2018-08-01
Oral mucositis (OM) is a major dose-limiting side effect of chemotherapy and radiation used in cancer treatment. Due to the complex nature of OM, currently available drug-based treatments are of limited efficacy. Our objectives were (i) to determine genes and molecular pathways associated with OM and wound healing using computational tools and publicly available data and (ii) to identify drugs formulated for topical use targeting the relevant OM molecular pathways. OM and wound healing-associated genes were determined by text mining, and the intersection of the two gene sets was selected for gene ontology analysis using the GeneCodis program. Protein interaction network analysis was performed using STRING-db. Enriched gene sets belonging to the identified pathways were queried against the Drug-Gene Interaction database to find drug candidates for topical use in OM. Our analysis identified 447 genes common to both the "OM" and "wound healing" text mining concepts. Gene enrichment analysis yielded 20 genes representing six pathways and targetable by a total of 32 drugs which could possibly be formulated for topical application. A manual search on ClinicalTrials.gov confirmed no relevant pathway/drug candidate had been overlooked. Twenty-five of the 32 drugs can directly affect the PTGS2 (COX-2) pathway, the pathway that has been targeted in previous clinical trials with limited success. Drug discovery using in silico text mining and pathway analysis tools can facilitate the identification of existing drugs that have the potential of topical administration to improve OM treatment.
Takeda, Kayoko; Takahashi, Kiyoshi; Masukawa, Hiroyuki; Shimamori, Yoshimitsu
2017-01-01
Recently, the practice of active learning has spread, increasingly recognized as an essential component of academic studies. Classes incorporating small group discussion (SGD) are conducted at many universities. At present, assessments of the effectiveness of SGD have mostly involved evaluation by questionnaires conducted by teachers, by peer assessment, and by self-evaluation of students. However, qualitative data, such as open-ended descriptions by students, have not been widely evaluated. As a result, we have been unable to analyze the processes and methods involved in how students acquire knowledge in SGD. In recent years, due to advances in information and communication technology (ICT), text mining has enabled the analysis of qualitative data. We therefore investigated whether the introduction of a learning system comprising the jigsaw method and problem-based learning (PBL) would improve student attitudes toward learning; we did this by text mining analysis of the content of student reports. We found that by applying the jigsaw method before PBL, we were able to improve student attitudes toward learning and increase the depth of their understanding of the area of study as a result of working with others. The use of text mining to analyze qualitative data also allowed us to understand the processes and methods by which students acquired knowledge in SGD and also changes in students' understanding and performance based on improvements to the class. This finding suggests that the use of text mining to analyze qualitative data could enable teachers to evaluate the effectiveness of various methods employed to improve learning.
NASA Astrophysics Data System (ADS)
Baig, A. M.; Urbancic, T.; Bosman, K.; Smith-Boughner, L.; Viegas, G. F.
2016-12-01
Underground excavation of ore tends to concentrate stress in the pillars of the mines. As the mining progresses, the stress tends to concentrate in these pillars resulting in potentially critical stress conditions that lead to concerns over personnel safety and has implications with regards to efficient and effective extraction criteria. It therefore becomes critical for operations to manage this stress behaviour as the extraction activities progress. In this study, we examine seismicity recorded with a full three-dimensional array consisting of single- and three-component accelerometers and geophones around the extraction volumes; this data formed the basis for characterization of stress variations. Specifically, we present an integrated study of the seismological properties of a sill pillar during the blasting of a stope to characterize how the stress is evolving in the mine. Our results suggest that the seismicity itself reacts to the stress conditions of the mining and through investigation of the source parameters, reveals how these events are being activated. Through consideration of the both the source parameters and the inter-event times and distances, we arrive at a description of the deformation of the reservoir and are able to assess the role of stress during this process. Further resolution of the stress state in the mine is obtained through inversions of moment tensors on the highest-quality microseismic data, and a descriptive analysis of event clustering by space and time to resolve the dynamics of the stress orientations. To corroborate our inferences based on microseismicity, we use blasts recorded around the extraction volume to understand how stress is manifesting itself through P-wave velocity anomalies. We confirm the dynamics of the stress field that we observe from the microseismicity and show the destressing effect of blasting coupled with stress migration through to other parts of the sill pillar.
Text feature extraction based on deep learning: a review.
Liang, Hong; Sun, Xiao; Sun, Yunlei; Gao, Yuan
2017-01-01
Selection of text feature item is a basic and important matter for text mining and information retrieval. Traditional methods of feature extraction require handcrafted features. To hand-design, an effective feature is a lengthy process, but aiming at new applications, deep learning enables to acquire new effective feature representation from training data. As a new feature extraction method, deep learning has made achievements in text mining. The major difference between deep learning and conventional methods is that deep learning automatically learns features from big data, instead of adopting handcrafted features, which mainly depends on priori knowledge of designers and is highly impossible to take the advantage of big data. Deep learning can automatically learn feature representation from big data, including millions of parameters. This thesis outlines the common methods used in text feature extraction first, and then expands frequently used deep learning methods in text feature extraction and its applications, and forecasts the application of deep learning in feature extraction.
30 CFR 730.11 - Inconsistent and more stringent State laws and regulations.
Code of Federal Regulations, 2010 CFR
2010-07-01
... regulations. 730.11 Section 730.11 Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT... Register setting forth the text or a summary of any State law or regulation initially determined by him to... stringent land use and environmental controls and regulations of coal exploration and surface coal mining...
Using Syntactic Patterns to Enhance Text Analytics
ERIC Educational Resources Information Center
Meyer, Bradley B.
2017-01-01
Large scale product and service reviews proliferate and are commonly found across the web. The ability to harvest, digest and analyze a large corpus of reviews from online websites is still however a difficult problem. This problem is referred to as "opinion mining." Opinion mining is an important area of research as advances in the…
Federal Register 2010, 2011, 2012, 2013, 2014
2011-01-28
... Uranium Recovery Project, located in the Pumpkin Buttes Uranium Mining District within the Powder River.... Alternatives that were considered, but were eliminated from detailed analysis, include conventional mining and... an Agencywide Documents and Management System (ADAMS), which provides text and image files of the NRC...
Federal Register 2010, 2011, 2012, 2013, 2014
2011-08-26
... (ADAMS), which provides text and image files of the NRC's public documents in the NRC Library at http... considered, but eliminated from detailed analysis, include conventional uranium mining and milling, conventional mining and heap leach processing, alternate lixiviants, and alternative wastewater disposal...
Quiroz-Arcentales, Leonardo; Hernández-Flórez, Luis J; Agudelo Calderón, Carlos A; Medina, Katalina; Robledo-Martínez, Rocío; Osorio-García, Samuel D
2013-01-01
Establishing the prevalence of respiratory symptoms and disease in children aged less than 12 years-old living within the Cesar department's coal-mining area and possible associated factors. This was a cross-sectional study of 1,627 children aged less than 10 years-old living in and near coal-mining areas in the Cesar department who were exposed to different levels of PM10 from 2008-2010; their PM10 exposure-related symptoms and respiratory diseases were measured, seeking an association with living in areas exposed to particulate material. Children living in areas close to coal-mining activity which also had high traffic volume had a higher rate of probable cases of asthma; those living in areas with traffic (not no coal-mining) were absent from school for more days due to acute respiratory disease. Respiratory symptoms were most commonly found in children experiencing living conditions which exposed them to cigarette or firewood smoke indoors, living in houses made with wattle and daub or adobe walls, living where animals were kept, living in damp housing and diesel-powered dump trucks operating within 100 m or less of their housing. Living in areas having high traffic volume increased the risk of respiratory symptoms, acute respiratory disease and being absent from school. All the effects studied were associated with intramural conditions, individual factors or those associated with the immediate surroundings thereby coinciding with results found in similar studies regarding air pollution and health. It is thus suggested that regional strategies and policy be created for controlling and monitoring the air quality and health of people living in the Cesar department.
Disposal and improvement of contaminated by waste extraction of copper mining in chile
NASA Astrophysics Data System (ADS)
Naranjo Lamilla, Pedro; Blanco Fernández, David; Díaz González, Marcos; Robles Castillo, Marcelo; Decinti Weiss, Alejandra; Tapia Alvarez, Carolina; Pardo Fabregat, Francisco; Vidal, Manuel Miguel Jordan; Bech, Jaume; Roca, Nuria
2016-04-01
This project originated from the need of a mining company, which mines and processes copper ore. High purity copper is produced with an annual production of 1,113,928 tons of concentrate to a law of 32%. This mining company has generated several illegal landfills and has been forced by the government to make a management center Industrial Solid Waste (ISW). The forecast volume of waste generated is 20,000 tons / year. Chemical analysis established that the studied soil has a high copper content, caused by nature or from the spread of contaminants from mining activities. Moreover, in some sectors, soil contamination by mercury, hydrocarbons and oils and fats were detected, likely associated with the accumulation of waste. The waters are also impacted by mining industrial tasks, specifically copper ores, molybdenum, manganese, sulfates and have an acidic pH. The ISW management center dispels the pollution of soil and water and concentrating all activities in a technically suitable place. In this center the necessary guidelines for the treatment and disposal of soil contamination caused by uncontrolled landfills are given, also generating a leachate collection system and a network of fluid monitoring physicochemical water quality and soil environment. Keywords: Industrial solid waste, soil contamination, Mining waste
Discovering and visualizing indirect associations between biomedical concepts
Tsuruoka, Yoshimasa; Miwa, Makoto; Hamamoto, Kaisei; Tsujii, Jun'ichi; Ananiadou, Sophia
2011-01-01
Motivation: Discovering useful associations between biomedical concepts has been one of the main goals in biomedical text-mining, and understanding their biomedical contexts is crucial in the discovery process. Hence, we need a text-mining system that helps users explore various types of (possibly hidden) associations in an easy and comprehensible manner. Results: This article describes FACTA+, a real-time text-mining system for finding and visualizing indirect associations between biomedical concepts from MEDLINE abstracts. The system can be used as a text search engine like PubMed with additional features to help users discover and visualize indirect associations between important biomedical concepts such as genes, diseases and chemical compounds. FACTA+ inherits all functionality from its predecessor, FACTA, and extends it by incorporating three new features: (i) detecting biomolecular events in text using a machine learning model, (ii) discovering hidden associations using co-occurrence statistics between concepts, and (iii) visualizing associations to improve the interpretability of the output. To the best of our knowledge, FACTA+ is the first real-time web application that offers the functionality of finding concepts involving biomolecular events and visualizing indirect associations of concepts with both their categories and importance. Availability: FACTA+ is available as a web application at http://refine1-nactem.mc.man.ac.uk/facta/, and its visualizer is available at http://refine1-nactem.mc.man.ac.uk/facta-visualizer/. Contact: tsuruoka@jaist.ac.jp PMID:21685059
de la Torre, M L; Grande, J A; Valente, T; Perez-Ostalé, E; Santisteban, M; Aroba, J; Ramos, I
2016-03-01
Poderosa Mine is an abandoned pyrite mine, located in the Iberian Pyrite Belt which pours its acid mine drainage (AMD) waters into the Odiel river (South-West Spain). This work focuses on establishing possible reasons for interdependence between the potential redox and pH, with the load of metals and sulfates, as well as a set of variables that define the physical chemistry of the water-conductivity, temperature, TDS, and dissolved oxygen-transported by a channel from Poderosa mine affected by acid mine drainage, through the use of techniques of artificial intelligence: fuzzy logic and data mining. The sampling campaign was carried out in May of 2012. There were a total of 16 sites, the first inside the tunnel and the last at the mouth of the river Odiel, with a distance of approximately 10 m between each pair of measuring stations. While the tools of classical statistics, which are widely used in this context, prove useful for defining proximity ratios between variables based on Pearson's correlations, in addition to making it easier to handle large volumes of data and producing easier-to-understand graphs, the use of fuzzy logic tools and data mining results in better definition of the variations produced by external stimuli on the set of variables. This tool is adaptable and can be extrapolated to any system polluted by acid mine drainage using simple, intuitive reasoning.
Alanazi, Ibrahim O; AlYahya, Sami A; Ebrahimie, Esmaeil; Mohammadi-Dehcheshmeh, Manijeh
2018-06-15
Exponentially growing scientific knowledge in scientific publications has resulted in the emergence of a new interdisciplinary science of literature mining. In text mining, the machine reads the published literature and transfers the discovered knowledge to mathematical-like formulas. In an integrative approach in this study, we used text mining in combination with network discovery, pathway analysis, and enrichment analysis of genomic regions for better understanding of biomarkers in lung cancer. Particular attention was paid to non-coding biomarkers. In total, 60 MicroRNA biomarkers were reported for lung cancer, including some prognostic biomarkers. MIR21, MIR155, MALAT1, and MIR31 were the top non-coding RNA biomarkers of lung cancer. Text mining identified 447 proteins which have been studied as biomarkers in lung cancer. EGFR (receptor), TP53 (transcription factor), KRAS, CDKN2A, ENO2, KRT19, RASSF1, GRP (ligand), SHOX2 (transcription factor), and ERBB2 (receptor) were the most studied proteins. Within small molecules, thymosin-a1, oestrogen, and 8-OHdG have received more attention. We found some chromosomal bands, such as 7q32.2, 18q12.1, 6p12, 11p15.5, and 3p21.3 that are highly involved in deriving lung cancer biomarkers. Copyright © 2018 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Burger, James A
2005-07-20
The overall purpose of this project is to evaluate the biological and economic feasibility of restoring high-quality forests on mined land, and to measure carbon sequestration and wood production benefits that would be achieved from forest restoration procedures. We are currently estimating the acreage of lands in Virginia, West Virginia, Kentucky, Ohio, and Pennsylvania mined under SMCRA and reclaimed to non-forested post-mining land uses that are not currently under active management, and therefore can be considered as available for carbon sequestration. To determine actual sequestration under different forest management scenarios, a field study was installed as a 3 x 3 factorial in a random complete block design with three replications at each of three locations, one each in Ohio, West Virginia, and Virginia. The treatments included three forest types (white pine, hybrid poplar, mixed hardwood) and three silvicultural regimes (competition control, competition control plus tillage, competition control plus tillage plus fertilization). Each individual treatment plot is 0.5 acres. Each block of nine plots is 4.5 acres, and the complete installation at each site is 13.5 acres. During the reporting period we determined that by grinding the soil samples to a finer particle size of less than 250 μm (sieve No. 60), the effect of mine soil coal particle size on the extent to which these particles will be oxidized during the thermal treatment of the carbon partitioning procedure will be eliminated, thus making the procedure more accurate and precise. In the second phase of the carbon sequestration project, we focused our attention on determining the sample size required for carbon accounting on grassland mined fields in order to achieve a desired accuracy and precision of the final soil organic carbon (SOC) estimate. A mine land site quality classification scheme was developed and some field-testing of the methods of implementation was completed. The classification model has been validated for softwoods (white pine) on several reclaimed mine sites in the southern Appalachian coal region. The classification model is a viable method for classifying post-SMCRA abandoned mined lands into productivity classes for white pine. A thinning study was established as a random complete block design to evaluate the response to thinning of a 26-year-old white pine stand growing on a reclaimed surface mine in southwest Virginia. Stand parameters were projected to age 30 using a stand table projection. Site index of the stand was found to be 32.3 m at base age 50 years. Thinning rapidly increased the diameter growth of the residual trees to 0.84 cm yr{sup -1} compared to 0.58 cm yr{sup -1} for the unthinned treatment; however, at age 26, there was no difference in volume or value per hectare. At age 30, the unthinned treatment had a volume of 457.1 m{sup 3} ha{sup -1} but was only worthmore » $$8807 ha{sup -1}, while the thinned treatment was projected to have 465.8 m{sup 3} ha{sup -1}, which was worth $$11265 ha{sup -1} due to a larger percentage of the volume being in sawtimber size classes.« less
A Review of Financial Accounting Fraud Detection based on Data Mining Techniques
NASA Astrophysics Data System (ADS)
Sharma, Anuj; Kumar Panigrahi, Prabin
2012-02-01
With an upsurge in financial accounting fraud in the current economic scenario experienced, financial accounting fraud detection (FAFD) has become an emerging topic of great importance for academic, research and industries. The failure of internal auditing system of the organization in identifying the accounting frauds has lead to use of specialized procedures to detect financial accounting fraud, collective known as forensic accounting. Data mining techniques are providing great aid in financial accounting fraud detection, since dealing with the large data volumes and complexities of financial data are big challenges for forensic accounting. This paper presents a comprehensive review of the literature on the application of data mining techniques for the detection of financial accounting fraud and proposes a framework for data mining techniques based accounting fraud detection. The systematic and comprehensive literature review of the data mining techniques applicable to financial accounting fraud detection may provide a foundation to future research in this field. The findings of this review show that data mining techniques like logistic models, neural networks, Bayesian belief network, and decision trees have been applied most extensively to provide primary solutions to the problems inherent in the detection and classification of fraudulent data.
Sağlam, Emine Selva; Akçay, Miğraç; Çolak, Dilşat Nigar; İnan Bektaş, Kadriye; Beldüz, Ali Osman
2016-09-01
The Karaerik Cu mine is a worked-out deposit with large volumes of tailings and slags which were left around the mine site without any protection. Natural feeding of these material and run-off water from the mineralised zones into the Acısu effluent causes a serious environmental degradation and creation of acid mine drainage (AMD) along its entire length. This research aims at modelling the formation of AMD with a specific attempt on the characterisation of the bacterial population in association with AMD and their role on its occurrence. Based on 16SrRNA analyses of the clones obtained from a composite water sample, the bacterial community was determined to consist of Acidithiobacillus ferrivorans, Ferrovum myxofaciens, Leptospirillum ferrooxidans and Acidithiobacillus ferrooxidans as iron-oxidising bacteria, Acidocella facilis, Acidocella aluminiidurans, Acidiphilium cryptum and Acidiphilium multivorum as iron-reducing bacteria, and Acidithiobacillus ferrivorans, Acidithiobacillus ferrooxidans, Acidithiobacillus thiooxidans and Acidiphilium cryptum as sulphur-oxidising bacteria. This association of bacteria with varying roles was interpreted as evidence of a concomitant occurrence of sulphur and iron cycles during the generation of AMD along the Acısu effluent draining the Karaerik mine.
Kinetic Study on the Removal of Iron from Gold Mine Tailings by Citric Acid
NASA Astrophysics Data System (ADS)
Mashifana, T.; Mavimbela, N.; Sithole, N.
2018-03-01
The Gold mining generates large volumes of tailings, with consequent disposal and environmental problems. Iron tends to react with sulphur to form pyrite and pyrrhotite which then react with rain water forming acid rain. The study focuses on the removal of iron (Fe) from Gold Mine tailings; Fe was leached using citric acid as a leaching reagent. Three parameters which have an effect on the removal of Fe from the gold mine tailings, namely; temperature (25 °C and 50 °C), reagent concentration (0.25 M, 0.5 M, 0.75 M and 1 M) and solid loading ratio (20 %, 30 % and 40 %) were investigated. It was found that the recovery of Fe from gold mine tailings increased with increasing temperature and reagent concentration, but decreased with increasing solid loading ratio. The optimum conditions for the recovery of Fe from gold mine tailings was found to be at a temperature of 50 ºC, reagent concentration of 1 M and solid loading of 20 %. Three linear kinetic models were investigated and Prout-Tompkins kinetic model was the best fit yielding linear graphs with the highest R2 values.
Kurth, Laura; Kolker, Allan; Engle, Mark A.; Geboy, Nicholas J.; Hendryx, Michael; Orem, William H.; McCawley, Michael; Crosby, Lynn M.; Tatu, Calin A.; Varonka, Matthew S.; DeVera, Christina A.
2015-01-01
Mountaintop removal mining (MTM) is a widely used approach to surface coal mining in the US Appalachian region whereby large volumes of coal overburden are excavated using explosives, removed, and transferred to nearby drainages below MTM operations. To investigate the air quality impact of MTM, the geochemical characteristics of atmospheric particulate matter (PM) from five surface mining sites in south central West Virginia, USA, and five in-state study control sites having only underground coal mining or no coal mining whatsoever were determined and compared. Epidemiologic studies show increased rates of cancer, respiratory disease, cardiovascular disease, and overall mortality in Appalachian surface mining areas compared to Appalachian non-mining areas. In the present study, 24-h coarse (>2.5 µm) and fine (≤2.5 µm) PM samples were collected from two surface mining sites in June 2011 showed pronounced enrichment in elements having a crustal affinity (Ga, Al, Ge, Rb, La, Ce) contributed by local sources, relative to controls. Follow-up sampling in August 2011 lacked this enrichment, suggesting that PM input from local sources is intermittent. Using passive samplers, dry deposition total PM elemental fluxes calculated for three surface mining sites over multi-day intervals between May and August 2012 were 5.8 ± 1.5 times higher for crustal elements than at controls. Scanning microscopy of 2,249 particles showed that primary aluminosilicate PM was prevalent at surface mining sites compared to secondary PM at controls. Additional testing is needed to establish any link between input of lithogenic PM and disease rates in the study area.
Kurth, Laura; Kolker, Allan; Engle, Mark; Geboy, Nicholas; Hendryx, Michael; Orem, William; McCawley, Michael; Crosby, Lynn; Tatu, Calin; Varonka, Matthew; DeVera, Christina
2015-06-01
Mountaintop removal mining (MTM) is a widely used approach to surface coal mining in the US Appalachian region whereby large volumes of coal overburden are excavated using explosives, removed, and transferred to nearby drainages below MTM operations. To investigate the air quality impact of MTM, the geochemical characteristics of atmospheric particulate matter (PM) from five surface mining sites in south central West Virginia, USA, and five in-state study control sites having only underground coal mining or no coal mining whatsoever were determined and compared. Epidemiologic studies show increased rates of cancer, respiratory disease, cardiovascular disease, and overall mortality in Appalachian surface mining areas compared to Appalachian non-mining areas. In the present study, 24-h coarse (>2.5 µm) and fine (≤2.5 µm) PM samples were collected from two surface mining sites in June 2011 showed pronounced enrichment in elements having a crustal affinity (Ga, Al, Ge, Rb, La, Ce) contributed by local sources, relative to controls. Follow-up sampling in August 2011 lacked this enrichment, suggesting that PM input from local sources is intermittent. Using passive samplers, dry deposition total PM elemental fluxes calculated for three surface mining sites over multi-day intervals between May and August 2012 were 5.8 ± 1.5 times higher for crustal elements than at controls. Scanning microscopy of 2,249 particles showed that primary aluminosilicate PM was prevalent at surface mining sites compared to secondary PM at controls. Additional testing is needed to establish any link between input of lithogenic PM and disease rates in the study area.
OSCAR4: a flexible architecture for chemical text-mining
2011-01-01
The Open-Source Chemistry Analysis Routines (OSCAR) software, a toolkit for the recognition of named entities and data in chemistry publications, has been developed since 2002. Recent work has resulted in the separation of the core OSCAR functionality and its release as the OSCAR4 library. This library features a modular API (based on reduction of surface coupling) that permits client programmers to easily incorporate it into external applications. OSCAR4 offers a domain-independent architecture upon which chemistry specific text-mining tools can be built, and its development and usage are discussed. PMID:21999457
Literature evidence in open targets - a target validation platform.
Kafkas, Şenay; Dunham, Ian; McEntyre, Johanna
2017-06-06
We present the Europe PMC literature component of Open Targets - a target validation platform that integrates various evidence to aid drug target identification and validation. The component identifies target-disease associations in documents and ranks the documents based on their confidence from the Europe PMC literature database, by using rules utilising expert-provided heuristic information. The confidence score of a given document represents how valuable the document is in the scope of target validation for a given target-disease association by taking into account the credibility of the association based on the properties of the text. The component serves the platform regularly with the up-to-date data since December, 2015. Currently, there are a total number of 1168365 distinct target-disease associations text mined from >26 million PubMed abstracts and >1.2 million Open Access full text articles. Our comparative analyses on the current available evidence data in the platform revealed that 850179 of these associations are exclusively identified by literature mining. This component helps the platform's users by providing the most relevant literature hits for a given target and disease. The text mining evidence along with the other types of evidence can be explored visually through https://www.targetvalidation.org and all the evidence data is available for download in json format from https://www.targetvalidation.org/downloads/data .
U-Compare: share and compare text mining tools with UIMA
Kano, Yoshinobu; Baumgartner, William A.; McCrohon, Luke; Ananiadou, Sophia; Cohen, K. Bretonnel; Hunter, Lawrence; Tsujii, Jun'ichi
2009-01-01
Summary: Due to the increasing number of text mining resources (tools and corpora) available to biologists, interoperability issues between these resources are becoming significant obstacles to using them effectively. UIMA, the Unstructured Information Management Architecture, is an open framework designed to aid in the construction of more interoperable tools. U-Compare is built on top of the UIMA framework, and provides both a concrete framework for out-of-the-box text mining and a sophisticated evaluation platform allowing users to run specific tools on any target text, generating both detailed statistics and instance-based visualizations of outputs. U-Compare is a joint project, providing the world's largest, and still growing, collection of UIMA-compatible resources. These resources, originally developed by different groups for a variety of domains, include many famous tools and corpora. U-Compare can be launched straight from the web, without needing to be manually installed. All U-Compare components are provided ready-to-use and can be combined easily via a drag-and-drop interface without any programming. External UIMA components can also simply be mixed with U-Compare components, without distinguishing between locally and remotely deployed resources. Availability: http://u-compare.org/ Contact: kano@is.s.u-tokyo.ac.jp PMID:19414535
Powers, Michael H.; Burton, Bethany L.
2007-01-01
As part of a research effort directed by the New Mexico Environment Department to determine pre-mining water quality of the Red River at a molybdenum mining site in northern New Mexico, we used seismic refraction tomography to create subsurface compressional-wave velocity images along six lines that crossed the Straight Creek drainage and three that crossed the valley of Red River. Field work was performed in June 2002 (lines 1-4) and September 2003 (lines 5-9). We interpreted the images to determine depths to the water table and to the top of bedrock. Depths to water and bedrock in boreholes near the lines correlate well with our interpretations based on seismic data. In general, the images suggest that the alluvium in this area has a trapezoidal cross section. Using a U.S. Geological Survey digital elevation model grid of surface elevations of this region and the interpreted elevations to water table and bedrock obtained from the seismic data, we generated new models of the shape of the buried bedrock surface and the water table through surface interpolation and extrapolation. Then, using elevation differences between the two grids, we calculated volumes of dry and wet alluvium in the two drainages. The Red River alluvium is about 51 percent saturated, whereas the much smaller volume of alluvium in the tributary Straight Creek is only about 18 percent saturated. When combined with average ground-water velocity values, the information we present can be used to determine discharge of Straight Creek into Red River relative to the total discharge of Red River moving past Straight Creek. This information will contribute to more accurate models of ground-water flow, which are needed to determine the pre-mining water quality in the Red River.
Mining biomedical images towards valuable information retrieval in biomedical and life sciences.
Ahmed, Zeeshan; Zeeshan, Saman; Dandekar, Thomas
2016-01-01
Biomedical images are helpful sources for the scientists and practitioners in drawing significant hypotheses, exemplifying approaches and describing experimental results in published biomedical literature. In last decades, there has been an enormous increase in the amount of heterogeneous biomedical image production and publication, which results in a need for bioimaging platforms for feature extraction and analysis of text and content in biomedical images to take advantage in implementing effective information retrieval systems. In this review, we summarize technologies related to data mining of figures. We describe and compare the potential of different approaches in terms of their developmental aspects, used methodologies, produced results, achieved accuracies and limitations. Our comparative conclusions include current challenges for bioimaging software with selective image mining, embedded text extraction and processing of complex natural language queries. © The Author(s) 2016. Published by Oxford University Press.
Using temporal mining to examine the development of lymphedema in breast cancer survivors.
Green, Jason M; Paladugu, Sowjanya; Shuyu, Xu; Stewart, Bob R; Shyu, Chi-Ren; Armer, Jane M
2013-01-01
Secondary lymphedema is a lifetime risk for breast cancer survivors and can severely affect quality of life. Early detection and treatment are crucial for successful lymphedema management. Limb volume measurements can be utilized not only to diagnose lymphedema but also to track progression of limb volume changes before lymphedema, which has the potential to provide insight into the development of this condition. This study aims to identify commonly occurring patterns in limb volume changes in breast cancer survivors before the development of lymphedema and to determine if there were differences in these patterns between certain patient subgroups. Furthermore, pattern differences were studied between patients who developed lymphedema quickly and those whose onset was delayed. A temporal data mining technique was used to identify and compare common patterns in limb volume measurements in patient subgroups of study participants (n = 232). Patterns were filtered initially by support and confidence values, and then t tests were used to determine statistical significance of the remaining patterns. Higher body mass index and the presence of postoperative swelling are supported as risk factors for lymphedema. In addition, a difference in trajectory to the lymphedema state was observed. The results have potential to guide clinical guidelines for assessment of latent and early-onset lymphedema.
Karacan, C.O.; Olea, R.A.; Goodman, G.
2012-01-01
Determination of the size of the gas emission zone, the locations of gas sources within, and especially the amount of gas retained in those zones is one of the most important steps for designing a successful methane control strategy and an efficient ventilation system in longwall coal mining. The formation of the gas emission zone and the potential amount of gas-in-place (GIP) that might be available for migration into a mine are factors of local geology and rock properties that usually show spatial variability in continuity and may also show geometric anisotropy. Geostatistical methods are used here for modeling and prediction of gas amounts and for assessing their associated uncertainty in gas emission zones of longwall mines for methane control.This study used core data obtained from 276 vertical exploration boreholes drilled from the surface to the bottom of the Pittsburgh coal seam in a mining district in the Northern Appalachian basin. After identifying important coal and non-coal layers for the gas emission zone, univariate statistical and semivariogram analyses were conducted for data from different formations to define the distribution and continuity of various attributes. Sequential simulations performed stochastic assessment of these attributes, such as gas content, strata thickness, and strata displacement. These analyses were followed by calculations of gas-in-place and their uncertainties in the Pittsburgh seam caved zone and fractured zone of longwall mines in this mining district. Grid blanking was used to isolate the volume over the actual panels from the entire modeled district and to calculate gas amounts that were directly related to the emissions in longwall mines.Results indicated that gas-in-place in the Pittsburgh seam, in the caved zone and in the fractured zone, as well as displacements in major rock units, showed spatial correlations that could be modeled and estimated using geostatistical methods. This study showed that GIP volumes may change up to 3. MMscf per acre and, in a multi-panel district, may total 9. Bcf of methane within the gas emission zone. Therefore, ventilation and gas capture systems should be designed accordingly. In addition, rock displacements within the gas emission zone are spatially distributed. From an engineering and practical point of view, spatial distributions of GIP and distributions of rock displacements should be correlated with in-mine emissions and gob gas venthole productions. ?? 2011.
Karacan, C. Özgen; Olea, Ricardo A.; Goodman, Gerrit
2015-01-01
Determination of the size of the gas emission zone, the locations of gas sources within, and especially the amount of gas retained in those zones is one of the most important steps for designing a successful methane control strategy and an efficient ventilation system in longwall coal mining. The formation of the gas emission zone and the potential amount of gas-in-place (GIP) that might be available for migration into a mine are factors of local geology and rock properties that usually show spatial variability in continuity and may also show geometric anisotropy. Geostatistical methods are used here for modeling and prediction of gas amounts and for assessing their associated uncertainty in gas emission zones of longwall mines for methane control. This study used core data obtained from 276 vertical exploration boreholes drilled from the surface to the bottom of the Pittsburgh coal seam in a mining district in the Northern Appalachian basin. After identifying important coal and non-coal layers for the gas emission zone, univariate statistical and semivariogram analyses were conducted for data from different formations to define the distribution and continuity of various attributes. Sequential simulations performed stochastic assessment of these attributes, such as gas content, strata thickness, and strata displacement. These analyses were followed by calculations of gas-in-place and their uncertainties in the Pittsburgh seam caved zone and fractured zone of longwall mines in this mining district. Grid blanking was used to isolate the volume over the actual panels from the entire modeled district and to calculate gas amounts that were directly related to the emissions in longwall mines. Results indicated that gas-in-place in the Pittsburgh seam, in the caved zone and in the fractured zone, as well as displacements in major rock units, showed spatial correlations that could be modeled and estimated using geostatistical methods. This study showed that GIP volumes may change up to 3 MMscf per acre and, in a multi-panel district, may total 9 Bcf of methane within the gas emission zone. Therefore, ventilation and gas capture systems should be designed accordingly. In addition, rock displacements within the gas emission zone are spatially distributed. From an engineering and practical point of view, spatial distributions of GIP and distributions of rock displacements should be correlated with in-mine emissions and gob gas venthole productions. PMID:26435558
75 FR 55678 - Minerals Management: Adjustment of Cost Recovery Fees
Federal Register 2010, 2011, 2012, 2013, 2014
2010-09-14
... text to the general cost recovery fee table so that mineral cost recovery fees can be found in one... Coal and Oil Shale) Program's lease renewal fee will increase from $480 to $485; (C) The Mining Law... $2,840; and (D) The Mining Law Administration Program's fee for mineral patent adjudication of 10 or...
Federal Register 2010, 2011, 2012, 2013, 2014
2010-10-15
... Environmental Assessment and Finding of No Significant Impact for License Amendment No. 61 for Rio Algom Mining... amendment to Source Materials License SUA-1473 issued to Rio Algom Mining LLC (Rio Algom, or the Licensee... access the NRC's Agencywide Document Access and Management System (ADAMS), which provides text and image...
ERIC Educational Resources Information Center
Tsai, Yea-Ru; Ouyang, Chen-Sen; Chang, Yukon
2016-01-01
The purpose of this study is to propose a diagnostic approach to identify engineering students' English reading comprehension errors. Student data were collected during the process of reading texts of English for science and technology on a web-based cumulative sentence analysis system. For the analysis, the association-rule, data mining technique…
A data mining based approach to predict spatiotemporal changes in satellite images
NASA Astrophysics Data System (ADS)
Boulila, W.; Farah, I. R.; Ettabaa, K. Saheb; Solaiman, B.; Ghézala, H. Ben
2011-06-01
The interpretation of remotely sensed images in a spatiotemporal context is becoming a valuable research topic. However, the constant growth of data volume in remote sensing imaging makes reaching conclusions based on collected data a challenging task. Recently, data mining appears to be a promising research field leading to several interesting discoveries in various areas such as marketing, surveillance, fraud detection and scientific discovery. By integrating data mining and image interpretation techniques, accurate and relevant information (i.e. functional relation between observed parcels and a set of informational contents) can be automatically elicited. This study presents a new approach to predict spatiotemporal changes in satellite image databases. The proposed method exploits fuzzy sets and data mining concepts to build predictions and decisions for several remote sensing fields. It takes into account imperfections related to the spatiotemporal mining process in order to provide more accurate and reliable information about land cover changes in satellite images. The proposed approach is validated using SPOT images representing the Saint-Denis region, capital of Reunion Island. Results show good performances of the proposed framework in predicting change for the urban zone.
Information Gain Based Dimensionality Selection for Classifying Text Documents
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dumidu Wijayasekara; Milos Manic; Miles McQueen
2013-06-01
Selecting the optimal dimensions for various knowledge extraction applications is an essential component of data mining. Dimensionality selection techniques are utilized in classification applications to increase the classification accuracy and reduce the computational complexity. In text classification, where the dimensionality of the dataset is extremely high, dimensionality selection is even more important. This paper presents a novel, genetic algorithm based methodology, for dimensionality selection in text mining applications that utilizes information gain. The presented methodology uses information gain of each dimension to change the mutation probability of chromosomes dynamically. Since the information gain is calculated a priori, the computational complexitymore » is not affected. The presented method was tested on a specific text classification problem and compared with conventional genetic algorithm based dimensionality selection. The results show an improvement of 3% in the true positives and 1.6% in the true negatives over conventional dimensionality selection methods.« less
Natural Language Processing Technologies in Radiology Research and Clinical Applications.
Cai, Tianrun; Giannopoulos, Andreas A; Yu, Sheng; Kelil, Tatiana; Ripley, Beth; Kumamaru, Kanako K; Rybicki, Frank J; Mitsouras, Dimitrios
2016-01-01
The migration of imaging reports to electronic medical record systems holds great potential in terms of advancing radiology research and practice by leveraging the large volume of data continuously being updated, integrated, and shared. However, there are significant challenges as well, largely due to the heterogeneity of how these data are formatted. Indeed, although there is movement toward structured reporting in radiology (ie, hierarchically itemized reporting with use of standardized terminology), the majority of radiology reports remain unstructured and use free-form language. To effectively "mine" these large datasets for hypothesis testing, a robust strategy for extracting the necessary information is needed. Manual extraction of information is a time-consuming and often unmanageable task. "Intelligent" search engines that instead rely on natural language processing (NLP), a computer-based approach to analyzing free-form text or speech, can be used to automate this data mining task. The overall goal of NLP is to translate natural human language into a structured format (ie, a fixed collection of elements), each with a standardized set of choices for its value, that is easily manipulated by computer programs to (among other things) order into subcategories or query for the presence or absence of a finding. The authors review the fundamentals of NLP and describe various techniques that constitute NLP in radiology, along with some key applications. ©RSNA, 2016.
A case study of methane gas migration through sealed mine GOB into active mine workings
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garcia, F.; McCall, F.E.; Trevits, M.A.
1995-12-31
The U.S. Bureau of Mines investigated the influence of atmospheric pressure changes on methane gas migration through mine seals at a mine site located in the Pittsburgh Coalbed. The mine gained access to a coal reserve through part of an abandoned mine and constructed nine seals to isolate the extensive old workings from the active mine area. Underground problems were experienced when atmospheric pressure fell, causing methane gas to migrate around the seals and into the active workings. During mining operations, methane gas levels exceeded legal limits and coal production was halted until the ventilation system could be improved. Whenmore » mining resumed with increased air flow, methane gas concentrations occasionally exceeded the legal limits and production had to be halted until the methane level fell within the mandated limit. To assist the ventilation system, a pressure relief borehole located in the abandoned workings near the mine seals was proposed. Preliminary estimates by a gob gas simulator (computer model) suggested that a 0.76 m (2.5 ft) diameter pressure relief borehole with an exhaust fan would be necessary to remove enough methane from the abandoned area so that the ventilation system could dilute the gas in the active workings. However, by monitoring methane gas emissions and seal pressure, during periods of low atmospheric pressure, the amount of methane gas that migrated into the active mine workings was calculated. Researchers then determined that a relief borehole, 20.3 cm (8-in) with an exhaust fan could remove at least twice the maximum measured volume of migrating methane gas. Because gas concentrations in the abandoned workings could potentially reach explosive limits, it was proposed that the mine eliminate the exhaust fan. Installation of the recommended borehole and enlarging two other ventilation boreholes located In the abandoned area reduced methane gas leakage through the seals by at least 63%.« less
Examination of space/volume requirements for US underground coal mine refuge alternatives.
Porter, William L; Dempsey, Patrick G; Jansky, Jacqueline H
2017-01-01
The Mine Safety and Health Administration requires that 1.4 m 2 (15 ft 2 ) of floor space is to be provided for each person inside a refuge alternative (RA). However, the amount of floor space needed for a person to reside inside an RA and perform basic tasks is unknown. During testing, participants entered into an RA or a simulated RA of various space/volume configurations and performed several simulated tasks that are representative of the survivability tasks performed within an RA. The results indicate that the floor space requirements were generally adequate for the tasks studied. Certain tasks such as changing scrubber cartridges, using toilets, and moving about the RA were impacted by the minimum height tested (0.6 m). As such, RAs of this height will require critical design consideration as a whole and the supplies provided for use inside of the RA to ensure the ability to use an RA.
Identifying Key Hospital Service Quality Factors in Online Health Communities
Jung, Yuchul; Hur, Cinyoung; Jung, Dain
2015-01-01
Background The volume of health-related user-created content, especially hospital-related questions and answers in online health communities, has rapidly increased. Patients and caregivers participate in online community activities to share their experiences, exchange information, and ask about recommended or discredited hospitals. However, there is little research on how to identify hospital service quality automatically from the online communities. In the past, in-depth analysis of hospitals has used random sampling surveys. However, such surveys are becoming impractical owing to the rapidly increasing volume of online data and the diverse analysis requirements of related stakeholders. Objective As a solution for utilizing large-scale health-related information, we propose a novel approach to identify hospital service quality factors and overtime trends automatically from online health communities, especially hospital-related questions and answers. Methods We defined social media–based key quality factors for hospitals. In addition, we developed text mining techniques to detect such factors that frequently occur in online health communities. After detecting these factors that represent qualitative aspects of hospitals, we applied a sentiment analysis to recognize the types of recommendations in messages posted within online health communities. Korea’s two biggest online portals were used to test the effectiveness of detection of social media–based key quality factors for hospitals. Results To evaluate the proposed text mining techniques, we performed manual evaluations on the extraction and classification results, such as hospital name, service quality factors, and recommendation types using a random sample of messages (ie, 5.44% (9450/173,748) of the total messages). Service quality factor detection and hospital name extraction achieved average F1 scores of 91% and 78%, respectively. In terms of recommendation classification, performance (ie, precision) is 78% on average. Extraction and classification performance still has room for improvement, but the extraction results are applicable to more detailed analysis. Further analysis of the extracted information reveals that there are differences in the details of social media–based key quality factors for hospitals according to the regions in Korea, and the patterns of change seem to accurately reflect social events (eg, influenza epidemics). Conclusions These findings could be used to provide timely information to caregivers, hospital officials, and medical officials for health care policies. PMID:25855612
Deploying and sharing U-Compare workflows as web services.
Kontonatsios, Georgios; Korkontzelos, Ioannis; Kolluru, Balakrishna; Thompson, Paul; Ananiadou, Sophia
2013-02-18
U-Compare is a text mining platform that allows the construction, evaluation and comparison of text mining workflows. U-Compare contains a large library of components that are tuned to the biomedical domain. Users can rapidly develop biomedical text mining workflows by mixing and matching U-Compare's components. Workflows developed using U-Compare can be exported and sent to other users who, in turn, can import and re-use them. However, the resulting workflows are standalone applications, i.e., software tools that run and are accessible only via a local machine, and that can only be run with the U-Compare platform. We address the above issues by extending U-Compare to convert standalone workflows into web services automatically, via a two-click process. The resulting web services can be registered on a central server and made publicly available. Alternatively, users can make web services available on their own servers, after installing the web application framework, which is part of the extension to U-Compare. We have performed a user-oriented evaluation of the proposed extension, by asking users who have tested the enhanced functionality of U-Compare to complete questionnaires that assess its functionality, reliability, usability, efficiency and maintainability. The results obtained reveal that the new functionality is well received by users. The web services produced by U-Compare are built on top of open standards, i.e., REST and SOAP protocols, and therefore, they are decoupled from the underlying platform. Exported workflows can be integrated with any application that supports these open standards. We demonstrate how the newly extended U-Compare enhances the cross-platform interoperability of workflows, by seamlessly importing a number of text mining workflow web services exported from U-Compare into Taverna, i.e., a generic scientific workflow construction platform.
Deploying and sharing U-Compare workflows as web services
2013-01-01
Background U-Compare is a text mining platform that allows the construction, evaluation and comparison of text mining workflows. U-Compare contains a large library of components that are tuned to the biomedical domain. Users can rapidly develop biomedical text mining workflows by mixing and matching U-Compare’s components. Workflows developed using U-Compare can be exported and sent to other users who, in turn, can import and re-use them. However, the resulting workflows are standalone applications, i.e., software tools that run and are accessible only via a local machine, and that can only be run with the U-Compare platform. Results We address the above issues by extending U-Compare to convert standalone workflows into web services automatically, via a two-click process. The resulting web services can be registered on a central server and made publicly available. Alternatively, users can make web services available on their own servers, after installing the web application framework, which is part of the extension to U-Compare. We have performed a user-oriented evaluation of the proposed extension, by asking users who have tested the enhanced functionality of U-Compare to complete questionnaires that assess its functionality, reliability, usability, efficiency and maintainability. The results obtained reveal that the new functionality is well received by users. Conclusions The web services produced by U-Compare are built on top of open standards, i.e., REST and SOAP protocols, and therefore, they are decoupled from the underlying platform. Exported workflows can be integrated with any application that supports these open standards. We demonstrate how the newly extended U-Compare enhances the cross-platform interoperability of workflows, by seamlessly importing a number of text mining workflow web services exported from U-Compare into Taverna, i.e., a generic scientific workflow construction platform. PMID:23419017
Biomedical text mining for research rigor and integrity: tasks, challenges, directions.
Kilicoglu, Halil
2017-06-13
An estimated quarter of a trillion US dollars is invested in the biomedical research enterprise annually. There is growing alarm that a significant portion of this investment is wasted because of problems in reproducibility of research findings and in the rigor and integrity of research conduct and reporting. Recent years have seen a flurry of activities focusing on standardization and guideline development to enhance the reproducibility and rigor of biomedical research. Research activity is primarily communicated via textual artifacts, ranging from grant applications to journal publications. These artifacts can be both the source and the manifestation of practices leading to research waste. For example, an article may describe a poorly designed experiment, or the authors may reach conclusions not supported by the evidence presented. In this article, we pose the question of whether biomedical text mining techniques can assist the stakeholders in the biomedical research enterprise in doing their part toward enhancing research integrity and rigor. In particular, we identify four key areas in which text mining techniques can make a significant contribution: plagiarism/fraud detection, ensuring adherence to reporting guidelines, managing information overload and accurate citation/enhanced bibliometrics. We review the existing methods and tools for specific tasks, if they exist, or discuss relevant research that can provide guidance for future work. With the exponential increase in biomedical research output and the ability of text mining approaches to perform automatic tasks at large scale, we propose that such approaches can support tools that promote responsible research practices, providing significant benefits for the biomedical research enterprise. Published by Oxford University Press 2017. This work is written by a US Government employee and is in the public domain in the US.
May, Brian H; Zhang, Anthony; Lu, Yubo; Lu, Chuanjian; Xue, Charlie C L
2014-12-01
This project aimed to develop an approach to evaluating information contained in the premodern Traditional Chinese Medicine (TCM) literature that was (1) comprehensive, systematic, and replicable and (2) able to produce quantifiable output that could be used to answer specific research questions in order to identify natural products for clinical and experimental research. The project involved two stages. In stage 1, 14 TCM collections and compendia were evaluated for suitability as sources for searching; 8 of these were compared in detail. The results were published in the Journal of Alternative and Complementary Medicine. Stage 2 developed a text-mining approach for two of these sources. The text-mining approach was developed for Zhong Hua Yi Dian; Encyclopaedia of Traditional Chinese Medicine, 4th edition) and Zhong Yi Fang Ji Da Ci Dian; Great Compendium of Chinese Medical Formulae). This approach developed procedures for search term selection; methods for screening, classifying, and scoring data; procedures for systematic searching and data extraction; data checking procedures; and approaches for analyzing results. Examples are provided for studies of memory impairment and diabetic nephropathy, and issues relating to data interpretation are discussed. This approach to the analysis of large collections of the premodern TCM literature uses widely available sources and provides a text-mining approach that is systematic, replicable, and adaptable to the requirements of the particular project. Researchers can use these methods to explore changes in the names and conceptions of a disease over time, to identify which therapeutic methods have been more or less frequently used in different eras for particular disorders, and to assist in the selection of natural products for research efforts.
Figure Text Extraction in Biomedical Literature
Kim, Daehyun; Yu, Hong
2011-01-01
Background Figures are ubiquitous in biomedical full-text articles, and they represent important biomedical knowledge. However, the sheer volume of biomedical publications has made it necessary to develop computational approaches for accessing figures. Therefore, we are developing the Biomedical Figure Search engine (http://figuresearch.askHERMES.org) to allow bioscientists to access figures efficiently. Since text frequently appears in figures, automatically extracting such text may assist the task of mining information from figures. Little research, however, has been conducted exploring text extraction from biomedical figures. Methodology We first evaluated an off-the-shelf Optical Character Recognition (OCR) tool on its ability to extract text from figures appearing in biomedical full-text articles. We then developed a Figure Text Extraction Tool (FigTExT) to improve the performance of the OCR tool for figure text extraction through the use of three innovative components: image preprocessing, character recognition, and text correction. We first developed image preprocessing to enhance image quality and to improve text localization. Then we adapted the off-the-shelf OCR tool on the improved text localization for character recognition. Finally, we developed and evaluated a novel text correction framework by taking advantage of figure-specific lexicons. Results/Conclusions The evaluation on 382 figures (9,643 figure texts in total) randomly selected from PubMed Central full-text articles shows that FigTExT performed with 84% precision, 98% recall, and 90% F1-score for text localization and with 62.5% precision, 51.0% recall and 56.2% F1-score for figure text extraction. When limiting figure texts to those judged by domain experts to be important content, FigTExT performed with 87.3% precision, 68.8% recall, and 77% F1-score. FigTExT significantly improved the performance of the off-the-shelf OCR tool we used, which on its own performed with 36.6% precision, 19.3% recall, and 25.3% F1-score for text extraction. In addition, our results show that FigTExT can extract texts that do not appear in figure captions or other associated text, further suggesting the potential utility of FigTExT for improving figure search. PMID:21249186
Figure text extraction in biomedical literature.
Kim, Daehyun; Yu, Hong
2011-01-13
Figures are ubiquitous in biomedical full-text articles, and they represent important biomedical knowledge. However, the sheer volume of biomedical publications has made it necessary to develop computational approaches for accessing figures. Therefore, we are developing the Biomedical Figure Search engine (http://figuresearch.askHERMES.org) to allow bioscientists to access figures efficiently. Since text frequently appears in figures, automatically extracting such text may assist the task of mining information from figures. Little research, however, has been conducted exploring text extraction from biomedical figures. We first evaluated an off-the-shelf Optical Character Recognition (OCR) tool on its ability to extract text from figures appearing in biomedical full-text articles. We then developed a Figure Text Extraction Tool (FigTExT) to improve the performance of the OCR tool for figure text extraction through the use of three innovative components: image preprocessing, character recognition, and text correction. We first developed image preprocessing to enhance image quality and to improve text localization. Then we adapted the off-the-shelf OCR tool on the improved text localization for character recognition. Finally, we developed and evaluated a novel text correction framework by taking advantage of figure-specific lexicons. The evaluation on 382 figures (9,643 figure texts in total) randomly selected from PubMed Central full-text articles shows that FigTExT performed with 84% precision, 98% recall, and 90% F1-score for text localization and with 62.5% precision, 51.0% recall and 56.2% F1-score for figure text extraction. When limiting figure texts to those judged by domain experts to be important content, FigTExT performed with 87.3% precision, 68.8% recall, and 77% F1-score. FigTExT significantly improved the performance of the off-the-shelf OCR tool we used, which on its own performed with 36.6% precision, 19.3% recall, and 25.3% F1-score for text extraction. In addition, our results show that FigTExT can extract texts that do not appear in figure captions or other associated text, further suggesting the potential utility of FigTExT for improving figure search.
Bollhöfer, Andreas; Honeybun, Russell; Rosman, Kevin; Martin, Paul
2006-08-01
Airborne lead isotope ratios were measured via Thermal Ionisation Mass Spectrometry in samples from the vicinity of Ranger uranium mine in northern Australia. Dust deposited on leaves of Acacia spp. was washed off and analysed to gain a geographical snapshot of lead isotope ratios in the region. Aerosols were also collected on Teflon filters that were changed monthly over one seasonal cycle using a low volume diaphragm pump. Lead isotope ratios in dust deposited on leaves overestimate the relative amount of mine origin airborne lead, most likely due to a difference of the size distribution of particles collected on leaves and true aerosol size distribution. Seasonal measurements show that the annual average mine contribution to airborne lead concentrations in Jabiru East, approximately 2.5 km northwest of the mine, amounted to 13%, with distinct differences between the wet and dry season. The relative contribution of mine origin lead deposited on leaves in the dry season drops to less than 1% at a distance of 12.5 km from the mine along the major wind direction. An approach is outlined, in which lead isotope ratios are used to estimate the effective radiation dose received from the inhalation of mine origin radioactivity trapped in or on dust. Using the data from our study, this dose has been calculated to be approximately 2 microSv year(-1) for people living and working in the area.
Incorporating linguistic knowledge for learning distributed word representations.
Wang, Yan; Liu, Zhiyuan; Sun, Maosong
2015-01-01
Combined with neural language models, distributed word representations achieve significant advantages in computational linguistics and text mining. Most existing models estimate distributed word vectors from large-scale data in an unsupervised fashion, which, however, do not take rich linguistic knowledge into consideration. Linguistic knowledge can be represented as either link-based knowledge or preference-based knowledge, and we propose knowledge regularized word representation models (KRWR) to incorporate these prior knowledge for learning distributed word representations. Experiment results demonstrate that our estimated word representation achieves better performance in task of semantic relatedness ranking. This indicates that our methods can efficiently encode both prior knowledge from knowledge bases and statistical knowledge from large-scale text corpora into a unified word representation model, which will benefit many tasks in text mining.
Incorporating Linguistic Knowledge for Learning Distributed Word Representations
Wang, Yan; Liu, Zhiyuan; Sun, Maosong
2015-01-01
Combined with neural language models, distributed word representations achieve significant advantages in computational linguistics and text mining. Most existing models estimate distributed word vectors from large-scale data in an unsupervised fashion, which, however, do not take rich linguistic knowledge into consideration. Linguistic knowledge can be represented as either link-based knowledge or preference-based knowledge, and we propose knowledge regularized word representation models (KRWR) to incorporate these prior knowledge for learning distributed word representations. Experiment results demonstrate that our estimated word representation achieves better performance in task of semantic relatedness ranking. This indicates that our methods can efficiently encode both prior knowledge from knowledge bases and statistical knowledge from large-scale text corpora into a unified word representation model, which will benefit many tasks in text mining. PMID:25874581
Text Mining of UU-ITE Implementation in Indonesia
NASA Astrophysics Data System (ADS)
Hakim, Lukmanul; Kusumasari, Tien F.; Lubis, Muharman
2018-04-01
At present, social media and networks act as one of the main platforms for sharing information, idea, thought and opinions. Many people share their knowledge and express their views on the specific topics or current hot issues that interest them. The social media texts have rich information about the complaints, comments, recommendation and suggestion as the automatic reaction or respond to government initiative or policy in order to overcome certain issues.This study examines the sentiment from netizensas part of citizen who has vocal sound about the implementation of UU ITE as the first cyberlaw in Indonesia as a means to identify the current tendency of citizen perception. To perform text mining techniques, this study used Twitter Rest API while R programming was utilized for the purpose of classification analysis based on hierarchical cluster.
Interactive text mining with Pipeline Pilot: a bibliographic web-based tool for PubMed.
Vellay, S G P; Latimer, N E Miller; Paillard, G
2009-06-01
Text mining has become an integral part of all research in the medical field. Many text analysis software platforms support particular use cases and only those. We show an example of a bibliographic tool that can be used to support virtually any use case in an agile manner. Here we focus on a Pipeline Pilot web-based application that interactively analyzes and reports on PubMed search results. This will be of interest to any scientist to help identify the most relevant papers in a topical area more quickly and to evaluate the results of query refinement. Links with Entrez databases help both the biologist and the chemist alike. We illustrate this application with Leishmaniasis, a neglected tropical disease, as a case study.
Literature classification for semi-automated updating of biological knowledgebases
2013-01-01
Background As the output of biological assays increase in resolution and volume, the body of specialized biological data, such as functional annotations of gene and protein sequences, enables extraction of higher-level knowledge needed for practical application in bioinformatics. Whereas common types of biological data, such as sequence data, are extensively stored in biological databases, functional annotations, such as immunological epitopes, are found primarily in semi-structured formats or free text embedded in primary scientific literature. Results We defined and applied a machine learning approach for literature classification to support updating of TANTIGEN, a knowledgebase of tumor T-cell antigens. Abstracts from PubMed were downloaded and classified as either "relevant" or "irrelevant" for database update. Training and five-fold cross-validation of a k-NN classifier on 310 abstracts yielded classification accuracy of 0.95, thus showing significant value in support of data extraction from the literature. Conclusion We here propose a conceptual framework for semi-automated extraction of epitope data embedded in scientific literature using principles from text mining and machine learning. The addition of such data will aid in the transition of biological databases to knowledgebases. PMID:24564403
Spectral signature verification using statistical analysis and text mining
NASA Astrophysics Data System (ADS)
DeCoster, Mallory E.; Firpi, Alexe H.; Jacobs, Samantha K.; Cone, Shelli R.; Tzeng, Nigel H.; Rodriguez, Benjamin M.
2016-05-01
In the spectral science community, numerous spectral signatures are stored in databases representative of many sample materials collected from a variety of spectrometers and spectroscopists. Due to the variety and variability of the spectra that comprise many spectral databases, it is necessary to establish a metric for validating the quality of spectral signatures. This has been an area of great discussion and debate in the spectral science community. This paper discusses a method that independently validates two different aspects of a spectral signature to arrive at a final qualitative assessment; the textual meta-data and numerical spectral data. Results associated with the spectral data stored in the Signature Database1 (SigDB) are proposed. The numerical data comprising a sample material's spectrum is validated based on statistical properties derived from an ideal population set. The quality of the test spectrum is ranked based on a spectral angle mapper (SAM) comparison to the mean spectrum derived from the population set. Additionally, the contextual data of a test spectrum is qualitatively analyzed using lexical analysis text mining. This technique analyzes to understand the syntax of the meta-data to provide local learning patterns and trends within the spectral data, indicative of the test spectrum's quality. Text mining applications have successfully been implemented for security2 (text encryption/decryption), biomedical3 , and marketing4 applications. The text mining lexical analysis algorithm is trained on the meta-data patterns of a subset of high and low quality spectra, in order to have a model to apply to the entire SigDB data set. The statistical and textual methods combine to assess the quality of a test spectrum existing in a database without the need of an expert user. This method has been compared to other validation methods accepted by the spectral science community, and has provided promising results when a baseline spectral signature is present for comparison. The spectral validation method proposed is described from a practical application and analytical perspective.
Minerals Yearbook, volume II, Area Reports—Domestic
,
2018-01-01
The U.S. Geological Survey (USGS) Minerals Yearbook discusses the performance of the worldwide minerals and materials industries and provides background information to assist in interpreting that performance. Content of the individual Minerals Yearbook volumes follows:Volume I, Metals and Minerals, contains chapters about virtually all metallic and industrial mineral commodities important to the U.S. economy. Chapters on survey methods, summary statistics for domestic nonfuel minerals, and trends in mining and quarrying in the metals and industrial mineral industries in the United States are also included.Volume II, Area Reports: Domestic, contains a chapter on the mineral industry of each of the 50 States and Puerto Rico and the Administered Islands. This volume also has chapters on survey methods and summary statistics of domestic nonfuel minerals.Volume III, Area Reports: International, is published as four separate reports. These regional reports contain the latest available minerals data on more than 180 foreign countries and discuss the importance of minerals to the economies of these nations and the United States. Each report begins with an overview of the region’s mineral industries during the year. It continues with individual country chapters that examine the mining, refining, processing, and use of minerals in each country of the region and how each country’s mineral industry relates to U.S. industry. Most chapters include production tables and industry structure tables, information about Government policies and programs that affect the country’s mineral industry, and an outlook section.The USGS continually strives to improve the value of its publications to users. Constructive comments and suggestions by readers of the Minerals Yearbook are welcomed.
Minerals Yearbook, volume I, Metals and Minerals
,
2018-01-01
The U.S. Geological Survey (USGS) Minerals Yearbook discusses the performance of the worldwide minerals and materials industries and provides background information to assist in interpreting that performance. Content of the individual Minerals Yearbook volumes follows:Volume I, Metals and Minerals, contains chapters about virtually all metallic and industrial mineral commodities important to the U.S. economy. Chapters on survey methods, summary statistics for domestic nonfuel minerals, and trends in mining and quarrying in the metals and industrial mineral industries in the United States are also included.Volume II, Area Reports: Domestic, contains a chapter on the mineral industry of each of the 50 States and Puerto Rico and the Administered Islands. This volume also has chapters on survey methods and summary statistics of domestic nonfuel minerals.Volume III, Area Reports: International, is published as four separate reports. These regional reports contain the latest available minerals data on more than 180 foreign countries and discuss the importance of minerals to the economies of these nations and the United States. Each report begins with an overview of the region’s mineral industries during the year. It continues with individual country chapters that examine the mining, refining, processing, and use of minerals in each country of the region and how each country’s mineral industry relates to U.S. industry. Most chapters include production tables and industry structure tables, information about Government policies and programs that affect the country’s mineral industry, and an outlook section.The USGS continually strives to improve the value of its publications to users. Constructive comments and suggestions by readers of the Minerals Yearbook are welcomed.
Minerals Yearbook, volume III, Area Reports—International
,
2018-01-01
The U.S. Geological Survey (USGS) Minerals Yearbook discusses the performance of the worldwide minerals and materials industries and provides background information to assist in interpreting that performance. Content of the individual Minerals Yearbook volumes follows:Volume I, Metals and Minerals, contains chapters about virtually all metallic and industrial mineral commodities important to the U.S. economy. Chapters on survey methods, summary statistics for domestic nonfuel minerals, and trends in mining and quarrying in the metals and industrial mineral industries in the United States are also included.Volume II, Area Reports: Domestic, contains a chapter on the mineral industry of each of the 50 States and Puerto Rico and the Administered Islands. This volume also has chapters on survey methods and summary statistics of domestic nonfuel minerals.Volume III, Area Reports: International, is published as four separate reports. These regional reports contain the latest available minerals data on more than 180 foreign countries and discuss the importance of minerals to the economies of these nations and the United States. Each report begins with an overview of the region’s mineral industries during the year. It continues with individual country chapters that examine the mining, refining, processing, and use of minerals in each country of the region and how each country’s mineral industry relates to U.S. industry. Most chapters include production tables and industry structure tables, information about Government policies and programs that affect the country’s mineral industry, and an outlook section.The USGS continually strives to improve the value of its publications to users. Constructive comments and suggestions by readers of the Minerals Yearbook are welcomed.
Chen, Chuyun; Hong, Jiaming; Zhou, Weilin; Lin, Guohua; Wang, Zhengfei; Zhang, Qufei; Lu, Cuina; Lu, Lihong
2017-07-12
To construct a knowledge platform of acupuncture ancient books based on data mining technology, and to provide retrieval service for users. The Oracle 10 g database was applied and JAVA was selected as development language; based on the standard library and ancient books database established by manual entry, a variety of data mining technologies, including word segmentation, speech tagging, dependency analysis, rule extraction, similarity calculation, ambiguity analysis, supervised classification technology were applied to achieve text automatic extraction of ancient books; in the last, through association mining and decision analysis, the comprehensive and intelligent analysis of disease and symptom, meridians, acupoints, rules of acupuncture and moxibustion in acupuncture ancient books were realized, and retrieval service was provided for users through structure of browser/server (B/S). The platform realized full-text retrieval, word frequency analysis and association analysis; when diseases or acupoints were searched, the frequencies of meridian, acupoints (diseases) and techniques were presented from high to low, meanwhile the support degree and confidence coefficient between disease and acupoints (special acupoint), acupoints and acupoints in prescription, disease or acupoints and technique were presented. The experience platform of acupuncture ancient books based on data mining technology could be used as a reference for selection of disease, meridian and acupoint in clinical treatment and education of acupuncture and moxibustion.
NASA Astrophysics Data System (ADS)
Stoch, B.; Anthonissen, C. J.; McCall, M.-J.; Basson, I. J.; Deacon, J.; Cloete, E.; Botha, J.; Britz, J.; Strydom, M.; Nel, D.; Bester, M.
2017-12-01
The Sishen deposit is one of the largest iron ore concentrations in current production. Hematite mineralization occurs along a strike length of 14 km, with a width of 3.2 km and a maximum vertical extent of 400 m below the original surface. The 986-Mt reserve incorporates a suite of individual orebodies, beneath a locally preserved tectonized unconformity, with a wide range of geometries, depths, and orientations. Fully constrained, implicit 3D modeling of the entire mining volume (> 70 km3), was undertaken to the original, pre-mining topography. The model incorporates 5287 mapping points and > 21,000 drillholes and provides exceptional insight into the original configuration of ore and its relationship to contacts, unconformities, and structures in the enclosing country rock. The bulk of ore occurs to the west of a strike-extensive, partially inverted normal fault (Sloep Fault), within an asymmetrical synclinal structure on its western flank. This linear, N-S distribution of deep, thick ore is punctuated by palaeosinkholes, wherein base-of-ore dips of greater than 45°, are concentrically arranged. Localized ore volumes also occur along faults and in fault-bounded, downthrown blocks, to the north of NW-SE- and NE-SW-trending strike-slip faults that show relatively minor uplift to the south, probably due to the Lomanian Namaqua-Natal Orogeny. The revised model demonstrates the proximity of ore to a tectonized unconformity and highlights the structural control on ore volumes, implying that Fe mineralization at Sishen cannot be exclusively attributed to supergene enrichment and concentric palaeosinkhole formation.
Data mining applications in the context of casemix.
Koh, H C; Leong, S K
2001-07-01
In October 1999, the Singapore Government introduced casemix-based funding to public hospitals. The casemix approach to health care funding is expected to yield significant benefits, including equity and rationality in financing health care, the use of comparative casemix data for quality improvement activities, and the provision of information that enables hospitals to understand their cost behaviour and reinforces the drive for more cost-efficient services. However, there is some concern about the "quicker and sicker" syndrome (that is, the rapid discharge of patients with little regard for the quality of outcome). As it is likely that consequences of premature discharges will be reflected in the readmission data, an analysis of possible systematic patterns in readmission data can provide useful insight into the "quicker and sicker" syndrome. This paper explores potential data mining applications in the context of casemix by using readmission data as an illustration. In particular, it illustrates how data mining can be used to better understand readmission data and to detect systematic patterns, if any. From a technical perspective, data mining (which is capable of analysing complex non-linear and interaction relationships) supplements and complements traditional statistical methods in data analysis. From an applications perspective, data mining provides the technology and methodology to analyse mass volume of data to detect hidden patterns in data. Using readmission data as an illustrative data mining application, this paper explores potential data mining applications in the general casemix context.
Public reactions to e-cigarette regulations on Twitter: a text mining analysis.
Lazard, Allison J; Wilcox, Gary B; Tuttle, Hannah M; Glowacki, Elizabeth M; Pikowski, Jessica
2017-12-01
In May 2016, the Food and Drug Administration (FDA) issued a final rule that deemed e-cigarettes to be within their regulatory authority as a tobacco product. News and opinions about the regulation were shared on social media platforms, such as Twitter, which can play an important role in shaping the public's attitudes. We analysed information shared on Twitter for insights into initial public reactions. A text mining approach was used to uncover important topics among reactions to the e-cigarette regulations on Twitter. SAS Text Miner V.12.1 software was used for descriptive text mining to uncover the primary topics from tweets collected from May 1 to May 17 2016 using NUVI software to gather the data. A total of nine topics were generated. These topics reveal initial reactions to whether the FDA's e-cigarette regulations will benefit or harm public health, how the regulations will impact the emerging e-cigarette market and efforts to share the news. The topics were dominated by negative or mixed reactions. In the days following the FDA's announcement of the new deeming regulations, the public reaction on Twitter was largely negative. Public health advocates should consider using social media outlets to better communicate the policy's intentions, reach and potential impact for public good to create a more balanced conversation. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
2015-01-01
Background Sufficient knowledge of molecular and genetic interactions, which comprise the entire basis of the functioning of living systems, is one of the necessary requirements for successfully answering almost any research question in the field of biology and medicine. To date, more than 24 million scientific papers can be found in PubMed, with many of them containing descriptions of a wide range of biological processes. The analysis of such tremendous amounts of data requires the use of automated text-mining approaches. Although a handful of tools have recently been developed to meet this need, none of them provide error-free extraction of highly detailed information. Results The ANDSystem package was developed for the reconstruction and analysis of molecular genetic networks based on an automated text-mining technique. It provides a detailed description of the various types of interactions between genes, proteins, microRNA's, metabolites, cellular components, pathways and diseases, taking into account the specificity of cell lines and organisms. Although the accuracy of ANDSystem is comparable to other well known text-mining tools, such as Pathway Studio and STRING, it outperforms them in having the ability to identify an increased number of interaction types. Conclusion The use of ANDSystem, in combination with Pathway Studio and STRING, can improve the quality of the automated reconstruction of molecular and genetic networks. ANDSystem should provide a useful tool for researchers working in a number of different fields, including biology, biotechnology, pharmacology and medicine. PMID:25881313
Learn Japanese--Elementary School Text, Volume III.
ERIC Educational Resources Information Center
Sato, Yaeko; And Others
This volume is the teacher's text for the first semester program on level two (fourth grade). See AL 001 718 for Volume I and ED 019 666 for Volume II. Text materials for the second level continue to introduce new structures systematically, according to the pupils' interest, ability, and rate of learning. Dialogs for level two, Volumes III and IV,…
Tagline: Information Extraction for Semi-Structured Text Elements in Medical Progress Notes
ERIC Educational Resources Information Center
Finch, Dezon Kile
2012-01-01
Text analysis has become an important research activity in the Department of Veterans Affairs (VA). Statistical text mining and natural language processing have been shown to be very effective for extracting useful information from medical documents. However, neither of these techniques is effective at extracting the information stored in…
ERIC Educational Resources Information Center
Walkington, Candace; Clinton, Virginia; Ritter, Steven N.; Nathan, Mitchell J.
2015-01-01
Solving mathematics story problems requires text comprehension skills. However, previous studies have found few connections between traditional measures of text readability and performance on story problems. We hypothesized that recently developed measures of readability and topic incidence measured by text-mining tools may illuminate associations…
New challenges for text mining: mapping between text and manually curated pathways
Oda, Kanae; Kim, Jin-Dong; Ohta, Tomoko; Okanohara, Daisuke; Matsuzaki, Takuya; Tateisi, Yuka; Tsujii, Jun'ichi
2008-01-01
Background Associating literature with pathways poses new challenges to the Text Mining (TM) community. There are three main challenges to this task: (1) the identification of the mapping position of a specific entity or reaction in a given pathway, (2) the recognition of the causal relationships among multiple reactions, and (3) the formulation and implementation of required inferences based on biological domain knowledge. Results To address these challenges, we constructed new resources to link the text with a model pathway; they are: the GENIA pathway corpus with event annotation and NF-kB pathway. Through their detailed analysis, we address the untapped resource, ‘bio-inference,’ as well as the differences between text and pathway representation. Here, we show the precise comparisons of their representations and the nine classes of ‘bio-inference’ schemes observed in the pathway corpus. Conclusions We believe that the creation of such rich resources and their detailed analysis is the significant first step for accelerating the research of the automatic construction of pathway from text. PMID:18426550
Sentiment analysis of Arabic tweets using text mining techniques
NASA Astrophysics Data System (ADS)
Al-Horaibi, Lamia; Khan, Muhammad Badruddin
2016-07-01
Sentiment analysis has become a flourishing field of text mining and natural language processing. Sentiment analysis aims to determine whether the text is written to express positive, negative, or neutral emotions about a certain domain. Most sentiment analysis researchers focus on English texts, with very limited resources available for other complex languages, such as Arabic. In this study, the target was to develop an initial model that performs satisfactorily and measures Arabic Twitter sentiment by using machine learning approach, Naïve Bayes and Decision Tree for classification algorithms. The datasets used contains more than 2,000 Arabic tweets collected from Twitter. We performed several experiments to check the performance of the two algorithms classifiers using different combinations of text-processing functions. We found that available facilities for Arabic text processing need to be made from scratch or improved to develop accurate classifiers. The small functionalities developed by us in a Python language environment helped improve the results and proved that sentiment analysis in the Arabic domain needs lot of work on the lexicon side.
Dias, Alvaro Machado; Mansur, Carlos Gustavo; Myczkowski, Martin; Marcolin, Marco
2011-06-01
Transcranial magnetic stimulation (TMS) has played an important role in the fields of psychiatry, neurology and neuroscience, since its emergence in the mid-1980s; and several high quality reviews have been produced since then. Most high quality reviews serve as powerful tools in the evaluation of predefined tendencies, but they cannot actually uncover new trends within the literature. However, special statistical procedures to 'mine' the literature have been developed which aid in achieving such a goal. This paper aims to uncover patterns within the literature on TMS as a whole, as well as specific trends in the recent literature on TMS for the treatment of depression. Data mining and text mining. Currently there are 7299 publications, which can be clustered in four essential themes. Considering the frequency of the core psychiatric concepts within the indexed literature, the main results are: depression is present in 13.5% of the publications; Parkinson's disease in 2.94%; schizophrenia in 2.76%; bipolar disorder in 0.158%; and anxiety disorder in 0.142% of all the publications indexed in PubMed. Several other perspectives are discussed in the article. Copyright © 2011 Elsevier B.V. All rights reserved.
76 FR 64384 - Petitions for Modification of Application of Existing Mandatory Safety Standards
Federal Register 2010, 2011, 2012, 2013, 2014
2011-10-18
... flame-resistant material or made with material accepted by MSHA as flame resistant. (9) If mining... alcohol slow-fermented from starch, bearing an alcohol content of less than 10 percent alcohol by volume...
Text mining by Tsallis entropy
NASA Astrophysics Data System (ADS)
Jamaati, Maryam; Mehri, Ali
2018-01-01
Long-range correlations between the elements of natural languages enable them to convey very complex information. Complex structure of human language, as a manifestation of natural languages, motivates us to apply nonextensive statistical mechanics in text mining. Tsallis entropy appropriately ranks the terms' relevance to document subject, taking advantage of their spatial correlation length. We apply this statistical concept as a new powerful word ranking metric in order to extract keywords of a single document. We carry out an experimental evaluation, which shows capability of the presented method in keyword extraction. We find that, Tsallis entropy has reliable word ranking performance, at the same level of the best previous ranking methods.
Summary of the BioLINK SIG 2013 meeting at ISMB/ECCB 2013.
Verspoor, Karin; Shatkay, Hagit; Hirschman, Lynette; Blaschke, Christian; Valencia, Alfonso
2015-01-15
The ISMB Special Interest Group on Linking Literature, Information and Knowledge for Biology (BioLINK) organized a one-day workshop at ISMB/ECCB 2013 in Berlin, Germany. The theme of the workshop was 'Roles for text mining in biomedical knowledge discovery and translational medicine'. This summary reviews the outcomes of the workshop. Meeting themes included concept annotation methods and applications, extraction of biological relationships and the use of text-mined data for biological data analysis. All articles are available at http://biolinksig.org/proceedings-online/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Xiaoyang, Zhong; Hong, Ren; Jingxin, Gao
2018-03-01
With the gradual maturity of the real estate market in China, urban housing prices are also better able to reflect changes in market demand and the commodity property of commercial housing has become more and more obvious. Many scholars in our country have made a lot of research on the factors that affect the price of commercial housing in the city and the number of related research papers increased rapidly. These scholars’ research results provide valuable wealth to solve the problem of urban housing price changes in our country. However, due to the huge amount of literature, the vast amount of information is submerged in the library and cannot be fully utilized. Text mining technology has been widely concerned and developed in the field of Humanities and Social Sciences in recent years. But through the text mining technology to obtain the influence factors on the price of urban commercial housing is still relatively rare. In this paper, the research results of the existing scholars were excavated by text mining algorithm based on support vector machine in order to further make full use of the current research results and to provide a reference for stabilizing housing prices.
Jurca, Gabriela; Addam, Omar; Aksac, Alper; Gao, Shang; Özyer, Tansel; Demetrick, Douglas; Alhajj, Reda
2016-04-26
Breast cancer is a serious disease which affects many women and may lead to death. It has received considerable attention from the research community. Thus, biomedical researchers aim to find genetic biomarkers indicative of the disease. Novel biomarkers can be elucidated from the existing literature. However, the vast amount of scientific publications on breast cancer make this a daunting task. This paper presents a framework which investigates existing literature data for informative discoveries. It integrates text mining and social network analysis in order to identify new potential biomarkers for breast cancer. We utilized PubMed for the testing. We investigated gene-gene interactions, as well as novel interactions such as gene-year, gene-country, and abstract-country to find out how the discoveries varied over time and how overlapping/diverse are the discoveries and the interest of various research groups in different countries. Interesting trends have been identified and discussed, e.g., different genes are highlighted in relationship to different countries though the various genes were found to share functionality. Some text analysis based results have been validated against results from other tools that predict gene-gene relations and gene functions.
Dura, Elzbieta; Muresan, Sorel; Engkvist, Ola; Blomberg, Niklas; Chen, Hongming
2014-05-01
In the pharmaceutical industry, efficiently mining pharmacological data from the rapidly increasing scientific literature is very crucial for many aspects of the drug discovery process such as target validation, tool compound selection etc. A quick and reliable way is needed to collect literature assertions of selected compounds' biological and pharmacological effects in order to assist the hypothesis generation and decision-making of drug developers. INFUSIS, the text mining system presented here, extracts data on chemical compounds from PubMed abstracts. It involves an extensive use of customized natural language processing besides a co-occurrence analysis. As a proof-of-concept study, INFUSIS was used to search in abstract texts for several obesity/diabetes related pharmacological effects of the compounds included in a compound dictionary. The system extracts assertions regarding the pharmacological effects of each given compound and scores them by the relevance. For each selected pharmacological effect, the highest scoring assertions in 100 abstracts were manually evaluated, i.e. 800 abstracts in total. The overall accuracy for the inferred assertions was over 90 percent. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Chen, Hongyu; Martin, Bronwen; Daimon, Caitlin M; Maudsley, Stuart
2013-01-01
Text mining is rapidly becoming an essential technique for the annotation and analysis of large biological data sets. Biomedical literature currently increases at a rate of several thousand papers per week, making automated information retrieval methods the only feasible method of managing this expanding corpus. With the increasing prevalence of open-access journals and constant growth of publicly-available repositories of biomedical literature, literature mining has become much more effective with respect to the extraction of biomedically-relevant data. In recent years, text mining of popular databases such as MEDLINE has evolved from basic term-searches to more sophisticated natural language processing techniques, indexing and retrieval methods, structural analysis and integration of literature with associated metadata. In this review, we will focus on Latent Semantic Indexing (LSI), a computational linguistics technique increasingly used for a variety of biological purposes. It is noted for its ability to consistently outperform benchmark Boolean text searches and co-occurrence models at information retrieval and its power to extract indirect relationships within a data set. LSI has been used successfully to formulate new hypotheses, generate novel connections from existing data, and validate empirical data.
Biomedical data mining in clinical routine: expanding the impact of hospital information systems.
Müller, Marcel; Markó, Kornel; Daumke, Philipp; Paetzold, Jan; Roesner, Arnold; Klar, Rüdiger
2007-01-01
In this paper we want to describe how the promising technology of biomedical data mining can improve the use of hospital information systems: a large set of unstructured, narrative clinical data from a dermatological university hospital like discharge letters or other dermatological reports were processed through a morpho-semantic text retrieval engine ("MorphoSaurus") and integrated with other clinical data using a web-based interface and brought into daily clinical routine. The user evaluation showed a very high user acceptance - this system seems to meet the clinicians' requirements for a vertical data mining in the electronic patient records. What emerges is the need for integration of biomedical data mining into hospital information systems for clinical, scientific, educational and economic reasons.
Mining adverse drug reactions from online healthcare forums using hidden Markov model.
Sampathkumar, Hariprasad; Chen, Xue-wen; Luo, Bo
2014-10-23
Adverse Drug Reactions are one of the leading causes of injury or death among patients undergoing medical treatments. Not all Adverse Drug Reactions are identified before a drug is made available in the market. Current post-marketing drug surveillance methods, which are based purely on voluntary spontaneous reports, are unable to provide the early indications necessary to prevent the occurrence of such injuries or fatalities. The objective of this research is to extract reports of adverse drug side-effects from messages in online healthcare forums and use them as early indicators to assist in post-marketing drug surveillance. We treat the task of extracting adverse side-effects of drugs from healthcare forum messages as a sequence labeling problem and present a Hidden Markov Model(HMM) based Text Mining system that can be used to classify a message as containing drug side-effect information and then extract the adverse side-effect mentions from it. A manually annotated dataset from http://www.medications.com is used in the training and validation of the HMM based Text Mining system. A 10-fold cross-validation on the manually annotated dataset yielded on average an F-Score of 0.76 from the HMM Classifier, in comparison to 0.575 from the Baseline classifier. Without the Plain Text Filter component as a part of the Text Processing module, the F-Score of the HMM Classifier was reduced to 0.378 on average, while absence of the HTML Filter component was found to have no impact. Reducing the Drug names dictionary size by half, on average reduced the F-Score of the HMM Classifier to 0.359, while a similar reduction to the side-effects dictionary yielded an F-Score of 0.651 on average. Adverse side-effects mined from http://www.medications.com and http://www.steadyhealth.com were found to match the Adverse Drug Reactions on the Drug Package Labels of several drugs. In addition, some novel adverse side-effects, which can be potential Adverse Drug Reactions, were also identified. The results from the HMM based Text Miner are encouraging to pursue further enhancements to this approach. The mined novel side-effects can act as early indicators for health authorities to help focus their efforts in post-marketing drug surveillance.
Managing coal combustion residues in mines
DOE Office of Scientific and Technical Information (OSTI.GOV)
NONE
2006-07-01
Burning coal in electric utility plants produces, in addition to power, residues that contain constituents which may be harmful to the environment. The management of large volumes of coal combustion residues (CCRs) is a challenge for utilities, because they must either place the CCRs in landfills, surface impoundments, or mines, or find alternative uses for the material. This study focuses on the placement of CCRs in active and abandoned coal mines. The Committee on Mine Placement of Coal Combustion Wastes of the National Research Council believes that placement of CCRs in mines as part of the reclamation process may bemore » a viable option for the disposal of this material as long as the placement is properly planned and carried out in a manner that avoids significant adverse environmental and health impacts. This report discusses a variety of steps that are involved in planning and managing the use of CCRs as minefills, including an integrated process of CCR characterization and site characterization, management and engineering design of placement activities, and design and implementation of monitoring to reduce the risk of contamination moving from the mine site to the ambient environment. Enforceable federal standards are needed for the disposal of CCRs in minefills to ensure that states have adequate, explicit authority and that they implement minimum safeguards. 267 refs., 6 apps.« less
Clinical diabetes research using data mining: a Canadian perspective.
Shah, Baiju R; Lipscombe, Lorraine L
2015-06-01
With the advent of the digitization of large amounts of information and the computer power capable of analyzing this volume of information, data mining is increasingly being applied to medical research. Datasets created for administration of the healthcare system provide a wealth of information from different healthcare sectors, and Canadian provinces' single-payer universal healthcare systems mean that data are more comprehensive and complete in this country than in many other jurisdictions. The increasing ability to also link clinical information, such as electronic medical records, laboratory test results and disease registries, has broadened the types of data available for analysis. Data-mining methods have been used in many different areas of diabetes clinical research, including classic epidemiology, effectiveness research, population health and health services research. Although methodologic challenges and privacy concerns remain important barriers to using these techniques, data mining remains a powerful tool for clinical research. Copyright © 2015 Canadian Diabetes Association. Published by Elsevier Inc. All rights reserved.
Study of Internal Dump Stability of Dudhichua Open Cast Project, Northern Coalfields Limited, India
NASA Astrophysics Data System (ADS)
Sengupta, S.; Roy, I.
2015-04-01
Dudhichua Open Cast Project is one of the prestigious projects of Northern Coalfields Limited, India; with total mineable coal reserves of approximately 400 million tonnes and corresponding 1,700 million m3 volume of waste rock i.e. overburden material. Accommodating this waste dump masses in the limited space of the de-coaled portion of the quarry is considered as one of the major challenges to the mine operators. It has been reported that this mine is facing frequent slope failures of waste rock dumps which is of great concern to the mine management in view of unsafe working condition. To tackle the above problem, a detailed investigation was carried out to propose a stable dump profile which will cater to the land economics and safety aspects of the mine. A detailed investigation along with recommendation of optimum design for dragline dump profile along with shovel-dumper-dump profile is presented in this paper.
Müller, H-M; Van Auken, K M; Li, Y; Sternberg, P W
2018-03-09
The biomedical literature continues to grow at a rapid pace, making the challenge of knowledge retrieval and extraction ever greater. Tools that provide a means to search and mine the full text of literature thus represent an important way by which the efficiency of these processes can be improved. We describe the next generation of the Textpresso information retrieval system, Textpresso Central (TPC). TPC builds on the strengths of the original system by expanding the full text corpus to include the PubMed Central Open Access Subset (PMC OA), as well as the WormBase C. elegans bibliography. In addition, TPC allows users to create a customized corpus by uploading and processing documents of their choosing. TPC is UIMA compliant, to facilitate compatibility with external processing modules, and takes advantage of Lucene indexing and search technology for efficient handling of millions of full text documents. Like Textpresso, TPC searches can be performed using keywords and/or categories (semantically related groups of terms), but to provide better context for interpreting and validating queries, search results may now be viewed as highlighted passages in the context of full text. To facilitate biocuration efforts, TPC also allows users to select text spans from the full text and annotate them, create customized curation forms for any data type, and send resulting annotations to external curation databases. As an example of such a curation form, we describe integration of TPC with the Noctua curation tool developed by the Gene Ontology (GO) Consortium. Textpresso Central is an online literature search and curation platform that enables biocurators and biomedical researchers to search and mine the full text of literature by integrating keyword and category searches with viewing search results in the context of the full text. It also allows users to create customized curation interfaces, use those interfaces to make annotations linked to supporting evidence statements, and then send those annotations to any database in the world. Textpresso Central URL: http://www.textpresso.org/tpc.
Freitas, Alex A; Limbu, Kriti; Ghafourian, Taravat
2015-01-01
Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug's distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been discussed. The regression methods predict Vss using, as predictive features, both the compounds' molecular descriptors and the compounds' tissue:plasma partition coefficients (Kt:p) - often used in physiologically-based pharmacokinetics. Therefore, this work has assessed whether the data mining-based prediction of Vss can be made more accurate by using as input not only the compounds' molecular descriptors but also (a subset of) their predicted Kt:p values. Comparison of the models that used only molecular descriptors, in particular, the Bagging decision tree (mean fold error of 2.33), with those employing predicted Kt:p values in addition to the molecular descriptors, such as the Bagging decision tree using adipose Kt:p (mean fold error of 2.29), indicated that the use of predicted Kt:p values as descriptors may be beneficial for accurate prediction of Vss using decision trees if prior feature selection is applied. Decision tree based models presented in this work have an accuracy that is reasonable and similar to the accuracy of reported Vss inter-species extrapolations in the literature. The estimation of Vss for new compounds in drug discovery will benefit from methods that are able to integrate large and varied sources of data and flexible non-linear data mining methods such as decision trees, which can produce interpretable models. Graphical AbstractDecision trees for the prediction of tissue partition coefficient and volume of distribution of drugs.
Complex Event Processing for Content-Based Text, Image, and Video Retrieval
2016-06-01
NY): Wiley- Interscience; 2000. Feldman R, Sanger J. The text mining handbook: advanced approaches in analyzing unstructured data. New York (NY...ARL-TR-7705 ● JUNE 2016 US Army Research Laboratory Complex Event Processing for Content-Based Text , Image, and Video Retrieval...ARL-TR-7705 ● JUNE 2016 US Army Research Laboratory Complex Event Processing for Content-Based Text , Image, and Video Retrieval
Sequence to Sequence - Video to Text
2015-12-11
Saenko, and S. Guadarrama. Generating natural-language video descriptions using text - mined knowledge. In AAAI, July 2013. 2 [20] P. Kuznetsova, V...Sequence to Sequence – Video to Text Subhashini Venugopalan1 Marcus Rohrbach2,4 Jeff Donahue2 Raymond Mooney1 Trevor Darrell2 Kate Saenko3...1. Introduction Describing visual content with natural language text has recently received increased interest, especially describing images with a
Tracing ancient hydrogeological fracture network age and compartmentalisation using noble gases
NASA Astrophysics Data System (ADS)
Warr, Oliver; Sherwood Lollar, Barbara; Fellowes, Jonathan; Sutcliffe, Chelsea N.; McDermott, Jill M.; Holland, Greg; Mabry, Jennifer C.; Ballentine, Christopher J.
2018-02-01
We show that fluid volumes residing within the Precambrian crystalline basement account for ca 30% of the total groundwater inventory of the Earth (> 30 million km3). The residence times and scientific importance of this groundwater are only now receiving attention with ancient fracture fluids identified in Canada and South Africa showing: (1) microbial life which has existed in isolation for millions of years; (2) significant hydrogen and hydrocarbon production via water-rock reactions; and (3) preserving noble gas components from the early atmosphere. Noble gas (He, Ne, Ar, Kr, Xe) abundance and isotopic compositions provide the primary evidence for fluid mean residence time (MRT). Here we extend the noble gas data from the Kidd Creek Mine in Timmins Ontario Canada, a volcanogenic massive sulfide (VMS) deposit formed at 2.7 Ga, in which fracture fluids with MRTs of 1.1-1.7 Ga were identified at 2.4 km depth (Holland et al., 2013); to fracture fluids at 2.9 km depth. We compare here the Kidd Creek Mine study with noble gas compositions determined in fracture fluids taken from two mines (Mine 1 & Mine 2) at 1.7 and 1.4 km depth below surface in the Sudbury Basin formed by a meteorite impact at 1.849 Ga. The 2.9 km samples at Kidd Creek Mine show the highest radiogenic isotopic ratios observed to date in free fluids (e.g. 21Ne/22Ne = 0.6 and 40Ar/36Ar = 102,000) and have MRTs of 1.0-2.2 Ga. In contrast, resampled 2.4 km fluids indicated a less ancient MRT (0.2-0.6 Ga) compared with the previous study (1.1-1.7 Ga). This is consistent with a change in the age distribution of fluids feeding the fractures as they drain, with a decreasing proportion of the most ancient end-member fluids. 129Xe/136Xe ratios for these fluids confirm that boreholes at 2.4 km versus 2.9 km are sourced from hydrogeologically distinct systems. In contrast, results for the Sudbury mines have MRTs of 0.2-0.6 and 0.2-0.9 Ga for Mines 1 and 2 respectively. While still old compared to almost all groundwaters reported in the literature to date, these younger residence times compared to Kidd Creek Mine are consistent with significant fracturing created by the impact event, facilitating more hydrogeologic connection and mixing of fluids in the basin. In all samples from both Kidd Creek Mine and Sudbury, a 124-128Xe excess is identified over modern air values. This is attributed to an early atmospheric xenon component, previously identified at Kidd Creek Mine but which has to date not been observed in fluids with a residence time as recent as 0.2-0.6 Ga. The temporal and spatial sampling at Kidd Creek Mine is also used to verify our proposed conceptual model which provides key constraints regarding distribution, volumes and residence times of fracture fluids on the smaller, regional, scale.