Sagace: A web-based search engine for biomedical databases in Japan
2012-01-01
Background In the big data era, biomedical research continues to generate a large amount of data, and the generated information is often stored in a database and made publicly available. Although combining data from multiple databases should accelerate further studies, the current number of life sciences databases is too large to grasp features and contents of each database. Findings We have developed Sagace, a web-based search engine that enables users to retrieve information from a range of biological databases (such as gene expression profiles and proteomics data) and biological resource banks (such as mouse models of disease and cell lines). With Sagace, users can search more than 300 databases in Japan. Sagace offers features tailored to biomedical research, including manually tuned ranking, a faceted navigation to refine search results, and rich snippets constructed with retrieved metadata for each database entry. Conclusions Sagace will be valuable for experts who are involved in biomedical research and drug development in both academia and industry. Sagace is freely available at http://sagace.nibio.go.jp/en/. PMID:23110816
The use and misuse of biomedical data: is bigger really better?
Hoffman, Sharona; Podgurski, Andy
2013-01-01
Very large biomedical research databases, containing electronic health records (EHR) and genomic data from millions of patients, have been heralded recently for their potential to accelerate scientific discovery and produce dramatic improvements in medical treatments. Research enabled by these databases may also lead to profound changes in law, regulation, social policy, and even litigation strategies. Yet, is "big data" necessarily better data? This paper makes an original contribution to the legal literature by focusing on what can go wrong in the process of biomedical database research and what precautions are necessary to avoid critical mistakes. We address three main reasons for approaching such research with care and being cautious in relying on its outcomes for purposes of public policy or litigation. First, the data contained in biomedical databases is surprisingly likely to be incorrect or incomplete. Second, systematic biases, arising from both the nature of the data and the preconceptions of investigators, are serious threats to the validity of research results, especially in answering causal questions. Third, data mining of biomedical databases makes it easier for individuals with political, social, or economic agendas to generate ostensibly scientific but misleading research findings for the purpose of manipulating public opinion and swaying policymakers. In short, this paper sheds much-needed light on the problems of credulous and uninformed acceptance of research results derived from biomedical databases. An understanding of the pitfalls of big data analysis is of critical importance to anyone who will rely on or dispute its outcomes, including lawyers, policymakers, and the public at large. The Article also recommends technical, methodological, and educational interventions to combat the dangers of database errors and abuses.
Dankar, Fida K; Ptitsyn, Andrey; Dankar, Samar K
2018-04-10
Contemporary biomedical databases include a wide range of information types from various observational and instrumental sources. Among the most important features that unite biomedical databases across the field are high volume of information and high potential to cause damage through data corruption, loss of performance, and loss of patient privacy. Thus, issues of data governance and privacy protection are essential for the construction of data depositories for biomedical research and healthcare. In this paper, we discuss various challenges of data governance in the context of population genome projects. The various challenges along with best practices and current research efforts are discussed through the steps of data collection, storage, sharing, analysis, and knowledge dissemination.
NCBI2RDF: enabling full RDF-based access to NCBI databases.
Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor
2013-01-01
RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments.
2013-01-01
Background A large-scale, highly accurate, machine-understandable drug-disease treatment relationship knowledge base is important for computational approaches to drug repurposing. The large body of published biomedical research articles and clinical case reports available on MEDLINE is a rich source of FDA-approved drug-disease indication as well as drug-repurposing knowledge that is crucial for applying FDA-approved drugs for new diseases. However, much of this information is buried in free text and not captured in any existing databases. The goal of this study is to extract a large number of accurate drug-disease treatment pairs from published literature. Results In this study, we developed a simple but highly accurate pattern-learning approach to extract treatment-specific drug-disease pairs from 20 million biomedical abstracts available on MEDLINE. We extracted a total of 34,305 unique drug-disease treatment pairs, the majority of which are not included in existing structured databases. Our algorithm achieved a precision of 0.904 and a recall of 0.131 in extracting all pairs, and a precision of 0.904 and a recall of 0.842 in extracting frequent pairs. In addition, we have shown that the extracted pairs strongly correlate with both drug target genes and therapeutic classes, therefore may have high potential in drug discovery. Conclusions We demonstrated that our simple pattern-learning relationship extraction algorithm is able to accurately extract many drug-disease pairs from the free text of biomedical literature that are not captured in structured databases. The large-scale, accurate, machine-understandable drug-disease treatment knowledge base that is resultant of our study, in combination with pairs from structured databases, will have high potential in computational drug repurposing tasks. PMID:23742147
NCBI2RDF: Enabling Full RDF-Based Access to NCBI Databases
Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor
2013-01-01
RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments. PMID:23984425
Jiang, Xiangying; Ringwald, Martin; Blake, Judith; Shatkay, Hagit
2017-01-01
The Gene Expression Database (GXD) is a comprehensive online database within the Mouse Genome Informatics resource, aiming to provide available information about endogenous gene expression during mouse development. The information stems primarily from many thousands of biomedical publications that database curators must go through and read. Given the very large number of biomedical papers published each year, automatic document classification plays an important role in biomedical research. Specifically, an effective and efficient document classifier is needed for supporting the GXD annotation workflow. We present here an effective yet relatively simple classification scheme, which uses readily available tools while employing feature selection, aiming to assist curators in identifying publications relevant to GXD. We examine the performance of our method over a large manually curated dataset, consisting of more than 25 000 PubMed abstracts, of which about half are curated as relevant to GXD while the other half as irrelevant to GXD. In addition to text from title-and-abstract, we also consider image captions, an important information source that we integrate into our method. We apply a captions-based classifier to a subset of about 3300 documents, for which the full text of the curated articles is available. The results demonstrate that our proposed approach is robust and effectively addresses the GXD document classification. Moreover, using information obtained from image captions clearly improves performance, compared to title and abstract alone, affirming the utility of image captions as a substantial evidence source for automatically determining the relevance of biomedical publications to a specific subject area. www.informatics.jax.org. © The Author(s) 2017. Published by Oxford University Press.
Biomedical databases: protecting privacy and promoting research.
Wylie, Jean E; Mineau, Geraldine P
2003-03-01
When combined with medical information, large electronic databases of information that identify individuals provide superlative resources for genetic, epidemiology and other biomedical research. Such research resources increasingly need to balance the protection of privacy and confidentiality with the promotion of research. Models that do not allow the use of such individual-identifying information constrain research; models that involve commercial interests raise concerns about what type of access is acceptable. Researchers, individuals representing the public interest and those developing regulatory guidelines must be involved in an ongoing dialogue to identify practical models.
Wei, Wei; Ji, Zhanglong; He, Yupeng; Zhang, Kai; Ha, Yuanchi; Li, Qi; Ohno-Machado, Lucila
2018-01-01
Abstract The number and diversity of biomedical datasets grew rapidly in the last decade. A large number of datasets are stored in various repositories, with different formats. Existing dataset retrieval systems lack the capability of cross-repository search. As a result, users spend time searching datasets in known repositories, and they typically do not find new repositories. The biomedical and healthcare data discovery index ecosystem (bioCADDIE) team organized a challenge to solicit new indexing and searching strategies for retrieving biomedical datasets across repositories. We describe the work of one team that built a retrieval pipeline and examined its performance. The pipeline used online resources to supplement dataset metadata, automatically generated queries from users’ free-text questions, produced high-quality retrieval results and achieved the highest inferred Normalized Discounted Cumulative Gain among competitors. The results showed that it is a promising solution for cross-database, cross-domain and cross-repository biomedical dataset retrieval. Database URL: https://github.com/w2wei/dataset_retrieval_pipeline PMID:29688374
Liljekvist, Mads Svane; Andresen, Kristoffer; Pommergaard, Hans-Christian; Rosenberg, Jacob
2015-01-01
Background. Open access (OA) journals allows access to research papers free of charge to the reader. Traditionally, biomedical researchers use databases like MEDLINE and EMBASE to discover new advances. However, biomedical OA journals might not fulfill such databases' criteria, hindering dissemination. The Directory of Open Access Journals (DOAJ) is a database exclusively listing OA journals. The aim of this study was to investigate DOAJ's coverage of biomedical OA journals compared with the conventional biomedical databases. Methods. Information on all journals listed in four conventional biomedical databases (MEDLINE, PubMed Central, EMBASE and SCOPUS) and DOAJ were gathered. Journals were included if they were (1) actively publishing, (2) full OA, (3) prospectively indexed in one or more database, and (4) of biomedical subject. Impact factor and journal language were also collected. DOAJ was compared with conventional databases regarding the proportion of journals covered, along with their impact factor and publishing language. The proportion of journals with articles indexed by DOAJ was determined. Results. In total, 3,236 biomedical OA journals were included in the study. Of the included journals, 86.7% were listed in DOAJ. Combined, the conventional biomedical databases listed 75.0% of the journals; 18.7% in MEDLINE; 36.5% in PubMed Central; 51.5% in SCOPUS and 50.6% in EMBASE. Of the journals in DOAJ, 88.7% published in English and 20.6% had received impact factor for 2012 compared with 93.5% and 26.0%, respectively, for journals in the conventional biomedical databases. A subset of 51.1% and 48.5% of the journals in DOAJ had articles indexed from 2012 and 2013, respectively. Of journals exclusively listed in DOAJ, one journal had received an impact factor for 2012, and 59.6% of the journals had no content from 2013 indexed in DOAJ. Conclusions. DOAJ is the most complete registry of biomedical OA journals compared with five conventional biomedical databases. However, DOAJ only indexes articles for half of the biomedical journals listed, making it an incomplete source for biomedical research papers in general.
Andresen, Kristoffer; Pommergaard, Hans-Christian; Rosenberg, Jacob
2015-01-01
Background. Open access (OA) journals allows access to research papers free of charge to the reader. Traditionally, biomedical researchers use databases like MEDLINE and EMBASE to discover new advances. However, biomedical OA journals might not fulfill such databases’ criteria, hindering dissemination. The Directory of Open Access Journals (DOAJ) is a database exclusively listing OA journals. The aim of this study was to investigate DOAJ’s coverage of biomedical OA journals compared with the conventional biomedical databases. Methods. Information on all journals listed in four conventional biomedical databases (MEDLINE, PubMed Central, EMBASE and SCOPUS) and DOAJ were gathered. Journals were included if they were (1) actively publishing, (2) full OA, (3) prospectively indexed in one or more database, and (4) of biomedical subject. Impact factor and journal language were also collected. DOAJ was compared with conventional databases regarding the proportion of journals covered, along with their impact factor and publishing language. The proportion of journals with articles indexed by DOAJ was determined. Results. In total, 3,236 biomedical OA journals were included in the study. Of the included journals, 86.7% were listed in DOAJ. Combined, the conventional biomedical databases listed 75.0% of the journals; 18.7% in MEDLINE; 36.5% in PubMed Central; 51.5% in SCOPUS and 50.6% in EMBASE. Of the journals in DOAJ, 88.7% published in English and 20.6% had received impact factor for 2012 compared with 93.5% and 26.0%, respectively, for journals in the conventional biomedical databases. A subset of 51.1% and 48.5% of the journals in DOAJ had articles indexed from 2012 and 2013, respectively. Of journals exclusively listed in DOAJ, one journal had received an impact factor for 2012, and 59.6% of the journals had no content from 2013 indexed in DOAJ. Conclusions. DOAJ is the most complete registry of biomedical OA journals compared with five conventional biomedical databases. However, DOAJ only indexes articles for half of the biomedical journals listed, making it an incomplete source for biomedical research papers in general. PMID:26038727
The BioMart community portal: an innovative alternative to large, centralized data repositories
USDA-ARS?s Scientific Manuscript database
The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified interface to biomedical databases that are distributed worldwide. The portal provides access to numerous database projects supported by 30 scientific organizations. It includes over 800 different biologi...
Vlassov, Vasiliy V; Danishevskiy, Kirill D
2008-01-01
In the 20th century, Russian biomedical science experienced a decline from the blossom of the early years to a drastic state. Through the first decades of the USSR, it was transformed to suit the ideological requirements of a totalitarian state and biased directives of communist leaders. Later, depressing economic conditions and isolation from the international research community further impeded its development. Contemporary Russia has inherited a system of medical education quite different from the west as well as counterproductive regulations for the allocation of research funding. The methodology of medical and epidemiological research in Russia is largely outdated. Epidemiology continues to focus on infectious disease and results of the best studies tend to be published in international periodicals. MEDLINE continues to be the best database to search for Russian biomedical publications, despite only a small proportion being indexed. The database of the Moscow Central Medical Library is the largest national database of medical periodicals, but does not provide abstracts and full subject heading codes, and it does not cover even the entire collection of the Library. New databases and catalogs (e.g. Panteleimon) that have appeared recently are incomplete and do not enable effective searching. PMID:18826569
Vlassov, Vasiliy V; Danishevskiy, Kirill D
2008-09-30
In the 20th century, Russian biomedical science experienced a decline from the blossom of the early years to a drastic state. Through the first decades of the USSR, it was transformed to suit the ideological requirements of a totalitarian state and biased directives of communist leaders. Later, depressing economic conditions and isolation from the international research community further impeded its development. Contemporary Russia has inherited a system of medical education quite different from the west as well as counterproductive regulations for the allocation of research funding. The methodology of medical and epidemiological research in Russia is largely outdated. Epidemiology continues to focus on infectious disease and results of the best studies tend to be published in international periodicals. MEDLINE continues to be the best database to search for Russian biomedical publications, despite only a small proportion being indexed. The database of the Moscow Central Medical Library is the largest national database of medical periodicals, but does not provide abstracts and full subject heading codes, and it does not cover even the entire collection of the Library. New databases and catalogs (e.g. Panteleimon) that have appeared recently are incomplete and do not enable effective searching.
DeTEXT: A Database for Evaluating Text Extraction from Biomedical Literature Figures
Yin, Xu-Cheng; Yang, Chun; Pei, Wei-Yi; Man, Haixia; Zhang, Jun; Learned-Miller, Erik; Yu, Hong
2015-01-01
Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information. A high-quality ground truth standard can greatly facilitate the development of an automated system. This article describes DeTEXT: A database for evaluating text extraction from biomedical literature figures. It is the first publicly available, human-annotated, high quality, and large-scale figure-text dataset with 288 full-text articles, 500 biomedical figures, and 9308 text regions. This article describes how figures were selected from open-access full-text biomedical articles and how annotation guidelines and annotation tools were developed. We also discuss the inter-annotator agreement and the reliability of the annotations. We summarize the statistics of the DeTEXT data and make available evaluation protocols for DeTEXT. Finally we lay out challenges we observed in the automated detection and recognition of figure text and discuss research directions in this area. DeTEXT is publicly available for downloading at http://prir.ustb.edu.cn/DeTEXT/. PMID:25951377
NASA Technical Reports Server (NTRS)
Grams, R. R.
1982-01-01
A system designed to access a large range of available medical textbook information in an online interactive fashion is described. A high level query type database manager, INQUIRE, is used. Operating instructions, system flow diagrams, database descriptions, text generation, and error messages are discussed. User information is provided.
DBMap: a TreeMap-based framework for data navigation and visualization of brain research registry
NASA Astrophysics Data System (ADS)
Zhang, Ming; Zhang, Hong; Tjandra, Donny; Wong, Stephen T. C.
2003-05-01
The purpose of this study is to investigate and apply a new, intuitive and space-conscious visualization framework to facilitate efficient data presentation and exploration of large-scale data warehouses. We have implemented the DBMap framework for the UCSF Brain Research Registry. Such a novel utility would facilitate medical specialists and clinical researchers in better exploring and evaluating a number of attributes organized in the brain research registry. The current UCSF Brain Research Registry consists of a federation of disease-oriented database modules, including Epilepsy, Brain Tumor, Intracerebral Hemorrphage, and CJD (Creuzfeld-Jacob disease). These database modules organize large volumes of imaging and non-imaging data to support Web-based clinical research. While the data warehouse supports general information retrieval and analysis, there lacks an effective way to visualize and present the voluminous and complex data stored. This study investigates whether the TreeMap algorithm can be adapted to display and navigate categorical biomedical data warehouse or registry. TreeMap is a space constrained graphical representation of large hierarchical data sets, mapped to a matrix of rectangles, whose size and color represent interested database fields. It allows the display of a large amount of numerical and categorical information in limited real estate of computer screen with an intuitive user interface. The paper will describe, DBMap, the proposed new data visualization framework for large biomedical databases. Built upon XML, Java and JDBC technologies, the prototype system includes a set of software modules that reside in the application server tier and provide interface to backend database tier and front-end Web tier of the brain registry.
Relational Databases and Biomedical Big Data.
de Silva, N H Nisansa D
2017-01-01
In various biomedical applications that collect, handle, and manipulate data, the amounts of data tend to build up and venture into the range identified as bigdata. In such occurrences, a design decision has to be taken as to what type of database would be used to handle this data. More often than not, the default and classical solution to this in the biomedical domain according to past research is relational databases. While this used to be the norm for a long while, it is evident that there is a trend to move away from relational databases in favor of other types and paradigms of databases. However, it still has paramount importance to understand the interrelation that exists between biomedical big data and relational databases. This chapter will review the pros and cons of using relational databases to store biomedical big data that previous researches have discussed and used.
An Interactive Iterative Method for Electronic Searching of Large Literature Databases
ERIC Educational Resources Information Center
Hernandez, Marco A.
2013-01-01
PubMed® is an on-line literature database hosted by the U.S. National Library of Medicine. Containing over 21 million citations for biomedical literature--both abstracts and full text--in the areas of the life sciences, behavioral studies, chemistry, and bioengineering, PubMed® represents an important tool for researchers. PubMed® searches return…
Olelo: a web application for intuitive exploration of biomedical literature
Niedermeier, Julian; Jankrift, Marcel; Tietböhl, Sören; Stachewicz, Toni; Folkerts, Hendrik; Uflacker, Matthias; Neves, Mariana
2017-01-01
Abstract Researchers usually query the large biomedical literature in PubMed via keywords, logical operators and filters, none of which is very intuitive. Question answering systems are an alternative to keyword searches. They allow questions in natural language as input and results reflect the given type of question, such as short answers and summaries. Few of those systems are available online but they experience drawbacks in terms of long response times and they support a limited amount of question and result types. Additionally, user interfaces are usually restricted to only displaying the retrieved information. For our Olelo web application, we combined biomedical literature and terminologies in a fast in-memory database to enable real-time responses to researchers’ queries. Further, we extended the built-in natural language processing features of the database with question answering and summarization procedures. Combined with a new explorative approach of document filtering and a clean user interface, Olelo enables a fast and intelligent search through the ever-growing biomedical literature. Olelo is available at http://www.hpi.de/plattner/olelo. PMID:28472397
KaBOB: ontology-based semantic integration of biomedical databases.
Livingston, Kevin M; Bada, Michael; Baumgartner, William A; Hunter, Lawrence E
2015-04-23
The ability to query many independent biological databases using a common ontology-based semantic model would facilitate deeper integration and more effective utilization of these diverse and rapidly growing resources. Despite ongoing work moving toward shared data formats and linked identifiers, significant problems persist in semantic data integration in order to establish shared identity and shared meaning across heterogeneous biomedical data sources. We present five processes for semantic data integration that, when applied collectively, solve seven key problems. These processes include making explicit the differences between biomedical concepts and database records, aggregating sets of identifiers denoting the same biomedical concepts across data sources, and using declaratively represented forward-chaining rules to take information that is variably represented in source databases and integrating it into a consistent biomedical representation. We demonstrate these processes and solutions by presenting KaBOB (the Knowledge Base Of Biomedicine), a knowledge base of semantically integrated data from 18 prominent biomedical databases using common representations grounded in Open Biomedical Ontologies. An instance of KaBOB with data about humans and seven major model organisms can be built using on the order of 500 million RDF triples. All source code for building KaBOB is available under an open-source license. KaBOB is an integrated knowledge base of biomedical data representationally based in prominent, actively maintained Open Biomedical Ontologies, thus enabling queries of the underlying data in terms of biomedical concepts (e.g., genes and gene products, interactions and processes) rather than features of source-specific data schemas or file formats. KaBOB resolves many of the issues that routinely plague biomedical researchers intending to work with data from multiple data sources and provides a platform for ongoing data integration and development and for formal reasoning over a wealth of integrated biomedical data.
The BioMedical Evidence Graph (BMEG) | Informatics Technology for Cancer Research (ITCR)
The BMEG is a Cancer Data integration Platform that utilizes methods collected from DREAM challenges and applied to large datasets, such as the TCGA, and makes them avalible for analysis using a high performance graph database
Should we search Chinese biomedical databases when performing systematic reviews?
Cohen, Jérémie F; Korevaar, Daniël A; Wang, Junfeng; Spijker, René; Bossuyt, Patrick M
2015-03-06
Chinese biomedical databases contain a large number of publications available to systematic reviewers, but it is unclear whether they are used for synthesizing the available evidence. We report a case of two systematic reviews on the accuracy of anti-cyclic citrullinated peptide for diagnosing rheumatoid arthritis. In one of these, the authors did not search Chinese databases; in the other, they did. We additionally assessed the extent to which Cochrane reviewers have searched Chinese databases in a systematic overview of the Cochrane Library (inception to 2014). The two diagnostic reviews included a total of 269 unique studies, but only 4 studies were included in both reviews. The first review included five studies published in the Chinese language (out of 151) while the second included 114 (out of 118). The summary accuracy estimates from the two reviews were comparable. Only 243 of the published 8,680 Cochrane reviews (less than 3%) searched one or more of the five major Chinese databases. These Chinese databases index about 2,500 journals, of which less than 6% are also indexed in MEDLINE. All 243 Cochrane reviews evaluated an intervention, 179 (74%) had at least one author with a Chinese affiliation; 118 (49%) addressed a topic in complementary or alternative medicine. Although searching Chinese databases may lead to the identification of a large amount of additional clinical evidence, Cochrane reviewers have rarely included them in their search strategy. We encourage future initiatives to evaluate more systematically the relevance of searching Chinese databases, as well as collaborative efforts to allow better incorporation of Chinese resources in systematic reviews.
Chembank | Office of Cancer Genomics
Funded in large part by the Initiative for Chemical Genetics (ICG), Chembank is an interactive database for small molecules. It contains data from hundreds of biomedically relevant small molecule screens that involved hundreds-of-thousands of compounds. Chembank also provides analysis tools to facilitate data mining.
Mehryary, Farrokh; Kaewphan, Suwisa; Hakala, Kai; Ginter, Filip
2016-01-01
Biomedical event extraction is one of the key tasks in biomedical text mining, supporting various applications such as database curation and hypothesis generation. Several systems, some of which have been applied at a large scale, have been introduced to solve this task. Past studies have shown that the identification of the phrases describing biological processes, also known as trigger detection, is a crucial part of event extraction, and notable overall performance gains can be obtained by solely focusing on this sub-task. In this paper we propose a novel approach for filtering falsely identified triggers from large-scale event databases, thus improving the quality of knowledge extraction. Our method relies on state-of-the-art word embeddings, event statistics gathered from the whole biomedical literature, and both supervised and unsupervised machine learning techniques. We focus on EVEX, an event database covering the whole PubMed and PubMed Central Open Access literature containing more than 40 million extracted events. The top most frequent EVEX trigger words are hierarchically clustered, and the resulting cluster tree is pruned to identify words that can never act as triggers regardless of their context. For rarely occurring trigger words we introduce a supervised approach trained on the combination of trigger word classification produced by the unsupervised clustering method and manual annotation. The method is evaluated on the official test set of BioNLP Shared Task on Event Extraction. The evaluation shows that the method can be used to improve the performance of the state-of-the-art event extraction systems. This successful effort also translates into removing 1,338,075 of potentially incorrect events from EVEX, thus greatly improving the quality of the data. The method is not solely bound to the EVEX resource and can be thus used to improve the quality of any event extraction system or database. The data and source code for this work are available at: http://bionlp-www.utu.fi/trigger-clustering/.
Erick Peirson, B R; Kropp, Heather; Damerow, Julia; Laubichler, Manfred D
2017-05-01
Contrary to concerns of some critics, we present evidence that biomedical research is not dominated by a small handful of model organisms. An exhaustive analysis of research literature suggests that the diversity of experimental organisms in biomedical research has increased substantially since 1975. There has been a longstanding worry that organism-centric funding policies can lead to biases in experimental organism choice, and thus negatively impact the direction of research and the interpretation of results. Critics have argued that a focus on model organisms has unduly constrained the diversity of experimental organisms. The availability of large electronic databases of scientific literature, combined with interest in quantitative methods among philosophers of science, presents new opportunities for data-driven investigations into organism choice in biomedical research. The diversity of organisms used in NIH-funded research may be considerably lower than in the broader biomedical sciences, and may be subject to greater constraints on organism choice. © 2017 WILEY Periodicals, Inc.
Amelogenin test: From forensics to quality control in clinical and biochemical genomics.
Francès, F; Portolés, O; González, J I; Coltell, O; Verdú, F; Castelló, A; Corella, D
2007-01-01
The increasing number of samples from the biomedical genetic studies and the number of centers participating in the same involves increasing risk of mistakes in the different sample handling stages. We have evaluated the usefulness of the amelogenin test for quality control in sample identification. Amelogenin test (frequently used in forensics) was undertaken on 1224 individuals participating in a biomedical study. Concordance between referred sex in the database and amelogenin test was estimated. Additional sex-error genetic detecting systems were developed. The overall concordance rate was 99.84% (1222/1224). Two samples showed a female amelogenin test outcome, being codified as males in the database. The first, after checking sex-specific biochemical and clinical profile data was found to be due to a codification error in the database. In the second, after checking the database, no apparent error was discovered because a correct male profile was found. False negatives in amelogenin male sex determination were discarded by additional tests, and feminine sex was confirmed. A sample labeling error was revealed after a new DNA extraction. The amelogenin test is a useful quality control tool for detecting sex-identification errors in large genomic studies, and can contribute to increase its validity.
Search and Graph Database Technologies for Biomedical Semantic Indexing: Experimental Analysis.
Segura Bedmar, Isabel; Martínez, Paloma; Carruana Martín, Adrián
2017-12-01
Biomedical semantic indexing is a very useful support tool for human curators in their efforts for indexing and cataloging the biomedical literature. The aim of this study was to describe a system to automatically assign Medical Subject Headings (MeSH) to biomedical articles from MEDLINE. Our approach relies on the assumption that similar documents should be classified by similar MeSH terms. Although previous work has already exploited the document similarity by using a k-nearest neighbors algorithm, we represent documents as document vectors by search engine indexing and then compute the similarity between documents using cosine similarity. Once the most similar documents for a given input document are retrieved, we rank their MeSH terms to choose the most suitable set for the input document. To do this, we define a scoring function that takes into account the frequency of the term into the set of retrieved documents and the similarity between the input document and each retrieved document. In addition, we implement guidelines proposed by human curators to annotate MEDLINE articles; in particular, the heuristic that says if 3 MeSH terms are proposed to classify an article and they share the same ancestor, they should be replaced by this ancestor. The representation of the MeSH thesaurus as a graph database allows us to employ graph search algorithms to quickly and easily capture hierarchical relationships such as the lowest common ancestor between terms. Our experiments show promising results with an F1 of 69% on the test dataset. To the best of our knowledge, this is the first work that combines search and graph database technologies for the task of biomedical semantic indexing. Due to its horizontal scalability, ElasticSearch becomes a real solution to index large collections of documents (such as the bibliographic database MEDLINE). Moreover, the use of graph search algorithms for accessing MeSH information could provide a support tool for cataloging MEDLINE abstracts in real time. ©Isabel Segura Bedmar, Paloma Martínez, Adrián Carruana Martín. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 01.12.2017.
TU-F-BRD-01: Biomedical Informatics for Medical Physicists
DOE Office of Scientific and Technical Information (OSTI.GOV)
Phillips, M; Kalet, I; McNutt, T
Biomedical informatics encompasses a very large domain of knowledge and applications. This broad and loosely defined field can make it difficult to navigate. Physicists often are called upon to provide informatics services and/or to take part in projects involving principles of the field. The purpose of the presentations in this symposium is to help medical physicists gain some knowledge about the breadth of the field and how, in the current clinical and research environment, they can participate and contribute. Three talks have been designed to give an overview from the perspective of physicists and to provide a more in-depth discussionmore » in two areas. One of the primary purposes, and the main subject of the first talk, is to help physicists achieve a perspective about the range of the topics and concepts that fall under the heading of 'informatics'. The approach is to de-mystify topics and jargon and to help physicists find resources in the field should they need them. The other talks explore two areas of biomedical informatics in more depth. The goal is to highlight two domains of intense current interest--databases and models--in enough depth into current approaches so that an adequate background for independent inquiry is achieved. These two areas will serve as good examples of how physicists, using informatics principles, can contribute to oncology practice and research. Learning Objectives: To understand how the principles of biomedical informatics are used by medical physicists. To put the relevant informatics concepts in perspective with regard to biomedicine in general. To use clinical database design as an example of biomedical informatics. To provide a solid background into the problems and issues of the design and use of data and databases in radiation oncology. To use modeling in the service of decision support systems as an example of modeling methods and data use. To provide a background into how uncertainty in our data and knowledge can be incorporated into modeling methods.« less
Ensemble Deep Learning for Biomedical Time Series Classification
2016-01-01
Ensemble learning has been proved to improve the generalization ability effectively in both theory and practice. In this paper, we briefly outline the current status of research on it first. Then, a new deep neural network-based ensemble method that integrates filtering views, local views, distorted views, explicit training, implicit training, subview prediction, and Simple Average is proposed for biomedical time series classification. Finally, we validate its effectiveness on the Chinese Cardiovascular Disease Database containing a large number of electrocardiogram recordings. The experimental results show that the proposed method has certain advantages compared to some well-known ensemble methods, such as Bagging and AdaBoost. PMID:27725828
An overview of biomedical literature search on the World Wide Web in the third millennium.
Kumar, Prince; Goel, Roshni; Jain, Chandni; Kumar, Ashish; Parashar, Abhishek; Gond, Ajay Ratan
2012-06-01
Complete access to the existing pool of biomedical literature and the ability to "hit" upon the exact information of the relevant specialty are becoming essential elements of academic and clinical expertise. With the rapid expansion of the literature database, it is almost impossible to keep up to date with every innovation. Using the Internet, however, most people can freely access this literature at any time, from almost anywhere. This paper highlights the use of the Internet in obtaining valuable biomedical research information, which is mostly available from journals, databases, textbooks and e-journals in the form of web pages, text materials, images, and so on. The authors present an overview of web-based resources for biomedical researchers, providing information about Internet search engines (e.g., Google), web-based bibliographic databases (e.g., PubMed, IndMed) and how to use them, and other online biomedical resources that can assist clinicians in reaching well-informed clinical decisions.
[Scientometrics and bibliometrics of biomedical engineering periodicals and papers].
Zhao, Ping; Xu, Ping; Li, Bingyan; Wang, Zhengrong
2003-09-01
This investigation was made to reveal the current status, research trend and research level of biomedical engineering in Chinese mainland by means of scientometrics and to assess the quality of the four domestic publications by bibliometrics. We identified all articles of four related publications by searching Chinese and foreign databases from 1997 to 2001. All articles collected or cited by these databases were searched and statistically analyzed for finding out the relevant distributions, including databases, years, authors, institutions, subject headings and subheadings. The source of sustentation funds and the related articles were analyzed too. The results showed that two journals were cited by two foreign databases and five Chinese databases simultaneously. The output of Journal of Biomedical Engineering was the highest. Its quantity of original papers cited by EI, CA and the totality of papers sponsored by funds were higher than those of the others, but the quantity and percentage per year of biomedical articles cited by EI were decreased in all. Inland core authors and institutions had come into being in the field of biomedical engineering. Their research topics were mainly concentrated on ten subject headings which included biocompatible materials, computer-assisted signal processing, electrocardiography, computer-assisted image processing, biomechanics, algorithms, electroencephalography, automatic data processing, mechanical stress, hemodynamics, mathematical computing, microcomputers, theoretical models, etc. The main subheadings were concentrated on instrumentation, physiopathology, diagnosis, therapy, ultrasonography, physiology, analysis, surgery, pathology, method, etc.
[The biomedical periodicals of Hungarian editions--historical overview].
Berhidi, Anna; Geges, József; Vasas, Lívia
2006-03-12
The majority of Hungarian scientific results are published in international periodicals in foreign languages. Yet the publications in Hungarian scientific periodicals also should not be ignored. This study analyses biomedical periodicals of Hungarian edition from different points of view. Based on different databases a list of titles consisting of 119 items resulted, which contains both the core and the peripheral journals of the biomedical field. These periodicals were analysed empirically, one by one: checking out the titles. 13 of the titles are ceased, among the rest 106 Hungarian scientific journals 10 are published in English language. From the remaining majority of Hungarian language and publishing only a few show up in international databases. Although quarter of the Hungarian biomedical journals meet the requirements, which means they could be represented in international databases, these periodicals are not indexed. 42 biomedical periodicals are available online. Although quarter of these journals come with restricted access. 2/3 of the Hungarian biomedical journals have detailed instructions to authors. These instructions inform the publishing doctors and researchers of the requirements of a biomedical periodical. The increasing number of Hungarian biomedical journals published is welcome news. But it would be important for quality publications which are cited a lot to appear in the Hungarian journals. The more publications are cited, the more journals and authors gain in prestige on home and international level.
Vieira, A.
2010-01-01
Background: In relation to pharmacognosy, an objective of many ethnobotanical studies is to identify plant species to be further investigated, for example, tested in disease models related to the ethnomedicinal application. To further warrant such testing, research evidence for medicinal applications of these plants (or of their major phytochemical constituents and metabolic derivatives) is typically analyzed in biomedical databases. Methods: As a model of this process, the current report presents novel information regarding traditional anti-inflammation and anti-infection medicinal plant use. This information was obtained from an interview-based ethnobotanical study; and was compared with current biomedical evidence using the Medline® database. Results: Of the 8 anti-infection plant species identified in the ethnobotanical study, 7 have related activities reported in the database; and of the 6 anti-inflammation plants, 4 have related activities in the database. Conclusion: Based on novel and complimentary results from the ethnobotanical and biomedical database analyses, it is suggested that some of these plants warrant additional investigation of potential anti-inflammatory or anti-infection activities in related disease models, and also additional studies in other population groups. PMID:21589754
Intelligent Interfaces for Mining Large-Scale RNAi-HCS Image Databases
Lin, Chen; Mak, Wayne; Hong, Pengyu; Sepp, Katharine; Perrimon, Norbert
2010-01-01
Recently, High-content screening (HCS) has been combined with RNA interference (RNAi) to become an essential image-based high-throughput method for studying genes and biological networks through RNAi-induced cellular phenotype analyses. However, a genome-wide RNAi-HCS screen typically generates tens of thousands of images, most of which remain uncategorized due to the inadequacies of existing HCS image analysis tools. Until now, it still requires highly trained scientists to browse a prohibitively large RNAi-HCS image database and produce only a handful of qualitative results regarding cellular morphological phenotypes. For this reason we have developed intelligent interfaces to facilitate the application of the HCS technology in biomedical research. Our new interfaces empower biologists with computational power not only to effectively and efficiently explore large-scale RNAi-HCS image databases, but also to apply their knowledge and experience to interactive mining of cellular phenotypes using Content-Based Image Retrieval (CBIR) with Relevance Feedback (RF) techniques. PMID:21278820
A biomedical information system for retrieval and manipulation of NHANES data.
Mukherjee, Sukrit; Martins, David; Norris, Keith C; Jenders, Robert A
2013-01-01
The retrieval and manipulation of data from large public databases like the U.S. National Health and Nutrition Examination Survey (NHANES) may require sophisticated statistical software and significant expertise that may be unavailable in the university setting. In response, we have developed the Data Retrieval And Manipulation System (DReAMS), an automated information system to handle all processes of data extraction and cleaning and then joining different subsets to produce analysis-ready output. The system is a browser-based data warehouse application in which the input data from flat files or operational systems are aggregated in a structured way so that the desired data can be read, recoded, queried and extracted efficiently. The current pilot implementation of the system provides access to a limited amount of NHANES database. We plan to increase the amount of data available through the system in the near future and to extend the techniques to other large databases from CDU archive with a current holding of about 53 databases.
Vuckovic-Dekic, Ljiljana; Gavrilovic, Dusica
2016-01-01
To investigate the dynamics of indexing the Journal of the Balkan Union of Oncology (JBUON) in important biomedical databases, the effects on the quantity and type of published articles, and also the countries of the (co)authors of these papers. The process of the JBUON indexing started with EMBASE/Excerpta Medica, followed in 2006 (PUBMED/MEDLINE) and continued every second year in other important biomedical databases, until 2012 when JBUON became Open Access Journal (for even more information please visit www.jbuon.com). Including the next two years for monitoring the effect of the last indexing, we analyzed 9 volumes consisting of 36 issues that were published from January 2006 to December 2014, with regard to the number and category of articles, the contribution of authors from Balkan and non-Balkan countries, and the (co)authorship in the published articles. In the period 2006-2014, 1165 articles of different categories were published in J BUON. The indexing progress of JBUON immediately increased the submission rate, and enlarged the number of publications, original papers in particular, in every volume of JBUON. Authors from Balkan countries contributed in 80.7% of all articles. The average number of coauthors per original article grew slowly and was higher at the end of the investigated period than at the start (6.6 and 5.8, respectively). The progressing covering of JBUON in important biomedical databases and its visibility on international level attracted the attention of a large readership, and submission rate and the number of published articles grew significantly, particularly the number of original papers. This is the most important consequence of the editorial policy which will hopefully lead to even more progress of JBUON in the near future.
Durack, Jeremy C.; Chao, Chih-Chien; Stevenson, Derek; Andriole, Katherine P.; Dev, Parvati
2002-01-01
Medical media collections are growing at a pace that exceeds the value they currently provide as research and educational resources. To address this issue, the Stanford MediaServer was designed to promote innovative multimedia-based application development. The nucleus of the MediaServer platform is a digital media database strategically designed to meet the information needs of many biomedical disciplines. Key features include an intuitive web-based interface for collaboratively populating the media database, flexible creation of media collections for diverse and specialized purposes, and the ability to construct a variety of end-user applications from the same database to support biomedical education and research. PMID:12463820
Durack, Jeremy C; Chao, Chih-Chien; Stevenson, Derek; Andriole, Katherine P; Dev, Parvati
2002-01-01
Medical media collections are growing at a pace that exceeds the value they currently provide as research and educational resources. To address this issue, the Stanford MediaServer was designed to promote innovative multimedia-based application development. The nucleus of the MediaServer platform is a digital media database strategically designed to meet the information needs of many biomedical disciplines. Key features include an intuitive web-based interface for collaboratively populating the media database, flexible creation of media collections for diverse and specialized purposes, and the ability to construct a variety of end-user applications from the same database to support biomedical education and research.
A scoping review of competencies for scientific editors of biomedical journals.
Galipeau, James; Barbour, Virginia; Baskin, Patricia; Bell-Syer, Sally; Cobey, Kelly; Cumpston, Miranda; Deeks, Jon; Garner, Paul; MacLehose, Harriet; Shamseer, Larissa; Straus, Sharon; Tugwell, Peter; Wager, Elizabeth; Winker, Margaret; Moher, David
2016-02-02
Biomedical journals are the main route for disseminating the results of health-related research. Despite this, their editors operate largely without formal training or certification. To our knowledge, no body of literature systematically identifying core competencies for scientific editors of biomedical journals exists. Therefore, we aimed to conduct a scoping review to determine what is known on the competency requirements for scientific editors of biomedical journals. We searched the MEDLINE®, Cochrane Library, Embase®, CINAHL, PsycINFO, and ERIC databases (from inception to November 2014) and conducted a grey literature search for research and non-research articles with competency-related statements (i.e. competencies, knowledge, skills, behaviors, and tasks) pertaining to the role of scientific editors of peer-reviewed health-related journals. We also conducted an environmental scan, searched the results of a previous environmental scan, and searched the websites of existing networks, major biomedical journal publishers, and organizations that offer resources for editors. A total of 225 full-text publications were included, 25 of which were research articles. We extracted a total of 1,566 statements possibly related to core competencies for scientific editors of biomedical journals from these publications. We then collated overlapping or duplicate statements which produced a list of 203 unique statements. Finally, we grouped these statements into seven emergent themes: (1) dealing with authors, (2) dealing with peer reviewers, (3) journal publishing, (4) journal promotion, (5) editing, (6) ethics and integrity, and (7) qualities and characteristics of editors. To our knowledge, this scoping review is the first attempt to systematically identify possible competencies of editors. Limitations are that (1) we may not have captured all aspects of a biomedical editor's work in our searches, (2) removing redundant and overlapping items may have led to the elimination of some nuances between items, (3) restricting to certain databases, and only French and English publications, may have excluded relevant publications, and (4) some statements may not necessarily be competencies. This scoping review is the first step of a program to develop a minimum set of core competencies for scientific editors of biomedical journals which will be followed by a training needs assessment, a Delphi exercise, and a consensus meeting.
2013-01-01
Background Most of the institutional and research information in the biomedical domain is available in the form of English text. Even in countries where English is an official language, such as the United States, language can be a barrier for accessing biomedical information for non-native speakers. Recent progress in machine translation suggests that this technique could help make English texts accessible to speakers of other languages. However, the lack of adequate specialized corpora needed to train statistical models currently limits the quality of automatic translations in the biomedical domain. Results We show how a large-sized parallel corpus can automatically be obtained for the biomedical domain, using the MEDLINE database. The corpus generated in this work comprises article titles obtained from MEDLINE and abstract text automatically retrieved from journal websites, which substantially extends the corpora used in previous work. After assessing the quality of the corpus for two language pairs (English/French and English/Spanish) we use the Moses package to train a statistical machine translation model that outperforms previous models for automatic translation of biomedical text. Conclusions We have built translation data sets in the biomedical domain that can easily be extended to other languages available in MEDLINE. These sets can successfully be applied to train statistical machine translation models. While further progress should be made by incorporating out-of-domain corpora and domain-specific lexicons, we believe that this work improves the automatic translation of biomedical texts. PMID:23631733
Bosnian and Herzegovinian medical scientists in PubMed database.
Masic, Izet
2013-01-01
In this paper it is shortly presented PubMed as one of the most important on-line databases of the scientific biomedical literature. Also, the author has analyzed the most cited authors, professors of the medical faculties in Bosnia and Herzegovina, from the published papers in the biomedical journals abstracted and indexed in PubMed.
Large-scale annotation of small-molecule libraries using public databases.
Zhou, Yingyao; Zhou, Bin; Chen, Kaisheng; Yan, S Frank; King, Frederick J; Jiang, Shumei; Winzeler, Elizabeth A
2007-01-01
While many large publicly accessible databases provide excellent annotation for biological macromolecules, the same is not true for small chemical compounds. Commercial data sources also fail to encompass an annotation interface for large numbers of compounds and tend to be cost prohibitive to be widely available to biomedical researchers. Therefore, using annotation information for the selection of lead compounds from a modern day high-throughput screening (HTS) campaign presently occurs only under a very limited scale. The recent rapid expansion of the NIH PubChem database provides an opportunity to link existing biological databases with compound catalogs and provides relevant information that potentially could improve the information garnered from large-scale screening efforts. Using the 2.5 million compound collection at the Genomics Institute of the Novartis Research Foundation (GNF) as a model, we determined that approximately 4% of the library contained compounds with potential annotation in such databases as PubChem and the World Drug Index (WDI) as well as related databases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) and ChemIDplus. Furthermore, the exact structure match analysis showed 32% of GNF compounds can be linked to third party databases via PubChem. We also showed annotations such as MeSH (medical subject headings) terms can be applied to in-house HTS databases in identifying signature biological inhibition profiles of interest as well as expediting the assay validation process. The automated annotation of thousands of screening hits in batch is becoming feasible and has the potential to play an essential role in the hit-to-lead decision making process.
Corwin, John; Silberschatz, Avi; Miller, Perry L; Marenco, Luis
2007-01-01
Data sparsity and schema evolution issues affecting clinical informatics and bioinformatics communities have led to the adoption of vertical or object-attribute-value-based database schemas to overcome limitations posed when using conventional relational database technology. This paper explores these issues and discusses why biomedical data are difficult to model using conventional relational techniques. The authors propose a solution to these obstacles based on a relational database engine using a sparse, column-store architecture. The authors provide benchmarks comparing the performance of queries and schema-modification operations using three different strategies: (1) the standard conventional relational design; (2) past approaches used by biomedical informatics researchers; and (3) their sparse, column-store architecture. The performance results show that their architecture is a promising technique for storing and processing many types of data that are not handled well by the other two semantic data models.
Dumitrascu, Dan L
2018-01-01
There is a competition between scientific journals in order to achieve leadership in their scientific field. There are several Romanian biomedical journals which are published in English and a smaller number in Romanian. We need a periodical analysis of their visibility and ranking according to scientometric measures. We searched all biomedical journals indexed on international data bases: Web of Science, PubMed, Scopus, Embase, Google Scholar. We analyzed their evaluation factors. Several journals from Romania in the biomedical field are indexed in international databases. Their scientometric indexes are not high. The best journal was acquired by an international publisher and is no longer listed for Romania. There are several Romanian biomedical journals indexed in international databases that deserve periodical analysis. There is a need to improve their ranking.
Bravo, Àlex; Piñero, Janet; Queralt-Rosinach, Núria; Rautschka, Michael; Furlong, Laura I
2015-02-21
Current biomedical research needs to leverage and exploit the large amount of information reported in scientific publications. Automated text mining approaches, in particular those aimed at finding relationships between entities, are key for identification of actionable knowledge from free text repositories. We present the BeFree system aimed at identifying relationships between biomedical entities with a special focus on genes and their associated diseases. By exploiting morpho-syntactic information of the text, BeFree is able to identify gene-disease, drug-disease and drug-target associations with state-of-the-art performance. The application of BeFree to real-case scenarios shows its effectiveness in extracting information relevant for translational research. We show the value of the gene-disease associations extracted by BeFree through a number of analyses and integration with other data sources. BeFree succeeds in identifying genes associated to a major cause of morbidity worldwide, depression, which are not present in other public resources. Moreover, large-scale extraction and analysis of gene-disease associations, and integration with current biomedical knowledge, provided interesting insights on the kind of information that can be found in the literature, and raised challenges regarding data prioritization and curation. We found that only a small proportion of the gene-disease associations discovered by using BeFree is collected in expert-curated databases. Thus, there is a pressing need to find alternative strategies to manual curation, in order to review, prioritize and curate text-mining data and incorporate it into domain-specific databases. We present our strategy for data prioritization and discuss its implications for supporting biomedical research and applications. BeFree is a novel text mining system that performs competitively for the identification of gene-disease, drug-disease and drug-target associations. Our analyses show that mining only a small fraction of MEDLINE results in a large dataset of gene-disease associations, and only a small proportion of this dataset is actually recorded in curated resources (2%), raising several issues on data prioritization and curation. We propose that joint analysis of text mined data with data curated by experts appears as a suitable approach to both assess data quality and highlight novel and interesting information.
Knowledge based word-concept model estimation and refinement for biomedical text mining.
Jimeno Yepes, Antonio; Berlanga, Rafael
2015-02-01
Text mining of scientific literature has been essential for setting up large public biomedical databases, which are being widely used by the research community. In the biomedical domain, the existence of a large number of terminological resources and knowledge bases (KB) has enabled a myriad of machine learning methods for different text mining related tasks. Unfortunately, KBs have not been devised for text mining tasks but for human interpretation, thus performance of KB-based methods is usually lower when compared to supervised machine learning methods. The disadvantage of supervised methods though is they require labeled training data and therefore not useful for large scale biomedical text mining systems. KB-based methods do not have this limitation. In this paper, we describe a novel method to generate word-concept probabilities from a KB, which can serve as a basis for several text mining tasks. This method not only takes into account the underlying patterns within the descriptions contained in the KB but also those in texts available from large unlabeled corpora such as MEDLINE. The parameters of the model have been estimated without training data. Patterns from MEDLINE have been built using MetaMap for entity recognition and related using co-occurrences. The word-concept probabilities were evaluated on the task of word sense disambiguation (WSD). The results showed that our method obtained a higher degree of accuracy than other state-of-the-art approaches when evaluated on the MSH WSD data set. We also evaluated our method on the task of document ranking using MEDLINE citations. These results also showed an increase in performance over existing baseline retrieval approaches. Copyright © 2014 Elsevier Inc. All rights reserved.
Biomedical science journals in the Arab world.
Tadmouri, Ghazi O
2004-10-01
Medieval Arab scientists established the basis of medical practice and gave important attention to the publication of scientific results. At present, modern scientific publishing in the Arab world is in its developmental stage. Arab biomedical journals are less than 300, most of which are published in Egypt, Lebanon, and the Kingdom of Saudi Arabia. Yet, many of these journals do not have on-line access or are indexed in major bibliographic databases. The majority of indexed journals, however, do not have a stable presence in the popular PubMed database and their indexes are discontinued since 2001. The exposure of Arab biomedical journals in international indices undoubtedly plays an important role in improving the scientific quality of these journals. The successful examples discussed in this review encourage us to call for the formation of a consortium of Arab biomedical journal publishers to assist in redressing the balance of the region from biomedical data consumption to data production.
A novel biomedical image indexing and retrieval system via deep preference learning.
Pang, Shuchao; Orgun, Mehmet A; Yu, Zhezhou
2018-05-01
The traditional biomedical image retrieval methods as well as content-based image retrieval (CBIR) methods originally designed for non-biomedical images either only consider using pixel and low-level features to describe an image or use deep features to describe images but still leave a lot of room for improving both accuracy and efficiency. In this work, we propose a new approach, which exploits deep learning technology to extract the high-level and compact features from biomedical images. The deep feature extraction process leverages multiple hidden layers to capture substantial feature structures of high-resolution images and represent them at different levels of abstraction, leading to an improved performance for indexing and retrieval of biomedical images. We exploit the current popular and multi-layered deep neural networks, namely, stacked denoising autoencoders (SDAE) and convolutional neural networks (CNN) to represent the discriminative features of biomedical images by transferring the feature representations and parameters of pre-trained deep neural networks from another domain. Moreover, in order to index all the images for finding the similarly referenced images, we also introduce preference learning technology to train and learn a kind of a preference model for the query image, which can output the similarity ranking list of images from a biomedical image database. To the best of our knowledge, this paper introduces preference learning technology for the first time into biomedical image retrieval. We evaluate the performance of two powerful algorithms based on our proposed system and compare them with those of popular biomedical image indexing approaches and existing regular image retrieval methods with detailed experiments over several well-known public biomedical image databases. Based on different criteria for the evaluation of retrieval performance, experimental results demonstrate that our proposed algorithms outperform the state-of-the-art techniques in indexing biomedical images. We propose a novel and automated indexing system based on deep preference learning to characterize biomedical images for developing computer aided diagnosis (CAD) systems in healthcare. Our proposed system shows an outstanding indexing ability and high efficiency for biomedical image retrieval applications and it can be used to collect and annotate the high-resolution images in a biomedical database for further biomedical image research and applications. Copyright © 2018 Elsevier B.V. All rights reserved.
[Presence of the biomedical periodicals of Hungarian editions in international databases].
Vasas, Lívia; Hercsel, Imréné
2006-01-15
Presence of the biomedical periodicals of Hungarian editions in international databases. The majority of Hungarian scientific results in medical and related sciences are published in scientific periodicals of foreign edition with high impact factor (IF) values, and they appear in international scientific literature in foreign languages. In this study the authors dealt with the presence and registered citation in international databases of those periodicals only, which had been published in Hungary and/or in cooperation with foreign publishing companies. The examination went back to year 1980 and covered a 25-year long period. 110 periodicals were selected for more detailed examination. The authors analyzed the situation of the current periodicals in the three most often visited databases (MEDLINE, EMBASE, Web of Science), and discovered, that the biomedical scientific periodicals of Hungarian interests were not represented with reasonable emphasis in the relevant international bibliographic databases. Because of the great number of data the scientific literature of medicine and related sciences could not be represented in its entirety, this publication, however, might give useful information for the inquirers, and call the attention of the competent people.
The BioLexicon: a large-scale terminological resource for biomedical text mining
2011-01-01
Background Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events) involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. Results This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized) together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is modelled using the Lexical Markup Framework, an ISO standard. Conclusions The BioLexicon contains over 2.2 M lexical entries and over 1.8 M terminological variants, as well as over 3.3 M semantic relations, including over 2 M synonymy relations. Its exploitation can benefit both application developers and users. We demonstrate some such benefits by describing integration of the resource into a number of different tools, and evaluating improvements in performance that this can bring. PMID:21992002
The BioLexicon: a large-scale terminological resource for biomedical text mining.
Thompson, Paul; McNaught, John; Montemagni, Simonetta; Calzolari, Nicoletta; del Gratta, Riccardo; Lee, Vivian; Marchi, Simone; Monachini, Monica; Pezik, Piotr; Quochi, Valeria; Rupp, C J; Sasaki, Yutaka; Venturi, Giulia; Rebholz-Schuhmann, Dietrich; Ananiadou, Sophia
2011-10-12
Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events) involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized) together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is modelled using the Lexical Markup Framework, an ISO standard. The BioLexicon contains over 2.2 M lexical entries and over 1.8 M terminological variants, as well as over 3.3 M semantic relations, including over 2 M synonymy relations. Its exploitation can benefit both application developers and users. We demonstrate some such benefits by describing integration of the resource into a number of different tools, and evaluating improvements in performance that this can bring.
Predicting structured metadata from unstructured metadata
Posch, Lisa; Panahiazar, Maryam; Dumontier, Michel; Gevaert, Olivier
2016-01-01
Enormous amounts of biomedical data have been and are being produced by investigators all over the world. However, one crucial and limiting factor in data reuse is accurate, structured and complete description of the data or data about the data—defined as metadata. We propose a framework to predict structured metadata terms from unstructured metadata for improving quality and quantity of metadata, using the Gene Expression Omnibus (GEO) microarray database. Our framework consists of classifiers trained using term frequency-inverse document frequency (TF-IDF) features and a second approach based on topics modeled using a Latent Dirichlet Allocation model (LDA) to reduce the dimensionality of the unstructured data. Our results on the GEO database show that structured metadata terms can be the most accurately predicted using the TF-IDF approach followed by LDA both outperforming the majority vote baseline. While some accuracy is lost by the dimensionality reduction of LDA, the difference is small for elements with few possible values, and there is a large improvement over the majority classifier baseline. Overall this is a promising approach for metadata prediction that is likely to be applicable to other datasets and has implications for researchers interested in biomedical metadata curation and metadata prediction. Database URL: http://www.yeastgenome.org/ PMID:28637268
Benson, Dennis A.; Karsch-Mizrachi, Ilene; Lipman, David J.; Ostell, James; Wheeler, David L.
2007-01-01
GenBank (R) is a comprehensive database that contains publicly available nucleotide sequences for more than 240 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage (). PMID:17202161
Chen, Hongyu; Martin, Bronwen; Daimon, Caitlin M; Maudsley, Stuart
2013-01-01
Text mining is rapidly becoming an essential technique for the annotation and analysis of large biological data sets. Biomedical literature currently increases at a rate of several thousand papers per week, making automated information retrieval methods the only feasible method of managing this expanding corpus. With the increasing prevalence of open-access journals and constant growth of publicly-available repositories of biomedical literature, literature mining has become much more effective with respect to the extraction of biomedically-relevant data. In recent years, text mining of popular databases such as MEDLINE has evolved from basic term-searches to more sophisticated natural language processing techniques, indexing and retrieval methods, structural analysis and integration of literature with associated metadata. In this review, we will focus on Latent Semantic Indexing (LSI), a computational linguistics technique increasingly used for a variety of biological purposes. It is noted for its ability to consistently outperform benchmark Boolean text searches and co-occurrence models at information retrieval and its power to extract indirect relationships within a data set. LSI has been used successfully to formulate new hypotheses, generate novel connections from existing data, and validate empirical data.
An active visual search interface for Medline.
Xuan, Weijian; Dai, Manhong; Mirel, Barbara; Wilson, Justin; Athey, Brian; Watson, Stanley J; Meng, Fan
2007-01-01
Searching the Medline database is almost a daily necessity for many biomedical researchers. However, available Medline search solutions are mainly designed for the quick retrieval of a small set of most relevant documents. Because of this search model, they are not suitable for the large-scale exploration of literature and the underlying biomedical conceptual relationships, which are common tasks in the age of high throughput experimental data analysis and cross-discipline research. We try to develop a new Medline exploration approach by incorporating interactive visualization together with powerful grouping, summary, sorting and active external content retrieval functions. Our solution, PubViz, is based on the FLEX platform designed for interactive web applications and its prototype is publicly available at: http://brainarray.mbni.med.umich.edu/Brainarray/DataMining/PubViz.
Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Wheeler, David L
2008-01-01
GenBank (R) is a comprehensive database that contains publicly available nucleotide sequences for more than 260 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov.
Benson, Dennis A.; Karsch-Mizrachi, Ilene; Lipman, David J.; Ostell, James; Wheeler, David L.
2008-01-01
GenBank (R) is a comprehensive database that contains publicly available nucleotide sequences for more than 260 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov PMID:18073190
Integrating systems biology models and biomedical ontologies
2011-01-01
Background Systems biology is an approach to biology that emphasizes the structure and dynamic behavior of biological systems and the interactions that occur within them. To succeed, systems biology crucially depends on the accessibility and integration of data across domains and levels of granularity. Biomedical ontologies were developed to facilitate such an integration of data and are often used to annotate biosimulation models in systems biology. Results We provide a framework to integrate representations of in silico systems biology with those of in vivo biology as described by biomedical ontologies and demonstrate this framework using the Systems Biology Markup Language. We developed the SBML Harvester software that automatically converts annotated SBML models into OWL and we apply our software to those biosimulation models that are contained in the BioModels Database. We utilize the resulting knowledge base for complex biological queries that can bridge levels of granularity, verify models based on the biological phenomenon they represent and provide a means to establish a basic qualitative layer on which to express the semantics of biosimulation models. Conclusions We establish an information flow between biomedical ontologies and biosimulation models and we demonstrate that the integration of annotated biosimulation models and biomedical ontologies enables the verification of models as well as expressive queries. Establishing a bi-directional information flow between systems biology and biomedical ontologies has the potential to enable large-scale analyses of biological systems that span levels of granularity from molecules to organisms. PMID:21835028
Lyon, Jennifer A; Garcia-Milian, Rolando; Norton, Hannah F; Tennant, Michele R
2014-01-01
Expert-mediated literature searching, a keystone service in biomedical librarianship, would benefit significantly from regular methodical review. This article describes the novel use of Research Electronic Data Capture (REDCap) software to create a database of literature searches conducted at a large academic health sciences library. An archive of paper search requests was entered into REDCap, and librarians now prospectively enter records for current searches. Having search data readily available allows librarians to reuse search strategies and track their workload. In aggregate, this data can help guide practice and determine priorities by identifying users' needs, tracking librarian effort, and focusing librarians' continuing education.
GeneView: a comprehensive semantic search engine for PubMed.
Thomas, Philippe; Starlinger, Johannes; Vowinkel, Alexander; Arzt, Sebastian; Leser, Ulf
2012-07-01
Research results are primarily published in scientific literature and curation efforts cannot keep up with the rapid growth of published literature. The plethora of knowledge remains hidden in large text repositories like MEDLINE. Consequently, life scientists have to spend a great amount of time searching for specific information. The enormous ambiguity among most names of biomedical objects such as genes, chemicals and diseases often produces too large and unspecific search results. We present GeneView, a semantic search engine for biomedical knowledge. GeneView is built upon a comprehensively annotated version of PubMed abstracts and openly available PubMed Central full texts. This semi-structured representation of biomedical texts enables a number of features extending classical search engines. For instance, users may search for entities using unique database identifiers or they may rank documents by the number of specific mentions they contain. Annotation is performed by a multitude of state-of-the-art text-mining tools for recognizing mentions from 10 entity classes and for identifying protein-protein interactions. GeneView currently contains annotations for >194 million entities from 10 classes for ∼21 million citations with 271,000 full text bodies. GeneView can be searched at http://bc3.informatik.hu-berlin.de/.
SparkText: Biomedical Text Mining on Big Data Framework.
Ye, Zhan; Tafti, Ahmad P; He, Karen Y; Wang, Kai; He, Max M
Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research.
SparkText: Biomedical Text Mining on Big Data Framework
He, Karen Y.; Wang, Kai
2016-01-01
Background Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. Results In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. Conclusions This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research. PMID:27685652
Since the early 1970s, the National Library of Medicine (NLM) has made searching the biomedical literature faster and easier by providing online information on NLMs family of databases -- (currently 40 online databases). MEDLINE?, NLMs premier database, has over 8.5 million citat...
ERIC Educational Resources Information Center
Kollegger, James G.; And Others
1988-01-01
In the first of three articles, the producer of Energyline, Energynet, and Tele/Scope recalls the development of the databases and database business strategies. The second describes the development of biomedical online databases, and the third discusses future developments, including full text databases, database producers as online host, and…
Identification of biomedical journals in Spain and Latin America.
Bonfill, Xavier; Osorio, Dimelza; Posso, Margarita; Solà, Ivan; Rada, Gabriel; Torres, Ania; García Dieguez, Marcelo; Piña-Pozas, Maricela; Díaz-García, Luisa; Tristán, Mario; Gandarilla, Omar; Rincón-Valenzuela, David A; Martí, Arturo; Hidalgo, Ricardo; Simancas-Racines, Daniel; López, Luis; Correa, Ricardo; Rojas-De-Arias, Antonieta; Loza, César; Gianneo, Óscar; Pardo, Hector
2015-12-01
Journals in languages other than English that publish original clinical research are often not well covered in the main biomedical databases and therefore often not included in systematic reviews. This study aimed to identify Spanish language biomedical journals from Spain and Latin America and to describe their main features. Journals were identified in electronic databases, publishers' catalogues and local registries. Eligibility was determined by assessing data from these sources or the journals' websites, when available. A total of 2457 journals were initially identified; 1498 met inclusion criteria. Spain (27.3%), Mexico (16.0%), Argentina (15.1%) and Chile (11.9%) had the highest number of journals. Most (85.8%) are currently active; 87.8% have an ISSN. The median and mean length of publication were 22 and 29 years, respectively. A total of 66.0% were indexed in at least one database; 3.0% had an impact factor in 2012. A total of 845 journals had websites (56.4%), of which 700 (82.8%) were searchable and 681 (80.6%) free of charge. Most of the identified journals have no impact factor or are not indexed in any of the major databases. The list of identified biomedical journals can be a useful resource when conducting hand searching activities and identifying clinical trials that otherwise would not be retrieved. © 2015 Health Libraries Group.
AST commercial human space flight biomedical data collection
DOT National Transportation Integrated Search
2007-02-01
Recommendations are made for specific biomedical data, equipment, and a database that will increase the knowledge and understanding of how short duration, suborbital space flight missions with brief exposure to microgravity affects the human body. Th...
[Over- or underestimated? Bibliographic survey of the biomedical periodicals published in Hungary].
Berhidi, Anna; Horváth, Katalin; Horváth, Gabriella; Vasas, Lívia
2013-06-30
This publication - based on an article published in 2006 - emphasises the qualities of the current biomedical periodicals of Hungarian editions. The aim of this study was to analyse how Hungarian journals meet the requirements of the scientific aspect and international visibility. Authors evaluated 93 Hungarian biomedical periodicals by 4 viewpoints of the two criteria mentioned above. 35% of the analysed journals complete the attributes of scientific aspect, 5% the international visibility, 6% fulfill all examined criteria, and 25% are indexed in international databases. 6 biomedical Hungarian periodicals covered by each of the three main bibliographic databases (Medline, Scopus, Web of Science) have the best qualities. Authors recommend to improve viewpoints of the scientific aspect and international visibility. The basis of qualitative adequacy are the accurate authors' guidelines, title, abstract, keywords of the articles in English, and the ability to publish on time.
A Diagram Editor for Efficient Biomedical Knowledge Capture and Integration
Yu, Bohua; Jakupovic, Elvis; Wilson, Justin; Dai, Manhong; Xuan, Weijian; Mirel, Barbara; Athey, Brian; Watson, Stanley; Meng, Fan
2008-01-01
Understanding the molecular mechanisms underlying complex disorders requires the integration of data and knowledge from different sources including free text literature and various biomedical databases. To facilitate this process, we created the Biomedical Concept Diagram Editor (BCDE) to help researchers distill knowledge from data and literature and aid the process of hypothesis development. A key feature of BCDE is the ability to capture information with a simple drag-and-drop. This is a vast improvement over manual methods of knowledge and data recording and greatly increases the efficiency of the biomedical researcher. BCDE also provides a unique concept matching function to enforce consistent terminology, which enables conceptual relationships deposited by different researchers in the BCDE database to be mined and integrated for intelligible and useful results. We hope BCDE will promote the sharing and integration of knowledge from different researchers for effective hypothesis development. PMID:21347131
CellLineNavigator: a workbench for cancer cell line analysis
Krupp, Markus; Itzel, Timo; Maass, Thorsten; Hildebrandt, Andreas; Galle, Peter R.; Teufel, Andreas
2013-01-01
The CellLineNavigator database, freely available at http://www.medicalgenomics.org/celllinenavigator, is a web-based workbench for large scale comparisons of a large collection of diverse cell lines. It aims to support experimental design in the fields of genomics, systems biology and translational biomedical research. Currently, this compendium holds genome wide expression profiles of 317 different cancer cell lines, categorized into 57 different pathological states and 28 individual tissues. To enlarge the scope of CellLineNavigator, the database was furthermore closely linked to commonly used bioinformatics databases and knowledge repositories. To ensure easy data access and search ability, a simple data and an intuitive querying interface were implemented. It allows the user to explore and filter gene expression, focusing on pathological or physiological conditions. For a more complex search, the advanced query interface may be used to query for (i) differentially expressed genes; (ii) pathological or physiological conditions; or (iii) gene names or functional attributes, such as Kyoto Encyclopaedia of Genes and Genomes pathway maps. These queries may also be combined. Finally, CellLineNavigator allows additional advanced analysis of differentially regulated genes by a direct link to the Database for Annotation, Visualization and Integrated Discovery (DAVID) Bioinformatics Resources. PMID:23118487
How to Search, Write, Prepare and Publish the Scientific Papers in the Biomedical Journals
Masic, Izet
2011-01-01
This article describes the methodology of preparation, writing and publishing scientific papers in biomedical journals. given is a concise overview of the concept and structure of the System of biomedical scientific and technical information and the way of biomedical literature retreival from worldwide biomedical databases. Described are the scientific and professional medical journals that are currently published in Bosnia and Herzegovina. Also, given is the comparative review on the number and structure of papers published in indexed journals in Bosnia and Herzegovina, which are listed in the Medline database. Analyzed are three B&H journals indexed in MEDLINE database: Medical Archives (Medicinski Arhiv), Bosnian Journal of Basic Medical Sciences and Medical Gazette (Medicinki Glasnik) in 2010. The largest number of original papers was published in the Medical Archives. There is a statistically significant difference in the number of papers published by local authors in relation to international journals in favor of the Medical Archives. True, the Journal Bosnian Journal of Basic Medical Sciences does not categorize the articles and we could not make comparisons. Journal Medical Archives and Bosnian Journal of Basic Medical Sciences by percentage published the largest number of articles by authors from Sarajevo and Tuzla, the two oldest and largest university medical centers in Bosnia and Herzegovina. The author believes that it is necessary to make qualitative changes in the reception and reviewing of papers for publication in biomedical journals published in Bosnia and Herzegovina which should be the responsibility of the separate scientific authority/ committee composed of experts in the field of medicine at the state level. PMID:23572850
AggNet: Deep Learning From Crowds for Mitosis Detection in Breast Cancer Histology Images.
Albarqouni, Shadi; Baur, Christoph; Achilles, Felix; Belagiannis, Vasileios; Demirci, Stefanie; Navab, Nassir
2016-05-01
The lack of publicly available ground-truth data has been identified as the major challenge for transferring recent developments in deep learning to the biomedical imaging domain. Though crowdsourcing has enabled annotation of large scale databases for real world images, its application for biomedical purposes requires a deeper understanding and hence, more precise definition of the actual annotation task. The fact that expert tasks are being outsourced to non-expert users may lead to noisy annotations introducing disagreement between users. Despite being a valuable resource for learning annotation models from crowdsourcing, conventional machine-learning methods may have difficulties dealing with noisy annotations during training. In this manuscript, we present a new concept for learning from crowds that handle data aggregation directly as part of the learning process of the convolutional neural network (CNN) via additional crowdsourcing layer (AggNet). Besides, we present an experimental study on learning from crowds designed to answer the following questions. 1) Can deep CNN be trained with data collected from crowdsourcing? 2) How to adapt the CNN to train on multiple types of annotation datasets (ground truth and crowd-based)? 3) How does the choice of annotation and aggregation affect the accuracy? Our experimental setup involved Annot8, a self-implemented web-platform based on Crowdflower API realizing image annotation tasks for a publicly available biomedical image database. Our results give valuable insights into the functionality of deep CNN learning from crowd annotations and prove the necessity of data aggregation integration.
The Function Biomedical Informatics Research Network Data Repository
Keator, David B.; van Erp, Theo G.M.; Turner, Jessica A.; Glover, Gary H.; Mueller, Bryon A.; Liu, Thomas T.; Voyvodic, James T.; Rasmussen, Jerod; Calhoun, Vince D.; Lee, Hyo Jong; Toga, Arthur W.; McEwen, Sarah; Ford, Judith M.; Mathalon, Daniel H.; Diaz, Michele; O’Leary, Daniel S.; Bockholt, H. Jeremy; Gadde, Syam; Preda, Adrian; Wible, Cynthia G.; Stern, Hal S.; Belger, Aysenil; McCarthy, Gregory; Ozyurt, Burak; Potkin, Steven G.
2015-01-01
The Function Biomedical Informatics Research Network (FBIRN) developed methods and tools for conducting multi-scanner functional magnetic resonance imaging (fMRI) studies. Method and tool development were based on two major goals: 1) to assess the major sources of variation in fMRI studies conducted across scanners, including instrumentation, acquisition protocols, challenge tasks, and analysis methods, and 2) to provide a distributed network infrastructure and an associated federated database to host and query large, multi-site, fMRI and clinical datasets. In the process of achieving these goals the FBIRN test bed generated several multi-scanner brain imaging data sets to be shared with the wider scientific community via the BIRN Data Repository (BDR). The FBIRN Phase 1 dataset consists of a traveling subject study of 5 healthy subjects, each scanned on 10 different 1.5 to 4 Tesla scanners. The FBIRN Phase 2 and Phase 3 datasets consist of subjects with schizophrenia or schizoaffective disorder along with healthy comparison subjects scanned at multiple sites. In this paper, we provide concise descriptions of FBIRN’s multi-scanner brain imaging data sets and details about the BIRN Data Repository instance of the Human Imaging Database (HID) used to publicly share the data. PMID:26364863
Dizeez: An Online Game for Human Gene-Disease Annotation
Loguercio, Salvatore; Good, Benjamin M.; Su, Andrew I.
2013-01-01
Structured gene annotations are a foundation upon which many bioinformatics and statistical analyses are built. However the structured annotations available in public databases are a sparse representation of biological knowledge as a whole. The rate of biomedical data generation is such that centralized biocuration efforts struggle to keep up. New models for gene annotation need to be explored that expand the pace at which we are able to structure biomedical knowledge. Recently, online games have emerged as an effective way to recruit, engage and organize large numbers of volunteers to help address difficult biological challenges. For example, games have been successfully developed for protein folding (Foldit), multiple sequence alignment (Phylo) and RNA structure design (EteRNA). Here we present Dizeez, a simple online game built with the purpose of structuring knowledge of gene-disease associations. Preliminary results from game play online and at scientific conferences suggest that Dizeez is producing valid gene-disease annotations not yet present in any public database. These early results provide a basic proof of principle that online games can be successfully applied to the challenge of gene annotation. Dizeez is available at http://genegames.org. PMID:23951102
Guidelines for the Effective Use of Entity-Attribute-Value Modeling for Biomedical Databases
Dinu, Valentin; Nadkarni, Prakash
2007-01-01
Purpose To introduce the goals of EAV database modeling, to describe the situations where Entity-Attribute-Value (EAV) modeling is a useful alternative to conventional relational methods of database modeling, and to describe the fine points of implementation in production systems. Methods We analyze the following circumstances: 1) data are sparse and have a large number of applicable attributes, but only a small fraction will apply to a given entity; 2) numerous classes of data need to be represented, each class has a limited number of attributes, but the number of instances of each class is very small. We also consider situations calling for a mixed approach where both conventional and EAV design are used for appropriate data classes. Results and Conclusions In robust production systems, EAV-modeled databases trade a modest data sub-schema for a complex metadata sub-schema. The need to design the metadata effectively makes EAV design potentially more challenging than conventional design. PMID:17098467
Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Wheeler, David L
2007-01-01
GenBank (R) is a comprehensive database that contains publicly available nucleotide sequences for more than 240 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage (www.ncbi.nlm.nih.gov).
Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Wheeler, David L
2005-01-01
GenBank is a comprehensive database that contains publicly available DNA sequences for more than 165,000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in the UK and the DNA Data Bank of Japan helps to ensure worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, go to the NCBI Homepage at http://www.ncbi.nlm.nih.gov.
Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Wheeler, David L
2006-01-01
GenBank (R) is a comprehensive database that contains publicly available DNA sequences for more than 205 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the Web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, go to the NCBI Homepage at www.ncbi.nlm.nih.gov.
Clique-based data mining for related genes in a biomedical database.
Matsunaga, Tsutomu; Yonemori, Chikara; Tomita, Etsuji; Muramatsu, Masaaki
2009-07-01
Progress in the life sciences cannot be made without integrating biomedical knowledge on numerous genes in order to help formulate hypotheses on the genetic mechanisms behind various biological phenomena, including diseases. There is thus a strong need for a way to automatically and comprehensively search from biomedical databases for related genes, such as genes in the same families and genes encoding components of the same pathways. Here we address the extraction of related genes by searching for densely-connected subgraphs, which are modeled as cliques, in a biomedical relational graph. We constructed a graph whose nodes were gene or disease pages, and edges were the hyperlink connections between those pages in the Online Mendelian Inheritance in Man (OMIM) database. We obtained over 20,000 sets of related genes (called 'gene modules') by enumerating cliques computationally. The modules included genes in the same family, genes for proteins that form a complex, and genes for components of the same signaling pathway. The results of experiments using 'metabolic syndrome'-related gene modules show that the gene modules can be used to get a coherent holistic picture helpful for interpreting relations among genes. We presented a data mining approach extracting related genes by enumerating cliques. The extracted gene sets provide a holistic picture useful for comprehending complex disease mechanisms.
Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Sayers, Eric W
2010-01-01
GenBank is a comprehensive database that contains publicly available nucleotide sequences for more than 300,000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bi-monthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI homepage: www.ncbi.nlm.nih.gov.
Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Sayers, Eric W
2009-01-01
GenBank is a comprehensive database that contains publicly available nucleotide sequences for more than 300,000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank(R) staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the National Center for Biotechnology Information (NCBI) Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov.
Large Dosage of Chishao in Formulae for Cholestatic Hepatitis: A Systematic Review and Meta-Analysis
Ma, Xiao; Wang, Ji; He, Xuan; Zhao, Yanling; Wang, Jiabo; Zhang, Ping; Zhu, Yun; Zhong, Lin; Zheng, Quanfu; Xiao, Xiaohe
2014-01-01
Objective. To evaluate the efficacy and safety of large dosage of Chishao in formulae for treatment of cholestatic hepatitis. Methods. The major databases (PubMed, Embase, Cochrane Library, Chinese Biomedical Database Wanfang, VIP medicine information system, and China National Knowledge Infrastructure) were searched until January 2014. Randomized controlled trials (RCTs) of large dosage of Chishao in formulae that reported on publications in treatment of cholestatic hepatitis with total efficacy rate, together with the biochemical indices including alanine aminotransferase (ALT), aspartate aminotransferase (AST), total bilirubin (TBIL), and direct bilirubin (DBIL), were extracted by two reviewers. The Cochrane tool was used for the assessment of risk of bias included trials. Data were analyzed with RevMan 5.2.7 software. Results. 11 RCTs involving 1275 subjects with cholestatic hepatitis were included. Compared with essential therapy, large dosage of Chishao in formulae demonstrated more efficiently with down regulation of serum ALT, AST, TBIL, DBIL. Meanwhile, there were no obvious adverse events. Conclusion. As a promising novel treatment approach, widely using large dosage of Chishao in formulae may enhance the curative efficacy for cholestatic hepatitis. Considering being accepted by more and more practitioners, further rigorously designed clinical studies are required. PMID:24987427
Whetzel, Patricia L.; Grethe, Jeffrey S.; Banks, Davis E.; Martone, Maryann E.
2015-01-01
The NIDDK Information Network (dkNET; http://dknet.org) was launched to serve the needs of basic and clinical investigators in metabolic, digestive and kidney disease by facilitating access to research resources that advance the mission of the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). By research resources, we mean the multitude of data, software tools, materials, services, projects and organizations available to researchers in the public domain. Most of these are accessed via web-accessible databases or web portals, each developed, designed and maintained by numerous different projects, organizations and individuals. While many of the large government funded databases, maintained by agencies such as European Bioinformatics Institute and the National Center for Biotechnology Information, are well known to researchers, many more that have been developed by and for the biomedical research community are unknown or underutilized. At least part of the problem is the nature of dynamic databases, which are considered part of the “hidden” web, that is, content that is not easily accessed by search engines. dkNET was created specifically to address the challenge of connecting researchers to research resources via these types of community databases and web portals. dkNET functions as a “search engine for data”, searching across millions of database records contained in hundreds of biomedical databases developed and maintained by independent projects around the world. A primary focus of dkNET are centers and projects specifically created to provide high quality data and resources to NIDDK researchers. Through the novel data ingest process used in dkNET, additional data sources can easily be incorporated, allowing it to scale with the growth of digital data and the needs of the dkNET community. Here, we provide an overview of the dkNET portal and its functions. We show how dkNET can be used to address a variety of use cases that involve searching for research resources. PMID:26393351
Architecture for biomedical multimedia information delivery on the World Wide Web
NASA Astrophysics Data System (ADS)
Long, L. Rodney; Goh, Gin-Hua; Neve, Leif; Thoma, George R.
1997-10-01
Research engineers at the National Library of Medicine are building a prototype system for the delivery of multimedia biomedical information on the World Wide Web. This paper discuses the architecture and design considerations for the system, which will be used initially to make images and text from the third National Health and Nutrition Examination Survey (NHANES) publicly available. We categorized our analysis as follows: (1) fundamental software tools: we analyzed trade-offs among use of conventional HTML/CGI, X Window Broadway, and Java; (2) image delivery: we examined the use of unconventional TCP transmission methods; (3) database manager and database design: we discuss the capabilities and planned use of the Informix object-relational database manager and the planned schema for the HNANES database; (4) storage requirements for our Sun server; (5) user interface considerations; (6) the compatibility of the system with other standard research and analysis tools; (7) image display: we discuss considerations for consistent image display for end users. Finally, we discuss the scalability of the system in terms of incorporating larger or more databases of similar data, and the extendibility of the system for supporting content-based retrieval of biomedical images. The system prototype is called the Web-based Medical Information Retrieval System. An early version was built as a Java applet and tested on Unix, PC, and Macintosh platforms. This prototype used the MiniSQL database manager to do text queries on a small database of records of participants in the second NHANES survey. The full records and associated x-ray images were retrievable and displayable on a standard Web browser. A second version has now been built, also a Java applet, using the MySQL database manager.
A Bioinformatics Module for Use in an Introductory Biology Laboratory
ERIC Educational Resources Information Center
Alaie, Adrienne; Teller, Virginia; Qiu, Wei-gang
2012-01-01
Since biomedical science has become increasingly data-intensive, acquisition of computational and quantitative skills by science students has become more important. For non-science students, an introduction to biomedical databases and their applications promotes the development of a scientifically literate population. Because typical college…
Big Biomedical data as the key resource for discovery science
DOE Office of Scientific and Technical Information (OSTI.GOV)
Toga, Arthur W.; Foster, Ian; Kesselman, Carl
Modern biomedical data collection is generating exponentially more data in a multitude of formats. This flood of complex data poses significant opportunities to discover and understand the critical interplay among such diverse domains as genomics, proteomics, metabolomics, and phenomics, including imaging, biometrics, and clinical data. The Big Data for Discovery Science Center is taking an “-ome to home” approach to discover linkages between these disparate data sources by mining existing databases of proteomic and genomic data, brain images, and clinical assessments. In support of this work, the authors developed new technological capabilities that make it easy for researchers to manage,more » aggregate, manipulate, integrate, and model large amounts of distributed data. Guided by biological domain expertise, the Center’s computational resources and software will reveal relationships and patterns, aiding researchers in identifying biomarkers for the most confounding conditions and diseases, such as Parkinson’s and Alzheimer’s.« less
Big biomedical data as the key resource for discovery science
Toga, Arthur W; Foster, Ian; Kesselman, Carl; Madduri, Ravi; Chard, Kyle; Deutsch, Eric W; Price, Nathan D; Glusman, Gustavo; Heavner, Benjamin D; Dinov, Ivo D; Ames, Joseph; Van Horn, John; Kramer, Roger; Hood, Leroy
2015-01-01
Modern biomedical data collection is generating exponentially more data in a multitude of formats. This flood of complex data poses significant opportunities to discover and understand the critical interplay among such diverse domains as genomics, proteomics, metabolomics, and phenomics, including imaging, biometrics, and clinical data. The Big Data for Discovery Science Center is taking an “-ome to home” approach to discover linkages between these disparate data sources by mining existing databases of proteomic and genomic data, brain images, and clinical assessments. In support of this work, the authors developed new technological capabilities that make it easy for researchers to manage, aggregate, manipulate, integrate, and model large amounts of distributed data. Guided by biological domain expertise, the Center’s computational resources and software will reveal relationships and patterns, aiding researchers in identifying biomarkers for the most confounding conditions and diseases, such as Parkinson’s and Alzheimer’s. PMID:26198305
Aggregated Indexing of Biomedical Time Series Data
Woodbridge, Jonathan; Mortazavi, Bobak; Sarrafzadeh, Majid; Bui, Alex A.T.
2016-01-01
Remote and wearable medical sensing has the potential to create very large and high dimensional datasets. Medical time series databases must be able to efficiently store, index, and mine these datasets to enable medical professionals to effectively analyze data collected from their patients. Conventional high dimensional indexing methods are a two stage process. First, a superset of the true matches is efficiently extracted from the database. Second, supersets are pruned by comparing each of their objects to the query object and rejecting any objects falling outside a predetermined radius. This pruning stage heavily dominates the computational complexity of most conventional search algorithms. Therefore, indexing algorithms can be significantly improved by reducing the amount of pruning. This paper presents an online algorithm to aggregate biomedical times series data to significantly reduce the search space (index size) without compromising the quality of search results. This algorithm is built on the observation that biomedical time series signals are composed of cyclical and often similar patterns. This algorithm takes in a stream of segments and groups them to highly concentrated collections. Locality Sensitive Hashing (LSH) is used to reduce the overall complexity of the algorithm, allowing it to run online. The output of this aggregation is used to populate an index. The proposed algorithm yields logarithmic growth of the index (with respect to the total number of objects) while keeping sensitivity and specificity simultaneously above 98%. Both memory and runtime complexities of time series search are improved when using aggregated indexes. In addition, data mining tasks, such as clustering, exhibit runtimes that are orders of magnitudes faster when run on aggregated indexes. PMID:27617298
USDA-ARS?s Scientific Manuscript database
The use of swine in biomedical research has increased dramatically in the last decade. Diverse genomic- and proteomic databases have been developed to facilitate research using human and rodent models. Current porcine gene databases, however, lack the robust annotation to study pig models that are...
Blockchain distributed ledger technologies for biomedical and health care applications.
Kuo, Tsung-Ting; Kim, Hyeon-Eui; Ohno-Machado, Lucila
2017-11-01
To introduce blockchain technologies, including their benefits, pitfalls, and the latest applications, to the biomedical and health care domains. Biomedical and health care informatics researchers who would like to learn about blockchain technologies and their applications in the biomedical/health care domains. The covered topics include: (1) introduction to the famous Bitcoin crypto-currency and the underlying blockchain technology; (2) features of blockchain; (3) review of alternative blockchain technologies; (4) emerging nonfinancial distributed ledger technologies and applications; (5) benefits of blockchain for biomedical/health care applications when compared to traditional distributed databases; (6) overview of the latest biomedical/health care applications of blockchain technologies; and (7) discussion of the potential challenges and proposed solutions of adopting blockchain technologies in biomedical/health care domains. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Cabello C, Felipe
2011-09-01
The National Library of Medicine (NLM) of the United States of America, celebrates in 2011 its 175th anniversary. This Library, the largest biomedical library in the world, has a proud and rich history serving the health community and the public, especially since its transfer to the National Institutes of Health in Bethesda, Maryland, in 1968. It holds 17 million publications in 150 languages, and has an important collection of ancient and modern historical books as well as original publications of Vesalius and other founders of biomedicine. Its modern document collections illustrate the progress of medical sciences. These collections include laboratory notes from many scientists whose work forms the foundations of contemporary life sciences. The Library also provides several services for health research and for the public, including databases and services such as MedLine and BLAST. The NLM constantly strives to fulfill the information needs of its customers, whether scientists or the public at large. For example, as the Hispanic population of the Unites States has increased in recent years, the NLM has made larger and larger amounts of data available in Spanish to fulfill the health information needs of this population. NLM programs train professionals in library science and biomedical informatics and link biomedical libraries of 18 academic centers throughout the United States. The NLM funds competitive grants for training at the Library, organizing short instruction courses about library science and informatics, and writing books on health related matters including the history of medicine and public health. The NLM is managed and maintained by an outstanding and farsighted group of professionals and dedicated support staff. Their focus on serving and reaching both the biomedical community and the public at large has been crucial to its development into a world icon of biomedical sciences, information technology and the humanities.
[Application of the life sciences platform based on oracle to biomedical informations].
Zhao, Zhi-Yun; Li, Tai-Huan; Yang, Hong-Qiao
2008-03-01
The life sciences platform based on Oracle database technology is introduced in this paper. By providing a powerful data access, integrating a variety of data types, and managing vast quantities of data, the software presents a flexible, safe and scalable management platform for biomedical data processing.
Constructing a Graph Database for Semantic Literature-Based Discovery.
Hristovski, Dimitar; Kastrin, Andrej; Dinevski, Dejan; Rindflesch, Thomas C
2015-01-01
Literature-based discovery (LBD) generates discoveries, or hypotheses, by combining what is already known in the literature. Potential discoveries have the form of relations between biomedical concepts; for example, a drug may be determined to treat a disease other than the one for which it was intended. LBD views the knowledge in a domain as a network; a set of concepts along with the relations between them. As a starting point, we used SemMedDB, a database of semantic relations between biomedical concepts extracted with SemRep from Medline. SemMedDB is distributed as a MySQL relational database, which has some problems when dealing with network data. We transformed and uploaded SemMedDB into the Neo4j graph database, and implemented the basic LBD discovery algorithms with the Cypher query language. We conclude that storing the data needed for semantic LBD is more natural in a graph database. Also, implementing LBD discovery algorithms is conceptually simpler with a graph query language when compared with standard SQL.
The BioMart community portal: an innovative alternative to large, centralized data repositories
Smedley, Damian; Haider, Syed; Durinck, Steffen; Pandini, Luca; Provero, Paolo; Allen, James; Arnaiz, Olivier; Awedh, Mohammad Hamza; Baldock, Richard; Barbiera, Giulia; Bardou, Philippe; Beck, Tim; Blake, Andrew; Bonierbale, Merideth; Brookes, Anthony J.; Bucci, Gabriele; Buetti, Iwan; Burge, Sarah; Cabau, Cédric; Carlson, Joseph W.; Chelala, Claude; Chrysostomou, Charalambos; Cittaro, Davide; Collin, Olivier; Cordova, Raul; Cutts, Rosalind J.; Dassi, Erik; Genova, Alex Di; Djari, Anis; Esposito, Anthony; Estrella, Heather; Eyras, Eduardo; Fernandez-Banet, Julio; Forbes, Simon; Free, Robert C.; Fujisawa, Takatomo; Gadaleta, Emanuela; Garcia-Manteiga, Jose M.; Goodstein, David; Gray, Kristian; Guerra-Assunção, José Afonso; Haggarty, Bernard; Han, Dong-Jin; Han, Byung Woo; Harris, Todd; Harshbarger, Jayson; Hastings, Robert K.; Hayes, Richard D.; Hoede, Claire; Hu, Shen; Hu, Zhi-Liang; Hutchins, Lucie; Kan, Zhengyan; Kawaji, Hideya; Keliet, Aminah; Kerhornou, Arnaud; Kim, Sunghoon; Kinsella, Rhoda; Klopp, Christophe; Kong, Lei; Lawson, Daniel; Lazarevic, Dejan; Lee, Ji-Hyun; Letellier, Thomas; Li, Chuan-Yun; Lio, Pietro; Liu, Chu-Jun; Luo, Jie; Maass, Alejandro; Mariette, Jerome; Maurel, Thomas; Merella, Stefania; Mohamed, Azza Mostafa; Moreews, Francois; Nabihoudine, Ibounyamine; Ndegwa, Nelson; Noirot, Céline; Perez-Llamas, Cristian; Primig, Michael; Quattrone, Alessandro; Quesneville, Hadi; Rambaldi, Davide; Reecy, James; Riba, Michela; Rosanoff, Steven; Saddiq, Amna Ali; Salas, Elisa; Sallou, Olivier; Shepherd, Rebecca; Simon, Reinhard; Sperling, Linda; Spooner, William; Staines, Daniel M.; Steinbach, Delphine; Stone, Kevin; Stupka, Elia; Teague, Jon W.; Dayem Ullah, Abu Z.; Wang, Jun; Ware, Doreen; Wong-Erasmus, Marie; Youens-Clark, Ken; Zadissa, Amonida; Zhang, Shi-Jian; Kasprzyk, Arek
2015-01-01
The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified interface to biomedical databases that are distributed worldwide. The portal provides access to numerous database projects supported by 30 scientific organizations. It includes over 800 different biological datasets spanning genomics, proteomics, model organisms, cancer data, ontology information and more. All resources available through the portal are independently administered and funded by their host organizations. The BioMart data federation technology provides a unified interface to all the available data. The latest version of the portal comes with many new databases that have been created by our ever-growing community. It also comes with better support and extensibility for data analysis and visualization tools. A new addition to our toolbox, the enrichment analysis tool is now accessible through graphical and web service interface. The BioMart community portal averages over one million requests per day. Building on this level of service and the wealth of information that has become available, the BioMart Community Portal has introduced a new, more scalable and cheaper alternative to the large data stores maintained by specialized organizations. PMID:25897122
Richardson, J; Smith, J E; McCall, G; Richardson, A; Pilkington, K; Kirsch, I
2007-09-01
To systematically review the research evidence on the effectiveness of hypnosis for cancer chemotherapy-induced nausea and vomiting (CINV). A comprehensive search of major biomedical databases including MEDLINE, EMBASE, ClNAHL, PsycINFO and the Cochrane Library was conducted. Specialist complementary and alternative medicine databases were searched and efforts were made to identify unpublished and ongoing research. Citations were included from the databases' inception to March 2005. Randomized controlled trials (RCTs) were appraised and meta-analysis undertaken. Clinical commentaries were obtained. Six RCTs evaluating the effectiveness of hypnosis in CINV were found. In five of these studies the participants were children. Studies report positive results including statistically significant reductions in anticipatory and CINV. Meta-analysis revealed a large effect size of hypnotic treatment when compared with treatment as usual, and the effect was at least as large as that of cognitive-behavioural therapy. Meta-analysis has demonstrated that hypnosis could be a clinically valuable intervention for anticipatory and CINV in children with cancer. Further research into the effectiveness, acceptance and feasibility of hypnosis in CINV, particularly in adults, is suggested. Future studies should assess suggestibility and provide full details of the hypnotic intervention.
The Function Biomedical Informatics Research Network Data Repository.
Keator, David B; van Erp, Theo G M; Turner, Jessica A; Glover, Gary H; Mueller, Bryon A; Liu, Thomas T; Voyvodic, James T; Rasmussen, Jerod; Calhoun, Vince D; Lee, Hyo Jong; Toga, Arthur W; McEwen, Sarah; Ford, Judith M; Mathalon, Daniel H; Diaz, Michele; O'Leary, Daniel S; Jeremy Bockholt, H; Gadde, Syam; Preda, Adrian; Wible, Cynthia G; Stern, Hal S; Belger, Aysenil; McCarthy, Gregory; Ozyurt, Burak; Potkin, Steven G
2016-01-01
The Function Biomedical Informatics Research Network (FBIRN) developed methods and tools for conducting multi-scanner functional magnetic resonance imaging (fMRI) studies. Method and tool development were based on two major goals: 1) to assess the major sources of variation in fMRI studies conducted across scanners, including instrumentation, acquisition protocols, challenge tasks, and analysis methods, and 2) to provide a distributed network infrastructure and an associated federated database to host and query large, multi-site, fMRI and clinical data sets. In the process of achieving these goals the FBIRN test bed generated several multi-scanner brain imaging data sets to be shared with the wider scientific community via the BIRN Data Repository (BDR). The FBIRN Phase 1 data set consists of a traveling subject study of 5 healthy subjects, each scanned on 10 different 1.5 to 4 T scanners. The FBIRN Phase 2 and Phase 3 data sets consist of subjects with schizophrenia or schizoaffective disorder along with healthy comparison subjects scanned at multiple sites. In this paper, we provide concise descriptions of FBIRN's multi-scanner brain imaging data sets and details about the BIRN Data Repository instance of the Human Imaging Database (HID) used to publicly share the data. Copyright © 2015 Elsevier Inc. All rights reserved.
Predicting structured metadata from unstructured metadata.
Posch, Lisa; Panahiazar, Maryam; Dumontier, Michel; Gevaert, Olivier
2016-01-01
Enormous amounts of biomedical data have been and are being produced by investigators all over the world. However, one crucial and limiting factor in data reuse is accurate, structured and complete description of the data or data about the data-defined as metadata. We propose a framework to predict structured metadata terms from unstructured metadata for improving quality and quantity of metadata, using the Gene Expression Omnibus (GEO) microarray database. Our framework consists of classifiers trained using term frequency-inverse document frequency (TF-IDF) features and a second approach based on topics modeled using a Latent Dirichlet Allocation model (LDA) to reduce the dimensionality of the unstructured data. Our results on the GEO database show that structured metadata terms can be the most accurately predicted using the TF-IDF approach followed by LDA both outperforming the majority vote baseline. While some accuracy is lost by the dimensionality reduction of LDA, the difference is small for elements with few possible values, and there is a large improvement over the majority classifier baseline. Overall this is a promising approach for metadata prediction that is likely to be applicable to other datasets and has implications for researchers interested in biomedical metadata curation and metadata prediction. © The Author(s) 2016. Published by Oxford University Press.
NASA Technical Reports Server (NTRS)
Charles, John B.; Richard, Elizabeth E.
2010-01-01
There is currently too little reproducible data for a scientifically valid understanding of the initial responses of a diverse human population to weightlessness and other space flight factors. Astronauts on orbital space flights to date have been extremely healthy and fit, unlike the general human population. Data collection opportunities during the earliest phases of space flights to date, when the most dynamic responses may occur in response to abrupt transitions in acceleration loads, have been limited by operational restrictions on our ability to encumber the astronauts with even minimal monitoring instrumentation. The era of commercial personal suborbital space flights promises the availability of a large (perhaps hundreds per year), diverse population of potential participants with a vested interest in their own responses to space flight factors, and a number of flight providers interested in documenting and demonstrating the attractiveness and safety of the experience they are offering. Voluntary participation by even a fraction of the flying population in a uniform set of unobtrusive biomedical data collections would provide a database enabling statistical analyses of a variety of acute responses to a standardized space flight environment. This will benefit both the space life sciences discipline and the general state of human knowledge.
Semantic Web repositories for genomics data using the eXframe platform.
Merrill, Emily; Corlosquet, Stéphane; Ciccarese, Paolo; Clark, Tim; Das, Sudeshna
2014-01-01
With the advent of inexpensive assay technologies, there has been an unprecedented growth in genomics data as well as the number of databases in which it is stored. In these databases, sample annotation using ontologies and controlled vocabularies is becoming more common. However, the annotation is rarely available as Linked Data, in a machine-readable format, or for standardized queries using SPARQL. This makes large-scale reuse, or integration with other knowledge bases very difficult. To address this challenge, we have developed the second generation of our eXframe platform, a reusable framework for creating online repositories of genomics experiments. This second generation model now publishes Semantic Web data. To accomplish this, we created an experiment model that covers provenance, citations, external links, assays, biomaterials used in the experiment, and the data collected during the process. The elements of our model are mapped to classes and properties from various established biomedical ontologies. Resource Description Framework (RDF) data is automatically produced using these mappings and indexed in an RDF store with a built-in Sparql Protocol and RDF Query Language (SPARQL) endpoint. Using the open-source eXframe software, institutions and laboratories can create Semantic Web repositories of their experiments, integrate it with heterogeneous resources and make it interoperable with the vast Semantic Web of biomedical knowledge.
Effects of technical editing in biomedical journals: a systematic review.
Wager, Elizabeth; Middleton, Philippa
2002-06-05
Technical editing supposedly improves the accuracy and clarity of journal articles. We examined evidence of its effects on research reports in biomedical journals. Subset of a systematic review using Cochrane methods, searching MEDLINE, EMBASE, and other databases from earliest entries to February 2000 by using inclusive search terms; hand searching relevant journals. We selected comparative studies of the effects of editorial processes on original research articles between acceptance and publication in biomedical journals. Two reviewers assessed each study and performed independent data extraction. The 11 studies on technical editing indicate that it improves the readability of articles slightly (as measured by Gunning Fog and Flesch reading ease scores), may improve other aspects of their quality, can increase the accuracy of references and quotations, and raises the quality of abstracts. Supplying authors with abstract preparation instructions had no discernible effect. Considering the time and resources devoted to technical editing, remarkably little is know about its effects or the effects of imposing different house styles. Studies performed at 3 journals employing relatively large numbers of professional technical editors suggest that their editorial processes are associated with increases in readability and quality of articles, but these findings may not be generalizable to other journals.
DEXTER: Disease-Expression Relation Extraction from Text.
Gupta, Samir; Dingerdissen, Hayley; Ross, Karen E; Hu, Yu; Wu, Cathy H; Mazumder, Raja; Vijay-Shanker, K
2018-01-01
Gene expression levels affect biological processes and play a key role in many diseases. Characterizing expression profiles is useful for clinical research, and diagnostics and prognostics of diseases. There are currently several high-quality databases that capture gene expression information, obtained mostly from large-scale studies, such as microarray and next-generation sequencing technologies, in the context of disease. The scientific literature is another rich source of information on gene expression-disease relationships that not only have been captured from large-scale studies but have also been observed in thousands of small-scale studies. Expression information obtained from literature through manual curation can extend expression databases. While many of the existing databases include information from literature, they are limited by the time-consuming nature of manual curation and have difficulty keeping up with the explosion of publications in the biomedical field. In this work, we describe an automated text-mining tool, Disease-Expression Relation Extraction from Text (DEXTER) to extract information from literature on gene and microRNA expression in the context of disease. One of the motivations in developing DEXTER was to extend the BioXpress database, a cancer-focused gene expression database that includes data derived from large-scale experiments and manual curation of publications. The literature-based portion of BioXpress lags behind significantly compared to expression information obtained from large-scale studies and can benefit from our text-mined results. We have conducted two different evaluations to measure the accuracy of our text-mining tool and achieved average F-scores of 88.51 and 81.81% for the two evaluations, respectively. Also, to demonstrate the ability to extract rich expression information in different disease-related scenarios, we used DEXTER to extract information on differential expression information for 2024 genes in lung cancer, 115 glycosyltransferases in 62 cancers and 826 microRNA in 171 cancers. All extractions using DEXTER are integrated in the literature-based portion of BioXpress.Database URL: http://biotm.cis.udel.edu/DEXTER.
Wheeler, David
2007-01-01
GenBank(R) is a comprehensive database of publicly available DNA sequences for more than 205,000 named organisms and for more than 60,000 within the embryophyta, obtained through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Daily data exchange with the European Molecular Biology Laboratory (EMBL) in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the National Center for Biotechnology Information (NCBI) retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases with taxonomy, genome, mapping, protein structure, and domain information and the biomedical journal literature through PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available through FTP. GenBank usage scenarios ranging from local analyses of the data available through FTP to online analyses supported by the NCBI Web-based tools are discussed. To access GenBank and its related retrieval and analysis services, go to the NCBI Homepage at http://www.ncbi.nlm.nih.gov.
Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Sayers, Eric W
2011-01-01
GenBank® is a comprehensive database that contains publicly available nucleotide sequences for more than 380,000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system that integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov.
PubMed and beyond: a survey of web tools for searching biomedical literature
Lu, Zhiyong
2011-01-01
The past decade has witnessed the modern advances of high-throughput technology and rapid growth of research capacity in producing large-scale biological data, both of which were concomitant with an exponential growth of biomedical literature. This wealth of scholarly knowledge is of significant importance for researchers in making scientific discoveries and healthcare professionals in managing health-related matters. However, the acquisition of such information is becoming increasingly difficult due to its large volume and rapid growth. In response, the National Center for Biotechnology Information (NCBI) is continuously making changes to its PubMed Web service for improvement. Meanwhile, different entities have devoted themselves to developing Web tools for helping users quickly and efficiently search and retrieve relevant publications. These practices, together with maturity in the field of text mining, have led to an increase in the number and quality of various Web tools that provide comparable literature search service to PubMed. In this study, we review 28 such tools, highlight their respective innovations, compare them to the PubMed system and one another, and discuss directions for future development. Furthermore, we have built a website dedicated to tracking existing systems and future advances in the field of biomedical literature search. Taken together, our work serves information seekers in choosing tools for their needs and service providers and developers in keeping current in the field. Database URL: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/search PMID:21245076
Toward a Bio-Medical Thesaurus: Building the Foundation of the UMLS
Tuttle, Mark S.; Blois, Marsden S.; Erlbaum, Mark S.; Nelson, Stuart J.; Sherertz, David D.
1988-01-01
The Unified Medical Language System (UMLS) is being designed to provide a uniform user interface to heterogeneous machine-readable bio-medical information resources, such as bibliographic databases, genetic databases, expert systems and patient records.1 Such an interface will have to recognize different ways of saying the same thing, and provide links to ways of saying related things. One way to represent the necessary associations is via a domain thesaurus. As no such thesaurus exists, and because, once built, it will be both sizable and in need of continuous maintenance, its design should include a methodology for building and maintaining it. We propose a methodology, utilizing lexically expanded schema inversion, and a design, called T. Lex, which together form one approach to the problem of defining and building a bio-medical thesaurus. We argue that the semantic locality implicit in such a thesaurus will support model-based reasoning in bio-medicine.2
Evaluation and Cross-Comparison of Lexical Entities of Biological Interest (LexEBI)
Rebholz-Schuhmann, Dietrich; Kim, Jee-Hyub; Yan, Ying; Dixit, Abhishek; Friteyre, Caroline; Hoehndorf, Robert; Backofen, Rolf; Lewin, Ian
2013-01-01
Motivation Biomedical entities, their identifiers and names, are essential in the representation of biomedical facts and knowledge. In the same way, the complete set of biomedical and chemical terms, i.e. the biomedical “term space” (the “Lexeome”), forms a key resource to achieve the full integration of the scientific literature with biomedical data resources: any identified named entity can immediately be normalized to the correct database entry. This goal does not only require that we are aware of all existing terms, but would also profit from knowing all their senses and their semantic interpretation (ambiguities, nestedness). Result This study compiles a resource for lexical terms of biomedical interest in a standard format (called “LexEBI”), determines the overall number of terms, their reuse in different resources and the nestedness of terms. LexEBI comprises references for protein and gene entries and their term variants and chemical entities amongst other terms. In addition, disease terms have been identified from Medline and PubmedCentral and added to LexEBI. Our analysis demonstrates that the baseforms of terms from the different semantic types show only little polysemous use. Nonetheless, the term variants of protein and gene names (PGNs) frequently contain species mentions, which should have been avoided according to protein annotation guidelines. Furthermore, the protein and gene entities as well as the chemical entities, both do comprise enzymes leading to hierarchical polysemy, and a large portion of PGNs make reference to a chemical entity. Altogether, according to our analysis based on the Medline distribution, 401,869 unique PGNs in the documents contain a reference to 25,022 chemical entities, 3,125 disease terms or 1,576 species mentions. Conclusion LexEBI delivers the complete biomedical and chemical Lexeome in a standardized representation (http://www.ebi.ac.uk/Rebholz-srv/LexEBI/). The resource provides the disease terms as open source content, and fully interlinks terms across resources. PMID:24124474
Featured Article: Genotation: Actionable knowledge for the scientific reader
Willis, Ethan; Sakauye, Mark; Jose, Rony; Chen, Hao; Davis, Robert L
2016-01-01
We present an article viewer application that allows a scientific reader to easily discover and share knowledge by linking genomics-related concepts to knowledge of disparate biomedical databases. High-throughput data streams generated by technical advancements have contributed to scientific knowledge discovery at an unprecedented rate. Biomedical Informaticists have created a diverse set of databases to store and retrieve the discovered knowledge. The diversity and abundance of such resources present biomedical researchers a challenge with knowledge discovery. These challenges highlight a need for a better informatics solution. We use a text mining algorithm, Genomine, to identify gene symbols from the text of a journal article. The identified symbols are supplemented with information from the GenoDB knowledgebase. Self-updating GenoDB contains information from NCBI Gene, Clinvar, Medgen, dbSNP, KEGG, PharmGKB, Uniprot, and Hugo Gene databases. The journal viewer is a web application accessible via a web browser. The features described herein are accessible on www.genotation.org. The Genomine algorithm identifies gene symbols with an accuracy shown by .65 F-Score. GenoDB currently contains information regarding 59,905 gene symbols, 5633 drug–gene relationships, 5981 gene–disease relationships, and 713 pathways. This application provides scientific readers with actionable knowledge related to concepts of a manuscript. The reader will be able to save and share supplements to be visualized in a graphical manner. This provides convenient access to details of complex biological phenomena, enabling biomedical researchers to generate novel hypothesis to further our knowledge in human health. This manuscript presents a novel application that integrates genomic, proteomic, and pharmacogenomic information to supplement content of a biomedical manuscript and enable readers to automatically discover actionable knowledge. PMID:26900164
Featured Article: Genotation: Actionable knowledge for the scientific reader.
Nagahawatte, Panduka; Willis, Ethan; Sakauye, Mark; Jose, Rony; Chen, Hao; Davis, Robert L
2016-06-01
We present an article viewer application that allows a scientific reader to easily discover and share knowledge by linking genomics-related concepts to knowledge of disparate biomedical databases. High-throughput data streams generated by technical advancements have contributed to scientific knowledge discovery at an unprecedented rate. Biomedical Informaticists have created a diverse set of databases to store and retrieve the discovered knowledge. The diversity and abundance of such resources present biomedical researchers a challenge with knowledge discovery. These challenges highlight a need for a better informatics solution. We use a text mining algorithm, Genomine, to identify gene symbols from the text of a journal article. The identified symbols are supplemented with information from the GenoDB knowledgebase. Self-updating GenoDB contains information from NCBI Gene, Clinvar, Medgen, dbSNP, KEGG, PharmGKB, Uniprot, and Hugo Gene databases. The journal viewer is a web application accessible via a web browser. The features described herein are accessible on www.genotation.org The Genomine algorithm identifies gene symbols with an accuracy shown by .65 F-Score. GenoDB currently contains information regarding 59,905 gene symbols, 5633 drug-gene relationships, 5981 gene-disease relationships, and 713 pathways. This application provides scientific readers with actionable knowledge related to concepts of a manuscript. The reader will be able to save and share supplements to be visualized in a graphical manner. This provides convenient access to details of complex biological phenomena, enabling biomedical researchers to generate novel hypothesis to further our knowledge in human health. This manuscript presents a novel application that integrates genomic, proteomic, and pharmacogenomic information to supplement content of a biomedical manuscript and enable readers to automatically discover actionable knowledge. © 2016 by the Society for Experimental Biology and Medicine.
Subramani, Suresh; Kalpana, Raja; Monickaraj, Pankaj Moses; Natarajan, Jeyakumar
2015-04-01
The knowledge on protein-protein interactions (PPI) and their related pathways are equally important to understand the biological functions of the living cell. Such information on human proteins is highly desirable to understand the mechanism of several diseases such as cancer, diabetes, and Alzheimer's disease. Because much of that information is buried in biomedical literature, an automated text mining system for visualizing human PPI and pathways is highly desirable. In this paper, we present HPIminer, a text mining system for visualizing human protein interactions and pathways from biomedical literature. HPIminer extracts human PPI information and PPI pairs from biomedical literature, and visualize their associated interactions, networks and pathways using two curated databases HPRD and KEGG. To our knowledge, HPIminer is the first system to build interaction networks from literature as well as curated databases. Further, the new interactions mined only from literature and not reported earlier in databases are highlighted as new. A comparative study with other similar tools shows that the resultant network is more informative and provides additional information on interacting proteins and their associated networks. Copyright © 2015 Elsevier Inc. All rights reserved.
Management of information in distributed biomedical collaboratories.
Keator, David B
2009-01-01
Organizing and annotating biomedical data in structured ways has gained much interest and focus in the last 30 years. Driven by decreases in digital storage costs and advances in genetics sequencing, imaging, electronic data collection, and microarray technologies, data is being collected at an alarming rate. The specialization of fields in biology and medicine demonstrates the need for somewhat different structures for storage and retrieval of data. For biologists, the need for structured information and integration across a number of domains drives development. For clinical researchers and hospitals, the need for a structured medical record accessible to, ideally, any medical practitioner who might require it during the course of research or patient treatment, patient confidentiality, and security are the driving developmental factors. Scientific data management systems generally consist of a few core services: a backend database system, a front-end graphical user interface, and an export/import mechanism or data interchange format to both get data into and out of the database and share data with collaborators. The chapter introduces some existing databases, distributed file systems, and interchange languages used within the biomedical research and clinical communities for scientific data management and exchange.
Large and linked in scientific publishing
2012-01-01
We are delighted to announce the launch of GigaScience, an online open-access journal that focuses on research using or producing large datasets in all areas of biological and biomedical sciences. GigaScience is a new type of journal that provides standard scientific publishing linked directly to a database that hosts all the relevant data. The primary goals for the journal, detailed in this editorial, are to promote more rapid data release, broader use and reuse of data, improved reproducibility of results, and direct, easy access between analyses and their data. Direct and permanent connections of scientific analyses and their data (achieved by assigning all hosted data a citable DOI) will enable better analysis and deeper interpretation of the data in the future. PMID:23587310
Large and linked in scientific publishing.
Goodman, Laurie; Edmunds, Scott C; Basford, Alexandra T
2012-07-12
We are delighted to announce the launch of GigaScience, an online open-access journal that focuses on research using or producing large datasets in all areas of biological and biomedical sciences. GigaScience is a new type of journal that provides standard scientific publishing linked directly to a database that hosts all the relevant data. The primary goals for the journal, detailed in this editorial, are to promote more rapid data release, broader use and reuse of data, improved reproducibility of results, and direct, easy access between analyses and their data. Direct and permanent connections of scientific analyses and their data (achieved by assigning all hosted data a citable DOI) will enable better analysis and deeper interpretation of the data in the future.
Biomedical data integration in computational drug design and bioinformatics.
Seoane, Jose A; Aguiar-Pulido, Vanessa; Munteanu, Cristian R; Rivero, Daniel; Rabunal, Juan R; Dorado, Julian; Pazos, Alejandro
2013-03-01
In recent years, in the post genomic era, more and more data is being generated by biological high throughput technologies, such as proteomics and transcriptomics. This omics data can be very useful, but the real challenge is to analyze all this data, as a whole, after integrating it. Biomedical data integration enables making queries to different, heterogeneous and distributed biomedical data sources. Data integration solutions can be very useful not only in the context of drug design, but also in biomedical information retrieval, clinical diagnosis, system biology, etc. In this review, we analyze the most common approaches to biomedical data integration, such as federated databases, data warehousing, multi-agent systems and semantic technology, as well as the solutions developed using these approaches in the past few years.
Organization of Heterogeneous Scientific Data Using the EAV/CR Representation
Nadkarni, Prakash M.; Marenco, Luis; Chen, Roland; Skoufos, Emmanouil; Shepherd, Gordon; Miller, Perry
1999-01-01
Entity-attribute-value (EAV) representation is a means of organizing highly heterogeneous data using a relatively simple physical database schema. EAV representation is widely used in the medical domain, most notably in the storage of data related to clinical patient records. Its potential strengths suggest its use in other biomedical areas, in particular research databases whose schemas are complex as well as constantly changing to reflect evolving knowledge in rapidly advancing scientific domains. When deployed for such purposes, the basic EAV representation needs to be augmented significantly to handle the modeling of complex objects (classes) as well as to manage interobject relationships. The authors refer to their modification of the basic EAV paradigm as EAV/CR (EAV with classes and relationships). They describe EAV/CR representation with examples from two biomedical databases that use it. PMID:10579606
Are we studying what matters? Health priorities and NIH-funded biomedical engineering research.
Rubin, Jessica B; Paltiel, A David; Saltzman, W Mark
2010-07-01
With the founding of the National Institute of Biomedical Imaging and Bioengineering (NIBIB) in 1999, the National Institutes of Health (NIH) made explicit its dedication to expanding research in biomedical engineering. Ten years later, we sought to examine how closely federal funding for biomedical engineering aligns with U.S. health priorities. Using a publicly accessible database of research projects funded by the NIH in 2008, we identified 641 grants focused on biomedical engineering, 48% of which targeted specific diseases. Overall, we found that these disease-specific NIH-funded biomedical engineering research projects align with national health priorities, as quantified by three commonly utilized measures of disease burden: cause of death, disability-adjusted survival losses, and expenditures. However, we also found some illnesses (e.g., cancer and heart disease) for which the number of research projects funded deviated from our expectations, given their disease burden. Our findings suggest several possibilities for future studies that would serve to further inform the allocation of limited research dollars within the field of biomedical engineering.
Bachman, John A; Gyori, Benjamin M; Sorger, Peter K
2018-06-28
For automated reading of scientific publications to extract useful information about molecular mechanisms it is critical that genes, proteins and other entities be correctly associated with uniform identifiers, a process known as named entity linking or "grounding." Correct grounding is essential for resolving relationships among mined information, curated interaction databases, and biological datasets. The accuracy of this process is largely dependent on the availability of machine-readable resources associating synonyms and abbreviations commonly found in biomedical literature with uniform identifiers. In a task involving automated reading of ∼215,000 articles using the REACH event extraction software we found that grounding was disproportionately inaccurate for multi-protein families (e.g., "AKT") and complexes with multiple subunits (e.g."NF- κB"). To address this problem we constructed FamPlex, a manually curated resource defining protein families and complexes as they are commonly encountered in biomedical text. In FamPlex the gene-level constituents of families and complexes are defined in a flexible format allowing for multi-level, hierarchical membership. To create FamPlex, text strings corresponding to entities were identified empirically from literature and linked manually to uniform identifiers; these identifiers were also mapped to equivalent entries in multiple related databases. FamPlex also includes curated prefix and suffix patterns that improve named entity recognition and event extraction. Evaluation of REACH extractions on a test corpus of ∼54,000 articles showed that FamPlex significantly increased grounding accuracy for families and complexes (from 15 to 71%). The hierarchical organization of entities in FamPlex also made it possible to integrate otherwise unconnected mechanistic information across families, subfamilies, and individual proteins. Applications of FamPlex to the TRIPS/DRUM reading system and the Biocreative VI Bioentity Normalization Task dataset demonstrated the utility of FamPlex in other settings. FamPlex is an effective resource for improving named entity recognition, grounding, and relationship resolution in automated reading of biomedical text. The content in FamPlex is available in both tabular and Open Biomedical Ontology formats at https://github.com/sorgerlab/famplex under the Creative Commons CC0 license and has been integrated into the TRIPS/DRUM and REACH reading systems.
Grethe, Jeffrey S; Baru, Chaitan; Gupta, Amarnath; James, Mark; Ludaescher, Bertram; Martone, Maryann E; Papadopoulos, Philip M; Peltier, Steven T; Rajasekar, Arcot; Santini, Simone; Zaslavsky, Ilya N; Ellisman, Mark H
2005-01-01
Through support from the National Institutes of Health's National Center for Research Resources, the Biomedical Informatics Research Network (BIRN) is pioneering the use of advanced cyberinfrastructure for medical research. By synchronizing developments in advanced wide area networking, distributed computing, distributed database federation, and other emerging capabilities of e-science, the BIRN has created a collaborative environment that is paving the way for biomedical research and clinical information management. The BIRN Coordinating Center (BIRN-CC) is orchestrating the development and deployment of key infrastructure components for immediate and long-range support of biomedical and clinical research being pursued by domain scientists in three neuroimaging test beds.
Big biomedical data as the key resource for discovery science.
Toga, Arthur W; Foster, Ian; Kesselman, Carl; Madduri, Ravi; Chard, Kyle; Deutsch, Eric W; Price, Nathan D; Glusman, Gustavo; Heavner, Benjamin D; Dinov, Ivo D; Ames, Joseph; Van Horn, John; Kramer, Roger; Hood, Leroy
2015-11-01
Modern biomedical data collection is generating exponentially more data in a multitude of formats. This flood of complex data poses significant opportunities to discover and understand the critical interplay among such diverse domains as genomics, proteomics, metabolomics, and phenomics, including imaging, biometrics, and clinical data. The Big Data for Discovery Science Center is taking an "-ome to home" approach to discover linkages between these disparate data sources by mining existing databases of proteomic and genomic data, brain images, and clinical assessments. In support of this work, the authors developed new technological capabilities that make it easy for researchers to manage, aggregate, manipulate, integrate, and model large amounts of distributed data. Guided by biological domain expertise, the Center's computational resources and software will reveal relationships and patterns, aiding researchers in identifying biomarkers for the most confounding conditions and diseases, such as Parkinson's and Alzheimer's. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Plikus, Maksim V; Zhang, Zina; Chuong, Cheng-Ming
2006-01-01
Background Understanding research activity within any given biomedical field is important. Search outputs generated by MEDLINE/PubMed are not well classified and require lengthy manual citation analysis. Automation of citation analytics can be very useful and timesaving for both novices and experts. Results PubFocus web server automates analysis of MEDLINE/PubMed search queries by enriching them with two widely used human factor-based bibliometric indicators of publication quality: journal impact factor and volume of forward references. In addition to providing basic volumetric statistics, PubFocus also prioritizes citations and evaluates authors' impact on the field of search. PubFocus also analyses presence and occurrence of biomedical key terms within citations by utilizing controlled vocabularies. Conclusion We have developed citations' prioritisation algorithm based on journal impact factor, forward referencing volume, referencing dynamics, and author's contribution level. It can be applied either to the primary set of PubMed search results or to the subsets of these results identified through key terms from controlled biomedical vocabularies and ontologies. NCI (National Cancer Institute) thesaurus and MGD (Mouse Genome Database) mammalian gene orthology have been implemented for key terms analytics. PubFocus provides a scalable platform for the integration of multiple available ontology databases. PubFocus analytics can be adapted for input sources of biomedical citations other than PubMed. PMID:17014720
Kafkas, Şenay; Kim, Jee-Hyub; Pi, Xingjun; McEntyre, Johanna R
2015-01-01
In this study, we present an analysis of data citation practices in full text research articles and their corresponding supplementary data files, made available in the Open Access set of articles from Europe PubMed Central. Our aim is to investigate whether supplementary data files should be considered as a source of information for integrating the literature with biomolecular databases. Using text-mining methods to identify and extract a variety of core biological database accession numbers, we found that the supplemental data files contain many more database citations than the body of the article, and that those citations often take the form of a relatively small number of articles citing large collections of accession numbers in text-based files. Moreover, citation of value-added databases derived from submission databases (such as Pfam, UniProt or Ensembl) is common, demonstrating the reuse of these resources as datasets in themselves. All the database accession numbers extracted from the supplementary data are publicly accessible from http://dx.doi.org/10.5281/zenodo.11771. Our study suggests that supplementary data should be considered when linking articles with data, in curation pipelines, and in information retrieval tasks in order to make full use of the entire research article. These observations highlight the need to improve the management of supplemental data in general, in order to make this information more discoverable and useful.
Polvi, Anne; Linturi, Henna; Varilo, Teppo; Anttonen, Anna-Kaisa; Byrne, Myles; Fokkema, Ivo F A C; Almusa, Henrikki; Metzidis, Anthony; Avela, Kristiina; Aula, Pertti; Kestilä, Marjo; Muilu, Juha
2013-11-01
The Finnish Disease Heritage Database (FinDis) (http://findis.org) was originally published in 2004 as a centralized information resource for rare monogenic diseases enriched in the Finnish population. The FinDis database originally contained 405 causative variants for 30 diseases. At the time, the FinDis database was a comprehensive collection of data, but since 1994, a large amount of new information has emerged, making the necessity to update the database evident. We collected information and updated the database to contain genes and causative variants for 35 diseases, including six more genes and more than 1,400 additional disease-causing variants. Information for causative variants for each gene is collected under the LOVD 3.0 platform, enabling easy updating. The FinDis portal provides a centralized resource and user interface to link information on each disease and gene with variant data in the LOVD 3.0 platform. The software written to achieve this has been open-sourced and made available on GitHub (http://github.com/findis-db), allowing biomedical institutions in other countries to present their national data in a similar way, and to both contribute to, and benefit from, standardized variation data. The updated FinDis portal provides a unique resource to assist patient diagnosis, research, and the development of new cures. © 2013 WILEY PERIODICALS, INC.
Private and Efficient Query Processing on Outsourced Genomic Databases.
Ghasemi, Reza; Al Aziz, Md Momin; Mohammed, Noman; Dehkordi, Massoud Hadian; Jiang, Xiaoqian
2017-09-01
Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a time consuming and expensive process. Second, it requires large-scale computation and storage systems to process genomic sequences. Third, genomic databases are often owned by different organizations, and thus, not available for public usage. Cloud computing paradigm can be leveraged to facilitate the creation and sharing of big genomic databases for these applications. Genomic data owners can outsource their databases in a centralized cloud server to ease the access of their databases. However, data owners are reluctant to adopt this model, as it requires outsourcing the data to an untrusted cloud service provider that may cause data breaches. In this paper, we propose a privacy-preserving model for outsourcing genomic data to a cloud. The proposed model enables query processing while providing privacy protection of genomic databases. Privacy of the individuals is guaranteed by permuting and adding fake genomic records in the database. These techniques allow cloud to evaluate count and top-k queries securely and efficiently. Experimental results demonstrate that a count and a top-k query over 40 Single Nucleotide Polymorphisms (SNPs) in a database of 20 000 records takes around 100 and 150 s, respectively.
Private and Efficient Query Processing on Outsourced Genomic Databases
Ghasemi, Reza; Al Aziz, Momin; Mohammed, Noman; Dehkordi, Massoud Hadian; Jiang, Xiaoqian
2017-01-01
Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a time-consuming and expensive process. Second, it requires large-scale computation and storage systems to processes genomic sequences. Third, genomic databases are often owned by different organizations and thus not available for public usage. Cloud computing paradigm can be leveraged to facilitate the creation and sharing of big genomic databases for these applications. Genomic data owners can outsource their databases in a centralized cloud server to ease the access of their databases. However, data owners are reluctant to adopt this model, as it requires outsourcing the data to an untrusted cloud service provider that may cause data breaches. In this paper, we propose a privacy-preserving model for outsourcing genomic data to a cloud. The proposed model enables query processing while providing privacy protection of genomic databases. Privacy of the individuals is guaranteed by permuting and adding fake genomic records in the database. These techniques allow cloud to evaluate count and top-k queries securely and efficiently. Experimental results demonstrate that a count and a top-k query over 40 SNPs in a database of 20,000 records takes around 100 and 150 seconds, respectively. PMID:27834660
The BioMart community portal: an innovative alternative to large, centralized data repositories.
Smedley, Damian; Haider, Syed; Durinck, Steffen; Pandini, Luca; Provero, Paolo; Allen, James; Arnaiz, Olivier; Awedh, Mohammad Hamza; Baldock, Richard; Barbiera, Giulia; Bardou, Philippe; Beck, Tim; Blake, Andrew; Bonierbale, Merideth; Brookes, Anthony J; Bucci, Gabriele; Buetti, Iwan; Burge, Sarah; Cabau, Cédric; Carlson, Joseph W; Chelala, Claude; Chrysostomou, Charalambos; Cittaro, Davide; Collin, Olivier; Cordova, Raul; Cutts, Rosalind J; Dassi, Erik; Di Genova, Alex; Djari, Anis; Esposito, Anthony; Estrella, Heather; Eyras, Eduardo; Fernandez-Banet, Julio; Forbes, Simon; Free, Robert C; Fujisawa, Takatomo; Gadaleta, Emanuela; Garcia-Manteiga, Jose M; Goodstein, David; Gray, Kristian; Guerra-Assunção, José Afonso; Haggarty, Bernard; Han, Dong-Jin; Han, Byung Woo; Harris, Todd; Harshbarger, Jayson; Hastings, Robert K; Hayes, Richard D; Hoede, Claire; Hu, Shen; Hu, Zhi-Liang; Hutchins, Lucie; Kan, Zhengyan; Kawaji, Hideya; Keliet, Aminah; Kerhornou, Arnaud; Kim, Sunghoon; Kinsella, Rhoda; Klopp, Christophe; Kong, Lei; Lawson, Daniel; Lazarevic, Dejan; Lee, Ji-Hyun; Letellier, Thomas; Li, Chuan-Yun; Lio, Pietro; Liu, Chu-Jun; Luo, Jie; Maass, Alejandro; Mariette, Jerome; Maurel, Thomas; Merella, Stefania; Mohamed, Azza Mostafa; Moreews, Francois; Nabihoudine, Ibounyamine; Ndegwa, Nelson; Noirot, Céline; Perez-Llamas, Cristian; Primig, Michael; Quattrone, Alessandro; Quesneville, Hadi; Rambaldi, Davide; Reecy, James; Riba, Michela; Rosanoff, Steven; Saddiq, Amna Ali; Salas, Elisa; Sallou, Olivier; Shepherd, Rebecca; Simon, Reinhard; Sperling, Linda; Spooner, William; Staines, Daniel M; Steinbach, Delphine; Stone, Kevin; Stupka, Elia; Teague, Jon W; Dayem Ullah, Abu Z; Wang, Jun; Ware, Doreen; Wong-Erasmus, Marie; Youens-Clark, Ken; Zadissa, Amonida; Zhang, Shi-Jian; Kasprzyk, Arek
2015-07-01
The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified interface to biomedical databases that are distributed worldwide. The portal provides access to numerous database projects supported by 30 scientific organizations. It includes over 800 different biological datasets spanning genomics, proteomics, model organisms, cancer data, ontology information and more. All resources available through the portal are independently administered and funded by their host organizations. The BioMart data federation technology provides a unified interface to all the available data. The latest version of the portal comes with many new databases that have been created by our ever-growing community. It also comes with better support and extensibility for data analysis and visualization tools. A new addition to our toolbox, the enrichment analysis tool is now accessible through graphical and web service interface. The BioMart community portal averages over one million requests per day. Building on this level of service and the wealth of information that has become available, the BioMart Community Portal has introduced a new, more scalable and cheaper alternative to the large data stores maintained by specialized organizations. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Semantic Web repositories for genomics data using the eXframe platform
2014-01-01
Background With the advent of inexpensive assay technologies, there has been an unprecedented growth in genomics data as well as the number of databases in which it is stored. In these databases, sample annotation using ontologies and controlled vocabularies is becoming more common. However, the annotation is rarely available as Linked Data, in a machine-readable format, or for standardized queries using SPARQL. This makes large-scale reuse, or integration with other knowledge bases very difficult. Methods To address this challenge, we have developed the second generation of our eXframe platform, a reusable framework for creating online repositories of genomics experiments. This second generation model now publishes Semantic Web data. To accomplish this, we created an experiment model that covers provenance, citations, external links, assays, biomaterials used in the experiment, and the data collected during the process. The elements of our model are mapped to classes and properties from various established biomedical ontologies. Resource Description Framework (RDF) data is automatically produced using these mappings and indexed in an RDF store with a built-in Sparql Protocol and RDF Query Language (SPARQL) endpoint. Conclusions Using the open-source eXframe software, institutions and laboratories can create Semantic Web repositories of their experiments, integrate it with heterogeneous resources and make it interoperable with the vast Semantic Web of biomedical knowledge. PMID:25093072
75 FR 61761 - Renewal of Charter for the Chronic Fatigue Syndrome Advisory Committee
Federal Register 2010, 2011, 2012, 2013, 2014
2010-10-06
... professionals, and the biomedical, academic, and research communities about chronic fatigue syndrome advances... accessing the FACA database that is maintained by the Committee Management Secretariat under the General Services Administration. The Web site address for the FACA database is http://fido.gov/facadatabase . Dated...
Semantic SenseLab: implementing the vision of the Semantic Web in neuroscience
Samwald, Matthias; Chen, Huajun; Ruttenberg, Alan; Lim, Ernest; Marenco, Luis; Miller, Perry; Shepherd, Gordon; Cheung, Kei-Hoi
2011-01-01
Summary Objective Integrative neuroscience research needs a scalable informatics framework that enables semantic integration of diverse types of neuroscience data. This paper describes the use of the Web Ontology Language (OWL) and other Semantic Web technologies for the representation and integration of molecular-level data provided by several of SenseLab suite of neuroscience databases. Methods Based on the original database structure, we semi-automatically translated the databases into OWL ontologies with manual addition of semantic enrichment. The SenseLab ontologies are extensively linked to other biomedical Semantic Web resources, including the Subcellular Anatomy Ontology, Brain Architecture Management System, the Gene Ontology, BIRNLex and UniProt. The SenseLab ontologies have also been mapped to the Basic Formal Ontology and Relation Ontology, which helps ease interoperability with many other existing and future biomedical ontologies for the Semantic Web. In addition, approaches to representing contradictory research statements are described. The SenseLab ontologies are designed for use on the Semantic Web that enables their integration into a growing collection of biomedical information resources. Conclusion We demonstrate that our approach can yield significant potential benefits and that the Semantic Web is rapidly becoming mature enough to realize its anticipated promises. The ontologies are available online at http://neuroweb.med.yale.edu/senselab/ PMID:20006477
Semantic SenseLab: Implementing the vision of the Semantic Web in neuroscience.
Samwald, Matthias; Chen, Huajun; Ruttenberg, Alan; Lim, Ernest; Marenco, Luis; Miller, Perry; Shepherd, Gordon; Cheung, Kei-Hoi
2010-01-01
Integrative neuroscience research needs a scalable informatics framework that enables semantic integration of diverse types of neuroscience data. This paper describes the use of the Web Ontology Language (OWL) and other Semantic Web technologies for the representation and integration of molecular-level data provided by several of SenseLab suite of neuroscience databases. Based on the original database structure, we semi-automatically translated the databases into OWL ontologies with manual addition of semantic enrichment. The SenseLab ontologies are extensively linked to other biomedical Semantic Web resources, including the Subcellular Anatomy Ontology, Brain Architecture Management System, the Gene Ontology, BIRNLex and UniProt. The SenseLab ontologies have also been mapped to the Basic Formal Ontology and Relation Ontology, which helps ease interoperability with many other existing and future biomedical ontologies for the Semantic Web. In addition, approaches to representing contradictory research statements are described. The SenseLab ontologies are designed for use on the Semantic Web that enables their integration into a growing collection of biomedical information resources. We demonstrate that our approach can yield significant potential benefits and that the Semantic Web is rapidly becoming mature enough to realize its anticipated promises. The ontologies are available online at http://neuroweb.med.yale.edu/senselab/. 2009 Elsevier B.V. All rights reserved.
Introducing meta-services for biomedical information extraction
Leitner, Florian; Krallinger, Martin; Rodriguez-Penagos, Carlos; Hakenberg, Jörg; Plake, Conrad; Kuo, Cheng-Ju; Hsu, Chun-Nan; Tsai, Richard Tzong-Han; Hung, Hsi-Chuan; Lau, William W; Johnson, Calvin A; Sætre, Rune; Yoshida, Kazuhiro; Chen, Yan Hua; Kim, Sun; Shin, Soo-Yong; Zhang, Byoung-Tak; Baumgartner, William A; Hunter, Lawrence; Haddow, Barry; Matthews, Michael; Wang, Xinglong; Ruch, Patrick; Ehrler, Frédéric; Özgür, Arzucan; Erkan, Güneş; Radev, Dragomir R; Krauthammer, Michael; Luong, ThaiBinh; Hoffmann, Robert; Sander, Chris; Valencia, Alfonso
2008-01-01
We introduce the first meta-service for information extraction in molecular biology, the BioCreative MetaServer (BCMS; ). This prototype platform is a joint effort of 13 research groups and provides automatically generated annotations for PubMed/Medline abstracts. Annotation types cover gene names, gene IDs, species, and protein-protein interactions. The annotations are distributed by the meta-server in both human and machine readable formats (HTML/XML). This service is intended to be used by biomedical researchers and database annotators, and in biomedical language processing. The platform allows direct comparison, unified access, and result aggregation of the annotations. PMID:18834497
Autism: biomedical complementary treatment approaches.
Hendren, Robert L
2013-07-01
This article provides a conceptual overview for the use of biomedical complementary and alternative medicine (CAM) treatments for autism spectrum disorders. Pharmaceutical agents with published studies are briefly mentioned; but the focus of the article is on possible biomedical CAM treatments, the rationale for their use, and the current database of mostly preliminary studies regarding their safety and efficacy. Of the more than 50 treatments currently listed here and in use by eager families, 9 are reviewed in more detail because of their promise from preliminary research studies or because of public interest. Copyright © 2013 Elsevier Inc. All rights reserved.
Torgerson, Carinna M; Quinn, Catherine; Dinov, Ivo; Liu, Zhizhong; Petrosyan, Petros; Pelphrey, Kevin; Haselgrove, Christian; Kennedy, David N; Toga, Arthur W; Van Horn, John Darrell
2015-03-01
Under the umbrella of the National Database for Clinical Trials (NDCT) related to mental illnesses, the National Database for Autism Research (NDAR) seeks to gather, curate, and make openly available neuroimaging data from NIH-funded studies of autism spectrum disorder (ASD). NDAR has recently made its database accessible through the LONI Pipeline workflow design and execution environment to enable large-scale analyses of cortical architecture and function via local, cluster, or "cloud"-based computing resources. This presents a unique opportunity to overcome many of the customary limitations to fostering biomedical neuroimaging as a science of discovery. Providing open access to primary neuroimaging data, workflow methods, and high-performance computing will increase uniformity in data collection protocols, encourage greater reliability of published data, results replication, and broaden the range of researchers now able to perform larger studies than ever before. To illustrate the use of NDAR and LONI Pipeline for performing several commonly performed neuroimaging processing steps and analyses, this paper presents example workflows useful for ASD neuroimaging researchers seeking to begin using this valuable combination of online data and computational resources. We discuss the utility of such database and workflow processing interactivity as a motivation for the sharing of additional primary data in ASD research and elsewhere.
Cataloging the biomedical world of pain through semi-automated curation of molecular interactions
Jamieson, Daniel G.; Roberts, Phoebe M.; Robertson, David L.; Sidders, Ben; Nenadic, Goran
2013-01-01
The vast collection of biomedical literature and its continued expansion has presented a number of challenges to researchers who require structured findings to stay abreast of and analyze molecular mechanisms relevant to their domain of interest. By structuring literature content into topic-specific machine-readable databases, the aggregate data from multiple articles can be used to infer trends that can be compared and contrasted with similar findings from topic-independent resources. Our study presents a generalized procedure for semi-automatically creating a custom topic-specific molecular interaction database through the use of text mining to assist manual curation. We apply the procedure to capture molecular events that underlie ‘pain’, a complex phenomenon with a large societal burden and unmet medical need. We describe how existing text mining solutions are used to build a pain-specific corpus, extract molecular events from it, add context to the extracted events and assess their relevance. The pain-specific corpus contains 765 692 documents from Medline and PubMed Central, from which we extracted 356 499 unique normalized molecular events, with 261 438 single protein events and 93 271 molecular interactions supplied by BioContext. Event chains are annotated with negation, speculation, anatomy, Gene Ontology terms, mutations, pain and disease relevance, which collectively provide detailed insight into how that event chain is associated with pain. The extracted relations are visualized in a wiki platform (wiki-pain.org) that enables efficient manual curation and exploration of the molecular mechanisms that underlie pain. Curation of 1500 grouped event chains ranked by pain relevance revealed 613 accurately extracted unique molecular interactions that in the future can be used to study the underlying mechanisms involved in pain. Our approach demonstrates that combining existing text mining tools with domain-specific terms and wiki-based visualization can facilitate rapid curation of molecular interactions to create a custom database. Database URL: ••• PMID:23707966
Cataloging the biomedical world of pain through semi-automated curation of molecular interactions.
Jamieson, Daniel G; Roberts, Phoebe M; Robertson, David L; Sidders, Ben; Nenadic, Goran
2013-01-01
The vast collection of biomedical literature and its continued expansion has presented a number of challenges to researchers who require structured findings to stay abreast of and analyze molecular mechanisms relevant to their domain of interest. By structuring literature content into topic-specific machine-readable databases, the aggregate data from multiple articles can be used to infer trends that can be compared and contrasted with similar findings from topic-independent resources. Our study presents a generalized procedure for semi-automatically creating a custom topic-specific molecular interaction database through the use of text mining to assist manual curation. We apply the procedure to capture molecular events that underlie 'pain', a complex phenomenon with a large societal burden and unmet medical need. We describe how existing text mining solutions are used to build a pain-specific corpus, extract molecular events from it, add context to the extracted events and assess their relevance. The pain-specific corpus contains 765 692 documents from Medline and PubMed Central, from which we extracted 356 499 unique normalized molecular events, with 261 438 single protein events and 93 271 molecular interactions supplied by BioContext. Event chains are annotated with negation, speculation, anatomy, Gene Ontology terms, mutations, pain and disease relevance, which collectively provide detailed insight into how that event chain is associated with pain. The extracted relations are visualized in a wiki platform (wiki-pain.org) that enables efficient manual curation and exploration of the molecular mechanisms that underlie pain. Curation of 1500 grouped event chains ranked by pain relevance revealed 613 accurately extracted unique molecular interactions that in the future can be used to study the underlying mechanisms involved in pain. Our approach demonstrates that combining existing text mining tools with domain-specific terms and wiki-based visualization can facilitate rapid curation of molecular interactions to create a custom database. Database URL: •••
Undisclosed conflicts of interest among biomedical textbook authors.
Piper, Brian J; Lambert, Drew A; Keefe, Ryan C; Smukler, Phoebe U; Selemon, Nicolas A; Duperry, Zachary R
2018-02-05
Textbooks are a formative resource for health care providers during their education and are also an enduring reference for pathophysiology and treatment. Unlike the primary literature and clinical guidelines, biomedical textbook authors do not typically disclose potential financial conflicts of interest (pCoIs). The objective of this study was to evaluate whether the authors of textbooks used in the training of physicians, pharmacists, and dentists had appreciable undisclosed pCoIs in the form of patents or compensation received from pharmaceutical or biotechnology companies. The most recent editions of six medical textbooks, Harrison's Principles of Internal Medicine ( Har PIM), Katzung and Trevor's Basic and Clinical Pharmacology ( Kat BCP), the American Osteopathic Association's Foundations of Osteopathic Medicine ( AOA FOM), Remington: The Science and Practice of Pharmacy ( Rem SPP), Koda-Kimble and Young's Applied Therapeutics ( KKY AT), and Yagiela's Pharmacology and Therapeutics for Dentistry ( Yag PTD), were selected after consulting biomedical educators for evaluation. Author names (N = 1,152, 29.2% female) were submitted to databases to examine patents (Google Scholar) and compensation (ProPublica's Dollars for Docs [PDD]). Authors were listed as inventors on 677 patents (maximum/author = 23), with three-quarters (74.9%) to Har PIM authors. Females were significantly underrepresented among patent holders. The PDD 2009-2013 database revealed receipt of US$13.2 million, the majority to (83.9%) to Har PIM. The maximum compensation per author was $869,413. The PDD 2014 database identified receipt of $6.8 million, with 50.4% of eligible authors receiving compensation. The maximum compensation received by a single author was $560,021. Cardiovascular authors were most likely to have a PDD entry and neurologic disorders authors were least likely. An appreciable subset of biomedical authors have patents and have received remuneration from medical product companies and this information is not disclosed to readers. These findings indicate that full transparency of financial pCoI should become a standard practice among the authors of biomedical educational materials.
Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II.
Lu, Zhiyong; Hirschman, Lynette
2012-01-01
Manual curation of data from the biomedical literature is a rate-limiting factor for many expert curated databases. Despite the continuing advances in biomedical text mining and the pressing needs of biocurators for better tools, few existing text-mining tools have been successfully integrated into production literature curation systems such as those used by the expert curated databases. To close this gap and better understand all aspects of literature curation, we invited submissions of written descriptions of curation workflows from expert curated databases for the BioCreative 2012 Workshop Track II. We received seven qualified contributions, primarily from model organism databases. Based on these descriptions, we identified commonalities and differences across the workflows, the common ontologies and controlled vocabularies used and the current and desired uses of text mining for biocuration. Compared to a survey done in 2009, our 2012 results show that many more databases are now using text mining in parts of their curation workflows. In addition, the workshop participants identified text-mining aids for finding gene names and symbols (gene indexing), prioritization of documents for curation (document triage) and ontology concept assignment as those most desired by the biocurators. DATABASE URL: http://www.biocreative.org/tasks/bc-workshop-2012/workflow/.
The BioIntelligence Framework: a new computational platform for biomedical knowledge computing.
Farley, Toni; Kiefer, Jeff; Lee, Preston; Von Hoff, Daniel; Trent, Jeffrey M; Colbourn, Charles; Mousses, Spyro
2013-01-01
Breakthroughs in molecular profiling technologies are enabling a new data-intensive approach to biomedical research, with the potential to revolutionize how we study, manage, and treat complex diseases. The next great challenge for clinical applications of these innovations will be to create scalable computational solutions for intelligently linking complex biomedical patient data to clinically actionable knowledge. Traditional database management systems (DBMS) are not well suited to representing complex syntactic and semantic relationships in unstructured biomedical information, introducing barriers to realizing such solutions. We propose a scalable computational framework for addressing this need, which leverages a hypergraph-based data model and query language that may be better suited for representing complex multi-lateral, multi-scalar, and multi-dimensional relationships. We also discuss how this framework can be used to create rapid learning knowledge base systems to intelligently capture and relate complex patient data to biomedical knowledge in order to automate the recovery of clinically actionable information.
RNAcentral: A vision for an international database of RNA sequences
Bateman, Alex; Agrawal, Shipra; Birney, Ewan; Bruford, Elspeth A.; Bujnicki, Janusz M.; Cochrane, Guy; Cole, James R.; Dinger, Marcel E.; Enright, Anton J.; Gardner, Paul P.; Gautheret, Daniel; Griffiths-Jones, Sam; Harrow, Jen; Herrero, Javier; Holmes, Ian H.; Huang, Hsien-Da; Kelly, Krystyna A.; Kersey, Paul; Kozomara, Ana; Lowe, Todd M.; Marz, Manja; Moxon, Simon; Pruitt, Kim D.; Samuelsson, Tore; Stadler, Peter F.; Vilella, Albert J.; Vogel, Jan-Hinnerk; Williams, Kelly P.; Wright, Mathew W.; Zwieb, Christian
2011-01-01
During the last decade there has been a great increase in the number of noncoding RNA genes identified, including new classes such as microRNAs and piRNAs. There is also a large growth in the amount of experimental characterization of these RNA components. Despite this growth in information, it is still difficult for researchers to access RNA data, because key data resources for noncoding RNAs have not yet been created. The most pressing omission is the lack of a comprehensive RNA sequence database, much like UniProt, which provides a comprehensive set of protein knowledge. In this article we propose the creation of a new open public resource that we term RNAcentral, which will contain a comprehensive collection of RNA sequences and fill an important gap in the provision of biomedical databases. We envision RNA researchers from all over the world joining a federated RNAcentral network, contributing specialized knowledge and databases. RNAcentral would centralize key data that are currently held across a variety of databases, allowing researchers instant access to a single, unified resource. This resource would facilitate the next generation of RNA research and help drive further discoveries, including those that improve food production and human and animal health. We encourage additional RNA database resources and research groups to join this effort. We aim to obtain international network funding to further this endeavor. PMID:21940779
Belmonte, M
In this article we review two of the main Internet information services for seeking references to bibliography and journals, and the electronic publications on the Internet, with particular emphasis on those related to neurosciencs. The main indices of bibliography are: 1. MEDLINE. By definition, this is the bibliography database. It is an 'on line' version of the magazine with a smaller format, published weekly with the title pages and summaries of most of the biomedical journals. It is based on the Index Medicus, a bibliographic index (on paper) which annually collects references to the most important biomedical journals. 2. EMBASE (Excerpta Medica). It is a direct competitor to MEDLINE, although it has the disadvantage of lack of government subsidies and is privately financed only. This bibliographic database, produced by the publishers Elsevier of Holland, covers approximately 3,500 biomedical journals from 110 countries, and is particularly useful for articles on drugs and toxicology. 3. Current Contents. It publishes the index Current Contents, a classic in this field, much appreciated by scientists in all areas: medicine, social, technology, arts and humanities. At present, it is available in an on line version known as CCC (Current Contents Connect), accessible through the web, but only to subscribers. There is a growing tendency towards the publication of biomedical journals on the Internet. Its full development, if correctly carried out, will mean the opportunity to have the best information available and will result in great benefit to all those who are already using new information technology.
Alfonso, Fernando
2010-03-01
Biomedical journals must adhere to strict standards of editorial quality. In a globalized academic scenario, biomedical journals must compete firstly to publish the most relevant original research and secondly to obtain the broadest possible visibility and the widest dissemination of their scientific contents. The cornerstone of the scientific process is still the peer-review system but additional quality criteria should be met. Recently access to medical information has been revolutionized by electronic editions. Bibliometric databases such as MEDLINE, the ISI Web of Science and Scopus offer comprehensive online information on medical literature. Classically, the prestige of biomedical journals has been measured by their impact factor but, recently, other indicators such as SCImago SJR or the Eigenfactor are emerging as alternative indices of a journal's quality. Assessing the scholarly impact of research and the merits of individual scientists remains a major challenge. Allocation of authorship credit also remains controversial. Furthermore, in our Kafkaesque world, we prefer to count rather than read the articles we judge. Quantitative publication metrics (research output) and citations analyses (scientific influence) are key determinants of the scientific success of individual investigators. However, academia is embracing new objective indicators (such as the "h" index) to evaluate scholarly merit. The present review discusses some editorial issues affecting biomedical journals, currently available bibliometric databases, bibliometric indices of journal quality and, finally, indicators of research performance and scientific success. Copyright 2010 SEEN. Published by Elsevier Espana. All rights reserved.
PubMed searches: overview and strategies for clinicians.
Lindsey, Wesley T; Olin, Bernie R
2013-04-01
PubMed is a biomedical and life sciences database maintained by a division of the National Library of Medicine known as the National Center for Biotechnology Information (NCBI). It is a large resource with more than 5600 journals indexed and greater than 22 million total citations. Searches conducted in PubMed provide references that are more specific for the intended topic compared with other popular search engines. Effective PubMed searches allow the clinician to remain current on the latest clinical trials, systematic reviews, and practice guidelines. PubMed continues to evolve by allowing users to create a customized experience through the My NCBI portal, new arrangements and options in search filters, and supporting scholarly projects through exportation of citations to reference managing software. Prepackaged search options available in the Clinical Queries feature also allow users to efficiently search for clinical literature. PubMed also provides information regarding the source journals themselves through the Journals in NCBI Databases link. This article provides an overview of the PubMed database's structure and features as well as strategies for conducting an effective search.
LAMDA at TREC CDS track 2015: Clinical Decision Support Track
2015-11-20
outperforms all the other vector space models supported by Elasticsearch. MetaMap is the online tool that maps biomedical text to the Metathesaurus, and...cases. The medical knowledge consists of 700,000 biomedical documents supported by the PubMed Central [3] which is online digital database freely...Science Research Program through the National Research Foundation (NRF) of Korea funded by the Ministry of Science, ICT , and Future Planning (MSIP
Classification of time series patterns from complex dynamic systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schryver, J.C.; Rao, N.
1998-07-01
An increasing availability of high-performance computing and data storage media at decreasing cost is making possible the proliferation of large-scale numerical databases and data warehouses. Numeric warehousing enterprises on the order of hundreds of gigabytes to terabytes are a reality in many fields such as finance, retail sales, process systems monitoring, biomedical monitoring, surveillance and transportation. Large-scale databases are becoming more accessible to larger user communities through the internet, web-based applications and database connectivity. Consequently, most researchers now have access to a variety of massive datasets. This trend will probably only continue to grow over the next several years. Unfortunately,more » the availability of integrated tools to explore, analyze and understand the data warehoused in these archives is lagging far behind the ability to gain access to the same data. In particular, locating and identifying patterns of interest in numerical time series data is an increasingly important problem for which there are few available techniques. Temporal pattern recognition poses many interesting problems in classification, segmentation, prediction, diagnosis and anomaly detection. This research focuses on the problem of classification or characterization of numerical time series data. Highway vehicles and their drivers are examples of complex dynamic systems (CDS) which are being used by transportation agencies for field testing to generate large-scale time series datasets. Tools for effective analysis of numerical time series in databases generated by highway vehicle systems are not yet available, or have not been adapted to the target problem domain. However, analysis tools from similar domains may be adapted to the problem of classification of numerical time series data.« less
Do open access biomedical journals benefit smaller countries? The Slovenian experience.
Turk, Nana
2011-06-01
Scientists from smaller countries have problems gaining visibility for their research. Does open access publishing provide a solution? Slovenia is a small country with around 5000 medical doctors, 1300 dentists and 1000 pharmacists. A search of Slovenia's Bibliographic database was carried out to identity all biomedical journals and those which are open access. Slovenia has 18 medical open access journals, but none has an impact factor and only 10 are indexed by Slovenian and international bibliographic databases. The visibility and quality of medical papers is poor. The solution might be to reduce the number of journals and encourage Slovenian scientists to publish their best articles in them. © 2011 The authors. Health Information and Libraries Journal © 2011 Health Libraries Group.
Building a biomedical cyberinfrastructure for collaborative research.
Schad, Peter A; Mobley, Lee Rivers; Hamilton, Carol M
2011-05-01
For the potential power of genome-wide association studies (GWAS) and translational medicine to be realized, the biomedical research community must adopt standard measures, vocabularies, and systems to establish an extensible biomedical cyberinfrastructure. Incorporating standard measures will greatly facilitate combining and comparing studies via meta-analysis. Incorporating consensus-based and well-established measures into various studies should reduce the variability across studies due to attributes of measurement, making findings across studies more comparable. This article describes two well-established consensus-based approaches to identifying standard measures and systems: PhenX (consensus measures for phenotypes and eXposures), and the Open Geospatial Consortium (OGC). NIH support for these efforts has produced the PhenX Toolkit, an assembled catalog of standard measures for use in GWAS and other large-scale genomic research efforts, and the RTI Spatial Impact Factor Database (SIFD), a comprehensive repository of geo-referenced variables and extensive meta-data that conforms to OGC standards. The need for coordinated development of cyberinfrastructure to support measures and systems that enhance collaboration and data interoperability is clear; this paper includes a discussion of standard protocols for ensuring data compatibility and interoperability. Adopting a cyberinfrastructure that includes standard measures and vocabularies, and open-source systems architecture, such as the two well-established systems discussed here, will enhance the potential of future biomedical and translational research. Establishing and maintaining the cyberinfrastructure will require a fundamental change in the way researchers think about study design, collaboration, and data storage and analysis. Copyright © 2011 American Journal of Preventive Medicine. Published by Elsevier Inc. All rights reserved.
Building a Biomedical Cyberinfrastructure for Collaborative Research
Schad, Peter A.; Mobley, Lee Rivers; Hamilton, Carol M.
2018-01-01
For the potential power of genome-wide association studies (GWAS) and translational medicine to be realized, the biomedical research community must adopt standard measures, vocabularies, and systems to establish an extensible biomedical cyberinfrastructure. Incorporating standard measures will greatly facilitate combining and comparing studies via meta-analysis, which is a means for deriving larger populations, needed for increased statistical power to detect less apparent and more complex associations (gene-environment interactions and polygenic gene-gene interactions). Incorporating consensus-based and well-established measures into various studies should reduce the variability across studies due to attributes of measurement, making findings across studies more comparable. This article describes two consensus-based approaches to establishing standard measures and systems: PhenX (consensus measures for Phenotypes and eXposures), and the Open Geospatial Consortium (OGC). National Institutes of Health support for these efforts has produced the PhenX Toolkit, an assembled catalog of standard measures for use in GWAS and other large-scale genomic research efforts, and the RTI Spatial Impact Factor Database (SIFD), a comprehensive repository of georeferenced variables and extensive metadata that conforms to OGC standards. The need for coordinated development of cyberinfrastructure to support collaboration and data interoperability is clear, and we discuss standard protocols for ensuring data compatibility and interoperability. Adopting a cyberinfrastructure that includes standard measures, vocabularies, and open-source systems architecture will enhance the potential of future biomedical and translational research. Establishing and maintaining the cyberinfrastructure will require a fundamental change in the way researchers think about study design, collaboration, and data storage and analysis. PMID:21521587
García-Remesal, M; Maojo, V; Billhardt, H; Crespo, J
2010-01-01
Bringing together structured and text-based sources is an exciting challenge for biomedical informaticians, since most relevant biomedical sources belong to one of these categories. In this paper we evaluate the feasibility of integrating relational and text-based biomedical sources using: i) an original logical schema acquisition method for textual databases developed by the authors, and ii) OntoFusion, a system originally designed by the authors for the integration of relational sources. We conducted an integration experiment involving a test set of seven differently structured sources covering the domain of genetic diseases. We used our logical schema acquisition method to generate schemas for all textual sources. The sources were integrated using the methods and tools provided by OntoFusion. The integration was validated using a test set of 500 queries. A panel of experts answered a questionnaire to evaluate i) the quality of the extracted schemas, ii) the query processing performance of the integrated set of sources, and iii) the relevance of the retrieved results. The results of the survey show that our method extracts coherent and representative logical schemas. Experts' feedback on the performance of the integrated system and the relevance of the retrieved results was also positive. Regarding the validation of the integration, the system successfully provided correct results for all queries in the test set. The results of the experiment suggest that text-based sources including a logical schema can be regarded as equivalent to structured databases. Using our method, previous research and existing tools designed for the integration of structured databases can be reused - possibly subject to minor modifications - to integrate differently structured sources.
SWS: accessing SRS sites contents through Web Services.
Romano, Paolo; Marra, Domenico
2008-03-26
Web Services and Workflow Management Systems can support creation and deployment of network systems, able to automate data analysis and retrieval processes in biomedical research. Web Services have been implemented at bioinformatics centres and workflow systems have been proposed for biological data analysis. New databanks are often developed by taking into account these technologies, but many existing databases do not allow a programmatic access. Only a fraction of available databanks can thus be queried through programmatic interfaces. SRS is a well know indexing and search engine for biomedical databanks offering public access to many databanks and analysis tools. Unfortunately, these data are not easily and efficiently accessible through Web Services. We have developed 'SRS by WS' (SWS), a tool that makes information available in SRS sites accessible through Web Services. Information on known sites is maintained in a database, srsdb. SWS consists in a suite of WS that can query both srsdb, for information on sites and databases, and SRS sites. SWS returns results in a text-only format and can be accessed through a WSDL compliant client. SWS enables interoperability between workflow systems and SRS implementations, by also managing access to alternative sites, in order to cope with network and maintenance problems, and selecting the most up-to-date among available systems. Development and implementation of Web Services, allowing to make a programmatic access to an exhaustive set of biomedical databases can significantly improve automation of in-silico analysis. SWS supports this activity by making biological databanks that are managed in public SRS sites available through a programmatic interface.
Asymmetric author-topic model for knowledge discovering of big data in toxicogenomics.
Chung, Ming-Hua; Wang, Yuping; Tang, Hailin; Zou, Wen; Basinger, John; Xu, Xiaowei; Tong, Weida
2015-01-01
The advancement of high-throughput screening technologies facilitates the generation of massive amount of biological data, a big data phenomena in biomedical science. Yet, researchers still heavily rely on keyword search and/or literature review to navigate the databases and analyses are often done in rather small-scale. As a result, the rich information of a database has not been fully utilized, particularly for the information embedded in the interactive nature between data points that are largely ignored and buried. For the past 10 years, probabilistic topic modeling has been recognized as an effective machine learning algorithm to annotate the hidden thematic structure of massive collection of documents. The analogy between text corpus and large-scale genomic data enables the application of text mining tools, like probabilistic topic models, to explore hidden patterns of genomic data and to the extension of altered biological functions. In this paper, we developed a generalized probabilistic topic model to analyze a toxicogenomics dataset that consists of a large number of gene expression data from the rat livers treated with drugs in multiple dose and time-points. We discovered the hidden patterns in gene expression associated with the effect of doses and time-points of treatment. Finally, we illustrated the ability of our model to identify the evidence of potential reduction of animal use.
Ontology-Oriented Programming for Biomedical Informatics.
Lamy, Jean-Baptiste
2016-01-01
Ontologies are now widely used in the biomedical domain. However, it is difficult to manipulate ontologies in a computer program and, consequently, it is not easy to integrate ontologies with databases or websites. Two main approaches have been proposed for accessing ontologies in a computer program: traditional API (Application Programming Interface) and ontology-oriented programming, either static or dynamic. In this paper, we will review these approaches and discuss their appropriateness for biomedical ontologies. We will also present an experience feedback about the integration of an ontology in a computer software during the VIIIP research project. Finally, we will present OwlReady, the solution we developed.
Increased co-first authorships in biomedical and clinical publications: a call for recognition.
Conte, Marisa L; Maat, Stacy L; Omary, M Bishr
2013-10-01
There has been a dramatic increase in the number and percentage of publications in biomedical and clinical journals in which two or more coauthors claim first authorship, with a change in some journals from no joint first authorship in 1990 to co-first authorship of >30% of all research publications in 2012. As biomedical and clinical research become increasingly complex and team-driven, and given the importance attributed to first authorship by grant reviewers and promotion and tenure committees, the time is ripe for journals, bibliographic databases, and authors to highlight equal first author contributions of published original research.
An Open-source Toolbox for Analysing and Processing PhysioNet Databases in MATLAB and Octave.
Silva, Ikaro; Moody, George B
The WaveForm DataBase (WFDB) Toolbox for MATLAB/Octave enables integrated access to PhysioNet's software and databases. Using the WFDB Toolbox for MATLAB/Octave, users have access to over 50 physiological databases in PhysioNet. The toolbox provides access over 4 TB of biomedical signals including ECG, EEG, EMG, and PLETH. Additionally, most signals are accompanied by metadata such as medical annotations of clinical events: arrhythmias, sleep stages, seizures, hypotensive episodes, etc. Users of this toolbox should easily be able to reproduce, validate, and compare results published based on PhysioNet's software and databases.
2014-11-01
sematic type. Injury or Poisoning inpo T037 Anatomical Abnormality anab T190 Given a document D, a concept vector = {1, 2, … , ...integrating biomedical terminology . Nucleic acids research 32, Database issue (2004), 267–270. 5. Chapman, W.W., Hillert, D., Velupillai, S., et...Conference (TREC), (2011). 9. Koopman, B. and Zuccon, G. Understanding negation and family history to improve clinical information retrieval. Proceedings
A corpus for plant-chemical relationships in the biomedical domain.
Choi, Wonjun; Kim, Baeksoo; Cho, Hyejin; Lee, Doheon; Lee, Hyunju
2016-09-20
Plants are natural products that humans consume in various ways including food and medicine. They have a long empirical history of treating diseases with relatively few side effects. Based on these strengths, many studies have been performed to verify the effectiveness of plants in treating diseases. It is crucial to understand the chemicals contained in plants because these chemicals can regulate activities of proteins that are key factors in causing diseases. With the accumulation of a large volume of biomedical literature in various databases such as PubMed, it is possible to automatically extract relationships between plants and chemicals in a large-scale way if we apply a text mining approach. A cornerstone of achieving this task is a corpus of relationships between plants and chemicals. In this study, we first constructed a corpus for plant and chemical entities and for the relationships between them. The corpus contains 267 plant entities, 475 chemical entities, and 1,007 plant-chemical relationships (550 and 457 positive and negative relationships, respectively), which are drawn from 377 sentences in 245 PubMed abstracts. Inter-annotator agreement scores for the corpus among three annotators were measured. The simple percent agreement scores for entities and trigger words for the relationships were 99.6 and 94.8 %, respectively, and the overall kappa score for the classification of positive and negative relationships was 79.8 %. We also developed a rule-based model to automatically extract such plant-chemical relationships. When we evaluated the rule-based model using the corpus and randomly selected biomedical articles, overall F-scores of 68.0 and 61.8 % were achieved, respectively. We expect that the corpus for plant-chemical relationships will be a useful resource for enhancing plant research. The corpus is available at http://combio.gist.ac.kr/plantchemicalcorpus .
The curation of genetic variants: difficulties and possible solutions.
Pandey, Kapil Raj; Maden, Narendra; Poudel, Barsha; Pradhananga, Sailendra; Sharma, Amit Kumar
2012-12-01
The curation of genetic variants from biomedical articles is required for various clinical and research purposes. Nowadays, establishment of variant databases that include overall information about variants is becoming quite popular. These databases have immense utility, serving as a user-friendly information storehouse of variants for information seekers. While manual curation is the gold standard method for curation of variants, it can turn out to be time-consuming on a large scale thus necessitating the need for automation. Curation of variants described in biomedical literature may not be straightforward mainly due to various nomenclature and expression issues. Though current trends in paper writing on variants is inclined to the standard nomenclature such that variants can easily be retrieved, we have a massive store of variants in the literature that are present as non-standard names and the online search engines that are predominantly used may not be capable of finding them. For effective curation of variants, knowledge about the overall process of curation, nature and types of difficulties in curation, and ways to tackle the difficulties during the task are crucial. Only by effective curation, can variants be correctly interpreted. This paper presents the process and difficulties of curation of genetic variants with possible solutions and suggestions from our work experience in the field including literature support. The paper also highlights aspects of interpretation of genetic variants and the importance of writing papers on variants following standard and retrievable methods. Copyright © 2012. Published by Elsevier Ltd.
The Curation of Genetic Variants: Difficulties and Possible Solutions
Pandey, Kapil Raj; Maden, Narendra; Poudel, Barsha; Pradhananga, Sailendra; Sharma, Amit Kumar
2012-01-01
The curation of genetic variants from biomedical articles is required for various clinical and research purposes. Nowadays, establishment of variant databases that include overall information about variants is becoming quite popular. These databases have immense utility, serving as a user-friendly information storehouse of variants for information seekers. While manual curation is the gold standard method for curation of variants, it can turn out to be time-consuming on a large scale thus necessitating the need for automation. Curation of variants described in biomedical literature may not be straightforward mainly due to various nomenclature and expression issues. Though current trends in paper writing on variants is inclined to the standard nomenclature such that variants can easily be retrieved, we have a massive store of variants in the literature that are present as non-standard names and the online search engines that are predominantly used may not be capable of finding them. For effective curation of variants, knowledge about the overall process of curation, nature and types of difficulties in curation, and ways to tackle the difficulties during the task are crucial. Only by effective curation, can variants be correctly interpreted. This paper presents the process and difficulties of curation of genetic variants with possible solutions and suggestions from our work experience in the field including literature support. The paper also highlights aspects of interpretation of genetic variants and the importance of writing papers on variants following standard and retrievable methods. PMID:23317699
Smartphone home monitoring of ECG
NASA Astrophysics Data System (ADS)
Szu, Harold; Hsu, Charles; Moon, Gyu; Landa, Joseph; Nakajima, Hiroshi; Hata, Yutaka
2012-06-01
A system of ambulatory, halter, electrocardiography (ECG) monitoring system has already been commercially available for recording and transmitting heartbeats data by the Internet. However, it enjoys the confidence with a reservation and thus a limited market penetration, our system was targeting at aging global villagers having an increasingly biomedical wellness (BMW) homecare needs, not hospital related BMI (biomedical illness). It was designed within SWaP-C (Size, Weight, and Power, Cost) using 3 innovative modules: (i) Smart Electrode (lowpower mixed signal embedded with modern compressive sensing and nanotechnology to improve the electrodes' contact impedance); (ii) Learnable Database (in terms of adaptive wavelets transform QRST feature extraction, Sequential Query Relational database allowing home care monitoring retrievable Aided Target Recognition); (iii) Smartphone (touch screen interface, powerful computation capability, caretaker reporting with GPI, ID, and patient panic button for programmable emergence procedure). It can provide a supplementary home screening system for the post or the pre-diagnosis care at home with a build-in database searchable with the time, the place, and the degree of urgency happened, using in-situ screening.
Biomedical Requirements for High Productivity Computing Systems
2005-04-01
server at http://www.ncbi.nlm.nih.gov/BLAST/. There are many variants of BLAST, including: 1. BLASTN - Compares a DNA query to a DNA database. Searches ...database (3 reading frames from each strand of the DNA) searching . 13 4. TBLASTN - Compares a protein query to a DNA database, in the 6 possible...the molecular during this phase. After eliminating molecules that could not match the query , an atom-by-atom search for the molecules in conducted
Biomedical journals in Republic of Macedonia: the current state.
Polenakovic, Momir; Danevska, Lenche
2014-01-01
Several biomedical journals in the Republic of Macedonia have succeeded in maintaining regular publication over the years, but only a few have a long-standing tradition. In this paper we present the basic characteristics of 18 biomedical journals that have been published without a break in the Republic of Macedonia. Of these, more details are given for 14 journals, a particular emphasis being on the journal Prilozi/Contributions of the Macedonian Academy of Sciences and Arts, Section of Medical Sciences as one of the journals with a long-term publishing tradition and one of the journals included in the Medline/PubMed database. A brief or broad description is given for the following journals: Macedonian Medical Review, Acta Morphologica, Physioacta, MJMS-Macedonian Journal of Medical Sciences, International Medical Journal Medicus, Archives of Public Health, Epilepsy, Macedonian Orthopaedics and Traumatology Journal, BANTAO Journal, Macedonian Dental Review, Macedonian Pharmaceutical Bulletin, Macedonian Veterinary Review, Journal of Special Education and Rehabilitation, Balkan Journal of Medical Genetics, Contributions of the Macedonian Scientific Society of Bitola, Vox Medici, Social Medicine: Professional Journal for Public Health, and Prilozi/Contributions of the Macedonian Academy of Sciences and Arts. Journals from Macedonia should aim to be published regularly, should comply with the Uniform requirements for manuscripts submitted to biomedical journals, and with the recommendations of reliable organizations working in the field of publishing and research. These are the key prerequisites which Macedonian journals have to accomplish in order to be included in renowned international bibliographic databases. Thus the results of biomedical science from the Republic of Macedonia will be presented to the international scientific arena.
Ponnaiah, Paulraj; Vnoothenei, Nagiah; Chandramohan, Muruganandham; Thevarkattil, Mohamed Javad Pazhayakath
2018-01-30
Polyhydroxyalkanoates are bio-based, biodegradable naturally occurring polymers produced by a wide range of organisms, from bacteria to higher mammals. The properties and biocompatibility of PHA make it possible for a wide spectrum of applications. In this context, we analyze the potential applications of PHA in biomedical science by exploring the global trend through the patent survey. The survey suggests that PHA is an attractive candidate in such a way that their applications are widely distributed in the medical industry, drug delivery system, dental material, tissue engineering, packaging material as well as other useful products. In our present study, we explored patents associated with various biomedical applications of polyhydroxyalkanoates. Patent databases of European Patent Office, United States Patent and Trademark Office and World Intellectual Property Organization were mined. We developed an intensive exploration approach to eliminate overlapping patents and sort out significant patents. We demarcated the keywords and search criterions and established search patterns for the database request. We retrieved documents within the recent 6 years, 2010 to 2016 and sort out the collected data stepwise to gather the most appropriate documents in patent families for further scrutiny. By this approach, we retrieved 23,368 patent documents from all the three databases and the patent titles were further analyzed for the relevance of polyhydroxyalkanoates in biomedical applications. This ensued in the documentation of approximately 226 significant patents associated with biomedical applications of polyhydroxyalkanoates and the information was classified into six major groups. Polyhydroxyalkanoates has been patented in such a way that their applications are widely distributed in the medical industry, drug delivery system, dental material, tissue engineering, packaging material as well as other useful products. There are many avenues through which PHA & PHB could be used. Our analysis shows patent information can be used to identify various applications of PHA and its representatives in the biomedical field. Upcoming studies can focus on the application of PHA in the different field to discover the related topics and associate to this study. We believe that this approach of analysis and findings can initiate new researchers to undertake similar kind of studies in their represented field to fill the gap between the patent articles and researchpublications. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Biomedical waste management in Ayurveda hospitals - current practices & future prospectives.
Rajan, Renju; Robin, Delvin T; M, Vandanarani
2018-03-16
Biomedical waste management is an integral part of traditional and contemporary system of health care. The paper focuses on the identification and classification of biomedical wastes in Ayurvedic hospitals, current practices of its management in Ayurveda hospitals and its future prospective. Databases like PubMed (1975-2017 Feb), Scopus (1960-2017), AYUSH Portal, DOAJ, DHARA and Google scholar were searched. We used the medical subject headings 'biomedical waste' and 'health care waste' for identification and classification. The terms 'biomedical waste management', 'health care waste management' alone and combined with 'Ayurveda' or 'Ayurvedic' for current practices and recent advances in the treatment of these wastes were used. We made a humble attempt to categorize the biomedical wastes from Ayurvedic hospitals as the available data about its grouping is very scarce. Proper biomedical waste management is the mainstay of hospital cleanliness, hospital hygiene and maintenance activities. Current disposal techniques adopted for Ayurveda biomedical wastes are - sewage/drains, incineration and land fill. But these methods are having some merits as well as demerits. Our review has identified a number of interesting areas for future research such as the logical application of bioremediation techniques in biomedical waste management and the usage of effective micro-organisms and solar energy in waste disposal. Copyright © 2017 Transdisciplinary University, Bangalore and World Ayurveda Foundation. Published by Elsevier B.V. All rights reserved.
A National Virtual Specimen Database for Early Cancer Detection
NASA Technical Reports Server (NTRS)
Crichton, Daniel; Kincaid, Heather; Kelly, Sean; Thornquist, Mark; Johnsey, Donald; Winget, Marcy
2003-01-01
Access to biospecimens is essential for enabling cancer biomarker discovery. The National Cancer Institute's (NCI) Early Detection Research Network (EDRN) comprises and integrates a large number of laboratories into a network in order to establish a collaborative scientific environment to discover and validate disease markers. The diversity of both the institutions and the collaborative focus has created the need for establishing cross-disciplinary teams focused on integrating expertise in biomedical research, computational and biostatistics, and computer science. Given the collaborative design of the network, the EDRN needed an informatics infrastructure. The Fred Hutchinson Cancer Research Center, the National Cancer Institute,and NASA's Jet Propulsion Laboratory (JPL) teamed up to build an informatics infrastructure creating a collaborative, science-driven research environment despite the geographic and morphology differences of the information systems that existed within the diverse network. EDRN investigators identified the need to share biospecimen data captured across the country managed in disparate databases. As a result, the informatics team initiated an effort to create a virtual tissue database whereby scientists could search and locate details about specimens located at collaborating laboratories. Each database, however, was locally implemented and integrated into collection processes and methods unique to each institution. This meant that efforts to integrate databases needed to be done in a manner that did not require redesign or re-implementation of existing system
G-Bean: an ontology-graph based web tool for biomedical literature retrieval
2014-01-01
Background Currently, most people use NCBI's PubMed to search the MEDLINE database, an important bibliographical information source for life science and biomedical information. However, PubMed has some drawbacks that make it difficult to find relevant publications pertaining to users' individual intentions, especially for non-expert users. To ameliorate the disadvantages of PubMed, we developed G-Bean, a graph based biomedical search engine, to search biomedical articles in MEDLINE database more efficiently. Methods G-Bean addresses PubMed's limitations with three innovations: (1) Parallel document index creation: a multithreaded index creation strategy is employed to generate the document index for G-Bean in parallel; (2) Ontology-graph based query expansion: an ontology graph is constructed by merging four major UMLS (Version 2013AA) vocabularies, MeSH, SNOMEDCT, CSP and AOD, to cover all concepts in National Library of Medicine (NLM) database; a Personalized PageRank algorithm is used to compute concept relevance in this ontology graph and the Term Frequency - Inverse Document Frequency (TF-IDF) weighting scheme is used to re-rank the concepts. The top 500 ranked concepts are selected for expanding the initial query to retrieve more accurate and relevant information; (3) Retrieval and re-ranking of documents based on user's search intention: after the user selects any article from the existing search results, G-Bean analyzes user's selections to determine his/her true search intention and then uses more relevant and more specific terms to retrieve additional related articles. The new articles are presented to the user in the order of their relevance to the already selected articles. Results Performance evaluation with 106 OHSUMED benchmark queries shows that G-Bean returns more relevant results than PubMed does when using these queries to search the MEDLINE database. PubMed could not even return any search result for some OHSUMED queries because it failed to form the appropriate Boolean query statement automatically from the natural language query strings. G-Bean is available at http://bioinformatics.clemson.edu/G-Bean/index.php. Conclusions G-Bean addresses PubMed's limitations with ontology-graph based query expansion, automatic document indexing, and user search intention discovery. It shows significant advantages in finding relevant articles from the MEDLINE database to meet the information need of the user. PMID:25474588
2013-01-01
Background Due to the growing number of biomedical entries in data repositories of the National Center for Biotechnology Information (NCBI), it is difficult to collect, manage and process all of these entries in one place by third-party software developers without significant investment in hardware and software infrastructure, its maintenance and administration. Web services allow development of software applications that integrate in one place the functionality and processing logic of distributed software components, without integrating the components themselves and without integrating the resources to which they have access. This is achieved by appropriate orchestration or choreography of available Web services and their shared functions. After the successful application of Web services in the business sector, this technology can now be used to build composite software tools that are oriented towards biomedical data processing. Results We have developed a new tool for efficient and dynamic data exploration in GenBank and other NCBI databases. A dedicated search GenBank system makes use of NCBI Web services and a package of Entrez Programming Utilities (eUtils) in order to provide extended searching capabilities in NCBI data repositories. In search GenBank users can use one of the three exploration paths: simple data searching based on the specified user’s query, advanced data searching based on the specified user’s query, and advanced data exploration with the use of macros. search GenBank orchestrates calls of particular tools available through the NCBI Web service providing requested functionality, while users interactively browse selected records in search GenBank and traverse between NCBI databases using available links. On the other hand, by building macros in the advanced data exploration mode, users create choreographies of eUtils calls, which can lead to the automatic discovery of related data in the specified databases. Conclusions search GenBank extends standard capabilities of the NCBI Entrez search engine in querying biomedical databases. The possibility of creating and saving macros in the search GenBank is a unique feature and has a great potential. The potential will further grow in the future with the increasing density of networks of relationships between data stored in particular databases. search GenBank is available for public use at http://sgb.biotools.pl/. PMID:23452691
Mrozek, Dariusz; Małysiak-Mrozek, Bożena; Siążnik, Artur
2013-03-01
Due to the growing number of biomedical entries in data repositories of the National Center for Biotechnology Information (NCBI), it is difficult to collect, manage and process all of these entries in one place by third-party software developers without significant investment in hardware and software infrastructure, its maintenance and administration. Web services allow development of software applications that integrate in one place the functionality and processing logic of distributed software components, without integrating the components themselves and without integrating the resources to which they have access. This is achieved by appropriate orchestration or choreography of available Web services and their shared functions. After the successful application of Web services in the business sector, this technology can now be used to build composite software tools that are oriented towards biomedical data processing. We have developed a new tool for efficient and dynamic data exploration in GenBank and other NCBI databases. A dedicated search GenBank system makes use of NCBI Web services and a package of Entrez Programming Utilities (eUtils) in order to provide extended searching capabilities in NCBI data repositories. In search GenBank users can use one of the three exploration paths: simple data searching based on the specified user's query, advanced data searching based on the specified user's query, and advanced data exploration with the use of macros. search GenBank orchestrates calls of particular tools available through the NCBI Web service providing requested functionality, while users interactively browse selected records in search GenBank and traverse between NCBI databases using available links. On the other hand, by building macros in the advanced data exploration mode, users create choreographies of eUtils calls, which can lead to the automatic discovery of related data in the specified databases. search GenBank extends standard capabilities of the NCBI Entrez search engine in querying biomedical databases. The possibility of creating and saving macros in the search GenBank is a unique feature and has a great potential. The potential will further grow in the future with the increasing density of networks of relationships between data stored in particular databases. search GenBank is available for public use at http://sgb.biotools.pl/.
G-Bean: an ontology-graph based web tool for biomedical literature retrieval.
Wang, James Z; Zhang, Yuanyuan; Dong, Liang; Li, Lin; Srimani, Pradip K; Yu, Philip S
2014-01-01
Currently, most people use NCBI's PubMed to search the MEDLINE database, an important bibliographical information source for life science and biomedical information. However, PubMed has some drawbacks that make it difficult to find relevant publications pertaining to users' individual intentions, especially for non-expert users. To ameliorate the disadvantages of PubMed, we developed G-Bean, a graph based biomedical search engine, to search biomedical articles in MEDLINE database more efficiently. G-Bean addresses PubMed's limitations with three innovations: (1) Parallel document index creation: a multithreaded index creation strategy is employed to generate the document index for G-Bean in parallel; (2) Ontology-graph based query expansion: an ontology graph is constructed by merging four major UMLS (Version 2013AA) vocabularies, MeSH, SNOMEDCT, CSP and AOD, to cover all concepts in National Library of Medicine (NLM) database; a Personalized PageRank algorithm is used to compute concept relevance in this ontology graph and the Term Frequency - Inverse Document Frequency (TF-IDF) weighting scheme is used to re-rank the concepts. The top 500 ranked concepts are selected for expanding the initial query to retrieve more accurate and relevant information; (3) Retrieval and re-ranking of documents based on user's search intention: after the user selects any article from the existing search results, G-Bean analyzes user's selections to determine his/her true search intention and then uses more relevant and more specific terms to retrieve additional related articles. The new articles are presented to the user in the order of their relevance to the already selected articles. Performance evaluation with 106 OHSUMED benchmark queries shows that G-Bean returns more relevant results than PubMed does when using these queries to search the MEDLINE database. PubMed could not even return any search result for some OHSUMED queries because it failed to form the appropriate Boolean query statement automatically from the natural language query strings. G-Bean is available at http://bioinformatics.clemson.edu/G-Bean/index.php. G-Bean addresses PubMed's limitations with ontology-graph based query expansion, automatic document indexing, and user search intention discovery. It shows significant advantages in finding relevant articles from the MEDLINE database to meet the information need of the user.
Securely and Flexibly Sharing a Biomedical Data Management System
Wang, Fusheng; Hussels, Phillip; Liu, Peiya
2011-01-01
Biomedical database systems need not only to address the issues of managing complex data, but also to provide data security and access control to the system. These include not only system level security, but also instance level access control such as access of documents, schemas, or aggregation of information. The latter is becoming more important as multiple users can share a single scientific data management system to conduct their research, while data have to be protected before they are published or IP-protected. This problem is challenging as users’ needs for data security vary dramatically from one application to another, in terms of who to share with, what resources to be shared, and at what access level. We develop a comprehensive data access framework for a biomedical data management system SciPort. SciPort provides fine-grained multi-level space based access control of resources at not only object level (documents and schemas), but also space level (resources set aggregated in a hierarchy way). Furthermore, to simplify the management of users and privileges, customizable role-based user model is developed. The access control is implemented efficiently by integrating access privileges into the backend XML database, thus efficient queries are supported. The secure access approach we take makes it possible for multiple users to share the same biomedical data management system with flexible access management and high data security. PMID:21625285
The BioIntelligence Framework: a new computational platform for biomedical knowledge computing
Farley, Toni; Kiefer, Jeff; Lee, Preston; Von Hoff, Daniel; Trent, Jeffrey M; Colbourn, Charles
2013-01-01
Breakthroughs in molecular profiling technologies are enabling a new data-intensive approach to biomedical research, with the potential to revolutionize how we study, manage, and treat complex diseases. The next great challenge for clinical applications of these innovations will be to create scalable computational solutions for intelligently linking complex biomedical patient data to clinically actionable knowledge. Traditional database management systems (DBMS) are not well suited to representing complex syntactic and semantic relationships in unstructured biomedical information, introducing barriers to realizing such solutions. We propose a scalable computational framework for addressing this need, which leverages a hypergraph-based data model and query language that may be better suited for representing complex multi-lateral, multi-scalar, and multi-dimensional relationships. We also discuss how this framework can be used to create rapid learning knowledge base systems to intelligently capture and relate complex patient data to biomedical knowledge in order to automate the recovery of clinically actionable information. PMID:22859646
GDRMS: a system for automatic extraction of the disease-centre relation
NASA Astrophysics Data System (ADS)
Yang, Ronggen; Zhang, Yue; Gong, Lejun
2012-01-01
With the rapidly increasing of biomedical literature, the deluge of new articles is leading to information overload. Extracting the available knowledge from the huge amount of biomedical literature has become a major challenge. GDRMS is developed as a tool that extracts the relationship between disease and gene, gene and gene from biomedical literatures using text mining technology. It is a ruled-based system which also provides disease-centre network visualization, constructs the disease-gene database, and represents a gene engine for understanding the function of the gene. The main focus of GDRMS is to provide a valuable opportunity to explore the relationship between disease and gene for the research community about etiology of disease.
Chen, R S; Nadkarni, P; Marenco, L; Levin, F; Erdos, J; Miller, P L
2000-01-01
The entity-attribute-value representation with classes and relationships (EAV/CR) provides a flexible and simple database schema to store heterogeneous biomedical data. In certain circumstances, however, the EAV/CR model is known to retrieve data less efficiently than conventionally based database schemas. To perform a pilot study that systematically quantifies performance differences for database queries directed at real-world microbiology data modeled with EAV/CR and conventional representations, and to explore the relative merits of different EAV/CR query implementation strategies. Clinical microbiology data obtained over a ten-year period were stored using both database models. Query execution times were compared for four clinically oriented attribute-centered and entity-centered queries operating under varying conditions of database size and system memory. The performance characteristics of three different EAV/CR query strategies were also examined. Performance was similar for entity-centered queries in the two database models. Performance in the EAV/CR model was approximately three to five times less efficient than its conventional counterpart for attribute-centered queries. The differences in query efficiency became slightly greater as database size increased, although they were reduced with the addition of system memory. The authors found that EAV/CR queries formulated using multiple, simple SQL statements executed in batch were more efficient than single, large SQL statements. This paper describes a pilot project to explore issues in and compare query performance for EAV/CR and conventional database representations. Although attribute-centered queries were less efficient in the EAV/CR model, these inefficiencies may be addressable, at least in part, by the use of more powerful hardware or more memory, or both.
Karabulut, Nevzat
2017-03-01
The aim of this study is to investigate the frequency of incorrect citations and its effects on the impact factor of a specific biomedical journal: the American Journal of Roentgenology. The Cited Reference Search function of Thomson Reuters' Web of Science database (formerly the Institute for Scientific Information's Web of Knowledge database) was used to identify erroneous citations. This was done by entering the journal name into the Cited Work field and entering "2011-2012" into the Cited Year(s) field. The errors in any part of the inaccurately cited references (e.g., author names, title, year, volume, issue, and page numbers) were recorded, and the types of errors (i.e., absent, deficient, or mistyped) were analyzed. Erroneous citations were corrected using the Suggest a Correction function of the Web of Science database. The effect of inaccurate citations on the impact factor of the AJR was calculated. Overall, 183 of 1055 citable articles published in 2011-2012 were inaccurately cited 423 times (mean [± SD], 2.31 ± 4.67 times; range, 1-44 times). Of these 183 articles, 110 (60.1%) were web-only articles and 44 (24.0%) were print articles. The most commonly identified errors were page number errors (44.8%) and misspelling of an author's name (20.2%). Incorrect citations adversely affected the impact factor of the AJR by 0.065 in 2012 and by 0.123 in 2013. Inaccurate citations are not infrequent in biomedical journals, yet they can be detected and corrected using the Web of Science database. Although the accuracy of references is primarily the responsibility of authors, the journal editorial office should also define a periodic inaccurate citation check task and correct erroneous citations to reclaim unnecessarily lost credit.
Shatkay, Hagit; Pan, Fengxia; Rzhetsky, Andrey; Wilbur, W. John
2008-01-01
Motivation: Much current research in biomedical text mining is concerned with serving biologists by extracting certain information from scientific text. We note that there is no ‘average biologist’ client; different users have distinct needs. For instance, as noted in past evaluation efforts (BioCreative, TREC, KDD) database curators are often interested in sentences showing experimental evidence and methods. Conversely, lab scientists searching for known information about a protein may seek facts, typically stated with high confidence. Text-mining systems can target specific end-users and become more effective, if the system can first identify text regions rich in the type of scientific content that is of interest to the user, retrieve documents that have many such regions, and focus on fact extraction from these regions. Here, we study the ability to characterize and classify such text automatically. We have recently introduced a multi-dimensional categorization and annotation scheme, developed to be applicable to a wide variety of biomedical documents and scientific statements, while intended to support specific biomedical retrieval and extraction tasks. Results: The annotation scheme was applied to a large corpus in a controlled effort by eight independent annotators, where three individual annotators independently tagged each sentence. We then trained and tested machine learning classifiers to automatically categorize sentence fragments based on the annotation. We discuss here the issues involved in this task, and present an overview of the results. The latter strongly suggest that automatic annotation along most of the dimensions is highly feasible, and that this new framework for scientific sentence categorization is applicable in practice. Contact: shatkay@cs.queensu.ca PMID:18718948
Offerhaus, L
1989-06-01
The problems of the direct composition of a biomedical manuscript on a personal computer are discussed. Most word processing software is unsuitable because literature references, once stored, cannot be rearranged if major changes are necessary. These obstacles have been overcome in Manuscript Manager, a combination of word processing and database software. As it follows Council of Biology Editors and Vancouver rules, the printouts should be technically acceptable to most leading biomedical journals.
Abstracting data warehousing issues in scientific research.
Tews, Cody; Bracio, Boris R
2002-01-01
This paper presents the design and implementation of the Idaho Biomedical Data Management System (IBDMS). This system preprocesses biomedical data from the IMPROVE (Improving Control of Patient Status in Critical Care) library via an Open Database Connectivity (ODBC) connection. The ODBC connection allows for local and remote simulations to access filtered, joined, and sorted data using the Structured Query Language (SQL). The tool is capable of providing an overview of available data in addition to user defined data subset for verification of models of the human respiratory system.
A recent advance in the automatic indexing of the biomedical literature.
Névéol, Aurélie; Shooshan, Sonya E; Humphrey, Susanne M; Mork, James G; Aronson, Alan R
2009-10-01
The volume of biomedical literature has experienced explosive growth in recent years. This is reflected in the corresponding increase in the size of MEDLINE, the largest bibliographic database of biomedical citations. Indexers at the US National Library of Medicine (NLM) need efficient tools to help them accommodate the ensuing workload. After reviewing issues in the automatic assignment of Medical Subject Headings (MeSH terms) to biomedical text, we focus more specifically on the new subheading attachment feature for NLM's Medical Text Indexer (MTI). Natural Language Processing, statistical, and machine learning methods of producing automatic MeSH main heading/subheading pair recommendations were assessed independently and combined. The best combination achieves 48% precision and 30% recall. After validation by NLM indexers, a suitable combination of the methods presented in this paper was integrated into MTI as a subheading attachment feature producing MeSH indexing recommendations compliant with current state-of-the-art indexing practice.
Conceptual biology, hypothesis discovery, and text mining: Swanson's legacy.
Bekhuis, Tanja
2006-04-03
Innovative biomedical librarians and information specialists who want to expand their roles as expert searchers need to know about profound changes in biology and parallel trends in text mining. In recent years, conceptual biology has emerged as a complement to empirical biology. This is partly in response to the availability of massive digital resources such as the network of databases for molecular biologists at the National Center for Biotechnology Information. Developments in text mining and hypothesis discovery systems based on the early work of Swanson, a mathematician and information scientist, are coincident with the emergence of conceptual biology. Very little has been written to introduce biomedical digital librarians to these new trends. In this paper, background for data and text mining, as well as for knowledge discovery in databases (KDD) and in text (KDT) is presented, then a brief review of Swanson's ideas, followed by a discussion of recent approaches to hypothesis discovery and testing. 'Testing' in the context of text mining involves partially automated methods for finding evidence in the literature to support hypothetical relationships. Concluding remarks follow regarding (a) the limits of current strategies for evaluation of hypothesis discovery systems and (b) the role of literature-based discovery in concert with empirical research. Report of an informatics-driven literature review for biomarkers of systemic lupus erythematosus is mentioned. Swanson's vision of the hidden value in the literature of science and, by extension, in biomedical digital databases, is still remarkably generative for information scientists, biologists, and physicians.
Text Mining the Biomedical Literature
2007-11-05
activities, and repeating past mistakes, or 3) agencies not participating in joint efforts that would fully exploit each agency’s strengths...research and joint projects (multi- department, multi-agency, multi-national, and government-industry) appropriate? • Is the balance among single...overall database taxonomy, i.e., are there any concepts missing from any of the databases, and even if not, do all the concepts bear the same
State of reporting of primary biomedical research: a scoping review protocol
Mbuagbaw, Lawrence; Samaan, Zainab; Jin, Yanling; Nwosu, Ikunna; Levine, Mitchell A H; Adachi, Jonathan D; Thabane, Lehana
2017-01-01
Introduction Incomplete or inconsistent reporting remains a major concern in the biomedical literature. Incomplete or inconsistent reporting may yield the published findings unreliable, irreproducible or sometimes misleading. In this study based on evidence from systematic reviews and surveys that have evaluated the reporting issues in primary biomedical studies, we aim to conduct a scoping review with focuses on (1) the state-of-the-art extent of adherence to the emerging reporting guidelines in primary biomedical research, (2) the inconsistency between protocols or registrations and full reports and (3) the disagreement between abstracts and full-text articles. Methods and analyses We will use a comprehensive search strategy to retrieve all available and eligible systematic reviews and surveys in the literature. We will search the following electronic databases: Web of Science, Excerpta Medica Database (EMBASE), MEDLINE and Cumulative Index to Nursing and Allied Health Literature (CINAHL). Our outcomes are levels of adherence to reporting guidelines, levels of consistency between protocols or registrations and full reports and the agreement between abstracts and full reports, all of which will be expressed as percentages, quality scores or categorised rating (such as high, medium and low). No pooled analyses will be performed quantitatively given the heterogeneity of the included systematic reviews and surveys. Likewise, factors associated with improved completeness and consistency of reporting will be summarised qualitatively. The quality of the included systematic reviews will be evaluated using AMSTAR (a measurement tool to assess systematic reviews). Ethics and dissemination All findings will be published in peer-reviewed journals and relevant conferences. These results may advance our understanding of the extent of incomplete and inconsistent reporting, factors related to improved completeness and consistency of reporting and potential recommendations for various stakeholders in the biomedical community. PMID:28360252
Accessing Biomedical Literature in the Current Information Landscape
Khare, Ritu; Leaman, Robert; Lu, Zhiyong
2015-01-01
i. Summary Biomedical and life sciences literature is unique because of its exponentially increasing volume and interdisciplinary nature. Biomedical literature access is essential for several types of users including biomedical researchers, clinicians, database curators, and bibliometricians. In the past few decades, several online search tools and literature archives, generic as well as biomedicine-specific, have been developed. We present this chapter in the light of three consecutive steps of literature access: searching for citations, retrieving full-text, and viewing the article. The first section presents the current state of practice of biomedical literature access, including an analysis of the search tools most frequently used by the users, including PubMed, Google Scholar, Web of Science, Scopus, and Embase, and a study on biomedical literature archives such as PubMed Central. The next section describes current research and the state-of-the-art systems motivated by the challenges a user faces during query formulation and interpretation of search results. The research solutions are classified into five key areas related to text and data mining, text similarity search, semantic search, query support, relevance ranking, and clustering results. Finally, the last section describes some predicted future trends for improving biomedical literature access, such as searching and reading articles on portable devices, and adoption of the open access policy. PMID:24788259
Mapping the literature of nursing: 1996–2000
Allen, Margaret (Peg); Jacobs, Susan Kaplan; Levy, June R.
2006-01-01
Introduction: This project is a collaborative effort of the Task Force on Mapping the Nursing Literature of the Nursing and Allied Health Resources Section of the Medical Library Association. This overview summarizes eighteen studies covering general nursing and sixteen specialties. Method: Following a common protocol, citations from source journals were analyzed for a three-year period within the years 1996 to 2000. Analysis included cited formats, age, and ranking of the frequency of cited journal titles. Highly cited journals were analyzed for coverage in twelve health sciences and academic databases. Results: Journals were the most frequently cited format, followed by books. More than 60% of the cited resources were published in the previous seven years. Bradford's law was validated, with a small core of cited journals accounting for a third of the citations. Medical and science databases provided the most comprehensive access for biomedical titles, while CINAHL and PubMed provided the best access for nursing journals. Discussion: Beyond a heavily cited core, nursing journal citations are widely dispersed among a variety of sources and disciplines, with corresponding access via a variety of bibliographic tools. Results underscore the interdisciplinary nature of the nursing profession. Conclusion: For comprehensive searches, nurses need to search multiple databases. Libraries need to provide access to databases beyond PubMed, including CINAHL and academic databases. Database vendors should improve their coverage of nursing, biomedical, and psychosocial titles identified in these studies. Additional research is needed to update these studies and analyze nursing specialties not covered. PMID:16636714
Mapping the literature of nursing: 1996-2000.
Allen, Margaret Peg; Jacobs, Susan Kaplan; Levy, June R
2006-04-01
This project is a collaborative effort of the Task Force on Mapping the Nursing Literature of the Nursing and Allied Health Resources Section of the Medical Library Association. This overview summarizes eighteen studies covering general nursing and sixteen specialties. Following a common protocol, citations from source journals were analyzed for a three-year period within the years 1996 to 2000. Analysis included cited formats, age, and ranking of the frequency of cited journal titles. Highly cited journals were analyzed for coverage in twelve health sciences and academic databases. Journals were the most frequently cited format, followed by books. More than 60% of the cited resources were published in the previous seven years. Bradford's law was validated, with a small core of cited journals accounting for a third of the citations. Medical and science databases provided the most comprehensive access for biomedical titles, while CINAHL and PubMed provided the best access for nursing journals. Beyond a heavily cited core, nursing journal citations are widely dispersed among a variety of sources and disciplines, with corresponding access via a variety of bibliographic tools. Results underscore the interdisciplinary nature of the nursing profession. For comprehensive searches, nurses need to search multiple databases. Libraries need to provide access to databases beyond PubMed, including CINAHL and academic databases. Database vendors should improve their coverage of nursing, biomedical, and psychosocial titles identified in these studies. Additional research is needed to update these studies and analyze nursing specialties not covered.
XML-based approaches for the integration of heterogeneous bio-molecular data.
Mesiti, Marco; Jiménez-Ruiz, Ernesto; Sanz, Ismael; Berlanga-Llavori, Rafael; Perlasca, Paolo; Valentini, Giorgio; Manset, David
2009-10-15
The today's public database infrastructure spans a very large collection of heterogeneous biological data, opening new opportunities for molecular biology, bio-medical and bioinformatics research, but raising also new problems for their integration and computational processing. In this paper we survey the most interesting and novel approaches for the representation, integration and management of different kinds of biological data by exploiting XML and the related recommendations and approaches. Moreover, we present new and interesting cutting edge approaches for the appropriate management of heterogeneous biological data represented through XML. XML has succeeded in the integration of heterogeneous biomolecular information, and has established itself as the syntactic glue for biological data sources. Nevertheless, a large variety of XML-based data formats have been proposed, thus resulting in a difficult effective integration of bioinformatics data schemes. The adoption of a few semantic-rich standard formats is urgent to achieve a seamless integration of the current biological resources.
Biomedical ontologies: toward scientific debate.
Maojo, V; Crespo, J; García-Remesal, M; de la Iglesia, D; Perez-Rey, D; Kulikowski, C
2011-01-01
Biomedical ontologies have been very successful in structuring knowledge for many different applications, receiving widespread praise for their utility and potential. Yet, the role of computational ontologies in scientific research, as opposed to knowledge management applications, has not been extensively discussed. We aim to stimulate further discussion on the advantages and challenges presented by biomedical ontologies from a scientific perspective. We review various aspects of biomedical ontologies going beyond their practical successes, and focus on some key scientific questions in two ways. First, we analyze and discuss current approaches to improve biomedical ontologies that are based largely on classical, Aristotelian ontological models of reality. Second, we raise various open questions about biomedical ontologies that require further research, analyzing in more detail those related to visual reasoning and spatial ontologies. We outline significant scientific issues that biomedical ontologies should consider, beyond current efforts of building practical consensus between them. For spatial ontologies, we suggest an approach for building "morphospatial" taxonomies, as an example that could stimulate research on fundamental open issues for biomedical ontologies. Analysis of a large number of problems with biomedical ontologies suggests that the field is very much open to alternative interpretations of current work, and in need of scientific debate and discussion that can lead to new ideas and research directions.
The European Bioinformatics Institute's data resources: towards systems biology.
Brooksbank, Catherine; Cameron, Graham; Thornton, Janet
2005-01-01
Genomic and post-genomic biological research has provided fine-grain insights into the molecular processes of life, but also threatens to drown biomedical researchers in data. Moreover, as new high-throughput technologies are developed, the types of data that are gathered en masse are diversifying. The need to collect, store and curate all this information in ways that allow its efficient retrieval and exploitation is greater than ever. The European Bioinformatics Institute's (EBI's) databases and tools have evolved to meet the changing needs of molecular biologists: since we last wrote about our services in the 2003 issue of Nucleic Acids Research, we have launched new databases covering protein-protein interactions (IntAct), pathways (Reactome) and small molecules (ChEBI). Our existing core databases have continued to evolve to meet the changing needs of biomedical researchers, and we have developed new data-access tools that help biologists to move intuitively through the different data types, thereby helping them to put the parts together to understand biology at the systems level. The EBI's data resources are all available on our website at http://www.ebi.ac.uk.
The European Bioinformatics Institute's data resources: towards systems biology
Brooksbank, Catherine; Cameron, Graham; Thornton, Janet
2005-01-01
Genomic and post-genomic biological research has provided fine-grain insights into the molecular processes of life, but also threatens to drown biomedical researchers in data. Moreover, as new high-throughput technologies are developed, the types of data that are gathered en masse are diversifying. The need to collect, store and curate all this information in ways that allow its efficient retrieval and exploitation is greater than ever. The European Bioinformatics Institute's (EBI's) databases and tools have evolved to meet the changing needs of molecular biologists: since we last wrote about our services in the 2003 issue of Nucleic Acids Research, we have launched new databases covering protein–protein interactions (IntAct), pathways (Reactome) and small molecules (ChEBI). Our existing core databases have continued to evolve to meet the changing needs of biomedical researchers, and we have developed new data-access tools that help biologists to move intuitively through the different data types, thereby helping them to put the parts together to understand biology at the systems level. The EBI's data resources are all available on our website at http://www.ebi.ac.uk. PMID:15608238
Uncovering Capgras delusion using a large-scale medical records database
Marshall, Caryl; Kanji, Zara; Wilkinson, Sam; Halligan, Peter; Deeley, Quinton
2017-01-01
Background Capgras delusion is scientifically important but most commonly reported as single case studies. Studies analysing large clinical records databases focus on common disorders but none have investigated rare syndromes. Aims Identify cases of Capgras delusion and associated psychopathology, demographics, cognitive function and neuropathology in light of existing models. Method Combined computational data extraction and qualitative classification using 250 000 case records from South London and Maudsley Clinical Record Interactive Search (CRIS) database. Results We identified 84 individuals and extracted diagnosis-matched comparison groups. Capgras was not ‘monothematic’ in the majority of cases. Most cases involved misidentified family members or close partners but others were misidentified in 25% of cases, contrary to dual-route face recognition models. Neuroimaging provided no evidence for predominantly right hemisphere damage. Individuals were ethnically diverse with a range of psychosis spectrum diagnoses. Conclusions Capgras is more diverse than current models assume. Identification of rare syndromes complements existing ‘big data’ approaches in psychiatry. Declaration of interests V.B. is supported by a Wellcome Trust Seed Award in Science (200589/Z/16/Z) and the UCLH NIHR Biomedical Research Centre. S.W. is supported by a Wellcome Trust Strategic Award (WT098455MA). Q.D. has received a grant from King’s Health Partners. Copyright and usage © The Royal College of Psychiatrists 2017. This is an open access article distributed under the terms of the Creative Commons Non-Commercial, No Derivatives (CC BY-NC-ND) license. PMID:28794897
[Design and establishment of modern literature database about acupuncture Deqi].
Guo, Zheng-rong; Qian, Gui-feng; Pan, Qiu-yin; Wang, Yang; Xin, Si-yuan; Li, Jing; Hao, Jie; Hu, Ni-juan; Zhu, Jiang; Ma, Liang-xiao
2015-02-01
A search on acupuncture Deqi was conducted using four Chinese-language biomedical databases (CNKI, Wan-Fang, VIP and CBM) and PubMed database and using keywords "Deqi" or "needle sensation" "needling feeling" "needle feel" "obtaining qi", etc. Then, a "Modern Literature Database for Acupuncture Deqi" was established by employing Microsoft SQL Server 2005 Express Edition, introducing the contents, data types, information structure and logic constraint of the system table fields. From this Database, detailed inquiries about general information of clinical trials, acupuncturists' experience, ancient medical works, comprehensive literature, etc. can be obtained. The present databank lays a foundation for subsequent evaluation of literature quality about Deqi and data mining of undetected Deqi knowledge.
Literature searches on Ayurveda: An update.
Aggithaya, Madhur G; Narahari, Saravu R
2015-01-01
The journals that publish on Ayurveda are increasingly indexed by popular medical databases in recent years. However, many Eastern journals are not indexed biomedical journal databases such as PubMed. Literature searches for Ayurveda continue to be challenging due to the nonavailability of active, unbiased dedicated databases for Ayurvedic literature. In 2010, authors identified 46 databases that can be used for systematic search of Ayurvedic papers and theses. This update reviewed our previous recommendation and identified current and relevant databases. To update on Ayurveda literature search and strategy to retrieve maximum publications. Author used psoriasis as an example to search previously listed databases and identify new. The population, intervention, control, and outcome table included keywords related to psoriasis and Ayurvedic terminologies for skin diseases. Current citation update status, search results, and search options of previous databases were assessed. Eight search strategies were developed. Hundred and five journals, both biomedical and Ayurveda, which publish on Ayurveda, were identified. Variability in databases was explored to identify bias in journal citation. Five among 46 databases are now relevant - AYUSH research portal, Annotated Bibliography of Indian Medicine, Digital Helpline for Ayurveda Research Articles (DHARA), PubMed, and Directory of Open Access Journals. Search options in these databases are not uniform, and only PubMed allows complex search strategy. "The Researches in Ayurveda" and "Ayurvedic Research Database" (ARD) are important grey resources for hand searching. About 44/105 (41.5%) journals publishing Ayurvedic studies are not indexed in any database. Only 11/105 (10.4%) exclusive Ayurveda journals are indexed in PubMed. AYUSH research portal and DHARA are two major portals after 2010. It is mandatory to search PubMed and four other databases because all five carry citations from different groups of journals. The hand searching is important to identify Ayurveda publications that are not indexed elsewhere. Availability information of citations in Ayurveda libraries from National Union Catalogue of Scientific Serials in India if regularly updated will improve the efficacy of hand searching. A grey database (ARD) contains unpublished PG/Ph.D. theses. The AYUSH portal, DHARA (funded by Ministry of AYUSH), and ARD should be merged to form single larger database to limit Ayurveda literature searches.
ERIC Educational Resources Information Center
Santana Arroyo, Sonia; del Carmen Gonzalez Rivero, Maria
2012-01-01
The National Medical Library of Cuba is currently developing an information literacy program to train users in the use of biomedical databases. This paper describes the experience with the course "Cochrane Library: Evidence-Based Medicine," which aims to teach users how to make the best use of this database, as well as the evidence-based…
Mapping selected general literature of international nursing.
Shams, Marie-Lise Antoun; Dixon, Lana S
2007-01-01
This study, part of a wider project to map the literature of nursing, identifies core journals cited in non-US nursing journals and determines the extent of their coverage by indexing services. Four general English-language journals were analyzed for format types and publication dates. Core titles were identified and nine bibliographic databases were scanned for indexing coverage. Findings show that 57.5% (13,391/23,271) of the cited references from the 4 core journals were to journal articles, 27.8% (6,471/23,271) to books, 9.5% (2,208/23,271) to government documents, 4.9% (1,131/23,271) to miscellaneous sources, and less than 1% (70/23,271) to Internet resources. Eleven journals produced one-third of the citations; the next third included 146 journals, followed by a dispersion of 1,622 titles. PubMed received the best database coverage scores, followed by CINAHL and Science Citation Index. None of the databases provided complete coverage of all 11 core titles. The four source journals contain a diverse group of cited references. The currency of citations to government documents makes these journals a good source for regulatory and legislative awareness. Nurses consult nursing and biomedical journals and must search both nursing and biomedical databases to cover the literature.
Large-Scale Event Extraction from Literature with Multi-Level Gene Normalization
Wei, Chih-Hsuan; Hakala, Kai; Pyysalo, Sampo; Ananiadou, Sophia; Kao, Hung-Yu; Lu, Zhiyong; Salakoski, Tapio; Van de Peer, Yves; Ginter, Filip
2013-01-01
Text mining for the life sciences aims to aid database curation, knowledge summarization and information retrieval through the automated processing of biomedical texts. To provide comprehensive coverage and enable full integration with existing biomolecular database records, it is crucial that text mining tools scale up to millions of articles and that their analyses can be unambiguously linked to information recorded in resources such as UniProt, KEGG, BioGRID and NCBI databases. In this study, we investigate how fully automated text mining of complex biomolecular events can be augmented with a normalization strategy that identifies biological concepts in text, mapping them to identifiers at varying levels of granularity, ranging from canonicalized symbols to unique gene and proteins and broad gene families. To this end, we have combined two state-of-the-art text mining components, previously evaluated on two community-wide challenges, and have extended and improved upon these methods by exploiting their complementary nature. Using these systems, we perform normalization and event extraction to create a large-scale resource that is publicly available, unique in semantic scope, and covers all 21.9 million PubMed abstracts and 460 thousand PubMed Central open access full-text articles. This dataset contains 40 million biomolecular events involving 76 million gene/protein mentions, linked to 122 thousand distinct genes from 5032 species across the full taxonomic tree. Detailed evaluations and analyses reveal promising results for application of this data in database and pathway curation efforts. The main software components used in this study are released under an open-source license. Further, the resulting dataset is freely accessible through a novel API, providing programmatic and customized access (http://www.evexdb.org/api/v001/). Finally, to allow for large-scale bioinformatic analyses, the entire resource is available for bulk download from http://evexdb.org/download/, under the Creative Commons – Attribution – Share Alike (CC BY-SA) license. PMID:23613707
Composite annotations: requirements for mapping multiscale data and models to biomedical ontologies
Cook, Daniel L.; Mejino, Jose L. V.; Neal, Maxwell L.; Gennari, John H.
2009-01-01
Current methods for annotating biomedical data resources rely on simple mappings between data elements and the contents of a variety of biomedical ontologies and controlled vocabularies. Here we point out that such simple mappings are inadequate for large-scale multiscale, multidomain integrative “virtual human” projects. For such integrative challenges, we describe a “composite annotation” schema that is simple yet sufficiently extensible for mapping the biomedical content of a variety of data sources and biosimulation models to available biomedical ontologies. PMID:19964601
Sorrell v. IMS Health: issues and opportunities for informaticians
Petersen, Carolyn; DeMuro, Paul; Goodman, Kenneth W; Kaplan, Bonnie
2013-01-01
In 2011, the US Supreme Court decided Sorrell v. IMS Health, Inc., a case that addressed the mining of large aggregated databases and the sale of prescriber data for marketing prescription drugs. The court struck down a Vermont law that required data mining companies to obtain permission from individual providers before selling prescription records that included identifiable physician prescription information to pharmaceutical companies for drug marketing. The decision was based on constitutional free speech protections rather than data sharing considerations. Sorrell illustrates challenges at the intersection of biomedical informatics, public health, constitutional liberties, and ethics. As states, courts, regulatory agencies, and federal bodies respond to Sorrell, informaticians’ expertise can contribute to more informed, ethical, and appropriate policies. PMID:23104048
Zhao, Lei; Guo, Yi; Wang, Wei; Yan, Li-juan
2011-08-01
To evaluate the effectiveness of acupuncture as a treatment for neurovascular headache and to analyze the current situation related to acupuncture treatment. PubMed database (1966-2010), EMBASE database (1986-2010), Cochrane Library (Issue 1, 2010), Chinese Biomedical Literature Database (1979-2010), China HowNet Knowledge Database (1979-2010), VIP Journals Database (1989-2010), and Wanfang database (1998-2010) were retrieved. Randomized or quasi-randomized controlled studies were included. The priority was given to high-quality randomized, controlled trials. Statistical outcome indicators were measured using RevMan 5.0.20 software. A total of 16 articles and 1 535 cases were included. Meta-analysis showed a significant difference between the acupuncture therapy and Western medicine therapy [combined RR (random efficacy model)=1.46, 95% CI (1.21, 1.75), Z=3.96, P<0.0001], indicating an obvious superior effect of the acupuncture therapy; significant difference also existed between the comprehensive acupuncture therapy and acupuncture therapy alone [combined RR (fixed efficacy model)=3.35, 95% CI (1.92, 5.82), Z=4.28, P<0.0001], indicating that acupuncture combined with other therapies, such as points injection, scalp acupuncture, auricular acupuncture, etc., were superior to the conventional body acupuncture therapy alone. The inclusion of limited clinical studies had verified the efficacy of acupuncture in the treatment of neurovascular headache. Although acupuncture or its combined therapies provides certain advantages, most clinical studies are of small sample sizes. Large sample size, randomized, controlled trials are needed in the future for more definitive results.
SCALEUS: Semantic Web Services Integration for Biomedical Applications.
Sernadela, Pedro; González-Castro, Lorena; Oliveira, José Luís
2017-04-01
In recent years, we have witnessed an explosion of biological data resulting largely from the demands of life science research. The vast majority of these data are freely available via diverse bioinformatics platforms, including relational databases and conventional keyword search applications. This type of approach has achieved great results in the last few years, but proved to be unfeasible when information needs to be combined or shared among different and scattered sources. During recent years, many of these data distribution challenges have been solved with the adoption of semantic web. Despite the evident benefits of this technology, its adoption introduced new challenges related with the migration process, from existent systems to the semantic level. To facilitate this transition, we have developed Scaleus, a semantic web migration tool that can be deployed on top of traditional systems in order to bring knowledge, inference rules, and query federation to the existent data. Targeted at the biomedical domain, this web-based platform offers, in a single package, straightforward data integration and semantic web services that help developers and researchers in the creation process of new semantically enhanced information systems. SCALEUS is available as open source at http://bioinformatics-ua.github.io/scaleus/ .
MAPI: a software framework for distributed biomedical applications
2013-01-01
Background The amount of web-based resources (databases, tools etc.) in biomedicine has increased, but the integrated usage of those resources is complex due to differences in access protocols and data formats. However, distributed data processing is becoming inevitable in several domains, in particular in biomedicine, where researchers face rapidly increasing data sizes. This big data is difficult to process locally because of the large processing, memory and storage capacity required. Results This manuscript describes a framework, called MAPI, which provides a uniform representation of resources available over the Internet, in particular for Web Services. The framework enhances their interoperability and collaborative use by enabling a uniform and remote access. The framework functionality is organized in modules that can be combined and configured in different ways to fulfil concrete development requirements. Conclusions The framework has been tested in the biomedical application domain where it has been a base for developing several clients that are able to integrate different web resources. The MAPI binaries and documentation are freely available at http://www.bitlab-es.com/mapi under the Creative Commons Attribution-No Derivative Works 2.5 Spain License. The MAPI source code is available by request (GPL v3 license). PMID:23311574
Reflective random indexing for semi-automatic indexing of the biomedical literature.
Vasuki, Vidya; Cohen, Trevor
2010-10-01
The rapid growth of biomedical literature is evident in the increasing size of the MEDLINE research database. Medical Subject Headings (MeSH), a controlled set of keywords, are used to index all the citations contained in the database to facilitate search and retrieval. This volume of citations calls for efficient tools to assist indexers at the US National Library of Medicine (NLM). Currently, the Medical Text Indexer (MTI) system provides assistance by recommending MeSH terms based on the title and abstract of an article using a combination of distributional and vocabulary-based methods. In this paper, we evaluate a novel approach toward indexer assistance by using nearest neighbor classification in combination with Reflective Random Indexing (RRI), a scalable alternative to the established methods of distributional semantics. On a test set provided by the NLM, our approach significantly outperforms the MTI system, suggesting that the RRI approach would make a useful addition to the current methodologies.
Duchrow, Timo; Shtatland, Timur; Guettler, Daniel; Pivovarov, Misha; Kramer, Stefan; Weissleder, Ralph
2009-01-01
Background The breadth of biological databases and their information content continues to increase exponentially. Unfortunately, our ability to query such sources is still often suboptimal. Here, we introduce and apply community voting, database-driven text classification, and visual aids as a means to incorporate distributed expert knowledge, to automatically classify database entries and to efficiently retrieve them. Results Using a previously developed peptide database as an example, we compared several machine learning algorithms in their ability to classify abstracts of published literature results into categories relevant to peptide research, such as related or not related to cancer, angiogenesis, molecular imaging, etc. Ensembles of bagged decision trees met the requirements of our application best. No other algorithm consistently performed better in comparative testing. Moreover, we show that the algorithm produces meaningful class probability estimates, which can be used to visualize the confidence of automatic classification during the retrieval process. To allow viewing long lists of search results enriched by automatic classifications, we added a dynamic heat map to the web interface. We take advantage of community knowledge by enabling users to cast votes in Web 2.0 style in order to correct automated classification errors, which triggers reclassification of all entries. We used a novel framework in which the database "drives" the entire vote aggregation and reclassification process to increase speed while conserving computational resources and keeping the method scalable. In our experiments, we simulate community voting by adding various levels of noise to nearly perfectly labelled instances, and show that, under such conditions, classification can be improved significantly. Conclusion Using PepBank as a model database, we show how to build a classification-aided retrieval system that gathers training data from the community, is completely controlled by the database, scales well with concurrent change events, and can be adapted to add text classification capability to other biomedical databases. The system can be accessed at . PMID:19799796
Mapping the literature of maternal-child/gynecologic nursing.
Jacobs, Susan Kaplan
2006-04-01
As part of a project to map the literature of nursing, sponsored by the Nursing and Allied Health Resources Section of the Medical Library Association, this study identifies core journals cited in maternal-child/gynecologic nursing and the indexing services that access the cited journals. Three source journals were selected and subjected to a citation analysis of articles from 1996 to 1998. Journals were the most frequently cited format (74.1%), followed by books (19.7%), miscellaneous (4.2%), and government documents (1.9%). Bradford's Law of Scattering was applied to the results, ranking cited journal references in descending order. One-third of the citations were found in a core of 14 journal titles; one-third were dispersed among a middle zone of 100 titles; and the remaining third were scattered in a larger zone of 1,194 titles. Indexing coverage for the core titles was most comprehensive in PubMed/MEDLINE, followed by Science Citation Index and CINAHL. The core of journals cited in this nursing specialty revealed a large number of medical titles, thus, the biomedical databases provide the best access. The interdisciplinary nature of maternal-child/ gynecologic nursing topics dictates that social sciences databases are an important adjunct. The study results will assist librarians in collection development, provide end users with guidelines for selecting databases, and influence database producers to consider extending coverage to identified titles.
Mapping the literature of maternal-child/gynecologic nursing
Jacobs, Susan Kaplan
2006-01-01
Objectives: As part of a project to map the literature of nursing, sponsored by the Nursing and Allied Health Resources Section of the Medical Library Association, this study identifies core journals cited in maternal-child/gynecologic nursing and the indexing services that access the cited journals. Methods: Three source journals were selected and subjected to a citation analysis of articles from 1996 to 1998. Results: Journals were the most frequently cited format (74.1%), followed by books (19.7%), miscellaneous (4.2%), and government documents (1.9%). Bradford's Law of Scattering was applied to the results, ranking cited journal references in descending order. One-third of the citations were found in a core of 14 journal titles; one-third were dispersed among a middle zone of 100 titles; and the remaining third were scattered in a larger zone of 1,194 titles. Indexing coverage for the core titles was most comprehensive in PubMed/MEDLINE, followed by Science Citation Index and CINAHL. Conclusion: The core of journals cited in this nursing specialty revealed a large number of medical titles, thus, the biomedical databases provide the best access. The interdisciplinary nature of maternal-child/ gynecologic nursing topics dictates that social sciences databases are an important adjunct. The study results will assist librarians in collection development, provide end users with guidelines for selecting databases, and influence database producers to consider extending coverage to identified titles. PMID:16710464
Automated labeling of bibliographic data extracted from biomedical online journals
NASA Astrophysics Data System (ADS)
Kim, Jongwoo; Le, Daniel X.; Thoma, George R.
2003-01-01
A prototype system has been designed to automate the extraction of bibliographic data (e.g., article title, authors, abstract, affiliation and others) from online biomedical journals to populate the National Library of Medicine"s MEDLINE database. This paper describes a key module in this system: the labeling module that employs statistics and fuzzy rule-based algorithms to identify segmented zones in an article"s HTML pages as specific bibliographic data. Results from experiments conducted with 1,149 medical articles from forty-seven journal issues are presented.
Image BOSS: a biomedical object storage system
NASA Astrophysics Data System (ADS)
Stacy, Mahlon C.; Augustine, Kurt E.; Robb, Richard A.
1997-05-01
Researchers using biomedical images have data management needs which are oriented perpendicular to clinical PACS. The image BOSS system is designed to permit researchers to organize and select images based on research topic, image metadata, and a thumbnail of the image. Image information is captured from existing images in a Unix based filesystem, stored in an object oriented database, and presented to the user in a familiar laboratory notebook metaphor. In addition, the ImageBOSS is designed to provide an extensible infrastructure for future content-based queries directly on the images.
SQLGEN: a framework for rapid client-server database application development.
Nadkarni, P M; Cheung, K H
1995-12-01
SQLGEN is a framework for rapid client-server relational database application development. It relies on an active data dictionary on the client machine that stores metadata on one or more database servers to which the client may be connected. The dictionary generates dynamic Structured Query Language (SQL) to perform common database operations; it also stores information about the access rights of the user at log-in time, which is used to partially self-configure the behavior of the client to disable inappropriate user actions. SQLGEN uses a microcomputer database as the client to store metadata in relational form, to transiently capture server data in tables, and to allow rapid application prototyping followed by porting to client-server mode with modest effort. SQLGEN is currently used in several production biomedical databases.
MedlinePlus FAQ: MedlinePlus and MEDLINE/PubMed
... What is the difference between MedlinePlus and MEDLINE/PubMed? To use the sharing features on this page, ... latest health professional articles on your topic. MEDLINE/PubMed: Is a database of professional biomedical literature Is ...
Technological Innovations from NASA
NASA Technical Reports Server (NTRS)
Pellis, Neal R.
2006-01-01
The challenge of human space exploration places demands on technology that push concepts and development to the leading edge. In biotechnology and biomedical equipment development, NASA science has been the seed for numerous innovations, many of which are in the commercial arena. The biotechnology effort has led to rational drug design, analytical equipment, and cell culture and tissue engineering strategies. Biomedical research and development has resulted in medical devices that enable diagnosis and treatment advances. NASA Biomedical developments are exemplified in the new laser light scattering analysis for cataracts, the axial flow left ventricular-assist device, non contact electrocardiography, and the guidance system for LASIK surgery. Many more developments are in progress. NASA will continue to advance technologies, incorporating new approaches from basic and applied research, nanotechnology, computational modeling, and database analyses.
Using Ontology Fingerprints to disambiguate gene name entities in the biomedical literature
Chen, Guocai; Zhao, Jieyi; Cohen, Trevor; Tao, Cui; Sun, Jingchun; Xu, Hua; Bernstam, Elmer V.; Lawson, Andrew; Zeng, Jia; Johnson, Amber M.; Holla, Vijaykumar; Bailey, Ann M.; Lara-Guerra, Humberto; Litzenburger, Beate; Meric-Bernstam, Funda; Jim Zheng, W.
2015-01-01
Ambiguous gene names in the biomedical literature are a barrier to accurate information extraction. To overcome this hurdle, we generated Ontology Fingerprints for selected genes that are relevant for personalized cancer therapy. These Ontology Fingerprints were used to evaluate the association between genes and biomedical literature to disambiguate gene names. We obtained 93.6% precision for the test gene set and 80.4% for the area under a receiver-operating characteristics curve for gene and article association. The core algorithm was implemented using a graphics processing unit-based MapReduce framework to handle big data and to improve performance. We conclude that Ontology Fingerprints can help disambiguate gene names mentioned in text and analyse the association between genes and articles. Database URL: http://www.ontologyfingerprint.org PMID:25858285
Stephens, Susie M; Chen, Jake Y; Davidson, Marcel G; Thomas, Shiby; Trute, Barry M
2005-01-01
As database management systems expand their array of analytical functionality, they become powerful research engines for biomedical data analysis and drug discovery. Databases can hold most of the data types commonly required in life sciences and consequently can be used as flexible platforms for the implementation of knowledgebases. Performing data analysis in the database simplifies data management by minimizing the movement of data from disks to memory, allowing pre-filtering and post-processing of datasets, and enabling data to remain in a secure, highly available environment. This article describes the Oracle Database 10g implementation of BLAST and Regular Expression Searches and provides case studies of their usage in bioinformatics. http://www.oracle.com/technology/software/index.html.
Ozyurt, Ibrahim Burak; Grethe, Jeffrey S; Martone, Maryann E; Bandrowski, Anita E
2016-01-01
The NIF Registry developed and maintained by the Neuroscience Information Framework is a cooperative project aimed at cataloging research resources, e.g., software tools, databases and tissue banks, funded largely by governments and available as tools to research scientists. Although originally conceived for neuroscience, the NIF Registry has over the years broadened in the scope to include research resources of general relevance to biomedical research. The current number of research resources listed by the Registry numbers over 13K. The broadening in scope to biomedical science led us to re-christen the NIF Registry platform as SciCrunch. The NIF/SciCrunch Registry has been cataloging the resource landscape since 2006; as such, it serves as a valuable dataset for tracking the breadth, fate and utilization of these resources. Our experience shows research resources like databases are dynamic objects, that can change location and scope over time. Although each record is entered manually and human-curated, the current size of the registry requires tools that can aid in curation efforts to keep content up to date, including when and where such resources are used. To address this challenge, we have developed an open source tool suite, collectively termed RDW: Resource Disambiguator for the (Web). RDW is designed to help in the upkeep and curation of the registry as well as in enhancing the content of the registry by automated extraction of resource candidates from the literature. The RDW toolkit includes a URL extractor from papers, resource candidate screen, resource URL change tracker, resource content change tracker. Curators access these tools via a web based user interface. Several strategies are used to optimize these tools, including supervised and unsupervised learning algorithms as well as statistical text analysis. The complete tool suite is used to enhance and maintain the resource registry as well as track the usage of individual resources through an innovative literature citation index honed for research resources. Here we present an overview of the Registry and show how the RDW tools are used in curation and usage tracking.
Ozyurt, Ibrahim Burak; Grethe, Jeffrey S.; Martone, Maryann E.; Bandrowski, Anita E.
2016-01-01
The NIF Registry developed and maintained by the Neuroscience Information Framework is a cooperative project aimed at cataloging research resources, e.g., software tools, databases and tissue banks, funded largely by governments and available as tools to research scientists. Although originally conceived for neuroscience, the NIF Registry has over the years broadened in the scope to include research resources of general relevance to biomedical research. The current number of research resources listed by the Registry numbers over 13K. The broadening in scope to biomedical science led us to re-christen the NIF Registry platform as SciCrunch. The NIF/SciCrunch Registry has been cataloging the resource landscape since 2006; as such, it serves as a valuable dataset for tracking the breadth, fate and utilization of these resources. Our experience shows research resources like databases are dynamic objects, that can change location and scope over time. Although each record is entered manually and human-curated, the current size of the registry requires tools that can aid in curation efforts to keep content up to date, including when and where such resources are used. To address this challenge, we have developed an open source tool suite, collectively termed RDW: Resource Disambiguator for the (Web). RDW is designed to help in the upkeep and curation of the registry as well as in enhancing the content of the registry by automated extraction of resource candidates from the literature. The RDW toolkit includes a URL extractor from papers, resource candidate screen, resource URL change tracker, resource content change tracker. Curators access these tools via a web based user interface. Several strategies are used to optimize these tools, including supervised and unsupervised learning algorithms as well as statistical text analysis. The complete tool suite is used to enhance and maintain the resource registry as well as track the usage of individual resources through an innovative literature citation index honed for research resources. Here we present an overview of the Registry and show how the RDW tools are used in curation and usage tracking. PMID:26730820
A Recent Advance in the Automatic Indexing of the Biomedical Literature
Névéol, Aurélie; Shooshan, Sonya E.; Humphrey, Susanne M.; Mork, James G.; Aronson, Alan R.
2009-01-01
The volume of biomedical literature has experienced explosive growth in recent years. This is reflected in the corresponding increase in the size of MEDLINE®, the largest bibliographic database of biomedical citations. Indexers at the U.S. National Library of Medicine (NLM) need efficient tools to help them accommodate the ensuing workload. After reviewing issues in the automatic assignment of Medical Subject Headings (MeSH® terms) to biomedical text, we focus more specifically on the new subheading attachment feature for NLM’s Medical Text Indexer (MTI). Natural Language Processing, statistical, and machine learning methods of producing automatic MeSH main heading/subheading pair recommendations were assessed independently and combined. The best combination achieves 48% precision and 30% recall. After validation by NLM indexers, a suitable combination of the methods presented in this paper was integrated into MTI as a subheading attachment feature producing MeSH indexing recommendations compliant with current state-of-the-art indexing practice. PMID:19166973
Compound image segmentation of published biomedical figures.
Li, Pengyuan; Jiang, Xiangying; Kambhamettu, Chandra; Shatkay, Hagit
2018-04-01
Images convey essential information in biomedical publications. As such, there is a growing interest within the bio-curation and the bio-databases communities, to store images within publications as evidence for biomedical processes and for experimental results. However, many of the images in biomedical publications are compound images consisting of multiple panels, where each individual panel potentially conveys a different type of information. Segmenting such images into constituent panels is an essential first step toward utilizing images. In this article, we develop a new compound image segmentation system, FigSplit, which is based on Connected Component Analysis. To overcome shortcomings typically manifested by existing methods, we develop a quality assessment step for evaluating and modifying segmentations. Two methods are proposed to re-segment the images if the initial segmentation is inaccurate. Experimental results show the effectiveness of our method compared with other methods. The system is publicly available for use at: https://www.eecis.udel.edu/~compbio/FigSplit. The code is available upon request. shatkay@udel.edu. Supplementary data are available online at Bioinformatics.
Kibbe, Warren A.; Arze, Cesar; Felix, Victor; Mitraka, Elvira; Bolton, Evan; Fu, Gang; Mungall, Christopher J.; Binder, Janos X.; Malone, James; Vasant, Drashtti; Parkinson, Helen; Schriml, Lynn M.
2015-01-01
The current version of the Human Disease Ontology (DO) (http://www.disease-ontology.org) database expands the utility of the ontology for the examination and comparison of genetic variation, phenotype, protein, drug and epitope data through the lens of human disease. DO is a biomedical resource of standardized common and rare disease concepts with stable identifiers organized by disease etiology. The content of DO has had 192 revisions since 2012, including the addition of 760 terms. Thirty-two percent of all terms now include definitions. DO has expanded the number and diversity of research communities and community members by 50+ during the past two years. These community members actively submit term requests, coordinate biomedical resource disease representation and provide expert curation guidance. Since the DO 2012 NAR paper, there have been hundreds of term requests and a steady increase in the number of DO listserv members, twitter followers and DO website usage. DO is moving to a multi-editor model utilizing Protégé to curate DO in web ontology language. This will enable closer collaboration with the Human Phenotype Ontology, EBI's Ontology Working Group, Mouse Genome Informatics and the Monarch Initiative among others, and enhance DO's current asserted view and multiple inferred views through reasoning. PMID:25348409
Development and Evaluation of Thesauri-Based Bibliographic Biomedical Search Engine
ERIC Educational Resources Information Center
Alghoson, Abdullah
2017-01-01
Due to the large volume and exponential growth of biomedical documents (e.g., books, journal articles), it has become increasingly challenging for biomedical search engines to retrieve relevant documents based on users' search queries. Part of the challenge is the matching mechanism of free-text indexing that performs matching based on…
NaKnowBaseTM: The EPA Nanomaterials Research Database
The ability to predict the environmental and health implications of engineered nanomaterials is an important research priority due to the exponential rate at which nanotechnology is being incorporated into consumer, industrial and biomedical applications. To address this need and...
The Camden & Islington Research Database: Using electronic mental health records for research.
Werbeloff, Nomi; Osborn, David P J; Patel, Rashmi; Taylor, Matthew; Stewart, Robert; Broadbent, Matthew; Hayes, Joseph F
2018-01-01
Electronic health records (EHRs) are widely used in mental health services. Case registers using EHRs from secondary mental healthcare have the potential to deliver large-scale projects evaluating mental health outcomes in real-world clinical populations. We describe the Camden and Islington NHS Foundation Trust (C&I) Research Database which uses the Clinical Record Interactive Search (CRIS) tool to extract and de-identify routinely collected clinical information from a large UK provider of secondary mental healthcare, and demonstrate its capabilities to answer a clinical research question regarding time to diagnosis and treatment of bipolar disorder. The C&I Research Database contains records from 108,168 mental health patients, of which 23,538 were receiving active care. The characteristics of the patient population are compared to those of the catchment area, of London, and of England as a whole. The median time to diagnosis of bipolar disorder was 76 days (interquartile range: 17-391) and median time to treatment was 37 days (interquartile range: 5-194). Compulsory admission under the UK Mental Health Act was associated with shorter intervals to diagnosis and treatment. Prior diagnoses of other psychiatric disorders were associated with longer intervals to diagnosis, though prior diagnoses of schizophrenia and related disorders were associated with decreased time to treatment. The CRIS tool, developed by the South London and Maudsley NHS Foundation Trust (SLaM) Biomedical Research Centre (BRC), functioned very well at C&I. It is reassuring that data from different organizations deliver similar results, and that applications developed in one Trust can then be successfully deployed in another. The information can be retrieved in a quicker and more efficient fashion than more traditional methods of health research. The findings support the secondary use of EHRs for large-scale mental health research in naturalistic samples and settings investigated across large, diverse geographical areas.
The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data
Köhler, Sebastian; Doelken, Sandra C.; Mungall, Christopher J.; Bauer, Sebastian; Firth, Helen V.; Bailleul-Forestier, Isabelle; Black, Graeme C. M.; Brown, Danielle L.; Brudno, Michael; Campbell, Jennifer; FitzPatrick, David R.; Eppig, Janan T.; Jackson, Andrew P.; Freson, Kathleen; Girdea, Marta; Helbig, Ingo; Hurst, Jane A.; Jähn, Johanna; Jackson, Laird G.; Kelly, Anne M.; Ledbetter, David H.; Mansour, Sahar; Martin, Christa L.; Moss, Celia; Mumford, Andrew; Ouwehand, Willem H.; Park, Soo-Mi; Riggs, Erin Rooney; Scott, Richard H.; Sisodiya, Sanjay; Vooren, Steven Van; Wapner, Ronald J.; Wilkie, Andrew O. M.; Wright, Caroline F.; Vulto-van Silfhout, Anneke T.; de Leeuw, Nicole; de Vries, Bert B. A.; Washingthon, Nicole L.; Smith, Cynthia L.; Westerfield, Monte; Schofield, Paul; Ruef, Barbara J.; Gkoutos, Georgios V.; Haendel, Melissa; Smedley, Damian; Lewis, Suzanna E.; Robinson, Peter N.
2014-01-01
The Human Phenotype Ontology (HPO) project, available at http://www.human-phenotype-ontology.org, provides a structured, comprehensive and well-defined set of 10,088 classes (terms) describing human phenotypic abnormalities and 13,326 subclass relations between the HPO classes. In addition we have developed logical definitions for 46% of all HPO classes using terms from ontologies for anatomy, cell types, function, embryology, pathology and other domains. This allows interoperability with several resources, especially those containing phenotype information on model organisms such as mouse and zebrafish. Here we describe the updated HPO database, which provides annotations of 7,278 human hereditary syndromes listed in OMIM, Orphanet and DECIPHER to classes of the HPO. Various meta-attributes such as frequency, references and negations are associated with each annotation. Several large-scale projects worldwide utilize the HPO for describing phenotype information in their datasets. We have therefore generated equivalence mappings to other phenotype vocabularies such as LDDB, Orphanet, MedDRA, UMLS and phenoDB, allowing integration of existing datasets and interoperability with multiple biomedical resources. We have created various ways to access the HPO database content using flat files, a MySQL database, and Web-based tools. All data and documentation on the HPO project can be found online. PMID:24217912
PedAM: a database for Pediatric Disease Annotation and Medicine.
Jia, Jinmeng; An, Zhongxin; Ming, Yue; Guo, Yongli; Li, Wei; Li, Xin; Liang, Yunxiang; Guo, Dongming; Tai, Jun; Chen, Geng; Jin, Yaqiong; Liu, Zhimei; Ni, Xin; Shi, Tieliu
2018-01-04
There is a significant number of children around the world suffering from the consequence of the misdiagnosis and ineffective treatment for various diseases. To facilitate the precision medicine in pediatrics, a database namely the Pediatric Disease Annotations & Medicines (PedAM) has been built to standardize and classify pediatric diseases. The PedAM integrates both biomedical resources and clinical data from Electronic Medical Records to support the development of computational tools, by which enables robust data analysis and integration. It also uses disease-manifestation (D-M) integrated from existing biomedical ontologies as prior knowledge to automatically recognize text-mined, D-M-specific syntactic patterns from 774 514 full-text articles and 8 848 796 abstracts in MEDLINE. Additionally, disease connections based on phenotypes or genes can be visualized on the web page of PedAM. Currently, the PedAM contains standardized 8528 pediatric disease terms (4542 unique disease concepts and 3986 synonyms) with eight annotation fields for each disease, including definition synonyms, gene, symptom, cross-reference (Xref), human phenotypes and its corresponding phenotypes in the mouse. The database PedAM is freely accessible at http://www.unimd.org/pedam/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
A rat RNA-Seq transcriptomic BodyMap across 11 organs and 4 developmental stages
Yu, Ying; Fuscoe, James C.; Zhao, Chen; Guo, Chao; Jia, Meiwen; Qing, Tao; Bannon, Desmond I.; Lancashire, Lee; Bao, Wenjun; Du, Tingting; Luo, Heng; Su, Zhenqiang; Jones, Wendell D.; Moland, Carrie L.; Branham, William S.; Qian, Feng; Ning, Baitang; Li, Yan; Hong, Huixiao; Guo, Lei; Mei, Nan; Shi, Tieliu; Wang, Kevin Y.; Wolfinger, Russell D.; Nikolsky, Yuri; Walker, Stephen J.; Duerksen-Hughes, Penelope; Mason, Christopher E.; Tong, Weida; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Shi, Leming; Wang, Charles
2014-01-01
The rat has been used extensively as a model for evaluating chemical toxicities and for understanding drug mechanisms. However, its transcriptome across multiple organs, or developmental stages, has not yet been reported. Here we show, as part of the SEQC consortium efforts, a comprehensive rat transcriptomic BodyMap created by performing RNA-Seq on 320 samples from 11 organs of both sexes of juvenile, adolescent, adult and aged Fischer 344 rats. We catalogue the expression profiles of 40,064 genes, 65,167 transcripts, 31,909 alternatively spliced transcript variants and 2,367 non-coding genes/non-coding RNAs (ncRNAs) annotated in AceView. We find that organ-enriched, differentially expressed genes reflect the known organ-specific biological activities. A large number of transcripts show organ-specific, age-dependent or sex-specific differential expression patterns. We create a web-based, open-access rat BodyMap database of expression profiles with crosslinks to other widely used databases, anticipating that it will serve as a primary resource for biomedical research using the rat model. PMID:24510058
Language-Agnostic Reproducible Data Analysis Using Literate Programming.
Vassilev, Boris; Louhimo, Riku; Ikonen, Elina; Hautaniemi, Sampsa
2016-01-01
A modern biomedical research project can easily contain hundreds of analysis steps and lack of reproducibility of the analyses has been recognized as a severe issue. While thorough documentation enables reproducibility, the number of analysis programs used can be so large that in reality reproducibility cannot be easily achieved. Literate programming is an approach to present computer programs to human readers. The code is rearranged to follow the logic of the program, and to explain that logic in a natural language. The code executed by the computer is extracted from the literate source code. As such, literate programming is an ideal formalism for systematizing analysis steps in biomedical research. We have developed the reproducible computing tool Lir (literate, reproducible computing) that allows a tool-agnostic approach to biomedical data analysis. We demonstrate the utility of Lir by applying it to a case study. Our aim was to investigate the role of endosomal trafficking regulators to the progression of breast cancer. In this analysis, a variety of tools were combined to interpret the available data: a relational database, standard command-line tools, and a statistical computing environment. The analysis revealed that the lipid transport related genes LAPTM4B and NDRG1 are coamplified in breast cancer patients, and identified genes potentially cooperating with LAPTM4B in breast cancer progression. Our case study demonstrates that with Lir, an array of tools can be combined in the same data analysis to improve efficiency, reproducibility, and ease of understanding. Lir is an open-source software available at github.com/borisvassilev/lir.
Rembrandt: Helping Personalized Medicine Become a Reality Through Integrative Translational Research
Madhavan, Subha; Zenklusen, Jean-Claude; Kotliarov, Yuri; Sahni, Himanso; Fine, Howard A.; Buetow, Kenneth
2009-01-01
Finding better therapies for the treatment of brain tumors is hampered by the lack of consistently obtained molecular data in a large sample set, and ability to integrate biomedical data from disparate sources enabling translation of therapies from bench to bedside. Hence, a critical factor in the advancement of biomedical research and clinical translation is the ease with which data can be integrated, redistributed and analyzed both within and across functional domains. Novel biomedical informatics infrastructure and tools are essential for developing individualized patient treatment based on the specific genomic signatures in each patient’s tumor. Here we present Rembrandt, Repository of Molecular BRAin Neoplasia DaTa, a cancer clinical genomics database and a web-based data mining and analysis platform aimed at facilitating discovery by connecting the dots between clinical information and genomic characterization data. To date, Rembrandt contains data generated through the Glioma Molecular Diagnostic Initiative from 874 glioma specimens comprising nearly 566 gene expression arrays, 834 copy number arrays and 13,472 clinical phenotype data points. Data can be queried and visualized for a selected gene across all data platforms or for multiple genes in a selected platform. Additionally, gene sets can be limited to clinically important annotations including secreted, kinase, membrane, and known gene-anomaly pairs to facilitate the discovery of novel biomarkers and therapeutic targets. We believe that REMBRANDT represents a prototype of how high throughput genomic and clinical data can be integrated in a way that will allow expeditious and efficient translation of laboratory discoveries to the clinic. PMID:19208739
Language-Agnostic Reproducible Data Analysis Using Literate Programming
Vassilev, Boris; Louhimo, Riku; Ikonen, Elina; Hautaniemi, Sampsa
2016-01-01
A modern biomedical research project can easily contain hundreds of analysis steps and lack of reproducibility of the analyses has been recognized as a severe issue. While thorough documentation enables reproducibility, the number of analysis programs used can be so large that in reality reproducibility cannot be easily achieved. Literate programming is an approach to present computer programs to human readers. The code is rearranged to follow the logic of the program, and to explain that logic in a natural language. The code executed by the computer is extracted from the literate source code. As such, literate programming is an ideal formalism for systematizing analysis steps in biomedical research. We have developed the reproducible computing tool Lir (literate, reproducible computing) that allows a tool-agnostic approach to biomedical data analysis. We demonstrate the utility of Lir by applying it to a case study. Our aim was to investigate the role of endosomal trafficking regulators to the progression of breast cancer. In this analysis, a variety of tools were combined to interpret the available data: a relational database, standard command-line tools, and a statistical computing environment. The analysis revealed that the lipid transport related genes LAPTM4B and NDRG1 are coamplified in breast cancer patients, and identified genes potentially cooperating with LAPTM4B in breast cancer progression. Our case study demonstrates that with Lir, an array of tools can be combined in the same data analysis to improve efficiency, reproducibility, and ease of understanding. Lir is an open-source software available at github.com/borisvassilev/lir. PMID:27711123
Enabling Large-Scale Biomedical Analysis in the Cloud
Lin, Ying-Chih; Yu, Chin-Sheng; Lin, Yen-Jen
2013-01-01
Recent progress in high-throughput instrumentations has led to an astonishing growth in both volume and complexity of biomedical data collected from various sources. The planet-size data brings serious challenges to the storage and computing technologies. Cloud computing is an alternative to crack the nut because it gives concurrent consideration to enable storage and high-performance computing on large-scale data. This work briefly introduces the data intensive computing system and summarizes existing cloud-based resources in bioinformatics. These developments and applications would facilitate biomedical research to make the vast amount of diversification data meaningful and usable. PMID:24288665
Life Sciences Data Archive (LSDA)
NASA Technical Reports Server (NTRS)
Fitts, M.; Johnson-Throop, Kathy; Thomas, D.; Shackelford, K.
2008-01-01
In the early days of spaceflight, space life sciences data were been collected and stored in numerous databases, formats, media-types and geographical locations. While serving the needs of individual research teams, these data were largely unknown/unavailable to the scientific community at large. As a result, the Space Act of 1958 and the Science Data Management Policy mandated that research data collected by the National Aeronautics and Space Administration be made available to the science community at large. The Biomedical Informatics and Health Care Systems Branch of the Space Life Sciences Directorate at JSC and the Data Archive Project at ARC, with funding from the Human Research Program through the Exploration Medical Capability Element, are fulfilling these requirements through the systematic population of the Life Sciences Data Archive. This program constitutes a formal system for the acquisition, archival and distribution of data for Life Sciences-sponsored experiments and investigations. The general goal of the archive is to acquire, preserve, and distribute these data using a variety of media which are accessible and responsive to inquiries from the science communities.
Pub-Med-dot-com, here we come!
Pulst, Stefan M
2016-08-01
As of April 8, 2016, articles in Neurology® Genetics can be searched using PubMed. Launched in 1996, PubMed is a search engine that accesses citations and abstracts of more than 26 million articles. Its primary sources include the MEDLINE database, which was started in the 1960s, and biomedical and life sciences journal articles that date back to 1946. In addition, PubMed accesses other sources, for example, citations to those life sciences journals that submit full-text articles to PubMed Central (PMC). PubMed Central was launched in 2000 as a free archive of biomedical and life science journals.
Read-across predictions require high quality measured data for source analogues. These data are typically retrieved from structured databases, but biomedical literature data are often untapped because current literature mining approaches are resource intensive. Our high-throughpu...
Stephens, Susie M.; Chen, Jake Y.; Davidson, Marcel G.; Thomas, Shiby; Trute, Barry M.
2005-01-01
As database management systems expand their array of analytical functionality, they become powerful research engines for biomedical data analysis and drug discovery. Databases can hold most of the data types commonly required in life sciences and consequently can be used as flexible platforms for the implementation of knowledgebases. Performing data analysis in the database simplifies data management by minimizing the movement of data from disks to memory, allowing pre-filtering and post-processing of datasets, and enabling data to remain in a secure, highly available environment. This article describes the Oracle Database 10g implementation of BLAST and Regular Expression Searches and provides case studies of their usage in bioinformatics. http://www.oracle.com/technology/software/index.html PMID:15608287
Lowe, H J; Lomax, E C; Polonkey, S E
1996-01-01
The Internet is rapidly evolving from a resource used primarily by the research community to a true global information network offering a wide range of databases and services. This evolution presents many opportunities for improved access to biomedical information, but Internet-based resources have often been difficult for the non-expert to develop and use. The World Wide Web (WWW) supports an inexpensive, easy-to-use, cross-platform, graphic interface to the Internet that may radically alter the way we retrieve and disseminate medical data. This paper summarizes the Internet and hypertext origins of the WWW, reviews WWW-specific technologies, and describes current and future applications of this technology in medicine and medical informatics. The paper also includes an appendix of useful biomedical WWW servers. PMID:8750386
Using Ontology Fingerprints to disambiguate gene name entities in the biomedical literature.
Chen, Guocai; Zhao, Jieyi; Cohen, Trevor; Tao, Cui; Sun, Jingchun; Xu, Hua; Bernstam, Elmer V; Lawson, Andrew; Zeng, Jia; Johnson, Amber M; Holla, Vijaykumar; Bailey, Ann M; Lara-Guerra, Humberto; Litzenburger, Beate; Meric-Bernstam, Funda; Jim Zheng, W
2015-01-01
Ambiguous gene names in the biomedical literature are a barrier to accurate information extraction. To overcome this hurdle, we generated Ontology Fingerprints for selected genes that are relevant for personalized cancer therapy. These Ontology Fingerprints were used to evaluate the association between genes and biomedical literature to disambiguate gene names. We obtained 93.6% precision for the test gene set and 80.4% for the area under a receiver-operating characteristics curve for gene and article association. The core algorithm was implemented using a graphics processing unit-based MapReduce framework to handle big data and to improve performance. We conclude that Ontology Fingerprints can help disambiguate gene names mentioned in text and analyse the association between genes and articles. Database URL: http://www.ontologyfingerprint.org © The Author(s) 2015. Published by Oxford University Press.
qPortal: A platform for data-driven biomedical research.
Mohr, Christopher; Friedrich, Andreas; Wojnar, David; Kenar, Erhan; Polatkan, Aydin Can; Codrea, Marius Cosmin; Czemmel, Stefan; Kohlbacher, Oliver; Nahnsen, Sven
2018-01-01
Modern biomedical research aims at drawing biological conclusions from large, highly complex biological datasets. It has become common practice to make extensive use of high-throughput technologies that produce big amounts of heterogeneous data. In addition to the ever-improving accuracy, methods are getting faster and cheaper, resulting in a steadily increasing need for scalable data management and easily accessible means of analysis. We present qPortal, a platform providing users with an intuitive way to manage and analyze quantitative biological data. The backend leverages a variety of concepts and technologies, such as relational databases, data stores, data models and means of data transfer, as well as front-end solutions to give users access to data management and easy-to-use analysis options. Users are empowered to conduct their experiments from the experimental design to the visualization of their results through the platform. Here, we illustrate the feature-rich portal by simulating a biomedical study based on publically available data. We demonstrate the software's strength in supporting the entire project life cycle. The software supports the project design and registration, empowers users to do all-digital project management and finally provides means to perform analysis. We compare our approach to Galaxy, one of the most widely used scientific workflow and analysis platforms in computational biology. Application of both systems to a small case study shows the differences between a data-driven approach (qPortal) and a workflow-driven approach (Galaxy). qPortal, a one-stop-shop solution for biomedical projects offers up-to-date analysis pipelines, quality control workflows, and visualization tools. Through intensive user interactions, appropriate data models have been developed. These models build the foundation of our biological data management system and provide possibilities to annotate data, query metadata for statistics and future re-analysis on high-performance computing systems via coupling of workflow management systems. Integration of project and data management as well as workflow resources in one place present clear advantages over existing solutions.
Ravikumar, Komandur Elayavilli; Wagholikar, Kavishwar B; Li, Dingcheng; Kocher, Jean-Pierre; Liu, Hongfang
2015-06-06
Advances in the next generation sequencing technology has accelerated the pace of individualized medicine (IM), which aims to incorporate genetic/genomic information into medicine. One immediate need in interpreting sequencing data is the assembly of information about genetic variants and their corresponding associations with other entities (e.g., diseases or medications). Even with dedicated effort to capture such information in biological databases, much of this information remains 'locked' in the unstructured text of biomedical publications. There is a substantial lag between the publication and the subsequent abstraction of such information into databases. Multiple text mining systems have been developed, but most of them focus on the sentence level association extraction with performance evaluation based on gold standard text annotations specifically prepared for text mining systems. We developed and evaluated a text mining system, MutD, which extracts protein mutation-disease associations from MEDLINE abstracts by incorporating discourse level analysis, using a benchmark data set extracted from curated database records. MutD achieves an F-measure of 64.3% for reconstructing protein mutation disease associations in curated database records. Discourse level analysis component of MutD contributed to a gain of more than 10% in F-measure when compared against the sentence level association extraction. Our error analysis indicates that 23 of the 64 precision errors are true associations that were not captured by database curators and 68 of the 113 recall errors are caused by the absence of associated disease entities in the abstract. After adjusting for the defects in the curated database, the revised F-measure of MutD in association detection reaches 81.5%. Our quantitative analysis reveals that MutD can effectively extract protein mutation disease associations when benchmarking based on curated database records. The analysis also demonstrates that incorporating discourse level analysis significantly improved the performance of extracting the protein-mutation-disease association. Future work includes the extension of MutD for full text articles.
Heterogeneous database integration in biomedicine.
Sujansky, W
2001-08-01
The rapid expansion of biomedical knowledge, reduction in computing costs, and spread of internet access have created an ocean of electronic data. The decentralized nature of our scientific community and healthcare system, however, has resulted in a patchwork of diverse, or heterogeneous, database implementations, making access to and aggregation of data across databases very difficult. The database heterogeneity problem applies equally to clinical data describing individual patients and biological data characterizing our genome. Specifically, databases are highly heterogeneous with respect to the data models they employ, the data schemas they specify, the query languages they support, and the terminologies they recognize. Heterogeneous database systems attempt to unify disparate databases by providing uniform conceptual schemas that resolve representational heterogeneities, and by providing querying capabilities that aggregate and integrate distributed data. Research in this area has applied a variety of database and knowledge-based techniques, including semantic data modeling, ontology definition, query translation, query optimization, and terminology mapping. Existing systems have addressed heterogeneous database integration in the realms of molecular biology, hospital information systems, and application portability.
Liu, Qin; Tian, Li-Guang; Xiao, Shu-Hua; Qi, Zhen; Steinmann, Peter; Mak, Tippi K; Utzinger, Jürg; Zhou, Xiao-Nong
2008-01-01
The economy of China continues to boom and so have its biomedical research and related publishing activities. Several so-called neglected tropical diseases that are most common in the developing world are still rampant or even emerging in some parts of China. The purpose of this article is to document the significant research potential from the Chinese biomedical bibliographic databases. The research contributions from China in the epidemiology and control of schistosomiasis provide an excellent illustration. We searched two widely used databases, namely China National Knowledge Infrastructure (CNKI) and VIP Information (VIP). Employing the keyword "Schistosoma" () and covering the period 1990–2006, we obtained 10,244 hits in the CNKI database and 5,975 in VIP. We examined 10 Chinese biomedical journals that published the highest number of original research articles on schistosomiasis for issues including languages and open access. Although most of the journals are published in Chinese, English abstracts are usually available. Open access to full articles was available in China Tropical Medicine in 2005/2006 and is granted by the Chinese Journal of Parasitology and Parasitic Diseases since 2003; none of the other journals examined offered open access. We reviewed (i) the discovery and development of antischistosomal drugs, (ii) the progress made with molluscicides and (iii) environmental management for schistosomiasis control in China over the past 20 years. In conclusion, significant research is published in the Chinese literature, which is relevant for local control measures and global scientific knowledge. Open access should be encouraged and language barriers removed so the wealth of Chinese research can be more fully appreciated by the scientific community. PMID:18826598
Literature searches on Ayurveda: An update
Aggithaya, Madhur G.; Narahari, Saravu R.
2015-01-01
Introduction: The journals that publish on Ayurveda are increasingly indexed by popular medical databases in recent years. However, many Eastern journals are not indexed biomedical journal databases such as PubMed. Literature searches for Ayurveda continue to be challenging due to the nonavailability of active, unbiased dedicated databases for Ayurvedic literature. In 2010, authors identified 46 databases that can be used for systematic search of Ayurvedic papers and theses. This update reviewed our previous recommendation and identified current and relevant databases. Aims: To update on Ayurveda literature search and strategy to retrieve maximum publications. Methods: Author used psoriasis as an example to search previously listed databases and identify new. The population, intervention, control, and outcome table included keywords related to psoriasis and Ayurvedic terminologies for skin diseases. Current citation update status, search results, and search options of previous databases were assessed. Eight search strategies were developed. Hundred and five journals, both biomedical and Ayurveda, which publish on Ayurveda, were identified. Variability in databases was explored to identify bias in journal citation. Results: Five among 46 databases are now relevant – AYUSH research portal, Annotated Bibliography of Indian Medicine, Digital Helpline for Ayurveda Research Articles (DHARA), PubMed, and Directory of Open Access Journals. Search options in these databases are not uniform, and only PubMed allows complex search strategy. “The Researches in Ayurveda” and “Ayurvedic Research Database” (ARD) are important grey resources for hand searching. About 44/105 (41.5%) journals publishing Ayurvedic studies are not indexed in any database. Only 11/105 (10.4%) exclusive Ayurveda journals are indexed in PubMed. Conclusion: AYUSH research portal and DHARA are two major portals after 2010. It is mandatory to search PubMed and four other databases because all five carry citations from different groups of journals. The hand searching is important to identify Ayurveda publications that are not indexed elsewhere. Availability information of citations in Ayurveda libraries from National Union Catalogue of Scientific Serials in India if regularly updated will improve the efficacy of hand searching. A grey database (ARD) contains unpublished PG/Ph.D. theses. The AYUSH portal, DHARA (funded by Ministry of AYUSH), and ARD should be merged to form single larger database to limit Ayurveda literature searches. PMID:27313409
[The Chilean Association of Biomedical Journal Editors].
Reyes, H
2001-01-01
On September 29th, 2000, The Chilean Association of Biomedical Journal Editors was founded, sponsored by the "Comisión Nacional de Investigación Científica y Tecnológica (CONICYT)" (the Governmental Agency promoting and funding scientific research and technological development in Chile) and the "Sociedad Médica de Santiago" (Chilean Society of Internal Medicine). The Association adopted the goals of the World Association of Medical Editors (WAME) and therefore it will foster "cooperation and communication among Editors of Chilean biomedical journals; to improve editorial standards, to promote professionalism in medical editing through education, self-criticism and self-regulation; and to encourage research on the principles and practice of medical editing". Twenty nine journals covering a closely similar number of different biomedical sciences, medical specialties, veterinary, dentistry and nursing, became Founding Members of the Association. A Governing Board was elected: President: Humberto Reyes, M.D. (Editor, Revista Médica de Chile); Vice-President: Mariano del Sol, M.D. (Editor, Revista Chilena de Anatomía); Secretary: Anna María Prat (CONICYT); Councilors: Manuel Krauskopff, Ph.D. (Editor, Biological Research) and Maritza Rahal, M.D. (Editor, Revista de Otorrinolaringología y Cirugía de Cabeza y Cuello). The Association will organize a Symposium on Biomedical Journal Editing and will spread information stimulating Chilean biomedical journals to become indexed in international databases and in SciELO-Chile, the main Chilean scientific website (www.scielo.cl).
Xiao, Y; Yuan, L; Liu, Y; Sun, X; Cheng, J; Wang, T; Li, F; Luo, R; Zhao, X
2015-02-01
A large number of traditional Chinese patent medicines (TCPMs) are widely used to treat migraine in China. However, it is uncertain whether there is robust evidence on the effects of TCPMs for migraine. A meta-analysis of randomized, double-blind, placebo-controlled trials was performed to evaluate the efficacy and safety of TCPMs in patients with migraine. Comprehensive searches were conducted on the Medline database, Cochrane Library, the China National Knowledge Infrastructure database, the Chinese Biomedical Literature database and the Wanfang database up to December 2013. Summary estimates, including 95% confidence intervals (CIs), were calculated for frequency of migraine attacks, response rate and headache intensity. A total of seven trials including 582 participants with migraine met the selection criteria. TCPM was significantly more likely to reduce the frequency of migraine attacks compared with placebo (standardized mean difference -0.54; 95% CI -0.72, -0.36; P < 0.001). TCPM was associated with an improvement of response rate compared with placebo (summary relative risk 4.63, 95% CI 2.74, 7.80, P < 0.001; therapeutic gain 24.1%; number needed to treat 4.1). Headache intensity was attenuated by TCPM compared with placebo (standardized mean difference -1.33; 95% CI -1.79, -0.87; P < 0.001). The adverse events of TCPM were no different from those of placebo. TCPMs are effective and well tolerated in the prophylactic treatment of migraine. © 2014 EAN.
Zhang, Yanqiong; Yang, Chunyuan; Wang, Shaochuang; Chen, Tao; Li, Mansheng; Wang, Xue; Li, Dongsheng; Wang, Kang; Ma, Jie; Wu, Songfeng; Zhang, Xueli; Zhu, Yunping; Wu, Jinsheng; He, Fuchu
2013-09-01
A large amount of liver-related physiological and pathological data exist in publicly available biological and bibliographic databases, which are usually far from comprehensive or integrated. Data collection, integration and mining processes pose a great challenge to scientific researchers and clinicians interested in the liver. To address these problems, we constructed LiverAtlas (http://liveratlas.hupo.org.cn), a comprehensive resource of biomedical knowledge related to the liver and various hepatic diseases by incorporating 53 databases. In the present version, LiverAtlas covers data on liver-related genomics, transcriptomics, proteomics, metabolomics and hepatic diseases. Additionally, LiverAtlas provides a wealth of manually curated information, relevant literature citations and cross-references to other databases. Importantly, an expert-confirmed Human Liver Disease Ontology, including relevant information for 227 types of hepatic disease, has been constructed and is used to annotate LiverAtlas data. Furthermore, we have demonstrated two examples of applying LiverAtlas data to identify candidate markers for hepatocellular carcinoma (HCC) at the systems level and to develop a systems biology-based classifier by combining the differential gene expression with topological features of human protein interaction networks to enhance the ability of HCC differential diagnosis. LiverAtlas is the most comprehensive liver and hepatic disease resource, which helps biologists and clinicians to analyse their data at the systems level and will contribute much to the biomarker discovery and diagnostic performance enhancement for liver diseases. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Comparison of PubMed, Scopus, Web of Science, and Google Scholar: strengths and weaknesses.
Falagas, Matthew E; Pitsouni, Eleni I; Malietzis, George A; Pappas, Georgios
2008-02-01
The evolution of the electronic age has led to the development of numerous medical databases on the World Wide Web, offering search facilities on a particular subject and the ability to perform citation analysis. We compared the content coverage and practical utility of PubMed, Scopus, Web of Science, and Google Scholar. The official Web pages of the databases were used to extract information on the range of journals covered, search facilities and restrictions, and update frequency. We used the example of a keyword search to evaluate the usefulness of these databases in biomedical information retrieval and a specific published article to evaluate their utility in performing citation analysis. All databases were practical in use and offered numerous search facilities. PubMed and Google Scholar are accessed for free. The keyword search with PubMed offers optimal update frequency and includes online early articles; other databases can rate articles by number of citations, as an index of importance. For citation analysis, Scopus offers about 20% more coverage than Web of Science, whereas Google Scholar offers results of inconsistent accuracy. PubMed remains an optimal tool in biomedical electronic research. Scopus covers a wider journal range, of help both in keyword searching and citation analysis, but it is currently limited to recent articles (published after 1995) compared with Web of Science. Google Scholar, as for the Web in general, can help in the retrieval of even the most obscure information but its use is marred by inadequate, less often updated, citation information.
Combining Crowd and Expert Labels using Decision Theoretic Active Learning
2015-10-11
meta-data such as titles, author information and keywords. Motivating Application: Biomedical Systematic Reviews Evidence - based medicine (EBM) aims to...individuals trained in evidence - based medicine ; usually MDs) reading the entire set of citations retrieved via database search to identify the small
Schneeweiss, S; Eichler, H-G; Garcia-Altes, A; Chinn, C; Eggimann, A-V; Garner, S; Goettsch, W; Lim, R; Löbker, W; Martin, D; Müller, T; Park, B J; Platt, R; Priddy, S; Ruhl, M; Spooner, A; Vannieuwenhuyse, B; Willke, R J
2016-12-01
Analyses of healthcare databases (claims, electronic health records [EHRs]) are useful supplements to clinical trials for generating evidence on the effectiveness, harm, use, and value of medical products in routine care. A constant stream of data from the routine operation of modern healthcare systems, which can be analyzed in rapid cycles, enables incremental evidence development to support accelerated and appropriate access to innovative medicines. Evidentiary needs by regulators, Health Technology Assessment, payers, clinicians, and patients after marketing authorization comprise (1) monitoring of medication performance in routine care, including the materialized effectiveness, harm, and value; (2) identifying new patient strata with added value or unacceptable harms; and (3) monitoring targeted utilization. Adaptive biomedical innovation (ABI) with rapid cycle database analytics is successfully enabled if evidence is meaningful, valid, expedited, and transparent. These principles will bring rigor and credibility to current efforts to increase research efficiency while upholding evidentiary standards required for effective decision-making in healthcare. © 2016 American Society for Clinical Pharmacology and Therapeutics.
Genomics Community Resources | Informatics Technology for Cancer Research (ITCR)
To facilitate genomic research and the dissemination of its products, National Human Genome Research Institute (NHGRI) supports genomic resources that are crucial for basic research, disease studies, model organism studies, and other biomedical research. Awards under this FOA will support the development and distribution of genomic resources that will be valuable for the broad research community, using cost-effective approaches. Such resources include (but are not limited to) databases and informatics resources (such as human and model organism databases, ontologies, and analysi
2012-01-01
Computational approaches to generate hypotheses from biomedical literature have been studied intensively in recent years. Nevertheless, it still remains a challenge to automatically discover novel, cross-silo biomedical hypotheses from large-scale literature repositories. In order to address this challenge, we first model a biomedical literature repository as a comprehensive network of biomedical concepts and formulate hypotheses generation as a process of link discovery on the concept network. We extract the relevant information from the biomedical literature corpus and generate a concept network and concept-author map on a cluster using Map-Reduce frame-work. We extract a set of heterogeneous features such as random walk based features, neighborhood features and common author features. The potential number of links to consider for the possibility of link discovery is large in our concept network and to address the scalability problem, the features from a concept network are extracted using a cluster with Map-Reduce framework. We further model link discovery as a classification problem carried out on a training data set automatically extracted from two network snapshots taken in two consecutive time duration. A set of heterogeneous features, which cover both topological and semantic features derived from the concept network, have been studied with respect to their impacts on the accuracy of the proposed supervised link discovery process. A case study of hypotheses generation based on the proposed method has been presented in the paper. PMID:22759614
Reinforcement learning interfaces for biomedical database systems.
Rudowsky, I; Kulyba, O; Kunin, M; Parsons, S; Raphan, T
2006-01-01
Studies of neural function that are carried out in different laboratories and that address different questions use a wide range of descriptors for data storage, depending on the laboratory and the individuals that input the data. A common approach to describe non-textual data that are referenced through a relational database is to use metadata descriptors. We have recently designed such a prototype system, but to maintain efficiency and a manageable metadata table, free formatted fields were designed as table entries. The database interface application utilizes an intelligent agent to improve integrity of operation. The purpose of this study was to investigate how reinforcement learning algorithms can assist the user in interacting with the database interface application that has been developed to improve the performance of the system.
Albin, Aaron; Ji, Xiaonan; Borlawsky, Tara B; Ye, Zhan; Lin, Simon; Payne, Philip Ro; Huang, Kun; Xiang, Yang
2014-10-07
The Unified Medical Language System (UMLS) contains many important ontologies in which terms are connected by semantic relations. For many studies on the relationships between biomedical concepts, the use of transitively associated information from ontologies and the UMLS has been shown to be effective. Although there are a few tools and methods available for extracting transitive relationships from the UMLS, they usually have major restrictions on the length of transitive relations or on the number of data sources. Our goal was to design an efficient online platform that enables efficient studies on the conceptual relationships between any medical terms. To overcome the restrictions of available methods and to facilitate studies on the conceptual relationships between medical terms, we developed a Web platform, onGrid, that supports efficient transitive queries and conceptual relationship studies using the UMLS. This framework uses the latest technique in converting natural language queries into UMLS concepts, performs efficient transitive queries, and visualizes the result paths. It also dynamically builds a relationship matrix for two sets of input biomedical terms. We are thus able to perform effective studies on conceptual relationships between medical terms based on their relationship matrix. The advantage of onGrid is that it can be applied to study any two sets of biomedical concept relations and the relations within one set of biomedical concepts. We use onGrid to study the disease-disease relationships in the Online Mendelian Inheritance in Man (OMIM). By crossvalidating our results with an external database, the Comparative Toxicogenomics Database (CTD), we demonstrated that onGrid is effective for the study of conceptual relationships between medical terms. onGrid is an efficient tool for querying the UMLS for transitive relations, studying the relationship between medical terms, and generating hypotheses.
PATTERNS IN BIOMEDICAL DATA-HOW DO WE FIND THEM?
Basile, Anna O; Verma, Anurag; Byrska-Bishop, Marta; Pendergrass, Sarah A; Darabos, Christian; Lester Kirchner, H
2017-01-01
Given the exponential growth of biomedical data, researchers are faced with numerous challenges in extracting and interpreting information from these large, high-dimensional, incomplete, and often noisy data. To facilitate addressing this growing concern, the "Patterns in Biomedical Data-How do we find them?" session of the 2017 Pacific Symposium on Biocomputing (PSB) is devoted to exploring pattern recognition using data-driven approaches for biomedical and precision medicine applications. The papers selected for this session focus on novel machine learning techniques as well as applications of established methods to heterogeneous data. We also feature manuscripts aimed at addressing the current challenges associated with the analysis of biomedical data.
Gururaj, Anupama E.; Chen, Xiaoling; Pournejati, Saeid; Alter, George; Hersh, William R.; Demner-Fushman, Dina; Ohno-Machado, Lucila
2017-01-01
Abstract The rapid proliferation of publicly available biomedical datasets has provided abundant resources that are potentially of value as a means to reproduce prior experiments, and to generate and explore novel hypotheses. However, there are a number of barriers to the re-use of such datasets, which are distributed across a broad array of dataset repositories, focusing on different data types and indexed using different terminologies. New methods are needed to enable biomedical researchers to locate datasets of interest within this rapidly expanding information ecosystem, and new resources are needed for the formal evaluation of these methods as they emerge. In this paper, we describe the design and generation of a benchmark for information retrieval of biomedical datasets, which was developed and used for the 2016 bioCADDIE Dataset Retrieval Challenge. In the tradition of the seminal Cranfield experiments, and as exemplified by the Text Retrieval Conference (TREC), this benchmark includes a corpus (biomedical datasets), a set of queries, and relevance judgments relating these queries to elements of the corpus. This paper describes the process through which each of these elements was derived, with a focus on those aspects that distinguish this benchmark from typical information retrieval reference sets. Specifically, we discuss the origin of our queries in the context of a larger collaborative effort, the biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium, and the distinguishing features of biomedical dataset retrieval as a task. The resulting benchmark set has been made publicly available to advance research in the area of biomedical dataset retrieval. Database URL: https://biocaddie.org/benchmark-data PMID:29220453
Perera, Gayan; Broadbent, Matthew; Callard, Felicity; Chang, Chin-Kuo; Downs, Johnny; Dutta, Rina; Fernandes, Andrea; Hayes, Richard D; Henderson, Max; Jackson, Richard; Jewell, Amelia; Kadra, Giouliana; Little, Ryan; Pritchard, Megan; Shetty, Hitesh; Tulloch, Alex; Stewart, Robert
2016-03-01
The South London and Maudsley National Health Service (NHS) Foundation Trust Biomedical Research Centre (SLaM BRC) Case Register and its Clinical Record Interactive Search (CRIS) application were developed in 2008, generating a research repository of real-time, anonymised, structured and open-text data derived from the electronic health record system used by SLaM, a large mental healthcare provider in southeast London. In this paper, we update this register's descriptive data, and describe the substantial expansion and extension of the data resource since its original development. Descriptive data were generated from the SLaM BRC Case Register on 31 December 2014. Currently, there are over 250,000 patient records accessed through CRIS. Since 2008, the most significant developments in the SLaM BRC Case Register have been the introduction of natural language processing to extract structured data from open-text fields, linkages to external sources of data, and the addition of a parallel relational database (Structured Query Language) output. Natural language processing applications to date have brought in new and hitherto inaccessible data on cognitive function, education, social care receipt, smoking, diagnostic statements and pharmacotherapy. In addition, through external data linkages, large volumes of supplementary information have been accessed on mortality, hospital attendances and cancer registrations. Coupled with robust data security and governance structures, electronic health records provide potentially transformative information on mental disorders and outcomes in routine clinical care. The SLaM BRC Case Register continues to grow as a database, with approximately 20,000 new cases added each year, in addition to extension of follow-up for existing cases. Data linkages and natural language processing present important opportunities to enhance this type of research resource further, achieving both volume and depth of data. However, research projects still need to be carefully tailored, so that they take into account the nature and quality of the source information. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Twitter K-H networks in action: Advancing biomedical literature for drug search.
Hamed, Ahmed Abdeen; Wu, Xindong; Erickson, Robert; Fandy, Tamer
2015-08-01
The importance of searching biomedical literature for drug interaction and side-effects is apparent. Current digital libraries (e.g., PubMed) suffer infrequent tagging and metadata annotation updates. Such limitations cause absence of linking literature to new scientific evidence. This demonstrates a great deal of challenges that stand in the way of scientists when searching biomedical repositories. In this paper, we present a network mining approach that provides a bridge for linking and searching drug-related literature. Our contributions here are two fold: (1) an efficient algorithm called HashPairMiner to address the run-time complexity issues demonstrated in its predecessor algorithm: HashnetMiner, and (2) a database of discoveries hosted on the web to facilitate literature search using the results produced by HashPairMiner. Though the K-H network model and the HashPairMiner algorithm are fairly young, their outcome is evidence of the considerable promise they offer to the biomedical science community in general and the drug research community in particular. Copyright © 2015 Elsevier Inc. All rights reserved.
Alkemio: association of chemicals with biomedical topics by text and data mining
Gijón-Correas, José A.; Andrade-Navarro, Miguel A.; Fontaine, Jean F.
2014-01-01
The PubMed® database of biomedical citations allows the retrieval of scientific articles studying the function of chemicals in biology and medicine. Mining millions of available citations to search reported associations between chemicals and topics of interest would require substantial human time. We have implemented the Alkemio text mining web tool and SOAP web service to help in this task. The tool uses biomedical articles discussing chemicals (including drugs), predicts their relatedness to the query topic with a naïve Bayesian classifier and ranks all chemicals by P-values computed from random simulations. Benchmarks on seven human pathways showed good retrieval performance (areas under the receiver operating characteristic curves ranged from 73.6 to 94.5%). Comparison with existing tools to retrieve chemicals associated to eight diseases showed the higher precision and recall of Alkemio when considering the top 10 candidate chemicals. Alkemio is a high performing web tool ranking chemicals for any biomedical topics and it is free to non-commercial users. Availability: http://cbdm.mdc-berlin.de/∼medlineranker/cms/alkemio. PMID:24838570
Uniform resolution of compact identifiers for biomedical data
Wimalaratne, Sarala M.; Juty, Nick; Kunze, John; Janée, Greg; McMurry, Julie A.; Beard, Niall; Jimenez, Rafael; Grethe, Jeffrey S.; Hermjakob, Henning; Martone, Maryann E.; Clark, Tim
2018-01-01
Most biomedical data repositories issue locally-unique accessions numbers, but do not provide globally unique, machine-resolvable, persistent identifiers for their datasets, as required by publishers wishing to implement data citation in accordance with widely accepted principles. Local accessions may however be prefixed with a namespace identifier, providing global uniqueness. Such “compact identifiers” have been widely used in biomedical informatics to support global resource identification with local identifier assignment. We report here on our project to provide robust support for machine-resolvable, persistent compact identifiers in biomedical data citation, by harmonizing the Identifiers.org and N2T.net (Name-To-Thing) meta-resolvers and extending their capabilities. Identifiers.org services hosted at the European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), and N2T.net services hosted at the California Digital Library (CDL), can now resolve any given identifier from over 600 source databases to its original source on the Web, using a common registry of prefix-based redirection rules. We believe these services will be of significant help to publishers and others implementing persistent, machine-resolvable citation of research data. PMID:29737976
Kibbe, Warren A; Arze, Cesar; Felix, Victor; Mitraka, Elvira; Bolton, Evan; Fu, Gang; Mungall, Christopher J; Binder, Janos X; Malone, James; Vasant, Drashtti; Parkinson, Helen; Schriml, Lynn M
2015-01-01
The current version of the Human Disease Ontology (DO) (http://www.disease-ontology.org) database expands the utility of the ontology for the examination and comparison of genetic variation, phenotype, protein, drug and epitope data through the lens of human disease. DO is a biomedical resource of standardized common and rare disease concepts with stable identifiers organized by disease etiology. The content of DO has had 192 revisions since 2012, including the addition of 760 terms. Thirty-two percent of all terms now include definitions. DO has expanded the number and diversity of research communities and community members by 50+ during the past two years. These community members actively submit term requests, coordinate biomedical resource disease representation and provide expert curation guidance. Since the DO 2012 NAR paper, there have been hundreds of term requests and a steady increase in the number of DO listserv members, twitter followers and DO website usage. DO is moving to a multi-editor model utilizing Protégé to curate DO in web ontology language. This will enable closer collaboration with the Human Phenotype Ontology, EBI's Ontology Working Group, Mouse Genome Informatics and the Monarch Initiative among others, and enhance DO's current asserted view and multiple inferred views through reasoning. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Kibbe, Warren A.; Arze, Cesar; Felix, Victor; ...
2014-10-27
The current version of the Human Disease Ontology (DO) (http://www.disease-ontology.org) database expands the utility of the ontology for the examination and comparison of genetic variation, phenotype, protein, drug and epitope data through the lens of human disease. DO is a biomedical resource of standardized common and rare disease concepts with stable identifiers organized by disease etiology. The content of DO has had 192 revisions since 2012, including the addition of 760 terms. Thirty-two percent of all terms now include definitions. DO has expanded the number and diversity of research communities and community members by 50+ during the past two years.more » These community members actively submit term requests, coordinate biomedical resource disease representation and provide expert curation guidance. Since the DO 2012 NAR paper, there have been hundreds of term requests and a steady increase in the number of DO listserv members, twitter followers and DO website usage. DO is moving to a multi-editor model utilizing Protégé to curate DO in web ontology language. In conclusion, this will enable closer collaboration with the Human Phenotype Ontology, EBI's Ontology Working Group, Mouse Genome Informatics and the Monarch Initiative among others, and enhance DO's current asserted view and multiple inferred views through reasoning.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kibbe, Warren A.; Arze, Cesar; Felix, Victor
The current version of the Human Disease Ontology (DO) (http://www.disease-ontology.org) database expands the utility of the ontology for the examination and comparison of genetic variation, phenotype, protein, drug and epitope data through the lens of human disease. DO is a biomedical resource of standardized common and rare disease concepts with stable identifiers organized by disease etiology. The content of DO has had 192 revisions since 2012, including the addition of 760 terms. Thirty-two percent of all terms now include definitions. DO has expanded the number and diversity of research communities and community members by 50+ during the past two years.more » These community members actively submit term requests, coordinate biomedical resource disease representation and provide expert curation guidance. Since the DO 2012 NAR paper, there have been hundreds of term requests and a steady increase in the number of DO listserv members, twitter followers and DO website usage. DO is moving to a multi-editor model utilizing Protégé to curate DO in web ontology language. In conclusion, this will enable closer collaboration with the Human Phenotype Ontology, EBI's Ontology Working Group, Mouse Genome Informatics and the Monarch Initiative among others, and enhance DO's current asserted view and multiple inferred views through reasoning.« less
GOVERNING GENETIC DATABASES: COLLECTION, STORAGE AND USE
Gibbons, Susan M.C.; Kaye, Jane
2008-01-01
This paper provides an introduction to a collection of five papers, published as a special symposium journal issue, under the title: “Governing Genetic Databases: Collection, Storage and Use”. It begins by setting the scene, to provide a backdrop and context for the papers. It describes the evolving scientific landscape around genetic databases and genomic research, particularly within the biomedical and criminal forensic investigation fields. It notes the lack of any clear, coherent or coordinated legal governance regime, either at the national or international level. It then identifies and reflects on key cross-cutting issues and themes that emerge from the five papers, in particular: terminology and definitions; consent; special concerns around population genetic databases (biobanks) and forensic databases; international harmonisation; data protection; data access; boundary-setting; governance; and issues around balancing individual interests against public good values. PMID:18841252
Grethe, Jeffrey S; Ross, Edward; Little, David; Sanders, Brian; Gupta, Amarnath; Astakhov, Vadim
2009-01-01
This paper presents current progress in the development of semantic data integration environment which is a part of the Biomedical Informatics Research Network (BIRN; http://www.nbirn.net) project. BIRN is sponsored by the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH). A goal is the development of a cyberinfrastructure for biomedical research that supports advance data acquisition, data storage, data management, data integration, data mining, data visualization, and other computing and information processing services over the Internet. Each participating institution maintains storage of their experimental or computationally derived data. Mediator-based data integration system performs semantic integration over the databases to enable researchers to perform analyses based on larger and broader datasets than would be available from any single institution's data. This paper describes recent revision of the system architecture, implementation, and capabilities of the semantically based data integration environment for BIRN.
Extracting biomedical events from pairs of text entities
2015-01-01
Background Huge amounts of electronic biomedical documents, such as molecular biology reports or genomic papers are generated daily. Nowadays, these documents are mainly available in the form of unstructured free texts, which require heavy processing for their registration into organized databases. This organization is instrumental for information retrieval, enabling to answer the advanced queries of researchers and practitioners in biology, medicine, and related fields. Hence, the massive data flow calls for efficient automatic methods of text-mining that extract high-level information, such as biomedical events, from biomedical text. The usual computational tools of Natural Language Processing cannot be readily applied to extract these biomedical events, due to the peculiarities of the domain. Indeed, biomedical documents contain highly domain-specific jargon and syntax. These documents also describe distinctive dependencies, making text-mining in molecular biology a specific discipline. Results We address biomedical event extraction as the classification of pairs of text entities into the classes corresponding to event types. The candidate pairs of text entities are recursively provided to a multiclass classifier relying on Support Vector Machines. This recursive process extracts events involving other events as arguments. Compared to joint models based on Markov Random Fields, our model simplifies inference and hence requires shorter training and prediction times along with lower memory capacity. Compared to usual pipeline approaches, our model passes over a complex intermediate problem, while making a more extensive usage of sophisticated joint features between text entities. Our method focuses on the core event extraction of the Genia task of BioNLP challenges yielding the best result reported so far on the 2013 edition. PMID:26201478
A Novel Multi-Class Ensemble Model for Classifying Imbalanced Biomedical Datasets
NASA Astrophysics Data System (ADS)
Bikku, Thulasi; Sambasiva Rao, N., Dr; Rao, Akepogu Ananda, Dr
2017-08-01
This paper mainly focuseson developing aHadoop based framework for feature selection and classification models to classify high dimensionality data in heterogeneous biomedical databases. Wide research has been performing in the fields of Machine learning, Big data and Data mining for identifying patterns. The main challenge is extracting useful features generated from diverse biological systems. The proposed model can be used for predicting diseases in various applications and identifying the features relevant to particular diseases. There is an exponential growth of biomedical repositories such as PubMed and Medline, an accurate predictive model is essential for knowledge discovery in Hadoop environment. Extracting key features from unstructured documents often lead to uncertain results due to outliers and missing values. In this paper, we proposed a two phase map-reduce framework with text preprocessor and classification model. In the first phase, mapper based preprocessing method was designed to eliminate irrelevant features, missing values and outliers from the biomedical data. In the second phase, a Map-Reduce based multi-class ensemble decision tree model was designed and implemented in the preprocessed mapper data to improve the true positive rate and computational time. The experimental results on the complex biomedical datasets show that the performance of our proposed Hadoop based multi-class ensemble model significantly outperforms state-of-the-art baselines.
An automated procedure to identify biomedical articles that contain cancer-associated gene variants.
McDonald, Ryan; Scott Winters, R; Ankuda, Claire K; Murphy, Joan A; Rogers, Amy E; Pereira, Fernando; Greenblatt, Marc S; White, Peter S
2006-09-01
The proliferation of biomedical literature makes it increasingly difficult for researchers to find and manage relevant information. However, identifying research articles containing mutation data, a requisite first step in integrating large and complex mutation data sets, is currently tedious, time-consuming and imprecise. More effective mechanisms for identifying articles containing mutation information would be beneficial both for the curation of mutation databases and for individual researchers. We developed an automated method that uses information extraction, classifier, and relevance ranking techniques to determine the likelihood of MEDLINE abstracts containing information regarding genomic variation data suitable for inclusion in mutation databases. We targeted the CDKN2A (p16) gene and the procedure for document identification currently used by CDKN2A Database curators as a measure of feasibility. A set of abstracts was manually identified from a MEDLINE search as potentially containing specific CDKN2A mutation events. A subset of these abstracts was used as a training set for a maximum entropy classifier to identify text features distinguishing "relevant" from "not relevant" abstracts. Each document was represented as a set of indicative word, word pair, and entity tagger-derived genomic variation features. When applied to a test set of 200 candidate abstracts, the classifier predicted 88 articles as being relevant; of these, 29 of 32 manuscripts in which manual curation found CDKN2A sequence variants were positively predicted. Thus, the set of potentially useful articles that a manual curator would have to review was reduced by 56%, maintaining 91% recall (sensitivity) and more than doubling precision (positive predictive value). Subsequent expansion of the training set to 494 articles yielded similar precision and recall rates, and comparison of the original and expanded trials demonstrated that the average precision improved with the larger data set. Our results show that automated systems can effectively identify article subsets relevant to a given task and may prove to be powerful tools for the broader research community. This procedure can be readily adapted to any or all genes, organisms, or sets of documents. Published 2006 Wiley-Liss, Inc.
Spatial and symbolic queries for 3D image data
NASA Astrophysics Data System (ADS)
Benson, Daniel C.; Zick, Gregory L.
1992-04-01
We present a query system for an object-oriented biomedical imaging database containing 3-D anatomical structures and their corresponding 2-D images. The graphical interface facilitates the formation of spatial queries, nonspatial or symbolic queries, and combined spatial/symbolic queries. A query editor is used for the creation and manipulation of 3-D query objects as volumes, surfaces, lines, and points. Symbolic predicates are formulated through a combination of text fields and multiple choice selections. Query results, which may include images, image contents, composite objects, graphics, and alphanumeric data, are displayed in multiple views. Objects returned by the query may be selected directly within the views for further inspection or modification, or for use as query objects in subsequent queries. Our image database query system provides visual feedback and manipulation of spatial query objects, multiple views of volume data, and the ability to combine spatial and symbolic queries. The system allows for incremental enhancement of existing objects and the addition of new objects and spatial relationships. The query system is designed for databases containing symbolic and spatial data. This paper discuses its application to data acquired in biomedical 3- D image reconstruction, but it is applicable to other areas such as CAD/CAM, geographical information systems, and computer vision.
UKPMC: a full text article resource for the life sciences.
McEntyre, Johanna R; Ananiadou, Sophia; Andrews, Stephen; Black, William J; Boulderstone, Richard; Buttery, Paula; Chaplin, David; Chevuru, Sandeepreddy; Cobley, Norman; Coleman, Lee-Ann; Davey, Paul; Gupta, Bharti; Haji-Gholam, Lesley; Hawkins, Craig; Horne, Alan; Hubbard, Simon J; Kim, Jee-Hyub; Lewin, Ian; Lyte, Vic; MacIntyre, Ross; Mansoor, Sami; Mason, Linda; McNaught, John; Newbold, Elizabeth; Nobata, Chikashi; Ong, Ernest; Pillai, Sharmila; Rebholz-Schuhmann, Dietrich; Rosie, Heather; Rowbotham, Rob; Rupp, C J; Stoehr, Peter; Vaughan, Philip
2011-01-01
UK PubMed Central (UKPMC) is a full-text article database that extends the functionality of the original PubMed Central (PMC) repository. The UKPMC project was launched as the first 'mirror' site to PMC, which in analogy to the International Nucleotide Sequence Database Collaboration, aims to provide international preservation of the open and free-access biomedical literature. UKPMC (http://ukpmc.ac.uk) has undergone considerable development since its inception in 2007 and now includes both a UKPMC and PubMed search, as well as access to other records such as Agricola, Patents and recent biomedical theses. UKPMC also differs from PubMed/PMC in that the full text and abstract information can be searched in an integrated manner from one input box. Furthermore, UKPMC contains 'Cited By' information as an alternative way to navigate the literature and has incorporated text-mining approaches to semantically enrich content and integrate it with related database resources. Finally, UKPMC also offers added-value services (UKPMC+) that enable grantees to deposit manuscripts, link papers to grants, publish online portfolios and view citation information on their papers. Here we describe UKPMC and clarify the relationship between PMC and UKPMC, providing historical context and future directions, 10 years on from when PMC was first launched.
The BioGRID interaction database: 2013 update.
Chatr-Aryamontri, Andrew; Breitkreutz, Bobby-Joe; Heinicke, Sven; Boucher, Lorrie; Winter, Andrew; Stark, Chris; Nixon, Julie; Ramage, Lindsay; Kolas, Nadine; O'Donnell, Lara; Reguly, Teresa; Breitkreutz, Ashton; Sellam, Adnane; Chen, Daici; Chang, Christie; Rust, Jennifer; Livstone, Michael; Oughtred, Rose; Dolinski, Kara; Tyers, Mike
2013-01-01
The Biological General Repository for Interaction Datasets (BioGRID: http//thebiogrid.org) is an open access archive of genetic and protein interactions that are curated from the primary biomedical literature for all major model organism species. As of September 2012, BioGRID houses more than 500 000 manually annotated interactions from more than 30 model organisms. BioGRID maintains complete curation coverage of the literature for the budding yeast Saccharomyces cerevisiae, the fission yeast Schizosaccharomyces pombe and the model plant Arabidopsis thaliana. A number of themed curation projects in areas of biomedical importance are also supported. BioGRID has established collaborations and/or shares data records for the annotation of interactions and phenotypes with most major model organism databases, including Saccharomyces Genome Database, PomBase, WormBase, FlyBase and The Arabidopsis Information Resource. BioGRID also actively engages with the text-mining community to benchmark and deploy automated tools to expedite curation workflows. BioGRID data are freely accessible through both a user-defined interactive interface and in batch downloads in a wide variety of formats, including PSI-MI2.5 and tab-delimited files. BioGRID records can also be interrogated and analyzed with a series of new bioinformatics tools, which include a post-translational modification viewer, a graphical viewer, a REST service and a Cytoscape plugin.
UKPMC: a full text article resource for the life sciences
McEntyre, Johanna R.; Ananiadou, Sophia; Andrews, Stephen; Black, William J.; Boulderstone, Richard; Buttery, Paula; Chaplin, David; Chevuru, Sandeepreddy; Cobley, Norman; Coleman, Lee-Ann; Davey, Paul; Gupta, Bharti; Haji-Gholam, Lesley; Hawkins, Craig; Horne, Alan; Hubbard, Simon J.; Kim, Jee-Hyub; Lewin, Ian; Lyte, Vic; MacIntyre, Ross; Mansoor, Sami; Mason, Linda; McNaught, John; Newbold, Elizabeth; Nobata, Chikashi; Ong, Ernest; Pillai, Sharmila; Rebholz-Schuhmann, Dietrich; Rosie, Heather; Rowbotham, Rob; Rupp, C. J.; Stoehr, Peter; Vaughan, Philip
2011-01-01
UK PubMed Central (UKPMC) is a full-text article database that extends the functionality of the original PubMed Central (PMC) repository. The UKPMC project was launched as the first ‘mirror’ site to PMC, which in analogy to the International Nucleotide Sequence Database Collaboration, aims to provide international preservation of the open and free-access biomedical literature. UKPMC (http://ukpmc.ac.uk) has undergone considerable development since its inception in 2007 and now includes both a UKPMC and PubMed search, as well as access to other records such as Agricola, Patents and recent biomedical theses. UKPMC also differs from PubMed/PMC in that the full text and abstract information can be searched in an integrated manner from one input box. Furthermore, UKPMC contains ‘Cited By’ information as an alternative way to navigate the literature and has incorporated text-mining approaches to semantically enrich content and integrate it with related database resources. Finally, UKPMC also offers added-value services (UKPMC+) that enable grantees to deposit manuscripts, link papers to grants, publish online portfolios and view citation information on their papers. Here we describe UKPMC and clarify the relationship between PMC and UKPMC, providing historical context and future directions, 10 years on from when PMC was first launched. PMID:21062818
Trapp, Jamie
2016-12-01
There are often differences in a publication's citation count, depending on the database accessed. Here, aspects of citation counts for medical physics and biomedical engineering papers are studied using papers published in the journal Australasian physical and engineering sciences in medicine. Comparison is made between the Web of Science, Scopus, and Google Scholar. Papers are categorised into subject matter, and citation trends are examined. It is shown that review papers as a group tend to receive more citations on average; however the highest cited individual papers are more likely to be research papers.
Open-source tools for data mining.
Zupan, Blaz; Demsar, Janez
2008-03-01
With a growing volume of biomedical databases and repositories, the need to develop a set of tools to address their analysis and support knowledge discovery is becoming acute. The data mining community has developed a substantial set of techniques for computational treatment of these data. In this article, we discuss the evolution of open-source toolboxes that data mining researchers and enthusiasts have developed over the span of a few decades and review several currently available open-source data mining suites. The approaches we review are diverse in data mining methods and user interfaces and also demonstrate that the field and its tools are ready to be fully exploited in biomedical research.
2012-01-01
Background Anatomic and physiological similarities to the human make swine an excellent large animal model for human health and disease. Methods Cloning from a modified somatic cell, which can be determined in cells prior to making the animal, is the only method available for the production of targeted modifications in swine. Results Since some strains of swine are similar in size to humans, technologies that have been developed for swine can be readily adapted to humans and vice versa. Here the importance of swine as a biomedical model, current technologies to produce genetically enhanced swine, current biomedical models, and how the completion of the swine genome will promote swine as a biomedical model are discussed. Conclusions The completion of the swine genome will enhance the continued use and development of swine as models of human health, syndromes and conditions. PMID:23151353
Bramer, Wichor M; Giustini, Dean; Kramer, Bianca Mr; Anderson, Pf
2013-12-23
The usefulness of Google Scholar (GS) as a bibliographic database for biomedical systematic review (SR) searching is a subject of current interest and debate in research circles. Recent research has suggested GS might even be used alone in SR searching. This assertion is challenged here by testing whether GS can locate all studies included in 21 previously published SRs. Second, it examines the recall of GS, taking into account the maximum number of items that can be viewed, and tests whether more complete searches created by an information specialist will improve recall compared to the searches used in the 21 published SRs. The authors identified 21 biomedical SRs that had used GS and PubMed as information sources and reported their use of identical, reproducible search strategies in both databases. These search strategies were rerun in GS and PubMed, and analyzed as to their coverage and recall. Efforts were made to improve searches that underperformed in each database. GS' overall coverage was higher than PubMed (98% versus 91%) and overall recall is higher in GS: 80% of the references included in the 21 SRs were returned by the original searches in GS versus 68% in PubMed. Only 72% of the included references could be used as they were listed among the first 1,000 hits (the maximum number shown). Practical precision (the number of included references retrieved in the first 1,000, divided by 1,000) was on average 1.9%, which is only slightly lower than in other published SRs. Improving searches with the lowest recall resulted in an increase in recall from 48% to 66% in GS and, in PubMed, from 60% to 85%. Although its coverage and precision are acceptable, GS, because of its incomplete recall, should not be used as a single source in SR searching. A specialized, curated medical database such as PubMed provides experienced searchers with tools and functionality that help improve recall, and numerous options in order to optimize precision. Searches for SRs should be performed by experienced searchers creating searches that maximize recall for as many databases as deemed necessary by the search expert.
2013-01-01
Background The usefulness of Google Scholar (GS) as a bibliographic database for biomedical systematic review (SR) searching is a subject of current interest and debate in research circles. Recent research has suggested GS might even be used alone in SR searching. This assertion is challenged here by testing whether GS can locate all studies included in 21 previously published SRs. Second, it examines the recall of GS, taking into account the maximum number of items that can be viewed, and tests whether more complete searches created by an information specialist will improve recall compared to the searches used in the 21 published SRs. Methods The authors identified 21 biomedical SRs that had used GS and PubMed as information sources and reported their use of identical, reproducible search strategies in both databases. These search strategies were rerun in GS and PubMed, and analyzed as to their coverage and recall. Efforts were made to improve searches that underperformed in each database. Results GS’ overall coverage was higher than PubMed (98% versus 91%) and overall recall is higher in GS: 80% of the references included in the 21 SRs were returned by the original searches in GS versus 68% in PubMed. Only 72% of the included references could be used as they were listed among the first 1,000 hits (the maximum number shown). Practical precision (the number of included references retrieved in the first 1,000, divided by 1,000) was on average 1.9%, which is only slightly lower than in other published SRs. Improving searches with the lowest recall resulted in an increase in recall from 48% to 66% in GS and, in PubMed, from 60% to 85%. Conclusions Although its coverage and precision are acceptable, GS, because of its incomplete recall, should not be used as a single source in SR searching. A specialized, curated medical database such as PubMed provides experienced searchers with tools and functionality that help improve recall, and numerous options in order to optimize precision. Searches for SRs should be performed by experienced searchers creating searches that maximize recall for as many databases as deemed necessary by the search expert. PMID:24360284
Airoldi, Edoardo M.; Bai, Xue; Malin, Bradley A.
2011-01-01
We live in an increasingly mobile world, which leads to the duplication of information across domains. Though organizations attempt to obscure the identities of their constituents when sharing information for worthwhile purposes, such as basic research, the uncoordinated nature of such environment can lead to privacy vulnerabilities. For instance, disparate healthcare providers can collect information on the same patient. Federal policy requires that such providers share “de-identified” sensitive data, such as biomedical (e.g., clinical and genomic) records. But at the same time, such providers can share identified information, devoid of sensitive biomedical data, for administrative functions. On a provider-by-provider basis, the biomedical and identified records appear unrelated, however, links can be established when multiple providers’ databases are studied jointly. The problem, known as trail disclosure, is a generalized phenomenon and occurs because an individual’s location access pattern can be matched across the shared databases. Due to technical and legal constraints, it is often difficult to coordinate between providers and thus it is critical to assess the disclosure risk in distributed environments, so that we can develop techniques to mitigate such risks. Research on privacy protection has so far focused on developing technologies to suppress or encrypt identifiers associated with sensitive information. There is growing body of work on the formal assessment of the disclosure risk of database entries in publicly shared databases, but a less attention has been paid to the distributed setting. In this research, we review the trail disclosure problem in several domains with known vulnerabilities and show that disclosure risk is influenced by the distribution of how people visit service providers. Based on empirical evidence, we propose an entropy metric for assessing such risk in shared databases prior to their release. This metric assesses risk by leveraging the statistical characteristics of a visit distribution, as opposed to person-level data. It is computationally efficient and superior to existing risk assessment methods, which rely on ad hoc assessment that are often computationally expensive and unreliable. We evaluate our approach on a range of location access patterns in simulated environments. Our results demonstrate the approach is effective at estimating trail disclosure risks and the amount of self-information contained in a distributed system is one of the main driving factors. PMID:21647242
View-Based Searching Systems--Progress Towards Effective Disintermediation.
ERIC Educational Resources Information Center
Pollitt, A. Steven; Smith, Martin P.; Treglown, Mark; Braekevelt, Patrick
This paper presents the background and then reports progress made in the development of two view-based searching systems--HIBROWSE for EMBASE, searching Europe's most important biomedical bibliographic database, and HIBROWSE for EPOQUE, improving access to the European Parliament's Online Query System. The HIBROWSE approach to searching promises…
Torous, John; Kiang, Mathew V; Lorme, Jeanette; Onnela, Jukka-Pekka
2016-05-05
A longstanding barrier to progress in psychiatry, both in clinical settings and research trials, has been the persistent difficulty of accurately and reliably quantifying disease phenotypes. Mobile phone technology combined with data science has the potential to offer medicine a wealth of additional information on disease phenotypes, but the large majority of existing smartphone apps are not intended for use as biomedical research platforms and, as such, do not generate research-quality data. Our aim is not the creation of yet another app per se but rather the establishment of a platform to collect research-quality smartphone raw sensor and usage pattern data. Our ultimate goal is to develop statistical, mathematical, and computational methodology to enable us and others to extract biomedical and clinical insights from smartphone data. We report on the development and early testing of Beiwe, a research platform featuring a study portal, smartphone app, database, and data modeling and analysis tools designed and developed specifically for transparent, customizable, and reproducible biomedical research use, in particular for the study of psychiatric and neurological disorders. We also outline a proposed study using the platform for patients with schizophrenia. We demonstrate the passive data capabilities of the Beiwe platform and early results of its analytical capabilities. Smartphone sensors and phone usage patterns, when coupled with appropriate statistical learning tools, are able to capture various social and behavioral manifestations of illnesses, in naturalistic settings, as lived and experienced by patients. The ubiquity of smartphones makes this type of moment-by-moment quantification of disease phenotypes highly scalable and, when integrated within a transparent research platform, presents tremendous opportunities for research, discovery, and patient health.
Torous, John; Kiang, Mathew V; Lorme, Jeanette
2016-01-01
Background A longstanding barrier to progress in psychiatry, both in clinical settings and research trials, has been the persistent difficulty of accurately and reliably quantifying disease phenotypes. Mobile phone technology combined with data science has the potential to offer medicine a wealth of additional information on disease phenotypes, but the large majority of existing smartphone apps are not intended for use as biomedical research platforms and, as such, do not generate research-quality data. Objective Our aim is not the creation of yet another app per se but rather the establishment of a platform to collect research-quality smartphone raw sensor and usage pattern data. Our ultimate goal is to develop statistical, mathematical, and computational methodology to enable us and others to extract biomedical and clinical insights from smartphone data. Methods We report on the development and early testing of Beiwe, a research platform featuring a study portal, smartphone app, database, and data modeling and analysis tools designed and developed specifically for transparent, customizable, and reproducible biomedical research use, in particular for the study of psychiatric and neurological disorders. We also outline a proposed study using the platform for patients with schizophrenia. Results We demonstrate the passive data capabilities of the Beiwe platform and early results of its analytical capabilities. Conclusions Smartphone sensors and phone usage patterns, when coupled with appropriate statistical learning tools, are able to capture various social and behavioral manifestations of illnesses, in naturalistic settings, as lived and experienced by patients. The ubiquity of smartphones makes this type of moment-by-moment quantification of disease phenotypes highly scalable and, when integrated within a transparent research platform, presents tremendous opportunities for research, discovery, and patient health. PMID:27150677
NASA Astrophysics Data System (ADS)
Chen, Zhangqi; Liu, Zi-Kui; Zhao, Ji-Cheng
2018-05-01
Diffusion coefficients of seven binary systems (Ti-Mo, Ti-Nb, Ti-Ta, Ti-Zr, Zr-Mo, Zr-Nb, and Zr-Ta) at 1200 °C, 1000 °C, and 800 °C were experimentally determined using three Ti-Mo-Nb-Ta-Zr diffusion multiples. Electron probe microanalysis (EPMA) was performed to collect concentration profiles at the binary diffusion regions. Forward simulation analysis (FSA) was then applied to extract both impurity and interdiffusion coefficients in Ti-rich and Zr-rich part of the bcc phase. Excellent agreements between our results and most of the literature data validate the high-throughput approach combining FSA with diffusion multiples to obtain a large amount of systematic diffusion data, which will help establish the diffusion (mobility) databases for the design and development of biomedical and structural Ti alloys.
NASA Astrophysics Data System (ADS)
Chen, Zhangqi; Liu, Zi-Kui; Zhao, Ji-Cheng
2018-07-01
Diffusion coefficients of seven binary systems (Ti-Mo, Ti-Nb, Ti-Ta, Ti-Zr, Zr-Mo, Zr-Nb, and Zr-Ta) at 1200 °C, 1000 °C, and 800 °C were experimentally determined using three Ti-Mo-Nb-Ta-Zr diffusion multiples. Electron probe microanalysis (EPMA) was performed to collect concentration profiles at the binary diffusion regions. Forward simulation analysis (FSA) was then applied to extract both impurity and interdiffusion coefficients in Ti-rich and Zr-rich part of the bcc phase. Excellent agreements between our results and most of the literature data validate the high-throughput approach combining FSA with diffusion multiples to obtain a large amount of systematic diffusion data, which will help establish the diffusion (mobility) databases for the design and development of biomedical and structural Ti alloys.
A review on computational systems biology of pathogen–host interactions
Durmuş, Saliha; Çakır, Tunahan; Özgür, Arzucan; Guthke, Reinhard
2015-01-01
Pathogens manipulate the cellular mechanisms of host organisms via pathogen–host interactions (PHIs) in order to take advantage of the capabilities of host cells, leading to infections. The crucial role of these interspecies molecular interactions in initiating and sustaining infections necessitates a thorough understanding of the corresponding mechanisms. Unlike the traditional approach of considering the host or pathogen separately, a systems-level approach, considering the PHI system as a whole is indispensable to elucidate the mechanisms of infection. Following the technological advances in the post-genomic era, PHI data have been produced in large-scale within the last decade. Systems biology-based methods for the inference and analysis of PHI regulatory, metabolic, and protein–protein networks to shed light on infection mechanisms are gaining increasing demand thanks to the availability of omics data. The knowledge derived from the PHIs may largely contribute to the identification of new and more efficient therapeutics to prevent or cure infections. There are recent efforts for the detailed documentation of these experimentally verified PHI data through Web-based databases. Despite these advances in data archiving, there are still large amounts of PHI data in the biomedical literature yet to be discovered, and novel text mining methods are in development to unearth such hidden data. Here, we review a collection of recent studies on computational systems biology of PHIs with a special focus on the methods for the inference and analysis of PHI networks, covering also the Web-based databases and text-mining efforts to unravel the data hidden in the literature. PMID:25914674
Understanding PubMed user search behavior through log analysis.
Islamaj Dogan, Rezarta; Murray, G Craig; Névéol, Aurélie; Lu, Zhiyong
2009-01-01
This article reports on a detailed investigation of PubMed users' needs and behavior as a step toward improving biomedical information retrieval. PubMed is providing free service to researchers with access to more than 19 million citations for biomedical articles from MEDLINE and life science journals. It is accessed by millions of users each day. Efficient search tools are crucial for biomedical researchers to keep abreast of the biomedical literature relating to their own research. This study provides insight into PubMed users' needs and their behavior. This investigation was conducted through the analysis of one month of log data, consisting of more than 23 million user sessions and more than 58 million user queries. Multiple aspects of users' interactions with PubMed are characterized in detail with evidence from these logs. Despite having many features in common with general Web searches, biomedical information searches have unique characteristics that are made evident in this study. PubMed users are more persistent in seeking information and they reformulate queries often. The three most frequent types of search are search by author name, search by gene/protein, and search by disease. Use of abbreviation in queries is very frequent. Factors such as result set size influence users' decisions. Analysis of characteristics such as these plays a critical role in identifying users' information needs and their search habits. In turn, such an analysis also provides useful insight for improving biomedical information retrieval.Database URL:http://www.ncbi.nlm.nih.gov/PubMed.
Biomedical engineering education--status and perspectives.
Magjarevic, Ratko; Zequera Diaz, Martha L
2014-01-01
Biomedical Engineering programs are present at a large number of universities all over the world with an increasing trend. New generations of biomedical engineers have to face the challenges of health care systems round the world which need a large number of professionals not only to support the present technology in the health care system but to develop new devices and services. Health care stakeholders would like to have innovative solutions directed towards solving problems of the world growing incidence of chronic disease and ageing population. These new solutions have to meet the requirements for continuous monitoring, support or care outside clinical settlements. Presence of these needs can be tracked through data from the Labor Organization in the U.S. showing that biomedical engineering jobs have the largest growth at the engineering labor market with expected 72% growth rate in the period from 2008-2018. In European Union the number of patents (i.e. innovation) is the highest in the category of biomedical technology. Biomedical engineering curricula have to adopt to the new needs and for expectations of the future. In this paper we want to give an overview of engineering professions in related to engineering in medicine and biology and the current status of BME education in some regions, as a base for further discussions.
Shao, Huikai; Zhao, Lingguo; Chen, Fuchao; Zeng, Shengbo; Liu, Shengquan; Li, Jiajia
2015-11-29
BACKGROUND In the past decades, a large number of randomized controlled trials (RCTs) on the efficacy of ligustrazine injection combined with conventional antianginal drugs for angina pectoris have been reported. However, these RCTs have not been evaluated in accordance with PRISMA systematic review standards. The aim of this study was to evaluate the efficacy of ligustrazine injection as adjunctive therapy for angina pectoris. MATERIAL AND METHODS The databases PubMed, Medline, Cochrane Library, Embase, Sino-Med, Wanfang Databases, Chinese Scientific Journal Database, Google Scholar, Chinese Biomedical Literature Database, China National Knowledge Infrastructure, and the Chinese Science Citation Database were searched for published RCTs. Meta-analysis was performed on the primary outcome measures, including the improvements of electrocardiography (ECG) and the reductions in angina symptoms. Sensitivity and subgroup analysis based on the M score (the refined Jadad scores) were also used to evaluate the effect of quality, sample size, and publication year of the included RCTs on the overall effect of ligustrazine injection. RESULTS Eleven RCTs involving 870 patients with angina pectoris were selected in this study. Compared with conventional antianginal drugs alone, ligustrazine injection combined with antianginal drugs significantly increased the efficacy in symptom improvement (odds ratio [OR], 3.59; 95% confidence interval [CI]: 2.39 to 5.40) and in ECG improvement (OR, 3.42; 95% CI: 2.33 to 5.01). Sensitivity and subgroup analysis also confirmed that ligustrazine injection had better effect in the treatment of angina pectoris as adjunctive therapy. CONCLUSIONS The 11 eligible RCTs indicated that ligustrazine injection as adjunctive therapy was more effective than antianginal drugs alone. However, due to the low quality of included RCTs, more rigorously designed RCTs were still needed to verify the effects of ligustrazine injection as adjunctive therapy for angina pectoris.
Shao, Huikai; Zhao, Lingguo; Chen, Fuchao; Zeng, Shengbo; Liu, Shengquan; Li, Jiajia
2015-01-01
Background In the past decades, a large number of randomized controlled trials (RCTs) on the efficacy of ligustrazine injection combined with conventional antianginal drugs for angina pectoris have been reported. However, these RCTs have not been evaluated in accordance with PRISMA systematic review standards. The aim of this study was to evaluate the efficacy of ligustrazine injection as adjunctive therapy for angina pectoris. Material/Methods The databases PubMed, Medline, Cochrane Library, Embase, Sino-Med, Wanfang Databases, Chinese Scientific Journal Database, Google Scholar, Chinese Biomedical Literature Database, China National Knowledge Infrastructure, and the Chinese Science Citation Database were searched for published RCTs. Meta-analysis was performed on the primary outcome measures, including the improvements of electrocardiography (ECG) and the reductions in angina symptoms. Sensitivity and subgroup analysis based on the M score (the refined Jadad scores) were also used to evaluate the effect of quality, sample size, and publication year of the included RCTs on the overall effect of ligustrazine injection. Results Eleven RCTs involving 870 patients with angina pectoris were selected in this study. Compared with conventional antianginal drugs alone, ligustrazine injection combined with antianginal drugs significantly increased the efficacy in symptom improvement (odds ratio [OR], 3.59; 95% confidence interval [CI]: 2.39 to 5.40) and in ECG improvement (OR, 3.42; 95% CI: 2.33 to 5.01). Sensitivity and subgroup analysis also confirmed that ligustrazine injection had better effect in the treatment of angina pectoris as adjunctive therapy. Conclusions The 11 eligible RCTs indicated that ligustrazine injection as adjunctive therapy was more effective than antianginal drugs alone. However, due to the low quality of included RCTs, more rigorously designed RCTs were still needed to verify the effects of ligustrazine injection as adjunctive therapy for angina pectoris. PMID:26615387
Dialynas, Emmanuel; Topalis, Pantelis; Vontas, John; Louis, Christos
2009-01-01
Background Monitoring of insect vector populations with respect to their susceptibility to one or more insecticides is a crucial element of the strategies used for the control of arthropod-borne diseases. This management task can nowadays be achieved more efficiently when assisted by IT (Information Technology) tools, ranging from modern integrated databases to GIS (Geographic Information System). Here we describe an application ontology that we developed de novo, and a specially designed database that, based on this ontology, can be used for the purpose of controlling mosquitoes and, thus, the diseases that they transmit. Methodology/Principal Findings The ontology, named MIRO for Mosquito Insecticide Resistance Ontology, developed using the OBO-Edit software, describes all pertinent aspects of insecticide resistance, including specific methodology and mode of action. MIRO, then, forms the basis for the design and development of a dedicated database, IRbase, constructed using open source software, which can be used to retrieve data on mosquito populations in a temporally and spatially separate way, as well as to map the output using a Google Earth interface. The dependency of the database on the MIRO allows for a rational and efficient hierarchical search possibility. Conclusions/Significance The fact that the MIRO complies with the rules set forward by the OBO (Open Biomedical Ontologies) Foundry introduces cross-referencing with other biomedical ontologies and, thus, both MIRO and IRbase are suitable as parts of future comprehensive surveillance tools and decision support systems that will be used for the control of vector-borne diseases. MIRO is downloadable from and IRbase is accessible at VectorBase, the NIAID-sponsored open access database for arthropod vectors of disease. PMID:19547750
PASSIM--an open source software system for managing information in biomedical studies.
Viksna, Juris; Celms, Edgars; Opmanis, Martins; Podnieks, Karlis; Rucevskis, Peteris; Zarins, Andris; Barrett, Amy; Neogi, Sudeshna Guha; Krestyaninova, Maria; McCarthy, Mark I; Brazma, Alvis; Sarkans, Ugis
2007-02-09
One of the crucial aspects of day-to-day laboratory information management is collection, storage and retrieval of information about research subjects and biomedical samples. An efficient link between sample data and experiment results is absolutely imperative for a successful outcome of a biomedical study. Currently available software solutions are largely limited to large-scale, expensive commercial Laboratory Information Management Systems (LIMS). Acquiring such LIMS indeed can bring laboratory information management to a higher level, but often implies sufficient investment of time, effort and funds, which are not always available. There is a clear need for lightweight open source systems for patient and sample information management. We present a web-based tool for submission, management and retrieval of sample and research subject data. The system secures confidentiality by separating anonymized sample information from individuals' records. It is simple and generic, and can be customised for various biomedical studies. Information can be both entered and accessed using the same web interface. User groups and their privileges can be defined. The system is open-source and is supplied with an on-line tutorial and necessary documentation. It has proven to be successful in a large international collaborative project. The presented system closes the gap between the need and the availability of lightweight software solutions for managing information in biomedical studies involving human research subjects.
Lowe, H. J.
1993-01-01
This paper describes Image Engine, an object-oriented, microcomputer-based, multimedia database designed to facilitate the storage and retrieval of digitized biomedical still images, video, and text using inexpensive desktop computers. The current prototype runs on Apple Macintosh computers and allows network database access via peer to peer file sharing protocols. Image Engine supports both free text and controlled vocabulary indexing of multimedia objects. The latter is implemented using the TView thesaurus model developed by the author. The current prototype of Image Engine uses the National Library of Medicine's Medical Subject Headings (MeSH) vocabulary (with UMLS Meta-1 extensions) as its indexing thesaurus. PMID:8130596
Chinese journals: a guide for epidemiologists
Fung, Isaac CH
2008-01-01
Chinese journals in epidemiology, preventive medicine and public health contain much that is of potential international interest. However, few non-Chinese speakers are acquainted with this literature. This article therefore provides an overview of the contemporary scene in Chinese biomedical journal publication, Chinese bibliographic databases and Chinese journals in epidemiology, preventive medicine and public health. The challenge of switching to English as the medium of publication, the development of publishing bibliometric data from Chinese databases, the prospect of an Open Access publication model in China, the issue of language bias in literature reviews and the quality of Chinese journals are discussed. Epidemiologists are encouraged to search the Chinese bibliographic databases for Chinese journal articles. PMID:18826604
Roles and applications of biomedical ontologies in experimental animal science.
Masuya, Hiroshi
2012-01-01
A huge amount of experimental data from past studies has played a vital role in the development of new knowledge and technologies in biomedical science. The importance of computational technologies for the reuse of data, data integration, and knowledge discoveries has also increased, providing means of processing large amounts of data. In recent years, information technologies related to "ontologies" have played more significant roles in the standardization, integration, and knowledge representation of biomedical information. This review paper outlines the history of data integration in biomedical science and its recent trends in relation to the field of experimental animal science.
Payakachat, Nalin; Tilford, J Mick; Ungar, Wendy J
2016-02-01
The National Database for Autism Research (NDAR) is a US National Institutes of Health (NIH)-funded research data repository created by integrating heterogeneous datasets through data sharing agreements between autism researchers and the NIH. To date, NDAR is considered the largest neuroscience and genomic data repository for autism research. In addition to biomedical data, NDAR contains a large collection of clinical and behavioral assessments and health outcomes from novel interventions. Importantly, NDAR has a global unique patient identifier that can be linked to aggregated individual-level data for hypothesis generation and testing, and for replicating research findings. As such, NDAR promotes collaboration and maximizes public investment in the original data collection. As screening and diagnostic technologies as well as interventions for children with autism are expensive, health services research (HSR) and health technology assessment (HTA) are needed to generate more evidence to facilitate implementation when warranted. This article describes NDAR and explains its value to health services researchers and decision scientists interested in autism and other mental health conditions. We provide a description of the scope and structure of NDAR and illustrate how data are likely to grow over time and become available for HSR and HTA.
Bruce, Rachel; Chauvin, Anthony; Trinquart, Ludovic; Ravaud, Philippe; Boutron, Isabelle
2016-06-10
The peer review process is a cornerstone of biomedical research. We aimed to evaluate the impact of interventions to improve the quality of peer review for biomedical publications. We performed a systematic review and meta-analysis. We searched CENTRAL, MEDLINE (PubMed), Embase, Cochrane Database of Systematic Reviews, and WHO ICTRP databases, for all randomized controlled trials (RCTs) evaluating the impact of interventions to improve the quality of peer review for biomedical publications. We selected 22 reports of randomized controlled trials, for 25 comparisons evaluating training interventions (n = 5), the addition of a statistical peer reviewer (n = 2), use of a checklist (n = 2), open peer review (i.e., peer reviewers informed that their identity would be revealed; n = 7), blinded peer review (i.e., peer reviewers blinded to author names and affiliation; n = 6) and other interventions to increase the speed of the peer review process (n = 3). Results from only seven RCTs were published since 2004. As compared with the standard peer review process, training did not improve the quality of the peer review report and use of a checklist did not improve the quality of the final manuscript. Adding a statistical peer review improved the quality of the final manuscript (standardized mean difference (SMD), 0.58; 95 % CI, 0.19 to 0.98). Open peer review improved the quality of the peer review report (SMD, 0.14; 95 % CI, 0.05 to 0.24), did not affect the time peer reviewers spent on the peer review (mean difference, 0.18; 95 % CI, -0.06 to 0.43), and decreased the rate of rejection (odds ratio, 0.56; 95 % CI, 0.33 to 0.94). Blinded peer review did not affect the quality of the peer review report or rejection rate. Interventions to increase the speed of the peer review process were too heterogeneous to allow for pooling the results. Despite the essential role of peer review, only a few interventions have been assessed in randomized controlled trials. Evidence-based peer review needs to be developed in biomedical journals.
ARIANE: integration of information databases within a hospital intranet.
Joubert, M; Aymard, S; Fieschi, D; Volot, F; Staccini, P; Robert, J J; Fieschi, M
1998-05-01
Large information systems handle massive volume of data stored in heterogeneous sources. Each server has its own model of representation of concepts with regard to its aims. One of the main problems end-users encounter when accessing different servers is to match their own viewpoint on biomedical concepts with the various representations that are made in the databases servers. The aim of the project ARIANE is to provide end-users with easy-to-use and natural means to access and query heterogeneous information databases. The objectives of this research work consist in building a conceptual interface by means of the Internet technology inside an enterprise Intranet and to propose a method to realize it. This method is based on the knowledge sources provided by the Unified Medical Language System (UMLS) project of the US National Library of Medicine. Experiments concern queries to three different information servers: PubMed, a Medline server of the NLM; Thériaque, a French database on drugs implemented in the Hospital Intranet; and a Web site dedicated to Internet resources in gastroenterology and nutrition, located at the Faculty of Medicine of Nice (France). Accessing to each of these servers is different according to the kind of information delivered and according to the technology used to query it. Dealing with health care professional workstation, the authors introduced in the ARIANE project quality criteria in order to attempt a homogeneous and efficient way to build a query system able to be integrated in existing information systems and to integrate existing and new information sources.
A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations
Bollegala, Danushka; Kontonatsios, Georgios; Ananiadou, Sophia
2015-01-01
Bilingual dictionaries for technical terms such as biomedical terms are an important resource for machine translation systems as well as for humans who would like to understand a concept described in a foreign language. Often a biomedical term is first proposed in English and later it is manually translated to other languages. Despite the fact that there are large monolingual lexicons of biomedical terms, only a fraction of those term lexicons are translated to other languages. Manually compiling large-scale bilingual dictionaries for technical domains is a challenging task because it is difficult to find a sufficiently large number of bilingual experts. We propose a cross-lingual similarity measure for detecting most similar translation candidates for a biomedical term specified in one language (source) from another language (target). Specifically, a biomedical term in a language is represented using two types of features: (a) intrinsic features that consist of character n-grams extracted from the term under consideration, and (b) extrinsic features that consist of unigrams and bigrams extracted from the contextual windows surrounding the term under consideration. We propose a cross-lingual similarity measure using each of those feature types. First, to reduce the dimensionality of the feature space in each language, we propose prototype vector projection (PVP)—a non-negative lower-dimensional vector projection method. Second, we propose a method to learn a mapping between the feature spaces in the source and target language using partial least squares regression (PLSR). The proposed method requires only a small number of training instances to learn a cross-lingual similarity measure. The proposed PVP method outperforms popular dimensionality reduction methods such as the singular value decomposition (SVD) and non-negative matrix factorization (NMF) in a nearest neighbor prediction task. Moreover, our experimental results covering several language pairs such as English–French, English–Spanish, English–Greek, and English–Japanese show that the proposed method outperforms several other feature projection methods in biomedical term translation prediction tasks. PMID:26030738
Impact of Overt and Subclinical Hypothyroidism on Exercise Tolerance: A Systematic Review
ERIC Educational Resources Information Center
Lankhaar, Jeannette A. C.; de Vries, Wouter R.; Jansen, Jaap A. C. G.; Zelissen, Pierre M. J.; Backx, Frank J. G.
2014-01-01
Purpose: This systematic review describes the state of the art of the impact of hypothyroidism on exercise tolerance and physical performance capacity in untreated and treated patients with hypothyroidism. Method: A systematic computer-aided search was conducted using biomedical databases. Relevant studies in English, German, and Dutch, published…
The Reorganization of Basic Science Departments in U.S. Medical Schools, 1980-1999.
ERIC Educational Resources Information Center
Mallon, William T.; Biebuyck, Julien F.; Jones, Robert F.
2003-01-01
Constructed a longitudinal database to examine how basic science departments have been reorganized at U.S. medical schools. Found that there were fewer basic science departments in the traditional disciplines of anatomy, biochemistry, microbiology, pharmacology, and physiology in 1999 than in 1980. But as biomedical science has developed in an…
Director of Duke Institute Wants To Make Medicine More of a Science.
ERIC Educational Resources Information Center
Wheeler, David L.
1998-01-01
The director of the Duke Clinical Research Institute (North Carolina) is committed to making better use of patient information for medical research, and is building a database from the institute's clinical trials. His approach is to provide biomedical researchers with daily involvement in medicine rather than management. (MSE)
Experiences in supporting the structured collection of cancer nanotechnology data using caNanoLab
Gaheen, Sharon; Lijowski, Michal; Heiskanen, Mervi; Klemm, Juli
2015-01-01
Summary The cancer Nanotechnology Laboratory (caNanoLab) data portal is an online nanomaterial database that allows users to submit and retrieve information on well-characterized nanomaterials, including composition, in vitro and in vivo experimental characterizations, experimental protocols, and related publications. Initiated in 2006, caNanoLab serves as an established resource with an infrastructure supporting the structured collection of nanotechnology data to address the needs of the cancer biomedical and nanotechnology communities. The portal contains over 1,000 curated nanomaterial data records that are publicly accessible for review, comparison, and re-use, with the ultimate goal of accelerating the translation of nanotechnology-based cancer therapeutics, diagnostics, and imaging agents to the clinic. In this paper, we will discuss challenges associated with developing a nanomaterial database and recognized needs for nanotechnology data curation and sharing in the biomedical research community. We will also describe the latest version of caNanoLab, caNanoLab 2.0, which includes enhancements and new features to improve usability such as personalized views of data and enhanced search and navigation. PMID:26425409
CELLPEDIA: a repository for human cell information for cell studies and differentiation analyses.
Hatano, Akiko; Chiba, Hirokazu; Moesa, Harry Amri; Taniguchi, Takeaki; Nagaie, Satoshi; Yamanegi, Koji; Takai-Igarashi, Takako; Tanaka, Hiroshi; Fujibuchi, Wataru
2011-01-01
CELLPEDIA is a repository database for current knowledge about human cells. It contains various types of information, such as cell morphologies, gene expression and literature references. The major role of CELLPEDIA is to provide a digital dictionary of human cells for the biomedical field, including support for the characterization of artificially generated cells in regenerative medicine. CELLPEDIA features (i) its own cell classification scheme, in which whole human cells are classified by their physical locations in addition to conventional taxonomy; and (ii) cell differentiation pathways compiled from biomedical textbooks and journal papers. Currently, human differentiated cells and stem cells are classified into 2260 and 66 cell taxonomy keys, respectively, from which 934 parent-child relationships reported in cell differentiation or transdifferentiation pathways are retrievable. As far as we know, this is the first attempt to develop a digital cell bank to function as a public resource for the accumulation of current knowledge about human cells. The CELLPEDIA homepage is freely accessible except for the data submission pages that require authentication (please send a password request to cell-info@cbrc.jp). Database URL: http://cellpedia.cbrc.jp/
Alkemio: association of chemicals with biomedical topics by text and data mining.
Gijón-Correas, José A; Andrade-Navarro, Miguel A; Fontaine, Jean F
2014-07-01
The PubMed® database of biomedical citations allows the retrieval of scientific articles studying the function of chemicals in biology and medicine. Mining millions of available citations to search reported associations between chemicals and topics of interest would require substantial human time. We have implemented the Alkemio text mining web tool and SOAP web service to help in this task. The tool uses biomedical articles discussing chemicals (including drugs), predicts their relatedness to the query topic with a naïve Bayesian classifier and ranks all chemicals by P-values computed from random simulations. Benchmarks on seven human pathways showed good retrieval performance (areas under the receiver operating characteristic curves ranged from 73.6 to 94.5%). Comparison with existing tools to retrieve chemicals associated to eight diseases showed the higher precision and recall of Alkemio when considering the top 10 candidate chemicals. Alkemio is a high performing web tool ranking chemicals for any biomedical topics and it is free to non-commercial users. http://cbdm.mdc-berlin.de/∼medlineranker/cms/alkemio. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Kwon, Yoojin; Powelson, Susan E; Wong, Holly; Ghali, William A; Conly, John M
2014-11-11
The purpose of our study is to determine the value and efficacy of searching biomedical databases beyond MEDLINE for systematic reviews. We analyzed the results from a systematic review conducted by the authors and others on ward closure as an infection control practice. Ovid MEDLINE including In-Process & Other Non-Indexed Citations, Ovid Embase, CINAHL Plus, LILACS, and IndMED were systematically searched for articles of any study type discussing ward closure, as were bibliographies of selected articles and recent infection control conference abstracts. Search results were tracked, recorded, and analyzed using a relative recall method. The sensitivity of searching in each database was calculated. Two thousand ninety-five unique citations were identified and screened for inclusion in the systematic review: 2,060 from database searching and 35 from hand searching and other sources. Ninety-seven citations were included in the final review. MEDLINE and Embase searches each retrieved 80 of the 97 articles included, only 4 articles from each database were unique. The CINAHL search retrieved 35 included articles, and 4 were unique. The IndMED and LILACS searches did not retrieve any included articles, although 75 of the included articles were indexed in LILACS. The true value of using regional databases, particularly LILACS, may lie with the ability to search in the language spoken in the region. Eight articles were found only through hand searching. Identifying studies for a systematic review where the research is observational is complex. The value each individual study contributes to the review cannot be accurately measured. Consequently, we could not determine the value of results found from searching beyond MEDLINE, Embase, and CINAHL with accuracy. However, hand searching for serendipitous retrieval remains an important aspect due to indexing and keyword challenges inherent in this literature.
Ji, Yanqing; Ying, Hao; Tran, John; Dews, Peter; Massanari, R Michael
2016-07-19
Finding highly relevant articles from biomedical databases is challenging not only because it is often difficult to accurately express a user's underlying intention through keywords but also because a keyword-based query normally returns a long list of hits with many citations being unwanted by the user. This paper proposes a novel biomedical literature search system, called BiomedSearch, which supports complex queries and relevance feedback. The system employed association mining techniques to build a k-profile representing a user's relevance feedback. More specifically, we developed a weighted interest measure and an association mining algorithm to find the strength of association between a query and each concept in the article(s) selected by the user as feedback. The top concepts were utilized to form a k-profile used for the next-round search. BiomedSearch relies on Unified Medical Language System (UMLS) knowledge sources to map text files to standard biomedical concepts. It was designed to support queries with any levels of complexity. A prototype of BiomedSearch software was made and it was preliminarily evaluated using the Genomics data from TREC (Text Retrieval Conference) 2006 Genomics Track. Initial experiment results indicated that BiomedSearch increased the mean average precision (MAP) for a set of queries. With UMLS and association mining techniques, BiomedSearch can effectively utilize users' relevance feedback to improve the performance of biomedical literature search.
Research in Biomaterials and Tissue Engineering: Achievements and perspectives.
Ventre, Maurizio; Causa, Filippo; Netti, Paolo A; Pietrabissa, Riccardo
2015-01-01
Research on biomaterials and related subjects has been active in Italy. Starting from the very first examples of biomaterials and biomedical devices, Italian researchers have always provided valuable scientific contributions. This trend has steadily increased. To provide a rough estimate of this, it is sufficient to search PubMed, a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics, with the keywords "biomaterials" or "tissue engineering" and sort the results by affiliation. Again, even though this is a crude estimate, the results speak for themselves, as Italy is the third European country, in terms of publications, with an astonishing 3,700 products in the last decade.
A pilot biomedical engineering course in rapid prototyping for mobile health.
Stokes, Todd H; Venugopalan, Janani; Hubbard, Elena N; Wang, May D
2013-01-01
Rapid prototyping of medically assistive mobile devices promises to fuel innovation and provides opportunity for hands-on engineering training in biomedical engineering curricula. This paper presents the design and outcomes of a course offered during a 16-week semester in Fall 2011 with 11 students enrolled. The syllabus covered a mobile health design process from end-to-end, including storyboarding, non-functional prototypes, integrated circuit programming, 3D modeling, 3D printing, cloud computing database programming, and developing patient engagement through animated videos describing the benefits of a new device. Most technologies presented in this class are open source and thus provide unlimited "hackability". They are also cost-effective and easily transferrable to other departments.
De Sanctis, A; Russo, S; Craciun, M F; Alexeev, A; Barnes, M D; Nagareddy, V K; Wright, C D
2018-06-06
Graphene-based materials are being widely explored for a range of biomedical applications, from targeted drug delivery to biosensing, bioimaging and use for antibacterial treatments, to name but a few. In many such applications, it is not graphene itself that is used as the active agent, but one of its chemically functionalized forms. The type of chemical species used for functionalization will play a key role in determining the utility of any graphene-based device in any particular biomedical application, because this determines to a large part its physical, chemical, electrical and optical interactions. However, other factors will also be important in determining the eventual uptake of graphene-based biomedical technologies, in particular the ease and cost of manufacture of proposed device and system designs. In this work, we describe three novel routes for the chemical functionalization of graphene using oxygen, iron chloride and fluorine. We also introduce novel in situ methods for controlling and patterning such functionalization on the micro- and nanoscales. Our approaches are readily transferable to large-scale manufacturing, potentially paving the way for the eventual cost-effective production of functionalized graphene-based materials, devices and systems for a range of important biomedical applications.
The center for causal discovery of biomedical knowledge from big data
Bahar, Ivet; Becich, Michael J; Benos, Panayiotis V; Berg, Jeremy; Espino, Jeremy U; Glymour, Clark; Jacobson, Rebecca Crowley; Kienholz, Michelle; Lee, Adrian V; Lu, Xinghua; Scheines, Richard
2015-01-01
The Big Data to Knowledge (BD2K) Center for Causal Discovery is developing and disseminating an integrated set of open source tools that support causal modeling and discovery of biomedical knowledge from large and complex biomedical datasets. The Center integrates teams of biomedical and data scientists focused on the refinement of existing and the development of new constraint-based and Bayesian algorithms based on causal Bayesian networks, the optimization of software for efficient operation in a supercomputing environment, and the testing of algorithms and software developed using real data from 3 representative driving biomedical projects: cancer driver mutations, lung disease, and the functional connectome of the human brain. Associated training activities provide both biomedical and data scientists with the knowledge and skills needed to apply and extend these tools. Collaborative activities with the BD2K Consortium further advance causal discovery tools and integrate tools and resources developed by other centers. PMID:26138794
Publishing priorities of biomedical research funders
Collins, Ellen
2013-01-01
Objectives To understand the publishing priorities, especially in relation to open access, of 10 UK biomedical research funders. Design Semistructured interviews. Setting 10 UK biomedical research funders. Participants 12 employees with responsibility for research management at 10 UK biomedical research funders; a purposive sample to represent a range of backgrounds and organisation types. Conclusions Publicly funded and large biomedical research funders are committed to open access publishing and are pleased with recent developments which have stimulated growth in this area. Smaller charitable funders are supportive of the aims of open access, but are concerned about the practical implications for their budgets and their funded researchers. Across the board, biomedical research funders are turning their attention to other priorities for sharing research outputs, including data, protocols and negative results. Further work is required to understand how smaller funders, including charitable funders, can support open access. PMID:24154520
Poor replication validity of biomedical association studies reported by newspapers
Smith, Andy; Boraud, Thomas; Gonon, François
2017-01-01
Objective To investigate the replication validity of biomedical association studies covered by newspapers. Methods We used a database of 4723 primary studies included in 306 meta-analysis articles. These studies associated a risk factor with a disease in three biomedical domains, psychiatry, neurology and four somatic diseases. They were classified into a lifestyle category (e.g. smoking) and a non-lifestyle category (e.g. genetic risk). Using the database Dow Jones Factiva, we investigated the newspaper coverage of each study. Their replication validity was assessed using a comparison with their corresponding meta-analyses. Results Among the 5029 articles of our database, 156 primary studies (of which 63 were lifestyle studies) and 5 meta-analysis articles were reported in 1561 newspaper articles. The percentage of covered studies and the number of newspaper articles per study strongly increased with the impact factor of the journal that published each scientific study. Newspapers almost equally covered initial (5/39 12.8%) and subsequent (58/600 9.7%) lifestyle studies. In contrast, initial non-lifestyle studies were covered more often (48/366 13.1%) than subsequent ones (45/3718 1.2%). Newspapers never covered initial studies reporting null findings and rarely reported subsequent null observations. Only 48.7% of the 156 studies reported by newspapers were confirmed by the corresponding meta-analyses. Initial non-lifestyle studies were less often confirmed (16/48) than subsequent ones (29/45) and than lifestyle studies (31/63). Psychiatric studies covered by newspapers were less often confirmed (10/38) than the neurological (26/41) or somatic (40/77) ones. This is correlated to an even larger coverage of initial studies in psychiatry. Whereas 234 newspaper articles covered the 35 initial studies that were later disconfirmed, only four press articles covered a subsequent null finding and mentioned the refutation of an initial claim. Conclusion Journalists preferentially cover initial findings although they are often contradicted by meta-analyses and rarely inform the public when they are disconfirmed. PMID:28222122
Poor replication validity of biomedical association studies reported by newspapers.
Dumas-Mallet, Estelle; Smith, Andy; Boraud, Thomas; Gonon, François
2017-01-01
To investigate the replication validity of biomedical association studies covered by newspapers. We used a database of 4723 primary studies included in 306 meta-analysis articles. These studies associated a risk factor with a disease in three biomedical domains, psychiatry, neurology and four somatic diseases. They were classified into a lifestyle category (e.g. smoking) and a non-lifestyle category (e.g. genetic risk). Using the database Dow Jones Factiva, we investigated the newspaper coverage of each study. Their replication validity was assessed using a comparison with their corresponding meta-analyses. Among the 5029 articles of our database, 156 primary studies (of which 63 were lifestyle studies) and 5 meta-analysis articles were reported in 1561 newspaper articles. The percentage of covered studies and the number of newspaper articles per study strongly increased with the impact factor of the journal that published each scientific study. Newspapers almost equally covered initial (5/39 12.8%) and subsequent (58/600 9.7%) lifestyle studies. In contrast, initial non-lifestyle studies were covered more often (48/366 13.1%) than subsequent ones (45/3718 1.2%). Newspapers never covered initial studies reporting null findings and rarely reported subsequent null observations. Only 48.7% of the 156 studies reported by newspapers were confirmed by the corresponding meta-analyses. Initial non-lifestyle studies were less often confirmed (16/48) than subsequent ones (29/45) and than lifestyle studies (31/63). Psychiatric studies covered by newspapers were less often confirmed (10/38) than the neurological (26/41) or somatic (40/77) ones. This is correlated to an even larger coverage of initial studies in psychiatry. Whereas 234 newspaper articles covered the 35 initial studies that were later disconfirmed, only four press articles covered a subsequent null finding and mentioned the refutation of an initial claim. Journalists preferentially cover initial findings although they are often contradicted by meta-analyses and rarely inform the public when they are disconfirmed.
Paperless anesthesia: uses and abuses of these data.
Anderson, Brian J; Merry, Alan F
2015-12-01
Demonstrably accurate records facilitate clinical decision making, improve patient safety, provide better defense against frivolous lawsuits, and enable better medical policy decisions. Anesthesia Information Management Systems (AIMS) have the potential to improve on the accuracy and reliability of handwritten records. Interfaces with electronic recording systems within the hospital or wider community allow correlation of anesthesia relevant data with biochemistry laboratory results, billing sections, radiological units, pharmacy, earlier patient records, and other systems. Electronic storage of large and accurate datasets has lent itself to quality assurance, enhancement of patient safety, research, cost containment, scheduling, anesthesia training initiatives, and has even stimulated organizational change. The time for record making may be increased by AIMS, but in some cases has been reduced. The question of impact on vigilance is not entirely settled, but substantial negative effects seem to be unlikely. The usefulness of these large databases depends on the accuracy of data and they may be incorrect or incomplete. Consequent biases are threats to the validity of research results. Data mining of biomedical databases makes it easier for individuals with political, social, or economic agendas to generate misleading research findings for the purpose of manipulating public opinion and swaying policymakers. There remains a fear that accessibility of data may have undesirable regulatory or legal consequences. Increasing regulation of treatment options during the perioperative period through regulated policies could reduce autonomy for clinicians. These fears are as yet unsubstantiated. © 2015 John Wiley & Sons Ltd.
Liu, Yifeng; Liang, Yongjie; Wishart, David
2015-07-01
PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch2 supports a generalized 'Given X, find all associated Ys' query, where X and Y can be selected from the aforementioned biomedical entities. An example query might be: 'Find all diseases associated with Bisphenol A'. To find its answers, PolySearch2 searches for associations against comprehensive collections of free-text collections, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles and US Patent application abstracts. PolySearch2 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and Human Metabolome Database to improve its accuracy and coverage. PolySearch2 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch2 also generates, ranks and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Liu, Yifeng; Liang, Yongjie; Wishart, David
2015-01-01
PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch2 supports a generalized ‘Given X, find all associated Ys’ query, where X and Y can be selected from the aforementioned biomedical entities. An example query might be: ‘Find all diseases associated with Bisphenol A’. To find its answers, PolySearch2 searches for associations against comprehensive collections of free-text collections, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles and US Patent application abstracts. PolySearch2 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and Human Metabolome Database to improve its accuracy and coverage. PolySearch2 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch2 also generates, ranks and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation. PMID:25925572
Wilson, S H; Merkle, S; Brown, D; Moskowitz, J; Hurley, D; Brown, D; Bailey, B J; McClain, M; Misenhimer, M; Buckalew, J; Burks, T
2000-01-01
The National Association of Physicians for the Environment (NAPE) has assumed a leadership role in protecting environmental health in recent years. The Committee of Biomedical Research Leaders was convened at the recent NAPE Leadership Conference: Biomedical Research and the Environment held on 1--2 November 1999, at the National Institutes of Health, Bethesda, Maryland. This report summarizes the discussion of the committee and its recommendations. The charge to the committee was to raise and address issues that will promote and sustain environmental health, safety, and energy efficiency within the biomedical community. Leaders from every important research sector (industry laboratories, academic health centers and institutes, hospitals and care facilities, Federal laboratories, and community-based research facilities) were gathered in this committee to discuss issues relevant to promoting environmental health. The conference and this report focus on the themes of environmental stewardship, sustainable development and "best greening practices." Environmental stewardship, an emerging theme within and outside the biomedical community, symbolizes the effort to provide an integrated, synthesized, and concerted effort to protect the health of the environment in both the present and the future. The primary goal established by the committee is to promote environmentally responsible leadership in the biomedical research community. Key outcomes of the committee's discussion and deliberation were a) the need for a central organization to evaluate, promote, and oversee efforts in environmental stewardship; and b) immediate need to facilitate efficient information transfer relevant to protecting the global environment through a database/clearinghouse. Means to fulfill these needs are discussed in this report. PMID:11121363
Wilson, S H; Merkle, S; Brown, D; Moskowitz, J; Hurley, D; Brown, D; Bailey, B J; McClain, M; Misenhimer, M; Buckalew, J; Burks, T
2000-12-01
The National Association of Physicians for the Environment (NAPE) has assumed a leadership role in protecting environmental health in recent years. The Committee of Biomedical Research Leaders was convened at the recent NAPE Leadership Conference: Biomedical Research and the Environment held on 1--2 November 1999, at the National Institutes of Health, Bethesda, Maryland. This report summarizes the discussion of the committee and its recommendations. The charge to the committee was to raise and address issues that will promote and sustain environmental health, safety, and energy efficiency within the biomedical community. Leaders from every important research sector (industry laboratories, academic health centers and institutes, hospitals and care facilities, Federal laboratories, and community-based research facilities) were gathered in this committee to discuss issues relevant to promoting environmental health. The conference and this report focus on the themes of environmental stewardship, sustainable development and "best greening practices." Environmental stewardship, an emerging theme within and outside the biomedical community, symbolizes the effort to provide an integrated, synthesized, and concerted effort to protect the health of the environment in both the present and the future. The primary goal established by the committee is to promote environmentally responsible leadership in the biomedical research community. Key outcomes of the committee's discussion and deliberation were a) the need for a central organization to evaluate, promote, and oversee efforts in environmental stewardship; and b) immediate need to facilitate efficient information transfer relevant to protecting the global environment through a database/clearinghouse. Means to fulfill these needs are discussed in this report.
Przydzial, Magdalena J; Bhhatarai, Barun; Koleti, Amar; Vempati, Uma; Schürer, Stephan C
2013-12-15
Novel tools need to be developed to help scientists analyze large amounts of available screening data with the goal to identify entry points for the development of novel chemical probes and drugs. As the largest class of drug targets, G protein-coupled receptors (GPCRs) remain of particular interest and are pursued by numerous academic and industrial research projects. We report the first GPCR ontology to facilitate integration and aggregation of GPCR-targeting drugs and demonstrate its application to classify and analyze a large subset of the PubChem database. The GPCR ontology, based on previously reported BioAssay Ontology, depicts available pharmacological, biochemical and physiological profiles of GPCRs and their ligands. The novelty of the GPCR ontology lies in the use of diverse experimental datasets linked by a model to formally define these concepts. Using a reasoning system, GPCR ontology offers potential for knowledge-based classification of individuals (such as small molecules) as a function of the data. The GPCR ontology is available at http://www.bioassayontology.org/bao_gpcr and the National Center for Biomedical Ontologies Web site.
Li, Zhao; Li, Jin; Yu, Peng
2018-01-01
Abstract Metadata curation has become increasingly important for biological discovery and biomedical research because a large amount of heterogeneous biological data is currently freely available. To facilitate efficient metadata curation, we developed an easy-to-use web-based curation application, GEOMetaCuration, for curating the metadata of Gene Expression Omnibus datasets. It can eliminate mechanical operations that consume precious curation time and can help coordinate curation efforts among multiple curators. It improves the curation process by introducing various features that are critical to metadata curation, such as a back-end curation management system and a curator-friendly front-end. The application is based on a commonly used web development framework of Python/Django and is open-sourced under the GNU General Public License V3. GEOMetaCuration is expected to benefit the biocuration community and to contribute to computational generation of biological insights using large-scale biological data. An example use case can be found at the demo website: http://geometacuration.yubiolab.org. Database URL: https://bitbucket.com/yubiolab/GEOMetaCuration PMID:29688376
Large-Scale Discovery of Disease-Disease and Disease-Gene Associations
Gligorijevic, Djordje; Stojanovic, Jelena; Djuric, Nemanja; Radosavljevic, Vladan; Grbovic, Mihajlo; Kulathinal, Rob J.; Obradovic, Zoran
2016-01-01
Data-driven phenotype analyses on Electronic Health Record (EHR) data have recently drawn benefits across many areas of clinical practice, uncovering new links in the medical sciences that can potentially affect the well-being of millions of patients. In this paper, EHR data is used to discover novel relationships between diseases by studying their comorbidities (co-occurrences in patients). A novel embedding model is designed to extract knowledge from disease comorbidities by learning from a large-scale EHR database comprising more than 35 million inpatient cases spanning nearly a decade, revealing significant improvements on disease phenotyping over current computational approaches. In addition, the use of the proposed methodology is extended to discover novel disease-gene associations by including valuable domain knowledge from genome-wide association studies. To evaluate our approach, its effectiveness is compared against a held-out set where, again, it revealed very compelling results. For selected diseases, we further identify candidate gene lists for which disease-gene associations were not studied previously. Thus, our approach provides biomedical researchers with new tools to filter genes of interest, thus, reducing costly lab studies. PMID:27578529
Lee, Kyubum; Kim, Byounggun; Jeon, Minji; Kim, Jihye; Tan, Aik Choon
2018-01-01
Background With the development of artificial intelligence (AI) technology centered on deep-learning, the computer has evolved to a point where it can read a given text and answer a question based on the context of the text. Such a specific task is known as the task of machine comprehension. Existing machine comprehension tasks mostly use datasets of general texts, such as news articles or elementary school-level storybooks. However, no attempt has been made to determine whether an up-to-date deep learning-based machine comprehension model can also process scientific literature containing expert-level knowledge, especially in the biomedical domain. Objective This study aims to investigate whether a machine comprehension model can process biomedical articles as well as general texts. Since there is no dataset for the biomedical literature comprehension task, our work includes generating a large-scale question answering dataset using PubMed and manually evaluating the generated dataset. Methods We present an attention-based deep neural model tailored to the biomedical domain. To further enhance the performance of our model, we used a pretrained word vector and biomedical entity type embedding. We also developed an ensemble method of combining the results of several independent models to reduce the variance of the answers from the models. Results The experimental results showed that our proposed deep neural network model outperformed the baseline model by more than 7% on the new dataset. We also evaluated human performance on the new dataset. The human evaluation result showed that our deep neural model outperformed humans in comprehension by 22% on average. Conclusions In this work, we introduced a new task of machine comprehension in the biomedical domain using a deep neural model. Since there was no large-scale dataset for training deep neural models in the biomedical domain, we created the new cloze-style datasets Biomedical Knowledge Comprehension Title (BMKC_T) and Biomedical Knowledge Comprehension Last Sentence (BMKC_LS) (together referred to as BioMedical Knowledge Comprehension) using the PubMed corpus. The experimental results showed that the performance of our model is much higher than that of humans. We observed that our model performed consistently better regardless of the degree of difficulty of a text, whereas humans have difficulty when performing biomedical literature comprehension tasks that require expert level knowledge. PMID:29305341
Trends in biochemical and biomedical applications of mass spectrometry
NASA Astrophysics Data System (ADS)
Gelpi, Emilio
1992-09-01
This review attempts an in-depth evaluation of progress and achievements made since the last 11th International Mass Spectrometry Conference in the application of mass spectrometric techniques to biochemistry and biomedicine. For this purpose, scientific contributions in this field at major international meetings have been monitored, together with an extensive appraisal of literature data covering the period from 1988 to 1991. A bibliometric evaluation of the MEDLINE database for this period provides a total of almost 4000 entries for mass spectrometry. This allows a detailed study of literature and geographical sources of the most frequent applications, of disciplines where mass spectrometry is most active and of types of sample and instrumentation most commonly used. In this regard major efforts according to number of publications (over 100 literature reports) are concentrated in countries like Canada, France, Germany, Italy, Japan, Sweden, UK and the USA. Also, most of the work using mass spectrometry in biochemistry and biomedicine is centred on studies on biotransformation, metabolism, pharmacology, pharmacokinetics and toxicology, which have been carried out on samples of blood, urine, plasma and tissue, by order of frequency of use. Human and animal studies appear to be evenly distributed in terms of the number of reports published in the literature in which the authors make use of experimental animals or describe work on human samples. Along these lines, special attention is given to the real usefulness of mass spectrometry (MS) technology in routine medical practice. Thus the review concentrates on evaluating the progress made in disease diagnosis and overall patient care. As regards prevailing techniques, GCMS continues to be the mainstay of the state of the art methods for multicomponent analysis, stable isotope tracer studies and metabolic profiling, while HPLC--MS and tandem MS are becoming increasingly important in biomedical research. However, despite the relatively large number of mass spectrometry reports in the biomedical sciences very few true routine applications are described, and recent technological innovations in instrumentation such as FABMS, electrospray, plasma or laser desorption have contributed relatively much more to structural biology, especially in biopolymer studies of macromolecules rather than to real life biomedical applications on patients and clinical problems.
UCSC genome browser: deep support for molecular biomedical research.
Mangan, Mary E; Williams, Jennifer M; Lathe, Scott M; Karolchik, Donna; Lathe, Warren C
2008-01-01
The volume and complexity of genomic sequence data, and the additional experimental data required for annotation of the genomic context, pose a major challenge for display and access for biomedical researchers. Genome browsers organize this data and make it available in various ways to extract useful information to advance research projects. The UCSC Genome Browser is one of these resources. The official sequence data for a given species forms the framework to display many other types of data such as expression, variation, cross-species comparisons, and more. Visual representations of the data are available for exploration. Data can be queried with sequences. Complex database queries are also easily achieved with the Table Browser interface. Associated tools permit additional query types or access to additional data sources such as images of in situ localizations. Support for solving researcher's issues is provided with active discussion mailing lists and by providing updated training materials. The UCSC Genome Browser provides a source of deep support for a wide range of biomedical molecular research (http://genome.ucsc.edu).
Yang, Zhihao; Lin, Yuan; Wu, Jiajin; Tang, Nan; Lin, Hongfei; Li, Yanpeng
2011-10-01
Knowledge about protein-protein interactions (PPIs) unveils the molecular mechanisms of biological processes. However, the volume and content of published biomedical literature on protein interactions is expanding rapidly, making it increasingly difficult for interaction database curators to detect and curate protein interaction information manually. We present a multiple kernel learning-based approach for automatic PPI extraction from biomedical literature. The approach combines the following kernels: feature-based, tree, and graph and combines their output with Ranking support vector machine (SVM). Experimental evaluations show that the features in individual kernels are complementary and the kernel combined with Ranking SVM achieves better performance than those of the individual kernels, equal weight combination and optimal weight combination. Our approach can achieve state-of-the-art performance with respect to the comparable evaluations, with 64.88% F-score and 88.02% AUC on the AImed corpus. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Evaluation of relational and NoSQL database architectures to manage genomic annotations.
Schulz, Wade L; Nelson, Brent G; Felker, Donn K; Durant, Thomas J S; Torres, Richard
2016-12-01
While the adoption of next generation sequencing has rapidly expanded, the informatics infrastructure used to manage the data generated by this technology has not kept pace. Historically, relational databases have provided much of the framework for data storage and retrieval. Newer technologies based on NoSQL architectures may provide significant advantages in storage and query efficiency, thereby reducing the cost of data management. But their relative advantage when applied to biomedical data sets, such as genetic data, has not been characterized. To this end, we compared the storage, indexing, and query efficiency of a common relational database (MySQL), a document-oriented NoSQL database (MongoDB), and a relational database with NoSQL support (PostgreSQL). When used to store genomic annotations from the dbSNP database, we found the NoSQL architectures to outperform traditional, relational models for speed of data storage, indexing, and query retrieval in nearly every operation. These findings strongly support the use of novel database technologies to improve the efficiency of data management within the biological sciences. Copyright © 2016 Elsevier Inc. All rights reserved.
Disbiome database: linking the microbiome to disease.
Janssens, Yorick; Nielandt, Joachim; Bronselaer, Antoon; Debunne, Nathan; Verbeke, Frederick; Wynendaele, Evelien; Van Immerseel, Filip; Vandewynckel, Yves-Paul; De Tré, Guy; De Spiegeleer, Bart
2018-06-04
Recent research has provided fascinating indications and evidence that the host health is linked to its microbial inhabitants. Due to the development of high-throughput sequencing technologies, more and more data covering microbial composition changes in different disease types are emerging. However, this information is dispersed over a wide variety of medical and biomedical disciplines. Disbiome is a database which collects and presents published microbiota-disease information in a standardized way. The diseases are classified using the MedDRA classification system and the micro-organisms are linked to their NCBI and SILVA taxonomy. Finally, each study included in the Disbiome database is assessed for its reporting quality using a standardized questionnaire. Disbiome is the first database giving a clear, concise and up-to-date overview of microbial composition differences in diseases, together with the relevant information of the studies published. The strength of this database lies within the combination of the presence of references to other databases, which enables both specific and diverse search strategies within the Disbiome database, and the human annotation which ensures a simple and structured presentation of the available data.
Specialist Bibliographic Databases
2016-01-01
Specialist bibliographic databases offer essential online tools for researchers and authors who work on specific subjects and perform comprehensive and systematic syntheses of evidence. This article presents examples of the established specialist databases, which may be of interest to those engaged in multidisciplinary science communication. Access to most specialist databases is through subscription schemes and membership in professional associations. Several aggregators of information and database vendors, such as EBSCOhost and ProQuest, facilitate advanced searches supported by specialist keyword thesauri. Searches of items through specialist databases are complementary to those through multidisciplinary research platforms, such as PubMed, Web of Science, and Google Scholar. Familiarizing with the functional characteristics of biomedical and nonbiomedical bibliographic search tools is mandatory for researchers, authors, editors, and publishers. The database users are offered updates of the indexed journal lists, abstracts, author profiles, and links to other metadata. Editors and publishers may find particularly useful source selection criteria and apply for coverage of their peer-reviewed journals and grey literature sources. These criteria are aimed at accepting relevant sources with established editorial policies and quality controls. PMID:27134485
Specialist Bibliographic Databases.
Gasparyan, Armen Yuri; Yessirkepov, Marlen; Voronov, Alexander A; Trukhachev, Vladimir I; Kostyukova, Elena I; Gerasimov, Alexey N; Kitas, George D
2016-05-01
Specialist bibliographic databases offer essential online tools for researchers and authors who work on specific subjects and perform comprehensive and systematic syntheses of evidence. This article presents examples of the established specialist databases, which may be of interest to those engaged in multidisciplinary science communication. Access to most specialist databases is through subscription schemes and membership in professional associations. Several aggregators of information and database vendors, such as EBSCOhost and ProQuest, facilitate advanced searches supported by specialist keyword thesauri. Searches of items through specialist databases are complementary to those through multidisciplinary research platforms, such as PubMed, Web of Science, and Google Scholar. Familiarizing with the functional characteristics of biomedical and nonbiomedical bibliographic search tools is mandatory for researchers, authors, editors, and publishers. The database users are offered updates of the indexed journal lists, abstracts, author profiles, and links to other metadata. Editors and publishers may find particularly useful source selection criteria and apply for coverage of their peer-reviewed journals and grey literature sources. These criteria are aimed at accepting relevant sources with established editorial policies and quality controls.
Biomedical question answering using semantic relations.
Hristovski, Dimitar; Dinevski, Dejan; Kastrin, Andrej; Rindflesch, Thomas C
2015-01-16
The proliferation of the scientific literature in the field of biomedicine makes it difficult to keep abreast of current knowledge, even for domain experts. While general Web search engines and specialized information retrieval (IR) systems have made important strides in recent decades, the problem of accurate knowledge extraction from the biomedical literature is far from solved. Classical IR systems usually return a list of documents that have to be read by the user to extract relevant information. This tedious and time-consuming work can be lessened with automatic Question Answering (QA) systems, which aim to provide users with direct and precise answers to their questions. In this work we propose a novel methodology for QA based on semantic relations extracted from the biomedical literature. We extracted semantic relations with the SemRep natural language processing system from 122,421,765 sentences, which came from 21,014,382 MEDLINE citations (i.e., the complete MEDLINE distribution up to the end of 2012). A total of 58,879,300 semantic relation instances were extracted and organized in a relational database. The QA process is implemented as a search in this database, which is accessed through a Web-based application, called SemBT (available at http://sembt.mf.uni-lj.si ). We conducted an extensive evaluation of the proposed methodology in order to estimate the accuracy of extracting a particular semantic relation from a particular sentence. Evaluation was performed by 80 domain experts. In total 7,510 semantic relation instances belonging to 2,675 distinct relations were evaluated 12,083 times. The instances were evaluated as correct 8,228 times (68%). In this work we propose an innovative methodology for biomedical QA. The system is implemented as a Web-based application that is able to provide precise answers to a wide range of questions. A typical question is answered within a few seconds. The tool has some extensions that make it especially useful for interpretation of DNA microarray results.
Cloud Based Metalearning System for Predictive Modeling of Biomedical Data
Vukićević, Milan
2014-01-01
Rapid growth and storage of biomedical data enabled many opportunities for predictive modeling and improvement of healthcare processes. On the other side analysis of such large amounts of data is a difficult and computationally intensive task for most existing data mining algorithms. This problem is addressed by proposing a cloud based system that integrates metalearning framework for ranking and selection of best predictive algorithms for data at hand and open source big data technologies for analysis of biomedical data. PMID:24892101
Strategies from UW-Madison for rescuing biomedical research in the US
Kimble, Judith; Bement, William M; Chang, Qiang; Cox, Benjamin L; Drinkwater, Norman R; Gourse, Richard L; Hoskins, Aaron A; Huttenlocher, Anna; Kreeger, Pamela K; Lambert, Paul F; Mailick, Marsha R; Miyamoto, Shigeki; Moss, Richard L; O'Connor-Giles, Kate M; Roopra, Avtar; Saha, Krishanu; Seidel, Hannah S
2015-01-01
A cross-campus, cross-career stage and cross-disciplinary series of discussions at a large public university has produced a series of recommendations for addressing the problems confronting the biomedical research community in the US. DOI: http://dx.doi.org/10.7554/eLife.09305.001 PMID:26122792
Designing Biomedical Informatics Infrastructure for Clinical and Translational Science
ERIC Educational Resources Information Center
La Paz Lillo, Ariel Isaac
2009-01-01
Clinical and Translational Science (CTS) rests largely on information flowing smoothly at multiple levels, in multiple directions, across multiple locations. Biomedical Informatics (BI) is seen as a backbone that helps to manage information flows for the translation of knowledge generated and stored in silos of basic science into bedside…
Meta-analysis of CHEK2 1100delC variant and colorectal cancer susceptibility.
Xiang, He-ping; Geng, Xiao-ping; Ge, Wei-wei; Li, He
2011-11-01
Cell cycle checkpoint kinase 2 (CHEK2) gene has been inconsistently associated with colorectal cancer (CRC), particularly the 1100delC variant. To generate large-scale evidence on whether the CHEK2 1100delC variant is associated with CRC susceptibility we have conducted a meta-analysis. Data were collected from the following electronic databases: PubMed, Excerpta Medica Database and Chinese Biomedical Literature Database, with the last report up to November 2010. The odds ratio (OR) and its 95% confidence interval (95% CI) were used to assess the strength of association. We evaluated the contrast of carriers versus non-carriers. Meta-analysis was performed in a fixed/random effect model by using the software Review Manager 4.2. A total of six studies including 4194 cases and 10,010 controls based on the search criteria were involved in this meta-analysis. A significant association of the CHEK2 1100delC variant with unselected CRC was found (OR=2.11, 95% CI=1.41-3.16, P=0.0003). We also found an association of the CHEK2 1100delC variant with familial CRC (OR=2.80, 95% CI=1.74-4.51, P<0.0001). However, the association was not established for sporadic CRC (OR=1.45, 95% CI=0.49-4.30, P=0.50). This meta-analysis demonstrates that the CHEK2 1100delC variant may be an important CRC-predisposing gene, which increases CRC risk. Copyright © 2011. Published by Elsevier Ltd.
Park, KeeHyun; Lim, SeungHyeon
2015-01-01
In this paper, a multilayer secure biomedical data management system for managing a very large number of diverse personal health devices is proposed. The system has the following characteristics: the system supports international standard communication protocols to achieve interoperability. The system is integrated in the sense that both a PHD communication system and a remote PHD management system work together as a single system. Finally, the system proposed in this paper provides user/message authentication processes to securely transmit biomedical data measured by PHDs based on the concept of a biomedical signature. Some experiments, including the stress test, have been conducted to show that the system proposed/constructed in this study performs very well even when a very large number of PHDs are used. For a stress test, up to 1,200 threads are made to represent the same number of PHD agents. The loss ratio of the ISO/IEEE 11073 messages in the normal system is as high as 14% when 1,200 PHD agents are connected. On the other hand, no message loss occurs in the multilayered system proposed in this study, which demonstrates the superiority of the multilayered system to the normal system with regard to heavy traffic.
Lim, SeungHyeon
2015-01-01
In this paper, a multilayer secure biomedical data management system for managing a very large number of diverse personal health devices is proposed. The system has the following characteristics: the system supports international standard communication protocols to achieve interoperability. The system is integrated in the sense that both a PHD communication system and a remote PHD management system work together as a single system. Finally, the system proposed in this paper provides user/message authentication processes to securely transmit biomedical data measured by PHDs based on the concept of a biomedical signature. Some experiments, including the stress test, have been conducted to show that the system proposed/constructed in this study performs very well even when a very large number of PHDs are used. For a stress test, up to 1,200 threads are made to represent the same number of PHD agents. The loss ratio of the ISO/IEEE 11073 messages in the normal system is as high as 14% when 1,200 PHD agents are connected. On the other hand, no message loss occurs in the multilayered system proposed in this study, which demonstrates the superiority of the multilayered system to the normal system with regard to heavy traffic. PMID:26247034
Biomedical imaging and sensing using flatbed scanners.
Göröcs, Zoltán; Ozcan, Aydogan
2014-09-07
In this Review, we provide an overview of flatbed scanner based biomedical imaging and sensing techniques. The extremely large imaging field-of-view (e.g., ~600-700 cm(2)) of these devices coupled with their cost-effectiveness provide unique opportunities for digital imaging of samples that are too large for regular optical microscopes, and for collection of large amounts of statistical data in various automated imaging or sensing tasks. Here we give a short introduction to the basic features of flatbed scanners also highlighting the key parameters for designing scientific experiments using these devices, followed by a discussion of some of the significant examples, where scanner-based systems were constructed to conduct various biomedical imaging and/or sensing experiments. Along with mobile phones and other emerging consumer electronics devices, flatbed scanners and their use in advanced imaging and sensing experiments might help us transform current practices of medicine, engineering and sciences through democratization of measurement science and empowerment of citizen scientists, science educators and researchers in resource limited settings.
Biomedical Imaging and Sensing using Flatbed Scanners
Göröcs, Zoltán; Ozcan, Aydogan
2014-01-01
In this Review, we provide an overview of flatbed scanner based biomedical imaging and sensing techniques. The extremely large imaging field-of-view (e.g., ~600–700 cm2) of these devices coupled with their cost-effectiveness provide unique opportunities for digital imaging of samples that are too large for regular optical microscopes, and for collection of large amounts of statistical data in various automated imaging or sensing tasks. Here we give a short introduction to the basic features of flatbed scanners also highlighting the key parameters for designing scientific experiments using these devices, followed by a discussion of some of the significant examples, where scanner-based systems were constructed to conduct various biomedical imaging and/or sensing experiments. Along with mobile phones and other emerging consumer electronics devices, flatbed scanners and their use in advanced imaging and sensing experiments might help us transform current practices of medicine, engineering and sciences through democratization of measurement science and empowerment of citizen scientists, science educators and researchers in resource limited settings. PMID:24965011
MEDLINE: the options for health professionals.
Wood, E H
1994-01-01
The bibliographic database MEDLINE, produced by the National Library of Medicine (NLM), is a computerized index to the world's biomedical literature. The database can be searched back to 1966 and contains 6.8 million records. The various means of access are divided, for the purposes of this article, into three categories: logging onto a remote host computer by telephone and modem or by the Internet; subscribing to part or all of the database on compact disc (CD-ROM); and leasing the data on a transport medium such as magnetic tape or CDs for loading on a local host computer. Decisions about which method is preferable in a given situation depend on cost, availability of hardware and software, local expertise, and the size of the intended user population. Trends include increased access to the Internet by health professionals, increased network speed, links from MEDLINE records to full-text databases or online journals, and integration of MEDLINE into wider health information systems.
OntoGene web services for biomedical text mining.
Rinaldi, Fabio; Clematide, Simon; Marques, Hernani; Ellendorff, Tilia; Romacker, Martin; Rodriguez-Esteban, Raul
2014-01-01
Text mining services are rapidly becoming a crucial component of various knowledge management pipelines, for example in the process of database curation, or for exploration and enrichment of biomedical data within the pharmaceutical industry. Traditional architectures, based on monolithic applications, do not offer sufficient flexibility for a wide range of use case scenarios, and therefore open architectures, as provided by web services, are attracting increased interest. We present an approach towards providing advanced text mining capabilities through web services, using a recently proposed standard for textual data interchange (BioC). The web services leverage a state-of-the-art platform for text mining (OntoGene) which has been tested in several community-organized evaluation challenges,with top ranked results in several of them.
OntoGene web services for biomedical text mining
2014-01-01
Text mining services are rapidly becoming a crucial component of various knowledge management pipelines, for example in the process of database curation, or for exploration and enrichment of biomedical data within the pharmaceutical industry. Traditional architectures, based on monolithic applications, do not offer sufficient flexibility for a wide range of use case scenarios, and therefore open architectures, as provided by web services, are attracting increased interest. We present an approach towards providing advanced text mining capabilities through web services, using a recently proposed standard for textual data interchange (BioC). The web services leverage a state-of-the-art platform for text mining (OntoGene) which has been tested in several community-organized evaluation challenges, with top ranked results in several of them. PMID:25472638
Managing biomedical image metadata for search and retrieval of similar images.
Korenblum, Daniel; Rubin, Daniel; Napel, Sandy; Rodriguez, Cesar; Beaulieu, Chris
2011-08-01
Radiology images are generally disconnected from the metadata describing their contents, such as imaging observations ("semantic" metadata), which are usually described in text reports that are not directly linked to the images. We developed a system, the Biomedical Image Metadata Manager (BIMM) to (1) address the problem of managing biomedical image metadata and (2) facilitate the retrieval of similar images using semantic feature metadata. Our approach allows radiologists, researchers, and students to take advantage of the vast and growing repositories of medical image data by explicitly linking images to their associated metadata in a relational database that is globally accessible through a Web application. BIMM receives input in the form of standard-based metadata files using Web service and parses and stores the metadata in a relational database allowing efficient data query and maintenance capabilities. Upon querying BIMM for images, 2D regions of interest (ROIs) stored as metadata are automatically rendered onto preview images included in search results. The system's "match observations" function retrieves images with similar ROIs based on specific semantic features describing imaging observation characteristics (IOCs). We demonstrate that the system, using IOCs alone, can accurately retrieve images with diagnoses matching the query images, and we evaluate its performance on a set of annotated liver lesion images. BIMM has several potential applications, e.g., computer-aided detection and diagnosis, content-based image retrieval, automating medical analysis protocols, and gathering population statistics like disease prevalences. The system provides a framework for decision support systems, potentially improving their diagnostic accuracy and selection of appropriate therapies.
Life sciences domain analysis model
Freimuth, Robert R; Freund, Elaine T; Schick, Lisa; Sharma, Mukesh K; Stafford, Grace A; Suzek, Baris E; Hernandez, Joyce; Hipp, Jason; Kelley, Jenny M; Rokicki, Konrad; Pan, Sue; Buckler, Andrew; Stokes, Todd H; Fernandez, Anna; Fore, Ian; Buetow, Kenneth H
2012-01-01
Objective Meaningful exchange of information is a fundamental challenge in collaborative biomedical research. To help address this, the authors developed the Life Sciences Domain Analysis Model (LS DAM), an information model that provides a framework for communication among domain experts and technical teams developing information systems to support biomedical research. The LS DAM is harmonized with the Biomedical Research Integrated Domain Group (BRIDG) model of protocol-driven clinical research. Together, these models can facilitate data exchange for translational research. Materials and methods The content of the LS DAM was driven by analysis of life sciences and translational research scenarios and the concepts in the model are derived from existing information models, reference models and data exchange formats. The model is represented in the Unified Modeling Language and uses ISO 21090 data types. Results The LS DAM v2.2.1 is comprised of 130 classes and covers several core areas including Experiment, Molecular Biology, Molecular Databases and Specimen. Nearly half of these classes originate from the BRIDG model, emphasizing the semantic harmonization between these models. Validation of the LS DAM against independently derived information models, research scenarios and reference databases supports its general applicability to represent life sciences research. Discussion The LS DAM provides unambiguous definitions for concepts required to describe life sciences research. The processes established to achieve consensus among domain experts will be applied in future iterations and may be broadly applicable to other standardization efforts. Conclusions The LS DAM provides common semantics for life sciences research. Through harmonization with BRIDG, it promotes interoperability in translational science. PMID:22744959
Acupuncture for treating sciatica: a systematic review protocol
Qin, Zongshi; Liu, Xiaoxu; Yao, Qin; Zhai, Yanbing; Liu, Zhishun
2015-01-01
Introduction This systematic review aims to assess the effectiveness and safety of acupuncture for treating sciatica. Methods The following nine databases will be searched from their inception to 30 October 2014: MEDLINE, EMBASE, the Cochrane Central Register of Controlled Trials (CENTRAL), the Chinese Biomedical Literature Database (CBM), the Chinese Medical Current Content (CMCC), the Chinese Scientific Journal Database (VIP database), the Wan-Fang Database, the China National Knowledge Infrastructure (CNKI) and Citation Information by National Institute of Informatics (CiNii). Randomised controlled trials (RCTs) of acupuncture for sciatica in English, Chinese or Japanese without restriction of publication status will be included. Two researchers will independently undertake study selection, extraction of data and assessment of study quality. Meta-analysis will be conducted after screening of studies. Data will be analysed using risk ratio for dichotomous data, and standardised mean difference or weighted mean difference for continuous data. Dissemination This systematic review will be disseminated electronically through a peer-reviewed publication or conference presentations. Trial registration number PROSPERO CRD42014015001. PMID:25922105
Database citation in full text biomedical articles.
Kafkas, Şenay; Kim, Jee-Hyub; McEntyre, Johanna R
2013-01-01
Molecular biology and literature databases represent essential infrastructure for life science research. Effective integration of these data resources requires that there are structured cross-references at the level of individual articles and biological records. Here, we describe the current patterns of how database entries are cited in research articles, based on analysis of the full text Open Access articles available from Europe PMC. Focusing on citation of entries in the European Nucleotide Archive (ENA), UniProt and Protein Data Bank, Europe (PDBe), we demonstrate that text mining doubles the number of structured annotations of database record citations supplied in journal articles by publishers. Many thousands of new literature-database relationships are found by text mining, since these relationships are also not present in the set of articles cited by database records. We recommend that structured annotation of database records in articles is extended to other databases, such as ArrayExpress and Pfam, entries from which are also cited widely in the literature. The very high precision and high-throughput of this text-mining pipeline makes this activity possible both accurately and at low cost, which will allow the development of new integrated data services.
Database Citation in Full Text Biomedical Articles
Kafkas, Şenay; Kim, Jee-Hyub; McEntyre, Johanna R.
2013-01-01
Molecular biology and literature databases represent essential infrastructure for life science research. Effective integration of these data resources requires that there are structured cross-references at the level of individual articles and biological records. Here, we describe the current patterns of how database entries are cited in research articles, based on analysis of the full text Open Access articles available from Europe PMC. Focusing on citation of entries in the European Nucleotide Archive (ENA), UniProt and Protein Data Bank, Europe (PDBe), we demonstrate that text mining doubles the number of structured annotations of database record citations supplied in journal articles by publishers. Many thousands of new literature-database relationships are found by text mining, since these relationships are also not present in the set of articles cited by database records. We recommend that structured annotation of database records in articles is extended to other databases, such as ArrayExpress and Pfam, entries from which are also cited widely in the literature. The very high precision and high-throughput of this text-mining pipeline makes this activity possible both accurately and at low cost, which will allow the development of new integrated data services. PMID:23734176
CD-REST: a system for extracting chemical-induced disease relation in literature.
Xu, Jun; Wu, Yonghui; Zhang, Yaoyun; Wang, Jingqi; Lee, Hee-Jin; Xu, Hua
2016-01-01
Mining chemical-induced disease relations embedded in the vast biomedical literature could facilitate a wide range of computational biomedical applications, such as pharmacovigilance. The BioCreative V organized a Chemical Disease Relation (CDR) Track regarding chemical-induced disease relation extraction from biomedical literature in 2015. We participated in all subtasks of this challenge. In this article, we present our participation system Chemical Disease Relation Extraction SysTem (CD-REST), an end-to-end system for extracting chemical-induced disease relations in biomedical literature. CD-REST consists of two main components: (1) a chemical and disease named entity recognition and normalization module, which employs the Conditional Random Fields algorithm for entity recognition and a Vector Space Model-based approach for normalization; and (2) a relation extraction module that classifies both sentence-level and document-level candidate drug-disease pairs by support vector machines. Our system achieved the best performance on the chemical-induced disease relation extraction subtask in the BioCreative V CDR Track, demonstrating the effectiveness of our proposed machine learning-based approaches for automatic extraction of chemical-induced disease relations in biomedical literature. The CD-REST system provides web services using HTTP POST request. The web services can be accessed fromhttp://clinicalnlptool.com/cdr The online CD-REST demonstration system is available athttp://clinicalnlptool.com/cdr/cdr.html. Database URL:http://clinicalnlptool.com/cdr;http://clinicalnlptool.com/cdr/cdr.html. © The Author(s) 2016. Published by Oxford University Press.
A common layer of interoperability for biomedical ontologies based on OWL EL.
Hoehndorf, Robert; Dumontier, Michel; Oellrich, Anika; Wimalaratne, Sarala; Rebholz-Schuhmann, Dietrich; Schofield, Paul; Gkoutos, Georgios V
2011-04-01
Ontologies are essential in biomedical research due to their ability to semantically integrate content from different scientific databases and resources. Their application improves capabilities for querying and mining biological knowledge. An increasing number of ontologies is being developed for this purpose, and considerable effort is invested into formally defining them in order to represent their semantics explicitly. However, current biomedical ontologies do not facilitate data integration and interoperability yet, since reasoning over these ontologies is very complex and cannot be performed efficiently or is even impossible. We propose the use of less expressive subsets of ontology representation languages to enable efficient reasoning and achieve the goal of genuine interoperability between ontologies. We present and evaluate EL Vira, a framework that transforms OWL ontologies into the OWL EL subset, thereby enabling the use of tractable reasoning. We illustrate which OWL constructs and inferences are kept and lost following the conversion and demonstrate the performance gain of reasoning indicated by the significant reduction of processing time. We applied EL Vira to the open biomedical ontologies and provide a repository of ontologies resulting from this conversion. EL Vira creates a common layer of ontological interoperability that, for the first time, enables the creation of software solutions that can employ biomedical ontologies to perform inferences and answer complex queries to support scientific analyses. The EL Vira software is available from http://el-vira.googlecode.com and converted OBO ontologies and their mappings are available from http://bioonto.gen.cam.ac.uk/el-ont.
Singhal, Ayush; Simmons, Michael; Lu, Zhiyong
2016-11-01
The practice of precision medicine will ultimately require databases of genes and mutations for healthcare providers to reference in order to understand the clinical implications of each patient's genetic makeup. Although the highest quality databases require manual curation, text mining tools can facilitate the curation process, increasing accuracy, coverage, and productivity. However, to date there are no available text mining tools that offer high-accuracy performance for extracting such triplets from biomedical literature. In this paper we propose a high-performance machine learning approach to automate the extraction of disease-gene-variant triplets from biomedical literature. Our approach is unique because we identify the genes and protein products associated with each mutation from not just the local text content, but from a global context as well (from the Internet and from all literature in PubMed). Our approach also incorporates protein sequence validation and disease association using a novel text-mining-based machine learning approach. We extract disease-gene-variant triplets from all abstracts in PubMed related to a set of ten important diseases (breast cancer, prostate cancer, pancreatic cancer, lung cancer, acute myeloid leukemia, Alzheimer's disease, hemochromatosis, age-related macular degeneration (AMD), diabetes mellitus, and cystic fibrosis). We then evaluate our approach in two ways: (1) a direct comparison with the state of the art using benchmark datasets; (2) a validation study comparing the results of our approach with entries in a popular human-curated database (UniProt) for each of the previously mentioned diseases. In the benchmark comparison, our full approach achieves a 28% improvement in F1-measure (from 0.62 to 0.79) over the state-of-the-art results. For the validation study with UniProt Knowledgebase (KB), we present a thorough analysis of the results and errors. Across all diseases, our approach returned 272 triplets (disease-gene-variant) that overlapped with entries in UniProt and 5,384 triplets without overlap in UniProt. Analysis of the overlapping triplets and of a stratified sample of the non-overlapping triplets revealed accuracies of 93% and 80% for the respective categories (cumulative accuracy, 77%). We conclude that our process represents an important and broadly applicable improvement to the state of the art for curation of disease-gene-variant relationships.
Bakal, Gokhan; Talari, Preetham; Kakani, Elijah V; Kavuluru, Ramakanth
2018-06-01
Identifying new potential treatment options for medical conditions that cause human disease burden is a central task of biomedical research. Since all candidate drugs cannot be tested with animal and clinical trials, in vitro approaches are first attempted to identify promising candidates. Likewise, identifying different causal relations between biomedical entities is also critical to understand biomedical processes. Generally, natural language processing (NLP) and machine learning are used to predict specific relations between any given pair of entities using the distant supervision approach. To build high accuracy supervised predictive models to predict previously unknown treatment and causative relations between biomedical entities based only on semantic graph pattern features extracted from biomedical knowledge graphs. We used 7000 treats and 2918 causes hand-curated relations from the UMLS Metathesaurus to train and test our models. Our graph pattern features are extracted from simple paths connecting biomedical entities in the SemMedDB graph (based on the well-known SemMedDB database made available by the U.S. National Library of Medicine). Using these graph patterns connecting biomedical entities as features of logistic regression and decision tree models, we computed mean performance measures (precision, recall, F-score) over 100 distinct 80-20% train-test splits of the datasets. For all experiments, we used a positive:negative class imbalance of 1:10 in the test set to model relatively more realistic scenarios. Our models predict treats and causes relations with high F-scores of 99% and 90% respectively. Logistic regression model coefficients also help us identify highly discriminative patterns that have an intuitive interpretation. We are also able to predict some new plausible relations based on false positives that our models scored highly based on our collaborations with two physician co-authors. Finally, our decision tree models are able to retrieve over 50% of treatment relations from a recently created external dataset. We employed semantic graph patterns connecting pairs of candidate biomedical entities in a knowledge graph as features to predict treatment/causative relations between them. We provide what we believe is the first evidence in direct prediction of biomedical relations based on graph features. Our work complements lexical pattern based approaches in that the graph patterns can be used as additional features for weakly supervised relation prediction. Copyright © 2018 Elsevier Inc. All rights reserved.
Tsatsaronis, George; Balikas, Georgios; Malakasiotis, Prodromos; Partalas, Ioannis; Zschunke, Matthias; Alvers, Michael R; Weissenborn, Dirk; Krithara, Anastasia; Petridis, Sergios; Polychronopoulos, Dimitris; Almirantis, Yannis; Pavlopoulos, John; Baskiotis, Nicolas; Gallinari, Patrick; Artiéres, Thierry; Ngomo, Axel-Cyrille Ngonga; Heino, Norman; Gaussier, Eric; Barrio-Alvers, Liliana; Schroeder, Michael; Androutsopoulos, Ion; Paliouras, Georgios
2015-04-30
This article provides an overview of the first BIOASQ challenge, a competition on large-scale biomedical semantic indexing and question answering (QA), which took place between March and September 2013. BIOASQ assesses the ability of systems to semantically index very large numbers of biomedical scientific articles, and to return concise and user-understandable answers to given natural language questions by combining information from biomedical articles and ontologies. The 2013 BIOASQ competition comprised two tasks, Task 1a and Task 1b. In Task 1a participants were asked to automatically annotate new PUBMED documents with MESH headings. Twelve teams participated in Task 1a, with a total of 46 system runs submitted, and one of the teams performing consistently better than the MTI indexer used by NLM to suggest MESH headings to curators. Task 1b used benchmark datasets containing 29 development and 282 test English questions, along with gold standard (reference) answers, prepared by a team of biomedical experts from around Europe and participants had to automatically produce answers. Three teams participated in Task 1b, with 11 system runs. The BIOASQ infrastructure, including benchmark datasets, evaluation mechanisms, and the results of the participants and baseline methods, is publicly available. A publicly available evaluation infrastructure for biomedical semantic indexing and QA has been developed, which includes benchmark datasets, and can be used to evaluate systems that: assign MESH headings to published articles or to English questions; retrieve relevant RDF triples from ontologies, relevant articles and snippets from PUBMED Central; produce "exact" and paragraph-sized "ideal" answers (summaries). The results of the systems that participated in the 2013 BIOASQ competition are promising. In Task 1a one of the systems performed consistently better from the NLM's MTI indexer. In Task 1b the systems received high scores in the manual evaluation of the "ideal" answers; hence, they produced high quality summaries as answers. Overall, BIOASQ helped obtain a unified view of how techniques from text classification, semantic indexing, document and passage retrieval, question answering, and text summarization can be combined to allow biomedical experts to obtain concise, user-understandable answers to questions reflecting their real information needs.
MalaCards: an integrated compendium for diseases and their annotation
Rappaport, Noa; Nativ, Noam; Stelzer, Gil; Twik, Michal; Guan-Golan, Yaron; Iny Stein, Tsippi; Bahir, Iris; Belinky, Frida; Morrey, C. Paul; Safran, Marilyn; Lancet, Doron
2013-01-01
Comprehensive disease classification, integration and annotation are crucial for biomedical discovery. At present, disease compilation is incomplete, heterogeneous and often lacking systematic inquiry mechanisms. We introduce MalaCards, an integrated database of human maladies and their annotations, modeled on the architecture and strategy of the GeneCards database of human genes. MalaCards mines and merges 44 data sources to generate a computerized card for each of 16 919 human diseases. Each MalaCard contains disease-specific prioritized annotations, as well as inter-disease connections, empowered by the GeneCards relational database, its searches and GeneDecks set analyses. First, we generate a disease list from 15 ranked sources, using disease-name unification heuristics. Next, we use four schemes to populate MalaCards sections: (i) directly interrogating disease resources, to establish integrated disease names, synonyms, summaries, drugs/therapeutics, clinical features, genetic tests and anatomical context; (ii) searching GeneCards for related publications, and for associated genes with corresponding relevance scores; (iii) analyzing disease-associated gene sets in GeneDecks to yield affiliated pathways, phenotypes, compounds and GO terms, sorted by a composite relevance score and presented with GeneCards links; and (iv) searching within MalaCards itself, e.g. for additional related diseases and anatomical context. The latter forms the basis for the construction of a disease network, based on shared MalaCards annotations, embodying associations based on etiology, clinical features and clinical conditions. This broadly disposed network has a power-law degree distribution, suggesting that this might be an inherent property of such networks. Work in progress includes hierarchical malady classification, ontological mapping and disease set analyses, striving to make MalaCards an even more effective tool for biomedical research. Database URL: http://www.malacards.org/ PMID:23584832
Targeted gene knock-in by CRISPR/Cas ribonucleoproteins in porcine zygotes
USDA-ARS?s Scientific Manuscript database
The domestic pig is an important “dual purpose” animal model for agricultural and biomedical applications. There is an emerging consensus in the biomedical community that even though mouse is a powerhouse genetic model, there is a requirement for large animal models such as pigs that can either ser...
Kim, Seongsoon; Park, Donghyeon; Choi, Yonghwa; Lee, Kyubum; Kim, Byounggun; Jeon, Minji; Kim, Jihye; Tan, Aik Choon; Kang, Jaewoo
2018-01-05
With the development of artificial intelligence (AI) technology centered on deep-learning, the computer has evolved to a point where it can read a given text and answer a question based on the context of the text. Such a specific task is known as the task of machine comprehension. Existing machine comprehension tasks mostly use datasets of general texts, such as news articles or elementary school-level storybooks. However, no attempt has been made to determine whether an up-to-date deep learning-based machine comprehension model can also process scientific literature containing expert-level knowledge, especially in the biomedical domain. This study aims to investigate whether a machine comprehension model can process biomedical articles as well as general texts. Since there is no dataset for the biomedical literature comprehension task, our work includes generating a large-scale question answering dataset using PubMed and manually evaluating the generated dataset. We present an attention-based deep neural model tailored to the biomedical domain. To further enhance the performance of our model, we used a pretrained word vector and biomedical entity type embedding. We also developed an ensemble method of combining the results of several independent models to reduce the variance of the answers from the models. The experimental results showed that our proposed deep neural network model outperformed the baseline model by more than 7% on the new dataset. We also evaluated human performance on the new dataset. The human evaluation result showed that our deep neural model outperformed humans in comprehension by 22% on average. In this work, we introduced a new task of machine comprehension in the biomedical domain using a deep neural model. Since there was no large-scale dataset for training deep neural models in the biomedical domain, we created the new cloze-style datasets Biomedical Knowledge Comprehension Title (BMKC_T) and Biomedical Knowledge Comprehension Last Sentence (BMKC_LS) (together referred to as BioMedical Knowledge Comprehension) using the PubMed corpus. The experimental results showed that the performance of our model is much higher than that of humans. We observed that our model performed consistently better regardless of the degree of difficulty of a text, whereas humans have difficulty when performing biomedical literature comprehension tasks that require expert level knowledge. ©Seongsoon Kim, Donghyeon Park, Yonghwa Choi, Kyubum Lee, Byounggun Kim, Minji Jeon, Jihye Kim, Aik Choon Tan, Jaewoo Kang. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 05.01.2018.
Building an Ontology-driven Database for Clinical Immune Research
Ma, Jingming
2006-01-01
The clinical researches of immune response usually generate a huge amount of biomedical testing data over a certain period of time. The user-friendly data management systems based on the relational database will help immunologists/clinicians to fully manage the data. On the other hand, the same biological assays such as ELISPOT and flow cytometric assays are involved in immunological experiments no matter of different study purposes. The reuse of biological knowledge is one of driving forces behind this ontology-driven data management. Therefore, an ontology-driven database will help to handle different clinical immune researches and help immunologists/clinicians easily understand the immunological data from each other. We will discuss some outlines for building an ontology-driven data management for clinical immune researches (ODMim). PMID:17238637
Outreach and online training services at the Saccharomyces Genome Database.
MacPherson, Kevin A; Starr, Barry; Wong, Edith D; Dalusag, Kyla S; Hellerstedt, Sage T; Lang, Olivia W; Nash, Robert S; Skrzypek, Marek S; Engel, Stacia R; Cherry, J Michael
2017-01-01
The Saccharomyces Genome Database (SGD; www.yeastgenome.org ), the primary genetics and genomics resource for the budding yeast S. cerevisiae , provides free public access to expertly curated information about the yeast genome and its gene products. As the central hub for the yeast research community, SGD engages in a variety of social outreach efforts to inform our users about new developments, promote collaboration, increase public awareness of the importance of yeast to biomedical research, and facilitate scientific discovery. Here we describe these various outreach methods, from networking at scientific conferences to the use of online media such as blog posts and webinars, and include our perspectives on the benefits provided by outreach activities for model organism databases. http://www.yeastgenome.org. © The Author(s) 2017. Published by Oxford University Press.
Taiwan Biobank: making cross-database convergence possible in the Big Data era
Lin, Jui-Chu; Fan, Chien-Te; Liao, Chia-Cheng; Chen, Yao-Sheng
2018-01-01
Abstract The Taiwan Biobank (TWB) is a biomedical research database of biopsy data from 200 000 participants. Access to this database has been granted to research communities taking part in the development of precision medicines; however, this has raised issues surrounding TWB’s access to electronic medical records (EMRs). The Personal Data Protection Act of Taiwan restricts access to EMRs for purposes not covered by patients’ original consent. This commentary explores possible legal solutions to help ensure that the access TWB has to EMR abides with legal obligations, and with governance frameworks associated with ethical, legal, and social implications. We suggest utilizing “hash function” algorithms to create nonretrospective, anonymized data for the purpose of cross-transmission and/or linkage with EMR. PMID:29149267
ChemBank: a small-molecule screening and cheminformatics resource database.
Seiler, Kathleen Petri; George, Gregory A; Happ, Mary Pat; Bodycombe, Nicole E; Carrinski, Hyman A; Norton, Stephanie; Brudz, Steve; Sullivan, John P; Muhlich, Jeremy; Serrano, Martin; Ferraiolo, Paul; Tolliday, Nicola J; Schreiber, Stuart L; Clemons, Paul A
2008-01-01
ChemBank (http://chembank.broad.harvard.edu/) is a public, web-based informatics environment developed through a collaboration between the Chemical Biology Program and Platform at the Broad Institute of Harvard and MIT. This knowledge environment includes freely available data derived from small molecules and small-molecule screens and resources for studying these data. ChemBank is unique among small-molecule databases in its dedication to the storage of raw screening data, its rigorous definition of screening experiments in terms of statistical hypothesis testing, and its metadata-based organization of screening experiments into projects involving collections of related assays. ChemBank stores an increasingly varied set of measurements derived from cells and other biological assay systems treated with small molecules. Analysis tools are available and are continuously being developed that allow the relationships between small molecules, cell measurements, and cell states to be studied. Currently, ChemBank stores information on hundreds of thousands of small molecules and hundreds of biomedically relevant assays that have been performed at the Broad Institute by collaborators from the worldwide research community. The goal of ChemBank is to provide life scientists unfettered access to biomedically relevant data and tools heretofore available primarily in the private sector.
Concept annotation in the CRAFT corpus.
Bada, Michael; Eckert, Miriam; Evans, Donald; Garcia, Kristin; Shipley, Krista; Sitnikov, Dmitry; Baumgartner, William A; Cohen, K Bretonnel; Verspoor, Karin; Blake, Judith A; Hunter, Lawrence E
2012-07-09
Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text. This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP) community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released). Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement. As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens), our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection), the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are freely available at http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml.
Pharmacovigilance and Biomedical Informatics: A Model for Future Development.
Beninger, Paul; Ibara, Michael A
2016-12-01
The discipline of pharmacovigilance is rooted in the aftermath of the thalidomide tragedy of 1961. It has evolved as a result of collaborative efforts by many individuals and organizations, including physicians, patients, Health Authorities, universities, industry, the World Health Organization, the Council for International Organizations of Medical Sciences, and the International Conference on Harmonisation. Biomedical informatics is rooted in technologically based methodologies and has evolved at the speed of computer technology. The purpose of this review is to bring a novel lens to pharmacovigilance, looking at the evolution and development of the field of pharmacovigilance from the perspective of biomedical informatics, with the explicit goal of providing a foundation for discussion of the future direction of pharmacovigilance as a discipline. For this review, we searched [publication trend for the log 10 value of the numbers of publications identified in PubMed] using the key words [informatics (INF), pharmacovigilance (PV), phar-macovigilance þ informatics (PV þ INF)], for [study types] articles published between [1994-2015]. We manually searched the reference lists of identified articles for additional information. Biomedical informatics has made significant contributions to the infrastructural development of pharmacovigilance. However, there has not otherwise been a systematic assessment of the role of biomedical informatics in enhancing the field of pharmacovigilance, and there has been little cross-discipline scholarship. Rapidly developing innovations in biomedical informatics pose a challenge to pharmacovigilance in finding ways to include new sources of safety information, including social media, massively linked databases, and mobile and wearable wellness applications and sensors. With biomedical informatics as a lens, it is evident that certain aspects of pharmacovigilance are evolving more slowly. However, the high levels of mutual interest in both fields and intense global and economic external pressures offer opportunities for a future of closer collaboration. Copyright © 2016 Elsevier HS Journals, Inc. All rights reserved.
Concept annotation in the CRAFT corpus
2012-01-01
Background Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text. Results This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP) community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released). Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement. Conclusions As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens), our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection), the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are freely available at http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml. PMID:22776079
Use of controlled vocabularies to improve biomedical information retrieval tasks.
Pasche, Emilie; Gobeill, Julien; Vishnyakova, Dina; Ruch, Patrick; Lovis, Christian
2013-01-01
The high heterogeneity of biomedical vocabulary is a major obstacle for information retrieval in large biomedical collections. Therefore, using biomedical controlled vocabularies is crucial for managing these contents. We investigate the impact of query expansion based on controlled vocabularies to improve the effectiveness of two search engines. Our strategy relies on the enrichment of users' queries with additional terms, directly derived from such vocabularies applied to infectious diseases and chemical patents. We observed that query expansion based on pathogen names resulted in improvements of the top-precision of our first search engine, while the normalization of diseases degraded the top-precision. The expansion of chemical entities, which was performed on the second search engine, positively affected the mean average precision. We have shown that query expansion of some types of biomedical entities has a great potential to improve search effectiveness; therefore a fine-tuning of query expansion strategies could help improving the performances of search engines.
BioStar models of clinical and genomic data for biomedical data warehouse design
Wang, Liangjiang; Ramanathan, Murali
2008-01-01
Biomedical research is now generating large amounts of data, ranging from clinical test results to microarray gene expression profiles. The scale and complexity of these datasets give rise to substantial challenges in data management and analysis. It is highly desirable that data warehousing and online analytical processing technologies can be applied to biomedical data integration and mining. The major difficulty probably lies in the task of capturing and modelling diverse biological objects and their complex relationships. This paper describes multidimensional data modelling for biomedical data warehouse design. Since the conventional models such as star schema appear to be insufficient for modelling clinical and genomic data, we develop a new model called BioStar schema. The new model can capture the rich semantics of biomedical data and provide greater extensibility for the fast evolution of biological research methodologies. PMID:18048122
Stewart, Robert; Soremekun, Mishael; Perera, Gayan; Broadbent, Matthew; Callard, Felicity; Denis, Mike; Hotopf, Matthew; Thornicroft, Graham; Lovestone, Simon
2009-08-12
Case registers have been used extensively in mental health research. Recent developments in electronic medical records, and in computer software to search and analyse these in anonymised format, have the potential to revolutionise this research tool. We describe the development of the South London and Maudsley NHS Foundation Trust (SLAM) Biomedical Research Centre (BRC) Case Register Interactive Search tool (CRIS) which allows research-accessible datasets to be derived from SLAM, the largest provider of secondary mental healthcare in Europe. All clinical data, including free text, are available for analysis in the form of anonymised datasets. Development involved both the building of the system and setting in place the necessary security (with both functional and procedural elements). Descriptive data are presented for the Register database as of October 2008. The database at that point included 122,440 cases, 35,396 of whom were receiving active case management under the Care Programme Approach. In terms of gender and ethnicity, the database was reasonably representative of the source population. The most common assigned primary diagnoses were within the ICD mood disorders (n = 12,756) category followed by schizophrenia and related disorders (8158), substance misuse (7749), neuroses (7105) and organic disorders (6414). The SLAM BRC Case Register represents a 'new generation' of this research design, built on a long-running system of fully electronic clinical records and allowing in-depth secondary analysis of both numerical, string and free text data, whilst preserving anonymity through technical and procedural safeguards.
VarioML framework for comprehensive variation data representation and exchange.
Byrne, Myles; Fokkema, Ivo Fac; Lancaster, Owen; Adamusiak, Tomasz; Ahonen-Bishopp, Anni; Atlan, David; Béroud, Christophe; Cornell, Michael; Dalgleish, Raymond; Devereau, Andrew; Patrinos, George P; Swertz, Morris A; Taschner, Peter Em; Thorisson, Gudmundur A; Vihinen, Mauno; Brookes, Anthony J; Muilu, Juha
2012-10-03
Sharing of data about variation and the associated phenotypes is a critical need, yet variant information can be arbitrarily complex, making a single standard vocabulary elusive and re-formatting difficult. Complex standards have proven too time-consuming to implement. The GEN2PHEN project addressed these difficulties by developing a comprehensive data model for capturing biomedical observations, Observ-OM, and building the VarioML format around it. VarioML pairs a simplified open specification for describing variants, with a toolkit for adapting the specification into one's own research workflow. Straightforward variant data can be captured, federated, and exchanged with no overhead; more complex data can be described, without loss of compatibility. The open specification enables push-button submission to gene variant databases (LSDBs) e.g., the Leiden Open Variation Database, using the Cafe Variome data publishing service, while VarioML bidirectionally transforms data between XML and web-application code formats, opening up new possibilities for open source web applications building on shared data. A Java implementation toolkit makes VarioML easily integrated into biomedical applications. VarioML is designed primarily for LSDB data submission and transfer scenarios, but can also be used as a standard variation data format for JSON and XML document databases and user interface components. VarioML is a set of tools and practices improving the availability, quality, and comprehensibility of human variation information. It enables researchers, diagnostic laboratories, and clinics to share that information with ease, clarity, and without ambiguity.
VarioML framework for comprehensive variation data representation and exchange
2012-01-01
Background Sharing of data about variation and the associated phenotypes is a critical need, yet variant information can be arbitrarily complex, making a single standard vocabulary elusive and re-formatting difficult. Complex standards have proven too time-consuming to implement. Results The GEN2PHEN project addressed these difficulties by developing a comprehensive data model for capturing biomedical observations, Observ-OM, and building the VarioML format around it. VarioML pairs a simplified open specification for describing variants, with a toolkit for adapting the specification into one's own research workflow. Straightforward variant data can be captured, federated, and exchanged with no overhead; more complex data can be described, without loss of compatibility. The open specification enables push-button submission to gene variant databases (LSDBs) e.g., the Leiden Open Variation Database, using the Cafe Variome data publishing service, while VarioML bidirectionally transforms data between XML and web-application code formats, opening up new possibilities for open source web applications building on shared data. A Java implementation toolkit makes VarioML easily integrated into biomedical applications. VarioML is designed primarily for LSDB data submission and transfer scenarios, but can also be used as a standard variation data format for JSON and XML document databases and user interface components. Conclusions VarioML is a set of tools and practices improving the availability, quality, and comprehensibility of human variation information. It enables researchers, diagnostic laboratories, and clinics to share that information with ease, clarity, and without ambiguity. PMID:23031277
Xenbase: Core features, data acquisition, and data processing.
James-Zorn, Christina; Ponferrada, Virgillio G; Burns, Kevin A; Fortriede, Joshua D; Lotay, Vaneet S; Liu, Yu; Brad Karpinka, J; Karimi, Kamran; Zorn, Aaron M; Vize, Peter D
2015-08-01
Xenbase, the Xenopus model organism database (www.xenbase.org), is a cloud-based, web-accessible resource that integrates the diverse genomic and biological data from Xenopus research. Xenopus frogs are one of the major vertebrate animal models used for biomedical research, and Xenbase is the central repository for the enormous amount of data generated using this model tetrapod. The goal of Xenbase is to accelerate discovery by enabling investigators to make novel connections between molecular pathways in Xenopus and human disease. Our relational database and user-friendly interface make these data easy to query and allows investigators to quickly interrogate and link different data types in ways that would otherwise be difficult, time consuming, or impossible. Xenbase also enhances the value of these data through high-quality gene expression curation and data integration, by providing bioinformatics tools optimized for Xenopus experiments, and by linking Xenopus data to other model organisms and to human data. Xenbase draws in data via pipelines that download data, parse the content, and save them into appropriate files and database tables. Furthermore, Xenbase makes these data accessible to the broader biomedical community by continually providing annotated data updates to organizations such as NCBI, UniProtKB, and Ensembl. Here, we describe our bioinformatics, genome-browsing tools, data acquisition and sharing, our community submitted and literature curation pipelines, text-mining support, gene page features, and the curation of gene nomenclature and gene models. © 2015 Wiley Periodicals, Inc.
Telescience Support Center Data System Software
NASA Technical Reports Server (NTRS)
Rahman, Hasan
2010-01-01
The Telescience Support Center (TSC) team has developed a databasedriven, increment-specific Data Require - ment Document (DRD) generation tool that automates much of the work required for generating and formatting the DRD. It creates a database to load the required changes to configure the TSC data system, thus eliminating a substantial amount of labor in database entry and formatting. The TSC database contains the TSC systems configuration, along with the experimental data, in which human physiological data must be de-commutated in real time. The data for each experiment also must be cataloged and archived for future retrieval. TSC software provides tools and resources for ground operation and data distribution to remote users consisting of PIs (principal investigators), bio-medical engineers, scientists, engineers, payload specialists, and computer scientists. Operations support is provided for computer systems access, detailed networking, and mathematical and computational problems of the International Space Station telemetry data. User training is provided for on-site staff and biomedical researchers and other remote personnel in the usage of the space-bound services via the Internet, which enables significant resource savings for the physical facility along with the time savings versus traveling to NASA sites. The software used in support of the TSC could easily be adapted to other Control Center applications. This would include not only other NASA payload monitoring facilities, but also other types of control activities, such as monitoring and control of the electric grid, chemical, or nuclear plant processes, air traffic control, and the like.
Biomedical informatics training at the University of Wisconsin-Madison.
Severtson, D J; Pape, L; Page, C D; Shavlik, J W; Phillips, G N; Flatley Brennan, P
2007-01-01
The purpose of this paper is to describe biomedical informatics training at the University of Wisconsin-Madison (UW-Madison). We reviewed biomedical informatics training, research, and faculty/trainee participation at UW-Madison. There are three primary approaches to training 1) The Computation & Informatics in Biology & Medicine Training Program, 2) formal biomedical informatics offered by various campus departments, and 3) individualized programs. Training at UW-Madison embodies the features of effective biomedical informatics training recommended by the American College of Medical Informatics that were delineated as: 1) curricula that integrate experiences among computational sciences and application domains, 2) individualized and interdisciplinary cross-training among a diverse cadre of trainees to develop key competencies that he or she does not initially possess, 3) participation in research and development activities, and 4) exposure to a range of basic informational and computational sciences. The three biomedical informatics training approaches immerse students in multidisciplinary training and education that is supported by faculty trainers who participate in collaborative research across departments. Training is provided across a range of disciplines and available at different training stages. Biomedical informatics training at UW-Madison illustrates how a large research University, with multiple departments across biological, computational and health fields, can provide effective and productive biomedical informatics training via multiple bioinformatics training approaches.
Curation accuracy of model organism databases
Keseler, Ingrid M.; Skrzypek, Marek; Weerasinghe, Deepika; Chen, Albert Y.; Fulcher, Carol; Li, Gene-Wei; Lemmer, Kimberly C.; Mladinich, Katherine M.; Chow, Edmond D.; Sherlock, Gavin; Karp, Peter D.
2014-01-01
Manual extraction of information from the biomedical literature—or biocuration—is the central methodology used to construct many biological databases. For example, the UniProt protein database, the EcoCyc Escherichia coli database and the Candida Genome Database (CGD) are all based on biocuration. Biological databases are used extensively by life science researchers, as online encyclopedias, as aids in the interpretation of new experimental data and as golden standards for the development of new bioinformatics algorithms. Although manual curation has been assumed to be highly accurate, we are aware of only one previous study of biocuration accuracy. We assessed the accuracy of EcoCyc and CGD by manually selecting curated assertions within randomly chosen EcoCyc and CGD gene pages and by then validating that the data found in the referenced publications supported those assertions. A database assertion is considered to be in error if that assertion could not be found in the publication cited for that assertion. We identified 10 errors in the 633 facts that we validated across the two databases, for an overall error rate of 1.58%, and individual error rates of 1.82% for CGD and 1.40% for EcoCyc. These data suggest that manual curation of the experimental literature by Ph.D-level scientists is highly accurate. Database URL: http://ecocyc.org/, http://www.candidagenome.org// PMID:24923819
The Protein Disease Database of human body fluids: II. Computer methods and data issues.
Lemkin, P F; Orr, G A; Goldstein, M P; Creed, G J; Myrick, J E; Merril, C R
1995-01-01
The Protein Disease Database (PDD) is a relational database of proteins and diseases. With this database it is possible to screen for quantitative protein abnormalities associated with disease states. These quantitative relationships use data drawn from the peer-reviewed biomedical literature. Assays may also include those observed in high-resolution electrophoretic gels that offer the potential to quantitate many proteins in a single test as well as data gathered by enzymatic or immunologic assays. We are using the Internet World Wide Web (WWW) and the Web browser paradigm as an access method for wide distribution and querying of the Protein Disease Database. The WWW hypertext transfer protocol and its Common Gateway Interface make it possible to build powerful graphical user interfaces that can support easy-to-use data retrieval using query specification forms or images. The details of these interactions are totally transparent to the users of these forms. Using a client-server SQL relational database, user query access, initial data entry and database maintenance are all performed over the Internet with a Web browser. We discuss the underlying design issues, mapping mechanisms and assumptions that we used in constructing the system, data entry, access to the database server, security, and synthesis of derived two-dimensional gel image maps and hypertext documents resulting from SQL database searches.
miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases.
Gupta, Samir; Ross, Karen E; Tudor, Catalina O; Wu, Cathy H; Schmidt, Carl J; Vijay-Shanker, K
2016-04-29
MicroRNAs are increasingly being appreciated as critical players in human diseases, and questions concerning the role of microRNAs arise in many areas of biomedical research. There are several manually curated databases of microRNA-disease associations gathered from the biomedical literature; however, it is difficult for curators of these databases to keep up with the explosion of publications in the microRNA-disease field. Moreover, automated literature mining tools that assist manual curation of microRNA-disease associations currently capture only one microRNA property (expression) in the context of one disease (cancer). Thus, there is a clear need to develop more sophisticated automated literature mining tools that capture a variety of microRNA properties and relations in the context of multiple diseases to provide researchers with fast access to the most recent published information and to streamline and accelerate manual curation. We have developed miRiaD (microRNAs in association with Disease), a text-mining tool that automatically extracts associations between microRNAs and diseases from the literature. These associations are often not directly linked, and the intermediate relations are often highly informative for the biomedical researcher. Thus, miRiaD extracts the miR-disease pairs together with an explanation for their association. We also developed a procedure that assigns scores to sentences, marking their informativeness, based on the microRNA-disease relation observed within the sentence. miRiaD was applied to the entire Medline corpus, identifying 8301 PMIDs with miR-disease associations. These abstracts and the miR-disease associations are available for browsing at http://biotm.cis.udel.edu/miRiaD . We evaluated the recall and precision of miRiaD with respect to information of high interest to public microRNA-disease database curators (expression and target gene associations), obtaining a recall of 88.46-90.78. When we expanded the evaluation to include sentences with a wide range of microRNA-disease information that may be of interest to biomedical researchers, miRiaD also performed very well with a F-score of 89.4. The informativeness ranking of sentences was evaluated in terms of nDCG (0.977) and correlation metrics (0.678-0.727) when compared to an annotator's ranked list. miRiaD, a high performance system that can capture a wide variety of microRNA-disease related information, extends beyond the scope of existing microRNA-disease resources. It can be incorporated into manual curation pipelines and serve as a resource for biomedical researchers interested in the role of microRNAs in disease. In our ongoing work we are developing an improved miRiaD web interface that will facilitate complex queries about microRNA-disease relationships, such as "In what diseases does microRNA regulation of apoptosis play a role?" or "Is there overlap in the sets of genes targeted by microRNAs in different types of dementia?"."
Rising Expectations: Access to Biomedical Information
Lindberg, D. A. B.; Humphreys, B. L.
2008-01-01
Summary Objective To provide an overview of the expansion in public access to electronic biomedical information over the past two decades, with an emphasis on developments to which the U.S. National Library of Medicine contributed. Methods Review of the increasingly broad spectrum of web-accessible genomic data, biomedical literature, consumer health information, clinical trials data, and images. Results The amount of publicly available electronic biomedical information has increased dramatically over the past twenty years. Rising expectations regarding access to biomedical information were stimulated by the spread of the Internet, the World Wide Web, advanced searching and linking techniques. These informatics advances simplified and improved access to electronic information and reduced costs, which enabled inter-organizational collaborations to build and maintain large international information resources and also aided outreach and education efforts The demonstrated benefits of free access to electronic biomedical information encouraged the development of public policies that further increase the amount of information available. Conclusions Continuing rapid growth of publicly accessible electronic biomedical information presents tremendous opportunities and challenges, including the need to ensure uninterrupted access during disasters or emergencies and to manage digital resources so they remain available for future generations. PMID:18587496
National Space Biomedical Research Institute
NASA Technical Reports Server (NTRS)
1998-01-01
The National Space Biomedical Research Institute (NSBRI) sponsors and performs fundamental and applied space biomedical research with the mission of leading a world-class, national effort in integrated, critical path space biomedical research that supports NASA's Human Exploration and Development of Space (HEDS) Strategic Plan. It focuses on the enabling of long-term human presence in, development of, and exploration of space. This will be accomplished by: designing, implementing, and validating effective countermeasures to address the biological and environmental impediments to long-term human space flight; defining the molecular, cellular, organ-level, integrated responses and mechanistic relationships that ultimately determine these impediments, where such activity fosters the development of novel countermeasures; establishing biomedical support technologies to maximize human performance in space, reduce biomedical hazards to an acceptable level, and deliver quality medical care; transferring and disseminating the biomedical advances in knowledge and technology acquired through living and working in space to the benefit of mankind in space and on Earth, including the treatment of patients suffering from gravity- and radiation-related conditions on Earth; and ensuring open involvement of the scientific community, industry, and the public at large in the Institute's activities and fostering a robust collaboration with NASA, particularly through Johnson Space Center.
Concept recognition for extracting protein interaction relations from biomedical text
Baumgartner, William A; Lu, Zhiyong; Johnson, Helen L; Caporaso, J Gregory; Paquette, Jesse; Lindemann, Anna; White, Elizabeth K; Medvedeva, Olga; Cohen, K Bretonnel; Hunter, Lawrence
2008-01-01
Background: Reliable information extraction applications have been a long sought goal of the biomedical text mining community, a goal that if reached would provide valuable tools to benchside biologists in their increasingly difficult task of assimilating the knowledge contained in the biomedical literature. We present an integrated approach to concept recognition in biomedical text. Concept recognition provides key information that has been largely missing from previous biomedical information extraction efforts, namely direct links to well defined knowledge resources that explicitly cement the concept's semantics. The BioCreative II tasks discussed in this special issue have provided a unique opportunity to demonstrate the effectiveness of concept recognition in the field of biomedical language processing. Results: Through the modular construction of a protein interaction relation extraction system, we present several use cases of concept recognition in biomedical text, and relate these use cases to potential uses by the benchside biologist. Conclusion: Current information extraction technologies are approaching performance standards at which concept recognition can begin to deliver high quality data to the benchside biologist. Our system is available as part of the BioCreative Meta-Server project and on the internet . PMID:18834500
Automated Database Mediation Using Ontological Metadata Mappings
Marenco, Luis; Wang, Rixin; Nadkarni, Prakash
2009-01-01
Objective To devise an automated approach for integrating federated database information using database ontologies constructed from their extended metadata. Background One challenge of database federation is that the granularity of representation of equivalent data varies across systems. Dealing effectively with this problem is analogous to dealing with precoordinated vs. postcoordinated concepts in biomedical ontologies. Model Description The authors describe an approach based on ontological metadata mapping rules defined with elements of a global vocabulary, which allows a query specified at one granularity level to fetch data, where possible, from databases within the federation that use different granularities. This is implemented in OntoMediator, a newly developed production component of our previously described Query Integrator System. OntoMediator's operation is illustrated with a query that accesses three geographically separate, interoperating databases. An example based on SNOMED also illustrates the applicability of high-level rules to support the enforcement of constraints that can prevent inappropriate curator or power-user actions. Summary A rule-based framework simplifies the design and maintenance of systems where categories of data must be mapped to each other, for the purpose of either cross-database query or for curation of the contents of compositional controlled vocabularies. PMID:19567801
The 2018 Nucleic Acids Research database issue and the online molecular biology database collection.
Rigden, Daniel J; Fernández, Xosé M
2018-01-04
The 2018 Nucleic Acids Research Database Issue contains 181 papers spanning molecular biology. Among them, 82 are new and 84 are updates describing resources that appeared in the Issue previously. The remaining 15 cover databases most recently published elsewhere. Databases in the area of nucleic acids include 3DIV for visualisation of data on genome 3D structure and RNArchitecture, a hierarchical classification of RNA families. Protein databases include the established SMART, ELM and MEROPS while GPCRdb and the newcomer STCRDab cover families of biomedical interest. In the area of metabolism, HMDB and Reactome both report new features while PULDB appears in NAR for the first time. This issue also contains reports on genomics resources including Ensembl, the UCSC Genome Browser and ENCODE. Update papers from the IUPHAR/BPS Guide to Pharmacology and DrugBank are highlights of the drug and drug target section while a number of proteomics databases including proteomicsDB are also covered. The entire Database Issue is freely available online on the Nucleic Acids Research website (https://academic.oup.com/nar). The NAR online Molecular Biology Database Collection has been updated, reviewing 138 entries, adding 88 new resources and eliminating 47 discontinued URLs, bringing the current total to 1737 databases. It is available at http://www.oxfordjournals.org/nar/database/c/. © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research.
Bio-mining for biomarkers with a multi-resolution block chain
NASA Astrophysics Data System (ADS)
Jenkins, Jeffrey; Kopf, Jarad; Tran, Binh Q.; Frenchi, Christopher; Szu, Harold
2015-05-01
In this paper, we discuss a framework for bridging the gap between security and medical Large Data Analysis (LDA) with functional- biomarkers. Unsupervised Learning for individual e-IQ & IQ relying on memory eliciting (i.e. scent, grandmother images) and IQ baseline profiles could further enhance the ability to uniquely identify and properly diagnose individuals. Sub-threshold changes in a common/probable biomedical biomarker (disorders) means that an individual remains healthy, while a martingale would require further investigation and more measurements taken to determine credibility. Empirical measurements of human actions can discover anomalies hidden in data, which point to biomarkers revealed through stimulus response. We review the approach for forming a single-user baseline having 1-d devices and a scale-invariant representation for N users each (i) having N*d(i) total devices. Such a fractal representation of human-centric data provides self-similar levels information and relationships which are useful for diagnosis and identification causality anywhere from a mental disorder to a DNA match. Biomarkers from biomedical devices offer a robust way to collect data. Biometrics could be envisioned as enhanced and personalized biomedical devices (e.g. typing fist), but used for security. As long as the devices have a shared context origin, useful information can be found by coupling the sensors. In the case of the electroencephalogram (EEG), known patterns have emerged in low frequency Delta Theta Alpha Beta-Gamma (DTAB-G) waves when an individual views a familiar picture in the visual cortex which is shown on EEGs as a sharp peak. Using brainwaves as a functional biomarker for security can lead the industry to create more secure sessions by allowing not only passwords but also visual stimuli and/or keystrokes coupled with EEG to capture and stay informed about real time user e-IQ/IQ data changes. This holistic Computer Science (CS) Knowledge Discovery in Databases, Data Mining (KDD, DM) approach seeks to merge the fields having a shared data origin - biomarkers revealed through stimulus response.
Hu, Hai; Brzeski, Henry; Hutchins, Joe; Ramaraj, Mohan; Qu, Long; Xiong, Richard; Kalathil, Surendran; Kato, Rand; Tenkillaya, Santhosh; Carney, Jerry; Redd, Rosann; Arkalgudvenkata, Sheshkumar; Shahzad, Kashif; Scott, Richard; Cheng, Hui; Meadow, Stephen; McMichael, John; Sheu, Shwu-Lin; Rosendale, David; Kvecher, Leonid; Ahern, Stephen; Yang, Song; Zhang, Yonghong; Jordan, Rick; Somiari, Stella B; Hooke, Jeffrey; Shriver, Craig D; Somiari, Richard I; Liebman, Michael N
2004-10-01
The Windber Research Institute is an integrated high-throughput research center employing clinical, genomic and proteomic platforms to produce terabyte levels of data. We use biomedical informatics technologies to integrate all of these operations. This report includes information on a multi-year, multi-phase hybrid data warehouse project currently under development in the Institute. The purpose of the warehouse is to host the terabyte-level of internal experimentally generated data as well as data from public sources. We have previously reported on the phase I development, which integrated limited internal data sources and selected public databases. Currently, we are completing phase II development, which integrates our internal automated data sources and develops visualization tools to query across these data types. This paper summarizes our clinical and experimental operations, the data warehouse development, and the challenges we have faced. In phase III we plan to federate additional manual internal and public data sources and then to develop and adapt more data analysis and mining tools. We expect that the final implementation of the data warehouse will greatly facilitate biomedical informatics research.
GOClonto: an ontological clustering approach for conceptualizing PubMed abstracts.
Zheng, Hai-Tao; Borchert, Charles; Kim, Hong-Gee
2010-02-01
Concurrent with progress in biomedical sciences, an overwhelming of textual knowledge is accumulating in the biomedical literature. PubMed is the most comprehensive database collecting and managing biomedical literature. To help researchers easily understand collections of PubMed abstracts, numerous clustering methods have been proposed to group similar abstracts based on their shared features. However, most of these methods do not explore the semantic relationships among groupings of documents, which could help better illuminate the groupings of PubMed abstracts. To address this issue, we proposed an ontological clustering method called GOClonto for conceptualizing PubMed abstracts. GOClonto uses latent semantic analysis (LSA) and gene ontology (GO) to identify key gene-related concepts and their relationships as well as allocate PubMed abstracts based on these key gene-related concepts. Based on two PubMed abstract collections, the experimental results show that GOClonto is able to identify key gene-related concepts and outperforms the STC (suffix tree clustering) algorithm, the Lingo algorithm, the Fuzzy Ants algorithm, and the clustering based TRS (tolerance rough set) algorithm. Moreover, the two ontologies generated by GOClonto show significant informative conceptual structures.
Dunn, Adam G; Coiera, Enrico; Mandl, Kenneth D
2014-04-22
In 2014, the vast majority of published biomedical research is still hidden behind paywalls rather than open access. For more than a decade, similar restrictions over other digitally available content have engendered illegal activity. Music file sharing became rampant in the late 1990s as communities formed around new ways to share. The frequency and scale of cyber-attacks against commercial and government interests has increased dramatically. Massive troves of classified government documents have become public through the actions of a few. Yet we have not seen significant growth in the illegal sharing of peer-reviewed academic articles. Should we truly expect that biomedical publishing is somehow at less risk than other content-generating industries? What of the larger threat--a "Biblioleaks" event--a database breach and public leak of the substantial archives of biomedical literature? As the expectation that all research should be available to everyone becomes the norm for a younger generation of researchers and the broader community, the motivations for such a leak are likely to grow. We explore the feasibility and consequences of a Biblioleaks event for researchers, journals, publishers, and the broader communities of doctors and the patients they serve.
Biomedical discovery acceleration, with applications to craniofacial development.
Leach, Sonia M; Tipney, Hannah; Feng, Weiguo; Baumgartner, William A; Kasliwal, Priyanka; Schuyler, Ronald P; Williams, Trevor; Spritz, Richard A; Hunter, Lawrence
2009-03-01
The profusion of high-throughput instruments and the explosion of new results in the scientific literature, particularly in molecular biomedicine, is both a blessing and a curse to the bench researcher. Even knowledgeable and experienced scientists can benefit from computational tools that help navigate this vast and rapidly evolving terrain. In this paper, we describe a novel computational approach to this challenge, a knowledge-based system that combines reading, reasoning, and reporting methods to facilitate analysis of experimental data. Reading methods extract information from external resources, either by parsing structured data or using biomedical language processing to extract information from unstructured data, and track knowledge provenance. Reasoning methods enrich the knowledge that results from reading by, for example, noting two genes that are annotated to the same ontology term or database entry. Reasoning is also used to combine all sources into a knowledge network that represents the integration of all sorts of relationships between a pair of genes, and to calculate a combined reliability score. Reporting methods combine the knowledge network with a congruent network constructed from experimental data and visualize the combined network in a tool that facilitates the knowledge-based analysis of that data. An implementation of this approach, called the Hanalyzer, is demonstrated on a large-scale gene expression array dataset relevant to craniofacial development. The use of the tool was critical in the creation of hypotheses regarding the roles of four genes never previously characterized as involved in craniofacial development; each of these hypotheses was validated by further experimental work.
Biomedical hypothesis generation by text mining and gene prioritization.
Petric, Ingrid; Ligeti, Balazs; Gyorffy, Balazs; Pongor, Sandor
2014-01-01
Text mining methods can facilitate the generation of biomedical hypotheses by suggesting novel associations between diseases and genes. Previously, we developed a rare-term model called RaJoLink (Petric et al, J. Biomed. Inform. 42(2): 219-227, 2009) in which hypotheses are formulated on the basis of terms rarely associated with a target domain. Since many current medical hypotheses are formulated in terms of molecular entities and molecular mechanisms, here we extend the methodology to proteins and genes, using a standardized vocabulary as well as a gene/protein network model. The proposed enhanced RaJoLink rare-term model combines text mining and gene prioritization approaches. Its utility is illustrated by finding known as well as potential gene-disease associations in ovarian cancer using MEDLINE abstracts and the STRING database.
A Scalable Data Access Layer to Manage Structured Heterogeneous Biomedical Data.
Delussu, Giovanni; Lianas, Luca; Frexia, Francesca; Zanetti, Gianluigi
2016-01-01
This work presents a scalable data access layer, called PyEHR, designed to support the implementation of data management systems for secondary use of structured heterogeneous biomedical and clinical data. PyEHR adopts the openEHR's formalisms to guarantee the decoupling of data descriptions from implementation details and exploits structure indexing to accelerate searches. Data persistence is guaranteed by a driver layer with a common driver interface. Interfaces for two NoSQL Database Management Systems are already implemented: MongoDB and Elasticsearch. We evaluated the scalability of PyEHR experimentally through two types of tests, called "Constant Load" and "Constant Number of Records", with queries of increasing complexity on synthetic datasets of ten million records each, containing very complex openEHR archetype structures, distributed on up to ten computing nodes.
The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update
Afgan, Enis; Baker, Dannon; van den Beek, Marius; Blankenberg, Daniel; Bouvier, Dave; Čech, Martin; Chilton, John; Clements, Dave; Coraor, Nate; Eberhard, Carl; Grüning, Björn; Guerler, Aysam; Hillman-Jackson, Jennifer; Von Kuster, Greg; Rasche, Eric; Soranzo, Nicola; Turaga, Nitesh; Taylor, James; Nekrutenko, Anton; Goecks, Jeremy
2016-01-01
High-throughput data production technologies, particularly ‘next-generation’ DNA sequencing, have ushered in widespread and disruptive changes to biomedical research. Making sense of the large datasets produced by these technologies requires sophisticated statistical and computational methods, as well as substantial computational power. This has led to an acute crisis in life sciences, as researchers without informatics training attempt to perform computation-dependent analyses. Since 2005, the Galaxy project has worked to address this problem by providing a framework that makes advanced computational tools usable by non experts. Galaxy seeks to make data-intensive research more accessible, transparent and reproducible by providing a Web-based environment in which users can perform computational analyses and have all of the details automatically tracked for later inspection, publication, or reuse. In this report we highlight recently added features enabling biomedical analyses on a large scale. PMID:27137889
The center for causal discovery of biomedical knowledge from big data.
Cooper, Gregory F; Bahar, Ivet; Becich, Michael J; Benos, Panayiotis V; Berg, Jeremy; Espino, Jeremy U; Glymour, Clark; Jacobson, Rebecca Crowley; Kienholz, Michelle; Lee, Adrian V; Lu, Xinghua; Scheines, Richard
2015-11-01
The Big Data to Knowledge (BD2K) Center for Causal Discovery is developing and disseminating an integrated set of open source tools that support causal modeling and discovery of biomedical knowledge from large and complex biomedical datasets. The Center integrates teams of biomedical and data scientists focused on the refinement of existing and the development of new constraint-based and Bayesian algorithms based on causal Bayesian networks, the optimization of software for efficient operation in a supercomputing environment, and the testing of algorithms and software developed using real data from 3 representative driving biomedical projects: cancer driver mutations, lung disease, and the functional connectome of the human brain. Associated training activities provide both biomedical and data scientists with the knowledge and skills needed to apply and extend these tools. Collaborative activities with the BD2K Consortium further advance causal discovery tools and integrate tools and resources developed by other centers. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Biomedical equipment and medical services in India.
Sahay, K B; Saxena, R K
Varieties of Biomedical Equipment (BME) are now used for quick diagnosis, flawless surgery and therapeutics etc. Use of a malfunctioning BME could result in faulty diagnosis and wrong treatment and can lead to damaging or even devastating aftermath. Modern Biomedical Equipments inevitably employ highly sophisticated technology and use complex systems and instrumentation for best results. To the best of our knowledge the medical education in India does not impart any knowledge on the theory and design of BME and it is perhaps not possible also. Hence there is need for a permanent mechanism which can maintain and repair the biomedical equipments routinely before use and this can be done only with the help of qualified Clinical Engineers. Thus there is a genuine need for well organized cadre of Clinical Engineers who would be persons with engineering background with specialization in medical instrumentation. These Clinical engineers should be made responsible for the maintenance and proper functioning of BME. Every hospital or group of hospitals in the advanced countries has a clinical engineering unit that takes care of the biomedical equipments and systems in the hospital by undertaking routine and preventive maintenance, regular calibration of equipments and their timely repairs. Clinical engineers should be thus made an essential part of modern health care system and services. Unfortunately such facilities and mechanism do not exist in India. To make BME maintenance efficient and flawless in India, study suggests following measures and remedies: (i) design and development of comprehensive computerized database for BME (ii) cadre of Clinical engineers (iii) online maintenance facility and (iv) farsighted managerial skill to maximize accuracy, functioning and cost effectiveness.
Training Multidisciplinary Biomedical Informatics Students: Three Years of Experience
van Mulligen, Erik M.; Cases, Montserrat; Hettne, Kristina; Molero, Eva; Weeber, Marc; Robertson, Kevin A.; Oliva, Baldomero; de la Calle, Guillermo; Maojo, Victor
2008-01-01
Objective The European INFOBIOMED Network of Excellence 1 recognized that a successful education program in biomedical informatics should include not only traditional teaching activities in the basic sciences but also the development of skills for working in multidisciplinary teams. Design A carefully developed 3-year training program for biomedical informatics students addressed these educational aspects through the following four activities: (1) an internet course database containing an overview of all Medical Informatics and BioInformatics courses, (2) a BioMedical Informatics Summer School, (3) a mobility program based on a ‘brokerage service’ which published demands and offers, including funding for research exchange projects, and (4) training challenges aimed at the development of multi-disciplinary skills. Measurements This paper focuses on experiences gained in the development of novel educational activities addressing work in multidisciplinary teams. The training challenges described here were evaluated by asking participants to fill out forms with Likert scale based questions. For the mobility program a needs assessment was carried out. Results The mobility program supported 20 exchanges which fostered new BMI research, resulted in a number of peer-reviewed publications and demonstrated the feasibility of this multidisciplinary BMI approach within the European Union. Students unanimously indicated that the training challenge experience had contributed to their understanding and appreciation of multidisciplinary teamwork. Conclusion The training activities undertaken in INFOBIOMED have contributed to a multi-disciplinary BMI approach. It is our hope that this work might provide an impetus for training efforts in Europe, and yield a new generation of biomedical informaticians. PMID:18096914
Biomedical Risk Mitigation: 1994-2004
NASA Technical Reports Server (NTRS)
2004-01-01
This custom bibliography from the NASA Scientific and Technical Information Program lists a sampling of records found in the NASA Aeronautics and Space Database. The scope of this topic includes space medicine, remote monitoring, diagnosis, and treatment. This area of focus is one of the enabling technologies as defined by NASA s Report of the President s Commission on Implementation of United States Space Exploration Policy, published in June 2004.
Mimvec: a deep learning approach for analyzing the human phenome.
Gan, Mingxin; Li, Wenran; Zeng, Wanwen; Wang, Xiaojian; Jiang, Rui
2017-09-21
The human phenome has been widely used with a variety of genomic data sources in the inference of disease genes. However, most existing methods thus far derive phenotype similarity based on the analysis of biomedical databases by using the traditional term frequency-inverse document frequency (TF-IDF) formulation. This framework, though intuitive, not only ignores semantic relationships between words but also tends to produce high-dimensional vectors, and hence lacks the ability to precisely capture intrinsic semantic characteristics of biomedical documents. To overcome these limitations, we propose a framework called mimvec to analyze the human phenome by making use of the state-of-the-art deep learning technique in natural language processing. We converted 24,061 records in the Online Mendelian Inheritance in Man (OMIM) database to low-dimensional vectors using our method. We demonstrated that the vector presentation not only effectively enabled classification of phenotype records against gene ones, but also succeeded in discriminating diseases of different inheritance styles and different mechanisms. We further derived pairwise phenotype similarities between 7988 human inherited diseases using their vector presentations. With a joint analysis of this phenome with multiple genomic data, we showed that phenotype overlap indeed implied genotype overlap. We finally used the derived phenotype similarities with genomic data to prioritize candidate genes and demonstrated advantages of this method over existing ones. Our method is capable of not only capturing semantic relationships between words in biomedical records but also alleviating the dimensional disaster accompanying the traditional TF-IDF framework. With the approaching of precision medicine, there will be abundant electronic records of medicine and health awaiting for deep analysis, and we expect to see a wide spectrum of applications borrowing the idea of our method in the near future.
It's all in the timing: calibrating temporal penalties for biomedical data sharing.
Xia, Weiyi; Wan, Zhiyu; Yin, Zhijun; Gaupp, James; Liu, Yongtai; Clayton, Ellen Wright; Kantarcioglu, Murat; Vorobeychik, Yevgeniy; Malin, Bradley A
2018-01-01
Biomedical science is driven by datasets that are being accumulated at an unprecedented rate, with ever-growing volume and richness. There are various initiatives to make these datasets more widely available to recipients who sign Data Use Certificate agreements, whereby penalties are levied for violations. A particularly popular penalty is the temporary revocation, often for several months, of the recipient's data usage rights. This policy is based on the assumption that the value of biomedical research data depreciates significantly over time; however, no studies have been performed to substantiate this belief. This study investigates whether this assumption holds true and the data science policy implications. This study tests the hypothesis that the value of data for scientific investigators, in terms of the impact of the publications based on the data, decreases over time. The hypothesis is tested formally through a mixed linear effects model using approximately 1200 publications between 2007 and 2013 that used datasets from the Database of Genotypes and Phenotypes, a data-sharing initiative of the National Institutes of Health. The analysis shows that the impact factors for publications based on Database of Genotypes and Phenotypes datasets depreciate in a statistically significant manner. However, we further discover that the depreciation rate is slow, only ∼10% per year, on average. The enduring value of data for subsequent studies implies that revoking usage for short periods of time may not sufficiently deter those who would violate Data Use Certificate agreements and that alternative penalty mechanisms may need to be invoked. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
A Machine Learning-based Method for Question Type Classification in Biomedical Question Answering.
Sarrouti, Mourad; Ouatik El Alaoui, Said
2017-05-18
Biomedical question type classification is one of the important components of an automatic biomedical question answering system. The performance of the latter depends directly on the performance of its biomedical question type classification system, which consists of assigning a category to each question in order to determine the appropriate answer extraction algorithm. This study aims to automatically classify biomedical questions into one of the four categories: (1) yes/no, (2) factoid, (3) list, and (4) summary. In this paper, we propose a biomedical question type classification method based on machine learning approaches to automatically assign a category to a biomedical question. First, we extract features from biomedical questions using the proposed handcrafted lexico-syntactic patterns. Then, we feed these features for machine-learning algorithms. Finally, the class label is predicted using the trained classifiers. Experimental evaluations performed on large standard annotated datasets of biomedical questions, provided by the BioASQ challenge, demonstrated that our method exhibits significant improved performance when compared to four baseline systems. The proposed method achieves a roughly 10-point increase over the best baseline in terms of accuracy. Moreover, the obtained results show that using handcrafted lexico-syntactic patterns as features' provider of support vector machine (SVM) lead to the highest accuracy of 89.40 %. The proposed method can automatically classify BioASQ questions into one of the four categories: yes/no, factoid, list, and summary. Furthermore, the results demonstrated that our method produced the best classification performance compared to four baseline systems.
Photoreconfigurable polymers for biomedical applications: chemistry and macromolecular engineering.
Zhu, Congcong; Ninh, Chi; Bettinger, Christopher J
2014-10-13
Stimuli-responsive polymers play an important role in many biomedical technologies. Light responsive polymers are particularly desirable because the parameters of irradiated light and diverse photoactive chemistries produce a large number of combinations between functional materials and associated stimuli. This Review summarizes recent advances in utilizing photoactive chemistries in macromolecules for prospective use in biomedical applications. Special focus is granted to selection criterion when choosing photofunctional groups. Synthetic strategies to incorporate these functionalities into polymers and networks with different topologies are also highlighted herein. Prospective applications of these materials are discussed including programmable matrices for controlled release, dynamic scaffolds for tissue engineering, and functional coatings for medical devices. The article concludes by summarizing the state of the art in photoresponsive polymers for biomedical applications including current challenges and future opportunities.
Data management integration for biomedical core facilities
NASA Astrophysics Data System (ADS)
Zhang, Guo-Qiang; Szymanski, Jacek; Wilson, David
2007-03-01
We present the design, development, and pilot-deployment experiences of MIMI, a web-based, Multi-modality Multi-Resource Information Integration environment for biomedical core facilities. This is an easily customizable, web-based software tool that integrates scientific and administrative support for a biomedical core facility involving a common set of entities: researchers; projects; equipments and devices; support staff; services; samples and materials; experimental workflow; large and complex data. With this software, one can: register users; manage projects; schedule resources; bill services; perform site-wide search; archive, back-up, and share data. With its customizable, expandable, and scalable characteristics, MIMI not only provides a cost-effective solution to the overarching data management problem of biomedical core facilities unavailable in the market place, but also lays a foundation for data federation to facilitate and support discovery-driven research.
Bacterial collagen-like proteins that form triple-helical structures
Yu, Zhuoxin; An, Bo; Ramshaw, John A.M.; Brodsky, Barbara
2014-01-01
A large number of collagen-like proteins have been identified in bacteria during the past ten years, principally from analysis of genome databases. These bacterial collagens share the distinctive Gly-Xaa-Yaa repeating amino acid sequence of animal collagens which underlies their unique triple-helical structure. A number of the bacterial collagens have been expressed in E. coli, and they all adopt a triple-helix conformation. Unlike animal collagens, these bacterial proteins do not contain the post-translationally modified amino acid, hydroxyproline, which is known to stabilize the triple-helix structure and may promote self-assembly. Despite the absence of collagen hydroxylation, the triple-helix structures of the bacterial collagens studied exhibit a high thermal stability of 35–39 °C, close to that seen for mammalian collagens. These bacterial collagens are readily produced in large quantities by recombinant methods, either in the original amino acid sequence or in genetically manipulated sequences. This new family of recombinant, easy to modify collagens could provide a novel system for investigating structural and functional motifs in animal collagens and could also form the basis of new biomedical materials with designed structural properties and functions. PMID:24434612
A token centric part-of-speech tagger for biomedical text.
Barrett, Neil; Weber-Jahnke, Jens
2014-05-01
Difficulties with part-of-speech (POS) tagging of biomedical text is accessing and annotating appropriate training corpora. These difficulties may result in POS taggers trained on corpora that differ from the tagger's target biomedical text (cross-domain tagging). In such cases where training and target corpora differ tagging accuracy decreases. This paper presents a POS tagger for cross-domain tagging called TcT. TcT estimates a tag's likelihood for a given token by combining token collocation probabilities and the token's tag probabilities calculated using a Naive Bayes classifier. We compared TcT to three POS taggers used in the biomedical domain (mxpost, Brill and TnT). We trained each tagger on a non-biomedical corpus and evaluated it on biomedical corpora. TcT was more accurate in cross-domain tagging than mxpost, Brill and TnT (respective averages 83.9, 81.0, 79.5 and 78.8). Our analysis of tagger performance suggests that lexical differences between corpora have more effect on tagging accuracy than originally considered by previous research work. Biomedical POS tagging algorithms may be modified to improve their cross-domain tagging accuracy without requiring extra training or large training data sets. Future work should reexamine POS tagging methods for biomedical text. This differs from the work to date that has focused on retraining existing POS taggers. Copyright © 2014 Elsevier B.V. All rights reserved.
2013-01-01
Background Professionals in the biomedical domain are confronted with an increasing mass of data. Developing methods to assist professional end users in the field of Knowledge Discovery to identify, extract, visualize and understand useful information from these huge amounts of data is a huge challenge. However, there are so many diverse methods and methodologies available, that for biomedical researchers who are inexperienced in the use of even relatively popular knowledge discovery methods, it can be very difficult to select the most appropriate method for their particular research problem. Results A web application, called KNODWAT (KNOwledge Discovery With Advanced Techniques) has been developed, using Java on Spring framework 3.1. and following a user-centered approach. The software runs on Java 1.6 and above and requires a web server such as Apache Tomcat and a database server such as the MySQL Server. For frontend functionality and styling, Twitter Bootstrap was used as well as jQuery for interactive user interface operations. Conclusions The framework presented is user-centric, highly extensible and flexible. Since it enables methods for testing using existing data to assess suitability and performance, it is especially suitable for inexperienced biomedical researchers, new to the field of knowledge discovery and data mining. For testing purposes two algorithms, CART and C4.5 were implemented using the WEKA data mining framework. PMID:23763826
Holzinger, Andreas; Zupan, Mario
2013-06-13
Professionals in the biomedical domain are confronted with an increasing mass of data. Developing methods to assist professional end users in the field of Knowledge Discovery to identify, extract, visualize and understand useful information from these huge amounts of data is a huge challenge. However, there are so many diverse methods and methodologies available, that for biomedical researchers who are inexperienced in the use of even relatively popular knowledge discovery methods, it can be very difficult to select the most appropriate method for their particular research problem. A web application, called KNODWAT (KNOwledge Discovery With Advanced Techniques) has been developed, using Java on Spring framework 3.1. and following a user-centered approach. The software runs on Java 1.6 and above and requires a web server such as Apache Tomcat and a database server such as the MySQL Server. For frontend functionality and styling, Twitter Bootstrap was used as well as jQuery for interactive user interface operations. The framework presented is user-centric, highly extensible and flexible. Since it enables methods for testing using existing data to assess suitability and performance, it is especially suitable for inexperienced biomedical researchers, new to the field of knowledge discovery and data mining. For testing purposes two algorithms, CART and C4.5 were implemented using the WEKA data mining framework.
Probabilistic and machine learning-based retrieval approaches for biomedical dataset retrieval
Karisani, Payam; Qin, Zhaohui S; Agichtein, Eugene
2018-01-01
Abstract The bioCADDIE dataset retrieval challenge brought together different approaches to retrieval of biomedical datasets relevant to a user’s query, expressed as a text description of a needed dataset. We describe experiments in applying a data-driven, machine learning-based approach to biomedical dataset retrieval as part of this challenge. We report on a series of experiments carried out to evaluate the performance of both probabilistic and machine learning-driven techniques from information retrieval, as applied to this challenge. Our experiments with probabilistic information retrieval methods, such as query term weight optimization, automatic query expansion and simulated user relevance feedback, demonstrate that automatically boosting the weights of important keywords in a verbose query is more effective than other methods. We also show that although there is a rich space of potential representations and features available in this domain, machine learning-based re-ranking models are not able to improve on probabilistic information retrieval techniques with the currently available training data. The models and algorithms presented in this paper can serve as a viable implementation of a search engine to provide access to biomedical datasets. The retrieval performance is expected to be further improved by using additional training data that is created by expert annotation, or gathered through usage logs, clicks and other processes during natural operation of the system. Database URL: https://github.com/emory-irlab/biocaddie PMID:29688379
Bichutskiy, Vadim Y.; Colman, Richard; Brachmann, Rainer K.; Lathrop, Richard H.
2006-01-01
Complex problems in life science research give rise to multidisciplinary collaboration, and hence, to the need for heterogeneous database integration. The tumor suppressor p53 is mutated in close to 50% of human cancers, and a small drug-like molecule with the ability to restore native function to cancerous p53 mutants is a long-held medical goal of cancer treatment. The Cancer Research DataBase (CRDB) was designed in support of a project to find such small molecules. As a cancer informatics project, the CRDB involved small molecule data, computational docking results, functional assays, and protein structure data. As an example of the hybrid strategy for data integration, it combined the mediation and data warehousing approaches. This paper uses the CRDB to illustrate the hybrid strategy as a viable approach to heterogeneous data integration in biomedicine, and provides a design method for those considering similar systems. More efficient data sharing implies increased productivity, and, hopefully, improved chances of success in cancer research. (Code and database schemas are freely downloadable, http://www.igb.uci.edu/research/research.html.) PMID:19458771
A survey of working conditions within biomedical research in the United Kingdom.
Riddiford, Nick
2017-01-01
Background: Many recent articles have presented a bleak view of career prospects in biomedical research in the US. Too many PhDs and postdocs are trained for too few research positions, creating a "holding-tank" of experienced senior postdocs who are unable to get a permanent position. Coupled with relatively low salaries and high levels of pressure to publish in top-tier academic journals, this has created a toxic environment that is perhaps responsible for a recently observed decline in biomedical postdocs in the US, the so-called "postdocalypse". Methods: In order to address the gulf of information relating to working habits and attitudes of UK-based biomedical researchers, a link to an online survey was included in an article published in the Guardian newspaper. Survey data were collected between 21 st March 2016 and 6 th November 2016 and analysed to examine discrete profiles for three major career stages: the PhD, the postdoc and the principal investigator. Results: Overall, the data presented here echo trends observed in the US: The 520 UK-based biomedical researchers responding to the survey reported feeling disillusioned with academic research, due to the low chance of getting a permanent position and the long hours required at the bench. Also like the US, large numbers of researchers at each distinct career stage are considering leaving biomedical research altogether. Conclusions: There are several systemic flaws in the academic scientific research machine - for example the continual overproduction of PhDs and the lack of stability in the early-mid stages of a research career - that are slowly being addressed in countries such as the US and Germany. These data suggest that similar flaws also exist in the UK, with a large proportion of respondents concerned about their future in research. To avoid lasting damage to the biomedical research agenda in the UK, addressing such concerns should be a major priority.
Role for protein–protein interaction databases in human genetics
Pattin, Kristine A; Moore, Jason H
2010-01-01
Proteomics and the study of protein–protein interactions are becoming increasingly important in our effort to understand human diseases on a system-wide level. Thanks to the development and curation of protein-interaction databases, up-to-date information on these interaction networks is accessible and publicly available to the scientific community. As our knowledge of protein–protein interactions increases, it is important to give thought to the different ways that these resources can impact biomedical research. In this article, we highlight the importance of protein–protein interactions in human genetics and genetic epidemiology. Since protein–protein interactions demonstrate one of the strongest functional relationships between genes, combining genomic data with available proteomic data may provide us with a more in-depth understanding of common human diseases. In this review, we will discuss some of the fundamentals of protein interactions, the databases that are publicly available and how information from these databases can be used to facilitate genome-wide genetic studies. PMID:19929610
Image query and indexing for digital x rays
NASA Astrophysics Data System (ADS)
Long, L. Rodney; Thoma, George R.
1998-12-01
The web-based medical information retrieval system (WebMIRS) allows interned access to databases containing 17,000 digitized x-ray spine images and associated text data from National Health and Nutrition Examination Surveys (NHANES). WebMIRS allows SQL query of the text, and viewing of the returned text records and images using a standard browser. We are now working (1) to determine utility of data directly derived from the images in our databases, and (2) to investigate the feasibility of computer-assisted or automated indexing of the images to support image retrieval of images of interest to biomedical researchers in the field of osteoarthritis. To build an initial database based on image data, we are manually segmenting a subset of the vertebrae, using techniques from vertebral morphometry. From this, we will derive and add to the database vertebral features. This image-derived data will enhance the user's data access capability by enabling the creation of combined SQL/image-content queries.
Sarkar, Indra Neil
2010-11-13
A better understanding of commensal microbiotic communities ("microbiomes") may provide valuable insights to human health. Towards this goal, an essential step may be the development of approaches to organize data that can enable comparative hypotheses across mammalian microbiomes. The present study explores the feasibility of using existing biomedical informatics resources - especially focusing on those available at the National Center for Biomedical Ontology - to organize microbiome data contained within large sequence repositories, such as GenBank. The results indicate that the Foundational Model of Anatomy and SNOMED CT can be used to organize greater than 90% of the bacterial organisms associated with 10 domesticated mammalian species. The promising findings suggest that the current biomedical informatics infrastructure may be used towards the organizing of microbiome data beyond humans. Furthermore, the results identify key concepts that might be organized into a semantic structure for incorporation into subsequent annotations that could facilitate comparative biomedical hypotheses pertaining to human health.
Stewart, Robert; Soremekun, Mishael; Perera, Gayan; Broadbent, Matthew; Callard, Felicity; Denis, Mike; Hotopf, Matthew; Thornicroft, Graham; Lovestone, Simon
2009-01-01
Background Case registers have been used extensively in mental health research. Recent developments in electronic medical records, and in computer software to search and analyse these in anonymised format, have the potential to revolutionise this research tool. Methods We describe the development of the South London and Maudsley NHS Foundation Trust (SLAM) Biomedical Research Centre (BRC) Case Register Interactive Search tool (CRIS) which allows research-accessible datasets to be derived from SLAM, the largest provider of secondary mental healthcare in Europe. All clinical data, including free text, are available for analysis in the form of anonymised datasets. Development involved both the building of the system and setting in place the necessary security (with both functional and procedural elements). Results Descriptive data are presented for the Register database as of October 2008. The database at that point included 122,440 cases, 35,396 of whom were receiving active case management under the Care Programme Approach. In terms of gender and ethnicity, the database was reasonably representative of the source population. The most common assigned primary diagnoses were within the ICD mood disorders (n = 12,756) category followed by schizophrenia and related disorders (8158), substance misuse (7749), neuroses (7105) and organic disorders (6414). Conclusion The SLAM BRC Case Register represents a 'new generation' of this research design, built on a long-running system of fully electronic clinical records and allowing in-depth secondary analysis of both numerical, string and free text data, whilst preserving anonymity through technical and procedural safeguards. PMID:19674459
Planning the Human Variome Project: The Spain Report†
Kaput, Jim; Cotton, Richard G. H.; Hardman, Lauren; Al Aqeel, Aida I.; Al-Aama, Jumana Y.; Al-Mulla, Fahd; Aretz, Stefan; Auerbach, Arleen D.; Axton, Myles; Bapat, Bharati; Bernstein, Inge T.; Bhak, Jong; Bleoo, Stacey L.; Blöcker, Helmut; Brenner, Steven E.; Burn, John; Bustamante, Mariona; Calzone, Rita; Cambon-Thomsen, Anne; Cargill, Michele; Carrera, Paola; Cavedon, Lawrence; Cho, Yoon Shin; Chung, Yeun-Jun; Claustres, Mireille; Cutting, Garry; Dalgleish, Raymond; den Dunnen, Johan T.; Díaz, Carlos; Dobrowolski, Steven; dos Santos, M. Rosário N.; Ekong, Rosemary; Flanagan, Simon B.; Flicek, Paul; Furukawa, Yoichi; Genuardi, Maurizio; Ghang, Ho; Golubenko, Maria V.; Greenblatt, Marc S.; Hamosh, Ada; Hancock, John M.; Hardison, Ross; Harrison, Terence M.; Hoffmann, Robert; Horaitis, Rania; Howard, Heather J.; Barash, Carol Isaacson; Izagirre, Neskuts; Jung, Jongsun; Kojima, Toshio; Laradi, Sandrine; Lee, Yeon-Su; Lee, Jong-Young; Gil-da-Silva-Lopes, Vera L.; Macrae, Finlay A.; Maglott, Donna; Marafie, Makia J.; Marsh, Steven G.E.; Matsubara, Yoichi; Messiaen, Ludwine M.; Möslein, Gabriela; Netea, Mihai G.; Norton, Melissa L.; Oefner, Peter J.; Oetting, William S.; O’Leary, James C.; de Ramirez, Ana Maria Oller; Paalman, Mark H.; Parboosingh, Jillian; Patrinos, George P.; Perozzi, Giuditta; Phillips, Ian R.; Povey, Sue; Prasad, Suyash; Qi, Ming; Quin, David J.; Ramesar, Rajkumar S.; Richards, C. Sue; Savige, Judith; Scheible, Dagmar G.; Scott, Rodney J.; Seminara, Daniela; Shephard, Elizabeth A.; Sijmons, Rolf H.; Smith, Timothy D.; Sobrido, María-Jesús; Tanaka, Toshihiro; Tavtigian, Sean V.; Taylor, Graham R.; Teague, Jon; Töpel, Thoralf; Ullman-Cullere, Mollie; Utsunomiya, Joji; van Kranen, Henk J.; Vihinen, Mauno; Watson, Michael; Webb, Elizabeth; Weber, Thomas K.; Yeager, Meredith; Yeom, Young I.; Yim, Seon-Hee; Yoo, Hyang-Sook
2018-01-01
The remarkable progress in characterizing the human genome sequence, exemplified by the Human Genome Project and the HapMap Consortium, has led to the perception that knowledge and the tools (e.g., microarrays) are sufficient for many if not most biomedical research efforts. A large amount of data from diverse studies proves this perception inaccurate at best, and at worst, an impediment for further efforts to characterize the variation in the human genome. Since variation in genotype and environment are the fundamental basis to understand phenotypic variability and heritability at the population level, identifying the range of human genetic variation is crucial to the development of personalized nutrition and medicine. The Human Variome Project (HVP; http://www.humanvariomeproject.org/) was proposed initially to systematically collect mutations that cause human disease and create a cyber infrastructure to link locus specific databases (LSDB). We report here the discussions and recommendations from the 2008 HVP planning meeting held in San Feliu de Guixols, Spain, in May 2008. PMID:19306394
Example-Based Super-Resolution Fluorescence Microscopy.
Jia, Shu; Han, Boran; Kutz, J Nathan
2018-04-23
Capturing biological dynamics with high spatiotemporal resolution demands the advancement in imaging technologies. Super-resolution fluorescence microscopy offers spatial resolution surpassing the diffraction limit to resolve near-molecular-level details. While various strategies have been reported to improve the temporal resolution of super-resolution imaging, all super-resolution techniques are still fundamentally limited by the trade-off associated with the longer image acquisition time that is needed to achieve higher spatial information. Here, we demonstrated an example-based, computational method that aims to obtain super-resolution images using conventional imaging without increasing the imaging time. With a low-resolution image input, the method provides an estimate of its super-resolution image based on an example database that contains super- and low-resolution image pairs of biological structures of interest. The computational imaging of cellular microtubules agrees approximately with the experimental super-resolution STORM results. This new approach may offer potential improvements in temporal resolution for experimental super-resolution fluorescence microscopy and provide a new path for large-data aided biomedical imaging.
Planning the human variome project: the Spain report.
Kaput, Jim; Cotton, Richard G H; Hardman, Lauren; Watson, Michael; Al Aqeel, Aida I; Al-Aama, Jumana Y; Al-Mulla, Fahd; Alonso, Santos; Aretz, Stefan; Auerbach, Arleen D; Bapat, Bharati; Bernstein, Inge T; Bhak, Jong; Bleoo, Stacey L; Blöcker, Helmut; Brenner, Steven E; Burn, John; Bustamante, Mariona; Calzone, Rita; Cambon-Thomsen, Anne; Cargill, Michele; Carrera, Paola; Cavedon, Lawrence; Cho, Yoon Shin; Chung, Yeun-Jun; Claustres, Mireille; Cutting, Garry; Dalgleish, Raymond; den Dunnen, Johan T; Díaz, Carlos; Dobrowolski, Steven; dos Santos, M Rosário N; Ekong, Rosemary; Flanagan, Simon B; Flicek, Paul; Furukawa, Yoichi; Genuardi, Maurizio; Ghang, Ho; Golubenko, Maria V; Greenblatt, Marc S; Hamosh, Ada; Hancock, John M; Hardison, Ross; Harrison, Terence M; Hoffmann, Robert; Horaitis, Rania; Howard, Heather J; Barash, Carol Isaacson; Izagirre, Neskuts; Jung, Jongsun; Kojima, Toshio; Laradi, Sandrine; Lee, Yeon-Su; Lee, Jong-Young; Gil-da-Silva-Lopes, Vera L; Macrae, Finlay A; Maglott, Donna; Marafie, Makia J; Marsh, Steven G E; Matsubara, Yoichi; Messiaen, Ludwine M; Möslein, Gabriela; Netea, Mihai G; Norton, Melissa L; Oefner, Peter J; Oetting, William S; O'Leary, James C; de Ramirez, Ana Maria Oller; Paalman, Mark H; Parboosingh, Jillian; Patrinos, George P; Perozzi, Giuditta; Phillips, Ian R; Povey, Sue; Prasad, Suyash; Qi, Ming; Quin, David J; Ramesar, Rajkumar S; Richards, C Sue; Savige, Judith; Scheible, Dagmar G; Scott, Rodney J; Seminara, Daniela; Shephard, Elizabeth A; Sijmons, Rolf H; Smith, Timothy D; Sobrido, María-Jesús; Tanaka, Toshihiro; Tavtigian, Sean V; Taylor, Graham R; Teague, Jon; Töpel, Thoralf; Ullman-Cullere, Mollie; Utsunomiya, Joji; van Kranen, Henk J; Vihinen, Mauno; Webb, Elizabeth; Weber, Thomas K; Yeager, Meredith; Yeom, Young I; Yim, Seon-Hee; Yoo, Hyang-Sook
2009-04-01
The remarkable progress in characterizing the human genome sequence, exemplified by the Human Genome Project and the HapMap Consortium, has led to the perception that knowledge and the tools (e.g., microarrays) are sufficient for many if not most biomedical research efforts. A large amount of data from diverse studies proves this perception inaccurate at best, and at worst, an impediment for further efforts to characterize the variation in the human genome. Because variation in genotype and environment are the fundamental basis to understand phenotypic variability and heritability at the population level, identifying the range of human genetic variation is crucial to the development of personalized nutrition and medicine. The Human Variome Project (HVP; http://www.humanvariomeproject.org/) was proposed initially to systematically collect mutations that cause human disease and create a cyber infrastructure to link locus specific databases (LSDB). We report here the discussions and recommendations from the 2008 HVP planning meeting held in San Feliu de Guixols, Spain, in May 2008. (c) 2009 Wiley-Liss, Inc.
[Public financing of health research in Chile].
Paraje, Guillermo
2010-01-01
In Chile, researchers can apply to public research funds through specific research projects and must compete with other professionals of other disciplines. To perform a critical assessment of the allocation of public funds for health research in Chile by a public institution called CONICYT. A database was constructed with health projects financed by CONICYT, between 2002 and 2006. Projects were classified (according to their titles) in three methodological categories and nine topics. Age, gender and region where the main researcher is based, were also recorded. 768 research projects were analyzed. Biomedical, clinical and public health research projects accounted for 66, 24 and 10% of allocated funds, respectively. Main researchers were female in 31 % of projects, their mean age was 52 years and 76% worked in the Metropolitan region. These results show that some objectives of the National Research System lead by CONICYT, such as using research as a tool for regional development and allocating funds for conditions with a large burden, are not been met.
Management information system of medical equipment using mobile devices
NASA Astrophysics Data System (ADS)
Núñez, C.; Castro, D.
2011-09-01
The large numbers of technologies currently incorporated into mobile devices transform them into excellent tools for capture and to manage the information, because of the increasing computing power and storage that allow to add many miscellaneous applications. In order to obtain benefits of these technologies, in the biomedical engineering field, it was developed a mobile information system for medical equipment management. The central platform for the system it's a mobile phone, which by a connection with a web server, it's capable to send and receive information relative to any medical equipment. Decoding a type of barcodes, known as QR-Codes, the management process is simplified and improved. These barcodes identified the medical equipments in a database, when these codes are photographed and decoded with the mobile device, you can access to relevant information about the medical equipment in question. This Project in it's actual state is a basic support tool for the maintenance of medical equipment. It is also a modern alternative, competitive and economic in the actual market.
Genomic region operation kit for flexible processing of deep sequencing data.
Ovaska, Kristian; Lyly, Lauri; Sahu, Biswajyoti; Jänne, Olli A; Hautaniemi, Sampsa
2013-01-01
Computational analysis of data produced in deep sequencing (DS) experiments is challenging due to large data volumes and requirements for flexible analysis approaches. Here, we present a mathematical formalism based on set algebra for frequently performed operations in DS data analysis to facilitate translation of biomedical research questions to language amenable for computational analysis. With the help of this formalism, we implemented the Genomic Region Operation Kit (GROK), which supports various DS-related operations such as preprocessing, filtering, file conversion, and sample comparison. GROK provides high-level interfaces for R, Python, Lua, and command line, as well as an extension C++ API. It supports major genomic file formats and allows storing custom genomic regions in efficient data structures such as red-black trees and SQL databases. To demonstrate the utility of GROK, we have characterized the roles of two major transcription factors (TFs) in prostate cancer using data from 10 DS experiments. GROK is freely available with a user guide from >http://csbi.ltdk.helsinki.fi/grok/.
DenguePredict: An Integrated Drug Repositioning Approach towards Drug Discovery for Dengue.
Wang, QuanQiu; Xu, Rong
2015-01-01
Dengue is a viral disease of expanding global incidence without cures. Here we present a drug repositioning system (DenguePredict) leveraging upon a unique drug treatment database and vast amounts of disease- and drug-related data. We first constructed a large-scale genetic disease network with enriched dengue genetics data curated from biomedical literature. We applied a network-based ranking algorithm to find dengue-related diseases from the disease network. We then developed a novel algorithm to prioritize FDA-approved drugs from dengue-related diseases to treat dengue. When tested in a de-novo validation setting, DenguePredict found the only two drugs tested in clinical trials for treating dengue and ranked them highly: chloroquine ranked at top 0.96% and ivermectin at top 22.75%. We showed that drugs targeting immune systems and arachidonic acid metabolism-related apoptotic pathways might represent innovative drugs to treat dengue. In summary, DenguePredict, by combining comprehensive disease- and drug-related data and novel algorithms, may greatly facilitate drug discovery for dengue.
Tackling the challenges of matching biomedical ontologies.
Faria, Daniel; Pesquita, Catia; Mott, Isabela; Martins, Catarina; Couto, Francisco M; Cruz, Isabel F
2018-01-15
Biomedical ontologies pose several challenges to ontology matching due both to the complexity of the biomedical domain and to the characteristics of the ontologies themselves. The biomedical tracks in the Ontology Matching Evaluation Initiative (OAEI) have spurred the development of matching systems able to tackle these challenges, and benchmarked their general performance. In this study, we dissect the strategies employed by matching systems to tackle the challenges of matching biomedical ontologies and gauge the impact of the challenges themselves on matching performance, using the AgreementMakerLight (AML) system as the platform for this study. We demonstrate that the linear complexity of the hash-based searching strategy implemented by most state-of-the-art ontology matching systems is essential for matching large biomedical ontologies efficiently. We show that accounting for all lexical annotations (e.g., labels and synonyms) in biomedical ontologies leads to a substantial improvement in F-measure over using only the primary name, and that accounting for the reliability of different types of annotations generally also leads to a marked improvement. Finally, we show that cross-references are a reliable source of information and that, when using biomedical ontologies as background knowledge, it is generally more reliable to use them as mediators than to perform lexical expansion. We anticipate that translating traditional matching algorithms to the hash-based searching paradigm will be a critical direction for the future development of the field. Improving the evaluation carried out in the biomedical tracks of the OAEI will also be important, as without proper reference alignments there is only so much that can be ascertained about matching systems or strategies. Nevertheless, it is clear that, to tackle the various challenges posed by biomedical ontologies, ontology matching systems must be able to efficiently combine multiple strategies into a mature matching approach.
Discovering Beaten Paths in Collaborative Ontology-Engineering Projects using Markov Chains
Walk, Simon; Singer, Philipp; Strohmaier, Markus; Tudorache, Tania; Musen, Mark A.; Noy, Natalya F.
2014-01-01
Biomedical taxonomies, thesauri and ontologies in the form of the International Classification of Diseases as a taxonomy or the National Cancer Institute Thesaurus as an OWL-based ontology, play a critical role in acquiring, representing and processing information about human health. With increasing adoption and relevance, biomedical ontologies have also significantly increased in size. For example, the 11th revision of the International Classification of Diseases, which is currently under active development by the World Health Organization contains nearly 50, 000 classes representing a vast variety of different diseases and causes of death. This evolution in terms of size was accompanied by an evolution in the way ontologies are engineered. Because no single individual has the expertise to develop such large-scale ontologies, ontology-engineering projects have evolved from small-scale efforts involving just a few domain experts to large-scale projects that require effective collaboration between dozens or even hundreds of experts, practitioners and other stakeholders. Understanding the way these different stakeholders collaborate will enable us to improve editing environments that support such collaborations. In this paper, we uncover how large ontology-engineering projects, such as the International Classification of Diseases in its 11th revision, unfold by analyzing usage logs of five different biomedical ontology-engineering projects of varying sizes and scopes using Markov chains. We discover intriguing interaction patterns (e.g., which properties users frequently change after specific given ones) that suggest that large collaborative ontology-engineering projects are governed by a few general principles that determine and drive development. From our analysis, we identify commonalities and differences between different projects that have implications for project managers, ontology editors, developers and contributors working on collaborative ontology-engineering projects and tools in the biomedical domain. PMID:24953242
Discovering beaten paths in collaborative ontology-engineering projects using Markov chains.
Walk, Simon; Singer, Philipp; Strohmaier, Markus; Tudorache, Tania; Musen, Mark A; Noy, Natalya F
2014-10-01
Biomedical taxonomies, thesauri and ontologies in the form of the International Classification of Diseases as a taxonomy or the National Cancer Institute Thesaurus as an OWL-based ontology, play a critical role in acquiring, representing and processing information about human health. With increasing adoption and relevance, biomedical ontologies have also significantly increased in size. For example, the 11th revision of the International Classification of Diseases, which is currently under active development by the World Health Organization contains nearly 50,000 classes representing a vast variety of different diseases and causes of death. This evolution in terms of size was accompanied by an evolution in the way ontologies are engineered. Because no single individual has the expertise to develop such large-scale ontologies, ontology-engineering projects have evolved from small-scale efforts involving just a few domain experts to large-scale projects that require effective collaboration between dozens or even hundreds of experts, practitioners and other stakeholders. Understanding the way these different stakeholders collaborate will enable us to improve editing environments that support such collaborations. In this paper, we uncover how large ontology-engineering projects, such as the International Classification of Diseases in its 11th revision, unfold by analyzing usage logs of five different biomedical ontology-engineering projects of varying sizes and scopes using Markov chains. We discover intriguing interaction patterns (e.g., which properties users frequently change after specific given ones) that suggest that large collaborative ontology-engineering projects are governed by a few general principles that determine and drive development. From our analysis, we identify commonalities and differences between different projects that have implications for project managers, ontology editors, developers and contributors working on collaborative ontology-engineering projects and tools in the biomedical domain. Copyright © 2014 Elsevier Inc. All rights reserved.
Investigation of laser polarized xenon magnetic resonance
NASA Technical Reports Server (NTRS)
Walsworth, Ronald L.
1998-01-01
Ground-based investigations of a new biomedical diagnostic technology: nuclear magnetic resonance of laser polarized noble gas are addressed. The specific research tasks discussed are: (1) Development of a large-scale noble gas polarization system; (2) biomedical investigations using laser polarized noble gas in conventional (high magnetic field) NMR systems; and (3) the development and application of a low magnetic field system for laser polarized noble gas NMR.
Galli, Joakim; Oelrich, Johan; Taussig, Michael J.; Andreasson, Ulrika; Ortega-Paino, Eva; Landegren, Ulf
2015-01-01
We report the development of a new database of technology services and products for analysis of biobank samples in biomedical research. BARCdb, the Biobanking Analysis Resource Catalogue (http://www.barcdb.org), is a freely available web resource, listing expertise and molecular resource capabilities of research centres and biotechnology companies. The database is designed for researchers who require information on how to make best use of valuable biospecimens from biobanks and other sample collections, focusing on the choice of analytical techniques and the demands they make on the type of samples, pre-analytical sample preparation and amounts needed. BARCdb has been developed as part of the Swedish biobanking infrastructure (BBMRI.se), but now welcomes submissions from service providers throughout Europe. BARCdb can help match resource providers with potential users, stimulating transnational collaborations and ensuring compatibility of results from different labs. It can promote a more optimal use of European resources in general, both with respect to standard and more experimental technologies, as well as for valuable biobank samples. This article describes how information on service and reagent providers of relevant technologies is made available on BARCdb, and how this resource may contribute to strengthening biomedical research in academia and in the biotechnology and pharmaceutical industries. PMID:25336620
Structural biology computing: Lessons for the biomedical research sciences.
Morin, Andrew; Sliz, Piotr
2013-11-01
The field of structural biology, whose aim is to elucidate the molecular and atomic structures of biological macromolecules, has long been at the forefront of biomedical sciences in adopting and developing computational research methods. Operating at the intersection between biophysics, biochemistry, and molecular biology, structural biology's growth into a foundational framework on which many concepts and findings of molecular biology are interpreted1 has depended largely on parallel advancements in computational tools and techniques. Without these computing advances, modern structural biology would likely have remained an exclusive pursuit practiced by few, and not become the widely practiced, foundational field it is today. As other areas of biomedical research increasingly embrace research computing techniques, the successes, failures and lessons of structural biology computing can serve as a useful guide to progress in other biomedically related research fields. Copyright © 2013 Wiley Periodicals, Inc.
Toward a Cognitive Task Analysis for Biomedical Query Mediation
Hruby, Gregory W.; Cimino, James J.; Patel, Vimla; Weng, Chunhua
2014-01-01
In many institutions, data analysts use a Biomedical Query Mediation (BQM) process to facilitate data access for medical researchers. However, understanding of the BQM process is limited in the literature. To bridge this gap, we performed the initial steps of a cognitive task analysis using 31 BQM instances conducted between one analyst and 22 researchers in one academic department. We identified five top-level tasks, i.e., clarify research statement, explain clinical process, identify related data elements, locate EHR data element, and end BQM with either a database query or unmet, infeasible information needs, and 10 sub-tasks. We evaluated the BQM task model with seven data analysts from different clinical research institutions. Evaluators found all the tasks completely or semi-valid. This study contributes initial knowledge towards the development of a generalizable cognitive task representation for BQM. PMID:25954589
Toward a cognitive task analysis for biomedical query mediation.
Hruby, Gregory W; Cimino, James J; Patel, Vimla; Weng, Chunhua
2014-01-01
In many institutions, data analysts use a Biomedical Query Mediation (BQM) process to facilitate data access for medical researchers. However, understanding of the BQM process is limited in the literature. To bridge this gap, we performed the initial steps of a cognitive task analysis using 31 BQM instances conducted between one analyst and 22 researchers in one academic department. We identified five top-level tasks, i.e., clarify research statement, explain clinical process, identify related data elements, locate EHR data element, and end BQM with either a database query or unmet, infeasible information needs, and 10 sub-tasks. We evaluated the BQM task model with seven data analysts from different clinical research institutions. Evaluators found all the tasks completely or semi-valid. This study contributes initial knowledge towards the development of a generalizable cognitive task representation for BQM.
Biomedical Engineering curriculum at UAM-I: a critical review.
Martinez Licona, Fabiola; Azpiroz-Leehan, Joaquin; Urbina Medal, E Gerardo; Cadena Mendez, Miguel
2014-01-01
The Biomedical Engineering (BME) curriculum at Universidad Autónoma Metropolitana (UAM) has undergone at least four major transformations since the founding of the BME undergraduate program in 1974. This work is a critical assessment of the curriculum from the point of view of its results as derived from an analysis of, among other resources, institutional databases on students, graduates and their academic performance. The results of the evaluation can help us define admission policies as well as reasonable limits on the maximum duration of undergraduate studies. Other results linked to the faculty composition and the social environment can be used to define a methodology for the evaluation of teaching and the implementation of mentoring and tutoring programs. Changes resulting from this evaluation may be the only way to assure and maintain leadership and recognition from the BME community.
Multiple approaches to fine-grained indexing of the biomedical literature.
Neveol, Aurelie; Shooshan, Sonya E; Humphrey, Susanne M; Rindflesh, Thomas C; Aronson, Alan R
2007-01-01
The number of articles in the MEDLINE database is expected to increase tremendously in the coming years. To ensure that all these documents are indexed with continuing high quality, it is necessary to develop tools and methods that help the indexers in their daily task. We present three methods addressing a novel aspect of automatic indexing of the biomedical literature, namely producing MeSH main heading/subheading pair recommendations. The methods, (dictionary-based, post- processing rules and Natural Language Processing rules) are described and evaluated on a genetics-related corpus. The best overall performance is obtained for the subheading genetics (70% precision and 17% recall with post-processing rules, 48% precision and 37% recall with the dictionary-based method). Future work will address extending this work to all MeSH subheadings and a more thorough study of method combination.
A Scalable Data Access Layer to Manage Structured Heterogeneous Biomedical Data
Lianas, Luca; Frexia, Francesca; Zanetti, Gianluigi
2016-01-01
This work presents a scalable data access layer, called PyEHR, designed to support the implementation of data management systems for secondary use of structured heterogeneous biomedical and clinical data. PyEHR adopts the openEHR’s formalisms to guarantee the decoupling of data descriptions from implementation details and exploits structure indexing to accelerate searches. Data persistence is guaranteed by a driver layer with a common driver interface. Interfaces for two NoSQL Database Management Systems are already implemented: MongoDB and Elasticsearch. We evaluated the scalability of PyEHR experimentally through two types of tests, called “Constant Load” and “Constant Number of Records”, with queries of increasing complexity on synthetic datasets of ten million records each, containing very complex openEHR archetype structures, distributed on up to ten computing nodes. PMID:27936191
Crowdsourcing biomedical research: leveraging communities as innovation engines
Saez-Rodriguez, Julio; Costello, James C.; Friend, Stephen H.; Kellen, Michael R.; Mangravite, Lara; Meyer, Pablo; Norman, Thea; Stolovitzky, Gustavo
2018-01-01
The generation of large-scale biomedical data is creating unprecedented opportunities for basic and translational science. Typically, the data producers perform initial analyses, but it is very likely that the most informative methods may reside with other groups. Crowdsourcing the analysis of complex and massive data has emerged as a framework to find robust methodologies. When the crowdsourcing is done in the form of collaborative scientific competitions, known as Challenges, the validation of the methods is inherently addressed. Challenges also encourage open innovation, create collaborative communities to solve diverse and important biomedical problems, and foster the creation and dissemination of well-curated data repositories. PMID:27418159
Crowdsourcing biomedical research: leveraging communities as innovation engines.
Saez-Rodriguez, Julio; Costello, James C; Friend, Stephen H; Kellen, Michael R; Mangravite, Lara; Meyer, Pablo; Norman, Thea; Stolovitzky, Gustavo
2016-07-15
The generation of large-scale biomedical data is creating unprecedented opportunities for basic and translational science. Typically, the data producers perform initial analyses, but it is very likely that the most informative methods may reside with other groups. Crowdsourcing the analysis of complex and massive data has emerged as a framework to find robust methodologies. When the crowdsourcing is done in the form of collaborative scientific competitions, known as Challenges, the validation of the methods is inherently addressed. Challenges also encourage open innovation, create collaborative communities to solve diverse and important biomedical problems, and foster the creation and dissemination of well-curated data repositories.
Huang, Weixin; Li, Xiaohui; Wang, Yuanping; Yan, Xia; Wu, Siping
2017-12-01
Stress urinary incontinence (SUI) is a widespread complaint in the adult women. Electroacupuncture has been widely applied in the treatment of SUI. But its efficacy has not been evaluated scientifically and systematically. Therefore, we provide a protocol of systematic evaluation to assess the effectiveness and safety of electroacupuncture treatment on women with SUI. The retrieved databases include 3 English literature databases, namely PubMed, Embase, and Cochrane Library, and 3 Chinese literature databases, namely Chinese Biomedical Literature Database (CBM), China National Knowledge Infrastructure (CNKI), and Wanfang Database. The randomized controlled trials (RCTs) of the electroacupuncture treatment on women with SUI will be searched in the above-mentioned databases from the time when the respective databases were established to December 2017. The change from baseline in the amount of urine leakage measured by the 1-hour pad test will be accepted as the primary outcomes. We will use RevMan V.5.3 software as well to compute the data synthesis carefully when a meta-analysis is allowed. This study will provide a high-quality synthesis to assess the effectiveness and safety of electroacupuncture treatment on women with SUI. The conclusion of our systematic review will provide evidence to judge whether electroacupuncture is an effective intervention for women with SUI. PROSPERO CRD42017070947.
Industry careers for the biomedical engineer.
Munzner, Robert F
2004-01-01
This year's conference theme is "linkages for innovation in biomedicine." Biomedical engineers, especially those transitioning their career from academic study into medical device industry, will play a critical role in converting the fruits of scientific research into the reality of modern medical devices. This special session is organized to help biomedical engineers to achieve their career goals more effectively. Participants will have opportunities to hear from and interact with leading industrial experts on many issues. These may include but not limited to 1) career paths for biomedical engineers (industrial, academic, or federal; technical vs. managerial track; small start-up or large established companies); 2) unique design challenges and regulatory requirements in medical device development; 3) aspects of a successful biomedical engineering job candidate (such as resume, interview, follow-up). Suggestions for other topics are welcome and should be directed to xkong@ieee.org The distinguished panelists include: Xuan Kong, Ph.D., VP of Research, NEUROMetrix Inc, Waltham, MA Robert F. Munzner, Ph.D., Medical Device Consultant, Doctor Device, Herndon, VA Glen McLaughlin, Ph.D., VP of Engineering and CTO, Zonare Medical System Inc., Mountain View, CA Grace Bartoo, Ph.D., RAC, General Manager, Decus Biomedical LLC San Carlos, CA.
The importance of Zebrafish in biomedical research.
Tavares, Bárbara; Santos Lopes, Susana
2013-01-01
Zebrafish (Danio rerio) is an ideal model organism for the study of vertebrate development. This is due to the large clutches that each couple produces, with up to 200 embryos every 7 days, and to the fact that the embryos and larvae are small, transparent and undergo rapid external development. Using scientific literature research tools available online and the keywords Zebrafish, biomedical research, human disease, and drug screening, we reviewed original studies and reviews indexed in PubMed. In this review we summarized work conducted with this model for the advancement of our knowledge related to several human diseases. We also focused on the biomedical research being performed in Portugal with the zebrafish model. Powerful live imaging and genetic tools are currently available for zebrafish making it a valuable model in biomedical research. The combination of these properties with the optimization of automated systems for drug screening has transformed the zebrafish into "a top model" in biomedical research, drug discovery and toxicity testing. Furthermore, with the optimization of xenografts technology it will be possible to use zebrafish to aide in the choice of the best therapy for each patient. Zebrafish is an excellent model organism in biomedical research, drug development and in clinical therapy.
A new visual navigation system for exploring biomedical Open Educational Resource (OER) videos
Zhao, Baoquan; Xu, Songhua; Lin, Shujin; Luo, Xiaonan; Duan, Lian
2016-01-01
Objective Biomedical videos as open educational resources (OERs) are increasingly proliferating on the Internet. Unfortunately, seeking personally valuable content from among the vast corpus of quality yet diverse OER videos is nontrivial due to limitations of today’s keyword- and content-based video retrieval techniques. To address this need, this study introduces a novel visual navigation system that facilitates users’ information seeking from biomedical OER videos in mass quantity by interactively offering visual and textual navigational clues that are both semantically revealing and user-friendly. Materials and Methods The authors collected and processed around 25 000 YouTube videos, which collectively last for a total length of about 4000 h, in the broad field of biomedical sciences for our experiment. For each video, its semantic clues are first extracted automatically through computationally analyzing audio and visual signals, as well as text either accompanying or embedded in the video. These extracted clues are subsequently stored in a metadata database and indexed by a high-performance text search engine. During the online retrieval stage, the system renders video search results as dynamic web pages using a JavaScript library that allows users to interactively and intuitively explore video content both efficiently and effectively. Results The authors produced a prototype implementation of the proposed system, which is publicly accessible at https://patentq.njit.edu/oer. To examine the overall advantage of the proposed system for exploring biomedical OER videos, the authors further conducted a user study of a modest scale. The study results encouragingly demonstrate the functional effectiveness and user-friendliness of the new system for facilitating information seeking from and content exploration among massive biomedical OER videos. Conclusion Using the proposed tool, users can efficiently and effectively find videos of interest, precisely locate video segments delivering personally valuable information, as well as intuitively and conveniently preview essential content of a single or a collection of videos. PMID:26335986
Research Trend Visualization by MeSH Terms from PubMed.
Yang, Heyoung; Lee, Hyuck Jai
2018-05-30
Motivation : PubMed is a primary source of biomedical information comprising search tool function and the biomedical literature from MEDLINE which is the US National Library of Medicine premier bibliographic database, life science journals and online books. Complimentary tools to PubMed have been developed to help the users search for literature and acquire knowledge. However, these tools are insufficient to overcome the difficulties of the users due to the proliferation of biomedical literature. A new method is needed for searching the knowledge in biomedical field. Methods : A new method is proposed in this study for visualizing the recent research trends based on the retrieved documents corresponding to a search query given by the user. The Medical Subject Headings (MeSH) are used as the primary analytical element. MeSH terms are extracted from the literature and the correlations between them are calculated. A MeSH network, called MeSH Net, is generated as the final result based on the Pathfinder Network algorithm. Results : A case study for the verification of proposed method was carried out on a research area defined by the search query (immunotherapy and cancer and "tumor microenvironment"). The MeSH Net generated by the method is in good agreement with the actual research activities in the research area (immunotherapy). Conclusion : A prototype application generating MeSH Net was developed. The application, which could be used as a "guide map for travelers", allows the users to quickly and easily acquire the knowledge of research trends. Combination of PubMed and MeSH Net is expected to be an effective complementary system for the researchers in biomedical field experiencing difficulties with search and information analysis.
PathNER: a tool for systematic identification of biological pathway mentions in the literature
2013-01-01
Background Biological pathways are central to many biomedical studies and are frequently discussed in the literature. Several curated databases have been established to collate the knowledge of molecular processes constituting pathways. Yet, there has been little focus on enabling systematic detection of pathway mentions in the literature. Results We developed a tool, named PathNER (Pathway Named Entity Recognition), for the systematic identification of pathway mentions in the literature. PathNER is based on soft dictionary matching and rules, with the dictionary generated from public pathway databases. The rules utilise general pathway-specific keywords, syntactic information and gene/protein mentions. Detection results from both components are merged. On a gold-standard corpus, PathNER achieved an F1-score of 84%. To illustrate its potential, we applied PathNER on a collection of articles related to Alzheimer's disease to identify associated pathways, highlighting cases that can complement an existing manually curated knowledgebase. Conclusions In contrast to existing text-mining efforts that target the automatic reconstruction of pathway details from molecular interactions mentioned in the literature, PathNER focuses on identifying specific named pathway mentions. These mentions can be used to support large-scale curation and pathway-related systems biology applications, as demonstrated in the example of Alzheimer's disease. PathNER is implemented in Java and made freely available online at http://sourceforge.net/projects/pathner/. PMID:24555844
de Vries, Rob B M; Leenaars, Marlies; Tra, Joppe; Huijbregtse, Robbertjan; Bongers, Erik; Jansen, John A; Gordijn, Bert; Ritskes-Hoitinga, Merel
2015-07-01
An underexposed ethical issue raised by tissue engineering is the use of laboratory animals in tissue engineering research. Even though this research results in suffering and loss of life in animals, tissue engineering also has great potential for the development of alternatives to animal experiments. With the objective of promoting a joint effort of tissue engineers and alternative experts to fully realise this potential, this study provides the first comprehensive overview of the possibilities of using tissue-engineered constructs as a replacement of laboratory animals. Through searches in two large biomedical databases (PubMed, Embase) and several specialised 3R databases, 244 relevant primary scientific articles, published between 1991 and 2011, were identified. By far most articles reviewed related to the use of tissue-engineered skin/epidermis for toxicological applications such as testing for skin irritation. This review article demonstrates, however, that the potential for the development of alternatives also extends to other tissues such as other epithelia and the liver, as well as to other fields of application such as drug screening and basic physiology. This review discusses which impediments need to be overcome to maximise the contributions that the field of tissue engineering can make, through the development of alternative methods, to the reduction of the use and suffering of laboratory animals. Copyright © 2013 John Wiley & Sons, Ltd.
Jiang, Li; Wei, Yi-Liang; Zhao, Lei; Li, Na; Liu, Tao; Liu, Hai-Bo; Ren, Li-Jie; Li, Jiu-Ling; Hao, Hui-Fang; Li, Qing; Li, Cai-Xia
2018-07-01
Over the last decade, several panels of ancestry-informative markers have been proposed for the analysis of population genetic structure. The differentiation efficiency depends on the discriminatory ability of the included markers and the reference population coverage. We previously developed a small set of 27 autosomal single nucleotide polymorphisms (SNPs) for analyzing African, European, and East Asian ancestries. In the current study, we gathered a high-coverage reference database of 110 populations (10,350 individuals) from across the globe. The discrimination power of the panel was re-evaluated using four continental ancestry groups (as well as Indigenous Americans). We observed that all the 27 SNPs demonstrated stratified population specificity leading to a striking ancestral discrimination. Five markers (rs728404, rs7170869, rs2470102, rs1448485, and rs4789193) showed differences (δ > 0.3) in the frequency profiles between East Asian and Indigenous American populations. Ancestry components of all involved populations were accurately accessed compared with those from previous genome-wide analyses, thereafter achieved broadly population separation. Thus, our ancestral inference panel of a small number of highly informative SNPs in combination with a large-scale reference database provides a high-resolution in estimating ancestry compositions and distinguishing individual origins. We propose extensive usage in biomedical studies and forensics. Copyright © 2018 Elsevier B.V. All rights reserved.
Commercializing biomedical research through securitization techniques.
Fernandez, Jose-Maria; Stein, Roger M; Lo, Andrew W
2012-10-01
Biomedical innovation has become riskier, more expensive and more difficult to finance with traditional sources such as private and public equity. Here we propose a financial structure in which a large number of biomedical programs at various stages of development are funded by a single entity to substantially reduce the portfolio's risk. The portfolio entity can finance its activities by issuing debt, a critical advantage because a much larger pool of capital is available for investment in debt versus equity. By employing financial engineering techniques such as securitization, it can raise even greater amounts of more-patient capital. In a simulation using historical data for new molecular entities in oncology from 1990 to 2011, we find that megafunds of $5–15 billion may yield average investment returns of 8.9–11.4% for equity holders and 5–8% for 'research-backed obligation' holders, which are lower than typical venture-capital hurdle rates but attractive to pension funds, insurance companies and other large institutional investors.
Collaborative mining and interpretation of large-scale data for biomedical research insights.
Tsiliki, Georgia; Karacapilidis, Nikos; Christodoulou, Spyros; Tzagarakis, Manolis
2014-01-01
Biomedical research becomes increasingly interdisciplinary and collaborative in nature. Researchers need to efficiently and effectively collaborate and make decisions by meaningfully assembling, mining and analyzing available large-scale volumes of complex multi-faceted data residing in different sources. In line with related research directives revealing that, in spite of the recent advances in data mining and computational analysis, humans can easily detect patterns which computer algorithms may have difficulty in finding, this paper reports on the practical use of an innovative web-based collaboration support platform in a biomedical research context. Arguing that dealing with data-intensive and cognitively complex settings is not a technical problem alone, the proposed platform adopts a hybrid approach that builds on the synergy between machine and human intelligence to facilitate the underlying sense-making and decision making processes. User experience shows that the platform enables more informed and quicker decisions, by displaying the aggregated information according to their needs, while also exploiting the associated human intelligence.
Collaborative Mining and Interpretation of Large-Scale Data for Biomedical Research Insights
Tsiliki, Georgia; Karacapilidis, Nikos; Christodoulou, Spyros; Tzagarakis, Manolis
2014-01-01
Biomedical research becomes increasingly interdisciplinary and collaborative in nature. Researchers need to efficiently and effectively collaborate and make decisions by meaningfully assembling, mining and analyzing available large-scale volumes of complex multi-faceted data residing in different sources. In line with related research directives revealing that, in spite of the recent advances in data mining and computational analysis, humans can easily detect patterns which computer algorithms may have difficulty in finding, this paper reports on the practical use of an innovative web-based collaboration support platform in a biomedical research context. Arguing that dealing with data-intensive and cognitively complex settings is not a technical problem alone, the proposed platform adopts a hybrid approach that builds on the synergy between machine and human intelligence to facilitate the underlying sense-making and decision making processes. User experience shows that the platform enables more informed and quicker decisions, by displaying the aggregated information according to their needs, while also exploiting the associated human intelligence. PMID:25268270
Biomedical information retrieval across languages.
Daumke, Philipp; Markü, Kornél; Poprat, Michael; Schulz, Stefan; Klar, Rüdiger
2007-06-01
This work presents a new dictionary-based approach to biomedical cross-language information retrieval (CLIR) that addresses many of the general and domain-specific challenges in current CLIR research. Our method is based on a multilingual lexicon that was generated partly manually and partly automatically, and currently covers six European languages. It contains morphologically meaningful word fragments, termed subwords. Using subwords instead of entire words significantly reduces the number of lexical entries necessary to sufficiently cover a specific language and domain. Mediation between queries and documents is based on these subwords as well as on lists of word-n-grams that are generated from large monolingual corpora and constitute possible translation units. The translations are then sent to a standard Internet search engine. This process makes our approach an effective tool for searching the biomedical content of the World Wide Web in different languages. We evaluate this approach using the OHSUMED corpus, a large medical document collection, within a cross-language retrieval setting.
Organization and integration of biomedical knowledge with concept maps for key peroxisomal pathways.
Willemsen, A M; Jansen, G A; Komen, J C; van Hooff, S; Waterham, H R; Brites, P M T; Wanders, R J A; van Kampen, A H C
2008-08-15
One important area of clinical genomics research involves the elucidation of molecular mechanisms underlying (complex) disorders which eventually may lead to new diagnostic or drug targets. To further advance this area of clinical genomics one of the main challenges is the acquisition and integration of data, information and expert knowledge for specific biomedical domains and diseases. Currently the required information is not very well organized but scattered over biological and biomedical databases, basic text books, scientific literature and experts' minds and may be highly specific, heterogeneous, complex and voluminous. We present a new framework to construct knowledge bases with concept maps for presentation of information and the web ontology language OWL for the representation of information. We demonstrate this framework through the construction of a peroxisomal knowledge base, which focuses on four key peroxisomal pathways and several related genetic disorders. All 155 concept maps in our knowledge base are linked to at least one other concept map, which allows the visualization of one big network of related pieces of information. The peroxisome knowledge base is available from www.bioinformaticslaboratory.nl (Support-->Web applications). Supplementary data is available from www.bioinformaticslaboratory.nl (Research-->Output--> Publications--> KB_SuppInfo)
Text mining applications in psychiatry: a systematic literature review.
Abbe, Adeline; Grouin, Cyril; Zweigenbaum, Pierre; Falissard, Bruno
2016-06-01
The expansion of biomedical literature is creating the need for efficient tools to keep pace with increasing volumes of information. Text mining (TM) approaches are becoming essential to facilitate the automated extraction of useful biomedical information from unstructured text. We reviewed the applications of TM in psychiatry, and explored its advantages and limitations. A systematic review of the literature was carried out using the CINAHL, Medline, EMBASE, PsycINFO and Cochrane databases. In this review, 1103 papers were screened, and 38 were included as applications of TM in psychiatric research. Using TM and content analysis, we identified four major areas of application: (1) Psychopathology (i.e. observational studies focusing on mental illnesses) (2) the Patient perspective (i.e. patients' thoughts and opinions), (3) Medical records (i.e. safety issues, quality of care and description of treatments), and (4) Medical literature (i.e. identification of new scientific information in the literature). The information sources were qualitative studies, Internet postings, medical records and biomedical literature. Our work demonstrates that TM can contribute to complex research tasks in psychiatry. We discuss the benefits, limits, and further applications of this tool in the future. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
Review of research grant allocation to psychosocial studies in diabetes research.
Jones, A; Vallis, M; Cooke, D; Pouwer, F
2016-12-01
To estimate and discuss the allocation of diabetes research funds to studies with a psychosocial focus. Annual reports and funded-research databases from approximately the last 5 years (if available) were reviewed from the following representative funding organizations, the American Diabetes Association, the Canadian Diabetes Association, Diabetes Australia, Diabetes UK, the Dutch Diabetes Research Foundation and the European Foundation for the Study of Diabetes, in order to estimate the overall proportion of studies allocated research funding that had a psychosocial focus. An estimated mean of 8% of funded studies from our sample were found to have a psychosocial focus. The proportion of funded studies with a psychosocial focus was small, with an estimated mean ratio of 17:1 observed between funded biomedical and psychosocial studies in diabetes research. While several factors may account for this finding, the observation that 90% of funded studies are biomedical may be partly attributable to the methodological orthodoxy of applying biomedical reductionism to understand and treat disease. A more comprehensive and systemic whole-person approach in diabetes research that resembles more closely the complexity of human beings is needed and may lead to improved care for individuals living with diabetes or at risk of diabetes. © 2016 Diabetes UK.
Reasons behind the participation in biomedical research: a brief review.
Dainesi, Sonia Mansoldo; Goldbaum, Moisés
2014-12-01
Clinical research is essential for the advancement of Medicine, especially regarding the development of new drugs. Understanding the reasons behind patients' decision of participating in these studies is critical for the recruitment and retention in the research. To examine the decision-making of participants in biomedical research, taking into account different settings and environments where clinical research is performed. A critical review of the literature was performed through several databases using the keywords: "motivation", "decision", "reason", "biomedical research", "clinical research", "recruitment", "enrollment", "participation", "benefits", "altruism", "decline", "vulnerability" and "ethics", between August and November 2013, in English and in Portuguese. The review pointed out that the reasons can be different according to some characteristics such as the disease being treated, study phase, prognoses and socioeconomic and cultural environment. Access to better health care, personal benefits, financial rewards and altruism are mentioned depending on the circumstances. Finding out more about individuals' reasons for taking part in the research will allow clinical investigators to design studies of greater benefit for the community and will probably help to remove undesirable barriers imposed to participation. Improving the information to health care professionals and patients on the benefits and risks of clinical trials is certainly a good start.
A Simple and Practical Dictionary-based Approach for Identification of Proteins in Medline Abstracts
Egorov, Sergei; Yuryev, Anton; Daraselia, Nikolai
2004-01-01
Objective: The aim of this study was to develop a practical and efficient protein identification system for biomedical corpora. Design: The developed system, called ProtScan, utilizes a carefully constructed dictionary of mammalian proteins in conjunction with a specialized tokenization algorithm to identify and tag protein name occurrences in biomedical texts and also takes advantage of Medline “Name-of-Substance” (NOS) annotation. The dictionaries for ProtScan were constructed in a semi-automatic way from various public-domain sequence databases followed by an intensive expert curation step. Measurements: The recall and precision of the system have been determined using 1,000 randomly selected and hand-tagged Medline abstracts. Results: The developed system is capable of identifying protein occurrences in Medline abstracts with a 98% precision and 88% recall. It was also found to be capable of processing approximately 300 abstracts per second. Without utilization of NOS annotation, precision and recall were found to be 98.5% and 84%, respectively. Conclusion: The developed system appears to be well suited for protein-based Medline indexing and can help to improve biomedical information retrieval. Further approaches to ProtScan's recall improvement also are discussed. PMID:14764613
Coiera, Enrico; Mandl, Kenneth D
2014-01-01
In 2014, the vast majority of published biomedical research is still hidden behind paywalls rather than open access. For more than a decade, similar restrictions over other digitally available content have engendered illegal activity. Music file sharing became rampant in the late 1990s as communities formed around new ways to share. The frequency and scale of cyber-attacks against commercial and government interests has increased dramatically. Massive troves of classified government documents have become public through the actions of a few. Yet we have not seen significant growth in the illegal sharing of peer-reviewed academic articles. Should we truly expect that biomedical publishing is somehow at less risk than other content-generating industries? What of the larger threat—a “Biblioleaks” event—a database breach and public leak of the substantial archives of biomedical literature? As the expectation that all research should be available to everyone becomes the norm for a younger generation of researchers and the broader community, the motivations for such a leak are likely to grow. We explore the feasibility and consequences of a Biblioleaks event for researchers, journals, publishers, and the broader communities of doctors and the patients they serve. PMID:24755534
Methodological Quality Assessment of Meta-Analyses of Hyperthyroidism Treatment.
Qin, Yahong; Yao, Liang; Shao, Feifei; Yang, Kehu; Tian, Limin
2018-01-01
Hyperthyroidism is a common condition that is associated with increased morbidity and mortality. A number of meta-analyses (MAs) have assessed the therapeutic measures for hyperthyroidism, including antithyroid drugs, surgery, and radioiodine, however, the methodological quality has not been evaluated. This study evaluated the methodological quality and summarized the evidence obtained from MAs of hyperthyroidism treatments for radioiodine, antithyroid drugs, and surgery. We searched the PubMed, EMBASE, Cochrane Library, Web of Science, and Chinese Biomedical Literature Database databases. Two investigators independently assessed the meta-analyses titles and abstracts for inclusion. Methodological quality was assessed using the validated AMSTAR (Assessing the Methodological Quality of Systematic Reviews) tool. A total of 26 MAs fulfilled the inclusion criteria. Based on the AMSTAR scores, the average methodological quality was 8.31, with large variability ranging from 4 to 11. The methodological quality of English meta-analyses was better than that of Chinese meta-analyses. Cochrane reviews had better methodological quality than non-Cochrane reviews due to better study selection and data extraction, the inclusion of unpublished studies, and better reporting of study characteristics. The authors did not report conflicts of interest in 53.8% meta-analyses, and 19.2% did not report the harmful effects of treatment. Publication bias was not assessed in 38.5% of meta-analyses, and 19.2% did not report the follow-up time. Large-scale assessment of methodological quality of meta-analyses of hyperthyroidism treatment highlighted methodological strengths and weaknesses. Consideration of scientific quality when formulating conclusions should be made explicit. Future meta-analyses should improve on reporting conflict of interest. © Georg Thieme Verlag KG Stuttgart · New York.
Shin, Dong Wook; Roter, Debra L; Roh, Yong Kyun; Hahm, Sang Keun; Cho, BeLong; Park, Hoon-Ki
2015-01-01
Female physicians have a more patient-centered communication style than their male counterparts; however, few studies have investigated how the biomedical or psychosocial nature of a patient diagnosis might moderate this relationship. Seventy six 3rd year residents (50 male and 26 females) seeking board certification from the Korean Academy of Family Medicine participated in the 2013 Clinical Practice Examination by conducting two simulated patient (SP) interviews, one presenting a largely psychosocial case and the other largely biomedical. The interview recordings were coded with the Roter Interaction Analysis System (RIAS). Female physicians and their SPs engaged in more dialog than male physicians in both cases. Female physicians were more patient-centered than males for the psychosocial case (t = -3.24, P < 0.05), however, their scores did not differ for the biomedical case. In multivariate analysis, a significant interaction between physician gender and case (z = -3.90, P < 0.001) similarly demonstrated greater female patient-centeredness only for the predominantly psychosocial case. Case characteristics moderated the association between physician gender and patient-centeredness. Case characteristics need to be considered in future research on the association of physician gender and the patient-centered communication, as well as in the tailoring of physician communication training. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
A Survey of Bioinformatics Database and Software Usage through Mining the Literature.
Duck, Geraint; Nenadic, Goran; Filannino, Michele; Brass, Andy; Robertson, David L; Stevens, Robert
2016-01-01
Computer-based resources are central to much, if not most, biological and medical research. However, while there is an ever expanding choice of bioinformatics resources to use, described within the biomedical literature, little work to date has provided an evaluation of the full range of availability or levels of usage of database and software resources. Here we use text mining to process the PubMed Central full-text corpus, identifying mentions of databases or software within the scientific literature. We provide an audit of the resources contained within the biomedical literature, and a comparison of their relative usage, both over time and between the sub-disciplines of bioinformatics, biology and medicine. We find that trends in resource usage differs between these domains. The bioinformatics literature emphasises novel resource development, while database and software usage within biology and medicine is more stable and conservative. Many resources are only mentioned in the bioinformatics literature, with a relatively small number making it out into general biology, and fewer still into the medical literature. In addition, many resources are seeing a steady decline in their usage (e.g., BLAST, SWISS-PROT), though some are instead seeing rapid growth (e.g., the GO, R). We find a striking imbalance in resource usage with the top 5% of resource names (133 names) accounting for 47% of total usage, and over 70% of resources extracted being only mentioned once each. While these results highlight the dynamic and creative nature of bioinformatics research they raise questions about software reuse, choice and the sharing of bioinformatics practice. Is it acceptable that so many resources are apparently never reused? Finally, our work is a step towards automated extraction of scientific method from text. We make the dataset generated by our study available under the CC0 license here: http://dx.doi.org/10.6084/m9.figshare.1281371.
In silico gene expression analysis – an overview
Murray, David; Doran, Peter; MacMathuna, Padraic; Moss, Alan C
2007-01-01
Efforts aimed at deciphering the molecular basis of complex disease are underpinned by the availability of high throughput strategies for the identification of biomolecules that drive the disease process. The completion of the human genome-sequencing project, coupled to major technological developments, has afforded investigators myriad opportunities for multidimensional analysis of biological systems. Nowhere has this research explosion been more evident than in the field of transcriptomics. Affordable access and availability to the technology that supports such investigations has led to a significant increase in the amount of data generated. As most biological distinctions are now observed at a genomic level, a large amount of expression information is now openly available via public databases. Furthermore, numerous computational based methods have been developed to harness the power of these data. In this review we provide a brief overview of in silico methodologies for the analysis of differential gene expression such as Serial Analysis of Gene Expression and Digital Differential Display. The performance of these strategies, at both an operational and result/output level is assessed and compared. The key considerations that must be made when completing an in silico expression analysis are also presented as a roadmap to facilitate biologists. Furthermore, to highlight the importance of these in silico methodologies in contemporary biomedical research, examples of current studies using these approaches are discussed. The overriding goal of this review is to present the scientific community with a critical overview of these strategies, so that they can be effectively added to the tool box of biomedical researchers focused on identifying the molecular mechanisms of disease. PMID:17683638
NASA Astrophysics Data System (ADS)
Dasari Shareena, Thabitha P.; McShan, Danielle; Dasmahapatra, Asok K.; Tchounwou, Paul B.
2018-07-01
Graphene-based nanomaterials (GBNs) have attracted increasing interests of the scientific community due to their unique physicochemical properties and their applications in biotechnology, biomedicine, bioengineering, disease diagnosis and therapy. Although a large amount of researches have been conducted on these novel nanomaterials, limited comprehensive reviews are published on their biomedical applications and potential environmental and human health effects. The present research aimed at addressing this knowledge gap by examining and discussing: (1) the history, synthesis, structural properties and recent developments of GBNs for biomedical applications; (2) GBNs uses as therapeutics, drug/gene delivery and antibacterial materials; (3) GBNs applications in tissue engineering and in research as biosensors and bioimaging materials; and (4) GBNs potential environmental effects and human health risks. It also discussed the perspectives and challenges associated with the biomedical applications of GBNs.[Figure not available: see fulltext.
Space medicine research publications: 1984-1986
NASA Technical Reports Server (NTRS)
Wallace, Janice S.
1988-01-01
A list is given of the publications of investigators supported by the Biomedical Research and Clinical Medicine Programs of the Space Medicine and Biology Branch, Life Sciences Division, Office of Space Science and Applications. It includes publications entered into the Life Sciences Bibliographic Database by the George Washington University as of December 31, 1986. Publications are organized into the following subject areas: Clinical Medicine, Space Human Factors, Musculoskeletal, Radiation and Environmental Health, Regulatory Physiology, Neuroscience, and Cardiopulmonary.
Yan, Qing
2010-01-01
Bioinformatics is the rational study at an abstract level that can influence the way we understand biomedical facts and the way we apply the biomedical knowledge. Bioinformatics is facing challenges in helping with finding the relationships between genetic structures and functions, analyzing genotype-phenotype associations, and understanding gene-environment interactions at the systems level. One of the most important issues in bioinformatics is data integration. The data integration methods introduced here can be used to organize and integrate both public and in-house data. With the volume of data and the high complexity, computational decision support is essential for integrative transporter studies in pharmacogenomics, nutrigenomics, epigenetics, and systems biology. For the development of such a decision support system, object-oriented (OO) models can be constructed using the Unified Modeling Language (UML). A methodology is developed to build biomedical models at different system levels and construct corresponding UML diagrams, including use case diagrams, class diagrams, and sequence diagrams. By OO modeling using UML, the problems of transporter pharmacogenomics and systems biology can be approached from different angles with a more complete view, which may greatly enhance the efforts in effective drug discovery and development. Bioinformatics resources of membrane transporters and general bioinformatics databases and tools that are frequently used in transporter studies are also collected here. An informatics decision support system based on the models presented here is available at http://www.pharmtao.com/transporter . The methodology developed here can also be used for other biomedical fields.
The GAAIN Entity Mapper: An Active-Learning System for Medical Data Mapping.
Ashish, Naveen; Dewan, Peehoo; Toga, Arthur W
2015-01-01
This work is focused on mapping biomedical datasets to a common representation, as an integral part of data harmonization for integrated biomedical data access and sharing. We present GEM, an intelligent software assistant for automated data mapping across different datasets or from a dataset to a common data model. The GEM system automates data mapping by providing precise suggestions for data element mappings. It leverages the detailed metadata about elements in associated dataset documentation such as data dictionaries that are typically available with biomedical datasets. It employs unsupervised text mining techniques to determine similarity between data elements and also employs machine-learning classifiers to identify element matches. It further provides an active-learning capability where the process of training the GEM system is optimized. Our experimental evaluations show that the GEM system provides highly accurate data mappings (over 90% accuracy) for real datasets of thousands of data elements each, in the Alzheimer's disease research domain. Further, the effort in training the system for new datasets is also optimized. We are currently employing the GEM system to map Alzheimer's disease datasets from around the globe into a common representation, as part of a global Alzheimer's disease integrated data sharing and analysis network called GAAIN. GEM achieves significantly higher data mapping accuracy for biomedical datasets compared to other state-of-the-art tools for database schema matching that have similar functionality. With the use of active-learning capabilities, the user effort in training the system is minimal.
The GAAIN Entity Mapper: An Active-Learning System for Medical Data Mapping
Ashish, Naveen; Dewan, Peehoo; Toga, Arthur W.
2016-01-01
This work is focused on mapping biomedical datasets to a common representation, as an integral part of data harmonization for integrated biomedical data access and sharing. We present GEM, an intelligent software assistant for automated data mapping across different datasets or from a dataset to a common data model. The GEM system automates data mapping by providing precise suggestions for data element mappings. It leverages the detailed metadata about elements in associated dataset documentation such as data dictionaries that are typically available with biomedical datasets. It employs unsupervised text mining techniques to determine similarity between data elements and also employs machine-learning classifiers to identify element matches. It further provides an active-learning capability where the process of training the GEM system is optimized. Our experimental evaluations show that the GEM system provides highly accurate data mappings (over 90% accuracy) for real datasets of thousands of data elements each, in the Alzheimer's disease research domain. Further, the effort in training the system for new datasets is also optimized. We are currently employing the GEM system to map Alzheimer's disease datasets from around the globe into a common representation, as part of a global Alzheimer's disease integrated data sharing and analysis network called GAAIN1. GEM achieves significantly higher data mapping accuracy for biomedical datasets compared to other state-of-the-art tools for database schema matching that have similar functionality. With the use of active-learning capabilities, the user effort in training the system is minimal. PMID:26793094
Explorative search of distributed bio-data to answer complex biomedical questions
2014-01-01
Background The huge amount of biomedical-molecular data increasingly produced is providing scientists with potentially valuable information. Yet, such data quantity makes difficult to find and extract those data that are most reliable and most related to the biomedical questions to be answered, which are increasingly complex and often involve many different biomedical-molecular aspects. Such questions can be addressed only by comprehensively searching and exploring different types of data, which frequently are ordered and provided by different data sources. Search Computing has been proposed for the management and integration of ranked results from heterogeneous search services. Here, we present its novel application to the explorative search of distributed biomedical-molecular data and the integration of the search results to answer complex biomedical questions. Results A set of available bioinformatics search services has been modelled and registered in the Search Computing framework, and a Bioinformatics Search Computing application (Bio-SeCo) using such services has been created and made publicly available at http://www.bioinformatics.deib.polimi.it/bio-seco/seco/. It offers an integrated environment which eases search, exploration and ranking-aware combination of heterogeneous data provided by the available registered services, and supplies global results that can support answering complex multi-topic biomedical questions. Conclusions By using Bio-SeCo, scientists can explore the very large and very heterogeneous biomedical-molecular data available. They can easily make different explorative search attempts, inspect obtained results, select the most appropriate, expand or refine them and move forward and backward in the construction of a global complex biomedical query on multiple distributed sources that could eventually find the most relevant results. Thus, it provides an extremely useful automated support for exploratory integrated bio search, which is fundamental for Life Science data driven knowledge discovery. PMID:24564278
NASA Astrophysics Data System (ADS)
Mylott, Elliot; Kutschera, Ellynne; Dunlap, Justin C.; Christensen, Warren; Widenhorn, Ralf
2016-04-01
We will describe a one-quarter pilot algebra-based introductory physics course for pre-health and life science majors. The course features videos with biomedical experts and cogent biomedically inspired physics content. The materials were used in a flipped classroom as well as an all-online environment where students interacted with multimedia materials online and prior to engaging in classroom activities. Pre-lecture questions on both the medical content covered in the video media and the physics concepts in the written material were designed to engage students and probe their understanding of physics. The course featured group discussion and peer-lead instruction. Following in-class instruction, students engaged with homework assignments which explore the connections of physics and the medical field in a quantitative manner. Course surveys showed a positive response by the vast majority of students. Students largely indicated that the course helped them to make a connection between physics and the biomedical field. The biomedical focus and different course format were seen as an improvement to previous traditional physics instruction.
Envisioning the future of 'big data' biomedicine.
Bui, Alex A T; Van Horn, John Darrell
2017-05-01
Through the increasing availability of more efficient data collection procedures, biomedical scientists are now confronting ever larger sets of data, often finding themselves struggling to process and interpret what they have gathered. This, while still more data continues to accumulate. This torrent of biomedical information necessitates creative thinking about how the data are being generated, how they might be best managed, analyzed, and eventually how they can be transformed into further scientific understanding for improving patient care. Recognizing this as a major challenge, the National Institutes of Health (NIH) has spearheaded the "Big Data to Knowledge" (BD2K) program - the agency's most ambitious biomedical informatics effort ever undertaken to date. In this commentary, we describe how the NIH has taken on "big data" science head-on, how a consortium of leading research centers are developing the means for handling large-scale data, and how such activities are being marshalled for the training of a new generation of biomedical data scientists. All in all, the NIH BD2K program seeks to position data science at the heart of 21 st Century biomedical research. Copyright © 2017 Elsevier Inc. All rights reserved.
Enhancing biomedical text summarization using semantic relation extraction.
Shang, Yue; Li, Yanpeng; Lin, Hongfei; Yang, Zhihao
2011-01-01
Automatic text summarization for a biomedical concept can help researchers to get the key points of a certain topic from large amount of biomedical literature efficiently. In this paper, we present a method for generating text summary for a given biomedical concept, e.g., H1N1 disease, from multiple documents based on semantic relation extraction. Our approach includes three stages: 1) We extract semantic relations in each sentence using the semantic knowledge representation tool SemRep. 2) We develop a relation-level retrieval method to select the relations most relevant to each query concept and visualize them in a graphic representation. 3) For relations in the relevant set, we extract informative sentences that can interpret them from the document collection to generate text summary using an information retrieval based method. Our major focus in this work is to investigate the contribution of semantic relation extraction to the task of biomedical text summarization. The experimental results on summarization for a set of diseases show that the introduction of semantic knowledge improves the performance and our results are better than the MEAD system, a well-known tool for text summarization.
caGrid 1.0: An Enterprise Grid Infrastructure for Biomedical Research
Oster, Scott; Langella, Stephen; Hastings, Shannon; Ervin, David; Madduri, Ravi; Phillips, Joshua; Kurc, Tahsin; Siebenlist, Frank; Covitz, Peter; Shanbhag, Krishnakant; Foster, Ian; Saltz, Joel
2008-01-01
Objective To develop software infrastructure that will provide support for discovery, characterization, integrated access, and management of diverse and disparate collections of information sources, analysis methods, and applications in biomedical research. Design An enterprise Grid software infrastructure, called caGrid version 1.0 (caGrid 1.0), has been developed as the core Grid architecture of the NCI-sponsored cancer Biomedical Informatics Grid (caBIG™) program. It is designed to support a wide range of use cases in basic, translational, and clinical research, including 1) discovery, 2) integrated and large-scale data analysis, and 3) coordinated study. Measurements The caGrid is built as a Grid software infrastructure and leverages Grid computing technologies and the Web Services Resource Framework standards. It provides a set of core services, toolkits for the development and deployment of new community provided services, and application programming interfaces for building client applications. Results The caGrid 1.0 was released to the caBIG community in December 2006. It is built on open source components and caGrid source code is publicly and freely available under a liberal open source license. The core software, associated tools, and documentation can be downloaded from the following URL: https://cabig.nci.nih.gov/workspaces/Architecture/caGrid. Conclusions While caGrid 1.0 is designed to address use cases in cancer research, the requirements associated with discovery, analysis and integration of large scale data, and coordinated studies are common in other biomedical fields. In this respect, caGrid 1.0 is the realization of a framework that can benefit the entire biomedical community. PMID:18096909
caGrid 1.0: an enterprise Grid infrastructure for biomedical research.
Oster, Scott; Langella, Stephen; Hastings, Shannon; Ervin, David; Madduri, Ravi; Phillips, Joshua; Kurc, Tahsin; Siebenlist, Frank; Covitz, Peter; Shanbhag, Krishnakant; Foster, Ian; Saltz, Joel
2008-01-01
To develop software infrastructure that will provide support for discovery, characterization, integrated access, and management of diverse and disparate collections of information sources, analysis methods, and applications in biomedical research. An enterprise Grid software infrastructure, called caGrid version 1.0 (caGrid 1.0), has been developed as the core Grid architecture of the NCI-sponsored cancer Biomedical Informatics Grid (caBIG) program. It is designed to support a wide range of use cases in basic, translational, and clinical research, including 1) discovery, 2) integrated and large-scale data analysis, and 3) coordinated study. The caGrid is built as a Grid software infrastructure and leverages Grid computing technologies and the Web Services Resource Framework standards. It provides a set of core services, toolkits for the development and deployment of new community provided services, and application programming interfaces for building client applications. The caGrid 1.0 was released to the caBIG community in December 2006. It is built on open source components and caGrid source code is publicly and freely available under a liberal open source license. The core software, associated tools, and documentation can be downloaded from the following URL: https://cabig.nci.nih.gov/workspaces/Architecture/caGrid. While caGrid 1.0 is designed to address use cases in cancer research, the requirements associated with discovery, analysis and integration of large scale data, and coordinated studies are common in other biomedical fields. In this respect, caGrid 1.0 is the realization of a framework that can benefit the entire biomedical community.
Jin, Yinghui; Tian, Jinhui; Sun, Mei; Yang, Kehu
2011-02-01
The purpose of this systematic review was to establish whether warmed irrigation fluid temperature could decrease the drop of body temperature and incidence of shivering and hypothermia. Irrigation fluid, which is used in large quantities during endoscopic surgeries at room temperature, is considered to be associated with hypothermia and shivering. It remains controversial whether using warmed irrigation fluid to replace room-temperature irrigation fluid will decrease the drop of core body temperature and the occurrence of hypothermia. A comprehensive search (computerised database searches, footnote chasing, citation chasing) was undertaken to identify all the randomised controlled trials that explored temperature of irrigation fluid in endoscopic surgery. An approach involving meta-analysis was used. We searched PubMed, EMBASE, Cochrane Library, SCI, China academic journals full-text databases, Chinese Biomedical Literature Database, Chinese scientific journals databases and Chinese Medical Association Journals for trials that meet the inclusion criteria. Study quality was assessed using standards recommended by Cochrane Library Handbook 5.0.1. Disagreement was resolved by consensus. Thirteen randomised controlled trials including 686 patients were identified. The results showed that room-temperature irrigation fluid caused a greater drop of core body temperature in patients, compared to warmed irrigation fluid (p < 0.00001; I(2) = 85%). The occurrence of shivering [odds ratio (OR) 5.13, 95% CI: 2.95-10.19, p < 0.00001; I(2) = 0%] and hypothermia (OR 22.01, 95% CI: 2.03-197.08, p = 0.01; I(2) = 64%) in the groups having warmed irrigation fluid were lower than the group of studies having room-temperature fluid. In endoscopic surgeries, irrigation fluid is recommended to be warmed to decrease the drop of core body temperature and the risk of perioperative shivering and hypothermia. Warming irrigating fluid should be considered standard practice in all endoscopic surgeries. © 2011 Blackwell Publishing Ltd.
Zhou, Quan; Tao, Jing; Song, Huamei; Chen, Aihua; Yang, Huaijie; Zuo, Manzhen; Li, Hairong
2016-12-01
Kuntai capsule has been widely used for the treatment of menopausal syndrome in China for long time. We conducted this review to assess efficacy and safety of Kuntai capsule for the treatment of menopausal syndrome. We searched studies in PubMed, ClinicalTrials, the Cochrane Library, China National Knowledge Infrastructure Database(CNKI), China Science and Technology Journal Database (VIP), Wan fang Database and Chinese Biomedical Literature Database(CBM) until November 20, 2014. Randomized trials on Kuntai capsule for menopausal syndrome, compared with placebo or hormone replacement therapy (HRT) were included. Two reviewers independently retrieved the randomized controlled trials (RCTs) and extracted the information. The Cochrane risk of bias method was used to assess the quality of the included studies, and a Meta-analysis was conducted with Review Manager 5.2 software. A total of 17 RCTs (1455 participants) were included. The studies were of low methodological quality. Meta-analysis indicated that there was no statistical difference in the Kupperman index (KI) [WMD=0.51, 95% CI (-0.04, 1.06)], the effective rate of KI [OR=1.21, 95% CI (0.72, 2.04)], E2 level [WMD=-15.18, 95% CI (-33.93, 3.56)], and FSH level [WMD=-3.46, 95% CI (-7.2, 0.28)] after treatment between Kuntai versus HRT group (P>0.05). However, Compared with HRT, Kuntai capsule could significantly reduce the total incidence of adverse events [OR=0.28, 95% CI (0.17, 0.45)]. Kuntai capsule may be effective for treating menopausal syndrome and lower risk of side effects. The studies we analyzed were of low methodological quality. Therefore, more strictly designed large-scale randomized clinical trials are needed to evaluate the efficacy of Kuntai capsule in menopausal syndrome. Copyright © 2016 Elsevier Ltd. All rights reserved.
RysannMD: A biomedical semantic annotator balancing speed and accuracy.
Cuzzola, John; Jovanović, Jelena; Bagheri, Ebrahim
2017-07-01
Recently, both researchers and practitioners have explored the possibility of semantically annotating large and continuously evolving collections of biomedical texts such as research papers, medical reports, and physician notes in order to enable their efficient and effective management and use in clinical practice or research laboratories. Such annotations can be automatically generated by biomedical semantic annotators - tools that are specifically designed for detecting and disambiguating biomedical concepts mentioned in text. The biomedical community has already presented several solid automated semantic annotators. However, the existing tools are either strong in their disambiguation capacity, i.e., the ability to identify the correct biomedical concept for a given piece of text among several candidate concepts, or they excel in their processing time, i.e., work very efficiently, but none of the semantic annotation tools reported in the literature has both of these qualities. In this paper, we present RysannMD (Ryerson Semantic Annotator for Medical Domain), a biomedical semantic annotation tool that strikes a balance between processing time and performance while disambiguating biomedical terms. In other words, RysannMD provides reasonable disambiguation performance when choosing the right sense for a biomedical term in a given context, and does that in a reasonable time. To examine how RysannMD stands with respect to the state of the art biomedical semantic annotators, we have conducted a series of experiments using standard benchmarking corpora, including both gold and silver standards, and four modern biomedical semantic annotators, namely cTAKES, MetaMap, NOBLE Coder, and Neji. The annotators were compared with respect to the quality of the produced annotations measured against gold and silver standards using precision, recall, and F 1 measure and speed, i.e., processing time. In the experiments, RysannMD achieved the best median F 1 measure across the benchmarking corpora, independent of the standard used (silver/gold), biomedical subdomain, and document size. In terms of the annotation speed, RysannMD scored the second best median processing time across all the experiments. The obtained results indicate that RysannMD offers the best performance among the examined semantic annotators when both quality of annotation and speed are considered simultaneously. Copyright © 2017 Elsevier Inc. All rights reserved.
Pang, Shuchao; Yu, Zhezhou; Orgun, Mehmet A
2017-03-01
Highly accurate classification of biomedical images is an essential task in the clinical diagnosis of numerous medical diseases identified from those images. Traditional image classification methods combined with hand-crafted image feature descriptors and various classifiers are not able to effectively improve the accuracy rate and meet the high requirements of classification of biomedical images. The same also holds true for artificial neural network models directly trained with limited biomedical images used as training data or directly used as a black box to extract the deep features based on another distant dataset. In this study, we propose a highly reliable and accurate end-to-end classifier for all kinds of biomedical images via deep learning and transfer learning. We first apply domain transferred deep convolutional neural network for building a deep model; and then develop an overall deep learning architecture based on the raw pixels of original biomedical images using supervised training. In our model, we do not need the manual design of the feature space, seek an effective feature vector classifier or segment specific detection object and image patches, which are the main technological difficulties in the adoption of traditional image classification methods. Moreover, we do not need to be concerned with whether there are large training sets of annotated biomedical images, affordable parallel computing resources featuring GPUs or long times to wait for training a perfect deep model, which are the main problems to train deep neural networks for biomedical image classification as observed in recent works. With the utilization of a simple data augmentation method and fast convergence speed, our algorithm can achieve the best accuracy rate and outstanding classification ability for biomedical images. We have evaluated our classifier on several well-known public biomedical datasets and compared it with several state-of-the-art approaches. We propose a robust automated end-to-end classifier for biomedical images based on a domain transferred deep convolutional neural network model that shows a highly reliable and accurate performance which has been confirmed on several public biomedical image datasets. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
XCEDE: An Extensible Schema For Biomedical Data
Gadde, Syam; Aucoin, Nicole; Grethe, Jeffrey S.; Keator, David B.; Marcus, Daniel S.; Pieper, Steve
2013-01-01
The XCEDE (XML-based Clinical and Experimental Data Exchange) XML schema, developed by members of the BIRN (Biomedical Informatics Research Network), provides an extensive metadata hierarchy for storing, describing and documenting the data generated by scientific studies. Currently at version 2.0, the XCEDE schema serves as a specification for the exchange of scientific data between databases, analysis tools, and web services. It provides a structured metadata hierarchy, storing information relevant to various aspects of an experiment (project, subject, protocol, etc.). Each hierarchy level also provides for the storage of data provenance information allowing for a traceable record of processing and/or changes to the underlying data. The schema is extensible to support the needs of various data modalities and to express types of data not originally envisioned by the developers. The latest version of the XCEDE schema and manual are available from http://www.xcede.org/ PMID:21479735
Stead, William W.; Miller, Randolph A.; Musen, Mark A.; Hersh, William R.
2000-01-01
The vision of integrating information—from a variety of sources, into the way people work, to improve decisions and process—is one of the cornerstones of biomedical informatics. Thoughts on how this vision might be realized have evolved as improvements in information and communication technologies, together with discoveries in biomedical informatics, and have changed the art of the possible. This review identified three distinct generations of “integration” projects. First-generation projects create a database and use it for multiple purposes. Second-generation projects integrate by bringing information from various sources together through enterprise information architecture. Third-generation projects inter-relate disparate but accessible information sources to provide the appearance of integration. The review suggests that the ideas developed in the earlier generations have not been supplanted by ideas from subsequent generations. Instead, the ideas represent a continuum of progress along the three dimensions of workflow, structure, and extraction. PMID:10730596
Flowers, Natalie L
2010-01-01
CodeSlinger is a desktop application that was developed to aid medical professionals in the intertranslation, exploration, and use of biomedical coding schemes. The application was designed to provide a highly intuitive, easy-to-use interface that simplifies a complex business problem: a set of time-consuming, laborious tasks that were regularly performed by a group of medical professionals involving manually searching coding books, searching the Internet, and checking documentation references. A workplace observation session with a target user revealed the details of the current process and a clear understanding of the business goals of the target user group. These goals drove the design of the application's interface, which centers on searches for medical conditions and displays the codes found in the application's database that represent those conditions. The interface also allows the exploration of complex conceptual relationships across multiple coding schemes.
Formal ontology for natural language processing and the integration of biomedical databases.
Simon, Jonathan; Dos Santos, Mariana; Fielding, James; Smith, Barry
2006-01-01
The central hypothesis underlying this communication is that the methodology and conceptual rigor of a philosophically inspired formal ontology can bring significant benefits in the development and maintenance of application ontologies [A. Flett, M. Dos Santos, W. Ceusters, Some Ontology Engineering Procedures and their Supporting Technologies, EKAW2002, 2003]. This hypothesis has been tested in the collaboration between Language and Computing (L&C), a company specializing in software for supporting natural language processing especially in the medical field, and the Institute for Formal Ontology and Medical Information Science (IFOMIS), an academic research institution concerned with the theoretical foundations of ontology. In the course of this collaboration L&C's ontology, LinKBase, which is designed to integrate and support reasoning across a plurality of external databases, has been subjected to a thorough auditing on the basis of the principles underlying IFOMIS's Basic Formal Ontology (BFO) [B. Smith, Basic Formal Ontology, 2002. http://ontology.buffalo.edu/bfo]. The goal is to transform a large terminology-based ontology into one with the ability to support reasoning applications. Our general procedure has been the implementation of a meta-ontological definition space in which the definitions of all the concepts and relations in LinKBase are standardized in the framework of first-order logic. In this paper we describe how this principles-based standardization has led to a greater degree of internal coherence of the LinKBase structure, and how it has facilitated the construction of mappings between external databases using LinKBase as translation hub. We argue that the collaboration here described represents a new phase in the quest to solve the so-called "Tower of Babel" problem of ontology integration [F. Montayne, J. Flanagan, Formal Ontology: The Foundation for Natural Language Processing, 2003. http://www.landcglobal.com/].
Immobilization of biomolecules on the surface of inorganic nanoparticles for biomedical applications
Xing, Zhi-Cai; Chang, Yongmin; Kang, Inn-Kyu
2010-01-01
Various inorganic nanoparticles have been used for drug delivery, magnetic resonance and fluorescence imaging, and cell targeting owing to their unique properties, such as large surface area and efficient contrasting effect. In this review, we focus on the surface functionalization of inorganic nanoparticles via immobilization of biomolecules and the corresponding surface interactions with biocomponents. Applications of surface-modified inorganic nanoparticles in biomedical fields are also outlined. PMID:27877316
NPACT: Naturally Occurring Plant-based Anti-cancer Compound-Activity-Target database
Mangal, Manu; Sagar, Parul; Singh, Harinder; Raghava, Gajendra P. S.; Agarwal, Subhash M.
2013-01-01
Plant-derived molecules have been highly valued by biomedical researchers and pharmaceutical companies for developing drugs, as they are thought to be optimized during evolution. Therefore, we have collected and compiled a central resource Naturally Occurring Plant-based Anti-cancer Compound-Activity-Target database (NPACT, http://crdd.osdd.net/raghava/npact/) that gathers the information related to experimentally validated plant-derived natural compounds exhibiting anti-cancerous activity (in vitro and in vivo), to complement the other databases. It currently contains 1574 compound entries, and each record provides information on their structure, manually curated published data on in vitro and in vivo experiments along with reference for users referral, inhibitory values (IC50/ED50/EC50/GI50), properties (physical, elemental and topological), cancer types, cell lines, protein targets, commercial suppliers and drug likeness of compounds. NPACT can easily be browsed or queried using various options, and an online similarity tool has also been made available. Further, to facilitate retrieval of existing data, each record is hyperlinked to similar databases like SuperNatural, Herbal Ingredients’ Targets, Comparative Toxicogenomics Database, PubChem and NCI-60 GI50 data. PMID:23203877
The BioGRID interaction database: 2017 update
Chatr-aryamontri, Andrew; Oughtred, Rose; Boucher, Lorrie; Rust, Jennifer; Chang, Christie; Kolas, Nadine K.; O'Donnell, Lara; Oster, Sara; Theesfeld, Chandra; Sellam, Adnane; Stark, Chris; Breitkreutz, Bobby-Joe; Dolinski, Kara; Tyers, Mike
2017-01-01
The Biological General Repository for Interaction Datasets (BioGRID: https://thebiogrid.org) is an open access database dedicated to the annotation and archival of protein, genetic and chemical interactions for all major model organism species and humans. As of September 2016 (build 3.4.140), the BioGRID contains 1 072 173 genetic and protein interactions, and 38 559 post-translational modifications, as manually annotated from 48 114 publications. This dataset represents interaction records for 66 model organisms and represents a 30% increase compared to the previous 2015 BioGRID update. BioGRID curates the biomedical literature for major model organism species, including humans, with a recent emphasis on central biological processes and specific human diseases. To facilitate network-based approaches to drug discovery, BioGRID now incorporates 27 501 chemical–protein interactions for human drug targets, as drawn from the DrugBank database. A new dynamic interaction network viewer allows the easy navigation and filtering of all genetic and protein interaction data, as well as for bioactive compounds and their established targets. BioGRID data are directly downloadable without restriction in a variety of standardized formats and are freely distributed through partner model organism databases and meta-databases. PMID:27980099
LocSigDB: a database of protein localization signals
Negi, Simarjeet; Pandey, Sanjit; Srinivasan, Satish M.; Mohammed, Akram; Guda, Chittibabu
2015-01-01
LocSigDB (http://genome.unmc.edu/LocSigDB/) is a manually curated database of experimental protein localization signals for eight distinct subcellular locations; primarily in a eukaryotic cell with brief coverage of bacterial proteins. Proteins must be localized at their appropriate subcellular compartment to perform their desired function. Mislocalization of proteins to unintended locations is a causative factor for many human diseases; therefore, collection of known sorting signals will help support many important areas of biomedical research. By performing an extensive literature study, we compiled a collection of 533 experimentally determined localization signals, along with the proteins that harbor such signals. Each signal in the LocSigDB is annotated with its localization, source, PubMed references and is linked to the proteins in UniProt database along with the organism information that contain the same amino acid pattern as the given signal. From LocSigDB webserver, users can download the whole database or browse/search for data using an intuitive query interface. To date, LocSigDB is the most comprehensive compendium of protein localization signals for eight distinct subcellular locations. Database URL: http://genome.unmc.edu/LocSigDB/ PMID:25725059
Hinton, Elizabeth G; Oelschlegel, Sandra; Vaughn, Cynthia J; Lindsay, J Michael; Hurst, Sachiko M; Earl, Martha
2013-01-01
This study utilizes an informatics tool to analyze a robust literature search service in an academic medical center library. Structured interviews with librarians were conducted focusing on the benefits of such a tool, expectations for performance, and visual layout preferences. The resulting application utilizes Microsoft SQL Server and .Net Framework 3.5 technologies, allowing for the use of a web interface. Customer tables and MeSH terms are included. The National Library of Medicine MeSH database and entry terms for each heading are incorporated, resulting in functionality similar to searching the MeSH database through PubMed. Data reports will facilitate analysis of the search service.
Pathology data integration with eXtensible Markup Language.
Berman, Jules J
2005-02-01
It is impossible to overstate the importance of XML (eXtensible Markup Language) as a data organization tool. With XML, pathologists can annotate all of their data (clinical and anatomic) in a format that can transform every pathology report into a database, without compromising narrative structure. The purpose of this manuscript is to provide an overview of XML for pathologists. Examples will demonstrate how pathologists can use XML to annotate individual data elements and to structure reports in a common format that can be merged with other XML files or queried using standard XML tools. This manuscript gives pathologists a glimpse into how XML allows pathology data to be linked to other types of biomedical data and reduces our dependence on centralized proprietary databases.
Sarrouti, Mourad; Ouatik El Alaoui, Said
2017-04-01
Passage retrieval, the identification of top-ranked passages that may contain the answer for a given biomedical question, is a crucial component for any biomedical question answering (QA) system. Passage retrieval in open-domain QA is a longstanding challenge widely studied over the last decades. However, it still requires further efforts in biomedical QA. In this paper, we present a new biomedical passage retrieval method based on Stanford CoreNLP sentence/passage length, probabilistic information retrieval (IR) model and UMLS concepts. In the proposed method, we first use our document retrieval system based on PubMed search engine and UMLS similarity to retrieve relevant documents to a given biomedical question. We then take the abstracts from the retrieved documents and use Stanford CoreNLP for sentence splitter to make a set of sentences, i.e., candidate passages. Using stemmed words and UMLS concepts as features for the BM25 model, we finally compute the similarity scores between the biomedical question and each of the candidate passages and keep the N top-ranked ones. Experimental evaluations performed on large standard datasets, provided by the BioASQ challenge, show that the proposed method achieves good performances compared with the current state-of-the-art methods. The proposed method significantly outperforms the current state-of-the-art methods by an average of 6.84% in terms of mean average precision (MAP). We have proposed an efficient passage retrieval method which can be used to retrieve relevant passages in biomedical QA systems with high mean average precision. Copyright © 2017 Elsevier Inc. All rights reserved.
Medical research in Israel and the Israel biomedical database.
Berns, D S; Rager-Zisman, B
2000-11-01
The data collected for the second edition of the Directory of Medical Research in Israel and the Israel Biomedical Database have yielded very relevant information concerning the distribution of investigators, publication activities and funding sources. The aggregate data confirm the findings of the first edition published in 1996 [2]. Those facts endorse the highly concentrated and extensive nature of medical research in the Jerusalem area, which is conducted at the Hebrew University and its affiliated hospitals. In contrast, Tel Aviv University, whose basic research staff is about two-thirds the size of the Hebrew University staff, has a more diffuse relationship with its clinical staff who are located at more than half a dozen hospitals. Ben-Gurion University in Beer Sheva and the Technion in Haifa are smaller in size, but have closer geographic contact between their clinical and basic research staff. Nonetheless, all the medical schools and affiliated hospitals have good publication and funding records. It is important to note that while some aspects of the performance at basic research institutions seem to be somewhat better than at hospitals, the records are actually quite similar despite the greater burden of clinical services at the hospitals as compared to teaching responsibilities in the basic sciences. The survey also indicates the substantial number of young investigators in the latest survey who did not appear in the first survey. While this is certainly encouraging, it is also disturbing that the funding sources are apparently decreasing at a time when young investigators are attempting to become established and the increasing burden of health care costs precludes financial assistance from hospital sources. The intensity and undoubtedly the quality of medical research in Israel remains at a level consistent with many of the more advanced western countries. This conclusion is somewhat mitigated by the fact that there is a decrease in available funding and a measurable decrease in scholarly activity at a time when a new, younger generation of investigators is just beginning to become productive. In closing, we wish to stress that the collection of data for the Biomedical Database is a continuing project and we encourage all medical researches who may not have contributed relevant information to write to the Office of the Chief Scientist or contact the office by email.
López-Cózar, E D; Ruiz-Pérez, R; Jiménez-Contreras, E
1999-01-01
To evaluate the editorial quality, diffusion, relevance of the scientific content, and the publication practices of the specialised journal Revista Española de Enfermedades Digestivas. We checked 136 parameters based on ISO standards, the recommendations of scientific and editorial organisations, and studies of scientific editing and international publishing practices for biomedical journals. Diffusion was calculated using national and international databases, specialised libraries in Spain, and Internet sources. The analysis of the scientific content and publication practices was based on bibliometric indicators for the journal, authorship, and contributions. The sample for this study comprised six alternate issues of volume number 88 (1996), the last issue of this volume, and the first issue of volume 89 (1997). The samples used for the bibliometric analysis varied depending on the characteristics of specific indicators and the availability of information. The overall mean value for compliance with standards was 46.1%, while the real mean was calculated at 72.21%. The editorial procedures at the journal are similar to those of analogous international journals. The Revista Española de Enfermedades Digestivas is included in international databases of biomedical journals, and in the interdisciplinary international database SCI. It was found to be present in 70% of the medical libraries of Spanish universities, and in 73% of the hospital libraries studied. Bibliometric indicators showed co-authorship to be 5.5%; the origin of the authors grouped by province and by type of institutional affiliations showed 27.8% of all authors to be from Madrid, and that more were affiliated with general hospitals than with university hospitals. The mean delay between initial receipt of a manuscript and its publication was 300 days. Cocitation analysis gave the journal a central position amongst the 38 Spanish biomedical journals considered representative of the field. The journal's impact factor for 1996 was 0.260. The Revista Española de Enfermedades Digestivas is a high-quality vehicle of research results, and has acceptable internal editorial procedures. The journal is widely distributed, though its visibility on the Internet should be improved. Co-authorship is similar to that seen in other medical journals. Steps should be taken to make this journal better known within Spain, and to reduce the delay between the initial receipt and the final publication of manuscripts. Its impact factor is increasing steadily.
BioSYNTHESIS: access to a knowledge network of health sciences databases.
Broering, N C; Hylton, J S; Guttmann, R; Eskridge, D
1991-04-01
Users of the IAIMS Knowledge Network at the Georgetown University Medical Center have access to multiple in-house and external databases from a single point of entry through BioSYNTHESIS. The IAIMS project has developed a rich environment of biomedical information resources that represent a medical decision support system for campus physicians and students. The BioSYNTHESIS system is an information navigator that provides transparent access to a Knowledge Network of over a dozen databases. These multiple health sciences databases consist of bibliographic, informational, diagnostic, and research systems which reside on diverse computers such as DEC VAXs, SUN 490, AT&T 3B2s, Macintoshes, IBM PC/PS2s and the AT&T ISN and SYTEK network systems. Ethernet and TCP/IP protocols are used in the network architecture. BioSYNTHESIS also provides network links to the other campus libraries and to external institutions. As additional knowledge resources and technological advances have become available. BioSYNTHESIS has evolved from a two phase to a three phase program. Major components of the system including recent achievements and future plans are described.
Farahmand, Sina; Maghami, Mohammad Hossein; Sodagar, Amir M
2012-01-01
This paper reports on the design of a programmable, high output impedance, large voltage compliance microstimulator for low-voltage biomedical applications. A 6-bit binary-weighted digital to analog converter (DAC) is used to generate biphasic stimulus current pulses. A compact current mirror with large output voltage compliance and high output resistance conveys the current pulses to the target tissue. Designed and simulated in a standard 0.18µm CMOS process, the microstimulator circuit is capable of delivering a maximum stimulation current of 160µA to a 10-kΩ resistive load. Operated at a 1.8-V supply voltage, the output stage exhibits a voltage compliance of 1.69V and output resistance of 160MΩ at full scale stimulus current. Layout of the core microelectrode circuit measures 25.5µm×31.5µm.
Determination of solute descriptors by chromatographic methods.
Poole, Colin F; Atapattu, Sanka N; Poole, Salwa K; Bell, Andrea K
2009-10-12
The solvation parameter model is now well established as a useful tool for obtaining quantitative structure-property relationships for chemical, biomedical and environmental processes. The model correlates a free-energy related property of a system to six free-energy derived descriptors describing molecular properties. These molecular descriptors are defined as L (gas-liquid partition coefficient on hexadecane at 298K), V (McGowan's characteristic volume), E (excess molar refraction), S (dipolarity/polarizability), A (hydrogen-bond acidity), and B (hydrogen-bond basicity). McGowan's characteristic volume is trivially calculated from structure and the excess molar refraction can be calculated for liquids from their refractive index and easily estimated for solids. The remaining four descriptors are derived by experiment using (largely) two-phase partitioning, chromatography, and solubility measurements. In this article, the use of gas chromatography, reversed-phase liquid chromatography, micellar electrokinetic chromatography, and two-phase partitioning for determining solute descriptors is described. A large database of experimental retention factors and partition coefficients is constructed after first applying selection tools to remove unreliable experimental values and an optimized collection of varied compounds with descriptor values suitable for calibrating chromatographic systems is presented. These optimized descriptors are demonstrated to be robust and more suitable than other groups of descriptors characterizing the separation properties of chromatographic systems.
National Space Biomedical Research Institute Annual Report
NASA Technical Reports Server (NTRS)
2000-01-01
This report summarizes the activities of the National Space Biomedical Research Institute (NSBRI) during FY 2000. The NSBRI is responsible for the development of countermeasures against the deleterious effects of long-duration space flight and performs fundamental and applied space biomedical research directed towards this specific goal. Its mission is to lead a world-class, national effort in integrated, critical path space biomedical research that supports NASA's Human Exploration and Development of Space (HEDS) Strategic Plan by focusing on the enabling of long-term human presence in, development of, and exploration of space. This is accomplished by: designing, testing and validating effective countermeasures to address the biological and environmental impediments to long-term human space flight; defining the molecular, cellular, organ-level, integrated responses and mechanistic relationships that ultimately determine these impediments, where such activity fosters the development of novel countermeasures; establishing biomedical support technologies to maximize human performance in space, reduce biomedical hazards to an acceptable level, and deliver quality medical care; transferring and disseminating the biomedical advances in knowledge and technology acquired through living and working in space to the general benefit of mankind, including the treatment of patients suffering from gravity- and radiation-related conditions on Earth; and ensuring open involvement of the scientific community, industry and the public at large in the Institute's activities and fostering a robust collaboration with NASA, particularly through NASA's Lyndon B. Johnson Space Center. Attachment:Appendices (A,B,C,D,E,F,G,H,I,J,K,L,M,N,O, and P.).
Patel, Vimla L; Yoskowitz, Nicole A; Arocha, Jose F; Shortliffe, Edward H
2009-02-01
Theoretical and methodological advances in the cognitive and learning sciences can greatly inform curriculum and instruction in biomedicine and also educational programs in biomedical informatics. It does so by addressing issues such as the processes related to comprehension of medical information, clinical problem-solving and decision-making, and the role of technology. This paper reviews these theories and methods from the cognitive and learning sciences and their role in addressing current and future needs in designing curricula, largely using illustrative examples drawn from medical education. The lessons of this past work are also applicable, however, to biomedical and health professional curricula in general, and to biomedical informatics training, in particular. We summarize empirical studies conducted over two decades on the role of memory, knowledge organization and reasoning as well as studies of problem-solving and decision-making in medical areas that inform curricular design. The results of this research contribute to the design of more informed curricula based on empirical findings about how people learn and think, and more specifically, how expertise is developed. Similarly, the study of practice can also help to shape theories of human performance, technology-based learning, and scientific and professional collaboration that extend beyond the domain of medicine. Just as biomedical science has revolutionized health care practice, research in the cognitive and learning sciences provides a scientific foundation for education in biomedicine, the health professions, and biomedical informatics.
NASA Technical Reports Server (NTRS)
Wilson, J. W.; Reginatto, M.; Hajnal, F.; Chun, S. Y.
1995-01-01
The Green's function for the transport of ions of high charge and energy is utilized with a nuclear fragmentation database to evaluate dose, dose equivalent, and RBE for C3H1OT1/2 cell survival and neoplastic transformation as a function of depth in soft tissue. Such evaluations are useful to estimates of biological risk for high altitude aircraft, space operations, accelerator operations, and biomedical applications.
NASA Technical Reports Server (NTRS)
Wilson, J. W.; Chun, S. Y.; Reginatto, M.; Hajnal, F.
1995-01-01
The Green's function for the transport of ions of high charge and energy is utilized with a nuclear fragmentation database to evaluate dose, dose equivalent, and RBE for C3H10T1/2 cell survival and neo-plastic transformation as function of depth in soft tissue. Such evaluations are useful to estimates of biological risk for high altitude aircraft, space operations, accelerator operations, and biomedical application.
JoVE: the Journal of Visualized Experiments.
Vardell, Emily
2015-01-01
The Journal of Visualized Experiments (JoVE) is the world's first scientific video journal and is designed to communicate research and scientific methods in an innovative, intuitive way. JoVE includes a wide range of biomedical videos, from biology to immunology and bioengineering to clinical and translation medicine. This column describes the browsing and searching capabilities of JoVE, as well as its additional features (including the JoVE Scientific Education Database designed for students in scientific fields).
The Orthology Ontology: development and applications.
Fernández-Breis, Jesualdo Tomás; Chiba, Hirokazu; Legaz-García, María Del Carmen; Uchiyama, Ikuo
2016-06-04
Computational comparative analysis of multiple genomes provides valuable opportunities to biomedical research. In particular, orthology analysis can play a central role in comparative genomics; it guides establishing evolutionary relations among genes of organisms and allows functional inference of gene products. However, the wide variations in current orthology databases necessitate the research toward the shareability of the content that is generated by different tools and stored in different structures. Exchanging the content with other research communities requires making the meaning of the content explicit. The need for a common ontology has led to the creation of the Orthology Ontology (ORTH) following the best practices in ontology construction. Here, we describe our model and major entities of the ontology that is implemented in the Web Ontology Language (OWL), followed by the assessment of the quality of the ontology and the application of the ORTH to existing orthology datasets. This shareable ontology enables the possibility to develop Linked Orthology Datasets and a meta-predictor of orthology through standardization for the representation of orthology databases. The ORTH is freely available in OWL format to all users at http://purl.org/net/orth . The Orthology Ontology can serve as a framework for the semantic standardization of orthology content and it will contribute to a better exploitation of orthology resources in biomedical research. The results demonstrate the feasibility of developing shareable datasets using this ontology. Further applications will maximize the usefulness of this ontology.
Nguyen, Caroline T; MacEntee, Michael I; Mintzes, Barbara; Perry, Thomas L
2014-01-01
Over three-quarters of the older population take medications that can potentially cause dry mouth. Physicians or pharmacists rarely inform patients about this adverse effect and its potentially severe damage to the teeth, mouth and general health. The objectives of this study were to (1) identify warnings in the literature about dry mouth associated with the most frequently prescribed pharmaceutical products in Canada; and (2) consider how this information might be obtained by physicians, pharmacists and patients. Monographs on the 72 most frequently prescribed medications during 2010 were retrieved from the Compendium of Pharmaceuticals and Specialties (CPS, a standard drug information reference for physicians and pharmacists), the National Library of Medicine's 'DailyMed' database, directly from the manufacturers, and from a systematic search of biomedical journals. The CPS provided monographs for 43% of the medications, and requests to manufacturers produced the remaining monographs. Mentions of dry mouth were identified in 61% of the products (43% amongst CPS monographs; an additional 43% amongst manufacturers' monographs; 7% in the DailyMed database and 7% from biomedical journals); five medications had contradictory reports in different monographs. Nearly two-thirds (61%) of the most commonly prescribed medications can cause dry mouth, yet warnings about this adverse effect and its potentially serious consequences are not readily available to physicians, pharmacists, dentists or patients.
Abugessaisa, Imad; Saevarsdottir, Saedis; Tsipras, Giorgos; Lindblad, Staffan; Sandin, Charlotta; Nikamo, Pernilla; Ståhle, Mona; Malmström, Vivianne; Klareskog, Lars; Tegnér, Jesper
2014-01-01
Translational medicine is becoming increasingly dependent upon data generated from health care, clinical research, and molecular investigations. This increasing rate of production and diversity in data has brought about several challenges, including the need to integrate fragmented databases, enable secondary use of patient clinical data from health care in clinical research, and to create information systems that clinicians and biomedical researchers can readily use. Our case study effectively integrates requirements from the clinical and biomedical researcher perspectives in a translational medicine setting. Our three principal achievements are (a) a design of a user-friendly web-based system for management and integration of clinical and molecular databases, while adhering to proper de-identification and security measures; (b) providing a real-world test of the system functionalities using clinical cohorts; and (c) system integration with a clinical decision support system to demonstrate system interoperability. We engaged two active clinical cohorts, 747 psoriasis patients and 2001 rheumatoid arthritis patients, to demonstrate efficient query possibilities across the data sources, enable cohort stratification, extract variation in antibody patterns, study biomarker predictors of treatment response in RA patients, and to explore metabolic profiles of psoriasis patients. Finally, we demonstrated system interoperability by enabling integration with an established clinical decision support system in health care. To assure the usefulness and usability of the system, we followed two approaches. First, we created a graphical user interface supporting all user interactions. Secondly we carried out a system performance evaluation study where we measured the average response time in seconds for active users, http errors, and kilobits per second received and sent. The maximum response time was found to be 0.12 seconds; no server or client errors of any kind were detected. In conclusion, the system can readily be used by clinicians and biomedical researchers in a translational medicine setting. PMID:25203647
Müller, H-M; Van Auken, K M; Li, Y; Sternberg, P W
2018-03-09
The biomedical literature continues to grow at a rapid pace, making the challenge of knowledge retrieval and extraction ever greater. Tools that provide a means to search and mine the full text of literature thus represent an important way by which the efficiency of these processes can be improved. We describe the next generation of the Textpresso information retrieval system, Textpresso Central (TPC). TPC builds on the strengths of the original system by expanding the full text corpus to include the PubMed Central Open Access Subset (PMC OA), as well as the WormBase C. elegans bibliography. In addition, TPC allows users to create a customized corpus by uploading and processing documents of their choosing. TPC is UIMA compliant, to facilitate compatibility with external processing modules, and takes advantage of Lucene indexing and search technology for efficient handling of millions of full text documents. Like Textpresso, TPC searches can be performed using keywords and/or categories (semantically related groups of terms), but to provide better context for interpreting and validating queries, search results may now be viewed as highlighted passages in the context of full text. To facilitate biocuration efforts, TPC also allows users to select text spans from the full text and annotate them, create customized curation forms for any data type, and send resulting annotations to external curation databases. As an example of such a curation form, we describe integration of TPC with the Noctua curation tool developed by the Gene Ontology (GO) Consortium. Textpresso Central is an online literature search and curation platform that enables biocurators and biomedical researchers to search and mine the full text of literature by integrating keyword and category searches with viewing search results in the context of the full text. It also allows users to create customized curation interfaces, use those interfaces to make annotations linked to supporting evidence statements, and then send those annotations to any database in the world. Textpresso Central URL: http://www.textpresso.org/tpc.
Abugessaisa, Imad; Saevarsdottir, Saedis; Tsipras, Giorgos; Lindblad, Staffan; Sandin, Charlotta; Nikamo, Pernilla; Ståhle, Mona; Malmström, Vivianne; Klareskog, Lars; Tegnér, Jesper
2014-01-01
Translational medicine is becoming increasingly dependent upon data generated from health care, clinical research, and molecular investigations. This increasing rate of production and diversity in data has brought about several challenges, including the need to integrate fragmented databases, enable secondary use of patient clinical data from health care in clinical research, and to create information systems that clinicians and biomedical researchers can readily use. Our case study effectively integrates requirements from the clinical and biomedical researcher perspectives in a translational medicine setting. Our three principal achievements are (a) a design of a user-friendly web-based system for management and integration of clinical and molecular databases, while adhering to proper de-identification and security measures; (b) providing a real-world test of the system functionalities using clinical cohorts; and (c) system integration with a clinical decision support system to demonstrate system interoperability. We engaged two active clinical cohorts, 747 psoriasis patients and 2001 rheumatoid arthritis patients, to demonstrate efficient query possibilities across the data sources, enable cohort stratification, extract variation in antibody patterns, study biomarker predictors of treatment response in RA patients, and to explore metabolic profiles of psoriasis patients. Finally, we demonstrated system interoperability by enabling integration with an established clinical decision support system in health care. To assure the usefulness and usability of the system, we followed two approaches. First, we created a graphical user interface supporting all user interactions. Secondly we carried out a system performance evaluation study where we measured the average response time in seconds for active users, http errors, and kilobits per second received and sent. The maximum response time was found to be 0.12 seconds; no server or client errors of any kind were detected. In conclusion, the system can readily be used by clinicians and biomedical researchers in a translational medicine setting.
Integrating text mining into the MGI biocuration workflow
Dowell, K.G.; McAndrews-Hill, M.S.; Hill, D.P.; Drabkin, H.J.; Blake, J.A.
2009-01-01
A major challenge for functional and comparative genomics resource development is the extraction of data from the biomedical literature. Although text mining for biological data is an active research field, few applications have been integrated into production literature curation systems such as those of the model organism databases (MODs). Not only are most available biological natural language (bioNLP) and information retrieval and extraction solutions difficult to adapt to existing MOD curation workflows, but many also have high error rates or are unable to process documents available in those formats preferred by scientific journals. In September 2008, Mouse Genome Informatics (MGI) at The Jackson Laboratory initiated a search for dictionary-based text mining tools that we could integrate into our biocuration workflow. MGI has rigorous document triage and annotation procedures designed to identify appropriate articles about mouse genetics and genome biology. We currently screen ∼1000 journal articles a month for Gene Ontology terms, gene mapping, gene expression, phenotype data and other key biological information. Although we do not foresee that curation tasks will ever be fully automated, we are eager to implement named entity recognition (NER) tools for gene tagging that can help streamline our curation workflow and simplify gene indexing tasks within the MGI system. Gene indexing is an MGI-specific curation function that involves identifying which mouse genes are being studied in an article, then associating the appropriate gene symbols with the article reference number in the MGI database. Here, we discuss our search process, performance metrics and success criteria, and how we identified a short list of potential text mining tools for further evaluation. We provide an overview of our pilot projects with NCBO's Open Biomedical Annotator and Fraunhofer SCAI's ProMiner. In doing so, we prove the potential for the further incorporation of semi-automated processes into the curation of the biomedical literature. PMID:20157492
Integrating text mining into the MGI biocuration workflow.
Dowell, K G; McAndrews-Hill, M S; Hill, D P; Drabkin, H J; Blake, J A
2009-01-01
A major challenge for functional and comparative genomics resource development is the extraction of data from the biomedical literature. Although text mining for biological data is an active research field, few applications have been integrated into production literature curation systems such as those of the model organism databases (MODs). Not only are most available biological natural language (bioNLP) and information retrieval and extraction solutions difficult to adapt to existing MOD curation workflows, but many also have high error rates or are unable to process documents available in those formats preferred by scientific journals.In September 2008, Mouse Genome Informatics (MGI) at The Jackson Laboratory initiated a search for dictionary-based text mining tools that we could integrate into our biocuration workflow. MGI has rigorous document triage and annotation procedures designed to identify appropriate articles about mouse genetics and genome biology. We currently screen approximately 1000 journal articles a month for Gene Ontology terms, gene mapping, gene expression, phenotype data and other key biological information. Although we do not foresee that curation tasks will ever be fully automated, we are eager to implement named entity recognition (NER) tools for gene tagging that can help streamline our curation workflow and simplify gene indexing tasks within the MGI system. Gene indexing is an MGI-specific curation function that involves identifying which mouse genes are being studied in an article, then associating the appropriate gene symbols with the article reference number in the MGI database.Here, we discuss our search process, performance metrics and success criteria, and how we identified a short list of potential text mining tools for further evaluation. We provide an overview of our pilot projects with NCBO's Open Biomedical Annotator and Fraunhofer SCAI's ProMiner. In doing so, we prove the potential for the further incorporation of semi-automated processes into the curation of the biomedical literature.
Wu, Tsung-Jung; Schriml, Lynn M.; Chen, Qing-Rong; Colbert, Maureen; Crichton, Daniel J.; Finney, Richard; Hu, Ying; Kibbe, Warren A.; Kincaid, Heather; Meerzaman, Daoud; Mitraka, Elvira; Pan, Yang; Smith, Krista M.; Srivastava, Sudhir; Ward, Sari; Yan, Cheng; Mazumder, Raja
2015-01-01
Bio-ontologies provide terminologies for the scientific community to describe biomedical entities in a standardized manner. There are multiple initiatives that are developing biomedical terminologies for the purpose of providing better annotation, data integration and mining capabilities. Terminology resources devised for multiple purposes inherently diverge in content and structure. A major issue of biomedical data integration is the development of overlapping terms, ambiguous classifications and inconsistencies represented across databases and publications. The disease ontology (DO) was developed over the past decade to address data integration, standardization and annotation issues for human disease data. We have established a DO cancer project to be a focused view of cancer terms within the DO. The DO cancer project mapped 386 cancer terms from the Catalogue of Somatic Mutations in Cancer (COSMIC), The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium, Therapeutically Applicable Research to Generate Effective Treatments, Integrative Oncogenomics and the Early Detection Research Network into a cohesive set of 187 DO terms represented by 63 top-level DO cancer terms. For example, the COSMIC term ‘kidney, NS, carcinoma, clear_cell_renal_cell_carcinoma’ and TCGA term ‘Kidney renal clear cell carcinoma’ were both grouped to the term ‘Disease Ontology Identification (DOID):4467 / renal clear cell carcinoma’ which was mapped to the TopNodes_DOcancerslim term ‘DOID:263 / kidney cancer’. Mapping of diverse cancer terms to DO and the use of top level terms (DO slims) will enable pan-cancer analysis across datasets generated from any of the cancer term sources where pan-cancer means including or relating to all or multiple types of cancer. The terms can be browsed from the DO web site (http://www.disease-ontology.org) and downloaded from the DO’s Apache Subversion or GitHub repositories. Database URL: http://www.disease-ontology.org PMID:25841438
Retractions in cancer research: a systematic survey.
Bozzo, Anthony; Bali, Kamal; Evaniew, Nathan; Ghert, Michelle
2017-01-01
The annual number of retracted publications in the scientific literature is rapidly increasing. The objective of this study was to determine the frequency and reason for retraction of cancer publications and to determine how journals in the cancer field handle retracted articles. We searched three online databases (MEDLINE, Embase, The Cochrane Library) from database inception until 2015 for retracted journal publications related to cancer research. For each article, the reason for retraction was categorized as plagiarism, duplicate publication, fraud, error, authorship issues, or ethical issues. Accessibility of the retracted article was defined as intact, removed, or available but with a watermark over each page. Descriptive data was collected on each retracted article including number of citations, journal name and impact factor, study design, and time between publication and retraction. The publications were screened in duplicated and two reviewers extracted and categorized data. Following database search and article screening, we identified 571 retracted cancer publications. The majority (76.4%) of cancer retractions were issued in the most recent decade, with 16.6 and 6.7% of the retractions in the prior two decades respectively. Retractions were issued by journals with impact factors ranging from 0 (discontinued) to 55.8. The average impact factor was 5.4 (median 3.54, IQR 1.8-5.5). On average, a retracted article was cited 45 times (median 18, IQR 6-51), with a range of 0-742. Reasons for retraction include plagiarism (14.4%), fraud (28.4%), duplicate publication (18.2%), error (24.2%), authorship issues (3.9%), and ethical issues (2.1%). The reason for retraction was not stated in 9.8% of cases. Twenty-nine percent of retracted articles remain available online in their original form. Retractions in cancer research are increasing in frequency at a similar rate to all biomedical research retractions. Cancer retractions are largely due to academic misconduct. Consequences to cancer patients, the public at large, and the research community can be substantial and should be addressed with future research. Despite the implications of this important issue, some cancer journals currently fall short of the current guidelines for clearly stating the reason for retraction and identifying the publication as retracted.
Biomedical data mining in clinical routine: expanding the impact of hospital information systems.
Müller, Marcel; Markó, Kornel; Daumke, Philipp; Paetzold, Jan; Roesner, Arnold; Klar, Rüdiger
2007-01-01
In this paper we want to describe how the promising technology of biomedical data mining can improve the use of hospital information systems: a large set of unstructured, narrative clinical data from a dermatological university hospital like discharge letters or other dermatological reports were processed through a morpho-semantic text retrieval engine ("MorphoSaurus") and integrated with other clinical data using a web-based interface and brought into daily clinical routine. The user evaluation showed a very high user acceptance - this system seems to meet the clinicians' requirements for a vertical data mining in the electronic patient records. What emerges is the need for integration of biomedical data mining into hospital information systems for clinical, scientific, educational and economic reasons.
2014-01-01
Background Independent data sources can be used to augment post-marketing drug safety signal detection. The vast amount of publicly available biomedical literature contains rich side effect information for drugs at all clinical stages. In this study, we present a large-scale signal boosting approach that combines over 4 million records in the US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) and over 21 million biomedical articles. Results The datasets are comprised of 4,285,097 records from FAERS and 21,354,075 MEDLINE articles. We first extracted all drug-side effect (SE) pairs from FAERS. Our study implemented a total of seven signal ranking algorithms. We then compared these different ranking algorithms before and after they were boosted with signals from MEDLINE sentences or abstracts. Finally, we manually curated all drug-cardiovascular (CV) pairs that appeared in both data sources and investigated whether our approach can detect many true signals that have not been included in FDA drug labels. We extracted a total of 2,787,797 drug-SE pairs from FAERS with a low initial precision of 0.025. The ranking algorithm combined signals from both FAERS and MEDLINE, significantly improving the precision from 0.025 to 0.371 for top-ranked pairs, representing a 13.8 fold elevation in precision. We showed by manual curation that drug-SE pairs that appeared in both data sources were highly enriched with true signals, many of which have not yet been included in FDA drug labels. Conclusions We have developed an efficient and effective drug safety signal ranking and strengthening approach We demonstrate that large-scale combining information from FAERS and biomedical literature can significantly contribute to drug safety surveillance. PMID:24428898
Xu, Rong; Wang, QuanQiu
2014-01-15
Independent data sources can be used to augment post-marketing drug safety signal detection. The vast amount of publicly available biomedical literature contains rich side effect information for drugs at all clinical stages. In this study, we present a large-scale signal boosting approach that combines over 4 million records in the US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) and over 21 million biomedical articles. The datasets are comprised of 4,285,097 records from FAERS and 21,354,075 MEDLINE articles. We first extracted all drug-side effect (SE) pairs from FAERS. Our study implemented a total of seven signal ranking algorithms. We then compared these different ranking algorithms before and after they were boosted with signals from MEDLINE sentences or abstracts. Finally, we manually curated all drug-cardiovascular (CV) pairs that appeared in both data sources and investigated whether our approach can detect many true signals that have not been included in FDA drug labels. We extracted a total of 2,787,797 drug-SE pairs from FAERS with a low initial precision of 0.025. The ranking algorithm combined signals from both FAERS and MEDLINE, significantly improving the precision from 0.025 to 0.371 for top-ranked pairs, representing a 13.8 fold elevation in precision. We showed by manual curation that drug-SE pairs that appeared in both data sources were highly enriched with true signals, many of which have not yet been included in FDA drug labels. We have developed an efficient and effective drug safety signal ranking and strengthening approach We demonstrate that large-scale combining information from FAERS and biomedical literature can significantly contribute to drug safety surveillance.
Covalent Organic Frameworks: From Materials Design to Biomedical Application
Zhao, Fuli; Liu, Huiming; Mathe, Salva D. R.; Dong, Anjie
2017-01-01
Covalent organic frameworks (COFs) are newly emerged crystalline porous polymers with well-defined skeletons and nanopores mainly consisted of light-weight elements (H, B, C, N and O) linked by dynamic covalent bonds. Compared with conventional materials, COFs possess some unique and attractive features, such as large surface area, pre-designable pore geometry, excellent crystallinity, inherent adaptability and high flexibility in structural and functional design, thus exhibiting great potential for various applications. Especially, their large surface area and tunable porosity and π conjugation with unique photoelectric properties will enable COFs to serve as a promising platform for drug delivery, bioimaging, biosensing and theranostic applications. In this review, we trace the evolution of COFs in terms of linkages and highlight the important issues on synthetic method, structural design, morphological control and functionalization. And then we summarize the recent advances of COFs in the biomedical and pharmaceutical sectors and conclude with a discussion of the challenges and opportunities of COFs for biomedical purposes. Although currently still at its infancy stage, COFs as an innovative source have paved a new way to meet future challenges in human healthcare and disease theranostic. PMID:29283423
Opal web services for biomedical applications.
Ren, Jingyuan; Williams, Nadya; Clementi, Luca; Krishnan, Sriram; Li, Wilfred W
2010-07-01
Biomedical applications have become increasingly complex, and they often require large-scale high-performance computing resources with a large number of processors and memory. The complexity of application deployment and the advances in cluster, grid and cloud computing require new modes of support for biomedical research. Scientific Software as a Service (sSaaS) enables scalable and transparent access to biomedical applications through simple standards-based Web interfaces. Towards this end, we built a production web server (http://ws.nbcr.net) in August 2007 to support the bioinformatics application called MEME. The server has grown since to include docking analysis with AutoDock and AutoDock Vina, electrostatic calculations using PDB2PQR and APBS, and off-target analysis using SMAP. All the applications on the servers are powered by Opal, a toolkit that allows users to wrap scientific applications easily as web services without any modification to the scientific codes, by writing simple XML configuration files. Opal allows both web forms-based access and programmatic access of all our applications. The Opal toolkit currently supports SOAP-based Web service access to a number of popular applications from the National Biomedical Computation Resource (NBCR) and affiliated collaborative and service projects. In addition, Opal's programmatic access capability allows our applications to be accessed through many workflow tools, including Vision, Kepler, Nimrod/K and VisTrails. From mid-August 2007 to the end of 2009, we have successfully executed 239,814 jobs. The number of successfully executed jobs more than doubled from 205 to 411 per day between 2008 and 2009. The Opal-enabled service model is useful for a wide range of applications. It provides for interoperation with other applications with Web Service interfaces, and allows application developers to focus on the scientific tool and workflow development. Web server availability: http://ws.nbcr.net.
Carmen Legaz-García, María Del; Miñarro-Giménez, José Antonio; Menárguez-Tortosa, Marcos; Fernández-Breis, Jesualdo Tomás
2016-06-03
Biomedical research usually requires combining large volumes of data from multiple heterogeneous sources, which makes difficult the integrated exploitation of such data. The Semantic Web paradigm offers a natural technological space for data integration and exploitation by generating content readable by machines. Linked Open Data is a Semantic Web initiative that promotes the publication and sharing of data in machine readable semantic formats. We present an approach for the transformation and integration of heterogeneous biomedical data with the objective of generating open biomedical datasets in Semantic Web formats. The transformation of the data is based on the mappings between the entities of the data schema and the ontological infrastructure that provides the meaning to the content. Our approach permits different types of mappings and includes the possibility of defining complex transformation patterns. Once the mappings are defined, they can be automatically applied to datasets to generate logically consistent content and the mappings can be reused in further transformation processes. The results of our research are (1) a common transformation and integration process for heterogeneous biomedical data; (2) the application of Linked Open Data principles to generate interoperable, open, biomedical datasets; (3) a software tool, called SWIT, that implements the approach. In this paper we also describe how we have applied SWIT in different biomedical scenarios and some lessons learned. We have presented an approach that is able to generate open biomedical repositories in Semantic Web formats. SWIT is able to apply the Linked Open Data principles in the generation of the datasets, so allowing for linking their content to external repositories and creating linked open datasets. SWIT datasets may contain data from multiple sources and schemas, thus becoming integrated datasets.
BIOSSES: a semantic sentence similarity estimation system for the biomedical domain.
Sogancioglu, Gizem; Öztürk, Hakime; Özgür, Arzucan
2017-07-15
The amount of information available in textual format is rapidly increasing in the biomedical domain. Therefore, natural language processing (NLP) applications are becoming increasingly important to facilitate the retrieval and analysis of these data. Computing the semantic similarity between sentences is an important component in many NLP tasks including text retrieval and summarization. A number of approaches have been proposed for semantic sentence similarity estimation for generic English. However, our experiments showed that such approaches do not effectively cover biomedical knowledge and produce poor results for biomedical text. We propose several approaches for sentence-level semantic similarity computation in the biomedical domain, including string similarity measures and measures based on the distributed vector representations of sentences learned in an unsupervised manner from a large biomedical corpus. In addition, ontology-based approaches are presented that utilize general and domain-specific ontologies. Finally, a supervised regression based model is developed that effectively combines the different similarity computation metrics. A benchmark data set consisting of 100 sentence pairs from the biomedical literature is manually annotated by five human experts and used for evaluating the proposed methods. The experiments showed that the supervised semantic sentence similarity computation approach obtained the best performance (0.836 correlation with gold standard human annotations) and improved over the state-of-the-art domain-independent systems up to 42.6% in terms of the Pearson correlation metric. A web-based system for biomedical semantic sentence similarity computation, the source code, and the annotated benchmark data set are available at: http://tabilab.cmpe.boun.edu.tr/BIOSSES/ . gizemsogancioglu@gmail.com or arzucan.ozgur@boun.edu.tr. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Simmons, Michael; Singhal, Ayush; Lu, Zhiyong
2018-01-01
The key question of precision medicine is whether it is possible to find clinically actionable granularity in diagnosing disease and classifying patient risk. The advent of next generation sequencing and the widespread adoption of electronic health records (EHRs) have provided clinicians and researchers a wealth of data and made possible the precise characterization of individual patient genotypes and phenotypes. Unstructured text — found in biomedical publications and clinical notes — is an important component of genotype and phenotype knowledge. Publications in the biomedical literature provide essential information for interpreting genetic data. Likewise, clinical notes contain the richest source of phenotype information in EHRs. Text mining can render these texts computationally accessible and support information extraction and hypothesis generation. This chapter reviews the mechanics of text mining in precision medicine and discusses several specific use cases, including database curation for personalized cancer medicine, patient outcome prediction from EHR-derived cohorts, and pharmacogenomic research. Taken as a whole, these use cases demonstrate how text mining enables effective utilization of existing knowledge sources and thus promotes increased value for patients and healthcare systems. Text mining is an indispensable tool for translating genotype-phenotype data into effective clinical care that will undoubtedly play an important role in the eventual realization of precision medicine. PMID:27807747
Simmons, Michael; Singhal, Ayush; Lu, Zhiyong
2016-01-01
The key question of precision medicine is whether it is possible to find clinically actionable granularity in diagnosing disease and classifying patient risk. The advent of next-generation sequencing and the widespread adoption of electronic health records (EHRs) have provided clinicians and researchers a wealth of data and made possible the precise characterization of individual patient genotypes and phenotypes. Unstructured text-found in biomedical publications and clinical notes-is an important component of genotype and phenotype knowledge. Publications in the biomedical literature provide essential information for interpreting genetic data. Likewise, clinical notes contain the richest source of phenotype information in EHRs. Text mining can render these texts computationally accessible and support information extraction and hypothesis generation. This chapter reviews the mechanics of text mining in precision medicine and discusses several specific use cases, including database curation for personalized cancer medicine, patient outcome prediction from EHR-derived cohorts, and pharmacogenomic research. Taken as a whole, these use cases demonstrate how text mining enables effective utilization of existing knowledge sources and thus promotes increased value for patients and healthcare systems. Text mining is an indispensable tool for translating genotype-phenotype data into effective clinical care that will undoubtedly play an important role in the eventual realization of precision medicine.
Anatomical entity mention recognition at literature scale
Pyysalo, Sampo; Ananiadou, Sophia
2014-01-01
Motivation: Anatomical entities ranging from subcellular structures to organ systems are central to biomedical science, and mentions of these entities are essential to understanding the scientific literature. Despite extensive efforts to automatically analyze various aspects of biomedical text, there have been only few studies focusing on anatomical entities, and no dedicated methods for learning to automatically recognize anatomical entity mentions in free-form text have been introduced. Results: We present AnatomyTagger, a machine learning-based system for anatomical entity mention recognition. The system incorporates a broad array of approaches proposed to benefit tagging, including the use of Unified Medical Language System (UMLS)- and Open Biomedical Ontologies (OBO)-based lexical resources, word representations induced from unlabeled text, statistical truecasing and non-local features. We train and evaluate the system on a newly introduced corpus that substantially extends on previously available resources, and apply the resulting tagger to automatically annotate the entire open access scientific domain literature. The resulting analyses have been applied to extend services provided by the Europe PubMed Central literature database. Availability and implementation: All tools and resources introduced in this work are available from http://nactem.ac.uk/anatomytagger. Contact: sophia.ananiadou@manchester.ac.uk Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:24162468
2012-01-01
Background We introduce the linguistic annotation of a corpus of 97 full-text biomedical publications, known as the Colorado Richly Annotated Full Text (CRAFT) corpus. We further assess the performance of existing tools for performing sentence splitting, tokenization, syntactic parsing, and named entity recognition on this corpus. Results Many biomedical natural language processing systems demonstrated large differences between their previously published results and their performance on the CRAFT corpus when tested with the publicly available models or rule sets. Trainable systems differed widely with respect to their ability to build high-performing models based on this data. Conclusions The finding that some systems were able to train high-performing models based on this corpus is additional evidence, beyond high inter-annotator agreement, that the quality of the CRAFT corpus is high. The overall poor performance of various systems indicates that considerable work needs to be done to enable natural language processing systems to work well when the input is full-text journal articles. The CRAFT corpus provides a valuable resource to the biomedical natural language processing community for evaluation and training of new models for biomedical full text publications. PMID:22901054
Enhancing Biomedical Text Summarization Using Semantic Relation Extraction
Shang, Yue; Li, Yanpeng; Lin, Hongfei; Yang, Zhihao
2011-01-01
Automatic text summarization for a biomedical concept can help researchers to get the key points of a certain topic from large amount of biomedical literature efficiently. In this paper, we present a method for generating text summary for a given biomedical concept, e.g., H1N1 disease, from multiple documents based on semantic relation extraction. Our approach includes three stages: 1) We extract semantic relations in each sentence using the semantic knowledge representation tool SemRep. 2) We develop a relation-level retrieval method to select the relations most relevant to each query concept and visualize them in a graphic representation. 3) For relations in the relevant set, we extract informative sentences that can interpret them from the document collection to generate text summary using an information retrieval based method. Our major focus in this work is to investigate the contribution of semantic relation extraction to the task of biomedical text summarization. The experimental results on summarization for a set of diseases show that the introduction of semantic knowledge improves the performance and our results are better than the MEAD system, a well-known tool for text summarization. PMID:21887336
Histochemistry in biology and medicine: a message from the citing journals.
Pellicciari, Carlo
2015-12-23
Especially in recent years, biomedical research has taken advantage of the progress in several disciplines, among which microscopy and histochemistry. To assess the influence of histochemistry in the biomedical field, the articles published during the period 2011-2015 have been selected from different databases and grouped by subject categories: as expected, biological and biomedical studies where histochemistry has been used as a major experimental approach include a wide of basic and applied researches on both humans and other animal or plant organisms. To better understand the impact of histochemical publications onto the different biological and medical disciplines, it was useful to look at the journals where the articles published in a multidisciplinary journal of histochemistry have been cited: it was observed that, in the five-years period considered, 20% only of the citations were in histochemical periodicals, the remaining ones being in journals of Cell & Tissue biology, general and experimental Medicine, Oncology, Biochemistry & Molecular biology, Neurobiology, Anatomy & Morphology, Pharmacology & Toxicology, Reproductive biology, Veterinary sciences, Physiology, Endocrinology, Tissue engineering & Biomaterials, as well as in multidisciplinary journals.It is easy to foresee that also in the future the histochemical journals will be an attended forum for basic and applied scientists in the biomedical field. It will be crucial that these journals be open to an audience as varied as possible, publishing articles on the application of refined techniques to very different experimental models: this will stimulate non-histochemist scientists to approach histochemistry whose application horizon could expand to novel and possibly exclusive subjects.
PubMed-based quantitative analysis of biomedical publications in the SAARC countries: 1985-2009.
Azim Majumder, Md Anwarul; Shaban, Sami F; Rahman, Sayeeda; Rahman, Nuzhat; Ahmed, Moslehuddin; Bin Abdulrahman, Khalid A; Islam, Ziauddin
2012-09-01
To conduct a geographical analysis of biomedical publications from the South Asian Association for Regional Cooperation (SAARC) countries over the past 25 years (1985-2009) using the PubMed database. A qualitative study. Web-based search during September 2010. A data extraction program, developed by one of the authors (SFS), was used to extract the raw publication counts from the downloaded PubMed data. A search of PubMed was performed for all journals indexed by selecting the advanced search option and entering the country name in the 'affiliation' field. The publications were normalized by total population, adult illiteracy rate, gross domestic product (GDP), secondary school enrollment ratio and Internet usage rate. The number of PubMed-listed papers published by the SAARC countries over the last 25 years totalled 141,783, which is 1.1% of the total papers indexed by PubMed in the same period. India alone produced 90.5% of total publications generated by SAARC countries. The average number of papers published per year from 1985 to 2009 was 5671 and number of publication increased approximately 242-fold. Normalizing by the population (per million) and GDP (per billion), India (133, 27.6%) and Nepal (323, 37.3%) had the highest publications respectively. There was a marked imbalance among the SAARC countries in terms of biomedical research and publication. Because of huge population and the high disease burden, biomedical research and publication output should receive special attention to formulate health policies, re-orient medical education curricula, and alleviate diseases and poverty.
Adverse Drug Event Discovery Using Biomedical Literature: A Big Data Neural Network Adventure
Badger, Jonathan; LaRose, Eric; Shirzadi, Ehsan; Mahnke, Andrea; Mayer, John; Ye, Zhan; Page, David; Peissig, Peggy
2017-01-01
Background The study of adverse drug events (ADEs) is a tenured topic in medical literature. In recent years, increasing numbers of scientific articles and health-related social media posts have been generated and shared daily, albeit with very limited use for ADE study and with little known about the content with respect to ADEs. Objective The aim of this study was to develop a big data analytics strategy that mines the content of scientific articles and health-related Web-based social media to detect and identify ADEs. Methods We analyzed the following two data sources: (1) biomedical articles and (2) health-related social media blog posts. We developed an intelligent and scalable text mining solution on big data infrastructures composed of Apache Spark, natural language processing, and machine learning. This was combined with an Elasticsearch No-SQL distributed database to explore and visualize ADEs. Results The accuracy, precision, recall, and area under receiver operating characteristic of the system were 92.7%, 93.6%, 93.0%, and 0.905, respectively, and showed better results in comparison with traditional approaches in the literature. This work not only detected and classified ADE sentences from big data biomedical literature but also scientifically visualized ADE interactions. Conclusions To the best of our knowledge, this work is the first to investigate a big data machine learning strategy for ADE discovery on massive datasets downloaded from PubMed Central and social media. This contribution illustrates possible capacities in big data biomedical text analysis using advanced computational methods with real-time update from new data published on a daily basis. PMID:29222076
Histochemistry in Biology and Medicine: A Message From the Citing Journals
2015-01-01
Especially in recent years, biomedical research has taken advantage of the progress in several disciplines, among which microscopy and histochemistry. To assess the influence of histochemistry in the biomedical field, the articles published during the period 2011-2015 have been selected from different databases and grouped by subject categories. As expected, biological and biomedical studies where histochemistry has been used as a major experimental approach include a wide range of basic and applied researches on both humans and other animal or plant organisms. To better understand the impact of histochemical publications onto the different biological and medical disciplines, it was useful to look at the journals where the articles published in a multidisciplinary journal of histochemistry have been cited: it was observed that, in the five-years period considered, 20% only of the citations were in histochemical periodicals, the remaining ones being in journals of Cell & Tissue biology, general and experimental Medicine, Oncology, Biochemistry & Molecular biology, Neurobiology, Anatomy & Morphology, Pharmacology & Toxicology, Reproductive biology, Veterinary sciences, Physiology, Endocrinology, Tissue engineering & Biomaterials, as well as in multidisciplinary journals. It is easy to foresee that also in the future the histochemical journals will be an attended forum for basic and applied scientists in the biomedical field. It will be crucial that these journals be open to an audience as varied as possible, publishing articles on the application of refined techniques to very different experimental models: this will stimulate non-histochemist scientists to approach histochemistry whose application horizon could expand to novel and possibly exclusive subjects. PMID:26708189
Document Exploration and Automatic Knowledge Extraction for Unstructured Biomedical Text
NASA Astrophysics Data System (ADS)
Chu, S.; Totaro, G.; Doshi, N.; Thapar, S.; Mattmann, C. A.; Ramirez, P.
2015-12-01
We describe our work on building a web-browser based document reader with built-in exploration tool and automatic concept extraction of medical entities for biomedical text. Vast amounts of biomedical information are offered in unstructured text form through scientific publications and R&D reports. Utilizing text mining can help us to mine information and extract relevant knowledge from a plethora of biomedical text. The ability to employ such technologies to aid researchers in coping with information overload is greatly desirable. In recent years, there has been an increased interest in automatic biomedical concept extraction [1, 2] and intelligent PDF reader tools with the ability to search on content and find related articles [3]. Such reader tools are typically desktop applications and are limited to specific platforms. Our goal is to provide researchers with a simple tool to aid them in finding, reading, and exploring documents. Thus, we propose a web-based document explorer, which we called Shangri-Docs, which combines a document reader with automatic concept extraction and highlighting of relevant terms. Shangri-Docsalso provides the ability to evaluate a wide variety of document formats (e.g. PDF, Words, PPT, text, etc.) and to exploit the linked nature of the Web and personal content by performing searches on content from public sites (e.g. Wikipedia, PubMed) and private cataloged databases simultaneously. Shangri-Docsutilizes Apache cTAKES (clinical Text Analysis and Knowledge Extraction System) [4] and Unified Medical Language System (UMLS) to automatically identify and highlight terms and concepts, such as specific symptoms, diseases, drugs, and anatomical sites, mentioned in the text. cTAKES was originally designed specially to extract information from clinical medical records. Our investigation leads us to extend the automatic knowledge extraction process of cTAKES for biomedical research domain by improving the ontology guided information extraction process. We will describe our experience and implementation of our system and share lessons learned from our development. We will also discuss ways in which this could be adapted to other science fields. [1] Funk et al., 2014. [2] Kang et al., 2014. [3] Utopia Documents, http://utopiadocs.com [4] Apache cTAKES, http://ctakes.apache.org
A new visual navigation system for exploring biomedical Open Educational Resource (OER) videos.
Zhao, Baoquan; Xu, Songhua; Lin, Shujin; Luo, Xiaonan; Duan, Lian
2016-04-01
Biomedical videos as open educational resources (OERs) are increasingly proliferating on the Internet. Unfortunately, seeking personally valuable content from among the vast corpus of quality yet diverse OER videos is nontrivial due to limitations of today's keyword- and content-based video retrieval techniques. To address this need, this study introduces a novel visual navigation system that facilitates users' information seeking from biomedical OER videos in mass quantity by interactively offering visual and textual navigational clues that are both semantically revealing and user-friendly. The authors collected and processed around 25 000 YouTube videos, which collectively last for a total length of about 4000 h, in the broad field of biomedical sciences for our experiment. For each video, its semantic clues are first extracted automatically through computationally analyzing audio and visual signals, as well as text either accompanying or embedded in the video. These extracted clues are subsequently stored in a metadata database and indexed by a high-performance text search engine. During the online retrieval stage, the system renders video search results as dynamic web pages using a JavaScript library that allows users to interactively and intuitively explore video content both efficiently and effectively.ResultsThe authors produced a prototype implementation of the proposed system, which is publicly accessible athttps://patentq.njit.edu/oer To examine the overall advantage of the proposed system for exploring biomedical OER videos, the authors further conducted a user study of a modest scale. The study results encouragingly demonstrate the functional effectiveness and user-friendliness of the new system for facilitating information seeking from and content exploration among massive biomedical OER videos. Using the proposed tool, users can efficiently and effectively find videos of interest, precisely locate video segments delivering personally valuable information, as well as intuitively and conveniently preview essential content of a single or a collection of videos. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Character-level neural network for biomedical named entity recognition.
Gridach, Mourad
2017-06-01
Biomedical named entity recognition (BNER), which extracts important named entities such as genes and proteins, is a challenging task in automated systems that mine knowledge in biomedical texts. The previous state-of-the-art systems required large amounts of task-specific knowledge in the form of feature engineering, lexicons and data pre-processing to achieve high performance. In this paper, we introduce a novel neural network architecture that benefits from both word- and character-level representations automatically, by using a combination of bidirectional long short-term memory (LSTM) and conditional random field (CRF) eliminating the need for most feature engineering tasks. We evaluate our system on two datasets: JNLPBA corpus and the BioCreAtIvE II Gene Mention (GM) corpus. We obtained state-of-the-art performance by outperforming the previous systems. To the best of our knowledge, we are the first to investigate the combination of deep neural networks, CRF, word embeddings and character-level representation in recognizing biomedical named entities. Copyright © 2017 Elsevier Inc. All rights reserved.
Efficient 41Ca measurements for biomedical applications
NASA Astrophysics Data System (ADS)
Vockenhuber, C.; Schulze-König, T.; Synal, H.-A.; Aeberli, I.; Zimmermann, M. B.
2015-10-01
We present the performance of 41Ca measurements using low-energy Accelerator Mass Spectrometry (AMS) at the 500 kV facility TANDY at ETH Zurich. We optimized the measurement procedure for biomedical applications where reliability and high sample throughput is required. The main challenge for AMS measurements of 41Ca is the interfering stable isobar 41K. We use a simplified sample preparation procedure to produce calcium fluoride (CaF2) and extract calcium tri-fluoride ions (CaF3-) ions to suppress the stable isobar 41K. Although 41K is not completely suppressed we reach 41Ca/40Ca background level in the 10-12 range which is adequate for biomedical studies. With helium as a stripper gas we can use charge state 2+ at high transmission (∼50%). The new measurement procedure with the approximately 10 × improved efficiency and the higher accuracy due to 41K correction allowed us to measure more than 600 samples for a large biomedical study within only a few weeks of measurement time.
Hoehndorf, Robert; Dumontier, Michel; Oellrich, Anika; Rebholz-Schuhmann, Dietrich; Schofield, Paul N; Gkoutos, Georgios V
2011-01-01
Researchers design ontologies as a means to accurately annotate and integrate experimental data across heterogeneous and disparate data- and knowledge bases. Formal ontologies make the semantics of terms and relations explicit such that automated reasoning can be used to verify the consistency of knowledge. However, many biomedical ontologies do not sufficiently formalize the semantics of their relations and are therefore limited with respect to automated reasoning for large scale data integration and knowledge discovery. We describe a method to improve automated reasoning over biomedical ontologies and identify several thousand contradictory class definitions. Our approach aligns terms in biomedical ontologies with foundational classes in a top-level ontology and formalizes composite relations as class expressions. We describe the semi-automated repair of contradictions and demonstrate expressive queries over interoperable ontologies. Our work forms an important cornerstone for data integration, automatic inference and knowledge discovery based on formal representations of knowledge. Our results and analysis software are available at http://bioonto.de/pmwiki.php/Main/ReasonableOntologies.