molecular biology databases: Topics by Science.gov

Sample records for molecular biology databases

Data warehousing in molecular biology.

PubMed

Schönbach, C; Kowalski-Saunders, P; Brusic, V

2000-05-01

In the business and healthcare sectors data warehousing has provided effective solutions for information usage and knowledge discovery from databases. However, data warehousing applications in the biological research and development (R&D) sector are lagging far behind. The fuzziness and complexity of biological data represent a major challenge in data warehousing for molecular biology. By combining experiences in other domains with our findings from building a model database, we have defined the requirements for data warehousing in molecular biology.
Library of molecular associations: curating the complex molecular basis of liver diseases.

PubMed

Buchkremer, Stefan; Hendel, Jasmin; Krupp, Markus; Weinmann, Arndt; Schlamp, Kai; Maass, Thorsten; Staib, Frank; Galle, Peter R; Teufel, Andreas

2010-03-20

Systems biology approaches offer novel insights into the development of chronic liver diseases. Current genomic databases supporting systems biology analyses are mostly based on microarray data. Although these data often cover genome wide expression, the validity of single microarray experiments remains questionable. However, for systems biology approaches addressing the interactions of molecular networks comprehensive but also highly validated data are necessary. We have therefore generated the first comprehensive database for published molecular associations in human liver diseases. It is based on PubMed published abstracts and aimed to close the gap between genome wide coverage of low validity from microarray data and individual highly validated data from PubMed. After an initial text mining process, the extracted abstracts were all manually validated to confirm content and potential genetic associations and may therefore be highly trusted. All data were stored in a publicly available database, Library of Molecular Associations http://www.medicalgenomics.org/databases/loma/news, currently holding approximately 1260 confirmed molecular associations for chronic liver diseases such as HCC, CCC, liver fibrosis, NASH/fatty liver disease, AIH, PBC, and PSC. We furthermore transformed these data into a powerful resource for molecular liver research by connecting them to multiple biomedical information resources. Together, this database is the first available database providing a comprehensive view and analysis options for published molecular associations on multiple liver diseases.
A comparative cellular and molecular biology of longevity database.

PubMed

Stuart, Jeffrey A; Liang, Ping; Luo, Xuemei; Page, Melissa M; Gallagher, Emily J; Christoff, Casey A; Robb, Ellen L

2013-10-01

Discovering key cellular and molecular traits that promote longevity is a major goal of aging and longevity research. One experimental strategy is to determine which traits have been selected during the evolution of longevity in naturally long-lived animal species. This comparative approach has been applied to lifespan research for nearly four decades, yielding hundreds of datasets describing aspects of cell and molecular biology hypothesized to relate to animal longevity. Here, we introduce a Comparative Cellular and Molecular Biology of Longevity Database, available at ( http://genomics.brocku.ca/ccmbl/ ), as a compendium of comparative cell and molecular data presented in the context of longevity. This open access database will facilitate the meta-analysis of amalgamated datasets using standardized maximum lifespan (MLSP) data (from AnAge). The first edition contains over 800 data records describing experimental measurements of cellular stress resistance, reactive oxygen species metabolism, membrane composition, protein homeostasis, and genome homeostasis as they relate to vertebrate species MLSP. The purpose of this review is to introduce the database and briefly demonstrate its use in the meta-analysis of combined datasets.
Comprehensive, comprehensible, distributed and intelligent databases: current status.

PubMed

Frishman, D; Heumann, K; Lesk, A; Mewes, H W

1998-01-01

It is only a matter of time until a user will see not many but one integrated database of information for molecular biology. Is this true? Is it a good thing? Why will it happen? Where are we now? What developments are fostering and what developments are impeding progress towards this end? A list of WWW resources devoted to database issues in molecular biology is available at http://www.mips.biochem.mpg.de frishman@mips.biochem.mpg.de
The 2015 Nucleic Acids Research Database Issue and molecular biology database collection.

PubMed

Galperin, Michael Y; Rigden, Daniel J; Fernández-Suárez, Xosé M

2015-01-01

The 2015 Nucleic Acids Research Database Issue contains 172 papers that include descriptions of 56 new molecular biology databases, and updates on 115 databases whose descriptions have been previously published in NAR or other journals. Following the classification that has been introduced last year in order to simplify navigation of the entire issue, these articles are divided into eight subject categories. This year's highlights include RNAcentral, an international community portal to various databases on noncoding RNA; ValidatorDB, a validation database for protein structures and their ligands; SASBDB, a primary repository for small-angle scattering data of various macromolecular complexes; MoonProt, a database of 'moonlighting' proteins, and two new databases of protein-protein and other macromolecular complexes, ComPPI and the Complex Portal. This issue also includes an unusually high number of cancer-related databases and other databases dedicated to genomic basics of disease and potential drugs and drug targets. The size of NAR online Molecular Biology Database Collection, http://www.oxfordjournals.org/nar/database/a/, remained approximately the same, following the addition of 74 new resources and removal of 77 obsolete web sites. The entire Database Issue is freely available online on the Nucleic Acids Research web site (http://nar.oxfordjournals.org/). Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Just Working with the Cellular Machine: A High School Game for Teaching Molecular Biology

ERIC Educational Resources Information Center

Cardoso, Fernanda Serpa; Dumpel, Renata; Gomes da Silva, Luisa B.; Rodrigues, Carlos R.; Santos, Dilvani O.; Cabral, Lucio Mendes; Castro, Helena C.

2008-01-01

Molecular biology is a difficult comprehension subject due to its high complexity, thus requiring new teaching approaches. Herein, we developed an interdisciplinary board game involving the human immune system response against a bacterial infection for teaching molecular biology at high school. Initially, we created a database with several…
The 2018 Nucleic Acids Research database issue and the online molecular biology database collection.

PubMed

Rigden, Daniel J; Fernández, Xosé M

2018-01-04

The 2018 Nucleic Acids Research Database Issue contains 181 papers spanning molecular biology. Among them, 82 are new and 84 are updates describing resources that appeared in the Issue previously. The remaining 15 cover databases most recently published elsewhere. Databases in the area of nucleic acids include 3DIV for visualisation of data on genome 3D structure and RNArchitecture, a hierarchical classification of RNA families. Protein databases include the established SMART, ELM and MEROPS while GPCRdb and the newcomer STCRDab cover families of biomedical interest. In the area of metabolism, HMDB and Reactome both report new features while PULDB appears in NAR for the first time. This issue also contains reports on genomics resources including Ensembl, the UCSC Genome Browser and ENCODE. Update papers from the IUPHAR/BPS Guide to Pharmacology and DrugBank are highlights of the drug and drug target section while a number of proteomics databases including proteomicsDB are also covered. The entire Database Issue is freely available online on the Nucleic Acids Research website (https://academic.oup.com/nar). The NAR online Molecular Biology Database Collection has been updated, reviewing 138 entries, adding 88 new resources and eliminating 47 discontinued URLs, bringing the current total to 1737 databases. It is available at http://www.oxfordjournals.org/nar/database/c/. © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research.
MIMO: an efficient tool for molecular interaction maps overlap

PubMed Central

2013-01-01

Background Molecular pathways represent an ensemble of interactions occurring among molecules within the cell and between cells. The identification of similarities between molecular pathways across organisms and functions has a critical role in understanding complex biological processes. For the inference of such novel information, the comparison of molecular pathways requires to account for imperfect matches (flexibility) and to efficiently handle complex network topologies. To date, these characteristics are only partially available in tools designed to compare molecular interaction maps. Results Our approach MIMO (Molecular Interaction Maps Overlap) addresses the first problem by allowing the introduction of gaps and mismatches between query and template pathways and permits -when necessary- supervised queries incorporating a priori biological information. It then addresses the second issue by relying directly on the rich graph topology described in the Systems Biology Markup Language (SBML) standard, and uses multidigraphs to efficiently handle multiple queries on biological graph databases. The algorithm has been here successfully used to highlight the contact point between various human pathways in the Reactome database. Conclusions MIMO offers a flexible and efficient graph-matching tool for comparing complex biological pathways. PMID:23672344
The 2014 Nucleic Acids Research Database Issue and an updated NAR online Molecular Biology Database Collection.

PubMed

Fernández-Suárez, Xosé M; Rigden, Daniel J; Galperin, Michael Y

2014-01-01

The 2014 Nucleic Acids Research Database Issue includes descriptions of 58 new molecular biology databases and recent updates to 123 databases previously featured in NAR or other journals. For convenience, the issue is now divided into eight sections that reflect major subject categories. Among the highlights of this issue are six databases of the transcription factor binding sites in various organisms and updates on such popular databases as CAZy, Database of Genomic Variants (DGV), dbGaP, DrugBank, KEGG, miRBase, Pfam, Reactome, SEED, TCDB and UniProt. There is a strong block of structural databases, which includes, among others, the new RNA Bricks database, updates on PDBe, PDBsum, ArchDB, Gene3D, ModBase, Nucleic Acid Database and the recently revived iPfam database. An update on the NCBI's MMDB describes VAST+, an improved tool for protein structure comparison. Two articles highlight the development of the Structural Classification of Proteins (SCOP) database: one describes SCOPe, which automates assignment of new structures to the existing SCOP hierarchy; the other one describes the first version of SCOP2, with its more flexible approach to classifying protein structures. This issue also includes a collection of articles on bacterial taxonomy and metagenomics, which includes updates on the List of Prokaryotic Names with Standing in Nomenclature (LPSN), Ribosomal Database Project (RDP), the Silva/LTP project and several new metagenomics resources. The NAR online Molecular Biology Database Collection, http://www.oxfordjournals.org/nar/database/c/, has been expanded to 1552 databases. The entire Database Issue is freely available online on the Nucleic Acids Research website (http://nar.oxfordjournals.org/).
WWW Entrez: A Hypertext Retrieval Tool for Molecular Biology.

ERIC Educational Resources Information Center

Epstein, Jonathan A.; Kans, Jonathan A.; Schuler, Gregory D.

This article describes the World Wide Web (WWW) Entrez server which is based upon the National Center for Biotechnology Information's (NCBI) Entrez retrieval database and software. Entrez is a molecular sequence retrieval system that contains an integrated view of portions of Medline and all publicly available nucleotide and protein databases,…
Information resources at the National Center for Biotechnology Information.

PubMed Central

Woodsmall, R M; Benson, D A

1993-01-01

The National Center for Biotechnology Information (NCBI), part of the National Library of Medicine, was established in 1988 to perform basic research in the field of computational molecular biology as well as build and distribute molecular biology databases. The basic research has led to new algorithms and analysis tools for interpreting genomic data and has been instrumental in the discovery of human disease genes for neurofibromatosis and Kallmann syndrome. The principal database responsibility is the National Institutes of Health (NIH) genetic sequence database, GenBank. NCBI, in collaboration with international partners, builds, distributes, and provides online and CD-ROM access to over 112,000 DNA sequences. Another major program is the integration of multiple sequences databases and related bibliographic information and the development of network-based retrieval systems for Internet access. PMID:8374583
BIOSPIDA: A Relational Database Translator for NCBI.

PubMed

Hagen, Matthew S; Lee, Eva K

2010-11-13

As the volume and availability of biological databases continue widespread growth, it has become increasingly difficult for research scientists to identify all relevant information for biological entities of interest. Details of nucleotide sequences, gene expression, molecular interactions, and three-dimensional structures are maintained across many different databases. To retrieve all necessary information requires an integrated system that can query multiple databases with minimized overhead. This paper introduces a universal parser and relational schema translator that can be utilized for all NCBI databases in Abstract Syntax Notation (ASN.1). The data models for OMIM, Entrez-Gene, Pubmed, MMDB and GenBank have been successfully converted into relational databases and all are easily linkable helping to answer complex biological questions. These tools facilitate research scientists to locally integrate databases from NCBI without significant workload or development time.
Design and implementation of a library-based information service in molecular biology and genetics at the University of Pittsburgh

PubMed Central

Chattopadhyay, Ansuman; Tannery, Nancy Hrinya; Silverman, Deborah A. L.; Bergen, Phillip; Epstein, Barbara A.

2006-01-01

Setting: In summer 2002, the Health Sciences Library System (HSLS) at the University of Pittsburgh initiated an information service in molecular biology and genetics to assist researchers with identifying and utilizing bioinformatics tools. Program Components: This novel information service comprises hands-on training workshops and consultation on the use of bioinformatics tools. The HSLS also provides an electronic portal and networked access to public and commercial molecular biology databases and software packages. Evaluation Mechanisms: Researcher feedback gathered during the first three years of workshops and individual consultation indicate that the information service is meeting user needs. Next Steps/Future Directions: The service's workshop offerings will expand to include emerging bioinformatics topics. A frequently asked questions database is also being developed to reuse advice on complex bioinformatics questions. PMID:16888665
Bibliometric analysis of original molecular biology research in anaesthesia.

PubMed

Schreiber, K; Girard, T; Kindler, C H

2004-10-01

Molecular biology has expanded the horizons of anaesthesia during the last 20 years and has led to an increase of basic science articles that are published in the specialised anaesthetic journals or originate in anaesthetic institutions. We searched for and analysed the specific features, such as year of publication, publishing journal, and country of origin, of all such molecular biology articles stored in the MEDLINE database during the period 1986-2002. We identified 1265 original articles that used molecular biology techniques; 223 (18%) of these articles were published in anaesthetic journals and 1042 (82%) articles in 556 other biomedical journals. While in the late 1980s only a few molecular biology articles were published each year by anaesthetic institutions, worldwide this number reached approximately 200 basic science articles by the end of 2002. The USA clearly dominates the field of anaesthesia with respect to molecular biology research with 839 (66%) such articles.
AN OVERVIEW OF COMPUTATIONAL LIFE SCIENCE DATABASES & EXCHANGE FORMATS OF RELEVANCE TO CHEMICAL BIOLOGY RESEARCH

PubMed Central

Hall, Aaron Smalter; Shan, Yunfeng; Lushington, Gerald; Visvanathan, Mahesh

2016-01-01

Databases and exchange formats describing biological entities such as chemicals and proteins, along with their relationships, are a critical component of research in life sciences disciplines, including chemical biology wherein small information about small molecule properties converges with cellular and molecular biology. Databases for storing biological entities are growing not only in size, but also in type, with many similarities between them and often subtle differences. The data formats available to describe and exchange these entities are numerous as well. In general, each format is optimized for a particular purpose or database, and hence some understanding of these formats is required when choosing one for research purposes. This paper reviews a selection of different databases and data formats with the goal of summarizing their purposes, features, and limitations. Databases are reviewed under the categories of 1) protein interactions, 2) metabolic pathways, 3) chemical interactions, and 4) drug discovery. Representation formats will be discussed according to those describing chemical structures, and those describing genomic/proteomic entities. PMID:22934944
An overview of computational life science databases & exchange formats of relevance to chemical biology research.

PubMed

Smalter Hall, Aaron; Shan, Yunfeng; Lushington, Gerald; Visvanathan, Mahesh

2013-03-01

Databases and exchange formats describing biological entities such as chemicals and proteins, along with their relationships, are a critical component of research in life sciences disciplines, including chemical biology wherein small information about small molecule properties converges with cellular and molecular biology. Databases for storing biological entities are growing not only in size, but also in type, with many similarities between them and often subtle differences. The data formats available to describe and exchange these entities are numerous as well. In general, each format is optimized for a particular purpose or database, and hence some understanding of these formats is required when choosing one for research purposes. This paper reviews a selection of different databases and data formats with the goal of summarizing their purposes, features, and limitations. Databases are reviewed under the categories of 1) protein interactions, 2) metabolic pathways, 3) chemical interactions, and 4) drug discovery. Representation formats will be discussed according to those describing chemical structures, and those describing genomic/proteomic entities.
BIOSPIDA: A Relational Database Translator for NCBI

PubMed Central

Hagen, Matthew S.; Lee, Eva K.

2010-01-01

As the volume and availability of biological databases continue widespread growth, it has become increasingly difficult for research scientists to identify all relevant information for biological entities of interest. Details of nucleotide sequences, gene expression, molecular interactions, and three-dimensional structures are maintained across many different databases. To retrieve all necessary information requires an integrated system that can query multiple databases with minimized overhead. This paper introduces a universal parser and relational schema translator that can be utilized for all NCBI databases in Abstract Syntax Notation (ASN.1). The data models for OMIM, Entrez-Gene, Pubmed, MMDB and GenBank have been successfully converted into relational databases and all are easily linkable helping to answer complex biological questions. These tools facilitate research scientists to locally integrate databases from NCBI without significant workload or development time. PMID:21347013
BISQUE: locus- and variant-specific conversion of genomic, transcriptomic and proteomic database identifiers.

PubMed

Meyer, Michael J; Geske, Philip; Yu, Haiyuan

2016-05-15

Biological sequence databases are integral to efforts to characterize and understand biological molecules and share biological data. However, when analyzing these data, scientists are often left holding disparate biological currency-molecular identifiers from different databases. For downstream applications that require converting the identifiers themselves, there are many resources available, but analyzing associated loci and variants can be cumbersome if data is not given in a form amenable to particular analyses. Here we present BISQUE, a web server and customizable command-line tool for converting molecular identifiers and their contained loci and variants between different database conventions. BISQUE uses a graph traversal algorithm to generalize the conversion process for residues in the human genome, genes, transcripts and proteins, allowing for conversion across classes of molecules and in all directions through an intuitive web interface and a URL-based web service. BISQUE is freely available via the web using any major web browser (http://bisque.yulab.org/). Source code is available in a public GitHub repository (https://github.com/hyulab/BISQUE). haiyuan.yu@cornell.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Exploring molecular networks using MONET ontology.

PubMed

Silva, João Paulo Müller da; Lemke, Ney; Mombach, José Carlos; Souza, José Guilherme Camargo de; Sinigaglia, Marialva; Vieira, Renata

2006-03-31

The description of the complex molecular network responsible for cell behavior requires new tools to integrate large quantities of experimental data in the design of biological information systems. These tools could be used in the characterization of these networks and in the formulation of relevant biological hypotheses. The building of an ontology is a crucial step because it integrates in a coherent framework the concepts necessary to accomplish such a task. We present MONET (molecular network), an extensible ontology and an architecture designed to facilitate the integration of data originating from different public databases in a single- and well-documented relational database, that is compatible with MONET formal definition. We also present an example of an application that can easily be implemented using these tools.
National Center for Biotechnology Information Celebrates 25th Anniversary | NIH MedlinePlus the Magazine

MedlinePlus

... is a national and international resource for molecular biology information. It creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and ...

BioFrameNet: A FrameNet Extension to the Domain of Molecular Biology

ERIC Educational Resources Information Center

Dolbey, Andrew Eric

2009-01-01

In this study I introduce BioFrameNet, an extension of the Berkeley FrameNet lexical database to the domain of molecular biology. I examine the syntactic and semantic combinatorial possibilities exhibited in the lexical items used in this domain in order to get a better understanding of the grammatical properties of the language used in scientific…
Searching molecular structure databases with tandem mass spectra using CSI:FingerID

PubMed Central

Dührkop, Kai; Shen, Huibin; Meusel, Marvin; Rousu, Juho; Böcker, Sebastian

2015-01-01

Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics experiments usually rely on tandem MS to identify the thousands of compounds in a biological sample. Today, the vast majority of metabolites remain unknown. We present a method for searching molecular structure databases using tandem MS data of small molecules. Our method computes a fragmentation tree that best explains the fragmentation spectrum of an unknown molecule. We use the fragmentation tree to predict the molecular structure fingerprint of the unknown compound using machine learning. This fingerprint is then used to search a molecular structure database such as PubChem. Our method is shown to improve on the competing methods for computational metabolite identification by a considerable margin. PMID:26392543
Just working with the cellular machine: A high school game for teaching molecular biology.

PubMed

Cardoso, Fernanda Serpa; Dumpel, Renata; da Silva, Luisa B Gomes; Rodrigues, Carlos R; Santos, Dilvani O; Cabral, Lucio Mendes; Castro, Helena C

2008-03-01

Molecular biology is a difficult comprehension subject due to its high complexity, thus requiring new teaching approaches. Herein, we developed an interdisciplinary board game involving the human immune system response against a bacterial infection for teaching molecular biology at high school. Initially, we created a database with several questions and a game story that invites the students for helping the human immunological system to produce antibodies (IgG) and fight back a pathogenic bacterium second-time invasion. The game involves answering questions completing the game board in which the antibodies "are synthesized" through the molecular biology process. At the end, a problem-based learning approach is used, and a last question is raised about proteins. Biology teachers and high school students evaluated the game and considered it an easy and interesting tool for teaching the theme. An increase of about 5-30% in answering molecular biology questions revealed that the game improves learning and induced a more engaged and proactive learning profile in the high school students. Copyright © 2008 International Union of Biochemistry and Molecular Biology, Inc.
Atlas - a data warehouse for integrative bioinformatics.

PubMed

Shah, Sohrab P; Huang, Yong; Xu, Tao; Yuen, Macaire M S; Ling, John; Ouellette, B F Francis

2005-02-21

We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: http://bioinformatics.ubc.ca/atlas/
Atlas – a data warehouse for integrative bioinformatics

PubMed Central

Shah, Sohrab P; Huang, Yong; Xu, Tao; Yuen, Macaire MS; Ling, John; Ouellette, BF Francis

2005-01-01

Background We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. Description The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. Conclusion The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: PMID:15723693
Text-mining and information-retrieval services for molecular biology

PubMed Central

Krallinger, Martin; Valencia, Alfonso

2005-01-01

Text-mining in molecular biology - defined as the automatic extraction of information about genes, proteins and their functional relationships from text documents - has emerged as a hybrid discipline on the edges of the fields of information science, bioinformatics and computational linguistics. A range of text-mining applications have been developed recently that will improve access to knowledge for biologists and database annotators. PMID:15998455
PROFESS: a PROtein Function, Evolution, Structure and Sequence database

PubMed Central

Triplet, Thomas; Shortridge, Matthew D.; Griep, Mark A.; Stark, Jaime L.; Powers, Robert; Revesz, Peter

2010-01-01

The proliferation of biological databases and the easy access enabled by the Internet is having a beneficial impact on biological sciences and transforming the way research is conducted. There are ∼1100 molecular biology databases dispersed throughout the Internet. To assist in the functional, structural and evolutionary analysis of the abundant number of novel proteins continually identified from whole-genome sequencing, we introduce the PROFESS (PROtein Function, Evolution, Structure and Sequence) database. Our database is designed to be versatile and expandable and will not confine analysis to a pre-existing set of data relationships. A fundamental component of this approach is the development of an intuitive query system that incorporates a variety of similarity functions capable of generating data relationships not conceived during the creation of the database. The utility of PROFESS is demonstrated by the analysis of the structural drift of homologous proteins and the identification of potential pancreatic cancer therapeutic targets based on the observation of protein–protein interaction networks. Database URL: http://cse.unl.edu/∼profess/ PMID:20624718
The EBI SRS server-new features.

PubMed

Zdobnov, Evgeny M; Lopez, Rodrigo; Apweiler, Rolf; Etzold, Thure

2002-08-01

Here we report on recent developments at the EBI SRS server (http://srs.ebi.ac.uk). SRS has become an integration system for both data retrieval and sequence analysis applications. The EBI SRS server is a primary gateway to major databases in the field of molecular biology produced and supported at EBI as well as European public access point to the MEDLINE database provided by US National Library of Medicine (NLM). It is a reference server for latest developments in data and application integration. The new additions include: concept of virtual databases, integration of XML databases like the Integrated Resource of Protein Domains and Functional Sites (InterPro), Gene Ontology (GO), MEDLINE, Metabolic pathways, etc., user friendly data representation in 'Nice views', SRSQuickSearch bookmarklets. SRS6 is a licensed product of LION Bioscience AG freely available for academics. The EBI SRS server (http://srs.ebi.ac.uk) is a free central resource for molecular biology data as well as a reference server for the latest developments in data integration.
The Molecular Signatures Database (MSigDB) hallmark gene set collection.

PubMed

Liberzon, Arthur; Birger, Chet; Thorvaldsdóttir, Helga; Ghandi, Mahmoud; Mesirov, Jill P; Tamayo, Pablo

2015-12-23

The Molecular Signatures Database (MSigDB) is one of the most widely used and comprehensive databases of gene sets for performing gene set enrichment analysis. Since its creation, MSigDB has grown beyond its roots in metabolic disease and cancer to include >10,000 gene sets. These better represent a wider range of biological processes and diseases, but the utility of the database is reduced by increased redundancy across, and heterogeneity within, gene sets. To address this challenge, here we use a combination of automated approaches and expert curation to develop a collection of "hallmark" gene sets as part of MSigDB. Each hallmark in this collection consists of a "refined" gene set, derived from multiple "founder" sets, that conveys a specific biological state or process and displays coherent expression. The hallmarks effectively summarize most of the relevant information of the original founder sets and, by reducing both variation and redundancy, provide more refined and concise inputs for gene set enrichment analysis.
The 24th annual Nucleic Acids Research database issue: a look back and upcoming changes

PubMed Central

Rigden, Daniel J

2017-01-01

Abstract This year's Database Issue of Nucleic Acids Research contains 152 papers that include descriptions of 54 new databases and update papers on 98 databases, of which 16 have not been previously featured in NAR. As always, these databases cover a broad range of molecular biology subjects, including genome structure, gene expression and its regulation, proteins, protein domains, and protein–protein interactions. Following the recent trend, an increasing number of new and established databases deal with the issues of human health, from cancer-causing mutations to drugs and drug targets. In accordance with this trend, three recently compiled databases that have been selected by NAR reviewers and editors as ‘breakthrough’ contributions, denovo-db, the Monarch Initiative, and Open Targets, cover human de novo gene variants, disease-related phenotypes in model organisms, and a bioinformatics platform for therapeutic target identification and validation, respectively. We expect these databases to attract the attention of numerous researchers working in various areas of genetics and genomics. Looking back at the past 12 years, we present here the ‘golden set’ of databases that have consistently served as authoritative, comprehensive, and convenient data resources widely used by the entire community and offer some lessons on what makes a successful database. The Database Issue is freely available online at the https://academic.oup.com/nar web site. An updated version of the NAR Molecular Biology Database Collection is available at http://www.oxfordjournals.org/nar/database/a/. PMID:28053160
Molecular signatures database (MSigDB) 3.0.

PubMed

Liberzon, Arthur; Subramanian, Aravind; Pinchback, Reid; Thorvaldsdóttir, Helga; Tamayo, Pablo; Mesirov, Jill P

2011-06-15

Well-annotated gene sets representing the universe of the biological processes are critical for meaningful and insightful interpretation of large-scale genomic data. The Molecular Signatures Database (MSigDB) is one of the most widely used repositories of such sets. We report the availability of a new version of the database, MSigDB 3.0, with over 6700 gene sets, a complete revision of the collection of canonical pathways and experimental signatures from publications, enhanced annotations and upgrades to the web site. MSigDB is freely available for non-commercial use at http://www.broadinstitute.org/msigdb.
The European Bioinformatics Institute's data resources: towards systems biology.

PubMed

Brooksbank, Catherine; Cameron, Graham; Thornton, Janet

2005-01-01

Genomic and post-genomic biological research has provided fine-grain insights into the molecular processes of life, but also threatens to drown biomedical researchers in data. Moreover, as new high-throughput technologies are developed, the types of data that are gathered en masse are diversifying. The need to collect, store and curate all this information in ways that allow its efficient retrieval and exploitation is greater than ever. The European Bioinformatics Institute's (EBI's) databases and tools have evolved to meet the changing needs of molecular biologists: since we last wrote about our services in the 2003 issue of Nucleic Acids Research, we have launched new databases covering protein-protein interactions (IntAct), pathways (Reactome) and small molecules (ChEBI). Our existing core databases have continued to evolve to meet the changing needs of biomedical researchers, and we have developed new data-access tools that help biologists to move intuitively through the different data types, thereby helping them to put the parts together to understand biology at the systems level. The EBI's data resources are all available on our website at http://www.ebi.ac.uk.
The European Bioinformatics Institute's data resources: towards systems biology

PubMed Central

Brooksbank, Catherine; Cameron, Graham; Thornton, Janet

2005-01-01

Genomic and post-genomic biological research has provided fine-grain insights into the molecular processes of life, but also threatens to drown biomedical researchers in data. Moreover, as new high-throughput technologies are developed, the types of data that are gathered en masse are diversifying. The need to collect, store and curate all this information in ways that allow its efficient retrieval and exploitation is greater than ever. The European Bioinformatics Institute's (EBI's) databases and tools have evolved to meet the changing needs of molecular biologists: since we last wrote about our services in the 2003 issue of Nucleic Acids Research, we have launched new databases covering protein–protein interactions (IntAct), pathways (Reactome) and small molecules (ChEBI). Our existing core databases have continued to evolve to meet the changing needs of biomedical researchers, and we have developed new data-access tools that help biologists to move intuitively through the different data types, thereby helping them to put the parts together to understand biology at the systems level. The EBI's data resources are all available on our website at http://www.ebi.ac.uk. PMID:15608238
MelanomaDB: A Web Tool for Integrative Analysis of Melanoma Genomic Information to Identify Disease-Associated Molecular Pathways

PubMed Central

Trevarton, Alexander J.; Mann, Michael B.; Knapp, Christoph; Araki, Hiromitsu; Wren, Jonathan D.; Stones-Havas, Steven; Black, Michael A.; Print, Cristin G.

2013-01-01

Despite on-going research, metastatic melanoma survival rates remain low and treatment options are limited. Researchers can now access a rapidly growing amount of molecular and clinical information about melanoma. This information is becoming difficult to assemble and interpret due to its dispersed nature, yet as it grows it becomes increasingly valuable for understanding melanoma. Integration of this information into a comprehensive resource to aid rational experimental design and patient stratification is needed. As an initial step in this direction, we have assembled a web-accessible melanoma database, MelanomaDB, which incorporates clinical and molecular data from publically available sources, which will be regularly updated as new information becomes available. This database allows complex links to be drawn between many different aspects of melanoma biology: genetic changes (e.g., mutations) in individual melanomas revealed by DNA sequencing, associations between gene expression and patient survival, data concerning drug targets, biomarkers, druggability, and clinical trials, as well as our own statistical analysis of relationships between molecular pathways and clinical parameters that have been produced using these data sets. The database is freely available at http://genesetdb.auckland.ac.nz/melanomadb/about.html. A subset of the information in the database can also be accessed through a freely available web application in the Illumina genomic cloud computing platform BaseSpace at http://www.biomatters.com/apps/melanoma-profiler-for-research. The MelanomaDB database illustrates dysregulation of specific signaling pathways across 310 exome-sequenced melanomas and in individual tumors and identifies the distribution of somatic variants in melanoma. We suggest that MelanomaDB can provide a context in which to interpret the tumor molecular profiles of individual melanoma patients relative to biological information and available drug therapies. PMID:23875173
ISMB Conference Funding to Support Attendance of Early Researchers and Students

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gaasterland, Terry

ISMB Conference Funding for Students and Young Scientists Historical Description The Intelligent Systems for Molecular Biology (ISMB) conference has provided a general forum for disseminating the latest developments in bioinformatics on an annual basis for the past 22 years. ISMB is a multidisciplinary conference that brings together scientists from computer science, molecular biology, mathematics and statistics. The goal of the ISMB meeting is to bring together biologists and computational scientists in a focus on actual biological problems, i.e., not simply theoretical calculations. The combined focus on “intelligent systems” and actual biological data makes ISMB a unique and highly important meeting.more » 21 years of experience in holding the conference has resulted in a consistently well-organized, well attended, and highly respected annual conference. "Intelligent systems" include any software which goes beyond straightforward, closed-form algorithms or standard database technologies, and encompasses those that view data in a symbolic fashion, learn from examples, consolidate multiple levels of abstraction, or synthesize results to be cognitively tractable to a human, including the development and application of advanced computational methods for biological problems. Relevant computational techniques include, but are not limited to: machine learning, pattern recognition, knowledge representation, databases, combinatorics, stochastic modeling, string and graph algorithms, linguistic methods, robotics, constraint satisfaction, and parallel computation. Biological areas of interest include molecular structure, genomics, molecular sequence analysis, evolution and phylogenetics, molecular interactions, metabolic pathways, regulatory networks, developmental control, and molecular biology generally. Emphasis is placed on the validation of methods using real data sets, on practical applications in the biological sciences, and on development of novel computational techniques. The ISMB conferences are distinguished from many other conferences in computational biology or artificial intelligence by an insistence that the researchers work with real molecular biology data, not theoretical or toy examples; and from many other biological conferences by providing a forum for technical advances as they occur, which otherwise may be shunned until a firm experimental result is published. The resulting intellectual richness and cross-disciplinary diversity provides an important opportunity for both students and senior researchers. ISMB has become the premier conference series in this field with refereed, published proceedings, establishing an infrastructure to promote the growing body of research.« less
BioPAX – A community standard for pathway data sharing

PubMed Central

Demir, Emek; Cary, Michael P.; Paley, Suzanne; Fukuda, Ken; Lemer, Christian; Vastrik, Imre; Wu, Guanming; D’Eustachio, Peter; Schaefer, Carl; Luciano, Joanne; Schacherer, Frank; Martinez-Flores, Irma; Hu, Zhenjun; Jimenez-Jacinto, Veronica; Joshi-Tope, Geeta; Kandasamy, Kumaran; Lopez-Fuentes, Alejandra C.; Mi, Huaiyu; Pichler, Elgar; Rodchenkov, Igor; Splendiani, Andrea; Tkachev, Sasha; Zucker, Jeremy; Gopinath, Gopal; Rajasimha, Harsha; Ramakrishnan, Ranjani; Shah, Imran; Syed, Mustafa; Anwar, Nadia; Babur, Ozgun; Blinov, Michael; Brauner, Erik; Corwin, Dan; Donaldson, Sylva; Gibbons, Frank; Goldberg, Robert; Hornbeck, Peter; Luna, Augustin; Murray-Rust, Peter; Neumann, Eric; Reubenacker, Oliver; Samwald, Matthias; van Iersel, Martijn; Wimalaratne, Sarala; Allen, Keith; Braun, Burk; Whirl-Carrillo, Michelle; Dahlquist, Kam; Finney, Andrew; Gillespie, Marc; Glass, Elizabeth; Gong, Li; Haw, Robin; Honig, Michael; Hubaut, Olivier; Kane, David; Krupa, Shiva; Kutmon, Martina; Leonard, Julie; Marks, Debbie; Merberg, David; Petri, Victoria; Pico, Alex; Ravenscroft, Dean; Ren, Liya; Shah, Nigam; Sunshine, Margot; Tang, Rebecca; Whaley, Ryan; Letovksy, Stan; Buetow, Kenneth H.; Rzhetsky, Andrey; Schachter, Vincent; Sobral, Bruno S.; Dogrusoz, Ugur; McWeeney, Shannon; Aladjem, Mirit; Birney, Ewan; Collado-Vides, Julio; Goto, Susumu; Hucka, Michael; Le Novère, Nicolas; Maltsev, Natalia; Pandey, Akhilesh; Thomas, Paul; Wingender, Edgar; Karp, Peter D.; Sander, Chris; Bader, Gary D.

2010-01-01

BioPAX (Biological Pathway Exchange) is a standard language to represent biological pathways at the molecular and cellular level. Its major use is to facilitate the exchange of pathway data (http://www.biopax.org). Pathway data captures our understanding of biological processes, but its rapid growth necessitates development of databases and computational tools to aid interpretation. However, the current fragmentation of pathway information across many databases with incompatible formats presents barriers to its effective use. BioPAX solves this problem by making pathway data substantially easier to collect, index, interpret and share. BioPAX can represent metabolic and signaling pathways, molecular and genetic interactions and gene regulation networks. BioPAX was created through a community process. Through BioPAX, millions of interactions organized into thousands of pathways across many organisms, from a growing number of sources, are available. Thus, large amounts of pathway data are available in a computable form to support visualization, analysis and biological discovery. PMID:20829833
The European Bioinformatics Institute's data resources 2014.

PubMed

Brooksbank, Catherine; Bergman, Mary Todd; Apweiler, Rolf; Birney, Ewan; Thornton, Janet

2014-01-01

Molecular Biology has been at the heart of the 'big data' revolution from its very beginning, and the need for access to biological data is a common thread running from the 1965 publication of Dayhoff's 'Atlas of Protein Sequence and Structure' through the Human Genome Project in the late 1990s and early 2000s to today's population-scale sequencing initiatives. The European Bioinformatics Institute (EMBL-EBI; http://www.ebi.ac.uk) is one of three organizations worldwide that provides free access to comprehensive, integrated molecular data sets. Here, we summarize the principles underpinning the development of these public resources and provide an overview of EMBL-EBI's database collection to complement the reviews of individual databases provided elsewhere in this issue.
Update of KDBI: Kinetic Data of Bio-molecular Interaction database

PubMed Central

Kumar, Pankaj; Han, B. C.; Shi, Z.; Jia, J.; Wang, Y. P.; Zhang, Y. T.; Liang, L.; Liu, Q. F.; Ji, Z. L.; Chen, Y. Z.

2009-01-01

Knowledge of the kinetics of biomolecular interactions is important for facilitating the study of cellular processes and underlying molecular events, and is essential for quantitative study and simulation of biological systems. Kinetic Data of Bio-molecular Interaction database (KDBI) has been developed to provide information about experimentally determined kinetic data of protein–protein, protein–nucleic acid, protein–ligand, nucleic acid–ligand binding or reaction events described in the literature. To accommodate increasing demand for studying and simulating biological systems, numerous improvements and updates have been made to KDBI, including new ways to access data by pathway and molecule names, data file in System Biology Markup Language format, more efficient search engine, access to published parameter sets of simulation models of 63 pathways, and 2.3-fold increase of data (19 263 entries of 10 532 distinctive biomolecular binding and 11 954 interaction events, involving 2635 proteins/protein complexes, 847 nucleic acids, 1603 small molecules and 45 multi-step processes). KDBI is publically available at http://bidd.nus.edu.sg/group/kdbi/kdbi.asp. PMID:18971255
Report of the matrix of biological knowledge workshop

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morowitz, H.J.; Smith, T.

1987-10-30

Current understanding of biology involves complex relationships rooted in enormous amounts of data. These data include entries from biochemistry, ecology, genetics, human and veterinary medicine, molecular structure studies, agriculture, embryology, systematics, and many other disciplines. The present wealth of biological data goes beyond past accumulations now include new understandings from molecular biology. Several important biological databases are currently being supported, and more are planned; however, major problems of interdatabase communication and management efficiency abound. Few scientists are currently capable of keeping up with this ever-increasing wealth of knowledge, let alone searching it efficiently for new or unsuspected links and importantmore » analogies. Yet this is what is required if the continued rapid generation of such data is to lead most effectively to the major conceptual, medical, and agricultural advances anticipated over the coming decades in the United States. The opportunity exists to combine the potential of modern computer science, database management, and artificial intelligence in a major effort to organize the vast wealth of biological and clinical data. The time is right because the amount of data is still manageable even in its current highly-fragmented form; important hardware and computer science tools have been greatly improved; and there have been recent fundamental advances in our comprehension of biology. This latter is particularly true at the molecular level where the information for nearly all higher structure and function is encoded. The organization of all biological experimental data coordinately within a structure incorporating our current understanding - the Matrix of Biological Knowledge - will provide the data and structure for the major advances foreseen in the years ahead.« less
Integrative Systems Biology for Data Driven Knowledge Discovery

PubMed Central

Greene, Casey S.; Troyanskaya, Olga G.

2015-01-01

Integrative systems biology is an approach that brings together diverse high throughput experiments and databases to gain new insights into biological processes or systems at molecular through physiological levels. These approaches rely on diverse high-throughput experimental techniques that generate heterogeneous data by assaying varying aspects of complex biological processes. Computational approaches are necessary to provide an integrative view of these experimental results and enable data-driven knowledge discovery. Hypotheses generated from these approaches can direct definitive molecular experiments in a cost effective manner. Using integrative systems biology approaches, we can leverage existing biological knowledge and large-scale data to improve our understanding of yet unknown components of a system of interest and how its malfunction leads to disease. PMID:21044756

Heterogeneous database integration in biomedicine.

PubMed

Sujansky, W

2001-08-01

The rapid expansion of biomedical knowledge, reduction in computing costs, and spread of internet access have created an ocean of electronic data. The decentralized nature of our scientific community and healthcare system, however, has resulted in a patchwork of diverse, or heterogeneous, database implementations, making access to and aggregation of data across databases very difficult. The database heterogeneity problem applies equally to clinical data describing individual patients and biological data characterizing our genome. Specifically, databases are highly heterogeneous with respect to the data models they employ, the data schemas they specify, the query languages they support, and the terminologies they recognize. Heterogeneous database systems attempt to unify disparate databases by providing uniform conceptual schemas that resolve representational heterogeneities, and by providing querying capabilities that aggregate and integrate distributed data. Research in this area has applied a variety of database and knowledge-based techniques, including semantic data modeling, ontology definition, query translation, query optimization, and terminology mapping. Existing systems have addressed heterogeneous database integration in the realms of molecular biology, hospital information systems, and application portability.
Pleurochrysome: A Web Database of Pleurochrysis Transcripts and Orthologs Among Heterogeneous Algae

PubMed Central

Fujiwara, Shoko; Takatsuka, Yukiko; Hirokawa, Yasutaka; Tsuzuki, Mikio; Takano, Tomoyuki; Kobayashi, Masaaki; Suda, Kunihiro; Asamizu, Erika; Yokoyama, Koji; Shibata, Daisuke; Tabata, Satoshi; Yano, Kentaro

2016-01-01

Pleurochrysis is a coccolithophorid genus, which belongs to the Coccolithales in the Haptophyta. The genus has been used extensively for biological research, together with Emiliania in the Isochrysidales, to understand distinctive features between the two coccolithophorid-including orders. However, molecular biological research on Pleurochrysis such as elucidation of the molecular mechanism behind coccolith formation has not made great progress at least in part because of lack of comprehensive gene information. To provide such information to the research community, we built an open web database, the Pleurochrysome (http://bioinf.mind.meiji.ac.jp/phapt/), which currently stores 9,023 unique gene sequences (designated as UNIGENEs) assembled from expressed sequence tag sequences of P. haptonemofera as core information. The UNIGENEs were annotated with gene sequences sharing significant homology, conserved domains, Gene Ontology, KEGG Orthology, predicted subcellular localization, open reading frames and orthologous relationship with genes of 10 other algal species, a cyanobacterium and the yeast Saccharomyces cerevisiae. This sequence and annotation information can be easily accessed via several search functions. Besides fundamental functions such as BLAST and keyword searches, this database also offers search functions to explore orthologous genes in the 12 organisms and to seek novel genes. The Pleurochrysome will promote molecular biological and phylogenetic research on coccolithophorids and other haptophytes by helping scientists mine data from the primary transcriptome of P. haptonemofera. PMID:26746174
The European Bioinformatics Institute’s data resources 2014

PubMed Central

Brooksbank, Catherine; Bergman, Mary Todd; Apweiler, Rolf; Birney, Ewan; Thornton, Janet

2014-01-01

Molecular Biology has been at the heart of the ‘big data’ revolution from its very beginning, and the need for access to biological data is a common thread running from the 1965 publication of Dayhoff’s ‘Atlas of Protein Sequence and Structure’ through the Human Genome Project in the late 1990s and early 2000s to today’s population-scale sequencing initiatives. The European Bioinformatics Institute (EMBL-EBI; http://www.ebi.ac.uk) is one of three organizations worldwide that provides free access to comprehensive, integrated molecular data sets. Here, we summarize the principles underpinning the development of these public resources and provide an overview of EMBL-EBI’s database collection to complement the reviews of individual databases provided elsewhere in this issue. PMID:24271396
MetNetAPI: A flexible method to access and manipulate biological network data from MetNet

PubMed Central

2010-01-01

Background Convenient programmatic access to different biological databases allows automated integration of scientific knowledge. Many databases support a function to download files or data snapshots, or a webservice that offers "live" data. However, the functionality that a database offers cannot be represented in a static data download file, and webservices may consume considerable computational resources from the host server. Results MetNetAPI is a versatile Application Programming Interface (API) to the MetNetDB database. It abstracts, captures and retains operations away from a biological network repository and website. A range of database functions, previously only available online, can be immediately (and independently from the website) applied to a dataset of interest. Data is available in four layers: molecular entities, localized entities (linked to a specific organelle), interactions, and pathways. Navigation between these layers is intuitive (e.g. one can request the molecular entities in a pathway, as well as request in what pathways a specific entity participates). Data retrieval can be customized: Network objects allow the construction of new and integration of existing pathways and interactions, which can be uploaded back to our server. In contrast to webservices, the computational demand on the host server is limited to processing data-related queries only. Conclusions An API provides several advantages to a systems biology software platform. MetNetAPI illustrates an interface with a central repository of data that represents the complex interrelationships of a metabolic and regulatory network. As an alternative to data-dumps and webservices, it allows access to a current and "live" database and exposes analytical functions to application developers. Yet it only requires limited resources on the server-side (thin server/fat client setup). The API is available for Java, Microsoft.NET and R programming environments and offers flexible query and broad data- retrieval methods. Data retrieval can be customized to client needs and the API offers a framework to construct and manipulate user-defined networks. The design principles can be used as a template to build programmable interfaces for other biological databases. The API software and tutorials are available at http://www.metnetonline.org/api. PMID:21083943
The Changing Face of Scientific Discourse: Analysis of Genomic and Proteomic Database Usage and Acceptance.

ERIC Educational Resources Information Center

Brown, Cecelia

2003-01-01

Discusses the growth in use and acceptance of Web-based genomic and proteomic databases (GPD) in scholarly communication. Confirms the role of GPD in the scientific literature cycle, suggests GPD are a storage and retrieval mechanism for molecular biology information, and recommends that existing models of scientific communication be updated to…
From metaphor to practices: The introduction of "information engineers" into the first DNA sequence database.

PubMed

García-Sancho, Miguel

2011-01-01

This paper explores the introduction of professional systems engineers and information management practices into the first centralized DNA sequence database, developed at the European Molecular Biology Laboratory (EMBL) during the 1980s. In so doing, it complements the literature on the emergence of an information discourse after World War II and its subsequent influence in biological research. By the careers of the database creators and the computer algorithms they designed, analyzing, from the mid-1960s onwards information in biology gradually shifted from a pervasive metaphor to be embodied in practices and professionals such as those incorporated at the EMBL. I then investigate the reception of these database professionals by the EMBL biological staff, which evolved from initial disregard to necessary collaboration as the relationship between DNA, genes, and proteins turned out to be more complex than expected. The trajectories of the database professionals at the EMBL suggest that the initial subject matter of the historiography of genomics should be the long-standing practices that emerged after World War II and to a large extent originated outside biomedicine and academia. Only after addressing these practices, historians may turn to their further disciplinary assemblage in fields such as bioinformatics or biotechnology.
Creating and virtually screening databases of fluorescently-labelled compounds for the discovery of target-specific molecular probes

NASA Astrophysics Data System (ADS)

Kamstra, Rhiannon L.; Dadgar, Saedeh; Wigg, John; Chowdhury, Morshed A.; Phenix, Christopher P.; Floriano, Wely B.

2014-11-01

Our group has recently demonstrated that virtual screening is a useful technique for the identification of target-specific molecular probes. In this paper, we discuss some of our proof-of-concept results involving two biologically relevant target proteins, and report the development of a computational script to generate large databases of fluorescence-labelled compounds for computer-assisted molecular design. The virtual screening of a small library of 1,153 fluorescently-labelled compounds against two targets, and the experimental testing of selected hits reveal that this approach is efficient at identifying molecular probes, and that the screening of a labelled library is preferred over the screening of base compounds followed by conjugation of confirmed hits. The automated script for library generation explores the known reactivity of commercially available dyes, such as NHS-esters, to create large virtual databases of fluorescence-tagged small molecules that can be easily synthesized in a laboratory. A database of 14,862 compounds, each tagged with the ATTO680 fluorophore was generated with the automated script reported here. This library is available for downloading and it is suitable for virtual ligand screening aiming at the identification of target-specific fluorescent molecular probes.
DrugPath: a database for academic investigators to match oncology molecular targets with drugs in development.

PubMed

Shah, Eric D; Fisch, Brandon M A; Arceci, Robert J; Buckley, Jonathan D; Reaman, Gregory H; Sorensen, Poul H; Triche, Timothy J; Reynolds, C Patrick

2014-05-01

Academic laboratories are developing increasingly large amounts of data that describe the genomic landscape and gene expression patterns of various types of cancers. Such data can potentially identify novel oncology molecular targets in cancer types that may not be the primary focus of a drug sponsor's initial research for an investigational new drug. Obtaining preclinical data that point toward the potential for a given molecularly targeted agent, or a novel combination of agents requires knowledge of drugs currently in development in both the academic and commercial sectors. We have developed the DrugPath database ( http://www.drugpath.org ) as a comprehensive, free-of-charge resource for academic investigators to identify agents being developed in academics or industry that may act against molecular targets of interest. DrugPath data on molecular targets overlay the Michigan Molecular Interactions ( http://mimi.ncibi.org ) gene-gene interaction map to facilitate identification of related agents in the same pathway. The database catalogs 2,081 drug development programs representing 751 drug sponsors and 722 molecular and genetic targets. DrugPath should assist investigators in identifying and obtaining drugs acting on specific molecular targets for biological and preclinical therapeutic studies.
SCRIPDB: a portal for easy access to syntheses, chemicals and reactions in patents

PubMed Central

Heifets, Abraham; Jurisica, Igor

2012-01-01

The patent literature is a rich catalog of biologically relevant chemicals; many public and commercial molecular databases contain the structures disclosed in patent claims. However, patents are an equally rich source of metadata about bioactive molecules, including mechanism of action, disease class, homologous experimental series, structural alternatives, or the synthetic pathways used to produce molecules of interest. Unfortunately, this metadata is discarded when chemical structures are deposited separately in databases. SCRIPDB is a chemical structure database designed to make this metadata accessible. SCRIPDB provides the full original patent text, reactions and relationships described within any individual patent, in addition to the molecular files common to structural databases. We discuss how such information is valuable in medical text mining, chemical image analysis, reaction extraction and in silico pharmaceutical lead optimization. SCRIPDB may be searched by exact chemical structure, substructure or molecular similarity and the results may be restricted to patents describing synthetic routes. SCRIPDB is available at http://dcv.uhnres.utoronto.ca/SCRIPDB. PMID:22067445
APPLYING DATA MINING APPROACHES TO FURTHER UNDERSTANDING CHEMICAL EFFECTS ON BIOLOGICAL SYSTEMS.

EPA Science Inventory

Correlations of bioassays and toxicity cannot be assessed at the compound level with the current toxicity database. Further work is planned for gaining molecular level knoweldge from these experiments.
Toward unification of taxonomy databases in a distributed computer environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kitakami, Hajime; Tateno, Yoshio; Gojobori, Takashi

1994-12-31

All the taxonomy databases constructed with the DNA databases of the international DNA data banks are powerful electronic dictionaries which aid in biological research by computer. The taxonomy databases are, however not consistently unified with a relational format. If we can achieve consistent unification of the taxonomy databases, it will be useful in comparing many research results, and investigating future research directions from existent research results. In particular, it will be useful in comparing relationships between phylogenetic trees inferred from molecular data and those constructed from morphological data. The goal of the present study is to unify the existent taxonomymore » databases and eliminate inconsistencies (errors) that are present in them. Inconsistencies occur particularly in the restructuring of the existent taxonomy databases, since classification rules for constructing the taxonomy have rapidly changed with biological advancements. A repair system is needed to remove inconsistencies in each data bank and mismatches among data banks. This paper describes a new methodology for removing both inconsistencies and mismatches from the databases on a distributed computer environment. The methodology is implemented in a relational database management system, SYBASE.« less
Cellular and Molecular Biological Approaches to Interpreting Ancient Biomarkers

NASA Astrophysics Data System (ADS)

Newman, Dianne K.; Neubauer, Cajetan; Ricci, Jessica N.; Wu, Chia-Hung; Pearson, Ann

2016-06-01

Our ability to read the molecular fossil record has advanced significantly in the past decade. Improvements in biomarker sampling and quantification methods, expansion of molecular sequence databases, and the application of genetic and cellular biological tools to problems in biomarker research have enabled much of this progress. By way of example, we review how attempts to understand the biological function of 2-methylhopanoids in modern bacteria have changed our interpretation of what their molecular fossils tell us about the early history of life. They were once thought to be biomarkers of cyanobacteria and hence the evolution of oxygenic photosynthesis, but we now believe that 2-methylhopanoid biosynthetic capacity originated in the Alphaproteobacteria, that 2-methylhopanoids are regulated in response to stress, and that hopanoid 2-methylation enhances membrane rigidity. We present a new interpretation of 2-methylhopanes that bridges the gap between studies of the functions of 2-methylhopanoids and their patterns of occurrence in the rock record.
CREDO: a structural interactomics database for drug discovery

PubMed Central

Schreyer, Adrian M.; Blundell, Tom L.

2013-01-01

CREDO is a unique relational database storing all pairwise atomic interactions of inter- as well as intra-molecular contacts between small molecules and macromolecules found in experimentally determined structures from the Protein Data Bank. These interactions are integrated with further chemical and biological data. The database implements useful data structures and algorithms such as cheminformatics routines to create a comprehensive analysis platform for drug discovery. The database can be accessed through a web-based interface, downloads of data sets and web services at http://www-cryst.bioc.cam.ac.uk/credo. Database URL: http://www-cryst.bioc.cam.ac.uk/credo PMID:23868908
Targeted Therapy Database (TTD): A Model to Match Patient's Molecular Profile with Current Knowledge on Cancer Biology

PubMed Central

Mocellin, Simone; Shrager, Jeff; Scolyer, Richard; Pasquali, Sandro; Verdi, Daunia; Marincola, Francesco M.; Briarava, Marta; Gobbel, Randy; Rossi, Carlo; Nitti, Donato

2010-01-01

Background The efficacy of current anticancer treatments is far from satisfactory and many patients still die of their disease. A general agreement exists on the urgency of developing molecularly targeted therapies, although their implementation in the clinical setting is in its infancy. In fact, despite the wealth of preclinical studies addressing these issues, the difficulty of testing each targeted therapy hypothesis in the clinical arena represents an intrinsic obstacle. As a consequence, we are witnessing a paradoxical situation where most hypotheses about the molecular and cellular biology of cancer remain clinically untested and therefore do not translate into a therapeutic benefit for patients. Objective To present a computational method aimed to comprehensively exploit the scientific knowledge in order to foster the development of personalized cancer treatment by matching the patient's molecular profile with the available evidence on targeted therapy. Methods To this aim we focused on melanoma, an increasingly diagnosed malignancy for which the need for novel therapeutic approaches is paradigmatic since no effective treatment is available in the advanced setting. Relevant data were manually extracted from peer-reviewed full-text original articles describing any type of anti-melanoma targeted therapy tested in any type of experimental or clinical model. To this purpose, Medline, Embase, Cancerlit and the Cochrane databases were searched. Results and Conclusions We created a manually annotated database (Targeted Therapy Database, TTD) where the relevant data are gathered in a formal representation that can be computationally analyzed. Dedicated algorithms were set up for the identification of the prevalent therapeutic hypotheses based on the available evidence and for ranking treatments based on the molecular profile of individual patients. In this essay we describe the principles and computational algorithms of an original method developed to fully exploit the available knowledge on cancer biology with the ultimate goal of fruitfully driving both preclinical and clinical research on anticancer targeted therapy. In the light of its theoretical nature, the prediction performance of this model must be validated before it can be implemented in the clinical setting. PMID:20706624
Targeted Therapy Database (TTD): a model to match patient's molecular profile with current knowledge on cancer biology.

PubMed

Mocellin, Simone; Shrager, Jeff; Scolyer, Richard; Pasquali, Sandro; Verdi, Daunia; Marincola, Francesco M; Briarava, Marta; Gobbel, Randy; Rossi, Carlo; Nitti, Donato

2010-08-10

The efficacy of current anticancer treatments is far from satisfactory and many patients still die of their disease. A general agreement exists on the urgency of developing molecularly targeted therapies, although their implementation in the clinical setting is in its infancy. In fact, despite the wealth of preclinical studies addressing these issues, the difficulty of testing each targeted therapy hypothesis in the clinical arena represents an intrinsic obstacle. As a consequence, we are witnessing a paradoxical situation where most hypotheses about the molecular and cellular biology of cancer remain clinically untested and therefore do not translate into a therapeutic benefit for patients. To present a computational method aimed to comprehensively exploit the scientific knowledge in order to foster the development of personalized cancer treatment by matching the patient's molecular profile with the available evidence on targeted therapy. To this aim we focused on melanoma, an increasingly diagnosed malignancy for which the need for novel therapeutic approaches is paradigmatic since no effective treatment is available in the advanced setting. Relevant data were manually extracted from peer-reviewed full-text original articles describing any type of anti-melanoma targeted therapy tested in any type of experimental or clinical model. To this purpose, Medline, Embase, Cancerlit and the Cochrane databases were searched. We created a manually annotated database (Targeted Therapy Database, TTD) where the relevant data are gathered in a formal representation that can be computationally analyzed. Dedicated algorithms were set up for the identification of the prevalent therapeutic hypotheses based on the available evidence and for ranking treatments based on the molecular profile of individual patients. In this essay we describe the principles and computational algorithms of an original method developed to fully exploit the available knowledge on cancer biology with the ultimate goal of fruitfully driving both preclinical and clinical research on anticancer targeted therapy. In the light of its theoretical nature, the prediction performance of this model must be validated before it can be implemented in the clinical setting.
GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity database

PubMed Central

2012-01-01

Background In the scientific biodiversity community, it is increasingly perceived the need to build a bridge between molecular and traditional biodiversity studies. We believe that the information technology could have a preeminent role in integrating the information generated by these studies with the large amount of molecular data we can find in bioinformatics public databases. This work is primarily aimed at building a bioinformatic infrastructure for the integration of public and private biodiversity data through the development of GIDL, an Intelligent Data Loader coupled with the Molecular Biodiversity Database. The system presented here organizes in an ontological way and locally stores the sequence and annotation data contained in the GenBank primary database. Methods The GIDL architecture consists of a relational database and of an intelligent data loader software. The relational database schema is designed to manage biodiversity information (Molecular Biodiversity Database) and it is organized in four areas: MolecularData, Experiment, Collection and Taxonomy. The MolecularData area is inspired to an established standard in Generic Model Organism Databases, the Chado relational schema. The peculiarity of Chado, and also its strength, is the adoption of an ontological schema which makes use of the Sequence Ontology. The Intelligent Data Loader (IDL) component of GIDL is an Extract, Transform and Load software able to parse data, to discover hidden information in the GenBank entries and to populate the Molecular Biodiversity Database. The IDL is composed by three main modules: the Parser, able to parse GenBank flat files; the Reasoner, which automatically builds CLIPS facts mapping the biological knowledge expressed by the Sequence Ontology; the DBFiller, which translates the CLIPS facts into ordered SQL statements used to populate the database. In GIDL Semantic Web technologies have been adopted due to their advantages in data representation, integration and processing. Results and conclusions Entries coming from Virus (814,122), Plant (1,365,360) and Invertebrate (959,065) divisions of GenBank rel.180 have been loaded in the Molecular Biodiversity Database by GIDL. Our system, combining the Sequence Ontology and the Chado schema, allows a more powerful query expressiveness compared with the most commonly used sequence retrieval systems like Entrez or SRS. PMID:22536971
Network-based drug discovery by integrating systems biology and computational technologies

PubMed Central

Leung, Elaine L.; Cao, Zhi-Wei; Jiang, Zhi-Hong; Zhou, Hua

2013-01-01

Network-based intervention has been a trend of curing systemic diseases, but it relies on regimen optimization and valid multi-target actions of the drugs. The complex multi-component nature of medicinal herbs may serve as valuable resources for network-based multi-target drug discovery due to its potential treatment effects by synergy. Recently, robustness of multiple systems biology platforms shows powerful to uncover molecular mechanisms and connections between the drugs and their targeting dynamic network. However, optimization methods of drug combination are insufficient, owning to lacking of tighter integration across multiple ‘-omics’ databases. The newly developed algorithm- or network-based computational models can tightly integrate ‘-omics’ databases and optimize combinational regimens of drug development, which encourage using medicinal herbs to develop into new wave of network-based multi-target drugs. However, challenges on further integration across the databases of medicinal herbs with multiple system biology platforms for multi-target drug optimization remain to the uncertain reliability of individual data sets, width and depth and degree of standardization of herbal medicine. Standardization of the methodology and terminology of multiple system biology and herbal database would facilitate the integration. Enhance public accessible databases and the number of research using system biology platform on herbal medicine would be helpful. Further integration across various ‘-omics’ platforms and computational tools would accelerate development of network-based drug discovery and network medicine. PMID:22877768
Teaching the Extracellular Matrix and Introducing Online Databases within a Multidisciplinary Course with i-Cell-MATRIX: A Student-Centered Approach

ERIC Educational Resources Information Center

Sousa, Joao Carlos; Costa, Manuel Joao; Palha, Joana Almeida

2010-01-01

The biochemistry and molecular biology of the extracellular matrix (ECM) is difficult to convey to students in a classroom setting in ways that capture their interest. The understanding of the matrix's roles in physiological and pathological conditions study will presumably be hampered by insufficient knowledge of its molecular structure.…
XML-based approaches for the integration of heterogeneous bio-molecular data.

PubMed

Mesiti, Marco; Jiménez-Ruiz, Ernesto; Sanz, Ismael; Berlanga-Llavori, Rafael; Perlasca, Paolo; Valentini, Giorgio; Manset, David

2009-10-15

The today's public database infrastructure spans a very large collection of heterogeneous biological data, opening new opportunities for molecular biology, bio-medical and bioinformatics research, but raising also new problems for their integration and computational processing. In this paper we survey the most interesting and novel approaches for the representation, integration and management of different kinds of biological data by exploiting XML and the related recommendations and approaches. Moreover, we present new and interesting cutting edge approaches for the appropriate management of heterogeneous biological data represented through XML. XML has succeeded in the integration of heterogeneous biomolecular information, and has established itself as the syntactic glue for biological data sources. Nevertheless, a large variety of XML-based data formats have been proposed, thus resulting in a difficult effective integration of bioinformatics data schemes. The adoption of a few semantic-rich standard formats is urgent to achieve a seamless integration of the current biological resources.
The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide

PubMed Central

Liolios, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Philip; Kyrpides, Nikos C.

2006-01-01

The Genomes On Line Database (GOLD) is a web resource for comprehensive access to information regarding complete and ongoing genome sequencing projects worldwide. The database currently incorporates information on over 1500 sequencing projects, of which 294 have been completed and the data deposited in the public databases. GOLD v.2 has been expanded to provide information related to organism properties such as phenotype, ecotype and disease. Furthermore, project relevance and availability information is now included. GOLD is available at . It is also mirrored at the Institute of Molecular Biology and Biotechnology, Crete, Greece at PMID:16381880

TranscriptomeBrowser 3.0: introducing a new compendium of molecular interactions and a new visualization tool for the study of gene regulatory networks.

PubMed

Lepoivre, Cyrille; Bergon, Aurélie; Lopez, Fabrice; Perumal, Narayanan B; Nguyen, Catherine; Imbert, Jean; Puthier, Denis

2012-01-31

Deciphering gene regulatory networks by in silico approaches is a crucial step in the study of the molecular perturbations that occur in diseases. The development of regulatory maps is a tedious process requiring the comprehensive integration of various evidences scattered over biological databases. Thus, the research community would greatly benefit from having a unified database storing known and predicted molecular interactions. Furthermore, given the intrinsic complexity of the data, the development of new tools offering integrated and meaningful visualizations of molecular interactions is necessary to help users drawing new hypotheses without being overwhelmed by the density of the subsequent graph. We extend the previously developed TranscriptomeBrowser database with a set of tables containing 1,594,978 human and mouse molecular interactions. The database includes: (i) predicted regulatory interactions (computed by scanning vertebrate alignments with a set of 1,213 position weight matrices), (ii) potential regulatory interactions inferred from systematic analysis of ChIP-seq experiments, (iii) regulatory interactions curated from the literature, (iv) predicted post-transcriptional regulation by micro-RNA, (v) protein kinase-substrate interactions and (vi) physical protein-protein interactions. In order to easily retrieve and efficiently analyze these interactions, we developed In-teractomeBrowser, a graph-based knowledge browser that comes as a plug-in for Transcriptome-Browser. The first objective of InteractomeBrowser is to provide a user-friendly tool to get new insight into any gene list by providing a context-specific display of putative regulatory and physical interactions. To achieve this, InteractomeBrowser relies on a "cell compartments-based layout" that makes use of a subset of the Gene Ontology to map gene products onto relevant cell compartments. This layout is particularly powerful for visual integration of heterogeneous biological information and is a productive avenue in generating new hypotheses. The second objective of InteractomeBrowser is to fill the gap between interaction databases and dynamic modeling. It is thus compatible with the network analysis software Cytoscape and with the Gene Interaction Network simulation software (GINsim). We provide examples underlying the benefits of this visualization tool for large gene set analysis related to thymocyte differentiation. The InteractomeBrowser plugin is a powerful tool to get quick access to a knowledge database that includes both predicted and validated molecular interactions. InteractomeBrowser is available through the TranscriptomeBrowser framework and can be found at: http://tagc.univ-mrs.fr/tbrowser/. Our database is updated on a regular basis.
Metabolite signal identification in accurate mass metabolomics data with MZedDB, an interactive m/z annotation tool utilising predicted ionisation behaviour 'rules'

PubMed Central

Draper, John; Enot, David P; Parker, David; Beckmann, Manfred; Snowdon, Stuart; Lin, Wanchang; Zubair, Hassan

2009-01-01

Background Metabolomics experiments using Mass Spectrometry (MS) technology measure the mass to charge ratio (m/z) and intensity of ionised molecules in crude extracts of complex biological samples to generate high dimensional metabolite 'fingerprint' or metabolite 'profile' data. High resolution MS instruments perform routinely with a mass accuracy of < 5 ppm (parts per million) thus providing potentially a direct method for signal putative annotation using databases containing metabolite mass information. Most database interfaces support only simple queries with the default assumption that molecules either gain or lose a single proton when ionised. In reality the annotation process is confounded by the fact that many ionisation products will be not only molecular isotopes but also salt/solvent adducts and neutral loss fragments of original metabolites. This report describes an annotation strategy that will allow searching based on all potential ionisation products predicted to form during electrospray ionisation (ESI). Results Metabolite 'structures' harvested from publicly accessible databases were converted into a common format to generate a comprehensive archive in MZedDB. 'Rules' were derived from chemical information that allowed MZedDB to generate a list of adducts and neutral loss fragments putatively able to form for each structure and calculate, on the fly, the exact molecular weight of every potential ionisation product to provide targets for annotation searches based on accurate mass. We demonstrate that data matrices representing populations of ionisation products generated from different biological matrices contain a large proportion (sometimes > 50%) of molecular isotopes, salt adducts and neutral loss fragments. Correlation analysis of ESI-MS data features confirmed the predicted relationships of m/z signals. An integrated isotope enumerator in MZedDB allowed verification of exact isotopic pattern distributions to corroborate experimental data. Conclusion We conclude that although ultra-high accurate mass instruments provide major insight into the chemical diversity of biological extracts, the facile annotation of a large proportion of signals is not possible by simple, automated query of current databases using computed molecular formulae. Parameterising MZedDB to take into account predicted ionisation behaviour and the biological source of any sample improves greatly both the frequency and accuracy of potential annotation 'hits' in ESI-MS data. PMID:19622150
ChlamyCyc: an integrative systems biology database and web-portal for Chlamydomonas reinhardtii.

PubMed

May, Patrick; Christian, Jan-Ole; Kempa, Stefan; Walther, Dirk

2009-05-04

The unicellular green alga Chlamydomonas reinhardtii is an important eukaryotic model organism for the study of photosynthesis and plant growth. In the era of modern high-throughput technologies there is an imperative need to integrate large-scale data sets from high-throughput experimental techniques using computational methods and database resources to provide comprehensive information about the molecular and cellular organization of a single organism. In the framework of the German Systems Biology initiative GoFORSYS, a pathway database and web-portal for Chlamydomonas (ChlamyCyc) was established, which currently features about 250 metabolic pathways with associated genes, enzymes, and compound information. ChlamyCyc was assembled using an integrative approach combining the recently published genome sequence, bioinformatics methods, and experimental data from metabolomics and proteomics experiments. We analyzed and integrated a combination of primary and secondary database resources, such as existing genome annotations from JGI, EST collections, orthology information, and MapMan classification. ChlamyCyc provides a curated and integrated systems biology repository that will enable and assist in systematic studies of fundamental cellular processes in Chlamydomonas. The ChlamyCyc database and web-portal is freely available under http://chlamycyc.mpimp-golm.mpg.de.
biochem4j: Integrated and extensible biochemical knowledge through graph databases.

PubMed

Swainston, Neil; Batista-Navarro, Riza; Carbonell, Pablo; Dobson, Paul D; Dunstan, Mark; Jervis, Adrian J; Vinaixa, Maria; Williams, Alan R; Ananiadou, Sophia; Faulon, Jean-Loup; Mendes, Pedro; Kell, Douglas B; Scrutton, Nigel S; Breitling, Rainer

2017-01-01

Biologists and biochemists have at their disposal a number of excellent, publicly available data resources such as UniProt, KEGG, and NCBI Taxonomy, which catalogue biological entities. Despite the usefulness of these resources, they remain fundamentally unconnected. While links may appear between entries across these databases, users are typically only able to follow such links by manual browsing or through specialised workflows. Although many of the resources provide web-service interfaces for computational access, performing federated queries across databases remains a non-trivial but essential activity in interdisciplinary systems and synthetic biology programmes. What is needed are integrated repositories to catalogue both biological entities and-crucially-the relationships between them. Such a resource should be extensible, such that newly discovered relationships-for example, those between novel, synthetic enzymes and non-natural products-can be added over time. With the introduction of graph databases, the barrier to the rapid generation, extension and querying of such a resource has been lowered considerably. With a particular focus on metabolic engineering as an illustrative application domain, biochem4j, freely available at http://biochem4j.org, is introduced to provide an integrated, queryable database that warehouses chemical, reaction, enzyme and taxonomic data from a range of reliable resources. The biochem4j framework establishes a starting point for the flexible integration and exploitation of an ever-wider range of biological data sources, from public databases to laboratory-specific experimental datasets, for the benefit of systems biologists, biosystems engineers and the wider community of molecular biologists and biological chemists.
biochem4j: Integrated and extensible biochemical knowledge through graph databases

PubMed Central

Batista-Navarro, Riza; Dunstan, Mark; Jervis, Adrian J.; Vinaixa, Maria; Ananiadou, Sophia; Faulon, Jean-Loup; Kell, Douglas B.

2017-01-01

Biologists and biochemists have at their disposal a number of excellent, publicly available data resources such as UniProt, KEGG, and NCBI Taxonomy, which catalogue biological entities. Despite the usefulness of these resources, they remain fundamentally unconnected. While links may appear between entries across these databases, users are typically only able to follow such links by manual browsing or through specialised workflows. Although many of the resources provide web-service interfaces for computational access, performing federated queries across databases remains a non-trivial but essential activity in interdisciplinary systems and synthetic biology programmes. What is needed are integrated repositories to catalogue both biological entities and–crucially–the relationships between them. Such a resource should be extensible, such that newly discovered relationships–for example, those between novel, synthetic enzymes and non-natural products–can be added over time. With the introduction of graph databases, the barrier to the rapid generation, extension and querying of such a resource has been lowered considerably. With a particular focus on metabolic engineering as an illustrative application domain, biochem4j, freely available at http://biochem4j.org, is introduced to provide an integrated, queryable database that warehouses chemical, reaction, enzyme and taxonomic data from a range of reliable resources. The biochem4j framework establishes a starting point for the flexible integration and exploitation of an ever-wider range of biological data sources, from public databases to laboratory-specific experimental datasets, for the benefit of systems biologists, biosystems engineers and the wider community of molecular biologists and biological chemists. PMID:28708831
Database constraints applied to metabolic pathway reconstruction tools.

PubMed

Vilaplana, Jordi; Solsona, Francesc; Teixido, Ivan; Usié, Anabel; Karathia, Hiren; Alves, Rui; Mateo, Jordi

2014-01-01

Our group developed two biological applications, Biblio-MetReS and Homol-MetReS, accessing the same database of organisms with annotated genes. Biblio-MetReS is a data-mining application that facilitates the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the process(es) of interest and their function. It also enables the sets of proteins involved in the process(es) in different organisms to be compared directly. The efficiency of these biological applications is directly related to the design of the shared database. We classified and analyzed the different kinds of access to the database. Based on this study, we tried to adjust and tune the configurable parameters of the database server to reach the best performance of the communication data link to/from the database system. Different database technologies were analyzed. We started the study with a public relational SQL database, MySQL. Then, the same database was implemented by a MapReduce-based database named HBase. The results indicated that the standard configuration of MySQL gives an acceptable performance for low or medium size databases. Nevertheless, tuning database parameters can greatly improve the performance and lead to very competitive runtimes.
Evaluation of MALDI-TOF mass spectrometry for identification of environmental yeasts and development of supplementary database.

PubMed

Agustini, Bruna Carla; Silva, Luciano Paulino; Bloch, Carlos; Bonfim, Tania M B; da Silva, Gildo Almeida

2014-06-01

Yeast identification using traditional methods which employ morphological, physiological, and biochemical characteristics can be considered a hard task as it requires experienced microbiologists and a rigorous control in culture conditions that could implicate in different outcomes. Considering clinical or industrial applications, the fast and accurate identification of microorganisms is a crescent demand. Hence, molecular biology approaches has been extensively used and, more recently, protein profiling using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) has proved to be an even more efficient tool for taxonomic purposes. Nonetheless, concerning to mass spectrometry, data available for the differentiation of yeast species for industrial purpose is limited and reference databases commercially available comprise almost exclusively clinical microorganisms. In this context, studies focusing on environmental isolates are required to extend the existing databases. The development of a supplementary database and the assessment of a commercial database for taxonomic identifications of environmental yeast are the aims of this study. We challenge MALDI-TOF MS to create protein profiles for 845 yeast strains isolated from grape must and 67.7 % of the strains were successfully identified according to previously available manufacturer database. The remaining 32.3 % strains were not identified due to the absence of a reference spectrum. After matching the correct taxon for these strains by using molecular biology approaches, the spectra concerning the missing species were added in a supplementary database. This new library was able to accurately predict unidentified species at first instance by MALDI-TOF MS, proving it is a powerful tool for the identification of environmental yeasts.
Database citation in full text biomedical articles.

PubMed

Kafkas, Şenay; Kim, Jee-Hyub; McEntyre, Johanna R

2013-01-01

Molecular biology and literature databases represent essential infrastructure for life science research. Effective integration of these data resources requires that there are structured cross-references at the level of individual articles and biological records. Here, we describe the current patterns of how database entries are cited in research articles, based on analysis of the full text Open Access articles available from Europe PMC. Focusing on citation of entries in the European Nucleotide Archive (ENA), UniProt and Protein Data Bank, Europe (PDBe), we demonstrate that text mining doubles the number of structured annotations of database record citations supplied in journal articles by publishers. Many thousands of new literature-database relationships are found by text mining, since these relationships are also not present in the set of articles cited by database records. We recommend that structured annotation of database records in articles is extended to other databases, such as ArrayExpress and Pfam, entries from which are also cited widely in the literature. The very high precision and high-throughput of this text-mining pipeline makes this activity possible both accurately and at low cost, which will allow the development of new integrated data services.
Database Citation in Full Text Biomedical Articles

PubMed Central

Kafkas, Şenay; Kim, Jee-Hyub; McEntyre, Johanna R.

2013-01-01

Molecular biology and literature databases represent essential infrastructure for life science research. Effective integration of these data resources requires that there are structured cross-references at the level of individual articles and biological records. Here, we describe the current patterns of how database entries are cited in research articles, based on analysis of the full text Open Access articles available from Europe PMC. Focusing on citation of entries in the European Nucleotide Archive (ENA), UniProt and Protein Data Bank, Europe (PDBe), we demonstrate that text mining doubles the number of structured annotations of database record citations supplied in journal articles by publishers. Many thousands of new literature-database relationships are found by text mining, since these relationships are also not present in the set of articles cited by database records. We recommend that structured annotation of database records in articles is extended to other databases, such as ArrayExpress and Pfam, entries from which are also cited widely in the literature. The very high precision and high-throughput of this text-mining pipeline makes this activity possible both accurately and at low cost, which will allow the development of new integrated data services. PMID:23734176
Computing biological functions using BioΨ, a formal description of biological processes based on elementary bricks of actions

PubMed Central

Pérès, Sabine; Felicori, Liza; Rialle, Stéphanie; Jobard, Elodie; Molina, Franck

2010-01-01

Motivation: In the available databases, biological processes are described from molecular and cellular points of view, but these descriptions are represented with text annotations that make it difficult to handle them for computation. Consequently, there is an obvious need for formal descriptions of biological processes. Results: We present a formalism that uses the BioΨ concepts to model biological processes from molecular details to networks. This computational approach, based on elementary bricks of actions, allows us to calculate on biological functions (e.g. process comparison, mapping structure–function relationships, etc.). We illustrate its application with two examples: the functional comparison of proteases and the functional description of the glycolysis network. This computational approach is compatible with detailed biological knowledge and can be applied to different kinds of systems of simulation. Availability: www.sysdiag.cnrs.fr/publications/supplementary-materials/BioPsi_Manager/ Contact: sabine.peres@sysdiag.cnrs.fr; franck.molina@sysdiag.cnrs.fr Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20448138
When insect endosymbionts and plant endophytes mediate biological control outcomes

USDA-ARS?s Scientific Manuscript database

The identification of endosymbionts and endophytes within insect and plant tissues, respectively, has increased exponentially over the past 10-15 years, enabled largely by the proliferation of sensitive molecular techniques and publicly accessible databases of nucleotide sequences. However, the rate...
The LncRNA Connectivity Map: Using LncRNA Signatures to Connect Small Molecules, LncRNAs, and Diseases.

PubMed

Yang, Haixiu; Shang, Desi; Xu, Yanjun; Zhang, Chunlong; Feng, Li; Sun, Zeguo; Shi, Xinrui; Zhang, Yunpeng; Han, Junwei; Su, Fei; Li, Chunquan; Li, Xia

2017-07-27

Well characterized the connections among diseases, long non-coding RNAs (lncRNAs) and drugs are important for elucidating the key roles of lncRNAs in biological mechanisms in various biological states. In this study, we constructed a database called LNCmap (LncRNA Connectivity Map), available at http://www.bio-bigdata.com/LNCmap/ , to establish the correlations among diseases, physiological processes, and the action of small molecule therapeutics by attempting to describe all biological states in terms of lncRNA signatures. By reannotating the microarray data from the Connectivity Map database, the LNCmap obtained 237 lncRNA signatures of 5916 instances corresponding to 1262 small molecular drugs. We provided a user-friendly interface for the convenient browsing, retrieval and download of the database, including detailed information and the associations of drugs and corresponding affected lncRNAs. Additionally, we developed two enrichment analysis methods for users to identify candidate drugs for a particular disease by inputting the corresponding lncRNA expression profiles or an associated lncRNA list and then comparing them to the lncRNA signatures in our database. Overall, LNCmap could significantly improve our understanding of the biological roles of lncRNAs and provide a unique resource to reveal the connections among drugs, lncRNAs and diseases.
SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters.

PubMed

Wang, Chunlin; Lefkowitz, Elliot J

2004-10-28

Large-scale sequence comparison is a powerful tool for biological inference in modern molecular biology. Comparing new sequences to those in annotated databases is a useful source of functional and structural information about these sequences. Using software such as the basic local alignment search tool (BLAST) or HMMPFAM to identify statistically significant matches between newly sequenced segments of genetic material and those in databases is an important task for most molecular biologists. Searching algorithms are intrinsically slow and data-intensive, especially in light of the rapid growth of biological sequence databases due to the emergence of high throughput DNA sequencing techniques. Thus, traditional bioinformatics tools are impractical on PCs and even on dedicated UNIX servers. To take advantage of larger databases and more reliable methods, high performance computation becomes necessary. We describe the implementation of SS-Wrapper (Similarity Search Wrapper), a package of wrapper applications that can parallelize similarity search applications on a Linux cluster. Our wrapper utilizes a query segmentation-search (QS-search) approach to parallelize sequence database search applications. It takes into consideration load balancing between each node on the cluster to maximize resource usage. QS-search is designed to wrap many different search tools, such as BLAST and HMMPFAM using the same interface. This implementation does not alter the original program, so newly obtained programs and program updates should be accommodated easily. Benchmark experiments using QS-search to optimize BLAST and HMMPFAM showed that QS-search accelerated the performance of these programs almost linearly in proportion to the number of CPUs used. We have also implemented a wrapper that utilizes a database segmentation approach (DS-BLAST) that provides a complementary solution for BLAST searches when the database is too large to fit into the memory of a single node. Used together, QS-search and DS-BLAST provide a flexible solution to adapt sequential similarity searching applications in high performance computing environments. Their ease of use and their ability to wrap a variety of database search programs provide an analytical architecture to assist both the seasoned bioinformaticist and the wet-bench biologist.
SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters

PubMed Central

Wang, Chunlin; Lefkowitz, Elliot J

2004-01-01

Background Large-scale sequence comparison is a powerful tool for biological inference in modern molecular biology. Comparing new sequences to those in annotated databases is a useful source of functional and structural information about these sequences. Using software such as the basic local alignment search tool (BLAST) or HMMPFAM to identify statistically significant matches between newly sequenced segments of genetic material and those in databases is an important task for most molecular biologists. Searching algorithms are intrinsically slow and data-intensive, especially in light of the rapid growth of biological sequence databases due to the emergence of high throughput DNA sequencing techniques. Thus, traditional bioinformatics tools are impractical on PCs and even on dedicated UNIX servers. To take advantage of larger databases and more reliable methods, high performance computation becomes necessary. Results We describe the implementation of SS-Wrapper (Similarity Search Wrapper), a package of wrapper applications that can parallelize similarity search applications on a Linux cluster. Our wrapper utilizes a query segmentation-search (QS-search) approach to parallelize sequence database search applications. It takes into consideration load balancing between each node on the cluster to maximize resource usage. QS-search is designed to wrap many different search tools, such as BLAST and HMMPFAM using the same interface. This implementation does not alter the original program, so newly obtained programs and program updates should be accommodated easily. Benchmark experiments using QS-search to optimize BLAST and HMMPFAM showed that QS-search accelerated the performance of these programs almost linearly in proportion to the number of CPUs used. We have also implemented a wrapper that utilizes a database segmentation approach (DS-BLAST) that provides a complementary solution for BLAST searches when the database is too large to fit into the memory of a single node. Conclusions Used together, QS-search and DS-BLAST provide a flexible solution to adapt sequential similarity searching applications in high performance computing environments. Their ease of use and their ability to wrap a variety of database search programs provide an analytical architecture to assist both the seasoned bioinformaticist and the wet-bench biologist. PMID:15511296
Exploring Genetic, Genomic, and Phenotypic Data at the Rat Genome Database

PubMed Central

Laulederkind, Stanley J. F.; Hayman, G. Thomas; Wang, Shur-Jen; Lowry, Timothy F.; Nigam, Rajni; Petri, Victoria; Smith, Jennifer R.; Dwinell, Melinda R.; Jacob, Howard J.; Shimoyama, Mary

2013-01-01

The laboratory rat, Rattus norvegicus, is an important model of human health and disease, and experimental findings in the rat have relevance to human physiology and disease. The Rat Genome Database (RGD, http://rgd.mcw.edu) is a model organism database that provides access to a wide variety of curated rat data including disease associations, phenotypes, pathways, molecular functions, biological processes and cellular components for genes, quantitative trait loci, and strains. We present an overview of the database followed by specific examples that can be used to gain experience in employing RGD to explore the wealth of functional data available for the rat. PMID:23255149
Broad issues to consider for library involvement in bioinformatics*

PubMed Central

Geer, Renata C.

2006-01-01

Background: The information landscape in biological and medical research has grown far beyond literature to include a wide variety of databases generated by research fields such as molecular biology and genomics. The traditional role of libraries to collect, organize, and provide access to information can expand naturally to encompass these new data domains. Methods: This paper discusses the current and potential role of libraries in bioinformatics using empirical evidence and experience from eleven years of work in user services at the National Center for Biotechnology Information. Findings: Medical and science libraries over the last decade have begun to establish educational and support programs to address the challenges users face in the effective and efficient use of a plethora of molecular biology databases and retrieval and analysis tools. As more libraries begin to establish a role in this area, the issues they face include assessment of user needs and skills, identification of existing services, development of plans for new services, recruitment and training of specialized staff, and establishment of collaborations with bioinformatics centers at their institutions. Conclusions: Increasing library involvement in bioinformatics can help address information needs of a broad range of students, researchers, and clinicians and ultimately help realize the power of bioinformatics resources in making new biological discoveries. PMID:16888662
Automated detection of discourse segment and experimental types from the text of cancer pathway results sections.

PubMed

Burns, Gully A P C; Dasigi, Pradeep; de Waard, Anita; Hovy, Eduard H

2016-01-01

Automated machine-reading biocuration systems typically use sentence-by-sentence information extraction to construct meaning representations for use by curators. This does not directly reflect the typical discourse structure used by scientists to construct an argument from the experimental data available within a article, and is therefore less likely to correspond to representations typically used in biomedical informatics systems (let alone to the mental models that scientists have). In this study, we develop Natural Language Processing methods to locate, extract, and classify the individual passages of text from articles' Results sections that refer to experimental data. In our domain of interest (molecular biology studies of cancer signal transduction pathways), individual articles may contain as many as 30 small-scale individual experiments describing a variety of findings, upon which authors base their overall research conclusions. Our system automatically classifies discourse segments in these texts into seven categories (fact, hypothesis, problem, goal, method, result, implication) with an F-score of 0.68. These segments describe the essential building blocks of scientific discourse to (i) provide context for each experiment, (ii) report experimental details and (iii) explain the data's meaning in context. We evaluate our system on text passages from articles that were curated in molecular biology databases (the Pathway Logic Datum repository, the Molecular Interaction MINT and INTACT databases) linking individual experiments in articles to the type of assay used (coprecipitation, phosphorylation, translocation etc.). We use supervised machine learning techniques on text passages containing unambiguous references to experiments to obtain baseline F1 scores of 0.59 for MINT, 0.71 for INTACT and 0.63 for Pathway Logic. Although preliminary, these results support the notion that targeting information extraction methods to experimental results could provide accurate, automated methods for biocuration. We also suggest the need for finer-grained curation of experimental methods used when constructing molecular biology databases. © The Author(s) 2016. Published by Oxford University Press.
A Systems Biology Approach Reveals Converging Molecular Mechanisms that Link Different POPs to Common Metabolic Diseases.

PubMed

Ruiz, Patricia; Perlina, Ally; Mumtaz, Moiz; Fowler, Bruce A

2016-07-01

A number of epidemiological studies have identified statistical associations between persistent organic pollutants (POPs) and metabolic diseases, but testable hypotheses regarding underlying molecular mechanisms to explain these linkages have not been published. We assessed the underlying mechanisms of POPs that have been associated with metabolic diseases; three well-known POPs [2,3,7,8-tetrachlorodibenzodioxin (TCDD), 2,2´,4,4´,5,5´-hexachlorobiphenyl (PCB 153), and 4,4´-dichlorodiphenyldichloroethylene (p,p´-DDE)] were studied. We used advanced database search tools to delineate testable hypotheses and to guide laboratory-based research studies into underlying mechanisms by which this POP mixture could produce or exacerbate metabolic diseases. For our searches, we used proprietary systems biology software (MetaCore™/MetaDrug™) to conduct advanced search queries for the underlying interactions database, followed by directional network construction to identify common mechanisms for these POPs within two or fewer interaction steps downstream of their primary targets. These common downstream pathways belong to various cytokine and chemokine families with experimentally well-documented causal associations with type 2 diabetes. Our systems biology approach allowed identification of converging pathways leading to activation of common downstream targets. To our knowledge, this is the first study to propose an integrated global set of step-by-step molecular mechanisms for a combination of three common POPs using a systems biology approach, which may link POP exposure to diseases. Experimental evaluation of the proposed pathways may lead to development of predictive biomarkers of the effects of POPs, which could translate into disease prevention and effective clinical treatment strategies. Ruiz P, Perlina A, Mumtaz M, Fowler BA. 2016. A systems biology approach reveals converging molecular mechanisms that link different POPs to common metabolic diseases. Environ Health Perspect 124:1034-1041; http://dx.doi.org/10.1289/ehp.1510308.
Database Constraints Applied to Metabolic Pathway Reconstruction Tools

PubMed Central

Vilaplana, Jordi; Solsona, Francesc; Teixido, Ivan; Usié, Anabel; Karathia, Hiren; Alves, Rui; Mateo, Jordi

2014-01-01

Our group developed two biological applications, Biblio-MetReS and Homol-MetReS, accessing the same database of organisms with annotated genes. Biblio-MetReS is a data-mining application that facilitates the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the process(es) of interest and their function. It also enables the sets of proteins involved in the process(es) in different organisms to be compared directly. The efficiency of these biological applications is directly related to the design of the shared database. We classified and analyzed the different kinds of access to the database. Based on this study, we tried to adjust and tune the configurable parameters of the database server to reach the best performance of the communication data link to/from the database system. Different database technologies were analyzed. We started the study with a public relational SQL database, MySQL. Then, the same database was implemented by a MapReduce-based database named HBase. The results indicated that the standard configuration of MySQL gives an acceptable performance for low or medium size databases. Nevertheless, tuning database parameters can greatly improve the performance and lead to very competitive runtimes. PMID:25202745
The transcriptome of Lutzomyia longipalpis (Diptera: Psychodidae) male reproductive organs.

PubMed

Azevedo, Renata V D M; Dias, Denise B S; Bretãs, Jorge A C; Mazzoni, Camila J; Souza, Nataly A; Albano, Rodolpho M; Wagner, Glauber; Davila, Alberto M R; Peixoto, Alexandre A

2012-01-01

It has been suggested that genes involved in the reproductive biology of insect disease vectors are potential targets for future alternative methods of control. Little is known about the molecular biology of reproduction in phlebotomine sand flies and there is no information available concerning genes that are expressed in male reproductive organs of Lutzomyia longipalpis, the main vector of American visceral leishmaniasis and a species complex. We generated 2678 high quality ESTs ("Expressed Sequence Tags") of L. longipalpis male reproductive organs that were grouped in 1391 non-redundant sequences (1136 singlets and 255 clusters). BLAST analysis revealed that only 57% of these sequences share similarity with a L. longipalpis female EST database. Although no more than 36% of the non-redundant sequences showed similarity to protein sequences deposited in databases, more than half of them presented the best-match hits with mosquito genes. Gene ontology analysis identified subsets of genes involved in biological processes such as protein biosynthesis and DNA replication, which are probably associated with spermatogenesis. A number of non-redundant sequences were also identified as putative male reproductive gland proteins (mRGPs), also known as male accessory gland protein genes (Acps). The transcriptome analysis of L. longipalpis male reproductive organs is one step further in the study of the molecular basis of the reproductive biology of this important species complex. It has allowed the identification of genes potentially involved in spermatogenesis as well as putative mRGPs sequences, which have been studied in many insect species because of their effects on female post-mating behavior and physiology and their potential role in sexual selection and speciation. These data open a number of new avenues for further research in the molecular and evolutionary reproductive biology of sand flies.

The Transcriptome of Lutzomyia longipalpis (Diptera: Psychodidae) Male Reproductive Organs

PubMed Central

Bretãs, Jorge A. C.; Mazzoni, Camila J.; Souza, Nataly A.; Albano, Rodolpho M.; Wagner, Glauber; Davila, Alberto M. R.; Peixoto, Alexandre A.

2012-01-01

Background It has been suggested that genes involved in the reproductive biology of insect disease vectors are potential targets for future alternative methods of control. Little is known about the molecular biology of reproduction in phlebotomine sand flies and there is no information available concerning genes that are expressed in male reproductive organs of Lutzomyia longipalpis, the main vector of American visceral leishmaniasis and a species complex. Methods/Principal Findings We generated 2678 high quality ESTs (“Expressed Sequence Tags”) of L. longipalpis male reproductive organs that were grouped in 1391 non-redundant sequences (1136 singlets and 255 clusters). BLAST analysis revealed that only 57% of these sequences share similarity with a L. longipalpis female EST database. Although no more than 36% of the non-redundant sequences showed similarity to protein sequences deposited in databases, more than half of them presented the best-match hits with mosquito genes. Gene ontology analysis identified subsets of genes involved in biological processes such as protein biosynthesis and DNA replication, which are probably associated with spermatogenesis. A number of non-redundant sequences were also identified as putative male reproductive gland proteins (mRGPs), also known as male accessory gland protein genes (Acps). Conclusions The transcriptome analysis of L. longipalpis male reproductive organs is one step further in the study of the molecular basis of the reproductive biology of this important species complex. It has allowed the identification of genes potentially involved in spermatogenesis as well as putative mRGPs sequences, which have been studied in many insect species because of their effects on female post-mating behavior and physiology and their potential role in sexual selection and speciation. These data open a number of new avenues for further research in the molecular and evolutionary reproductive biology of sand flies. PMID:22496818
Learning about Intermolecular Interactions from the Cambridge Structural Database

ERIC Educational Resources Information Center

Battle, Gary M.; Allen, Frank H.

2012-01-01

A clear understanding and appreciation of noncovalent interactions, especially hydrogen bonding, are vitally important to students of chemistry and the life sciences, including biochemistry, molecular biology, pharmacology, and medicine. The opportunities afforded by the IsoStar knowledge base of intermolecular interactions to enhance the…
[The thirty years of Acta Genetica Sinica].

PubMed

Li, Shao-Wu; Zhou, Su; Xue, Yong-Biao; Zhu, Li-Huang

2003-04-01

Acta Genetica Sinica (AGS) is sponsored by the Genetics Society of China and the Institute of Genetics and Developmental Biology of Chinese Academy of Sciences, and is published by Science Press. The journal is a leading national academic periodical and one of the Chinese key periodicals of natural sciences. Currently, AGS is being indexed by several well-known domestic and international indexing systems, such as the American Chemical Digest (CA), BIOSIS database, Biological Digest (BA), Medical Index and Russian Digest (P [symbol: see text]). Papers in the areas of genetics, developmental biology, cell molecular biology and evolution are regularly published by AGS.
Relax with CouchDB - Into the non-relational DBMS era of Bioinformatics

PubMed Central

Manyam, Ganiraju; Payton, Michelle A.; Roth, Jack A.; Abruzzo, Lynne V.; Coombes, Kevin R.

2012-01-01

With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. PMID:22609849
Bioinformatics in Undergraduate Education: Practical Examples

ERIC Educational Resources Information Center

Boyle, John A.

2004-01-01

Bioinformatics has emerged as an important research tool in recent years. The ability to mine large databases for relevant information has become increasingly central to many different aspects of biochemistry and molecular biology. It is important that undergraduates be introduced to the available information and methodologies. We present a…
TIPdb-3D: the three-dimensional structure database of phytochemicals from Taiwan indigenous plants

PubMed Central

Tung, Chun-Wei; Lin, Ying-Chi; Chang, Hsun-Shuo; Wang, Chia-Chi; Chen, Ih-Sheng; Jheng, Jhao-Liang; Li, Jih-Heng

2014-01-01

The rich indigenous and endemic plants in Taiwan serve as a resourceful bank for biologically active phytochemicals. Based on our TIPdb database curating bioactive phytochemicals from Taiwan indigenous plants, this study presents a three-dimensional (3D) chemical structure database named TIPdb-3D to support the discovery of novel pharmacologically active compounds. The Merck Molecular Force Field (MMFF94) was used to generate 3D structures of phytochemicals in TIPdb. The 3D structures could facilitate the analysis of 3D quantitative structure–activity relationship, the exploration of chemical space and the identification of potential pharmacologically active compounds using protein–ligand docking. Database URL: http://cwtung.kmu.edu.tw/tipdb. PMID:24930145
Omics databases on kidney disease: where they can be found and how to benefit from them.

PubMed

Papadopoulos, Theofilos; Krochmal, Magdalena; Cisek, Katryna; Fernandes, Marco; Husi, Holger; Stevens, Robert; Bascands, Jean-Loup; Schanstra, Joost P; Klein, Julie

2016-06-01

In the recent decades, the evolution of omics technologies has led to advances in all biological fields, creating a demand for effective storage, management and exchange of rapidly generated data and research discoveries. To address this need, the development of databases of experimental outputs has become a common part of scientific practice in order to serve as knowledge sources and data-sharing platforms, providing information about genes, transcripts, proteins or metabolites. In this review, we present omics databases available currently, with a special focus on their application in kidney research and possibly in clinical practice. Databases are divided into two categories: general databases with a broad information scope and kidney-specific databases distinctively concentrated on kidney pathologies. In research, databases can be used as a rich source of information about pathophysiological mechanisms and molecular targets. In the future, databases will support clinicians with their decisions, providing better and faster diagnoses and setting the direction towards more preventive, personalized medicine. We also provide a test case demonstrating the potential of biological databases in comparing multi-omics datasets and generating new hypotheses to answer a critical and common diagnostic problem in nephrology practice. In the future, employment of databases combined with data integration and data mining should provide powerful insights into unlocking the mysteries of kidney disease, leading to a potential impact on pharmacological intervention and therapeutic disease management.
High-throughput Crystallography for Structural Genomics

PubMed Central

Joachimiak, Andrzej

2009-01-01

Protein X-ray crystallography recently celebrated its 50th anniversary. The structures of myoglobin and hemoglobin determined by Kendrew and Perutz provided the first glimpses into the complex protein architecture and chemistry. Since then, the field of structural molecular biology has experienced extraordinary progress and now over 53,000 proteins structures have been deposited into the Protein Data Bank. In the past decade many advances in macromolecular crystallography have been driven by world-wide structural genomics efforts. This was made possible because of third-generation synchrotron sources, structure phasing approaches using anomalous signal and cryo-crystallography. Complementary progress in molecular biology, proteomics, hardware and software for crystallographic data collection, structure determination and refinement, computer science, databases, robotics and automation improved and accelerated many processes. These advancements provide the robust foundation for structural molecular biology and assure strong contribution to science in the future. In this report we focus mainly on reviewing structural genomics high-throughput X-ray crystallography technologies and their impact. PMID:19765976
Transcriptome Analysis of the Octopus vulgaris Central Nervous System

PubMed Central

Zhang, Xiang; Mao, Yong; Huang, Zixia; Qu, Meng; Chen, Jun; Ding, Shaoxiong; Hong, Jingni; Sun, Tiantian

2012-01-01

Background Cephalopoda are a class of Mollusca species found in all the world's oceans. They are an important model organism in neurobiology. Unfortunately, the lack of neuronal molecular sequences, such as ESTs, transcriptomic or genomic information, has limited the development of molecular neurobiology research in this unique model organism. Results With high-throughput Illumina Solexa sequencing technology, we have generated 59,859 high quality sequences from 12,918,391 paired-end reads. Using BLASTx/BLASTn, 12,227 contigs have blast hits in the Swissprot, NR protein database and NT nucleotide database with E-value cutoff 1e−5. The comparison between the Octopus vulgaris central nervous system (CNS) library and the Aplysia californica/Lymnaea stagnalis CNS ESTs library yielded 5.93%/13.45% of O. vulgaris sequences with significant matches (1e−5) using BLASTn/tBLASTx. Meanwhile the hit percentage of the recently published Schistocerca gregaria, Tilapia or Hirudo medicinalis CNS library to the O. vulgaris CNS library is 21.03%–46.19%. We constructed the Phylogenetic tree using two genes related to CNS function, Synaptotagmin-7 and Synaptophysin. Lastly, we demonstrated that O. vulgaris may have a vertebrate-like Blood-Brain Barrier based on bioinformatic analysis. Conclusion This study provides a mass of molecular information that will contribute to further molecular biology research on O. vulgaris. In our presentation of the first CNS transcriptome analysis of O. vulgaris, we hope to accelerate the study of functional molecular neurobiology and comparative evolutionary biology. PMID:22768275
The National DNA Data Bank of Canada: a Quebecer perspective

PubMed Central

Milot, Emmanuel; Lecomte, Marie M. J.; Germain, Hugo; Crispino, Frank

2013-01-01

The Canadian National DNA Database was created in 1998 and first used in the mid-2000. Under management by the RCMP, the National DNA Data Bank of Canada offers each year satisfactory reported statistics for its use and efficiency. Built on two indexes (convicted offenders and crime scene indexes), the database not only provides increasing matches to offenders or linked traces to the various police forces of the nation, but offers a memory repository for cold cases. Despite these achievements, the data bank is now facing new challenges that will inevitably defy the way the database is currently used. These arise from the increasing power of detection of DNA traces, the diversity of demands from police investigators and the growth of the bank itself. Examples of new requirements from the database now include familial searches, low-copy-number analyses and the correct interpretation of mixed samples. This paper aims to develop on the original way set in Québec to address some of these challenges. Nevertheless, analytic and technological advances will inevitably lead to the introduction of new technologies in forensic laboratories, such as single cell sequencing, phenotyping, and proteomics. Furthermore, it will not only request a new holistic/global approach of the forensic molecular biology sciences (through academia and a more investigative role in the laboratory), but also new legal developments. Far from being exhaustive, this paper highlights some of the current use of the database, its potential for the future, and opportunity to expand as a result of recent technological developments in molecular biology, including, but not limited to DNA identification. PMID:24312124
The National DNA Data Bank of Canada: a Quebecer perspective.

PubMed

Milot, Emmanuel; Lecomte, Marie M J; Germain, Hugo; Crispino, Frank

2013-11-20

The Canadian National DNA Database was created in 1998 and first used in the mid-2000. Under management by the RCMP, the National DNA Data Bank of Canada offers each year satisfactory reported statistics for its use and efficiency. Built on two indexes (convicted offenders and crime scene indexes), the database not only provides increasing matches to offenders or linked traces to the various police forces of the nation, but offers a memory repository for cold cases. Despite these achievements, the data bank is now facing new challenges that will inevitably defy the way the database is currently used. These arise from the increasing power of detection of DNA traces, the diversity of demands from police investigators and the growth of the bank itself. Examples of new requirements from the database now include familial searches, low-copy-number analyses and the correct interpretation of mixed samples. This paper aims to develop on the original way set in Québec to address some of these challenges. Nevertheless, analytic and technological advances will inevitably lead to the introduction of new technologies in forensic laboratories, such as single cell sequencing, phenotyping, and proteomics. Furthermore, it will not only request a new holistic/global approach of the forensic molecular biology sciences (through academia and a more investigative role in the laboratory), but also new legal developments. Far from being exhaustive, this paper highlights some of the current use of the database, its potential for the future, and opportunity to expand as a result of recent technological developments in molecular biology, including, but not limited to DNA identification.
A unique large-scale undergraduate research experience in molecular systems biology for non-mathematics majors.

PubMed

Kappler, Ulrike; Rowland, Susan L; Pedwell, Rhianna K

2017-05-01

Systems biology is frequently taught with an emphasis on mathematical modeling approaches. This focus effectively excludes most biology, biochemistry, and molecular biology students, who are not mathematics majors. The mathematical focus can also present a misleading picture of systems biology, which is a multi-disciplinary pursuit requiring collaboration between biochemists, bioinformaticians, and mathematicians. This article describes an authentic large-scale undergraduate research experience (ALURE) in systems biology that incorporates proteomics, bacterial genomics, and bioinformatics in the one exercise. This project is designed to engage students who have a basic grounding in protein chemistry and metabolism and no mathematical modeling skills. The pedagogy around the research experience is designed to help students attack complex datasets and use their emergent metabolic knowledge to make meaning from large amounts of raw data. On completing the ALURE, participants reported a significant increase in their confidence around analyzing large datasets, while the majority of the cohort reported good or great gains in a variety of skills including "analysing data for patterns" and "conducting database or internet searches." An environmental scan shows that this ALURE is the only undergraduate-level system-biology research project offered on a large-scale in Australia; this speaks to the perceived difficulty of implementing such an opportunity for students. We argue however, that based on the student feedback, allowing undergraduate students to complete a systems-biology project is both feasible and desirable, even if the students are not maths and computing majors. © 2016 by The International Union of Biochemistry and Molecular Biology, 45(3):235-248, 2017. © 2016 The International Union of Biochemistry and Molecular Biology.
R.E.DD.B.: A database for RESP and ESP atomic charges, and force field libraries

PubMed Central

Dupradeau, François-Yves; Cézard, Christine; Lelong, Rodolphe; Stanislawiak, Élodie; Pêcher, Julien; Delepine, Jean Charles; Cieplak, Piotr

2008-01-01

The web-based RESP ESP charge DataBase (R.E.DD.B., http://q4md-forcefieldtools.org/REDDB) is a free and new source of RESP and ESP atomic charge values and force field libraries for model systems and/or small molecules. R.E.DD.B. stores highly effective and reproducible charge values and molecular structures in the Tripos mol2 file format, information about the charge derivation procedure, scripts to integrate the charges and molecular topology in the most common molecular dynamics packages. Moreover, R.E.DD.B. allows users to freely store and distribute RESP or ESP charges and force field libraries to the scientific community, via a web interface. The first version of R.E.DD.B., released in January 2006, contains force field libraries for molecules as well as molecular fragments for standard residues and their analogs (amino acids, monosaccharides, nucleotides and ligands), hence covering a vast area of relevant biological applications. PMID:17962302
Perspectives in biological physics: the nDDB project for a neutron Dynamics Data Bank for biological macromolecules.

PubMed

Rusevich, Leonid; García Sakai, Victoria; Franzetti, Bruno; Johnson, Mark; Natali, Francesca; Pellegrini, Eric; Peters, Judith; Pieper, Jörg; Weik, Martin; Zaccai, Giuseppe

2013-07-01

Neutron spectroscopy provides experimental data on time-dependent trajectories, which can be directly compared to molecular dynamics simulations. Its importance in helping us to understand biological macromolecules at a molecular level is demonstrated by the results of a literature survey over the last two to three decades. Around 300 articles in refereed journals relate to neutron scattering studies of biological macromolecular dynamics, and the results of the survey are presented here. The scope of the publications ranges from the general physics of protein and solvent dynamics, to the biologically relevant dynamics-function relationships in live cells. As a result of the survey we are currently setting up a neutron Dynamics Data Bank (nDDB) with the aim to make the neutron data on biological systems widely available. This will benefit, in particular, the MD simulation community to validate and improve their force fields. The aim of the database is to expose and give easy access to a body of experimental data to the scientific community. The database will be populated with as much of the existing data as possible. In the future it will give value, as part of a bigger whole, to high throughput data, as well as more detailed studies. A range and volume of experimental data will be of interest in determining how quantitatively MD simulations can reproduce trends across a range of systems and to what extent such trends may depend on sample preparation and data reduction and analysis methods. In this context, we strongly encourage researchers in the field to deposit their data in the nDDB.
The Mouse Tumor Biology Database: A Comprehensive Resource for Mouse Models of Human Cancer.

PubMed

Krupke, Debra M; Begley, Dale A; Sundberg, John P; Richardson, Joel E; Neuhauser, Steven B; Bult, Carol J

2017-11-01

Research using laboratory mice has led to fundamental insights into the molecular genetic processes that govern cancer initiation, progression, and treatment response. Although thousands of scientific articles have been published about mouse models of human cancer, collating information and data for a specific model is hampered by the fact that many authors do not adhere to existing annotation standards when describing models. The interpretation of experimental results in mouse models can also be confounded when researchers do not factor in the effect of genetic background on tumor biology. The Mouse Tumor Biology (MTB) database is an expertly curated, comprehensive compendium of mouse models of human cancer. Through the enforcement of nomenclature and related annotation standards, MTB supports aggregation of data about a cancer model from diverse sources and assessment of how genetic background of a mouse strain influences the biological properties of a specific tumor type and model utility. Cancer Res; 77(21); e67-70. ©2017 AACR . ©2017 American Association for Cancer Research.
The BioExtract Server: a web-based bioinformatic workflow platform

PubMed Central

Lushbough, Carol M.; Jennewein, Douglas M.; Brendel, Volker P.

2011-01-01

The BioExtract Server (bioextract.org) is an open, web-based system designed to aid researchers in the analysis of genomic data by providing a platform for the creation of bioinformatic workflows. Scientific workflows are created within the system by recording tasks performed by the user. These tasks may include querying multiple, distributed data sources, saving query results as searchable data extracts, and executing local and web-accessible analytic tools. The series of recorded tasks can then be saved as a reproducible, sharable workflow available for subsequent execution with the original or modified inputs and parameter settings. Integrated data resources include interfaces to the National Center for Biotechnology Information (NCBI) nucleotide and protein databases, the European Molecular Biology Laboratory (EMBL-Bank) non-redundant nucleotide database, the Universal Protein Resource (UniProt), and the UniProt Reference Clusters (UniRef) database. The system offers access to numerous preinstalled, curated analytic tools and also provides researchers with the option of selecting computational tools from a large list of web services including the European Molecular Biology Open Software Suite (EMBOSS), BioMoby, and the Kyoto Encyclopedia of Genes and Genomes (KEGG). The system further allows users to integrate local command line tools residing on their own computers through a client-side Java applet. PMID:21546552
Text mining for the biocuration workflow

PubMed Central

Hirschman, Lynette; Burns, Gully A. P. C; Krallinger, Martin; Arighi, Cecilia; Cohen, K. Bretonnel; Valencia, Alfonso; Wu, Cathy H.; Chatr-Aryamontri, Andrew; Dowell, Karen G.; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G.

2012-01-01

Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on ‘Text Mining for the BioCuration Workflow’ at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community. PMID:22513129
Text mining for the biocuration workflow.

PubMed

Hirschman, Lynette; Burns, Gully A P C; Krallinger, Martin; Arighi, Cecilia; Cohen, K Bretonnel; Valencia, Alfonso; Wu, Cathy H; Chatr-Aryamontri, Andrew; Dowell, Karen G; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G

2012-01-01

Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on 'Text Mining for the BioCuration Workflow' at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community.
WormQTLHD—a web database for linking human disease to natural variation data in C. elegans

PubMed Central

van der Velde, K. Joeri; de Haan, Mark; Zych, Konrad; Arends, Danny; Snoek, L. Basten; Kammenga, Jan E.; Jansen, Ritsert C.; Swertz, Morris A.; Li, Yang

2014-01-01

Interactions between proteins are highly conserved across species. As a result, the molecular basis of multiple diseases affecting humans can be studied in model organisms that offer many alternative experimental opportunities. One such organism—Caenorhabditis elegans—has been used to produce much molecular quantitative genetics and systems biology data over the past decade. We present WormQTLHD (Human Disease), a database that quantitatively and systematically links expression Quantitative Trait Loci (eQTL) findings in C. elegans to gene–disease associations in man. WormQTLHD, available online at http://www.wormqtl-hd.org, is a user-friendly set of tools to reveal functionally coherent, evolutionary conserved gene networks. These can be used to predict novel gene-to-gene associations and the functions of genes underlying the disease of interest. We created a new database that links C. elegans eQTL data sets to human diseases (34 337 gene–disease associations from OMIM, DGA, GWAS Central and NHGRI GWAS Catalogue) based on overlapping sets of orthologous genes associated to phenotypes in these two species. We utilized QTL results, high-throughput molecular phenotypes, classical phenotypes and genotype data covering different developmental stages and environments from WormQTL database. All software is available as open source, built on MOLGENIS and xQTL workbench. PMID:24217915
WormQTLHD--a web database for linking human disease to natural variation data in C. elegans.

PubMed

van der Velde, K Joeri; de Haan, Mark; Zych, Konrad; Arends, Danny; Snoek, L Basten; Kammenga, Jan E; Jansen, Ritsert C; Swertz, Morris A; Li, Yang

2014-01-01

Interactions between proteins are highly conserved across species. As a result, the molecular basis of multiple diseases affecting humans can be studied in model organisms that offer many alternative experimental opportunities. One such organism-Caenorhabditis elegans-has been used to produce much molecular quantitative genetics and systems biology data over the past decade. We present WormQTL(HD) (Human Disease), a database that quantitatively and systematically links expression Quantitative Trait Loci (eQTL) findings in C. elegans to gene-disease associations in man. WormQTL(HD), available online at http://www.wormqtl-hd.org, is a user-friendly set of tools to reveal functionally coherent, evolutionary conserved gene networks. These can be used to predict novel gene-to-gene associations and the functions of genes underlying the disease of interest. We created a new database that links C. elegans eQTL data sets to human diseases (34 337 gene-disease associations from OMIM, DGA, GWAS Central and NHGRI GWAS Catalogue) based on overlapping sets of orthologous genes associated to phenotypes in these two species. We utilized QTL results, high-throughput molecular phenotypes, classical phenotypes and genotype data covering different developmental stages and environments from WormQTL database. All software is available as open source, built on MOLGENIS and xQTL workbench.

OptoBase: A web platform for molecular optogenetics.

PubMed

Kolar, Katja; Knobloch, Christian; Stork, Hendrik; Žnidarič, Matej; Weber, Wilfried

2018-06-18

OptoBase is an online platform for molecular optogenetics. At its core is a hand-annotated and ontology-supported database that aims to cover all existing optogenetic switches and publications, which is further complemented with a collection of convenient optogenetics-related web tools. OptoBase is meant for both expert optogeneticists, to easily keep track of the field, as well as for all researchers who find optogenetics inviting as a powerful tool to address their biological questions of interest. It is available at https://www.optobase.org. This work also presents OptoBase-based analysis of the trends in molecular optogenetics.
TIPdb-3D: the three-dimensional structure database of phytochemicals from Taiwan indigenous plants.

PubMed

Tung, Chun-Wei; Lin, Ying-Chi; Chang, Hsun-Shuo; Wang, Chia-Chi; Chen, Ih-Sheng; Jheng, Jhao-Liang; Li, Jih-Heng

2014-01-01

The rich indigenous and endemic plants in Taiwan serve as a resourceful bank for biologically active phytochemicals. Based on our TIPdb database curating bioactive phytochemicals from Taiwan indigenous plants, this study presents a three-dimensional (3D) chemical structure database named TIPdb-3D to support the discovery of novel pharmacologically active compounds. The Merck Molecular Force Field (MMFF94) was used to generate 3D structures of phytochemicals in TIPdb. The 3D structures could facilitate the analysis of 3D quantitative structure-activity relationship, the exploration of chemical space and the identification of potential pharmacologically active compounds using protein-ligand docking. Database URL: http://cwtung.kmu.edu.tw/tipdb. © The Author(s) 2014. Published by Oxford University Press.
Cell Line Data Base: structure and recent improvements towards molecular authentication of human cell lines

PubMed Central

Romano, Paolo; Manniello, Assunta; Aresu, Ottavia; Armento, Massimiliano; Cesaro, Michela; Parodi, Barbara

2009-01-01

The Cell Line Data Base (CLDB) is a well-known reference information source on human and animal cell lines including information on more than 6000 cell lines. Main biological features are coded according to controlled vocabularies derived from international lists and taxonomies. HyperCLDB (http://bioinformatics.istge.it/hypercldb/) is a hypertext version of CLDB that improves data accessibility by also allowing information retrieval through web spiders. Access to HyperCLDB is provided through indexes of biological characteristics and navigation in the hypertext is granted by many internal links. HyperCLDB also includes links to external resources. Recently, an interest was raised for a reference nomenclature for cell lines and CLDB was seen as an authoritative system. Furthermore, to overcome the cell line misidentification problem, molecular authentication methods, such as fingerprinting, single-locus short tandem repeat (STR) profile and single nucleotide polymorphisms validation, were proposed. Since this data is distributed, a reference portal on authentication of human cell lines is needed. We present here the architecture and contents of CLDB, its recent enhancements and perspectives. We also present a new related database, the Cell Line Integrated Molecular Authentication (CLIMA) database (http://bioinformatics.istge.it/clima/), that allows to link authentication data to actual cell lines. PMID:18927105
Cell Line Data Base: structure and recent improvements towards molecular authentication of human cell lines.

PubMed

Romano, Paolo; Manniello, Assunta; Aresu, Ottavia; Armento, Massimiliano; Cesaro, Michela; Parodi, Barbara

2009-01-01

The Cell Line Data Base (CLDB) is a well-known reference information source on human and animal cell lines including information on more than 6000 cell lines. Main biological features are coded according to controlled vocabularies derived from international lists and taxonomies. HyperCLDB (http://bioinformatics.istge.it/hypercldb/) is a hypertext version of CLDB that improves data accessibility by also allowing information retrieval through web spiders. Access to HyperCLDB is provided through indexes of biological characteristics and navigation in the hypertext is granted by many internal links. HyperCLDB also includes links to external resources. Recently, an interest was raised for a reference nomenclature for cell lines and CLDB was seen as an authoritative system. Furthermore, to overcome the cell line misidentification problem, molecular authentication methods, such as fingerprinting, single-locus short tandem repeat (STR) profile and single nucleotide polymorphisms validation, were proposed. Since this data is distributed, a reference portal on authentication of human cell lines is needed. We present here the architecture and contents of CLDB, its recent enhancements and perspectives. We also present a new related database, the Cell Line Integrated Molecular Authentication (CLIMA) database (http://bioinformatics.istge.it/clima/), that allows to link authentication data to actual cell lines.
Molecular nutrition research: the modern way of performing nutritional science.

PubMed

Norheim, Frode; Gjelstad, Ingrid Merethe Fange; Hjorth, Marit; Vinknes, Kathrine J; Langleite, Torgrim M; Holen, Torgeir; Jensen, Jørgen; Dalen, Knut Tomas; Karlsen, Anette S; Kielland, Anders; Rustan, Arild C; Drevon, Christian A

2012-12-03

In spite of amazing progress in food supply and nutritional science, and a striking increase in life expectancy of approximately 2.5 months per year in many countries during the previous 150 years, modern nutritional research has a great potential of still contributing to improved health for future generations, granted that the revolutions in molecular and systems technologies are applied to nutritional questions. Descriptive and mechanistic studies using state of the art epidemiology, food intake registration, genomics with single nucleotide polymorphisms (SNPs) and epigenomics, transcriptomics, proteomics, metabolomics, advanced biostatistics, imaging, calorimetry, cell biology, challenge tests (meals, exercise, etc.), and integration of all data by systems biology, will provide insight on a much higher level than today in a field we may name molecular nutrition research. To take advantage of all the new technologies scientists should develop international collaboration and gather data in large open access databases like the suggested Nutritional Phenotype database (dbNP). This collaboration will promote standardization of procedures (SOP), and provide a possibility to use collected data in future research projects. The ultimate goals of future nutritional research are to understand the detailed mechanisms of action for how nutrients/foods interact with the body and thereby enhance health and treat diet-related diseases.
Molecular Nutrition Research—The Modern Way Of Performing Nutritional Science

PubMed Central

Norheim, Frode; Gjelstad, Ingrid M. F.; Hjorth, Marit; Vinknes, Kathrine J.; Langleite, Torgrim M.; Holen, Torgeir; Jensen, Jørgen; Dalen, Knut Tomas; Karlsen, Anette S.; Kielland, Anders; Rustan, Arild C.; Drevon, Christian A.

2012-01-01

In spite of amazing progress in food supply and nutritional science, and a striking increase in life expectancy of approximately 2.5 months per year in many countries during the previous 150 years, modern nutritional research has a great potential of still contributing to improved health for future generations, granted that the revolutions in molecular and systems technologies are applied to nutritional questions. Descriptive and mechanistic studies using state of the art epidemiology, food intake registration, genomics with single nucleotide polymorphisms (SNPs) and epigenomics, transcriptomics, proteomics, metabolomics, advanced biostatistics, imaging, calorimetry, cell biology, challenge tests (meals, exercise, etc.), and integration of all data by systems biology, will provide insight on a much higher level than today in a field we may name molecular nutrition research. To take advantage of all the new technologies scientists should develop international collaboration and gather data in large open access databases like the suggested Nutritional Phenotype database (dbNP). This collaboration will promote standardization of procedures (SOP), and provide a possibility to use collected data in future research projects. The ultimate goals of future nutritional research are to understand the detailed mechanisms of action for how nutrients/foods interact with the body and thereby enhance health and treat diet-related diseases. PMID:23208524
Technology-Enhanced Research in the Science Classroom.

ERIC Educational Resources Information Center

Francis, Joseph W.

1997-01-01

Describes a project where students use the Internet as a research tool. Discusses using e-mail to access molecular biology databases and identify proteins using amino acid sequences, obtaining complete amino acid sequences using the world wide web, using telnet to access library resources on the Internet, and various stages of protein analysis…
LigandBox: A database for 3D structures of chemical compounds

PubMed Central

Kawabata, Takeshi; Sugihara, Yusuke; Fukunishi, Yoshifumi; Nakamura, Haruki

2013-01-01

A database for the 3D structures of available compounds is essential for the virtual screening by molecular docking. We have developed the LigandBox database (http://ligandbox.protein.osaka-u.ac.jp/ligandbox/) containing four million available compounds, collected from the catalogues of 37 commercial suppliers, and approved drugs and biochemical compounds taken from KEGG_DRUG, KEGG_COMPOUND and PDB databases. Each chemical compound in the database has several 3D conformers with hydrogen atoms and atomic charges, which are ready to be docked into receptors using docking programs. The 3D conformations were generated using our molecular simulation program package, myPresto. Various physical properties, such as aqueous solubility (LogS) and carcinogenicity have also been calculated to characterize the ADME-Tox properties of the compounds. The Web database provides two services for compound searches: a property/chemical ID search and a chemical structure search. The chemical structure search is performed by a descriptor search and a maximum common substructure (MCS) search combination, using our program kcombu. By specifying a query chemical structure, users can find similar compounds among the millions of compounds in the database within a few minutes. Our database is expected to assist a wide range of researchers, in the fields of medical science, chemical biology, and biochemistry, who are seeking to discover active chemical compounds by the virtual screening. PMID:27493549
LigandBox: A database for 3D structures of chemical compounds.

PubMed

Kawabata, Takeshi; Sugihara, Yusuke; Fukunishi, Yoshifumi; Nakamura, Haruki

2013-01-01

A database for the 3D structures of available compounds is essential for the virtual screening by molecular docking. We have developed the LigandBox database (http://ligandbox.protein.osaka-u.ac.jp/ligandbox/) containing four million available compounds, collected from the catalogues of 37 commercial suppliers, and approved drugs and biochemical compounds taken from KEGG_DRUG, KEGG_COMPOUND and PDB databases. Each chemical compound in the database has several 3D conformers with hydrogen atoms and atomic charges, which are ready to be docked into receptors using docking programs. The 3D conformations were generated using our molecular simulation program package, myPresto. Various physical properties, such as aqueous solubility (LogS) and carcinogenicity have also been calculated to characterize the ADME-Tox properties of the compounds. The Web database provides two services for compound searches: a property/chemical ID search and a chemical structure search. The chemical structure search is performed by a descriptor search and a maximum common substructure (MCS) search combination, using our program kcombu. By specifying a query chemical structure, users can find similar compounds among the millions of compounds in the database within a few minutes. Our database is expected to assist a wide range of researchers, in the fields of medical science, chemical biology, and biochemistry, who are seeking to discover active chemical compounds by the virtual screening.
Relax with CouchDB--into the non-relational DBMS era of bioinformatics.

PubMed

Manyam, Ganiraju; Payton, Michelle A; Roth, Jack A; Abruzzo, Lynne V; Coombes, Kevin R

2012-07-01

With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. Copyright © 2012 Elsevier Inc. All rights reserved.
From Discovery to Function: The Expanding Roles of Long NonCoding RNAs in Physiology and Disease

PubMed Central

Sun, Miao

2015-01-01

Long noncoding RNAs (lncRNAs) are a relatively poorly understood class of RNAs with little or no coding capacity transcribed from a set of incompletely annotated genes. They have received considerable attention in the past few years and are emerging as potentially important players in biological regulation. Here we discuss the evolving understanding of this new class of molecular regulators that has emerged from ongoing research, which continues to expand our databases of annotated lncRNAs and provide new insights into their physical properties, molecular mechanisms of action, and biological functions. We outline the current strategies and approaches that have been employed to identify and characterize lncRNAs, which have been instrumental in revealing their multifaceted roles ranging from cis- to trans-regulation of gene expression and from epigenetic modulation in the nucleus to posttranscriptional control in the cytoplasm. In addition, we highlight the molecular and biological functions of some of the best characterized lncRNAs in physiology and disease, especially those relevant to endocrinology, reproduction, metabolism, immunology, neurobiology, muscle biology, and cancer. Finally, we discuss the tremendous diagnostic and therapeutic potential of lncRNAs in cancer and other diseases. PMID:25426780
From discovery to function: the expanding roles of long noncoding RNAs in physiology and disease.

PubMed

Sun, Miao; Kraus, W Lee

2015-02-01

Long noncoding RNAs (lncRNAs) are a relatively poorly understood class of RNAs with little or no coding capacity transcribed from a set of incompletely annotated genes. They have received considerable attention in the past few years and are emerging as potentially important players in biological regulation. Here we discuss the evolving understanding of this new class of molecular regulators that has emerged from ongoing research, which continues to expand our databases of annotated lncRNAs and provide new insights into their physical properties, molecular mechanisms of action, and biological functions. We outline the current strategies and approaches that have been employed to identify and characterize lncRNAs, which have been instrumental in revealing their multifaceted roles ranging from cis- to trans-regulation of gene expression and from epigenetic modulation in the nucleus to posttranscriptional control in the cytoplasm. In addition, we highlight the molecular and biological functions of some of the best characterized lncRNAs in physiology and disease, especially those relevant to endocrinology, reproduction, metabolism, immunology, neurobiology, muscle biology, and cancer. Finally, we discuss the tremendous diagnostic and therapeutic potential of lncRNAs in cancer and other diseases.
The Nuclear Protein Database (NPD): sub-nuclear localisation and functional annotation of the nuclear proteome

PubMed Central

Dellaire, G.; Farrall, R.; Bickmore, W.A.

2003-01-01

The Nuclear Protein Database (NPD) is a curated database that contains information on more than 1300 vertebrate proteins that are thought, or are known, to localise to the cell nucleus. Each entry is annotated with information on predicted protein size and isoelectric point, as well as any repeats, motifs or domains within the protein sequence. In addition, information on the sub-nuclear localisation of each protein is provided and the biological and molecular functions are described using Gene Ontology (GO) terms. The database is searchable by keyword, protein name, sub-nuclear compartment and protein domain/motif. Links to other databases are provided (e.g. Entrez, SWISS-PROT, OMIM, PubMed, PubMed Central). Thus, NPD provides a gateway through which the nuclear proteome may be explored. The database can be accessed at http://npd.hgu.mrc.ac.uk and is updated monthly. PMID:12520015
Rhizoma Dioscoreae extract protects against alveolar bone loss by regulating the cell cycle: A predictive study based on the protein‑protein interaction network.

PubMed

Zhang, Zhi-Guo; Song, Chang-Heng; Zhang, Fang-Zhen; Chen, Yan-Jing; Xiang, Li-Hua; Xiao, Gary Guishan; Ju, Da-Hong

2016-06-01

Rhizoma Dioscoreae extract (RDE) exhibits a protective effect on alveolar bone loss in ovariectomized (OVX) rats. The aim of this study was to predict the pathways or targets that are regulated by RDE, by re‑assessing our previously reported data and conducting a protein‑protein interaction (PPI) network analysis. In total, 383 differentially expressed genes (≥3‑fold) between alveolar bone samples from the RDE and OVX group rats were identified, and a PPI network was constructed based on these genes. Furthermore, four molecular clusters (A‑D) in the PPI network with the smallest P‑values were detected by molecular complex detection (MCODE) algorithm. Using Database for Annotation, Visualization and Integrated Discovery (DAVID) and Ingenuity Pathway Analysis (IPA) tools, two molecular clusters (A and B) were enriched for biological process in Gene Ontology (GO). Only cluster A was associated with biological pathways in the IPA database. GO and pathway analysis results showed that cluster A, associated with cell cycle regulation, was the most important molecular cluster in the PPI network. In addition, cyclin‑dependent kinase 1 (CDK1) may be a key molecule achieving the cell‑cycle‑regulatory function of cluster A. From the PPI network analysis, it was predicted that delayed cell cycle progression in excessive alveolar bone remodeling via downregulation of CDK1 may be another mechanism underling the anti‑osteopenic effect of RDE on alveolar bone.
Functional Analysis of OMICs Data and Small Molecule Compounds in an Integrated "Knowledge-Based" Platform.

PubMed

Dubovenko, Alexey; Nikolsky, Yuri; Rakhmatulin, Eugene; Nikolskaya, Tatiana

2017-01-01

Analysis of NGS and other sequencing data, gene variants, gene expression, proteomics, and other high-throughput (OMICs) data is challenging because of its biological complexity and high level of technical and biological noise. One way to deal with both problems is to perform analysis with a high fidelity annotated knowledgebase of protein interactions, pathways, and functional ontologies. This knowledgebase has to be structured in a computer-readable format and must include software tools for managing experimental data, analysis, and reporting. Here, we present MetaCore™ and Key Pathway Advisor (KPA), an integrated platform for functional data analysis. On the content side, MetaCore and KPA encompass a comprehensive database of molecular interactions of different types, pathways, network models, and ten functional ontologies covering human, mouse, and rat genes. The analytical toolkit includes tools for gene/protein list enrichment analysis, statistical "interactome" tool for the identification of over- and under-connected proteins in the dataset, and a biological network analysis module made up of network generation algorithms and filters. The suite also features Advanced Search, an application for combinatorial search of the database content, as well as a Java-based tool called Pathway Map Creator for drawing and editing custom pathway maps. Applications of MetaCore and KPA include molecular mode of action of disease research, identification of potential biomarkers and drug targets, pathway hypothesis generation, analysis of biological effects for novel small molecule compounds and clinical applications (analysis of large cohorts of patients, and translational and personalized medicine).
Integration of Molecular Networking and In-Silico MS/MS Fragmentation for Natural Products Dereplication.

PubMed

Allard, Pierre-Marie; Péresse, Tiphaine; Bisson, Jonathan; Gindro, Katia; Marcourt, Laurence; Pham, Van Cuong; Roussi, Fanny; Litaudon, Marc; Wolfender, Jean-Luc

2016-03-15

Dereplication represents a key step for rapidly identifying known secondary metabolites in complex biological matrices. In this context, liquid-chromatography coupled to high resolution mass spectrometry (LC-HRMS) is increasingly used and, via untargeted data-dependent MS/MS experiments, massive amounts of detailed information on the chemical composition of crude extracts can be generated. An efficient exploitation of such data sets requires automated data treatment and access to dedicated fragmentation databases. Various novel bioinformatics approaches such as molecular networking (MN) and in-silico fragmentation tools have emerged recently and provide new perspective for early metabolite identification in natural products (NPs) research. Here we propose an innovative dereplication strategy based on the combination of MN with an extensive in-silico MS/MS fragmentation database of NPs. Using two case studies, we demonstrate that this combined approach offers a powerful tool to navigate through the chemistry of complex NPs extracts, dereplicate metabolites, and annotate analogues of database entries.
NaviCom: a web application to create interactive molecular network portraits using multi-level omics data.

PubMed

Dorel, Mathurin; Viara, Eric; Barillot, Emmanuel; Zinovyev, Andrei; Kuperstein, Inna

2017-01-01

Human diseases such as cancer are routinely characterized by high-throughput molecular technologies, and multi-level omics data are accumulated in public databases at increasing rate. Retrieval and visualization of these data in the context of molecular network maps can provide insights into the pattern of regulation of molecular functions reflected by an omics profile. In order to make this task easy, we developed NaviCom, a Python package and web platform for visualization of multi-level omics data on top of biological network maps. NaviCom is bridging the gap between cBioPortal, the most used resource of large-scale cancer omics data and NaviCell, a data visualization web service that contains several molecular network map collections. NaviCom proposes several standardized modes of data display on top of molecular network maps, allowing addressing specific biological questions. We illustrate how users can easily create interactive network-based cancer molecular portraits via NaviCom web interface using the maps of Atlas of Cancer Signalling Network (ACSN) and other maps. Analysis of these molecular portraits can help in formulating a scientific hypothesis on the molecular mechanisms deregulated in the studied disease. NaviCom is available at https://navicom.curie.fr. © The Author(s) 2017. Published by Oxford University Press.
Estrogen alters the profile of the transcriptome in river snail Bellamya aeruginosa.

PubMed

Lei, Kun; Liu, Ruizhi; An, Li-Hui; Luo, Ying-Feng; LeBlanc, Gerald A

2015-03-01

We evaluated the transcriptome dynamics of the freshwater river snail Bellamya aeruginosa exposed to 17β-estradiol (E2) using the Roche/454 GS-FLX platform. In total, 41,869 unigenes, with an average length of 586 bp, representing 36,181 contigs and 5,688 singlets were obtained. Among them, 18.08, 36.85, and 25.47 % matched sequences in the GenBank non-redundant nucleic acid database, non-redundant protein database, and Swiss protein database, respectively. Annotation of the unigenes with gene ontology, and then mapping them to biological pathways, revealed large groups of genes related to growth, development, reproduction, signal transduction, and defense mechanisms. Significant differences were found in gene expression in both liver and testicular tissues between control and E2-exposed organisms. These changes in gene expression will help in understanding the molecular mechanisms of the response to physiological stress in the river snail exposed to estrogen, and will facilitate research into biological processes and underlying physiological adaptations to xenoestrogen exposure in gastropods.
Using PCR-RFLP technology to teach single nucleotide polymorphism for undergraduates.

PubMed

Zhang, Bo; Wang, Yan; Xu, Xiaofeng; Guan, Xingying; Bai, Yun

2013-01-01

Recent studies indicated that the aberrant gene expression of peroxiredoxin-6 (prdx6) was found in various kinds of cancers. Because of its biochemical function and gene expression pattern in cancer cells, the association between genetic polymorphism of Prdx6 and cancer onset is interesting. In this report, we have developed and implemented a serial experiment in molecular biology laboratory course to teach single nucleotide polymorphism (SNP) to undergraduate students majoring in molecular biology or genetics. The flanking sequence of rs4382766 was located in Prdx6 gene, which contained a restriction site of SspI, and was used as a target in this lab course. The students could mimic real research by integrating different techniques, such as database retrieving, genomic DNA isolation, PCR, and restriction enzyme assay. This serial experiment of PCR-RFLP helps students set up intact idea of molecular biology and understand the relation among individual experiments. Students were found to be more enthusiastic during the laboratory classes than those in the former curriculum. Copyright © 2013 Wiley Periodicals, Inc.
ExplorEnz: the primary source of the IUBMB enzyme list

PubMed Central

McDonald, Andrew G.; Boyce, Sinéad; Tipton, Keith F.

2009-01-01

ExplorEnz is the MySQL database that is used for the curation and dissemination of the International Union of Biochemistry and Molecular Biology (IUBMB) Enzyme Nomenclature. A simple web-based query interface is provided, along with an advanced search engine for more complex Boolean queries. The WWW front-end is accessible at http://www.enzyme-database.org, from where downloads of the database as SQL and XML are also available. An associated form-based curatorial application has been developed to facilitate the curation of enzyme data as well as the internal and public review processes that occur before an enzyme entry is made official. Suggestions for new enzyme entries, or modifications to existing ones, can be made using the forms provided at http://www.enzyme-database.org/forms.php. PMID:18776214

Molecular differences between mature and immature dental pulp cells: Bioinformatics and preliminary results.

PubMed

Chen, Long; Jiang, Yifeng; Du, Zhen

2018-04-01

Although previous studies have demonstrated that dental pulp stem cells (DPSCs) from mature and immature teeth exhibit potential for multi-directional differentiation, the molecular and biological difference between the DPSCs from mature and immature permanent teeth has not been fully investigated. In the present study, 500 differentially expressed genes from dental pulp cells (DPCs) in mature and immature permanent teeth were obtained from the Gene Expression Omnibus online database. Based on bioinformatics analysis using the Database for Annotation, Visualization and Integrated Discovery, these genes were divided into a number of subgroups associated with immunity, inflammation and cell signaling. The results of the present study suggest that immune features, response to infection and cell signaling may be different in DPCs from mature and immature permanent teeth; furthermore, DPCs from immature permanent teeth may be more suitable for use in tissue engineering or stem cell therapy. The Online Mendelian Inheritance in Man database stated that Sonic Hedgehog (SHH), a differentially expressed gene in DPCs from mature and immature permanent teeth, serves a crucial role in the development of craniofacial tissues, including teeth, which further confirmed that SHH may cause DPCs from mature and immature permanent teeth to exhibit different biological characteristics. The Search Tool for the Retrieval of Interacting Genes/Proteins database revealed that SHH has functional protein associations with a number of other proteins, including Glioma-associated oncogene (GLI)1, GLI2, growth arrest-specific protein 1, bone morphogenetic protein (BMP)2 and BMP4, in mice and humans. It was also demonstrated that SHH may interact with other genes to regulate the biological characteristics of DPCs. The results of the present study may provide a useful reference basis for selecting suitable DPSCs and molecules for the treatment of these cells to optimize features for tissue engineering or stem cell therapy. Quantitative polymerase chain reaction should be performed to confirm the differential expression of these genes prior to the beginning of a functional study.
Biopython: freely available Python tools for computational molecular biology and bioinformatics.

PubMed

Cock, Peter J A; Antao, Tiago; Chang, Jeffrey T; Chapman, Brad A; Cox, Cymon J; Dalke, Andrew; Friedberg, Iddo; Hamelryck, Thomas; Kauff, Frank; Wilczynski, Bartek; de Hoon, Michiel J L

2009-06-01

The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Biopython includes modules for reading and writing different sequence file formats and multiple sequence alignments, dealing with 3D macro molecular structures, interacting with common tools such as BLAST, ClustalW and EMBOSS, accessing key online databases, as well as providing numerical methods for statistical learning. Biopython is freely available, with documentation and source code at (www.biopython.org) under the Biopython license.
Conceptual biology, hypothesis discovery, and text mining: Swanson's legacy.

PubMed

Bekhuis, Tanja

2006-04-03

Innovative biomedical librarians and information specialists who want to expand their roles as expert searchers need to know about profound changes in biology and parallel trends in text mining. In recent years, conceptual biology has emerged as a complement to empirical biology. This is partly in response to the availability of massive digital resources such as the network of databases for molecular biologists at the National Center for Biotechnology Information. Developments in text mining and hypothesis discovery systems based on the early work of Swanson, a mathematician and information scientist, are coincident with the emergence of conceptual biology. Very little has been written to introduce biomedical digital librarians to these new trends. In this paper, background for data and text mining, as well as for knowledge discovery in databases (KDD) and in text (KDT) is presented, then a brief review of Swanson's ideas, followed by a discussion of recent approaches to hypothesis discovery and testing. 'Testing' in the context of text mining involves partially automated methods for finding evidence in the literature to support hypothetical relationships. Concluding remarks follow regarding (a) the limits of current strategies for evaluation of hypothesis discovery systems and (b) the role of literature-based discovery in concert with empirical research. Report of an informatics-driven literature review for biomarkers of systemic lupus erythematosus is mentioned. Swanson's vision of the hidden value in the literature of science and, by extension, in biomedical digital databases, is still remarkably generative for information scientists, biologists, and physicians.
PathNER: a tool for systematic identification of biological pathway mentions in the literature

PubMed Central

2013-01-01

Background Biological pathways are central to many biomedical studies and are frequently discussed in the literature. Several curated databases have been established to collate the knowledge of molecular processes constituting pathways. Yet, there has been little focus on enabling systematic detection of pathway mentions in the literature. Results We developed a tool, named PathNER (Pathway Named Entity Recognition), for the systematic identification of pathway mentions in the literature. PathNER is based on soft dictionary matching and rules, with the dictionary generated from public pathway databases. The rules utilise general pathway-specific keywords, syntactic information and gene/protein mentions. Detection results from both components are merged. On a gold-standard corpus, PathNER achieved an F1-score of 84%. To illustrate its potential, we applied PathNER on a collection of articles related to Alzheimer's disease to identify associated pathways, highlighting cases that can complement an existing manually curated knowledgebase. Conclusions In contrast to existing text-mining efforts that target the automatic reconstruction of pathway details from molecular interactions mentioned in the literature, PathNER focuses on identifying specific named pathway mentions. These mentions can be used to support large-scale curation and pathway-related systems biology applications, as demonstrated in the example of Alzheimer's disease. PathNER is implemented in Java and made freely available online at http://sourceforge.net/projects/pathner/. PMID:24555844
Finding mouse models of human lymphomas and leukemia's using the Jackson laboratory mouse tumor biology database.

PubMed

Begley, Dale A; Sundberg, John P; Krupke, Debra M; Neuhauser, Steven B; Bult, Carol J; Eppig, Janan T; Morse, Herbert C; Ward, Jerrold M

2015-12-01

Many mouse models have been created to study hematopoietic cancer types. There are over thirty hematopoietic tumor types and subtypes, both human and mouse, with various origins, characteristics and clinical prognoses. Determining the specific type of hematopoietic lesion produced in a mouse model and identifying mouse models that correspond to the human subtypes of these lesions has been a continuing challenge for the scientific community. The Mouse Tumor Biology Database (MTB; http://tumor.informatics.jax.org) is designed to facilitate use of mouse models of human cancer by providing detailed histopathologic and molecular information on lymphoma subtypes, including expertly annotated, on line, whole slide scans, and providing a repository for storing information on and querying these data for specific lymphoma models. Copyright © 2015 Elsevier Inc. All rights reserved.
Mutation databases and other online sites as a resource for transfusion medicine: history and attributes.

PubMed

Blumenfeld, Olga O

2002-04-01

Recent advances in molecular biology and technology have provided evidence, at a molecular level, for long-known observations that the human genome is not unique but is characterized by individual sequence variation. At the present time, documentation of genetic variation occurring in a large number of genes is increasing exponentially. The characterization of alleles that encode a variety of blood group antigens has been particularly fruitful for transfusion medicine. Phenotypic variation, as identified by the serologic study of blood group variants, is required to identify the presence of a variant allele. Many of the other alleles currently recorded have been selected and identified on the basis of inherited disease traits. New approaches document single nucleotide polymorphisms that occur throughout the genome and best show how the DNA sequence varies in the human population. The primary data dealing with variant alleles or more general genomic variation are scattered throughout the scientific literature and only within the last few years has information begun to be organized into databases. This article provides guidance on how to access those databases online as a source of information about genetic variation for purposes of molecular, clinical, and diagnostic medicine, research, and teaching. The attributes of the sites are described. A more detailed view of the database dealing specifically with alleles of genes encoding the blood group antigens includes a brief preliminary analysis of the molecular basis for observed polymorphisms. Other online sites that may be particularly useful to the transfusion medicine readership as well as a brief historical account are also presented. Copyright 2002, Elsevier Science (USA). All rights reserved.
Molecular Evolution in Historical Perspective.

PubMed

Suárez-Díaz, Edna

2016-12-01

In the 1960s, advances in protein chemistry and molecular genetics provided new means for the study of biological evolution. Amino acid sequencing, nucleic acid hybridization, zone gel electrophoresis, and immunochemistry were some of the experimental techniques that brought about new perspectives to the study of the patterns and mechanisms of evolution. New concepts, such as the molecular evolutionary clock, and the discovery of unexpected molecular phenomena, like the presence of repetitive sequences in eukaryotic genomes, eventually led to the realization that evolution might occur at a different pace at the organismic and the molecular levels, and according to different mechanisms. These developments sparked important debates between defendants of the molecular and organismic approaches. The most vocal confrontations focused on the relation between primates and humans, and the neutral theory of molecular evolution. By the 1980s and 1990s, the construction of large protein and DNA sequences databases, and the development of computer-based statistical tools, facilitated the coming together of molecular and evolutionary biology. Although in its contemporary form the field of molecular evolution can be traced back to the last five decades, the field has deep roots in twentieth century experimental life sciences. For historians of science, the origins and consolidation of molecular evolution provide a privileged field for the study of scientific debates, the relation between technological advances and scientific knowledge, and the connection between science and broader social concerns.
Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

PubMed

Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo

2016-07-19

Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .
Systems biology of cancer biomarker detection.

PubMed

Mitra, Sanga; Das, Smarajit; Chakrabarti, Jayprokas

2013-01-01

Cancer systems-biology is an ever-growing area of research due to explosion of data; how to mine these data and extract useful information is the problem. To have an insight on carcinogenesis one need to systematically mine several resources, such as databases, microarray and next-generation sequences. This review encompasses management and analysis of cancer data, databases construction and data deposition, whole transcriptome and genome comparison, analysing results from high throughput experiments to uncover cellular pathways and molecular interactions, and the design of effective algorithms to identify potential biomarkers. Recent technical advances such as ChIP-on-chip, ChIP-seq and RNA-seq can be applied to get epigenetic information transformed into a high-throughput endeavour to which systems biology and bioinformatics are making significant inroads. The data from ENCODE and GENCODE projects available through UCSC genome browser can be considered as benchmark for comparison and meta-analysis. A pipeline for integrating next generation sequencing data, microarray data, and putting them together with the existing database is discussed. The understanding of cancer genomics is changing the way we approach cancer diagnosis and treatment. To give a better understanding of utilizing available resources' we have chosen oral cancer to show how and what kind of analysis can be done. This review is a computational genomic primer that provides a bird's eye view of computational and bioinformatics' tools currently available to perform integrated genomic and system biology analyses of several carcinoma.
Morphinome Database - The database of proteins altered by morphine administration - An update.

PubMed

Bodzon-Kulakowska, Anna; Padrtova, Tereza; Drabik, Anna; Ner-Kluza, Joanna; Antolak, Anna; Kulakowski, Konrad; Suder, Piotr

2018-04-13

Morphine is considered a gold standard in pain treatment. Nevertheless, its use could be associated with severe side effects, including drug addiction. Thus, it is very important to understand the molecular mechanism of morphine action in order to develop new methods of pain therapy, or at least to attenuate the side effects of opioids usage. Proteomics allows for the indication of proteins involved in certain biological processes, but the number of items identified in a single study is usually overwhelming. Thus, researchers face the difficult problem of choosing the proteins which are really important for the investigated processes and worth further studies. Therefore, based on the 29 published articles, we created a database of proteins regulated by morphine administration - The Morphinome Database (addiction-proteomics.org). This web tool allows for indicating proteins that were identified during different proteomics studies. Moreover, the collection and organization of such a vast amount of data allows us to find the same proteins that were identified in various studies and to create their ranking, based on the frequency of their identification. STRING and KEGG databases indicated metabolic pathways which those molecules are involved in. This means that those molecular pathways seem to be strongly affected by morphine administration and could be important targets for further investigations. The data about proteins identified by different proteomics studies of molecular changes caused by morphine administration (29 published articles) were gathered in the Morphinome Database. Unification of those data allowed for the identification of proteins that were indicated several times by distinct proteomics studies, which means that they seem to be very well verified and important for the entire process. Those proteins might be now considered promising aims for more detailed studies of their role in the molecular mechanism of morphine action. Copyright © 2018. Published by Elsevier B.V.
A Chado case study: an ontology-based modular schema for representing genome-associated biological information.

PubMed

Mungall, Christopher J; Emmert, David B

2007-07-01

A few years ago, FlyBase undertook to design a new database schema to store Drosophila data. It would fully integrate genomic sequence and annotation data with bibliographic, genetic, phenotypic and molecular data from the literature representing a distillation of the first 100 years of research on this major animal model system. In developing this new integrated schema, FlyBase also made a commitment to ensure that its design was generic, extensible and available as open source, so that it could be employed as the core schema of any model organism data repository, thereby avoiding redundant software development and potentially increasing interoperability. Our question was whether we could create a relational database schema that would be successfully reused. Chado is a relational database schema now being used to manage biological knowledge for a wide variety of organisms, from human to pathogens, especially the classes of information that directly or indirectly can be associated with genome sequences or the primary RNA and protein products encoded by a genome. Biological databases that conform to this schema can interoperate with one another, and with application software from the Generic Model Organism Database (GMOD) toolkit. Chado is distinctive because its design is driven by ontologies. The use of ontologies (or controlled vocabularies) is ubiquitous across the schema, as they are used as a means of typing entities. The Chado schema is partitioned into integrated subschemas (modules), each encapsulating a different biological domain, and each described using representations in appropriate ontologies. To illustrate this methodology, we describe here the Chado modules used for describing genomic sequences. GMOD is a collaboration of several model organism database groups, including FlyBase, to develop a set of open-source software for managing model organism data. The Chado schema is freely distributed under the terms of the Artistic License (http://www.opensource.org/licenses/artistic-license.php) from GMOD (www.gmod.org).
Generation of a foveomacular transcriptome

PubMed Central

Bernstein, Steven; Wong, Paul W.

2014-01-01

Purpose Organizing molecular biologic data is a growing challenge since the rate of data accumulation is steadily increasing. Information relevant to a particular biologic query can be difficult to extract from the comprehensive databases currently available. We present a data collection and organization model designed to ameliorate these problems and applied it to generate an expressed sequence tag (EST)–based foveomacular transcriptome. Methods Using Perl, MySQL, EST libraries, screening, and human foveomacular gene expression as a model system, we generated a foveomacular transcriptome database enriched for molecularly relevant data. Results Using foveomacula as a gene expression model tissue, we identified and organized 6,056 genes expressed in that tissue. Of those identified genes, 3,480 had not been previously described as expressed in the foveomacula. Internal experimental controls as well as comparison of our data set to published data sets suggest we do not yet have a complete description of the foveomacula transcriptome. Conclusions We present an organizational method designed to amplify the utility of data pertinent to a specific research interest. Our method is generic enough to be applicable to a variety of conditions yet focused enough to allow for specialized study. PMID:24991187
Radiation damage of biomolecules (RADAM) database development: current status

NASA Astrophysics Data System (ADS)

Denifl, S.; Garcia, G.; Huber, B. A.; Marinković, B. P.; Mason, N.; Postler, J.; Rabus, H.; Rixon, G.; Solov'yov, A. V.; Suraud, E.; Yakubovich, A. V.

2013-06-01

Ion beam therapy offers the possibility of excellent dose localization for treatment of malignant tumours, minimizing radiation damage in normal tissue, while maximizing cell killing within the tumour. However, as the underlying dependent physical, chemical and biological processes are too complex to treat them on a purely analytical level, most of our current and future understanding will rely on computer simulations, based on mathematical equations, algorithms and last, but not least, on the available atomic and molecular data. The viability of the simulated output and the success of any computer simulation will be determined by these data, which are treated as the input variables in each computer simulation performed. The radiation research community lacks a complete database for the cross sections of all the different processes involved in ion beam induced damage: ionization and excitation cross sections for ions with liquid water and biological molecules, all the possible electron - medium interactions, dielectric response data, electron attachment to biomolecules etc. In this paper we discuss current progress in the creation of such a database, outline the roadmap of the project and review plans for the exploitation of such a database in future simulations.
HBVPathDB: a database of HBV infection-related molecular interaction network.

PubMed

Zhang, Yi; Bo, Xiao-Chen; Yang, Jing; Wang, Sheng-Qi

2005-03-21

To describe molecules or genes interaction between hepatitis B viruses (HBV) and host, for understanding how virus' and host's genes and molecules are networked to form a biological system and for perceiving mechanism of HBV infection. The knowledge of HBV infection-related reactions was organized into various kinds of pathways with carefully drawn graphs in HBVPathDB. Pathway information is stored with relational database management system (DBMS), which is currently the most efficient way to manage large amounts of data and query is implemented with powerful Structured Query Language (SQL). The search engine is written using Personal Home Page (PHP) with SQL embedded and web retrieval interface is developed for searching with Hypertext Markup Language (HTML). We present the first version of HBVPathDB, which is a HBV infection-related molecular interaction network database composed of 306 pathways with 1 050 molecules involved. With carefully drawn graphs, pathway information stored in HBVPathDB can be browsed in an intuitive way. We develop an easy-to-use interface for flexible accesses to the details of database. Convenient software is implemented to query and browse the pathway information of HBVPathDB. Four search page layout options-category search, gene search, description search, unitized search-are supported by the search engine of the database. The database is freely available at http://www.bio-inf.net/HBVPathDB/HBV/. The conventional perspective HBVPathDB have already contained a considerable amount of pathway information with HBV infection related, which is suitable for in-depth analysis of molecular interaction network of virus and host. HBVPathDB integrates pathway data-sets with convenient software for query, browsing, visualization, that provides users more opportunity to identify regulatory key molecules as potential drug targets and to explore the possible mechanism of HBV infection based on gene expression datasets.
Advanced techniques in placental biology -- workshop report.

PubMed

Nelson, D M; Sadovsky, Y; Robinson, J M; Croy, B A; Rice, G; Kniss, D A

2006-04-01

Major advances in placental biology have been realized as new technologies have been developed and existing methods have been refined in many areas of biological research. Classical anatomy and whole-organ physiology tools once used to analyze placental structure and function have been supplanted by more sophisticated techniques adapted from molecular biology, proteomics, and computational biology and bioinformatics. In addition, significant refinements in morphological study of the placenta and its constituent cell types have improved our ability to assess form and function in highly integrated manner. To offer an overview of modern technologies used by investigators to study the placenta, this workshop: Advanced techniques in placental biology, assembled experts who discussed fundamental principles and real time examples of four separate methodologies. Y. Sadovsky presented the principles of microRNA function as an endogenous mechanism of gene regulation. J. Robinson demonstrated the utility of correlative microscopy in which light-level and transmission electron microscopy are combined to provide cellular and subcellular views of placental cells. A. Croy provided a lecture on the use of microdissection techniques which are invaluable for isolating very small subsets of cell types for molecular analysis. Finally, G. Rice presented an overview methods on profiling of complex protein mixtures within tissue and/or fluid samples that, when refined, will offer databases that will underpin a systems approach to modern trophoblast biology.
Sialyldisaccharide conformations: a molecular dynamics perspective

NASA Astrophysics Data System (ADS)

Selvin, Jeyasigamani F. A.; Priyadarzini, Thanu R. K.; Veluraja, Kasinadar

2012-04-01

Sialyldisaccharides are significant terminal components of glycoconjugates and their negative charge and conformation are extensively utilized in molecular recognition processes. The conformation and flexibility of four biologically important sialyldisaccharides [Neu5Acα(2-3)Gal, Neu5Acα(2-6)Gal, Neu5Acα(2-8)Neu5Ac and Neu5Acα(2-9)Neu5Ac] are studied using Molecular Dynamics simulations of 20 ns duration to deduce the conformational preferences of the sialyldisaccharides and the interactions which stabilize the conformations. This study clearly describes the possible conformational models of sialyldisaccharides deduced from 20 ns Molecular Dynamics simulations and our results confirm the role of water in the structural stabilization of sialyldisaccharides. An extensive analysis on the sialyldisaccharide structures available in PDB also confirms the conformational regions found by experiments are detected in MD simulations of 20 ns duration. The three dimensional structural coordinates for all the MD derived sialyldisaccharide conformations are deposited in the 3DSDSCAR database and these conformational models will be useful for glycobiologists and biotechnologists to understand the biological functions of sialic acid containing glycoconjugates.
Effects of small particle numbers on long-term behaviour in discrete biochemical systems.

PubMed

Kreyssig, Peter; Wozar, Christian; Peter, Stephan; Veloz, Tomás; Ibrahim, Bashar; Dittrich, Peter

2014-09-01

The functioning of many biological processes depends on the appearance of only a small number of a single molecular species. Additionally, the observation of molecular crowding leads to the insight that even a high number of copies of species do not guarantee their interaction. How single particles contribute to stabilizing biological systems is not well understood yet. Hence, we aim at determining the influence of single molecules on the long-term behaviour of biological systems, i.e. whether they can reach a steady state. We provide theoretical considerations and a tool to analyse Systems Biology Markup Language models for the possibility to stabilize because of the described effects. The theory is an extension of chemical organization theory, which we called discrete chemical organization theory. Furthermore we scanned the BioModels Database for the occurrence of discrete chemical organizations. To exemplify our method, we describe an application to the Template model of the mitotic spindle assembly checkpoint mechanism. http://www.biosys.uni-jena.de/Services.html. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
Using an international p53 mutation database as a foundation for an online laboratory in an upper level undergraduate biology class.

PubMed

Melloy, Patricia G

2015-01-01

A two-part laboratory exercise was developed to enhance classroom instruction on the significance of p53 mutations in cancer development. Students were asked to mine key information from an international database of p53 genetic changes related to cancer, the IARC TP53 database. Using this database, students designed several data mining activities to look at the changes in the p53 gene from a number of perspectives, including potential cancer-causing agents leading to particular changes and the prevalence of certain p53 variations in certain cancers. In addition, students gained a global perspective on cancer prevalence in different parts of the world. Students learned how to use the database in the first part of the exercise, and then used that knowledge to search particular cancers and cancer-causing agents of their choosing in the second part of the exercise. Students also connected the information gathered from the p53 exercise to a previous laboratory exercise looking at risk factors for cancer development. The goal of the experience was to increase student knowledge of the link between p53 genetic variation and cancer. Students also were able to walk a similar path through the website as a cancer researcher using the database to enhance bench work-based experiments with complementary large-scale database p53 variation information. © 2014 The International Union of Biochemistry and Molecular Biology.
The salinity tolerant poplar database (STPD): a comprehensive database for studying tree salt-tolerant adaption and poplar genomics.

PubMed

Ma, Yazhen; Xu, Ting; Wan, Dongshi; Ma, Tao; Shi, Sheng; Liu, Jianquan; Hu, Quanjun

2015-03-17

Soil salinity is a significant factor that impairs plant growth and agricultural productivity, and numerous efforts are underway to enhance salt tolerance of economically important plants. Populus species are widely cultivated for diverse uses. Especially, they grow in different habitats, from salty soil to mesophytic environment, and are therefore used as a model genus for elucidating physiological and molecular mechanisms of stress tolerance in woody plants. The Salinity Tolerant Poplar Database (STPD) is an integrative database for salt-tolerant poplar genome biology. Currently the STPD contains Populus euphratica genome and its related genetic resources. P. euphratica, with a preference of the salty habitats, has become a valuable genetic resource for the exploitation of tolerance characteristics in trees. This database contains curated data including genomic sequence, genes and gene functional information, non-coding RNA sequences, transposable elements, simple sequence repeats and single nucleotide polymorphisms information of P. euphratica, gene expression data between P. euphratica and Populus tomentosa, and whole-genome alignments between Populus trichocarpa, P. euphratica and Salix suchowensis. The STPD provides useful searching and data mining tools, including GBrowse genome browser, BLAST servers and genome alignments viewer, which can be used to browse genome regions, identify similar sequences and visualize genome alignments. Datasets within the STPD can also be downloaded to perform local searches. A new Salinity Tolerant Poplar Database has been developed to assist studies of salt tolerance in trees and poplar genomics. The database will be continuously updated to incorporate new genome-wide data of related poplar species. This database will serve as an infrastructure for researches on the molecular function of genes, comparative genomics, and evolution in closely related species as well as promote advances in molecular breeding within Populus. The STPD can be accessed at http://me.lzu.edu.cn/stpd/ .
Database resources of the National Center for Biotechnology Information: 2002 update

PubMed Central

Wheeler, David L.; Church, Deanna M.; Lash, Alex E.; Leipe, Detlef D.; Madden, Thomas L.; Pontius, Joan U.; Schuler, Gregory D.; Schriml, Lynn M.; Tatusova, Tatiana A.; Wagner, Lukas; Rapp, Barbara A.

2002-01-01

In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources that operate on the data in GenBank and a variety of other biological data made available through NCBI’s web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, HomoloGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing, Human MapViewer, Human¡VMouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB) and the Conserved Domain Database (CDD). Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at http://www.ncbi.nlm.nih.gov. PMID:11752242

The EMBL nucleotide sequence database

PubMed Central

Stoesser, Guenter; Baker, Wendy; van den Broek, Alexandra; Camon, Evelyn; Garcia-Pastor, Maria; Kanz, Carola; Kulikova, Tamara; Lombard, Vincent; Lopez, Rodrigo; Parkinson, Helen; Redaschi, Nicole; Sterk, Peter; Stoehr, Peter; Tuli, Mary Ann

2001-01-01

The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. PMID:11125039
Systems Biology Approaches for Discovering Biomarkers for Traumatic Brain Injury

PubMed Central

Feala, Jacob D.; AbdulHameed, Mohamed Diwan M.; Yu, Chenggang; Dutta, Bhaskar; Yu, Xueping; Schmid, Kara; Dave, Jitendra; Tortella, Frank

2013-01-01

Abstract The rate of traumatic brain injury (TBI) in service members with wartime injuries has risen rapidly in recent years, and complex, variable links have emerged between TBI and long-term neurological disorders. The multifactorial nature of TBI secondary cellular response has confounded attempts to find cellular biomarkers for its diagnosis and prognosis or for guiding therapy for brain injury. One possibility is to apply emerging systems biology strategies to holistically probe and analyze the complex interweaving molecular pathways and networks that mediate the secondary cellular response through computational models that integrate these diverse data sets. Here, we review available systems biology strategies, databases, and tools. In addition, we describe opportunities for applying this methodology to existing TBI data sets to identify new biomarker candidates and gain insights about the underlying molecular mechanisms of TBI response. As an exemplar, we apply network and pathway analysis to a manually compiled list of 32 protein biomarker candidates from the literature, recover known TBI-related mechanisms, and generate hypothetical new biomarker candidates. PMID:23510232
Non-Metastatic Cutaneous Melanoma Induces Chronodisruption in Central and Peripheral Circadian Clocks.

PubMed

de Assis, Leonardo Vinícius Monteiro; Moraes, Maria Nathália; Magalhães-Marques, Keila Karoline; Kinker, Gabriela Sarti; da Silveira Cruz-Machado, Sanseray; Castrucci, Ana Maria de Lauro

2018-04-03

The biological clock has received increasing interest due to its key role in regulating body homeostasis in a time-dependent manner. Cancer development and progression has been linked to a disrupted molecular clock; however, in melanoma, the role of the biological clock is largely unknown. We investigated the effects of the tumor on its micro- (TME) and macro-environments (TMaE) in a non-metastatic melanoma model. C57BL/6J mice were inoculated with murine B16-F10 melanoma cells and 2 weeks later the animals were euthanized every 6 h during 24 h. The presence of a localized tumor significantly impaired the biological clock of tumor-adjacent skin and affected the oscillatory expression of genes involved in light- and thermo-reception, proliferation, melanogenesis, and DNA repair. The expression of tumor molecular clock was significantly reduced compared to healthy skin but still displayed an oscillatory profile. We were able to cluster the affected genes using a human database and distinguish between primary melanoma and healthy skin. The molecular clocks of lungs and liver (common sites of metastasis), and the suprachiasmatic nucleus (SCN) were significantly affected by tumor presence, leading to chronodisruption in each organ. Taken altogether, the presence of non-metastatic melanoma significantly impairs the organism's biological clocks. We suggest that the clock alterations found in TME and TMaE could impact development, progression, and metastasis of melanoma; thus, making the molecular clock an interesting pharmacological target.
Property Graph vs RDF Triple Store: A Comparison on Glycan Substructure Search

PubMed Central

Alocci, Davide; Mariethoz, Julien; Horlacher, Oliver; Bolleman, Jerven T.; Campbell, Matthew P.; Lisacek, Frederique

2015-01-01

Resource description framework (RDF) and Property Graph databases are emerging technologies that are used for storing graph-structured data. We compare these technologies through a molecular biology use case: glycan substructure search. Glycans are branched tree-like molecules composed of building blocks linked together by chemical bonds. The molecular structure of a glycan can be encoded into a direct acyclic graph where each node represents a building block and each edge serves as a chemical linkage between two building blocks. In this context, Graph databases are possible software solutions for storing glycan structures and Graph query languages, such as SPARQL and Cypher, can be used to perform a substructure search. Glycan substructure searching is an important feature for querying structure and experimental glycan databases and retrieving biologically meaningful data. This applies for example to identifying a region of the glycan recognised by a glycan binding protein (GBP). In this study, 19,404 glycan structures were selected from GlycomeDB (www.glycome-db.org) and modelled for being stored into a RDF triple store and a Property Graph. We then performed two different sets of searches and compared the query response times and the results from both technologies to assess performance and accuracy. The two implementations produced the same results, but interestingly we noted a difference in the query response times. Qualitative measures such as portability were also used to define further criteria for choosing the technology adapted to solving glycan substructure search and other comparable issues. PMID:26656740
A Web interface generator for molecular biology programs in Unix.

PubMed

Letondal, C

2001-01-01

Almost all users encounter problems using sequence analysis programs. Not only are they difficult to learn because of the parameters, syntax and semantic, but many are different. That is why we have developed a Web interface generator for more than 150 molecular biology command-line driven programs, including: phylogeny, gene prediction, alignment, RNA, DNA and protein analysis, motif discovery, structure analysis and database searching programs. The generator uses XML as a high-level description language of the legacy software parameters. Its aim is to provide users with the equivalent of a basic Unix environment, with program combination, customization and basic scripting through macro registration. The program has been used for three years by about 15000 users throughout the world; it has recently been installed on other sites and evaluated as a standard user interface for EMBOSS programs.
Bioinformatics and molecular modeling in glycobiology

PubMed Central

Schloissnig, Siegfried

2010-01-01

The field of glycobiology is concerned with the study of the structure, properties, and biological functions of the family of biomolecules called carbohydrates. Bioinformatics for glycobiology is a particularly challenging field, because carbohydrates exhibit a high structural diversity and their chains are often branched. Significant improvements in experimental analytical methods over recent years have led to a tremendous increase in the amount of carbohydrate structure data generated. Consequently, the availability of databases and tools to store, retrieve and analyze these data in an efficient way is of fundamental importance to progress in glycobiology. In this review, the various graphical representations and sequence formats of carbohydrates are introduced, and an overview of newly developed databases, the latest developments in sequence alignment and data mining, and tools to support experimental glycan analysis are presented. Finally, the field of structural glycoinformatics and molecular modeling of carbohydrates, glycoproteins, and protein–carbohydrate interaction are reviewed. PMID:20364395
Knowledge environments representing molecular entities for the virtual physiological human.

PubMed

Hofmann-Apitius, Martin; Fluck, Juliane; Furlong, Laura; Fornes, Oriol; Kolárik, Corinna; Hanser, Susanne; Boeker, Martin; Schulz, Stefan; Sanz, Ferran; Klinger, Roman; Mevissen, Theo; Gattermayer, Tobias; Oliva, Baldo; Friedrich, Christoph M

2008-09-13

In essence, the virtual physiological human (VPH) is a multiscale representation of human physiology spanning from the molecular level via cellular processes and multicellular organization of tissues to complex organ function. The different scales of the VPH deal with different entities, relationships and processes, and in consequence the models used to describe and simulate biological functions vary significantly. Here, we describe methods and strategies to generate knowledge environments representing molecular entities that can be used for modelling the molecular scale of the VPH. Our strategy to generate knowledge environments representing molecular entities is based on the combination of information extraction from scientific text and the integration of information from biomolecular databases. We introduce @neuLink, a first prototype of an automatically generated, disease-specific knowledge environment combining biomolecular, chemical, genetic and medical information. Finally, we provide a perspective for the future implementation and use of knowledge environments representing molecular entities for the VPH.
Workflow based framework for life science informatics.

PubMed

Tiwari, Abhishek; Sekhar, Arvind K T

2007-10-01

Workflow technology is a generic mechanism to integrate diverse types of available resources (databases, servers, software applications and different services) which facilitate knowledge exchange within traditionally divergent fields such as molecular biology, clinical research, computational science, physics, chemistry and statistics. Researchers can easily incorporate and access diverse, distributed tools and data to develop their own research protocols for scientific analysis. Application of workflow technology has been reported in areas like drug discovery, genomics, large-scale gene expression analysis, proteomics, and system biology. In this article, we have discussed the existing workflow systems and the trends in applications of workflow based systems.
An interactive web-tool for molecular analyses links naturally occurring mutation data with three-dimensional structures of the rhodopsin-like glycoprotein hormone receptors.

PubMed

Kleinau, Gunnar; Kreuchwig, Annika; Worth, Catherine L; Krause, Gerd

2010-06-01

The collection, description and molecular analysis of naturally occurring (pathogenic) mutations are important for understanding the functional mechanisms and malfunctions of biological units such as proteins. Numerous databases collate a huge amount of functional data or descriptions of mutations, but tools to analyse the molecular effects of genetic variations are as yet poorly provided. The goal of this work was therefore to develop a translational web-application that facilitates the interactive linkage of functional and structural data and which helps improve our understanding of the molecular basis of naturally occurring gain- or loss- of function mutations. Here we focus on the human glycoprotein hormone receptors (GPHRs), for which a huge number of mutations are known to cause diseases. We describe new options for interactive data analyses within three-dimensional structures, which enable the assignment of molecular relationships between structure and function. Strikingly, as the functional data are converted into relational percentage values, the system allows the comparison and classification of data from different GPHR subtypes and different experimental approaches. Our new application has been incorporated into a freely available database and website for the GPHRs (http://www.ssfa-gphr.de), but the principle development would also be applicable to other macromolecules.
Freshwater Biological Traits Database (Final Report)

EPA Science Inventory

EPA announced the release of the final report, Freshwater Biological Traits Database. This report discusses the development of a database of freshwater biological traits. The database combines several existing traits databases into an online format. The database is also...
Three-Dimensional Biologically Relevant Spectrum (BRS-3D): Shape Similarity Profile Based on PDB Ligands as Molecular Descriptors.

PubMed

Hu, Ben; Kuang, Zheng-Kun; Feng, Shi-Yu; Wang, Dong; He, Song-Bing; Kong, De-Xin

2016-11-17

The crystallized ligands in the Protein Data Bank (PDB) can be treated as the inverse shapes of the active sites of corresponding proteins. Therefore, the shape similarity between a molecule and PDB ligands indicated the possibility of the molecule to bind with the targets. In this paper, we proposed a shape similarity profile that can be used as a molecular descriptor for ligand-based virtual screening. First, through three-dimensional (3D) structural clustering, 300 diverse ligands were extracted from the druggable protein-ligand database, sc-PDB. Then, each of the molecules under scrutiny was flexibly superimposed onto the 300 ligands. Superimpositions were scored by shape overlap and property similarity, producing a 300 dimensional similarity array termed the "Three-Dimensional Biologically Relevant Spectrum (BRS-3D)". Finally, quantitative or discriminant models were developed with the 300 dimensional descriptor using machine learning methods (support vector machine). The effectiveness of this approach was evaluated using 42 benchmark data sets from the G protein-coupled receptor (GPCR) ligand library and the GPCR decoy database (GLL/GDD). We compared the performance of BRS-3D with other 2D and 3D state-of-the-art molecular descriptors. The results showed that models built with BRS-3D performed best for most GLL/GDD data sets. We also applied BRS-3D in histone deacetylase 1 inhibitors screening and GPCR subtype selectivity prediction. The advantages and disadvantages of this approach are discussed.
NCBI-compliant genome submissions: tips and tricks to save time and money.

PubMed

Pirovano, Walter; Boetzer, Marten; Derks, Martijn F L; Smit, Sandra

2017-03-01

Genome sequences nowadays play a central role in molecular biology and bioinformatics. These sequences are shared with the scientific community through sequence databases. The sequence repositories of the International Nucleotide Sequence Database Collaboration (INSDC, comprising GenBank, ENA and DDBJ) are the largest in the world. Preparing an annotated sequence in such a way that it will be accepted by the database is challenging because many validation criteria apply. In our opinion, it is an undesirable situation that researchers who want to submit their sequence need either a lot of experience or help from partners to get the job done. To save valuable time and money, we list a number of recommendations for people who want to submit an annotated genome to a sequence database, as well as for tool developers, who could help to ease the process. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
A computational chemistry perspective on the current status and future direction of hepatitis B antiviral drug discovery.

PubMed

Morgnanesi, Dante; Heinrichs, Eric J; Mele, Anthony R; Wilkinson, Sean; Zhou, Suzanne; Kulp, John L

2015-11-01

Computational chemical biology, applied to research on hepatitis B virus (HBV), has two major branches: bioinformatics (statistical models) and first-principle methods (molecular physics). While bioinformatics focuses on statistical tools and biological databases, molecular physics uses mathematics and chemical theory to study the interactions of biomolecules. Three computational techniques most commonly used in HBV research are homology modeling, molecular docking, and molecular dynamics. Homology modeling is a computational simulation to predict protein structure and has been used to construct conformers of the viral polymerase (reverse transcriptase domain and RNase H domain) and the HBV X protein. Molecular docking is used to predict the most likely orientation of a ligand when it is bound to a protein, as well as determining an energy score of the docked conformation. Molecular dynamics is a simulation that analyzes biomolecule motions and determines conformation and stability patterns. All of these modeling techniques have aided in the understanding of resistance mutations on HBV non-nucleos(t)ide reverse-transcriptase inhibitor binding. Finally, bioinformatics can be used to study the DNA and RNA protein sequences of viruses to both analyze drug resistance and to genotype the viral genomes. Overall, with these techniques, and others, computational chemical biology is becoming more and more necessary in hepatitis B research. This article forms part of a symposium in Antiviral Research on "An unfinished story: from the discovery of the Australia antigen to the development of new curative therapies for hepatitis B." Copyright © 2015 Elsevier B.V. All rights reserved.
Teaching the extracellular matrix and introducing online databases within a multidisciplinary course with i-cell-MATRIX: A student-centered approach.

PubMed

Sousa, João Carlos; Costa, Manuel João; Palha, Joana Almeida

2010-03-01

The biochemistry and molecular biology of the extracellular matrix (ECM) is difficult to convey to students in a classroom setting in ways that capture their interest. The understanding of the matrix's roles in physiological and pathological conditions study will presumably be hampered by insufficient knowledge of its molecular structure. Internet-available resources can bridge the division between the molecular details and ECM's biological properties and associated processes. This article presents an approach to teach the ECM developed for first year medical undergraduates who, working in teams: (i) Explore a specific molecular component of the matrix, (ii) identify a disease in which the component is implicated, (iii) investigate how the component's structure/function contributes to ECM' supramolecular organization in physiological and in pathological conditions, and (iv) share their findings with colleagues. The approach-designated i-cell-MATRIX-is focused on the contribution of individual components to the overall organization and biological functions of the ECM. i-cell-MATRIX is student centered and uses 5 hours of class time. Summary of results and take home message: A "1-minute paper" has been used to gather student feedback on the impact of i-cell-MATRIX. Qualitative analysis of student feedback gathered in three consecutive years revealed that students appreciate the approach's reliance on self-directed learning, the interactivity embedded and the demand for deeper insights on the ECM. Learning how to use internet biomedical resources is another positive outcome. Ninety percent of students recommend the activity for subsequent years. i-cell-MATRIX is adaptable by other medical schools which may be looking for an approach that achieves higher student engagement with the ECM. Copyright © 2010 International Union of Biochemistry and Molecular Biology, Inc.
dbAMEPNI: a database of alanine mutagenic effects for protein–nucleic acid interactions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, Ling; Xiong, Yi; Gao, Hongyun

Protein–nucleic acid interactions play essential roles in various biological activities such as gene regulation, transcription, DNA repair and DNA packaging. Understanding the effects of amino acid substitutions on protein–nucleic acid binding affinities can help elucidate the molecular mechanism of protein–nucleic acid recognition. Until now, no comprehensive and updated database of quantitative binding data on alanine mutagenic effects for protein–nucleic acid interactions is publicly accessible. Thus, we developed a new database of Alanine Mutagenic Effects for Protein-Nucleic Acid Interactions (dbAMEPNI). dbAMEPNI is a manually curated, literature-derived database, comprising over 577 alanine mutagenic data with experimentally determined binding affinities for protein–nucleic acidmore » complexes. Here, it contains several important parameters, such as dissociation constant (Kd), Gibbs free energy change (ΔΔG), experimental conditions and structural parameters of mutant residues. In addition, the database provides an extended dataset of 282 single alanine mutations with only qualitative data (or descriptive effects) of thermodynamic information.« less
dbAMEPNI: a database of alanine mutagenic effects for protein–nucleic acid interactions

DOE PAGES

Liu, Ling; Xiong, Yi; Gao, Hongyun; ...

2018-04-02

Protein–nucleic acid interactions play essential roles in various biological activities such as gene regulation, transcription, DNA repair and DNA packaging. Understanding the effects of amino acid substitutions on protein–nucleic acid binding affinities can help elucidate the molecular mechanism of protein–nucleic acid recognition. Until now, no comprehensive and updated database of quantitative binding data on alanine mutagenic effects for protein–nucleic acid interactions is publicly accessible. Thus, we developed a new database of Alanine Mutagenic Effects for Protein-Nucleic Acid Interactions (dbAMEPNI). dbAMEPNI is a manually curated, literature-derived database, comprising over 577 alanine mutagenic data with experimentally determined binding affinities for protein–nucleic acidmore » complexes. Here, it contains several important parameters, such as dissociation constant (Kd), Gibbs free energy change (ΔΔG), experimental conditions and structural parameters of mutant residues. In addition, the database provides an extended dataset of 282 single alanine mutations with only qualitative data (or descriptive effects) of thermodynamic information.« less
GenBank.

PubMed

Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Wheeler, David L

2008-01-01

GenBank (R) is a comprehensive database that contains publicly available nucleotide sequences for more than 260 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov.
GenBank

PubMed Central

Benson, Dennis A.; Karsch-Mizrachi, Ilene; Lipman, David J.; Ostell, James; Wheeler, David L.

2008-01-01

GenBank (R) is a comprehensive database that contains publicly available nucleotide sequences for more than 260 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov PMID:18073190
The Binding Database: data management and interface design.

PubMed

Chen, Xi; Lin, Yuhmei; Liu, Ming; Gilson, Michael K

2002-01-01

The large and growing body of experimental data on biomolecular binding is of enormous value in developing a deeper understanding of molecular biology, in developing new therapeutics, and in various molecular design applications. However, most of these data are found only in the published literature and are therefore difficult to access and use. No existing public database has focused on measured binding affinities and has provided query capabilities that include chemical structure and sequence homology searches. We have created Binding DataBase (BindingDB), a public, web-accessible database of measured binding affinities. BindingDB is based upon a relational data specification for describing binding measurements via Isothermal Titration Calorimetry (ITC) and enzyme inhibition. A corresponding XML Document Type Definition (DTD) is used to create and parse intermediate files during the on-line deposition process and will also be used for data interchange, including collection of data from other sources. The on-line query interface, which is constructed with Java Servlet technology, supports standard SQL queries as well as searches for molecules by chemical structure and sequence homology. The on-line deposition interface uses Java Server Pages and JavaBean objects to generate dynamic HTML and to store intermediate results. The resulting data resource provides a range of functionality with brisk response-times, and lends itself well to continued development and enhancement.
Scrubchem: Building Bioactivity Datasets from Pubchem ...

EPA Pesticide Factsheets

The PubChem Bioassay database is a non-curated public repository with data from 64 sources, including: ChEMBL, BindingDb, DrugBank, EPA Tox21, NIH Molecular Libraries Screening Program, and various other academic, government, and industrial contributors. Methods for extracting this public data into quality datasets, useable for analytical research, presents several big-data challenges for which we have designed manageable solutions. According to our preliminary work, there are approximately 549 million bioactivity values and related meta-data within PubChem that can be mapped to over 10,000 biological targets. However, this data is not ready for use in data-driven research, mainly due to lack of structured annotations.We used a pragmatic approach that provides increasing access to bioactivity values in the PubChem Bioassay database. This included restructuring of individual PubChem Bioassay files into a relational database (ScrubChem). ScrubChem contains all primary PubChem Bioassay data that was: reparsed; error-corrected (when applicable); enriched with additional data links from other NCBI databases; and improved by adding key biological and assay annotations derived from logic-based language processing rules. The utility of ScrubChem and the curation process were illustrated using an example bioactivity dataset for the androgen receptor protein. This initial work serves as a trial ground for establishing the technical framework for accessing, integrating, cu

An NMR database for simulations of membrane dynamics.

PubMed

Leftin, Avigdor; Brown, Michael F

2011-03-01

Computational methods are powerful in capturing the results of experimental studies in terms of force fields that both explain and predict biological structures. Validation of molecular simulations requires comparison with experimental data to test and confirm computational predictions. Here we report a comprehensive database of NMR results for membrane phospholipids with interpretations intended to be accessible by non-NMR specialists. Experimental ¹³C-¹H and ²H NMR segmental order parameters (S(CH) or S(CD)) and spin-lattice (Zeeman) relaxation times (T(1Z)) are summarized in convenient tabular form for various saturated, unsaturated, and biological membrane phospholipids. Segmental order parameters give direct information about bilayer structural properties, including the area per lipid and volumetric hydrocarbon thickness. In addition, relaxation rates provide complementary information about molecular dynamics. Particular attention is paid to the magnetic field dependence (frequency dispersion) of the NMR relaxation rates in terms of various simplified power laws. Model-free reduction of the T(1Z) studies in terms of a power-law formalism shows that the relaxation rates for saturated phosphatidylcholines follow a single frequency-dispersive trend within the MHz regime. We show how analytical models can guide the continued development of atomistic and coarse-grained force fields. Our interpretation suggests that lipid diffusion and collective order fluctuations are implicitly governed by the viscoelastic nature of the liquid-crystalline ensemble. Collective bilayer excitations are emergent over mesoscopic length scales that fall between the molecular and bilayer dimensions, and are important for lipid organization and lipid-protein interactions. Future conceptual advances and theoretical reductions will foster understanding of biomembrane structural dynamics through a synergy of NMR measurements and molecular simulations. Copyright © 2010 Elsevier B.V. All rights reserved.
RNA-Seq Analysis of Cocos nucifera: Transcriptome Sequencing and De Novo Assembly for Subsequent Functional Genomics Approaches

PubMed Central

Xia, Wei; Mason, Annaliese S.; Xia, Zhihui; Qiao, Fei; Zhao, Songlin; Tang, Haoru

2013-01-01

Background Cocos nucifera (coconut), a member of the Arecaceae family, is an economically important woody palm grown in tropical regions. Despite its agronomic importance, previous germplasm assessment studies have relied solely on morphological and agronomical traits. Molecular biology techniques have been scarcely used in assessment of genetic resources and for improvement of important agronomic and quality traits in Cocos nucifera, mostly due to the absence of available sequence information. Methodology/Principal Findings To provide basic information for molecular breeding and further molecular biological analysis in Cocos nucifera, we applied RNA-seq technology and de novo assembly to gain a global overview of the Cocos nucifera transcriptome from mixed tissue samples. Using Illumina sequencing, we obtained 54.9 million short reads and conducted de novo assembly to obtain 57,304 unigenes with an average length of 752 base pairs. Sequence comparison between assembled unigenes and released cDNA sequences of Cocos nucifera and Elaeis guineensis indicated that the assembled sequences were of high quality. Approximately 99.9% of unigenes were novel compared to the released coconut EST sequences. Using BLASTX, 68.2% of unigenes were successfully annotated based on the Genbank non-redundant (Nr) protein database. The annotated unigenes were then further classified using the Gene Ontology (GO), Clusters of Orthologous Groups (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Conclusions/Significance Our study provides a large quantity of novel genetic information for Cocos nucifera. This information will act as a valuable resource for further molecular genetic studies and breeding in coconut, as well as for isolation and characterization of functional genes involved in different biochemical pathways in this important tropical crop species. PMID:23555859
RNA-Seq analysis of Cocos nucifera: transcriptome sequencing and de novo assembly for subsequent functional genomics approaches.

PubMed

Fan, Haikuo; Xiao, Yong; Yang, Yaodong; Xia, Wei; Mason, Annaliese S; Xia, Zhihui; Qiao, Fei; Zhao, Songlin; Tang, Haoru

2013-01-01

Cocos nucifera (coconut), a member of the Arecaceae family, is an economically important woody palm grown in tropical regions. Despite its agronomic importance, previous germplasm assessment studies have relied solely on morphological and agronomical traits. Molecular biology techniques have been scarcely used in assessment of genetic resources and for improvement of important agronomic and quality traits in Cocos nucifera, mostly due to the absence of available sequence information. To provide basic information for molecular breeding and further molecular biological analysis in Cocos nucifera, we applied RNA-seq technology and de novo assembly to gain a global overview of the Cocos nucifera transcriptome from mixed tissue samples. Using Illumina sequencing, we obtained 54.9 million short reads and conducted de novo assembly to obtain 57,304 unigenes with an average length of 752 base pairs. Sequence comparison between assembled unigenes and released cDNA sequences of Cocos nucifera and Elaeis guineensis indicated that the assembled sequences were of high quality. Approximately 99.9% of unigenes were novel compared to the released coconut EST sequences. Using BLASTX, 68.2% of unigenes were successfully annotated based on the Genbank non-redundant (Nr) protein database. The annotated unigenes were then further classified using the Gene Ontology (GO), Clusters of Orthologous Groups (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Our study provides a large quantity of novel genetic information for Cocos nucifera. This information will act as a valuable resource for further molecular genetic studies and breeding in coconut, as well as for isolation and characterization of functional genes involved in different biochemical pathways in this important tropical crop species.
Molecular Biology and Prevention of Endometrial Cancer

DTIC Science & Technology

2009-07-01

us time to complete the study. Aim 2: To analyze vaginal and cervical adenocarcinomas , that have arisen in women exposed to DES in- utero , for...therapy. Methods: 1) Oligonucleotide microarray analysis was performed on a panel of endometrial cancers. 2) A subset of adenocarcinoma cases...from the International DES Registry (IDESR) was analyzed for MSI 3) A case-control study of the CASH database was performed to evaluate the
Whole Transcriptome Analysis Provides Insights into Molecular Mechanisms for Molting in Litopenaeus vannamei

PubMed Central

Gao, Yi; Zhang, Xiaojun; Wei, Jiankai; Sun, Xiaoqing; Yuan, Jianbo; Li, Fuhua; Xiang, Jianhai

2015-01-01

Molting is one of the most important biological processes in shrimp growth and development. All shrimp undergo cyclic molting periodically to shed and replace their exoskeletons. This process is essential for growth, metamorphosis, and reproduction in shrimp. However, the molecular mechanisms underlying shrimp molting remain poorly understood. In this study, we investigated global expression changes in the transcriptomes of the Pacific white shrimp, Litopenaeus vannamei, the most commonly cultured shrimp species worldwide. The transcriptome of whole L. vannamei was investigated by RNA-sequencing (RNA-seq) throughout the molting cycle, including the inter-molt (C), pre-molt (D0, D1, D2, D3, D4), and post-molt (P1 and P2) stages, and 93,756 unigenes were identified. Among these genes, we identified 5,117 genes differentially expressed (log2ratio ≥1 and FDR ≤0.001) in adjacent molt stages. The results were compared against the National Center for Biotechnology Information (NCBI) non-redundant protein/nucleotide sequence database, Swiss-Prot, PFAM database, the Gene Ontology database, and the Kyoto Encyclopedia of Genes and Genomes database in order to annotate gene descriptions, associate them with gene ontology terms, and assign them to pathways. The expression patterns for genes involved in several molecular events critical for molting, such as hormone regulation, triggering events, implementation phases, skelemin, immune responses were characterized and considered as mechanisms underlying molting in L. vannamei. Comparisons with transcriptomic analyses in other arthropods were also performed. The characterization of major transcriptional changes in genes involved in the molting cycle provides candidates for future investigation of the molecular mechanisms. The data generated in this study will serve as an important transcriptomic resource for the shrimp research community to facilitate gene and genome annotation and to characterize key molecular processes underlying shrimp development. PMID:26650402
[The interpretation and integration of traditional Chinese phytotherapy into Western-type medicine with the possession of knowledge of the human genome].

PubMed

Blázovics, Anna

2018-05-01

The terminology of traditional Chinese medicine (TCM) is hardly interpretable in the context of human genome, therefore the human genome program attracted attention towards the Western practice of medicine in China. In the last two decades, several important steps could be observed in China in relation to the approach of traditional Chinese and Western medicine. The Chinese government supports the realization of information databases for research in order to clarify the molecular biology level to detect associations between gene expression signal transduction pathways and protein-protein interactions, and the effects of bioactive components of Chinese drugs and their effectiveness. The values of TCM are becoming more and more important for Western medicine as well, because molecular biological therapies did not redeem themselves, e.g., in tumor therapy. Orv Hetil. 2018; 159(18): 696-702.
Effects of small particle numbers on long-term behaviour in discrete biochemical systems

PubMed Central

Ibrahim, Bashar; Dittrich, Peter

2014-01-01

Motivation: The functioning of many biological processes depends on the appearance of only a small number of a single molecular species. Additionally, the observation of molecular crowding leads to the insight that even a high number of copies of species do not guarantee their interaction. How single particles contribute to stabilizing biological systems is not well understood yet. Hence, we aim at determining the influence of single molecules on the long-term behaviour of biological systems, i.e. whether they can reach a steady state. Results: We provide theoretical considerations and a tool to analyse Systems Biology Markup Language models for the possibility to stabilize because of the described effects. The theory is an extension of chemical organization theory, which we called discrete chemical organization theory. Furthermore we scanned the BioModels Database for the occurrence of discrete chemical organizations. To exemplify our method, we describe an application to the Template model of the mitotic spindle assembly checkpoint mechanism. Availability and implementation: http://www.biosys.uni-jena.de/Services.html. Contact: bashar.ibrahim@uni-jena.de or dittrich@minet.uni-jena.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25161236
Database resources of the National Center for Biotechnology Information

PubMed Central

2015-01-01

The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. Additional NCBI resources focus on literature (Bookshelf, PubMed Central (PMC) and PubReader); medical genetics (ClinVar, dbMHC, the Genetic Testing Registry, HIV-1/Human Protein Interaction Database and MedGen); genes and genomics (BioProject, BioSample, dbSNP, dbVar, Epigenomics, Gene, Gene Expression Omnibus (GEO), Genome, HomoloGene, the Map Viewer, Nucleotide, PopSet, Probe, RefSeq, Sequence Read Archive, the Taxonomy Browser, Trace Archive and UniGene); and proteins and chemicals (Biosystems, COBALT, the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), the Molecular Modeling Database (MMDB), Protein Clusters, Protein and the PubChem suite of small molecule databases). The Entrez system provides search and retrieval operations for many of these databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at http://www.ncbi.nlm.nih.gov. PMID:25398906
Database resources of the National Center for Biotechnology Information

PubMed Central

2016-01-01

The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. Additional NCBI resources focus on literature (PubMed Central (PMC), Bookshelf and PubReader), health (ClinVar, dbGaP, dbMHC, the Genetic Testing Registry, HIV-1/Human Protein Interaction Database and MedGen), genomes (BioProject, Assembly, Genome, BioSample, dbSNP, dbVar, Epigenomics, the Map Viewer, Nucleotide, Probe, RefSeq, Sequence Read Archive, the Taxonomy Browser and the Trace Archive), genes (Gene, Gene Expression Omnibus (GEO), HomoloGene, PopSet and UniGene), proteins (Protein, the Conserved Domain Database (CDD), COBALT, Conserved Domain Architecture Retrieval Tool (CDART), the Molecular Modeling Database (MMDB) and Protein Clusters) and chemicals (Biosystems and the PubChem suite of small molecule databases). The Entrez system provides search and retrieval operations for most of these databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized datasets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. PMID:26615191
Integrative Sparse K-Means With Overlapping Group Lasso in Genomic Applications for Disease Subtype Discovery

PubMed Central

Huo, Zhiguang; Tseng, George

2017-01-01

Cancer subtypes discovery is the first step to deliver personalized medicine to cancer patients. With the accumulation of massive multi-level omics datasets and established biological knowledge databases, omics data integration with incorporation of rich existing biological knowledge is essential for deciphering a biological mechanism behind the complex diseases. In this manuscript, we propose an integrative sparse K-means (is-K means) approach to discover disease subtypes with the guidance of prior biological knowledge via sparse overlapping group lasso. An algorithm using an alternating direction method of multiplier (ADMM) will be applied for fast optimization. Simulation and three real applications in breast cancer and leukemia will be used to compare is-K means with existing methods and demonstrate its superior clustering accuracy, feature selection, functional annotation of detected molecular features and computing efficiency. PMID:28959370
Integrative Sparse K-Means With Overlapping Group Lasso in Genomic Applications for Disease Subtype Discovery.

PubMed

Huo, Zhiguang; Tseng, George

2017-06-01

Cancer subtypes discovery is the first step to deliver personalized medicine to cancer patients. With the accumulation of massive multi-level omics datasets and established biological knowledge databases, omics data integration with incorporation of rich existing biological knowledge is essential for deciphering a biological mechanism behind the complex diseases. In this manuscript, we propose an integrative sparse K -means (is- K means) approach to discover disease subtypes with the guidance of prior biological knowledge via sparse overlapping group lasso. An algorithm using an alternating direction method of multiplier (ADMM) will be applied for fast optimization. Simulation and three real applications in breast cancer and leukemia will be used to compare is- K means with existing methods and demonstrate its superior clustering accuracy, feature selection, functional annotation of detected molecular features and computing efficiency.
Database resources of the National Center for Biotechnology Information

PubMed Central

Wheeler, David L.; Barrett, Tanya; Benson, Dennis A.; Bryant, Stephen H.; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M.; DiCuccio, Michael; Edgar, Ron; Federhen, Scott; Geer, Lewis Y.; Helmberg, Wolfgang; Kapustin, Yuri; Kenton, David L.; Khovayko, Oleg; Lipman, David J.; Madden, Thomas L.; Maglott, Donna R.; Ostell, James; Pruitt, Kim D.; Schuler, Gregory D.; Schriml, Lynn M.; Sequeira, Edwin; Sherry, Stephen T.; Sirotkin, Karl; Souvorov, Alexandre; Starchenko, Grigory; Suzek, Tugba O.; Tatusov, Roman; Tatusova, Tatiana A.; Wagner, Lukas; Yaschenko, Eugene

2006-01-01

In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups, Retroviral Genotyping Tools, HIV-1, Human Protein Interaction Database, SAGEmap, Gene Expression Omnibus, Entrez Probe, GENSAT, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized datasets. All of the resources can be accessed through the NCBI home page at: . PMID:16381840
Database resources of the National Center for Biotechnology Information.

PubMed

Sayers, Eric W; Barrett, Tanya; Benson, Dennis A; Bolton, Evan; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; Dicuccio, Michael; Federhen, Scott; Feolo, Michael; Fingerman, Ian M; Geer, Lewis Y; Helmberg, Wolfgang; Kapustin, Yuri; Krasnov, Sergey; Landsman, David; Lipman, David J; Lu, Zhiyong; Madden, Thomas L; Madej, Tom; Maglott, Donna R; Marchler-Bauer, Aron; Miller, Vadim; Karsch-Mizrachi, Ilene; Ostell, James; Panchenko, Anna; Phan, Lon; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Stephen T; Shumway, Martin; Sirotkin, Karl; Slotta, Douglas; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A; Wagner, Lukas; Wang, Yanli; Wilbur, W John; Yaschenko, Eugene; Ye, Jian

2012-01-01

In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Website. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Probe, Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Database resources of the National Center for Biotechnology Information

PubMed Central

Acland, Abigail; Agarwala, Richa; Barrett, Tanya; Beck, Jeff; Benson, Dennis A.; Bollin, Colleen; Bolton, Evan; Bryant, Stephen H.; Canese, Kathi; Church, Deanna M.; Clark, Karen; DiCuccio, Michael; Dondoshansky, Ilya; Federhen, Scott; Feolo, Michael; Geer, Lewis Y.; Gorelenkov, Viatcheslav; Hoeppner, Marilu; Johnson, Mark; Kelly, Christopher; Khotomlianski, Viatcheslav; Kimchi, Avi; Kimelman, Michael; Kitts, Paul; Krasnov, Sergey; Kuznetsov, Anatoliy; Landsman, David; Lipman, David J.; Lu, Zhiyong; Madden, Thomas L.; Madej, Tom; Maglott, Donna R.; Marchler-Bauer, Aron; Karsch-Mizrachi, Ilene; Murphy, Terence; Ostell, James; O'Sullivan, Christopher; Panchenko, Anna; Phan, Lon; Pruitt, Don Preussm Kim D.; Rubinstein, Wendy; Sayers, Eric W.; Schneider, Valerie; Schuler, Gregory D.; Sequeira, Edwin; Sherry, Stephen T.; Shumway, Martin; Sirotkin, Karl; Siyan, Karanjit; Slotta, Douglas; Soboleva, Alexandra; Soussov, Vladimir; Starchenko, Grigory; Tatusova, Tatiana A.; Trawick, Bart W.; Vakatov, Denis; Wang, Yanli; Ward, Minghong; John Wilbur, W.; Yaschenko, Eugene; Zbicz, Kerry

2014-01-01

In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, PubReader, Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link, Primer-BLAST, COBALT, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, the Genetic Testing Registry, Genome and related tools, the Map Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, ClinVar, MedGen, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Probe, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All these resources can be accessed through the NCBI home page. PMID:24259429
Database resources of the National Center for Biotechnology Information

PubMed Central

Wheeler, David L.; Church, Deanna M.; Lash, Alex E.; Leipe, Detlef D.; Madden, Thomas L.; Pontius, Joan U.; Schuler, Gregory D.; Schriml, Lynn M.; Tatusova, Tatiana A.; Wagner, Lukas; Rapp, Barbara A.

2001-01-01

In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources that operate on the data in GenBank and a variety of other biological data made available through NCBI’s Web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, HomoloGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing, Human MapViewer, GeneMap’99, Human–Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, Cancer Genome Anatomy Project (CGAP), SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB) and the Conserved Domain Database (CDD). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov. PMID:11125038
Database resources of the National Center for Biotechnology

PubMed Central

Wheeler, David L.; Church, Deanna M.; Federhen, Scott; Lash, Alex E.; Madden, Thomas L.; Pontius, Joan U.; Schuler, Gregory D.; Schriml, Lynn M.; Sequeira, Edwin; Tatusova, Tatiana A.; Wagner, Lukas

2003-01-01

In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, PubMed, PubMed Central (PMC), LocusLink, the NCBITaxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR (e-PCR), Open Reading Frame (ORF) Finder, References Sequence (RefSeq), UniGene, HomoloGene, ProtEST, Database of Single Nucleotide Polymorphisms (dbSNP), Human/Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes and related tools, the Map Viewer, Model Maker (MM), Evidence Viewer (EV), Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), and the Conserved Domain Architecture Retrieval Tool (CDART). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov. PMID:12519941
Investigating Evolutionary Questions Using Online Molecular Databases.

ERIC Educational Resources Information Center

Puterbaugh, Mary N.; Burleigh, J. Gordon

2001-01-01

Recommends using online molecular databases as teaching tools to illustrate evolutionary questions and concepts while introducing students to public molecular databases. Provides activities in which students make molecular comparisons between species. (YDS)
Defining a Computational Framework for the Assessment of ...

EPA Pesticide Factsheets

The Adverse Outcome Pathway (AOP) framework describes the effects of environmental stressors across multiple scales of biological organization and function. This includes an evaluation of the potential for each key event to occur across a broad range of species in order to determine the taxonomic applicability of each AOP. Computational tools are needed to facilitate this process. Recently, we developed a tool that uses sequence homology to evaluate the applicability of molecular initiating events across species (Lalone et al., Toxicol. Sci., 2016). To extend our ability to make computational predictions at higher levels of biological organization, we have created the AOPdb. This database links molecular targets identified associated with key events in the AOPwiki to publically available data (e.g. gene-protein, pathway, species orthology, ontology, chemical, disease) including ToxCast assay information. The AOPdb combines different data types in order to characterize the impacts of chemicals to human health and the environment and serves as a decision support tool for case study development in the area of taxonomic applicability. As a proof of concept, the AOPdb allows identification of relevant molecular targets, biological pathways, and chemical and disease associations across species for four AOPs from the AOP-Wiki (https://aopwiki.org): Estrogen receptor antagonism leading to reproductive dysfunction (Aop:30); Aromatase inhibition leading to reproductive d
Simple system--substantial share: the use of Dictyostelium in cell biology and molecular medicine.

PubMed

Müller-Taubenberger, Annette; Kortholt, Arjan; Eichinger, Ludwig

2013-02-01

Dictyostelium discoideum offers unique advantages for studying fundamental cellular processes, host-pathogen interactions as well as the molecular causes of human diseases. The organism can be easily grown in large amounts and is amenable to diverse biochemical, cell biological and genetic approaches. Throughout their life cycle Dictyostelium cells are motile, and thus are perfectly suited to study random and directed cell motility with the underlying changes in signal transduction and the actin cytoskeleton. Dictyostelium is also increasingly used for the investigation of human disease genes and the crosstalk between host and pathogen. As a professional phagocyte it can be infected with several human bacterial pathogens and used to study the infection process. The availability of a large number of knock-out mutants renders Dictyostelium particularly useful for the elucidation and investigation of host cell factors. A powerful armory of molecular genetic techniques that have been continuously expanded over the years and a well curated genome sequence, which is accessible via the online database dictyBase, considerably strengthened Dictyostelium's experimental attractiveness and its value as model organism. Copyright © 2012 Elsevier GmbH. All rights reserved.
PyPathway: Python Package for Biological Network Analysis and Visualization.

PubMed

Xu, Yang; Luo, Xiao-Chun

2018-05-01

Life science studies represent one of the biggest generators of large data sets, mainly because of rapid sequencing technological advances. Biological networks including interactive networks and human curated pathways are essential to understand these high-throughput data sets. Biological network analysis offers a method to explore systematically not only the molecular complexity of a particular disease but also the molecular relationships among apparently distinct phenotypes. Currently, several packages for Python community have been developed, such as BioPython and Goatools. However, tools to perform comprehensive network analysis and visualization are still needed. Here, we have developed PyPathway, an extensible free and open source Python package for functional enrichment analysis, network modeling, and network visualization. The network process module supports various interaction network and pathway databases such as Reactome, WikiPathway, STRING, and BioGRID. The network analysis module implements overrepresentation analysis, gene set enrichment analysis, network-based enrichment, and de novo network modeling. Finally, the visualization and data publishing modules enable users to share their analysis by using an easy web application. For package availability, see the first Reference.

bcl::Cluster : A method for clustering biological molecules coupled with visualization in the Pymol Molecular Graphics System.

PubMed

Alexander, Nathan; Woetzel, Nils; Meiler, Jens

2011-02-01

Clustering algorithms are used as data analysis tools in a wide variety of applications in Biology. Clustering has become especially important in protein structure prediction and virtual high throughput screening methods. In protein structure prediction, clustering is used to structure the conformational space of thousands of protein models. In virtual high throughput screening, databases with millions of drug-like molecules are organized by structural similarity, e.g. common scaffolds. The tree-like dendrogram structure obtained from hierarchical clustering can provide a qualitative overview of the results, which is important for focusing detailed analysis. However, in practice it is difficult to relate specific components of the dendrogram directly back to the objects of which it is comprised and to display all desired information within the two dimensions of the dendrogram. The current work presents a hierarchical agglomerative clustering method termed bcl::Cluster. bcl::Cluster utilizes the Pymol Molecular Graphics System to graphically depict dendrograms in three dimensions. This allows simultaneous display of relevant biological molecules as well as additional information about the clusters and the members comprising them.
BiologicalNetworks 2.0 - an integrative view of genome biology data

PubMed Central

2010-01-01

Background A significant problem in the study of mechanisms of an organism's development is the elucidation of interrelated factors which are making an impact on the different levels of the organism, such as genes, biological molecules, cells, and cell systems. Numerous sources of heterogeneous data which exist for these subsystems are still not integrated sufficiently enough to give researchers a straightforward opportunity to analyze them together in the same frame of study. Systematic application of data integration methods is also hampered by a multitude of such factors as the orthogonal nature of the integrated data and naming problems. Results Here we report on a new version of BiologicalNetworks, a research environment for the integral visualization and analysis of heterogeneous biological data. BiologicalNetworks can be queried for properties of thousands of different types of biological entities (genes/proteins, promoters, COGs, pathways, binding sites, and other) and their relations (interactions, co-expression, co-citations, and other). The system includes the build-pathways infrastructure for molecular interactions/relations and module discovery in high-throughput experiments. Also implemented in BiologicalNetworks are the Integrated Genome Viewer and Comparative Genomics Browser applications, which allow for the search and analysis of gene regulatory regions and their conservation in multiple species in conjunction with molecular pathways/networks, experimental data and functional annotations. Conclusions The new release of BiologicalNetworks together with its back-end database introduces extensive functionality for a more efficient integrated multi-level analysis of microarray, sequence, regulatory, and other data. BiologicalNetworks is freely available at http://www.biologicalnetworks.org. PMID:21190573
An Integrated Korean Biodiversity and Genetic Information Retrieval System

PubMed Central

Lim, Jeongheui; Bhak, Jong; Oh, Hee-Mock; Kim, Chang-Bae; Park, Yong-Ha; Paek, Woon Kee

2008-01-01

Background On-line biodiversity information databases are growing quickly and being integrated into general bioinformatics systems due to the advances of fast gene sequencing technologies and the Internet. These can reduce the cost and effort of performing biodiversity surveys and genetic searches, which allows scientists to spend more time researching and less time collecting and maintaining data. This will cause an increased rate of knowledge build-up and improve conservations. The biodiversity databases in Korea have been scattered among several institutes and local natural history museums with incompatible data types. Therefore, a comprehensive database and a nation wide web portal for biodiversity information is necessary in order to integrate diverse information resources, including molecular and genomic databases. Results The Korean Natural History Research Information System (NARIS) was built and serviced as the central biodiversity information system to collect and integrate the biodiversity data of various institutes and natural history museums in Korea. This database aims to be an integrated resource that contains additional biological information, such as genome sequences and molecular level diversity. Currently, twelve institutes and museums in Korea are integrated by the DiGIR (Distributed Generic Information Retrieval) protocol, with Darwin Core2.0 format as its metadata standard for data exchange. Data quality control and statistical analysis functions have been implemented. In particular, integrating molecular and genetic information from the National Center for Biotechnology Information (NCBI) databases with NARIS was recently accomplished. NARIS can also be extended to accommodate other institutes abroad, and the whole system can be exported to establish local biodiversity management servers. Conclusion A Korean data portal, NARIS, has been developed to efficiently manage and utilize biodiversity data, which includes genetic resources. NARIS aims to be integral in maximizing bio-resource utilization for conservation, management, research, education, industrial applications, and integration with other bioinformation data resources. It can be found at . PMID:19091024
The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata.

PubMed

Liolios, Konstantinos; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Kyrpides, Nikos C

2008-01-01

The Genomes On Line Database (GOLD) is a comprehensive resource that provides information on genome and metagenome projects worldwide. Complete and ongoing projects and their associated metadata can be accessed in GOLD through pre-computed lists and a search page. As of September 2007, GOLD contains information on more than 2900 sequencing projects, out of which 639 have been completed and their sequence data deposited in the public databases. GOLD continues to expand with the goal of providing metadata information related to the projects and the organisms/environments towards the Minimum Information about a Genome Sequence' (MIGS) guideline. GOLD is available at http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece at http://gold.imbb.forth.gr/
The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata

PubMed Central

Liolios, Konstantinos; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Kyrpides, Nikos C.

2008-01-01

The Genomes On Line Database (GOLD) is a comprehensive resource that provides information on genome and metagenome projects worldwide. Complete and ongoing projects and their associated metadata can be accessed in GOLD through pre-computed lists and a search page. As of September 2007, GOLD contains information on more than 2900 sequencing projects, out of which 639 have been completed and their sequence data deposited in the public databases. GOLD continues to expand with the goal of providing metadata information related to the projects and the organisms/environments towards the Minimum Information about a Genome Sequence’ (MIGS) guideline. GOLD is available at http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece at http://gold.imbb.forth.gr/ PMID:17981842
Molecular Biology and Prevention of Endometrial Cancer

DTIC Science & Technology

2006-07-01

adenocarcinoma cases from the International DES Registry (IDESR) was analyzed for MSI 3) A case-control study of the CASH database was performed to...that have arisen in women exposed to DES in- utero , for methylation and mutation of PTEN and MLH1 in order to determine if estrogen induces genetic...and analyzed, which would most likely take an additional 3-6 months after enrollment. Aim 2: To analyze vaginal and cervical adenocarcinomas
Molecular Biology and Prevention of Endometrial Cancer. Addendum

DTIC Science & Technology

2008-07-01

2) A subset of adenocarcinoma cases from the International DES Registry (IDESR) was analyzed for MSI 3) A case-control study of the CASH database... DES in- utero , for methylation and mutation of PTEN and MLH1 in order to determine if estrogen induces genetic alterations in these tumors...current trial within the “Gynecologic Disease Program”. Aim 2: To analyze vaginal and cervical adenocarcinomas , that have arisen in women exposed to
FBIS: A regional DNA barcode archival & analysis system for Indian fishes.

PubMed

Nagpure, Naresh Sahebrao; Rashid, Iliyas; Pathak, Ajey Kumar; Singh, Mahender; Singh, Shri Prakash; Sarkar, Uttam Kumar

2012-01-01

DNA barcode is a new tool for taxon recognition and classification of biological organisms based on sequence of a fragment of mitochondrial gene, cytochrome c oxidase I (COI). In view of the growing importance of the fish DNA barcoding for species identification, molecular taxonomy and fish diversity conservation, we developed a Fish Barcode Information System (FBIS) for Indian fishes, which will serve as a regional DNA barcode archival and analysis system. The database presently contains 2334 sequence records of COI gene for 472 aquatic species belonging to 39 orders and 136 families, collected from available published data sources. Additionally, it contains information on phenotype, distribution and IUCN Red List status of fishes. The web version of FBIS was designed using MySQL, Perl and PHP under Linux operating platform to (a) store and manage the acquisition (b) analyze and explore DNA barcode records (c) identify species and estimate genetic divergence. FBIS has also been integrated with appropriate tools for retrieving and viewing information about the database statistics and taxonomy. It is expected that FBIS would be useful as a potent information system in fish molecular taxonomy, phylogeny and genomics. The database is available for free at http://mail.nbfgr.res.in/fbis/
TISSUES 2.0: an integrative web resource on mammalian tissue expression

PubMed Central

Palasca, Oana; Santos, Alberto; Stolte, Christian; Gorodkin, Jan; Jensen, Lars Juhl

2018-01-01

Abstract Physiological and molecular similarities between organisms make it possible to translate findings from simpler experimental systems—model organisms—into more complex ones, such as human. This translation facilitates the understanding of biological processes under normal or disease conditions. Researchers aiming to identify the similarities and differences between organisms at the molecular level need resources collecting multi-organism tissue expression data. We have developed a database of gene–tissue associations in human, mouse, rat and pig by integrating multiple sources of evidence: transcriptomics covering all four species and proteomics (human only), manually curated and mined from the scientific literature. Through a scoring scheme, these associations are made comparable across all sources of evidence and across organisms. Furthermore, the scoring produces a confidence score assigned to each of the associations. The TISSUES database (version 2.0) is publicly accessible through a user-friendly web interface and as part of the STRING app for Cytoscape. In addition, we analyzed the agreement between datasets, across and within organisms, and identified that the agreement is mainly affected by the quality of the datasets rather than by the technologies used or organisms compared. Database URL: http://tissues.jensenlab.org/ PMID:29617745
Reactivity of 12-tungstophosphoric acid and its inhibitor potency toward Na+/K+-ATPase: A combined 31P NMR study, ab initio calculations and crystallographic analysis.

PubMed

Bošnjaković-Pavlović, Nada; Bajuk-Bogdanović, Danica; Zakrzewska, Joanna; Yan, Zeyin; Holclajtner-Antunović, Ivanka; Gillet, Jean-Michel; Spasojević-de Biré, Anne

2017-11-01

Influence of 12-tungstophosphoric acid (WPA) on conversion of adenosine triphosphate (ATP) to adenosine diphosphate (ADP) in the presence of Na + /K + -ATPase was monitored by 31 P NMR spectroscopy. It was shown that WPA exhibits inhibitory effect on Na + /K + -ATPase activity. In order to study WPA reactivity and intermolecular interactions between WPA oxygen atoms and different proton donor types (D=O, N, C), we have considered data for WPA based compounds from the Cambridge Structural Database (CSD), the Crystallographic Open Database (COD) and the Inorganic Crystal Structure Database (ICSD). Binding properties of Keggin's anion in biological systems are illustrated using Protein Data Bank (PDB). This work constitutes the first determination of theoretical Bader charges on polyoxotungstate compound via the Atom In Molecule theory. An analysis of electrostatic potential maps at the molecular surface and charge of WPA, resulting from DFT calculations, suggests that the preferred protonation site corresponds to WPA bridging oxygen. These results enlightened WPA chemical reactivity and its potential biological applications such as the inhibition of the ATPase activity. Copyright © 2017 Elsevier Inc. All rights reserved.
BioAssay Research Database (BARD): chemical biology and probe-development enabled by structured metadata and result types

PubMed Central

Howe, E.A.; de Souza, A.; Lahr, D.L.; Chatwin, S.; Montgomery, P.; Alexander, B.R.; Nguyen, D.-T.; Cruz, Y.; Stonich, D.A.; Walzer, G.; Rose, J.T.; Picard, S.C.; Liu, Z.; Rose, J.N.; Xiang, X.; Asiedu, J.; Durkin, D.; Levine, J.; Yang, J.J.; Schürer, S.C.; Braisted, J.C.; Southall, N.; Southern, M.R.; Chung, T.D.Y.; Brudz, S.; Tanega, C.; Schreiber, S.L.; Bittker, J.A.; Guha, R.; Clemons, P.A.

2015-01-01

BARD, the BioAssay Research Database (https://bard.nih.gov/) is a public database and suite of tools developed to provide access to bioassay data produced by the NIH Molecular Libraries Program (MLP). Data from 631 MLP projects were migrated to a new structured vocabulary designed to capture bioassay data in a formalized manner, with particular emphasis placed on the description of assay protocols. New data can be submitted to BARD with a user-friendly set of tools that assist in the creation of appropriately formatted datasets and assay definitions. Data published through the BARD application program interface (API) can be accessed by researchers using web-based query tools or a desktop client. Third-party developers wishing to create new tools can use the API to produce stand-alone tools or new plug-ins that can be integrated into BARD. The entire BARD suite of tools therefore supports three classes of researcher: those who wish to publish data, those who wish to mine data for testable hypotheses, and those in the developer community who wish to build tools that leverage this carefully curated chemical biology resource. PMID:25477388
The Comparative Toxicogenomics Database (CTD): A Resource for Comparative Toxicological Studies

PubMed Central

CJ, Mattingly; MC, Rosenstein; GT, Colby; JN, Forrest; JL, Boyer

2006-01-01

The etiology of most chronic diseases involves interactions between environmental factors and genes that modulate important biological processes (Olden and Wilson, 2000). We are developing the publicly available Comparative Toxicogenomics Database (CTD) to promote understanding about the effects of environmental chemicals on human health. CTD identifies interactions between chemicals and genes and facilitates cross-species comparative studies of these genes. The use of diverse animal models and cross-species comparative sequence studies has been critical for understanding basic physiological mechanisms and gene and protein functions. Similarly, these approaches will be valuable for exploring the molecular mechanisms of action of environmental chemicals and the genetic basis of differential susceptibility. PMID:16902965
Application of molecular docking for the degradation of organic pollutants in the environmental remediation: A review.

PubMed

Liu, Zhifeng; Liu, Yujie; Zeng, Guangming; Shao, Binbin; Chen, Ming; Li, Zhigang; Jiang, Yilin; Liu, Yang; Zhang, Yu; Zhong, Hua

2018-07-01

The molecular docking has been employed successfully to study the mechanism of biodegradation in the environmental remediation in the past few years, although medical science and biology are the main application areas for it. Molecular docking is a very convenient and low cost method to understand the reaction mechanism of proteins or enzymes with ligands with a high accuracy. This paper mainly provides a review for the application of molecular docking between organic pollutants and enzymes. It summarizes the fundamental knowledge of molecular docking, such as its theory, available softwares and main databases. Moreover, five types of pollutants, including phenols, BTEX (benzene, toluene, ethylbenzene, and xylenes), nitrile, polycyclic aromatic hydrocarbons (PAHs), and high polymer (e.g., lignin and cellulose), are discussed from molecular level. Different removal mechanisms are also explained in detail via docking technology. Even though this method shows promising application in the research of biodegradation, further studies are still needed to relate with actual condition. Copyright © 2018 Elsevier Ltd. All rights reserved.
Freshwater Biological Traits Database (Data Sources)

EPA Science Inventory

When EPA release the final report, Freshwater Biological Traits Database, it referenced numerous data sources that are included below. The Traits Database report covers the development of a database of freshwater biological traits with additional traits that are relevan...
Database resources of the National Center for Biotechnology Information

PubMed Central

Wheeler, David L.; Barrett, Tanya; Benson, Dennis A.; Bryant, Stephen H.; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M.; DiCuccio, Michael; Edgar, Ron; Federhen, Scott; Feolo, Michael; Geer, Lewis Y.; Helmberg, Wolfgang; Kapustin, Yuri; Khovayko, Oleg; Landsman, David; Lipman, David J.; Madden, Thomas L.; Maglott, Donna R.; Miller, Vadim; Ostell, James; Pruitt, Kim D.; Schuler, Gregory D.; Shumway, Martin; Sequeira, Edwin; Sherry, Steven T.; Sirotkin, Karl; Souvorov, Alexandre; Starchenko, Grigory; Tatusov, Roman L.; Tatusova, Tatiana A.; Wagner, Lukas; Yaschenko, Eugene

2008-01-01

In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data available through NCBI's web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link, Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genome, Genome Project and related tools, the Trace, Assembly, and Short Read Archives, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups, Influenza Viral Resources, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Database of Genotype and Phenotype, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool and the PubChem suite of small molecule databases. Augmenting the web applications are custom implementations of the BLAST program optimized to search specialized data sets. These resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. PMID:18045790
CORUM: the comprehensive resource of mammalian protein complexes

PubMed Central

Ruepp, Andreas; Brauner, Barbara; Dunger-Kaltenbach, Irmtraud; Frishman, Goar; Montrone, Corinna; Stransky, Michael; Waegele, Brigitte; Schmidt, Thorsten; Doudieu, Octave Noubibou; Stümpflen, Volker; Mewes, H. Werner

2008-01-01

Protein complexes are key molecular entities that integrate multiple gene products to perform cellular functions. The CORUM (http://mips.gsf.de/genre/proj/corum/index.html) database is a collection of experimentally verified mammalian protein complexes. Information is manually derived by critical reading of the scientific literature from expert annotators. Information about protein complexes includes protein complex names, subunits, literature references as well as the function of the complexes. For functional annotation, we use the FunCat catalogue that enables to organize the protein complex space into biologically meaningful subsets. The database contains more than 1750 protein complexes that are built from 2400 different genes, thus representing 12% of the protein-coding genes in human. A web-based system is available to query, view and download the data. CORUM provides a comprehensive dataset of protein complexes for discoveries in systems biology, analyses of protein networks and protein complex-associated diseases. Comparable to the MIPS reference dataset of protein complexes from yeast, CORUM intends to serve as a reference for mammalian protein complexes. PMID:17965090
Bioinformatics Goes to School—New Avenues for Teaching Contemporary Biology

PubMed Central

Wood, Louisa; Gebhardt, Philipp

2013-01-01

Since 2010, the European Molecular Biology Laboratory's (EMBL) Heidelberg laboratory and the European Bioinformatics Institute (EMBL-EBI) have jointly run bioinformatics training courses developed specifically for secondary school science teachers within Europe and EMBL member states. These courses focus on introducing bioinformatics, databases, and data-intensive biology, allowing participants to explore resources and providing classroom-ready materials to support them in sharing this new knowledge with their students. In this article, we chart our progress made in creating and running three bioinformatics training courses, including how the course resources are received by participants and how these, and bioinformatics in general, are subsequently used in the classroom. We assess the strengths and challenges of our approach, and share what we have learned through our interactions with European science teachers. PMID:23785266
Strategies for Advancing Disease Definition Using Biomarkers and Genetics: The Bipolar and Schizophrenia Network for Intermediate Phenotypes.

PubMed

Tamminga, Carol A; Pearlson, Godfrey D; Stan, Ana D; Gibbons, Robert D; Padmanabhan, Jaya; Keshavan, Matcheri; Clementz, Brett A

2017-01-01

It is critical for psychiatry as a field to develop approaches to define the molecular, cellular, and circuit basis of its brain diseases, especially for serious mental illnesses, and then to use these definitions to generate biologically based disease categories, as well as to explore disease mechanisms and illness etiologies. Our current reliance on phenomenology is inadequate to support exploration of molecular treatment targets and disease formulations, and the leap directly from phenomenology to disease biology has been limiting because of broad heterogeneity within conventional diagnoses. The questions addressed in this review are formulated around how we can use brain biomarkers to achieve disease categories that are biologically based. We have grouped together a series of vignettes as examples of early approaches, all using the Bipolar and Schizophrenia Network on Intermediate Phenotypes (BSNIP) biomarker database and collaborators, starting off with describing the foundational statistical methods for these goals. We use primarily criterion-free statistics to identify pertinent groups of involved genes related to psychosis as well as symptoms, and finally, to create new biologically based disease cohorts within the psychopathological dimension of psychosis. Although we do not put these results forward as final formulations, they represent a novel effort to rely minimally on phenomenology as a diagnostic tool and to fully embrace brain characteristics of structure, as well as molecular and cellular characteristics and function, to support disease definition in psychosis. Copyright © 2016. Published by Elsevier Inc.
STRUCTURAL BIOLOGY AND MOLECULAR MEDICINE RESEARCH PROGRAM (LSBMM)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Eisenberg, David S.

2008-07-15

The UCLA-DOE Institute of Genomics and Proteomics is an organized research unit of the University of California, sponsored by the Department of Energy through the mechanism of a Cooperative Agreement. Today the Institute consists of 10 Principal Investigators and 7 Associate Members, developing and applying technologies to promote the biological and environmental missions of the Department of Energy, and 5 Core Technology Centers to sustain this work. The focus is on understanding genomes, pathways and molecular machines in organisms of interest to DOE, with special emphasis on developing enabling technologies. Since it was founded in 1947, the UCLA-DOE Institute hasmore » adapted its mission to the research needs of DOE and its progenitor agencies as these research needs have changed. The Institute started as the AEC Laboratory of Nuclear Medicine, directed by Stafford Warren, who later became the founding Dean of the UCLA School of Medicine. In this sense, the entire UCLA medical center grew out of the precursor of our Institute. In 1963, the mission of the Institute was expanded into environmental studies by Director Ray Lunt. I became the third director in 1993, and in close consultation with David Galas and John Wooley of DOE, shifted the mission of the Institute towards genomics and proteomics. Since 1993, the Principal Investigators and Core Technology Centers are entirely new, and the Institute has separated from its former division concerned with PET imaging. The UCLA-DOE Institute shares the space of Boyer Hall with the Molecular Biology Institute, and assumes responsibility for the operation of the main core facilities. Fig. 1 gives the organizational chart of the Institute. Some of the benefits to the public of research carried out at the UCLA-DOE Institute include the following: The development of publicly accessible, web-based databases, including the Database of Protein Interactions, and the ProLinks database of genomicly inferred protein function linkages. The development of publicly accessible, web-based servers, including the HOTPATCH server, the ProKnow Server and the SAVEs server. All of these are accessible from the home page of the Institute. Advancing the science of bioenergy, in the laboratories of the Principal Investigators of the Institute, including the laboratories of Shimon Weiss, James Liao, James Bowie, Todd Yeates, Rob Gunsalus.« less
Mining and gene ontology based annotation of SSR markers from expressed sequence tags of Humulus lupulus

PubMed Central

Singh, Swati; Gupta, Sanchita; Mani, Ashutosh; Chaturvedi, Anoop

2012-01-01

Humulus lupulus is commonly known as hops, a member of the family moraceae. Currently many projects are underway leading to the accumulation of voluminous genomic and expressed sequence tag sequences in public databases. The genetically characterized domains in these databases are limited due to non-availability of reliable molecular markers. The large data of EST sequences are available in hops. The simple sequence repeat markers extracted from EST data are used as molecular markers for genetic characterization, in the present study. 25,495 EST sequences were examined and assembled to get full-length sequences. Maximum frequency distribution was shown by mononucleotide SSR motifs i.e. 60.44% in contig and 62.16% in singleton where as minimum frequency are observed for hexanucleotide SSR in contig (0.09%) and pentanucleotide SSR in singletons (0.12%). Maximum trinucleotide motifs code for Glutamic acid (GAA) while AT/TA were the most frequent repeat of dinucleotide SSRs. Flanking primer pairs were designed in-silico for the SSR containing sequences. Functional categorization of SSRs containing sequences was done through gene ontology terms like biological process, cellular component and molecular function. PMID:22368382

A Combined Pharmacophore Modeling, 3D QSAR and Virtual Screening Studies on Imidazopyridines as B-Raf Inhibitors

PubMed Central

Xie, Huiding; Chen, Lijun; Zhang, Jianqiang; Xie, Xiaoguang; Qiu, Kaixiong; Fu, Jijun

2015-01-01

B-Raf kinase is an important target in treatment of cancers. In order to design and find potent B-Raf inhibitors (BRIs), 3D pharmacophore models were created using the Genetic Algorithm with Linear Assignment of Hypermolecular Alignment of Database (GALAHAD). The best pharmacophore model obtained which was used in effective alignment of the data set contains two acceptor atoms, three donor atoms and three hydrophobes. In succession, comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) were performed on 39 imidazopyridine BRIs to build three dimensional quantitative structure-activity relationship (3D QSAR) models based on both pharmacophore and docking alignments. The CoMSIA model based on the pharmacophore alignment shows the best result (q2 = 0.621, r2pred = 0.885). This 3D QSAR approach provides significant insights that are useful for designing potent BRIs. In addition, the obtained best pharmacophore model was used for virtual screening against the NCI2000 database. The hit compounds were further filtered with molecular docking, and their biological activities were predicted using the CoMSIA model, and three potential BRIs with new skeletons were obtained. PMID:26035757
A Combined Pharmacophore Modeling, 3D QSAR and Virtual Screening Studies on Imidazopyridines as B-Raf Inhibitors.

PubMed

Xie, Huiding; Chen, Lijun; Zhang, Jianqiang; Xie, Xiaoguang; Qiu, Kaixiong; Fu, Jijun

2015-05-29

B-Raf kinase is an important target in treatment of cancers. In order to design and find potent B-Raf inhibitors (BRIs), 3D pharmacophore models were created using the Genetic Algorithm with Linear Assignment of Hypermolecular Alignment of Database (GALAHAD). The best pharmacophore model obtained which was used in effective alignment of the data set contains two acceptor atoms, three donor atoms and three hydrophobes. In succession, comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) were performed on 39 imidazopyridine BRIs to build three dimensional quantitative structure-activity relationship (3D QSAR) models based on both pharmacophore and docking alignments. The CoMSIA model based on the pharmacophore alignment shows the best result (q(2) = 0.621, r(2)(pred) = 0.885). This 3D QSAR approach provides significant insights that are useful for designing potent BRIs. In addition, the obtained best pharmacophore model was used for virtual screening against the NCI2000 database. The hit compounds were further filtered with molecular docking, and their biological activities were predicted using the CoMSIA model, and three potential BRIs with new skeletons were obtained.
Banking biological collections: data warehousing, data mining, and data dilemmas in genomics and global health policy.

PubMed

Blatt, R J R

2000-01-01

While DNA databases may offer the opportunity to (1) assess population-based prevalence of specific genes and variants, (2) simplify the search for molecular markers, (3) improve targeted drug discovery and development for disease management, (4) refine strategies for disease prevention, and (5) provide the data necessary for evidence-based decision-making, serious scientific and social questions remain. Whether samples are identified, coded, or anonymous, biological banking raises profound ethical and legal issues pertaining to access, informed consent, privacy and confidentiality of genomic information, civil liberties, patenting, and proprietary rights. This paper provides an overview of key policy issues and questions pertaining to biological banking, with a focus on developments in specimen collection, transnational distribution, and public health and academic-industry research alliances. It highlights the challenges posed by the commercialization of genomics, and proposes the need for harmonization of biological banking policies.
REDIdb: an upgraded bioinformatics resource for organellar RNA editing sites.

PubMed

Picardi, Ernesto; Regina, Teresa M R; Verbitskiy, Daniil; Brennicke, Axel; Quagliariello, Carla

2011-03-01

RNA editing is a post-transcriptional molecular process whereby the information in a genetic message is modified from that in the corresponding DNA template by means of nucleotide substitutions, insertions and/or deletions. It occurs mostly in organelles by clade-specific diverse and unrelated biochemical mechanisms. RNA editing events have been annotated in primary databases as GenBank and at more sophisticated level in the specialized databases REDIdb, dbRES and EdRNA. At present, REDIdb is the only freely available database that focuses on the organellar RNA editing process and annotates each editing modification in its biological context. Here we present an updated and upgraded release of REDIdb with a web-interface refurbished with graphical and computational facilities that improve RNA editing investigations. Details of the REDIdb features and novelties are illustrated and compared to other RNA editing databases. REDIdb is freely queried at http://biologia.unical.it/py_script/REDIdb/. Copyright © 2010 Elsevier B.V. and Mitochondria Research Society. All rights reserved.
EcoCyc: a comprehensive database resource for Escherichia coli

PubMed Central

Keseler, Ingrid M.; Collado-Vides, Julio; Gama-Castro, Socorro; Ingraham, John; Paley, Suzanne; Paulsen, Ian T.; Peralta-Gil, Martín; Karp, Peter D.

2005-01-01

The EcoCyc database (http://EcoCyc.org/) is a comprehensive source of information on the biology of the prototypical model organism Escherichia coli K12. The mission for EcoCyc is to contain both computable descriptions of, and detailed comments describing, all genes, proteins, pathways and molecular interactions in E.coli. Through ongoing manual curation, extensive information such as summary comments, regulatory information, literature citations and evidence types has been extracted from 8862 publications and added to Version 8.5 of the EcoCyc database. The EcoCyc database can be accessed through a World Wide Web interface, while the downloadable Pathway Tools software and data files enable computational exploration of the data and provide enhanced querying capabilities that web interfaces cannot support. For example, EcoCyc contains carefully curated information that can be used as training sets for bioinformatics prediction of entities such as promoters, operons, genetic networks, transcription factor binding sites, metabolic pathways, functionally related genes, protein complexes and protein–ligand interactions. PMID:15608210
EDULISS: a small-molecule database with data-mining and pharmacophore searching capabilities

PubMed Central

Hsin, Kun-Yi; Morgan, Hugh P.; Shave, Steven R.; Hinton, Andrew C.; Taylor, Paul; Walkinshaw, Malcolm D.

2011-01-01

We present the relational database EDULISS (EDinburgh University Ligand Selection System), which stores structural, physicochemical and pharmacophoric properties of small molecules. The database comprises a collection of over 4 million commercially available compounds from 28 different suppliers. A user-friendly web-based interface for EDULISS (available at http://eduliss.bch.ed.ac.uk/) has been established providing a number of data-mining possibilities. For each compound a single 3D conformer is stored along with over 1600 calculated descriptor values (molecular properties). A very efficient method for unique compound recognition, especially for a large scale database, is demonstrated by making use of small subgroups of the descriptors. Many of the shape and distance descriptors are held as pre-calculated bit strings permitting fast and efficient similarity and pharmacophore searches which can be used to identify families of related compounds for biological testing. Two ligand searching applications are given to demonstrate how EDULISS can be used to extract families of molecules with selected structural and biophysical features. PMID:21051336
Can all heritable biology really be reduced to a single dimension?

PubMed

Babbitt, Gregory A; Coppola, Erin E; Alawad, Mohammed A; Hudson, André O

2016-03-10

A long-held presupposition in the field of bioinformatics holds that genetic, and now even epigenetic 'information' can be abstracted from the physicochemical details of the macromolecular polymers in which it resides. It is perhaps rather ironic that this basic conjecture originated upon the first observations of DNA structure itself. This static model of DNA led very quickly to the conclusion that only the nucleobase sequence itself is rich enough in molecular complexity to replicate a complex biology. This idea has been pervasive throughout genomic science, higher education and popular culture ever since; to the point that most of us would accept it unquestioningly as fact. What is more alarming is that this conjecture is driving a significant portion of the technological development in modern genomics towards methods strongly rooted in DNA sequencing, thereby reducing a dynamic multi-dimensional biology into single-dimensional forms of data. Evidence countering this central tenet of bioinformatics has been quietly mounting over many decades, prompting some to propose that the genome must be studied from the perspective of its molecular reality, rather than as a body of information to be represented symbolically. Here, we explore the epistemological boundary between bioinformatics and molecular biology, and warn against an 'overtly' bioinformatic perspective. We review a selection of new bioinformatic methods that move beyond sequence-based approaches to include consideration of databased three dimensional structures. However, we also note that these hybrid methods still ignore the most important element of gene function when attempting to improve outcomes; the fourth dimension of molecular dynamics over time. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
GenBank.

PubMed

Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Sayers, Eric W

2010-01-01

GenBank is a comprehensive database that contains publicly available nucleotide sequences for more than 300,000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bi-monthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI homepage: www.ncbi.nlm.nih.gov.
GenBank.

PubMed

Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Sayers, Eric W

2009-01-01

GenBank is a comprehensive database that contains publicly available nucleotide sequences for more than 300,000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank(R) staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the National Center for Biotechnology Information (NCBI) Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov.
Search extension transforms Wiki into a relational system: a case for flavonoid metabolite database.

PubMed

Arita, Masanori; Suwa, Kazuhiro

2008-09-17

In computer science, database systems are based on the relational model founded by Edgar Codd in 1970. On the other hand, in the area of biology the word 'database' often refers to loosely formatted, very large text files. Although such bio-databases may describe conflicts or ambiguities (e.g. a protein pair do and do not interact, or unknown parameters) in a positive sense, the flexibility of the data format sacrifices a systematic query mechanism equivalent to the widely used SQL. To overcome this disadvantage, we propose embeddable string-search commands on a Wiki-based system and designed a half-formatted database. As proof of principle, a database of flavonoid with 6902 molecular structures from over 1687 plant species was implemented on MediaWiki, the background system of Wikipedia. Registered users can describe any information in an arbitrary format. Structured part is subject to text-string searches to realize relational operations. The system was written in PHP language as the extension of MediaWiki. All modifications are open-source and publicly available. This scheme benefits from both the free-formatted Wiki style and the concise and structured relational-database style. MediaWiki supports multi-user environments for document management, and the cost for database maintenance is alleviated.
Search extension transforms Wiki into a relational system: A case for flavonoid metabolite database

PubMed Central

Arita, Masanori; Suwa, Kazuhiro

2008-01-01

Background In computer science, database systems are based on the relational model founded by Edgar Codd in 1970. On the other hand, in the area of biology the word 'database' often refers to loosely formatted, very large text files. Although such bio-databases may describe conflicts or ambiguities (e.g. a protein pair do and do not interact, or unknown parameters) in a positive sense, the flexibility of the data format sacrifices a systematic query mechanism equivalent to the widely used SQL. Results To overcome this disadvantage, we propose embeddable string-search commands on a Wiki-based system and designed a half-formatted database. As proof of principle, a database of flavonoid with 6902 molecular structures from over 1687 plant species was implemented on MediaWiki, the background system of Wikipedia. Registered users can describe any information in an arbitrary format. Structured part is subject to text-string searches to realize relational operations. The system was written in PHP language as the extension of MediaWiki. All modifications are open-source and publicly available. Conclusion This scheme benefits from both the free-formatted Wiki style and the concise and structured relational-database style. MediaWiki supports multi-user environments for document management, and the cost for database maintenance is alleviated. PMID:18822113
PrenDB, a Substrate Prediction Database to Enable Biocatalytic Use of Prenyltransferases.

PubMed

Gunera, Jakub; Kindinger, Florian; Li, Shu-Ming; Kolb, Peter

2017-03-10

Prenyltransferases of the dimethylallyltryptophan synthase (DMATS) superfamily catalyze the attachment of prenyl or prenyl-like moieties to diverse acceptor compounds. These acceptor molecules are generally aromatic in nature and mostly indole or indole-like. Their catalytic transformation represents a major skeletal diversification step in the biosynthesis of secondary metabolites, including the indole alkaloids. DMATS enzymes thus contribute significantly to the biological and pharmacological diversity of small molecule metabolites. Understanding the substrate specificity of these enzymes could create opportunities for their biocatalytic use in preparing complex synthetic scaffolds. However, there has been no framework to achieve this in a rational way. Here, we report a chemoinformatic pipeline to enable prenyltransferase substrate prediction. We systematically catalogued 32 unique prenyltransferases and 167 unique substrates to create possible reaction matrices and compiled these data into a browsable database named PrenDB. We then used a newly developed algorithm based on molecular fragmentation to automatically extract reactive chemical epitopes. The analysis of the collected data sheds light on the thus far explored substrate space of DMATS enzymes. To assess the predictive performance of our virtual reaction extraction tool, 38 potential substrates were tested as prenyl acceptors in assays with three prenyltransferases, and we were able to detect turnover in >55% of the cases. The database, PrenDB (www.kolblab.org/prendb.php), enables the prediction of potential substrates for chemoenzymatic synthesis through substructure similarity and virtual chemical transformation techniques. It aims at making prenyltransferases and their highly regio- and stereoselective reactions accessible to the research community for integration in synthetic work flows. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
neXtA5: accelerating annotation of articles via automated approaches in neXtProt.

PubMed

Mottin, Luc; Gobeill, Julien; Pasche, Emilie; Michel, Pierre-André; Cusin, Isabelle; Gaudet, Pascale; Ruch, Patrick

2016-01-01

The rapid increase in the number of published articles poses a challenge for curated databases to remain up-to-date. To help the scientific community and database curators deal with this issue, we have developed an application, neXtA5, which prioritizes the literature for specific curation requirements. Our system, neXtA5, is a curation service composed of three main elements. The first component is a named-entity recognition module, which annotates MEDLINE over some predefined axes. This report focuses on three axes: Diseases, the Molecular Function and Biological Process sub-ontologies of the Gene Ontology (GO). The automatic annotations are then stored in a local database, BioMed, for each annotation axis. Additional entities such as species and chemical compounds are also identified. The second component is an existing search engine, which retrieves the most relevant MEDLINE records for any given query. The third component uses the content of BioMed to generate an axis-specific ranking, which takes into account the density of named-entities as stored in the Biomed database. The two ranked lists are ultimately merged using a linear combination, which has been specifically tuned to support the annotation of each axis. The fine-tuning of the coefficients is formally reported for each axis-driven search. Compared with PubMed, which is the system used by most curators, the improvement is the following: +231% for Diseases, +236% for Molecular Functions and +3153% for Biological Process when measuring the precision of the top-returned PMID (P0 or mean reciprocal rank). The current search methods significantly improve the search effectiveness of curators for three important curation axes. Further experiments are being performed to extend the curation types, in particular protein-protein interactions, which require specific relationship extraction capabilities. In parallel, user-friendly interfaces powered with a set of JSON web services are currently being implemented into the neXtProt annotation pipeline.Available on: http://babar.unige.ch:8082/neXtA5Database URL: http://babar.unige.ch:8082/neXtA5/fetcher.jsp. © The Author(s) 2016. Published by Oxford University Press.
Meta-All: a system for managing metabolic pathway information.

PubMed

Weise, Stephan; Grosse, Ivo; Klukas, Christian; Koschützki, Dirk; Scholz, Uwe; Schreiber, Falk; Junker, Björn H

2006-10-23

Many attempts are being made to understand biological subjects at a systems level. A major resource for these approaches are biological databases, storing manifold information about DNA, RNA and protein sequences including their functional and structural motifs, molecular markers, mRNA expression levels, metabolite concentrations, protein-protein interactions, phenotypic traits or taxonomic relationships. The use of these databases is often hampered by the fact that they are designed for special application areas and thus lack universality. Databases on metabolic pathways, which provide an increasingly important foundation for many analyses of biochemical processes at a systems level, are no exception from the rule. Data stored in central databases such as KEGG, BRENDA or SABIO-RK is often limited to read-only access. If experimentalists want to store their own data, possibly still under investigation, there are two possibilities. They can either develop their own information system for managing that own data, which is very time-consuming and costly, or they can try to store their data in existing systems, which is often restricted. Hence, an out-of-the-box information system for managing metabolic pathway data is needed. We have designed META-ALL, an information system that allows the management of metabolic pathways, including reaction kinetics, detailed locations, environmental factors and taxonomic information. Data can be stored together with quality tags and in different parallel versions. META-ALL uses Oracle DBMS and Oracle Application Express. We provide the META-ALL information system for download and use. In this paper, we describe the database structure and give information about the tools for submitting and accessing the data. As a first application of META-ALL, we show how the information contained in a detailed kinetic model can be stored and accessed. META-ALL is a system for managing information about metabolic pathways. It facilitates the handling of pathway-related data and is designed to help biochemists and molecular biologists in their daily research. It is available on the Web at http://bic-gh.de/meta-all and can be downloaded free of charge and installed locally.
neXtA5: accelerating annotation of articles via automated approaches in neXtProt

PubMed Central

Mottin, Luc; Gobeill, Julien; Pasche, Emilie; Michel, Pierre-André; Cusin, Isabelle; Gaudet, Pascale; Ruch, Patrick

2016-01-01

The rapid increase in the number of published articles poses a challenge for curated databases to remain up-to-date. To help the scientific community and database curators deal with this issue, we have developed an application, neXtA5, which prioritizes the literature for specific curation requirements. Our system, neXtA5, is a curation service composed of three main elements. The first component is a named-entity recognition module, which annotates MEDLINE over some predefined axes. This report focuses on three axes: Diseases, the Molecular Function and Biological Process sub-ontologies of the Gene Ontology (GO). The automatic annotations are then stored in a local database, BioMed, for each annotation axis. Additional entities such as species and chemical compounds are also identified. The second component is an existing search engine, which retrieves the most relevant MEDLINE records for any given query. The third component uses the content of BioMed to generate an axis-specific ranking, which takes into account the density of named-entities as stored in the Biomed database. The two ranked lists are ultimately merged using a linear combination, which has been specifically tuned to support the annotation of each axis. The fine-tuning of the coefficients is formally reported for each axis-driven search. Compared with PubMed, which is the system used by most curators, the improvement is the following: +231% for Diseases, +236% for Molecular Functions and +3153% for Biological Process when measuring the precision of the top-returned PMID (P0 or mean reciprocal rank). The current search methods significantly improve the search effectiveness of curators for three important curation axes. Further experiments are being performed to extend the curation types, in particular protein–protein interactions, which require specific relationship extraction capabilities. In parallel, user-friendly interfaces powered with a set of JSON web services are currently being implemented into the neXtProt annotation pipeline. Available on: http://babar.unige.ch:8082/neXtA5 Database URL: http://babar.unige.ch:8082/neXtA5/fetcher.jsp PMID:27374119
Meta-All: a system for managing metabolic pathway information

PubMed Central

Weise, Stephan; Grosse, Ivo; Klukas, Christian; Koschützki, Dirk; Scholz, Uwe; Schreiber, Falk; Junker, Björn H

2006-01-01

Background Many attempts are being made to understand biological subjects at a systems level. A major resource for these approaches are biological databases, storing manifold information about DNA, RNA and protein sequences including their functional and structural motifs, molecular markers, mRNA expression levels, metabolite concentrations, protein-protein interactions, phenotypic traits or taxonomic relationships. The use of these databases is often hampered by the fact that they are designed for special application areas and thus lack universality. Databases on metabolic pathways, which provide an increasingly important foundation for many analyses of biochemical processes at a systems level, are no exception from the rule. Data stored in central databases such as KEGG, BRENDA or SABIO-RK is often limited to read-only access. If experimentalists want to store their own data, possibly still under investigation, there are two possibilities. They can either develop their own information system for managing that own data, which is very time-consuming and costly, or they can try to store their data in existing systems, which is often restricted. Hence, an out-of-the-box information system for managing metabolic pathway data is needed. Results We have designed META-ALL, an information system that allows the management of metabolic pathways, including reaction kinetics, detailed locations, environmental factors and taxonomic information. Data can be stored together with quality tags and in different parallel versions. META-ALL uses Oracle DBMS and Oracle Application Express. We provide the META-ALL information system for download and use. In this paper, we describe the database structure and give information about the tools for submitting and accessing the data. As a first application of META-ALL, we show how the information contained in a detailed kinetic model can be stored and accessed. Conclusion META-ALL is a system for managing information about metabolic pathways. It facilitates the handling of pathway-related data and is designed to help biochemists and molecular biologists in their daily research. It is available on the Web at and can be downloaded free of charge and installed locally. PMID:17059592
The Transporter Classification Database: recent advances.

PubMed

Saier, Milton H; Yen, Ming Ren; Noto, Keith; Tamang, Dorjee G; Elkan, Charles

2009-01-01

The Transporter Classification Database (TCDB), freely accessible at http://www.tcdb.org, is a relational database containing sequence, structural, functional and evolutionary information about transport systems from a variety of living organisms, based on the International Union of Biochemistry and Molecular Biology-approved transporter classification (TC) system. It is a curated repository for factual information compiled largely from published references. It uses a functional/phylogenetic system of classification, and currently encompasses about 5000 representative transporters and putative transporters in more than 500 families. We here describe novel software designed to support and extend the usefulness of TCDB. Our recent efforts render it more user friendly, incorporate machine learning to input novel data in a semiautomatic fashion, and allow analyses that are more accurate and less time consuming. The availability of these tools has resulted in recognition of distant phylogenetic relationships and tremendous expansion of the information available to TCDB users.
RiboDB Database: A Comprehensive Resource for Prokaryotic Systematics.

PubMed

Jauffrit, Frédéric; Penel, Simon; Delmotte, Stéphane; Rey, Carine; de Vienne, Damien M; Gouy, Manolo; Charrier, Jean-Philippe; Flandrois, Jean-Pierre; Brochier-Armanet, Céline

2016-08-01

Ribosomal proteins (r-proteins) are increasingly used as an alternative to ribosomal rRNA for prokaryotic systematics. However, their routine use is difficult because r-proteins are often not or wrongly annotated in complete genome sequences, and there is currently no dedicated exhaustive database of r-proteins. RiboDB aims at fulfilling this gap. This weekly updated comprehensive database allows the fast and easy retrieval of r-protein sequences from publicly available complete prokaryotic genome sequences. The current version of RiboDB contains 90 r-proteins from 3,750 prokaryotic complete genomes encompassing 38 phyla/major classes and 1,759 different species. RiboDB is accessible at http://ribodb.univ-lyon1.fr and through ACNUC interfaces. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
IntNetDB v1.0: an integrated protein-protein interaction network database generated by a probabilistic model

PubMed Central

Xia, Kai; Dong, Dong; Han, Jing-Dong J

2006-01-01

Background Although protein-protein interaction (PPI) networks have been explored by various experimental methods, the maps so built are still limited in coverage and accuracy. To further expand the PPI network and to extract more accurate information from existing maps, studies have been carried out to integrate various types of functional relationship data. A frequently updated database of computationally analyzed potential PPIs to provide biological researchers with rapid and easy access to analyze original data as a biological network is still lacking. Results By applying a probabilistic model, we integrated 27 heterogeneous genomic, proteomic and functional annotation datasets to predict PPI networks in human. In addition to previously studied data types, we show that phenotypic distances and genetic interactions can also be integrated to predict PPIs. We further built an easy-to-use, updatable integrated PPI database, the Integrated Network Database (IntNetDB) online, to provide automatic prediction and visualization of PPI network among genes of interest. The networks can be visualized in SVG (Scalable Vector Graphics) format for zooming in or out. IntNetDB also provides a tool to extract topologically highly connected network neighborhoods from a specific network for further exploration and research. Using the MCODE (Molecular Complex Detections) algorithm, 190 such neighborhoods were detected among all the predicted interactions. The predicted PPIs can also be mapped to worm, fly and mouse interologs. Conclusion IntNetDB includes 180,010 predicted protein-protein interactions among 9,901 human proteins and represents a useful resource for the research community. Our study has increased prediction coverage by five-fold. IntNetDB also provides easy-to-use network visualization and analysis tools that allow biological researchers unfamiliar with computational biology to access and analyze data over the internet. The web interface of IntNetDB is freely accessible at . Visualization requires Mozilla version 1.8 (or higher) or Internet Explorer with installation of SVGviewer. PMID:17112386
Getting the most out of parasitic helminth transcriptomes using HelmDB: implications for biology and biotechnology.

PubMed

Mangiola, Stefano; Young, Neil D; Korhonen, Pasi; Mondal, Alinda; Scheerlinck, Jean-Pierre; Sternberg, Paul W; Cantacessi, Cinzia; Hall, Ross S; Jex, Aaron R; Gasser, Robin B

2013-12-01

Compounded by a massive global food shortage, many parasitic diseases have a devastating, long-term impact on animal and human health and welfare worldwide. Parasitic helminths (worms) affect the health of billions of animals. Unlocking the systems biology of these neglected pathogens will underpin the design of new and improved interventions against them. Currently, the functional annotation of genomic and transcriptomic sequence data for socio-economically important parasitic worms relies almost exclusively on comparative bioinformatic analyses using model organism- and other databases. However, many genes and gene products of parasitic helminths (often >50%) cannot be annotated using this approach, because they are specific to parasites and/or do not have identifiable homologs in other organisms for which sequence data are available. This inability to fully annotate transcriptomes and predicted proteomes is a major challenge and constrains our understanding of the biology of parasites, interactions with their hosts and of parasitism and the pathogenesis of disease on a molecular level. In the present article, we compiled transcriptomic data sets of key, socioeconomically important parasitic helminths, and constructed and validated a curated database, called HelmDB (www.helmdb.org). We demonstrate how this database can be used effectively for the improvement of functional annotation by employing data integration and clustering. Importantly, HelmDB provides a practical and user-friendly toolkit for sequence browsing and comparative analyses among divergent helminth groups (including nematodes and trematodes), and should be readily adaptable and applicable to a wide range of other organisms. This web-based, integrative database should assist 'systems biology' studies of parasitic helminths, and the discovery and prioritization of novel drug and vaccine targets. This focus provides a pathway toward developing new and improved approaches for the treatment and control of parasitic diseases, with the potential for important biotechnological outcomes. Copyright © 2012 Elsevier Inc. All rights reserved.

Biopython: freely available Python tools for computational molecular biology and bioinformatics

PubMed Central

Cock, Peter J. A.; Antao, Tiago; Chang, Jeffrey T.; Chapman, Brad A.; Cox, Cymon J.; Dalke, Andrew; Friedberg, Iddo; Hamelryck, Thomas; Kauff, Frank; Wilczynski, Bartek; de Hoon, Michiel J. L.

2009-01-01

Summary: The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Biopython includes modules for reading and writing different sequence file formats and multiple sequence alignments, dealing with 3D macro molecular structures, interacting with common tools such as BLAST, ClustalW and EMBOSS, accessing key online databases, as well as providing numerical methods for statistical learning. Availability: Biopython is freely available, with documentation and source code at www.biopython.org under the Biopython license. Contact: All queries should be directed to the Biopython mailing lists, see www.biopython.org/wiki/_Mailing_listspeter.cock@scri.ac.uk. PMID:19304878
Identification of an expressed gene in Dipylidium caninum.

PubMed

Miranda, Rodrigo R C; Costa-Júnior, Livio M; Campos, Artur K; Santos, Hudson A; Rabelo, Elida M L

2004-10-01

Recombinant DNA studies have been focused on developing vaccines to different cestodes. But few studies involving Dipylidium caninum molecular biology and genes have been done. Only partial sequences of mitochondrial DNA and ribosomal RNA gene are available in databases. Any molecular work with this parasite, including epidemiology, study of drug-resistant strains, and vaccine development, is hampered by the lack of knowledge of its genome. Thus, the knowledge of specific genes of different developmental stages of D. caninum is crucial to locate potential targets to be used as candidates to develop a vaccine and/or new drugs against this parasite. Here we report, for the first time, the sequencing of a fragment of a D. caninum expressed gene.
Structural-functional diversity of the natural oligopeptides.

PubMed

Zamyatnin, Alexander A

2018-03-01

Natural oligopeptides may regulate nearly all vital processes. To date, the chemical structures of many oligopeptides have been identified from >2000 organisms representing all the biological kingdoms. We have considered a number of mathematical (sequence length), chemical, physical, and biological features of an array of natural oligopeptides on the basis of the oligopeptide EROP-Moscow database (http://erop.inbi.ras.ru, 15,351 entries) data. There is the substantial difference of these substances from polypeptide molecules of proteins according to their physicochemical characteristics. These characteristics may be critical for understanding the molecular mechanisms of the action of oligopeptides that lead to the development of physiological effects. Copyright © 2017 Elsevier Ltd. All rights reserved.
Toward genome-enabled mycology.

PubMed

Hibbett, David S; Stajich, Jason E; Spatafora, Joseph W

2013-01-01

Genome-enabled mycology is a rapidly expanding field that is characterized by the pervasive use of genome-scale data and associated computational tools in all aspects of fungal biology. Genome-enabled mycology is integrative and often requires teams of researchers with diverse skills in organismal mycology, bioinformatics and molecular biology. This issue of Mycologia presents the first complete fungal genomes in the history of the journal, reflecting the ongoing transformation of mycology into a genome-enabled science. Here, we consider the prospects for genome-enabled mycology and the technical and social challenges that will need to be overcome to grow the database of complete fungal genomes and enable all fungal biologists to make use of the new data.
Biologically active ligands for yersinia outer protein H (YopH): feature based pharmacophore screening, docking and molecular dynamics studies.

PubMed

Tamilvanan, Thangaraju; Hopper, Waheeta

2014-01-01

Yersinia pestis, a Gram negative bacillus, spreads via lymphatic to lymph nodes and to all organs through the bloodstream, causing plague. Yersinia outer protein H (YopH) is one of the important effector proteins, which paralyzes lymphocytes and macrophages by dephosphorylating critical tyrosine kinases and signal transduction molecules. The purpose of the study is to generate a three-dimensional (3D) pharmacophore model by using diverse sets of YopH inhibitors, which would be useful for designing of potential antitoxin. In this study, we have selected 60 biologically active inhibitors of YopH to perform Ligand based pharmacophore study to elucidate the important structural features responsible for biological activity. Pharmacophore model demonstrated the importance of two acceptors, one hydrophobic and two aromatic features toward the biological activity. Based on these features, different databases were screened to identify novel compounds and these ligands were subjected for docking, ADME properties and Binding energy prediction. Post docking validation was performed using molecular dynamics simulation for selected ligands to calculate the Root Mean Square Deviation (RMSD) and Root Mean Square Fluctuation (RMSF). The ligands, ASN03270114, Mol_252138, Mol_31073 and ZINC04237078 may act as inhibitors against YopH of Y. pestis.
Teaching the structure of immunoglobulins by molecular visualization and SDS-PAGE analysis.

PubMed

Rižner, Tea Lanišnik

2014-01-01

This laboratory class combines molecular visualization and laboratory experimentation to teach the structure of the immunoglobulins (Ig). In the first part of the class, the three-dimensional structures of the human IgG and IgM molecules available through the RCSB PDB database are visualized using freely available software. In the second part, IgG and IgM are studied using electrophoretic methods. Through SDS-PAGE analysis under reducing conditions, the students determine the number and molecular masses of the polypeptide chains, while through SDS-PAGE under nonreducing conditions, the students assess the oligomerization of these Ig molecules. The aims of this class are to expand upon the knowledge and understanding of the Ig structure that the students have gained from classroom lectures. The combination of this molecular visualization of the Ig molecules and the SDS-PAGE experimentation ensures variety in the teaching techniques, while the implication of the Ig molecules in human disease promotes interest for biomedical students. © 2014 by The International Union of Biochemistry and Molecular Biology.
Database resources of the National Center for Biotechnology Information

PubMed Central

Sayers, Eric W.; Barrett, Tanya; Benson, Dennis A.; Bolton, Evan; Bryant, Stephen H.; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M.; DiCuccio, Michael; Federhen, Scott; Feolo, Michael; Fingerman, Ian M.; Geer, Lewis Y.; Helmberg, Wolfgang; Kapustin, Yuri; Krasnov, Sergey; Landsman, David; Lipman, David J.; Lu, Zhiyong; Madden, Thomas L.; Madej, Tom; Maglott, Donna R.; Marchler-Bauer, Aron; Miller, Vadim; Karsch-Mizrachi, Ilene; Ostell, James; Panchenko, Anna; Phan, Lon; Pruitt, Kim D.; Schuler, Gregory D.; Sequeira, Edwin; Sherry, Stephen T.; Shumway, Martin; Sirotkin, Karl; Slotta, Douglas; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A.; Wagner, Lukas; Wang, Yanli; Wilbur, W. John; Yaschenko, Eugene; Ye, Jian

2012-01-01

In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Website. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Probe, Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. PMID:22140104
Database resources of the National Center for Biotechnology Information

PubMed Central

2013-01-01

In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, the Genetic Testing Registry, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Probe, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page. PMID:23193264
Database resources of the National Center for Biotechnology Information.

PubMed

Wheeler, David L; Barrett, Tanya; Benson, Dennis A; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; DiCuccio, Michael; Edgar, Ron; Federhen, Scott; Geer, Lewis Y; Kapustin, Yuri; Khovayko, Oleg; Landsman, David; Lipman, David J; Madden, Thomas L; Maglott, Donna R; Ostell, James; Miller, Vadim; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Steven T; Sirotkin, Karl; Souvorov, Alexandre; Starchenko, Grigory; Tatusov, Roman L; Tatusova, Tatiana A; Wagner, Lukas; Yaschenko, Eugene

2007-01-01

In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link(BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genome, Genome Project and related tools, the Trace and Assembly Archives, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Viral Genotyping Tools, Influenza Viral Resources, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. These resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Database resources of the National Center for Biotechnology Information.

PubMed

Sayers, Eric W; Barrett, Tanya; Benson, Dennis A; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; DiCuccio, Michael; Edgar, Ron; Federhen, Scott; Feolo, Michael; Geer, Lewis Y; Helmberg, Wolfgang; Kapustin, Yuri; Landsman, David; Lipman, David J; Madden, Thomas L; Maglott, Donna R; Miller, Vadim; Mizrachi, Ilene; Ostell, James; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Stephen T; Shumway, Martin; Sirotkin, Karl; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A; Wagner, Lukas; Yaschenko, Eugene; Ye, Jian

2009-01-01

In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the web applications is custom implementation of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
NABIC marker database: A molecular markers information network of agricultural crops.

PubMed

Kim, Chang-Kug; Seol, Young-Joo; Lee, Dong-Jun; Jeong, In-Seon; Yoon, Ung-Han; Lee, Gang-Seob; Hahn, Jang-Ho; Park, Dong-Suk

2013-01-01

In 2013, National Agricultural Biotechnology Information Center (NABIC) reconstructs a molecular marker database for useful genetic resources. The web-based marker database consists of three major functional categories: map viewer, RSN marker and gene annotation. It provides 7250 marker locations, 3301 RSN marker property, 3280 molecular marker annotation information in agricultural plants. The individual molecular marker provides information such as marker name, expressed sequence tag number, gene definition and general marker information. This updated marker-based database provides useful information through a user-friendly web interface that assisted in tracing any new structures of the chromosomes and gene positional functions using specific molecular markers. The database is available for free at http://nabic.rda.go.kr/gere/rice/molecularMarkers/
Unicellular eukaryotes as models in cell and molecular biology: critical appraisal of their past and future value.

PubMed

Simon, Martin; Plattner, Helmut

2014-01-01

Unicellular eukaryotes have been appreciated as model systems for the analysis of crucial questions in cell and molecular biology. This includes Dictyostelium (chemotaxis, amoeboid movement, phagocytosis), Tetrahymena (telomere structure, telomerase function), Paramecium (variant surface antigens, exocytosis, phagocytosis cycle) or both ciliates (ciliary beat regulation, surface pattern formation), Chlamydomonas (flagellar biogenesis and beat), and yeast (S. cerevisiae) for innumerable aspects. Nowadays many problems may be tackled with "higher" eukaryotic/metazoan cells for which full genomic information as well as domain databases, etc., were available long before protozoa. Established molecular tools, commercial antibodies, and established pharmacology are additional advantages available for higher eukaryotic cells. Moreover, an increasing number of inherited genetic disturbances in humans have become elucidated and can serve as new models. Among lower eukaryotes, yeast will remain a standard model because of its peculiarities, including its reduced genome and availability in the haploid form. But do protists still have a future as models? This touches not only the basic understanding of biology but also practical aspects of research, such as fund raising. As we try to scrutinize, due to specific advantages some protozoa should and will remain favorable models for analyzing novel genes or specific aspects of cell structure and function. Outstanding examples are epigenetic phenomena-a field of rising interest. © 2014 Elsevier Inc. All rights reserved.
Genetics and attribution issues that confront the microbial forensics field.

PubMed

Budowle, Bruce

2004-12-02

The commission of an act of bioterrorism or biocrime is a real concern for law enforcement and society. Efforts are underway to develop a strong microbial forensic program to assist in identifying perpetrators of acts of bioterrorism and biocrimes, as well as serve as a deterrent for those who might commit such illicit acts. Genetic analyses of microbial organisms will likely be a powerful tool for attribution of criminal acts. There are some similarities to forensic human DNA analysis practices, such as: molecular biology technology, use of population databases, qualitative conclusions of test results, and the application of QA/QC practices. Differences include: database size and composition, statistical interpretation methods, and confidence/uncertainty in the outcome of an interpretation.
Bioinformatics Analysis Reveals Distinct Molecular Characteristics of Hepatitis B-Related Hepatocellular Carcinomas from Very Early to Advanced Barcelona Clinic Liver Cancer Stages.

PubMed

Kong, Fan-Yun; Wei, Xiao; Zhou, Kai; Hu, Wei; Kou, Yan-Bo; You, Hong-Juan; Liu, Xiao-Mei; Zheng, Kui-Yang; Tang, Ren-Xian

2016-01-01

Hepatocellular carcinoma (HCC)is the fifth most common malignancy associated with high mortality. One of the risk factors for HCC is chronic hepatitis B virus (HBV) infection. The treatment strategy for the disease is dependent on the stage of HCC, and the Barcelona clinic liver cancer (BCLC) staging system is used in most HCC cases. However, the molecular characteristics of HBV-related HCC in different BCLC stages are still unknown. Using GSE14520 microarray data from HBV-related HCC cases with BCLC stages from 0 (very early stage) to C (advanced stage) in the gene expression omnibus (GEO) database, differentially expressed genes (DEGs), including common DEGs and unique DEGs in different BCLC stages, were identified. These DEGs were located on different chromosomes. The molecular functions and biology pathways of DEGs were identified by gene ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, and the interactome networks of DEGs were constructed using the NetVenn online tool. The results revealed that both common DEGs and stage-specific DEGs were associated with various molecular functions and were involved in special biological pathways. In addition, several hub genes were found in the interactome networks of DEGs. The identified DEGs and hub genes promote our understanding of the molecular mechanisms underlying the development of HBV-related HCC through the different BCLC stages, and might be used as staging biomarkers or molecular targets for the treatment of HCC with HBV infection.
The Gene Ontology (GO) project: structured vocabularies for molecular biology and their application to genome and expression analysis.

PubMed

Blake, Judith A; Harris, Midori A

2008-09-01

Scientists wishing to utilize genomic data have quickly come to realize the benefit of standardizing descriptions of experimental procedures and results for computer-driven information retrieval systems. The focus of the Gene Ontology project is three-fold. First, the project goal is to compile the Gene Ontologies: structured vocabularies describing domains of molecular biology. Second, the project supports the use of these structured vocabularies in the annotation of gene products. Third, the gene product-to-GO annotation sets are provided by participating groups to the public through open access to the GO database and Web resource. This unit describes the current ontologies and what is beyond the scope of the Gene Ontology project. It addresses the issue of how GO vocabularies are constructed and related to genes and gene products. It concludes with a discussion of how researchers can access, browse, and utilize the GO project in the course of their own research. Copyright 2008 by John Wiley & Sons, Inc.
libChEBI: an API for accessing the ChEBI database.

PubMed

Swainston, Neil; Hastings, Janna; Dekker, Adriano; Muthukrishnan, Venkatesh; May, John; Steinbeck, Christoph; Mendes, Pedro

2016-01-01

ChEBI is a database and ontology of chemical entities of biological interest. It is widely used as a source of identifiers to facilitate unambiguous reference to chemical entities within biological models, databases, ontologies and literature. ChEBI contains a wealth of chemical data, covering over 46,500 distinct chemical entities, and related data such as chemical formula, charge, molecular mass, structure, synonyms and links to external databases. Furthermore, ChEBI is an ontology, and thus provides meaningful links between chemical entities. Unlike many other resources, ChEBI is fully human-curated, providing a reliable, non-redundant collection of chemical entities and related data. While ChEBI is supported by a web service for programmatic access and a number of download files, it does not have an API library to facilitate the use of ChEBI and its data in cheminformatics software. To provide this missing functionality, libChEBI, a comprehensive API library for accessing ChEBI data, is introduced. libChEBI is available in Java, Python and MATLAB versions from http://github.com/libChEBI, and provides full programmatic access to all data held within the ChEBI database through a simple and documented API. libChEBI is reliant upon the (automated) download and regular update of flat files that are held locally. As such, libChEBI can be embedded in both on- and off-line software applications. libChEBI allows better support of ChEBI and its data in the development of new cheminformatics software. Covering three key programming languages, it allows for the entirety of the ChEBI database to be accessed easily and quickly through a simple API. All code is open access and freely available.
MARRVEL: Integration of Human and Model Organism Genetic Resources to Facilitate Functional Annotation of the Human Genome.

PubMed

Wang, Julia; Al-Ouran, Rami; Hu, Yanhui; Kim, Seon-Young; Wan, Ying-Wooi; Wangler, Michael F; Yamamoto, Shinya; Chao, Hsiao-Tuan; Comjean, Aram; Mohr, Stephanie E; Perrimon, Norbert; Liu, Zhandong; Bellen, Hugo J

2017-06-01

One major challenge encountered with interpreting human genetic variants is the limited understanding of the functional impact of genetic alterations on biological processes. Furthermore, there remains an unmet demand for an efficient survey of the wealth of information on human homologs in model organisms across numerous databases. To efficiently assess the large volume of publically available information, it is important to provide a concise summary of the most relevant information in a rapid user-friendly format. To this end, we created MARRVEL (model organism aggregated resources for rare variant exploration). MARRVEL is a publicly available website that integrates information from six human genetic databases and seven model organism databases. For any given variant or gene, MARRVEL displays information from OMIM, ExAC, ClinVar, Geno2MP, DGV, and DECIPHER. Importantly, it curates model organism-specific databases to concurrently display a concise summary regarding the human gene homologs in budding and fission yeast, worm, fly, fish, mouse, and rat on a single webpage. Experiment-based information on tissue expression, protein subcellular localization, biological process, and molecular function for the human gene and homologs in the seven model organisms are arranged into a concise output. Hence, rather than visiting multiple separate databases for variant and gene analysis, users can obtain important information by searching once through MARRVEL. Altogether, MARRVEL dramatically improves efficiency and accessibility to data collection and facilitates analysis of human genes and variants by cross-disciplinary integration of 18 million records available in public databases to facilitate clinical diagnosis and basic research. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Identification of Biological Targets of Therapeutic Intervention for Hepatocellular Carcinoma by Integrated Bioinformatical Analysis.

PubMed

Hu, Wei Qi; Wang, Wei; Fang, Di Long; Yin, Xue Feng

2018-05-24

BACKGROUND We screened the potential molecular targets and investigated the molecular mechanisms of hepatocellular carcinoma (HCC). MATERIAL AND METHODS Microarray data of GSE47786, including the 40 μM berberine-treated HepG2 human hepatoma cell line and 0.08% DMSO-treated as control cells samples, was downloaded from the GEO database. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) enrichment analyses were performed; the protein-protein interaction (PPI) networks were constructed using STRING database and Cytoscape; the genetic alteration, neighboring genes networks, and survival analysis of hub genes were explored by cBio portal; and the expression of mRNA level of hub genes was obtained from the Oncomine databases. RESULTS A total of 56 upregulated and 8 downregulated DEGs were identified. The GO analysis results were significantly enriched in cell-cycle arrest, regulation of transcription, DNA-dependent, protein amino acid phosphorylation, cell cycle, and apoptosis. The KEGG pathway analysis showed that DEGs were enriched in MAPK signaling pathway, ErbB signaling pathway, and p53 signaling pathway. JUN, EGR1, MYC, and CDKN1A were identified as hub genes in PPI networks. The genetic alteration of hub genes was mainly concentrated in amplification. TP53, NDRG1, and MAPK15 were found in neighboring genes networks. Altered genes had worse overall survival and disease-free survival than unaltered genes. The expressions of EGR1, MYC, and CDKN1A were significantly increased, but expression of JUN was not, in the Roessler Liver datasets. CONCLUSIONS We found that JUN, EGR1, MYC, and CDKN1A might be used as diagnostic and therapeutic molecular biomarkers and broaden our understanding of the molecular mechanisms of HCC.
[Computational chemistry in structure-based drug design].

PubMed

Cao, Ran; Li, Wei; Sun, Han-Zi; Zhou, Yu; Huang, Niu

2013-07-01

Today, the understanding of the sequence and structure of biologically relevant targets is growing rapidly and researchers from many disciplines, physics and computational science in particular, are making significant contributions to modern biology and drug discovery. However, it remains challenging to rationally design small molecular ligands with desired biological characteristics based on the structural information of the drug targets, which demands more accurate calculation of ligand binding free-energy. With the rapid advances in computer power and extensive efforts in algorithm development, physics-based computational chemistry approaches have played more important roles in structure-based drug design. Here we reviewed the newly developed computational chemistry methods in structure-based drug design as well as the elegant applications, including binding-site druggability assessment, large scale virtual screening of chemical database, and lead compound optimization. Importantly, here we address the current bottlenecks and propose practical solutions.
Constraints on signaling network logic reveal functional subgraphs on Multiple Myeloma OMIC data.

PubMed

Miannay, Bertrand; Minvielle, Stéphane; Magrangeas, Florence; Guziolowski, Carito

2018-03-21

The integration of gene expression profiles (GEPs) and large-scale biological networks derived from pathways databases is a subject which is being widely explored. Existing methods are based on network distance measures among significantly measured species. Only a small number of them include the directionality and underlying logic existing in biological networks. In this study we approach the GEP-networks integration problem by considering the network logic, however our approach does not require a prior species selection according to their gene expression level. We start by modeling the biological network representing its underlying logic using Logic Programming. This model points to reachable network discrete states that maximize a notion of harmony between the molecular species active or inactive possible states and the directionality of the pathways reactions according to their activator or inhibitor control role. Only then, we confront these network states with the GEP. From this confrontation independent graph components are derived, each of them related to a fixed and optimal assignment of active or inactive states. These components allow us to decompose a large-scale network into subgraphs and their molecular species state assignments have different degrees of similarity when compared to the same GEP. We apply our method to study the set of possible states derived from a subgraph from the NCI-PID Pathway Interaction Database. This graph links Multiple Myeloma (MM) genes to known receptors for this blood cancer. We discover that the NCI-PID MM graph had 15 independent components, and when confronted to 611 MM GEPs, we find 1 component as being more specific to represent the difference between cancer and healthy profiles.

Biological sequence compression algorithms.

PubMed

Matsumoto, T; Sadakane, K; Imai, H

2000-01-01

Today, more and more DNA sequences are becoming available. The information about DNA sequences are stored in molecular biology databases. The size and importance of these databases will be bigger and bigger in the future, therefore this information must be stored or communicated efficiently. Furthermore, sequence compression can be used to define similarities between biological sequences. The standard compression algorithms such as gzip or compress cannot compress DNA sequences, but only expand them in size. On the other hand, CTW (Context Tree Weighting Method) can compress DNA sequences less than two bits per symbol. These algorithms do not use special structures of biological sequences. Two characteristic structures of DNA sequences are known. One is called palindromes or reverse complements and the other structure is approximate repeats. Several specific algorithms for DNA sequences that use these structures can compress them less than two bits per symbol. In this paper, we improve the CTW so that characteristic structures of DNA sequences are available. Before encoding the next symbol, the algorithm searches an approximate repeat and palindrome using hash and dynamic programming. If there is a palindrome or an approximate repeat with enough length then our algorithm represents it with length and distance. By using this preprocessing, a new program achieves a little higher compression ratio than that of existing DNA-oriented compression algorithms. We also describe new compression algorithm for protein sequences.
Taverna: a tool for building and running workflows of services

PubMed Central

Hull, Duncan; Wolstencroft, Katy; Stevens, Robert; Goble, Carole; Pocock, Mathew R.; Li, Peter; Oinn, Tom

2006-01-01

Taverna is an application that eases the use and integration of the growing number of molecular biology tools and databases available on the web, especially web services. It allows bioinformaticians to construct workflows or pipelines of services to perform a range of different analyses, such as sequence analysis and genome annotation. These high-level workflows can integrate many different resources into a single analysis. Taverna is available freely under the terms of the GNU Lesser General Public License (LGPL) from . PMID:16845108
IDAAPM: integrated database of ADMET and adverse effects of predictive modeling based on FDA approved drug data.

PubMed

Legehar, Ashenafi; Xhaard, Henri; Ghemtio, Leo

2016-01-01

The disposition of a pharmaceutical compound within an organism, i.e. its Absorption, Distribution, Metabolism, Excretion, Toxicity (ADMET) properties and adverse effects, critically affects late stage failure of drug candidates and has led to the withdrawal of approved drugs. Computational methods are effective approaches to reduce the number of safety issues by analyzing possible links between chemical structures and ADMET or adverse effects, but this is limited by the size, quality, and heterogeneity of the data available from individual sources. Thus, large, clean and integrated databases of approved drug data, associated with fast and efficient predictive tools are desirable early in the drug discovery process. We have built a relational database (IDAAPM) to integrate available approved drug data such as drug approval information, ADMET and adverse effects, chemical structures and molecular descriptors, targets, bioactivity and related references. The database has been coupled with a searchable web interface and modern data analytics platform (KNIME) to allow data access, data transformation, initial analysis and further predictive modeling. Data were extracted from FDA resources and supplemented from other publicly available databases. Currently, the database contains information regarding about 19,226 FDA approval applications for 31,815 products (small molecules and biologics) with their approval history, 2505 active ingredients, together with as many ADMET properties, 1629 molecular structures, 2.5 million adverse effects and 36,963 experimental drug-target bioactivity data. IDAAPM is a unique resource that, in a single relational database, provides detailed information on FDA approved drugs including their ADMET properties and adverse effects, the corresponding targets with bioactivity data, coupled with a data analytics platform. It can be used to perform basic to complex drug-target ADMET or adverse effects analysis and predictive modeling. IDAAPM is freely accessible at http://idaapm.helsinki.fi and can be exploited through a KNIME workflow connected to the database.Graphical abstractFDA approved drug data integration for predictive modeling.
Computational Approaches to Phenotyping

PubMed Central

Lussier, Yves A.; Liu, Yang

2007-01-01

The recent completion of the Human Genome Project has made possible a high-throughput “systems approach” for accelerating the elucidation of molecular underpinnings of human diseases, and subsequent derivation of molecular-based strategies to more effectively prevent, diagnose, and treat these diseases. Although altered phenotypes are among the most reliable manifestations of altered gene functions, research using systematic analysis of phenotype relationships to study human biology is still in its infancy. This article focuses on the emerging field of high-throughput phenotyping (HTP) phenomics research, which aims to capitalize on novel high-throughput computation and informatics technology developments to derive genomewide molecular networks of genotype–phenotype associations, or “phenomic associations.” The HTP phenomics research field faces the challenge of technological research and development to generate novel tools in computation and informatics that will allow researchers to amass, access, integrate, organize, and manage phenotypic databases across species and enable genomewide analysis to associate phenotypic information with genomic data at different scales of biology. Key state-of-the-art technological advancements critical for HTP phenomics research are covered in this review. In particular, we highlight the power of computational approaches to conduct large-scale phenomics studies. PMID:17202287
Exploring new scaffolds for angiotensin II receptor antagonism.

PubMed

Kritsi, Eftichia; Matsoukas, Minos-Timotheos; Potamitis, Constantinos; Karageorgos, Vlasios; Detsi, Anastasia; Magafa, Vasilliki; Liapakis, George; Mavromoustakos, Thomas; Zoumpoulakis, Panagiotis

2016-09-15

Nowadays, AT1 receptor (AT1R) antagonists (ARBs) constitute the one of the most prevalent classes of antihypertensive drugs that modulate the renin-angiotensin system (RAS). Their main uses include also treatment of diabetic nephropathy (kidney damage due to diabetes) and congestive heart failure. Towards this direction, our study has been focused on the discovery of novel agents bearing different scaffolds which may evolve as a new class of AT1 receptor antagonists. To fulfill this aim, a combination of computational approaches and biological assays were implemented. Particularly, a pharmacophore model was established and served as a 3D search query to screen the ChEMBL15 database. The reliability and accuracy of virtual screening results were improved by using molecular docking studies. In total, 4 compounds with completely diverse chemical scaffolds from potential ARBs, were picked and tested for their binding affinity to AT1 receptor. Results revealed high nanomolar to micromolar affinity (IC50) for all the compounds. Especially, compound 4 exhibited a binding affinity of 199nM. Molecular dynamics simulations were utilized in an effort to provide a molecular basis of their binding to AT1R in accordance to their biological activities. Copyright © 2016 Elsevier Ltd. All rights reserved.
Digging into the low molecular weight peptidome with the OligoNet web server.

PubMed

Liu, Youzhong; Forcisi, Sara; Lucio, Marianna; Harir, Mourad; Bahut, Florian; Deleris-Bou, Magali; Krieger-Weber, Sibylle; Gougeon, Régis D; Alexandre, Hervé; Schmitt-Kopplin, Philippe

2017-09-15

Bioactive peptides play critical roles in regulating many biological processes. Recently, natural short peptides biomarkers are drawing significant attention and are considered as "hidden treasure" of drug candidates. High resolution and high mass accuracy provided by mass spectrometry (MS)-based untargeted metabolomics would enable the rapid detection and wide coverage of the low-molecular-weight peptidome. However, translating unknown masses (<1 500 Da) into putative peptides is often limited due to the lack of automatic data processing tools and to the limit of peptide databases. The web server OligoNet responds to this challenge by attempting to decompose each individual mass into a combination of amino acids out of metabolomics datasets. It provides an additional network-based data interpretation named "Peptide degradation network" (PDN), which unravels interesting relations between annotated peptides and generates potential functional patterns. The ab initio PDN built from yeast metabolic profiling data shows a great similarity with well-known metabolic networks, and could aid biological interpretation. OligoNet allows also an easy evaluation and interpretation of annotated peptides in systems biology, and is freely accessible at https://daniellyz200608105.shinyapps.io/OligoNet/ .
Use of Graph Database for the Integration of Heterogeneous Biological Data.

PubMed

Yoon, Byoung-Ha; Kim, Seon-Kyu; Kim, Seon-Young

2017-03-01

Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.
Use of Graph Database for the Integration of Heterogeneous Biological Data

PubMed Central

Yoon, Byoung-Ha; Kim, Seon-Kyu

2017-01-01

Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data. PMID:28416946
Tcof1-Related Molecular Networks in Treacher Collins Syndrome.

PubMed

Dai, Jiewen; Si, Jiawen; Wang, Minjiao; Huang, Li; Fang, Bing; Shi, Jun; Wang, Xudong; Shen, Guofang

2016-09-01

Treacher Collins syndrome (TCS) is a rare, autosomal-dominant disorder characterized by craniofacial deformities, and is primarily caused by mutations in the Tcof1 gene. This article was aimed to perform a comprehensive literature review and systematic bioinformatic analysis of Tcof1-related molecular networks in TCS. First, the up- and down-regulated genes in Tcof1 heterozygous haploinsufficient mutant mice embryos and Tcof1 knockdown and Tcof1 over-expressed neuroblastoma N1E-115 cells were obtained from the Gene Expression Omnibus database. The GeneDecks database was used to calculate the 500 genes most closely related to Tcof1. Then, the relationships between 4 gene sets (a predicted set and sets comparing the wildtype with the 3 Gene Expression Omnibus datasets) were analyzed using the DAVID, GeneMANIA and STRING databases. The analysis results showed that the Tcof1-related genes were enriched in various biological processes, including cell proliferation, apoptosis, cell cycle, differentiation, and migration. They were also enriched in several signaling pathways, such as the ribosome, p53, cell cycle, and WNT signaling pathways. Additionally, these genes clearly had direct or indirect interactions with Tcof1 and between each other. Literature review and bioinformatic analysis finds imply that special attention should be given to these pathways, as they may offer target points for TCS therapies.
FBIS: A regional DNA barcode archival & analysis system for Indian fishes

PubMed Central

Nagpure, Naresh Sahebrao; Rashid, Iliyas; Pathak, Ajey Kumar; Singh, Mahender; Singh, Shri Prakash; Sarkar, Uttam Kumar

2012-01-01

DNA barcode is a new tool for taxon recognition and classification of biological organisms based on sequence of a fragment of mitochondrial gene, cytochrome c oxidase I (COI). In view of the growing importance of the fish DNA barcoding for species identification, molecular taxonomy and fish diversity conservation, we developed a Fish Barcode Information System (FBIS) for Indian fishes, which will serve as a regional DNA barcode archival and analysis system. The database presently contains 2334 sequence records of COI gene for 472 aquatic species belonging to 39 orders and 136 families, collected from available published data sources. Additionally, it contains information on phenotype, distribution and IUCN Red List status of fishes. The web version of FBIS was designed using MySQL, Perl and PHP under Linux operating platform to (a) store and manage the acquisition (b) analyze and explore DNA barcode records (c) identify species and estimate genetic divergence. FBIS has also been integrated with appropriate tools for retrieving and viewing information about the database statistics and taxonomy. It is expected that FBIS would be useful as a potent information system in fish molecular taxonomy, phylogeny and genomics. Availability The database is available for free at http://mail.nbfgr.res.in/fbis/ PMID:22715304
NCBI Bookshelf: books and documents in life sciences and health care

PubMed Central

Hoeppner, Marilu A.

2013-01-01

Bookshelf (http://www.ncbi.nlm.nih.gov/books/) is a full-text electronic literature resource of books and documents in life sciences and health care at the National Center for Biotechnology Information (NCBI). Created in 1999 with a single book as an encyclopedic reference for resources such as PubMed and GenBank, it has grown to its current size of >1300 titles. Unlike other NCBI databases, such as GenBank and Gene, which have a strict data structure, books come in all forms; they are diverse in publication types, formats, sizes and authoring models. The Bookshelf data format is XML tagged in the NCBI Book DTD (Document Type Definition), modeled after the National Library of Medicine journal article DTDs. The book DTD has been used for systematically tagging the diverse data formats of books, a move that has set the foundation for the growth of this resource. Books at NCBI followed the route of journal articles in the PubMed Central project, using the PubMed Central architectural framework, workflows and processes. Through integration with other NCBI molecular databases, books at NCBI can be used to provide reference information for biological data and facilitate its discovery. This article describes Bookshelf at NCBI: its growth, data handling and retrieval and integration with molecular databases. PMID:23203889
NCBI Bookshelf: books and documents in life sciences and health care.

PubMed

Hoeppner, Marilu A

2013-01-01

Bookshelf (http://www.ncbi.nlm.nih.gov/books/) is a full-text electronic literature resource of books and documents in life sciences and health care at the National Center for Biotechnology Information (NCBI). Created in 1999 with a single book as an encyclopedic reference for resources such as PubMed and GenBank, it has grown to its current size of >1300 titles. Unlike other NCBI databases, such as GenBank and Gene, which have a strict data structure, books come in all forms; they are diverse in publication types, formats, sizes and authoring models. The Bookshelf data format is XML tagged in the NCBI Book DTD (Document Type Definition), modeled after the National Library of Medicine journal article DTDs. The book DTD has been used for systematically tagging the diverse data formats of books, a move that has set the foundation for the growth of this resource. Books at NCBI followed the route of journal articles in the PubMed Central project, using the PubMed Central architectural framework, workflows and processes. Through integration with other NCBI molecular databases, books at NCBI can be used to provide reference information for biological data and facilitate its discovery. This article describes Bookshelf at NCBI: its growth, data handling and retrieval and integration with molecular databases.
MareyMap Online: A User-Friendly Web Application and Database Service for Estimating Recombination Rates Using Physical and Genetic Maps.

PubMed

Siberchicot, Aurélie; Bessy, Adrien; Guéguen, Laurent; Marais, Gabriel A B

2017-10-01

Given the importance of meiotic recombination in biology, there is a need to develop robust methods to estimate meiotic recombination rates. A popular approach, called the Marey map approach, relies on comparing genetic and physical maps of a chromosome to estimate local recombination rates. In the past, we have implemented this approach in an R package called MareyMap, which includes many functionalities useful to get reliable recombination rate estimates in a semi-automated way. MareyMap has been used repeatedly in studies looking at the effect of recombination on genome evolution. Here, we propose a simpler user-friendly web service version of MareyMap, called MareyMap Online, which allows a user to get recombination rates from her/his own data or from a publicly available database that we offer in a few clicks. When the analysis is done, the user is asked whether her/his curated data can be placed in the database and shared with other users, which we hope will make meta-analysis on recombination rates including many species easy in the future. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Planform: an application and database of graph-encoded planarian regenerative experiments.

PubMed

Lobo, Daniel; Malone, Taylor J; Levin, Michael

2013-04-15

Understanding the mechanisms governing the regeneration capabilities of many organisms is a fundamental interest in biology and medicine. An ever-increasing number of manipulation and molecular experiments are attempting to discover a comprehensive model for regeneration, with the planarian flatworm being one of the most important model species. Despite much effort, no comprehensive, constructive, mechanistic models exist yet, and it is now clear that computational tools are needed to mine this huge dataset. However, until now, there is no database of regenerative experiments, and the current genotype-phenotype ontologies and databases are based on textual descriptions, which are not understandable by computers. To overcome these difficulties, we present here Planform (Planarian formalization), a manually curated database and software tool for planarian regenerative experiments, based on a mathematical graph formalism. The database contains more than a thousand experiments from the main publications in the planarian literature. The software tool provides the user with a graphical interface to easily interact with and mine the database. The presented system is a valuable resource for the regeneration community and, more importantly, will pave the way for the application of novel artificial intelligence tools to extract knowledge from this dataset. The database and software tool are freely available at http://planform.daniel-lobo.com.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Rustad, James

Since they first puzzled over the geometric regularity of faceted crystals, geologists have been striving for a molecular-level understanding of the processes that control the transformation of earth materials. The relative lack of success in this endeavor can be revealed by asking why, if everyone knows what a molecular biologist is, there is no such corresponding occupation as a molecular geologist. That this should be so is even more surprising considering the vast amount of effort devoted over the 20th century to the determination of thousands of crystal structures of minerals of geological importance. Up through the 1970s every geologymore » department in a major research university had at least one specialist in X-ray mineralogy and crystallography. Roughly contemporaneous with the understanding of plate tectonics, geology had completed a remarkably comprehensive database of the crystal structures of thousands of minerals making up the Earth's crust and the more remote mineral assemblages making up the Earth's mantle. Uncovering the fundamental atomic structures of earth materials should have had the same transformational effect on geology that, for example, protein crystallography had on biology. The most basic and most interesting questions, such as the motions of tectonic plates, the rates of dissolution and weathering of rocks at the earth's surface into primary oxides and clay minerals, the process of replacing and preserving biological materials with minerals on deep time-scales, and the fractionation of isotopes during establishment of the earth's rock record have a molecular component that is no less central or less fascinating than those underpinning biological processes.« less
An editor for pathway drawing and data visualization in the Biopathways Workbench.

PubMed

Byrnes, Robert W; Cotter, Dawn; Maer, Andreia; Li, Joshua; Nadeau, David; Subramaniam, Shankar

2009-10-02

Pathway models serve as the basis for much of systems biology. They are often built using programs designed for the purpose. Constructing new models generally requires simultaneous access to experimental data of diverse types, to databases of well-characterized biological compounds and molecular intermediates, and to reference model pathways. However, few if any software applications provide all such capabilities within a single user interface. The Pathway Editor is a program written in the Java programming language that allows de-novo pathway creation and downloading of LIPID MAPS (Lipid Metabolites and Pathways Strategy) and KEGG lipid metabolic pathways, and of measured time-dependent changes to lipid components of metabolism. Accessed through Java Web Start, the program downloads pathways from the LIPID MAPS Pathway database (Pathway) as well as from the LIPID MAPS web server http://www.lipidmaps.org. Data arises from metabolomic (lipidomic), microarray, and protein array experiments performed by the LIPID MAPS consortium of laboratories and is arranged by experiment. Facility is provided to create, connect, and annotate nodes and processes on a drawing panel with reference to database objects and time course data. Node and interaction layout as well as data display may be configured in pathway diagrams as desired. Users may extend diagrams, and may also read and write data and non-lipidomic KEGG pathways to and from files. Pathway diagrams in XML format, containing database identifiers referencing specific compounds and experiments, can be saved to a local file for subsequent use. The program is built upon a library of classes, referred to as the Biopathways Workbench, that convert between different file formats and database objects. An example of this feature is provided in the form of read/construct/write access to models in SBML (Systems Biology Markup Language) contained in the local file system. Inclusion of access to multiple experimental data types and of pathway diagrams within a single interface, automatic updating through connectivity to an online database, and a focus on annotation, including reference to standardized lipid nomenclature as well as common lipid names, supports the view that the Pathway Editor represents a significant, practicable contribution to current pathway modeling tools.
GEneSTATION 1.0: a synthetic resource of diverse evolutionary and functional genomic data for studying the evolution of pregnancy-associated tissues and phenotypes

PubMed Central

Kim, Mara; Cooper, Brian A.; Venkat, Rohit; Phillips, Julie B.; Eidem, Haley R.; Hirbo, Jibril; Nutakki, Sashank; Williams, Scott M.; Muglia, Louis J.; Capra, J. Anthony; Petren, Kenneth; Abbot, Patrick; Rokas, Antonis; McGary, Kriston L.

2016-01-01

Mammalian gestation and pregnancy are fast evolving processes that involve the interaction of the fetal, maternal and paternal genomes. Version 1.0 of the GEneSTATION database (http://genestation.org) integrates diverse types of omics data across mammals to advance understanding of the genetic basis of gestation and pregnancy-associated phenotypes and to accelerate the translation of discoveries from model organisms to humans. GEneSTATION is built using tools from the Generic Model Organism Database project, including the biology-aware database CHADO, new tools for rapid data integration, and algorithms that streamline synthesis and user access. GEneSTATION contains curated life history information on pregnancy and reproduction from 23 high-quality mammalian genomes. For every human gene, GEneSTATION contains diverse evolutionary (e.g. gene age, population genetic and molecular evolutionary statistics), organismal (e.g. tissue-specific gene and protein expression, differential gene expression, disease phenotype), and molecular data types (e.g. Gene Ontology Annotation, protein interactions), as well as links to many general (e.g. Entrez, PubMed) and pregnancy disease-specific (e.g. PTBgene, dbPTB) databases. By facilitating the synthesis of diverse functional and evolutionary data in pregnancy-associated tissues and phenotypes and enabling their quick, intuitive, accurate and customized meta-analysis, GEneSTATION provides a novel platform for comprehensive investigation of the function and evolution of mammalian pregnancy. PMID:26567549
BioM2MetDisease: a manually curated database for associations between microRNAs, metabolites, small molecules and metabolic diseases

PubMed Central

Xu, Yanjun; Yang, Haixiu; Wu, Tan; Dong, Qun; Sun, Zeguo; Shang, Desi; Li, Feng; Xu, Yingqi; Su, Fei; Liu, Siyao

2017-01-01

Abstract BioM2MetDisease is a manually curated database that aims to provide a comprehensive and experimentally supported resource of associations between metabolic diseases and various biomolecules. Recently, metabolic diseases such as diabetes have become one of the leading threats to people’s health. Metabolic disease associated with alterations of multiple types of biomolecules such as miRNAs and metabolites. An integrated and high-quality data source that collection of metabolic disease associated biomolecules is essential for exploring the underlying molecular mechanisms and discovering novel therapeutics. Here, we developed the BioM2MetDisease database, which currently documents 2681 entries of relationships between 1147 biomolecules (miRNAs, metabolites and small molecules/drugs) and 78 metabolic diseases across 14 species. Each entry includes biomolecule category, species, biomolecule name, disease name, dysregulation pattern, experimental technique, a brief description of metabolic disease-biomolecule relationships, the reference, additional annotation information etc. BioM2MetDisease provides a user-friendly interface to explore and retrieve all data conveniently. A submission page was also offered for researchers to submit new associations between biomolecules and metabolic diseases. BioM2MetDisease provides a comprehensive resource for studying biology molecules act in metabolic diseases, and it is helpful for understanding the molecular mechanisms and developing novel therapeutics for metabolic diseases. Database URL: http://www.bio-bigdata.com/BioM2MetDisease/ PMID:28605773
Gene Unprediction with Spurio: A tool to identify spurious protein sequences.

PubMed

Höps, Wolfram; Jeffryes, Matt; Bateman, Alex

2018-01-01

We now have access to the sequences of tens of millions of proteins. These protein sequences are essential for modern molecular biology and computational biology. The vast majority of protein sequences are derived from gene prediction tools and have no experimental supporting evidence for their translation. Despite the increasing accuracy of gene prediction tools there likely exists a large number of spurious protein predictions in the sequence databases. We have developed the Spurio tool to help identify spurious protein predictions in prokaryotes. Spurio searches the query protein sequence against a prokaryotic nucleotide database using tblastn and identifies homologous sequences. The tblastn matches are used to score the query sequence's likelihood of being a spurious protein prediction using a Gaussian process model. The most informative feature is the appearance of stop codons within the presumed translation of homologous DNA sequences. Benchmarking shows that the Spurio tool is able to distinguish spurious from true proteins. However, transposon proteins are prone to be predicted as spurious because of the frequency of degraded homologs found in the DNA sequence databases. Our initial experiments suggest that less than 1% of the proteins in the UniProtKB sequence database are likely to be spurious and that Spurio is able to identify over 60 times more spurious proteins than the AntiFam resource. The Spurio software and source code is available under an MIT license at the following URL: https://bitbucket.org/bateman-group/spurio.
De Novo Assembly, Gene Annotation, and Marker Discovery in Stored-Product Pest Liposcelis entomophila (Enderlein) Using Transcriptome Sequences

PubMed Central

Wei, Dan-Dan; Chen, Er-Hu; Ding, Tian-Bo; Chen, Shi-Chun; Dou, Wei; Wang, Jin-Jun

2013-01-01

Background As a major stored-product pest insect, Liposcelis entomophila has developed high levels of resistance to various insecticides in grain storage systems. However, the molecular mechanisms underlying resistance and environmental stress have not been characterized. To date, there is a lack of genomic information for this species. Therefore, studies aimed at profiling the L. entomophila transcriptome would provide a better understanding of the biological functions at the molecular levels. Methodology/Principal Findings We applied Illumina sequencing technology to sequence the transcriptome of L. entomophila. A total of 54,406,328 clean reads were obtained and that de novo assembled into 54,220 unigenes, with an average length of 571 bp. Through a similarity search, 33,404 (61.61%) unigenes were matched to known proteins in the NCBI non-redundant (Nr) protein database. These unigenes were further functionally annotated with gene ontology (GO), cluster of orthologous groups of proteins (COG), and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. A large number of genes potentially involved in insecticide resistance were manually curated, including 68 putative cytochrome P450 genes, 37 putative glutathione S-transferase (GST) genes, 19 putative carboxyl/cholinesterase (CCE) genes, and other 126 transcripts to contain target site sequences or encoding detoxification genes representing eight types of resistance enzymes. Furthermore, to gain insight into the molecular basis of the L. entomophila toward thermal stresses, 25 heat shock protein (Hsp) genes were identified. In addition, 1,100 SSRs and 57,757 SNPs were detected and 231 pairs of SSR primes were designed for investigating the genetic diversity in future. Conclusions/Significance We developed a comprehensive transcriptomic database for L. entomophila. These sequences and putative molecular markers would further promote our understanding of the molecular mechanisms underlying insecticide resistance or environmental stress, and will facilitate studies on population genetics for psocids, as well as providing useful information for functional genomic research in the future. PMID:24244605

Surface similarity-based molecular query-retrieval

PubMed Central

Singh, Rahul

2007-01-01

Background Discerning the similarity between molecules is a challenging problem in drug discovery as well as in molecular biology. The importance of this problem is due to the fact that the biochemical characteristics of a molecule are closely related to its structure. Therefore molecular similarity is a key notion in investigations targeting exploration of molecular structural space, query-retrieval in molecular databases, and structure-activity modelling. Determining molecular similarity is related to the choice of molecular representation. Currently, representations with high descriptive power and physical relevance like 3D surface-based descriptors are available. Information from such representations is both surface-based and volumetric. However, most techniques for determining molecular similarity tend to focus on idealized 2D graph-based descriptors due to the complexity that accompanies reasoning with more elaborate representations. Results This paper addresses the problem of determining similarity when molecules are described using complex surface-based representations. It proposes an intrinsic, spherical representation that systematically maps points on a molecular surface to points on a standard coordinate system (a sphere). Molecular surface properties such as shape, field strengths, and effects due to field super-positioningcan then be captured as distributions on the surface of the sphere. Surface-based molecular similarity is subsequently determined by computing the similarity of the surface-property distributions using a novel formulation of histogram-intersection. The similarity formulation is not only sensitive to the 3D distribution of the surface properties, but is also highly efficient to compute. Conclusion The proposed method obviates the computationally expensive step of molecular pose-optimisation, can incorporate conformational variations, and facilitates highly efficient determination of similarity by directly comparing molecular surfaces and surface-based properties. Retrieval performance, applications in structure-activity modeling of complex biological properties, and comparisons with existing research and commercial methods demonstrate the validity and effectiveness of the approach. PMID:17634096
The State of the Art of the Zebrafish Model for Toxicology and Toxicologic Pathology Research—Advantages and Current Limitations

PubMed Central

Spitsbergen, Jan M.; Kent, Michael L.

2007-01-01

The zebrafish (Danio rerio) is now the pre-eminent vertebrate model system for clarification of the roles of specific genes and signaling pathways in development. The zebrafish genome will be completely sequenced within the next 1–2 years. Together with the substantial historical database regarding basic developmental biology, toxicology, and gene transfer, the rich foundation of molecular genetic and genomic data makes zebrafish a powerful model system for clarifying mechanisms in toxicity. In contrast to the highly advanced knowledge base on molecular developmental genetics in zebrafish, our database regarding infectious and noninfectious diseases and pathologic lesions in zebrafish lags far behind the information available on most other domestic mammalian and avian species, particularly rodents. Currently, minimal data are available regarding spontaneous neoplasm rates or spontaneous aging lesions in any of the commonly used wild-type or mutant lines of zebrafish. Therefore, to fully utilize the potential of zebrafish as an animal model for understanding human development, disease, and toxicology we must greatly advance our knowledge on zebrafish diseases and pathology. PMID:12597434
Bayesian screening for active compounds in high-dimensional chemical spaces combining property descriptors and molecular fingerprints.

PubMed

Vogt, Martin; Bajorath, Jürgen

2008-01-01

Bayesian classifiers are increasingly being used to distinguish active from inactive compounds and search large databases for novel active molecules. We introduce an approach to directly combine the contributions of property descriptors and molecular fingerprints in the search for active compounds that is based on a Bayesian framework. Conventionally, property descriptors and fingerprints are used as alternative features for virtual screening methods. Following the approach introduced here, probability distributions of descriptor values and fingerprint bit settings are calculated for active and database molecules and the divergence between the resulting combined distributions is determined as a measure of biological activity. In test calculations on a large number of compound activity classes, this methodology was found to consistently perform better than similarity searching using fingerprints and multiple reference compounds or Bayesian screening calculations using probability distributions calculated only from property descriptors. These findings demonstrate that there is considerable synergy between different types of property descriptors and fingerprints in recognizing diverse structure-activity relationships, at least in the context of Bayesian modeling.
Informatics approaches in the Biological Characterization of ...

EPA Pesticide Factsheets

Adverse Outcome Pathways (AOPs) are a conceptual framework to characterize toxicity pathways by a series of mechanistic steps from a molecular initiating event to population outcomes. This framework helps to direct risk assessment research, for example by aiding in computational prioritization of chemicals, genes, and tissues relevant to an adverse health outcome. We have designed and implemented a computational workflow to access a wealth of public data relating genes, chemicals, diseases, pathways, and species, to provide a biological context for putative AOPs. We selected three AOP case studies: ER/Aromatase Antagonism Leading to Reproductive Dysfunction, AHR1 Activation Leading to Cardiotoxicity, and AChE Inhibition Leading to Acute Mortality, and deduced a taxonomic range of applicability for each AOP. We developed computational tools to automatically access and analyze the pathway activity of AOP-relevant protein orthologs, finding broad similarity among vertebrate species for the ER/Aromatase and AHR1 AOPs, and similarity extending to invertebrate animal species for AChE inhibition. Additionally, we used public gene expression data to find groups of highly co-expressed genes, and compared those groups across organisms. To interpret these findings at a higher level of biological organization, we created the AOPdb, a relational database that mines results from sources including NCBI, KEGG, Reactome, CTD, and OMIM. This multi-source database connects genes,
The Arabidopsis Information Resource: Making and Mining the ‘Gold Standard’ Annotated Reference Plant Genome

PubMed Central

Berardini, Tanya Z.; Reiser, Leonore; Li, Donghui; Mezheritsky, Yarik; Muller, Robert; Strait, Emily; Huala, Eva

2015-01-01

The Arabidopsis Information Resource (TAIR) is a continuously updated, online database of genetic and molecular biology data for the model plant Arabidopsis thaliana that provides a global research community with centralized access to data for over 30,000 Arabidopsis genes. TAIR’s biocurators systematically extract, organize, and interconnect experimental data from the literature along with computational predictions, community submissions, and high throughput datasets to present a high quality and comprehensive picture of Arabidopsis gene function. TAIR provides tools for data visualization and analysis, and enables ordering of seed and DNA stocks, protein chips and other experimental resources. TAIR actively engages with its users who contribute expertise and data that augments the work of the curatorial staff. TAIR’s focus in an extensive and evolving ecosystem of online resources for plant biology is on the critically important role of extracting experimentally-based research findings from the literature and making that information computationally accessible. In response to the loss of government grant funding, the TAIR team founded a nonprofit entity, Phoenix Bioinformatics, with the aim of developing sustainable funding models for biological databases, using TAIR as a test case. Phoenix has successfully transitioned TAIR to subscription-based funding while still keeping its data relatively open and accessible. PMID:26201819
DASMiner: discovering and integrating data from DAS sources

PubMed Central

2009-01-01

Background DAS is a widely adopted protocol for providing syntactic interoperability among biological databases. The popularity of DAS is due to a simplified and elegant mechanism for data exchange that consists of sources exposing their RESTful interfaces for data access. As a growing number of DAS services are available for molecular biology resources, there is an incentive to explore this protocol in order to advance data discovery and integration among these resources. Results We developed DASMiner, a Matlab toolkit for querying DAS data sources that enables creation of integrated biological models using the information available in DAS-compliant repositories. DASMiner is composed by a browser application and an API that work together to facilitate gathering of data from different DAS sources, which can be used for creating enriched datasets from multiple sources. The browser is used to formulate queries and navigate data contained in DAS sources. Users can execute queries against these sources in an intuitive fashion, without the need of knowing the specific DAS syntax for the particular source. Using the source's metadata provided by the DAS Registry, the browser's layout adapts to expose only the set of commands and coordinate systems supported by the specific source. For this reason, the browser can interrogate any DAS source, independently of the type of data being served. The API component of DASMiner may be used for programmatic access of DAS sources by programs in Matlab. Once the desired data is found during navigation, the query is exported in the format of an API call to be used within any Matlab application. We illustrate the use of DASMiner by creating integrative models of histone modification maps and protein-protein interaction networks. These enriched datasets were built by retrieving and integrating distributed genomic and proteomic DAS sources using the API. Conclusion The support of the DAS protocol allows that hundreds of molecular biology databases to be treated as a federated, online collection of resources. DASMiner enables full exploration of these resources, and can be used to deploy applications and create integrated views of biological systems using the information deposited in DAS repositories. PMID:19919683
Database resources of the National Center for Biotechnology Information.

PubMed

2016-01-04

The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank(®) nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. Additional NCBI resources focus on literature (PubMed Central (PMC), Bookshelf and PubReader), health (ClinVar, dbGaP, dbMHC, the Genetic Testing Registry, HIV-1/Human Protein Interaction Database and MedGen), genomes (BioProject, Assembly, Genome, BioSample, dbSNP, dbVar, Epigenomics, the Map Viewer, Nucleotide, Probe, RefSeq, Sequence Read Archive, the Taxonomy Browser and the Trace Archive), genes (Gene, Gene Expression Omnibus (GEO), HomoloGene, PopSet and UniGene), proteins (Protein, the Conserved Domain Database (CDD), COBALT, Conserved Domain Architecture Retrieval Tool (CDART), the Molecular Modeling Database (MMDB) and Protein Clusters) and chemicals (Biosystems and the PubChem suite of small molecule databases). The Entrez system provides search and retrieval operations for most of these databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized datasets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. Published by Oxford University Press on behalf of Nucleic Acids Research 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Database resources of the National Center for Biotechnology Information.

PubMed

2015-01-01

The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank(®) nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. Additional NCBI resources focus on literature (Bookshelf, PubMed Central (PMC) and PubReader); medical genetics (ClinVar, dbMHC, the Genetic Testing Registry, HIV-1/Human Protein Interaction Database and MedGen); genes and genomics (BioProject, BioSample, dbSNP, dbVar, Epigenomics, Gene, Gene Expression Omnibus (GEO), Genome, HomoloGene, the Map Viewer, Nucleotide, PopSet, Probe, RefSeq, Sequence Read Archive, the Taxonomy Browser, Trace Archive and UniGene); and proteins and chemicals (Biosystems, COBALT, the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), the Molecular Modeling Database (MMDB), Protein Clusters, Protein and the PubChem suite of small molecule databases). The Entrez system provides search and retrieval operations for many of these databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at http://www.ncbi.nlm.nih.gov. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.
A dedicated database system for handling multi-level data in systems biology.

PubMed

Pornputtapong, Natapol; Wanichthanarak, Kwanjeera; Nilsson, Avlant; Nookaew, Intawat; Nielsen, Jens

2014-01-01

Advances in high-throughput technologies have enabled extensive generation of multi-level omics data. These data are crucial for systems biology research, though they are complex, heterogeneous, highly dynamic, incomplete and distributed among public databases. This leads to difficulties in data accessibility and often results in errors when data are merged and integrated from varied resources. Therefore, integration and management of systems biological data remain very challenging. To overcome this, we designed and developed a dedicated database system that can serve and solve the vital issues in data management and hereby facilitate data integration, modeling and analysis in systems biology within a sole database. In addition, a yeast data repository was implemented as an integrated database environment which is operated by the database system. Two applications were implemented to demonstrate extensibility and utilization of the system. Both illustrate how the user can access the database via the web query function and implemented scripts. These scripts are specific for two sample cases: 1) Detecting the pheromone pathway in protein interaction networks; and 2) Finding metabolic reactions regulated by Snf1 kinase. In this study we present the design of database system which offers an extensible environment to efficiently capture the majority of biological entities and relations encountered in systems biology. Critical functions and control processes were designed and implemented to ensure consistent, efficient, secure and reliable transactions. The two sample cases on the yeast integrated data clearly demonstrate the value of a sole database environment for systems biology research.
Plant Reactome: a resource for plant pathways and comparative analysis

PubMed Central

Naithani, Sushma; Preece, Justin; D'Eustachio, Peter; Gupta, Parul; Amarasinghe, Vindhya; Dharmawardhana, Palitha D.; Wu, Guanming; Fabregat, Antonio; Elser, Justin L.; Weiser, Joel; Keays, Maria; Fuentes, Alfonso Munoz-Pomer; Petryszak, Robert; Stein, Lincoln D.; Ware, Doreen; Jaiswal, Pankaj

2017-01-01

Plant Reactome (http://plantreactome.gramene.org/) is a free, open-source, curated plant pathway database portal, provided as part of the Gramene project. The database provides intuitive bioinformatics tools for the visualization, analysis and interpretation of pathway knowledge to support genome annotation, genome analysis, modeling, systems biology, basic research and education. Plant Reactome employs the structural framework of a plant cell to show metabolic, transport, genetic, developmental and signaling pathways. We manually curate molecular details of pathways in these domains for reference species Oryza sativa (rice) supported by published literature and annotation of well-characterized genes. Two hundred twenty-two rice pathways, 1025 reactions associated with 1173 proteins, 907 small molecules and 256 literature references have been curated to date. These reference annotations were used to project pathways for 62 model, crop and evolutionarily significant plant species based on gene homology. Database users can search and browse various components of the database, visualize curated baseline expression of pathway-associated genes provided by the Expression Atlas and upload and analyze their Omics datasets. The database also offers data access via Application Programming Interfaces (APIs) and in various standardized pathway formats, such as SBML and BioPAX. PMID:27799469
ESCAPE: database for integrating high-content published data collected from human and mouse embryonic stem cells.

PubMed

Xu, Huilei; Baroukh, Caroline; Dannenfelser, Ruth; Chen, Edward Y; Tan, Christopher M; Kou, Yan; Kim, Yujin E; Lemischka, Ihor R; Ma'ayan, Avi

2013-01-01

High content studies that profile mouse and human embryonic stem cells (m/hESCs) using various genome-wide technologies such as transcriptomics and proteomics are constantly being published. However, efforts to integrate such data to obtain a global view of the molecular circuitry in m/hESCs are lagging behind. Here, we present an m/hESC-centered database called Embryonic Stem Cell Atlas from Pluripotency Evidence integrating data from many recent diverse high-throughput studies including chromatin immunoprecipitation followed by deep sequencing, genome-wide inhibitory RNA screens, gene expression microarrays or RNA-seq after knockdown (KD) or overexpression of critical factors, immunoprecipitation followed by mass spectrometry proteomics and phosphoproteomics. The database provides web-based interactive search and visualization tools that can be used to build subnetworks and to identify known and novel regulatory interactions across various regulatory layers. The web-interface also includes tools to predict the effects of combinatorial KDs by additive effects controlled by sliders, or through simulation software implemented in MATLAB. Overall, the Embryonic Stem Cell Atlas from Pluripotency Evidence database is a comprehensive resource for the stem cell systems biology community. Database URL: http://www.maayanlab.net/ESCAPE
MetReS, an Efficient Database for Genomic Applications.

PubMed

Vilaplana, Jordi; Alves, Rui; Solsona, Francesc; Mateo, Jordi; Teixidó, Ivan; Pifarré, Marc

2018-02-01

MetReS (Metabolic Reconstruction Server) is a genomic database that is shared between two software applications that address important biological problems. Biblio-MetReS is a data-mining tool that enables the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the processes of interest and their function. The main goal of this work was to identify the areas where the performance of the MetReS database performance could be improved and to test whether this improvement would scale to larger datasets and more complex types of analysis. The study was started with a relational database, MySQL, which is the current database server used by the applications. We also tested the performance of an alternative data-handling framework, Apache Hadoop. Hadoop is currently used for large-scale data processing. We found that this data handling framework is likely to greatly improve the efficiency of the MetReS applications as the dataset and the processing needs increase by several orders of magnitude, as expected to happen in the near future.
Using GenBank.

PubMed

Wheeler, David

2007-01-01

GenBank(R) is a comprehensive database of publicly available DNA sequences for more than 205,000 named organisms and for more than 60,000 within the embryophyta, obtained through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Daily data exchange with the European Molecular Biology Laboratory (EMBL) in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the National Center for Biotechnology Information (NCBI) retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases with taxonomy, genome, mapping, protein structure, and domain information and the biomedical journal literature through PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available through FTP. GenBank usage scenarios ranging from local analyses of the data available through FTP to online analyses supported by the NCBI Web-based tools are discussed. To access GenBank and its related retrieval and analysis services, go to the NCBI Homepage at http://www.ncbi.nlm.nih.gov.
Textpresso site-specific recombinases: A text-mining server for the recombinase literature including Cre mice and conditional alleles.

PubMed

Urbanski, William M; Condie, Brian G

2009-12-01

Textpresso Site Specific Recombinases (http://ssrc.genetics.uga.edu/) is a text-mining web server for searching a database of more than 9,000 full-text publications. The papers and abstracts in this database represent a wide range of topics related to site-specific recombinase (SSR) research tools. Included in the database are most of the papers that report the characterization or use of mouse strains that express Cre recombinase as well as papers that describe or analyze mouse lines that carry conditional (floxed) alleles or SSR-activated transgenes/knockins. The database also includes reports describing SSR-based cloning methods such as the Gateway or the Creator systems, papers reporting the development or use of SSR-based tools in systems such as Drosophila, bacteria, parasites, stem cells, yeast, plants, zebrafish, and Xenopus as well as publications that describe the biochemistry, genetics, or molecular structure of the SSRs themselves. Textpresso Site Specific Recombinases is the only comprehensive text-mining resource available for the literature describing the biology and technical applications of SSRs. (c) 2009 Wiley-Liss, Inc.
Chemical Informatics and the Drug Discovery Knowledge Pyramid

PubMed Central

Lushington, Gerald H.; Dong, Yinghua; Theertham, Bhargav

2012-01-01

The magnitude of the challenges in preclinical drug discovery is evident in the large amount of capital invested in such efforts in pursuit of a small static number of eventually successful marketable therapeutics. An explosion in the availability of potentially drug-like compounds and chemical biology data on these molecules can provide us with the means to improve the eventual success rates for compounds being considered at the preclinical level, but only if the community is able to access available information in an efficient and meaningful way. Thus, chemical database resources are critical to any serious drug discovery effort. This paper explores the basic principles underlying the development and implementation of chemical databases, and examines key issues of how molecular information may be encoded within these databases so as to enhance the likelihood that users will be able to extract meaningful information from data queries. In addition to a broad survey of conventional data representation and query strategies, key enabling technologies such as new context-sensitive chemical similarity measures and chemical cartridges are examined, with recommendations on how such resources may be integrated into a practical database environment. PMID:23782037
Accessing biological actions of Ganoderma secondary metabolites by in silico profiling

PubMed Central

Grienke, Ulrike; Kaserer, Teresa; Pfluger, Florian; Mair, Christina E.; Langer, Thierry; Schuster, Daniela; Rollinger, Judith M.

2016-01-01

The species complex around the medicinal fungus Ganoderma lucidum Karst. (Ganodermataceae) is widely known in traditional medicines as well as in modern applications such as functional food or nutraceuticals. A considerable number of publications reflects its abundance and variety in biological actions either provoked by primary metabolites such as polysaccharides or secondary metabolites such as lanostane-type triterpenes. However, due to this remarkable amount of information, a rationalization of the individual Ganoderma constituents to biological actions on a molecular level is quite challenging. To overcome this issue, a database was generated containing meta-information, i.e. chemical structures and biological actions of hitherto identified Ganoderma constituents (279). This was followed by a computational approach subjecting this 3D multi-conformational molecular dataset to in silico parallel screening against an in-house collection of validated structure- and ligand-based 3D pharmacophore models. The predictive power of the evaluated in silico tools and hints from traditional application fields served as criteria for the model selection. Thus, we focused on representative druggable targets in the field of viral infections (5) and diseases related to the metabolic syndrome (22). The results obtained from this in silico approach were compared to bioactivity data available from the literature to distinguish between true and false positives or negatives. 89 and 197 Ganoderma compounds were predicted as ligands of at least one of the selected pharmacological targets in the antiviral and the metabolic syndrome screening, respectively. Among them only a minority of individual compounds (around 10%) has ever been investigated on these targets or for the associated biological activity. Accordingly, this study discloses putative ligand target interactions for a plethora of Ganoderma constituents in the empirically manifested field of viral diseases and metabolic syndrome which serve as a basis for future applications to access yet undiscovered biological actions of Ganoderma secondary metabolites on a molecular level. PMID:25457486
Protein-protein interaction networks: unraveling the wiring of molecular machines within the cell.

PubMed

De Las Rivas, Javier; Fontanillo, Celia

2012-11-01

Mapping and understanding of the protein interaction networks with their key modules and hubs can provide deeper insights into the molecular machinery underlying complex phenotypes. In this article, we present the basic characteristics and definitions of protein networks, starting with a distinction of the different types of associations between proteins. We focus the review on protein-protein interactions (PPIs), a subset of associations defined as physical contacts between proteins that occur by selective molecular docking in a particular biological context. We present such definition as opposed to other types of protein associations derived from regulatory, genetic, structural or functional relations. To determine PPIs, a variety of binary and co-complex methods exist; however, not all the technologies provide the same information and data quality. A way of increasing confidence in a given protein interaction is to integrate orthogonal experimental evidences. The use of several complementary methods testing each single interaction assesses the accuracy of PPI data and tries to minimize the occurrence of false interactions. Following this approach there have been important efforts to unify primary databases of experimentally proven PPIs into integrated databases. These meta-databases provide a measure of the confidence of interactions based on the number of experimental proofs that report them. As a conclusion, we can state that integrated information allows the building of more reliable interaction networks. Identification of communities, cliques, modules and hubs by analysing the topological parameters and graph properties of the protein networks allows the discovery of central/critical nodes, which are candidates to regulate cellular flux and dynamics.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Prather, J. C.; Smith, S. K.; Watson, C. R.

The National Radiobiology Archives is a comprehensive effort to gather, organize, and catalog original data, representative specimens, and supporting materials related to significant radiobiology studies. This provides researchers with information for analyses which compare or combine results of these and other studies and with materials for analysis by advanced molecular biology techniques. This Programmer's Guide document describes the database access software, NRADEMO, and the subset loading script NRADEMO/MAINT/MAINTAIN, which comprise the National Laboratory Archives Distributed Access Package. The guide is intended for use by an experienced database management specialist. It contains information about the physical and logical organization of themore » software and data files. It also contains printouts of all the scripts and associated batch processing files. It is part of a suite of documents published by the National Radiobiology Archives.« less
PomBase: a comprehensive online resource for fission yeast

PubMed Central

Wood, Valerie; Harris, Midori A.; McDowall, Mark D.; Rutherford, Kim; Vaughan, Brendan W.; Staines, Daniel M.; Aslett, Martin; Lock, Antonia; Bähler, Jürg; Kersey, Paul J.; Oliver, Stephen G.

2012-01-01

PomBase (www.pombase.org) is a new model organism database established to provide access to comprehensive, accurate, and up-to-date molecular data and biological information for the fission yeast Schizosaccharomyces pombe to effectively support both exploratory and hypothesis-driven research. PomBase encompasses annotation of genomic sequence and features, comprehensive manual literature curation and genome-wide data sets, and supports sophisticated user-defined queries. The implementation of PomBase integrates a Chado relational database that houses manually curated data with Ensembl software that supports sequence-based annotation and web access. PomBase will provide user-friendly tools to promote curation by experts within the fission yeast community. This will make a key contribution to shaping its content and ensuring its comprehensiveness and long-term relevance. PMID:22039153
Pharmacophore-based virtual screening, molecular docking, molecular dynamics simulation, and biological evaluation for the discovery of novel BRD4 inhibitors.

PubMed

Yan, Guoyi; Hou, Manzhou; Luo, Jiang; Pu, Chunlan; Hou, Xueyan; Lan, Suke; Li, Rui

2018-02-01

Bromodomain is a recognition module in the signal transduction of acetylated histone. BRD4, one of the bromodomain members, is emerging as an attractive therapeutic target for several types of cancer. Therefore, in this study, an attempt has been made to screen compounds from an integrated database containing 5.5 million compounds for BRD4 inhibitors using pharmacophore-based virtual screening, molecular docking, and molecular dynamics simulations. As a result, two molecules of twelve hits were found to be active in bioactivity tests. Among the molecules, compound 5 exhibited potent anticancer activity, and the IC 50 values against human cancer cell lines MV4-11, A375, and HeLa were 4.2, 7.1, and 11.6 μm, respectively. After that, colony formation assay, cell cycle, apoptosis analysis, wound-healing migration assay, and Western blotting were carried out to learn the bioactivity of compound 5. © 2017 John Wiley & Sons A/S.

Experimental medical mycological research in Latin America - a 2000-2009 overview.

PubMed

San-Blas, Gioconda; Burger, Eva

2011-01-01

An overview of current trends in Latin American Experimental Medical Mycological research since the beginning of the 21(st) century is done (search from January 2000 to December 2009). Using the PubMed and LILACS databases, the authors have chosen publications on medically important fungi which, according to our opinion, are the most relevant because of their novelty, interest, and international impact, based on research made entirely in the Latin American region or as part of collaborative efforts with laboratories elsewhere. In this way, the following areas are discussed: 1) molecular identification of fungal pathogens; 2) molecular and clinical epidemiology on fungal pathogens of prevalence in the region; 3) cell biology; 4) transcriptome, genome, molecular taxonomy and phylogeny; 5) immunology; 6) vaccines; 7) new and experimental antifungals. Copyright © 2010 Revista Iberoamericana de Micología. Published by Elsevier Espana. All rights reserved.
CRISPR-Cas in Medicinal Chemistry: Applications and Regulatory Concerns.

PubMed

Duardo-Sanchez, Aliuska

2017-01-01

A rapid search in scientific publication's databases shows how the use of CRISPR-Cas genome editions' technique has considerably expanded, and its growing importance, in modern molecular biology. Just in pub-med platform, the search of the term gives more than 3000 results. Specifically, in Drug Discovery, Medicinal Chemistry and Chemical Biology in general CRISPR method may have multiple applications. Some of these applications are: resistance-selection studies of antimalarial lead organic compounds; investigation of druggability; development of animal models for chemical compounds testing, etc. In this paper, we offer a review of the most relevant scientific literature illustrated with specific examples of application of CRISPR technique to medicinal chemistry and chemical biology. We also present a general overview of the main legal and ethical trends regarding this method of genome editing. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Cell illustrator 4.0: a computational platform for systems biology.

PubMed

Nagasaki, Masao; Saito, Ayumu; Jeong, Euna; Li, Chen; Kojima, Kaname; Ikeda, Emi; Miyano, Satoru

2011-01-01

Cell Illustrator is a software platform for Systems Biology that uses the concept of Petri net for modeling and simulating biopathways. It is intended for biological scientists working at bench. The latest version of Cell Illustrator 4.0 uses Java Web Start technology and is enhanced with new capabilities, including: automatic graph grid layout algorithms using ontology information; tools using Cell System Markup Language (CSML) 3.0 and Cell System Ontology 3.0; parameter search module; high-performance simulation module; CSML database management system; conversion from CSML model to programming languages (FORTRAN, C, C++, Java, Python and Perl); import from SBML, CellML, and BioPAX; and, export to SVG and HTML. Cell Illustrator employs an extension of hybrid Petri net in an object-oriented style so that biopathway models can include objects such as DNA sequence, molecular density, 3D localization information, transcription with frame-shift, translation with codon table, as well as biochemical reactions.
Opportunities and challenges for digital morphology

PubMed Central

2010-01-01

Advances in digital data acquisition, analysis, and storage have revolutionized the work in many biological disciplines such as genomics, molecular phylogenetics, and structural biology, but have not yet found satisfactory acceptance in morphology. Improvements in non-invasive imaging and three-dimensional visualization techniques, however, permit high-throughput analyses also of whole biological specimens, including museum material. These developments pave the way towards a digital era in morphology. Using sea urchins (Echinodermata: Echinoidea), we provide examples illustrating the power of these techniques. However, remote visualization, the creation of a specialized database, and the implementation of standardized, world-wide accepted data deposition practices prior to publication are essential to cope with the foreseeable exponential increase in digital morphological data. Reviewers This article was reviewed by Marc D. Sutton (nominated by Stephan Beck), Gonzalo Giribet (nominated by Lutz Walter), and Lennart Olsson (nominated by Purificación López-García). PMID:20604956
Cell Illustrator 4.0: a computational platform for systems biology.

PubMed

Nagasaki, Masao; Saito, Ayumu; Jeong, Euna; Li, Chen; Kojima, Kaname; Ikeda, Emi; Miyano, Satoru

2010-01-01

Cell Illustrator is a software platform for Systems Biology that uses the concept of Petri net for modeling and simulating biopathways. It is intended for biological scientists working at bench. The latest version of Cell Illustrator 4.0 uses Java Web Start technology and is enhanced with new capabilities, including: automatic graph grid layout algorithms using ontology information; tools using Cell System Markup Language (CSML) 3.0 and Cell System Ontology 3.0; parameter search module; high-performance simulation module; CSML database management system; conversion from CSML model to programming languages (FORTRAN, C, C++, Java, Python and Perl); import from SBML, CellML, and BioPAX; and, export to SVG and HTML. Cell Illustrator employs an extension of hybrid Petri net in an object-oriented style so that biopathway models can include objects such as DNA sequence, molecular density, 3D localization information, transcription with frame-shift, translation with codon table, as well as biochemical reactions.
An integrated database-pipeline system for studying single nucleotide polymorphisms and diseases.

PubMed

Yang, Jin Ok; Hwang, Sohyun; Oh, Jeongsu; Bhak, Jong; Sohn, Tae-Kwon

2008-12-12

Studies on the relationship between disease and genetic variations such as single nucleotide polymorphisms (SNPs) are important. Genetic variations can cause disease by influencing important biological regulation processes. Despite the needs for analyzing SNP and disease correlation, most existing databases provide information only on functional variants at specific locations on the genome, or deal with only a few genes associated with disease. There is no combined resource to widely support gene-, SNP-, and disease-related information, and to capture relationships among such data. Therefore, we developed an integrated database-pipeline system for studying SNPs and diseases. To implement the pipeline system for the integrated database, we first unified complicated and redundant disease terms and gene names using the Unified Medical Language System (UMLS) for classification and noun modification, and the HUGO Gene Nomenclature Committee (HGNC) and NCBI gene databases. Next, we collected and integrated representative databases for three categories of information. For genes and proteins, we examined the NCBI mRNA, UniProt, UCSC Table Track and MitoDat databases. For genetic variants we used the dbSNP, JSNP, ALFRED, and HGVbase databases. For disease, we employed OMIM, GAD, and HGMD databases. The database-pipeline system provides a disease thesaurus, including genes and SNPs associated with disease. The search results for these categories are available on the web page http://diseasome.kobic.re.kr/, and a genome browser is also available to highlight findings, as well as to permit the convenient review of potentially deleterious SNPs among genes strongly associated with specific diseases and clinical phenotypes. Our system is designed to capture the relationships between SNPs associated with disease and disease-causing genes. The integrated database-pipeline provides a list of candidate genes and SNP markers for evaluation in both epidemiological and molecular biological approaches to diseases-gene association studies. Furthermore, researchers then can decide semi-automatically the data set for association studies while considering the relationships between genetic variation and diseases. The database can also be economical for disease-association studies, as well as to facilitate an understanding of the processes which cause disease. Currently, the database contains 14,674 SNP records and 109,715 gene records associated with human diseases and it is updated at regular intervals.
bioDBnet - Biological Database Network

Cancer.gov

bioDBnet is a comprehensive resource of most of the biological databases available from different sites like NCBI, Uniprot, EMBL, Ensembl, Affymetrix. It provides a queryable interface to all the databases available, converts identifiers from one database into another and generates comprehensive reports.
Database resources of the National Center for Biotechnology Information.

PubMed

Sayers, Eric W; Barrett, Tanya; Benson, Dennis A; Bolton, Evan; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; DiCuccio, Michael; Federhen, Scott; Feolo, Michael; Fingerman, Ian M; Geer, Lewis Y; Helmberg, Wolfgang; Kapustin, Yuri; Landsman, David; Lipman, David J; Lu, Zhiyong; Madden, Thomas L; Madej, Tom; Maglott, Donna R; Marchler-Bauer, Aron; Miller, Vadim; Mizrachi, Ilene; Ostell, James; Panchenko, Anna; Phan, Lon; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Stephen T; Shumway, Martin; Sirotkin, Karl; Slotta, Douglas; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A; Wagner, Lukas; Wang, Yanli; Wilbur, W John; Yaschenko, Eugene; Ye, Jian

2011-01-01

In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Electronic PCR, OrfFinder, Splign, ProSplign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), IBIS, Biosystems, Peptidome, OMSSA, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Molecular and comparative genetics of mental retardation.

PubMed Central

Inlow, Jennifer K; Restifo, Linda L

2004-01-01

Affecting 1-3% of the population, mental retardation (MR) poses significant challenges for clinicians and scientists. Understanding the biology of MR is complicated by the extraordinary heterogeneity of genetic MR disorders. Detailed analyses of >1000 Online Mendelian Inheritance in Man (OMIM) database entries and literature searches through September 2003 revealed 282 molecularly identified MR genes. We estimate that hundreds more MR genes remain to be identified. A novel test, in which we distributed unmapped MR disorders proportionately across the autosomes, failed to eliminate the well-known X-chromosome overrepresentation of MR genes and candidate genes. This evidence argues against ascertainment bias as the main cause of the skewed distribution. On the basis of a synthesis of clinical and laboratory data, we developed a biological functions classification scheme for MR genes. Metabolic pathways, signaling pathways, and transcription are the most common functions, but numerous other aspects of neuronal and glial biology are controlled by MR genes as well. Using protein sequence and domain-organization comparisons, we found a striking conservation of MR genes and genetic pathways across the approximately 700 million years that separate Homo sapiens and Drosophila melanogaster. Eighty-seven percent have one or more fruit fly homologs and 76% have at least one candidate functional ortholog. We propose that D. melanogaster can be used in a systematic manner to study MR and possibly to develop bioassays for therapeutic drug discovery. We selected 42 Drosophila orthologs as most likely to reveal molecular and cellular mechanisms of nervous system development or plasticity relevant to MR. PMID:15020472
Multi-tissue RNA-seq and transcriptome characterisation of the spiny dogfish shark (Squalus acanthias) provides a molecular tool for biological research and reveals new genes involved in osmoregulation.

PubMed

Chana-Munoz, Andres; Jendroszek, Agnieszka; Sønnichsen, Malene; Kristiansen, Rune; Jensen, Jan K; Andreasen, Peter A; Bendixen, Christian; Panitz, Frank

2017-01-01

The spiny dogfish shark (Squalus acanthias) is one of the most commonly used cartilaginous fishes in biological research, especially in the fields of nitrogen metabolism, ion transporters and osmoregulation. Nonetheless, transcriptomic data for this organism is scarce. In the present study, a multi-tissue RNA-seq experiment and de novo transcriptome assembly was performed in four different spiny dogfish tissues (brain, liver, kidney and ovary), providing an annotated sequence resource. The characterization of the transcriptome greatly increases the scarce sequence information for shark species. Reads were assembled with the Trinity de novo assembler both within each tissue and across all tissues combined resulting in 362,690 transcripts in the combined assembly which represent 289,515 Trinity genes. BUSCO analysis determined a level of 87% completeness for the combined transcriptome. In total, 123,110 proteins were predicted of which 78,679 and 83,164 had significant hits against the SwissProt and Uniref90 protein databases, respectively. Additionally, 61,215 proteins aligned to known protein domains, 7,208 carried a signal peptide and 15,971 possessed at least one transmembrane region. Based on the annotation, 81,582 transcripts were assigned to gene ontology terms and 42,078 belong to known clusters of orthologous groups (eggNOG). To demonstrate the value of our molecular resource, we show that the improved transcriptome data enhances the current possibilities of osmoregulation research in spiny dogfish by utilizing the novel gene and protein annotations to investigate a set of genes involved in urea synthesis and urea, ammonia and water transport, all of them crucial in osmoregulation. We describe the presence of different gene copies and isoforms of key enzymes involved in this process, including arginases and transporters of urea and ammonia, for which sequence information is currently absent in the databases for this model species. The transcriptome assemblies and the derived annotations generated in this study will support the ongoing research for this particular animal model and provides a new molecular tool to assist biological research in cartilaginous fishes.
Multi-tissue RNA-seq and transcriptome characterisation of the spiny dogfish shark (Squalus acanthias) provides a molecular tool for biological research and reveals new genes involved in osmoregulation

PubMed Central

Chana-Munoz, Andres; Jendroszek, Agnieszka; Sønnichsen, Malene; Kristiansen, Rune; Jensen, Jan K.; Bendixen, Christian

2017-01-01

The spiny dogfish shark (Squalus acanthias) is one of the most commonly used cartilaginous fishes in biological research, especially in the fields of nitrogen metabolism, ion transporters and osmoregulation. Nonetheless, transcriptomic data for this organism is scarce. In the present study, a multi-tissue RNA-seq experiment and de novo transcriptome assembly was performed in four different spiny dogfish tissues (brain, liver, kidney and ovary), providing an annotated sequence resource. The characterization of the transcriptome greatly increases the scarce sequence information for shark species. Reads were assembled with the Trinity de novo assembler both within each tissue and across all tissues combined resulting in 362,690 transcripts in the combined assembly which represent 289,515 Trinity genes. BUSCO analysis determined a level of 87% completeness for the combined transcriptome. In total, 123,110 proteins were predicted of which 78,679 and 83,164 had significant hits against the SwissProt and Uniref90 protein databases, respectively. Additionally, 61,215 proteins aligned to known protein domains, 7,208 carried a signal peptide and 15,971 possessed at least one transmembrane region. Based on the annotation, 81,582 transcripts were assigned to gene ontology terms and 42,078 belong to known clusters of orthologous groups (eggNOG). To demonstrate the value of our molecular resource, we show that the improved transcriptome data enhances the current possibilities of osmoregulation research in spiny dogfish by utilizing the novel gene and protein annotations to investigate a set of genes involved in urea synthesis and urea, ammonia and water transport, all of them crucial in osmoregulation. We describe the presence of different gene copies and isoforms of key enzymes involved in this process, including arginases and transporters of urea and ammonia, for which sequence information is currently absent in the databases for this model species. The transcriptome assemblies and the derived annotations generated in this study will support the ongoing research for this particular animal model and provides a new molecular tool to assist biological research in cartilaginous fishes. PMID:28832628
The Importance of Biological Databases in Biological Discovery.

PubMed

Baxevanis, Andreas D; Bateman, Alex

2015-06-19

Biological databases play a central role in bioinformatics. They offer scientists the opportunity to access a wide variety of biologically relevant data, including the genomic sequences of an increasingly broad range of organisms. This unit provides a brief overview of major sequence databases and portals, such as GenBank, the UCSC Genome Browser, and Ensembl. Model organism databases, including WormBase, The Arabidopsis Information Resource (TAIR), and those made available through the Mouse Genome Informatics (MGI) resource, are also covered. Non-sequence-centric databases, such as Online Mendelian Inheritance in Man (OMIM), the Protein Data Bank (PDB), MetaCyc, and the Kyoto Encyclopedia of Genes and Genomes (KEGG), are also discussed. Copyright © 2015 John Wiley & Sons, Inc.
SPARQLGraph: a web-based platform for graphically querying biological Semantic Web databases.

PubMed

Schweiger, Dominik; Trajanoski, Zlatko; Pabinger, Stephan

2014-08-15

Semantic Web has established itself as a framework for using and sharing data across applications and database boundaries. Here, we present a web-based platform for querying biological Semantic Web databases in a graphical way. SPARQLGraph offers an intuitive drag & drop query builder, which converts the visual graph into a query and executes it on a public endpoint. The tool integrates several publicly available Semantic Web databases, including the databases of the just recently released EBI RDF platform. Furthermore, it provides several predefined template queries for answering biological questions. Users can easily create and save new query graphs, which can also be shared with other researchers. This new graphical way of creating queries for biological Semantic Web databases considerably facilitates usability as it removes the requirement of knowing specific query languages and database structures. The system is freely available at http://sparqlgraph.i-med.ac.at.
Liverome: a curated database of liver cancer-related gene signatures with self-contained context information.

PubMed

Lee, Langho; Wang, Kai; Li, Gang; Xie, Zhi; Wang, Yuli; Xu, Jiangchun; Sun, Shaoxian; Pocalyko, David; Bhak, Jong; Kim, Chulhong; Lee, Kee-Ho; Jang, Ye Jin; Yeom, Young Il; Yoo, Hyang-Sook; Hwang, Seungwoo

2011-11-30

Hepatocellular carcinoma (HCC) is the fifth most common cancer worldwide. A number of molecular profiling studies have investigated the changes in gene and protein expression that are associated with various clinicopathological characteristics of HCC and generated a wealth of scattered information, usually in the form of gene signature tables. A database of the published HCC gene signatures would be useful to liver cancer researchers seeking to retrieve existing differential expression information on a candidate gene and to make comparisons between signatures for prioritization of common genes. A challenge in constructing such database is that a direct import of the signatures as appeared in articles would lead to a loss or ambiguity of their context information that is essential for a correct biological interpretation of a gene's expression change. This challenge arises because designation of compared sample groups is most often abbreviated, ad hoc, or even missing from published signature tables. Without manual curation, the context information becomes lost, leading to uninformative database contents. Although several databases of gene signatures are available, none of them contains informative form of signatures nor shows comprehensive coverage on liver cancer. Thus we constructed Liverome, a curated database of liver cancer-related gene signatures with self-contained context information. Liverome's data coverage is more than three times larger than any other signature database, consisting of 143 signatures taken from 98 HCC studies, mostly microarray and proteome, and involving 6,927 genes. The signatures were post-processed into an informative and uniform representation and annotated with an itemized summary so that all context information is unambiguously self-contained within the database. The signatures were further informatively named and meaningfully organized according to ten functional categories for guided browsing. Its web interface enables a straightforward retrieval of known differential expression information on a query gene and a comparison of signatures to prioritize common genes. The utility of Liverome-collected data is shown by case studies in which useful biological insights on HCC are produced. Liverome database provides a comprehensive collection of well-curated HCC gene signatures and straightforward interfaces for gene search and signature comparison as well. Liverome is available at http://liverome.kobic.re.kr.
ISCB Ebola Award for Important Future Research on the Computational Biology of Ebola Virus

PubMed Central

Karp, Peter D.; Berger, Bonnie; Kovats, Diane; Lengauer, Thomas; Linial, Michal; Sabeti, Pardis; Hide, Winston; Rost, Burkhard

2015-01-01

Speed is of the essence in combating Ebola; thus, computational approaches should form a significant component of Ebola research. As for the development of any modern drug, computational biology is uniquely positioned to contribute through comparative analysis of the genome sequences of Ebola strains as well as 3-D protein modeling. Other computational approaches to Ebola may include large-scale docking studies of Ebola proteins with human proteins and with small-molecule libraries, computational modeling of the spread of the virus, computational mining of the Ebola literature, and creation of a curated Ebola database. Taken together, such computational efforts could significantly accelerate traditional scientific approaches. In recognition of the need for important and immediate solutions from the field of computational biology against Ebola, the International Society for Computational Biology (ISCB) announces a prize for an important computational advance in fighting the Ebola virus. ISCB will confer the ISCB Fight against Ebola Award, along with a prize of US$2,000, at its July 2016 annual meeting (ISCB Intelligent Systems for Molecular Biology (ISMB) 2016, Orlando, Florida). PMID:26097686
ISCB Ebola Award for Important Future Research on the Computational Biology of Ebola Virus.

PubMed

Karp, Peter D; Berger, Bonnie; Kovats, Diane; Lengauer, Thomas; Linial, Michal; Sabeti, Pardis; Hide, Winston; Rost, Burkhard

2015-01-01

Speed is of the essence in combating Ebola; thus, computational approaches should form a significant component of Ebola research. As for the development of any modern drug, computational biology is uniquely positioned to contribute through comparative analysis of the genome sequences of Ebola strains as well as 3-D protein modeling. Other computational approaches to Ebola may include large-scale docking studies of Ebola proteins with human proteins and with small-molecule libraries, computational modeling of the spread of the virus, computational mining of the Ebola literature, and creation of a curated Ebola database. Taken together, such computational efforts could significantly accelerate traditional scientific approaches. In recognition of the need for important and immediate solutions from the field of computational biology against Ebola, the International Society for Computational Biology (ISCB) announces a prize for an important computational advance in fighting the Ebola virus. ISCB will confer the ISCB Fight against Ebola Award, along with a prize of US$2,000, at its July 2016 annual meeting (ISCB Intelligent Systems for Molecular Biology (ISMB) 2016, Orlando, Florida).
Genetics and Forensics: Making the National DNA Database

PubMed Central

Johnson, Paul; Williams, Robin; Martin, Paul

2005-01-01

This paper is based on a current study of the growing police use of the epistemic authority of molecular biology for the identification of criminal suspects in support of crime investigation. It discusses the development of DNA profiling and the establishment and development of the UK National DNA Database (NDNAD) as an instance of the ‘scientification of police work’ (Ericson and Shearing 1986) in which the police uses of science and technology have a recursive effect on their future development. The NDNAD, owned by the Association of Chief Police Officers of England and Wales, is the first of its kind in the world and currently contains the genetic profiles of more than 2 million people. The paper provides a framework for the examination of this socio-technical innovation, begins to tease out the dense and compact history of the database and accounts for the way in which changes and developments across disparate scientific, governmental and policing contexts, have all contributed to the range of uses to which it is put. PMID:16467921
MSDB: A Comprehensive Database of Simple Sequence Repeats.

PubMed

Avvaru, Akshay Kumar; Saxena, Saketh; Sowpati, Divya Tej; Mishra, Rakesh Kumar

2017-06-01

Microsatellites, also known as Simple Sequence Repeats (SSRs), are short tandem repeats of 1-6 nt motifs present in all genomes, particularly eukaryotes. Besides their usefulness as genome markers, SSRs have been shown to perform important regulatory functions, and variations in their length at coding regions are linked to several disorders in humans. Microsatellites show a taxon-specific enrichment in eukaryotic genomes, and some may be functional. MSDB (Microsatellite Database) is a collection of >650 million SSRs from 6,893 species including Bacteria, Archaea, Fungi, Plants, and Animals. This database is by far the most exhaustive resource to access and analyze SSR data of multiple species. In addition to exploring data in a customizable tabular format, users can view and compare the data of multiple species simultaneously using our interactive plotting system. MSDB is developed using the Django framework and MySQL. It is freely available at http://tdb.ccmb.res.in/msdb. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Genome sequence analysis of a flocculant-producing bacterium, Paenibacillus shenyangensis.

PubMed

Fu, Lili; Jiang, Binhui; Liu, Jinliang; Zhao, Xin; Liu, Qian; Hu, Xiaomin

2016-03-01

To explore the metabolic process of Paenibacillus shenyangensis that is an efficient bioflocculant-producing bacterium. The biosynthesis mechanism of bioflocculation was used to enrich the genome of Paenibacillus shenyangensis and provide a basis for molecular genetics and functional genomics analyses. According to the analysis of de novo assembly, a total of 5,501,467 bp clean reads were generated, and were assembled into 92 contigs. 4800 unigenes were predicted of which 4393 were annotated showing a specific gene function in the NCBI-Nr database. 3423 genes were found in the database of cluster of orthologous groups. Among the 168 Kyoto Encyclopedia of Genes and Genomes database, cell growth and metabolism were the main biological processes, and a potential metabolic pathway was predicted from glucose to exopolysaccharide within the starch and sucrose metabolism pathway. By using the high-throughput sequencing technology, we provide a genome analysis of Paenibacillus shenyangensis that predicts the main metabolic processes and a potential pathway of exopolysaccharide biosynthesis.
(abstract) Modeling Protein Families and Human Genes: Hidden Markov Models and a Little Beyond

NASA Technical Reports Server (NTRS)

Baldi, Pierre

1994-01-01

We will first give a brief overview of Hidden Markov Models (HMMs) and their use in Computational Molecular Biology. In particular, we will describe a detailed application of HMMs to the G-Protein-Coupled-Receptor Superfamily. We will also describe a number of analytical results on HMMs that can be used in discrimination tests and database mining. We will then discuss the limitations of HMMs and some new directions of research. We will conclude with some recent results on the application of HMMs to human gene modeling and parsing.

A Systems Biology Strategy to Identify Molecular Mechanisms of Action and Protein Indicators of Traumatic Brain Injury

DTIC Science & Technology

2014-11-14

2 Xueping Yu,1 Bhaskar Dutta,1 Jacob D. Feala,1 Kara Schmid,2 Jitendra Dave,2 Gregory J . Tawa,1 Anders Wallqvist,1 and Jaques Reifman1* 1Department of...pathway.html), downloaded in December, 2011. KEGG, one of the largest and most widely used publicly available pathway databases, anno - tates pathways...Ansari MA, Roberts KN, Scheff SW. 2008b. A time course of contusion-induced oxidative stress and synaptic proteins in cortex in a rat model of TBI. J
Biological Databases for Human Research

PubMed Central

Zou, Dong; Ma, Lina; Yu, Jun; Zhang, Zhang

2015-01-01

The completion of the Human Genome Project lays a foundation for systematically studying the human genome from evolutionary history to precision medicine against diseases. With the explosive growth of biological data, there is an increasing number of biological databases that have been developed in aid of human-related research. Here we present a collection of human-related biological databases and provide a mini-review by classifying them into different categories according to their data types. As human-related databases continue to grow not only in count but also in volume, challenges are ahead in big data storage, processing, exchange and curation. PMID:25712261
Identifying relevant data for a biological database: handcrafted rules versus machine learning.

PubMed

Sehgal, Aditya Kumar; Das, Sanmay; Noto, Keith; Saier, Milton H; Elkan, Charles

2011-01-01

With well over 1,000 specialized biological databases in use today, the task of automatically identifying novel, relevant data for such databases is increasingly important. In this paper, we describe practical machine learning approaches for identifying MEDLINE documents and Swiss-Prot/TrEMBL protein records, for incorporation into a specialized biological database of transport proteins named TCDB. We show that both learning approaches outperform rules created by hand by a human expert. As one of the first case studies involving two different approaches to updating a deployed database, both the methods compared and the results will be of interest to curators of many specialized databases.
A systems biology pipeline identifies new immune and disease related molecular signatures and networks in human cells during microgravity exposure

NASA Astrophysics Data System (ADS)

Mukhopadhyay, Sayak; Saha, Rohini; Palanisamy, Anbarasi; Ghosh, Madhurima; Biswas, Anupriya; Roy, Saheli; Pal, Arijit; Sarkar, Kathakali; Bagh, Sangram

2016-05-01

Microgravity is a prominent health hazard for astronauts, yet we understand little about its effect at the molecular systems level. In this study, we have integrated a set of systems-biology tools and databases and have analysed more than 8000 molecular pathways on published global gene expression datasets of human cells in microgravity. Hundreds of new pathways have been identified with statistical confidence for each dataset and despite the difference in cell types and experiments, around 100 of the new pathways are appeared common across the datasets. They are related to reduced inflammation, autoimmunity, diabetes and asthma. We have identified downregulation of NfκB pathway via Notch1 signalling as new pathway for reduced immunity in microgravity. Induction of few cancer types including liver cancer and leukaemia and increased drug response to cancer in microgravity are also found. Increase in olfactory signal transduction is also identified. Genes, based on their expression pattern, are clustered and mathematically stable clusters are identified. The network mapping of genes within a cluster indicates the plausible functional connections in microgravity. This pipeline gives a new systems level picture of human cells under microgravity, generates testable hypothesis and may help estimating risk and developing medicine for space missions.
An Overview of the Evolution of Infrared Spectroscopy Applied to Bacterial Typing.

PubMed

Quintelas, Cristina; Ferreira, Eugénio C; Lopes, João A; Sousa, Clara

2018-01-01

The sustained emergence of new declared bacterial species makes typing a continuous challenge for microbiologists. Molecular biology techniques have a very significant role in the context of bacterial typing, but they are often very laborious, time consuming, and eventually fail when dealing with very closely related species. Spectroscopic-based techniques appear in some situations as a viable alternative to molecular methods with advantages in terms of analysis time and cost. Infrared and mass spectrometry are among the most exploited techniques in this context: particularly, infrared spectroscopy emerged as a very promising method with multiple reported successful applications. This article presents a systematic review on infrared spectroscopy applications for bacterial typing, highlighting fundamental aspects of infrared spectroscopy, a detailed literature review (covering different taxonomic levels and bacterial species), advantages, and limitations of the technique over molecular biology methods and a comparison with other competing spectroscopic techniques such as MALDI-TOF MS, Raman, and intrinsic fluorescence. Infrared spectroscopy possesses a high potential for bacterial typing at distinct taxonomic levels and worthy of further developments and systematization. The development of databases appears fundamental toward the establishment of infrared spectroscopy as a viable method for bacterial typing. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Capturing cooperative interactions with the PSI-MI format

PubMed Central

Van Roey, Kim; Orchard, Sandra; Kerrien, Samuel; Dumousseau, Marine; Ricard-Blum, Sylvie; Hermjakob, Henning; Gibson, Toby J.

2013-01-01

The complex biological processes that control cellular function are mediated by intricate networks of molecular interactions. Accumulating evidence indicates that these interactions are often interdependent, thus acting cooperatively. Cooperative interactions are prevalent in and indispensible for reliable and robust control of cell regulation, as they underlie the conditional decision-making capability of large regulatory complexes. Despite an increased focus on experimental elucidation of the molecular details of cooperative binding events, as evidenced by their growing occurrence in literature, they are currently lacking from the main bioinformatics resources. One of the contributing factors to this deficiency is the lack of a computer-readable standard representation and exchange format for cooperative interaction data. To tackle this shortcoming, we added functionality to the widely used PSI-MI interchange format for molecular interaction data by defining new controlled vocabulary terms that allow annotation of different aspects of cooperativity without making structural changes to the underlying XML schema. As a result, we are able to capture cooperative interaction data in a structured format that is backward compatible with PSI-MI–based data and applications. This will facilitate the storage, exchange and analysis of cooperative interaction data, which in turn will advance experimental research on this fundamental principle in biology. Database URL: http://psi-mi-cooperativeinteractions.embl.de/ PMID:24067240
A course-based undergraduate research experience investigating p300 bromodomain mutations.

PubMed

Shanle, Erin K; Tsun, Ian K; Strahl, Brian D

2016-01-01

Course-based undergraduate research experiences (CUREs) provide an opportunity for students to engage in experiments with outcomes that are unknown to both the instructor and students. These experiences allow students and instructors to collaboratively bridge the research laboratory and classroom, and provide research experiences for a large number of students relative to traditional individual mentored research. Here, we describe a molecular biology CURE investigating the impact of clinically relevant mutations found in the bromodomain of the p300 transcriptional regulator on acetylated histone interaction. In the CURE, students identified missense mutations in the p300 bromodomain using the Catalogue of Somatic Mutations in Cancer (COSMIC) database and hypothesized the effects of the mutation on the acetyl-binding function of the domain. They cloned and purified the mutated bromodomain and performed peptide pulldown assays to define its potential to bind to acetylated histones. Upon completion of the course, students showed increased confidence performing molecular techniques and reported positively on doing a research project in class. In addition, results generated in the classroom were further validated in the research laboratory setting thereby providing a new model for faculty to engage in both course-based and individual undergraduate research experiences. © 2015 The International Union of Biochemistry and Molecular Biology.
Diverse data supports the transition of filamentous fungal model organisms into the post-genomics era

DOE PAGES

McCluskey, Kevin; Baker, Scott E.

2017-02-17

As model organisms filamentous fungi have been important since the beginning of modern biological inquiry and have benefitted from open data since the earliest genetic maps were shared. From early origins in simple Mendelian genetics of mating types, parasexual genetics of colony colour, and the foundational demonstration of the segregation of a nutritional requirement, the contribution of research systems utilising filamentous fungi has spanned the biochemical genetics era, through the molecular genetics era, and now are at the very foundation of diverse omics approaches to research and development. Fungal model organisms have come from most major taxonomic groups although Ascomycetemore » filamentous fungi have seen the most major sustained effort. In addition to the published material about filamentous fungi, shared molecular tools have found application in every area of fungal biology. Likewise, shared data has contributed to the success of model systems. Furthermore, the scale of data supporting research with filamentous fungi has grown by 10 to 12 orders of magnitude. From genetic to molecular maps, expression databases, and finally genome resources, the open and collaborative nature of the research communities has assured that the rising tide of data has lifted all of the research systems together.« less
Diverse data supports the transition of filamentous fungal model organisms into the post-genomics era

DOE Office of Scientific and Technical Information (OSTI.GOV)

McCluskey, Kevin; Baker, Scott E.

As model organisms filamentous fungi have been important since the beginning of modern biological inquiry and have benefitted from open data since the earliest genetic maps were shared. From early origins in simple Mendelian genetics of mating types, parasexual genetics of colony colour, and the foundational demonstration of the segregation of a nutritional requirement, the contribution of research systems utilising filamentous fungi has spanned the biochemical genetics era, through the molecular genetics era, and now are at the very foundation of diverse omics approaches to research and development. Fungal model organisms have come from most major taxonomic groups although Ascomycetemore » filamentous fungi have seen the most major sustained effort. In addition to the published material about filamentous fungi, shared molecular tools have found application in every area of fungal biology. Likewise, shared data has contributed to the success of model systems. Furthermore, the scale of data supporting research with filamentous fungi has grown by 10 to 12 orders of magnitude. From genetic to molecular maps, expression databases, and finally genome resources, the open and collaborative nature of the research communities has assured that the rising tide of data has lifted all of the research systems together.« less
A systems biology pipeline identifies new immune and disease related molecular signatures and networks in human cells during microgravity exposure.

PubMed

Mukhopadhyay, Sayak; Saha, Rohini; Palanisamy, Anbarasi; Ghosh, Madhurima; Biswas, Anupriya; Roy, Saheli; Pal, Arijit; Sarkar, Kathakali; Bagh, Sangram

2016-05-17

Microgravity is a prominent health hazard for astronauts, yet we understand little about its effect at the molecular systems level. In this study, we have integrated a set of systems-biology tools and databases and have analysed more than 8000 molecular pathways on published global gene expression datasets of human cells in microgravity. Hundreds of new pathways have been identified with statistical confidence for each dataset and despite the difference in cell types and experiments, around 100 of the new pathways are appeared common across the datasets. They are related to reduced inflammation, autoimmunity, diabetes and asthma. We have identified downregulation of NfκB pathway via Notch1 signalling as new pathway for reduced immunity in microgravity. Induction of few cancer types including liver cancer and leukaemia and increased drug response to cancer in microgravity are also found. Increase in olfactory signal transduction is also identified. Genes, based on their expression pattern, are clustered and mathematically stable clusters are identified. The network mapping of genes within a cluster indicates the plausible functional connections in microgravity. This pipeline gives a new systems level picture of human cells under microgravity, generates testable hypothesis and may help estimating risk and developing medicine for space missions.
TOXCAST, A TOOL FOR CATEGORIZATION AND ...

EPA Pesticide Factsheets

Across several EPA Program Offices (e.g., OPPTS, OW, OAR), there is a clear need to develop strategies and methods to screen large numbers of chemicals for potential toxicity, and to use the resulting information to prioritize the use of testing resources towards those entities and endpoints that present the greatest likelihood of risk to human health and the environment. This need could be addressed using the experience of the pharmaceutical industry in the use of advanced modern molecular biology and computational chemistry tools for the development of new drugs, with appropriate adjustment to the needs and desires of environmental toxicology. A conceptual approach named ToxCast has been developed to address the needs of EPA Program Offices in the area of prioritization and screening. Modern computational chemistry and molecular biology tools bring enabling technologies forward that can provide information about the physical and biological properties of large numbers of chemicals. The essence of the proposal is to conduct a demonstration project based upon a rich toxicological database (e.g., registered pesticides, or the chemicals tested in the NTP bioassay program), select a fairly large number (50-100 or more chemicals) representative of a number of differing structural classes and phenotypic outcomes (e.g., carcinogens, reproductive toxicants, neurotoxicants), and evaluate them across a broad spectrum of information domains that modern technology has pro
A life scientist's gateway to distributed data management and computing: the PathPort/ToolBus framework.

PubMed

Eckart, J Dana; Sobral, Bruno W S

2003-01-01

The emergent needs of the bioinformatics community challenge current information systems. The pace of biological data generation far outstrips Moore's Law. Therefore, a gap continues to widen between the capabilities to produce biological (molecular and cell) data sets and the capability to manage and analyze these data sets. As a result, Federal investments in large data set generation produces diminishing returns in terms of the community's capabilities of understanding biology and leveraging that understanding to make scientific and technological advances that improve society. We are building an open framework to address various data management issues including data and tool interoperability, nomenclature and data communication standardization, and database integration. PathPort, short for Pathogen Portal, employs a generic, web-services based framework to deal with some of the problems identified by the bioinformatics community. The motivating research goal of a scalable system to provide data management and analysis for key pathosystems, especially relating to molecular data, has resulted in a generic framework using two major components. On the server-side, we employ web-services. On the client-side, a Java application called ToolBus acts as a client-side "bus" for contacting data and tools and viewing results through a single, consistent user interface.
The Biological Macromolecule Crystallization Database and NASA Protein Crystal Growth Archive

PubMed Central

Gilliland, Gary L.; Tung, Michael; Ladner, Jane

1996-01-01

The NIST/NASA/CARB Biological Macromolecule Crystallization Database (BMCD), NIST Standard Reference Database 21, contains crystal data and crystallization conditions for biological macromolecules. The database entries include data abstracted from published crystallographic reports. Each entry consists of information describing the biological macromolecule crystallized and crystal data and the crystallization conditions for each crystal form. The BMCD serves as the NASA Protein Crystal Growth Archive in that it contains protocols and results of crystallization experiments undertaken in microgravity (space). These database entries report the results, whether successful or not, from NASA-sponsored protein crystal growth experiments in microgravity and from microgravity crystallization studies sponsored by other international organizations. The BMCD was designed as a tool to assist x-ray crystallographers in the development of protocols to crystallize biological macromolecules, those that have previously been crystallized, and those that have not been crystallized. PMID:11542472
Functional genomics approaches in parasitic helminths.

PubMed

Hagen, J; Lee, E F; Fairlie, W D; Kalinna, B H

2012-01-01

As research on parasitic helminths is moving into the post-genomic era, an enormous effort is directed towards deciphering gene function and to achieve gene annotation. The sequences that are available in public databases undoubtedly hold information that can be utilized for new interventions and control but the exploitation of these resources has until recently remained difficult. Only now, with the emergence of methods to genetically manipulate and transform parasitic worms will it be possible to gain a comprehensive understanding of the molecular mechanisms involved in nutrition, metabolism, developmental switches/maturation and interaction with the host immune system. This review focuses on functional genomics approaches in parasitic helminths that are currently used, to highlight potential applications of these technologies in the areas of cell biology, systems biology and immunobiology of parasitic helminths. © 2011 Blackwell Publishing Ltd.
POLLUX: a program for simulated cloning, mutagenesis and database searching of DNA constructs.

PubMed

Dayringer, H E; Sammons, S A

1991-04-01

Computer support for research in biotechnology has developed rapidly and has provided several tools to aid the researcher. This report describes the capabilities of new computer software developed in this laboratory to aid in the documentation and planning of experiments in molecular biology. The program, POLLUX, provides a graphical medium for the entry, edit and manipulation of DNA constructs and a textual format for display and edit of construct descriptive data. Program operation and procedures are designed to mimic the actual laboratory experiments with respect to capability and the order in which they are performed. Flexible control over the content of the computer-generated displays and program facilities is provided by a mouse-driven menu interface. Programmed facilities for mutagenesis, simulated cloning and searching of the database from networked workstations are described.
Marine molecular biology: an emerging field of biological sciences.

PubMed

Thakur, Narsinh L; Jain, Roopesh; Natalio, Filipe; Hamer, Bojan; Thakur, Archana N; Müller, Werner E G

2008-01-01

An appreciation of the potential applications of molecular biology is of growing importance in many areas of life sciences, including marine biology. During the past two decades, the development of sophisticated molecular technologies and instruments for biomedical research has resulted in significant advances in the biological sciences. However, the value of molecular techniques for addressing problems in marine biology has only recently begun to be cherished. It has been proven that the exploitation of molecular biological techniques will allow difficult research questions about marine organisms and ocean processes to be addressed. Marine molecular biology is a discipline, which strives to define and solve the problems regarding the sustainable exploration of marine life for human health and welfare, through the cooperation between scientists working in marine biology, molecular biology, microbiology and chemistry disciplines. Several success stories of the applications of molecular techniques in the field of marine biology are guiding further research in this area. In this review different molecular techniques are discussed, which have application in marine microbiology, marine invertebrate biology, marine ecology, marine natural products, material sciences, fisheries, conservation and bio-invasion etc. In summary, if marine biologists and molecular biologists continue to work towards strong partnership during the next decade and recognize intellectual and technological advantages and benefits of such partnership, an exciting new frontier of marine molecular biology will emerge in the future.
MUBII-TB-DB: a database of mutations associated with antibiotic resistance in Mycobacterium tuberculosis.

PubMed

Flandrois, Jean-Pierre; Lina, Gérard; Dumitrescu, Oana

2014-04-14

Tuberculosis is an infectious bacterial disease caused by Mycobacterium tuberculosis. It remains a major health threat, killing over one million people every year worldwide. An early antibiotic therapy is the basis of the treatment, and the emergence and spread of multidrug and extensively drug-resistant mutant strains raise significant challenges. As these bacteria grow very slowly, drug resistance mutations are currently detected using molecular biology techniques. Resistance mutations are identified by sequencing the resistance-linked genes followed by a comparison with the literature data. The only online database is the TB Drug Resistance Mutation database (TBDReaM database); however, it requires mutation detection before use, and its interrogation is complex due to its loose syntax and grammar. The MUBII-TB-DB database is a simple, highly structured text-based database that contains a set of Mycobacterium tuberculosis mutations (DNA and proteins) occurring at seven loci: rpoB, pncA, katG; mabA(fabG1)-inhA, gyrA, gyrB, and rrs. Resistance mutation data were extracted after the systematic review of MEDLINE referenced publications before March 2013. MUBII analyzes the query sequence obtained by PCR-sequencing using two parallel strategies: i) a BLAST search against a set of previously reconstructed mutated sequences and ii) the alignment of the query sequences (DNA and its protein translation) with the wild-type sequences. The post-treatment includes the extraction of the aligned sequences together with their descriptors (position and nature of mutations). The whole procedure is performed using the internet. The results are graphs (alignments) and text (description of the mutation, therapeutic significance). The system is quick and easy to use, even for technicians without bioinformatics training. MUBII-TB-DB is a structured database of the mutations occurring at seven loci of major therapeutic value in tuberculosis management. Moreover, the system provides interpretation of the mutations in biological and therapeutic terms and can evolve by the addition of newly described mutations. Its goal is to provide easy and comprehensive access through a client-server model over the Web to an up-to-date database of mutations that lead to the resistance of M. tuberculosis to antibiotics.
MIR@NT@N: a framework integrating transcription factors, microRNAs and their targets to identify sub-network motifs in a meta-regulation network model

PubMed Central

2011-01-01

Background To understand biological processes and diseases, it is crucial to unravel the concerted interplay of transcription factors (TFs), microRNAs (miRNAs) and their targets within regulatory networks and fundamental sub-networks. An integrative computational resource generating a comprehensive view of these regulatory molecular interactions at a genome-wide scale would be of great interest to biologists, but is not available to date. Results To identify and analyze molecular interaction networks, we developed MIR@NT@N, an integrative approach based on a meta-regulation network model and a large-scale database. MIR@NT@N uses a graph-based approach to predict novel molecular actors across multiple regulatory processes (i.e. TFs acting on protein-coding or miRNA genes, or miRNAs acting on messenger RNAs). Exploiting these predictions, the user can generate networks and further analyze them to identify sub-networks, including motifs such as feedback and feedforward loops (FBL and FFL). In addition, networks can be built from lists of molecular actors with an a priori role in a given biological process to predict novel and unanticipated interactions. Analyses can be contextualized and filtered by integrating additional information such as microarray expression data. All results, including generated graphs, can be visualized, saved and exported into various formats. MIR@NT@N performances have been evaluated using published data and then applied to the regulatory program underlying epithelium to mesenchyme transition (EMT), an evolutionary-conserved process which is implicated in embryonic development and disease. Conclusions MIR@NT@N is an effective computational approach to identify novel molecular regulations and to predict gene regulatory networks and sub-networks including conserved motifs within a given biological context. Taking advantage of the M@IA environment, MIR@NT@N is a user-friendly web resource freely available at http://mironton.uni.lu which will be updated on a regular basis. PMID:21375730
SIDD: A Semantically Integrated Database towards a Global View of Human Disease

PubMed Central

Cheng, Liang; Wang, Guohua; Li, Jie; Zhang, Tianjiao; Xu, Peigang; Wang, Yadong

2013-01-01

Background A number of databases have been developed to collect disease-related molecular, phenotypic and environmental features (DR-MPEs), such as genes, non-coding RNAs, genetic variations, drugs, phenotypes and environmental factors. However, each of current databases focused on only one or two DR-MPEs. There is an urgent demand to develop an integrated database, which can establish semantic associations among disease-related databases and link them to provide a global view of human disease at the biological level. This database, once developed, will facilitate researchers to query various DR-MPEs through disease, and investigate disease mechanisms from different types of data. Methodology To establish an integrated disease-associated database, disease vocabularies used in different databases are mapped to Disease Ontology (DO) through semantic match. 4,284 and 4,186 disease terms from Medical Subject Headings (MeSH) and Online Mendelian Inheritance in Man (OMIM) respectively are mapped to DO. Then, the relationships between DR-MPEs and diseases are extracted and merged from different source databases for reducing the data redundancy. Conclusions A semantically integrated disease-associated database (SIDD) is developed, which integrates 18 disease-associated databases, for researchers to browse multiple types of DR-MPEs in a view. A web interface allows easy navigation for querying information through browsing a disease ontology tree or searching a disease term. Furthermore, a network visualization tool using Cytoscape Web plugin has been implemented in SIDD. It enhances the SIDD usage when viewing the relationships between diseases and DR-MPEs. The current version of SIDD (Jul 2013) documents 4,465,131 entries relating to 139,365 DR-MPEs, and to 3,824 human diseases. The database can be freely accessed from: http://mlg.hit.edu.cn/SIDD. PMID:24146757
Automated bond order assignment as an optimization problem.

PubMed

Dehof, Anna Katharina; Rurainski, Alexander; Bui, Quang Bao Anh; Böcker, Sebastian; Lenhof, Hans-Peter; Hildebrandt, Andreas

2011-03-01

Numerous applications in Computational Biology process molecular structures and hence strongly rely not only on correct atomic coordinates but also on correct bond order information. For proteins and nucleic acids, bond orders can be easily deduced but this does not hold for other types of molecules like ligands. For ligands, bond order information is not always provided in molecular databases and thus a variety of approaches tackling this problem have been developed. In this work, we extend an ansatz proposed by Wang et al. that assigns connectivity-based penalty scores and tries to heuristically approximate its optimum. In this work, we present three efficient and exact solvers for the problem replacing the heuristic approximation scheme of the original approach: an A*, an ILP and an fixed-parameter approach (FPT) approach. We implemented and evaluated the original implementation, our A*, ILP and FPT formulation on the MMFF94 validation suite and the KEGG Drug database. We show the benefit of computing exact solutions of the penalty minimization problem and the additional gain when computing all optimal (or even suboptimal) solutions. We close with a detailed comparison of our methods. The A* and ILP solution are integrated into the open-source C++ LGPL library BALL and the molecular visualization and modelling tool BALLView and can be downloaded from our homepage www.ball-project.org. The FPT implementation can be downloaded from http://bio.informatik.uni-jena.de/software/.

SM2PH-db: an interactive system for the integrated analysis of phenotypic consequences of missense mutations in proteins involved in human genetic diseases.

PubMed

Friedrich, Anne; Garnier, Nicolas; Gagnière, Nicolas; Nguyen, Hoan; Albou, Laurent-Philippe; Biancalana, Valérie; Bettler, Emmanuel; Deléage, Gilbert; Lecompte, Odile; Muller, Jean; Moras, Dino; Mandel, Jean-Louis; Toursel, Thierry; Moulinier, Luc; Poch, Olivier

2010-02-01

Understanding how genetic alterations affect gene products at the molecular level represents a first step in the elucidation of the complex relationships between genotypic and phenotypic variations, and is thus a major challenge in the postgenomic era. Here, we present SM2PH-db (http://decrypthon.igbmc.fr/sm2ph), a new database designed to investigate structural and functional impacts of missense mutations and their phenotypic effects in the context of human genetic diseases. A wealth of up-to-date interconnected information is provided for each of the 2,249 disease-related entry proteins (August 2009), including data retrieved from biological databases and data generated from a Sequence-Structure-Evolution Inference in Systems-based approach, such as multiple alignments, three-dimensional structural models, and multidimensional (physicochemical, functional, structural, and evolutionary) characterizations of mutations. SM2PH-db provides a robust infrastructure associated with interactive analysis tools supporting in-depth study and interpretation of the molecular consequences of mutations, with the more long-term goal of elucidating the chain of events leading from a molecular defect to its pathology. The entire content of SM2PH-db is regularly and automatically updated thanks to a computational grid data federation facilities provided in the context of the Decrypthon program. (c) 2009 Wiley-Liss, Inc.
KECSA-Movable Type Implicit Solvation Model (KMTISM)

PubMed Central

2015-01-01

Computation of the solvation free energy for chemical and biological processes has long been of significant interest. The key challenges to effective solvation modeling center on the choice of potential function and configurational sampling. Herein, an energy sampling approach termed the “Movable Type” (MT) method, and a statistical energy function for solvation modeling, “Knowledge-based and Empirical Combined Scoring Algorithm” (KECSA) are developed and utilized to create an implicit solvation model: KECSA-Movable Type Implicit Solvation Model (KMTISM) suitable for the study of chemical and biological systems. KMTISM is an implicit solvation model, but the MT method performs energy sampling at the atom pairwise level. For a specific molecular system, the MT method collects energies from prebuilt databases for the requisite atom pairs at all relevant distance ranges, which by its very construction encodes all possible molecular configurations simultaneously. Unlike traditional statistical energy functions, KECSA converts structural statistical information into categorized atom pairwise interaction energies as a function of the radial distance instead of a mean force energy function. Within the implicit solvent model approximation, aqueous solvation free energies are then obtained from the NVT ensemble partition function generated by the MT method. Validation is performed against several subsets selected from the Minnesota Solvation Database v2012. Results are compared with several solvation free energy calculation methods, including a one-to-one comparison against two commonly used classical implicit solvation models: MM-GBSA and MM-PBSA. Comparison against a quantum mechanics based polarizable continuum model is also discussed (Cramer and Truhlar’s Solvation Model 12). PMID:25691832
Molecular taxonomy of phytopathogenic fungi: a case study in Peronospora.

PubMed

Göker, Markus; García-Blázquez, Gema; Voglmayr, Hermann; Tellería, M Teresa; Martín, María P

2009-07-29

Inappropriate taxon definitions may have severe consequences in many areas. For instance, biologically sensible species delimitation of plant pathogens is crucial for measures such as plant protection or biological control and for comparative studies involving model organisms. However, delimiting species is challenging in the case of organisms for which often only molecular data are available, such as prokaryotes, fungi, and many unicellular eukaryotes. Even in the case of organisms with well-established morphological characteristics, molecular taxonomy is often necessary to emend current taxonomic concepts and to analyze DNA sequences directly sampled from the environment. Typically, for this purpose clustering approaches to delineate molecular operational taxonomic units have been applied using arbitrary choices regarding the distance threshold values, and the clustering algorithms. Here, we report on a clustering optimization method to establish a molecular taxonomy of Peronospora based on ITS nrDNA sequences. Peronospora is the largest genus within the downy mildews, which are obligate parasites of higher plants, and includes various economically important pathogens. The method determines the distance function and clustering setting that result in an optimal agreement with selected reference data. Optimization was based on both taxonomy-based and host-based reference information, yielding the same outcome. Resampling and permutation methods indicate that the method is robust regarding taxon sampling and errors in the reference data. Tests with newly obtained ITS sequences demonstrate the use of the re-classified dataset in molecular identification of downy mildews. A corrected taxonomy is provided for all Peronospora ITS sequences contained in public databases. Clustering optimization appears to be broadly applicable in automated, sequence-based taxonomy. The method connects traditional and modern taxonomic disciplines by specifically addressing the issue of how to optimally account for both traditional species concepts and genetic divergence.
Molecular Taxonomy of Phytopathogenic Fungi: A Case Study in Peronospora

PubMed Central

Göker, Markus; García-Blázquez, Gema; Voglmayr, Hermann; Tellería, M. Teresa; Martín, María P.

2009-01-01

Background Inappropriate taxon definitions may have severe consequences in many areas. For instance, biologically sensible species delimitation of plant pathogens is crucial for measures such as plant protection or biological control and for comparative studies involving model organisms. However, delimiting species is challenging in the case of organisms for which often only molecular data are available, such as prokaryotes, fungi, and many unicellular eukaryotes. Even in the case of organisms with well-established morphological characteristics, molecular taxonomy is often necessary to emend current taxonomic concepts and to analyze DNA sequences directly sampled from the environment. Typically, for this purpose clustering approaches to delineate molecular operational taxonomic units have been applied using arbitrary choices regarding the distance threshold values, and the clustering algorithms. Methodology Here, we report on a clustering optimization method to establish a molecular taxonomy of Peronospora based on ITS nrDNA sequences. Peronospora is the largest genus within the downy mildews, which are obligate parasites of higher plants, and includes various economically important pathogens. The method determines the distance function and clustering setting that result in an optimal agreement with selected reference data. Optimization was based on both taxonomy-based and host-based reference information, yielding the same outcome. Resampling and permutation methods indicate that the method is robust regarding taxon sampling and errors in the reference data. Tests with newly obtained ITS sequences demonstrate the use of the re-classified dataset in molecular identification of downy mildews. Conclusions A corrected taxonomy is provided for all Peronospora ITS sequences contained in public databases. Clustering optimization appears to be broadly applicable in automated, sequence-based taxonomy. The method connects traditional and modern taxonomic disciplines by specifically addressing the issue of how to optimally account for both traditional species concepts and genetic divergence. PMID:19641601
Product Differences in Intra-articular Hyaluronic Acids for Osteoarthritis of the Knee.

PubMed

Altman, Roy D; Bedi, Asheesh; Karlsson, Jon; Sancheti, Parag; Schemitsch, Emil

2016-08-01

Knee osteoarthritis (OA) is a common and often disabling joint disorder among adults that may result in impaired activity and daily function. A variety of treatment options are currently available and prescribed for knee OA depending on the severity of the disorder and physician preference. Intra-articular hyaluronic acid (IA-HA) injection is a treatment for knee OA that reportedly provides numerous biochemical and biological benefits, including shock absorption, chondroprotection, and anti-inflammatory effects within the knee. Clarity is needed as to whether the available IA-HA products should be considered for therapy as a group or whether there are significant differences in the products that need to be considered in treatment of OA of the knee. To determine whether there are differences in efficacy and safety with respect to intrinsic properties of available IA-HA injections for knee OA. Meta-analysis. A comprehensive literature search of the Medline, EMBASE, and PubMed databases was conducted for all existing randomized trials of IA-HA. The primary outcome measure analyzed was the mean pain score at the reported follow-up nearest to 26 weeks after injection. Pooled efficacy and safety results were recorded for subgroupings of HA product characteristics. A total of 68 studies were included for analysis. Products with an average molecular weight ≥3000 kDa provided favorable efficacy results when compared with products of an average molecular weight <3000 kDa. Products with a molecular weight ≥3000 kDa demonstrated significantly fewer discontinuations due to treatment-related adverse events than did ≤1500 kDa counterparts, while trial discontinuation rates were similar between biological fermentation-derived HA products and avian-derived HA. The results did not demonstrate a significant difference in the occurrence of effusion across molecular weight subgroups. Additionally, biological fermentation-derived HA had a significantly smaller incidence of effusion than did avian-derived HA. Biological fermentation-derived HA demonstrated fewer acute flare-ups at the injection site than did avian-derived HA products, while high-molecular-weight products demonstrated the highest rate of injection site flare-up. Despite similarities, IA-HA products should not be treated as a group, as there are differences in IA-HA products that influence both efficacy and safety. In the available literature, IA-HA products with a molecular weight ≥3000 kDa and those derived from biological fermentation relate to superior efficacy and safety-factors that may influence selection an IA-HA product for OA of the knee. © 2015 The Author(s).
Interleukins and their signaling pathways in the Reactome biological pathway database.

PubMed

Jupe, Steve; Ray, Keith; Roca, Corina Duenas; Varusai, Thawfeek; Shamovsky, Veronica; Stein, Lincoln; D'Eustachio, Peter; Hermjakob, Henning

2018-04-01

There is a wealth of biological pathway information available in the scientific literature, but it is spread across many thousands of publications. Alongside publications that contain definitive experimental discoveries are many others that have been dismissed as spurious, found to be irreproducible, or are contradicted by later results and consequently now considered controversial. Many descriptions and images of pathways are incomplete stylized representations that assume the reader is an expert and familiar with the established details of the process, which are consequently not fully explained. Pathway representations in publications frequently do not represent a complete, detailed, and unambiguous description of the molecules involved; their precise posttranslational state; or a full account of the molecular events they undergo while participating in a process. Although this might be sufficient to be interpreted by an expert reader, the lack of detail makes such pathways less useful and difficult to understand for anyone unfamiliar with the area and of limited use as the basis for computational models. Reactome was established as a freely accessible knowledge base of human biological pathways. It is manually populated with interconnected molecular events that fully detail the molecular participants linked to published experimental data and background material by using a formal and open data structure that facilitates computational reuse. These data are accessible on a Web site in the form of pathway diagrams that have descriptive summaries and annotations and as downloadable data sets in several formats that can be reused with other computational tools. The entire database and all supporting software can be downloaded and reused under a Creative Commons license. Pathways are authored by expert biologists who work with Reactome curators and editorial staff to represent the consensus in the field. Pathways are represented as interactive diagrams that include as much molecular detail as possible and are linked to literature citations that contain supporting experimental details. All newly created events undergo a peer-review process before they are added to the database and made available on the associated Web site. New content is added quarterly. The 63rd release of Reactome in December 2017 contains 10,996 human proteins participating in 11,426 events in 2,179 pathways. In addition, analytic tools allow data set submission for the identification and visualization of pathway enrichment and representation of expression profiles as an overlay on Reactome pathways. Protein-protein and compound-protein interactions from several sources, including custom user data sets, can be added to extend pathways. Pathway diagrams and analytic result displays can be downloaded as editable images, human-readable reports, and files in several standard formats that are suitable for computational reuse. Reactome content is available programmatically through a REpresentational State Transfer (REST)-based content service and as a Neo4J graph database. Signaling pathways for IL-1 to IL-38 are hierarchically classified within the pathway "signaling by interleukins." The classification used is largely derived from Akdis et al. The addition to Reactome of a complete set of the known human interleukins, their receptors, and established signaling pathways linked to annotations of relevant aspects of immune function provides a significant computationally accessible resource of information about this important family. This information can be extended easily as new discoveries become accepted as the consensus in the field. A key aim for the future is to increase coverage of gene expression changes induced by interleukin signaling. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Computational tools for exploring sequence databases as a resource for antimicrobial peptides.

PubMed

Porto, W F; Pires, A S; Franco, O L

Data mining has been recognized by many researchers as a hot topic in different areas. In the post-genomic era, the growing number of sequences deposited in databases has been the reason why these databases have become a resource for novel biological information. In recent years, the identification of antimicrobial peptides (AMPs) in databases has gained attention. The identification of unannotated AMPs has shed some light on the distribution and evolution of AMPs and, in some cases, indicated suitable candidates for developing novel antimicrobial agents. The data mining process has been performed mainly by local alignments and/or regular expressions. Nevertheless, for the identification of distant homologous sequences, other techniques such as antimicrobial activity prediction and molecular modelling are required. In this context, this review addresses the tools and techniques, and also their limitations, for mining AMPs from databases. These methods could be helpful not only for the development of novel AMPs, but also for other kinds of proteins, at a higher level of structural genomics. Moreover, solving the problem of unannotated proteins could bring immeasurable benefits to society, especially in the case of AMPs, which could be helpful for developing novel antimicrobial agents and combating resistant bacteria. Copyright © 2017 Elsevier Inc. All rights reserved.
PDB-wide collection of binding data: current status of the PDBbind database.

PubMed

Liu, Zhihai; Li, Yan; Han, Li; Li, Jie; Liu, Jie; Zhao, Zhixiong; Nie, Wei; Liu, Yuchen; Wang, Renxiao

2015-02-01

Molecular recognition between biological macromolecules and organic small molecules plays an important role in various life processes. Both structural information and binding data of biomolecular complexes are indispensable for depicting the underlying mechanism in such an event. The PDBbind database was created to collect experimentally measured binding data for the biomolecular complexes throughout the Protein Data Bank (PDB). It thus provides the linkage between structural information and energetic properties of biomolecular complexes, which is especially desirable for computational studies or statistical analyses. Since its first public release in 2004, the PDBbind database has been updated on an annual basis. The latest release (version 2013) provides experimental binding affinity data for 10,776 biomolecular complexes in PDB, including 8302 protein-ligand complexes and 2474 other types of complexes. In this article, we will describe the current methods used for compiling PDBbind and the updated status of this database. We will also review some typical applications of PDBbind published in the scientific literature. All contents of this database are freely accessible at the PDBbind-CN Web server at http://www.pdbbind-cn.org/. wangrx@mail.sioc.ac.cn. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Plant Reactome: a resource for plant pathways and comparative analysis.

PubMed

Naithani, Sushma; Preece, Justin; D'Eustachio, Peter; Gupta, Parul; Amarasinghe, Vindhya; Dharmawardhana, Palitha D; Wu, Guanming; Fabregat, Antonio; Elser, Justin L; Weiser, Joel; Keays, Maria; Fuentes, Alfonso Munoz-Pomer; Petryszak, Robert; Stein, Lincoln D; Ware, Doreen; Jaiswal, Pankaj

2017-01-04

Plant Reactome (http://plantreactome.gramene.org/) is a free, open-source, curated plant pathway database portal, provided as part of the Gramene project. The database provides intuitive bioinformatics tools for the visualization, analysis and interpretation of pathway knowledge to support genome annotation, genome analysis, modeling, systems biology, basic research and education. Plant Reactome employs the structural framework of a plant cell to show metabolic, transport, genetic, developmental and signaling pathways. We manually curate molecular details of pathways in these domains for reference species Oryza sativa (rice) supported by published literature and annotation of well-characterized genes. Two hundred twenty-two rice pathways, 1025 reactions associated with 1173 proteins, 907 small molecules and 256 literature references have been curated to date. These reference annotations were used to project pathways for 62 model, crop and evolutionarily significant plant species based on gene homology. Database users can search and browse various components of the database, visualize curated baseline expression of pathway-associated genes provided by the Expression Atlas and upload and analyze their Omics datasets. The database also offers data access via Application Programming Interfaces (APIs) and in various standardized pathway formats, such as SBML and BioPAX. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Creating and Using a Consumer Chemical Molecular Graphics Database: The "Molecule of the Day" - A Great Way To Begin Your Lecture

NASA Astrophysics Data System (ADS)

Scharberg, Maureen A.; Cox, Oran E.; Barelli, Carl A.

1997-07-01

"The Molecule of the Day" consumer chemical database has been created to allow introductory chemistry students to explore molecular structures of chemicals in household products, and to provide opportunities in molecular modeling for undergraduate chemistry students. Before class begins, an overhead transparency is displayed which shows a three-dimensional molecular structure of a household chemical, and lists relevant features and uses of this chemical. Within answers to questionnaires, students have commented that this molecular graphics database has helped them to visually connect the microscopic structure of a molecule with its physical and chemical properties, as well as its uses in consumer products. It is anticipated that this database will be incorporated into a navigational software package such as Netscape.
Improved Infrastucture for Cdms and JPL Molecular Spectroscopy Catalogues

NASA Astrophysics Data System (ADS)

Endres, Christian; Schlemmer, Stephan; Drouin, Brian; Pearson, John; Müller, Holger S. P.; Schilke, P.; Stutzki, Jürgen

2014-06-01

Over the past years a new infrastructure for atomic and molecular databases has been developed within the framework of the Virtual Atomic and Molecular Data Centre (VAMDC). Standards for the representation of atomic and molecular data as well as a set of protocols have been established which allow now to retrieve data from various databases through one portal and to combine the data easily. Apart from spectroscopic databases such as the Cologne Database for Molecular Spectroscopy (CDMS), the Jet Propulsion Laboratory microwave, millimeter and submillimeter spectral line catalogue (JPL) and the HITRAN database, various databases on molecular collisions (BASECOL, KIDA) and reactions (UMIST) are connected. Together with other groups within the VAMDC consortium we are working on common user tools to simplify the access for new customers and to tailor data requests for users with specified needs. This comprises in particular tools to support the analysis of complex observational data obtained with the ALMA telescope. In this presentation requests to CDMS and JPL will be used to explain the basic concepts and the tools which are provided by VAMDC. In addition a new portal to CDMS will be presented which has a number of new features, in particular meaningful quantum numbers, references linked to data points, access to state energies and improved documentation. Fit files are accessible for download and queries to other databases are possible.
Predicting Protein Relationships to Human Pathways through a Relational Learning Approach Based on Simple Sequence Features.

PubMed

García-Jiménez, Beatriz; Pons, Tirso; Sanchis, Araceli; Valencia, Alfonso

2014-01-01

Biological pathways are important elements of systems biology and in the past decade, an increasing number of pathway databases have been set up to document the growing understanding of complex cellular processes. Although more genome-sequence data are becoming available, a large fraction of it remains functionally uncharacterized. Thus, it is important to be able to predict the mapping of poorly annotated proteins to original pathway models. We have developed a Relational Learning-based Extension (RLE) system to investigate pathway membership through a function prediction approach that mainly relies on combinations of simple properties attributed to each protein. RLE searches for proteins with molecular similarities to specific pathway components. Using RLE, we associated 383 uncharacterized proteins to 28 pre-defined human Reactome pathways, demonstrating relative confidence after proper evaluation. Indeed, in specific cases manual inspection of the database annotations and the related literature supported the proposed classifications. Examples of possible additional components of the Electron transport system, Telomere maintenance and Integrin cell surface interactions pathways are discussed in detail. All the human predicted proteins in the 2009 and 2012 releases 30 and 40 of Reactome are available at http://rle.bioinfo.cnio.es.
Psmir: a database of potential associations between small molecules and miRNAs

PubMed Central

Meng, Fanlin; Wang, Jing; Dai, Enyu; Yang, Feng; Chen, Xiaowen; Wang, Shuyuan; Yu, Xuexin; Liu, Dianming; Jiang, Wei

2016-01-01

miRNAs are key post-transcriptional regulators of many essential biological processes, and their dysregulation has been validated in almost all human cancers. Restoring aberrantly expressed miRNAs might be a novel therapeutics. Recently, many studies have demonstrated that small molecular compounds can affect miRNA expression. Thus, prediction of associations between small molecules and miRNAs is important for investigation of miRNA-targeted drugs. Here, we analyzed 39 miRNA-perturbed gene expression profiles, and then calculated the similarity of transcription responses between miRNA perturbation and drug treatment to predict drug-miRNA associations. At the significance level of 0.05, we obtained 6501 candidate associations between 1295 small molecules and 25 miRNAs, which included 624 FDA approved drugs. Finally, we constructed the Psmir database to store all potential associations and the related materials. In a word, Psmir served as a valuable resource for dissecting the biological significance in small molecules’ effects on miRNA expression, which will facilitate developing novel potential therapeutic targets or treatments for human cancers. Psmir is supported by all major browsers, and is freely available at http://www.bio-bigdata.com/Psmir/. PMID:26759061
Psmir: a database of potential associations between small molecules and miRNAs.

PubMed

Meng, Fanlin; Wang, Jing; Dai, Enyu; Yang, Feng; Chen, Xiaowen; Wang, Shuyuan; Yu, Xuexin; Liu, Dianming; Jiang, Wei

2016-01-13

miRNAs are key post-transcriptional regulators of many essential biological processes, and their dysregulation has been validated in almost all human cancers. Restoring aberrantly expressed miRNAs might be a novel therapeutics. Recently, many studies have demonstrated that small molecular compounds can affect miRNA expression. Thus, prediction of associations between small molecules and miRNAs is important for investigation of miRNA-targeted drugs. Here, we analyzed 39 miRNA-perturbed gene expression profiles, and then calculated the similarity of transcription responses between miRNA perturbation and drug treatment to predict drug-miRNA associations. At the significance level of 0.05, we obtained 6501 candidate associations between 1295 small molecules and 25 miRNAs, which included 624 FDA approved drugs. Finally, we constructed the Psmir database to store all potential associations and the related materials. In a word, Psmir served as a valuable resource for dissecting the biological significance in small molecules' effects on miRNA expression, which will facilitate developing novel potential therapeutic targets or treatments for human cancers. Psmir is supported by all major browsers, and is freely available at http://www.bio-bigdata.com/Psmir/.
BioBenchmark Toyama 2012: an evaluation of the performance of triple stores on biological data

PubMed Central

2014-01-01

Background Biological databases vary enormously in size and data complexity, from small databases that contain a few million Resource Description Framework (RDF) triples to large databases that contain billions of triples. In this paper, we evaluate whether RDF native stores can be used to meet the needs of a biological database provider. Prior evaluations have used synthetic data with a limited database size. For example, the largest BSBM benchmark uses 1 billion synthetic e-commerce knowledge RDF triples on a single node. However, real world biological data differs from the simple synthetic data much. It is difficult to determine whether the synthetic e-commerce data is efficient enough to represent biological databases. Therefore, for this evaluation, we used five real data sets from biological databases. Results We evaluated five triple stores, 4store, Bigdata, Mulgara, Virtuoso, and OWLIM-SE, with five biological data sets, Cell Cycle Ontology, Allie, PDBj, UniProt, and DDBJ, ranging in size from approximately 10 million to 8 billion triples. For each database, we loaded all the data into our single node and prepared the database for use in a classical data warehouse scenario. Then, we ran a series of SPARQL queries against each endpoint and recorded the execution time and the accuracy of the query response. Conclusions Our paper shows that with appropriate configuration Virtuoso and OWLIM-SE can satisfy the basic requirements to load and query biological data less than 8 billion or so on a single node, for the simultaneous access of 64 clients. OWLIM-SE performs best for databases with approximately 11 million triples; For data sets that contain 94 million and 590 million triples, OWLIM-SE and Virtuoso perform best. They do not show overwhelming advantage over each other; For data over 4 billion Virtuoso works best. 4store performs well on small data sets with limited features when the number of triples is less than 100 million, and our test shows its scalability is poor; Bigdata demonstrates average performance and is a good open source triple store for middle-sized (500 million or so) data set; Mulgara shows a little of fragility. PMID:25089180
BioBenchmark Toyama 2012: an evaluation of the performance of triple stores on biological data.

PubMed

Wu, Hongyan; Fujiwara, Toyofumi; Yamamoto, Yasunori; Bolleman, Jerven; Yamaguchi, Atsuko

2014-01-01

Biological databases vary enormously in size and data complexity, from small databases that contain a few million Resource Description Framework (RDF) triples to large databases that contain billions of triples. In this paper, we evaluate whether RDF native stores can be used to meet the needs of a biological database provider. Prior evaluations have used synthetic data with a limited database size. For example, the largest BSBM benchmark uses 1 billion synthetic e-commerce knowledge RDF triples on a single node. However, real world biological data differs from the simple synthetic data much. It is difficult to determine whether the synthetic e-commerce data is efficient enough to represent biological databases. Therefore, for this evaluation, we used five real data sets from biological databases. We evaluated five triple stores, 4store, Bigdata, Mulgara, Virtuoso, and OWLIM-SE, with five biological data sets, Cell Cycle Ontology, Allie, PDBj, UniProt, and DDBJ, ranging in size from approximately 10 million to 8 billion triples. For each database, we loaded all the data into our single node and prepared the database for use in a classical data warehouse scenario. Then, we ran a series of SPARQL queries against each endpoint and recorded the execution time and the accuracy of the query response. Our paper shows that with appropriate configuration Virtuoso and OWLIM-SE can satisfy the basic requirements to load and query biological data less than 8 billion or so on a single node, for the simultaneous access of 64 clients. OWLIM-SE performs best for databases with approximately 11 million triples; For data sets that contain 94 million and 590 million triples, OWLIM-SE and Virtuoso perform best. They do not show overwhelming advantage over each other; For data over 4 billion Virtuoso works best. 4store performs well on small data sets with limited features when the number of triples is less than 100 million, and our test shows its scalability is poor; Bigdata demonstrates average performance and is a good open source triple store for middle-sized (500 million or so) data set; Mulgara shows a little of fragility.
Biological network extraction from scientific literature: state of the art and challenges.

PubMed

Li, Chen; Liakata, Maria; Rebholz-Schuhmann, Dietrich

2014-09-01

Networks of molecular interactions explain complex biological processes, and all known information on molecular events is contained in a number of public repositories including the scientific literature. Metabolic and signalling pathways are often viewed separately, even though both types are composed of interactions involving proteins and other chemical entities. It is necessary to be able to combine data from all available resources to judge the functionality, complexity and completeness of any given network overall, but especially the full integration of relevant information from the scientific literature is still an ongoing and complex task. Currently, the text-mining research community is steadily moving towards processing the full body of the scientific literature by making use of rich linguistic features such as full text parsing, to extract biological interactions. The next step will be to combine these with information from scientific databases to support hypothesis generation for the discovery of new knowledge and the extension of biological networks. The generation of comprehensive networks requires technologies such as entity grounding, coordination resolution and co-reference resolution, which are not fully solved and are required to further improve the quality of results. Here, we analyse the state of the art for the extraction of network information from the scientific literature and the evaluation of extraction methods against reference corpora, discuss challenges involved and identify directions for future research. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
A reductionist approach to extract robust molecular markers from microarray data series - Isolating markers to track osseointegration.

PubMed

Barik, Anwesha; Banerjee, Satarupa; Dhara, Santanu; Chakravorty, Nishant

2017-04-01

Complexities in the full genome expression studies hinder the extraction of tracker genes to analyze the course of biological events. In this study, we demonstrate the applications of supervised machine learning methods to reduce the irrelevance in microarray data series and thereby extract robust molecular markers to track biological processes. The methodology has been illustrated by analyzing whole genome expression studies on bone-implant integration (ossointegration). Being a biological process, osseointegration is known to leave a trail of genetic footprint during the course. In spite of existence of enormous amount of raw data in public repositories, researchers still do not have access to a panel of genes that can definitively track osseointegration. The results from our study revealed panels comprising of matrix metalloproteinases and collagen genes were able to track osseointegration on implant surfaces (MMP9 and COL1A2 on micro-textured; MMP12 and COL6A3 on superimposed nano-textured surfaces) with 100% classification accuracy, specificity and sensitivity. Further, our analysis showed the importance of the progression of the duration in establishment of the mechanical connection at bone-implant surface. The findings from this study are expected to be useful to researchers investigating osseointegration of novel implant materials especially at the early stage. The methodology demonstrated can be easily adapted by scientists in different fields to analyze large databases for other biological processes. Copyright © 2017 Elsevier Inc. All rights reserved.
Relational Databases: A Transparent Framework for Encouraging Biology Students to Think Informatically

ERIC Educational Resources Information Center

Rice, Michael; Gladstone, William; Weir, Michael

2004-01-01

We discuss how relational databases constitute an ideal framework for representing and analyzing large-scale genomic data sets in biology. As a case study, we describe a Drosophila splice-site database that we recently developed at Wesleyan University for use in research and teaching. The database stores data about splice sites computed by a…
Generating Gene Ontology-Disease Inferences to Explore Mechanisms of Human Disease at the Comparative Toxicogenomics Database

PubMed Central

Davis, Allan Peter; Wiegers, Thomas C.; King, Benjamin L.; Wiegers, Jolene; Grondin, Cynthia J.; Sciaky, Daniela; Johnson, Robin J.; Mattingly, Carolyn J.

2016-01-01

Strategies for discovering common molecular events among disparate diseases hold promise for improving understanding of disease etiology and expanding treatment options. One technique is to leverage curated datasets found in the public domain. The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) manually curates chemical-gene, chemical-disease, and gene-disease interactions from the scientific literature. The use of official gene symbols in CTD interactions enables this information to be combined with the Gene Ontology (GO) file from NCBI Gene. By integrating these GO-gene annotations with CTD’s gene-disease dataset, we produce 753,000 inferences between 15,700 GO terms and 4,200 diseases, providing opportunities to explore presumptive molecular underpinnings of diseases and identify biological similarities. Through a variety of applications, we demonstrate the utility of this novel resource. As a proof-of-concept, we first analyze known repositioned drugs (e.g., raloxifene and sildenafil) and see that their target diseases have a greater degree of similarity when comparing GO terms vs. genes. Next, a computational analysis predicts seemingly non-intuitive diseases (e.g., stomach ulcers and atherosclerosis) as being similar to bipolar disorder, and these are validated in the literature as reported co-diseases. Additionally, we leverage other CTD content to develop testable hypotheses about thalidomide-gene networks to treat seemingly disparate diseases. Finally, we illustrate how CTD tools can rank a series of drugs as potential candidates for repositioning against B-cell chronic lymphocytic leukemia and predict cisplatin and the small molecule inhibitor JQ1 as lead compounds. The CTD dataset is freely available for users to navigate pathologies within the context of extensive biological processes, molecular functions, and cellular components conferred by GO. This inference set should aid researchers, bioinformaticists, and pharmaceutical drug makers in finding commonalities in disease mechanisms, which in turn could help identify new therapeutics, new indications for existing pharmaceuticals, potential disease comorbidities, and alerts for side effects. PMID:27171405

Generating Gene Ontology-Disease Inferences to Explore Mechanisms of Human Disease at the Comparative Toxicogenomics Database.

PubMed

Davis, Allan Peter; Wiegers, Thomas C; King, Benjamin L; Wiegers, Jolene; Grondin, Cynthia J; Sciaky, Daniela; Johnson, Robin J; Mattingly, Carolyn J

2016-01-01

Strategies for discovering common molecular events among disparate diseases hold promise for improving understanding of disease etiology and expanding treatment options. One technique is to leverage curated datasets found in the public domain. The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) manually curates chemical-gene, chemical-disease, and gene-disease interactions from the scientific literature. The use of official gene symbols in CTD interactions enables this information to be combined with the Gene Ontology (GO) file from NCBI Gene. By integrating these GO-gene annotations with CTD's gene-disease dataset, we produce 753,000 inferences between 15,700 GO terms and 4,200 diseases, providing opportunities to explore presumptive molecular underpinnings of diseases and identify biological similarities. Through a variety of applications, we demonstrate the utility of this novel resource. As a proof-of-concept, we first analyze known repositioned drugs (e.g., raloxifene and sildenafil) and see that their target diseases have a greater degree of similarity when comparing GO terms vs. genes. Next, a computational analysis predicts seemingly non-intuitive diseases (e.g., stomach ulcers and atherosclerosis) as being similar to bipolar disorder, and these are validated in the literature as reported co-diseases. Additionally, we leverage other CTD content to develop testable hypotheses about thalidomide-gene networks to treat seemingly disparate diseases. Finally, we illustrate how CTD tools can rank a series of drugs as potential candidates for repositioning against B-cell chronic lymphocytic leukemia and predict cisplatin and the small molecule inhibitor JQ1 as lead compounds. The CTD dataset is freely available for users to navigate pathologies within the context of extensive biological processes, molecular functions, and cellular components conferred by GO. This inference set should aid researchers, bioinformaticists, and pharmaceutical drug makers in finding commonalities in disease mechanisms, which in turn could help identify new therapeutics, new indications for existing pharmaceuticals, potential disease comorbidities, and alerts for side effects.
Gene-Disease Network Analysis Reveals Functional Modules in Mendelian, Complex and Environmental Diseases

PubMed Central

Bauer-Mehren, Anna; Bundschus, Markus; Rautschka, Michael; Mayer, Miguel A.; Sanz, Ferran; Furlong, Laura I.

2011-01-01

Background Scientists have been trying to understand the molecular mechanisms of diseases to design preventive and therapeutic strategies for a long time. For some diseases, it has become evident that it is not enough to obtain a catalogue of the disease-related genes but to uncover how disruptions of molecular networks in the cell give rise to disease phenotypes. Moreover, with the unprecedented wealth of information available, even obtaining such catalogue is extremely difficult. Principal Findings We developed a comprehensive gene-disease association database by integrating associations from several sources that cover different biomedical aspects of diseases. In particular, we focus on the current knowledge of human genetic diseases including mendelian, complex and environmental diseases. To assess the concept of modularity of human diseases, we performed a systematic study of the emergent properties of human gene-disease networks by means of network topology and functional annotation analysis. The results indicate a highly shared genetic origin of human diseases and show that for most diseases, including mendelian, complex and environmental diseases, functional modules exist. Moreover, a core set of biological pathways is found to be associated with most human diseases. We obtained similar results when studying clusters of diseases, suggesting that related diseases might arise due to dysfunction of common biological processes in the cell. Conclusions For the first time, we include mendelian, complex and environmental diseases in an integrated gene-disease association database and show that the concept of modularity applies for all of them. We furthermore provide a functional analysis of disease-related modules providing important new biological insights, which might not be discovered when considering each of the gene-disease association repositories independently. Hence, we present a suitable framework for the study of how genetic and environmental factors, such as drugs, contribute to diseases. Availability The gene-disease networks used in this study and part of the analysis are available at http://ibi.imim.es/DisGeNET/DisGeNETweb.html#Download. PMID:21695124
Gene-disease network analysis reveals functional modules in mendelian, complex and environmental diseases.

PubMed

Bauer-Mehren, Anna; Bundschus, Markus; Rautschka, Michael; Mayer, Miguel A; Sanz, Ferran; Furlong, Laura I

2011-01-01

Scientists have been trying to understand the molecular mechanisms of diseases to design preventive and therapeutic strategies for a long time. For some diseases, it has become evident that it is not enough to obtain a catalogue of the disease-related genes but to uncover how disruptions of molecular networks in the cell give rise to disease phenotypes. Moreover, with the unprecedented wealth of information available, even obtaining such catalogue is extremely difficult. We developed a comprehensive gene-disease association database by integrating associations from several sources that cover different biomedical aspects of diseases. In particular, we focus on the current knowledge of human genetic diseases including mendelian, complex and environmental diseases. To assess the concept of modularity of human diseases, we performed a systematic study of the emergent properties of human gene-disease networks by means of network topology and functional annotation analysis. The results indicate a highly shared genetic origin of human diseases and show that for most diseases, including mendelian, complex and environmental diseases, functional modules exist. Moreover, a core set of biological pathways is found to be associated with most human diseases. We obtained similar results when studying clusters of diseases, suggesting that related diseases might arise due to dysfunction of common biological processes in the cell. For the first time, we include mendelian, complex and environmental diseases in an integrated gene-disease association database and show that the concept of modularity applies for all of them. We furthermore provide a functional analysis of disease-related modules providing important new biological insights, which might not be discovered when considering each of the gene-disease association repositories independently. Hence, we present a suitable framework for the study of how genetic and environmental factors, such as drugs, contribute to diseases. The gene-disease networks used in this study and part of the analysis are available at http://ibi.imim.es/DisGeNET/DisGeNETweb.html#Download.
EWET: Data collection and interface for the genetic analysis of Echinococcus multilocularis based on EmsB microsatellite.

PubMed

Knapp, Jenny; Damy, Sylvie; Brillaud, Jonathan; Tissot, Jean-Daniel; Navion, Jérémy; Mélior, Raphael; Afonso, Eve; Hormaz, Vanessa; Gottstein, Bruno; Umhang, Gérald; Casulli, Adriano; Dadeau, Frédéric; Millon, Laurence; Raoul, Francis

2017-01-01

Evolution and dispersion history on Earth of organisms can best be studied through biological markers in molecular epidemiological studies. The biological diversity of the cestode Echinococcus multilocularis was investigated in different cladistic approaches. First the morphological aspects were explored in connection with its ecology. More recently, molecular aspects were investigated to better understand the nature of the variations observed among isolates. The study of the tandemly repeated multilocus microsatellite EmsB allowed us to attain a high genetic diversity level where other classic markers have failed. Since 2006, EmsB data have been collected on specimens from various endemic foci of the parasite in Europe (in historic and newly endemic areas), Asia (China, Japan and Kyrgyzstan), and North America (Canada and Alaska). Biological data on the isolates and metadata were also recorded (e.g. host, geographical location, EmsB analysis, citation in the literature). In order to make available the data set of 1,166 isolates from classic and aberrant domestic and wild animal hosts (larval lesions and adult worms) and from human origin, an open web access interface, developed in PHP, and connected to a PostgreSQL database, was developed in the EmsB Website for the Echinococcus Typing (EWET) project. It allows researchers to access data collection, perform genetic analyses online (e.g. defining the genetic distance between their own samples and the samples in the database), consult distribution maps of EmsB profiles, and record and share their new EmsB genotyping data. In order to standardize the EmsB analyses performed in the different laboratories throughout the world, a calibrator was developed. The final aim of this project was to gather and arrange available data to permit to better understand the dispersion and transmission patterns of the parasite among definitive and intermediate hosts, in order to organize control strategies on the ground.
EWET: Data collection and interface for the genetic analysis of Echinococcus multilocularis based on EmsB microsatellite

PubMed Central

Damy, Sylvie; Brillaud, Jonathan; Tissot, Jean-Daniel; Navion, Jérémy; Mélior, Raphael; Afonso, Eve; Hormaz, Vanessa; Gottstein, Bruno; Umhang, Gérald; Casulli, Adriano; Dadeau, Frédéric; Millon, Laurence; Raoul, Francis

2017-01-01

Evolution and dispersion history on Earth of organisms can best be studied through biological markers in molecular epidemiological studies. The biological diversity of the cestode Echinococcus multilocularis was investigated in different cladistic approaches. First the morphological aspects were explored in connection with its ecology. More recently, molecular aspects were investigated to better understand the nature of the variations observed among isolates. The study of the tandemly repeated multilocus microsatellite EmsB allowed us to attain a high genetic diversity level where other classic markers have failed. Since 2006, EmsB data have been collected on specimens from various endemic foci of the parasite in Europe (in historic and newly endemic areas), Asia (China, Japan and Kyrgyzstan), and North America (Canada and Alaska). Biological data on the isolates and metadata were also recorded (e.g. host, geographical location, EmsB analysis, citation in the literature). In order to make available the data set of 1,166 isolates from classic and aberrant domestic and wild animal hosts (larval lesions and adult worms) and from human origin, an open web access interface, developed in PHP, and connected to a PostgreSQL database, was developed in the EmsB Website for the Echinococcus Typing (EWET) project. It allows researchers to access data collection, perform genetic analyses online (e.g. defining the genetic distance between their own samples and the samples in the database), consult distribution maps of EmsB profiles, and record and share their new EmsB genotyping data. In order to standardize the EmsB analyses performed in the different laboratories throughout the world, a calibrator was developed. The final aim of this project was to gather and arrange available data to permit to better understand the dispersion and transmission patterns of the parasite among definitive and intermediate hosts, in order to organize control strategies on the ground. PMID:28972978
Towards BioDBcore: a community-defined information specification for biological databases

PubMed Central

Gaudet, Pascale; Bairoch, Amos; Field, Dawn; Sansone, Susanna-Assunta; Taylor, Chris; Attwood, Teresa K.; Bateman, Alex; Blake, Judith A.; Bult, Carol J.; Cherry, J. Michael; Chisholm, Rex L.; Cochrane, Guy; Cook, Charles E.; Eppig, Janan T.; Galperin, Michael Y.; Gentleman, Robert; Goble, Carole A.; Gojobori, Takashi; Hancock, John M.; Howe, Douglas G.; Imanishi, Tadashi; Kelso, Janet; Landsman, David; Lewis, Suzanna E.; Mizrachi, Ilene Karsch; Orchard, Sandra; Ouellette, B. F. Francis; Ranganathan, Shoba; Richardson, Lorna; Rocca-Serra, Philippe; Schofield, Paul N.; Smedley, Damian; Southan, Christopher; Tan, Tin Wee; Tatusova, Tatiana; Whetzel, Patricia L.; White, Owen; Yamasaki, Chisato

2011-01-01

The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases. PMID:21097465
Towards BioDBcore: a community-defined information specification for biological databases

PubMed Central

Gaudet, Pascale; Bairoch, Amos; Field, Dawn; Sansone, Susanna-Assunta; Taylor, Chris; Attwood, Teresa K.; Bateman, Alex; Blake, Judith A.; Bult, Carol J.; Cherry, J. Michael; Chisholm, Rex L.; Cochrane, Guy; Cook, Charles E.; Eppig, Janan T.; Galperin, Michael Y.; Gentleman, Robert; Goble, Carole A.; Gojobori, Takashi; Hancock, John M.; Howe, Douglas G.; Imanishi, Tadashi; Kelso, Janet; Landsman, David; Lewis, Suzanna E.; Karsch Mizrachi, Ilene; Orchard, Sandra; Ouellette, B.F. Francis; Ranganathan, Shoba; Richardson, Lorna; Rocca-Serra, Philippe; Schofield, Paul N.; Smedley, Damian; Southan, Christopher; Tan, Tin W.; Tatusova, Tatiana; Whetzel, Patricia L.; White, Owen; Yamasaki, Chisato

2011-01-01

The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources; and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases. PMID:21205783
An Integrated Molecular Database on Indian Insects.

PubMed

Pratheepa, Maria; Venkatesan, Thiruvengadam; Gracy, Gandhi; Jalali, Sushil Kumar; Rangheswaran, Rajagopal; Antony, Jomin Cruz; Rai, Anil

2018-01-01

MOlecular Database on Indian Insects (MODII) is an online database linking several databases like Insect Pest Info, Insect Barcode Information System (IBIn), Insect Whole Genome sequence, Other Genomic Resources of National Bureau of Agricultural Insect Resources (NBAIR), Whole Genome sequencing of Honey bee viruses, Insecticide resistance gene database and Genomic tools. This database was developed with a holistic approach for collecting information about phenomic and genomic information of agriculturally important insects. This insect resource database is available online for free at http://cib.res.in. http://cib.res.in/.
Object-oriented parsing of biological databases with Python.

PubMed

Ramu, C; Gemünd, C; Gibson, T J

2000-07-01

While database activities in the biological area are increasing rapidly, rather little is done in the area of parsing them in a simple and object-oriented way. We present here an elegant, simple yet powerful way of parsing biological flat-file databases. We have taken EMBL, SWISSPROT and GENBANK as examples. EMBL and SWISS-PROT do not differ much in the format structure. GENBANK has a very different format structure than EMBL and SWISS-PROT. Extracting the desired fields in an entry (for example a sub-sequence with an associated feature) for later analysis is a constant need in the biological sequence-analysis community: this is illustrated with tools to make new splice-site databases. The interface to the parser is abstract in the sense that the access to all the databases is independent from their different formats, since parsing instructions are hidden.
Evolving Strategies for the Incorporation of Bioinformatics within the Undergraduate Cell Biology Curriculum

ERIC Educational Resources Information Center

Honts, Jerry E.

2003-01-01

Recent advances in genomics and structural biology have resulted in an unprecedented increase in biological data available from Internet-accessible databases. In order to help students effectively use this vast repository of information, undergraduate biology students at Drake University were introduced to bioinformatics software and databases in…
Database-independent Protein Sequencing (DiPS) Enables Full-length de Novo Protein and Antibody Sequence Determination.

PubMed

Savidor, Alon; Barzilay, Rotem; Elinger, Dalia; Yarden, Yosef; Lindzen, Moshit; Gabashvili, Alexandra; Adiv Tal, Ophir; Levin, Yishai

2017-06-01

Traditional "bottom-up" proteomic approaches use proteolytic digestion, LC-MS/MS, and database searching to elucidate peptide identities and their parent proteins. Protein sequences absent from the database cannot be identified, and even if present in the database, complete sequence coverage is rarely achieved even for the most abundant proteins in the sample. Thus, sequencing of unknown proteins such as antibodies or constituents of metaproteomes remains a challenging problem. To date, there is no available method for full-length protein sequencing, independent of a reference database, in high throughput. Here, we present Database-independent Protein Sequencing, a method for unambiguous, rapid, database-independent, full-length protein sequencing. The method is a novel combination of non-enzymatic, semi-random cleavage of the protein, LC-MS/MS analysis, peptide de novo sequencing, extraction of peptide tags, and their assembly into a consensus sequence using an algorithm named "Peptide Tag Assembler." As proof-of-concept, the method was applied to samples of three known proteins representing three size classes and to a previously un-sequenced, clinically relevant monoclonal antibody. Excluding leucine/isoleucine and glutamic acid/deamidated glutamine ambiguities, end-to-end full-length de novo sequencing was achieved with 99-100% accuracy for all benchmarking proteins and the antibody light chain. Accuracy of the sequenced antibody heavy chain, including the entire variable region, was also 100%, but there was a 23-residue gap in the constant region sequence. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
Atomic and Molecular Databases, VAMDC (Virtual Atomic and Molecular Data Centre)

NASA Astrophysics Data System (ADS)

Dubernet, Marie-Lise; Zwölf, Carlo Maria; Moreau, Nicolas; Awa Ba, Yaya; VAMDC Consortium

2015-08-01

The "Virtual Atomic and Molecular Data Centre Consortium",(VAMDC Consortium, http://www.vamdc.eu) is a Consortium bound by an Memorandum of Understanding aiming at ensuring the sustainability of the VAMDC e-infrastructure. The current VAMDC e-infrastructure inter-connects about 30 atomic and molecular databases with the number of connected databases increasing every year: some databases are well-known databases such as CDMS, JPL, HITRAN, VALD,.., other databases have been created since the start of VAMDC. About 90% of our databases are used for astrophysical applications. The data can be queried, retrieved, visualized in a single format from a general portal (http://portal.vamdc.eu) and VAMDC is also developing standalone tools in order to retrieve and handle the data. VAMDC provides software and support in order to include databases within the VAMDC e-infrastructure. One current feature of VAMDC is the constrained environnement of description of data that ensures a higher quality for distribution of data; a future feature is the link of VAMDC with evaluation/validation groups. The talk will present the VAMDC Consortium and the VAMDC e infrastructure with its underlying technology, its services, its science use cases and its etension towards other communities than the academic research community.
The Molecular Biology Capstone Assessment: A Concept Assessment for Upper-Division Molecular Biology Students

ERIC Educational Resources Information Center

Couch, Brian A.; Wood, William B.; Knight, Jennifer K.

2015-01-01

Measuring students' conceptual understandings has become increasingly important to biology faculty members involved in evaluating and improving departmental programs. We developed the Molecular Biology Capstone Assessment (MBCA) to gauge comprehension of fundamental concepts in molecular and cell biology and the ability to apply these concepts in…
KEGG Bioinformatics Resource for Plant Genomics and Metabolomics.

PubMed

Kanehisa, Minoru

2016-01-01

In the era of high-throughput biology it is necessary to develop not only elaborate computational methods but also well-curated databases that can be used as reference for data interpretation. KEGG ( http://www.kegg.jp/ ) is such a reference knowledge base with two specific aims. One is to compile knowledge on high-level functions of the cell and the organism in terms of the molecular interaction and reaction networks, which is implemented in KEGG pathway maps, BRITE functional hierarchies, and KEGG modules. The other is to expand knowledge on genes and proteins involved in the molecular networks from experimentally observed organisms to other organisms using the concept of orthologs, which is implemented in the KEGG Orthology (KO) system. Thus, KEGG is a generic resource applicable to all organisms and enables interpretation of high-level functions from genomic and molecular data. Here we first present a brief overview of the entire KEGG resource, and then give an introduction of how to use KEGG in plant genomics and metabolomics research.
ChemoPy: freely available python package for computational biology and chemoinformatics.

PubMed

Cao, Dong-Sheng; Xu, Qing-Song; Hu, Qian-Nan; Liang, Yi-Zeng

2013-04-15

Molecular representation for small molecules has been routinely used in QSAR/SAR, virtual screening, database search, ranking, drug ADME/T prediction and other drug discovery processes. To facilitate extensive studies of drug molecules, we developed a freely available, open-source python package called chemoinformatics in python (ChemoPy) for calculating the commonly used structural and physicochemical features. It computes 16 drug feature groups composed of 19 descriptors that include 1135 descriptor values. In addition, it provides seven types of molecular fingerprint systems for drug molecules, including topological fingerprints, electro-topological state (E-state) fingerprints, MACCS keys, FP4 keys, atom pairs fingerprints, topological torsion fingerprints and Morgan/circular fingerprints. By applying a semi-empirical quantum chemistry program MOPAC, ChemoPy can also compute a large number of 3D molecular descriptors conveniently. The python package, ChemoPy, is freely available via http://code.google.com/p/pychem/downloads/list, and it runs on Linux and MS-Windows. Supplementary data are available at Bioinformatics online.
Synthetic biology for microbial heavy metal biosensors.

PubMed

Kim, Hyun Ju; Jeong, Haeyoung; Lee, Sang Jun

2018-02-01

Using recombinant DNA technology, various whole-cell biosensors have been developed for detection of environmental pollutants, including heavy metal ions. Whole-cell biosensors have several advantages: easy and inexpensive cultivation, multiple assays, and no requirement of any special techniques for analysis. In the era of synthetic biology, cutting-edge DNA sequencing and gene synthesis technologies have accelerated the development of cell-based biosensors. Here, we summarize current technological advances in whole-cell heavy metal biosensors, including the synthetic biological components (bioparts), sensing and reporter modules, genetic circuits, and chassis cells. We discuss several opportunities for improvement of synthetic cell-based biosensors. First, new functional modules must be discovered in genome databases, and this knowledge must be used to upgrade specific bioparts through molecular engineering. Second, modules must be assembled into functional biosystems in chassis cells. Third, heterogeneity of individual cells in the microbial population must be eliminated. In the perspectives, the development of whole-cell biosensors is also discussed in the aspects of cultivation methods and synthetic cells.
Financing a future for public biological data.

PubMed

Ellis, L B; Kalumbi, D

1999-09-01

The public web-based biological database infrastructure is a source of both wonder and worry. Users delight in the ever increasing amounts of information available; database administrators and curators worry about long-term financial support. An earlier study of 153 biological databases (Ellis and Kalumbi, Nature Biotechnol., 16, 1323-1324, 1998) determined that near future (1-5 year) funding for over two-thirds of them was uncertain. More detailed data are required to determine the magnitude of the problem and offer possible solutions. This study examines the finances and use statistics of a few of these organizations in more depth, and reviews several economic models that may help sustain them. Six organizations were studied. Their administrative overhead is fairly low; non-administrative personnel and computer-related costs account for 77% of expenses. One smaller, more specialized US database, in 1997, had 60% of total access from US domains; a majority (56%) of its US accesses came from commercial domains, although only 2% of the 153 databases originally studied received any industrial support. The most popular model used to gain industrial support is asymmetric pricing: preferentially charging the commercial users of a database. At least five biological databases have recently begun using this model. Advertising is another model which may be useful for the more general, more heavily used sites. Microcommerce has promise, especially for databases that do not attract advertisers, but needs further testing. The least income reported for any of the databases studied was $50,000/year; applying this rate to 400 biological databases (a lower limit of the number of such databases, many of which require far larger resources) would mean annual support need of at least $20 million. To obtain this level of support is challenging, yet failure to accept the challenge could be catastrophic. lynda@tc.umn. edu
Introducing meta-services for biomedical information extraction

PubMed Central

Leitner, Florian; Krallinger, Martin; Rodriguez-Penagos, Carlos; Hakenberg, Jörg; Plake, Conrad; Kuo, Cheng-Ju; Hsu, Chun-Nan; Tsai, Richard Tzong-Han; Hung, Hsi-Chuan; Lau, William W; Johnson, Calvin A; Sætre, Rune; Yoshida, Kazuhiro; Chen, Yan Hua; Kim, Sun; Shin, Soo-Yong; Zhang, Byoung-Tak; Baumgartner, William A; Hunter, Lawrence; Haddow, Barry; Matthews, Michael; Wang, Xinglong; Ruch, Patrick; Ehrler, Frédéric; Özgür, Arzucan; Erkan, Güneş; Radev, Dragomir R; Krauthammer, Michael; Luong, ThaiBinh; Hoffmann, Robert; Sander, Chris; Valencia, Alfonso

2008-01-01

We introduce the first meta-service for information extraction in molecular biology, the BioCreative MetaServer (BCMS; ). This prototype platform is a joint effort of 13 research groups and provides automatically generated annotations for PubMed/Medline abstracts. Annotation types cover gene names, gene IDs, species, and protein-protein interactions. The annotations are distributed by the meta-server in both human and machine readable formats (HTML/XML). This service is intended to be used by biomedical researchers and database annotators, and in biomedical language processing. The platform allows direct comparison, unified access, and result aggregation of the annotations. PMID:18834497
Looking for Cancer Clues in Publicly Accessible Databases

PubMed Central

Lemkin, Peter F.; Smythers, Gary W.; Munroe, David J.

2004-01-01

What started out as a mere attempt to tentatively identify proteins in experimental cancer-related 2D-PAGE maps developed into VIRTUAL2D, a web-accessible repository for theoretical pI/MW charts for 92 organisms. Using publicly available expression data, we developed a collection of tissue-specific plots based on differential gene expression between normal and diseased states. We use this comparative cancer proteomics knowledge base, known as the tissue molecular anatomy project (TMAP), to uncover threads of cancer markers common to several types of cancer and to relate this information to established biological pathways. PMID:18629065
Looking for cancer clues in publicly accessible databases.

PubMed

Medjahed, Djamel; Lemkin, Peter F; Smythers, Gary W; Munroe, David J

2004-01-01

What started out as a mere attempt to tentatively identify proteins in experimental cancer-related 2D-PAGE maps developed into VIRTUAL2D, a web-accessible repository for theoretical pI/MW charts for 92 organisms. Using publicly available expression data, we developed a collection of tissue-specific plots based on differential gene expression between normal and diseased states. We use this comparative cancer proteomics knowledge base, known as the tissue molecular anatomy project (TMAP), to uncover threads of cancer markers common to several types of cancer and to relate this information to established biological pathways.

ANN expert system screening for illicit amphetamines using molecular descriptors

NASA Astrophysics Data System (ADS)

Gosav, S.; Praisler, M.; Dorohoi, D. O.

2007-05-01

The goal of this study was to develop and an artificial neural network (ANN) based on computed descriptors, which would be able to classify the molecular structures of potential illicit amphetamines and to derive their biological activity according to the similarity of their molecular structure with amphetamines of known toxicity. The system is necessary for testing new molecular structures for epidemiological, clinical, and forensic purposes. It was built using a database formed by 146 compounds representing drugs of abuse (mainly central stimulants, hallucinogens, sympathomimetic amines, narcotics and other potent analgesics), precursors, or derivatized counterparts. Their molecular structures were characterized by computing three types of descriptors: 38 constitutional descriptors (CDs), 69 topological descriptors (TDs) and 160 3D-MoRSE descriptors (3DDs). An ANN system was built for each category of variables. All three networks (CD-NN, TD-NN and 3DD-NN) were trained to distinguish between stimulant amphetamines, hallucinogenic amphetamines, and nonamphetamines. A selection of variables was performed when necessary. The efficiency with which each network identifies the class identity of an unknown sample was evaluated by calculating several figures of merit. The results of the comparative analysis are presented.
Quantum probability ranking principle for ligand-based virtual screening.

PubMed

Al-Dabbagh, Mohammed Mumtaz; Salim, Naomie; Himmat, Mubarak; Ahmed, Ali; Saeed, Faisal

2017-04-01

Chemical libraries contain thousands of compounds that need screening, which increases the need for computational methods that can rank or prioritize compounds. The tools of virtual screening are widely exploited to enhance the cost effectiveness of lead drug discovery programs by ranking chemical compounds databases in decreasing probability of biological activity based upon probability ranking principle (PRP). In this paper, we developed a novel ranking approach for molecular compounds inspired by quantum mechanics, called quantum probability ranking principle (QPRP). The QPRP ranking criteria would make an attempt to draw an analogy between the physical experiment and molecular structure ranking process for 2D fingerprints in ligand based virtual screening (LBVS). The development of QPRP criteria in LBVS has employed the concepts of quantum at three different levels, firstly at representation level, this model makes an effort to develop a new framework of molecular representation by connecting the molecular compounds with mathematical quantum space. Secondly, estimate the similarity between chemical libraries and references based on quantum-based similarity searching method. Finally, rank the molecules using QPRP approach. Simulated virtual screening experiments with MDL drug data report (MDDR) data sets showed that QPRP outperformed the classical ranking principle (PRP) for molecular chemical compounds.
Quantum probability ranking principle for ligand-based virtual screening

NASA Astrophysics Data System (ADS)

Al-Dabbagh, Mohammed Mumtaz; Salim, Naomie; Himmat, Mubarak; Ahmed, Ali; Saeed, Faisal

2017-04-01

Chemical libraries contain thousands of compounds that need screening, which increases the need for computational methods that can rank or prioritize compounds. The tools of virtual screening are widely exploited to enhance the cost effectiveness of lead drug discovery programs by ranking chemical compounds databases in decreasing probability of biological activity based upon probability ranking principle (PRP). In this paper, we developed a novel ranking approach for molecular compounds inspired by quantum mechanics, called quantum probability ranking principle (QPRP). The QPRP ranking criteria would make an attempt to draw an analogy between the physical experiment and molecular structure ranking process for 2D fingerprints in ligand based virtual screening (LBVS). The development of QPRP criteria in LBVS has employed the concepts of quantum at three different levels, firstly at representation level, this model makes an effort to develop a new framework of molecular representation by connecting the molecular compounds with mathematical quantum space. Secondly, estimate the similarity between chemical libraries and references based on quantum-based similarity searching method. Finally, rank the molecules using QPRP approach. Simulated virtual screening experiments with MDL drug data report (MDDR) data sets showed that QPRP outperformed the classical ranking principle (PRP) for molecular chemical compounds.
A Review of the Composition of the Essential Oils and Biological Activities of Angelica Species.

PubMed

Sowndhararajan, Kandasamy; Deepa, Ponnuvel; Kim, Minju; Park, Se Jin; Kim, Songmun

2017-09-20

A number of Angelica species have been used in traditional systems of medicine to treat many ailments. Especially, essential oils (EOs) from the Angelica species have been used for the treatment of various health problems, including malaria, gynecological diseases, fever, anemia, and arthritis. EOs are complex mixtures of low molecular weight compounds, especially terpenoids and their oxygenated compounds. These components deliver specific fragrance and biological properties to essential oils. In this review, we summarized the chemical composition and biological activities of EOs from different species of Angelica . For this purpose, a literature search was carried out to obtain information about the EOs of Angelica species and their bioactivities from electronic databases such as PubMed, Science Direct, Wiley, Springer, ACS, Google, and other journal publications. There has been a lot of variation in the EO composition among different Angelica species. EOs from Angelica species were reported for different kinds of biological activities, such as antioxidant, anti-inflammatory, antimicrobial, immunotoxic, and insecticidal activities. The present review is an attempt to consolidate the available data for different Angelica species on the basis of major constituents in the EOs and their biological activities.
pClone: Synthetic Biology Tool Makes Promoter Research Accessible to Beginning Biology Students

PubMed Central

Eckdahl, Todd; Cronk, Brian; Andresen, Corinne; Frederick, Paul; Huckuntod, Samantha; Shinneman, Claire; Wacker, Annie; Yuan, Jason

2014-01-01

The Vision and Change report recommended genuine research experiences for undergraduate biology students. Authentic research improves science education, increases the number of scientifically literate citizens, and encourages students to pursue research. Synthetic biology is well suited for undergraduate research and is a growing area of science. We developed a laboratory module called pClone that empowers students to use advances in molecular cloning methods to discover new promoters for use by synthetic biologists. Our educational goals are consistent with Vision and Change and emphasize core concepts and competencies. pClone is a family of three plasmids that students use to clone a new transcriptional promoter or mutate a canonical promoter and measure promoter activity in Escherichia coli. We also developed the Registry of Functional Promoters, an open-access database of student promoter research results. Using pre- and posttests, we measured significant learning gains among students using pClone in introductory biology and genetics classes. Student posttest scores were significantly better than scores of students who did not use pClone. pClone is an easy and affordable mechanism for large-enrollment labs to meet the high standards of Vision and Change. PMID:26086659
HGDB: A web retrieving cardiovascular-associated gene data.

PubMed

Noorabad-Ghahroodi, Faezeh; Abdi, Samaneh; Zand, Amir Hossein; Najafi, Mohammad

2017-04-01

The use of data obtained from high throughput techniques in genetics studies is an essential subject in biology. The system approaches of networking and enriching may improve the data management. Here, we annotated the molecular features for cardiovascular-associated genes and presented the HGDB search-based database (www.hgdb.ir). The initial seed data was primarily used from Gene Ontology and was automatically enriched with other molecular features. The data was managed in a SQL popular and open source. The search tabs on the HGDB homepage were applied for ID/Name Gene, chromosome, cell organelle and all gene options. The search results were presented on the gene text-based and source link-based descriptions. The HGDB is a friendly website to present gene data in the cardiovascular field. Copyright Â© 2017 Elsevier B.V. All rights reserved.
Strategies for drug delivery to the central nervous system by systemic route.

PubMed

Kasinathan, Narayanan; Jagani, Hitesh V; Alex, Angel Treasa; Volety, Subrahmanyam M; Rao, J Venkata

2015-05-01

Delivery of a drug into the central nervous system (CNS) is considered difficult. Most of the drugs discovered over the past decade are biological, which are high in molecular weight and polar in nature. The delivery of such drugs across the blood-brain barrier presents problems. This review discusses some of the options available to reach the CNS by systemic route. The focus is mainly on the recent developments in systemic delivery of a drug to the CNS. Databases such as Scopus, Google scholar, Science Direct, SciFinder and online journals were referred for preparing this article including 89 references. There are at least nine strategies that could be adopted to achieve the required drug concentration in the CNS. The recent developments in drug delivery are very promising to deliver biologicals into the CNS.
Correcting ligands, metabolites, and pathways

PubMed Central

Ott, Martin A; Vriend, Gert

2006-01-01

Background A wide range of research areas in bioinformatics, molecular biology and medicinal chemistry require precise chemical structure information about molecules and reactions, e.g. drug design, ligand docking, metabolic network reconstruction, and systems biology. Most available databases, however, treat chemical structures more as illustrations than as a datafield in its own right. Lack of chemical accuracy impedes progress in the areas mentioned above. We present a database of metabolites called BioMeta that augments the existing pathway databases by explicitly assessing the validity, correctness, and completeness of chemical structure and reaction information. Description The main bulk of the data in BioMeta were obtained from the KEGG Ligand database. We developed a tool for chemical structure validation which assesses the chemical validity and stereochemical completeness of a molecule description. The validation tool was used to examine the compounds in BioMeta, showing that a relatively small number of compounds had an incorrect constitution (connectivity only, not considering stereochemistry) and that a considerable number (about one third) had incomplete or even incorrect stereochemistry. We made a large effort to correct the errors and to complete the structural descriptions. A total of 1468 structures were corrected and/or completed. We also established the reaction balance of the reactions in BioMeta and corrected 55% of the unbalanced (stoichiometrically incorrect) reactions in an automatic procedure. The BioMeta database was implemented in PostgreSQL and provided with a web-based interface. Conclusion We demonstrate that the validation of metabolite structures and reactions is a feasible and worthwhile undertaking, and that the validation results can be used to trigger corrections and improvements to BioMeta, our metabolite database. BioMeta provides some tools for rational drug design, reaction searches, and visualization. It is freely available at provided that the copyright notice of all original data is cited. The database will be useful for querying and browsing biochemical pathways, and to obtain reference information for identifying compounds. However, these applications require that the underlying data be correct, and that is the focus of BioMeta. PMID:17132165
Lynx web services for annotations and systems analysis of multi-gene disorders.

PubMed

Sulakhe, Dinanath; Taylor, Andrew; Balasubramanian, Sandhya; Feng, Bo; Xie, Bingqing; Börnigen, Daniela; Dave, Utpal J; Foster, Ian T; Gilliam, T Conrad; Maltsev, Natalia

2014-07-01

Lynx is a web-based integrated systems biology platform that supports annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Lynx has integrated multiple classes of biomedical data (genomic, proteomic, pathways, phenotypic, toxicogenomic, contextual and others) from various public databases as well as manually curated data from our group and collaborators (LynxKB). Lynx provides tools for gene list enrichment analysis using multiple functional annotations and network-based gene prioritization. Lynx provides access to the integrated database and the analytical tools via REST based Web Services (http://lynx.ci.uchicago.edu/webservices.html). This comprises data retrieval services for specific functional annotations, services to search across the complete LynxKB (powered by Lucene), and services to access the analytical tools built within the Lynx platform. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Towards the ophthalmology patentome: a comprehensive patent database of ocular drugs and biomarkers.

PubMed

Mucke, Hermann A M; Mucke, Eva; Mucke, Peter M

2013-01-01

We are currently building a database of all patent documents that contain substantial information related to pharmacology, drug delivery, tissue technology, and molecular diagnostics in ophthalmology. The goal is to establish a 'patentome', a body of cleaned and annotated data where all text-based, chemistry and pharmacology information can be accessed and mined in its context. We provide metrics on patent convention treaty documents, which demonstrate that ocular-related patenting has shown stronger growth than general patent cooperation treaty patenting during the past 25 years, and, while the majority of applications of this type have always provided substantial biological data, both data support and objections by patent examiners have been increasing since 2006-2007. Separately, we present a case study of chemistry information extraction from patents published during the 1950s and 1970s, which reveal compounds with corneal anesthesia potential that were never published in the peer-reviewed literature.
Identification of a New Isoindole-2-yl Scaffold as a Qo and Qi Dual Inhibitor of Cytochrome bc 1 Complex: Virtual Screening, Synthesis, and Biochemical Assay.

PubMed

Azizian, Homa; Bagherzadeh, Kowsar; Shahbazi, Sophia; Sharifi, Niusha; Amanlou, Massoud

2017-09-18

Respiratory chain ubiquinol-cytochrome (cyt) c oxidoreductase (cyt bc 1 or complex III) has been demonstrated as a promising target for numerous antibiotics and fungicide applications. In this study, a virtual screening of NCI diversity database was carried out in order to find novel Qo/Qi cyt bc 1 complex inhibitors. Structure-based virtual screening and molecular docking methodology were employed to further screen compounds with inhibition activity against cyt bc 1 complex after extensive reliability validation protocol with cross-docking method and identification of the best score functions. Subsequently, the application of rational filtering procedure over the target database resulted in the elucidation of a novel class of cyt bc 1 complex potent inhibitors with comparable binding energies and biological activities to those of the standard inhibitor, antimycin.
The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata.

PubMed

Liolios, Konstantinos; Chen, I-Min A; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Philip; Markowitz, Victor M; Kyrpides, Nikos C

2010-01-01

The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification. GOLD is available at: http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece, at: http://gold.imbb.forth.gr/
The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata

PubMed Central

Liolios, Konstantinos; Chen, I-Min A.; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Philip; Markowitz, Victor M.; Kyrpides, Nikos C.

2010-01-01

The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification. GOLD is available at: http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece, at: http://gold.imbb.forth.gr/ PMID:19914934
Proteomic profile of dormant Trichophyton Rubrum conidia

PubMed Central

Leng, Wenchuan; Liu, Tao; Li, Rui; Yang, Jian; Wei, Candong; Zhang, Wenliang; Jin, Qi

2008-01-01

Background Trichophyton rubrum is the most common dermatophyte causing fungal skin infections in humans. Asexual sporulation is an important means of propagation for T. rubrum, and conidia produced by this way are thought to be the primary cause of human infections. Despite their importance in pathogenesis, the conidia of T. rubrum remain understudied. We intend to intensively investigate the proteome of dormant T. rubrum conidia to characterize its molecular and cellular features and to enhance the development of novel therapeutic strategies. Results The proteome of T. rubrum conidia was analyzed by combining shotgun proteomics with sample prefractionation and multiple enzyme digestion. In total, 1026 proteins were identified. All identified proteins were compared to those in the NCBI non-redundant protein database, the eukaryotic orthologous groups database, and the gene ontology database to obtain functional annotation information. Functional classification revealed that the identified proteins covered nearly all major biological processes. Some proteins were spore specific and related to the survival and dispersal of T. rubrum conidia, and many proteins were important to conidial germination and response to environmental conditions. Conclusion Our results suggest that the proteome of T. rubrum conidia is considerably complex, and that the maintenance of conidial dormancy is an intricate and elaborate process. This data set provides the first global framework for the dormant T. rubrum conidia proteome and is a stepping stone on the way to further study of the molecular mechanisms of T. rubrum conidial germination and the maintenance of conidial dormancy. PMID:18578874
Metaproteomics as a Complementary Approach to Gut Microbiota in Health and Disease

NASA Astrophysics Data System (ADS)

Petriz, Bernardo A.; Franco, Octávio L.

2017-01-01

Classic studies on phylotype profiling are limited to the identification of microbial constituents, where information is lacking about the molecular interaction of these bacterial communities with the host genome and the possible outcomes in host biology. A range of OMICs approaches have provided great progress linking the microbiota to health and disease. However, the investigation of this context through proteomic mass spectrometry-based tools is still being improved. Therefore, metaproteomics or community proteogenomics has emerged as a complementary approach to metagenomic data, as a field in proteomics aiming to perform large-scale characterization of proteins from environmental microbiota such as the human gut. The advances in molecular separation methods coupled with mass spectrometry (e.g. LC-MS/MS) and proteome bioinformatics have been fundamental in these novel large-scale metaproteomic studies, which have further been performed in a wide range of samples including soil, plant and human environments. Metaproteomic studies will make major progress if a comprehensive database covering the genes and expresses proteins from all gut microbial species is developed. To this end, we here present some of the main limitations of metaproteomic studies in complex microbiota environments such as the gut, also addressing the up-to-date pipelines in sample preparation prior to fractionation/separation and mass spectrometry analysis. In addition, a novel approach to the limitations of metagenomic databases is also discussed. Finally, prospects are addressed regarding the application of metaproteomic analysis using a unified host-microbiome gene database and other meta-OMICs platforms.
BioM2MetDisease: a manually curated database for associations between microRNAs, metabolites, small molecules and metabolic diseases.

PubMed

Xu, Yanjun; Yang, Haixiu; Wu, Tan; Dong, Qun; Sun, Zeguo; Shang, Desi; Li, Feng; Xu, Yingqi; Su, Fei; Liu, Siyao; Zhang, Yunpeng; Li, Xia

2017-01-01

BioM2MetDisease is a manually curated database that aims to provide a comprehensive and experimentally supported resource of associations between metabolic diseases and various biomolecules. Recently, metabolic diseases such as diabetes have become one of the leading threats to people’s health. Metabolic disease associated with alterations of multiple types of biomolecules such as miRNAs and metabolites. An integrated and high-quality data source that collection of metabolic disease associated biomolecules is essential for exploring the underlying molecular mechanisms and discovering novel therapeutics. Here, we developed the BioM2MetDisease database, which currently documents 2681 entries of relationships between 1147 biomolecules (miRNAs, metabolites and small molecules/drugs) and 78 metabolic diseases across 14 species. Each entry includes biomolecule category, species, biomolecule name, disease name, dysregulation pattern, experimental technique, a brief description of metabolic disease-biomolecule relationships, the reference, additional annotation information etc. BioM2MetDisease provides a user-friendly interface to explore and retrieve all data conveniently. A submission page was also offered for researchers to submit new associations between biomolecules and metabolic diseases. BioM2MetDisease provides a comprehensive resource for studying biology molecules act in metabolic diseases, and it is helpful for understanding the molecular mechanisms and developing novel therapeutics for metabolic diseases. http://www.bio-bigdata.com/BioM2MetDisease/. © The Author(s) 2017. Published by Oxford University Press.
Practices and exploration on competition of molecular biological detection technology among students in food quality and safety major.

PubMed

Chang, Yaning; Peng, Yuke; Li, Pengfei; Zhuang, Yingping

2017-07-08

With the increasing importance in the application of the molecular biological detection technology in the field of food safety, strengthening education in molecular biology experimental techniques is more necessary for the culture of the students in food quality and safety major. However, molecular biology experiments are not always in curricula of Food quality and safety Majors. This paper introduced a project "competition of molecular biological detection technology for food safety among undergraduate sophomore students in food quality and safety major", students participating in this project needed to learn the fundamental molecular biology experimental techniques such as the principles of molecular biology experiments and genome extraction, PCR and agarose gel electrophoresis analysis, and then design the experiments in groups to identify the meat species in pork and beef products using molecular biological methods. The students should complete the experimental report after basic experiments, write essays and make a presentation after the end of the designed experiments. This project aims to provide another way for food quality and safety majors to improve their knowledge of molecular biology, especially experimental technology, and enhances them to understand the scientific research activities as well as give them a chance to learn how to write a professional thesis. In addition, in line with the principle of an open laboratory, the project is also open to students in other majors in East China University of Science and Technology, in order to enhance students in other majors to understand the fields of molecular biology and food safety. © 2017 by The International Union of Biochemistry and Molecular Biology, 45(4):343-350, 2017. © 2017 The International Union of Biochemistry and Molecular Biology.
The ANISEED database: digital representation, formalization, and elucidation of a chordate developmental program.

PubMed

Tassy, Olivier; Dauga, Delphine; Daian, Fabrice; Sobral, Daniel; Robin, François; Khoueiry, Pierre; Salgado, David; Fox, Vanessa; Caillol, Danièle; Schiappa, Renaud; Laporte, Baptiste; Rios, Anne; Luxardi, Guillaume; Kusakabe, Takehiro; Joly, Jean-Stéphane; Darras, Sébastien; Christiaen, Lionel; Contensin, Magali; Auger, Hélène; Lamy, Clément; Hudson, Clare; Rothbächer, Ute; Gilchrist, Michael J; Makabe, Kazuhiro W; Hotta, Kohji; Fujiwara, Shigeki; Satoh, Nori; Satou, Yutaka; Lemaire, Patrick

2010-10-01

Developmental biology aims to understand how the dynamics of embryonic shapes and organ functions are encoded in linear DNA molecules. Thanks to recent progress in genomics and imaging technologies, systemic approaches are now used in parallel with small-scale studies to establish links between genomic information and phenotypes, often described at the subcellular level. Current model organism databases, however, do not integrate heterogeneous data sets at different scales into a global view of the developmental program. Here, we present a novel, generic digital system, NISEED, and its implementation, ANISEED, to ascidians, which are invertebrate chordates suitable for developmental systems biology approaches. ANISEED hosts an unprecedented combination of anatomical and molecular data on ascidian development. This includes the first detailed anatomical ontologies for these embryos, and quantitative geometrical descriptions of developing cells obtained from reconstructed three-dimensional (3D) embryos up to the gastrula stages. Fully annotated gene model sets are linked to 30,000 high-resolution spatial gene expression patterns in wild-type and experimentally manipulated conditions and to 528 experimentally validated cis-regulatory regions imported from specialized databases or extracted from 160 literature articles. This highly structured data set can be explored via a Developmental Browser, a Genome Browser, and a 3D Virtual Embryo module. We show how integration of heterogeneous data in ANISEED can provide a system-level understanding of the developmental program through the automatic inference of gene regulatory interactions, the identification of inducing signals, and the discovery and explanation of novel asymmetric divisions.
Systematic Analysis of Arabidopsis Organelles and a Protein Localization Database for Facilitating Fluorescent Tagging of Full-Length Arabidopsis Proteins1[W

PubMed Central

Li, Shijun; Ehrhardt, David W.; Rhee, Seung Y.

2006-01-01

Cells are organized into a complex network of subcellular compartments that are specialized for various biological functions. Subcellular location is an important attribute of protein function. To facilitate systematic elucidation of protein subcellular location, we analyzed experimentally verified protein localization data of 1,300 Arabidopsis (Arabidopsis thaliana) proteins. The 1,300 experimentally verified proteins are distributed among 40 different compartments, with most of the proteins localized to four compartments: mitochondria (36%), nucleus (28%), plastid (17%), and cytosol (13.3%). About 19% of the proteins are found in multiple compartments, in which a high proportion (36.4%) is localized to both cytosol and nucleus. Characterization of the overrepresented Gene Ontology molecular functions and biological processes suggests that the Golgi apparatus and peroxisome may play more diverse functions but are involved in more specialized processes than other compartments. To support systematic empirical determination of protein subcellular localization using a technology called fluorescent tagging of full-length proteins, we developed a database and Web application to provide preselected green fluorescent protein insertion position and primer sequences for all Arabidopsis proteins to study their subcellular localization and to store experimentally verified protein localization images, videos, and their annotations of proteins generated using the fluorescent tagging of full-length proteins technology. The database can be searched, browsed, and downloaded using a Web browser at http://aztec.stanford.edu/gfp/. The software can also be downloaded from the same Web site for local installation. PMID:16617091
The ANISEED database: Digital representation, formalization, and elucidation of a chordate developmental program

PubMed Central

Tassy, Olivier; Dauga, Delphine; Daian, Fabrice; Sobral, Daniel; Robin, François; Khoueiry, Pierre; Salgado, David; Fox, Vanessa; Caillol, Danièle; Schiappa, Renaud; Laporte, Baptiste; Rios, Anne; Luxardi, Guillaume; Kusakabe, Takehiro; Joly, Jean-Stéphane; Darras, Sébastien; Christiaen, Lionel; Contensin, Magali; Auger, Hélène; Lamy, Clément; Hudson, Clare; Rothbächer, Ute; Gilchrist, Michael J.; Makabe, Kazuhiro W.; Hotta, Kohji; Fujiwara, Shigeki; Satoh, Nori; Satou, Yutaka; Lemaire, Patrick

2010-01-01

Developmental biology aims to understand how the dynamics of embryonic shapes and organ functions are encoded in linear DNA molecules. Thanks to recent progress in genomics and imaging technologies, systemic approaches are now used in parallel with small-scale studies to establish links between genomic information and phenotypes, often described at the subcellular level. Current model organism databases, however, do not integrate heterogeneous data sets at different scales into a global view of the developmental program. Here, we present a novel, generic digital system, NISEED, and its implementation, ANISEED, to ascidians, which are invertebrate chordates suitable for developmental systems biology approaches. ANISEED hosts an unprecedented combination of anatomical and molecular data on ascidian development. This includes the first detailed anatomical ontologies for these embryos, and quantitative geometrical descriptions of developing cells obtained from reconstructed three-dimensional (3D) embryos up to the gastrula stages. Fully annotated gene model sets are linked to 30,000 high-resolution spatial gene expression patterns in wild-type and experimentally manipulated conditions and to 528 experimentally validated cis-regulatory regions imported from specialized databases or extracted from 160 literature articles. This highly structured data set can be explored via a Developmental Browser, a Genome Browser, and a 3D Virtual Embryo module. We show how integration of heterogeneous data in ANISEED can provide a system-level understanding of the developmental program through the automatic inference of gene regulatory interactions, the identification of inducing signals, and the discovery and explanation of novel asymmetric divisions. PMID:20647237

Clinical value of miR-182-5p in lung squamous cell carcinoma: a study combining data from TCGA, GEO, and RT-qPCR validation.

PubMed

Luo, Jie; Shi, Ke; Yin, Shu-Ya; Tang, Rui-Xue; Chen, Wen-Jie; Huang, Lin-Zhen; Gan, Ting-Qing; Cai, Zheng-Wen; Chen, Gang

2018-04-10

MiR-182-5p, as a member of miRNA family, can be detected in lung cancer and plays an important role in lung cancer. To explore the clinical value of miR-182-5p in lung squamous cell carcinoma (LUSC) and to unveil the molecular mechanism of LUSC. The clinical value of miR-182-5p in LUSC was investigated by collecting and calculating data from The Cancer Genome Atlas (TCGA) database, the Gene Expression Omnibus (GEO) database, and real-time quantitative polymerase chain reaction (RT-qPCR). Twelve prediction platforms were used to predict the target genes of miR-182-5p. Protein-protein interaction (PPI) networks and gene ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were used to explore the molecular mechanism of LUSC. The expression of miR-182-5p was significantly over-expressed in LUSC than in non-cancerous tissues, as evidenced by various approaches, including the TCGA database, GEO microarrays, RT-qPCR, and a comprehensive meta-analysis of 501 LUSC cases and 148 non-cancerous cases. Furthermore, a total of 81 potential target genes were chosen from the union of predicted genes and the TCGA database. GO and KEGG analyses demonstrated that the target genes are involved in pathways related to biological processes. PPIs revealed the relationships between these genes, with EPAS1, PRKCE, NR3C1, and RHOB being located in the center of the PPI network. MiR-182-5p upregulation greatly contributes to LUSC and may serve as a biomarker in LUSC.
Integrated web visualizations for protein-protein interaction databases.

PubMed

Jeanquartier, Fleur; Jean-Quartier, Claire; Holzinger, Andreas

2015-06-16

Understanding living systems is crucial for curing diseases. To achieve this task we have to understand biological networks based on protein-protein interactions. Bioinformatics has come up with a great amount of databases and tools that support analysts in exploring protein-protein interactions on an integrated level for knowledge discovery. They provide predictions and correlations, indicate possibilities for future experimental research and fill the gaps to complete the picture of biochemical processes. There are numerous and huge databases of protein-protein interactions used to gain insights into answering some of the many questions of systems biology. Many computational resources integrate interaction data with additional information on molecular background. However, the vast number of diverse Bioinformatics resources poses an obstacle to the goal of understanding. We present a survey of databases that enable the visual analysis of protein networks. We selected M=10 out of N=53 resources supporting visualization, and we tested against the following set of criteria: interoperability, data integration, quantity of possible interactions, data visualization quality and data coverage. The study reveals differences in usability, visualization features and quality as well as the quantity of interactions. StringDB is the recommended first choice. CPDB presents a comprehensive dataset and IntAct lets the user change the network layout. A comprehensive comparison table is available via web. The supplementary table can be accessed on http://tinyurl.com/PPI-DB-Comparison-2015. Only some web resources featuring graph visualization can be successfully applied to interactive visual analysis of protein-protein interaction. Study results underline the necessity for further enhancements of visualization integration in biochemical analysis tools. Identified challenges are data comprehensiveness, confidence, interactive feature and visualization maturing.
TDR Targets: a chemogenomics resource for neglected diseases.

PubMed

Magariños, María P; Carmona, Santiago J; Crowther, Gregory J; Ralph, Stuart A; Roos, David S; Shanmugam, Dhanasekaran; Van Voorhis, Wesley C; Agüero, Fernán

2012-01-01

The TDR Targets Database (http://tdrtargets.org) has been designed and developed as an online resource to facilitate the rapid identification and prioritization of molecular targets for drug development, focusing on pathogens responsible for neglected human diseases. The database integrates pathogen specific genomic information with functional data (e.g. expression, phylogeny, essentiality) for genes collected from various sources, including literature curation. This information can be browsed and queried using an extensive web interface with functionalities for combining, saving, exporting and sharing the query results. Target genes can be ranked and prioritized using numerical weights assigned to the criteria used for querying. In this report we describe recent updates to the TDR Targets database, including the addition of new genomes (specifically helminths), and integration of chemical structure, property and bioactivity information for biological ligands, drugs and inhibitors and cheminformatic tools for querying and visualizing these chemical data. These changes greatly facilitate exploration of linkages (both known and predicted) between genes and small molecules, yielding insight into whether particular proteins may be druggable, effectively allowing the navigation of chemical space in a genomics context.
TDR Targets: a chemogenomics resource for neglected diseases

PubMed Central

Magariños, María P.; Carmona, Santiago J.; Crowther, Gregory J.; Ralph, Stuart A.; Roos, David S.; Shanmugam, Dhanasekaran; Van Voorhis, Wesley C.; Agüero, Fernán

2012-01-01

The TDR Targets Database (http://tdrtargets.org) has been designed and developed as an online resource to facilitate the rapid identification and prioritization of molecular targets for drug development, focusing on pathogens responsible for neglected human diseases. The database integrates pathogen specific genomic information with functional data (e.g. expression, phylogeny, essentiality) for genes collected from various sources, including literature curation. This information can be browsed and queried using an extensive web interface with functionalities for combining, saving, exporting and sharing the query results. Target genes can be ranked and prioritized using numerical weights assigned to the criteria used for querying. In this report we describe recent updates to the TDR Targets database, including the addition of new genomes (specifically helminths), and integration of chemical structure, property and bioactivity information for biological ligands, drugs and inhibitors and cheminformatic tools for querying and visualizing these chemical data. These changes greatly facilitate exploration of linkages (both known and predicted) between genes and small molecules, yielding insight into whether particular proteins may be druggable, effectively allowing the navigation of chemical space in a genomics context. PMID:22116064
MetaboLights: towards a new COSMOS of metabolomics data management.

PubMed

Steinbeck, Christoph; Conesa, Pablo; Haug, Kenneth; Mahendraker, Tejasvi; Williams, Mark; Maguire, Eamonn; Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Salek, Reza M; Griffin, Julian L

2012-10-01

Exciting funding initiatives are emerging in Europe and the US for metabolomics data production, storage, dissemination and analysis. This is based on a rich ecosystem of resources around the world, which has been build during the past ten years, including but not limited to resources such as MassBank in Japan and the Human Metabolome Database in Canada. Now, the European Bioinformatics Institute has launched MetaboLights, a database for metabolomics experiments and the associated metadata (http://www.ebi.ac.uk/metabolights). It is the first comprehensive, cross-species, cross-platform metabolomics database maintained by one of the major open access data providers in molecular biology. In October, the European COSMOS consortium will start its work on Metabolomics data standardization, publication and dissemination workflows. The NIH in the US is establishing 6-8 metabolomics services cores as well as a national metabolomics repository. This communication reports about MetaboLights as a new resource for Metabolomics research, summarises the related developments and outlines how they may consolidate the knowledge management in this third large omics field next to proteomics and genomics.
Characterization of pathogenic human MSH2 missense mutations using yeast as a model system: a laboratory course in molecular biology.

PubMed

Gammie, Alison E; Erdeniz, Naz

2004-01-01

This work describes the project for an advanced undergraduate laboratory course in cell and molecular biology. One objective of the course is to teach students a variety of cellular and molecular techniques while conducting original research. A second objective is to provide instruction in science writing and data presentation by requiring comprehensive laboratory reports modeled on the primary literature. The project for the course focuses on a gene, MSH2, implicated in the most common form of inherited colorectal cancer. Msh2 is important for maintaining the fidelity of genetic material where it functions as an important component of the DNA mismatch repair machinery. The goal of the project has two parts. The first part is to create mapped missense mutation listed in the human databases in the cognate yeast MSH2 gene and to assay for defects in DNA mismatch repair. The second part of the course is directed towards understanding in what way are the variant proteins defective for mismatch repair. Protein levels are analyzed to determine if the missense alleles display decreased expression. Furthermore, the students establish whether the Msh2p variants are properly localized to the nucleus using indirect immunofluorescence and whether the altered proteins have lost their ability to interact with other subunits of the MMR complex by creating recombinant DNA molecules and employing the yeast 2-hybrid assay.
Global open data management in metabolomics.

PubMed

Haug, Kenneth; Salek, Reza M; Steinbeck, Christoph

2017-02-01

Chemical Biology employs chemical synthesis, analytical chemistry and other tools to study biological systems. Recent advances in both molecular biology such as next generation sequencing (NGS) have led to unprecedented insights towards the evolution of organisms' biochemical repertoires. Because of the specific data sharing culture in Genomics, genomes from all kingdoms of life become readily available for further analysis by other researchers. While the genome expresses the potential of an organism to adapt to external influences, the Metabolome presents a molecular phenotype that allows us to asses the external influences under which an organism exists and develops in a dynamic way. Steady advancements in instrumentation towards high-throughput and highresolution methods have led to a revival of analytical chemistry methods for the measurement and analysis of the metabolome of organisms. This steady growth of metabolomics as a field is leading to a similar accumulation of big data across laboratories worldwide as can be observed in all of the other omics areas. This calls for the development of methods and technologies for handling and dealing with such large datasets, for efficiently distributing them and for enabling re-analysis. Here we describe the recently emerging ecosystem of global open-access databases and data exchange efforts between them, as well as the foundations and obstacles that enable or prevent the data sharing and reanalysis of this data. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
An overview of the challenges in designing, integrating, and delivering BARD: a public chemical biology resource and query portal across multiple organizations, locations, and disciplines

PubMed Central

de Souza, Andrea; Bittker, Joshua; Lahr, David; Brudz, Steve; Chatwin, Simon; Oprea, Tudor I.; Waller, Anna; Yang, Jeremy; Southall, Noel; Guha, Rajarshi; Schurer, Stephan; Vempati, Uma; Southern, Mark R.; Dawson, Eric S.; Clemons, Paul A.; Chung, Thomas D.Y.

2015-01-01

Recent industry-academic partnerships involve collaboration across disciplines, locations, and organizations using publicly funded “open-access” and proprietary commercial data sources. These require effective integration of chemical and biological information from diverse data sources, presenting key informatics, personnel, and organizational challenges. BARD (BioAssay Research Database) was conceived to address these challenges and to serve as a community-wide resource and intuitive web portal for public-sector chemical biology data. Its initial focus is to enable scientists to more effectively use the NIH Roadmap Molecular Libraries Program (MLP) data generated from 3-year pilot and 6-year production phases of the Molecular Libraries Probe Production Centers Network (MLPCN), currently in its final year. BARD evolves the current data standards through structured assay and result annotations that leverage the BioAssay Ontology (BAO) and other industry-standard ontologies, and a core hierarchy of assay definition terms and data standards defined specifically for small-molecule assay data. We have initially focused on migrating the highest-value MLP data into BARD and bringing it up to this new standard. We review the technical and organizational challenges overcome by the inter-disciplinary BARD team, veterans of public and private sector data-integration projects, collaborating to describe (functional specifications), design (technical specifications), and implement this next-generation software solution. PMID:24441647
CellBase, a comprehensive collection of RESTful web services for retrieving relevant biological information from heterogeneous sources.

PubMed

Bleda, Marta; Tarraga, Joaquin; de Maria, Alejandro; Salavert, Francisco; Garcia-Alonso, Luz; Celma, Matilde; Martin, Ainoha; Dopazo, Joaquin; Medina, Ignacio

2012-07-01

During the past years, the advances in high-throughput technologies have produced an unprecedented growth in the number and size of repositories and databases storing relevant biological data. Today, there is more biological information than ever but, unfortunately, the current status of many of these repositories is far from being optimal. Some of the most common problems are that the information is spread out in many small databases; frequently there are different standards among repositories and some databases are no longer supported or they contain too specific and unconnected information. In addition, data size is increasingly becoming an obstacle when accessing or storing biological data. All these issues make very difficult to extract and integrate information from different sources, to analyze experiments or to access and query this information in a programmatic way. CellBase provides a solution to the growing necessity of integration by easing the access to biological data. CellBase implements a set of RESTful web services that query a centralized database containing the most relevant biological data sources. The database is hosted in our servers and is regularly updated. CellBase documentation can be found at http://docs.bioinfo.cipf.es/projects/cellbase.
Market Pressure and Government Intervention in the Administration and Development of Molecular Databases.

ERIC Educational Resources Information Center

Sillince, J. A. A.; Sillince, M.

1993-01-01

Discusses molecular databases and the role that government and private companies play in their administration and development. Highlights include copyright and patent issues relating to public databases and the information contained in them; data quality; data structures and technological questions; the international organization of molecular…
[Molecular Biology on the Mechanisms of Autism Spectrum Disorder for Clinical Psychiatrists].

PubMed

Makinodan, Manabu

2015-01-01

While, in general, a certain number of clinical psychiatrists might not be familiar with molecular biology, the mechanisms of mental illnesses have been uncovered by molecular biology for decades. Among mental illnesses, even biological psychiatrists and neuroscientists have paid less attention to the biological treatment of autism spectrum disorder (ASD) than Alzheimer's disease and schizophrenia since ASD has been regarded as a developmental disorder that was seemingly untreatable. However, multifaceted methods of molecular biology have revealed the mechanisms that would lead to the medication of ASD. In this article, how molecular biology dissects the pathobiology of ASD is described in order to announce the possibilities of biological treatment for clinical psychiatrists.
A Systems Biology Approach to the Coordination of Defensive and Offensive Molecular Mechanisms in the Innate and Adaptive Host–Pathogen Interaction Networks

PubMed Central

Wu, Chia-Chou; Chen, Bor-Sen

2016-01-01

Infected zebrafish coordinates defensive and offensive molecular mechanisms in response to Candida albicans infections, and invasive C. albicans coordinates corresponding molecular mechanisms to interact with the host. However, knowledge of the ensuing infection-activated signaling networks in both host and pathogen and their interspecific crosstalk during the innate and adaptive phases of the infection processes remains incomplete. In the present study, dynamic network modeling, protein interaction databases, and dual transcriptome data from zebrafish and C. albicans during infection were used to infer infection-activated host–pathogen dynamic interaction networks. The consideration of host–pathogen dynamic interaction systems as innate and adaptive loops and subsequent comparisons of inferred innate and adaptive networks indicated previously unrecognized crosstalk between known pathways and suggested roles of immunological memory in the coordination of host defensive and offensive molecular mechanisms to achieve specific and powerful defense against pathogens. Moreover, pathogens enhance intraspecific crosstalk and abrogate host apoptosis to accommodate enhanced host defense mechanisms during the adaptive phase. Accordingly, links between physiological phenomena and changes in the coordination of defensive and offensive molecular mechanisms highlight the importance of host–pathogen molecular interaction networks, and consequent inferences of the host–pathogen relationship could be translated into biomedical applications. PMID:26881892
A review of methods used for studying the molecular epidemiology of Brachyspira hyodysenteriae.

PubMed

Zeeh, Friederike; Nathues, Heiko; Frey, Joachim; Muellner, Petra; Fellström, Claes

2017-08-01

Brachyspira (B.) spp. are intestinal spirochaetes isolated from pigs, other mammals, birds and humans. In pigs, seven Brachyspira spp. have been described, i.e. B. hyodysenteriae, B. pilosicoli, B. intermedia, B. murdochii, B. innocens, B. suanatina and B. hampsonii. Brachyspira hyodysenteriae is especially relevant in pigs as it causes swine dysentery and hence considerable economic losses to the pig industry. Furthermore, reduced susceptibility of B. hyodysenteriae to antimicrobials is of increasing concern. The epidemiology of B. hyodysenteriae infections is only partially understood, but different methods for detection, identification and typing have supported recent improvements in knowledge and understanding. In the last years, molecular methods have been increasingly used. Molecular epidemiology links molecular biology with epidemiology, offering unique opportunities to advance the study of diseases. This review is based on papers published in the field of epidemiology and molecular epidemiology of B. hyodysenteriae in pigs. Electronic databases were screened for potentially relevant papers using title and abstract and finally, Barcellos et al. papers were systemically selected and assessed. The review summarises briefly the current knowledge on B. hyodysenteriae epidemiology and elaborates on molecular typing techniques available. Results of the studies are compared and gaps in the knowledge are addressed. Finally, potential areas for future research are proposed. Copyright © 2017 Elsevier B.V. All rights reserved.
A Systems Biology Approach to the Coordination of Defensive and Offensive Molecular Mechanisms in the Innate and Adaptive Host-Pathogen Interaction Networks.

PubMed

Wu, Chia-Chou; Chen, Bor-Sen

2016-01-01

Infected zebrafish coordinates defensive and offensive molecular mechanisms in response to Candida albicans infections, and invasive C. albicans coordinates corresponding molecular mechanisms to interact with the host. However, knowledge of the ensuing infection-activated signaling networks in both host and pathogen and their interspecific crosstalk during the innate and adaptive phases of the infection processes remains incomplete. In the present study, dynamic network modeling, protein interaction databases, and dual transcriptome data from zebrafish and C. albicans during infection were used to infer infection-activated host-pathogen dynamic interaction networks. The consideration of host-pathogen dynamic interaction systems as innate and adaptive loops and subsequent comparisons of inferred innate and adaptive networks indicated previously unrecognized crosstalk between known pathways and suggested roles of immunological memory in the coordination of host defensive and offensive molecular mechanisms to achieve specific and powerful defense against pathogens. Moreover, pathogens enhance intraspecific crosstalk and abrogate host apoptosis to accommodate enhanced host defense mechanisms during the adaptive phase. Accordingly, links between physiological phenomena and changes in the coordination of defensive and offensive molecular mechanisms highlight the importance of host-pathogen molecular interaction networks, and consequent inferences of the host-pathogen relationship could be translated into biomedical applications.
An improved Pearson's correlation proximity-based hierarchical clustering for mining biological association between genes.

PubMed

Booma, P M; Prabhakaran, S; Dhanalakshmi, R

2014-01-01

Microarray gene expression datasets has concerned great awareness among molecular biologist, statisticians, and computer scientists. Data mining that extracts the hidden and usual information from datasets fails to identify the most significant biological associations between genes. A search made with heuristic for standard biological process measures only the gene expression level, threshold, and response time. Heuristic search identifies and mines the best biological solution, but the association process was not efficiently addressed. To monitor higher rate of expression levels between genes, a hierarchical clustering model was proposed, where the biological association between genes is measured simultaneously using proximity measure of improved Pearson's correlation (PCPHC). Additionally, the Seed Augment algorithm adopts average linkage methods on rows and columns in order to expand a seed PCPHC model into a maximal global PCPHC (GL-PCPHC) model and to identify association between the clusters. Moreover, a GL-PCPHC applies pattern growing method to mine the PCPHC patterns. Compared to existing gene expression analysis, the PCPHC model achieves better performance. Experimental evaluations are conducted for GL-PCPHC model with standard benchmark gene expression datasets extracted from UCI repository and GenBank database in terms of execution time, size of pattern, significance level, biological association efficiency, and pattern quality.
An Improved Pearson's Correlation Proximity-Based Hierarchical Clustering for Mining Biological Association between Genes

PubMed Central

Booma, P. M.; Prabhakaran, S.; Dhanalakshmi, R.

2014-01-01

Microarray gene expression datasets has concerned great awareness among molecular biologist, statisticians, and computer scientists. Data mining that extracts the hidden and usual information from datasets fails to identify the most significant biological associations between genes. A search made with heuristic for standard biological process measures only the gene expression level, threshold, and response time. Heuristic search identifies and mines the best biological solution, but the association process was not efficiently addressed. To monitor higher rate of expression levels between genes, a hierarchical clustering model was proposed, where the biological association between genes is measured simultaneously using proximity measure of improved Pearson's correlation (PCPHC). Additionally, the Seed Augment algorithm adopts average linkage methods on rows and columns in order to expand a seed PCPHC model into a maximal global PCPHC (GL-PCPHC) model and to identify association between the clusters. Moreover, a GL-PCPHC applies pattern growing method to mine the PCPHC patterns. Compared to existing gene expression analysis, the PCPHC model achieves better performance. Experimental evaluations are conducted for GL-PCPHC model with standard benchmark gene expression datasets extracted from UCI repository and GenBank database in terms of execution time, size of pattern, significance level, biological association efficiency, and pattern quality. PMID:25136661
Environmental surveillance and monitoring. The next frontiers ...

EPA Pesticide Factsheets

High throughput toxicity testing (HTT) technologies along with the world-wide web are revolutionizing both generation and access to data regarding the bioactivities that chemicals can elicit when they interact with specific proteins, genes, or other targets in the body of an organism. However, to date, most of the focus has been on the application of such data to assessment of individual chemicals. We suggest that environmental surveillance and monitoring represent the next frontiers for HTT. Resources already exist in curated databases of chemical-biological interactions, including highly standardized quantitative dose-response data generated from nascent HTT programs like ToxCast and Tox21, to link chemicals detected through environmental analytical chemistry to known biological activities. The emergence of the adverse outcome pathway framework and associated knowledgebase for linking molecular or pathway-level perturbations of biological systems to adverse outcomes traditionally considered in risk assessment and regulatory decision-making through a series of measureable biological changes provides a critical link between activity and hazard. Furthermore, environmental samples can be directly analyzed via HTT platforms to provide an unprecedented breadth of biological activity characterization that integrates the effects of all compounds present in a mixture, whether known or not. Novel application of these chemical-biological interaction data provide an oppor
Global analysis of the rat and human platelet proteome – the molecular blueprint for illustrating multi-functional platelets and cross-species function evolution

PubMed Central

Yu, Yanbao; Leng, Taohua; Yun, Dong; Liu, Na; Yao, Jun; Dai, Ying; Yang, Pengyuan; Chen, Xian

2013-01-01

Emerging evidences indicate that blood platelets function in multiple biological processes including immune response, bone metastasis and liver regeneration in addition to their known roles in hemostasis and thrombosis. Global elucidation of platelet proteome will provide the molecular base of these platelet functions. Here, we set up a high throughput platform for maximum exploration of the rat/human platelet proteome using integrated proteomics technologies, and then applied to identify the largest number of the proteins expressed in both rat and human platelets. After stringent statistical filtration, a total of 837 unique proteins matched with at least two unique peptides were precisely identified, making it the first comprehensive protein database so far for rat platelets. Meanwhile, quantitative analyses of the thrombin-stimulated platelets offered great insights into the biological functions of platelet proteins and therefore confirmed our global profiling data. A comparative proteomic analysis between rat and human platelets was also conducted, which revealed not only a significant similarity, but also an across-species evolutionary link that the orthologous proteins representing ‘core proteome’, and the ‘evolutionary proteome’ is actually a relatively static proteome. PMID:20443191
Systems biology impact on antiepileptic drug discovery.

PubMed

Margineanu, Doru Georg

2012-02-01

Systems biology (SB), a recent trend in bioscience research to consider the complex interactions in biological systems from a holistic perspective, sees the disease as a disturbed network of interactions, rather than alteration of single molecular component(s). SB-relying network pharmacology replaces the prevailing focus on specific drug-receptor interaction and the corollary of rational drug design of "magic bullets", by the search for multi-target drugs that would act on biological networks as "magic shotguns". Epilepsy being a multi-factorial, polygenic and dynamic pathology, SB approach appears particularly fit and promising for antiepileptic drug (AED) discovery. In fact, long before the advent of SB, AED discovery already involved some SB-like elements. A reported SB project aimed to find out new drug targets in epilepsy relies on a relational database that integrates clinical information, recordings from deep electrodes and 3D-brain imagery with histology and molecular biology data on modified expression of specific genes in the brain regions displaying spontaneous epileptic activity. Since hitting a single target does not treat complex diseases, a proper pharmacological promiscuity might impart on an AED the merit of being multi-potent. However, multi-target drug discovery entails the complicated task of optimizing multiple activities of compounds, while having to balance drug-like properties and to control unwanted effects. Specific design tools for this new approach in drug discovery barely emerge, but computational methods making reliable in silico predictions of poly-pharmacology did appear, and their progress might be quite rapid. The current move away from reductionism into network pharmacology allows expecting that a proper integration of the intrinsic complexity of epileptic pathology in AED discovery might result in literally anti-epileptic drugs. Copyright © 2011 Elsevier B.V. All rights reserved.
Biological agents database in the armed forces.

PubMed

Niemcewicz, Marcin; Kocik, Janusz; Bielecka, Anna; Wierciński, Michał

2014-10-01

Rapid detection and identification of the biological agent during both, natural or deliberate outbreak is crucial for implementation of appropriate control measures and procedures in order to mitigate the spread of disease. Determination of pathogen etiology may not only support epidemiological investigation and safety of human beings, but also enhance forensic efforts in pathogen tracing, collection of evidences and correct inference. The article presents objectives of the Biological Agents Database, which was developed for the purpose of the Ministry of National Defense of the Republic of Poland under the European Defence Agency frame. The Biological Agents Database is an electronic catalogue of genetic markers of highly dangerous pathogens and biological agents of weapon of mass destruction concern, which provides full identification of biological threats emerging in Poland and in locations of activity of Polish troops. The Biological Agents Database is a supportive tool used for tracing biological agents' origin as well as rapid identification of agent causing the disease of unknown etiology. It also provides support in diagnosis, analysis, response and exchange of information between institutions that use information contained in it. Therefore, it can be used not only for military purposes, but also in a civilian environment.

Huntington's Disease and its therapeutic target genes: a global functional profile based on the HD Research Crossroads database

PubMed Central

2012-01-01

Background Huntington’s disease (HD) is a fatal progressive neurodegenerative disorder caused by the expansion of the polyglutamine repeat region in the huntingtin gene. Although the disease is triggered by the mutation of a single gene, intensive research has linked numerous other genes to its pathogenesis. To obtain a systematic overview of these genes, which may serve as therapeutic targets, CHDI Foundation has recently established the HD Research Crossroads database. With currently over 800 cataloged genes, this web-based resource constitutes the most extensive curation of genes relevant to HD. It provides us with an unprecedented opportunity to survey molecular mechanisms involved in HD in a holistic manner. Methods To gain a synoptic view of therapeutic targets for HD, we have carried out a variety of bioinformatical and statistical analyses to scrutinize the functional association of genes curated in the HD Research Crossroads database. In particular, enrichment analyses were performed with respect to Gene Ontology categories, KEGG signaling pathways, and Pfam protein families. For selected processes, we also analyzed differential expression, using published microarray data. Additionally, we generated a candidate set of novel genetic modifiers of HD by combining information from the HD Research Crossroads database with previous genome-wide linkage studies. Results Our analyses led to a comprehensive identification of molecular mechanisms associated with HD. Remarkably, we not only recovered processes and pathways, which have frequently been linked to HD (such as cytotoxicity, apoptosis, and calcium signaling), but also found strong indications for other potentially disease-relevant mechanisms that have been less intensively studied in the context of HD (such as the cell cycle and RNA splicing, as well as Wnt and ErbB signaling). For follow-up studies, we provide a regularly updated compendium of molecular mechanism, that are associated with HD, at http://hdtt.sysbiolab.eu Additionally, we derived a candidate set of 24 novel genetic modifiers, including histone deacetylase 3 (HDAC3), metabotropic glutamate receptor 1 (GRM1), CDK5 regulatory subunit 2 (CDK5R2), and coactivator 1ß of the peroxisome proliferator-activated receptor gamma (PPARGC1B). Conclusions The results of our study give us an intriguing picture of the molecular complexity of HD. Our analyses can be seen as a first step towards a comprehensive list of biological processes, molecular functions, and pathways involved in HD, and may provide a basis for the development of more holistic disease models and new therapeutics. PMID:22741533
A computational platform to maintain and migrate manual functional annotations for BioCyc databases.

PubMed

Walsh, Jesse R; Sen, Taner Z; Dickerson, Julie A

2014-10-12

BioCyc databases are an important resource for information on biological pathways and genomic data. Such databases represent the accumulation of biological data, some of which has been manually curated from literature. An essential feature of these databases is the continuing data integration as new knowledge is discovered. As functional annotations are improved, scalable methods are needed for curators to manage annotations without detailed knowledge of the specific design of the BioCyc database. We have developed CycTools, a software tool which allows curators to maintain functional annotations in a model organism database. This tool builds on existing software to improve and simplify annotation data imports of user provided data into BioCyc databases. Additionally, CycTools automatically resolves synonyms and alternate identifiers contained within the database into the appropriate internal identifiers. Automating steps in the manual data entry process can improve curation efforts for major biological databases. The functionality of CycTools is demonstrated by transferring GO term annotations from MaizeCyc to matching proteins in CornCyc, both maize metabolic pathway databases available at MaizeGDB, and by creating strain specific databases for metabolic engineering.
Re-thinking organisms: The impact of databases on model organism biology.

PubMed

Leonelli, Sabina; Ankeny, Rachel A

2012-03-01

Community databases have become crucial to the collection, ordering and retrieval of data gathered on model organisms, as well as to the ways in which these data are interpreted and used across a range of research contexts. This paper analyses the impact of community databases on research practices in model organism biology by focusing on the history and current use of four community databases: FlyBase, Mouse Genome Informatics, WormBase and The Arabidopsis Information Resource. We discuss the standards used by the curators of these databases for what counts as reliable evidence, acceptable terminology, appropriate experimental set-ups and adequate materials (e.g., specimens). On the one hand, these choices are informed by the collaborative research ethos characterising most model organism communities. On the other hand, the deployment of these standards in databases reinforces this ethos and gives it concrete and precise instantiations by shaping the skills, practices, values and background knowledge required of the database users. We conclude that the increasing reliance on community databases as vehicles to circulate data is having a major impact on how researchers conduct and communicate their research, which affects how they understand the biology of model organisms and its relation to the biology of other species. Copyright © 2011 Elsevier Ltd. All rights reserved.
New tools and methods for direct programmatic access to the dbSNP relational database.

PubMed

Saccone, Scott F; Quan, Jiaxi; Mehta, Gaurang; Bolze, Raphael; Thomas, Prasanth; Deelman, Ewa; Tischfield, Jay A; Rice, John P

2011-01-01

Genome-wide association studies often incorporate information from public biological databases in order to provide a biological reference for interpreting the results. The dbSNP database is an extensive source of information on single nucleotide polymorphisms (SNPs) for many different organisms, including humans. We have developed free software that will download and install a local MySQL implementation of the dbSNP relational database for a specified organism. We have also designed a system for classifying dbSNP tables in terms of common tasks we wish to accomplish using the database. For each task we have designed a small set of custom tables that facilitate task-related queries and provide entity-relationship diagrams for each task composed from the relevant dbSNP tables. In order to expose these concepts and methods to a wider audience we have developed web tools for querying the database and browsing documentation on the tables and columns to clarify the relevant relational structure. All web tools and software are freely available to the public at http://cgsmd.isi.edu/dbsnpq. Resources such as these for programmatically querying biological databases are essential for viably integrating biological information into genetic association experiments on a genome-wide scale.
A method for automatically extracting infectious disease-related primers and probes from the literature

PubMed Central

2010-01-01

Background Primer and probe sequences are the main components of nucleic acid-based detection systems. Biologists use primers and probes for different tasks, some related to the diagnosis and prescription of infectious diseases. The biological literature is the main information source for empirically validated primer and probe sequences. Therefore, it is becoming increasingly important for researchers to navigate this important information. In this paper, we present a four-phase method for extracting and annotating primer/probe sequences from the literature. These phases are: (1) convert each document into a tree of paper sections, (2) detect the candidate sequences using a set of finite state machine-based recognizers, (3) refine problem sequences using a rule-based expert system, and (4) annotate the extracted sequences with their related organism/gene information. Results We tested our approach using a test set composed of 297 manuscripts. The extracted sequences and their organism/gene annotations were manually evaluated by a panel of molecular biologists. The results of the evaluation show that our approach is suitable for automatically extracting DNA sequences, achieving precision/recall rates of 97.98% and 95.77%, respectively. In addition, 76.66% of the detected sequences were correctly annotated with their organism name. The system also provided correct gene-related information for 46.18% of the sequences assigned a correct organism name. Conclusions We believe that the proposed method can facilitate routine tasks for biomedical researchers using molecular methods to diagnose and prescribe different infectious diseases. In addition, the proposed method can be expanded to detect and extract other biological sequences from the literature. The extracted information can also be used to readily update available primer/probe databases or to create new databases from scratch. PMID:20682041
An overview of methods using (13)C for improved compound identification in metabolomics and natural products.

PubMed

Clendinen, Chaevien S; Stupp, Gregory S; Ajredini, Ramadan; Lee-McMullen, Brittany; Beecher, Chris; Edison, Arthur S

2015-01-01

Compound identification is a major bottleneck in metabolomics studies. In nuclear magnetic resonance (NMR) investigations, resonance overlap often hinders unambiguous database matching or de novo compound identification. In liquid chromatography-mass spectrometry (LC-MS), discriminating between biological signals and background artifacts and reliable determination of molecular formulae are not always straightforward. We have designed and implemented several NMR and LC-MS approaches that utilize (13)C, either enriched or at natural abundance, in metabolomics applications. For LC-MS applications, we describe a technique called isotopic ratio outlier analysis (IROA), which utilizes samples that are isotopically labeled with 5% (test) and 95% (control) (13)C. This labeling strategy leads to characteristic isotopic patterns that allow the differentiation of biological signals from artifacts and yield the exact number of carbons, significantly reducing possible molecular formulae. The relative abundance between the test and control samples for every IROA feature can be determined simply by integrating the peaks that arise from the 5 and 95% channels. For NMR applications, we describe two (13)C-based approaches. For samples at natural abundance, we have developed a workflow to obtain (13)C-(13)C and (13)C-(1)H statistical correlations using 1D (13)C and (1)H NMR spectra. For samples that can be isotopically labeled, we describe another NMR approach to obtain direct (13)C-(13)C spectroscopic correlations. These methods both provide extensive information about the carbon framework of compounds in the mixture for either database matching or de novo compound identification. We also discuss strategies in which (13)C NMR can be used to identify unknown compounds from IROA experiments. By combining technologies with the same samples, we can identify important biomarkers and corresponding metabolites of interest.
International consensus for neuroblastoma molecular diagnostics: report from the International Neuroblastoma Risk Group (INRG) Biology Committee

PubMed Central

Ambros, P F; Ambros, I M; Brodeur, G M; Haber, M; Khan, J; Nakagawara, A; Schleiermacher, G; Speleman, F; Spitz, R; London, W B; Cohn, S L; Pearson, A D J; Maris, J M

2009-01-01

Neuroblastoma serves as a paradigm for utilising tumour genomic data for determining patient prognosis and treatment allocation. However, before the establishment of the International Neuroblastoma Risk Group (INRG) Task Force in 2004, international consensus on markers, methodology, and data interpretation did not exist, compromising the reliability of decisive genetic markers and inhibiting translational research efforts. The objectives of the INRG Biology Committee were to identify highly prognostic genetic aberrations to be included in the new INRG risk classification schema and to develop precise definitions, decisive biomarkers, and technique standardisation. The review of the INRG database (n=8800 patients) by the INRG Task Force finally enabled the identification of the most significant neuroblastoma biomarkers. In addition, the Biology Committee compared the standard operating procedures of different cooperative groups to arrive at international consensus for methodology, nomenclature, and future directions. Consensus was reached to include MYCN status, 11q23 allelic status, and ploidy in the INRG classification system on the basis of an evidence-based review of the INRG database. Standardised operating procedures for analysing these genetic factors were adopted, and criteria for proper nomenclature were developed. Neuroblastoma treatment planning is highly dependant on tumour cell genomic features, and it is likely that a comprehensive panel of DNA-based biomarkers will be used in future risk assignment algorithms applying genome-wide techniques. Consensus on methodology and interpretation is essential for uniform INRG classification and will greatly facilitate international and cooperative clinical and translational research studies. PMID:19401703
PathCase-SB architecture and database design

PubMed Central

2011-01-01

Background Integration of metabolic pathways resources and regulatory metabolic network models, and deploying new tools on the integrated platform can help perform more effective and more efficient systems biology research on understanding the regulation in metabolic networks. Therefore, the tasks of (a) integrating under a single database environment regulatory metabolic networks and existing models, and (b) building tools to help with modeling and analysis are desirable and intellectually challenging computational tasks. Description PathCase Systems Biology (PathCase-SB) is built and released. The PathCase-SB database provides data and API for multiple user interfaces and software tools. The current PathCase-SB system provides a database-enabled framework and web-based computational tools towards facilitating the development of kinetic models for biological systems. PathCase-SB aims to integrate data of selected biological data sources on the web (currently, BioModels database and KEGG), and to provide more powerful and/or new capabilities via the new web-based integrative framework. This paper describes architecture and database design issues encountered in PathCase-SB's design and implementation, and presents the current design of PathCase-SB's architecture and database. Conclusions PathCase-SB architecture and database provide a highly extensible and scalable environment with easy and fast (real-time) access to the data in the database. PathCase-SB itself is already being used by researchers across the world. PMID:22070889
Construction of a Linux based chemical and biological information system.

PubMed

Molnár, László; Vágó, István; Fehér, András

2003-01-01

A chemical and biological information system with a Web-based easy-to-use interface and corresponding databases has been developed. The constructed system incorporates all chemical, numerical and textual data related to the chemical compounds, including numerical biological screen results. Users can search the database by traditional textual/numerical and/or substructure or similarity queries through the web interface. To build our chemical database management system, we utilized existing IT components such as ORACLE or Tripos SYBYL for database management and Zope application server for the web interface. We chose Linux as the main platform, however, almost every component can be used under various operating systems.
RISE: a database of RNA interactome from sequencing experiments

PubMed Central

Gong, Jing; Shao, Di; Xu, Kui

2018-01-01

Abstract We present RISE (http://rise.zhanglab.net), a database of RNA Interactome from Sequencing Experiments. RNA-RNA interactions (RRIs) are essential for RNA regulation and function. RISE provides a comprehensive collection of RRIs that mainly come from recent transcriptome-wide sequencing-based experiments like PARIS, SPLASH, LIGR-seq, and MARIO, as well as targeted studies like RIA-seq, RAP-RNA and CLASH. It also includes interactions aggregated from other primary databases and publications. The RISE database currently contains 328,811 RNA-RNA interactions mainly in human, mouse and yeast. While most existing RNA databases mainly contain interactions of miRNA targeting, notably, more than half of the RRIs in RISE are among mRNA and long non-coding RNAs. We compared different RRI datasets in RISE and found limited overlaps in interactions resolved by different techniques and in different cell lines. It may suggest technology preference and also dynamic natures of RRIs. We also analyzed the basic features of the human and mouse RRI networks and found that they tend to be scale-free, small-world, hierarchical and modular. The analysis may nominate important RNAs or RRIs for further investigation. Finally, RISE provides a Circos plot and several table views for integrative visualization, with extensive molecular and functional annotations to facilitate exploration of biological functions for any RRI of interest. PMID:29040625
MeDReaders: a database for transcription factors that bind to methylated DNA.

PubMed

Wang, Guohua; Luo, Ximei; Wang, Jianan; Wan, Jun; Xia, Shuli; Zhu, Heng; Qian, Jiang; Wang, Yadong

2018-01-04

Understanding the molecular principles governing interactions between transcription factors (TFs) and DNA targets is one of the main subjects for transcriptional regulation. Recently, emerging evidence demonstrated that some TFs could bind to DNA motifs containing highly methylated CpGs both in vitro and in vivo. Identification of such TFs and elucidation of their physiological roles now become an important stepping-stone toward understanding the mechanisms underlying the methylation-mediated biological processes, which have crucial implications for human disease and disease development. Hence, we constructed a database, named as MeDReaders, to collect information about methylated DNA binding activities. A total of 731 TFs, which could bind to methylated DNA sequences, were manually curated in human and mouse studies reported in the literature. In silico approaches were applied to predict methylated and unmethylated motifs of 292 TFs by integrating whole genome bisulfite sequencing (WGBS) and ChIP-Seq datasets in six human cell lines and one mouse cell line extracted from ENCODE and GEO database. MeDReaders database will provide a comprehensive resource for further studies and aid related experiment designs. The database implemented unified access for users to most TFs involved in such methylation-associated binding actives. The website is available at http://medreader.org/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Assessment of knowledge of participants on basic molecular biology techniques after 5-day intensive molecular biology training workshops in Nigeria.

PubMed

Yisau, J I; Adagbada, A O; Bamidele, T; Fowora, M; Brai, B I C; Adebesin, O; Bamidele, M; Fesobi, T; Nwaokorie, F O; Ajayi, A; Smith, S I

2017-07-08

The deployment of molecular biology techniques for diagnosis and research in Nigeria is faced with a number of challenges, including the cost of equipment and reagents coupled with the dearth of personnel skilled in the procedures and handling of equipment. Short molecular biology training workshops were conducted at the Nigerian Institute of Medical Research (NIMR), to improve the knowledge and skills of laboratory personnel and academics in health, research, and educational facilities. Five-day molecular biology workshops were conducted annually between 2011 and 2014, with participants drawn from health, research facilities, and the academia. The courses consisted of theoretical and practical sessions. The impact of the workshops on knowledge and skill acquisition was evaluated by pre- and post-tests which consisted of 25 multiple choice and other questions. Sixty-five participants took part in the workshops. The mean knowledge of molecular biology as evaluated by the pre- and post-test assessments were 8.4 (95% CI 7.6-9.1) and 13.0 (95 CI 11.9-14.1), respectively. The mean post-test score was significantly greater than the mean pre-test score (p < 0.0001). The five-day molecular biology workshop significantly increased the knowledge and skills of participants in molecular biology techniques. © 2017 by The International Union of Biochemistry and Molecular Biology, 45(4):313-317, 2017. © 2017 The International Union of Biochemistry and Molecular Biology.
The human pain genetics database: an interview with Luda Diatchenko.

PubMed

Diatchenko, Luda

2018-06-05

Luda Diatchenko, MD, PhD is a Canada Excellence Research Chair in Human Pain Genetics, Professor, Faculty of Medicine, Department of Anesthesia and Faculty of Dentistry at McGill University, Alan Edwards Centre for Research on Pain. She earned her MD and PhD in the field of molecular biology from the Russian State Medical University. She started her career in industry, she was a Leader of the RNA Expression Group at Clontech, Inc., and subsequently, Director of Gene Discovery at Attagene, Inc. During this time, she was actively involved in the development of several widely used and widely cited molecular tools for the analysis of gene expression and regulation. Her academic career started at 2000 in the Center for Neurosensory Disorders at University of North Carolina. Her research since then is focused on determining the cellular and molecular biological mechanisms by which functional genetic variations impact human pain perception and risk of development of chronic pain conditions, enabling new approaches to identify new drug targets, treatment responses to analgesics and diagnostic. Multiple collaborative activities allow the Diatchenko group to take basic genetic findings all the way from human association studies, through molecular and cellular mechanisms to animal models and ultimately to human clinical trials. In total, she has authored or co-authored over 120 peer-reviewed research papers in journals, ten book chapters and edited a book in human pain genetics. She is a member and an active officer of several national and international scientific societies, including the International Association for the Study of Pain and the American Pain Society.
Urtica dioica pollen allergy: Clinical, biological, and allergomics analysis.

PubMed

Tiotiu, Angelica; Brazdova, Andrea; Longé, Cyril; Gallet, Patrice; Morisset, Martine; Leduc, Virginie; Hilger, Christiane; Broussard, Cédric; Couderc, Rémy; Sutra, Jean-Pierre; Sénéchal, Hélène; Poncet, Pascal

2016-11-01

The most emblematic members of Urticaceae at allergic risk level are wall pellitories (Parietaria), whereas nettle (Urtica) pollen is considered as poorly allergenic. No allergen from nettle pollen has yet been characterized, whereas 4 are listed for Parietaria pollen by the International Union of Immunological Societies. Clinical and biological profiles of 2 adult men who developed symptoms against nettle pollen and/or leaves were studied. To characterize the allergic reaction and identify the potential nettle pollen sensitizing allergens. IgE-mediated reaction to nettle pollen extract was evaluated by skin prick test, immunoassay, nasal provocation, and basophil activation test. To characterize specific nettle pollen allergens, an allergomic (IgE immunoproteomic) analysis was performed combining 1- and 2-dimensional electrophoresis, IgE immunoblots of nettle pollen extract, identification of allergens by mass spectrometry, and database queries. The results of biological and immunochemical analyses revealed that the allergic rhinitis was due to Urtica dioica pollen in both patients. The allergomic analysis of nettle pollen extract allowed the characterization of 4 basic protein allergens: a thaumatin-like protein (osmotin) with a relative molecular mass of 27 to 29 kDa, a pectinesterase (relative molecular mass, 40 kDa), and 2 other basic proteins with relative molecular masses of 14 to 16 kDa and 43 kDa. There is no or only very weak allergen associations between pellitory and nettle pollen. Exposure to nettle pollen can be responsible of allergic symptoms, and several allergens were characterized. Unravelling the allergens of this underestimated allergy might help to improve diagnosis and care for patients, to predict cross-reactivities and design adapted specific immunotherapy. Copyright © 2016 American College of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.
Prior knowledge guided active modules identification: an integrated multi-objective approach.

PubMed

Chen, Weiqi; Liu, Jing; He, Shan

2017-03-14

Active module, defined as an area in biological network that shows striking changes in molecular activity or phenotypic signatures, is important to reveal dynamic and process-specific information that is correlated with cellular or disease states. A prior information guided active module identification approach is proposed to detect modules that are both active and enriched by prior knowledge. We formulate the active module identification problem as a multi-objective optimisation problem, which consists two conflicting objective functions of maximising the coverage of known biological pathways and the activity of the active module simultaneously. Network is constructed from protein-protein interaction database. A beta-uniform-mixture model is used to estimate the distribution of p-values and generate scores for activity measurement from microarray data. A multi-objective evolutionary algorithm is used to search for Pareto optimal solutions. We also incorporate a novel constraints based on algebraic connectivity to ensure the connectedness of the identified active modules. Application of proposed algorithm on a small yeast molecular network shows that it can identify modules with high activities and with more cross-talk nodes between related functional groups. The Pareto solutions generated by the algorithm provides solutions with different trade-off between prior knowledge and novel information from data. The approach is then applied on microarray data from diclofenac-treated yeast cells to build network and identify modules to elucidate the molecular mechanisms of diclofenac toxicity and resistance. Gene ontology analysis is applied to the identified modules for biological interpretation. Integrating knowledge of functional groups into the identification of active module is an effective method and provides a flexible control of balance between pure data-driven method and prior information guidance.
A bio-inspired structural health monitoring system based on ambient vibration

NASA Astrophysics Data System (ADS)

Lin, Tzu-Kang; Kiremidjian, Anne; Lei, Chi-Yang

2010-11-01

A structural health monitoring (SHM) system based on naïve Bayesian (NB) damage classification and DNA-like expression data was developed in this research. Adapted from the deoxyribonucleic acid (DNA) array concept in molecular biology, the proposed structural health monitoring system is constructed utilizing a double-tier regression process to extract the expression array from the structural time history recorded during external excitations. The extracted array is symbolized as the various genes of the structure from the viewpoint of molecular biology and reflects the possible damage conditions prevalent in the structure. A scaled down, six-story steel building mounted on the shaking table of the National Center for Research on Earthquake Engineering (NCREE) was used as the benchmark. The structural response at different damage levels and locations under ambient vibration was collected to support the database for the proposed SHM system. To improve the precision of detection in practical applications, the system was enhanced by an optimization process using the likelihood selection method. The obtained array representing the DNA array of the health condition of the structure was first evaluated and ranked. A total of 12 groups of expression arrays were regenerated from a combination of four damage conditions. To keep the length of the array unchanged, the best 16 coefficients from every expression array were selected to form the optimized SHM system. Test results from the ambient vibrations showed that the detection accuracy of the structural damage could be greatly enhanced by the optimized expression array, when compared to the original system. Practical verification also demonstrated that a rapid and reliable result could be given by the final system within 1 min. The proposed system implements the idea of transplanting the DNA array concept from molecular biology into the field of SHM.
Mentor-mentee interaction and laboratory social environment: Do they matter in doctoral students' publication productivity?

PubMed

Ynalvez, Marcus Antonius; Ynalvez, Ruby A; Ramírez, Enrique

2017-03-04

We explored the social shaping of science at the micro-level reality of face-to-face interaction in one of the traditional places for scientific activities-the scientific lab. We specifically examined how doctoral students' perception of their: (i) interaction with doctoral mentors (MMI) and (ii) lab social environment (LSE) influenced productivity. Construed as the production of peer-reviewed articles, we measured productivity using total number of articles (TOTAL), number of articles with impact factor greater than or equal to 4.00 (IFGE4), and number of first-authored articles (NFA). Via face-to-face interviews, we obtained data from n = 210 molecular biology Ph.D. students in selected universities in Japan, Singapore, and Taiwan. Additional productivity data (NFA) were obtained from online bibliometric databases. To summarize the original 13 MMI and 13 LSE semantic-differential items which we used to measure students' perceptions, principal component (PC) analyses were performed. The results were smaller sets of 4 MMI PCs and 4 LSE PCs. To identify which PCs influenced publication counts, we performed Poisson regression analyses. Although perceived MMI was not linked to productivity, perceived LSE was linked: Students who perceived their LSE as intellectually stimulating reported high levels of productivity in both TOTAL and IFGE4, but not in NFA. Our findings not only highlight how students' perception of their training environment factors in the production of scientific output, our findings also carry important implications for improving mentoring programs in science. © 2016 by The International Union of Biochemistry and Molecular Biology, 45(2):130-144, 2017. © 2016 The International Union of Biochemistry and Molecular Biology.
Mechanisms of action of sacubitril/valsartan on cardiac remodeling: a systems biology approach.

PubMed

Iborra-Egea, Oriol; Gálvez-Montón, Carolina; Roura, Santiago; Perea-Gil, Isaac; Prat-Vidal, Cristina; Soler-Botija, Carolina; Bayes-Genis, Antoni

2017-01-01

Sacubitril/Valsartan, proved superiority over other conventional heart failure management treatments, but its mechanisms of action remains obscure. In this study, we sought to explore the mechanistic details for Sacubitril/Valsartan in heart failure and post-myocardial infarction remodeling, using an in silico, systems biology approach. Myocardial transcriptome obtained in response to myocardial infarction in swine was analyzed to address post-infarction ventricular remodeling. Swine transcriptome hits were mapped to their human equivalents using Reciprocal Best (blast) Hits, Gene Name Correspondence, and InParanoid database. Heart failure remodeling was studied using public data available in gene expression omnibus (accession GSE57345, subseries GSE57338), processed using the GEO2R tool. Using the Therapeutic Performance Mapping System technology, dedicated mathematical models trained to fit a set of molecular criteria, defining both pathologies and including all the information available on Sacubitril/Valsartan, were generated. All relationships incorporated into the biological network were drawn from public resources (including KEGG, REACTOME, INTACT, BIOGRID, and MINT). An artificial neural network analysis revealed that Sacubitril/Valsartan acts synergistically against cardiomyocyte cell death and left ventricular extracellular matrix remodeling via eight principal synergistic nodes. When studying each pathway independently, Valsartan was found to improve cardiac remodeling by inhibiting members of the guanine nucleotide-binding protein family, while Sacubitril attenuated cardiomyocyte cell death, hypertrophy, and impaired myocyte contractility by inhibiting PTEN. The complex molecular mechanisms of action of Sacubitril/Valsartan upon post-myocardial infarction and heart failure cardiac remodeling were delineated using a systems biology approach. Further, this dataset provides pathophysiological rationale for the use of Sacubitril/Valsartan to prevent post-infarct remodeling.
The nucleic acid revolution continues - will forensic biology become forensic molecular biology?

PubMed

Gunn, Peter; Walsh, Simon; Roux, Claude

2014-01-01

Molecular biology has evolved far beyond that which could have been predicted at the time DNA identity testing was established. Indeed we should now perhaps be referring to "forensic molecular biology." Aside from DNA's established role in identifying the "who" in crime investigations, other developments in medical and developmental molecular biology are now ripe for application to forensic challenges. The impact of DNA methylation and other post-fertilization DNA modifications, plus the emerging role of small RNAs in the control of gene expression, is re-writing our understanding of human biology. It is apparent that these emerging technologies will expand forensic molecular biology to allow for inferences about "when" a crime took place and "what" took place. However, just as the introduction of DNA identity testing engendered many challenges, so the expansion of molecular biology into these domains will raise again the issues of scientific validity, interpretation, probative value, and infringement of personal liberties. This Commentary ponders some of these emerging issues, and presents some ideas on how they will affect the conduct of forensic molecular biology in the foreseeable future.
3D visualization of molecular structures in the MOGADOC database

NASA Astrophysics Data System (ADS)

Vogt, Natalja; Popov, Evgeny; Rudert, Rainer; Kramer, Rüdiger; Vogt, Jürgen

2010-08-01

The MOGADOC database (Molecular Gas-Phase Documentation) is a powerful tool to retrieve information about compounds which have been studied in the gas-phase by electron diffraction, microwave spectroscopy and molecular radio astronomy. Presently the database contains over 34,500 bibliographic references (from the beginning of each method) for about 10,000 inorganic, organic and organometallic compounds and structural data (bond lengths, bond angles, dihedral angles, etc.) for about 7800 compounds. Most of the implemented molecular structures are given in a three-dimensional (3D) presentation. To create or edit and visualize the 3D images of molecules, new tools (special editor and Java-based 3D applet) were developed. Molecular structures in internal coordinates were converted to those in Cartesian coordinates.

The NCBI BioSystems database.

PubMed

Geer, Lewis Y; Marchler-Bauer, Aron; Geer, Renata C; Han, Lianyi; He, Jane; He, Siqian; Liu, Chunlei; Shi, Wenyao; Bryant, Stephen H

2010-01-01

The NCBI BioSystems database, found at http://www.ncbi.nlm.nih.gov/biosystems/, centralizes and cross-links existing biological systems databases, increasing their utility and target audience by integrating their pathways and systems into NCBI resources. This integration allows users of NCBI's Entrez databases to quickly categorize proteins, genes and small molecules by metabolic pathway, disease state or other BioSystem type, without requiring time-consuming inference of biological relationships from the literature or multiple experimental datasets.
Databases and coordinated research projects at the IAEA on atomic processes in plasmas

NASA Astrophysics Data System (ADS)

Braams, Bastiaan J.; Chung, Hyun-Kyung

2012-05-01

The Atomic and Molecular Data Unit at the IAEA works with a network of national data centres to encourage and coordinate production and dissemination of fundamental data for atomic, molecular and plasma-material interaction (A+M/PMI) processes that are relevant to the realization of fusion energy. The Unit maintains numerical and bibliographical databases and has started a Wiki-style knowledge base. The Unit also contributes to A+M database interface standards and provides a search engine that offers a common interface to multiple numerical A+M/PMI databases. Coordinated Research Projects (CRPs) bring together fusion energy researchers and atomic, molecular and surface physicists for joint work towards the development of new data and new methods. The databases and current CRPs on A+M/PMI processes are briefly described here.
Systems biology for molecular life sciences and its impact in biomedicine.

PubMed

Medina, Miguel Ángel

2013-03-01

Modern systems biology is already contributing to a radical transformation of molecular life sciences and biomedicine, and it is expected to have a real impact in the clinical setting in the next years. In this review, the emergence of systems biology is contextualized with a historic overview, and its present state is depicted. The present and expected future contribution of systems biology to the development of molecular medicine is underscored. Concerning the present situation, this review includes a reflection on the "inflation" of biological data and the urgent need for tools and procedures to make hidden information emerge. Descriptions of the impact of networks and models and the available resources and tools for applying them in systems biology approaches to molecular medicine are provided as well. The actual current impact of systems biology in molecular medicine is illustrated, reviewing two cases, namely, those of systems pharmacology and cancer systems biology. Finally, some of the expected contributions of systems biology to the immediate future of molecular medicine are commented.
The systematic annotation of the three main GPCR families in Reactome.

PubMed

Jassal, Bijay; Jupe, Steven; Caudy, Michael; Birney, Ewan; Stein, Lincoln; Hermjakob, Henning; D'Eustachio, Peter

2010-07-29

Reactome is an open-source, freely available database of human biological pathways and processes. A major goal of our work is to provide an integrated view of cellular signalling processes that spans from ligand-receptor interactions to molecular readouts at the level of metabolic and transcriptional events. To this end, we have built the first catalogue of all human G protein-coupled receptors (GPCRs) known to bind endogenous or natural ligands. The UniProt database has records for 797 proteins classified as GPCRs and sorted into families A/1, B/2 and C/3 on the basis of amino acid sequence. To these records we have added details from the IUPHAR database and our own manual curation of relevant literature to create reactions in which 563 GPCRs bind ligands and also interact with specific G-proteins to initiate signalling cascades. We believe the remaining 234 GPCRs are true orphans. The Reactome GPCR pathway can be viewed as a detailed interactive diagram and can be exported in many forms. It provides a template for the orthology-based inference of GPCR reactions for diverse model organism species, and can be overlaid with protein-protein interaction and gene expression datasets to facilitate overrepresentation studies and other forms of pathway analysis. Database URL: http://www.reactome.org.
A prototype molecular interactive collaborative environment (MICE).

PubMed

Bourne, P; Gribskov, M; Johnson, G; Moreland, J; Wavra, S; Weissig, H

1998-01-01

Illustrations of macromolecular structure in the scientific literature contain a high level of semantic content through which the authors convey, among other features, the biological function of that macromolecule. We refer to these illustrations as molecular scenes. Such scenes, if available electronically, are not readily accessible for further interactive interrogation. The basic PDB format does not retain features of the scene; formats like PostScript retain the scene but are not interactive; and the many formats used by individual graphics programs, while capable of reproducing the scene, are neither interchangeable nor can they be stored in a database and queried for features of the scene. MICE defines a Molecular Scene Description Language (MSDL) which allows scenes to be stored in a relational database (a molecular scene gallery) and queried. Scenes retrieved from the gallery are rendered in Virtual Reality Modeling Language (VRML) and currently displayed in WebView, a VRML browser modified to support the Virtual Reality Behavior System (VRBS) protocol. VRBS provides communication between multiple client browsers, each capable of manipulating the scene. This level of collaboration works well over standard Internet connections and holds promise for collaborative research at a distance and distance learning. Further, via VRBS, the VRML world can be used as a visual cue to trigger an application such as a remote MEME search. MICE is very much work in progress. Current work seeks to replace WebView with Netscape, Cosmoplayer, a standard VRML plug-in, and a Java-based console. The console consists of a generic kernel suitable for multiple collaborative applications and additional application-specific controls. Further details of the MICE project are available at http:/(/)mice.sdsc.edu.
Wikidata as a semantic framework for the Gene Wiki initiative.

PubMed

Burgstaller-Muehlbacher, Sebastian; Waagmeester, Andra; Mitraka, Elvira; Turner, Julia; Putman, Tim; Leong, Justin; Naik, Chinmay; Pavlidis, Paul; Schriml, Lynn; Good, Benjamin M; Su, Andrew I

2016-01-01

Open biological data are distributed over many resources making them challenging to integrate, to update and to disseminate quickly. Wikidata is a growing, open community database which can serve this purpose and also provides tight integration with Wikipedia. In order to improve the state of biological data, facilitate data management and dissemination, we imported all human and mouse genes, and all human and mouse proteins into Wikidata. In total, 59,721 human genes and 73,355 mouse genes have been imported from NCBI and 27,306 human proteins and 16,728 mouse proteins have been imported from the Swissprot subset of UniProt. As Wikidata is open and can be edited by anybody, our corpus of imported data serves as the starting point for integration of further data by scientists, the Wikidata community and citizen scientists alike. The first use case for these data is to populate Wikipedia Gene Wiki infoboxes directly from Wikidata with the data integrated above. This enables immediate updates of the Gene Wiki infoboxes as soon as the data in Wikidata are modified. Although Gene Wiki pages are currently only on the English language version of Wikipedia, the multilingual nature of Wikidata allows for usage of the data we imported in all 280 different language Wikipedias. Apart from the Gene Wiki infobox use case, a SPARQL endpoint and exporting functionality to several standard formats (e.g. JSON, XML) enable use of the data by scientists. In summary, we created a fully open and extensible data resource for human and mouse molecular biology and biochemistry data. This resource enriches all the Wikipedias with structured information and serves as a new linking hub for the biological semantic web. Database URL: https://www.wikidata.org/. © The Author(s) 2016. Published by Oxford University Press.
Virtual screening of B-Raf kinase inhibitors: A combination of pharmacophore modelling, molecular docking, 3D-QSAR model and binding free energy calculation studies.

PubMed

Zhang, Wen; Qiu, Kai-Xiong; Yu, Fang; Xie, Xiao-Guang; Zhang, Shu-Qun; Chen, Ya-Juan; Xie, Hui-Ding

2017-10-01

B-Raf kinase has been identified as an important target in recent cancer treatment. In order to discover structurally diverse and novel B-Raf inhibitors (BRIs), a virtual screening of BRIs against ZINC database was performed by using a combination of pharmacophore modelling, molecular docking, 3D-QSAR model and binding free energy (ΔG bind ) calculation studies in this work. After the virtual screening, six promising hit compounds were obtained, which were then tested for inhibitory activities of A375 cell lines. In the result, five hit compounds show good biological activities (IC 50 <50μM). The present method of virtual screening can be applied to find structurally diverse inhibitors, and the obtained five structurally diverse compounds are expected to develop novel BRIs. Copyright © 2017. Published by Elsevier Ltd.
BioMAJ: a flexible framework for databanks synchronization and processing.

PubMed

Filangi, Olivier; Beausse, Yoann; Assi, Anthony; Legrand, Ludovic; Larré, Jean-Marc; Martin, Véronique; Collin, Olivier; Caron, Christophe; Leroy, Hugues; Allouche, David

2008-08-15

Large- and medium-scale computational molecular biology projects require accurate bioinformatics software and numerous heterogeneous biological databanks, which are distributed around the world. BioMAJ provides a flexible, robust, fully automated environment for managing such massive amounts of data. The JAVA application enables automation of the data update cycle process and supervision of the locally mirrored data repository. We have developed workflows that handle some of the most commonly used bioinformatics databases. A set of scripts is also available for post-synchronization data treatment consisting of indexation or format conversion (for NCBI blast, SRS, EMBOSS, GCG, etc.). BioMAJ can be easily extended by personal homemade processing scripts. Source history can be kept via html reports containing statements of locally managed databanks. http://biomaj.genouest.org. BioMAJ is free open software. It is freely available under the CECILL version 2 license.
Bioinformatics approach reveals systematic mechanism underlying lung adenocarcinoma.

PubMed

Wu, Xiya; Zhang, Wei; Hu, Yunhua; Yi, Xianghua

2015-01-01

The purpose of this work was to explore the systematic molecular mechanism of lung adenocarcinoma and gain a deeper insight into it. Comprehensive bioinformatics methods were applied. Initially, significant differentially expressed genes (DEGs) were analyzed from the Affymetrix microarray data (GSE27262) deposited in the Gene Expression Omnibus (GEO). Subsequently, gene ontology (GO) analysis was performed using online Database for Annotation, Visualization and Integration Discovery (DAVID) software. Finally, significant pathway crosstalk was investigated based on the information derived from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. According to our results, the N-terminal globular domain of the type X collagen (COL10A1) gene and transmembrane protein 100 (TMEM100) gene were identified to be the most significant DEGs in tumor tissue compared with the adjacent normal tissues. The main GO categories were biological process, cellular component and molecular function. In addition, the crosstalk was significantly different between non-small cell lung cancer pathways and inositol phosphate metabolism pathway, focal adhesion signal pathway, vascular smooth muscle contraction signal pathway, peroxisome proliferator-activated receptor (PPAR) signaling pathway and calcium signaling pathway in tumor. Dysfunctional genes and pathways may play key roles in the progression and development of lung adenocarcinoma. Our data provide a systematic perspective for understanding this mechanism and may be helpful in discovering an effective treatment for lung adenocarcinoma.
Guide RNA selection for CRISPR-Cas9 transfections in Plasmodium falciparum.

PubMed

Ribeiro, Jose M; Garriga, Meera; Potchen, Nicole; Crater, Anna K; Gupta, Ankit; Ito, Daisuke; Desai, Sanjay A

2018-06-12

CRISPR-Cas9 mediated genome editing is addressing key limitations in the transfection of malaria parasites. While this method has already simplified the needed molecular cloning and reduced the time required to generate mutants in the human pathogen Plasmodium falciparum, optimal selection of required guide RNAs and guidelines for successful transfections have not been well characterized, leading workers to use time-consuming trial and error approaches. We used a genome-wide computational approach to create a comprehensive and publicly accessible database of possible guide RNA sequences in the P. falciparum genome. For each guide, we report on-target efficiency and specificity scores as well as information about the genomic site relevant to optimal design of CRISPR-Cas9 transfections to modify, disrupt, or conditionally knockdown any gene. As many antimalarial drug and vaccine targets are encoded by multigene families, we also developed a new paralog specificity score that should facilitate modification of either a single family member of interest or multiple paralogs that serve overlapping roles. Finally, we tabulated features of successful transfections in our laboratory, providing broadly useful guidelines for parasite transfections. Molecular studies aimed at understanding parasite biology or characterizing drug and vaccine targets in P. falciparum should be facilitated by this comprehensive database. Published by Elsevier Ltd.
A Systems Biology-Based Investigation into the Pharmacological Mechanisms of Sheng-ma-bie-jia-tang Acting on Systemic Lupus Erythematosus by Multi-Level Data Integration.

PubMed

Huang, Lin; Lv, Qi; Liu, Fenfen; Shi, Tieliu; Wen, Chengping

2015-11-12

Sheng-ma-bie-jia-tang (SMBJT) is a Traditional Chinese Medicine (TCM) formula that is widely used for the treatment of Systemic Lupus Erythematosus (SLE) in China. However, molecular mechanism behind this formula remains unknown. Here, we systematically analyzed targets of the ingredients in SMBJT to evaluate its potential molecular mechanism. First, we collected 1,267 targets from our previously published database, the Traditional Chinese Medicine Integrated Database (TCMID). Next, we conducted gene ontology and pathway enrichment analyses for these targets and determined that they were enriched in metabolism (amino acids, fatty acids, etc.) and signaling pathways (chemokines, Toll-like receptors, adipocytokines, etc.). 96 targets, which are known SLE disease proteins, were identified as essential targets and the rest 1,171 targets were defined as common targets of this formula. The essential targets directly interacted with SLE disease proteins. Besides, some common targets also had essential connections to both key targets and SLE disease proteins in enriched signaling pathway, e.g. toll-like receptor signaling pathway. We also found distinct function of essential and common targets in immune system processes. This multi-level approach to deciphering the underlying mechanism of SMBJT treatment of SLE details a new perspective that will further our understanding of TCM formulas.
Genetic diversity of Histoplasma and Sporothrix complexes based on sequences of their ITS1-5.8S-ITS2 regions from the BOLD System.

PubMed

Estrada-Bárcenas, Daniel Alfonso; Vite-Garín, Tania; Navarro-Barranco, Hortensia; de la Torre-Arciniega, Raúl; Pérez-Mejía, Amelia; Rodríguez-Arellanes, Gabriela; Ramirez, Jose Antonio; Humberto Sahaza, Jorge; Taylor, Maria Lucia; Toriello, Conchita

2014-01-01

High sensitivity and specificity of molecular biology techniques have proven usefulness for the detection, identification and typing of different pathogens. The ITS (Internal Transcribed Spacer) regions of the ribosomal DNA are highly conserved non-coding regions, and have been widely used in different studies including the determination of the genetic diversity of human fungal pathogens. This article wants to contribute to the understanding of the intra- and interspecific genetic diversity of isolates of the Histoplasma capsulatum and Sporothrix schenckii species complexes by an analysis of the available sequences of the ITS regions from different sequence databases. ITS1-5.8S-ITS2 sequences of each fungus, either deposited in GenBank, or from our research groups (registered in the Fungi Barcode of Life Database), were analyzed using the maximum likelihood (ML) method. ML analysis of the ITS sequences discriminated isolates from distant geographic origins and particular wild hosts, depending on the fungal species analyzed. This manuscript is part of the series of works presented at the "V International Workshop: Molecular genetic approaches to the study of human pathogenic fungi" (Oaxaca, Mexico, 2012). Copyright © 2013 Revista Iberoamericana de Micología. Published by Elsevier Espana. All rights reserved.
Databases for Microbiologists

DOE PAGES

Zhulin, Igor B.

2015-05-26

Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. Finally, the purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists.
Databases for Microbiologists

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhulin, Igor B.

Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. Finally, the purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists.
Databases for Microbiologists

PubMed Central

2015-01-01

Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. The purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists. PMID:26013493
Venkat Subramanian | NREL

Science.gov Websites

Venkat Subramanian Photo of Venkat Subramanian Venkataramanan Subramanian Researcher IV-Molecular for production of biofuels and bioproducts Areas of Expertise Molecular biology and biotechnology ., Molecular Biology and Biotechnology, University of Cincinnati, 2008 M.S., Molecular Biology, University of
An attempt to understand glioma stem cell biology through centrality analysis of a protein interaction network.

PubMed

Mallik, Mrinmay Kumar

2018-02-07

Biological networks can be analyzed using "Centrality Analysis" to identify the more influential nodes and interactions in the network. This study was undertaken to create and visualize a biological network comprising of protein-protein interactions (PPIs) amongst proteins which are preferentially over-expressed in glioma cancer stem cell component (GCSC) of glioblastomas as compared to the glioma non-stem cancer cell (GNSC) component and then to analyze this network through centrality analyses (CA) in order to identify the essential proteins in this network and their interactions. In addition, this study proposes a new centrality analysis method pertaining exclusively to transcription factors (TFs) and interactions amongst them. Moreover the relevant molecular functions, biological processes and biochemical pathways amongst these proteins were sought through enrichment analysis. A protein interaction network was created using a list of proteins which have been shown to be preferentially expressed or over-expressed in GCSCs isolated from glioblastomas as compared to the GNSCs. This list comprising of 38 proteins, created using manual literature mining, was submitted to the Reactome FIViz tool, a web based application integrated into Cytoscape, an open source software platform for visualizing and analyzing molecular interaction networks and biological pathways to produce the network. This network was subjected to centrality analyses utilizing ranked lists of six centrality measures using the FIViz application and (for the first time) a dedicated centrality analysis plug-in ; CytoNCA. The interactions exclusively amongst the transcription factors were nalyzed through a newly proposed centrality analysis method called "Gene Expression Associated Degree Centrality Analysis (GEADCA)". Enrichment analysis was performed using the "network function analysis" tool on Reactome. The CA was able to identify a small set of proteins with consistently high centrality ranks that is indicative of their strong influence in the protein protein interaction network. Similarly the newly proposed GEADCA helped identify the transcription factors with high centrality values indicative of their key roles in transcriptional regulation. The enrichment studies provided a list of molecular functions, biological processes and biochemical pathways associated with the constructed network. The study shows how pathway based databases may be used to create and analyze a relevant protein interaction network in glioma cancer stem cells and identify the essential elements within it to gather insights into the molecular interactions that regulate the properties of glioma stem cells. How these insights may be utilized to help the development of future research towards formulation of new management strategies have been discussed from a theoretical standpoint. Copyright © 2017 Elsevier Ltd. All rights reserved.
Molecular biology of pancreatic cancer: how useful is it in clinical practice?

PubMed

Sakorafas, George H; Smyrniotis, Vasileios

2012-07-10

During the recent two decades dramatic advances of molecular biology allowed an in-depth understanding of pancreatic carcinogenesis. It is currently accepted that pancreatic cancer has a genetic component. The real challenge is now how these impressive advances could be used in clinical practice. To critically present currently available data regarding clinical application of molecular biology in pancreatic cancer. Reports about clinical implications of molecular biology in patients with pancreatic cancer were retrieved from PubMed. These reports were selected on the basis of their clinical relevance, and the data of their publication (preferentially within the last 5 years). Emphasis was placed on reports investigating diagnostic, prognostic, and therapeutic implications. Molecular biology can be used to identify individuals at high-risk for pancreatic cancer development. Intensive surveillance is indicated in these patients to detect pancreatic neoplasia ideally at a preinvasive stage, when curative resection is still possible. Molecular biology can also be used in the diagnosis of pancreatic cancer, with molecular analysis on samples of biologic material, such as serum or plasma, duodenal fluid or preferentially pure pancreatic juice, pancreatic cells or tissue, and stools. Molecular indices have also prognostic significance. Finally, molecular biology may have therapeutic implications by using various therapeutic approaches, such as antiangiogenic factors, purine synthesis inhibitors, matrix metalloproteinase inhibitors, factors modulating tumor-stroma interaction, inactivation of the hedgehog pathway, gene therapy, oncolytic viral therapy, immunotherapy (both passive as well as active) etc. Molecular biology may have important clinical implications in patients with pancreatic cancer and represents one of the most active areas on cancer research. Hopefully clinical applications of molecular biology in pancreatic cancer will expand in the future, improving the effectiveness of treatment and prognosis of patients with pancreatic cancer.
Practices and Exploration on Competition of Molecular Biological Detection Technology among Students in Food Quality and Safety Major

ERIC Educational Resources Information Center

Chang, Yaning; Peng, Yuke; Li, Pengfei; Zhuang, Yingping

2017-01-01

With the increasing importance in the application of the molecular biological detection technology in the field of food safety, strengthening education in molecular biology experimental techniques is more necessary for the culture of the students in food quality and safety major. However, molecular biology experiments are not always in curricula…
Spectroscopic data for an astronomy database

NASA Technical Reports Server (NTRS)

Parkinson, W. H.; Smith, Peter L.

1995-01-01

Very few of the atomic and molecular data used in analyses of astronomical spectra are currently available in World Wide Web (WWW) databases that are searchable with hypertext browsers. We have begun to rectify this situation by making extensive atomic data files available with simple search procedures. We have also established links to other on-line atomic and molecular databases. All can be accessed from our database homepage with URL: http:// cfa-www.harvard.edu/ amp/ data/ amdata.html.

The postgenomic era: implications for the clinical laboratory.

PubMed

Kiechle, Frederick L; Zhang, Xinbo

2002-03-01

To review the advances in clinically useful molecular biological techniques and to identify their applications in clinical practice, as presented at the Tenth Annual William Beaumont Hospital DNA Symposium. The 11 manuscripts submitted were reviewed and their major findings were compared with literature on the same topic. Manuscripts address creative thinking techniques applied to DNA discovery, extraction of DNA from clotted blood, the relationship of mitochondrial dysfunction in neurodegenerative disorders, and molecular methods to identify human lymphocyte antigen class I and class II loci. Two other manuscripts review current issues in molecular microbiology, including detection of hepatitis C virus and biological warfare. The last 5 manuscripts describe current issues in molecular cardiovascular disease, including assessing thrombotic risk, genomic analysis, gene therapy, and a device for aiding in cardiac angiogenesis. Novel problem-solving techniques have been used in the past and will be required in the future in DNA discovery. The extraction of DNA from clotted blood demonstrates a potential cost-effective strategy. Cybrids created from mitochondrial DNA-depleted cells and mitochondrial DNA from a platelet donor have been useful in defining the role mitochondria play in neurodegeneration. Mitochondrial depletion has been reported as a genetically inherited disorder or after human immunodeficiency virus therapy. Hepatitis C viral detection by qualitative, quantitative, or genotyping techniques is useful clinically. Preparedness for potential biological warfare is a responsibility of all clinical laboratorians. Thrombotic risk in cardiovascular disorders may be assessed by coagulation screening assays and further defined by mutation analysis for specific genes for prothrombin and factor V Leiden. Gene therapy for reducing arteriosclerotic risk has been hindered primarily by complications introduced by the vectors used to introduce the therapeutic genes. Neovascularization in cardiac muscle with occluded vessels represents a promising method for recovery of viable tissue following ischemia. The sequence of the human genome was reported by 2 groups in February 2001. The postgenomic era will emphasize the use of microarrays and database software for genomic and proteomic screening in the search for useful clinical assays. The number of molecular pathologic techniques and assays will expand as additional disease-associated mutations are defined. Gene therapy and tissue engineering will represent successful therapeutic adjuncts.
The NCBI BioSystems database

PubMed Central

Geer, Lewis Y.; Marchler-Bauer, Aron; Geer, Renata C.; Han, Lianyi; He, Jane; He, Siqian; Liu, Chunlei; Shi, Wenyao; Bryant, Stephen H.

2010-01-01

The NCBI BioSystems database, found at http://www.ncbi.nlm.nih.gov/biosystems/, centralizes and cross-links existing biological systems databases, increasing their utility and target audience by integrating their pathways and systems into NCBI resources. This integration allows users of NCBI’s Entrez databases to quickly categorize proteins, genes and small molecules by metabolic pathway, disease state or other BioSystem type, without requiring time-consuming inference of biological relationships from the literature or multiple experimental datasets. PMID:19854944
New tools and methods for direct programmatic access to the dbSNP relational database

PubMed Central

Saccone, Scott F.; Quan, Jiaxi; Mehta, Gaurang; Bolze, Raphael; Thomas, Prasanth; Deelman, Ewa; Tischfield, Jay A.; Rice, John P.

2011-01-01

Genome-wide association studies often incorporate information from public biological databases in order to provide a biological reference for interpreting the results. The dbSNP database is an extensive source of information on single nucleotide polymorphisms (SNPs) for many different organisms, including humans. We have developed free software that will download and install a local MySQL implementation of the dbSNP relational database for a specified organism. We have also designed a system for classifying dbSNP tables in terms of common tasks we wish to accomplish using the database. For each task we have designed a small set of custom tables that facilitate task-related queries and provide entity-relationship diagrams for each task composed from the relevant dbSNP tables. In order to expose these concepts and methods to a wider audience we have developed web tools for querying the database and browsing documentation on the tables and columns to clarify the relevant relational structure. All web tools and software are freely available to the public at http://cgsmd.isi.edu/dbsnpq. Resources such as these for programmatically querying biological databases are essential for viably integrating biological information into genetic association experiments on a genome-wide scale. PMID:21037260
The nucleic acid revolution continues – will forensic biology become forensic molecular biology?

PubMed Central

Gunn, Peter; Walsh, Simon; Roux, Claude

2014-01-01

Molecular biology has evolved far beyond that which could have been predicted at the time DNA identity testing was established. Indeed we should now perhaps be referring to “forensic molecular biology.” Aside from DNA’s established role in identifying the “who” in crime investigations, other developments in medical and developmental molecular biology are now ripe for application to forensic challenges. The impact of DNA methylation and other post-fertilization DNA modifications, plus the emerging role of small RNAs in the control of gene expression, is re-writing our understanding of human biology. It is apparent that these emerging technologies will expand forensic molecular biology to allow for inferences about “when” a crime took place and “what” took place. However, just as the introduction of DNA identity testing engendered many challenges, so the expansion of molecular biology into these domains will raise again the issues of scientific validity, interpretation, probative value, and infringement of personal liberties. This Commentary ponders some of these emerging issues, and presents some ideas on how they will affect the conduct of forensic molecular biology in the foreseeable future. PMID:24634675
Future Technology-Driven Revolutions in Military Operations. Results of a Workshop

DTIC Science & Technology

1994-01-01

sensor missions. "• Biomolecular Electronics - The use of techniques from molecular biology and biotechnology to develop new molecular electronic materials...34* Biomolecular electronics - The use of techniques from molecular biology and biotechnology to develop new molecular electronic materials, components, and...occurring in molecular biology . 42 Biotechnology Molecular Biologists Arm Develoni "Magical" Caoabilitles "• To mynthsieh genm (frm satch) with conboi
New perspectives in toxicological information management, and the role of ISSTOX databases in assessing chemical mutagenicity and carcinogenicity.

PubMed

Benigni, Romualdo; Battistelli, Chiara Laura; Bossa, Cecilia; Tcheremenskaia, Olga; Crettaz, Pierre

2013-07-01

Currently, the public has access to a variety of databases containing mutagenicity and carcinogenicity data. These resources are crucial for the toxicologists and regulators involved in the risk assessment of chemicals, which necessitates access to all the relevant literature, and the capability to search across toxicity databases using both biological and chemical criteria. Towards the larger goal of screening chemicals for a wide range of toxicity end points of potential interest, publicly available resources across a large spectrum of biological and chemical data space must be effectively harnessed with current and evolving information technologies (i.e. systematised, integrated and mined), if long-term screening and prediction objectives are to be achieved. A key to rapid progress in the field of chemical toxicity databases is that of combining information technology with the chemical structure as identifier of the molecules. This permits an enormous range of operations (e.g. retrieving chemicals or chemical classes, describing the content of databases, finding similar chemicals, crossing biological and chemical interrogations, etc.) that other more classical databases cannot allow. This article describes the progress in the technology of toxicity databases, including the concepts of Chemical Relational Database and Toxicological Standardized Controlled Vocabularies (Ontology). Then it describes the ISSTOX cluster of toxicological databases at the Istituto Superiore di Sanitá. It consists of freely available databases characterised by the use of modern information technologies and by curation of the quality of the biological data. Finally, this article provides examples of analyses and results made possible by ISSTOX.
The Cologne Database for Molecular Spectroscopy, CDMS, in the Virtual Atomic and Molecular Data Centre, VAMDC

NASA Astrophysics Data System (ADS)

Endres, Christian P.; Schlemmer, Stephan; Schilke, Peter; Stutzki, Jürgen; Müller, Holger S. P.

2016-09-01

The Cologne Database for Molecular Spectroscopy, CDMS, was founded 1998 to provide in its catalog section line lists of mostly molecular species which are or may be observed in various astronomical sources (usually) by radio astronomical means. The line lists contain transition frequencies with qualified accuracies, intensities, quantum numbers, as well as further auxiliary information. They have been generated from critically evaluated experimental line lists, mostly from laboratory experiments, employing established Hamiltonian models. Separate entries exist for different isotopic species and usually also for different vibrational states. As of December 2015, the number of entries is 792. They are available online as ascii tables with additional files documenting information on the entries. The Virtual Atomic and Molecular Data Centre, VAMDC, was founded more than 5 years ago as a common platform for atomic and molecular data. This platform facilitates exchange not only between spectroscopic databases related to astrophysics or astrochemistry, but also with collisional and kinetic databases. A dedicated infrastructure was developed to provide a common data format in the various databases enabling queries to a large variety of databases on atomic and molecular data at once. For CDMS, the incorporation in VAMDC was combined with several modifications on the generation of CDMS catalog entries. Here we introduce related changes to the data structure and the data content in the CDMS. The new data scheme allows us to incorporate all previous data entries but in addition allows us also to include entries based on new theoretical descriptions. Moreover, the CDMS entries have been transferred into a mySQL database format. These developments within the VAMDC framework have in part been driven by the needs of the astronomical community to be able to deal efficiently with large data sets obtained with the Herschel Space Telescope or, more recently, with the Atacama Large Millimeter Array.
Databases and coordinated research projects at the IAEA on atomic processes in plasmas

DOE Office of Scientific and Technical Information (OSTI.GOV)

Braams, Bastiaan J.; Chung, Hyun-Kyung

2012-05-25

The Atomic and Molecular Data Unit at the IAEA works with a network of national data centres to encourage and coordinate production and dissemination of fundamental data for atomic, molecular and plasma-material interaction (A+M/PMI) processes that are relevant to the realization of fusion energy. The Unit maintains numerical and bibliographical databases and has started a Wiki-style knowledge base. The Unit also contributes to A+M database interface standards and provides a search engine that offers a common interface to multiple numerical A+M/PMI databases. Coordinated Research Projects (CRPs) bring together fusion energy researchers and atomic, molecular and surface physicists for joint workmore » towards the development of new data and new methods. The databases and current CRPs on A+M/PMI processes are briefly described here.« less
Assessment of the health effects of chemicals in humans: II. Construction of an adverse effects database for QSAR modeling.

PubMed

Matthews, Edwin J; Kruhlak, Naomi L; Weaver, James L; Benz, R Daniel; Contrera, Joseph F

2004-12-01

The FDA's Spontaneous Reporting System (SRS) database contains over 1.5 million adverse drug reaction (ADR) reports for 8620 drugs/biologics that are listed for 1191 Coding Symbols for Thesaurus of Adverse Reaction (COSTAR) terms of adverse effects. We have linked the trade names of the drugs to 1861 generic names and retrieved molecular structures for each chemical to obtain a set of 1515 organic chemicals that are suitable for modeling with commercially available QSAR software packages. ADR report data for 631 of these compounds were extracted and pooled for the first five years that each drug was marketed. Patient exposure was estimated during this period using pharmaceutical shipping units obtained from IMS Health. Significant drug effects were identified using a Reporting Index (RI), where RI = (# ADR reports / # shipping units) x 1,000,000. MCASE/MC4PC software was used to identify the optimal conditions for defining a significant adverse effect finding. Results suggest that a significant effect in our database is characterized by > or = 4 ADR reports and > or = 20,000 shipping units during five years of marketing, and an RI > or = 4.0. Furthermore, for a test chemical to be evaluated as active it must contain a statistically significant molecular structural alert, called a decision alert, in two or more toxicologically related endpoints. We also report the use of a composite module, which pools observations from two or more toxicologically related COSTAR term endpoints to provide signal enhancement for detecting adverse effects.
Database of traditional Chinese medicine and its application to studies of mechanism and to prescription validation.

PubMed

Chen, X; Zhou, H; Liu, Y B; Wang, J F; Li, H; Ung, C Y; Han, L Y; Cao, Z W; Chen, Y Z

2006-12-01

Traditional Chinese Medicine (TCM) is widely practised and is viewed as an attractive alternative to conventional medicine. Quantitative information about TCM prescriptions, constituent herbs and herbal ingredients is necessary for studying and exploring TCM. We manually collected information on TCM in books and other printed sources in Medline. The Traditional Chinese Medicine Information Database TCM-ID, at http://tcm.cz3.nus.edu.sg/group/tcm-id/tcmid.asp, was introduced for providing comprehensive information about all aspects of TCM including prescriptions, constituent herbs, herbal ingredients, molecular structure and functional properties of active ingredients, therapeutic and side effects, clinical indication and application and related matters. TCM-ID currently contains information for 1,588 prescriptions, 1,313 herbs, 5,669 herbal ingredients, and the 3D structure of 3,725 herbal ingredients. The value of the data in TCM-ID was illustrated by using some of the data for an in-silico study of molecular mechanism of the therapeutic effects of herbal ingredients and for developing a computer program to validate TCM multi-herb preparations. The development of systems biology has led to a new design principle for therapeutic intervention strategy, the concept of 'magic shrapnel' (rather than the 'magic bullet'), involving many drugs against multiple targets, administered in a single treatment. TCM offers an extensive source of examples of this concept in which several active ingredients in one prescription are aimed at numerous targets and work together to provide therapeutic benefit. The database and its mining applications described here represent early efforts toward exploring TCM for new theories in drug discovery.
Message from the ISCB: ISCB Ebola award for important future research on the computational biology of Ebola virus.

PubMed

Karp, Peter D; Berger, Bonnie; Kovats, Diane; Lengauer, Thomas; Linial, Michal; Sabeti, Pardis; Hide, Winston; Rost, Burkhard

2015-02-15

Speed is of the essence in combating Ebola; thus, computational approaches should form a significant component of Ebola research. As for the development of any modern drug, computational biology is uniquely positioned to contribute through comparative analysis of the genome sequences of Ebola strains and three-dimensional protein modeling. Other computational approaches to Ebola may include large-scale docking studies of Ebola proteins with human proteins and with small-molecule libraries, computational modeling of the spread of the virus, computational mining of the Ebola literature and creation of a curated Ebola database. Taken together, such computational efforts could significantly accelerate traditional scientific approaches. In recognition of the need for important and immediate solutions from the field of computational biology against Ebola, the International Society for Computational Biology (ISCB) announces a prize for an important computational advance in fighting the Ebola virus. ISCB will confer the ISCB Fight against Ebola Award, along with a prize of US$2000, at its July 2016 annual meeting (ISCB Intelligent Systems for Molecular Biology 2016, Orlando, FL). dkovats@iscb.org or rost@in.tum.de. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Assessment of Knowledge of Participants on Basic Molecular Biology Techniques after 5-Day Intensive Molecular Biology Training Workshops in Nigeria

ERIC Educational Resources Information Center

Yisau, J. I.; Adagbada, A. O.; Bamidele, T.; Fowora, M.; Brai, B. I. C.; Adebesin, O.; Bamidele, M.; Fesobi, T.; Nwaokorie, F. O.; Ajayi, A.; Smith, S. I.

2017-01-01

The deployment of molecular biology techniques for diagnosis and research in Nigeria is faced with a number of challenges, including the cost of equipment and reagents coupled with the dearth of personnel skilled in the procedures and handling of equipment. Short molecular biology training workshops were conducted at the Nigerian Institute of…
A Comprehensive Experiment for Molecular Biology: Determination of Single Nucleotide Polymorphism in Human REV3 Gene Using PCR-RFLP

ERIC Educational Resources Information Center

Zhang, Xu; Shao, Meng; Gao, Lu; Zhao, Yuanyuan; Sun, Zixuan; Zhou, Liping; Yan, Yongmin; Shao, Qixiang; Xu, Wenrong; Qian, Hui

2017-01-01

Laboratory exercise is helpful for medical students to understand the basic principles of molecular biology and to learn about the practical applications of molecular biology. We have designed a lab course on molecular biology about the determination of single nucleotide polymorphism (SNP) in human REV3 gene, the product of which is a subunit of…
Virtual Atomic and Molecular Data Center (VAMDC) and Stark-B Database

NASA Astrophysics Data System (ADS)

Dimitrijevic, M. S.; Sahal-Brechot, S.; Kovacevic, A.; Jevremovic, D.; Popovic, L. C.; VAMDC Consortium; Dubernet, Marie-Lise

2012-01-01

Virtual Atomic and Molecular Data Center (VAMDC) is an European FP7 project with aims to build a flexible and interoperable e-science environment based interface to the existing Atomic and Molecular data. The VAMDC will be built upon the expertise of existing Atomic and Molecular databases, data producers and service providers with the specific aim of creating an infrastructure that is easily tuned to the requirements of a wide variety of users in academic, governmental, industrial or public communities. In VAMDC will enter also STARK-B database, containing Stark broadening parameters for a large number of lines, obtained by the semiclassical perturbation method during more than 30 years of collaboration of authors of this work (MSD and SSB) and their co-workers. In this contribution we will review the VAMDC project, STARK-B database and discuss the benefits of both for the corresponding data users.
A National Comparison of Biochemistry and Molecular Biology Capstone Experiences

ERIC Educational Resources Information Center

Aguanno, Ann; Mertz, Pamela; Martin, Debra; Bell, Ellis

2015-01-01

Recognizing the increasingly integrative nature of the molecular life sciences, the "American Society for Biochemistry and Molecular Biology" (ASBMB) recommends that Biochemistry and Molecular Biology (BMB) programs develop curricula based on concepts, content, topics, and expected student outcomes, rather than courses. To that end,…
Transcriptome analysis of duck liver and identification of differentially expressed transcripts in response to duck hepatitis A virus genotype C infection.

PubMed

Tang, Cheng; Lan, Daoliang; Zhang, Huanrong; Ma, Jing; Yue, Hua

2013-01-01

Duck is an economically important poultry and animal model for human viral hepatitis B. However, the molecular mechanisms underlying host-virus interaction remain unclear because of limited information on the duck genome. This study aims to characterize the duck normal liver transcriptome and to identify the differentially expressed transcripts at 24 h after duck hepatitis A virus genotype C (DHAV-C) infection using Illumina-Solexa sequencing. After removal of low-quality sequences and assembly, a total of 52,757 unigenes was obtained from the normal liver group. Further blast analysis showed that 18,918 unigenes successfully matched the known genes in the database. GO analysis revealed that 25,116 unigenes took part in 61 categories of biological processes, cellular components, and molecular functions. Among the 25 clusters of orthologous group categories (COG), the cluster for "General function prediction only" represented the largest group, followed by "Transcription" and "Replication, recombination, and repair." KEGG analysis showed that 17,628 unigenes were involved in 301 pathways. Through comparison of normal and infected transcriptome data, we identified 20 significantly differentially expressed unigenes, which were further confirmed by real-time polymerase chain reaction. Of the 20 unigenes, nine matched the known genes in the database, including three up-regulated genes (virus replicase polyprotein, LRRC3B, and PCK1) and six down-regulated genes (CRP, AICL-like 2, L1CAM, CYB26A1, CHAC1, and ADAM32). The remaining 11 novel unigenes that did not match any known genes in the database may provide a basis for the discovery of new transcripts associated with infection. This study provided a gene expression pattern for normal duck liver and for the previously unrecognized changes in gene transcription that are altered during DHAV-C infection. Our data revealed useful information for future studies on the duck genome and provided new insights into the molecular mechanism of host-DHAV-C interaction.
Systematic revision of the adeleid haemogregarines, with creation of Bartazoon n. g., reassignment of Hepatozoon argantis Garnham, 1954 to Hemolivia, and molecular data on Hemolivia stellata

PubMed Central

Karadjian, Grégory; Chavatte, Jean-Marc; Landau, Irène

2015-01-01

Life cycles and molecular data for terrestrial haemogregarines are reviewed in this article. Collection material was re-examined: Hepatozoon argantis Garnham, 1954 in Argas brumpti was reassigned to Hemolivia as Hemolivia argantis (Garnham, 1954) n. comb.; parasite DNA was extracted from a tick crush on smear of an archived slide of Hemolivia stellata in Amblyomma rotondatum, then the 18S ssrRNA gene was amplified by PCR. A systematic revision of the group is proposed, based on biological life cycles and phylogenetic reconstruction. Four types of life cycles, based on parasite vector, vertebrate host and the characteristics of their development, are defined. We propose combining species, based on their biology, into four groups (types I, II, III and IV). The characters of each type are defined and associated with a type genus and a type species. The biological characters of each type are associated with a different genus and a type species. The phylogenetic reconstruction with sequences deposited in the databases and our own new sequence of Hemolivia stellata is consistent with this classification. The classification is as follows: Type I, Hepatozoon Miller, 1908, type species H. perniciosum Miller, 1908; Type II, Karyolysus Labbé, 1894, type species K. lacertae (Danilewsky, 1886) Reichenow, 1913; Type III Hemolivia Petit et al., 1990, type species H. stellata, Petit et al., 1990; and Type IV: Bartazoon n. g., type species B. breinli (Mackerras, 1960). PMID:26551414
An Overview of the Challenges in Designing, Integrating, and Delivering BARD: A Public Chemical-Biology Resource and Query Portal for Multiple Organizations, Locations, and Disciplines.

PubMed

de Souza, Andrea; Bittker, Joshua A; Lahr, David L; Brudz, Steve; Chatwin, Simon; Oprea, Tudor I; Waller, Anna; Yang, Jeremy J; Southall, Noel; Guha, Rajarshi; Schürer, Stephan C; Vempati, Uma D; Southern, Mark R; Dawson, Eric S; Clemons, Paul A; Chung, Thomas D Y

2014-06-01

Recent industry-academic partnerships involve collaboration among disciplines, locations, and organizations using publicly funded "open-access" and proprietary commercial data sources. These require the effective integration of chemical and biological information from diverse data sources, which presents key informatics, personnel, and organizational challenges. The BioAssay Research Database (BARD) was conceived to address these challenges and serve as a community-wide resource and intuitive web portal for public-sector chemical-biology data. Its initial focus is to enable scientists to more effectively use the National Institutes of Health Roadmap Molecular Libraries Program (MLP) data generated from the 3-year pilot and 6-year production phases of the Molecular Libraries Probe Production Centers Network (MLPCN), which is currently in its final year. BARD evolves the current data standards through structured assay and result annotations that leverage BioAssay Ontology and other industry-standard ontologies, and a core hierarchy of assay definition terms and data standards defined specifically for small-molecule assay data. We initially focused on migrating the highest-value MLP data into BARD and bringing it up to this new standard. We review the technical and organizational challenges overcome by the interdisciplinary BARD team, veterans of public- and private-sector data-integration projects, who are collaborating to describe (functional specifications), design (technical specifications), and implement this next-generation software solution. © 2014 Society for Laboratory Automation and Screening.
A new biology for a new century.

PubMed

Woese, Carl R

2004-06-01

Biology today is at a crossroads. The molecular paradigm, which so successfully guided the discipline throughout most of the 20th century, is no longer a reliable guide. Its vision of biology now realized, the molecular paradigm has run its course. Biology, therefore, has a choice to make, between the comfortable path of continuing to follow molecular biology's lead or the more invigorating one of seeking a new and inspiring vision of the living world, one that addresses the major problems in biology that 20th century biology, molecular biology, could not handle and, so, avoided. The former course, though highly productive, is certain to turn biology into an engineering discipline. The latter holds the promise of making biology an even more fundamental science, one that, along with physics, probes and defines the nature of reality. This is a choice between a biology that solely does society's bidding and a biology that is society's teacher.
Analysis of the ergosterol biosynthesis pathway cloning, molecular characterization and phylogeny of lanosterol 14 α-demethylase (ERG11) gene of Moniliophthora perniciosa.

PubMed

de Oliveira Ceita, Geruza; Vilas-Boas, Laurival Antônio; Castilho, Marcelo Santos; Carazzolle, Marcelo Falsarella; Pirovani, Carlos Priminho; Selbach-Schnadelbach, Alessandra; Gramacho, Karina Peres; Ramos, Pablo Ivan Pereira; Barbosa, Luciana Veiga; Pereira, Gonçalo Amarante Guimarães; Góes-Neto, Aristóteles

2014-10-01

The phytopathogenic fungus Moniliophthora perniciosa (Stahel) Aime & Philips-Mora, causal agent of witches' broom disease of cocoa, causes countless damage to cocoa production in Brazil. Molecular studies have attempted to identify genes that play important roles in fungal survival and virulence. In this study, sequences deposited in the M. perniciosa Genome Sequencing Project database were analyzed to identify potential biological targets. For the first time, the ergosterol biosynthetic pathway in M. perniciosa was studied and the lanosterol 14α-demethylase gene (ERG11) that encodes the main enzyme of this pathway and is a target for fungicides was cloned, characterized molecularly and its phylogeny analyzed. ERG11 genomic DNA and cDNA were characterized and sequence analysis of the ERG11 protein identified highly conserved domains typical of this enzyme, such as SRS1, SRS4, EXXR and the heme-binding region (HBR). Comparison of the protein sequences and phylogenetic analysis revealed that the M. perniciosa enzyme was most closely related to that of Coprinopsis cinerea.

Analysis of the ergosterol biosynthesis pathway cloning, molecular characterization and phylogeny of lanosterol 14 α-demethylase (ERG11) gene of Moniliophthora perniciosa

PubMed Central

de Oliveira Ceita, Geruza; Vilas-Boas, Laurival Antônio; Castilho, Marcelo Santos; Carazzolle, Marcelo Falsarella; Pirovani, Carlos Priminho; Selbach-Schnadelbach, Alessandra; Gramacho, Karina Peres; Ramos, Pablo Ivan Pereira; Barbosa, Luciana Veiga; Pereira, Gonçalo Amarante Guimarães; Góes-Neto, Aristóteles

2014-01-01

The phytopathogenic fungus Moniliophthora perniciosa (Stahel) Aime & Philips-Mora, causal agent of witches’ broom disease of cocoa, causes countless damage to cocoa production in Brazil. Molecular studies have attempted to identify genes that play important roles in fungal survival and virulence. In this study, sequences deposited in the M. perniciosa Genome Sequencing Project database were analyzed to identify potential biological targets. For the first time, the ergosterol biosynthetic pathway in M. perniciosa was studied and the lanosterol 14α-demethylase gene (ERG11) that encodes the main enzyme of this pathway and is a target for fungicides was cloned, characterized molecularly and its phylogeny analyzed. ERG11 genomic DNA and cDNA were characterized and sequence analysis of the ERG11 protein identified highly conserved domains typical of this enzyme, such as SRS1, SRS4, EXXR and the heme-binding region (HBR). Comparison of the protein sequences and phylogenetic analysis revealed that the M. perniciosa enzyme was most closely related to that of Coprinopsis cinerea. PMID:25505843
Molecular biomarkers of resistance to anti-EGFR treatment in metastatic colorectal cancer, from classical to innovation.

PubMed

Giampieri, Riccardo; Scartozzi, Mario; Del Prete, Michela; Maccaroni, Elena; Bittoni, Alessandro; Faloppi, Luca; Bianconi, Maristella; Cecchini, Luca; Cascinu, Stefano

2013-11-01

Systematic dissection of the EGFR pathway was considered as the best way to identify putative markers of resistance to anti-EGFR therapies. This kind of approach leaves other, less known but by no means less important, putative mechanisms of resistance. We tried to shed some light on these mechanisms of resistance. We performed a research through Pubmed database of all published articles highlighting mechanisms of resistance to Cetuximab and Panitumumab based therapies, published in 2000-2012 period. We reviewed the "classical" molecular factors, extensively analyzed as predictive factors for efficacy to anti-EGFR therapy, such as K-ras, B-raf, and PI3K-mTOR-Akt, focusing on their predictive or prognostic value and on the controversial aspects of the biomarker analysis for clinical practice. On the second part we will then move on to other less known molecular markers, for the future understanding of biological mechanisms underlying anti-EGFR therapy resistance, such as non-canonical heterodimer candidates, microRNA, IGF1-IGF1R, HGF-cMET and secondary mutations of EGFR. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
OralCard: a bioinformatic tool for the study of oral proteome.

PubMed

Arrais, Joel P; Rosa, Nuno; Melo, José; Coelho, Edgar D; Amaral, Diana; Correia, Maria José; Barros, Marlene; Oliveira, José Luís

2013-07-01

The molecular complexity of the human oral cavity can only be clarified through identification of components that participate within it. However current proteomic techniques produce high volumes of information that are dispersed over several online databases. Collecting all of this data and using an integrative approach capable of identifying unknown associations is still an unsolved problem. This is the main motivation for this work. We present the online bioinformatic tool OralCard, which comprises results from 55 manually curated articles reflecting the oral molecular ecosystem (OralPhysiOme). It comprises experimental information available from the oral proteome both of human (OralOme) and microbial origin (MicroOralOme) structured in protein, disease and organism. This tool is a key resource for researchers to understand the molecular foundations implicated in biology and disease mechanisms of the oral cavity. The usefulness of this tool is illustrated with the analysis of the oral proteome associated with diabetes melitus type 2. OralCard is available at http://bioinformatics.ua.pt/oralcard. Copyright © 2013 Elsevier Ltd. All rights reserved.
Assessing the binding of cholinesterase inhibitors by docking and molecular dynamics studies.

PubMed

Ali, M Rejwan; Sadoqi, Mostafa; Møller, Simon G; Boutajangout, Allal; Mezei, Mihaly

2017-09-01

In this report we assessed by docking and molecular dynamics the binding mechanisms of three FDA-approved Alzheimer drugs, inhibitors of the enzyme acetylcholinesterase (AChE): donepezil, galantamine and rivastigmine. Dockings by the softwares Autodock-Vina, PatchDock and Plant reproduced the docked conformations of the inhibitor-enzyme complexes within 2Å of RMSD of the X-ray structure. Free-energy scores show strong affinity of the inhibitors for the enzyme binding pocket. Three independent Molecular Dynamics simulation runs indicated general stability of donepezil, galantamine and rivastigmine in their respective enzyme binding pocket (also referred to as gorge) as well as the tendency to form hydrogen bonds with the water molecules. The binding of rivastigmine in the Torpedo California AChE binding pocket is interesting as it eventually undergoes carbamylation and breaks apart according to the X-ray structure of the complex. Similarity search in the ZINC database and targeted docking on the gorge region of the AChE enzyme gave new putative inhibitor molecules with high predicted binding affinity, suitable for potential biophysical and biological assessments. Copyright © 2017 Elsevier Inc. All rights reserved.
Use of mutation spectra analysis software.

PubMed

Rogozin, I; Kondrashov, F; Glazko, G

2001-02-01

The study and comparison of mutation(al) spectra is an important problem in molecular biology, because these spectra often reflect on important features of mutations and their fixation. Such features include the interaction of DNA with various mutagens, the function of repair/replication enzymes, and properties of target proteins. It is known that mutability varies significantly along nucleotide sequences, such that mutations often concentrate at certain positions, called "hotspots," in a sequence. In this paper, we discuss in detail two approaches for mutation spectra analysis: the comparison of mutation spectra with a HG-PUBL program, (FTP: sunsite.unc.edu/pub/academic/biology/dna-mutations/hyperg) and hotspot prediction with the CLUSTERM program (www.itba.mi.cnr.it/webmutation; ftp.bionet.nsc.ru/pub/biology/dbms/clusterm.zip). Several other approaches for mutational spectra analysis, such as the analysis of a target protein structure, hotspot context revealing, multiple spectra comparisons, as well as a number of mutation databases are briefly described. Mutation spectra in the lacI gene of E. coli and the human p53 gene are used for illustration of various difficulties of such analysis. Copyright 2001 Wiley-Liss, Inc.
Design of a comprehensive biochemistry and molecular biology experiment: phase variation caused by recombinational regulation of bacterial gene expression.

PubMed

Sheng, Xiumei; Xu, Shungao; Lu, Renyun; Isaac, Dadzie; Zhang, Xueyi; Zhang, Haifang; Wang, Huifang; Qiao, Zheng; Huang, Xinxiang

2014-01-01

Scientific experiments are indispensable parts of Biochemistry and Molecular Biology. In this study, a comprehensive Biochemistry and Molecular Biology experiment about Salmonella enterica serovar Typhi Flagellar phase variation has been designed. It consisted of three parts, namely, inducement of bacterial Flagellar phase variation, antibody agglutination test, and PCR analysis. Phase variation was observed by baterial motility assay and identified by antibody agglutination test and PCR analysis. This comprehensive experiment can be performed to help students improve their ability to use the knowledge acquired in Biochemistry and Molecular Biology. Copyright © 2014 by The International Union of Biochemistry and Molecular Biology.
Recent Progress in the Development of Metabolome Databases for Plant Systems Biology

PubMed Central

Fukushima, Atsushi; Kusano, Miyako

2013-01-01

Metabolomics has grown greatly as a functional genomics tool, and has become an invaluable diagnostic tool for biochemical phenotyping of biological systems. Over the past decades, a number of databases involving information related to mass spectra, compound names and structures, statistical/mathematical models and metabolic pathways, and metabolite profile data have been developed. Such databases complement each other and support efficient growth in this area, although the data resources remain scattered across the World Wide Web. Here, we review available metabolome databases and summarize the present status of development of related tools, particularly focusing on the plant metabolome. Data sharing discussed here will pave way for the robust interpretation of metabolomic data and advances in plant systems biology. PMID:23577015
[Construction of chemical information database based on optical structure recognition technique].

PubMed

Lv, C Y; Li, M N; Zhang, L R; Liu, Z M

2018-04-18

To create a protocol that could be used to construct chemical information database from scientific literature quickly and automatically. Scientific literature, patents and technical reports from different chemical disciplines were collected and stored in PDF format as fundamental datasets. Chemical structures were transformed from published documents and images to machine-readable data by using the name conversion technology and optical structure recognition tool CLiDE. In the process of molecular structure information extraction, Markush structures were enumerated into well-defined monomer molecules by means of QueryTools in molecule editor ChemDraw. Document management software EndNote X8 was applied to acquire bibliographical references involving title, author, journal and year of publication. Text mining toolkit ChemDataExtractor was adopted to retrieve information that could be used to populate structured chemical database from figures, tables, and textual paragraphs. After this step, detailed manual revision and annotation were conducted in order to ensure the accuracy and completeness of the data. In addition to the literature data, computing simulation platform Pipeline Pilot 7.5 was utilized to calculate the physical and chemical properties and predict molecular attributes. Furthermore, open database ChEMBL was linked to fetch known bioactivities, such as indications and targets. After information extraction and data expansion, five separate metadata files were generated, including molecular structure data file, molecular information, bibliographical references, predictable attributes and known bioactivities. Canonical simplified molecular input line entry specification as primary key, metadata files were associated through common key nodes including molecular number and PDF number to construct an integrated chemical information database. A reasonable construction protocol of chemical information database was created successfully. A total of 174 research articles and 25 reviews published in Marine Drugs from January 2015 to June 2016 collected as essential data source, and an elementary marine natural product database named PKU-MNPD was built in accordance with this protocol, which contained 3 262 molecules and 19 821 records. This data aggregation protocol is of great help for the chemical information database construction in accuracy, comprehensiveness and efficiency based on original documents. The structured chemical information database can facilitate the access to medical intelligence and accelerate the transformation of scientific research achievements.
A comprehensive experiment for molecular biology: Determination of single nucleotide polymorphism in human REV3 gene using PCR-RFLP.

PubMed

Zhang, Xu; Shao, Meng; Gao, Lu; Zhao, Yuanyuan; Sun, Zixuan; Zhou, Liping; Yan, Yongmin; Shao, Qixiang; Xu, Wenrong; Qian, Hui

2017-07-08

Laboratory exercise is helpful for medical students to understand the basic principles of molecular biology and to learn about the practical applications of molecular biology. We have designed a lab course on molecular biology about the determination of single nucleotide polymorphism (SNP) in human REV3 gene, the product of which is a subunit of DNA polymerase ζ and SNPs in this gene are associated with altered susceptibility to cancer. This newly designed experiment is composed of three parts, including genomic DNA extraction, gene amplification by PCR, and genotyping by RFLP. By combining these activities, the students are not only able to learn a series of biotechniques in molecular biology, but also acquire the ability to link the learned knowledge with practical applications. This comprehensive experiment will help the medical students improve the conceptual understanding of SNP and the technical understanding of SNP detection. © 2017 by The International Union of Biochemistry and Molecular Biology, 45(4):299-304, 2017. © 2017 The International Union of Biochemistry and Molecular Biology.
PhyloExplorer: a web server to validate, explore and query phylogenetic trees

PubMed Central

Ranwez, Vincent; Clairon, Nicolas; Delsuc, Frédéric; Pourali, Saeed; Auberval, Nicolas; Diser, Sorel; Berry, Vincent

2009-01-01

Background Many important problems in evolutionary biology require molecular phylogenies to be reconstructed. Phylogenetic trees must then be manipulated for subsequent inclusion in publications or analyses such as supertree inference and tree comparisons. However, no tool is currently available to facilitate the management of tree collections providing, for instance: standardisation of taxon names among trees with respect to a reference taxonomy; selection of relevant subsets of trees or sub-trees according to a taxonomic query; or simply computation of descriptive statistics on the collection. Moreover, although several databases of phylogenetic trees exist, there is currently no easy way to find trees that are both relevant and complementary to a given collection of trees. Results We propose a tool to facilitate assessment and management of phylogenetic tree collections. Given an input collection of rooted trees, PhyloExplorer provides facilities for obtaining statistics describing the collection, correcting invalid taxon names, extracting taxonomically relevant parts of the collection using a dedicated query language, and identifying related trees in the TreeBASE database. Conclusion PhyloExplorer is a simple and interactive website implemented through underlying Python libraries and MySQL databases. It is available at: and the source code can be downloaded from: . PMID:19450253
dbCPG: A web resource for cancer predisposition genes.

PubMed

Wei, Ran; Yao, Yao; Yang, Wu; Zheng, Chun-Hou; Zhao, Min; Xia, Junfeng

2016-06-21

Cancer predisposition genes (CPGs) are genes in which inherited mutations confer highly or moderately increased risks of developing cancer. Identification of these genes and understanding the biological mechanisms that underlie them is crucial for the prevention, early diagnosis, and optimized management of cancer. Over the past decades, great efforts have been made to identify CPGs through multiple strategies. However, information on these CPGs and their molecular functions is scattered. To address this issue and provide a comprehensive resource for researchers, we developed the Cancer Predisposition Gene Database (dbCPG, Database URL: http://bioinfo.ahu.edu.cn:8080/dbCPG/index.jsp), the first literature-based gene resource for exploring human CPGs. It contains 827 human (724 protein-coding, 23 non-coding, and 80 unknown type genes), 637 rats, and 658 mouse CPGs. Furthermore, data mining was performed to gain insights into the understanding of the CPGs data, including functional annotation, gene prioritization, network analysis of prioritized genes and overlap analysis across multiple cancer types. A user-friendly web interface with multiple browse, search, and upload functions was also developed to facilitate access to the latest information on CPGs. Taken together, the dbCPG database provides a comprehensive data resource for further studies of cancer predisposition genes.
A proteomics study of barley powdery mildew haustoria.

PubMed

Godfrey, Dale; Zhang, Ziguo; Saalbach, Gerhard; Thordal-Christensen, Hans

2009-06-01

A number of fungal and oomycete plant pathogens of major economic importance feed on their hosts by means of haustoria, which they place inside living plant cells. The underlying mechanisms are poorly understood, partly due to difficulty in preparing haustoria. We have therefore developed a procedure for isolating haustoria from the barley powdery mildew fungus (Blumeria graminis f.sp. hordei, Bgh). We subsequently aimed to understand the molecular mechanisms of haustoria through a study of their proteome. Extracted proteins were digested using trypsin, separated by LC, and analysed by MS/MS. Searches of a custom Bgh EST sequence database and the NCBI-NR fungal protein database, using the MS/MS data, identified 204 haustoria proteins. The majority of the proteins appear to have roles in protein metabolic pathways and biological energy production. Surprisingly, pyruvate decarboxylase (PDC), involved in alcoholic fermentation and commonly abundant in fungi and plants, was absent in our Bgh proteome data set. A sequence encoding this enzyme was also absent in our EST sequence database. Significantly, BLAST searches of the recently available Bgh genome sequence data also failed to identify a sequence encoding this enzyme, strongly indicating that Bgh does not have a gene for PDC.
PATIKAweb: a Web interface for analyzing biological pathways through advanced querying and visualization.

PubMed

Dogrusoz, U; Erson, E Z; Giral, E; Demir, E; Babur, O; Cetintas, A; Colak, R

2006-02-01

Patikaweb provides a Web interface for retrieving and analyzing biological pathways in the Patika database, which contains data integrated from various prominent public pathway databases. It features a user-friendly interface, dynamic visualization and automated layout, advanced graph-theoretic queries for extracting biologically important phenomena, local persistence capability and exporting facilities to various pathway exchange formats.
Pivotal role of the muscle-contraction pathway in cryptorchidism and evidence for genomic connections with cardiomyopathy pathways in RASopathies.

PubMed

Cannistraci, Carlo V; Ogorevc, Jernej; Zorc, Minja; Ravasi, Timothy; Dovc, Peter; Kunej, Tanja

2013-02-14

Cryptorchidism is the most frequent congenital disorder in male children; however the genetic causes of cryptorchidism remain poorly investigated. Comparative integratomics combined with systems biology approach was employed to elucidate genetic factors and molecular pathways underlying testis descent. Literature mining was performed to collect genomic loci associated with cryptorchidism in seven mammalian species. Information regarding the collected candidate genes was stored in MySQL relational database. Genomic view of the loci was presented using Flash GViewer web tool (http://gmod.org/wiki/Flashgviewer/). DAVID Bioinformatics Resources 6.7 was used for pathway enrichment analysis. Cytoscape plug-in PiNGO 1.11 was employed for protein-network-based prediction of novel candidate genes. Relevant protein-protein interactions were confirmed and visualized using the STRING database (version 9.0). The developed cryptorchidism gene atlas includes 217 candidate loci (genes, regions involved in chromosomal mutations, and copy number variations) identified at the genomic, transcriptomic, and proteomic level. Human orthologs of the collected candidate loci were presented using a genomic map viewer. The cryptorchidism gene atlas is freely available online: http://www.integratomics-time.com/cryptorchidism/. Pathway analysis suggested the presence of twelve enriched pathways associated with the list of 179 literature-derived candidate genes. Additionally, a list of 43 network-predicted novel candidate genes was significantly associated with four enriched pathways. Joint pathway analysis of the collected and predicted candidate genes revealed the pivotal importance of the muscle-contraction pathway in cryptorchidism and evidence for genomic associations with cardiomyopathy pathways in RASopathies. The developed gene atlas represents an important resource for the scientific community researching genetics of cryptorchidism. The collected data will further facilitate development of novel genetic markers and could be of interest for functional studies in animals and human. The proposed network-based systems biology approach elucidates molecular mechanisms underlying co-presence of cryptorchidism and cardiomyopathy in RASopathies. Such approach could also aid in molecular explanation of co-presence of diverse and apparently unrelated clinical manifestations in other syndromes.
Impact of cultivation on characterisation of species composition of soil bacterial communities.

PubMed

McCaig, A E.; Grayston, S J.; Prosser, J I.; Glover, L A.

2001-03-01

The species composition of culturable bacteria in Scottish grassland soils was investigated using a combination of Biolog and 16S rDNA analysis for characterisation of isolates. The inclusion of a molecular approach allowed direct comparison of sequences from culturable bacteria with sequences obtained during analysis of DNA extracted directly from the same soil samples. Bacterial strains were isolated on Pseudomonas isolation agar (PIA), a selective medium, and on tryptone soya agar (TSA), a general laboratory medium. In total, 12 and 21 morphologically different bacterial cultures were isolated on PIA and TSA, respectively. Biolog and sequencing placed PIA isolates in the same taxonomic groups, the majority of cultures belonging to the Pseudomonas (sensu stricto) group. However, analysis of 16S rDNA sequences proved more efficient than Biolog for characterising TSA isolates due to limitations of the Microlog database for identifying environmental bacteria. In general, 16S rDNA sequences from TSA isolates showed high similarities to cultured species represented in sequence databases, although TSA-8 showed only 92.5% similarity to the nearest relative, Bacillus insolitus. In general, there was very little overlap between the culturable and uncultured bacterial communities, although two sequences, PIA-2 and TSA-13, showed >99% similarity to soil clones. A cloning step was included prior to sequence analysis of two isolates, TSA-5 and TSA-14, and analysis of several clones confirmed that these cultures comprised at least four and three sequence types, respectively. All isolate clones were most closely related to uncultured bacteria, with clone TSA-5.1 showing 99.8% similarity to a sequence amplified directly from the same soil sample. Interestingly, one clone, TSA-5.4, clustered within a novel group comprising only uncultured sequences. This group, which is associated with the novel, deep-branching Acidobacterium capsulatum lineage, also included clones isolated during direct analysis of the same soil and from a wide range of other sample types studied elsewhere. The study demonstrates the value of fine-scale molecular analysis for identification of laboratory isolates and indicates the culturability of approximately 1% of the total population but under a restricted range of media and cultivation conditions.
Bibliographical database of radiation biological dosimetry and risk assessment: Part 1, through June 1988

DOE Office of Scientific and Technical Information (OSTI.GOV)

Straume, T.; Ricker, Y.; Thut, M.

1988-08-29

This database was constructed to support research in radiation biological dosimetry and risk assessment. Relevant publications were identified through detailed searches of national and international electronic databases and through our personal knowledge of the subject. Publications were numbered and key worded, and referenced in an electronic data-retrieval system that permits quick access through computerized searches on publication number, authors, key words, title, year, and journal name. Photocopies of all publications contained in the database are maintained in a file that is numerically arranged by citation number. This report of the database is provided as a useful reference and overview. Itmore » should be emphasized that the database will grow as new citations are added to it. With that in mind, we arranged this report in order of ascending citation number so that follow-up reports will simply extend this document. The database cite 1212 publications. Publications are from 119 different scientific journals, 27 of these journals are cited at least 5 times. It also contains reference to 42 books and published symposia, and 129 reports. Information relevant to radiation biological dosimetry and risk assessment is widely distributed among the scientific literature, although a few journals clearly dominate. The four journals publishing the largest number of relevant papers are Health Physics, Mutation Research, Radiation Research, and International Journal of Radiation Biology. Publications in Health Physics make up almost 10% of the current database.« less
BioCarian: search engine for exploratory searches in heterogeneous biological databases.

PubMed

Zaki, Nazar; Tennakoon, Chandana

2017-10-02

There are a large number of biological databases publicly available for scientists in the web. Also, there are many private databases generated in the course of research projects. These databases are in a wide variety of formats. Web standards have evolved in the recent times and semantic web technologies are now available to interconnect diverse and heterogeneous sources of data. Therefore, integration and querying of biological databases can be facilitated by techniques used in semantic web. Heterogeneous databases can be converted into Resource Description Format (RDF) and queried using SPARQL language. Searching for exact queries in these databases is trivial. However, exploratory searches need customized solutions, especially when multiple databases are involved. This process is cumbersome and time consuming for those without a sufficient background in computer science. In this context, a search engine facilitating exploratory searches of databases would be of great help to the scientific community. We present BioCarian, an efficient and user-friendly search engine for performing exploratory searches on biological databases. The search engine is an interface for SPARQL queries over RDF databases. We note that many of the databases can be converted to tabular form. We first convert the tabular databases to RDF. The search engine provides a graphical interface based on facets to explore the converted databases. The facet interface is more advanced than conventional facets. It allows complex queries to be constructed, and have additional features like ranking of facet values based on several criteria, visually indicating the relevance of a facet value and presenting the most important facet values when a large number of choices are available. For the advanced users, SPARQL queries can be run directly on the databases. Using this feature, users will be able to incorporate federated searches of SPARQL endpoints. We used the search engine to do an exploratory search on previously published viral integration data and were able to deduce the main conclusions of the original publication. BioCarian is accessible via http://www.biocarian.com . We have developed a search engine to explore RDF databases that can be used by both novice and advanced users.
[Advance in molecular biology of Dendrobium (Orchidaceae)].

PubMed

Li, Qing; Li, Biao; Guo, Shun-Xing

2016-08-01

With the development of molecular biology, the process in molecular biology research of Dendrobium is going fast. Not only did it provide new ways to identify Dendrobium quickly, reveal the genetic diversity and relationship of Dendrobium, but also lay the vital foundation for explaining the mechanism of Dendrobium growth and metabolism. The present paper reviews the recent process in molecular biology research of Dendrobium from three aspects, including molecular identification, genetic diversity and functional genes. And this review will facilitate the development of this research area and Dendrobium. Copyright© by the Chinese Pharmaceutical Association.
Performance of Matrix-Assisted Laser Desorption Ionization–Time of Flight Mass Spectrometry for Identifying Clinical Malassezia Isolates

PubMed Central

Machouart, Marie; Morio, Florent; Sabou, Marcela; Kauffmann-LaCroix, Catherine; Contet-Audonneau, Nelly; Candolfi, Ermanno; Letscher-Bru, Valérie

2016-01-01

ABSTRACT The genus Malassezia comprises commensal yeasts on human skin. These yeasts are involved in superficial infections but are also isolated in deeper infections, such as fungemia, particularly in certain at-risk patients, such as neonates or patients with parenteral nutrition catheters. Very little is known about Malassezia epidemiology and virulence. This is due mainly to the difficulty of distinguishing species. Currently, species identification is based on morphological and biochemical characteristics. Only molecular biology techniques identify species with certainty, but they are time-consuming and expensive. The aim of this study was to develop and evaluate a matrix-assisted laser desorption ionization–time of flight (MALDI-TOF) database for identifying Malassezia species by mass spectrometry. Eighty-five Malassezia isolates from patients in three French university hospitals were investigated. Each strain was identified by internal transcribed spacer sequencing. Forty-five strains of the six species Malassezia furfur, M. sympodialis, M. slooffiae, M. globosa, M. restricta, and M. pachydermatis allowed the creation of a MALDI-TOF database. Forty other strains were used to test this database. All strains were identified by our Malassezia database with log scores of >2.0, according to the manufacturer's criteria. Repeatability and reproducibility tests showed a coefficient of variation of the log score values of <10%. In conclusion, our new Malassezia database allows easy, fast, and reliable identification of Malassezia species. Implementation of this database will contribute to a better, more rapid identification of Malassezia species and will be helpful in gaining a better understanding of their epidemiology. PMID:27795342
Performance of Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry for Identifying Clinical Malassezia Isolates.

PubMed

Denis, Julie; Machouart, Marie; Morio, Florent; Sabou, Marcela; Kauffmann-LaCroix, Catherine; Contet-Audonneau, Nelly; Candolfi, Ermanno; Letscher-Bru, Valérie

2017-01-01

The genus Malassezia comprises commensal yeasts on human skin. These yeasts are involved in superficial infections but are also isolated in deeper infections, such as fungemia, particularly in certain at-risk patients, such as neonates or patients with parenteral nutrition catheters. Very little is known about Malassezia epidemiology and virulence. This is due mainly to the difficulty of distinguishing species. Currently, species identification is based on morphological and biochemical characteristics. Only molecular biology techniques identify species with certainty, but they are time-consuming and expensive. The aim of this study was to develop and evaluate a matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) database for identifying Malassezia species by mass spectrometry. Eighty-five Malassezia isolates from patients in three French university hospitals were investigated. Each strain was identified by internal transcribed spacer sequencing. Forty-five strains of the six species Malassezia furfur, M. sympodialis, M. slooffiae, M. globosa, M. restricta, and M. pachydermatis allowed the creation of a MALDI-TOF database. Forty other strains were used to test this database. All strains were identified by our Malassezia database with log scores of >2.0, according to the manufacturer's criteria. Repeatability and reproducibility tests showed a coefficient of variation of the log score values of <10%. In conclusion, our new Malassezia database allows easy, fast, and reliable identification of Malassezia species. Implementation of this database will contribute to a better, more rapid identification of Malassezia species and will be helpful in gaining a better understanding of their epidemiology. Copyright © 2016 Denis et al.

dbMDEGA: a database for meta-analysis of differentially expressed genes in autism spectrum disorder.

PubMed

Zhang, Shuyun; Deng, Libin; Jia, Qiyue; Huang, Shaoting; Gu, Junwang; Zhou, Fankun; Gao, Meng; Sun, Xinyi; Feng, Chang; Fan, Guangqin

2017-11-16

Autism spectrum disorders (ASD) are hereditary, heterogeneous and biologically complex neurodevelopmental disorders. Individual studies on gene expression in ASD cannot provide clear consensus conclusions. Therefore, a systematic review to synthesize the current findings from brain tissues and a search tool to share the meta-analysis results are urgently needed. Here, we conducted a meta-analysis of brain gene expression profiles in the current reported human ASD expression datasets (with 84 frozen male cortex samples, 17 female cortex samples, 32 cerebellum samples and 4 formalin fixed samples) and knock-out mouse ASD model expression datasets (with 80 collective brain samples). Then, we applied R language software and developed an interactive shared and updated database (dbMDEGA) displaying the results of meta-analysis of data from ASD studies regarding differentially expressed genes (DEGs) in the brain. This database, dbMDEGA ( https://dbmdega.shinyapps.io/dbMDEGA/ ), is a publicly available web-portal for manual annotation and visualization of DEGs in the brain from data from ASD studies. This database uniquely presents meta-analysis values and homologous forest plots of DEGs in brain tissues. Gene entries are annotated with meta-values, statistical values and forest plots of DEGs in brain samples. This database aims to provide searchable meta-analysis results based on the current reported brain gene expression datasets of ASD to help detect candidate genes underlying this disorder. This new analytical tool may provide valuable assistance in the discovery of DEGs and the elucidation of the molecular pathogenicity of ASD. This database model may be replicated to study other disorders.
PhytoREF: a reference database of the plastidial 16S rRNA gene of photosynthetic eukaryotes with curated taxonomy.

PubMed

Decelle, Johan; Romac, Sarah; Stern, Rowena F; Bendif, El Mahdi; Zingone, Adriana; Audic, Stéphane; Guiry, Michael D; Guillou, Laure; Tessier, Désiré; Le Gall, Florence; Gourvil, Priscillia; Dos Santos, Adriana L; Probert, Ian; Vaulot, Daniel; de Vargas, Colomban; Christen, Richard

2015-11-01

Photosynthetic eukaryotes have a critical role as the main producers in most ecosystems of the biosphere. The ongoing environmental metabarcoding revolution opens the perspective for holistic ecosystems biological studies of these organisms, in particular the unicellular microalgae that often lack distinctive morphological characters and have complex life cycles. To interpret environmental sequences, metabarcoding necessarily relies on taxonomically curated databases containing reference sequences of the targeted gene (or barcode) from identified organisms. To date, no such reference framework exists for photosynthetic eukaryotes. In this study, we built the PhytoREF database that contains 6490 plastidial 16S rDNA reference sequences that originate from a large diversity of eukaryotes representing all known major photosynthetic lineages. We compiled 3333 amplicon sequences available from public databases and 879 sequences extracted from plastidial genomes, and generated 411 novel sequences from cultured marine microalgal strains belonging to different eukaryotic lineages. A total of 1867 environmental Sanger 16S rDNA sequences were also included in the database. Stringent quality filtering and a phylogeny-based taxonomic classification were applied for each 16S rDNA sequence. The database mainly focuses on marine microalgae, but sequences from land plants (representing half of the PhytoREF sequences) and freshwater taxa were also included to broaden the applicability of PhytoREF to different aquatic and terrestrial habitats. PhytoREF, accessible via a web interface (http://phytoref.fr), is a new resource in molecular ecology to foster the discovery, assessment and monitoring of the diversity of photosynthetic eukaryotes using high-throughput sequencing. © 2015 John Wiley & Sons Ltd.
Sting_RDB: a relational database of structural parameters for protein analysis with support for data warehousing and data mining.

PubMed

Oliveira, S R M; Almeida, G V; Souza, K R R; Rodrigues, D N; Kuser-Falcão, P R; Yamagishi, M E B; Santos, E H; Vieira, F D; Jardine, J G; Neshich, G

2007-10-05

An effective strategy for managing protein databases is to provide mechanisms to transform raw data into consistent, accurate and reliable information. Such mechanisms will greatly reduce operational inefficiencies and improve one's ability to better handle scientific objectives and interpret the research results. To achieve this challenging goal for the STING project, we introduce Sting_RDB, a relational database of structural parameters for protein analysis with support for data warehousing and data mining. In this article, we highlight the main features of Sting_RDB and show how a user can explore it for efficient and biologically relevant queries. Considering its importance for molecular biologists, effort has been made to advance Sting_RDB toward data quality assessment. To the best of our knowledge, Sting_RDB is one of the most comprehensive data repositories for protein analysis, now also capable of providing its users with a data quality indicator. This paper differs from our previous study in many aspects. First, we introduce Sting_RDB, a relational database with mechanisms for efficient and relevant queries using SQL. Sting_rdb evolved from the earlier, text (flat file)-based database, in which data consistency and integrity was not guaranteed. Second, we provide support for data warehousing and mining. Third, the data quality indicator was introduced. Finally and probably most importantly, complex queries that could not be posed on a text-based database, are now easily implemented. Further details are accessible at the Sting_RDB demo web page: http://www.cbi.cnptia.embrapa.br/StingRDB.
History of the molecular biology of cytomegaloviruses.

PubMed

Stinski, Mark F

2014-01-01

The history of the molecular biology of cytomegaloviruses from the purification of the virus and the viral DNA to the cloning and expression of the viral genes is reviewed. A key genetic element of cytomegalovirus (the CMV promoter) contributed to our understanding of eukaryotic cell molecular biology and to the development of lifesaving therapeutic proteins. The study of the molecular biology of cytomegaloviruses also contributed to the development of antivirals to control the viral infection.
Teaching Molecular Biological Techniques in a Research Content

ERIC Educational Resources Information Center

Stiller, John W.; Coggins, T. Chad

2006-01-01

Molecular biological methods, such as the polymerase chain reaction (PCR) and gel electrophoresis, are now commonly taught to students in introductory biology courses at the college and even high school levels. This often includes hands-on experience with one or more molecular techniques as part of a general biology laboratory. To assure that most…
Tagging and Purifying Proteins to Teach Molecular Biology and Advanced Biochemistry

ERIC Educational Resources Information Center

Roecklein-Canfield, Jennifer A.; Lopilato, Jane

2004-01-01

Two distinct courses, "Molecular Biology" taught by the Biology Department and "Advanced Biochemistry" taught by the Chemistry Department, complement each other and, when taught in a coordinated and integrated way, can enhance student learning and understanding of complex material. "Molecular Biology" is a comprehensive lecture-based course with a…
Implementation and Assessment of a Molecular Biology and Bioinformatics Undergraduate Degree Program

ERIC Educational Resources Information Center

Pham, Daphne Q. -D.; Higgs, David C.; Statham, Anne; Schleiter, Mary Kay

2008-01-01

The Department of Biological Sciences at the University of Wisconsin-Parkside has developed and implemented an innovative, multidisciplinary undergraduate curriculum in Molecular Biology and Bioinformatics (MBB). The objective of the MBB program is to give students a hands-on facility with molecular biology theories and laboratory techniques, an…
In Silico PCR Tools for a Fast Primer, Probe, and Advanced Searching.

PubMed

Kalendar, Ruslan; Muterko, Alexandr; Shamekova, Malika; Zhambakin, Kabyl

2017-01-01

The polymerase chain reaction (PCR) is fundamental to molecular biology and is the most important practical molecular technique for the research laboratory. The principle of this technique has been further used and applied in plenty of other simple or complex nucleic acid amplification technologies (NAAT). In parallel to laboratory "wet bench" experiments for nucleic acid amplification technologies, in silico or virtual (bioinformatics) approaches have been developed, among which in silico PCR analysis. In silico NAAT analysis is a useful and efficient complementary method to ensure the specificity of primers or probes for an extensive range of PCR applications from homology gene discovery, molecular diagnosis, DNA fingerprinting, and repeat searching. Predicting sensitivity and specificity of primers and probes requires a search to determine whether they match a database with an optimal number of mismatches, similarity, and stability. In the development of in silico bioinformatics tools for nucleic acid amplification technologies, the prospects for the development of new NAAT or similar approaches should be taken into account, including forward-looking and comprehensive analysis that is not limited to only one PCR technique variant. The software FastPCR and the online Java web tool are integrated tools for in silico PCR of linear and circular DNA, multiple primer or probe searches in large or small databases and for advanced search. These tools are suitable for processing of batch files that are essential for automation when working with large amounts of data. The FastPCR software is available for download at http://primerdigital.com/fastpcr.html and the online Java version at http://primerdigital.com/tools/pcr.html .
A review on computational systems biology of pathogen–host interactions

PubMed Central

Durmuş, Saliha; Çakır, Tunahan; Özgür, Arzucan; Guthke, Reinhard

2015-01-01

Pathogens manipulate the cellular mechanisms of host organisms via pathogen–host interactions (PHIs) in order to take advantage of the capabilities of host cells, leading to infections. The crucial role of these interspecies molecular interactions in initiating and sustaining infections necessitates a thorough understanding of the corresponding mechanisms. Unlike the traditional approach of considering the host or pathogen separately, a systems-level approach, considering the PHI system as a whole is indispensable to elucidate the mechanisms of infection. Following the technological advances in the post-genomic era, PHI data have been produced in large-scale within the last decade. Systems biology-based methods for the inference and analysis of PHI regulatory, metabolic, and protein–protein networks to shed light on infection mechanisms are gaining increasing demand thanks to the availability of omics data. The knowledge derived from the PHIs may largely contribute to the identification of new and more efficient therapeutics to prevent or cure infections. There are recent efforts for the detailed documentation of these experimentally verified PHI data through Web-based databases. Despite these advances in data archiving, there are still large amounts of PHI data in the biomedical literature yet to be discovered, and novel text mining methods are in development to unearth such hidden data. Here, we review a collection of recent studies on computational systems biology of PHIs with a special focus on the methods for the inference and analysis of PHI networks, covering also the Web-based databases and text-mining efforts to unravel the data hidden in the literature. PMID:25914674
Perigone Lobe Transcriptome Analysis Provides Insights into Rafflesia cantleyi Flower Development.

PubMed

Lee, Xin-Wei; Mat-Isa, Mohd-Noor; Mohd-Elias, Nur-Atiqah; Aizat-Juhari, Mohd Afiq; Goh, Hoe-Han; Dear, Paul H; Chow, Keng-See; Haji Adam, Jumaat; Mohamed, Rahmah; Firdaus-Raih, Mohd; Wan, Kiew-Lian

2016-01-01

Rafflesia is a biologically enigmatic species that is very rare in occurrence and possesses an extraordinary morphology. This parasitic plant produces a gigantic flower up to one metre in diameter with no leaves, stem or roots. However, little is known about the floral biology of this species especially at the molecular level. In an effort to address this issue, we have generated and characterised the transcriptome of the Rafflesia cantleyi flower, and performed a comparison with the transcriptome of its floral bud to predict genes that are expressed and regulated during flower development. Approximately 40 million sequencing reads were generated and assembled de novo into 18,053 transcripts with an average length of 641 bp. Of these, more than 79% of the transcripts had significant matches to annotated sequences in the public protein database. A total of 11,756 and 7,891 transcripts were assigned to Gene Ontology categories and clusters of orthologous groups respectively. In addition, 6,019 transcripts could be mapped to 129 pathways in Kyoto Encyclopaedia of Genes and Genomes Pathway database. Digital abundance analysis identified 52 transcripts with very high expression in the flower transcriptome of R. cantleyi. Subsequently, analysis of differential expression between developing flower and the floral bud revealed a set of 105 transcripts with potential role in flower development. Our work presents a deep transcriptome resource analysis for the developing flower of R. cantleyi. Genes potentially involved in the growth and development of the R. cantleyi flower were identified and provide insights into biological processes that occur during flower development.
Enhancing navigation in biomedical databases by community voting and database-driven text classification

PubMed Central

Duchrow, Timo; Shtatland, Timur; Guettler, Daniel; Pivovarov, Misha; Kramer, Stefan; Weissleder, Ralph

2009-01-01

Background The breadth of biological databases and their information content continues to increase exponentially. Unfortunately, our ability to query such sources is still often suboptimal. Here, we introduce and apply community voting, database-driven text classification, and visual aids as a means to incorporate distributed expert knowledge, to automatically classify database entries and to efficiently retrieve them. Results Using a previously developed peptide database as an example, we compared several machine learning algorithms in their ability to classify abstracts of published literature results into categories relevant to peptide research, such as related or not related to cancer, angiogenesis, molecular imaging, etc. Ensembles of bagged decision trees met the requirements of our application best. No other algorithm consistently performed better in comparative testing. Moreover, we show that the algorithm produces meaningful class probability estimates, which can be used to visualize the confidence of automatic classification during the retrieval process. To allow viewing long lists of search results enriched by automatic classifications, we added a dynamic heat map to the web interface. We take advantage of community knowledge by enabling users to cast votes in Web 2.0 style in order to correct automated classification errors, which triggers reclassification of all entries. We used a novel framework in which the database "drives" the entire vote aggregation and reclassification process to increase speed while conserving computational resources and keeping the method scalable. In our experiments, we simulate community voting by adding various levels of noise to nearly perfectly labelled instances, and show that, under such conditions, classification can be improved significantly. Conclusion Using PepBank as a model database, we show how to build a classification-aided retrieval system that gathers training data from the community, is completely controlled by the database, scales well with concurrent change events, and can be adapted to add text classification capability to other biomedical databases. The system can be accessed at . PMID:19799796
The biobank for the molecular classification of kidney disease: research translation and precision medicine in nephrology.

PubMed

Muruve, Daniel A; Mann, Michelle C; Chapman, Kevin; Wong, Josee F; Ravani, Pietro; Page, Stacey A; Benediktsson, Hallgrimur

2017-07-26

Advances in technology and the ability to interrogate disease pathogenesis using systems biology approaches are exploding. As exemplified by the substantial progress in the personalized diagnosis and treatment of cancer, the application of systems biology to enable precision medicine in other disciplines such as Nephrology is well underway. Infrastructure that permits the integration of clinical data, patient biospecimens and advanced technologies is required for institutions to contribute to, and benefit from research in molecular disease classification and to devise specific and patient-oriented treatments. We describe the establishment of the Biobank for the Molecular Classification of Kidney Disease (BMCKD) at the University of Calgary, Alberta, Canada. The BMCKD consists of a fully equipped wet laboratory, an information technology infrastructure, and a formal operational, ethical and legal framework for banking human biospecimens and storing clinical data. The BMCKD first consolidated a large retrospective cohort of kidney biopsy specimens to create a population-based renal pathology database and tissue inventory of glomerular and other kidney diseases. The BMCKD will continue to prospectively bank all kidney biopsies performed in Southern Alberta. The BMCKD is equipped to perform molecular, clinical and epidemiologic studies in renal pathology. The BMCKD also developed formal biobanking procedures for human specimens such as blood, urine and nucleic acids collected for basic and clinical research studies or for advanced diagnostic technologies in clinical care. The BMCKD is guided by standard operating procedures, an ethics framework and legal agreements with stakeholders that include researchers, data custodians and patients. The design and structure of the BMCKD permits its inclusion in a wide variety of research and clinical activities. The BMCKD is a core multidisciplinary facility that will bridge basic and clinical research and integrate precision medicine into renal pathology and nephrology.
Remodeling Cildb, a popular database for cilia and links for ciliopathies

PubMed Central

2014-01-01

Background New generation technologies in cell and molecular biology generate large amounts of data hard to exploit for individual proteins. This is particularly true for ciliary and centrosomal research. Cildb is a multi–species knowledgebase gathering high throughput studies, which allows advanced searches to identify proteins involved in centrosome, basal body or cilia biogenesis, composition and function. Combined to localization of genetic diseases on human chromosomes given by OMIM links, candidate ciliopathy proteins can be compiled through Cildb searches. Methods Othology between recent versions of the whole proteomes was computed using Inparanoid and ciliary high throughput studies were remapped on these recent versions. Results Due to constant evolution of the ciliary and centrosomal field, Cildb has been recently upgraded twice, with new species whole proteomes and new ciliary studies, and the latter version displays a novel BioMart interface, much more intuitive than the previous ones. Conclusions This already popular database is designed now for easier use and is up to date in regard to high throughput ciliary studies. PMID:25422781
Molecular epidemiology biomarkers-Sample collection and processing considerations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holland, Nina T.; Pfleger, Laura; Berger, Eileen

2005-08-07

Biomarker studies require processing and storage of numerous biological samples with the goals of obtaining a large amount of information and minimizing future research costs. An efficient study design includes provisions for processing of the original samples, such as cryopreservation, DNA isolation, and preparation of specimens for exposure assessment. Use of standard, two-dimensional and nanobarcodes and customized electronic databases assure efficient management of large sample collections and tracking results of data analyses. Standard operating procedures and quality control plans help to protect sample quality and to assure validity of the biomarker data. Specific state, federal and international regulations are inmore » place regarding research with human samples, governing areas including custody, safety of handling, and transport of human samples. Appropriate informed consent must be obtained from the study subjects prior to sample collection and confidentiality of results maintained. Finally, examples of three biorepositories of different scale (European Cancer Study, National Cancer Institute and School of Public Health Biorepository, University of California, Berkeley) are used to illustrate challenges faced by investigators and the ways to overcome them. New software and biorepository technologies are being developed by many companies that will help to bring biological banking to a new level required by molecular epidemiology of the 21st century.« less
Design Principles of Nanoparticles as Contrast Agents for Magnetic Resonance Imaging

NASA Astrophysics Data System (ADS)

Shan, Liang; Gu, Xinbin; Wang, Paul

2013-09-01

Molecular imaging is an emerging field that introduces molecular agents into traditional imaging techniques, enabling visualization, characterization and measurement of biological processes at the molecular and cellular levels in humans and other living systems. The promise of molecular imaging lies in its potential for selective potency by targeting biomarkers or molecular targets and the imaging agents serve as reporters for the selectivity of targeting. Development of an efficient molecular imaging agent depends on well-controlled high-quality experiment design involving target selection, agent synthesis, in vitro characterization, and in vivo animal characterization before it is applied in humans. According to the analysis from the Molecular Imaging and Contrast Agent Database (MICAD, ), more than 6000 molecular imaging agents with sufficient preclinical evaluation have been reported to date in the literature and this number increases by 250-300 novel agents each year. The majority of these agents are radionuclides, which are developed for positron emission tomography (PET) and single photon emission computed tomography (SPECT). Contrast agents for magnetic resonance imaging (MRI) account for only a small part. This is largely due to the fact that MRI is currently not a fully quantitative imaging technique and is less sensitive than PET and SPECT. However, because of the superior ability to simultaneously extract molecular and anatomic information, molecular MRI is attracting significant interest and various targeted nanoparticle contrast agents have been synthesized for MRI. The first and one of the most critical steps in developing a targeted nanoparticle contrast agent is target selection, which plays the central role and forms the basis for success of molecular imaging. This chapter discusses the design principles of targeted contrast agents in the emerging frontiers of molecular MRI.
The 2015 edition of the GEISA spectroscopic database

NASA Astrophysics Data System (ADS)

Jacquinet-Husson, N.; Armante, R.; Scott, N. A.; Chédin, A.; Crépeau, L.; Boutammine, C.; Bouhdaoui, A.; Crevoisier, C.; Capelle, V.; Boonne, C.; Poulet-Crovisier, N.; Barbe, A.; Chris Benner, D.; Boudon, V.; Brown, L. R.; Buldyreva, J.; Campargue, A.; Coudert, L. H.; Devi, V. M.; Down, M. J.; Drouin, B. J.; Fayt, A.; Fittschen, C.; Flaud, J.-M.; Gamache, R. R.; Harrison, J. J.; Hill, C.; Hodnebrog, Ø.; Hu, S.-M.; Jacquemart, D.; Jolly, A.; Jiménez, E.; Lavrentieva, N. N.; Liu, A.-W.; Lodi, L.; Lyulin, O. M.; Massie, S. T.; Mikhailenko, S.; Müller, H. S. P.; Naumenko, O. V.; Nikitin, A.; Nielsen, C. J.; Orphal, J.; Perevalov, V. I.; Perrin, A.; Polovtseva, E.; Predoi-Cross, A.; Rotger, M.; Ruth, A. A.; Yu, S. S.; Sung, K.; Tashkun, S. A.; Tennyson, J.; Tyuterev, Vl. G.; Vander Auwera, J.; Voronin, B. A.; Makie, A.

2016-09-01

The GEISA database (Gestion et Etude des Informations Spectroscopiques Atmosphériques: Management and Study of Atmospheric Spectroscopic Information) has been developed and maintained by the http://ara.abct.lmd.polytechnique.fr. The "line parameters database" contains 52 molecular species (118 isotopologues) and transitions in the spectral range from 10-6 to 35,877.031 cm-1, representing 5,067,351 entries, against 3,794,297 in GEISA-2011. Among the previously existing molecules, 20 molecular species have been updated. A new molecule (SO3) has been added. HDO, isotopologue of H2O, is now identified as an independent molecular species. Seven new isotopologues have been added to the GEISA-2015 database. The "cross section sub-database" has been enriched by the addition of 43 new molecular species in its infrared part, 4 molecules (ethane, propane, acetone, acetonitrile) are also updated; they represent 3% of the update. A new section is added, in the near-infrared spectral region, involving 7 molecular species: CH3CN, CH3I, CH3O2, H2CO, HO2, HONO, NH3. The "microphysical and optical properties of atmospheric aerosols sub-database" has been updated for the first time since 2003. It contains more than 40 species originating from NCAR and 20 from the http://eodg.atm.ox.ac.uk/ARIA/introduction_nocol.html. As for the previous versions, this new release of GEISA and associated management software facilities are implemented and freely accessible on the http://cds-espri.ipsl.fr/etherTypo/?id=950.
Molecular Force Spectroscopy on Cells

NASA Astrophysics Data System (ADS)

Liu, Baoyu; Chen, Wei; Zhu, Cheng

2015-04-01

Molecular force spectroscopy has become a powerful tool to study how mechanics regulates biology, especially the mechanical regulation of molecular interactions and its impact on cellular functions. This force-driven methodology has uncovered a wealth of new information of the physical chemistry of molecular bonds for various biological systems. The new concepts, qualitative and quantitative measures describing bond behavior under force, and structural bases underlying these phenomena have substantially advanced our fundamental understanding of the inner workings of biological systems from the nanoscale (molecule) to the microscale (cell), elucidated basic molecular mechanisms of a wide range of important biological processes, and provided opportunities for engineering applications. Here, we review major force spectroscopic assays, conceptual developments of mechanically regulated kinetics of molecular interactions, and their biological relevance. We also present current challenges and highlight future directions.
Permanent Genetic Resources added to Molecular Ecology Resources Database 1 December 2009–31 January 2010

USDA-ARS?s Scientific Manuscript database

This article documents the addition of 220 microsatellite marker loci to the Molecular Ecology Resources Database. Loci were developed for the following species: Allanblackia floribunda, Amblyraja radiata, Bactrocera cucurbitae, Brachycaudus helichrysi, Calopogonium mucunoides, Dissodactylus primiti...
MMDB: Entrez’s 3D-structure database

PubMed Central

Wang, Yanli; Anderson, John B.; Chen, Jie; Geer, Lewis Y.; He, Siqian; Hurwitz, David I.; Liebert, Cynthia A.; Madej, Thomas; Marchler, Gabriele H.; Marchler-Bauer, Aron; Panchenko, Anna R.; Shoemaker, Benjamin A.; Song, James S.; Thiessen, Paul A.; Yamashita, Roxanne A.; Bryant, Stephen H.

2002-01-01

Three-dimensional structures are now known within many protein families and it is quite likely, in searching a sequence database, that one will encounter a homolog with known structure. The goal of Entrez’s 3D-structure database is to make this information, and the functional annotation it can provide, easily accessible to molecular biologists. To this end Entrez’s search engine provides three powerful features. (i) Sequence and structure neighbors; one may select all sequences similar to one of interest, for example, and link to any known 3D structures. (ii) Links between databases; one may search by term matching in MEDLINE, for example, and link to 3D structures reported in these articles. (iii) Sequence and structure visualization; identifying a homolog with known structure, one may view molecular-graphic and alignment displays, to infer approximate 3D structure. In this article we focus on two features of Entrez’s Molecular Modeling Database (MMDB) not described previously: links from individual biopolymer chains within 3D structures to a systematic taxonomy of organisms represented in molecular databases, and links from individual chains (and compact 3D domains within them) to structure neighbors, other chains (and 3D domains) with similar 3D structure. MMDB may be accessed at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure. PMID:11752307
The molecular biology in wound healing & non-healing wound.

PubMed

Qing, Chun

2017-08-01

The development of molecular biology and other new biotechnologies helps us to recognize the wound healing and non-healing wound of skin in the past 30 years. This review mainly focuses on the molecular biology of many cytokines (including growth factors) and other molecular factors such as extracellular matrix (ECM) on wound healing. The molecular biology in cell movement such as epidermal cells in wound healing was also discussed. Moreover many common chronic wounds such as pressure ulcers, leg ulcers, diabetic foot wounds, venous stasis ulcers, etc. usually deteriorate into non-healing wounds. Therefore the molecular biology such as advanced glycation end products (AGEs) and other molecular factors in diabetes non-healing wounds were also reviewed. Copyright © 2017 Daping Hospital and the Research Institute of Surgery of the Third Military Medical University. Production and hosting by Elsevier B.V. All rights reserved.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.