GenomeRNAi: a database for cell-based RNAi phenotypes.
Horn, Thomas; Arziman, Zeynep; Berger, Juerg; Boutros, Michael
2007-01-01
RNA interference (RNAi) has emerged as a powerful tool to generate loss-of-function phenotypes in a variety of organisms. Combined with the sequence information of almost completely annotated genomes, RNAi technologies have opened new avenues to conduct systematic genetic screens for every annotated gene in the genome. As increasing large datasets of RNAi-induced phenotypes become available, an important challenge remains the systematic integration and annotation of functional information. Genome-wide RNAi screens have been performed both in Caenorhabditis elegans and Drosophila for a variety of phenotypes and several RNAi libraries have become available to assess phenotypes for almost every gene in the genome. These screens were performed using different types of assays from visible phenotypes to focused transcriptional readouts and provide a rich data source for functional annotation across different species. The GenomeRNAi database provides access to published RNAi phenotypes obtained from cell-based screens and maps them to their genomic locus, including possible non-specific regions. The database also gives access to sequence information of RNAi probes used in various screens. It can be searched by phenotype, by gene, by RNAi probe or by sequence and is accessible at http://rnai.dkfz.de.
GenomeRNAi: a database for cell-based RNAi phenotypes
Horn, Thomas; Arziman, Zeynep; Berger, Juerg; Boutros, Michael
2007-01-01
RNA interference (RNAi) has emerged as a powerful tool to generate loss-of-function phenotypes in a variety of organisms. Combined with the sequence information of almost completely annotated genomes, RNAi technologies have opened new avenues to conduct systematic genetic screens for every annotated gene in the genome. As increasing large datasets of RNAi-induced phenotypes become available, an important challenge remains the systematic integration and annotation of functional information. Genome-wide RNAi screens have been performed both in Caenorhabditis elegans and Drosophila for a variety of phenotypes and several RNAi libraries have become available to assess phenotypes for almost every gene in the genome. These screens were performed using different types of assays from visible phenotypes to focused transcriptional readouts and provide a rich data source for functional annotation across different species. The GenomeRNAi database provides access to published RNAi phenotypes obtained from cell-based screens and maps them to their genomic locus, including possible non-specific regions. The database also gives access to sequence information of RNAi probes used in various screens. It can be searched by phenotype, by gene, by RNAi probe or by sequence and is accessible at PMID:17135194
A comprehensive global genotype-phenotype database for rare diseases.
Trujillano, Daniel; Oprea, Gabriela-Elena; Schmitz, Yvonne; Bertoli-Avella, Aida M; Abou Jamra, Rami; Rolfs, Arndt
2017-01-01
The ability to discover genetic variants in a patient runs far ahead of the ability to interpret them. Databases with accurate descriptions of the causal relationship between the variants and the phenotype are valuable since these are critical tools in clinical genetic diagnostics. Here, we introduce a comprehensive and global genotype-phenotype database focusing on rare diseases. This database (CentoMD ® ) is a browser-based tool that enables access to a comprehensive, independently curated system utilizing stringent high-quality criteria and a quickly growing repository of genetic and human phenotype ontology (HPO)-based clinical information. Its main goals are to aid the evaluation of genetic variants, to enhance the validity of the genetic analytical workflow, to increase the quality of genetic diagnoses, and to improve evaluation of treatment options for patients with hereditary diseases. The database software correlates clinical information from consented patients and probands of different geographical backgrounds with a large dataset of genetic variants and, when available, biomarker information. An automated follow-up tool is incorporated that informs all users whenever a variant classification has changed. These unique features fully embedded in a CLIA/CAP-accredited quality management system allow appropriate data quality and enhanced patient safety. More than 100,000 genetically screened individuals are documented in the database, resulting in more than 470 million variant detections. Approximately, 57% of the clinically relevant and uncertain variants in the database are novel. Notably, 3% of the genetic variants identified and previously reported in the literature as being associated with a particular rare disease were reclassified, based on internal evidence, as clinically irrelevant. The database offers a comprehensive summary of the clinical validity and causality of detected gene variants with their associated phenotypes, and is a valuable tool for identifying new disease genes through the correlation of novel genetic variants with specific, well-defined phenotypes.
Milc, Justyna; Sala, Antonio; Bergamaschi, Sonia; Pecchioni, Nicola
2011-01-01
The CEREALAB database aims to store genotypic and phenotypic data obtained by the CEREALAB project and to integrate them with already existing data sources in order to create a tool for plant breeders and geneticists. The database can help them in unravelling the genetics of economically important phenotypic traits; in identifying and choosing molecular markers associated to key traits; and in choosing the desired parentals for breeding programs. The database is divided into three sub-schemas corresponding to the species of interest: wheat, barley and rice; each sub-schema is then divided into two sub-ontologies, regarding genotypic and phenotypic data, respectively. Database URL: http://www.cerealab.unimore.it/jws/cerealab.jnlp PMID:21247929
Multi-source and ontology-based retrieval engine for maize mutant phenotypes
Green, Jason M.; Harnsomburana, Jaturon; Schaeffer, Mary L.; Lawrence, Carolyn J.; Shyu, Chi-Ren
2011-01-01
Model Organism Databases, including the various plant genome databases, collect and enable access to massive amounts of heterogeneous information, including sequence data, gene product information, images of mutant phenotypes, etc, as well as textual descriptions of many of these entities. While a variety of basic browsing and search capabilities are available to allow researchers to query and peruse the names and attributes of phenotypic data, next-generation search mechanisms that allow querying and ranking of text descriptions are much less common. In addition, the plant community needs an innovative way to leverage the existing links in these databases to search groups of text descriptions simultaneously. Furthermore, though much time and effort have been afforded to the development of plant-related ontologies, the knowledge embedded in these ontologies remains largely unused in available plant search mechanisms. Addressing these issues, we have developed a unique search engine for mutant phenotypes from MaizeGDB. This advanced search mechanism integrates various text description sources in MaizeGDB to aid a user in retrieving desired mutant phenotype information. Currently, descriptions of mutant phenotypes, loci and gene products are utilized collectively for each search, though expansion of the search mechanism to include other sources is straightforward. The retrieval engine, to our knowledge, is the first engine to exploit the content and structure of available domain ontologies, currently the Plant and Gene Ontologies, to expand and enrich retrieval results in major plant genomic databases. Database URL: http:www.PhenomicsWorld.org/QBTA.php PMID:21558151
Stade, Björn; Seelow, Dominik; Thomsen, Ingo; Krawczak, Michael; Franke, Andre
2014-01-01
Next Generation Sequencing (NGS) of whole exomes or genomes is increasingly being used in human genetic research and diagnostics. Sharing NGS data with third parties can help physicians and researchers to identify causative or predisposing mutations for a specific sample of interest more efficiently. In many cases, however, the exchange of such data may collide with data privacy regulations. GrabBlur is a newly developed tool to aggregate and share NGS-derived single nucleotide variant (SNV) data in a public database, keeping individual samples unidentifiable. In contrast to other currently existing SNV databases, GrabBlur includes phenotypic information and contact details of the submitter of a given database entry. By means of GrabBlur human geneticists can securely and easily share SNV data from resequencing projects. GrabBlur can ease the interpretation of SNV data by offering basic annotations, genotype frequencies and in particular phenotypic information - given that this information was shared - for the SNV of interest. GrabBlur facilitates the combination of phenotypic and NGS data (VCF files) via a local interface or command line operations. Data submissions may include HPO (Human Phenotype Ontology) terms, other trait descriptions, NGS technology information and the identity of the submitter. Most of this information is optional and its provision at the discretion of the submitter. Upon initial intake, GrabBlur merges and aggregates all sample-specific data. If a certain SNV is rare, the sample-specific information is replaced with the submitter identity. Generally, all data in GrabBlur are highly aggregated so that they can be shared with others while ensuring maximum privacy. Thus, it is impossible to reconstruct complete exomes or genomes from the database or to re-identify single individuals. After the individual information has been sufficiently "blurred", the data can be uploaded into a publicly accessible domain where aggregated genotypes are provided alongside phenotypic information. A web interface allows querying the database and the extraction of gene-wise SNV information. If an interesting SNV is found, the interrogator can get in contact with the submitter to exchange further information on the carrier and clarify, for example, whether the latter's phenotype matches with phenotype of their own patient.
Nishio, Shin-Ya; Usami, Shin-Ichi
2017-03-01
Recent advances in next-generation sequencing (NGS) have given rise to new challenges due to the difficulties in variant pathogenicity interpretation and large dataset management, including many kinds of public population databases as well as public or commercial disease-specific databases. Here, we report a new database development tool, named the "Clinical NGS Database," for improving clinical NGS workflow through the unified management of variant information and clinical information. This database software offers a two-feature approach to variant pathogenicity classification. The first of these approaches is a phenotype similarity-based approach. This database allows the easy comparison of the detailed phenotype of each patient with the average phenotype of the same gene mutation at the variant or gene level. It is also possible to browse patients with the same gene mutation quickly. The other approach is a statistical approach to variant pathogenicity classification based on the use of the odds ratio for comparisons between the case and the control for each inheritance mode (families with apparently autosomal dominant inheritance vs. control, and families with apparently autosomal recessive inheritance vs. control). A number of case studies are also presented to illustrate the utility of this database. © 2016 The Authors. **Human Mutation published by Wiley Periodicals, Inc.
Peng, Zhi-yu; Zhou, Xin; Li, Linchuan; Yu, Xiangchun; Li, Hongjiang; Jiang, Zhiqiang; Cao, Guangyu; Bai, Mingyi; Wang, Xingchun; Jiang, Caifu; Lu, Haibin; Hou, Xianhui; Qu, Lijia; Wang, Zhiyong; Zuo, Jianru; Fu, Xiangdong; Su, Zhen; Li, Songgang; Guo, Hongwei
2009-01-01
Plant hormones are small organic molecules that influence almost every aspect of plant growth and development. Genetic and molecular studies have revealed a large number of genes that are involved in responses to numerous plant hormones, including auxin, gibberellin, cytokinin, abscisic acid, ethylene, jasmonic acid, salicylic acid, and brassinosteroid. Here, we develop an Arabidopsis hormone database, which aims to provide a systematic and comprehensive view of genes participating in plant hormonal regulation, as well as morphological phenotypes controlled by plant hormones. Based on data from mutant studies, transgenic analysis and gene ontology (GO) annotation, we have identified a total of 1026 genes in the Arabidopsis genome that participate in plant hormone functions. Meanwhile, a phenotype ontology is developed to precisely describe myriad hormone-regulated morphological processes with standardized vocabularies. A web interface (http://ahd.cbi.pku.edu.cn) would allow users to quickly get access to information about these hormone-related genes, including sequences, functional category, mutant information, phenotypic description, microarray data and linked publications. Several applications of this database in studying plant hormonal regulation and hormone cross-talk will be presented and discussed. PMID:19015126
Peng, Zhi-yu; Zhou, Xin; Li, Linchuan; Yu, Xiangchun; Li, Hongjiang; Jiang, Zhiqiang; Cao, Guangyu; Bai, Mingyi; Wang, Xingchun; Jiang, Caifu; Lu, Haibin; Hou, Xianhui; Qu, Lijia; Wang, Zhiyong; Zuo, Jianru; Fu, Xiangdong; Su, Zhen; Li, Songgang; Guo, Hongwei
2009-01-01
Plant hormones are small organic molecules that influence almost every aspect of plant growth and development. Genetic and molecular studies have revealed a large number of genes that are involved in responses to numerous plant hormones, including auxin, gibberellin, cytokinin, abscisic acid, ethylene, jasmonic acid, salicylic acid, and brassinosteroid. Here, we develop an Arabidopsis hormone database, which aims to provide a systematic and comprehensive view of genes participating in plant hormonal regulation, as well as morphological phenotypes controlled by plant hormones. Based on data from mutant studies, transgenic analysis and gene ontology (GO) annotation, we have identified a total of 1026 genes in the Arabidopsis genome that participate in plant hormone functions. Meanwhile, a phenotype ontology is developed to precisely describe myriad hormone-regulated morphological processes with standardized vocabularies. A web interface (http://ahd.cbi.pku.edu.cn) would allow users to quickly get access to information about these hormone-related genes, including sequences, functional category, mutant information, phenotypic description, microarray data and linked publications. Several applications of this database in studying plant hormonal regulation and hormone cross-talk will be presented and discussed.
PGMapper: a web-based tool linking phenotype to genes.
Xiong, Qing; Qiu, Yuhui; Gu, Weikuan
2008-04-01
With the availability of whole genome sequence in many species, linkage analysis, positional cloning and microarray are gradually becoming powerful tools for investigating the links between phenotype and genotype or genes. However, in these methods, causative genes underlying a quantitative trait locus, or a disease, are usually located within a large genomic region or a large set of genes. Examining the function of every gene is very time consuming and needs to retrieve and integrate the information from multiple databases or genome resources. PGMapper is a software tool for automatically matching phenotype to genes from a defined genome region or a group of given genes by combining the mapping information from the Ensembl database and gene function information from the OMIM and PubMed databases. PGMapper is currently available for candidate gene search of human, mouse, rat, zebrafish and 12 other species. Available online at http://www.genediscovery.org/pgmapper/index.jsp.
The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data
Köhler, Sebastian; Doelken, Sandra C.; Mungall, Christopher J.; Bauer, Sebastian; Firth, Helen V.; Bailleul-Forestier, Isabelle; Black, Graeme C. M.; Brown, Danielle L.; Brudno, Michael; Campbell, Jennifer; FitzPatrick, David R.; Eppig, Janan T.; Jackson, Andrew P.; Freson, Kathleen; Girdea, Marta; Helbig, Ingo; Hurst, Jane A.; Jähn, Johanna; Jackson, Laird G.; Kelly, Anne M.; Ledbetter, David H.; Mansour, Sahar; Martin, Christa L.; Moss, Celia; Mumford, Andrew; Ouwehand, Willem H.; Park, Soo-Mi; Riggs, Erin Rooney; Scott, Richard H.; Sisodiya, Sanjay; Vooren, Steven Van; Wapner, Ronald J.; Wilkie, Andrew O. M.; Wright, Caroline F.; Vulto-van Silfhout, Anneke T.; de Leeuw, Nicole; de Vries, Bert B. A.; Washingthon, Nicole L.; Smith, Cynthia L.; Westerfield, Monte; Schofield, Paul; Ruef, Barbara J.; Gkoutos, Georgios V.; Haendel, Melissa; Smedley, Damian; Lewis, Suzanna E.; Robinson, Peter N.
2014-01-01
The Human Phenotype Ontology (HPO) project, available at http://www.human-phenotype-ontology.org, provides a structured, comprehensive and well-defined set of 10,088 classes (terms) describing human phenotypic abnormalities and 13,326 subclass relations between the HPO classes. In addition we have developed logical definitions for 46% of all HPO classes using terms from ontologies for anatomy, cell types, function, embryology, pathology and other domains. This allows interoperability with several resources, especially those containing phenotype information on model organisms such as mouse and zebrafish. Here we describe the updated HPO database, which provides annotations of 7,278 human hereditary syndromes listed in OMIM, Orphanet and DECIPHER to classes of the HPO. Various meta-attributes such as frequency, references and negations are associated with each annotation. Several large-scale projects worldwide utilize the HPO for describing phenotype information in their datasets. We have therefore generated equivalence mappings to other phenotype vocabularies such as LDDB, Orphanet, MedDRA, UMLS and phenoDB, allowing integration of existing datasets and interoperability with multiple biomedical resources. We have created various ways to access the HPO database content using flat files, a MySQL database, and Web-based tools. All data and documentation on the HPO project can be found online. PMID:24217912
DRUMS: a human disease related unique gene mutation search engine.
Li, Zuofeng; Liu, Xingnan; Wen, Jingran; Xu, Ye; Zhao, Xin; Li, Xuan; Liu, Lei; Zhang, Xiaoyan
2011-10-01
With the completion of the human genome project and the development of new methods for gene variant detection, the integration of mutation data and its phenotypic consequences has become more important than ever. Among all available resources, locus-specific databases (LSDBs) curate one or more specific genes' mutation data along with high-quality phenotypes. Although some genotype-phenotype data from LSDB have been integrated into central databases little effort has been made to integrate all these data by a search engine approach. In this work, we have developed disease related unique gene mutation search engine (DRUMS), a search engine for human disease related unique gene mutation as a convenient tool for biologists or physicians to retrieve gene variant and related phenotype information. Gene variant and phenotype information were stored in a gene-centred relational database. Moreover, the relationships between mutations and diseases were indexed by the uniform resource identifier from LSDB, or another central database. By querying DRUMS, users can access the most popular mutation databases under one interface. DRUMS could be treated as a domain specific search engine. By using web crawling, indexing, and searching technologies, it provides a competitively efficient interface for searching and retrieving mutation data and their relationships to diseases. The present system is freely accessible at http://www.scbit.org/glif/new/drums/index.html. © 2011 Wiley-Liss, Inc.
eCOMPAGT – efficient Combination and Management of Phenotypes and Genotypes for Genetic Epidemiology
Schönherr, Sebastian; Weißensteiner, Hansi; Coassin, Stefan; Specht, Günther; Kronenberg, Florian; Brandstätter, Anita
2009-01-01
Background High-throughput genotyping and phenotyping projects of large epidemiological study populations require sophisticated laboratory information management systems. Most epidemiological studies include subject-related personal information, which needs to be handled with care by following data privacy protection guidelines. In addition, genotyping core facilities handling cooperative projects require a straightforward solution to monitor the status and financial resources of the different projects. Description We developed a database system for an efficient combination and management of phenotypes and genotypes (eCOMPAGT) deriving from genetic epidemiological studies. eCOMPAGT securely stores and manages genotype and phenotype data and enables different user modes with different rights. Special attention was drawn on the import of data deriving from TaqMan and SNPlex genotyping assays. However, the database solution is adjustable to other genotyping systems by programming additional interfaces. Further important features are the scalability of the database and an export interface to statistical software. Conclusion eCOMPAGT can store, administer and connect phenotype data with all kinds of genotype data and is available as a downloadable version at . PMID:19432954
CRAVE: a database, middleware and visualization system for phenotype ontologies.
Gkoutos, Georgios V; Green, Eain C J; Greenaway, Simon; Blake, Andrew; Mallon, Ann-Marie; Hancock, John M
2005-04-01
A major challenge in modern biology is to link genome sequence information to organismal function. In many organisms this is being done by characterizing phenotypes resulting from mutations. Efficiently expressing phenotypic information requires combinatorial use of ontologies. However tools are not currently available to visualize combinations of ontologies. Here we describe CRAVE (Concept Relation Assay Value Explorer), a package allowing storage, active updating and visualization of multiple ontologies. CRAVE is a web-accessible JAVA application that accesses an underlying MySQL database of ontologies via a JAVA persistent middleware layer (Chameleon). This maps the database tables into discrete JAVA classes and creates memory resident, interlinked objects corresponding to the ontology data. These JAVA objects are accessed via calls through the middleware's application programming interface. CRAVE allows simultaneous display and linking of multiple ontologies and searching using Boolean and advanced searches.
Akiyama, Kenji; Kurotani, Atsushi; Iida, Kei; Kuromori, Takashi; Shinozaki, Kazuo; Sakurai, Tetsuya
2014-01-01
Arabidopsis thaliana is one of the most popular experimental plants. However, only 40% of its genes have at least one experimental Gene Ontology (GO) annotation assigned. Systematic observation of mutant phenotypes is an important technique for elucidating gene functions. Indeed, several large-scale phenotypic analyses have been performed and have generated phenotypic data sets from many Arabidopsis mutant lines and overexpressing lines, which are freely available online. Since each Arabidopsis mutant line database uses individual phenotype expression, the differences in the structured term sets used by each database make it difficult to compare data sets and make it impossible to search across databases. Therefore, we obtained publicly available information for a total of 66,209 Arabidopsis mutant lines, including loss-of-function (RATM and TARAPPER) and gain-of-function (AtFOX and OsFOX) lines, and integrated the phenotype data by mapping the descriptions onto Plant Ontology (PO) and Phenotypic Quality Ontology (PATO) terms. This approach made it possible to manage the four different phenotype databases as one large data set. Here, we report a publicly accessible web-based database, the RIKEN Arabidopsis Genome Encyclopedia II (RARGE II; http://rarge-v2.psc.riken.jp/), in which all of the data described in this study are included. Using the database, we demonstrated consistency (in terms of protein function) with a previous study and identified the presumed function of an unknown gene. We provide examples of AT1G21600, which is a subunit in the plastid-encoded RNA polymerase complex, and AT5G56980, which is related to the jasmonic acid signaling pathway.
Multi-source and ontology-based retrieval engine for maize mutant phenotypes
USDA-ARS?s Scientific Manuscript database
In the midst of this genomics era, major plant genome databases are collecting massive amounts of heterogeneous information, including sequence data, gene product information, images of mutant phenotypes, etc., as well as textual descriptions of many of these entities. While basic browsing and sear...
AgeFactDB--the JenAge Ageing Factor Database--towards data integration in ageing research.
Hühne, Rolf; Thalheim, Torsten; Sühnel, Jürgen
2014-01-01
AgeFactDB (http://agefactdb.jenage.de) is a database aimed at the collection and integration of ageing phenotype data including lifespan information. Ageing factors are considered to be genes, chemical compounds or other factors such as dietary restriction, whose action results in a changed lifespan or another ageing phenotype. Any information related to the effects of ageing factors is called an observation and is presented on observation pages. To provide concise access to the complete information for a particular ageing factor, corresponding observations are also summarized on ageing factor pages. In a first step, ageing-related data were primarily taken from existing databases such as the Ageing Gene Database--GenAge, the Lifespan Observations Database and the Dietary Restriction Gene Database--GenDR. In addition, we have started to include new ageing-related information. Based on homology data taken from the HomoloGene Database, AgeFactDB also provides observation and ageing factor pages of genes that are homologous to known ageing-related genes. These homologues are considered as candidate or putative ageing-related genes. AgeFactDB offers a variety of search and browse options, and also allows the download of ageing factor or observation lists in TSV, CSV and XML formats.
Chloroplast 2010: A Database for Large-Scale Phenotypic Screening of Arabidopsis Mutants1[W][OA
Lu, Yan; Savage, Linda J.; Larson, Matthew D.; Wilkerson, Curtis G.; Last, Robert L.
2011-01-01
Large-scale phenotypic screening presents challenges and opportunities not encountered in typical forward or reverse genetics projects. We describe a modular database and laboratory information management system that was implemented in support of the Chloroplast 2010 Project, an Arabidopsis (Arabidopsis thaliana) reverse genetics phenotypic screen of more than 5,000 mutants (http://bioinfo.bch.msu.edu/2010_LIMS; www.plastid.msu.edu). The software and laboratory work environment were designed to minimize operator error and detect systematic process errors. The database uses Ruby on Rails and Flash technologies to present complex quantitative and qualitative data and pedigree information in a flexible user interface. Examples are presented where the database was used to find opportunities for process changes that improved data quality. We also describe the use of the data-analysis tools to discover mutants defective in enzymes of leucine catabolism (heteromeric mitochondrial 3-methylcrotonyl-coenzyme A carboxylase [At1g03090 and At4g34030] and putative hydroxymethylglutaryl-coenzyme A lyase [At2g26800]) based upon a syndrome of pleiotropic seed amino acid phenotypes that resembles previously described isovaleryl coenzyme A dehydrogenase (At3g45300) mutants. In vitro assay results support the computational annotation of At2g26800 as hydroxymethylglutaryl-coenzyme A lyase. PMID:21224340
The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide
Liolios, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Philip; Kyrpides, Nikos C.
2006-01-01
The Genomes On Line Database (GOLD) is a web resource for comprehensive access to information regarding complete and ongoing genome sequencing projects worldwide. The database currently incorporates information on over 1500 sequencing projects, of which 294 have been completed and the data deposited in the public databases. GOLD v.2 has been expanded to provide information related to organism properties such as phenotype, ecotype and disease. Furthermore, project relevance and availability information is now included. GOLD is available at . It is also mirrored at the Institute of Molecular Biology and Biotechnology, Crete, Greece at PMID:16381880
Schofield, Paul N; Sundberg, John P; Hoehndorf, Robert; Gkoutos, Georgios V
2011-09-01
The systematic investigation of the phenotypes associated with genotypes in model organisms holds the promise of revealing genotype-phenotype relations directly and without additional, intermediate inferences. Large-scale projects are now underway to catalog the complete phenome of a species, notably the mouse. With the increasing amount of phenotype information becoming available, a major challenge that biology faces today is the systematic analysis of this information and the translation of research results across species and into an improved understanding of human disease. The challenge is to integrate and combine phenotype descriptions within a species and to systematically relate them to phenotype descriptions in other species, in order to form a comprehensive understanding of the relations between those phenotypes and the genotypes involved in human disease. We distinguish between two major approaches for comparative phenotype analyses: the first relies on evolutionary relations to bridge the species gap, while the other approach compares phenotypes directly. In particular, the direct comparison of phenotypes relies heavily on the quality and coherence of phenotype and disease databases. We discuss major achievements and future challenges for these databases in light of their potential to contribute to the understanding of the molecular mechanisms underlying human disease. In particular, we discuss how the use of ontologies and automated reasoning can significantly contribute to the analysis of phenotypes and demonstrate their potential for enabling translational research.
Townend, Gillian S; Ehrhart, Friederike; van Kranen, Henk J; Wilkinson, Mark; Jacobsen, Annika; Roos, Marco; Willighagen, Egon L; van Enckevort, David; Evelo, Chris T; Curfs, Leopold M G
2018-04-27
Rett syndrome (RTT) is a monogenic rare disorder that causes severe neurological problems. In most cases, it results from a loss-of-function mutation in the gene encoding methyl-CPG-binding protein 2 (MECP2). Currently, about 900 unique MECP2 variations (benign and pathogenic) have been identified and it is suspected that the different mutations contribute to different levels of disease severity. For researchers and clinicians, it is important that genotype-phenotype information is available to identify disease-causing mutations for diagnosis, to aid in clinical management of the disorder, and to provide counseling for parents. In this study, 13 genotype-phenotype databases were surveyed for their general functionality and availability of RTT-specific MECP2 variation data. For each database, we investigated findability and interoperability alongside practical user functionality, and type and amount of genetic and phenotype data. The main conclusions are that, as well as being challenging to find these databases and specific MECP2 variants held within, interoperability is as yet poorly developed and requires effort to search across databases. Nevertheless, we found several thousand online database entries for MECP2 variations and their associated phenotypes, diagnosis, or predicted variant effects, which is a good starting point for researchers and clinicians who want to provide, annotate, and use the data. © 2018 The Authors. Human Mutation published by Wiley Periodicals, Inc.
Open Window: When Easily Identifiable Genomes and Traits Are in the Public Domain
Angrist, Misha
2014-01-01
“One can't be of an enquiring and experimental nature, and still be very sensible.” - Charles Fort [1] As the costs of personal genetic testing “self-quantification” fall, publicly accessible databases housing people's genotypic and phenotypic information are gradually increasing in number and scope. The latest entrant is openSNP, which allows participants to upload their personal genetic/genomic and self-reported phenotypic data. I believe the emergence of such open repositories of human biological data is a natural reflection of inquisitive and digitally literate people's desires to make genomic and phenotypic information more easily available to a community beyond the research establishment. Such unfettered databases hold the promise of contributing mightily to science, science education and medicine. That said, in an age of increasingly widespread governmental and corporate surveillance, we would do well to be mindful that genomic DNA is uniquely identifying. Participants in open biological databases are engaged in a real-time experiment whose outcome is unknown. PMID:24647311
2011-01-01
Background Renewed interest in plant × environment interactions has risen in the post-genomic era. In this context, high-throughput phenotyping platforms have been developed to create reproducible environmental scenarios in which the phenotypic responses of multiple genotypes can be analysed in a reproducible way. These platforms benefit hugely from the development of suitable databases for storage, sharing and analysis of the large amount of data collected. In the model plant Arabidopsis thaliana, most databases available to the scientific community contain data related to genetic and molecular biology and are characterised by an inadequacy in the description of plant developmental stages and experimental metadata such as environmental conditions. Our goal was to develop a comprehensive information system for sharing of the data collected in PHENOPSIS, an automated platform for Arabidopsis thaliana phenotyping, with the scientific community. Description PHENOPSIS DB is a publicly available (URL: http://bioweb.supagro.inra.fr/phenopsis/) information system developed for storage, browsing and sharing of online data generated by the PHENOPSIS platform and offline data collected by experimenters and experimental metadata. It provides modules coupled to a Web interface for (i) the visualisation of environmental data of an experiment, (ii) the visualisation and statistical analysis of phenotypic data, and (iii) the analysis of Arabidopsis thaliana plant images. Conclusions Firstly, data stored in the PHENOPSIS DB are of interest to the Arabidopsis thaliana community, particularly in allowing phenotypic meta-analyses directly linked to environmental conditions on which publications are still scarce. Secondly, data or image analysis modules can be downloaded from the Web interface for direct usage or as the basis for modifications according to new requirements. Finally, the structure of PHENOPSIS DB provides a useful template for the development of other similar databases related to genotype × environment interactions. PMID:21554668
Transactional Database Transformation and Its Application in Prioritizing Human Disease Genes
Xiang, Yang; Payne, Philip R.O.; Huang, Kun
2013-01-01
Binary (0,1) matrices, commonly known as transactional databases, can represent many application data, including gene-phenotype data where “1” represents a confirmed gene-phenotype relation and “0” represents an unknown relation. It is natural to ask what information is hidden behind these “0”s and “1”s. Unfortunately, recent matrix completion methods, though very effective in many cases, are less likely to infer something interesting from these (0,1)-matrices. To answer this challenge, we propose IndEvi, a very succinct and effective algorithm to perform independent-evidence-based transactional database transformation. Each entry of a (0,1)-matrix is evaluated by “independent evidence” (maximal supporting patterns) extracted from the whole matrix for this entry. The value of an entry, regardless of its value as 0 or 1, has completely no effect for its independent evidence. The experiment on a gene-phenotype database shows that our method is highly promising in ranking candidate genes and predicting unknown disease genes. PMID:21422495
The Pathogen-Host Interactions database (PHI-base): additions and future developments
Urban, Martin; Pant, Rashmi; Raghunath, Arathi; Irvine, Alistair G.; Pedro, Helder; Hammond-Kosack, Kim E.
2015-01-01
Rapidly evolving pathogens cause a diverse array of diseases and epidemics that threaten crop yield, food security as well as human, animal and ecosystem health. To combat infection greater comparative knowledge is required on the pathogenic process in multiple species. The Pathogen-Host Interactions database (PHI-base) catalogues experimentally verified pathogenicity, virulence and effector genes from bacterial, fungal and protist pathogens. Mutant phenotypes are associated with gene information. The included pathogens infect a wide range of hosts including humans, animals, plants, insects, fish and other fungi. The current version, PHI-base 3.6, available at http://www.phi-base.org, stores information on 2875 genes, 4102 interactions, 110 host species, 160 pathogenic species (103 plant, 3 fungal and 54 animal infecting species) and 181 diseases drawn from 1243 references. Phenotypic and gene function information has been obtained by manual curation of the peer-reviewed literature. A controlled vocabulary consisting of nine high-level phenotype terms permits comparisons and data analysis across the taxonomic space. PHI-base phenotypes were mapped via their associated gene information to reference genomes available in Ensembl Genomes. Virulence genes and hotspots can be visualized directly in genome browsers. Future plans for PHI-base include development of tools facilitating community-led curation and inclusion of the corresponding host target(s). PMID:25414340
DNA Data Bank of Japan: 30th anniversary.
Kodama, Yuichi; Mashima, Jun; Kosuge, Takehide; Kaminuma, Eli; Ogasawara, Osamu; Okubo, Kousaku; Nakamura, Yasukazu; Takagi, Toshihisa
2018-01-04
The DNA Data Bank of Japan (DDBJ) Center (http://www.ddbj.nig.ac.jp) has been providing public data services for 30 years since 1987. We are collecting nucleotide sequence data and associated biological information from researchers as a member of the International Nucleotide Sequence Database Collaboration (INSDC), in collaboration with the US National Center for Biotechnology Information and the European Bioinformatics Institute. The DDBJ Center also services the Japanese Genotype-phenotype Archive (JGA) with the National Bioscience Database Center to collect genotype and phenotype data of human individuals. Here, we outline our database activities for INSDC and JGA over the past year, and introduce submission, retrieval and analysis services running on our supercomputer system and their recent developments. Furthermore, we highlight our responses to the amended Japanese rules for the protection of personal information and the launch of the DDBJ Group Cloud service for sharing pre-publication data among research groups. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Riise Stensland, Hilde Monica Frostad; Frantzen, Gabrio; Kuokkanen, Elina; Buvang, Elisabeth Kjeldsen; Klenow, Helle Bagterp; Heikinheimo, Pirkko; Malm, Dag; Nilssen, Øivind
2015-06-01
α-Mannosidosis is an autosomal recessive lysosomal storage disorder caused by mutations in the MAN2B1 gene, encoding lysosomal α-mannosidase. The disorder is characterized by a range of clinical phenotypes of which the major manifestations are mental impairment, hearing impairment, skeletal changes, and immunodeficiency. Here, we report an α-mannosidosis mutation database, amamutdb.no, which has been constructed as a publicly accessible online resource for recording and analyzing MAN2B1 variants (http://amamutdb.no). Our aim has been to offer structured and relational information on MAN2B1 mutations and genotypes along with associated clinical phenotypes. Classifying missense mutations, as pathogenic or benign, is a challenge. Therefore, they have been given special attention as we have compiled all available data that relate to their biochemical, functional, and structural properties. The α-mannosidosis mutation database is comprehensive and relational in the sense that information can be retrieved and compiled across datasets; hence, it will facilitate diagnostics and increase our understanding of the clinical and molecular aspects of α-mannosidosis. We believe that the amamutdb.no structure and architecture will be applicable for the development of databases for any monogenic disorder. © 2015 WILEY PERIODICALS, INC.
Manual Gene Ontology annotation workflow at the Mouse Genome Informatics Database.
Drabkin, Harold J; Blake, Judith A
2012-01-01
The Mouse Genome Database, the Gene Expression Database and the Mouse Tumor Biology database are integrated components of the Mouse Genome Informatics (MGI) resource (http://www.informatics.jax.org). The MGI system presents both a consensus view and an experimental view of the knowledge concerning the genetics and genomics of the laboratory mouse. From genotype to phenotype, this information resource integrates information about genes, sequences, maps, expression analyses, alleles, strains and mutant phenotypes. Comparative mammalian data are also presented particularly in regards to the use of the mouse as a model for the investigation of molecular and genetic components of human diseases. These data are collected from literature curation as well as downloads of large datasets (SwissProt, LocusLink, etc.). MGI is one of the founding members of the Gene Ontology (GO) and uses the GO for functional annotation of genes. Here, we discuss the workflow associated with manual GO annotation at MGI, from literature collection to display of the annotations. Peer-reviewed literature is collected mostly from a set of journals available electronically. Selected articles are entered into a master bibliography and indexed to one of eight areas of interest such as 'GO' or 'homology' or 'phenotype'. Each article is then either indexed to a gene already contained in the database or funneled through a separate nomenclature database to add genes. The master bibliography and associated indexing provide information for various curator-reports such as 'papers selected for GO that refer to genes with NO GO annotation'. Once indexed, curators who have expertise in appropriate disciplines enter pertinent information. MGI makes use of several controlled vocabularies that ensure uniform data encoding, enable robust analysis and support the construction of complex queries. These vocabularies range from pick-lists to structured vocabularies such as the GO. All data associations are supported with statements of evidence as well as access to source publications.
AgeFactDB—the JenAge Ageing Factor Database—towards data integration in ageing research
Hühne, Rolf; Thalheim, Torsten; Sühnel, Jürgen
2014-01-01
AgeFactDB (http://agefactdb.jenage.de) is a database aimed at the collection and integration of ageing phenotype data including lifespan information. Ageing factors are considered to be genes, chemical compounds or other factors such as dietary restriction, whose action results in a changed lifespan or another ageing phenotype. Any information related to the effects of ageing factors is called an observation and is presented on observation pages. To provide concise access to the complete information for a particular ageing factor, corresponding observations are also summarized on ageing factor pages. In a first step, ageing-related data were primarily taken from existing databases such as the Ageing Gene Database—GenAge, the Lifespan Observations Database and the Dietary Restriction Gene Database—GenDR. In addition, we have started to include new ageing-related information. Based on homology data taken from the HomoloGene Database, AgeFactDB also provides observation and ageing factor pages of genes that are homologous to known ageing-related genes. These homologues are considered as candidate or putative ageing-related genes. AgeFactDB offers a variety of search and browse options, and also allows the download of ageing factor or observation lists in TSV, CSV and XML formats. PMID:24217911
The Pathogen-Host Interactions database (PHI-base): additions and future developments.
Urban, Martin; Pant, Rashmi; Raghunath, Arathi; Irvine, Alistair G; Pedro, Helder; Hammond-Kosack, Kim E
2015-01-01
Rapidly evolving pathogens cause a diverse array of diseases and epidemics that threaten crop yield, food security as well as human, animal and ecosystem health. To combat infection greater comparative knowledge is required on the pathogenic process in multiple species. The Pathogen-Host Interactions database (PHI-base) catalogues experimentally verified pathogenicity, virulence and effector genes from bacterial, fungal and protist pathogens. Mutant phenotypes are associated with gene information. The included pathogens infect a wide range of hosts including humans, animals, plants, insects, fish and other fungi. The current version, PHI-base 3.6, available at http://www.phi-base.org, stores information on 2875 genes, 4102 interactions, 110 host species, 160 pathogenic species (103 plant, 3 fungal and 54 animal infecting species) and 181 diseases drawn from 1243 references. Phenotypic and gene function information has been obtained by manual curation of the peer-reviewed literature. A controlled vocabulary consisting of nine high-level phenotype terms permits comparisons and data analysis across the taxonomic space. PHI-base phenotypes were mapped via their associated gene information to reference genomes available in Ensembl Genomes. Virulence genes and hotspots can be visualized directly in genome browsers. Future plans for PHI-base include development of tools facilitating community-led curation and inclusion of the corresponding host target(s). © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Dececchi, T Alex; Mabee, Paula M; Blackburn, David C
2016-01-01
Databases of organismal traits that aggregate information from one or multiple sources can be leveraged for large-scale analyses in biology. Yet the differences among these data streams and how well they capture trait diversity have never been explored. We present the first analysis of the differences between phenotypes captured in free text of descriptive publications ('monographs') and those used in phylogenetic analyses ('matrices'). We focus our analysis on osteological phenotypes of the limbs of four extinct vertebrate taxa critical to our understanding of the fin-to-limb transition. We find that there is low overlap between the anatomical entities used in these two sources of phenotype data, indicating that phenotypes represented in matrices are not simply a subset of those found in monographic descriptions. Perhaps as expected, compared to characters found in matrices, phenotypes in monographs tend to emphasize descriptive and positional morphology, be somewhat more complex, and relate to fewer additional taxa. While based on a small set of focal taxa, these qualitative and quantitative data suggest that either source of phenotypes alone will result in incomplete knowledge of variation for a given taxon. As a broader community develops to use and expand databases characterizing organismal trait diversity, it is important to recognize the limitations of the data sources and develop strategies to more fully characterize variation both within species and across the tree of life.
Dececchi, T. Alex; Mabee, Paula M.; Blackburn, David C.
2016-01-01
Databases of organismal traits that aggregate information from one or multiple sources can be leveraged for large-scale analyses in biology. Yet the differences among these data streams and how well they capture trait diversity have never been explored. We present the first analysis of the differences between phenotypes captured in free text of descriptive publications (‘monographs’) and those used in phylogenetic analyses (‘matrices’). We focus our analysis on osteological phenotypes of the limbs of four extinct vertebrate taxa critical to our understanding of the fin-to-limb transition. We find that there is low overlap between the anatomical entities used in these two sources of phenotype data, indicating that phenotypes represented in matrices are not simply a subset of those found in monographic descriptions. Perhaps as expected, compared to characters found in matrices, phenotypes in monographs tend to emphasize descriptive and positional morphology, be somewhat more complex, and relate to fewer additional taxa. While based on a small set of focal taxa, these qualitative and quantitative data suggest that either source of phenotypes alone will result in incomplete knowledge of variation for a given taxon. As a broader community develops to use and expand databases characterizing organismal trait diversity, it is important to recognize the limitations of the data sources and develop strategies to more fully characterize variation both within species and across the tree of life. PMID:27191170
Zhang, Shihua; Xuan, Hongdong; Zhang, Liang; Fu, Sicong; Wang, Yijun; Yang, Hua; Tai, Yuling; Song, Youhong; Zhang, Jinsong; Ho, Chi-Tang; Li, Shaowen; Wan, Xiaochun
2017-09-01
Tea is one of the most consumed beverages in the world. Considerable studies show the exceptional health benefits (e.g. antioxidation, cancer prevention) of tea owing to its various bioactive components. However, data from these extensively published papers had not been made available in a central database. To lay a foundation in improving the understanding of healthy tea functions, we established a TBC2health database that currently documents 1338 relationships between 497 tea bioactive compounds and 206 diseases (or phenotypes) manually culled from over 300 published articles. Each entry in TBC2health contains comprehensive information about a bioactive relationship that can be accessed in three aspects: (i) compound information, (ii) disease (or phenotype) information and (iii) evidence and reference. Using the curated bioactive relationships, a bipartite network was reconstructed and the corresponding network (or sub-network) visualization and topological analyses are provided for users. This database has a user-friendly interface for entry browse, search and download. In addition, TBC2health provides a submission page and several useful tools (e.g. BLAST, molecular docking) to facilitate use of the database. Consequently, TBC2health can serve as a valuable bioinformatics platform for the exploration of beneficial effects of tea on human health. TBC2health is freely available at http://camellia.ahau.edu.cn/TBC2health. © The Author 2016. Published by Oxford University Press.
Hassani-Pak, Keywan; Rawlings, Christopher
2017-06-13
Genetics and "omics" studies designed to uncover genotype to phenotype relationships often identify large numbers of potential candidate genes, among which the causal genes are hidden. Scientists generally lack the time and technical expertise to review all relevant information available from the literature, from key model species and from a potentially wide range of related biological databases in a variety of data formats with variable quality and coverage. Computational tools are needed for the integration and evaluation of heterogeneous information in order to prioritise candidate genes and components of interaction networks that, if perturbed through potential interventions, have a positive impact on the biological outcome in the whole organism without producing negative side effects. Here we review several bioinformatics tools and databases that play an important role in biological knowledge discovery and candidate gene prioritization. We conclude with several key challenges that need to be addressed in order to facilitate biological knowledge discovery in the future.
Evaluating the quality of Marfan genotype-phenotype correlations in existing FBN1 databases.
Groth, Kristian A; Von Kodolitsch, Yskert; Kutsche, Kerstin; Gaustadnes, Mette; Thorsen, Kasper; Andersen, Niels H; Gravholt, Claus H
2017-07-01
Genetic FBN1 testing is pivotal for confirming the clinical diagnosis of Marfan syndrome. In an effort to evaluate variant causality, FBN1 databases are often used. We evaluated the current databases regarding FBN1 variants and validated associated phenotype records with a new Marfan syndrome geno-phenotyping tool called the Marfan score. We evaluated four databases (UMD-FBN1, ClinVar, the Human Gene Mutation Database (HGMD), and Uniprot) containing 2,250 FBN1 variants supported by 4,904 records presented in 307 references. The Marfan score calculated for phenotype data from the records quantified variant associations with Marfan syndrome phenotype. We calculated a Marfan score for 1,283 variants, of which we confirmed the database diagnosis of Marfan syndrome in 77.1%. This represented only 35.8% of the total registered variants; 18.5-33.3% (UMD-FBN1 versus HGMD) of variants associated with Marfan syndrome in the databases could not be confirmed by the recorded phenotype. FBN1 databases can be imprecise and incomplete. Data should be used with caution when evaluating FBN1 variants. At present, the UMD-FBN1 database seems to be the biggest and best curated; therefore, it is the most comprehensive database. However, the need for better genotype-phenotype curated databases is evident, and we hereby present such a database.Genet Med advance online publication 01 December 2016.
Knowlton, Michelle N; Li, Tongbin; Ren, Yongliang; Bill, Brent R; Ellis, Lynda Bm; Ekker, Stephen C
2008-01-07
The zebrafish is a powerful model vertebrate amenable to high throughput in vivo genetic analyses. Examples include reverse genetic screens using morpholino knockdown, expression-based screening using enhancer trapping and forward genetic screening using transposon insertional mutagenesis. We have created a database to facilitate web-based distribution of data from such genetic studies. The MOrpholino DataBase is a MySQL relational database with an online, PHP interface. Multiple quality control levels allow differential access to data in raw and finished formats. MODBv1 includes sequence information relating to almost 800 morpholinos and their targets and phenotypic data regarding the dose effect of each morpholino (mortality, toxicity and defects). To improve the searchability of this database, we have incorporated a fixed-vocabulary defect ontology that allows for the organization of morpholino affects based on anatomical structure affected and defect produced. This also allows comparison between species utilizing Phenotypic Attribute Trait Ontology (PATO) designated terminology. MODB is also cross-linked with ZFIN, allowing full searches between the two databases. MODB offers users the ability to retrieve morpholino data by sequence of morpholino or target, name of target, anatomical structure affected and defect produced. MODB data can be used for functional genomic analysis of morpholino design to maximize efficacy and minimize toxicity. MODB also serves as a template for future sequence-based functional genetic screen databases, and it is currently being used as a model for the creation of a mutagenic insertional transposon database.
Fish Karyome: A karyological information network database of Indian Fishes.
Nagpure, Naresh Sahebrao; Pathak, Ajey Kumar; Pati, Rameshwar; Singh, Shri Prakash; Singh, Mahender; Sarkar, Uttam Kumar; Kushwaha, Basdeo; Kumar, Ravindra
2012-01-01
'Fish Karyome', a database on karyological information of Indian fishes have been developed that serves as central source for karyotype data about Indian fishes compiled from the published literature. Fish Karyome has been intended to serve as a liaison tool for the researchers and contains karyological information about 171 out of 2438 finfish species reported in India and is publically available via World Wide Web. The database provides information on chromosome number, morphology, sex chromosomes, karyotype formula and cytogenetic markers etc. Additionally, it also provides the phenotypic information that includes species name, its classification, and locality of sample collection, common name, local name, sex, geographical distribution, and IUCN Red list status. Besides, fish and karyotype images, references for 171 finfish species have been included in the database. Fish Karyome has been developed using SQL Server 2008, a relational database management system, Microsoft's ASP.NET-2008 and Macromedia's FLASH Technology under Windows 7 operating environment. The system also enables users to input new information and images into the database, search and view the information and images of interest using various search options. Fish Karyome has wide range of applications in species characterization and identification, sex determination, chromosomal mapping, karyo-evolution and systematics of fishes.
MIPS: curated databases and comprehensive secondary data resources in 2010.
Mewes, H Werner; Ruepp, Andreas; Theis, Fabian; Rattei, Thomas; Walter, Mathias; Frishman, Dmitrij; Suhre, Karsten; Spannagl, Manuel; Mayer, Klaus F X; Stümpflen, Volker; Antonov, Alexey
2011-01-01
The Munich Information Center for Protein Sequences (MIPS at the Helmholtz Center for Environmental Health, Neuherberg, Germany) has many years of experience in providing annotated collections of biological data. Selected data sets of high relevance, such as model genomes, are subjected to careful manual curation, while the bulk of high-throughput data is annotated by automatic means. High-quality reference resources developed in the past and still actively maintained include Saccharomyces cerevisiae, Neurospora crassa and Arabidopsis thaliana genome databases as well as several protein interaction data sets (MPACT, MPPI and CORUM). More recent projects are PhenomiR, the database on microRNA-related phenotypes, and MIPS PlantsDB for integrative and comparative plant genome research. The interlinked resources SIMAP and PEDANT provide homology relationships as well as up-to-date and consistent annotation for 38,000,000 protein sequences. PPLIPS and CCancer are versatile tools for proteomics and functional genomics interfacing to a database of compilations from gene lists extracted from literature. A novel literature-mining tool, EXCERBT, gives access to structured information on classified relations between genes, proteins, phenotypes and diseases extracted from Medline abstracts by semantic analysis. All databases described here, as well as the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.helmholtz-muenchen.de).
MIPS: curated databases and comprehensive secondary data resources in 2010
Mewes, H. Werner; Ruepp, Andreas; Theis, Fabian; Rattei, Thomas; Walter, Mathias; Frishman, Dmitrij; Suhre, Karsten; Spannagl, Manuel; Mayer, Klaus F.X.; Stümpflen, Volker; Antonov, Alexey
2011-01-01
The Munich Information Center for Protein Sequences (MIPS at the Helmholtz Center for Environmental Health, Neuherberg, Germany) has many years of experience in providing annotated collections of biological data. Selected data sets of high relevance, such as model genomes, are subjected to careful manual curation, while the bulk of high-throughput data is annotated by automatic means. High-quality reference resources developed in the past and still actively maintained include Saccharomyces cerevisiae, Neurospora crassa and Arabidopsis thaliana genome databases as well as several protein interaction data sets (MPACT, MPPI and CORUM). More recent projects are PhenomiR, the database on microRNA-related phenotypes, and MIPS PlantsDB for integrative and comparative plant genome research. The interlinked resources SIMAP and PEDANT provide homology relationships as well as up-to-date and consistent annotation for 38 000 000 protein sequences. PPLIPS and CCancer are versatile tools for proteomics and functional genomics interfacing to a database of compilations from gene lists extracted from literature. A novel literature-mining tool, EXCERBT, gives access to structured information on classified relations between genes, proteins, phenotypes and diseases extracted from Medline abstracts by semantic analysis. All databases described here, as well as the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.helmholtz-muenchen.de). PMID:21109531
Toward a mtDNA locus-specific mutation database using the LOVD platform.
Elson, Joanna L; Sweeney, Mary G; Procaccio, Vincent; Yarham, John W; Salas, Antonio; Kong, Qing-Peng; van der Westhuizen, Francois H; Pitceathly, Robert D S; Thorburn, David R; Lott, Marie T; Wallace, Douglas C; Taylor, Robert W; McFarland, Robert
2012-09-01
The Human Variome Project (HVP) is a global effort to collect and curate all human genetic variation affecting health. Mutations of mitochondrial DNA (mtDNA) are an important cause of neurogenetic disease in humans; however, identification of the pathogenic mutations responsible can be problematic. In this article, we provide explanations as to why and suggest how such difficulties might be overcome. We put forward a case in support of a new Locus Specific Mutation Database (LSDB) implemented using the Leiden Open-source Variation Database (LOVD) system that will not only list primary mutations, but also present the evidence supporting their role in disease. Critically, we feel that this new database should have the capacity to store information on the observed phenotypes alongside the genetic variation, thereby facilitating our understanding of the complex and variable presentation of mtDNA disease. LOVD supports fast queries of both seen and hidden data and allows storage of sequence variants from high-throughput sequence analysis. The LOVD platform will allow construction of a secure mtDNA database; one that can fully utilize currently available data, as well as that being generated by high-throughput sequencing, to link genotype with phenotype enhancing our understanding of mitochondrial disease, with a view to providing better prognostic information. © 2012 Wiley Periodicals, Inc.
Toward a mtDNA Locus-Specific Mutation Database Using the LOVD Platform
Elson, Joanna L.; Sweeney, Mary G.; Procaccio, Vincent; Yarham, John W.; Salas, Antonio; Kong, Qing-Peng; van der Westhuizen, Francois H.; Pitceathly, Robert D.S.; Thorburn, David R.; Lott, Marie T.; Wallace, Douglas C.; Taylor, Robert W.; McFarland, Robert
2015-01-01
The Human Variome Project (HVP) is a global effort to collect and curate all human genetic variation affecting health. Mutations of mitochondrial DNA (mtDNA) are an important cause of neurogenetic disease in humans; however, identification of the pathogenic mutations responsible can be problematic. In this article, we provide explanations as to why and suggest how such difficulties might be overcome. We put forward a case in support of a new Locus Specific Mutation Database (LSDB) implemented using the Leiden Open-source Variation Database (LOVD) system that will not only list primary mutations, but also present the evidence supporting their role in disease. Critically, we feel that this new database should have the capacity to store information on the observed phenotypes alongside the genetic variation, thereby facilitating our understanding of the complex and variable presentation of mtDNA disease. LOVD supports fast queries of both seen and hidden data and allows storage of sequence variants from high-throughput sequence analysis. The LOVD platform will allow construction of a secure mtDNA database; one that can fully utilize currently available data, as well as that being generated by high-throughput sequencing, to link genotype with phenotype enhancing our understanding of mitochondrial disease, with a view to providing better prognostic information. PMID:22581690
Ilic, Katica; Kellogg, Elizabeth A.; Jaiswal, Pankaj; Zapata, Felipe; Stevens, Peter F.; Vincent, Leszek P.; Avraham, Shulamit; Reiser, Leonore; Pujar, Anuradha; Sachs, Martin M.; Whitman, Noah T.; McCouch, Susan R.; Schaeffer, Mary L.; Ware, Doreen H.; Stein, Lincoln D.; Rhee, Seung Y.
2007-01-01
Formal description of plant phenotypes and standardized annotation of gene expression and protein localization data require uniform terminology that accurately describes plant anatomy and morphology. This facilitates cross species comparative studies and quantitative comparison of phenotypes and expression patterns. A major drawback is variable terminology that is used to describe plant anatomy and morphology in publications and genomic databases for different species. The same terms are sometimes applied to different plant structures in different taxonomic groups. Conversely, similar structures are named by their species-specific terms. To address this problem, we created the Plant Structure Ontology (PSO), the first generic ontological representation of anatomy and morphology of a flowering plant. The PSO is intended for a broad plant research community, including bench scientists, curators in genomic databases, and bioinformaticians. The initial releases of the PSO integrated existing ontologies for Arabidopsis (Arabidopsis thaliana), maize (Zea mays), and rice (Oryza sativa); more recent versions of the ontology encompass terms relevant to Fabaceae, Solanaceae, additional cereal crops, and poplar (Populus spp.). Databases such as The Arabidopsis Information Resource, Nottingham Arabidopsis Stock Centre, Gramene, MaizeGDB, and SOL Genomics Network are using the PSO to describe expression patterns of genes and phenotypes of mutants and natural variants and are regularly contributing new annotations to the Plant Ontology database. The PSO is also used in specialized public databases, such as BRENDA, GENEVESTIGATOR, NASCArrays, and others. Over 10,000 gene annotations and phenotype descriptions from participating databases can be queried and retrieved using the Plant Ontology browser. The PSO, as well as contributed gene associations, can be obtained at www.plantontology.org. PMID:17142475
PhenoTips: patient phenotyping software for clinical and research use.
Girdea, Marta; Dumitriu, Sergiu; Fiume, Marc; Bowdin, Sarah; Boycott, Kym M; Chénier, Sébastien; Chitayat, David; Faghfoury, Hanna; Meyn, M Stephen; Ray, Peter N; So, Joyce; Stavropoulos, Dimitri J; Brudno, Michael
2013-08-01
We have developed PhenoTips: open source software for collecting and analyzing phenotypic information for patients with genetic disorders. Our software combines an easy-to-use interface, compatible with any device that runs a Web browser, with a standardized database back end. The PhenoTips' user interface closely mirrors clinician workflows so as to facilitate the recording of observations made during the patient encounter. Collected data include demographics, medical history, family history, physical and laboratory measurements, physical findings, and additional notes. Phenotypic information is represented using the Human Phenotype Ontology; however, the complexity of the ontology is hidden behind a user interface, which combines simple selection of common phenotypes with error-tolerant, predictive search of the entire ontology. PhenoTips supports accurate diagnosis by analyzing the entered data, then suggesting additional clinical investigations and providing Online Mendelian Inheritance in Man (OMIM) links to likely disorders. By collecting, classifying, and analyzing phenotypic information during the patient encounter, PhenoTips allows for streamlining of clinic workflow, efficient data entry, improved diagnosis, standardization of collected patient phenotypes, and sharing of anonymized patient phenotype data for the study of rare disorders. Our source code and a demo version of PhenoTips are available at http://phenotips.org. © 2013 WILEY PERIODICALS, INC.
StreptomycesInforSys: A web-enabled information repository
Jain, Chakresh Kumar; Gupta, Vidhi; Gupta, Ashvarya; Gupta, Sanjay; Wadhwa, Gulshan; Sharma, Sanjeev Kumar; Sarethy, Indira P
2012-01-01
Members of Streptomyces produce 70% of natural bioactive products. There is considerable amount of information available based on polyphasic approach for classification of Streptomyces. However, this information based on phenotypic, genotypic and bioactive component production profiles is crucial for pharmacological screening programmes. This is scattered across various journals, books and other resources, many of which are not freely accessible. The designed database incorporates polyphasic typing information using combinations of search options to aid in efficient screening of new isolates. This will help in the preliminary categorization of appropriate groups. It is a free relational database compatible with existing operating systems. A cross platform technology with XAMPP Web server has been used to develop, manage, and facilitate the user query effectively with database support. Employment of PHP, a platform-independent scripting language, embedded in HTML and the database management software MySQL will facilitate dynamic information storage and retrieval. The user-friendly, open and flexible freeware (PHP, MySQL and Apache) is foreseen to reduce running and maintenance cost. Availability www.sis.biowaves.org PMID:23275736
StreptomycesInforSys: A web-enabled information repository.
Jain, Chakresh Kumar; Gupta, Vidhi; Gupta, Ashvarya; Gupta, Sanjay; Wadhwa, Gulshan; Sharma, Sanjeev Kumar; Sarethy, Indira P
2012-01-01
Members of Streptomyces produce 70% of natural bioactive products. There is considerable amount of information available based on polyphasic approach for classification of Streptomyces. However, this information based on phenotypic, genotypic and bioactive component production profiles is crucial for pharmacological screening programmes. This is scattered across various journals, books and other resources, many of which are not freely accessible. The designed database incorporates polyphasic typing information using combinations of search options to aid in efficient screening of new isolates. This will help in the preliminary categorization of appropriate groups. It is a free relational database compatible with existing operating systems. A cross platform technology with XAMPP Web server has been used to develop, manage, and facilitate the user query effectively with database support. Employment of PHP, a platform-independent scripting language, embedded in HTML and the database management software MySQL will facilitate dynamic information storage and retrieval. The user-friendly, open and flexible freeware (PHP, MySQL and Apache) is foreseen to reduce running and maintenance cost. www.sis.biowaves.org.
The Resistome: A Comprehensive Database of Escherichia coli Resistance Phenotypes.
Winkler, James D; Halweg-Edwards, Andrea L; Erickson, Keesha E; Choudhury, Alaksh; Pines, Gur; Gill, Ryan T
2016-12-16
The microbial ability to resist stressful environmental conditions and chemical inhibitors is of great industrial and medical interest. Much of the data related to mutation-based stress resistance, however, is scattered through the academic literature, making it difficult to apply systematic analyses to this wealth of information. To address this issue, we introduce the Resistome database: a literature-curated collection of Escherichia coli genotypes-phenotypes containing over 5,000 mutants that resist hundreds of compounds and environmental conditions. We use the Resistome to understand our current state of knowledge regarding resistance and to detect potential synergy or antagonism between resistance phenotypes. Our data set represents one of the most comprehensive collections of genomic data related to resistance currently available. Future development will focus on the construction of a combined genomic-transcriptomic-proteomic framework for understanding E. coli's resistance biology. The Resistome can be downloaded at https://bitbucket.org/jdwinkler/resistome_release/overview .
Kim, Mara; Cooper, Brian A.; Venkat, Rohit; Phillips, Julie B.; Eidem, Haley R.; Hirbo, Jibril; Nutakki, Sashank; Williams, Scott M.; Muglia, Louis J.; Capra, J. Anthony; Petren, Kenneth; Abbot, Patrick; Rokas, Antonis; McGary, Kriston L.
2016-01-01
Mammalian gestation and pregnancy are fast evolving processes that involve the interaction of the fetal, maternal and paternal genomes. Version 1.0 of the GEneSTATION database (http://genestation.org) integrates diverse types of omics data across mammals to advance understanding of the genetic basis of gestation and pregnancy-associated phenotypes and to accelerate the translation of discoveries from model organisms to humans. GEneSTATION is built using tools from the Generic Model Organism Database project, including the biology-aware database CHADO, new tools for rapid data integration, and algorithms that streamline synthesis and user access. GEneSTATION contains curated life history information on pregnancy and reproduction from 23 high-quality mammalian genomes. For every human gene, GEneSTATION contains diverse evolutionary (e.g. gene age, population genetic and molecular evolutionary statistics), organismal (e.g. tissue-specific gene and protein expression, differential gene expression, disease phenotype), and molecular data types (e.g. Gene Ontology Annotation, protein interactions), as well as links to many general (e.g. Entrez, PubMed) and pregnancy disease-specific (e.g. PTBgene, dbPTB) databases. By facilitating the synthesis of diverse functional and evolutionary data in pregnancy-associated tissues and phenotypes and enabling their quick, intuitive, accurate and customized meta-analysis, GEneSTATION provides a novel platform for comprehensive investigation of the function and evolution of mammalian pregnancy. PMID:26567549
Temporal abstraction-based clinical phenotyping with Eureka!
Post, Andrew R; Kurc, Tahsin; Willard, Richie; Rathod, Himanshu; Mansour, Michel; Pai, Akshatha Kalsanka; Torian, William M; Agravat, Sanjay; Sturm, Suzanne; Saltz, Joel H
2013-01-01
Temporal abstraction, a method for specifying and detecting temporal patterns in clinical databases, is very expressive and performs well, but it is difficult for clinical investigators and data analysts to understand. Such patterns are critical in phenotyping patients using their medical records in research and quality improvement. We have previously developed the Analytic Information Warehouse (AIW), which computes such phenotypes using temporal abstraction but requires software engineers to use. We have extended the AIW's web user interface, Eureka! Clinical Analytics, to support specifying phenotypes using an alternative model that we developed with clinical stakeholders. The software converts phenotypes from this model to that of temporal abstraction prior to data processing. The model can represent all phenotypes in a quality improvement project and a growing set of phenotypes in a multi-site research study. Phenotyping that is accessible to investigators and IT personnel may enable its broader adoption.
The androgen receptor gene mutations database.
Patterson, M N; Hughes, I A; Gottlieb, B; Pinsky, L
1994-09-01
The androgen receptor gene mutations database is a comprehensive listing of mutations published in journals and meetings proceedings. The majority of mutations are point mutations identified in patients with androgen insensitivity syndrome. Information is included regarding the phenotype, the nature and location of the mutations, as well as the effects of the mutations on the androgen binding activity of the receptor. The current version of the database contains 149 entries, of which 114 are unique mutations. The database is available from EMBL (NetServ@EMBL-Heidelberg.DE) or as a Macintosh Filemaker file (mc33001@musica.mcgill.ca).
Pantazatos, Spiro P.; Li, Jianrong; Pavlidis, Paul; Lussier, Yves A.
2009-01-01
An approach towards heterogeneous neuroscience dataset integration is proposed that uses Natural Language Processing (NLP) and a knowledge-based phenotype organizer system (PhenOS) to link ontology-anchored terms to underlying data from each database, and then maps these terms based on a computable model of disease (SNOMED CT®). The approach was implemented using sample datasets from fMRIDC, GEO, The Whole Brain Atlas and Neuronames, and allowed for complex queries such as “List all disorders with a finding site of brain region X, and then find the semantically related references in all participating databases based on the ontological model of the disease or its anatomical and morphological attributes”. Precision of the NLP-derived coding of the unstructured phenotypes in each dataset was 88% (n = 50), and precision of the semantic mapping between these terms across datasets was 98% (n = 100). To our knowledge, this is the first example of the use of both semantic decomposition of disease relationships and hierarchical information found in ontologies to integrate heterogeneous phenotypes across clinical and molecular datasets. PMID:20495688
Manual Gene Ontology annotation workflow at the Mouse Genome Informatics Database
Drabkin, Harold J.; Blake, Judith A.
2012-01-01
The Mouse Genome Database, the Gene Expression Database and the Mouse Tumor Biology database are integrated components of the Mouse Genome Informatics (MGI) resource (http://www.informatics.jax.org). The MGI system presents both a consensus view and an experimental view of the knowledge concerning the genetics and genomics of the laboratory mouse. From genotype to phenotype, this information resource integrates information about genes, sequences, maps, expression analyses, alleles, strains and mutant phenotypes. Comparative mammalian data are also presented particularly in regards to the use of the mouse as a model for the investigation of molecular and genetic components of human diseases. These data are collected from literature curation as well as downloads of large datasets (SwissProt, LocusLink, etc.). MGI is one of the founding members of the Gene Ontology (GO) and uses the GO for functional annotation of genes. Here, we discuss the workflow associated with manual GO annotation at MGI, from literature collection to display of the annotations. Peer-reviewed literature is collected mostly from a set of journals available electronically. Selected articles are entered into a master bibliography and indexed to one of eight areas of interest such as ‘GO’ or ‘homology’ or ‘phenotype’. Each article is then either indexed to a gene already contained in the database or funneled through a separate nomenclature database to add genes. The master bibliography and associated indexing provide information for various curator-reports such as ‘papers selected for GO that refer to genes with NO GO annotation’. Once indexed, curators who have expertise in appropriate disciplines enter pertinent information. MGI makes use of several controlled vocabularies that ensure uniform data encoding, enable robust analysis and support the construction of complex queries. These vocabularies range from pick-lists to structured vocabularies such as the GO. All data associations are supported with statements of evidence as well as access to source publications. PMID:23110975
LAILAPS: the plant science search engine.
Esch, Maria; Chen, Jinbo; Colmsee, Christian; Klapperstück, Matthias; Grafahrend-Belau, Eva; Scholz, Uwe; Lange, Matthias
2015-01-01
With the number of sequenced plant genomes growing, the number of predicted genes and functional annotations is also increasing. The association between genes and phenotypic traits is currently of great interest. Unfortunately, the information available today is widely scattered over a number of different databases. Information retrieval (IR) has become an all-encompassing bioinformatics methodology for extracting knowledge from complex, heterogeneous and distributed databases, and therefore can be a useful tool for obtaining a comprehensive view of plant genomics, from genes to traits. Here we describe LAILAPS (http://lailaps.ipk-gatersleben.de), an IR system designed to link plant genomic data in the context of phenotypic attributes for a detailed forward genetic research. LAILAPS comprises around 65 million indexed documents, encompassing >13 major life science databases with around 80 million links to plant genomic resources. The LAILAPS search engine allows fuzzy querying for candidate genes linked to specific traits over a loosely integrated system of indexed and interlinked genome databases. Query assistance and an evidence-based annotation system enable time-efficient and comprehensive information retrieval. An artificial neural network incorporating user feedback and behavior tracking allows relevance sorting of results. We fully describe LAILAPS's functionality and capabilities by comparing this system's performance with other widely used systems and by reporting both a validation in maize and a knowledge discovery use-case focusing on candidate genes in barley. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists.
Rabal, Obdulia; Link, Wolfgang; Serelde, Beatriz G; Bischoff, James R; Oyarzabal, Julen
2010-04-01
Here we report the development and validation of a complete solution to manage and analyze the data produced by image-based phenotypic screening campaigns of small-molecule libraries. In one step initial crude images are analyzed for multiple cytological features, statistical analysis is performed and molecules that produce the desired phenotypic profile are identified. A naïve Bayes classifier, integrating chemical and phenotypic spaces, is built and utilized during the process to assess those images initially classified as "fuzzy"-an automated iterative feedback tuning. Simultaneously, all this information is directly annotated in a relational database containing the chemical data. This novel fully automated method was validated by conducting a re-analysis of results from a high-content screening campaign involving 33 992 molecules used to identify inhibitors of the PI3K/Akt signaling pathway. Ninety-two percent of confirmed hits identified by the conventional multistep analysis method were identified using this integrated one-step system as well as 40 new hits, 14.9% of the total, originally false negatives. Ninety-six percent of true negatives were properly recognized too. A web-based access to the database, with customizable data retrieval and visualization tools, facilitates the posterior analysis of annotated cytological features which allows identification of additional phenotypic profiles; thus, further analysis of original crude images is not required.
The androgen receptor gene mutations database.
Gottlieb, B; Trifiro, M; Lumbroso, R; Pinsky, L
1997-01-01
The current version of the androgen receptor (AR) gene mutations database is described. The total number of reported mutations has risen from 212 to 272. We have expanded the database: (i) by adding a large amount of new data on somatic mutations in prostatic cancer tissue; (ii) by defining a new constitutional phenotype, mild androgen insensitivity (MAI); (iii) by placing additional relevant information on an internet site (http://www.mcgill.ca/androgendb/ ). The database has allowed us to examine the contribution of CpG sites to the multiplicity of reports of the same mutation in different families. The database is also available from EMBL (ftp.ebi.ac.uk/pub/databases/androgen) or as a Macintosh Filemaker Pro or Word file (MC33@musica,mcgill.ca)
The androgen receptor gene mutations database.
Gottlieb, B; Trifiro, M; Lumbroso, R; Pinsky, L
1997-01-01
The current version of the androgen receptor (AR) gene mutations database is described. The total number of reported mutations has risen from 212 to 272. We have expanded the database: (i) by adding a large amount of new data on somatic mutations in prostatic cancer tissue; (ii) by defining a new constitutional phenotype, mild androgen insensitivity (MAI); (iii) by placing additional relevant information on an internet site (http://www.mcgill.ca/androgendb/ ). The database has allowed us to examine the contribution of CpG sites to the multiplicity of reports of the same mutation in different families. The database is also available from EMBL (ftp.ebi.ac.uk/pub/databases/androgen) or as a Macintosh Filemaker Pro or Word file (MC33@musica,mcgill.ca) PMID:9016528
TOMATOMA Update: Phenotypic and Metabolite Information in the Micro-Tom Mutant Resource.
Shikata, Masahito; Hoshikawa, Ken; Ariizumi, Tohru; Fukuda, Naoya; Yamazaki, Yukiko; Ezura, Hiroshi
2016-01-01
TOMATOMA (http://tomatoma.nbrp.jp/) is a tomato mutant database providing visible phenotypic data of tomato mutant lines generated by ethylmethane sulfonate (EMS) treatment or γ-ray irradiation in the genetic background of Micro-Tom, a small and rapidly growing variety. To increase mutation efficiency further, mutagenized M3 seeds were subjected to a second round of EMS treatment; M3M1 populations were generated. These plants were self-pollinated, and 4,952 lines of M3M2 mutagenized seeds were generated. We checked for visible phenotypes in the M3M2 plants, and 618 mutant lines with 1,194 phenotypic categories were identified. In addition to the phenotypic information, we investigated Brix values and carotenoid contents in the fruits of individual mutants. Of 466 samples from 171 mutant lines, Brix values and carotenoid contents were between 3.2% and 11.6% and 6.9 and 37.3 µg g(-1) FW, respectively. This metabolite information concerning the mutant fruits would be useful in breeding programs as well as for the elucidation of metabolic regulation. Researchers are able to browse and search this phenotypic and metabolite information and order seeds of individual mutants via TOMATOMA. Our new Micro-Tom double-mutagenized populations and the metabolic information could provide a valuable genetic toolkit to accelerate tomato research and potential breeding programs. © The Author 2015. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.
CTGA: the database for genetic disorders in Arab populations.
Tadmouri, Ghazi O; Al Ali, Mahmoud Taleb; Al-Haj Ali, Sarah; Al Khaja, Najib
2006-01-01
The Arabs comprise a genetically heterogeneous group that resulted from the admixture of different populations throughout history. They share many common characteristics responsible for a considerable proportion of perinatal and neonatal mortalities. To this end, the Centre for Arab Genomic Studies (CAGS) launched a pilot project to construct the 'Catalogue of Transmission Genetics in Arabs' (CTGA) database for genetic disorders in Arabs. Information in CTGA is drawn from published research and mined hospital records. The database offers web-based basic and advanced search approaches. In either case, the final search result is a detailed HTML record that includes text-, URL- and graphic-based fields. At present, CTGA hosts entries for 692 phenotypes and 235 related genes described in Arab individuals. Of these, 213 phenotypic descriptions and 22 related genes were observed in the Arab population of the United Arab Emirates (UAE). These results emphasize the role of CTGA as an essential tool to promote scientific research on genetic disorders in the region. The priority of CTGA is to provide timely information on the occurrence of genetic disorders in Arab individuals. It is anticipated that data from Arab countries other than the UAE will be exhaustively searched and incorporated in CTGA (http://www.cags.org.ae).
CTGA: the database for genetic disorders in Arab populations
Tadmouri, Ghazi O.; Ali, Mahmoud Taleb Al; Ali, Sarah Al-Haj; Khaja, Najib Al
2006-01-01
The Arabs comprise a genetically heterogeneous group that resulted from the admixture of different populations throughout history. They share many common characteristics responsible for a considerable proportion of perinatal and neonatal mortalities. To this end, the Centre for Arab Genomic Studies (CAGS) launched a pilot project to construct the ‘Catalogue of Transmission Genetics in Arabs’ (CTGA) database for genetic disorders in Arabs. Information in CTGA is drawn from published research and mined hospital records. The database offers web-based basic and advanced search approaches. In either case, the final search result is a detailed HTML record that includes text-, URL- and graphic-based fields. At present, CTGA hosts entries for 692 phenotypes and 235 related genes described in Arab individuals. Of these, 213 phenotypic descriptions and 22 related genes were observed in the Arab population of the United Arab Emirates (UAE). These results emphasize the role of CTGA as an essential tool to promote scientific research on genetic disorders in the region. The priority of CTGA is to provide timely information on the occurrence of genetic disorders in Arab individuals. It is anticipated that data from Arab countries other than the UAE will be exhaustively searched and incorporated in CTGA (). PMID:16381941
Chang, Yi-Chien; Hu, Zhenjun; Rachlin, John; Anton, Brian P; Kasif, Simon; Roberts, Richard J; Steffen, Martin
2016-01-04
The COMBREX database (COMBREX-DB; combrex.bu.edu) is an online repository of information related to (i) experimentally determined protein function, (ii) predicted protein function, (iii) relationships among proteins of unknown function and various types of experimental data, including molecular function, protein structure, and associated phenotypes. The database was created as part of the novel COMBREX (COMputational BRidges to EXperiments) effort aimed at accelerating the rate of gene function validation. It currently holds information on ∼ 3.3 million known and predicted proteins from over 1000 completely sequenced bacterial and archaeal genomes. The database also contains a prototype recommendation system for helping users identify those proteins whose experimental determination of function would be most informative for predicting function for other proteins within protein families. The emphasis on documenting experimental evidence for function predictions, and the prioritization of uncharacterized proteins for experimental testing distinguish COMBREX from other publicly available microbial genomics resources. This article describes updates to COMBREX-DB since an initial description in the 2011 NAR Database Issue. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
An annotated database of Arabidopsis mutants of acyl lipid metabolism
McGlew, Kathleen; Shaw, Vincent; Zhang, Meng; ...
2014-12-10
Mutants have played a fundamental role in gene discovery and in understanding the function of genes involved in plant acyl lipid metabolism. The first mutant in Arabidopsis lipid metabolism ( fad4) was described in 1985. Since that time, characterization of mutants in more than 280 genes associated with acyl lipid metabolism has been reported. This review provides a brief background and history on identification of mutants in acyl lipid metabolism, an analysis of the distribution of mutants in different areas of acyl lipid metabolism and presents an annotated database (ARALIPmutantDB) of these mutants. The database provides information on the phenotypesmore » of mutants, pathways and enzymes/proteins associated with the mutants, and allows rapid access via hyperlinks to summaries of information about each mutant and to literature that provides information on the lipid composition of the mutants. Mutants for at least 30 % of the genes in the database have multiple names, which have been compiled here to reduce ambiguities in searches for information. Furthermore, the database should also provide a tool for exploring the relationships between mutants in acyl lipid-related genes and their lipid phenotypes and point to opportunities for further research.« less
e-GRASP: an integrated evolutionary and GRASP resource for exploring disease associations.
Karim, Sajjad; NourEldin, Hend Fakhri; Abusamra, Heba; Salem, Nada; Alhathli, Elham; Dudley, Joel; Sanderford, Max; Scheinfeldt, Laura B; Chaudhary, Adeel G; Al-Qahtani, Mohammed H; Kumar, Sudhir
2016-10-17
Genome-wide association studies (GWAS) have become a mainstay of biological research concerned with discovering genetic variation linked to phenotypic traits and diseases. Both discrete and continuous traits can be analyzed in GWAS to discover associations between single nucleotide polymorphisms (SNPs) and traits of interest. Associations are typically determined by estimating the significance of the statistical relationship between genetic loci and the given trait. However, the prioritization of bona fide, reproducible genetic associations from GWAS results remains a central challenge in identifying genomic loci underlying common complex diseases. Evolutionary-aware meta-analysis of the growing GWAS literature is one way to address this challenge and to advance from association to causation in the discovery of genotype-phenotype relationships. We have created an evolutionary GWAS resource to enable in-depth query and exploration of published GWAS results. This resource uses the publically available GWAS results annotated in the GRASP2 database. The GRASP2 database includes results from 2082 studies, 177 broad phenotype categories, and ~8.87 million SNP-phenotype associations. For each SNP in e-GRASP, we present information from the GRASP2 database for convenience as well as evolutionary information (e.g., rate and timespan). Users can, therefore, identify not only SNPs with highly significant phenotype-association P-values, but also SNPs that are highly replicated and/or occur at evolutionarily conserved sites that are likely to be functionally important. Additionally, we provide an evolutionary-adjusted SNP association ranking (E-rank) that uses cross-species evolutionary conservation scores and population allele frequencies to transform P-values in an effort to enhance the discovery of SNPs with a greater probability of biologically meaningful disease associations. By adding an evolutionary dimension to the GWAS results available in the GRASP2 database, our e-GRASP resource will enable a more effective exploration of SNPs not only by the statistical significance of trait associations, but also by the number of studies in which associations have been replicated, and the evolutionary context of the associated mutations. Therefore, e-GRASP will be a valuable resource for aiding researchers in the identification of bona fide, reproducible genetic associations from GWAS results. This resource is freely available at http://www.mypeg.info/egrasp .
Topological Phenotypes Constitute a New Dimension in the Phenotypic Space of Leaf Venation Networks
Ronellenfitsch, Henrik; Lasser, Jana; Daly, Douglas C.; Katifori, Eleni
2015-01-01
The leaves of angiosperms contain highly complex venation networks consisting of recursively nested, hierarchically organized loops. We describe a new phenotypic trait of reticulate vascular networks based on the topology of the nested loops. This phenotypic trait encodes information orthogonal to widely used geometric phenotypic traits, and thus constitutes a new dimension in the leaf venation phenotypic space. We apply our metric to a database of 186 leaves and leaflets representing 137 species, predominantly from the Burseraceae family, revealing diverse topological network traits even within this single family. We show that topological information significantly improves identification of leaves from fragments by calculating a “leaf venation fingerprint” from topology and geometry. Further, we present a phenomenological model suggesting that the topological traits can be explained by noise effects unique to specimen during development of each leaf which leave their imprint on the final network. This work opens the path to new quantitative identification techniques for leaves which go beyond simple geometric traits such as vein density and is directly applicable to other planar or sub-planar networks such as blood vessels in the brain. PMID:26700471
Comparison of genomic-enhanced EPD systems using an external phenotypic database
USDA-ARS?s Scientific Manuscript database
The American Angus Association (AAA) is currently evaluating two methods to incorporate genomic information into their genetic evaluation program: 1) multi-trait incorporation of an externally produced molecular breeding value as an indicator trait (MT) and 2) single-step evaluation with an unweight...
Bouwman, Jildau; Dragsted, Lars O.; Drevon, Christian A.; Elliott, Ruan; de Groot, Philip; Kaput, Jim; Mathers, John C.; Müller, Michael; Pepping, Fre; Saito, Jahn; Scalbert, Augustin; Radonjic, Marijana; Rocca-Serra, Philippe; Travis, Anthony; Wopereis, Suzan; Evelo, Chris T.
2010-01-01
The challenge of modern nutrition and health research is to identify food-based strategies promoting life-long optimal health and well-being. This research is complex because it exploits a multitude of bioactive compounds acting on an extensive network of interacting processes. Whereas nutrition research can profit enormously from the revolution in ‘omics’ technologies, it has discipline-specific requirements for analytical and bioinformatic procedures. In addition to measurements of the parameters of interest (measures of health), extensive description of the subjects of study and foods or diets consumed is central for describing the nutritional phenotype. We propose and pursue an infrastructural activity of constructing the “Nutritional Phenotype database” (dbNP). When fully developed, dbNP will be a research and collaboration tool and a publicly available data and knowledge repository. Creation and implementation of the dbNP will maximize benefits to the research community by enabling integration and interrogation of data from multiple studies, from different research groups, different countries and different—omics levels. The dbNP is designed to facilitate storage of biologically relevant, pre-processed—omics data, as well as study descriptive and study participant phenotype data. It is also important to enable the combination of this information at different levels (e.g. to facilitate linkage of data describing participant phenotype, genotype and food intake with information on study design and—omics measurements, and to combine all of this with existing knowledge). The biological information stored in the database (i.e. genetics, transcriptomics, proteomics, biomarkers, metabolomics, functional assays, food intake and food composition) is tailored to nutrition research and embedded in an environment of standard procedures and protocols, annotations, modular data-basing, networking and integrated bioinformatics. The dbNP is an evolving enterprise, which is only sustainable if it is accepted and adopted by the wider nutrition and health research community as an open source, pre-competitive and publicly available resource where many partners both can contribute and profit from its developments. We introduce the Nutrigenomics Organisation (NuGO, http://www.nugo.org) as a membership association responsible for establishing and curating the dbNP. Within NuGO, all efforts related to dbNP (i.e. usage, coordination, integration, facilitation and maintenance) will be directed towards a sustainable and federated infrastructure. PMID:21052526
Davis, Allan Peter; Wiegers, Thomas C.; Roberts, Phoebe M.; King, Benjamin L.; Lay, Jean M.; Lennon-Hopkins, Kelley; Sciaky, Daniela; Johnson, Robin; Keating, Heather; Greene, Nigel; Hernandez, Robert; McConnell, Kevin J.; Enayetallah, Ahmed E.; Mattingly, Carolyn J.
2013-01-01
Improving the prediction of chemical toxicity is a goal common to both environmental health research and pharmaceutical drug development. To improve safety detection assays, it is critical to have a reference set of molecules with well-defined toxicity annotations for training and validation purposes. Here, we describe a collaboration between safety researchers at Pfizer and the research team at the Comparative Toxicogenomics Database (CTD) to text mine and manually review a collection of 88 629 articles relating over 1 200 pharmaceutical drugs to their potential involvement in cardiovascular, neurological, renal and hepatic toxicity. In 1 year, CTD biocurators curated 2 54 173 toxicogenomic interactions (1 52 173 chemical–disease, 58 572 chemical–gene, 5 345 gene–disease and 38 083 phenotype interactions). All chemical–gene–disease interactions are fully integrated with public CTD, and phenotype interactions can be downloaded. We describe Pfizer’s text-mining process to collate the articles, and CTD’s curation strategy, performance metrics, enhanced data content and new module to curate phenotype information. As well, we show how data integration can connect phenotypes to diseases. This curation can be leveraged for information about toxic endpoints important to drug safety and help develop testable hypotheses for drug–disease events. The availability of these detailed, contextualized, high-quality annotations curated from seven decades’ worth of the scientific literature should help facilitate new mechanistic screening assays for pharmaceutical compound survival. This unique partnership demonstrates the importance of resource sharing and collaboration between public and private entities and underscores the complementary needs of the environmental health science and pharmaceutical communities. Database URL: http://ctdbase.org/ PMID:24288140
Davis, Allan Peter; Wiegers, Thomas C; Roberts, Phoebe M; King, Benjamin L; Lay, Jean M; Lennon-Hopkins, Kelley; Sciaky, Daniela; Johnson, Robin; Keating, Heather; Greene, Nigel; Hernandez, Robert; McConnell, Kevin J; Enayetallah, Ahmed E; Mattingly, Carolyn J
2013-01-01
Improving the prediction of chemical toxicity is a goal common to both environmental health research and pharmaceutical drug development. To improve safety detection assays, it is critical to have a reference set of molecules with well-defined toxicity annotations for training and validation purposes. Here, we describe a collaboration between safety researchers at Pfizer and the research team at the Comparative Toxicogenomics Database (CTD) to text mine and manually review a collection of 88,629 articles relating over 1,200 pharmaceutical drugs to their potential involvement in cardiovascular, neurological, renal and hepatic toxicity. In 1 year, CTD biocurators curated 254,173 toxicogenomic interactions (152,173 chemical-disease, 58,572 chemical-gene, 5,345 gene-disease and 38,083 phenotype interactions). All chemical-gene-disease interactions are fully integrated with public CTD, and phenotype interactions can be downloaded. We describe Pfizer's text-mining process to collate the articles, and CTD's curation strategy, performance metrics, enhanced data content and new module to curate phenotype information. As well, we show how data integration can connect phenotypes to diseases. This curation can be leveraged for information about toxic endpoints important to drug safety and help develop testable hypotheses for drug-disease events. The availability of these detailed, contextualized, high-quality annotations curated from seven decades' worth of the scientific literature should help facilitate new mechanistic screening assays for pharmaceutical compound survival. This unique partnership demonstrates the importance of resource sharing and collaboration between public and private entities and underscores the complementary needs of the environmental health science and pharmaceutical communities. Database URL: http://ctdbase.org/
CYP21A2 mutation update: Comprehensive analysis of databases and published genetic variants.
Simonetti, Leandro; Bruque, Carlos D; Fernández, Cecilia S; Benavides-Mori, Belén; Delea, Marisol; Kolomenski, Jorge E; Espeche, Lucía D; Buzzalino, Noemí D; Nadra, Alejandro D; Dain, Liliana
2018-01-01
Congenital adrenal hyperplasia (CAH) is a group of autosomal recessive disorders of adrenal steroidogenesis. Disorders in steroid 21-hydroxylation account for over 95% of patients with CAH. Clinically, the 21-hydroxylase deficiency has been classified in a broad spectrum of clinical forms, ranging from severe or classical, to mild late onset or non-classical. Known allelic variants in the disease causing CYP21A2 gene are spread among different sources. Until recently, most variants reported have been identified in the clinical setting, which presumably bias described variants to pathogenic ones, as those found in the CYPAlleles database. Nevertheless, a large number of variants are being described in massive genome projects, many of which are found in dbSNP, but lack functional implications and/or their phenotypic effect. In this work, we gathered a total of 1,340 GVs in the CYP21A2 gene, from which 899 variants were unique and 230 have an effect on human health, and compiled all this information in an integrated database. We also connected CYP21A2 sequence information to phenotypic effects for all available mutations, including double mutants in cis. Data compiled in the present work could help physicians in the genetic counseling of families affected with 21-hydroxylase deficiency. © 2017 Wiley Periodicals, Inc.
SIDD: A Semantically Integrated Database towards a Global View of Human Disease
Cheng, Liang; Wang, Guohua; Li, Jie; Zhang, Tianjiao; Xu, Peigang; Wang, Yadong
2013-01-01
Background A number of databases have been developed to collect disease-related molecular, phenotypic and environmental features (DR-MPEs), such as genes, non-coding RNAs, genetic variations, drugs, phenotypes and environmental factors. However, each of current databases focused on only one or two DR-MPEs. There is an urgent demand to develop an integrated database, which can establish semantic associations among disease-related databases and link them to provide a global view of human disease at the biological level. This database, once developed, will facilitate researchers to query various DR-MPEs through disease, and investigate disease mechanisms from different types of data. Methodology To establish an integrated disease-associated database, disease vocabularies used in different databases are mapped to Disease Ontology (DO) through semantic match. 4,284 and 4,186 disease terms from Medical Subject Headings (MeSH) and Online Mendelian Inheritance in Man (OMIM) respectively are mapped to DO. Then, the relationships between DR-MPEs and diseases are extracted and merged from different source databases for reducing the data redundancy. Conclusions A semantically integrated disease-associated database (SIDD) is developed, which integrates 18 disease-associated databases, for researchers to browse multiple types of DR-MPEs in a view. A web interface allows easy navigation for querying information through browsing a disease ontology tree or searching a disease term. Furthermore, a network visualization tool using Cytoscape Web plugin has been implemented in SIDD. It enhances the SIDD usage when viewing the relationships between diseases and DR-MPEs. The current version of SIDD (Jul 2013) documents 4,465,131 entries relating to 139,365 DR-MPEs, and to 3,824 human diseases. The database can be freely accessed from: http://mlg.hit.edu.cn/SIDD. PMID:24146757
Translational genomics for plant breeding with the genome sequence explosion.
Kang, Yang Jae; Lee, Taeyoung; Lee, Jayern; Shim, Sangrea; Jeong, Haneul; Satyawan, Dani; Kim, Moon Young; Lee, Suk-Ha
2016-04-01
The use of next-generation sequencers and advanced genotyping technologies has propelled the field of plant genomics in model crops and plants and enhanced the discovery of hidden bridges between genotypes and phenotypes. The newly generated reference sequences of unstudied minor plants can be annotated by the knowledge of model plants via translational genomics approaches. Here, we reviewed the strategies of translational genomics and suggested perspectives on the current databases of genomic resources and the database structures of translated information on the new genome. As a draft picture of phenotypic annotation, translational genomics on newly sequenced plants will provide valuable assistance for breeders and researchers who are interested in genetic studies. © 2015 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Shen, Lishuang; Diroma, Maria Angela; Gonzalez, Michael; Navarro-Gomez, Daniel; Leipzig, Jeremy; Lott, Marie T; van Oven, Mannis; Wallace, Douglas C; Muraresku, Colleen Clarke; Zolkipli-Cunningham, Zarazuela; Chinnery, Patrick F; Attimonelli, Marcella; Zuchner, Stephan; Falk, Marni J; Gai, Xiaowu
2016-06-01
MSeqDR is the Mitochondrial Disease Sequence Data Resource, a centralized and comprehensive genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phenotypes, genomes, genes, and variants. A central Web portal (https://mseqdr.org) integrates community knowledge from expert-curated databases with genomic and phenotype data shared by clinicians and researchers. MSeqDR also functions as a centralized application server for Web-based tools to analyze data across both mitochondrial and nuclear DNA, including investigator-driven whole exome or genome dataset analyses through MSeqDR-Genesis. MSeqDR-GBrowse genome browser supports interactive genomic data exploration and visualization with custom tracks relevant to mtDNA variation and mitochondrial disease. MSeqDR-LSDB is a locus-specific database that currently manages 178 mitochondrial diseases, 1,363 genes associated with mitochondrial biology or disease, and 3,711 pathogenic variants in those genes. MSeqDR Disease Portal allows hierarchical tree-style disease exploration to evaluate their unique descriptions, phenotypes, and causative variants. Automated genomic data submission tools are provided that capture ClinVar compliant variant annotations. PhenoTips will be used for phenotypic data submission on deidentified patients using human phenotype ontology terminology. The development of a dynamic informed patient consent process to guide data access is underway to realize the full potential of these resources. © 2016 WILEY PERIODICALS, INC.
Shen, Lishuang; Diroma, Maria Angela; Gonzalez, Michael; Navarro-Gomez, Daniel; Leipzig, Jeremy; Lott, Marie T.; van Oven, Mannis; Wallace, Douglas C.; Muraresku, Colleen Clarke; Zolkipli-Cunningham, Zarazuela; Chinnery, Patrick F.; Attimonelli, Marcella; Zuchner, Stephan
2016-01-01
MSeqDR is the Mitochondrial Disease Sequence Data Resource, a centralized and comprehensive genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phenotypes, genomes, genes, and variants. A central Web portal (https://mseqdr.org) integrates community knowledge from expert-curated databases with genomic and phenotype data shared by clinicians and researchers. MSeqDR also functions as a centralized application server for Web-based tools to analyze data across both mitochondrial and nuclear DNA, including investigator-driven whole exome or genome dataset analyses through MSeqDR-Genesis. MSeqDR-GBrowse supports interactive genomic data exploration and visualization with custom tracks relevant to mtDNA variation and disease. MSeqDR-LSDB is a locus specific database that currently manages 178 mitochondrial diseases, 1,363 genes associated with mitochondrial biology or disease, and 3,711 pathogenic variants in those genes. MSeqDR Disease Portal allows hierarchical tree-style disease exploration to evaluate their unique descriptions, phenotypes, and causative variants. Automated genomic data submission tools are provided that capture ClinVar-compliant variant annotations. PhenoTips is used for phenotypic data submission on de-identified patients using human phenotype ontology terminology. Development of a dynamic informed patient consent process to guide data access is underway to realize the full potential of these resources. PMID:26919060
Addition of a breeding database in the Genome Database for Rosaceae
Evans, Kate; Jung, Sook; Lee, Taein; Brutcher, Lisa; Cho, Ilhyung; Peace, Cameron; Main, Dorrie
2013-01-01
Breeding programs produce large datasets that require efficient management systems to keep track of performance, pedigree, geographical and image-based data. With the development of DNA-based screening technologies, more breeding programs perform genotyping in addition to phenotyping for performance evaluation. The integration of breeding data with other genomic and genetic data is instrumental for the refinement of marker-assisted breeding tools, enhances genetic understanding of important crop traits and maximizes access and utility by crop breeders and allied scientists. Development of new infrastructure in the Genome Database for Rosaceae (GDR) was designed and implemented to enable secure and efficient storage, management and analysis of large datasets from the Washington State University apple breeding program and subsequently expanded to fit datasets from other Rosaceae breeders. The infrastructure was built using the software Chado and Drupal, making use of the Natural Diversity module to accommodate large-scale phenotypic and genotypic data. Breeders can search accessions within the GDR to identify individuals with specific trait combinations. Results from Search by Parentage lists individuals with parents in common and results from Individual Variety pages link to all data available on each chosen individual including pedigree, phenotypic and genotypic information. Genotypic data are searchable by markers and alleles; results are linked to other pages in the GDR to enable the user to access tools such as GBrowse and CMap. This breeding database provides users with the opportunity to search datasets in a fully targeted manner and retrieve and compare performance data from multiple selections, years and sites, and to output the data needed for variety release publications and patent applications. The breeding database facilitates efficient program management. Storing publicly available breeding data in a database together with genomic and genetic data will further accelerate the cross-utilization of diverse data types by researchers from various disciplines. Database URL: http://www.rosaceae.org/breeders_toolbox PMID:24247530
Addition of a breeding database in the Genome Database for Rosaceae.
Evans, Kate; Jung, Sook; Lee, Taein; Brutcher, Lisa; Cho, Ilhyung; Peace, Cameron; Main, Dorrie
2013-01-01
Breeding programs produce large datasets that require efficient management systems to keep track of performance, pedigree, geographical and image-based data. With the development of DNA-based screening technologies, more breeding programs perform genotyping in addition to phenotyping for performance evaluation. The integration of breeding data with other genomic and genetic data is instrumental for the refinement of marker-assisted breeding tools, enhances genetic understanding of important crop traits and maximizes access and utility by crop breeders and allied scientists. Development of new infrastructure in the Genome Database for Rosaceae (GDR) was designed and implemented to enable secure and efficient storage, management and analysis of large datasets from the Washington State University apple breeding program and subsequently expanded to fit datasets from other Rosaceae breeders. The infrastructure was built using the software Chado and Drupal, making use of the Natural Diversity module to accommodate large-scale phenotypic and genotypic data. Breeders can search accessions within the GDR to identify individuals with specific trait combinations. Results from Search by Parentage lists individuals with parents in common and results from Individual Variety pages link to all data available on each chosen individual including pedigree, phenotypic and genotypic information. Genotypic data are searchable by markers and alleles; results are linked to other pages in the GDR to enable the user to access tools such as GBrowse and CMap. This breeding database provides users with the opportunity to search datasets in a fully targeted manner and retrieve and compare performance data from multiple selections, years and sites, and to output the data needed for variety release publications and patent applications. The breeding database facilitates efficient program management. Storing publicly available breeding data in a database together with genomic and genetic data will further accelerate the cross-utilization of diverse data types by researchers from various disciplines. Database URL: http://www.rosaceae.org/breeders_toolbox.
Joslin, A C; Green, R; German, J B; Lange, M C
2014-09-01
Advances in the development of bioinformatic tools continue to improve investigators' ability to interrogate, organize, and derive knowledge from large amounts of heterogeneous information. These tools often require advanced technical skills not possessed by life scientists. User-friendly, low-barrier-to-entry methods of visualizing nutrigenomics information are yet to be developed. We utilized concept mapping software from the Institute for Human and Machine Cognition to create a conceptual model of diet and health-related data that provides a foundation for future nutrigenomics ontologies describing published nutrient-gene/polymorphism-phenotype data. In this model, maps containing phenotype, nutrient, gene product, and genetic polymorphism interactions are visualized as triples of two concepts linked together by a linking phrase. These triples, or "knowledge propositions," contextualize aggregated data and information into easy-to-read knowledge maps. Maps of these triples enable visualization of genes spanning the One-Carbon Metabolism (OCM) pathway, their sequence variants, and multiple literature-mined associations including concepts relevant to nutrition, phenotypes, and health. The concept map development process documents the incongruity of information derived from pathway databases versus literature resources. This conceptual model highlights the importance of incorporating information about genes in upstream pathways that provide substrates, as well as downstream pathways that utilize products of the pathway under investigation, in this case OCM. Other genes and their polymorphisms, such as TCN2 and FUT2, although not directly involved in OCM, potentially alter OCM pathway functionality. These upstream gene products regulate substrates such as B12. Constellations of polymorphisms affecting the functionality of genes along OCM, together with substrate and cofactor availability, may impact resultant phenotypes. These conceptual maps provide a foundational framework for development of nutrient-gene/polymorphism-phenotype ontologies and systems visualization.
Recent Progress in the Development of Metabolome Databases for Plant Systems Biology
Fukushima, Atsushi; Kusano, Miyako
2013-01-01
Metabolomics has grown greatly as a functional genomics tool, and has become an invaluable diagnostic tool for biochemical phenotyping of biological systems. Over the past decades, a number of databases involving information related to mass spectra, compound names and structures, statistical/mathematical models and metabolic pathways, and metabolite profile data have been developed. Such databases complement each other and support efficient growth in this area, although the data resources remain scattered across the World Wide Web. Here, we review available metabolome databases and summarize the present status of development of related tools, particularly focusing on the plant metabolome. Data sharing discussed here will pave way for the robust interpretation of metabolomic data and advances in plant systems biology. PMID:23577015
Database resources of the National Center for Biotechnology Information
Wheeler, David L.; Barrett, Tanya; Benson, Dennis A.; Bryant, Stephen H.; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M.; DiCuccio, Michael; Edgar, Ron; Federhen, Scott; Feolo, Michael; Geer, Lewis Y.; Helmberg, Wolfgang; Kapustin, Yuri; Khovayko, Oleg; Landsman, David; Lipman, David J.; Madden, Thomas L.; Maglott, Donna R.; Miller, Vadim; Ostell, James; Pruitt, Kim D.; Schuler, Gregory D.; Shumway, Martin; Sequeira, Edwin; Sherry, Steven T.; Sirotkin, Karl; Souvorov, Alexandre; Starchenko, Grigory; Tatusov, Roman L.; Tatusova, Tatiana A.; Wagner, Lukas; Yaschenko, Eugene
2008-01-01
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data available through NCBI's web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link, Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genome, Genome Project and related tools, the Trace, Assembly, and Short Read Archives, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups, Influenza Viral Resources, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Database of Genotype and Phenotype, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool and the PubChem suite of small molecule databases. Augmenting the web applications are custom implementations of the BLAST program optimized to search specialized data sets. These resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. PMID:18045790
A probabilistic model to predict clinical phenotypic traits from genome sequencing.
Chen, Yun-Ching; Douville, Christopher; Wang, Cheng; Niknafs, Noushin; Yeo, Grace; Beleva-Guthrie, Violeta; Carter, Hannah; Stenson, Peter D; Cooper, David N; Li, Biao; Mooney, Sean; Karchin, Rachel
2014-09-01
Genetic screening is becoming possible on an unprecedented scale. However, its utility remains controversial. Although most variant genotypes cannot be easily interpreted, many individuals nevertheless attempt to interpret their genetic information. Initiatives such as the Personal Genome Project (PGP) and Illumina's Understand Your Genome are sequencing thousands of adults, collecting phenotypic information and developing computational pipelines to identify the most important variant genotypes harbored by each individual. These pipelines consider database and allele frequency annotations and bioinformatics classifications. We propose that the next step will be to integrate these different sources of information to estimate the probability that a given individual has specific phenotypes of clinical interest. To this end, we have designed a Bayesian probabilistic model to predict the probability of dichotomous phenotypes. When applied to a cohort from PGP, predictions of Gilbert syndrome, Graves' disease, non-Hodgkin lymphoma, and various blood groups were accurate, as individuals manifesting the phenotype in question exhibited the highest, or among the highest, predicted probabilities. Thirty-eight PGP phenotypes (26%) were predicted with area-under-the-ROC curve (AUC)>0.7, and 23 (15.8%) of these were statistically significant, based on permutation tests. Moreover, in a Critical Assessment of Genome Interpretation (CAGI) blinded prediction experiment, the models were used to match 77 PGP genomes to phenotypic profiles, generating the most accurate prediction of 16 submissions, according to an independent assessor. Although the models are currently insufficiently accurate for diagnostic utility, we expect their performance to improve with growth of publicly available genomics data and model refinement by domain experts.
An XML-based interchange format for genotype-phenotype data.
Whirl-Carrillo, M; Woon, M; Thorn, C F; Klein, T E; Altman, R B
2008-02-01
Recent advances in high-throughput genotyping and phenotyping have accelerated the creation of pharmacogenomic data. Consequently, the community requires standard formats to exchange large amounts of diverse information. To facilitate the transfer of pharmacogenomics data between databases and analysis packages, we have created a standard XML (eXtensible Markup Language) schema that describes both genotype and phenotype data as well as associated metadata. The schema accommodates information regarding genes, drugs, diseases, experimental methods, genomic/RNA/protein sequences, subjects, subject groups, and literature. The Pharmacogenetics and Pharmacogenomics Knowledge Base (PharmGKB; www.pharmgkb.org) has used this XML schema for more than 5 years to accept and process submissions containing more than 1,814,139 SNPs on 20,797 subjects using 8,975 assays. Although developed in the context of pharmacogenomics, the schema is of general utility for exchange of genotype and phenotype data. We have written syntactic and semantic validators to check documents using this format. The schema and code for validation is available to the community at http://www.pharmgkb.org/schema/index.html (last accessed: 8 October 2007). (c) 2007 Wiley-Liss, Inc.
Yang, Guijun; Liu, Jiangang; Zhao, Chunjiang; Li, Zhenhong; Huang, Yanbo; Yu, Haiyang; Xu, Bo; Yang, Xiaodong; Zhu, Dongmei; Zhang, Xiaoyan; Zhang, Ruyang; Feng, Haikuan; Zhao, Xiaoqing; Li, Zhenhai; Li, Heli; Yang, Hao
2017-01-01
Phenotyping plays an important role in crop science research; the accurate and rapid acquisition of phenotypic information of plants or cells in different environments is helpful for exploring the inheritance and expression patterns of the genome to determine the association of genomic and phenotypic information to increase the crop yield. Traditional methods for acquiring crop traits, such as plant height, leaf color, leaf area index (LAI), chlorophyll content, biomass and yield, rely on manual sampling, which is time-consuming and laborious. Unmanned aerial vehicle remote sensing platforms (UAV-RSPs) equipped with different sensors have recently become an important approach for fast and non-destructive high throughput phenotyping and have the advantage of flexible and convenient operation, on-demand access to data and high spatial resolution. UAV-RSPs are a powerful tool for studying phenomics and genomics. As the methods and applications for field phenotyping using UAVs to users who willing to derive phenotypic parameters from large fields and tests with the minimum effort on field work and getting highly reliable results are necessary, the current status and perspectives on the topic of UAV-RSPs for field-based phenotyping were reviewed based on the literature survey of crop phenotyping using UAV-RSPs in the Web of Science™ Core Collection database and cases study by NERCITA. The reference for the selection of UAV platforms and remote sensing sensors, the commonly adopted methods and typical applications for analyzing phenotypic traits by UAV-RSPs, and the challenge for crop phenotyping by UAV-RSPs were considered. The review can provide theoretical and technical support to promote the applications of UAV-RSPs for crop phenotyping.
Yang, Guijun; Liu, Jiangang; Zhao, Chunjiang; Li, Zhenhong; Huang, Yanbo; Yu, Haiyang; Xu, Bo; Yang, Xiaodong; Zhu, Dongmei; Zhang, Xiaoyan; Zhang, Ruyang; Feng, Haikuan; Zhao, Xiaoqing; Li, Zhenhai; Li, Heli; Yang, Hao
2017-01-01
Phenotyping plays an important role in crop science research; the accurate and rapid acquisition of phenotypic information of plants or cells in different environments is helpful for exploring the inheritance and expression patterns of the genome to determine the association of genomic and phenotypic information to increase the crop yield. Traditional methods for acquiring crop traits, such as plant height, leaf color, leaf area index (LAI), chlorophyll content, biomass and yield, rely on manual sampling, which is time-consuming and laborious. Unmanned aerial vehicle remote sensing platforms (UAV-RSPs) equipped with different sensors have recently become an important approach for fast and non-destructive high throughput phenotyping and have the advantage of flexible and convenient operation, on-demand access to data and high spatial resolution. UAV-RSPs are a powerful tool for studying phenomics and genomics. As the methods and applications for field phenotyping using UAVs to users who willing to derive phenotypic parameters from large fields and tests with the minimum effort on field work and getting highly reliable results are necessary, the current status and perspectives on the topic of UAV-RSPs for field-based phenotyping were reviewed based on the literature survey of crop phenotyping using UAV-RSPs in the Web of Science™ Core Collection database and cases study by NERCITA. The reference for the selection of UAV platforms and remote sensing sensors, the commonly adopted methods and typical applications for analyzing phenotypic traits by UAV-RSPs, and the challenge for crop phenotyping by UAV-RSPs were considered. The review can provide theoretical and technical support to promote the applications of UAV-RSPs for crop phenotyping. PMID:28713402
Chuartzman, Silvia G; Schuldiner, Maya
2018-03-25
In the last decade several collections of Saccharomyces cerevisiae yeast strains have been created. In these collections every gene is modified in a similar manner such as by a deletion or the addition of a protein tag. Such libraries have enabled a diversity of systematic screens, giving rise to large amounts of information regarding gene functions. However, often papers describing such screens focus on a single gene or a small set of genes and all other loci affecting the phenotype of choice ('hits') are only mentioned in tables that are provided as supplementary material and are often hard to retrieve or search. To help unify and make such data accessible, we have created a Database of High Throughput Screening Hits (dHITS). The dHITS database enables information to be obtained about screens in which genes of interest were found as well as the other genes that came up in that screen - all in a readily accessible and downloadable format. The ability to query large lists of genes at the same time provides a platform to easily analyse hits obtained from transcriptional analyses or other screens. We hope that this platform will serve as a tool to facilitate investigation of protein functions to the yeast community. © 2018 The Authors Yeast Published by John Wiley & Sons Ltd.
Sousa, Filipa L; Parente, Daniel J; Hessman, Jacob A; Chazelle, Allen; Teichmann, Sarah A; Swint-Kruse, Liskin
2016-09-01
The AlloRep database (www.AlloRep.org) (Sousa et al., 2016) [1] compiles extensive sequence, mutagenesis, and structural information for the LacI/GalR family of transcription regulators. Sequence alignments are presented for >3000 proteins in 45 paralog subfamilies and as a subsampled alignment of the whole family. Phenotypic and biochemical data on almost 6000 mutants have been compiled from an exhaustive search of the literature; citations for these data are included herein. These data include information about oligomerization state, stability, DNA binding and allosteric regulation. Protein structural data for 65 proteins are presented as easily-accessible, residue-contact networks. Finally, this article includes example queries to enable the use of the AlloRep database. See the related article, "AlloRep: a repository of sequence, structural and mutagenesis data for the LacI/GalR transcription regulators" (Sousa et al., 2016) [1].
Simmons, Michael; Singhal, Ayush; Lu, Zhiyong
2018-01-01
The key question of precision medicine is whether it is possible to find clinically actionable granularity in diagnosing disease and classifying patient risk. The advent of next generation sequencing and the widespread adoption of electronic health records (EHRs) have provided clinicians and researchers a wealth of data and made possible the precise characterization of individual patient genotypes and phenotypes. Unstructured text — found in biomedical publications and clinical notes — is an important component of genotype and phenotype knowledge. Publications in the biomedical literature provide essential information for interpreting genetic data. Likewise, clinical notes contain the richest source of phenotype information in EHRs. Text mining can render these texts computationally accessible and support information extraction and hypothesis generation. This chapter reviews the mechanics of text mining in precision medicine and discusses several specific use cases, including database curation for personalized cancer medicine, patient outcome prediction from EHR-derived cohorts, and pharmacogenomic research. Taken as a whole, these use cases demonstrate how text mining enables effective utilization of existing knowledge sources and thus promotes increased value for patients and healthcare systems. Text mining is an indispensable tool for translating genotype-phenotype data into effective clinical care that will undoubtedly play an important role in the eventual realization of precision medicine. PMID:27807747
Simmons, Michael; Singhal, Ayush; Lu, Zhiyong
2016-01-01
The key question of precision medicine is whether it is possible to find clinically actionable granularity in diagnosing disease and classifying patient risk. The advent of next-generation sequencing and the widespread adoption of electronic health records (EHRs) have provided clinicians and researchers a wealth of data and made possible the precise characterization of individual patient genotypes and phenotypes. Unstructured text-found in biomedical publications and clinical notes-is an important component of genotype and phenotype knowledge. Publications in the biomedical literature provide essential information for interpreting genetic data. Likewise, clinical notes contain the richest source of phenotype information in EHRs. Text mining can render these texts computationally accessible and support information extraction and hypothesis generation. This chapter reviews the mechanics of text mining in precision medicine and discusses several specific use cases, including database curation for personalized cancer medicine, patient outcome prediction from EHR-derived cohorts, and pharmacogenomic research. Taken as a whole, these use cases demonstrate how text mining enables effective utilization of existing knowledge sources and thus promotes increased value for patients and healthcare systems. Text mining is an indispensable tool for translating genotype-phenotype data into effective clinical care that will undoubtedly play an important role in the eventual realization of precision medicine.
Hindt, Maria; Socha, Amanda L.; Zuber, Hélène
2013-01-01
Here we present approaches for using multi-elemental imaging (specifically synchrotron X-ray fluorescence microscopy, SXRF) in ionomics, with examples using the model plant Arabidopsis thaliana. The complexity of each approach depends on the amount of a priori information available for the gene and/or phenotype being studied. Three approaches are outlined, which apply to experimental situations where a gene of interest has been identified but has an unknown phenotype (Phenotyping), an unidentified gene is associated with a known phenotype (Gene Cloning) and finally, a Screening approach, where both gene and phenotype are unknown. These approaches make use of open-access, online databases with which plant molecular genetics researchers working in the model plant Arabidopsis will be familiar, in particular the Ionomics Hub and online transcriptomic databases such as the Arabidopsis eFP browser. The approaches and examples we describe are based on the assumption that altering the expression of ion transporters can result in changes in elemental distribution. We provide methodological details on using elemental imaging to aid or accelerate gene functional characterization by narrowing down the search for candidate genes to the tissues in which elemental distributions are altered. We use synchrotron X-ray microprobes as a technique of choice, which can now be used to image all parts of an Arabidopsis plant in a hydrated state. We present elemental images of leaves, stem, root, siliques and germinating hypocotyls. PMID:23912758
MGIS: managing banana (Musa spp.) genetic resources information and high-throughput genotyping data
Guignon, V.; Sempere, G.; Sardos, J.; Hueber, Y.; Duvergey, H.; Andrieu, A.; Chase, R.; Jenny, C.; Hazekamp, T.; Irish, B.; Jelali, K.; Adeka, J.; Ayala-Silva, T.; Chao, C.P.; Daniells, J.; Dowiya, B.; Effa effa, B.; Gueco, L.; Herradura, L.; Ibobondji, L.; Kempenaers, E.; Kilangi, J.; Muhangi, S.; Ngo Xuan, P.; Paofa, J.; Pavis, C.; Thiemele, D.; Tossou, C.; Sandoval, J.; Sutanto, A.; Vangu Paka, G.; Yi, G.; Van den houwe, I.; Roux, N.
2017-01-01
Abstract Unraveling the genetic diversity held in genebanks on a large scale is underway, due to advances in Next-generation sequence (NGS) based technologies that produce high-density genetic markers for a large number of samples at low cost. Genebank users should be in a position to identify and select germplasm from the global genepool based on a combination of passport, genotypic and phenotypic data. To facilitate this, a new generation of information systems is being designed to efficiently handle data and link it with other external resources such as genome or breeding databases. The Musa Germplasm Information System (MGIS), the database for global ex situ-held banana genetic resources, has been developed to address those needs in a user-friendly way. In developing MGIS, we selected a generic database schema (Chado), the robust content management system Drupal for the user interface, and Tripal, a set of Drupal modules which links the Chado schema to Drupal. MGIS allows germplasm collection examination, accession browsing, advanced search functions, and germplasm orders. Additionally, we developed unique graphical interfaces to compare accessions and to explore them based on their taxonomic information. Accession-based data has been enriched with publications, genotyping studies and associated genotyping datasets reporting on germplasm use. Finally, an interoperability layer has been implemented to facilitate the link with complementary databases like the Banana Genome Hub and the MusaBase breeding database. Database URL: https://www.crop-diversity.org/mgis/ PMID:29220435
Yokochi, Masashi; Kobayashi, Naohiro; Ulrich, Eldon L; Kinjo, Akira R; Iwata, Takeshi; Ioannidis, Yannis E; Livny, Miron; Markley, John L; Nakamura, Haruki; Kojima, Chojiro; Fujiwara, Toshimichi
2016-05-05
The nuclear magnetic resonance (NMR) spectroscopic data for biological macromolecules archived at the BioMagResBank (BMRB) provide a rich resource of biophysical information at atomic resolution. The NMR data archived in NMR-STAR ASCII format have been implemented in a relational database. However, it is still fairly difficult for users to retrieve data from the NMR-STAR files or the relational database in association with data from other biological databases. To enhance the interoperability of the BMRB database, we present a full conversion of BMRB entries to two standard structured data formats, XML and RDF, as common open representations of the NMR-STAR data. Moreover, a SPARQL endpoint has been deployed. The described case study demonstrates that a simple query of the SPARQL endpoints of the BMRB, UniProt, and Online Mendelian Inheritance in Man (OMIM), can be used in NMR and structure-based analysis of proteins combined with information of single nucleotide polymorphisms (SNPs) and their phenotypes. We have developed BMRB/XML and BMRB/RDF and demonstrate their use in performing a federated SPARQL query linking the BMRB to other databases through standard semantic web technologies. This will facilitate data exchange across diverse information resources.
Towards an Age-Phenome Knowledge-base
2011-01-01
Background Currently, data about age-phenotype associations are not systematically organized and cannot be studied methodically. Searching for scientific articles describing phenotypic changes reported as occurring at a given age is not possible for most ages. Results Here we present the Age-Phenome Knowledge-base (APK), in which knowledge about age-related phenotypic patterns and events can be modeled and stored for retrieval. The APK contains evidence connecting specific ages or age groups with phenotypes, such as disease and clinical traits. Using a simple text mining tool developed for this purpose, we extracted instances of age-phenotype associations from journal abstracts related to non-insulin-dependent Diabetes Mellitus. In addition, links between age and phenotype were extracted from clinical data obtained from the NHANES III survey. The knowledge stored in the APK is made available for the relevant research community in the form of 'Age-Cards', each card holds the collection of all the information stored in the APK about a particular age. These Age-Cards are presented in a wiki, allowing community review, amendment and contribution of additional information. In addition to the wiki interaction, complex searches can also be conducted which require the user to have some knowledge of database query construction. Conclusions The combination of a knowledge model based repository with community participation in the evolution and refinement of the knowledge-base makes the APK a useful and valuable environment for collecting and curating existing knowledge of the connections between age and phenotypes. PMID:21651792
VaProS: a database-integration approach for protein/genome information retrieval.
Gojobori, Takashi; Ikeo, Kazuho; Katayama, Yukie; Kawabata, Takeshi; Kinjo, Akira R; Kinoshita, Kengo; Kwon, Yeondae; Migita, Ohsuke; Mizutani, Hisashi; Muraoka, Masafumi; Nagata, Koji; Omori, Satoshi; Sugawara, Hideaki; Yamada, Daichi; Yura, Kei
2016-12-01
Life science research now heavily relies on all sorts of databases for genome sequences, transcription, protein three-dimensional (3D) structures, protein-protein interactions, phenotypes and so forth. The knowledge accumulated by all the omics research is so vast that a computer-aided search of data is now a prerequisite for starting a new study. In addition, a combinatory search throughout these databases has a chance to extract new ideas and new hypotheses that can be examined by wet-lab experiments. By virtually integrating the related databases on the Internet, we have built a new web application that facilitates life science researchers for retrieving experts' knowledge stored in the databases and for building a new hypothesis of the research target. This web application, named VaProS, puts stress on the interconnection between the functional information of genome sequences and protein 3D structures, such as structural effect of the gene mutation. In this manuscript, we present the notion of VaProS, the databases and tools that can be accessed without any knowledge of database locations and data formats, and the power of search exemplified in quest of the molecular mechanisms of lysosomal storage disease. VaProS can be freely accessed at http://p4d-info.nig.ac.jp/vapros/ .
Hu, Yuming; Callebert, Pieter; Vandemoortel, Ilse; Nguyen, Long; Audenaert, Dominique; Verschraegen, Luc; Vandenbussche, Filip; Van Der Straeten, Dominique
2014-02-01
Small molecules which act as hormone agonists or antagonists represent useful tools in fundamental research and are widely applied in agriculture to control hormone effects. High-throughput screening of large chemical compound libraries has yielded new findings in plant biology, with possible future applications in agriculture and horticulture. To further understand ethylene biosynthesis/signaling and its crosstalk with other hormones, we screened a 12,000 compound chemical library based on an ethylene-related bioassay of dark-grown Arabidopsis thaliana (L.) Heynh. seedlings. From the initial screening, 1313 (∼11%) biologically active small molecules altering the phenotype triggered by the ethylene precursor 1-aminocyclopropane-1-carboxylic acid (ACC), were identified. Selection and sorting in classes were based on the angle of curvature of the apical hook, the length and width of the hypocotyl and the root. A MySQL-database was constructed (https://chaos.ugent.be/WE15/) including basic chemical information on the compounds, images illustrating the phenotypes, phenotype descriptions and classification. The research perspectives for different classes of hit compounds will be evaluated, and some general screening tips for customized high-throughput screening and pitfalls will be discussed. Copyright © 2013 Elsevier Masson SAS. All rights reserved.
Gene: a gene-centered information resource at NCBI.
Brown, Garth R; Hem, Vichet; Katz, Kenneth S; Ovetsky, Michael; Wallin, Craig; Ermolaeva, Olga; Tolstoy, Igor; Tatusova, Tatiana; Pruitt, Kim D; Maglott, Donna R; Murphy, Terence D
2015-01-01
The National Center for Biotechnology Information's (NCBI) Gene database (www.ncbi.nlm.nih.gov/gene) integrates gene-specific information from multiple data sources. NCBI Reference Sequence (RefSeq) genomes for viruses, prokaryotes and eukaryotes are the primary foundation for Gene records in that they form the critical association between sequence and a tracked gene upon which additional functional and descriptive content is anchored. Additional content is integrated based on the genomic location and RefSeq transcript and protein sequence data. The content of a Gene record represents the integration of curation and automated processing from RefSeq, collaborating model organism databases, consortia such as Gene Ontology, and other databases within NCBI. Records in Gene are assigned unique, tracked integers as identifiers. The content (citations, nomenclature, genomic location, gene products and their attributes, phenotypes, sequences, interactions, variation details, maps, expression, homologs, protein domains and external databases) is available via interactive browsing through NCBI's Entrez system, via NCBI's Entrez programming utilities (E-Utilities and Entrez Direct) and for bulk transfer by FTP. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Dahdul, Wasila M; Balhoff, James P; Engeman, Jeffrey; Grande, Terry; Hilton, Eric J; Kothari, Cartik; Lapp, Hilmar; Lundberg, John G; Midford, Peter E; Vision, Todd J; Westerfield, Monte; Mabee, Paula M
2010-05-20
The wealth of phenotypic descriptions documented in the published articles, monographs, and dissertations of phylogenetic systematics is traditionally reported in a free-text format, and it is therefore largely inaccessible for linkage to biological databases for genetics, development, and phenotypes, and difficult to manage for large-scale integrative work. The Phenoscape project aims to represent these complex and detailed descriptions with rich and formal semantics that are amenable to computation and integration with phenotype data from other fields of biology. This entails reconceptualizing the traditional free-text characters into the computable Entity-Quality (EQ) formalism using ontologies. We used ontologies and the EQ formalism to curate a collection of 47 phylogenetic studies on ostariophysan fishes (including catfishes, characins, minnows, knifefishes) and their relatives with the goal of integrating these complex phenotype descriptions with information from an existing model organism database (zebrafish, http://zfin.org). We developed a curation workflow for the collection of character, taxonomic and specimen data from these publications. A total of 4,617 phenotypic characters (10,512 states) for 3,449 taxa, primarily species, were curated into EQ formalism (for a total of 12,861 EQ statements) using anatomical and taxonomic terms from teleost-specific ontologies (Teleost Anatomy Ontology and Teleost Taxonomy Ontology) in combination with terms from a quality ontology (Phenotype and Trait Ontology). Standards and guidelines for consistently and accurately representing phenotypes were developed in response to the challenges that were evident from two annotation experiments and from feedback from curators. The challenges we encountered and many of the curation standards and methods for improving consistency that we developed are generally applicable to any effort to represent phenotypes using ontologies. This is because an ontological representation of the detailed variations in phenotype, whether between mutant or wildtype, among individual humans, or across the diversity of species, requires a process by which a precise combination of terms from domain ontologies are selected and organized according to logical relations. The efficiencies that we have developed in this process will be useful for any attempt to annotate complex phenotypic descriptions using ontologies. We also discuss some ramifications of EQ representation for the domain of systematics.
Taboada, María; Martínez, Diego; Pilo, Belén; Jiménez-Escrig, Adriano; Robinson, Peter N; Sobrido, María J
2012-07-31
Semantic Web technology can considerably catalyze translational genetics and genomics research in medicine, where the interchange of information between basic research and clinical levels becomes crucial. This exchange involves mapping abstract phenotype descriptions from research resources, such as knowledge databases and catalogs, to unstructured datasets produced through experimental methods and clinical practice. This is especially true for the construction of mutation databases. This paper presents a way of harmonizing abstract phenotype descriptions with patient data from clinical practice, and querying this dataset about relationships between phenotypes and genetic variants, at different levels of abstraction. Due to the current availability of ontological and terminological resources that have already reached some consensus in biomedicine, a reuse-based ontology engineering approach was followed. The proposed approach uses the Ontology Web Language (OWL) to represent the phenotype ontology and the patient model, the Semantic Web Rule Language (SWRL) to bridge the gap between phenotype descriptions and clinical data, and the Semantic Query Web Rule Language (SQWRL) to query relevant phenotype-genotype bidirectional relationships. The work tests the use of semantic web technology in the biomedical research domain named cerebrotendinous xanthomatosis (CTX), using a real dataset and ontologies. A framework to query relevant phenotype-genotype bidirectional relationships is provided. Phenotype descriptions and patient data were harmonized by defining 28 Horn-like rules in terms of the OWL concepts. In total, 24 patterns of SWQRL queries were designed following the initial list of competency questions. As the approach is based on OWL, the semantic of the framework adapts the standard logical model of an open world assumption. This work demonstrates how semantic web technologies can be used to support flexible representation and computational inference mechanisms required to query patient datasets at different levels of abstraction. The open world assumption is especially good for describing only partially known phenotype-genotype relationships, in a way that is easily extensible. In future, this type of approach could offer researchers a valuable resource to infer new data from patient data for statistical analysis in translational research. In conclusion, phenotype description formalization and mapping to clinical data are two key elements for interchanging knowledge between basic and clinical research.
Li, Min; Dong, Xiang-yu; Liang, Hao; Leng, Li; Zhang, Hui; Wang, Shou-zhi; Li, Hui; Du, Zhi-Qiang
2017-05-20
Effective management and analysis of precisely recorded phenotypic traits are important components of the selection and breeding of superior livestocks. Over two decades, we divergently selected chicken lines for abdominal fat content at Northeast Agricultural University (Northeast Agricultural University High and Low Fat, NEAUHLF), and collected large volume of phenotypic data related to the investigation on molecular genetic basis of adipose tissue deposition in broilers. To effectively and systematically store, manage and analyze phenotypic data, we built the NEAUHLF Phenome Database (NEAUHLFPD). NEAUHLFPD included the following phenotypic records: pedigree (generations 1-19) and 29 phenotypes, such as body sizes and weights, carcass traits and their corresponding rates. The design and construction strategy of NEAUHLFPD were executed as follows: (1) Framework design. We used Apache as our web server, MySQL and Navicat as database management tools, and PHP as the HTML-embedded language to create dynamic interactive website. (2) Structural components. On the main interface, detailed introduction on the composition, function, and the index buttons of the basic structure of the database could be found. The functional modules of NEAUHLFPD had two main components: the first module referred to the physical storage space for phenotypic data, in which functional manipulation on data can be realized, such as data indexing, filtering, range-setting, searching, etc.; the second module related to the calculation of basic descriptive statistics, where data filtered from the database can be used for the computation of basic statistical parameters and the simultaneous conditional sorting. NEAUHLFPD could be used to effectively store and manage not only phenotypic, but also genotypic and genomics data, which can facilitate further investigation on the molecular genetic basis of chicken adipose tissue growth and development, and expedite the selection and breeding of broilers with low fat content.
EuroPhenome: a repository for high-throughput mouse phenotyping data
Morgan, Hugh; Beck, Tim; Blake, Andrew; Gates, Hilary; Adams, Niels; Debouzy, Guillaume; Leblanc, Sophie; Lengger, Christoph; Maier, Holger; Melvin, David; Meziane, Hamid; Richardson, Dave; Wells, Sara; White, Jacqui; Wood, Joe; de Angelis, Martin Hrabé; Brown, Steve D. M.; Hancock, John M.; Mallon, Ann-Marie
2010-01-01
The broad aim of biomedical science in the postgenomic era is to link genomic and phenotype information to allow deeper understanding of the processes leading from genomic changes to altered phenotype and disease. The EuroPhenome project (http://www.EuroPhenome.org) is a comprehensive resource for raw and annotated high-throughput phenotyping data arising from projects such as EUMODIC. EUMODIC is gathering data from the EMPReSSslim pipeline (http://www.empress.har.mrc.ac.uk/) which is performed on inbred mouse strains and knock-out lines arising from the EUCOMM project. The EuroPhenome interface allows the user to access the data via the phenotype or genotype. It also allows the user to access the data in a variety of ways, including graphical display, statistical analysis and access to the raw data via web services. The raw phenotyping data captured in EuroPhenome is annotated by an annotation pipeline which automatically identifies statistically different mutants from the appropriate baseline and assigns ontology terms for that specific test. Mutant phenotypes can be quickly identified using two EuroPhenome tools: PhenoMap, a graphical representation of statistically relevant phenotypes, and mining for a mutant using ontology terms. To assist with data definition and cross-database comparisons, phenotype data is annotated using combinations of terms from biological ontologies. PMID:19933761
Computational Approaches to Phenotyping
Lussier, Yves A.; Liu, Yang
2007-01-01
The recent completion of the Human Genome Project has made possible a high-throughput “systems approach” for accelerating the elucidation of molecular underpinnings of human diseases, and subsequent derivation of molecular-based strategies to more effectively prevent, diagnose, and treat these diseases. Although altered phenotypes are among the most reliable manifestations of altered gene functions, research using systematic analysis of phenotype relationships to study human biology is still in its infancy. This article focuses on the emerging field of high-throughput phenotyping (HTP) phenomics research, which aims to capitalize on novel high-throughput computation and informatics technology developments to derive genomewide molecular networks of genotype–phenotype associations, or “phenomic associations.” The HTP phenomics research field faces the challenge of technological research and development to generate novel tools in computation and informatics that will allow researchers to amass, access, integrate, organize, and manage phenotypic databases across species and enable genomewide analysis to associate phenotypic information with genomic data at different scales of biology. Key state-of-the-art technological advancements critical for HTP phenomics research are covered in this review. In particular, we highlight the power of computational approaches to conduct large-scale phenomics studies. PMID:17202287
Lynx: a database and knowledge extraction engine for integrative medicine.
Sulakhe, Dinanath; Balasubramanian, Sandhya; Xie, Bingqing; Feng, Bo; Taylor, Andrew; Wang, Sheng; Berrocal, Eduardo; Dave, Utpal; Xu, Jinbo; Börnigen, Daniela; Gilliam, T Conrad; Maltsev, Natalia
2014-01-01
We have developed Lynx (http://lynx.ci.uchicago.edu)--a web-based database and a knowledge extraction engine, supporting annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Its underlying knowledge base (LynxKB) integrates various classes of information from >35 public databases and private collections, as well as manually curated data from our group and collaborators. Lynx provides advanced search capabilities and a variety of algorithms for enrichment analysis and network-based gene prioritization to assist the user in extracting meaningful knowledge from LynxKB and experimental data, whereas its service-oriented architecture provides public access to LynxKB and its analytical tools via user-friendly web services and interfaces.
The YeastGenome app: the Saccharomyces Genome Database at your fingertips.
Wong, Edith D; Karra, Kalpana; Hitz, Benjamin C; Hong, Eurie L; Cherry, J Michael
2013-01-01
The Saccharomyces Genome Database (SGD) is a scientific database that provides researchers with high-quality curated data about the genes and gene products of Saccharomyces cerevisiae. To provide instant and easy access to this information on mobile devices, we have developed YeastGenome, a native application for the Apple iPhone and iPad. YeastGenome can be used to quickly find basic information about S. cerevisiae genes and chromosomal features regardless of internet connectivity. With or without network access, you can view basic information and Gene Ontology annotations about a gene of interest by searching gene names and gene descriptions or by browsing the database within the app to find the gene of interest. With internet access, the app provides more detailed information about the gene, including mutant phenotypes, references and protein and genetic interactions, as well as provides hyperlinks to retrieve detailed information by showing SGD pages and views of the genome browser. SGD provides online help describing basic ways to navigate the mobile version of SGD, highlights key features and answers frequently asked questions related to the app. The app is available from iTunes (http://itunes.com/apps/yeastgenome). The YeastGenome app is provided freely as a service to our community, as part of SGD's mission to provide free and open access to all its data and annotations.
Ran, Xia; Cai, Wei-Jun; Huang, Xiu-Feng; Liu, Qi; Lu, Fan; Qu, Jia; Wu, Jinyu; Jin, Zi-Bing
2014-01-01
Inherited retinal degeneration (IRD), a leading cause of human blindness worldwide, is exceptionally heterogeneous with clinical heterogeneity and genetic variety. During the past decades, tremendous efforts have been made to explore the complex heterogeneity, and massive mutations have been identified in different genes underlying IRD with the significant advancement of sequencing technology. In this study, we developed a comprehensive database, 'RetinoGenetics', which contains informative knowledge about all known IRD-related genes and mutations for IRD. 'RetinoGenetics' currently contains 4270 mutations in 186 genes, with detailed information associated with 164 phenotypes from 934 publications and various types of functional annotations. Then extensive annotations were performed to each gene using various resources, including Gene Ontology, KEGG pathways, protein-protein interaction, mutational annotations and gene-disease network. Furthermore, by using the search functions, convenient browsing ways and intuitive graphical displays, 'RetinoGenetics' could serve as a valuable resource for unveiling the genetic basis of IRD. Taken together, 'RetinoGenetics' is an integrative, informative and updatable resource for IRD-related genetic predispositions. Database URL: http://www.retinogenetics.org/. © The Author(s) 2014. Published by Oxford University Press.
Badisco, Liesbeth; Huybrechts, Jurgen; Simonet, Gert; Verlinden, Heleen; Marchal, Elisabeth; Huybrechts, Roger; Schoofs, Liliane; De Loof, Arnold; Vanden Broeck, Jozef
2011-03-21
The desert locust (Schistocerca gregaria) displays a fascinating type of phenotypic plasticity, designated as 'phase polyphenism'. Depending on environmental conditions, one genome can be translated into two highly divergent phenotypes, termed the solitarious and gregarious (swarming) phase. Although many of the underlying molecular events remain elusive, the central nervous system (CNS) is expected to play a crucial role in the phase transition process. Locusts have also proven to be interesting model organisms in a physiological and neurobiological research context. However, molecular studies in locusts are hampered by the fact that genome/transcriptome sequence information available for this branch of insects is still limited. We have generated 34,672 raw expressed sequence tags (EST) from the CNS of desert locusts in both phases. These ESTs were assembled in 12,709 unique transcript sequences and nearly 4,000 sequences were functionally annotated. Moreover, the obtained S. gregaria EST information is highly complementary to the existing orthopteran transcriptomic data. Since many novel transcripts encode neuronal signaling and signal transduction components, this paper includes an overview of these sequences. Furthermore, several transcripts being differentially represented in solitarious and gregarious locusts were retrieved from this EST database. The findings highlight the involvement of the CNS in the phase transition process and indicate that this novel annotated database may also add to the emerging knowledge of concomitant neuronal signaling and neuroplasticity events. In summary, we met the need for novel sequence data from desert locust CNS. To our knowledge, we hereby also present the first insect EST database that is derived from the complete CNS. The obtained S. gregaria EST data constitute an important new source of information that will be instrumental in further unraveling the molecular principles of phase polyphenism, in further establishing locusts as valuable research model organisms and in molecular evolutionary and comparative entomology.
Application of Genetic/Genomic Approaches to Allergic Disorders
Baye, Tesfaye M.; Martin, Lisa J.; Khurana Hershey, Gurjit K.
2010-01-01
Completion of the human genome project and rapid progress in genetics and bioinformatics have enabled the development of large public databases, which include genetic and genomic data linked to clinical health data. With the massive amount of information available, clinicians and researchers have the unique opportunity to complement and integrate their daily practice with the existing resources to clarify the underlying etiology of complex phenotypes such as allergic diseases. The genome itself is now often utilized as a starting point for many studies and multiple innovative approaches have emerged applying genetic/genomic strategies to key questions in the field of allergy and immunology. There have been several successes, which have uncovered new insights into the biologic underpinnings of allergic disorders. Herein, we will provide an in depth review of genomic approaches to identifying genes and biologic networks involved in allergic diseases. We will discuss genetic and phenotypic variation, statistical approaches for gene discovery, public databases, functional genomics, clinical implications, and the challenges that remain. PMID:20638111
Flavitrack: an annotated database of flavivirus sequences
Misra, Milind
2009-01-01
Motivation Properly annotated sequence data for flaviviruses, which cause diseases, such as tick-borne encephalitis (TBE), dengue fever (DF), West Nile (WN) and yellow fever (YF), can aid in the design of antiviral drugs and vaccines to prevent their spread. Flavitrack was designed to help identify conserved sequence motifs, interpret mutational and structural data and track evolution of phenotypic properties. Summary Flavitrack contains over 590 complete flavivirus genome/protein sequences and information on known mutations and literature references. Each sequence has been manually annotated according to its date and place of isolation, phenotype and lethality. Internal tools are provided to rapidly determine relationships between viruses in Flavitrack and sequences provided by the user. Availability http://carnot.utmb.edu/flavitrack Contact chschein@utmb.edu Supplementary information http://carnot.utmb.edu/flavitrack/B1S1.html PMID:17660525
van den Akker, Peter C; Jonkman, Marcel F; Rengaw, Trebor; Bruckner-Tuderman, Leena; Has, Cristina; Bauer, Johann W; Klausegger, Alfred; Zambruno, Giovanna; Castiglia, Daniele; Mellerio, Jemima E; McGrath, John A; van Essen, Anthonie J; Hofstra, Robert M W; Swertz, Morris A
2011-10-01
Dystrophic epidermolysis bullosa (DEB) is a heritable blistering disorder that can be inherited autosomal dominantly (DDEB) or recessively (RDEB) and covers a group of several distinctive phenotypes. A large number of unique COL7A1 mutations have been shown to underlie DEB. Although general genotype-phenotype correlation rules have emerged, many exceptions to these rules exist, compromising disease diagnosing and genetic counseling. We therefore constructed the International DEB Patient Registry (http://www.deb-central.org), aimed at worldwide collection and sharing of phenotypic and genotypic information on DEB. As of May 2011, this MOLGENIS-based registry contains detailed information on 508 published and 71 unpublished patients and their 388 unique COL7A1 mutations, and includes all combinations of mutations. The current registry RDEB versus DDEB ratio of 4:1, if compared to prevalence figures, suggests underreporting of DDEB in the literature. Thirty-eight percent of mutations stored introduce a premature termination codon (PTC) and 43% an amino acid change. Submission wizards allow users to quickly and easily share novel information. This registry will be of great help in disease diagnosing and genetic counseling and will lead to novel insights, especially in the rare phenotypes of which there is often lack of understanding. Altogether, this registry will greatly benefit the DEB patients. © 2011 Wiley-Liss, Inc.
FBIS: A regional DNA barcode archival & analysis system for Indian fishes.
Nagpure, Naresh Sahebrao; Rashid, Iliyas; Pathak, Ajey Kumar; Singh, Mahender; Singh, Shri Prakash; Sarkar, Uttam Kumar
2012-01-01
DNA barcode is a new tool for taxon recognition and classification of biological organisms based on sequence of a fragment of mitochondrial gene, cytochrome c oxidase I (COI). In view of the growing importance of the fish DNA barcoding for species identification, molecular taxonomy and fish diversity conservation, we developed a Fish Barcode Information System (FBIS) for Indian fishes, which will serve as a regional DNA barcode archival and analysis system. The database presently contains 2334 sequence records of COI gene for 472 aquatic species belonging to 39 orders and 136 families, collected from available published data sources. Additionally, it contains information on phenotype, distribution and IUCN Red List status of fishes. The web version of FBIS was designed using MySQL, Perl and PHP under Linux operating platform to (a) store and manage the acquisition (b) analyze and explore DNA barcode records (c) identify species and estimate genetic divergence. FBIS has also been integrated with appropriate tools for retrieving and viewing information about the database statistics and taxonomy. It is expected that FBIS would be useful as a potent information system in fish molecular taxonomy, phylogeny and genomics. The database is available for free at http://mail.nbfgr.res.in/fbis/
Bohland, Jason W; Myers, Emma M; Kim, Esther
2014-01-01
A number of heritable disorders impair the normal development of speech and language processes and occur in large numbers within the general population. While candidate genes and loci have been identified, the gap between genotype and phenotype is vast, limiting current understanding of the biology of normal and disordered processes. This gap exists not only in our scientific knowledge, but also in our research communities, where genetics researchers and speech, language, and cognitive scientists tend to operate independently. Here we describe a web-based, domain-specific, curated database that represents information about genotype-phenotype relations specific to speech and language disorders, as well as neuroimaging results demonstrating focal brain differences in relevant patients versus controls. Bringing these two distinct data types into a common database ( http://neurospeech.org/sldb ) is a first step toward bringing molecular level information into cognitive and computational theories of speech and language function. One bridge between these data types is provided by densely sampled profiles of gene expression in the brain, such as those provided by the Allen Brain Atlases. Here we present results from exploratory analyses of human brain gene expression profiles for genes implicated in speech and language disorders, which are annotated in our database. We then discuss how such datasets can be useful in the development of computational models that bridge levels of analysis, necessary to provide a mechanistic understanding of heritable language disorders. We further describe our general approach to information integration, discuss important caveats and considerations, and offer a specific but speculative example based on genes implicated in stuttering and basal ganglia function in speech motor control.
Cyclebase 3.0: a multi-organism database on cell-cycle regulation and phenotypes.
Santos, Alberto; Wernersson, Rasmus; Jensen, Lars Juhl
2015-01-01
The eukaryotic cell division cycle is a highly regulated process that consists of a complex series of events and involves thousands of proteins. Researchers have studied the regulation of the cell cycle in several organisms, employing a wide range of high-throughput technologies, such as microarray-based mRNA expression profiling and quantitative proteomics. Due to its complexity, the cell cycle can also fail or otherwise change in many different ways if important genes are knocked out, which has been studied in several microscopy-based knockdown screens. The data from these many large-scale efforts are not easily accessed, analyzed and combined due to their inherent heterogeneity. To address this, we have created Cyclebase--available at http://www.cyclebase.org--an online database that allows users to easily visualize and download results from genome-wide cell-cycle-related experiments. In Cyclebase version 3.0, we have updated the content of the database to reflect changes to genome annotation, added new mRNA and protein expression data, and integrated cell-cycle phenotype information from high-content screens and model-organism databases. The new version of Cyclebase also features a new web interface, designed around an overview figure that summarizes all the cell-cycle-related data for a gene. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Oliver, Sarah; Willard, Francis S.; Heidler, Steven; Peery, Robert B.; Oler, Jennifer; Chu, Shaoyou; Southall, Noel; Dexheimer, Thomas S.; Smallwood, Jeffrey; Huang, Ruili; Guha, Rajarshi; Jadhav, Ajit; Cox, Karen; Austin, Christopher P.; Simeonov, Anton; Sittampalam, G. Sitta; Husain, Saba; Franklin, Natalie; Wild, David J.; Yang, Jeremy J.; Sutherland, Jeffrey J.; Thomas, Craig J.
2015-01-01
Phenotypic assays have a proven track record for generating leads that become first-in-class therapies. Whole cell assays that inform on a phenotype or mechanism also possess great potential in drug repositioning studies by illuminating new activities for the existing pharmacopeia. The National Center for Advancing Translational Sciences (NCATS) pharmaceutical collection (NPC) is the largest reported collection of approved small molecule therapeutics that is available for screening in a high-throughput setting. Via a wide-ranging collaborative effort, this library was analyzed in the Open Innovation Drug Discovery (OIDD) phenotypic assay modules publicly offered by Lilly. The results of these tests are publically available online at www.ncats.nih.gov/expertise/preclinical/pd2 and via the PubChem Database (https://pubchem.ncbi.nlm.nih.gov/) (AID 1117321). Phenotypic outcomes for numerous drugs were confirmed, including sulfonylureas as insulin secretagogues and the anti-angiogenesis actions of multikinase inhibitors sorafenib, axitinib and pazopanib. Several novel outcomes were also noted including the Wnt potentiating activities of rotenone and the antifolate class of drugs, and the anti-angiogenic activity of cetaben. PMID:26177200
Lee, Jonathan A; Shinn, Paul; Jaken, Susan; Oliver, Sarah; Willard, Francis S; Heidler, Steven; Peery, Robert B; Oler, Jennifer; Chu, Shaoyou; Southall, Noel; Dexheimer, Thomas S; Smallwood, Jeffrey; Huang, Ruili; Guha, Rajarshi; Jadhav, Ajit; Cox, Karen; Austin, Christopher P; Simeonov, Anton; Sittampalam, G Sitta; Husain, Saba; Franklin, Natalie; Wild, David J; Yang, Jeremy J; Sutherland, Jeffrey J; Thomas, Craig J
2015-01-01
Phenotypic assays have a proven track record for generating leads that become first-in-class therapies. Whole cell assays that inform on a phenotype or mechanism also possess great potential in drug repositioning studies by illuminating new activities for the existing pharmacopeia. The National Center for Advancing Translational Sciences (NCATS) pharmaceutical collection (NPC) is the largest reported collection of approved small molecule therapeutics that is available for screening in a high-throughput setting. Via a wide-ranging collaborative effort, this library was analyzed in the Open Innovation Drug Discovery (OIDD) phenotypic assay modules publicly offered by Lilly. The results of these tests are publically available online at www.ncats.nih.gov/expertise/preclinical/pd2 and via the PubChem Database (https://pubchem.ncbi.nlm.nih.gov/) (AID 1117321). Phenotypic outcomes for numerous drugs were confirmed, including sulfonylureas as insulin secretagogues and the anti-angiogenesis actions of multikinase inhibitors sorafenib, axitinib and pazopanib. Several novel outcomes were also noted including the Wnt potentiating activities of rotenone and the antifolate class of drugs, and the anti-angiogenic activity of cetaben.
Choosing a genome browser for a Model Organism Database: surveying the Maize community
Sen, Taner Z.; Harper, Lisa C.; Schaeffer, Mary L.; Andorf, Carson M.; Seigfried, Trent E.; Campbell, Darwin A.; Lawrence, Carolyn J.
2010-01-01
As the B73 maize genome sequencing project neared completion, MaizeGDB began to integrate a graphical genome browser with its existing web interface and database. To ensure that maize researchers would optimally benefit from the potential addition of a genome browser to the existing MaizeGDB resource, personnel at MaizeGDB surveyed researchers’ needs. Collected data indicate that existing genome browsers for maize were inadequate and suggest implementation of a browser with quick interface and intuitive tools would meet most researchers’ needs. Here, we document the survey’s outcomes, review functionalities of available genome browser software platforms and offer our rationale for choosing the GBrowse software suite for MaizeGDB. Because the genome as represented within the MaizeGDB Genome Browser is tied to detailed phenotypic data, molecular marker information, available stocks, etc., the MaizeGDB Genome Browser represents a novel mechanism by which the researchers can leverage maize sequence information toward crop improvement directly. Database URL: http://gbrowse.maizegdb.org/ PMID:20627860
Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders.
Hamosh, Ada; Scott, Alan F; Amberger, Joanna S; Bocchini, Carol A; McKusick, Victor A
2005-01-01
Online Mendelian Inheritance in Man (OMIM) is a comprehensive, authoritative and timely knowledgebase of human genes and genetic disorders compiled to support human genetics research and education and the practice of clinical genetics. Started by Dr Victor A. McKusick as the definitive reference Mendelian Inheritance in Man, OMIM (http://www.ncbi.nlm.nih.gov/omim/) is now distributed electronically by the National Center for Biotechnology Information, where it is integrated with the Entrez suite of databases. Derived from the biomedical literature, OMIM is written and edited at Johns Hopkins University with input from scientists and physicians around the world. Each OMIM entry has a full-text summary of a genetically determined phenotype and/or gene and has numerous links to other genetic databases such as DNA and protein sequence, PubMed references, general and locus-specific mutation databases, HUGO nomenclature, MapViewer, GeneTests, patient support groups and many others. OMIM is an easy and straightforward portal to the burgeoning information in human genetics.
BTKbase, mutation database for X-linked agammaglobulinemia (XLA).
Vihinen, M; Brandau, O; Brandén, L J; Kwan, S P; Lappalainen, I; Lester, T; Noordzij, J G; Ochs, H D; Ollila, J; Pienaar, S M; Riikonen, P; Saha, B K; Smith, C I
1998-01-01
X-linked agammaglobulinemia (XLA) is an immunodeficiency caused by mutations in the gene coding for Bruton's agammaglobulinemia tyrosine kinase (BTK). A database (BTKbase) of BTK mutations has been compiled and the recent update lists 463 mutation entries from 406 unrelated families showing 303 unique molecular events. In addition to mutations, the database also lists variants or polymorphisms. Each patient is given a unique patient identity number (PIN). Information is included regarding the phenotype including symptoms. Mutations in all the five domains of BTK have been noticed to cause the disease, the most common event being missense mutations. The mutations appear almost uniformly throughout the molecule and frequently affect CpG sites that code for arginine residues. The putative structural implications of all the missense mutations are given in the database. The improved version of the registry having a number of new features is available at http://www. helsinki.fi/science/signal/btkbase.html PMID:9399844
Mashima, Jun; Kodama, Yuichi; Fujisawa, Takatomo; Katayama, Toshiaki; Okuda, Yoshihiro; Kaminuma, Eli; Ogasawara, Osamu; Okubo, Kousaku; Nakamura, Yasukazu; Takagi, Toshihisa
2017-01-01
The DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) has been providing public data services for thirty years (since 1987). We are collecting nucleotide sequence data from researchers as a member of the International Nucleotide Sequence Database Collaboration (INSDC, http://www.insdc.org), in collaboration with the US National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI). The DDBJ Center also services Japanese Genotype-phenotype Archive (JGA), with the National Bioscience Database Center to collect human-subjected data from Japanese researchers. Here, we report our database activities for INSDC and JGA over the past year, and introduce retrieval and analytical services running on our supercomputer system and their recent modifications. Furthermore, with the Database Center for Life Science, the DDBJ Center improves semantic web technologies to integrate and to share biological data, for providing the RDF version of the sequence data. PMID:27924010
Lynx: a database and knowledge extraction engine for integrative medicine
Sulakhe, Dinanath; Balasubramanian, Sandhya; Xie, Bingqing; Feng, Bo; Taylor, Andrew; Wang, Sheng; Berrocal, Eduardo; Dave, Utpal; Xu, Jinbo; Börnigen, Daniela; Gilliam, T. Conrad; Maltsev, Natalia
2014-01-01
We have developed Lynx (http://lynx.ci.uchicago.edu)—a web-based database and a knowledge extraction engine, supporting annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Its underlying knowledge base (LynxKB) integrates various classes of information from >35 public databases and private collections, as well as manually curated data from our group and collaborators. Lynx provides advanced search capabilities and a variety of algorithms for enrichment analysis and network-based gene prioritization to assist the user in extracting meaningful knowledge from LynxKB and experimental data, whereas its service-oriented architecture provides public access to LynxKB and its analytical tools via user-friendly web services and interfaces. PMID:24270788
Brunet, Marie A; Levesque, Sébastien A; Hunting, Darel J; Cohen, Alan A; Roucou, Xavier
2018-05-01
Technological advances promise unprecedented opportunities for whole exome sequencing and proteomic analyses of populations. Currently, data from genome and exome sequencing or proteomic studies are searched against reference genome annotations. This provides the foundation for research and clinical screening for genetic causes of pathologies. However, current genome annotations substantially underestimate the proteomic information encoded within a gene. Numerous studies have now demonstrated the expression and function of alternative (mainly small, sometimes overlapping) ORFs within mature gene transcripts. This has important consequences for the correlation of phenotypes and genotypes. Most alternative ORFs are not yet annotated because of a lack of evidence, and this absence from databases precludes their detection by standard proteomic methods, such as mass spectrometry. Here, we demonstrate how current approaches tend to overlook alternative ORFs, hindering the discovery of new genetic drivers and fundamental research. We discuss available tools and techniques to improve identification of proteins from alternative ORFs and finally suggest a novel annotation system to permit a more complete representation of the transcriptomic and proteomic information contained within a gene. Given the crucial challenge of distinguishing functional ORFs from random ones, the suggested pipeline emphasizes both experimental data and conservation signatures. The addition of alternative ORFs in databases will render identification less serendipitous and advance the pace of research and genomic knowledge. This review highlights the urgent medical and research need to incorporate alternative ORFs in current genome annotations and thus permit their inclusion in hypotheses and models, which relate phenotypes and genotypes. © 2018 Brunet et al.; Published by Cold Spring Harbor Laboratory Press.
Validation and discovery of genotype-phenotype associations in chronic diseases using linked data.
Pathak, Jyotishman; Kiefer, Richard; Freimuth, Robert; Chute, Christopher
2012-01-01
This study investigates federated SPARQL queries over Linked Open Data (LOD) in the Semantic Web to validate existing, and potentially discover new genotype-phenotype associations from public datasets. In particular, we report our preliminary findings for identifying such associations for commonly occurring chronic diseases using the Online Mendelian Inheritance in Man (OMIM) and Database for SNPs (dbSNP) within the LOD knowledgebase and compare them with Gene Wiki for coverage and completeness. Our results indicate that Semantic Web technologies can play an important role for in-silico identification of novel disease-gene-SNP associations, although additional verification is required before such information can be applied and used effectively.
Howe, Douglas G.; Bradford, Yvonne M.; Eagle, Anne; Fashena, David; Frazer, Ken; Kalita, Patrick; Mani, Prita; Martin, Ryan; Moxon, Sierra Taylor; Paddock, Holly; Pich, Christian; Ramachandran, Sridhar; Ruzicka, Leyla; Schaper, Kevin; Shao, Xiang; Singer, Amy; Toro, Sabrina; Van Slyke, Ceri; Westerfield, Monte
2017-01-01
The Zebrafish Model Organism Database (ZFIN; http://zfin.org) is the central resource for zebrafish (Danio rerio) genetic, genomic, phenotypic and developmental data. ZFIN curators provide expert manual curation and integration of comprehensive data involving zebrafish genes, mutants, transgenic constructs and lines, phenotypes, genotypes, gene expressions, morpholinos, TALENs, CRISPRs, antibodies, anatomical structures, models of human disease and publications. We integrate curated, directly submitted, and collaboratively generated data, making these available to zebrafish research community. Among the vertebrate model organisms, zebrafish are superbly suited for rapid generation of sequence-targeted mutant lines, characterization of phenotypes including gene expression patterns, and generation of human disease models. The recent rapid adoption of zebrafish as human disease models is making management of these data particularly important to both the research and clinical communities. Here, we describe recent enhancements to ZFIN including use of the zebrafish experimental conditions ontology, ‘Fish’ records in the ZFIN database, support for gene expression phenotypes, models of human disease, mutation details at the DNA, RNA and protein levels, and updates to the ZFIN single box search. PMID:27899582
2012-01-01
Background Semantic Web technology can considerably catalyze translational genetics and genomics research in medicine, where the interchange of information between basic research and clinical levels becomes crucial. This exchange involves mapping abstract phenotype descriptions from research resources, such as knowledge databases and catalogs, to unstructured datasets produced through experimental methods and clinical practice. This is especially true for the construction of mutation databases. This paper presents a way of harmonizing abstract phenotype descriptions with patient data from clinical practice, and querying this dataset about relationships between phenotypes and genetic variants, at different levels of abstraction. Methods Due to the current availability of ontological and terminological resources that have already reached some consensus in biomedicine, a reuse-based ontology engineering approach was followed. The proposed approach uses the Ontology Web Language (OWL) to represent the phenotype ontology and the patient model, the Semantic Web Rule Language (SWRL) to bridge the gap between phenotype descriptions and clinical data, and the Semantic Query Web Rule Language (SQWRL) to query relevant phenotype-genotype bidirectional relationships. The work tests the use of semantic web technology in the biomedical research domain named cerebrotendinous xanthomatosis (CTX), using a real dataset and ontologies. Results A framework to query relevant phenotype-genotype bidirectional relationships is provided. Phenotype descriptions and patient data were harmonized by defining 28 Horn-like rules in terms of the OWL concepts. In total, 24 patterns of SWQRL queries were designed following the initial list of competency questions. As the approach is based on OWL, the semantic of the framework adapts the standard logical model of an open world assumption. Conclusions This work demonstrates how semantic web technologies can be used to support flexible representation and computational inference mechanisms required to query patient datasets at different levels of abstraction. The open world assumption is especially good for describing only partially known phenotype-genotype relationships, in a way that is easily extensible. In future, this type of approach could offer researchers a valuable resource to infer new data from patient data for statistical analysis in translational research. In conclusion, phenotype description formalization and mapping to clinical data are two key elements for interchanging knowledge between basic and clinical research. PMID:22849591
Huerta, Mario; Munyi, Marc; Expósito, David; Querol, Enric; Cedano, Juan
2014-06-15
The microarrays performed by scientific teams grow exponentially. These microarray data could be useful for researchers around the world, but unfortunately they are underused. To fully exploit these data, it is necessary (i) to extract these data from a repository of the high-throughput gene expression data like Gene Expression Omnibus (GEO) and (ii) to make the data from different microarrays comparable with tools easy to use for scientists. We have developed these two solutions in our server, implementing a database of microarray marker genes (Marker Genes Data Base). This database contains the marker genes of all GEO microarray datasets and it is updated monthly with the new microarrays from GEO. Thus, researchers can see whether the marker genes of their microarray are marker genes in other microarrays in the database, expanding the analysis of their microarray to the rest of the public microarrays. This solution helps not only to corroborate the conclusions regarding a researcher's microarray but also to identify the phenotype of different subsets of individuals under investigation, to frame the results with microarray experiments from other species, pathologies or tissues, to search for drugs that promote the transition between the studied phenotypes, to detect undesirable side effects of the treatment applied, etc. Thus, the researcher can quickly add relevant information to his/her studies from all of the previous analyses performed in other studies as long as they have been deposited in public repositories. Marker-gene database tool: http://ibb.uab.es/mgdb © The Author 2014. Published by Oxford University Press.
Exploring Genetic, Genomic, and Phenotypic Data at the Rat Genome Database
Laulederkind, Stanley J. F.; Hayman, G. Thomas; Wang, Shur-Jen; Lowry, Timothy F.; Nigam, Rajni; Petri, Victoria; Smith, Jennifer R.; Dwinell, Melinda R.; Jacob, Howard J.; Shimoyama, Mary
2013-01-01
The laboratory rat, Rattus norvegicus, is an important model of human health and disease, and experimental findings in the rat have relevance to human physiology and disease. The Rat Genome Database (RGD, http://rgd.mcw.edu) is a model organism database that provides access to a wide variety of curated rat data including disease associations, phenotypes, pathways, molecular functions, biological processes and cellular components for genes, quantitative trait loci, and strains. We present an overview of the database followed by specific examples that can be used to gain experience in employing RGD to explore the wealth of functional data available for the rat. PMID:23255149
Argo: enabling the development of bespoke workflows and services for disease annotation.
Batista-Navarro, Riza; Carter, Jacob; Ananiadou, Sophia
2016-01-01
Argo (http://argo.nactem.ac.uk) is a generic text mining workbench that can cater to a variety of use cases, including the semi-automatic annotation of literature. It enables its technical users to build their own customised text mining solutions by providing a wide array of interoperable and configurable elementary components that can be seamlessly integrated into processing workflows. With Argo's graphical annotation interface, domain experts can then make use of the workflows' automatically generated output to curate information of interest.With the continuously rising need to understand the aetiology of diseases as well as the demand for their informed diagnosis and personalised treatment, the curation of disease-relevant information from medical and clinical documents has become an indispensable scientific activity. In the Fifth BioCreative Challenge Evaluation Workshop (BioCreative V), there was substantial interest in the mining of literature for disease-relevant information. Apart from a panel discussion focussed on disease annotations, the chemical-disease relations (CDR) track was also organised to foster the sharing and advancement of disease annotation tools and resources.This article presents the application of Argo's capabilities to the literature-based annotation of diseases. As part of our participation in BioCreative V's User Interactive Track (IAT), we demonstrated and evaluated Argo's suitability to the semi-automatic curation of chronic obstructive pulmonary disease (COPD) phenotypes. Furthermore, the workbench facilitated the development of some of the CDR track's top-performing web services for normalising disease mentions against the Medical Subject Headings (MeSH) database. In this work, we highlight Argo's support for developing various types of bespoke workflows ranging from ones which enabled us to easily incorporate information from various databases, to those which train and apply machine learning-based concept recognition models, through to user-interactive ones which allow human curators to manually provide their corrections to automatically generated annotations. Our participation in the BioCreative V challenges shows Argo's potential as an enabling technology for curating disease and phenotypic information from literature.Database URL: http://argo.nactem.ac.uk. © The Author(s) 2016. Published by Oxford University Press.
Argo: enabling the development of bespoke workflows and services for disease annotation
Batista-Navarro, Riza; Carter, Jacob; Ananiadou, Sophia
2016-01-01
Argo (http://argo.nactem.ac.uk) is a generic text mining workbench that can cater to a variety of use cases, including the semi-automatic annotation of literature. It enables its technical users to build their own customised text mining solutions by providing a wide array of interoperable and configurable elementary components that can be seamlessly integrated into processing workflows. With Argo's graphical annotation interface, domain experts can then make use of the workflows' automatically generated output to curate information of interest. With the continuously rising need to understand the aetiology of diseases as well as the demand for their informed diagnosis and personalised treatment, the curation of disease-relevant information from medical and clinical documents has become an indispensable scientific activity. In the Fifth BioCreative Challenge Evaluation Workshop (BioCreative V), there was substantial interest in the mining of literature for disease-relevant information. Apart from a panel discussion focussed on disease annotations, the chemical-disease relations (CDR) track was also organised to foster the sharing and advancement of disease annotation tools and resources. This article presents the application of Argo’s capabilities to the literature-based annotation of diseases. As part of our participation in BioCreative V’s User Interactive Track (IAT), we demonstrated and evaluated Argo’s suitability to the semi-automatic curation of chronic obstructive pulmonary disease (COPD) phenotypes. Furthermore, the workbench facilitated the development of some of the CDR track’s top-performing web services for normalising disease mentions against the Medical Subject Headings (MeSH) database. In this work, we highlight Argo’s support for developing various types of bespoke workflows ranging from ones which enabled us to easily incorporate information from various databases, to those which train and apply machine learning-based concept recognition models, through to user-interactive ones which allow human curators to manually provide their corrections to automatically generated annotations. Our participation in the BioCreative V challenges shows Argo’s potential as an enabling technology for curating disease and phenotypic information from literature. Database URL: http://argo.nactem.ac.uk PMID:27189607
The androgen receptor gene mutations database.
Gottlieb, B; Trifiro, M; Lumbroso, R; Vasiliou, D M; Pinsky, L
1996-01-01
The current version of the androgen receptor (AR) gene mutations database is described. We have added (if available) data on the androgen binding phenotype of the mutant AR, the clinical phenotype of the affected persons, the family history and whether the pathogenicity of a mutation has been proven. Exonic mutations are now listed in 5'-->3' sequence regardless of type and single base pair changes are presented in codon context. Splice site and intronic mutations are listed separately. The database has allowed us to substantiate and amplify the observation of mutational hot spots within exons encoding the AR androgen binding domain. The database is available from EML (ftp://www.ebi.ac.uk/pub/databases/androgen) or as a Macintosh Filemaker file (MC33@musica.mcgill.ca).
Life at the extreme limit: phenotypic characteristics of supercentenarians in Okinawa.
Willcox, D Craig; Willcox, Bradley J; Wang, Nien-Chiang; He, Qimei; Rosenbaum, Matthew; Suzuki, Makoto
2008-11-01
As elite representatives of the rapidly increasing "oldest-old" population, centenarians have become an important model population for understanding human aging. However, as we are beginning to understand more about this important phenotype, another demographic group of even more elite survivors is emerging-so-called "supercentenarians" or those who survive 110-plus years. Little is known about these exceptional survivors. We assessed the Okinawa Centenarian Study (OCS) database for all information on supercentenarians. The database includes dates of birth and year of death for all residents of Okinawa 99 years old or older and a yearly geriatric assessment of all centenarians who consented, enabling prospective study of age-related traits. Of 20 potential supercentenarians identified, 15 had agreed to participate in the OCS interview, physical examination, and blood draw. Of these 15, 12 (3 men and 9 women) met our age validation criteria and were accepted as supercentenarians. Phenotypic variables studied include medical and social history, activities of daily living (ADLs), and clinical phenotypes (physiology, hematology, biochemistry, and immunology). Age at death ranged from 110 to 112 years. The majority of supercentenarians had minimal clinically apparent disease until late in life, with cataracts (42%) and fractures (33%) being common and coronary heart disease (8%), stroke (8%), cancer (0%), and diabetes (0%) rare or not evident on clinical examination. Functionally, most supercentenarians were independent in ADLs at age 100 years, and few were institutionalized before the age of 105 years. Most had normal clinical parameters at age 100 years, but by age 105 exhibited multiple clinical markers of frailty coincident with a rapid ADL decline. Supercentenarians displayed an exceptionally healthy aging phenotype where clinically apparent major chronic diseases and disabilities were markedly delayed, often beyond age 100. They had little clinical history of cardiovascular disease and reported no history of cancer or diabetes. This phenotype is consistent with a more elite phenotype than has been observed in prior studies of centenarians. The genetic and environmental antecedents of this exceptionally healthy aging phenotype deserve further study.
Harmonising phenomics information for a better interoperability in the rare disease field.
Maiella, Sylvie; Olry, Annie; Hanauer, Marc; Lanneau, Valérie; Lourghi, Halima; Donadille, Bruno; Rodwell, Charlotte; Köhler, Sebastian; Seelow, Dominik; Jupp, Simon; Parkinson, Helen; Groza, Tudor; Brudno, Michael; Robinson, Peter N; Rath, Ana
2018-02-07
HIPBI-RD (Harmonising phenomics information for a better interoperability in the rare disease field) is a three-year project which started in 2016 funded via the E-Rare 3 ERA-NET program. This project builds on three resources largely adopted by the rare disease (RD) community: Orphanet, its ontology ORDO (the Orphanet Rare Disease Ontology), HPO (the Human Phenotype Ontology) as well as PhenoTips software for the capture and sharing of structured phenotypic data for RD patients. Our project is further supported by resources developed by the European Bioinformatics Institute and the Garvan Institute. HIPBI-RD aims to provide the community with an integrated, RD-specific bioinformatics ecosystem that will harmonise the way phenomics information is stored in databases and patient files worldwide, and thereby contribute to interoperability. This ecosystem will consist of a suite of tools and ontologies, optimized to work together, and made available through commonly used software repositories. The project workplan follows three main objectives: The HIPBI-RD ecosystem will contribute to the interpretation of variants identified through exome and full genome sequencing by harmonising the way phenotypic information is collected, thus improving diagnostics and delineation of RD. The ultimate goal of HIPBI-RD is to provide a resource that will contribute to bridging genome-scale biology and a disease-centered view on human pathobiology. Achievements in Year 1. Copyright © 2018. Published by Elsevier Masson SAS.
CardioTF, a database of deconstructing transcriptional circuits in the heart system
2016-01-01
Background: Information on cardiovascular gene transcription is fragmented and far behind the present requirements of the systems biology field. To create a comprehensive source of data for cardiovascular gene regulation and to facilitate a deeper understanding of genomic data, the CardioTF database was constructed. The purpose of this database is to collate information on cardiovascular transcription factors (TFs), position weight matrices (PWMs), and enhancer sequences discovered using the ChIP-seq method. Methods: The Naïve-Bayes algorithm was used to classify literature and identify all PubMed abstracts on cardiovascular development. The natural language learning tool GNAT was then used to identify corresponding gene names embedded within these abstracts. Local Perl scripts were used to integrate and dump data from public databases into the MariaDB management system (MySQL). In-house R scripts were written to analyze and visualize the results. Results: Known cardiovascular TFs from humans and human homologs from fly, Ciona, zebrafish, frog, chicken, and mouse were identified and deposited in the database. PWMs from Jaspar, hPDI, and UniPROBE databases were deposited in the database and can be retrieved using their corresponding TF names. Gene enhancer regions from various sources of ChIP-seq data were deposited into the database and were able to be visualized by graphical output. Besides biocuration, mouse homologs of the 81 core cardiac TFs were selected using a Naïve-Bayes approach and then by intersecting four independent data sources: RNA profiling, expert annotation, PubMed abstracts and phenotype. Discussion: The CardioTF database can be used as a portal to construct transcriptional network of cardiac development. Availability and Implementation: Database URL: http://www.cardiosignal.org/database/cardiotf.html. PMID:27635320
CardioTF, a database of deconstructing transcriptional circuits in the heart system.
Zhen, Yisong
2016-01-01
Information on cardiovascular gene transcription is fragmented and far behind the present requirements of the systems biology field. To create a comprehensive source of data for cardiovascular gene regulation and to facilitate a deeper understanding of genomic data, the CardioTF database was constructed. The purpose of this database is to collate information on cardiovascular transcription factors (TFs), position weight matrices (PWMs), and enhancer sequences discovered using the ChIP-seq method. The Naïve-Bayes algorithm was used to classify literature and identify all PubMed abstracts on cardiovascular development. The natural language learning tool GNAT was then used to identify corresponding gene names embedded within these abstracts. Local Perl scripts were used to integrate and dump data from public databases into the MariaDB management system (MySQL). In-house R scripts were written to analyze and visualize the results. Known cardiovascular TFs from humans and human homologs from fly, Ciona, zebrafish, frog, chicken, and mouse were identified and deposited in the database. PWMs from Jaspar, hPDI, and UniPROBE databases were deposited in the database and can be retrieved using their corresponding TF names. Gene enhancer regions from various sources of ChIP-seq data were deposited into the database and were able to be visualized by graphical output. Besides biocuration, mouse homologs of the 81 core cardiac TFs were selected using a Naïve-Bayes approach and then by intersecting four independent data sources: RNA profiling, expert annotation, PubMed abstracts and phenotype. The CardioTF database can be used as a portal to construct transcriptional network of cardiac development. Database URL: http://www.cardiosignal.org/database/cardiotf.html.
PRGdb: a bioinformatics platform for plant resistance gene analysis
Sanseverino, Walter; Roma, Guglielmo; De Simone, Marco; Faino, Luigi; Melito, Sara; Stupka, Elia; Frusciante, Luigi; Ercolano, Maria Raffaella
2010-01-01
PRGdb is a web accessible open-source (http://www.prgdb.org) database that represents the first bioinformatic resource providing a comprehensive overview of resistance genes (R-genes) in plants. PRGdb holds more than 16 000 known and putative R-genes belonging to 192 plant species challenged by 115 different pathogens and linked with useful biological information. The complete database includes a set of 73 manually curated reference R-genes, 6308 putative R-genes collected from NCBI and 10463 computationally predicted putative R-genes. Thanks to a user-friendly interface, data can be examined using different query tools. A home-made prediction pipeline called Disease Resistance Analysis and Gene Orthology (DRAGO), based on reference R-gene sequence data, was developed to search for plant resistance genes in public datasets such as Unigene and Genbank. New putative R-gene classes containing unknown domain combinations were discovered and characterized. The development of the PRG platform represents an important starting point to conduct various experimental tasks. The inferred cross-link between genomic and phenotypic information allows access to a large body of information to find answers to several biological questions. The database structure also permits easy integration with other data types and opens up prospects for future implementations. PMID:19906694
FBIS: A regional DNA barcode archival & analysis system for Indian fishes
Nagpure, Naresh Sahebrao; Rashid, Iliyas; Pathak, Ajey Kumar; Singh, Mahender; Singh, Shri Prakash; Sarkar, Uttam Kumar
2012-01-01
DNA barcode is a new tool for taxon recognition and classification of biological organisms based on sequence of a fragment of mitochondrial gene, cytochrome c oxidase I (COI). In view of the growing importance of the fish DNA barcoding for species identification, molecular taxonomy and fish diversity conservation, we developed a Fish Barcode Information System (FBIS) for Indian fishes, which will serve as a regional DNA barcode archival and analysis system. The database presently contains 2334 sequence records of COI gene for 472 aquatic species belonging to 39 orders and 136 families, collected from available published data sources. Additionally, it contains information on phenotype, distribution and IUCN Red List status of fishes. The web version of FBIS was designed using MySQL, Perl and PHP under Linux operating platform to (a) store and manage the acquisition (b) analyze and explore DNA barcode records (c) identify species and estimate genetic divergence. FBIS has also been integrated with appropriate tools for retrieving and viewing information about the database statistics and taxonomy. It is expected that FBIS would be useful as a potent information system in fish molecular taxonomy, phylogeny and genomics. Availability The database is available for free at http://mail.nbfgr.res.in/fbis/ PMID:22715304
Applying the archetype approach to the database of a biobank information management system.
Späth, Melanie Bettina; Grimson, Jane
2011-03-01
The purpose of this study is to investigate the feasibility of applying the openEHR archetype approach to modelling the data in the database of an existing proprietary biobank information management system. A biobank information management system stores the clinical/phenotypic data of the sample donor and sample related information. The clinical/phenotypic data is potentially sourced from the donor's electronic health record (EHR). The study evaluates the reuse of openEHR archetypes that have been developed for the creation of an interoperable EHR in the context of biobanking, and proposes a new set of archetypes specifically for biobanks. The ultimate goal of the research is the development of an interoperable electronic biomedical research record (eBMRR) to support biomedical knowledge discovery. The database of the prostate cancer biobank of the Irish Prostate Cancer Research Consortium (PCRC), which supports the identification of novel biomarkers for prostate cancer, was taken as the basis for the modelling effort. First the database schema of the biobank was analyzed and reorganized into archetype-friendly concepts. Then, archetype repositories were searched for matching archetypes. Some existing archetypes were reused without change, some were modified or specialized, and new archetypes were developed where needed. The fields of the biobank database schema were then mapped to the elements in the archetypes. Finally, the archetypes were arranged into templates specifically to meet the requirements of the PCRC biobank. A set of 47 archetypes was found to cover all the concepts used in the biobank. Of these, 29 (62%) were reused without change, 6 were modified and/or extended, 1 was specialized, and 11 were newly defined. These archetypes were arranged into 8 templates specifically required for this biobank. A number of issues were encountered in this research. Some arose from the immaturity of the archetype approach, such as immature modelling support tools, difficulties in defining high-quality archetypes and the problem of overlapping archetypes. In addition, the identification of suitable existing archetypes was time-consuming and many semantic conflicts were encountered during the process of mapping the PCRC BIMS database to existing archetypes. These include differences in the granularity of documentation, in metadata-level versus data-level modelling, in terminologies and vocabularies used, and in the amount of structure imposed on the information to be recorded. Furthermore, the current way of modelling the sample entity was found to be cumbersome in the sample-centric activity of biobanking. The archetype approach is a promising approach to create a shareable eBMRR based on the study participant/donor for biobanks. Many archetypes originally developed for the EHR domain can be reused to model the clinical/phenotypic and sample information in the biobank context, which validates the genericity of these archetypes and their potential for reuse in the context of biomedical research. However, finding suitable archetypes in the repositories and establishing an exact mapping between the fields in the PCRC BIMS database and the elements of existing archetypes that have been designed for clinical practice can be challenging and time-consuming and involves resolving many common system integration conflicts. These may be attributable to differences in the requirements for information documentation between clinical practice and biobanking. This research also recognized the need for better support tools, modelling guidelines and best practice rules and reconfirmed the need for better domain knowledge governance. Furthermore, the authors propose that the establishment of an independent sample record with the sample as record subject should be investigated. The research presented in this paper is limited by the fact that the new archetypes developed during this research are based on a single biobank instance. These new archetypes may not be complete, representing only those subsets of items required by this particular database. Nevertheless, this exercise exposes some of the gaps that exist in the archetype modelling landscape and highlights the concepts that need to be modelled with archetypes to enable the development of an eBMRR. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Prediction of gene-phenotype associations in humans, mice, and plants using phenologs.
Woods, John O; Singh-Blom, Ulf Martin; Laurent, Jon M; McGary, Kriston L; Marcotte, Edward M
2013-06-21
Phenotypes and diseases may be related to seemingly dissimilar phenotypes in other species by means of the orthology of underlying genes. Such "orthologous phenotypes," or "phenologs," are examples of deep homology, and may be used to predict additional candidate disease genes. In this work, we develop an unsupervised algorithm for ranking phenolog-based candidate disease genes through the integration of predictions from the k nearest neighbor phenologs, comparing classifiers and weighting functions by cross-validation. We also improve upon the original method by extending the theory to paralogous phenotypes. Our algorithm makes use of additional phenotype data--from chicken, zebrafish, and E. coli, as well as new datasets for C. elegans--establishing that several types of annotations may be treated as phenotypes. We demonstrate the use of our algorithm to predict novel candidate genes for human atrial fibrillation (such as HRH2, ATP4A, ATP4B, and HOPX) and epilepsy (e.g., PAX6 and NKX2-1). We suggest gene candidates for pharmacologically-induced seizures in mouse, solely based on orthologous phenotypes from E. coli. We also explore the prediction of plant gene-phenotype associations, as for the Arabidopsis response to vernalization phenotype. We are able to rank gene predictions for a significant portion of the diseases in the Online Mendelian Inheritance in Man database. Additionally, our method suggests candidate genes for mammalian seizures based only on bacterial phenotypes and gene orthology. We demonstrate that phenotype information may come from diverse sources, including drug sensitivities, gene ontology biological processes, and in situ hybridization annotations. Finally, we offer testable candidates for a variety of human diseases, plant traits, and other classes of phenotypes across a wide array of species.
Wain, Karen E; Riggs, Erin; Hanson, Karen; Savage, Melissa; Riethmaier, Darlene; Muirhead, Andrea; Mitchell, Elyse; Packard, Bethanny Smith; Faucett, W Andrew
2012-10-01
The International Standards for Cytogenomic Arrays (ISCA) Consortium is a worldwide collaborative effort dedicated to optimizing patient care by improving the quality of chromosomal microarray testing. The primary effort of the ISCA Consortium has been the development of a database of copy number variants (CNVs) identified during the course of clinical microarray testing. This database is a powerful resource for clinicians, laboratories, and researchers, and can be utilized for a variety of applications, such as facilitating standardized interpretations of certain CNVs across laboratories or providing phenotypic information for counseling purposes when published data is sparse. A recognized limitation to the clinical utility of this database, however, is the quality of clinical information available for each patient. Clinical genetic counselors are uniquely suited to facilitate the communication of this information to the laboratory by virtue of their existing clinical responsibilities, case management skills, and appreciation of the evolving nature of scientific knowledge. We intend to highlight the critical role that genetic counselors play in ensuring optimal patient care through contributing to the clinical utility of the ISCA Consortium's database, as well as the quality of individual patient microarray reports provided by contributing laboratories. Current tools, paper and electronic forms, created to maximize this collaboration are shared. In addition to making a professional commitment to providing complete clinical information, genetic counselors are invited to become ISCA members and to become involved in the discussions and initiatives within the Consortium.
Semantic Web Ontology and Data Integration: a Case Study in Aiding Psychiatric Drug Repurposing.
Liang, Chen; Sun, Jingchun; Tao, Cui
2015-01-01
There remain significant difficulties selecting probable candidate drugs from existing databases. We describe an ontology-oriented approach to represent the nexus between genes, drugs, phenotypes, symptoms, and diseases from multiple information sources. We also report a case study in which we attempted to explore candidate drugs effective for bipolar disorder and epilepsy. We constructed an ontology incorporating knowledge between the two diseases and performed semantic reasoning tasks with the ontology. The results suggested 48 candidate drugs that hold promise for further breakthrough. The evaluation demonstrated the validity our approach. Our approach prioritizes the candidate drugs that have potential associations among genes, phenotypes and symptoms, and thus facilitates the data integration and drug repurposing in psychiatric disorders.
SNPs selection using support vector regression and genetic algorithms in GWAS
2014-01-01
Introduction This paper proposes a new methodology to simultaneously select the most relevant SNPs markers for the characterization of any measurable phenotype described by a continuous variable using Support Vector Regression with Pearson Universal kernel as fitness function of a binary genetic algorithm. The proposed methodology is multi-attribute towards considering several markers simultaneously to explain the phenotype and is based jointly on statistical tools, machine learning and computational intelligence. Results The suggested method has shown potential in the simulated database 1, with additive effects only, and real database. In this simulated database, with a total of 1,000 markers, and 7 with major effect on the phenotype and the other 993 SNPs representing the noise, the method identified 21 markers. Of this total, 5 are relevant SNPs between the 7 but 16 are false positives. In real database, initially with 50,752 SNPs, we have reduced to 3,073 markers, increasing the accuracy of the model. In the simulated database 2, with additive effects and interactions (epistasis), the proposed method matched to the methodology most commonly used in GWAS. Conclusions The method suggested in this paper demonstrates the effectiveness in explaining the real phenotype (PTA for milk), because with the application of the wrapper based on genetic algorithm and Support Vector Regression with Pearson Universal, many redundant markers were eliminated, increasing the prediction and accuracy of the model on the real database without quality control filters. The PUK demonstrated that it can replicate the performance of linear and RBF kernels. PMID:25573332
Locus-Specific Mutation Databases for Neurodegenerative Brain Diseases
Cruts, Marc; Theuns, Jessie; Van Broeckhoven, Christine
2012-01-01
The Alzheimer disease and frontotemporal dementia (AD&FTLD) and Parkinson disease (PD) Mutation Databases make available curated information of sequence variations in genes causing Mendelian forms of the most common neurodegenerative brain disease AD, frontotemporal lobar degeneration (FTLD), and PD. They are established resources for clinical geneticists, neurologists, and researchers in need of comprehensive, referenced genetic, epidemiologic, clinical, neuropathological, and/or cell biological information of specific gene mutations in these diseases. In addition, the aggregate analysis of all information available in the databases provides unique opportunities to extract mutation characteristics and genotype–phenotype correlations, which would be otherwise unnoticed and unexplored. Such analyses revealed that 61.4% of mutations are private to one single family, while only 5.7% of mutations occur in 10 or more families. The five mutations with most frequent independent observations occur in 21% of AD, 43% of FTLD, and 48% of PD families recorded in the Mutation Databases, respectively. Although these figures are inevitably biased by a publishing policy favoring novel mutations, they probably also reflect the occurrence of multiple rare and few relatively common mutations in the inherited forms of these diseases. Finally, with the exception of the PD genes PARK2 and PINK1, all other genes are associated with more than one clinical diagnosis or characteristics thereof. Hum Mutat 33:1340–1344, 2012. © 2012 Wiley Periodicals, Inc. PMID:22581678
Laplacian normalization and random walk on heterogeneous networks for disease-gene prioritization.
Zhao, Zhi-Qin; Han, Guo-Sheng; Yu, Zu-Guo; Li, Jinyan
2015-08-01
Random walk on heterogeneous networks is a recently emerging approach to effective disease gene prioritization. Laplacian normalization is a technique capable of normalizing the weight of edges in a network. We use this technique to normalize the gene matrix and the phenotype matrix before the construction of the heterogeneous network, and also use this idea to define the transition matrices of the heterogeneous network. Our method has remarkably better performance than the existing methods for recovering known gene-phenotype relationships. The Shannon information entropy of the distribution of the transition probabilities in our networks is found to be smaller than the networks constructed by the existing methods, implying that a higher number of top-ranked genes can be verified as disease genes. In fact, the most probable gene-phenotype relationships ranked within top 3 or top 5 in our gene lists can be confirmed by the OMIM database for many cases. Our algorithms have shown remarkably superior performance over the state-of-the-art algorithms for recovering gene-phenotype relationships. All Matlab codes can be available upon email request. Copyright © 2015 Elsevier Ltd. All rights reserved.
Huang, Haiyan; Liu, Chun-Chi; Zhou, Xianghong Jasmine
2010-04-13
The rapid accumulation of gene expression data has offered unprecedented opportunities to study human diseases. The National Center for Biotechnology Information Gene Expression Omnibus is currently the largest database that systematically documents the genome-wide molecular basis of diseases. However, thus far, this resource has been far from fully utilized. This paper describes the first study to transform public gene expression repositories into an automated disease diagnosis database. Particularly, we have developed a systematic framework, including a two-stage Bayesian learning approach, to achieve the diagnosis of one or multiple diseases for a query expression profile along a hierarchical disease taxonomy. Our approach, including standardizing cross-platform gene expression data and heterogeneous disease annotations, allows analyzing both sources of information in a unified probabilistic system. A high level of overall diagnostic accuracy was shown by cross validation. It was also demonstrated that the power of our method can increase significantly with the continued growth of public gene expression repositories. Finally, we showed how our disease diagnosis system can be used to characterize complex phenotypes and to construct a disease-drug connectivity map.
Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders.
Hamosh, Ada; Scott, Alan F; Amberger, Joanna; Bocchini, Carol; Valle, David; McKusick, Victor A
2002-01-01
Online Mendelian Inheritance in Man (OMIM) is a comprehensive, authoritative and timely knowledgebase of human genes and genetic disorders compiled to support research and education in human genomics and the practice of clinical genetics. Started by Dr Victor A. McKusick as the definitive reference Mendelian Inheritance in Man, OMIM (www.ncbi.nlm.nih.gov/omim) is now distributed electronically by the National Center for Biotechnology Information (NCBI), where it is integrated with the Entrez suite of databases. Derived from the biomedical literature, OMIM is written and edited at Johns Hopkins University with input from scientists and physicians around the world. Each OMIM entry has a full-text summary of a genetically determined phenotype and/or gene and has numerous links to other genetic databases such as DNA and protein sequence, PubMed references, general and locus-specific mutation databases, approved gene nomenclature, and the highly detailed mapviewer, as well as patient support groups and many others. OMIM is an easy and straightforward portal to the burgeoning information in human genetics.
Garver, William S.; Jelinek, David; Meaney, F. John; Flynn, James; Pettit, Kathleen M.; Shepherd, Glen; Heidenreich, Randall A.; Vockley, Cate M. Walsh; Castro, Graciela; Francis, Gordon A.
2010-01-01
Niemann-Pick type C1 disease (NPC1) is an autosomal recessive lysosomal storage disorder characterized by neonatal jaundice, hepatosplenomegaly, and progressive neurodegeneration. The present study provides the lipid profiles, mutations, and corresponding associations with the biochemical phenotype obtained from NPC1 patients who participated in the National NPC1 Disease Database. Lipid profiles were obtained from 34 patients (39%) in the survey and demonstrated significantly reduced plasma LDL cholesterol (LDL-C) and increased plasma triglycerides in the majority of patients. Reduced plasma HDL cholesterol (HDL-C) was the most consistent lipoprotein abnormality found in male and female NPC1 patients across age groups and occurred independent of changes in plasma triglycerides. A subset of 19 patients for whom the biochemical severity of known NPC1 mutations could be correlated with their lipid profile showed a strong inverse correlation between plasma HDL-C and severity of the biochemical phenotype. Gene mutations were available for 52 patients (59%) in the survey, including 52 different mutations and five novel mutations (Y628C, P887L, I923V, A1151T, and 3741_3744delACTC). Together, these findings provide novel information regarding the plasma lipoprotein changes and mutations in NPC1 disease, and suggest plasma HDL-C represents a potential biomarker of NPC1 disease severity. PMID:19744920
Friedrich, Anne; Garnier, Nicolas; Gagnière, Nicolas; Nguyen, Hoan; Albou, Laurent-Philippe; Biancalana, Valérie; Bettler, Emmanuel; Deléage, Gilbert; Lecompte, Odile; Muller, Jean; Moras, Dino; Mandel, Jean-Louis; Toursel, Thierry; Moulinier, Luc; Poch, Olivier
2010-02-01
Understanding how genetic alterations affect gene products at the molecular level represents a first step in the elucidation of the complex relationships between genotypic and phenotypic variations, and is thus a major challenge in the postgenomic era. Here, we present SM2PH-db (http://decrypthon.igbmc.fr/sm2ph), a new database designed to investigate structural and functional impacts of missense mutations and their phenotypic effects in the context of human genetic diseases. A wealth of up-to-date interconnected information is provided for each of the 2,249 disease-related entry proteins (August 2009), including data retrieved from biological databases and data generated from a Sequence-Structure-Evolution Inference in Systems-based approach, such as multiple alignments, three-dimensional structural models, and multidimensional (physicochemical, functional, structural, and evolutionary) characterizations of mutations. SM2PH-db provides a robust infrastructure associated with interactive analysis tools supporting in-depth study and interpretation of the molecular consequences of mutations, with the more long-term goal of elucidating the chain of events leading from a molecular defect to its pathology. The entire content of SM2PH-db is regularly and automatically updated thanks to a computational grid data federation facilities provided in the context of the Decrypthon program. (c) 2009 Wiley-Liss, Inc.
Sönksen, Ute Wolff; Christensen, Jens Jørgen; Nielsen, Lisbeth; Hesselbjerg, Annemarie; Hansen, Dennis Schrøder; Bruun, Brita
2010-12-31
Taxonomy and identification of fastidious Gram negatives are evolving and challenging. We compared identifications achieved with the Vitek 2 Neisseria-Haemophilus (NH) card and partial 16S rRNA gene sequence (526 bp stretch) analysis with identifications obtained with extensive phenotypic characterization using 100 fastidious Gram negative bacteria. Seventy-five strains represented 21 of the 26 taxa included in the Vitek 2 NH database and 25 strains represented related species not included in the database. Of the 100 strains, 31 were the type strains of the species. Vitek 2 NH identification results: 48 of 75 database strains were correctly identified, 11 strains gave `low discrimination´, seven strains were unidentified, and nine strains were misidentified. Identification of 25 non-database strains resulted in 14 strains incorrectly identified as belonging to species in the database. Partial 16S rRNA gene sequence analysis results: For 76 strains phenotypic and sequencing identifications were identical, for 23 strains the sequencing identifications were either probable or possible, and for one strain only the genus was confirmed. Thus, the Vitek 2 NH system identifies most of the commonly occurring species included in the database. Some strains of rarely occurring species and strains of non-database species closely related to database species cause problems. Partial 16S rRNA gene sequence analysis performs well, but does not always suffice, additional phenotypical characterization being useful for final identification.
Sönksen, Ute Wolff; Christensen, Jens Jørgen; Nielsen, Lisbeth; Hesselbjerg, Annemarie; Hansen, Dennis Schrøder; Bruun, Brita
2010-01-01
Taxonomy and identification of fastidious Gram negatives are evolving and challenging. We compared identifications achieved with the Vitek 2 Neisseria-Haemophilus (NH) card and partial 16S rRNA gene sequence (526 bp stretch) analysis with identifications obtained with extensive phenotypic characterization using 100 fastidious Gram negative bacteria. Seventy-five strains represented 21 of the 26 taxa included in the Vitek 2 NH database and 25 strains represented related species not included in the database. Of the 100 strains, 31 were the type strains of the species. Vitek 2 NH identification results: 48 of 75 database strains were correctly identified, 11 strains gave `low discrimination´, seven strains were unidentified, and nine strains were misidentified. Identification of 25 non-database strains resulted in 14 strains incorrectly identified as belonging to species in the database. Partial 16S rRNA gene sequence analysis results: For 76 strains phenotypic and sequencing identifications were identical, for 23 strains the sequencing identifications were either probable or possible, and for one strain only the genus was confirmed. Thus, the Vitek 2 NH system identifies most of the commonly occurring species included in the database. Some strains of rarely occurring species and strains of non-database species closely related to database species cause problems. Partial 16S rRNA gene sequence analysis performs well, but does not always suffice, additional phenotypical characterization being useful for final identification. PMID:21347215
Povey, Sue; Al Aqeel, Aida I; Cambon-Thomsen, Anne; Dalgleish, Raymond; den Dunnen, Johan T; Firth, Helen V; Greenblatt, Marc S; Barash, Carol Isaacson; Parker, Michael; Patrinos, George P; Savige, Judith; Sobrido, Maria-Jesus; Winship, Ingrid; Cotton, Richard GH
2010-01-01
More than 1,000 Web-based locus-specific variation databases (LSDBs) are listed on the Website of the Human Genetic Variation Society (HGVS). These individual efforts, which often relate phenotype to genotype, are a valuable source of information for clinicians, patients, and their families, as well as for basic research. The initiators of the Human Variome Project recently recognized that having access to some of the immense resources of unpublished information already present in diagnostic laboratories would provide critical data to help manage genetic disorders. However, there are significant ethical issues involved in sharing these data worldwide. An international working group presents second-generation guidelines addressing ethical issues relating to the curation of human LSDBs that provide information via a Web-based interface. It is intended that these should help current and future curators and may also inform the future decisions of ethics committees and legislators. These guidelines have been reviewed by the Ethics Committee of the Human Genome Organization (HUGO). Hum Mutat 31:–6, 2010. © 2010 Wiley-Liss, Inc. PMID:20683926
OryzaGenome: Genome Diversity Database of Wild Oryza Species.
Ohyanagi, Hajime; Ebata, Toshinobu; Huang, Xuehui; Gong, Hao; Fujita, Masahiro; Mochizuki, Takako; Toyoda, Atsushi; Fujiyama, Asao; Kaminuma, Eli; Nakamura, Yasukazu; Feng, Qi; Wang, Zi-Xuan; Han, Bin; Kurata, Nori
2016-01-01
The species in the genus Oryza, encompassing nine genome types and 23 species, are a rich genetic resource and may have applications in deeper genomic analyses aiming to understand the evolution of plant genomes. With the advancement of next-generation sequencing (NGS) technology, a flood of Oryza species reference genomes and genomic variation information has become available in recent years. This genomic information, combined with the comprehensive phenotypic information that we are accumulating in our Oryzabase, can serve as an excellent genotype-phenotype association resource for analyzing rice functional and structural evolution, and the associated diversity of the Oryza genus. Here we integrate our previous and future phenotypic/habitat information and newly determined genotype information into a united repository, named OryzaGenome, providing the variant information with hyperlinks to Oryzabase. The current version of OryzaGenome includes genotype information of 446 O. rufipogon accessions derived by imputation and of 17 accessions derived by imputation-free deep sequencing. Two variant viewers are implemented: SNP Viewer as a conventional genome browser interface and Variant Table as a text-based browser for precise inspection of each variant one by one. Portable VCF (variant call format) file or tab-delimited file download is also available. Following these SNP (single nucleotide polymorphism) data, reference pseudomolecules/scaffolds/contigs and genome-wide variation information for almost all of the closely and distantly related wild Oryza species from the NIG Wild Rice Collection will be available in future releases. All of the resources can be accessed through http://viewer.shigen.info/oryzagenome/. © The Author 2015. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists.
Ghandikota, Sudhir; Hershey, Gurjit K Khurana; Mersha, Tesfaye B
2018-03-24
Advances in high-throughput sequencing technologies have made it possible to generate multiple omics data at an unprecedented rate and scale. The accumulation of these omics data far outpaces the rate at which biologists can mine and generate new hypothesis to test experimentally. There is an urgent need to develop a myriad of powerful tools to efficiently and effectively search and filter these resources to address specific post-GWAS functional genomics questions. However, to date, these resources are scattered across several databases and often lack a unified portal for data annotation and analytics. In addition, existing tools to analyze and visualize these databases are highly fragmented, resulting researchers to access multiple applications and manual interventions for each gene or variant in an ad hoc fashion until all the questions are answered. In this study, we present GENEASE, a web-based one-stop bioinformatics tool designed to not only query and explore multi-omics and phenotype databases (e.g., GTEx, ClinVar, dbGaP, GWAS Catalog, ENCODE, Roadmap Epigenomics, KEGG, Reactome, Gene and Phenotype Ontology) in a single web interface but also to perform seamless post genome-wide association downstream functional and overlap analysis for non-coding regulatory variants. GENEASE accesses over 50 different databases in public domain including model organism-specific databases to facilitate gene/variant and disease exploration, enrichment and overlap analysis in real time. It is a user-friendly tool with point-and-click interface containing links for support information including user manual and examples. GENEASE can be accessed freely at http://research.cchmc.org/mershalab/genease_new/login.html. Tesfaye.Mersha@cchmc.org, Sudhir.Ghandikota@cchmc.org. Supplementary data are available at Bioinformatics online.
Automatic categorization of diverse experimental information in the bioscience literature
2012-01-01
Background Curation of information from bioscience literature into biological knowledge databases is a crucial way of capturing experimental information in a computable form. During the biocuration process, a critical first step is to identify from all published literature the papers that contain results for a specific data type the curator is interested in annotating. This step normally requires curators to manually examine many papers to ascertain which few contain information of interest and thus, is usually time consuming. We developed an automatic method for identifying papers containing these curation data types among a large pool of published scientific papers based on the machine learning method Support Vector Machine (SVM). This classification system is completely automatic and can be readily applied to diverse experimental data types. It has been in use in production for automatic categorization of 10 different experimental datatypes in the biocuration process at WormBase for the past two years and it is in the process of being adopted in the biocuration process at FlyBase and the Saccharomyces Genome Database (SGD). We anticipate that this method can be readily adopted by various databases in the biocuration community and thereby greatly reducing time spent on an otherwise laborious and demanding task. We also developed a simple, readily automated procedure to utilize training papers of similar data types from different bodies of literature such as C. elegans and D. melanogaster to identify papers with any of these data types for a single database. This approach has great significance because for some data types, especially those of low occurrence, a single corpus often does not have enough training papers to achieve satisfactory performance. Results We successfully tested the method on ten data types from WormBase, fifteen data types from FlyBase and three data types from Mouse Genomics Informatics (MGI). It is being used in the curation work flow at WormBase for automatic association of newly published papers with ten data types including RNAi, antibody, phenotype, gene regulation, mutant allele sequence, gene expression, gene product interaction, overexpression phenotype, gene interaction, and gene structure correction. Conclusions Our methods are applicable to a variety of data types with training set containing several hundreds to a few thousand documents. It is completely automatic and, thus can be readily incorporated to different workflow at different literature-based databases. We believe that the work presented here can contribute greatly to the tremendous task of automating the important yet labor-intensive biocuration effort. PMID:22280404
Automatic categorization of diverse experimental information in the bioscience literature.
Fang, Ruihua; Schindelman, Gary; Van Auken, Kimberly; Fernandes, Jolene; Chen, Wen; Wang, Xiaodong; Davis, Paul; Tuli, Mary Ann; Marygold, Steven J; Millburn, Gillian; Matthews, Beverley; Zhang, Haiyan; Brown, Nick; Gelbart, William M; Sternberg, Paul W
2012-01-26
Curation of information from bioscience literature into biological knowledge databases is a crucial way of capturing experimental information in a computable form. During the biocuration process, a critical first step is to identify from all published literature the papers that contain results for a specific data type the curator is interested in annotating. This step normally requires curators to manually examine many papers to ascertain which few contain information of interest and thus, is usually time consuming. We developed an automatic method for identifying papers containing these curation data types among a large pool of published scientific papers based on the machine learning method Support Vector Machine (SVM). This classification system is completely automatic and can be readily applied to diverse experimental data types. It has been in use in production for automatic categorization of 10 different experimental datatypes in the biocuration process at WormBase for the past two years and it is in the process of being adopted in the biocuration process at FlyBase and the Saccharomyces Genome Database (SGD). We anticipate that this method can be readily adopted by various databases in the biocuration community and thereby greatly reducing time spent on an otherwise laborious and demanding task. We also developed a simple, readily automated procedure to utilize training papers of similar data types from different bodies of literature such as C. elegans and D. melanogaster to identify papers with any of these data types for a single database. This approach has great significance because for some data types, especially those of low occurrence, a single corpus often does not have enough training papers to achieve satisfactory performance. We successfully tested the method on ten data types from WormBase, fifteen data types from FlyBase and three data types from Mouse Genomics Informatics (MGI). It is being used in the curation work flow at WormBase for automatic association of newly published papers with ten data types including RNAi, antibody, phenotype, gene regulation, mutant allele sequence, gene expression, gene product interaction, overexpression phenotype, gene interaction, and gene structure correction. Our methods are applicable to a variety of data types with training set containing several hundreds to a few thousand documents. It is completely automatic and, thus can be readily incorporated to different workflow at different literature-based databases. We believe that the work presented here can contribute greatly to the tremendous task of automating the important yet labor-intensive biocuration effort.
2011-01-01
Background Most information on genomic variations and their associations with phenotypes are covered exclusively in scientific publications rather than in structured databases. These texts commonly describe variations using natural language; database identifiers are seldom mentioned. This complicates the retrieval of variations, associated articles, as well as information extraction, e. g. the search for biological implications. To overcome these challenges, procedures to map textual mentions of variations to database identifiers need to be developed. Results This article describes a workflow for normalization of variation mentions, i.e. the association of them to unique database identifiers. Common pitfalls in the interpretation of single nucleotide polymorphism (SNP) mentions are highlighted and discussed. The developed normalization procedure achieves a precision of 98.1 % and a recall of 67.5% for unambiguous association of variation mentions with dbSNP identifiers on a text corpus based on 296 MEDLINE abstracts containing 527 mentions of SNPs. The annotated corpus is freely available at http://www.scai.fraunhofer.de/snp-normalization-corpus.html. Conclusions Comparable approaches usually focus on variations mentioned on the protein sequence and neglect problems for other SNP mentions. The results presented here indicate that normalizing SNPs described on DNA level is more difficult than the normalization of SNPs described on protein level. The challenges associated with normalization are exemplified with ambiguities and errors, which occur in this corpus. PMID:21992066
Claustres, Mireille; Thèze, Corinne; des Georges, Marie; Baux, David; Girodon, Emmanuelle; Bienvenu, Thierry; Audrezet, Marie-Pierre; Dugueperoux, Ingrid; Férec, Claude; Lalau, Guy; Pagin, Adrien; Kitzis, Alain; Thoreau, Vincent; Gaston, Véronique; Bieth, Eric; Malinge, Marie-Claire; Reboul, Marie-Pierre; Fergelot, Patricia; Lemonnier, Lydie; Mekki, Chadia; Fanen, Pascale; Bergougnoux, Anne; Sasorith, Souphatta; Raynal, Caroline; Bareil, Corinne
2017-10-01
Most of the 2,000 variants identified in the CFTR (cystic fibrosis transmembrane regulator) gene are rare or private. Their interpretation is hampered by the lack of available data and resources, making patient care and genetic counseling challenging. We developed a patient-based database dedicated to the annotations of rare CFTR variants in the context of their cis- and trans-allelic combinations. Based on almost 30 years of experience of CFTR testing, CFTR-France (https://cftr.iurc.montp.inserm.fr/cftr) currently compiles 16,819 variant records from 4,615 individuals with cystic fibrosis (CF) or CFTR-RD (related disorders), fetuses with ultrasound bowel anomalies, newborns awaiting clinical diagnosis, and asymptomatic compound heterozygotes. For each of the 736 different variants reported in the database, patient characteristics and genetic information (other variations in cis or in trans) have been thoroughly checked by a dedicated curator. Combining updated clinical, epidemiological, in silico, or in vitro functional data helps to the interpretation of unclassified and the reassessment of misclassified variants. This comprehensive CFTR database is now an invaluable tool for diagnostic laboratories gathering information on rare variants, especially in the context of genetic counseling, prenatal and preimplantation genetic diagnosis. CFTR-France is thus highly complementary to the international database CFTR2 focused so far on the most common CF-causing alleles. © 2017 Wiley Periodicals, Inc.
Meta-All: a system for managing metabolic pathway information.
Weise, Stephan; Grosse, Ivo; Klukas, Christian; Koschützki, Dirk; Scholz, Uwe; Schreiber, Falk; Junker, Björn H
2006-10-23
Many attempts are being made to understand biological subjects at a systems level. A major resource for these approaches are biological databases, storing manifold information about DNA, RNA and protein sequences including their functional and structural motifs, molecular markers, mRNA expression levels, metabolite concentrations, protein-protein interactions, phenotypic traits or taxonomic relationships. The use of these databases is often hampered by the fact that they are designed for special application areas and thus lack universality. Databases on metabolic pathways, which provide an increasingly important foundation for many analyses of biochemical processes at a systems level, are no exception from the rule. Data stored in central databases such as KEGG, BRENDA or SABIO-RK is often limited to read-only access. If experimentalists want to store their own data, possibly still under investigation, there are two possibilities. They can either develop their own information system for managing that own data, which is very time-consuming and costly, or they can try to store their data in existing systems, which is often restricted. Hence, an out-of-the-box information system for managing metabolic pathway data is needed. We have designed META-ALL, an information system that allows the management of metabolic pathways, including reaction kinetics, detailed locations, environmental factors and taxonomic information. Data can be stored together with quality tags and in different parallel versions. META-ALL uses Oracle DBMS and Oracle Application Express. We provide the META-ALL information system for download and use. In this paper, we describe the database structure and give information about the tools for submitting and accessing the data. As a first application of META-ALL, we show how the information contained in a detailed kinetic model can be stored and accessed. META-ALL is a system for managing information about metabolic pathways. It facilitates the handling of pathway-related data and is designed to help biochemists and molecular biologists in their daily research. It is available on the Web at http://bic-gh.de/meta-all and can be downloaded free of charge and installed locally.
Meta-All: a system for managing metabolic pathway information
Weise, Stephan; Grosse, Ivo; Klukas, Christian; Koschützki, Dirk; Scholz, Uwe; Schreiber, Falk; Junker, Björn H
2006-01-01
Background Many attempts are being made to understand biological subjects at a systems level. A major resource for these approaches are biological databases, storing manifold information about DNA, RNA and protein sequences including their functional and structural motifs, molecular markers, mRNA expression levels, metabolite concentrations, protein-protein interactions, phenotypic traits or taxonomic relationships. The use of these databases is often hampered by the fact that they are designed for special application areas and thus lack universality. Databases on metabolic pathways, which provide an increasingly important foundation for many analyses of biochemical processes at a systems level, are no exception from the rule. Data stored in central databases such as KEGG, BRENDA or SABIO-RK is often limited to read-only access. If experimentalists want to store their own data, possibly still under investigation, there are two possibilities. They can either develop their own information system for managing that own data, which is very time-consuming and costly, or they can try to store their data in existing systems, which is often restricted. Hence, an out-of-the-box information system for managing metabolic pathway data is needed. Results We have designed META-ALL, an information system that allows the management of metabolic pathways, including reaction kinetics, detailed locations, environmental factors and taxonomic information. Data can be stored together with quality tags and in different parallel versions. META-ALL uses Oracle DBMS and Oracle Application Express. We provide the META-ALL information system for download and use. In this paper, we describe the database structure and give information about the tools for submitting and accessing the data. As a first application of META-ALL, we show how the information contained in a detailed kinetic model can be stored and accessed. Conclusion META-ALL is a system for managing information about metabolic pathways. It facilitates the handling of pathway-related data and is designed to help biochemists and molecular biologists in their daily research. It is available on the Web at and can be downloaded free of charge and installed locally. PMID:17059592
Retrospective Mining of Toxicology Data to Discover ...
In vivo toxicology data is subject to multiple sources of uncertainty: observer severity bias (a pathologist may record only more severe effects and ignore less severe ones); dose spacing issues (this can lead to missing data, e.g. if a severe effect has a less severe precursor, but both occur at the same tested dose); imperfect control of key independent variables (in databases, one can rarely control key input variables such as animal strain or dosing schedules); effect description heterogeneity (terminology changes over time which can lead to information loss); statistical issues (too few chemicals with a given phenotype, or too few animals in dose groups). These issues directly contribute to uncertainties in models built from the data. We are investigating the use of collections of endpoints (toxicity syndromes) to address these issues. These are identical in concept to medical syndromes which allow a physician to diagnose an underlying disease more accurately than can be done when relying on examination of one symptom at a time. Our test case is anemia, for several reasons: most of the phenotypes (e.g. cell counts) are quantitative; related effects are measured in an automated way; anemia is relatively common, at least at high doses (~30% of chemicals in our database show significant drops in red cell count); the causes of anemia are well understood; and, there is a standard clinical decision tree to classify anemia. Using a database of 658 chemicals, we ha
PedNavigator: a pedigree drawing servlet for large and inbred populations.
Mancosu, Gianmaria; Ledda, Giuseppe; Melis, Paola M
2003-03-22
PedNavigator is a pedigree drawing application for large and complex pedigrees. It has been developed especially for genetic and epidemiological studies of isolated populations characterized by high inbreeding and multiple matrimonies. PedNavigator is written in Java and is intended as a server-side web application, allowing researchers to 'walk' through family ties by point-and-clicking on person's symbols. The application is able to enrich the pedigree drawings with genotypic and phenotypic information taken from the underlying relational database.
Firnkorn, D; Ganzinger, M; Muley, T; Thomas, M; Knaup, P
2015-01-01
Joint data analysis is a key requirement in medical research networks. Data are available in heterogeneous formats at each network partner and their harmonization is often rather complex. The objective of our paper is to provide a generic approach for the harmonization process in research networks. We applied the process when harmonizing data from three sites for the Lung Cancer Phenotype Database within the German Center for Lung Research. We developed a spreadsheet-based solution as tool to support the harmonization process for lung cancer data and a data integration procedure based on Talend Open Studio. The harmonization process consists of eight steps describing a systematic approach for defining and reviewing source data elements and standardizing common data elements. The steps for defining common data elements and harmonizing them with local data definitions are repeated until consensus is reached. Application of this process for building the phenotype database led to a common basic data set on lung cancer with 285 structured parameters. The Lung Cancer Phenotype Database was realized as an i2b2 research data warehouse. Data harmonization is a challenging task requiring informatics skills as well as domain knowledge. Our approach facilitates data harmonization by providing guidance through a uniform process that can be applied in a wide range of projects.
Chen, Tsute; Yu, Wen-Han; Izard, Jacques; Baranova, Oxana V.; Lakshmanan, Abirami; Dewhirst, Floyd E.
2010-01-01
The human oral microbiome is the most studied human microflora, but 53% of the species have not yet been validly named and 35% remain uncultivated. The uncultivated taxa are known primarily from 16S rRNA sequence information. Sequence information tied solely to obscure isolate or clone numbers, and usually lacking accurate phylogenetic placement, is a major impediment to working with human oral microbiome data. The goal of creating the Human Oral Microbiome Database (HOMD) is to provide the scientific community with a body site-specific comprehensive database for the more than 600 prokaryote species that are present in the human oral cavity based on a curated 16S rRNA gene-based provisional naming scheme. Currently, two primary types of information are provided in HOMD—taxonomic and genomic. Named oral species and taxa identified from 16S rRNA gene sequence analysis of oral isolates and cloning studies were placed into defined 16S rRNA phylotypes and each given unique Human Oral Taxon (HOT) number. The HOT interlinks phenotypic, phylogenetic, genomic, clinical and bibliographic information for each taxon. A BLAST search tool is provided to match user 16S rRNA gene sequences to a curated, full length, 16S rRNA gene reference data set. For genomic analysis, HOMD provides comprehensive set of analysis tools and maintains frequently updated annotations for all the human oral microbial genomes that have been sequenced and publicly released. Oral bacterial genome sequences, determined as part of the Human Microbiome Project, are being added to the HOMD as they become available. We provide HOMD as a conceptual model for the presentation of microbiome data for other human body sites. Database URL: http://www.homd.org PMID:20624719
Investigating genotype-phenotype relationships in Rett syndrome using an international data set.
Bebbington, A; Anderson, A; Ravine, D; Fyfe, S; Pineda, M; de Klerk, N; Ben-Zeev, B; Yatawara, N; Percy, A; Kaufmann, W E; Leonard, H
2008-03-11
Rett syndrome is an uncommon neurodevelopmental disorder with an incidence of 1:9,000 live female births. The principal genetic cause was first reported in 1999 when the association with mutations in the methyl-CpG-binding protein 2 (or MECP2) gene was identified. This study uses data from a large international database, InterRett, to examine genotype-phenotype relationships and compares these with previous findings in a population-based cohort. The data set for these analyses was derived from a subset of InterRett cases with subject information collected from the family, the clinician, or both. Individual phenotypic characteristics and clinical severity using three scales were compared among those with eight known recurrent pathogenic MECP2 mutations as well as those with C-terminal deletions (n = 272). Overall, p.R270X and p.R255X were the most severe and p.R133C and p.R294X were the mildest mutations. Significant differences by mutation were seen for individual phenotypic characteristics such as hand use, ambulation, and language. This multicenter investigation into the phenotypic correlates of MECP2 mutations in Rett syndrome has provided a greater depth of understanding than hitherto available about the specific phenotypic characteristics associated with commonly occurring mutations. Although the modifying influence of X inactivation on clinical severity could not be included in the analysis, the findings confirm clear genotype-phenotype relationships in Rett syndrome and show the benefits of collaboration crucial to effective research in rare disorders.
Sen, Arjune; Dugan, Patricia; Perucca, Piero; Costello, Daniel; Choi, Hyunmi; Bazil, Carl; Radtke, Rod; Andrade, Danielle; Depondt, Chantal; Heavin, Sinead; Adcock, Jane; Pickrell, W Owen; McGinty, Ronan; Nascimento, Fábio; Smith, Philip; Rees, Mark I; Kwan, Patrick; O'Brien, Terence J; Goldstein, David; Delanty, Norman
2018-06-14
There is little detailed phenotypic characterization of bilateral hippocampal sclerosis (HS). We therefore conducted a multicenter review of people with pharmacoresistant epilepsy and bilateral HS to better determine their clinical characteristics. Databases from 11 EPIGEN centers were searched. For identified cases, clinicians reviewed the medical notes, imaging, and electroencephalographic (EEG), video-EEG, and neuropsychometric data. Data were irretrievably anonymized, and a single database was populated to capture all phenotypic information. These data were compared with phenotyped cases of unilateral HS from the same centers. In total, 96 patients with pharmacoresistant epilepsy and bilateral HS were identified (43 female, 53 male; age range = 8-80 years). Twenty-five percent had experienced febrile convulsions, and 27% of patients had experienced status epilepticus. The mean number of previously tried antiepileptic drugs was 5.32, and the average number of currently prescribed medications was 2.99; 44.8% of patients had cognitive difficulties, and 47.9% had psychiatric comorbidity; 35.4% (34/96) of patients continued with long-term medical therapy alone, another 4 being seizure-free on medication. Sixteen patients proceeded to, or were awaiting, neurostimulation, and 11 underwent surgical resection. One patient was rendered seizure-free postresection, with an improvement in seizures for 3 other cases. By comparison, of 201 patients with unilateral HS, a significantly higher number (44.3%) had febrile convulsions and only 11.4% had experienced status epilepticus. Importantly, 41.8% (84/201) of patients with unilateral HS had focal aware seizures, whereas such seizures were less frequently observed in people with bilateral HS, and were never observed exclusively (P = .002; Fisher's exact test). The current work describes the phenotypic spectrum of people with pharmacoresistant epilepsy and bilateral HS, highlights salient clinical differences from patients with unilateral HS, and provides a large platform from which to develop further studies, both epidemiological and genomic, to better understand etiopathogenesis and optimal treatment regimes in this condition. Wiley Periodicals, Inc. © 2018 International League Against Epilepsy.
NASA Astrophysics Data System (ADS)
Albers, D. J.; Hripcsak, George
2010-02-01
Statistical physics and information theory is applied to the clinical chemistry measurements present in a patient database containing 2.5 million patients' data over a 20-year period. Despite the seemingly naive approach of aggregating all patients over all times (with respect to particular clinical chemistry measurements), both a diurnal signal in the decay of the time-delayed mutual information and the presence of two sub-populations with differing health are detected. This provides a proof in principle that the highly fragmented data in electronic health records has potential for being useful in defining disease and human phenotypes.
Automated 3D Phenotype Analysis Using Data Mining
Plyusnin, Ilya; Evans, Alistair R.; Karme, Aleksis; Gionis, Aristides; Jernvall, Jukka
2008-01-01
The ability to analyze and classify three-dimensional (3D) biological morphology has lagged behind the analysis of other biological data types such as gene sequences. Here, we introduce the techniques of data mining to the study of 3D biological shapes to bring the analyses of phenomes closer to the efficiency of studying genomes. We compiled five training sets of highly variable morphologies of mammalian teeth from the MorphoBrowser database. Samples were labeled either by dietary class or by conventional dental types (e.g. carnassial, selenodont). We automatically extracted a multitude of topological attributes using Geographic Information Systems (GIS)-like procedures that were then used in several combinations of feature selection schemes and probabilistic classification models to build and optimize classifiers for predicting the labels of the training sets. In terms of classification accuracy, computational time and size of the feature sets used, non-repeated best-first search combined with 1-nearest neighbor classifier was the best approach. However, several other classification models combined with the same searching scheme proved practical. The current study represents a first step in the automatic analysis of 3D phenotypes, which will be increasingly valuable with the future increase in 3D morphology and phenomics databases. PMID:18320060
Mouse Tumor Biology (MTB): a database of mouse models for human cancer.
Bult, Carol J; Krupke, Debra M; Begley, Dale A; Richardson, Joel E; Neuhauser, Steven B; Sundberg, John P; Eppig, Janan T
2015-01-01
The Mouse Tumor Biology (MTB; http://tumor.informatics.jax.org) database is a unique online compendium of mouse models for human cancer. MTB provides online access to expertly curated information on diverse mouse models for human cancer and interfaces for searching and visualizing data associated with these models. The information in MTB is designed to facilitate the selection of strains for cancer research and is a platform for mining data on tumor development and patterns of metastases. MTB curators acquire data through manual curation of peer-reviewed scientific literature and from direct submissions by researchers. Data in MTB are also obtained from other bioinformatics resources including PathBase, the Gene Expression Omnibus and ArrayExpress. Recent enhancements to MTB improve the association between mouse models and human genes commonly mutated in a variety of cancers as identified in large-scale cancer genomics studies, provide new interfaces for exploring regions of the mouse genome associated with cancer phenotypes and incorporate data and information related to Patient-Derived Xenograft models of human cancers. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
An integrated database-pipeline system for studying single nucleotide polymorphisms and diseases.
Yang, Jin Ok; Hwang, Sohyun; Oh, Jeongsu; Bhak, Jong; Sohn, Tae-Kwon
2008-12-12
Studies on the relationship between disease and genetic variations such as single nucleotide polymorphisms (SNPs) are important. Genetic variations can cause disease by influencing important biological regulation processes. Despite the needs for analyzing SNP and disease correlation, most existing databases provide information only on functional variants at specific locations on the genome, or deal with only a few genes associated with disease. There is no combined resource to widely support gene-, SNP-, and disease-related information, and to capture relationships among such data. Therefore, we developed an integrated database-pipeline system for studying SNPs and diseases. To implement the pipeline system for the integrated database, we first unified complicated and redundant disease terms and gene names using the Unified Medical Language System (UMLS) for classification and noun modification, and the HUGO Gene Nomenclature Committee (HGNC) and NCBI gene databases. Next, we collected and integrated representative databases for three categories of information. For genes and proteins, we examined the NCBI mRNA, UniProt, UCSC Table Track and MitoDat databases. For genetic variants we used the dbSNP, JSNP, ALFRED, and HGVbase databases. For disease, we employed OMIM, GAD, and HGMD databases. The database-pipeline system provides a disease thesaurus, including genes and SNPs associated with disease. The search results for these categories are available on the web page http://diseasome.kobic.re.kr/, and a genome browser is also available to highlight findings, as well as to permit the convenient review of potentially deleterious SNPs among genes strongly associated with specific diseases and clinical phenotypes. Our system is designed to capture the relationships between SNPs associated with disease and disease-causing genes. The integrated database-pipeline provides a list of candidate genes and SNP markers for evaluation in both epidemiological and molecular biological approaches to diseases-gene association studies. Furthermore, researchers then can decide semi-automatically the data set for association studies while considering the relationships between genetic variation and diseases. The database can also be economical for disease-association studies, as well as to facilitate an understanding of the processes which cause disease. Currently, the database contains 14,674 SNP records and 109,715 gene records associated with human diseases and it is updated at regular intervals.
Online Mendelian Inheritance in Man (OMIM).
Hamosh, A; Scott, A F; Amberger, J; Valle, D; McKusick, V A
2000-01-01
Online Mendelian Inheritance In Man (OMIM) is a public database of bibliographic information about human genes and genetic disorders. Begun by Dr. Victor McKusick as the authoritative reference Mendelian Inheritance in Man, it is now distributed electronically by the National Center for Biotechnology Information (NCBI). Material in OMIM is derived from the biomedical literature and is written by Dr. McKusick and his colleagues at Johns Hopkins University and elsewhere. Each OMIM entry has a full text summary of a genetic phenotype and/or gene and has copious links to other genetic resources such as DNA and protein sequence, PubMed references, mutation databases, approved gene nomenclature, and more. In addition, NCBI's neighboring feature allows users to identify related articles from PubMed selected on the basis of key words in the OMIM entry. Through its many features, OMIM is increasingly becoming a major gateway for clinicians, students, and basic researchers to the ever-growing literature and resources of human genetics. Copyright 2000 Wiley-Liss, Inc.
Intelligent Interfaces for Mining Large-Scale RNAi-HCS Image Databases
Lin, Chen; Mak, Wayne; Hong, Pengyu; Sepp, Katharine; Perrimon, Norbert
2010-01-01
Recently, High-content screening (HCS) has been combined with RNA interference (RNAi) to become an essential image-based high-throughput method for studying genes and biological networks through RNAi-induced cellular phenotype analyses. However, a genome-wide RNAi-HCS screen typically generates tens of thousands of images, most of which remain uncategorized due to the inadequacies of existing HCS image analysis tools. Until now, it still requires highly trained scientists to browse a prohibitively large RNAi-HCS image database and produce only a handful of qualitative results regarding cellular morphological phenotypes. For this reason we have developed intelligent interfaces to facilitate the application of the HCS technology in biomedical research. Our new interfaces empower biologists with computational power not only to effectively and efficiently explore large-scale RNAi-HCS image databases, but also to apply their knowledge and experience to interactive mining of cellular phenotypes using Content-Based Image Retrieval (CBIR) with Relevance Feedback (RF) techniques. PMID:21278820
Engel, Stacia R.; Cherry, J. Michael
2013-01-01
The first completed eukaryotic genome sequence was that of the yeast Saccharomyces cerevisiae, and the Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the original model organism database. SGD remains the authoritative community resource for the S. cerevisiae reference genome sequence and its annotation, and continues to provide comprehensive biological information correlated with S. cerevisiae genes and their products. A diverse set of yeast strains have been sequenced to explore commercial and laboratory applications, and a brief history of those strains is provided. The publication of these new genomes has motivated the creation of new tools, and SGD will annotate and provide comparative analyses of these sequences, correlating changes with variations in strain phenotypes and protein function. We are entering a new era at SGD, as we incorporate these new sequences and make them accessible to the scientific community, all in an effort to continue in our mission of educating researchers and facilitating discovery. Database URL: http://www.yeastgenome.org/ PMID:23487186
Semi-Automated Annotation of Biobank Data Using Standard Medical Terminologies in a Graph Database.
Hofer, Philipp; Neururer, Sabrina; Goebel, Georg
2016-01-01
Data describing biobank resources frequently contains unstructured free-text information or insufficient coding standards. (Bio-) medical ontologies like Orphanet Rare Diseases Ontology (ORDO) or the Human Disease Ontology (DOID) provide a high number of concepts, synonyms and entity relationship properties. Such standard terminologies increase quality and granularity of input data by adding comprehensive semantic background knowledge from validated entity relationships. Moreover, cross-references between terminology concepts facilitate data integration across databases using different coding standards. In order to encourage the use of standard terminologies, our aim is to identify and link relevant concepts with free-text diagnosis inputs within a biobank registry. Relevant concepts are selected automatically by lexical matching and SPARQL queries against a RDF triplestore. To ensure correctness of annotations, proposed concepts have to be confirmed by medical data administration experts before they are entered into the registry database. Relevant (bio-) medical terminologies describing diseases and phenotypes were identified and stored in a graph database which was tied to a local biobank registry. Concept recommendations during data input trigger a structured description of medical data and facilitate data linkage between heterogeneous systems.
Bao, Y M; Liu, X L; Liu, X L; Chen, J H; Zheng, Y J
2017-11-02
Objective: To summarize the clinical characteristics of the diffuse parenchymal lung diseases in a child caused by a novel compound heterozygous ABCA3 mutation and explore the association between the phenotype and ABCA3 mutation. Method: The clinical material of a patient diagnosed with diffuse parenchymal lung disease with ABCA3 mutation in December 2016 in Shenzhen Children's Hospital was analyzed. The information about ABCA3 gene mutation updated before April, 2017 was searched and collected from the gene databases (including 1000Genomes, HGMD, EXAC) and the literatures (including Wanfang Chinese database and Pubmed). Result: The girl was one year and nine months old. She presented with chronic cough, tachypnea, cyanosis and failure to thrive since she was one year and three months old. Her condition gradually deteriorated after she was empirically treated. Physical examination showed malnutrition, tachypnea and clubbed-fingers. Her high resolution computed tomography (HRCT) revealed diffused ground-glass opacities, thickened interlobular septum, and multiple subpleural small air-filled lung cysts. The second generation sequencing study identified a novel compound heterozygous mutation (c.1755delC+c.2890G>A) in her ABCA3 gene, which derived respectively from her parents and has not been reported in the database and the literatures mentioned above. Conclusion: c.1755delC+c.2890G>A is a new kind of compound heterozygous mutation in ABCA3, which can cause children's diffuse parenchymal lung disease. Its phenotype is related to its genotype.
Database of cattle candidate genes and genetic markers for milk production and mastitis
Ogorevc, J; Kunej, T; Razpet, A; Dovc, P
2009-01-01
A cattle database of candidate genes and genetic markers for milk production and mastitis has been developed to provide an integrated research tool incorporating different types of information supporting a genomic approach to study lactation, udder development and health. The database contains 943 genes and genetic markers involved in mammary gland development and function, representing candidates for further functional studies. The candidate loci were drawn on a genetic map to reveal positional overlaps. For identification of candidate loci, data from seven different research approaches were exploited: (i) gene knockouts or transgenes in mice that result in specific phenotypes associated with mammary gland (143 loci); (ii) cattle QTL for milk production (344) and mastitis related traits (71); (iii) loci with sequence variations that show specific allele-phenotype interactions associated with milk production (24) or mastitis (10) in cattle; (iv) genes with expression profiles associated with milk production (207) or mastitis (107) in cattle or mouse; (v) cattle milk protein genes that exist in different genetic variants (9); (vi) miRNAs expressed in bovine mammary gland (32) and (vii) epigenetically regulated cattle genes associated with mammary gland function (1). Fourty-four genes found by multiple independent analyses were suggested as the most promising candidates and were further in silico analysed for expression levels in lactating mammary gland, genetic variability and top biological functions in functional networks. A miRNA target search for mammary gland expressed miRNAs identified 359 putative binding sites in 3′UTRs of candidate genes. PMID:19508288
2010-01-01
Background The maturing field of genomics is rapidly increasing the number of sequenced genomes and producing more information from those previously sequenced. Much of this additional information is variation data derived from sampling multiple individuals of a given species with the goal of discovering new variants and characterising the population frequencies of the variants that are already known. These data have immense value for many studies, including those designed to understand evolution and connect genotype to phenotype. Maximising the utility of the data requires that it be stored in an accessible manner that facilitates the integration of variation data with other genome resources such as gene annotation and comparative genomics. Description The Ensembl project provides comprehensive and integrated variation resources for a wide variety of chordate genomes. This paper provides a detailed description of the sources of data and the methods for creating the Ensembl variation databases. It also explores the utility of the information by explaining the range of query options available, from using interactive web displays, to online data mining tools and connecting directly to the data servers programmatically. It gives a good overview of the variation resources and future plans for expanding the variation data within Ensembl. Conclusions Variation data is an important key to understanding the functional and phenotypic differences between individuals. The development of new sequencing and genotyping technologies is greatly increasing the amount of variation data known for almost all genomes. The Ensembl variation resources are integrated into the Ensembl genome browser and provide a comprehensive way to access this data in the context of a widely used genome bioinformatics system. All Ensembl data is freely available at http://www.ensembl.org and from the public MySQL database server at ensembldb.ensembl.org. PMID:20459805
Analysis and visualization of Arabidopsis thaliana GWAS using web 2.0 technologies.
Huang, Yu S; Horton, Matthew; Vilhjálmsson, Bjarni J; Seren, Umit; Meng, Dazhe; Meyer, Christopher; Ali Amer, Muhammad; Borevitz, Justin O; Bergelson, Joy; Nordborg, Magnus
2011-01-01
With large-scale genomic data becoming the norm in biological studies, the storing, integrating, viewing and searching of such data have become a major challenge. In this article, we describe the development of an Arabidopsis thaliana database that hosts the geographic information and genetic polymorphism data for over 6000 accessions and genome-wide association study (GWAS) results for 107 phenotypes representing the largest collection of Arabidopsis polymorphism data and GWAS results to date. Taking advantage of a series of the latest web 2.0 technologies, such as Ajax (Asynchronous JavaScript and XML), GWT (Google-Web-Toolkit), MVC (Model-View-Controller) web framework and Object Relationship Mapper, we have created a web-based application (web app) for the database, that offers an integrated and dynamic view of geographic information, genetic polymorphism and GWAS results. Essential search functionalities are incorporated into the web app to aid reverse genetics research. The database and its web app have proven to be a valuable resource to the Arabidopsis community. The whole framework serves as an example of how biological data, especially GWAS, can be presented and accessed through the web. In the end, we illustrate the potential to gain new insights through the web app by two examples, showcasing how it can be used to facilitate forward and reverse genetics research. Database URL: http://arabidopsis.usc.edu/
Makita, Yuko; Kobayashi, Norio; Mochizuki, Yoshiki; Yoshida, Yuko; Asano, Satomi; Heida, Naohiko; Deshpande, Mrinalini; Bhatia, Rinki; Matsushima, Akihiro; Ishii, Manabu; Kawaguchi, Shuji; Iida, Kei; Hanada, Kosuke; Kuromori, Takashi; Seki, Motoaki; Shinozaki, Kazuo; Toyoda, Tetsuro
2009-07-01
Molecular breeding of crops is an efficient way to upgrade plant functions useful to mankind. A key step is forward genetics or positional cloning to identify the genes that confer useful functions. In order to accelerate the whole research process, we have developed an integrated database system powered by an intelligent data-retrieval engine termed PosMed-plus (Positional Medline for plant upgrading science), allowing us to prioritize highly promising candidate genes in a given chromosomal interval(s) of Arabidopsis thaliana and rice, Oryza sativa. By inferentially integrating cross-species information resources including genomes, transcriptomes, proteomes, localizomes, phenomes and literature, the system compares a user's query, such as phenotypic or functional keywords, with the literature associated with the relevant genes located within the interval. By utilizing orthologous and paralogous correspondences, PosMed-plus efficiently integrates cross-species information to facilitate the ranking of rice candidate genes based on evidence from other model species such as Arabidopsis. PosMed-plus is a plant science version of the PosMed system widely used by mammalian researchers, and provides both a powerful integrative search function and a rich integrative display of the integrated databases. PosMed-plus is the first cross-species integrated database that inferentially prioritizes candidate genes for forward genetics approaches in plant science, and will be expanded for wider use in plant upgrading in many species.
Zhou, Bailing; Zhao, Huiying; Yu, Jiafeng; Guo, Chengang; Dou, Xianghua; Song, Feng; Hu, Guodong; Cao, Zanxia; Qu, Yuanxu; Yang, Yuedong; Zhou, Yaoqi; Wang, Jihua
2018-01-04
Long non-coding RNAs (lncRNAs) play important functional roles in various biological processes. Early databases were utilized to deposit all lncRNA candidates produced by high-throughput experimental and/or computational techniques to facilitate classification, assessment and validation. As more lncRNAs are validated by low-throughput experiments, several databases were established for experimentally validated lncRNAs. However, these databases are small in scale (with a few hundreds of lncRNAs only) and specific in their focuses (plants, diseases or interactions). Thus, it is highly desirable to have a comprehensive dataset for experimentally validated lncRNAs as a central repository for all of their structures, functions and phenotypes. Here, we established EVLncRNAs by curating lncRNAs validated by low-throughput experiments (up to 1 May 2016) and integrating specific databases (lncRNAdb, LncRANDisease, Lnc2Cancer and PLNIncRBase) with additional functional and disease-specific information not covered previously. The current version of EVLncRNAs contains 1543 lncRNAs from 77 species that is 2.9 times larger than the current largest database for experimentally validated lncRNAs. Seventy-four percent lncRNA entries are partially or completely new, comparing to all existing experimentally validated databases. The established database allows users to browse, search and download as well as to submit experimentally validated lncRNAs. The database is available at http://biophy.dzu.edu.cn/EVLncRNAs. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Zhao, Huiying; Yu, Jiafeng; Guo, Chengang; Dou, Xianghua; Song, Feng; Hu, Guodong; Cao, Zanxia; Qu, Yuanxu
2018-01-01
Abstract Long non-coding RNAs (lncRNAs) play important functional roles in various biological processes. Early databases were utilized to deposit all lncRNA candidates produced by high-throughput experimental and/or computational techniques to facilitate classification, assessment and validation. As more lncRNAs are validated by low-throughput experiments, several databases were established for experimentally validated lncRNAs. However, these databases are small in scale (with a few hundreds of lncRNAs only) and specific in their focuses (plants, diseases or interactions). Thus, it is highly desirable to have a comprehensive dataset for experimentally validated lncRNAs as a central repository for all of their structures, functions and phenotypes. Here, we established EVLncRNAs by curating lncRNAs validated by low-throughput experiments (up to 1 May 2016) and integrating specific databases (lncRNAdb, LncRANDisease, Lnc2Cancer and PLNIncRBase) with additional functional and disease-specific information not covered previously. The current version of EVLncRNAs contains 1543 lncRNAs from 77 species that is 2.9 times larger than the current largest database for experimentally validated lncRNAs. Seventy-four percent lncRNA entries are partially or completely new, comparing to all existing experimentally validated databases. The established database allows users to browse, search and download as well as to submit experimentally validated lncRNAs. The database is available at http://biophy.dzu.edu.cn/EVLncRNAs. PMID:28985416
Zdrazil, B.; Neefs, J.-M.; Van Vlijmen, H.; Herhaus, C.; Caracoti, A.; Brea, J.; Roibás, B.; Loza, M. I.; Queralt-Rosinach, N.; Furlong, L. I.; Gaulton, A.; Bartek, L.; Senger, S.; Chichester, C.; Engkvist, O.; Evelo, C. T.; Franklin, N. I.; Marren, D.; Ecker, G. F.
2016-01-01
Phenotypic screening is in a renaissance phase and is expected by many academic and industry leaders to accelerate the discovery of new drugs for new biology. Given that phenotypic screening is per definition target agnostic, the emphasis of in silico and in vitro follow-up work is on the exploration of possible molecular mechanisms and efficacy targets underlying the biological processes interrogated by the phenotypic screening experiments. Herein, we present six exemplar computational protocols for the interpretation of cellular phenotypic screens based on the integration of compound, target, pathway, and disease data established by the IMI Open PHACTS project. The protocols annotate phenotypic hit lists and allow follow-up experiments and mechanistic conclusions. The annotations included are from ChEMBL, ChEBI, GO, WikiPathways and DisGeNET. Also provided are protocols which select from the IUPHAR/BPS Guide to PHARMACOLOGY interaction file selective compounds to probe potential targets and a correlation robot which systematically aims to identify an overlap of active compounds in both the phenotypic as well as any kinase assay. The protocols are applied to a phenotypic pre-lamin A/C splicing assay selected from the ChEMBL database to illustrate the process. The computational protocols make use of the Open PHACTS API and data and are built within the Pipeline Pilot and KNIME workflow tools. PMID:27774140
Zheng, Jie; Erzurumluoglu, A Mesut; Elsworth, Benjamin L; Kemp, John P; Howe, Laurence; Haycock, Philip C; Hemani, Gibran; Tansey, Katherine; Laurin, Charles; Pourcain, Beate St; Warrington, Nicole M; Finucane, Hilary K; Price, Alkes L; Bulik-Sullivan, Brendan K; Anttila, Verneri; Paternoster, Lavinia; Gaunt, Tom R; Evans, David M; Neale, Benjamin M
2017-01-15
LD score regression is a reliable and efficient method of using genome-wide association study (GWAS) summary-level results data to estimate the SNP heritability of complex traits and diseases, partition this heritability into functional categories, and estimate the genetic correlation between different phenotypes. Because the method relies on summary level results data, LD score regression is computationally tractable even for very large sample sizes. However, publicly available GWAS summary-level data are typically stored in different databases and have different formats, making it difficult to apply LD score regression to estimate genetic correlations across many different traits simultaneously. In this manuscript, we describe LD Hub - a centralized database of summary-level GWAS results for 173 diseases/traits from different publicly available resources/consortia and a web interface that automates the LD score regression analysis pipeline. To demonstrate functionality and validate our software, we replicated previously reported LD score regression analyses of 49 traits/diseases using LD Hub; and estimated SNP heritability and the genetic correlation across the different phenotypes. We also present new results obtained by uploading a recent atopic dermatitis GWAS meta-analysis to examine the genetic correlation between the condition and other potentially related traits. In response to the growing availability of publicly accessible GWAS summary-level results data, our database and the accompanying web interface will ensure maximal uptake of the LD score regression methodology, provide a useful database for the public dissemination of GWAS results, and provide a method for easily screening hundreds of traits for overlapping genetic aetiologies. The web interface and instructions for using LD Hub are available at http://ldsc.broadinstitute.org/ CONTACT: jie.zheng@bristol.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Collod-Béroud, G; Béroud, C; Adès, L; Black, C; Boxer, M; Brock, D J; Godfrey, M; Hayward, C; Karttunen, L; Milewicz, D; Peltonen, L; Richards, R I; Wang, M; Junien, C; Boileau, C
1997-01-01
Fibrillin is the major component of extracellular microfibrils. Mutations in the fibrillin gene on chromosome 15 (FBN1) were described at first in the heritable connective tissue disorder, Marfan syndrome (MFS). More recently, FBN1 has also been shown to harbor mutations related to a spectrum of conditions phenotypically related to MFS. These mutations are private, essentially missense, generally non-recurrent and widely distributed throughout the gene. To date no clear genotype/phenotype relationship has been observed excepted for the localization of neonatal mutations in a cluster between exons 24 and 32. The second version of the computerized Marfan database contains 89 entries. The software has been modified to accomodate new functions and routines. PMID:9016526
Classifying compound mechanism of action for linking whole cell phenotypes to molecular targets
Bourne, Christina R.; Wakeham, Nancy; Bunce, Richard A.; Berlin, K. Darrell; Barrow, William W.
2013-01-01
Drug development programs have proven successful when performed at a whole cell level, thus incorporating solubility and permeability into the primary screen. However, linking those results to the target within the cell has been a major set-back. The Phenotype Microarray system, marketed and sold by Biolog, seeks to address this need by assessing the phenotype in combination with a variety of chemicals with known mechanism of action (MOA). We have evaluated this system for usefulness in deducing the MOA for three test compounds. To achieve this, we constructed a database with 21 known antimicrobials, which served as a comparison for grouping our unknown MOA compounds. Pearson correlation and Ward linkage calculations were used to generate a dendrogram that produced clustering largely by known MOA, although there were exceptions. Of the three unknown compounds, one was definitively placed as an anti-folate. The second and third compounds’ MOA were not clearly identified, likely due to unique MOA not represented within the commercial database. The availability of the database generated in this report for S. aureus ATCC 29213 will increase the accessibility of this technique to other investigators. From our analysis, the Phenotype Microarray system can group compounds with clear MOA, but distinction of unique or broadly acting MOA at this time is less clear. PMID:22434711
Towards linked open gene mutations data
2012-01-01
Background With the advent of high-throughput technologies, a great wealth of variation data is being produced. Such information may constitute the basis for correlation analyses between genotypes and phenotypes and, in the future, for personalized medicine. Several databases on gene variation exist, but this kind of information is still scarce in the Semantic Web framework. In this paper, we discuss issues related to the integration of mutation data in the Linked Open Data infrastructure, part of the Semantic Web framework. We present the development of a mapping from the IARC TP53 Mutation database to RDF and the implementation of servers publishing this data. Methods A version of the IARC TP53 Mutation database implemented in a relational database was used as first test set. Automatic mappings to RDF were first created by using D2RQ and later manually refined by introducing concepts and properties from domain vocabularies and ontologies, as well as links to Linked Open Data implementations of various systems of biomedical interest. Since D2RQ query performances are lower than those that can be achieved by using an RDF archive, generated data was also loaded into a dedicated system based on tools from the Jena software suite. Results We have implemented a D2RQ Server for TP53 mutation data, providing data on a subset of the IARC database, including gene variations, somatic mutations, and bibliographic references. The server allows to browse the RDF graph by using links both between classes and to external systems. An alternative interface offers improved performances for SPARQL queries. The resulting data can be explored by using any Semantic Web browser or application. Conclusions This has been the first case of a mutation database exposed as Linked Data. A revised version of our prototype, including further concepts and IARC TP53 Mutation database data sets, is under development. The publication of variation information as Linked Data opens new perspectives: the exploitation of SPARQL searches on mutation data and other biological databases may support data retrieval which is presently not possible. Moreover, reasoning on integrated variation data may support discoveries towards personalized medicine. PMID:22536974
Towards linked open gene mutations data.
Zappa, Achille; Splendiani, Andrea; Romano, Paolo
2012-03-28
With the advent of high-throughput technologies, a great wealth of variation data is being produced. Such information may constitute the basis for correlation analyses between genotypes and phenotypes and, in the future, for personalized medicine. Several databases on gene variation exist, but this kind of information is still scarce in the Semantic Web framework. In this paper, we discuss issues related to the integration of mutation data in the Linked Open Data infrastructure, part of the Semantic Web framework. We present the development of a mapping from the IARC TP53 Mutation database to RDF and the implementation of servers publishing this data. A version of the IARC TP53 Mutation database implemented in a relational database was used as first test set. Automatic mappings to RDF were first created by using D2RQ and later manually refined by introducing concepts and properties from domain vocabularies and ontologies, as well as links to Linked Open Data implementations of various systems of biomedical interest. Since D2RQ query performances are lower than those that can be achieved by using an RDF archive, generated data was also loaded into a dedicated system based on tools from the Jena software suite. We have implemented a D2RQ Server for TP53 mutation data, providing data on a subset of the IARC database, including gene variations, somatic mutations, and bibliographic references. The server allows to browse the RDF graph by using links both between classes and to external systems. An alternative interface offers improved performances for SPARQL queries. The resulting data can be explored by using any Semantic Web browser or application. This has been the first case of a mutation database exposed as Linked Data. A revised version of our prototype, including further concepts and IARC TP53 Mutation database data sets, is under development.The publication of variation information as Linked Data opens new perspectives: the exploitation of SPARQL searches on mutation data and other biological databases may support data retrieval which is presently not possible. Moreover, reasoning on integrated variation data may support discoveries towards personalized medicine.
WormQTLHD—a web database for linking human disease to natural variation data in C. elegans
van der Velde, K. Joeri; de Haan, Mark; Zych, Konrad; Arends, Danny; Snoek, L. Basten; Kammenga, Jan E.; Jansen, Ritsert C.; Swertz, Morris A.; Li, Yang
2014-01-01
Interactions between proteins are highly conserved across species. As a result, the molecular basis of multiple diseases affecting humans can be studied in model organisms that offer many alternative experimental opportunities. One such organism—Caenorhabditis elegans—has been used to produce much molecular quantitative genetics and systems biology data over the past decade. We present WormQTLHD (Human Disease), a database that quantitatively and systematically links expression Quantitative Trait Loci (eQTL) findings in C. elegans to gene–disease associations in man. WormQTLHD, available online at http://www.wormqtl-hd.org, is a user-friendly set of tools to reveal functionally coherent, evolutionary conserved gene networks. These can be used to predict novel gene-to-gene associations and the functions of genes underlying the disease of interest. We created a new database that links C. elegans eQTL data sets to human diseases (34 337 gene–disease associations from OMIM, DGA, GWAS Central and NHGRI GWAS Catalogue) based on overlapping sets of orthologous genes associated to phenotypes in these two species. We utilized QTL results, high-throughput molecular phenotypes, classical phenotypes and genotype data covering different developmental stages and environments from WormQTL database. All software is available as open source, built on MOLGENIS and xQTL workbench. PMID:24217915
WormQTLHD--a web database for linking human disease to natural variation data in C. elegans.
van der Velde, K Joeri; de Haan, Mark; Zych, Konrad; Arends, Danny; Snoek, L Basten; Kammenga, Jan E; Jansen, Ritsert C; Swertz, Morris A; Li, Yang
2014-01-01
Interactions between proteins are highly conserved across species. As a result, the molecular basis of multiple diseases affecting humans can be studied in model organisms that offer many alternative experimental opportunities. One such organism-Caenorhabditis elegans-has been used to produce much molecular quantitative genetics and systems biology data over the past decade. We present WormQTL(HD) (Human Disease), a database that quantitatively and systematically links expression Quantitative Trait Loci (eQTL) findings in C. elegans to gene-disease associations in man. WormQTL(HD), available online at http://www.wormqtl-hd.org, is a user-friendly set of tools to reveal functionally coherent, evolutionary conserved gene networks. These can be used to predict novel gene-to-gene associations and the functions of genes underlying the disease of interest. We created a new database that links C. elegans eQTL data sets to human diseases (34 337 gene-disease associations from OMIM, DGA, GWAS Central and NHGRI GWAS Catalogue) based on overlapping sets of orthologous genes associated to phenotypes in these two species. We utilized QTL results, high-throughput molecular phenotypes, classical phenotypes and genotype data covering different developmental stages and environments from WormQTL database. All software is available as open source, built on MOLGENIS and xQTL workbench.
Pullman, Daryl; Perrot-Daley, Astrid; Hodgkinson, Kathy; Street, Catherine; Rahman, Proton
2013-01-01
Objective To provide a legal and ethical analysis of some of the implementation challenges faced by the Population Therapeutics Research Group (PTRG) at Memorial University (Canada), in using genealogical information offered by individuals for its genetics research database. Materials and methods This paper describes the unique historical and genetic characteristics of the Newfoundland and Labrador founder population, which gave rise to the opportunity for PTRG to build the Newfoundland Genealogy Database containing digitized records of all pre-confederation (1949) census records of the Newfoundland founder population. In addition to building the database, PTRG has developed the Heritability Analytics Infrastructure, a data management structure that stores genotype, phenotype, and pedigree information in a single database, and custom linkage software (KINNECT) to perform pedigree linkages on the genealogy database. Discussion A newly adopted legal regimen in Newfoundland and Labrador is discussed. It incorporates health privacy legislation with a unique research ethics statute governing the composition and activities of research ethics boards and, for the first time in Canada, elevating the status of national research ethics guidelines into law. The discussion looks at this integration of legal and ethical principles which provides a flexible and seamless framework for balancing the privacy rights and welfare interests of individuals, families, and larger societies in the creation and use of research data infrastructures as public goods. Conclusion The complementary legal and ethical frameworks that now coexist in Newfoundland and Labrador provide the legislative authority, ethical legitimacy, and practical flexibility needed to find a workable balance between privacy interests and public goods. Such an approach may also be instructive for other jurisdictions as they seek to construct and use biobanks and related research platforms for genetic research. PMID:22859644
Kosseim, Patricia; Pullman, Daryl; Perrot-Daley, Astrid; Hodgkinson, Kathy; Street, Catherine; Rahman, Proton
2013-01-01
To provide a legal and ethical analysis of some of the implementation challenges faced by the Population Therapeutics Research Group (PTRG) at Memorial University (Canada), in using genealogical information offered by individuals for its genetics research database. This paper describes the unique historical and genetic characteristics of the Newfoundland and Labrador founder population, which gave rise to the opportunity for PTRG to build the Newfoundland Genealogy Database containing digitized records of all pre-confederation (1949) census records of the Newfoundland founder population. In addition to building the database, PTRG has developed the Heritability Analytics Infrastructure, a data management structure that stores genotype, phenotype, and pedigree information in a single database, and custom linkage software (KINNECT) to perform pedigree linkages on the genealogy database. A newly adopted legal regimen in Newfoundland and Labrador is discussed. It incorporates health privacy legislation with a unique research ethics statute governing the composition and activities of research ethics boards and, for the first time in Canada, elevating the status of national research ethics guidelines into law. The discussion looks at this integration of legal and ethical principles which provides a flexible and seamless framework for balancing the privacy rights and welfare interests of individuals, families, and larger societies in the creation and use of research data infrastructures as public goods. The complementary legal and ethical frameworks that now coexist in Newfoundland and Labrador provide the legislative authority, ethical legitimacy, and practical flexibility needed to find a workable balance between privacy interests and public goods. Such an approach may also be instructive for other jurisdictions as they seek to construct and use biobanks and related research platforms for genetic research.
NutriChem: a systems chemical biology resource to explore the medicinal value of plant-based foods.
Jensen, Kasper; Panagiotou, Gianni; Kouskoumvekaki, Irene
2015-01-01
There is rising evidence of an inverse association between chronic diseases and diets characterized by rich fruit and vegetable consumption. Dietary components may act directly or indirectly on the human genome and modulate multiple processes involved in disease risk and disease progression. However, there is currently no exhaustive resource on the health benefits associated to specific dietary interventions, or a resource covering the broad molecular content of food. Here we present the first release of NutriChem, available at http://cbs.dtu.dk/services/NutriChem-1.0, a database generated by text mining of 21 million MEDLINE abstracts for information that links plant-based foods with their small molecule components and human disease phenotypes. NutriChem contains text-mined data for 18478 pairs of 1772 plant-based foods and 7898 phytochemicals, and 6242 pairs of 1066 plant-based foods and 751 diseases. In addition, it includes predicted associations for 548 phytochemicals and 252 diseases. To the best of our knowledge this database is the only resource linking the chemical space of plant-based foods with human disease phenotypes and provides a foundation for understanding mechanistically the consequences of eating behaviors on health. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
1-CMDb: A Curated Database of Genomic Variations of the One-Carbon Metabolism Pathway.
Bhat, Manoj K; Gadekar, Veerendra P; Jain, Aditya; Paul, Bobby; Rai, Padmalatha S; Satyamoorthy, Kapaettu
2017-01-01
The one-carbon metabolism pathway is vital in maintaining tissue homeostasis by driving the critical reactions of folate and methionine cycles. A myriad of genetic and epigenetic events mark the rate of reactions in a tissue-specific manner. Integration of these to predict and provide personalized health management requires robust computational tools that can process multiomics data. The DNA sequences that may determine the chain of biological events and the endpoint reactions within one-carbon metabolism genes remain to be comprehensively recorded. Hence, we designed the one-carbon metabolism database (1-CMDb) as a platform to interrogate its association with a host of human disorders. DNA sequence and network information of a total of 48 genes were extracted from a literature survey and KEGG pathway that are involved in the one-carbon folate-mediated pathway. The information generated, collected, and compiled for all these genes from the UCSC genome browser included the single nucleotide polymorphisms (SNPs), CpGs, copy number variations (CNVs), and miRNAs, and a comprehensive database was created. Furthermore, a significant correlation analysis was performed for SNPs in the pathway genes. Detailed data of SNPs, CNVs, CpG islands, and miRNAs for 48 folate pathway genes were compiled. The SNPs in CNVs (9670), CpGs (984), and miRNAs (14) were also compiled for all pathway genes. The SIFT score, the prediction and PolyPhen score, as well as the prediction for each of the SNPs were tabulated and represented for folate pathway genes. Also included in the database for folate pathway genes were the links to 124 various phenotypes and disease associations as reported in the literature and from publicly available information. A comprehensive database was generated consisting of genomic elements within and among SNPs, CNVs, CpGs, and miRNAs of one-carbon metabolism pathways to facilitate (a) single source of information and (b) integration into large-genome scale network analysis to be developed in the future by the scientific community. The database can be accessed at http://slsdb.manipal.edu/ocm/. © 2017 S. Karger AG, Basel.
2009-01-01
Background The majority of the genes even in well-studied multi-cellular model organisms have not been functionally characterized yet. Mining the numerous genome wide data sets related to protein function to retrieve potential candidate genes for a particular biological process remains a challenge. Description GExplore has been developed to provide a user-friendly database interface for data mining at the gene expression/protein function level to help in hypothesis development and experiment design. It supports combinatorial searches for proteins with certain domains, tissue- or developmental stage-specific expression patterns, and mutant phenotypes. GExplore operates on a stand-alone database and has fast response times, which is essential for exploratory searches. The interface is not only user-friendly, but also modular so that it accommodates additional data sets in the future. Conclusion GExplore is an online database for quick mining of data related to gene and protein function, providing a multi-gene display of data sets related to the domain composition of proteins as well as expression and phenotype data. GExplore is publicly available at: http://genome.sfu.ca/gexplore/ PMID:19917126
The first Malay database toward the ethnic-specific target molecular variation.
Halim-Fikri, Hashim; Etemad, Ali; Abdul Latif, Ahmad Zubaidi; Merican, Amir Feisal; Baig, Atif Amin; Annuar, Azlina Ahmad; Ismail, Endom; Salahshourifar, Iman; Liza-Sharmini, Ahmad Tajudin; Ramli, Marini; Shah, Mohamed Irwan; Johan, Muhammad Farid; Hassan, Nik Norliza Nik; Abdul-Aziz, Noraishah Mydin; Mohd Noor, Noor Haslina; Nur-Shafawati, Ab Rajab; Hassan, Rosline; Bahar, Rosnah; Zain, Rosnah Binti; Yusoff, Shafini Mohamed; Yusoff, Surini; Tan, Soon Guan; Thong, Meow-Keong; Wan-Isa, Hatin; Abdullah, Wan Zaidah; Mohamed, Zahurin; Abdul Latiff, Zarina; Zilfalil, Bin Alwi
2015-04-30
The Malaysian Node of the Human Variome Project (MyHVP) is one of the eighteen official Human Variome Project (HVP) country-specific nodes. Since its inception in 9(th) October 2010, MyHVP has attracted the significant number of Malaysian clinicians and researchers to participate and contribute their data to this project. MyHVP also act as the center of coordination for genotypic and phenotypic variation studies of the Malaysian population. A specialized database was developed to store and manage the data based on genetic variations which also associated with health and disease of Malaysian ethnic groups. This ethnic-specific database is called the Malaysian Node of the Human Variome Project database (MyHVPDb). Currently, MyHVPDb provides only information about the genetic variations and mutations found in the Malays. In the near future, it will expand for the other Malaysian ethnics as well. The data sets are specified based on diseases or genetic mutation types which have three main subcategories: Single Nucleotide Polymorphism (SNP), Copy Number Variation (CNV) followed by the mutations which code for the common diseases among Malaysians. MyHVPDb has been open to the local researchers, academicians and students through the registration at the portal of MyHVP ( http://hvpmalaysia.kk.usm.my/mhgvc/index.php?id=register ). This database would be useful for clinicians and researchers who are interested in doing a study on genomics population and genetic diseases in order to obtain up-to-date and accurate information regarding the population-specific variations and also useful for those in countries with similar ethnic background.
Astuti, Dewi; Sabir, Ataf; Fulton, Piers; Zatyka, Malgorzata; Williams, Denise; Hardy, Carol; Milan, Gabriella; Favaretto, Francesca; Yu‐Wai‐Man, Patrick; Rohayem, Julia; López de Heredia, Miguel; Hershey, Tamara; Tranebjaerg, Lisbeth; Chen, Jian‐Hua; Chaussenot, Annabel; Nunes, Virginia; Marshall, Bess; McAfferty, Susan; Tillmann, Vallo; Maffei, Pietro; Paquis‐Flucklinger, Veronique; Geberhiwot, Tarekign; Mlynarski, Wojciech; Parkinson, Kay; Picard, Virginie; Bueno, Gema Esteban; Dias, Renuka; Arnold, Amy; Richens, Caitlin; Paisey, Richard; Urano, Fumihiko; Semple, Robert; Sinnott, Richard
2017-01-01
Abstract We developed a variant database for diabetes syndrome genes, using the Leiden Open Variation Database platform, containing observed phenotypes matched to the genetic variations. We populated it with 628 published disease‐associated variants (December 2016) for: WFS1 (n = 309), CISD2 (n = 3), ALMS1 (n = 268), and SLC19A2 (n = 48) for Wolfram type 1, Wolfram type 2, Alström, and Thiamine‐responsive megaloblastic anemia syndromes, respectively; and included 23 previously unpublished novel germline variants in WFS1 and 17 variants in ALMS1. We then investigated genotype–phenotype relations for the WFS1 gene. The presence of biallelic loss‐of‐function variants predicted Wolfram syndrome defined by insulin‐dependent diabetes and optic atrophy, with a sensitivity of 79% (95% CI 75%–83%) and specificity of 92% (83%–97%). The presence of minor loss‐of‐function variants in WFS1 predicted isolated diabetes, isolated deafness, or isolated congenital cataracts without development of the full syndrome (sensitivity 100% [93%–100%]; specificity 78% [73%–82%]). The ability to provide a prognostic prediction based on genotype will lead to improvements in patient care and counseling. The development of the database as a repository for monogenic diabetes gene variants will allow prognostic predictions for other diabetes syndromes as next‐generation sequencing expands the repertoire of genotypes and phenotypes. The database is publicly available online at https://lovd.euro-wabb.org. PMID:28432734
Bernstein, Inge T; Lindorff-Larsen, Karen; Timshel, Susanne; Brandt, Carsten A; Dinesen, Birger; Fenger, Mogens; Gerdes, Anne-Marie; Iversen, Lene H; Madsen, Mogens R; Okkels, Henrik; Sunde, Lone; Rahr, Hans B; Wikman, Friedrick P; Rossing, Niels
2011-05-01
The Danish HNPCC register is a publically financed national database. The register gathers epidemiological and genomic data in HNPCC families to improve prognosis by screening and identifying family members at risk. Diagnostic data are generated throughout the country and collected over several decades. Until recently, paper-based reports were sent to the register and typed into the database. In the EC cofunded-INFOBIOMED network of excellence, the register was a model for electronic exchange of epidemiological and genomic data between diagnosing/treating departments and the central database. The aim of digitization was to optimize the organization of screening by facilitating combination of genotype-phenotype information, and to generate IT-tools sufficiently usable and generic to be implemented in other countries and for other oncogenetic diseases. The focus was on integration of heterogeneous data, elaboration, and dissemination of classification systems and development of communication standards. At the conclusion of the EU project in 2007 the system was implemented in 12 pilot departments. In the surgical departments this resulted in a 192% increase of reports to the database. Several gaps were identified: lack of standards for data to be exchanged, lack of local databases suitable for direct communication, reporting being time-consuming and dependent on interest and feedback. © 2011 Wiley-Liss, Inc.
Software and database for the analysis of mutations in the human FBN1 gene.
Collod, G; Béroud, C; Soussi, T; Junien, C; Boileau, C
1996-01-01
Fibrillin is the major component of extracellular microfibrils. Mutations in the fibrillin gene on chromosome 15 (FBN1) were described at first in the heritable connective tissue disorder, Marfan syndrome (MFS). More recently, FBN1 has also been shown to harbor mutations related to a spectrum of conditions phenotypically related to MFS and many mutations will have to be accumulated before genotype/phenotype relationships emerge. To facilitate mutational analysis of the FBN1 gene, a software package along with a computerized database (currently listing 63 entries) have been created. PMID:8594563
EuroPhenome and EMPReSS: online mouse phenotyping resource
Mallon, Ann-Marie; Hancock, John M.
2008-01-01
EuroPhenome (http://www.europhenome.org) and EMPReSS (http://empress.har.mrc.ac.uk/) form an integrated resource to provide access to data and procedures for mouse phenotyping. EMPReSS describes 96 Standard Operating Procedures for mouse phenotyping. EuroPhenome contains data resulting from carrying out EMPReSS protocols on four inbred laboratory mouse strains. As well as web interfaces, both resources support web services to enable integration with other mouse phenotyping and functional genetics resources, and are committed to initiatives to improve integration of mouse phenotype databases. EuroPhenome will be the repository for a recently initiated effort to carry out large-scale phenotyping on a large number of knockout mouse lines (EUMODIC). PMID:17905814
EuroPhenome and EMPReSS: online mouse phenotyping resource.
Mallon, Ann-Marie; Blake, Andrew; Hancock, John M
2008-01-01
EuroPhenome (http://www.europhenome.org) and EMPReSS (http://empress.har.mrc.ac.uk/) form an integrated resource to provide access to data and procedures for mouse phenotyping. EMPReSS describes 96 Standard Operating Procedures for mouse phenotyping. EuroPhenome contains data resulting from carrying out EMPReSS protocols on four inbred laboratory mouse strains. As well as web interfaces, both resources support web services to enable integration with other mouse phenotyping and functional genetics resources, and are committed to initiatives to improve integration of mouse phenotype databases. EuroPhenome will be the repository for a recently initiated effort to carry out large-scale phenotyping on a large number of knockout mouse lines (EUMODIC).
Saunders, Rebecca E; Instrell, Rachael; Rispoli, Rossella; Jiang, Ming; Howell, Michael
2013-01-01
High-throughput screening (HTS) uses technologies such as RNA interference to generate loss-of-function phenotypes on a genomic scale. As these technologies become more popular, many research institutes have established core facilities of expertise to deal with the challenges of large-scale HTS experiments. As the efforts of core facility screening projects come to fruition, focus has shifted towards managing the results of these experiments and making them available in a useful format that can be further mined for phenotypic discovery. The HTS-DB database provides a public view of data from screening projects undertaken by the HTS core facility at the CRUK London Research Institute. All projects and screens are described with comprehensive assay protocols, and datasets are provided with complete descriptions of analysis techniques. This format allows users to browse and search data from large-scale studies in an informative and intuitive way. It also provides a repository for additional measurements obtained from screens that were not the focus of the project, such as cell viability, and groups these data so that it can provide a gene-centric summary across several different cell lines and conditions. All datasets from our screens that can be made available can be viewed interactively and mined for further hit lists. We believe that in this format, the database provides researchers with rapid access to results of large-scale experiments that might facilitate their understanding of genes/compounds identified in their own research. DATABASE URL: http://hts.cancerresearchuk.org/db/public.
Zaitlen, Noah; Kraft, Peter; Patterson, Nick; Pasaniuc, Bogdan; Bhatia, Gaurav; Pollack, Samuela; Price, Alkes L.
2013-01-01
Important knowledge about the determinants of complex human phenotypes can be obtained from the estimation of heritability, the fraction of phenotypic variation in a population that is determined by genetic factors. Here, we make use of extensive phenotype data in Iceland, long-range phased genotypes, and a population-wide genealogical database to examine the heritability of 11 quantitative and 12 dichotomous phenotypes in a sample of 38,167 individuals. Most previous estimates of heritability are derived from family-based approaches such as twin studies, which may be biased upwards by epistatic interactions or shared environment. Our estimates of heritability, based on both closely and distantly related pairs of individuals, are significantly lower than those from previous studies. We examine phenotypic correlations across a range of relationships, from siblings to first cousins, and find that the excess phenotypic correlation in these related individuals is predominantly due to shared environment as opposed to dominance or epistasis. We also develop a new method to jointly estimate narrow-sense heritability and the heritability explained by genotyped SNPs. Unlike existing methods, this approach permits the use of information from both closely and distantly related pairs of individuals, thereby reducing the variance of estimates of heritability explained by genotyped SNPs while preventing upward bias. Our results show that common SNPs explain a larger proportion of the heritability than previously thought, with SNPs present on Illumina 300K genotyping arrays explaining more than half of the heritability for the 23 phenotypes examined in this study. Much of the remaining heritability is likely to be due to rare alleles that are not captured by standard genotyping arrays. PMID:23737753
Zaitlen, Noah; Kraft, Peter; Patterson, Nick; Pasaniuc, Bogdan; Bhatia, Gaurav; Pollack, Samuela; Price, Alkes L
2013-05-01
Important knowledge about the determinants of complex human phenotypes can be obtained from the estimation of heritability, the fraction of phenotypic variation in a population that is determined by genetic factors. Here, we make use of extensive phenotype data in Iceland, long-range phased genotypes, and a population-wide genealogical database to examine the heritability of 11 quantitative and 12 dichotomous phenotypes in a sample of 38,167 individuals. Most previous estimates of heritability are derived from family-based approaches such as twin studies, which may be biased upwards by epistatic interactions or shared environment. Our estimates of heritability, based on both closely and distantly related pairs of individuals, are significantly lower than those from previous studies. We examine phenotypic correlations across a range of relationships, from siblings to first cousins, and find that the excess phenotypic correlation in these related individuals is predominantly due to shared environment as opposed to dominance or epistasis. We also develop a new method to jointly estimate narrow-sense heritability and the heritability explained by genotyped SNPs. Unlike existing methods, this approach permits the use of information from both closely and distantly related pairs of individuals, thereby reducing the variance of estimates of heritability explained by genotyped SNPs while preventing upward bias. Our results show that common SNPs explain a larger proportion of the heritability than previously thought, with SNPs present on Illumina 300K genotyping arrays explaining more than half of the heritability for the 23 phenotypes examined in this study. Much of the remaining heritability is likely to be due to rare alleles that are not captured by standard genotyping arrays.
Prader-Willi-like phenotypes: a systematic review of their chromosomal abnormalities.
Rocha, C F; Paiva, C L A
2014-03-31
Prader-Willi syndrome (PWS) is caused by the lack of expression of genes located on paternal chromosome 15q11-q13. This lack of gene expression may be due to a deletion in this chromosomal segment, to maternal uniparental disomy of chromosome 15, or to a defect in the imprinting center on 15q11-q13. PWS is characterized by hypotonia during the neonatal stage and in childhood, accompanied by a delay in neuropsychomotor development. Overeating, obesity, and mental deficiency arise later on. The syndrome has a clinical overlap with other diseases, which makes it difficult to accurately diagnose. The purpose of this article is to review the Prader-Willi-like phenotype in the scientific literature from 2000 to 2013, i.e., to review the cases of PWS caused by chromosomal abnormalities different from those found on chromosome 15. A search was carried out using the "National Center for Biotechnology Information" (www.pubmed.com) and "Scientific Electronic Library Online (www.scielo.br) databases and combinations of key words such as "Prader-Willi-like phenotype" and "Prader-Willi syndrome phenotype". Editorials, letters, reviews, and guidelines were excluded. Articles chosen contained descriptions of patients diagnosed with the PWS phenotype but who were negative for alterations on 15q11-q13. Our search found 643 articles about PWS, but only 14 of these matched with the Prader-Willi-like phenotype and with the selected years of publication (2000-2013). If two or more articles reported the same chromosomal alterations for Prader-Willi-like phenotype, the most recent was chosen. Twelve articles of 14 were case reports and 2 reported series of cases.
Cabré, Eduard; Mañosa, Míriam; García-Sánchez, Valle; Gutiérrez, Ana; Ricart, Elena; Esteve, Maria; Guardiola, Jordi; Aguas, Mariam; Merino, Olga; Ponferrada, Angel; Gisbert, Javier P; Garcia-Planella, Esther; Ceña, Gloria; Cabriada, José L; Montoro, Miguel; Domènech, Eugeni
2014-07-01
Disease outcome has been found to be poorer in familial inflammatory bowel disease (IBD) than in sporadic forms, but assessment of phenotypic concordance in familial IBD provided controversial results. We assessed the concordance for disease type and phenotypic features in IBD families. Patients with familial IBD were identified from the IBD Spanish database ENEIDA. Families in whom at least two members were in the database were selected for concordance analysis (κ index). Concordance for type of IBD [Crohn's disease (CD) vs. ulcerative colitis (UC)], as well as for disease extent, localization and behaviour, perianal disease, extraintestinal manifestations, and indicators of severe disease (i.e., need for immunosuppressors, biological agents, and surgery) for those pairs concordant for IBD type, were analyzed. 798 out of 11,905 IBD patients (7%) in ENEIDA had familial history of IBD. Complete data of 107 families (231 patients and 144 consanguineous pairs) were available for concordance analyses. The youngest members of the pairs were diagnosed with IBD at a significantly younger age (p<0.001) than the oldest ones. Seventy-six percent of pairs matched up for the IBD type (κ=0.58; 95%CI: 0.42-0.73, moderate concordance). There was no relevant concordance for any of the phenotypic items assessed in both diseases. Familial IBD is associated with diagnostic anticipation in younger individuals. Familial history does not allow predicting any phenotypic feature other than IBD type. Copyright © 2013 European Crohn's and Colitis Organisation. Published by Elsevier B.V. All rights reserved.
Grubb, Stephen C.; Maddatu, Terry P.; Bult, Carol J.; Bogue, Molly A.
2009-01-01
The Mouse Phenome Database (MPD; http://www.jax.org/phenome) is an open source, web-based repository of phenotypic and genotypic data on commonly used and genetically diverse inbred strains of mice and their derivatives. MPD is also a facility for query, analysis and in silico hypothesis testing. Currently MPD contains about 1400 phenotypic measurements contributed by research teams worldwide, including phenotypes relevant to human health such as cancer susceptibility, aging, obesity, susceptibility to infectious diseases, atherosclerosis, blood disorders and neurosensory disorders. Electronic access to centralized strain data enables investigators to select optimal strains for many systems-based research applications, including physiological studies, drug and toxicology testing, modeling disease processes and complex trait analysis. The ability to select strains for specific research applications by accessing existing phenotype data can bypass the need to (re)characterize strains, precluding major investments of time and resources. This functionality, in turn, accelerates research and leverages existing community resources. Since our last NAR reporting in 2007, MPD has added more community-contributed data covering more phenotypic domains and implemented several new tools and features, including a new interactive Tool Demo available through the MPD homepage (quick link: http://phenome.jax.org/phenome/trytools). PMID:18987003
A multilocus database for the identification of Aspergillus and Penicillium species
USDA-ARS?s Scientific Manuscript database
Identification of Aspergillus and Penicillium isolates using phenotypic methods is increasingly complex and difficult but genetic tools allow recognition and description of species formerly unrecognized or cryptic. We constructed a web-based taxonomic database using BIGSdb for the identification of ...
The triticeae toolbox: combining phenotype and genotype data to advance small-grains breeding
USDA-ARS?s Scientific Manuscript database
The Triticeae Toolbox (http://triticeaetoolbox.org; T3) is the database schema enabling plant breeders and researchers to combine, visualize, and interrogate the wealth of phenotype and genotype data generated by the Triticeae Coordinated Agricultural Project (TCAP). T3 enables users to define speci...
Methodology for the inference of gene function from phenotype data.
Ascensao, Joao A; Dolan, Mary E; Hill, David P; Blake, Judith A
2014-12-12
Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowledge by domain that occurs due to independent development and use of the ontologies. The ability to infer data associated with one ontology to data associated with another ontology would prove useful in expanding information content and scope. We here focus on relating two ontologies: the Gene Ontology (GO), which encodes canonical gene function, and the Mammalian Phenotype Ontology (MP), which describes non-canonical phenotypes, using statistical methods to suggest GO functional annotations from existing MP phenotype annotations. This work is in contrast to previous studies that have focused on inferring gene function from phenotype primarily through lexical or semantic similarity measures. We have designed and tested a set of algorithms that represents a novel methodology to define rules for predicting gene function by examining the emergent structure and relationships between the gene functions and phenotypes rather than inspecting the terms semantically. The algorithms inspect relationships among multiple phenotype terms to deduce if there are cases where they all arise from a single gene function. We apply this methodology to data about genes in the laboratory mouse that are formally represented in the Mouse Genome Informatics (MGI) resource. From the data, 7444 rule instances were generated from five generalized rules, resulting in 4818 unique GO functional predictions for 1796 genes. We show that our method is capable of inferring high-quality functional annotations from curated phenotype data. As well as creating inferred annotations, our method has the potential to allow for the elucidation of unforeseen, biologically significant associations between gene function and phenotypes that would be overlooked by a semantics-based approach. Future work will include the implementation of the described algorithms for a variety of other model organism databases, taking full advantage of the abundance of available high quality curated data.
VarioML framework for comprehensive variation data representation and exchange.
Byrne, Myles; Fokkema, Ivo Fac; Lancaster, Owen; Adamusiak, Tomasz; Ahonen-Bishopp, Anni; Atlan, David; Béroud, Christophe; Cornell, Michael; Dalgleish, Raymond; Devereau, Andrew; Patrinos, George P; Swertz, Morris A; Taschner, Peter Em; Thorisson, Gudmundur A; Vihinen, Mauno; Brookes, Anthony J; Muilu, Juha
2012-10-03
Sharing of data about variation and the associated phenotypes is a critical need, yet variant information can be arbitrarily complex, making a single standard vocabulary elusive and re-formatting difficult. Complex standards have proven too time-consuming to implement. The GEN2PHEN project addressed these difficulties by developing a comprehensive data model for capturing biomedical observations, Observ-OM, and building the VarioML format around it. VarioML pairs a simplified open specification for describing variants, with a toolkit for adapting the specification into one's own research workflow. Straightforward variant data can be captured, federated, and exchanged with no overhead; more complex data can be described, without loss of compatibility. The open specification enables push-button submission to gene variant databases (LSDBs) e.g., the Leiden Open Variation Database, using the Cafe Variome data publishing service, while VarioML bidirectionally transforms data between XML and web-application code formats, opening up new possibilities for open source web applications building on shared data. A Java implementation toolkit makes VarioML easily integrated into biomedical applications. VarioML is designed primarily for LSDB data submission and transfer scenarios, but can also be used as a standard variation data format for JSON and XML document databases and user interface components. VarioML is a set of tools and practices improving the availability, quality, and comprehensibility of human variation information. It enables researchers, diagnostic laboratories, and clinics to share that information with ease, clarity, and without ambiguity.
VarioML framework for comprehensive variation data representation and exchange
2012-01-01
Background Sharing of data about variation and the associated phenotypes is a critical need, yet variant information can be arbitrarily complex, making a single standard vocabulary elusive and re-formatting difficult. Complex standards have proven too time-consuming to implement. Results The GEN2PHEN project addressed these difficulties by developing a comprehensive data model for capturing biomedical observations, Observ-OM, and building the VarioML format around it. VarioML pairs a simplified open specification for describing variants, with a toolkit for adapting the specification into one's own research workflow. Straightforward variant data can be captured, federated, and exchanged with no overhead; more complex data can be described, without loss of compatibility. The open specification enables push-button submission to gene variant databases (LSDBs) e.g., the Leiden Open Variation Database, using the Cafe Variome data publishing service, while VarioML bidirectionally transforms data between XML and web-application code formats, opening up new possibilities for open source web applications building on shared data. A Java implementation toolkit makes VarioML easily integrated into biomedical applications. VarioML is designed primarily for LSDB data submission and transfer scenarios, but can also be used as a standard variation data format for JSON and XML document databases and user interface components. Conclusions VarioML is a set of tools and practices improving the availability, quality, and comprehensibility of human variation information. It enables researchers, diagnostic laboratories, and clinics to share that information with ease, clarity, and without ambiguity. PMID:23031277
genenames.org: the HGNC resources in 2011
Seal, Ruth L.; Gordon, Susan M.; Lush, Michael J.; Wright, Mathew W.; Bruford, Elspeth A.
2011-01-01
The HUGO Gene Nomenclature Committee (HGNC) aims to assign a unique gene symbol and name to every human gene. The HGNC database currently contains almost 30 000 approved gene symbols, over 19 000 of which represent protein-coding genes. The public website, www.genenames.org, displays all approved nomenclature within Symbol Reports that contain data curated by HGNC editors and links to related genomic, phenotypic and proteomic information. Here we describe improvements to our resources, including a new Quick Gene Search, a new List Search, an integrated HGNC BioMart and a new Statistics and Downloads facility. PMID:20929869
Shrestha, Rosemary; Matteis, Luca; Skofic, Milko; Portugal, Arllet; McLaren, Graham; Hyman, Glenn; Arnaud, Elizabeth
2012-01-01
The Crop Ontology (CO) of the Generation Challenge Program (GCP) (http://cropontology.org/) is developed for the Integrated Breeding Platform (IBP) (http://www.integratedbreeding.net/) by several centers of The Consultative Group on International Agricultural Research (CGIAR): bioversity, CIMMYT, CIP, ICRISAT, IITA, and IRRI. Integrated breeding necessitates that breeders access genotypic and phenotypic data related to a given trait. The CO provides validated trait names used by the crop communities of practice (CoP) for harmonizing the annotation of phenotypic and genotypic data and thus supporting data accessibility and discovery through web queries. The trait information is completed by the description of the measurement methods and scales, and images. The trait dictionaries used to produce the Integrated Breeding (IB) fieldbooks are synchronized with the CO terms for an automatic annotation of the phenotypic data measured in the field. The IB fieldbook provides breeders with direct access to the CO to get additional descriptive information on the traits. Ontologies and trait dictionaries are online for cassava, chickpea, common bean, groundnut, maize, Musa, potato, rice, sorghum, and wheat. Online curation and annotation tools facilitate (http://cropontology.org) direct maintenance of the trait information and production of trait dictionaries by the crop communities. An important feature is the cross referencing of CO terms with the Crop database trait ID and with their synonyms in Plant Ontology (PO) and Trait Ontology (TO). Web links between cross referenced terms in CO provide online access to data annotated with similar ontological terms, particularly the genetic data in Gramene (University of Cornell) or the evaluation and climatic data in the Global Repository of evaluation trials of the Climate Change, Agriculture and Food Security programme (CCAFS). Cross-referencing and annotation will be further applied in the IBP. PMID:22934074
A web accessible resource for investigating cassava phenomics and genomics information: BIOGEN BASE
Jayakodi, Murukarthick; selvan, Sreedevi Ghokhilamani; Natesan, Senthil; Muthurajan, Raveendran; Duraisamy, Raghu; Ramineni, Jana Jeevan; Rathinasamy, Sakthi Ambothi; Karuppusamy, Nageswari; Lakshmanan, Pugalenthi; Chokkappan, Mohan
2011-01-01
The goal of our research is to establish a unique portal to bring out the potential outcome of the research in the Casssava crop. The Biogen base for cassava clearly brings out the variations of different traits of the germplasms, maintained at the Tapioca and Castor Research Station, Tamil Nadu Agricultural University. Phenotypic and genotypic variations of the accessions are clearly depicted, for the users to browse and interpret the variations using the microsatellite markers. Database (BIOGEN BASE ‐ CASSAVA) is designed using PHP and MySQL and is equipped with extensive search options. It is more user-friendly and made publicly available, to improve the research and development of cassava by making a wealth of genetics and genomics data available through open, common, and worldwide forum for all individuals interested in the field. Availability The database is available for free at http://www.tnaugenomics.com/biogenbase/casava.php PMID:21904428
A web accessible resource for investigating cassava phenomics and genomics information: BIOGEN BASE.
Jayakodi, Murukarthick; Selvan, Sreedevi Ghokhilamani; Natesan, Senthil; Muthurajan, Raveendran; Duraisamy, Raghu; Ramineni, Jana Jeevan; Rathinasamy, Sakthi Ambothi; Karuppusamy, Nageswari; Lakshmanan, Pugalenthi; Chokkappan, Mohan
2011-01-01
The goal of our research is to establish a unique portal to bring out the potential outcome of the research in the Casssava crop. The Biogen base for cassava clearly brings out the variations of different traits of the germplasms, maintained at the Tapioca and Castor Research Station, Tamil Nadu Agricultural University. Phenotypic and genotypic variations of the accessions are clearly depicted, for the users to browse and interpret the variations using the microsatellite markers. Database (BIOGEN BASE - CASSAVA) is designed using PHP and MySQL and is equipped with extensive search options. It is more user-friendly and made publicly available, to improve the research and development of cassava by making a wealth of genetics and genomics data available through open, common, and worldwide forum for all individuals interested in the field. The database is available for free at http://www.tnaugenomics.com/biogenbase/casava.php.
Said, Joseph I; Knapka, Joseph A; Song, Mingzhou; Zhang, Jinfa
2015-08-01
A specialized database currently containing more than 2200 QTL is established, which allows graphic presentation, visualization and submission of QTL. In cotton quantitative trait loci (QTL), studies are focused on intraspecific Gossypium hirsutum and interspecific G. hirsutum × G. barbadense populations. These two populations are commercially important for the textile industry and are evaluated for fiber quality, yield, seed quality, resistance, physiological, and morphological trait QTL. With meta-analysis data based on the vast amount of QTL studies in cotton it will be beneficial to organize the data into a functional database for the cotton community. Here we provide a tool for cotton researchers to visualize previously identified QTL and submit their own QTL to the Cotton QTLdb database. The database provides the user with the option of selecting various QTL trait types from either the G. hirsutum or G. hirsutum × G. barbadense populations. Based on the user's QTL trait selection, graphical representations of chromosomes of the population selected are displayed in publication ready images. The database also provides users with trait information on QTL, LOD scores, and explained phenotypic variances for all QTL selected. The CottonQTLdb database provides cotton geneticist and breeders with statistical data on cotton QTL previously identified and provides a visualization tool to view QTL positions on chromosomes. Currently the database (Release 1) contains 2274 QTLs, and succeeding QTL studies will be updated regularly by the curators and members of the cotton community that contribute their data to keep the database current. The database is accessible from http://www.cottonqtldb.org.
dndDB: a database focused on phosphorothioation of the DNA backbone.
Ou, Hong-Yu; He, Xinyi; Shao, Yucheng; Tai, Cui; Rajakumar, Kumar; Deng, Zixin
2009-01-01
The Dnd DNA degradation phenotype was first observed during electrophoresis of genomic DNA from Streptomyces lividans more than 20 years ago. It was subsequently shown to be governed by the five-gene dnd cluster. Similar gene clusters have now been found to be widespread among many other distantly related bacteria. Recently the dnd cluster was shown to mediate the incorporation of sulphur into the DNA backbone via a sequence-selective, stereo-specific phosphorothioate modification in Escherichia coli B7A. Intriguingly, to date all identified dnd clusters lie within mobile genetic elements, the vast majority in laterally transferred genomic islands. We organized available data from experimental and bioinformatics analyses about the DNA phosphorothioation phenomenon and associated documentation as a dndDB database. It contains the following detailed information: (i) Dnd phenotype; (ii) dnd gene clusters; (iii) genomic islands harbouring dnd genes; (iv) Dnd proteins and conserved domains. As of 25 December 2008, dndDB contained data corresponding to 24 bacterial species exhibiting the Dnd phenotype reported in the scientific literature. In addition, via in silico analysis, dndDB identified 26 syntenic dnd clusters from 25 species of Eubacteria and Archaea, 25 dnd-bearing genomic islands and one dnd plasmid containing 114 dnd genes. A further 397 other genes coding for proteins with varying levels of similarity to Dnd proteins were also included in dndDB. A broad range of similarity search, sequence alignment and phylogenetic tools are readily accessible to allow for to individualized directions of research focused on dnd genes. dndDB can facilitate efficient investigation of a wide range of aspects relating to dnd DNA modification and other island-encoded functions in host organisms. dndDB version 1.0 is freely available at http://mml.sjtu.edu.cn/dndDB/.
He, Wenyin; Sun, Xiaofang; Liu, Lian; Li, Man; Jin, Hua; Wang, Wei-Hua
2014-01-01
Chromosomal anomalies in human embryos produced by in vitro fertilization are very common, which include numerical (aneuploidy) and structural (deletion, duplication or others) anomalies. Our previous study indicated that chromosomal deletion(s) is the most common structural anomaly accounting for approximately 8% of euploid blastocysts. It is still unknown if these deletions in human euploid blastocysts have clinical significance. In this study, we analyzed 15 previously diagnosed euploid blastocysts that had chromosomal deletion(s) using Agilent oligonucleotide DNA microarray platform and localized the gene location in each deletion. Then, we used OMIM gene map and phenotype database to investigate if these deletions are related with some important genes that cause genetic diseases, especially developmental delay or intellectual disability. As results, we found that the detectable chromosomal deletion size with Agilent microarray is above 2.38 Mb, while the deletions observed in human blastocysts are between 11.6 to 103 Mb. With OMIM gene map and phenotype database information, we found that deletions can result in loss of 81-464 genes. Out of these genes, 34-149 genes are related with known genetic problems. Furthermore, we found that 5 out of 15 samples lost genes in the deleted region, which were related to developmental delay and/or intellectual disability. In conclusion, our data indicates that all human euploid blastocysts with chromosomal deletion(s) are abnormal and transfer of these embryos may cause birth defects and/or developmental and intellectual disabilities. Therefore, the embryos with chromosomal deletion revealed by DNA microarray should not be transferred to the patients, or further gene map and/or phenotype seeking is necessary before making a final decision.
The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease.
Eppig, Janan T; Blake, Judith A; Bult, Carol J; Kadin, James A; Richardson, Joel E
2015-01-01
The Mouse Genome Database (MGD, http://www.informatics.jax.org) serves the international biomedical research community as the central resource for integrated genomic, genetic and biological data on the laboratory mouse. To facilitate use of mouse as a model in translational studies, MGD maintains a core of high-quality curated data and integrates experimentally and computationally generated data sets. MGD maintains a unified catalog of genes and genome features, including functional RNAs, QTL and phenotypic loci. MGD curates and provides functional and phenotype annotations for mouse genes using the Gene Ontology and Mammalian Phenotype Ontology. MGD integrates phenotype data and associates mouse genotypes to human diseases, providing critical mouse-human relationships and access to repositories holding mouse models. MGD is the authoritative source of nomenclature for genes, genome features, alleles and strains following guidelines of the International Committee on Standardized Genetic Nomenclature for Mice. A new addition to MGD, the Human-Mouse: Disease Connection, allows users to explore gene-phenotype-disease relationships between human and mouse. MGD has also updated search paradigms for phenotypic allele attributes, incorporated incidental mutation data, added a module for display and exploration of genes and microRNA interactions and adopted the JBrowse genome browser. MGD resources are freely available to the scientific community. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Grubb, Stephen C.; Bult, Carol J.; Bogue, Molly A.
2014-01-01
The Mouse Phenome Database (MPD; phenome.jax.org) was launched in 2001 as the data coordination center for the international Mouse Phenome Project. MPD integrates quantitative phenotype, gene expression and genotype data into a common annotated framework to facilitate query and analysis. MPD contains >3500 phenotype measurements or traits relevant to human health, including cancer, aging, cardiovascular disorders, obesity, infectious disease susceptibility, blood disorders, neurosensory disorders, drug addiction and toxicity. Since our 2012 NAR report, we have added >70 new data sets, including data from Collaborative Cross lines and Diversity Outbred mice. During this time we have completely revamped our homepage, improved search and navigational aspects of the MPD application, developed several web-enabled data analysis and visualization tools, annotated phenotype data to public ontologies, developed an ontology browser and released new single nucleotide polymorphism query functionality with much higher density coverage than before. Here, we summarize recent data acquisitions and describe our latest improvements. PMID:24243846
PedAM: a database for Pediatric Disease Annotation and Medicine.
Jia, Jinmeng; An, Zhongxin; Ming, Yue; Guo, Yongli; Li, Wei; Li, Xin; Liang, Yunxiang; Guo, Dongming; Tai, Jun; Chen, Geng; Jin, Yaqiong; Liu, Zhimei; Ni, Xin; Shi, Tieliu
2018-01-04
There is a significant number of children around the world suffering from the consequence of the misdiagnosis and ineffective treatment for various diseases. To facilitate the precision medicine in pediatrics, a database namely the Pediatric Disease Annotations & Medicines (PedAM) has been built to standardize and classify pediatric diseases. The PedAM integrates both biomedical resources and clinical data from Electronic Medical Records to support the development of computational tools, by which enables robust data analysis and integration. It also uses disease-manifestation (D-M) integrated from existing biomedical ontologies as prior knowledge to automatically recognize text-mined, D-M-specific syntactic patterns from 774 514 full-text articles and 8 848 796 abstracts in MEDLINE. Additionally, disease connections based on phenotypes or genes can be visualized on the web page of PedAM. Currently, the PedAM contains standardized 8528 pediatric disease terms (4542 unique disease concepts and 3986 synonyms) with eight annotation fields for each disease, including definition synonyms, gene, symptom, cross-reference (Xref), human phenotypes and its corresponding phenotypes in the mouse. The database PedAM is freely accessible at http://www.unimd.org/pedam/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Neural/Bayes network predictor for inheritable cardiac disease pathogenicity and phenotype.
Burghardt, Thomas P; Ajtai, Katalin
2018-04-11
The cardiac muscle sarcomere contains multiple proteins contributing to contraction energy transduction and its regulation during a heartbeat. Inheritable heart disease mutants affect most of them but none more frequently than the ventricular myosin motor and cardiac myosin binding protein c (mybpc3). These co-localizing proteins have mybpc3 playing a regulatory role to the energy transducing motor. Residue substitution and functional domain assignment of each mutation in the protein sequence decides, under the direction of a sensible disease model, phenotype and pathogenicity. The unknown model mechanism is decided here using a method combing neural and Bayes networks. Missense single nucleotide polymorphisms (SNPs) are clues for the disease mechanism summarized in an extensive database collecting mutant sequence location and residue substitution as independent variables that imply the dependent disease phenotype and pathogenicity characteristics in 4 dimensional data points (4ddps). The SNP database contains entries with the majority having one or both dependent data entries unfulfilled. A neural network relating causes (mutant residue location and substitution) and effects (phenotype and pathogenicity) is trained, validated, and optimized using fulfilled 4ddps. It then predicts unfulfilled 4ddps providing the implicit disease model. A discrete Bayes network interprets fulfilled and predicted 4ddps with conditional probabilities for phenotype and pathogenicity given mutation location and residue substitution thus relating the neural network implicit model to explicit features of the motor and mybpc3 sequence and structural domains. Neural/Bayes network forecasting automates disease mechanism modeling by leveraging the world wide human missense SNP database that is in place and expanding. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
Properties and biotechnological applications of ice-binding proteins in bacteria.
Cid, Fernanda P; Rilling, Joaquín I; Graether, Steffen P; Bravo, Leon A; Mora, María de La Luz; Jorquera, Milko A
2016-06-01
Ice-binding proteins (IBPs), such as antifreeze proteins (AFPs) and ice-nucleating proteins (INPs), have been described in diverse cold-adapted organisms, and their potential applications in biotechnology have been recognized in various fields. Currently, both IBPs are being applied to biotechnological processes, primarily in medicine and the food industry. However, our knowledge regarding the diversity of bacterial IBPs is limited; few studies have purified and characterized AFPs and INPs from bacteria. Phenotypically verified IBPs have been described in members belonging to Gammaproteobacteria, Actinobacteria and Flavobacteriia classes, whereas putative IBPs have been found in Gammaproteobacteria, Alphaproteobacteria and Bacilli classes. Thus, the main goal of this minireview is to summarize the current information on bacterial IBPs and their application in biotechnology, emphasizing the potential application in less explored fields such as agriculture. Investigations have suggested the use of INP-producing bacteria antagonists and AFPs-producing bacteria (or their AFPs) as a very attractive strategy to prevent frost damages in crops. UniProt database analyses of reported IBPs (phenotypically verified) and putative IBPs also show the limited information available on bacterial IBPs and indicate that major studies are required. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
eRAM: encyclopedia of rare disease annotations for precision medicine.
Jia, Jinmeng; An, Zhongxin; Ming, Yue; Guo, Yongli; Li, Wei; Liang, Yunxiang; Guo, Dongming; Li, Xin; Tai, Jun; Chen, Geng; Jin, Yaqiong; Liu, Zhimei; Ni, Xin; Shi, Tieliu
2018-01-04
Rare diseases affect over a hundred million people worldwide, most of these patients are not accurately diagnosed and effectively treated. The limited knowledge of rare diseases forms the biggest obstacle for improving their treatment. Detailed clinical phenotyping is considered as a keystone of deciphering genes and realizing the precision medicine for rare diseases. Here, we preset a standardized system for various types of rare diseases, called encyclopedia of Rare disease Annotations for Precision Medicine (eRAM). eRAM was built by text-mining nearly 10 million scientific publications and electronic medical records, and integrating various data in existing recognized databases (such as Unified Medical Language System (UMLS), Human Phenotype Ontology, Orphanet, OMIM, GWAS). eRAM systematically incorporates currently available data on clinical manifestations and molecular mechanisms of rare diseases and uncovers many novel associations among diseases. eRAM provides enriched annotations for 15 942 rare diseases, yielding 6147 human disease related phenotype terms, 31 661 mammalians phenotype terms, 10,202 symptoms from UMLS, 18 815 genes and 92 580 genotypes. eRAM can not only provide information about rare disease mechanism but also facilitate clinicians to make accurate diagnostic and therapeutic decisions towards rare diseases. eRAM can be freely accessed at http://www.unimd.org/eram/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Fast Raman single bacteria identification: toward a routine in-vitro diagnostic
NASA Astrophysics Data System (ADS)
Douet, Alice; Josso, Quentin; Marchant, Adrien; Dutertre, Bertrand; Filiputti, Delphine; Novelli-Rousseau, Armelle; Espagnon, Isabelle; Kloster-Landsberg, Meike; Mallard, Frédéric; Perraut, Francois
2016-04-01
Timely microbiological results are essential to allow clinicians to optimize the prescribed treatment, ideally at the initial stage of the therapeutic process. Several approaches have been proposed to solve this issue and to provide the microbiological result in a few hours directly from the sample such as molecular biology. However fast and sensitive those methods are not based on single phenotypic information which presents several drawbacks and limitations. Optical methods have the advantage to allow single-cell sensitivity and to probe the phenotype of measured cells. Here we present a process and a prototype that allow automated single-bacteria phenotypic analysis. This prototype is based on the use of Digital In-line Holography techniques combined with a specially designed Raman spectrometer using a dedicated device to capture bacteria. The localization of single-cell is finely determined by using holograms and a proper propagation kernel. Holographic images are also used to analyze bacteria in the sample to sort potential pathogens from flora dwelling species or other biological particles. This accurate localization enables the use of a small confocal volume adapted to the measurement of single-cell. Along with the confocal volume adaptation, we also have modified every components of the spectrometer to optimize single-bacteria Raman measurements. This optimization allowed us to acquire informative single-cell spectra using an integration time of 0.5s only. Identification results obtained with this prototype are presented based on a 65144 Raman spectra database acquired automatically on 48 bacteria strains belonging to 8 species.
Fernandez-Ricaud, Luciano; Kourtchenko, Olga; Zackrisson, Martin; Warringer, Jonas; Blomberg, Anders
2016-06-23
Phenomics is a field in functional genomics that records variation in organismal phenotypes in the genetic, epigenetic or environmental context at a massive scale. For microbes, the key phenotype is the growth in population size because it contains information that is directly linked to fitness. Due to technical innovations and extensive automation our capacity to record complex and dynamic microbial growth data is rapidly outpacing our capacity to dissect and visualize this data and extract the fitness components it contains, hampering progress in all fields of microbiology. To automate visualization, analysis and exploration of complex and highly resolved microbial growth data as well as standardized extraction of the fitness components it contains, we developed the software PRECOG (PREsentation and Characterization Of Growth-data). PRECOG allows the user to quality control, interact with and evaluate microbial growth data with ease, speed and accuracy, also in cases of non-standard growth dynamics. Quality indices filter high- from low-quality growth experiments, reducing false positives. The pre-processing filters in PRECOG are computationally inexpensive and yet functionally comparable to more complex neural network procedures. We provide examples where data calibration, project design and feature extraction methodologies have a clear impact on the estimated growth traits, emphasising the need for proper standardization in data analysis. PRECOG is a tool that streamlines growth data pre-processing, phenotypic trait extraction, visualization, distribution and the creation of vast and informative phenomics databases.
Post, Andrew R.; Kurc, Tahsin; Cholleti, Sharath; Gao, Jingjing; Lin, Xia; Bornstein, William; Cantrell, Dedra; Levine, David; Hohmann, Sam; Saltz, Joel H.
2013-01-01
Objective To create an analytics platform for specifying and detecting clinical phenotypes and other derived variables in electronic health record (EHR) data for quality improvement investigations. Materials and Methods We have developed an architecture for an Analytic Information Warehouse (AIW). It supports transforming data represented in different physical schemas into a common data model, specifying derived variables in terms of the common model to enable their reuse, computing derived variables while enforcing invariants and ensuring correctness and consistency of data transformations, long-term curation of derived data, and export of derived data into standard analysis tools. It includes software that implements these features and a computing environment that enables secure high-performance access to and processing of large datasets extracted from EHRs. Results We have implemented and deployed the architecture in production locally. The software is available as open source. We have used it as part of hospital operations in a project to reduce rates of hospital readmission within 30 days. The project examined the association of over 100 derived variables representing disease and co-morbidity phenotypes with readmissions in five years of data from our institution’s clinical data warehouse and the UHC Clinical Database (CDB). The CDB contains administrative data from over 200 hospitals that are in academic medical centers or affiliated with such centers. Discussion and Conclusion A widely available platform for managing and detecting phenotypes in EHR data could accelerate the use of such data in quality improvement and comparative effectiveness studies. PMID:23402960
USDA-ARS?s Scientific Manuscript database
Because less than one-third of clinically relevant fusaria can be accurately identified to species level using phenotypic data (i.e., morphological species recognition), we constructed a three-locus DNA sequence database to facilitate molecular identification of the 69 Fusarium species associated wi...
D'Souza, Mark; Sulakhe, Dinanath; Wang, Sheng; Xie, Bing; Hashemifar, Somaye; Taylor, Andrew; Dubchak, Inna; Conrad Gilliam, T; Maltsev, Natalia
2017-01-01
Recent technological advances in genomics allow the production of biological data at unprecedented tera- and petabyte scales. Efficient mining of these vast and complex datasets for the needs of biomedical research critically depends on a seamless integration of the clinical, genomic, and experimental information with prior knowledge about genotype-phenotype relationships. Such experimental data accumulated in publicly available databases should be accessible to a variety of algorithms and analytical pipelines that drive computational analysis and data mining.We present an integrated computational platform Lynx (Sulakhe et al., Nucleic Acids Res 44:D882-D887, 2016) ( http://lynx.cri.uchicago.edu ), a web-based database and knowledge extraction engine. It provides advanced search capabilities and a variety of algorithms for enrichment analysis and network-based gene prioritization. It gives public access to the Lynx integrated knowledge base (LynxKB) and its analytical tools via user-friendly web services and interfaces. The Lynx service-oriented architecture supports annotation and analysis of high-throughput experimental data. Lynx tools assist the user in extracting meaningful knowledge from LynxKB and experimental data, and in the generation of weighted hypotheses regarding the genes and molecular mechanisms contributing to human phenotypes or conditions of interest. The goal of this integrated platform is to support the end-to-end analytical needs of various translational projects.
Wang, Yongcui; Chen, Shilong; Deng, Naiyang; Wang, Yong
2013-01-01
Computational inference of novel therapeutic values for existing drugs, i.e., drug repositioning, offers the great prospect for faster and low-risk drug development. Previous researches have indicated that chemical structures, target proteins, and side-effects could provide rich information in drug similarity assessment and further disease similarity. However, each single data source is important in its own way and data integration holds the great promise to reposition drug more accurately. Here, we propose a new method for drug repositioning, PreDR (Predict Drug Repositioning), to integrate molecular structure, molecular activity, and phenotype data. Specifically, we characterize drug by profiling in chemical structure, target protein, and side-effects space, and define a kernel function to correlate drugs with diseases. Then we train a support vector machine (SVM) to computationally predict novel drug-disease interactions. PreDR is validated on a well-established drug-disease network with 1,933 interactions among 593 drugs and 313 diseases. By cross-validation, we find that chemical structure, drug target, and side-effects information are all predictive for drug-disease relationships. More experimentally observed drug-disease interactions can be revealed by integrating these three data sources. Comparison with existing methods demonstrates that PreDR is competitive both in accuracy and coverage. Follow-up database search and pathway analysis indicate that our new predictions are worthy of further experimental validation. Particularly several novel predictions are supported by clinical trials databases and this shows the significant prospects of PreDR in future drug treatment. In conclusion, our new method, PreDR, can serve as a useful tool in drug discovery to efficiently identify novel drug-disease interactions. In addition, our heterogeneous data integration framework can be applied to other problems. PMID:24244318
Breast MRI radiomics: comparison of computer- and human-extracted imaging phenotypes.
Sutton, Elizabeth J; Huang, Erich P; Drukker, Karen; Burnside, Elizabeth S; Li, Hui; Net, Jose M; Rao, Arvind; Whitman, Gary J; Zuley, Margarita; Ganott, Marie; Bonaccio, Ermelinda; Giger, Maryellen L; Morris, Elizabeth A
2017-01-01
In this study, we sought to investigate if computer-extracted magnetic resonance imaging (MRI) phenotypes of breast cancer could replicate human-extracted size and Breast Imaging-Reporting and Data System (BI-RADS) imaging phenotypes using MRI data from The Cancer Genome Atlas (TCGA) project of the National Cancer Institute. Our retrospective interpretation study involved analysis of Health Insurance Portability and Accountability Act-compliant breast MRI data from The Cancer Imaging Archive, an open-source database from the TCGA project. This study was exempt from institutional review board approval at Memorial Sloan Kettering Cancer Center and the need for informed consent was waived. Ninety-one pre-operative breast MRIs with verified invasive breast cancers were analysed. Three fellowship-trained breast radiologists evaluated the index cancer in each case according to size and the BI-RADS lexicon for shape, margin, and enhancement (human-extracted image phenotypes [HEIP]). Human inter-observer agreement was analysed by the intra-class correlation coefficient (ICC) for size and Krippendorff's α for other measurements. Quantitative MRI radiomics of computerised three-dimensional segmentations of each cancer generated computer-extracted image phenotypes (CEIP). Spearman's rank correlation coefficients were used to compare HEIP and CEIP. Inter-observer agreement for HEIP varied, with the highest agreement seen for size (ICC 0.679) and shape (ICC 0.527). The computer-extracted maximum linear size replicated the human measurement with p < 10 -12 . CEIP of shape, specifically sphericity and irregularity, replicated HEIP with both p values < 0.001. CEIP did not demonstrate agreement with HEIP of tumour margin or internal enhancement. Quantitative radiomics of breast cancer may replicate human-extracted tumour size and BI-RADS imaging phenotypes, thus enabling precision medicine.
Rationale and uses of a public HIV drug-resistance database.
Shafer, Robert W
2006-09-15
Knowledge regarding the drug resistance of human immunodeficiency virus (HIV) is critical for surveillance of drug resistance, development of antiretroviral drugs, and management of infections with drug-resistant viruses. Such knowledge is derived from studies that correlate genetic variation in the targets of therapy with the antiretroviral treatments received by persons from whom the variant was obtained (genotype-treatment), with drug-susceptibility data on genetic variants (genotype-phenotype), and with virological and clinical response to a new treatment regimen (genotype-outcome). An HIV drug-resistance database is required to represent, store, and analyze the diverse forms of data underlying our knowledge of drug resistance and to make these data available to the broad community of researchers studying drug resistance in HIV and clinicians using HIV drug-resistance tests. Such genotype-treatment, genotype-phenotype, and genotype-outcome correlations are contained in the Stanford HIV RT and Protease Sequence Database and have specific usefulness.
Czech multicenter research database of severe COPD
Novotna, Barbora; Koblizek, Vladimir; Zatloukal, Jaromir; Plutinsky, Marek; Hejduk, Karel; Zbozinkova, Zuzana; Jarkovsky, Jiri; Sobotik, Ondrej; Dvorak, Tomas; Safranek, Petr
2014-01-01
Purpose Chronic obstructive pulmonary disease (COPD) has been recognized as a heterogeneous, multiple organ system-affecting disorder. The Global Initiative for Chronic Obstructive Lung Disease (GOLD) places emphasis on symptom and exacerbation management. The aim of this study is examine the course of COPD and its impact on morbidity and all-cause mortality of patients, with respect to individual phenotypes and GOLD categories. This study will also evaluate COPD real-life patient care in the Czech Republic. Patients and methods The Czech Multicentre Research Database of COPD is projected to last for 5 years, with the aim of enrolling 1,000 patients. This is a multicenter, observational, and prospective study of patients with severe COPD (post-bronchodilator forced expiratory volume in 1 second ≤60%). Every consecutive patient, who fulfils the inclusion criteria, is asked to participate in the study. Patient recruitment is done on the basis of signed informed consent. The study was approved by the Multicentre Ethical Committee in Brno, Czech Republic. Results The objective of this paper was to outline the methodology of this study. Conclusion The establishment of the database is a useful step in improving care for COPD subjects. Additionally, it will serve as a source of data elucidating the natural course of COPD, comorbidities, and overall impact on the patients. Moreover, it will provide information on the diverse course of the COPD syndrome in the Czech Republic. PMID:25419124
The Human Phenotype Ontology in 2017
Köhler, Sebastian; Vasilevsky, Nicole A.; Engelstad, Mark; Foster, Erin; McMurry, Julie; Aymé, Ségolène; Baynam, Gareth; Bello, Susan M.; Boerkoel, Cornelius F.; Boycott, Kym M.; Brudno, Michael; Buske, Orion J.; Chinnery, Patrick F.; Cipriani, Valentina; Connell, Laureen E.; Dawkins, Hugh J.S.; DeMare, Laura E.; Devereau, Andrew D.; de Vries, Bert B.A.; Firth, Helen V.; Freson, Kathleen; Greene, Daniel; Hamosh, Ada; Helbig, Ingo; Hum, Courtney; Jähn, Johanna A.; James, Roger; Krause, Roland; F. Laulederkind, Stanley J.; Lochmüller, Hanns; Lyon, Gholson J.; Ogishima, Soichi; Olry, Annie; Ouwehand, Willem H.; Pontikos, Nikolas; Rath, Ana; Schaefer, Franz; Scott, Richard H.; Segal, Michael; Sergouniotis, Panagiotis I.; Sever, Richard; Smith, Cynthia L.; Straub, Volker; Thompson, Rachel; Turner, Catherine; Turro, Ernest; Veltman, Marijcke W.M.; Vulliamy, Tom; Yu, Jing; von Ziegenweidt, Julie; Zankl, Andreas; Züchner, Stephan; Zemojtel, Tomasz; Jacobsen, Julius O.B.; Groza, Tudor; Smedley, Damian; Mungall, Christopher J.; Haendel, Melissa; Robinson, Peter N.
2017-01-01
Deep phenotyping has been defined as the precise and comprehensive analysis of phenotypic abnormalities in which the individual components of the phenotype are observed and described. The three components of the Human Phenotype Ontology (HPO; www.human-phenotype-ontology.org) project are the phenotype vocabulary, disease-phenotype annotations and the algorithms that operate on these. These components are being used for computational deep phenotyping and precision medicine as well as integration of clinical data into translational research. The HPO is being increasingly adopted as a standard for phenotypic abnormalities by diverse groups such as international rare disease organizations, registries, clinical labs, biomedical resources, and clinical software tools and will thereby contribute toward nascent efforts at global data exchange for identifying disease etiologies. This update article reviews the progress of the HPO project since the debut Nucleic Acids Research database article in 2014, including specific areas of expansion such as common (complex) disease, new algorithms for phenotype driven genomic discovery and diagnostics, integration of cross-species mapping efforts with the Mammalian Phenotype Ontology, an improved quality control pipeline, and the addition of patient-friendly terminology. PMID:27899602
The Human Phenotype Ontology in 2017
Köhler, Sebastian; Vasilevsky, Nicole A.; Engelstad, Mark; ...
2016-11-24
Deep phenotyping has been defined as the precise and comprehensive analysis of phenotypic abnormalities in which the individual components of the phenotype are observed and described. The three components of the Human PhenotypeOntology (HPO; www.human-phenotype-ontology.org) project are the phenotype vocabulary, disease-phenotype annotations and the algorithms that operate on these. These components are being used for computational deep phenotyping and precision medicine as well as integration of clinical data into translational research. The HPO is being increasingly adopted as a standard for phenotypic abnormalities by diverse groups such as international rare disease organizations, registries, clinical labs, biomedical resources, and clinical softwaremore » tools and will thereby contribute toward nascent efforts at global data exchange for identifying disease etiologies. This update article reviews the progress of the HPO project since the debut Nucleic Acids Research database article in 2014, including specific areas of expansion such as common (complex) disease, new algorithms for phenotype driven genomic discovery and diagnostics, integration of cross-species mapping efforts with the Mammalian Phenotype Ontology, an improved quality control pipeline, and the addition of patient-friendly terminology.« less
Phenome-driven disease genetics prediction toward drug discovery.
Chen, Yang; Li, Li; Zhang, Guo-Qiang; Xu, Rong
2015-06-15
Discerning genetic contributions to diseases not only enhances our understanding of disease mechanisms, but also leads to translational opportunities for drug discovery. Recent computational approaches incorporate disease phenotypic similarities to improve the prediction power of disease gene discovery. However, most current studies used only one data source of human disease phenotype. We present an innovative and generic strategy for combining multiple different data sources of human disease phenotype and predicting disease-associated genes from integrated phenotypic and genomic data. To demonstrate our approach, we explored a new phenotype database from biomedical ontologies and constructed Disease Manifestation Network (DMN). We combined DMN with mimMiner, which was a widely used phenotype database in disease gene prediction studies. Our approach achieved significantly improved performance over a baseline method, which used only one phenotype data source. In the leave-one-out cross-validation and de novo gene prediction analysis, our approach achieved the area under the curves of 90.7% and 90.3%, which are significantly higher than 84.2% (P < e(-4)) and 81.3% (P < e(-12)) for the baseline approach. We further demonstrated that our predicted genes have the translational potential in drug discovery. We used Crohn's disease as an example and ranked the candidate drugs based on the rank of drug targets. Our gene prediction approach prioritized druggable genes that are likely to be associated with Crohn's disease pathogenesis, and our rank of candidate drugs successfully prioritized the Food and Drug Administration-approved drugs for Crohn's disease. We also found literature evidence to support a number of drugs among the top 200 candidates. In summary, we demonstrated that a novel strategy combining unique disease phenotype data with system approaches can lead to rapid drug discovery. nlp. edu/public/data/DMN © The Author 2015. Published by Oxford University Press.
DNA variant databases improve test accuracy and phenotype prediction in Alport syndrome.
Savige, Judy; Ars, Elisabet; Cotton, Richard G H; Crockett, David; Dagher, Hayat; Deltas, Constantinos; Ding, Jie; Flinter, Frances; Pont-Kingdon, Genevieve; Smaoui, Nizar; Torra, Roser; Storey, Helen
2014-06-01
X-linked Alport syndrome is a form of progressive renal failure caused by pathogenic variants in the COL4A5 gene. More than 700 variants have been described and a further 400 are estimated to be known to individual laboratories but are unpublished. The major genetic testing laboratories for X-linked Alport syndrome worldwide have established a Web-based database for published and unpublished COL4A5 variants ( https://grenada.lumc.nl/LOVD2/COL4A/home.php?select_db=COL4A5 ). This conforms with the recommendations of the Human Variome Project: it uses the Leiden Open Variation Database (LOVD) format, describes variants according to the human reference sequence with standardized nomenclature, indicates likely pathogenicity and associated clinical features, and credits the submitting laboratory. The database includes non-pathogenic and recurrent variants, and is linked to another COL4A5 mutation database and relevant bioinformatics sites. Access is free. Increasing the number of COL4A5 variants in the public domain helps patients, diagnostic laboratories, clinicians, and researchers. The database improves the accuracy and efficiency of genetic testing because its variants are already categorized for pathogenicity. The description of further COL4A5 variants and clinical associations will improve our ability to predict phenotype and our understanding of collagen IV biochemistry. The database for X-linked Alport syndrome represents a model for databases in other inherited renal diseases.
Increasing rigor in NMR-based metabolomics through validated and open source tools
Eghbalnia, Hamid R; Romero, Pedro R; Westler, William M; Baskaran, Kumaran; Ulrich, Eldon L; Markley, John L
2016-01-01
The metabolome, the collection of small molecules associated with an organism, is a growing subject of inquiry, with the data utilized for data-intensive systems biology, disease diagnostics, biomarker discovery, and the broader characterization of small molecules in mixtures. Owing to their close proximity to the functional endpoints that govern an organism’s phenotype, metabolites are highly informative about functional states. The field of metabolomics identifies and quantifies endogenous and exogenous metabolites in biological samples. Information acquired from nuclear magnetic spectroscopy (NMR), mass spectrometry (MS), and the published literature, as processed by statistical approaches, are driving increasingly wider applications of metabolomics. This review focuses on the role of databases and software tools in advancing the rigor, robustness, reproducibility, and validation of metabolomics studies. PMID:27643760
Increasing rigor in NMR-based metabolomics through validated and open source tools.
Eghbalnia, Hamid R; Romero, Pedro R; Westler, William M; Baskaran, Kumaran; Ulrich, Eldon L; Markley, John L
2017-02-01
The metabolome, the collection of small molecules associated with an organism, is a growing subject of inquiry, with the data utilized for data-intensive systems biology, disease diagnostics, biomarker discovery, and the broader characterization of small molecules in mixtures. Owing to their close proximity to the functional endpoints that govern an organism's phenotype, metabolites are highly informative about functional states. The field of metabolomics identifies and quantifies endogenous and exogenous metabolites in biological samples. Information acquired from nuclear magnetic spectroscopy (NMR), mass spectrometry (MS), and the published literature, as processed by statistical approaches, are driving increasingly wider applications of metabolomics. This review focuses on the role of databases and software tools in advancing the rigor, robustness, reproducibility, and validation of metabolomics studies. Copyright © 2016. Published by Elsevier Ltd.
Trezza, Alfonso; Bernini, Andrea; Langella, Andrea; Ascher, David B; Pires, Douglas E V; Sodi, Andrea; Passerini, Ilaria; Pelo, Elisabetta; Rizzo, Stanislao; Niccolai, Neri; Spiga, Ottavia
2017-10-01
The aim of this article is to report the investigation of the structural features of ABCA4, a protein associated with a genetic retinal disease. A new database collecting knowledge of ABCA4 structure may facilitate predictions about the possible functional consequences of gene mutations observed in clinical practice. In order to correlate structural and functional effects of the observed mutations, the structure of mouse P-glycoprotein was used as a template for homology modeling. The obtained structural information and genetic data are the basis of our relational database (ABCA4Database). Sequence variability among all ABCA4-deposited entries was calculated and reported as Shannon entropy score at the residue level. The three-dimensional model of ABCA4 structure was used to locate the spatial distribution of the observed variable regions. Our predictions from structural in silico tools were able to accurately link the functional effects of mutations to phenotype. The development of the ABCA4Database gathers all the available genetic and structural information, yielding a global view of the molecular basis of some retinal diseases. ABCA4 modeled structure provides a molecular basis on which to analyze protein sequence mutations related to genetic retinal disease in order to predict the risk of retinal disease across all possible ABCA4 mutations. Additionally, our ABCA4 predicted structure is a good starting point for the creation of a new data analysis model, appropriate for precision medicine, in order to develop a deeper knowledge network of the disease and to improve the management of patients.
Blumenfeld, Olga O
2002-04-01
Recent advances in molecular biology and technology have provided evidence, at a molecular level, for long-known observations that the human genome is not unique but is characterized by individual sequence variation. At the present time, documentation of genetic variation occurring in a large number of genes is increasing exponentially. The characterization of alleles that encode a variety of blood group antigens has been particularly fruitful for transfusion medicine. Phenotypic variation, as identified by the serologic study of blood group variants, is required to identify the presence of a variant allele. Many of the other alleles currently recorded have been selected and identified on the basis of inherited disease traits. New approaches document single nucleotide polymorphisms that occur throughout the genome and best show how the DNA sequence varies in the human population. The primary data dealing with variant alleles or more general genomic variation are scattered throughout the scientific literature and only within the last few years has information begun to be organized into databases. This article provides guidance on how to access those databases online as a source of information about genetic variation for purposes of molecular, clinical, and diagnostic medicine, research, and teaching. The attributes of the sites are described. A more detailed view of the database dealing specifically with alleles of genes encoding the blood group antigens includes a brief preliminary analysis of the molecular basis for observed polymorphisms. Other online sites that may be particularly useful to the transfusion medicine readership as well as a brief historical account are also presented. Copyright 2002, Elsevier Science (USA). All rights reserved.
Towards improving phenotype representation in OWL
2012-01-01
Background Phenotype ontologies are used in species-specific databases for the annotation of mutagenesis experiments and to characterize human diseases. The Entity-Quality (EQ) formalism is a means to describe complex phenotypes based on one or more affected entities and a quality. EQ-based definitions have been developed for many phenotype ontologies, including the Human and Mammalian Phenotype ontologies. Methods We analyze formalizations of complex phenotype descriptions in the Web Ontology Language (OWL) that are based on the EQ model, identify several representational challenges and analyze potential solutions to address these challenges. Results In particular, we suggest a novel, role-based approach to represent relational qualities such as concentration of iron in spleen, discuss its ontological foundation in the General Formal Ontology (GFO) and evaluate its representation in OWL and the benefits it can bring to the representation of phenotype annotations. Conclusion Our analysis of OWL-based representations of phenotypes can contribute to improving consistency and expressiveness of formal phenotype descriptions. PMID:23046625
Combined metopic and unilateral coronal synostoses: a phenotypic conundrum.
Sauerhammer, Tina M; Patel, Kamlesh; Oh, Albert K; Proctor, Mark R; Mulliken, John B; Rogers, Gary F
2014-03-01
Most types of craniosynostosis cause predictable changes in cranial shape. However, the phenotype of combined metopic and unilateral coronal synostoses is anomalous. The purpose of this observational study was to better clarify the clinical and radiographic features of this rare entity. A retrospective review of a craniofacial database was performed. Patients with combined metopic and unilateral coronal synostoses were included in this study. Data collected included demographic information, physical and radiographic findings, genetic evaluation, treatment, and operative outcomes. Of 687 patients treated between 1989 and 2010, only 3 patients had combined metopic and unilateral coronal synostoses. All patients were diagnosed through computed tomography on the first day of life. Phenotypic features included the following: (1) narrowed forehead with a prominent midline ridge, (2) severe bilateral brow retrusion with an acute indentation on the side of the patient coronal suture, (3) facial and nasal angulation similar to isolated unilateral coronal synostosis, and (4) anterior displacement of the ear on the fused side. In addition, the cranial vertex was deviated toward the side of the open coronal suture. Two patients had a head circumference below the 25th percentile; 2 of the 3 had a TWIST gene mutation consistent with Saethre-Chotzen syndrome. One patient was managed through fronto-orbital advancement and required a revision. The other 2 patients had early endoscopic release, followed by postoperative helmet therapy; one improved but still required open cranial remodeling. The other has near-normal phenotype, and no further surgery is planned. Combined metopic and unilateral coronal synostoses present a rare and unusual phenotype. Although early intervention improves the deformity, revisional procedures are usually required.
Coffin-Siris syndrome and the BAF complex: genotype-phenotype study in 63 patients.
Santen, Gijs W E; Aten, Emmelien; Vulto-van Silfhout, Anneke T; Pottinger, Caroline; van Bon, Bregje W M; van Minderhout, Ivonne J H M; Snowdowne, Ronelle; van der Lans, Christian A C; Boogaard, Merel; Linssen, Margot M L; Vijfhuizen, Linda; van der Wielen, Michiel J R; Vollebregt, M J Ellen; Breuning, Martijn H; Kriek, Marjolein; van Haeringen, Arie; den Dunnen, Johan T; Hoischen, Alexander; Clayton-Smith, Jill; de Vries, Bert B A; Hennekam, Raoul C M; van Belzen, Martine J
2013-11-01
De novo germline variants in several components of the SWI/SNF-like BAF complex can cause Coffin-Siris syndrome (CSS), Nicolaides-Baraitser syndrome (NCBRS), and nonsyndromic intellectual disability. We screened 63 patients with a clinical diagnosis of CSS for these genes (ARID1A, ARID1B, SMARCA2, SMARCA4, SMARCB1, and SMARCE1) and identified pathogenic variants in 45 (71%) patients. We found a high proportion of variants in ARID1B (68%). All four pathogenic variants in ARID1A appeared to be mosaic. By using all variants from the Exome Variant Server as test data, we were able to classify variants in ARID1A, ARID1B, and SMARCB1 reliably as being pathogenic or nonpathogenic. For SMARCA2, SMARCA4, and SMARCE1 several variants in the EVS remained unclassified, underlining the importance of parental testing. We have entered all variant and clinical information in LOVD-powered databases to facilitate further genotype-phenotype correlations, as these will become increasingly important because of the uptake of targeted and untargeted next generation sequencing in diagnostics. The emerging phenotype-genotype correlation is that SMARCB1 patients have the most marked physical phenotype and severe cognitive and growth delay. The variability in phenotype seems most marked in ARID1A and ARID1B patients. Distal limbs anomalies are most marked in ARID1A patients and least in SMARCB1 patients. Numbers are small however, and larger series are needed to confirm this correlation. © 2013 WILEY PERIODICALS, INC.
Damming the genomic data flood using a comprehensive analysis and storage data structure
Bouffard, Marc; Phillips, Michael S.; Brown, Andrew M.K.; Marsh, Sharon; Tardif, Jean-Claude; van Rooij, Tibor
2010-01-01
Data generation, driven by rapid advances in genomic technologies, is fast outpacing our analysis capabilities. Faced with this flood of data, more hardware and software resources are added to accommodate data sets whose structure has not specifically been designed for analysis. This leads to unnecessarily lengthy processing times and excessive data handling and storage costs. Current efforts to address this have centered on developing new indexing schemas and analysis algorithms, whereas the root of the problem lies in the format of the data itself. We have developed a new data structure for storing and analyzing genotype and phenotype data. By leveraging data normalization techniques, database management system capabilities and the use of a novel multi-table, multidimensional database structure we have eliminated the following: (i) unnecessarily large data set size due to high levels of redundancy, (ii) sequential access to these data sets and (iii) common bottlenecks in analysis times. The resulting novel data structure horizontally divides the data to circumvent traditional problems associated with the use of databases for very large genomic data sets. The resulting data set required 86% less disk space and performed analytical calculations 6248 times faster compared to a standard approach without any loss of information. Database URL: http://castor.pharmacogenomics.ca PMID:21159730
DOE Office of Scientific and Technical Information (OSTI.GOV)
Köhler, Sebastian; Vasilevsky, Nicole A.; Engelstad, Mark
Deep phenotyping has been defined as the precise and comprehensive analysis of phenotypic abnormalities in which the individual components of the phenotype are observed and described. The three components of the Human PhenotypeOntology (HPO; www.human-phenotype-ontology.org) project are the phenotype vocabulary, disease-phenotype annotations and the algorithms that operate on these. These components are being used for computational deep phenotyping and precision medicine as well as integration of clinical data into translational research. The HPO is being increasingly adopted as a standard for phenotypic abnormalities by diverse groups such as international rare disease organizations, registries, clinical labs, biomedical resources, and clinical softwaremore » tools and will thereby contribute toward nascent efforts at global data exchange for identifying disease etiologies. This update article reviews the progress of the HPO project since the debut Nucleic Acids Research database article in 2014, including specific areas of expansion such as common (complex) disease, new algorithms for phenotype driven genomic discovery and diagnostics, integration of cross-species mapping efforts with the Mammalian Phenotype Ontology, an improved quality control pipeline, and the addition of patient-friendly terminology.« less
Mungall, Christopher J; Emmert, David B
2007-07-01
A few years ago, FlyBase undertook to design a new database schema to store Drosophila data. It would fully integrate genomic sequence and annotation data with bibliographic, genetic, phenotypic and molecular data from the literature representing a distillation of the first 100 years of research on this major animal model system. In developing this new integrated schema, FlyBase also made a commitment to ensure that its design was generic, extensible and available as open source, so that it could be employed as the core schema of any model organism data repository, thereby avoiding redundant software development and potentially increasing interoperability. Our question was whether we could create a relational database schema that would be successfully reused. Chado is a relational database schema now being used to manage biological knowledge for a wide variety of organisms, from human to pathogens, especially the classes of information that directly or indirectly can be associated with genome sequences or the primary RNA and protein products encoded by a genome. Biological databases that conform to this schema can interoperate with one another, and with application software from the Generic Model Organism Database (GMOD) toolkit. Chado is distinctive because its design is driven by ontologies. The use of ontologies (or controlled vocabularies) is ubiquitous across the schema, as they are used as a means of typing entities. The Chado schema is partitioned into integrated subschemas (modules), each encapsulating a different biological domain, and each described using representations in appropriate ontologies. To illustrate this methodology, we describe here the Chado modules used for describing genomic sequences. GMOD is a collaboration of several model organism database groups, including FlyBase, to develop a set of open-source software for managing model organism data. The Chado schema is freely distributed under the terms of the Artistic License (http://www.opensource.org/licenses/artistic-license.php) from GMOD (www.gmod.org).
Bedside Back to Bench: Building Bridges between Basic and Clinical Genomic Research.
Manolio, Teri A; Fowler, Douglas M; Starita, Lea M; Haendel, Melissa A; MacArthur, Daniel G; Biesecker, Leslie G; Worthey, Elizabeth; Chisholm, Rex L; Green, Eric D; Jacob, Howard J; McLeod, Howard L; Roden, Dan; Rodriguez, Laura Lyman; Williams, Marc S; Cooper, Gregory M; Cox, Nancy J; Herman, Gail E; Kingsmore, Stephen; Lo, Cecilia; Lutz, Cathleen; MacRae, Calum A; Nussbaum, Robert L; Ordovas, Jose M; Ramos, Erin M; Robinson, Peter N; Rubinstein, Wendy S; Seidman, Christine; Stranger, Barbara E; Wang, Haoyi; Westerfield, Monte; Bult, Carol
2017-03-23
Genome sequencing has revolutionized the diagnosis of genetic diseases. Close collaborations between basic scientists and clinical genomicists are now needed to link genetic variants with disease causation. To facilitate such collaborations, we recommend prioritizing clinically relevant genes for functional studies, developing reference variant-phenotype databases, adopting phenotype description standards, and promoting data sharing. Published by Elsevier Inc.
Bedside Back to Bench: Building Bridges between Basic and Clinical Genomic Research
Manolio, Teri A.; Fowler, Douglas M.; Starita, Lea M.; Haendel, Melissa A.; MacArthur, Daniel G.; Biesecker, Leslie G.; Worthey, Elizabeth; Chisholm, Rex L.; Green, Eric D.; Jacob, Howard J.; McLeod, Howard L.; Roden, Dan; Rodriguez, Laura Lyman; Williams, Marc S.; Cooper, Gregory M.; Cox, Nancy J.; Herman, Gail E.; Kingsmore, Stephen; Lo, Cecilia; Lutz, Cathleen; MacRae, Calum A.; Nussbaum, Robert L.; Ordovas, Jose M.; Ramos, Erin M.; Robinson, Peter N.; Rubinstein, Wendy S.; Seidman, Christine; Stranger, Barbara E.; Wang, Haoyi; Westerfield, Monte; Bult, Carol
2017-01-01
Summary Genome sequencing has revolutionized the diagnosis of genetic diseases. Close collaborations between basic scientists and clinical genomicists are now needed to link genetic variants with disease causation. To facilitate such collaborations we recommend prioritizing clinically relevant genes for functional studies, developing reference variant-phenotype databases, adopting phenotype description standards, and promoting data sharing. PMID:28340351
Klein, Stanley B.; Cosmides, Leda; Gangi, Cynthia E.; Jackson, Betsy; Tooby, John; Costabile, Kristi A.
2013-01-01
Over the past two decades, an abundance of evidence has shown that individuals typically rely on semantic summary knowledge when making trait judgments about self and others (for reviews, see Klein, 2004; Klein, Robertson, Gangi, & Loftus, 2008). But why form trait summaries if one can consult the original episodes on which the summary was based? Conversely, why retain episodes after having abstracted a summary representation from them? Are there functional reasons to have trait information represented in two different, independently retrievable databases? Evolution does not produce new phenotypic systems that are complex and functionally organized by chance. Such systems acquire their functional organization because they solved some evolutionarily recurrent problems for the organism. In this article we explore some of the functional properties of episodic memory. Specifically, in a series of studies we demonstrate that maintaining a database of episodic memories enables its owner to reevaluate an individual’s past behavior in light of new information, sometimes drastically changing one’s impression in the process. We conclude that some of the most important functions of episodic memory have to do with its role in human social interaction. PMID:23378680
Gene Expression Profiling of Benign and Malignant Pheochromocytoma
BROUWERS, FREDERIEKE M.; ELKAHLOUN, ABDEL G.; MUNSON, PETER J.; EISENHOFER, GRAEME; BARB, JENNIFER; LINEHAN, W. MARSTON; LENDERS, JACQUES W.M.; DE KRIJGER, RONALD; MANNELLI, MASSIMO; UDELSMAN, ROBERT; OCAL, IDRIS T.; SHULKIN, BARRY L.; BORNSTEIN, STEFAN R.; BREZA, JAN; KSINANTOVA, LUCIA; PACAK, KAREL
2016-01-01
There are currently no reliable diagnostic and prognostic markers or effective treatments for malignant pheochromocytoma. This study used oligonucleotide microarrays to examine gene expression profiles in pheochromocytomas from 90 patients, including 20 with malignant tumors, the latter including metastases and primary tumors from which metastases developed. Other subgroups of tumors included those defined by tissue norepinephrine compared to epinephrine contents (i.e., noradrenergic versus adrenergic phenotypes), adrenal versus extra-adrenal locations, and presence of germline mutations of genes pre-disposing to the tumor. Correcting for the confounding influence of nora-drenergic versus adrenergic catecholamine phenotype by the analysis of variance revealed a larger and more accurate number of genes that discriminated benign from malignant pheochromocytomas than when the confounding influence of catecholamine phenotype was not considered. Seventy percent of these genes were underexpressed in malignant compared to benign tumors. Similarly, 89% of genes were underexpressed in malignant primary tumors compared to benign tumors, suggesting that malignant potential is largely characterized by a less-differentiated pattern of gene expression. The present database of differentially expressed genes provides a unique resource for mapping the pathways leading to malignancy and for establishing new targets for treatment and diagnostic and prognostic markers of malignant disease. The database may also be useful for examining mechanisms of tumorigenesis and genotype–phenotype relationships. Further progress on the basis of this database can be made from follow-up confirmatory studies, application of bioinformatics approaches for data mining and pathway analyses, testing in pheochromocytoma cell culture and animal model systems, and retrospective and prospective studies of diagnostic markers. PMID:17102123
Rallapalli, P M; Kemball-Cook, G; Tuddenham, E G; Gomez, K; Perkins, S J
2013-07-01
Factor IX (FIX) is important in the coagulation cascade, being activated to FIXa on cleavage. Defects in the human F9 gene frequently lead to hemophilia B. To assess 1113 unique F9 mutations corresponding to 3721 patient entries in a new and up-to-date interactive web database alongside the FIXa protein structure. The mutations database was built using MySQL and structural analyses were based on a homology model for the human FIXa structure based on closely-related crystal structures. Mutations have been found in 336 (73%) out of 461 residues in FIX. There were 812 unique point mutations, 182 deletions, 54 polymorphisms, 39 insertions and 26 others that together comprise a total of 1113 unique variants. The 64 unique mild severity mutations in the mature protein with known circulating protein phenotypes include 15 (23%) quantitative type I mutations and 41 (64%) predominantly qualitative type II mutations. Inhibitors were described in 59 reports (1.6%) corresponding to 25 unique mutations. The interactive database provides insights into mechanisms of hemophilia B. Type II mutations are deduced to disrupt predominantly those structural regions involved with functional interactions. The interactive features of the database will assist in making judgments about patient management. © 2013 International Society on Thrombosis and Haemostasis.
Deal, Cheri; Hasselmann, Caroline; Pfäffle, Roland W; Zimmermann, Alan G; Quigley, Charmian A; Child, Christopher J; Shavrikova, Elena P; Cutler, Gordon B; Blum, Werner F
2013-01-01
Magnetic resonance imaging (MRI) is used to investigate the etiology of growth hormone deficiency (GHD). This study examined relationships between MRI findings and clinical/hormonal phenotypes in children with GHD in the observational Genetics and Neuroendocrinology of Short Stature International Study, GeNeSIS. Clinical presentation, hormonal status and first-year GH response were compared between patients with pituitary imaging abnormalities (n = 1,071), patients with mutations in genes involved in pituitary development/GH secretion (n = 120) and patients with idiopathic GHD (n = 7,039). Patients with hypothalamic-pituitary abnormalities had more severe phenotypes than patients with idiopathic GHD. Additional hormonal deficiencies were found in 35% of patients with structural abnormalities (thyroid-stimulating hormone > adrenocorticotropic hormone > luteinizing hormone/follicle-stimulating hormone > antidiuretic hormone), most frequently in patients with septo-optic dysplasia (SOD). Patients with the triad [ectopic posterior pituitary (EPP), pituitary aplasia/hypoplasia and stalk defects] had a more severe phenotype and better response to GH treatment than patients with isolated abnormalities. The sex ratio was approximately equal for patients with SOD, but there was a significantly higher proportion of males (approximately 70%) in the EPP, pituitary hypoplasia, stalk defects, and triad categories. This large, international database demonstrates the value of classification of GH-deficient patients by the presence and type of hypothalamic-pituitary imaging abnormalities. This information may assist family counseling and patient management. Copyright © 2013 S. Karger AG, Basel.
Abraham, Paul E; Wang, Xiaojing; Ranjan, Priya; Nookaew, Intawat; Zhang, Bing; Tuskan, Gerald A; Hettich, Robert L
2015-12-04
Next-generation sequencing has transformed the ability to link genotypes to phenotypes and facilitates the dissection of genetic contribution to complex traits. However, it is challenging to link genetic variants with the perturbed functional effects on proteins encoded by such genes. Here we show how RNA sequencing can be exploited to construct genotype-specific protein sequence databases to assess natural variation in proteins, providing information about the molecular toolbox driving cellular processes. For this study, we used two natural genotypes selected from a recent genome-wide association study of Populus trichocarpa, an obligate outcrosser with tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs), as well as insertions and deletions. We profiled the frequency of 128 types of naturally occurring amino acid substitutions, including both expected (neutral) and unexpected (non-neutral) SAAPs, with a subset occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. By zeroing in on the molecular signatures of these important regions that might have previously been uncharacterized, we now provide a high-resolution molecular inventory that should improve accessibility and subsequent identification of natural protein variants in future genotype-to-phenotype studies.
Vascular phenotypes in nonvascular subtypes of the Ehlers-Danlos syndrome: a systematic review
D'hondt, Sanne; Van Damme, Tim; Malfait, Fransiska
2018-01-01
Purpose Within the spectrum of the Ehlers-Danlos syndromes (EDS), vascular complications are usually associated with the vascular subtype of EDS. Vascular complications are also observed in other EDS subtypes, but the reports are anecdotal and the information is dispersed. To better document the nature of vascular complications among “nonvascular” EDS subtypes, we performed a systematic review. Methods We queried three databases for English-language studies from inception until May 2017, documenting both phenotypes and genotypes of patients with nonvascular EDS subtypes. The outcome included the number and nature of vascular complications. Results A total of 112 papers were included and data were collected from 467 patients, of whom 77 presented with a vascular phenotype. Severe complications included mainly hematomas (53%), frequently reported in musculocontractural and classical-like EDS; intracranial hemorrhages (18%), with a high risk in dermatosparaxis EDS; and arterial dissections (16%), frequently reported in kyphoscoliotic and classical EDS. Other, more minor, vascular complications were reported in cardiac-valvular, arthrochalasia, spondylodysplastic, and periodontal EDS. Conclusion Potentially life-threatening vascular complications are a rare but important finding in several nonvascular EDS subtypes, highlighting a need for more systematic documentation. This review will help familiarize clinicians with the spectrum of vascular complications in EDS and guide follow-up and management. PMID:28981071
Adamusiak, Tomasz; Parkinson, Helen; Muilu, Juha; Roos, Erik; van der Velde, Kasper Joeri; Thorisson, Gudmundur A; Byrne, Myles; Pang, Chao; Gollapudi, Sirisha; Ferretti, Vincent; Hillege, Hans; Brookes, Anthony J; Swertz, Morris A
2012-05-01
Genetic and epidemiological research increasingly employs large collections of phenotypic and molecular observation data from high quality human and model organism samples. Standardization efforts have produced a few simple formats for exchange of these various data, but a lightweight and convenient data representation scheme for all data modalities does not exist, hindering successful data integration, such as assignment of mouse models to orphan diseases and phenotypic clustering for pathways. We report a unified system to integrate and compare observation data across experimental projects, disease databases, and clinical biobanks. The core object model (Observ-OM) comprises only four basic concepts to represent any kind of observation: Targets, Features, Protocols (and their Applications), and Values. An easy-to-use file format (Observ-TAB) employs Excel to represent individual and aggregate data in straightforward spreadsheets. The systems have been tested successfully on human biobank, genome-wide association studies, quantitative trait loci, model organism, and patient registry data using the MOLGENIS platform to quickly setup custom data portals. Our system will dramatically lower the barrier for future data sharing and facilitate integrated search across panels and species. All models, formats, documentation, and software are available for free and open source (LGPLv3) at http://www.observ-om.org. © 2012 Wiley Periodicals, Inc.
regSNPs: a strategy for prioritizing regulatory single nucleotide substitutions
Teng, Mingxiang; Ichikawa, Shoji; Padgett, Leah R.; Wang, Yadong; Mort, Matthew; Cooper, David N.; Koller, Daniel L.; Foroud, Tatiana; Edenberg, Howard J.; Econs, Michael J.; Liu, Yunlong
2012-01-01
Motivation: One of the fundamental questions in genetics study is to identify functional DNA variants that are responsible to a disease or phenotype of interest. Results from large-scale genetics studies, such as genome-wide association studies (GWAS), and the availability of high-throughput sequencing technologies provide opportunities in identifying causal variants. Despite the technical advances, informatics methodologies need to be developed to prioritize thousands of variants for potential causative effects. Results: We present regSNPs, an informatics strategy that integrates several established bioinformatics tools, for prioritizing regulatory SNPs, i.e. the SNPs in the promoter regions that potentially affect phenotype through changing transcription of downstream genes. Comparing to existing tools, regSNPs has two distinct features. It considers degenerative features of binding motifs by calculating the differences on the binding affinity caused by the candidate variants and integrates potential phenotypic effects of various transcription factors. When tested by using the disease-causing variants documented in the Human Gene Mutation Database, regSNPs showed mixed performance on various diseases. regSNPs predicted three SNPs that can potentially affect bone density in a region detected in an earlier linkage study. Potential effects of one of the variants were validated using luciferase reporter assay. Contact: yunliu@iupui.edu Supplementary information: Supplementary data are available at Bioinformatics online PMID:22611130
E-Learning for Rare Diseases: An Example Using Fabry Disease.
Cimmaruta, Chiara; Liguori, Ludovica; Monticelli, Maria; Andreotti, Giuseppina; Citro, Valentina
2017-09-24
Rare diseases represent a challenge for physicians because patients are rarely seen, and they can manifest with symptoms similar to those of common diseases. In this work, genetic confirmation of diagnosis is derived from DNA sequencing. We present a tutorial for the molecular analysis of a rare disease using Fabry disease as an example. An exonic sequence derived from a hypothetical male patient was matched against human reference data using a genome browser. The missense mutation was identified by running BlastX, and information on the affected protein was retrieved from the database UniProt. The pathogenic nature of the mutation was assessed with PolyPhen-2. Disease-specific databases were used to assess whether the missense mutation led to a severe phenotype, and whether pharmacological therapy was an option. An inexpensive bioinformatics approach is presented to get the reader acquainted with the diagnosis of Fabry disease. The reader is introduced to the field of pharmacological chaperones, a therapeutic approach that can be applied only to certain Fabry genotypes. The principle underlying the analysis of exome sequencing can be explained in simple terms using web applications and databases which facilitate diagnosis and therapeutic choices.
MitoBreak: the mitochondrial DNA breakpoints database.
Damas, Joana; Carneiro, João; Amorim, António; Pereira, Filipe
2014-01-01
Mitochondrial DNA (mtDNA) rearrangements are key events in the development of many diseases. Investigations of mtDNA regions affected by rearrangements (i.e. breakpoints) can lead to important discoveries about rearrangement mechanisms and can offer important clues about the causes of mitochondrial diseases. Here, we present the mitochondrial DNA breakpoints database (MitoBreak; http://mitobreak.portugene.com), a free, web-accessible comprehensive list of breakpoints from three classes of somatic mtDNA rearrangements: circular deleted (deletions), circular partially duplicated (duplications) and linear mtDNAs. Currently, MitoBreak contains >1400 mtDNA rearrangements from seven species (Homo sapiens, Mus musculus, Rattus norvegicus, Macaca mulatta, Drosophila melanogaster, Caenorhabditis elegans and Podospora anserina) and their associated phenotypic information collected from nearly 400 publications. The database allows researchers to perform multiple types of data analyses through user-friendly interfaces with full or partial datasets. It also permits the download of curated data and the submission of new mtDNA rearrangements. For each reported case, MitoBreak also documents the precise breakpoint positions, junction sequences, disease or associated symptoms and links to the related publications, providing a useful resource to study the causes and consequences of mtDNA structural alterations.
Establishment of an Italian chronic migraine database: a multicenter pilot study.
Barbanti, Piero; Fofi, L; Cevoli, S; Torelli, P; Aurilia, C; Egeo, G; Grazzi, L; D'Amico, D; Manzoni, G C; Cortelli, P; Infarinato, F; Vanacore, N
2018-05-01
To optimize chronic migraine (CM) ascertainment and phenotype definition, provide adequate clinical management and health care procedures, and rationalize economic resources allocation, we performed an exploratory multicenter pilot study aimed at establishing a CM database, the first step for developing a future Italian CM registry. We enrolled 63 consecutive CM patients in four tertiary headache centers screened with face-to-face interviews using an ad hoc dedicated semi-structured questionnaire gathering detailed information on life-style, behavioral and socio-demographic factors, comorbidities, and migraine features before and after chronicization and healthcare resource use. Our pilot study provided useful insights revealing that CM patients (1) presented in most cases symptoms of peripheral trigeminal sensitization, a relatively unexpected feature which could be useful to unravel different CM endophenotypes and to predict trigeminal-targeted treatments' responsiveness; (2) had been frequently admitted to emergency departments; (3) had undergone, sometime repeatedly, unnecessary or inappropriate investigations; (4) got rarely illness benefit exemption or disability allowance only. We deem that the expansion of the database-shortly including many other Italian headache centers-will contribute to more precisely outline CM endophenotypes, hence improving management, treatment, and economic resource allocation, ultimately reducing CM burden on both patients and health system.
Measures for interoperability of phenotypic data: minimum information requirements and formatting.
Ćwiek-Kupczyńska, Hanna; Altmann, Thomas; Arend, Daniel; Arnaud, Elizabeth; Chen, Dijun; Cornut, Guillaume; Fiorani, Fabio; Frohmberg, Wojciech; Junker, Astrid; Klukas, Christian; Lange, Matthias; Mazurek, Cezary; Nafissi, Anahita; Neveu, Pascal; van Oeveren, Jan; Pommier, Cyril; Poorter, Hendrik; Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Scholz, Uwe; van Schriek, Marco; Seren, Ümit; Usadel, Björn; Weise, Stephan; Kersey, Paul; Krajewski, Paweł
2016-01-01
Plant phenotypic data shrouds a wealth of information which, when accurately analysed and linked to other data types, brings to light the knowledge about the mechanisms of life. As phenotyping is a field of research comprising manifold, diverse and time-consuming experiments, the findings can be fostered by reusing and combining existing datasets. Their correct interpretation, and thus replicability, comparability and interoperability, is possible provided that the collected observations are equipped with an adequate set of metadata. So far there have been no common standards governing phenotypic data description, which hampered data exchange and reuse. In this paper we propose the guidelines for proper handling of the information about plant phenotyping experiments, in terms of both the recommended content of the description and its formatting. We provide a document called "Minimum Information About a Plant Phenotyping Experiment", which specifies what information about each experiment should be given, and a Phenotyping Configuration for the ISA-Tab format, which allows to practically organise this information within a dataset. We provide examples of ISA-Tab-formatted phenotypic data, and a general description of a few systems where the recommendations have been implemented. Acceptance of the rules described in this paper by the plant phenotyping community will help to achieve findable, accessible, interoperable and reusable data.
Pituitary gene mutations and the growth hormone pathway.
Moseley, C T; Phillips, J A
2000-01-01
Hereditary forms of pituitary insufficiency not associated with anatomic defects of the central nervous system, hypothalamus, or pituitary are a heterogeneous group of disorders that result from interruptions at different points in the hypothalamic-pituitary-somatomedin-peripheral tissue axis. These different types of pituitary dwarfism can be classified on the level of the defect; mode of inheritance; whether the phenotype is isolated growth hormone deficiency (IGHD) or combined pituitary hormone deficiency (CPHD); whether the hormone is absent, deficient, or abnormal; and, in patients with GH resistance, whether insulin-like growth factor 1 (IGF1) is deficient due to GH receptor or IGF1 defects. Information on each disorder is summarized. More detailed information can be obtained through the electronic database Online Mendelian Inheritance in Man which is available at http://www3.ncbi.nlm.nih.gov/Omim/.
Interoperability between phenotype and anatomy ontologies.
Hoehndorf, Robert; Oellrich, Anika; Rebholz-Schuhmann, Dietrich
2010-12-15
Phenotypic information is important for the analysis of the molecular mechanisms underlying disease. A formal ontological representation of phenotypic information can help to identify, interpret and infer phenotypic traits based on experimental findings. The methods that are currently used to represent data and information about phenotypes fail to make the semantics of the phenotypic trait explicit and do not interoperate with ontologies of anatomy and other domains. Therefore, valuable resources for the analysis of phenotype studies remain unconnected and inaccessible to automated analysis and reasoning. We provide a framework to formalize phenotypic descriptions and make their semantics explicit. Based on this formalization, we provide the means to integrate phenotypic descriptions with ontologies of other domains, in particular anatomy and physiology. We demonstrate how our framework leads to the capability to represent disease phenotypes, perform powerful queries that were not possible before and infer additional knowledge. http://bioonto.de/pmwiki.php/Main/PheneOntology.
An overview and online registry of microvillus inclusion disease patients and their MYO5B mutations.
van der Velde, K Joeri; Dhekne, Herschel S; Swertz, Morris A; Sirigu, Serena; Ropars, Virginie; Vinke, Petra C; Rengaw, Trebor; van den Akker, Peter C; Rings, Edmond H H M; Houdusse, Anne; van Ijzendoorn, Sven C D
2013-12-01
Microvillus inclusion disease (MVID) is one of the most severe congenital intestinal disorders and is characterized by neonatal secretory diarrhea and the inability to absorb nutrients from the intestinal lumen. MVID is associated with patient-, family-, and ancestry-unique mutations in the MYO5B gene, encoding the actin-based motor protein myosin Vb. Here, we review the MYO5B gene and all currently known MYO5B mutations and for the first time methodologically categorize these with regard to functional protein domains and recurrence in MYO7A associated with Usher syndrome and other myosins. We also review animal models for MVID and the latest data on functional studies related to the myosin Vb protein. To congregate existing and future information on MVID geno-/phenotypes and facilitate its quick and easy sharing among clinicians and researchers, we have constructed an online MOLGENIS-based international patient registry (www.MVID-central.org). This easily accessible database currently contains detailed information of 137 MVID patients together with reported clinical/phenotypic details and 41 unique MYO5B mutations, of which several unpublished. The future expansion and prospective nature of this registry is expected to improve disease diagnosis, prognosis, and genetic counseling. © 2013 WILEY PERIODICALS, INC.
Knoppers, Bartha M; Isasi, Rosario; Benvenisty, Nissim; Kim, Ock-Joo; Lomax, Geoffrey; Morris, Clive; Murray, Thomas H; Lee, Eng Hin; Perry, Margery; Richardson, Genevra; Sipp, Douglas; Tanner, Klaus; Wahlström, Jan; de Wert, Guido; Zeng, Fanyi
2011-09-01
Novel methods and associated tools permitting individual identification in publicly accessible SNP databases have become a debatable issue. There is growing concern that current technical and ethical safeguards to protect the identities of donors could be insufficient. In the context of human embryonic stem cell research, there are no studies focusing on the probability that an hESC line donor could be identified by analyzing published SNP profiles and associated genotypic and phenotypic information. We present the International Stem Cell Forum (ISCF) Ethics Working Party's Policy Statement on "Publishing SNP Genotypes of Human Embryonic Stem Cell Lines (hESC)". The Statement prospectively addresses issues surrounding the publication of genotypic data and associated annotations of hESC lines in open access databases. It proposes a balanced approach between the goals of open science and data sharing with the respect for fundamental bioethical principles (autonomy, privacy, beneficence, justice and research merit and integrity).
Ortholog Identification and Comparative Analysis of Microbial Genomes Using MBGD and RECOG.
Uchiyama, Ikuo
2017-01-01
Comparative genomics is becoming an essential approach for identification of genes associated with a specific function or phenotype. Here, we introduce the microbial genome database for comparative analysis (MBGD), which is a comprehensive ortholog database among the microbial genomes available so far. MBGD contains several precomputed ortholog tables including the standard ortholog table covering the entire taxonomic range and taxon-specific ortholog tables for various major taxa. In addition, MBGD allows the users to create an ortholog table within any specified set of genomes through dynamic calculations. In particular, MBGD has a "My MBGD" mode where users can upload their original genome sequences and incorporate them into orthology analysis. The created ortholog table can serve as the basis for various comparative analyses. Here, we describe the use of MBGD and briefly explain how to utilize the orthology information during comparative genome analysis in combination with the stand-alone comparative genomics software RECOG, focusing on the application to comparison of closely related microbial genomes.
JAX Colony Management System (JCMS): an extensible colony and phenotype data management system.
Donnelly, Chuck J; McFarland, Mike; Ames, Abigail; Sundberg, Beth; Springer, Dave; Blauth, Peter; Bult, Carol J
2010-04-01
The Jackson Laboratory Colony Management System (JCMS) is a software application for managing data and information related to research mouse colonies, associated biospecimens, and experimental protocols. JCMS runs directly on computers that run one of the PC Windows operating systems, but can be accessed via web browser interfaces from any computer running a Windows, Macintosh, or Linux operating system. JCMS can be configured for a single user or multiple users in small- to medium-size work groups. The target audience for JCMS includes laboratory technicians, animal colony managers, and principal investigators. The application provides operational support for colony management and experimental workflows, sample and data tracking through transaction-based data entry forms, and date-driven work reports. Flexible query forms allow researchers to retrieve database records based on user-defined criteria. Recent advances in handheld computers with integrated barcode readers, middleware technologies, web browsers, and wireless networks add to the utility of JCMS by allowing real-time access to the database from any networked computer.
Mouse Genome Database: From sequence to phenotypes and disease models
Richardson, Joel E.; Kadin, James A.; Smith, Cynthia L.; Blake, Judith A.; Bult, Carol J.
2015-01-01
Summary The Mouse Genome Database (MGD, www.informatics.jax.org) is the international scientific database for genetic, genomic, and biological data on the laboratory mouse to support the research requirements of the biomedical community. To accomplish this goal, MGD provides broad data coverage, serves as the authoritative standard for mouse nomenclature for genes, mutants, and strains, and curates and integrates many types of data from literature and electronic sources. Among the key data sets MGD supports are: the complete catalog of mouse genes and genome features, comparative homology data for mouse and vertebrate genes, the authoritative set of Gene Ontology (GO) annotations for mouse gene functions, a comprehensive catalog of mouse mutations and their phenotypes, and a curated compendium of mouse models of human diseases. Here, we describe the data acquisition process, specifics about MGD's key data areas, methods to access and query MGD data, and outreach and user help facilities. genesis 53:458–473, 2015. © 2015 The Authors. Genesis Published by Wiley Periodicals, Inc. PMID:26150326
ALDB: a domestic-animal long noncoding RNA database.
Li, Aimin; Zhang, Junying; Zhou, Zhongyin; Wang, Lei; Liu, Yujuan; Liu, Yajun
2015-01-01
Long noncoding RNAs (lncRNAs) have attracted significant attention in recent years due to their important roles in many biological processes. Domestic animals constitute a unique resource for understanding the genetic basis of phenotypic variation and are ideal models relevant to diverse areas of biomedical research. With improving sequencing technologies, numerous domestic-animal lncRNAs are now available. Thus, there is an immediate need for a database resource that can assist researchers to store, organize, analyze and visualize domestic-animal lncRNAs. The domestic-animal lncRNA database, named ALDB, is the first comprehensive database with a focus on the domestic-animal lncRNAs. It currently archives 12,103 pig intergenic lncRNAs (lincRNAs), 8,923 chicken lincRNAs and 8,250 cow lincRNAs. In addition to the annotations of lincRNAs, it offers related data that is not available yet in existing lncRNA databases (lncRNAdb and NONCODE), such as genome-wide expression profiles and animal quantitative trait loci (QTLs) of domestic animals. Moreover, a collection of interfaces and applications, such as the Basic Local Alignment Search Tool (BLAST), the Generic Genome Browser (GBrowse) and flexible search functionalities, are available to help users effectively explore, analyze and download data related to domestic-animal lncRNAs. ALDB enables the exploration and comparative analysis of lncRNAs in domestic animals. A user-friendly web interface, integrated information and tools make it valuable to researchers in their studies. ALDB is freely available from http://res.xaut.edu.cn/aldb/index.jsp.
SeedStor: A Germplasm Information Management System and Public Database
Horler, RSP; Turner, AS; Fretter, P; Ambrose, M
2018-01-01
Abstract SeedStor (https://www.seedstor.ac.uk) acts as the publicly available database for the seed collections held by the Germplasm Resources Unit (GRU) based at the John Innes Centre, Norwich, UK. The GRU is a national capability supported by the Biotechnology and Biological Sciences Research Council (BBSRC). The GRU curates germplasm collections of a range of temperate cereal, legume and Brassica crops and their associated wild relatives, as well as precise genetic stocks, near-isogenic lines and mapping populations. With >35,000 accessions, the GRU forms part of the UK’s plant conservation contribution to the Multilateral System (MLS) of the International Treaty for Plant Genetic Resources for Food and Agriculture (ITPGRFA) for wheat, barley, oat and pea. SeedStor is a fully searchable system that allows our various collections to be browsed species by species through to complicated multipart phenotype criteria-driven queries. The results from these searches can be downloaded for later analysis or used to order germplasm via our shopping cart. The user community for SeedStor is the plant science research community, plant breeders, specialist growers, hobby farmers and amateur gardeners, and educationalists. Furthermore, SeedStor is much more than a database; it has been developed to act internally as a Germplasm Information Management System that allows team members to track and process germplasm requests, determine regeneration priorities, handle cost recovery and Material Transfer Agreement paperwork, manage the Seed Store holdings and easily report on a wide range of the aforementioned tasks. PMID:29228298
Iwasaki, Wataru; Fukunaga, Tsukasa; Isagozawa, Ryota; Yamada, Koichiro; Maeda, Yasunobu; Satoh, Takashi P.; Sado, Tetsuya; Mabuchi, Kohji; Takeshima, Hirohiko; Miya, Masaki; Nishida, Mutsumi
2013-01-01
Mitofish is a database of fish mitochondrial genomes (mitogenomes) that includes powerful and precise de novo annotations for mitogenome sequences. Fish occupy an important position in the evolution of vertebrates and the ecology of the hydrosphere, and mitogenomic sequence data have served as a rich source of information for resolving fish phylogenies and identifying new fish species. The importance of a mitogenomic database continues to grow at a rapid pace as massive amounts of mitogenomic data are generated with the advent of new sequencing technologies. A severe bottleneck seems likely to occur with regard to mitogenome annotation because of the overwhelming pace of data accumulation and the intrinsic difficulties in annotating sequences with degenerating transfer RNA structures, divergent start/stop codons of the coding elements, and the overlapping of adjacent elements. To ease this data backlog, we developed an annotation pipeline named MitoAnnotator. MitoAnnotator automatically annotates a fish mitogenome with a high degree of accuracy in approximately 5 min; thus, it is readily applicable to data sets of dozens of sequences. MitoFish also contains re-annotations of previously sequenced fish mitogenomes, enabling researchers to refer to them when they find annotations that are likely to be erroneous or while conducting comparative mitogenomic analyses. For users who need more information on the taxonomy, habitats, phenotypes, or life cycles of fish, MitoFish provides links to related databases. MitoFish and MitoAnnotator are freely available at http://mitofish.aori.u-tokyo.ac.jp/ (last accessed August 28, 2013); all of the data can be batch downloaded, and the annotation pipeline can be used via a web interface. PMID:23955518
Henderson, Jette; Ke, Junyuan; Ho, Joyce C; Ghosh, Joydeep; Wallace, Byron C
2018-05-04
Researchers are developing methods to automatically extract clinically relevant and useful patient characteristics from raw healthcare datasets. These characteristics, often capturing essential properties of patients with common medical conditions, are called computational phenotypes. Being generated by automated or semiautomated, data-driven methods, such potential phenotypes need to be validated as clinically meaningful (or not) before they are acceptable for use in decision making. The objective of this study was to present Phenotype Instance Verification and Evaluation Tool (PIVET), a framework that uses co-occurrence analysis on an online corpus of publically available medical journal articles to build clinical relevance evidence sets for user-supplied phenotypes. PIVET adopts a conceptual framework similar to the pioneering prototype tool PheKnow-Cloud that was developed for the phenotype validation task. PIVET completely refactors each part of the PheKnow-Cloud pipeline to deliver vast improvements in speed without sacrificing the quality of the insights PheKnow-Cloud achieved. PIVET leverages indexing in NoSQL databases to efficiently generate evidence sets. Specifically, PIVET uses a succinct representation of the phenotypes that corresponds to the index on the corpus database and an optimized co-occurrence algorithm inspired by the Aho-Corasick algorithm. We compare PIVET's phenotype representation with PheKnow-Cloud's by using PheKnow-Cloud's experimental setup. In PIVET's framework, we also introduce a statistical model trained on domain expert-verified phenotypes to automatically classify phenotypes as clinically relevant or not. Additionally, we show how the classification model can be used to examine user-supplied phenotypes in an online, rather than batch, manner. PIVET maintains the discriminative power of PheKnow-Cloud in terms of identifying clinically relevant phenotypes for the same corpus with which PheKnow-Cloud was originally developed, but PIVET's analysis is an order of magnitude faster than that of PheKnow-Cloud. Not only is PIVET much faster, it can be scaled to a larger corpus and still retain speed. We evaluated multiple classification models on top of the PIVET framework and found ridge regression to perform best, realizing an average F1 score of 0.91 when predicting clinically relevant phenotypes. Our study shows that PIVET improves on the most notable existing computational tool for phenotype validation in terms of speed and automation and is comparable in terms of accuracy. ©Jette Henderson, Junyuan Ke, Joyce C Ho, Joydeep Ghosh, Byron C Wallace. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 04.05.2018.
Ke, Junyuan; Ho, Joyce C; Ghosh, Joydeep; Wallace, Byron C
2018-01-01
Background Researchers are developing methods to automatically extract clinically relevant and useful patient characteristics from raw healthcare datasets. These characteristics, often capturing essential properties of patients with common medical conditions, are called computational phenotypes. Being generated by automated or semiautomated, data-driven methods, such potential phenotypes need to be validated as clinically meaningful (or not) before they are acceptable for use in decision making. Objective The objective of this study was to present Phenotype Instance Verification and Evaluation Tool (PIVET), a framework that uses co-occurrence analysis on an online corpus of publically available medical journal articles to build clinical relevance evidence sets for user-supplied phenotypes. PIVET adopts a conceptual framework similar to the pioneering prototype tool PheKnow-Cloud that was developed for the phenotype validation task. PIVET completely refactors each part of the PheKnow-Cloud pipeline to deliver vast improvements in speed without sacrificing the quality of the insights PheKnow-Cloud achieved. Methods PIVET leverages indexing in NoSQL databases to efficiently generate evidence sets. Specifically, PIVET uses a succinct representation of the phenotypes that corresponds to the index on the corpus database and an optimized co-occurrence algorithm inspired by the Aho-Corasick algorithm. We compare PIVET’s phenotype representation with PheKnow-Cloud’s by using PheKnow-Cloud’s experimental setup. In PIVET’s framework, we also introduce a statistical model trained on domain expert–verified phenotypes to automatically classify phenotypes as clinically relevant or not. Additionally, we show how the classification model can be used to examine user-supplied phenotypes in an online, rather than batch, manner. Results PIVET maintains the discriminative power of PheKnow-Cloud in terms of identifying clinically relevant phenotypes for the same corpus with which PheKnow-Cloud was originally developed, but PIVET’s analysis is an order of magnitude faster than that of PheKnow-Cloud. Not only is PIVET much faster, it can be scaled to a larger corpus and still retain speed. We evaluated multiple classification models on top of the PIVET framework and found ridge regression to perform best, realizing an average F1 score of 0.91 when predicting clinically relevant phenotypes. Conclusions Our study shows that PIVET improves on the most notable existing computational tool for phenotype validation in terms of speed and automation and is comparable in terms of accuracy. PMID:29728351
Qiu, Jingya; Darabos, Christian
2016-01-01
ABSTRACT Genome‐wide association studies (GWAS) have led to the discovery of over 200 single nucleotide polymorphisms (SNPs) associated with type 2 diabetes mellitus (T2DM). Additionally, East Asians develop T2DM at a higher rate, younger age, and lower body mass index than their European ancestry counterparts. The reason behind this occurrence remains elusive. With comprehensive searches through the National Human Genome Research Institute (NHGRI) GWAS catalog literature, we compiled a database of 2,800 ancestry‐specific SNPs associated with T2DM and 70 other related traits. Manual data extraction was necessary because the GWAS catalog reports statistics such as odds ratio and P‐value, but does not consistently include ancestry information. Currently, many statistics are derived by combining initial and replication samples from study populations of mixed ancestry. Analysis of all‐inclusive data can be misleading, as not all SNPs are transferable across diverse populations. We used ancestry data to construct ancestry‐specific human phenotype networks (HPN) centered on T2DM. Quantitative and visual analysis of network models reveal the genetic disparities between ancestry groups. Of the 27 phenotypes in the East Asian HPN, six phenotypes were unique to the network, revealing the underlying ancestry‐specific nature of some SNPs associated with T2DM. We studied the relationship between T2DM and five phenotypes unique to the East Asian HPN to generate new interaction hypotheses in a clinical context. The genetic differences found in our ancestry‐specific HPNs suggest different pathways are involved in the pathogenesis of T2DM among different populations. Our study underlines the importance of ancestry in the development of T2DM and its implications in pharmocogenetics and personalized medicine. PMID:27061195
Phenome-driven disease genetics prediction toward drug discovery
Chen, Yang; Li, Li; Zhang, Guo-Qiang; Xu, Rong
2015-01-01
Motivation: Discerning genetic contributions to diseases not only enhances our understanding of disease mechanisms, but also leads to translational opportunities for drug discovery. Recent computational approaches incorporate disease phenotypic similarities to improve the prediction power of disease gene discovery. However, most current studies used only one data source of human disease phenotype. We present an innovative and generic strategy for combining multiple different data sources of human disease phenotype and predicting disease-associated genes from integrated phenotypic and genomic data. Results: To demonstrate our approach, we explored a new phenotype database from biomedical ontologies and constructed Disease Manifestation Network (DMN). We combined DMN with mimMiner, which was a widely used phenotype database in disease gene prediction studies. Our approach achieved significantly improved performance over a baseline method, which used only one phenotype data source. In the leave-one-out cross-validation and de novo gene prediction analysis, our approach achieved the area under the curves of 90.7% and 90.3%, which are significantly higher than 84.2% (P < e−4) and 81.3% (P < e−12) for the baseline approach. We further demonstrated that our predicted genes have the translational potential in drug discovery. We used Crohn’s disease as an example and ranked the candidate drugs based on the rank of drug targets. Our gene prediction approach prioritized druggable genes that are likely to be associated with Crohn’s disease pathogenesis, and our rank of candidate drugs successfully prioritized the Food and Drug Administration-approved drugs for Crohn’s disease. We also found literature evidence to support a number of drugs among the top 200 candidates. In summary, we demonstrated that a novel strategy combining unique disease phenotype data with system approaches can lead to rapid drug discovery. Availability and implementation: nlp.case.edu/public/data/DMN Contact: rxx@case.edu PMID:26072493
Urban, Martin; Cuzick, Alayne; Rutherford, Kim; Irvine, Alistair; Pedro, Helder; Pant, Rashmi; Sadanadan, Vidyendra; Khamari, Lokanath; Billal, Santoshkumar; Mohanty, Sagar; Hammond-Kosack, Kim E.
2017-01-01
The pathogen–host interactions database (PHI-base) is available at www.phi-base.org. PHI-base contains expertly curated molecular and biological information on genes proven to affect the outcome of pathogen–host interactions reported in peer reviewed research articles. In addition, literature that indicates specific gene alterations that did not affect the disease interaction phenotype are curated to provide complete datasets for comparative purposes. Viruses are not included. Here we describe a revised PHI-base Version 4 data platform with improved search, filtering and extended data display functions. A PHIB-BLAST search function is provided and a link to PHI-Canto, a tool for authors to directly curate their own published data into PHI-base. The new release of PHI-base Version 4.2 (October 2016) has an increased data content containing information from 2219 manually curated references. The data provide information on 4460 genes from 264 pathogens tested on 176 hosts in 8046 interactions. Prokaryotic and eukaryotic pathogens are represented in almost equal numbers. Host species belong ∼70% to plants and 30% to other species of medical and/or environmental importance. Additional data types included into PHI-base 4 are the direct targets of pathogen effector proteins in experimental and natural host organisms. The curation problems encountered and the future directions of the PHI-base project are briefly discussed. PMID:27915230
Panou, Manthos
2016-01-01
Abstract Background Currently, cyanobacterial diversity is examined using a polyphasic approach by assessing morphological and molecular data (Komárek 2015). However, the comparison of morphological and genetic data is sometimes hindered by the lack of cultures of several cyanobacterial morphospecies and inadequate morphological data of sequenced strains (Rajaniemi et al. 2005). Furthermore, in order to evaluate the phenotypic plasticity within defined taxa, the variability observed in cultures has to be compared to the range in natural variation (Komárek and Mareš 2012). Thus, new tools are needed to aggregate, link and process data in a meaningful way, in order to properly study and understand cyanodiversity. New information An online database on cyanobacteria has been created, namely the Cyanobacteria culture collection (CCC) (http://cyanobacteria.myspecies.info/) using as case studies cyanobacterial strains isolated from lakes of Greece, which are part of the AUTH culture collection (School of Biology, Aristotle University of Thessaloniki). The database hosts, for the first time, information and data such as morphology/morphometry, biogeography, phylogeny, microphotographs, distribution maps, toxicology and biochemical traits of the strains. All this data are structured managed, and presented online and are publicly accessible with a recently developed tool, namely “Scratchpads”, a taxon-centric virtual research environment allowing browsing the taxonomic classification and retrieving various kinds of relevant information for each taxon. PMID:27226753
Unsupervised automated high throughput phenotyping of RNAi time-lapse movies.
Failmezger, Henrik; Fröhlich, Holger; Tresch, Achim
2013-10-04
Gene perturbation experiments in combination with fluorescence time-lapse cell imaging are a powerful tool in reverse genetics. High content applications require tools for the automated processing of the large amounts of data. These tools include in general several image processing steps, the extraction of morphological descriptors, and the grouping of cells into phenotype classes according to their descriptors. This phenotyping can be applied in a supervised or an unsupervised manner. Unsupervised methods are suitable for the discovery of formerly unknown phenotypes, which are expected to occur in high-throughput RNAi time-lapse screens. We developed an unsupervised phenotyping approach based on Hidden Markov Models (HMMs) with multivariate Gaussian emissions for the detection of knockdown-specific phenotypes in RNAi time-lapse movies. The automated detection of abnormal cell morphologies allows us to assign a phenotypic fingerprint to each gene knockdown. By applying our method to the Mitocheck database, we show that a phenotypic fingerprint is indicative of a gene's function. Our fully unsupervised HMM-based phenotyping is able to automatically identify cell morphologies that are specific for a certain knockdown. Beyond the identification of genes whose knockdown affects cell morphology, phenotypic fingerprints can be used to find modules of functionally related genes.
Empirical data on 220 families with de novo or inherited paracentric inversions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eyre, J.; McConkie-Rosell, A.; Tripp, T.
Six new cases of paracentric inversions (3 detected prenatally) are presented and added to an expanding database of paracentric inversions. Three inversions were associated with an abnormal phenotype and detected postnatally: inv(2)(p21p23), inv(13)(q14q34), and inv(18)(q12.3q23). The present database of paracentric inversions includes 220 families reported. All chromosomes were involved except chromosome 20. The most frequent inversions were found on chromosomes 1, 3, 7, 11, and 14. 48 index cases had an abnormal phenotype not explainable by other causes such as additional chromosome abnormalities. Of these, 12 were de novo and 36 familial. By contrast, of the 122 index cases withmore » normal phenotype, there were 8 de novo and 87 familial cases (rest unknown). Ascertainment bias probably accounts for some of the abnormal inherited inversions cases. Maternally inherited inversions were more frequent than paternally inherited (72 versus 55). Inversions were found in males more than females (ratio of 4 to 3). There were some paracentric inversions that appear to be less involved with abnormal phenotypes (e.g., 11q21q23) than other inversions (e.g., inv X and Turner syndrome). An interesting observation which warrants further investigation is the excess number of fetal losses and karyotypically abnormal progeny in paracentric inversion carriers. The presence of additional karyotypic abnormalities in the children might be explainable by interchromosomal effects and chromosome position changes in the nucleus. Genetic counseling for paracentric inversions should take into consideration mode of ascertainment, inheritance, and chromosome involved. We solicit other cases of paracentric inversions to make this database more useful in counseling patients and families.« less
Phenotypic and genotypic data integration and exploration through a web-service architecture.
Nuzzo, Angelo; Riva, Alberto; Bellazzi, Riccardo
2009-10-15
Linking genotypic and phenotypic information is one of the greatest challenges of current genetics research. The definition of an Information Technology infrastructure to support this kind of studies, and in particular studies aimed at the analysis of complex traits, which require the definition of multifaceted phenotypes and the integration genotypic information to discover the most prevalent diseases, is a paradigmatic goal of Biomedical Informatics. This paper describes the use of Information Technology methods and tools to develop a system for the management, inspection and integration of phenotypic and genotypic data. We present the design and architecture of the Phenotype Miner, a software system able to flexibly manage phenotypic information, and its extended functionalities to retrieve genotype information from external repositories and to relate it to phenotypic data. For this purpose we developed a module to allow customized data upload by the user and a SOAP-based communications layer to retrieve data from existing biomedical knowledge management tools. In this paper we also demonstrate the system functionality by an example application of the system in which we analyze two related genomic datasets. In this paper we show how a comprehensive, integrated and automated workbench for genotype and phenotype integration can facilitate and improve the hypothesis generation process underlying modern genetic studies.
Sandhu, Maninder; Sureshkumar, V; Prakash, Chandra; Dixit, Rekha; Solanke, Amolkumar U; Sharma, Tilak Raj; Mohapatra, Trilochan; S V, Amitha Mithra
2017-09-30
Genome-wide microarray has enabled development of robust databases for functional genomics studies in rice. However, such databases do not directly cater to the needs of breeders. Here, we have attempted to develop a web interface which combines the information from functional genomic studies across different genetic backgrounds with DNA markers so that they can be readily deployed in crop improvement. In the current version of the database, we have included drought and salinity stress studies since these two are the major abiotic stresses in rice. RiceMetaSys, a user-friendly and freely available web interface provides comprehensive information on salt responsive genes (SRGs) and drought responsive genes (DRGs) across genotypes, crop development stages and tissues, identified from multiple microarray datasets. 'Physical position search' is an attractive tool for those using QTL based approach for dissecting tolerance to salt and drought stress since it can provide the list of SRGs and DRGs in any physical interval. To identify robust candidate genes for use in crop improvement, the 'common genes across varieties' search tool is useful. Graphical visualization of expression profiles across genes and rice genotypes has been enabled to facilitate the user and to make the comparisons more impactful. Simple Sequence Repeat (SSR) search in the SRGs and DRGs is a valuable tool for fine mapping and marker assisted selection since it provides primers for survey of polymorphism. An external link to intron specific markers is also provided for this purpose. Bulk retrieval of data without any limit has been enabled in case of locus and SSR search. The aim of this database is to facilitate users with a simple and straight-forward search options for identification of robust candidate genes from among thousands of SRGs and DRGs so as to facilitate linking variation in expression profiles to variation in phenotype. Database URL: http://14.139.229.201.
NASA Astrophysics Data System (ADS)
Hoehndorf, Robert; Schofield, Paul N.; Gkoutos, Georgios V.
2015-06-01
Phenotypes are the observable characteristics of an organism arising from its response to the environment. Phenotypes associated with engineered and natural genetic variation are widely recorded using phenotype ontologies in model organisms, as are signs and symptoms of human Mendelian diseases in databases such as OMIM and Orphanet. Exploiting these resources, several computational methods have been developed for integration and analysis of phenotype data to identify the genetic etiology of diseases or suggest plausible interventions. A similar resource would be highly useful not only for rare and Mendelian diseases, but also for common, complex and infectious diseases. We apply a semantic text-mining approach to identify the phenotypes (signs and symptoms) associated with over 6,000 diseases. We evaluate our text-mined phenotypes by demonstrating that they can correctly identify known disease-associated genes in mice and humans with high accuracy. Using a phenotypic similarity measure, we generate a human disease network in which diseases that have similar signs and symptoms cluster together, and we use this network to identify closely related diseases based on common etiological, anatomical as well as physiological underpinnings.
Breeding and Genetics Symposium: networks and pathways to guide genomic selection.
Snelling, W M; Cushman, R A; Keele, J W; Maltecca, C; Thomas, M G; Fortes, M R S; Reverter, A
2013-02-01
Many traits affecting profitability and sustainability of meat, milk, and fiber production are polygenic, with no single gene having an overwhelming influence on observed variation. No knowledge of the specific genes controlling these traits has been needed to make substantial improvement through selection. Significant gains have been made through phenotypic selection enhanced by pedigree relationships and continually improving statistical methodology. Genomic selection, recently enabled by assays for dense SNP located throughout the genome, promises to increase selection accuracy and accelerate genetic improvement by emphasizing the SNP most strongly correlated to phenotype although the genes and sequence variants affecting phenotype remain largely unknown. These genomic predictions theoretically rely on linkage disequilibrium (LD) between genotyped SNP and unknown functional variants, but familial linkage may increase effectiveness when predicting individuals related to those in the training data. Genomic selection with functional SNP genotypes should be less reliant on LD patterns shared by training and target populations, possibly allowing robust prediction across unrelated populations. Although the specific variants causing polygenic variation may never be known with certainty, a number of tools and resources can be used to identify those most likely to affect phenotype. Associations of dense SNP genotypes with phenotype provide a 1-dimensional approach for identifying genes affecting specific traits; in contrast, associations with multiple traits allow defining networks of genes interacting to affect correlated traits. Such networks are especially compelling when corroborated by existing functional annotation and established molecular pathways. The SNP occurring within network genes, obtained from public databases or derived from genome and transcriptome sequences, may be classified according to expected effects on gene products. As illustrated by functionally informed genomic predictions being more accurate than naive whole-genome predictions of beef tenderness, coupling evidence from livestock genotypes, phenotypes, gene expression, and genomic variants with existing knowledge of gene functions and interactions may provide greater insight into the genes and genomic mechanisms affecting polygenic traits and facilitate functional genomic selection for economically important traits.
Spontaneous belief attribution in younger siblings of children on the autism spectrum.
Gliga, Teodora; Senju, Atsushi; Pettinato, Michèle; Charman, Tony; Johnson, Mark H
2014-03-01
The recent development in the measurements of spontaneous mental state understanding, employing eye-movements instead of verbal responses, has opened new opportunities for understanding the developmental origin of "mind-reading" impairments frequently described in autism spectrum disorders (ASDs). Our main aim was to characterize the relationship between mental state understanding and the broader autism phenotype, early in childhood. An eye-tracker was used to capture anticipatory looking as a measure of false beliefs attribution in 3-year-old children with a family history of autism (at-risk participants, n = 47) and controls (control participants, n = 39). Unlike controls, the at-risk group, independent of their clinical outcome (ASD, broader autism phenotype or typically developing), performed at chance. Performance was not related to children's verbal or general IQ, nor was it explained by children "missing out" on crucial information, as shown by an analysis of visual scanning during the task. We conclude that difficulties with using mental state understanding for action prediction may be an endophenotype of autism spectrum disorders. PsycINFO Database Record (c) 2014 APA, all rights reserved.
PERSON-Personalized Expert Recommendation System for Optimized Nutrition.
Chen, Chih-Han; Karvela, Maria; Sohbati, Mohammadreza; Shinawatra, Thaksin; Toumazou, Christofer
2018-02-01
The rise of personalized diets is due to the emergence of nutrigenetics and genetic tests services. However, the recommendation system is far from mature to provide personalized food suggestion to consumers for daily usage. The main barrier of connecting genetic information to personalized diets is the complexity of data and the scalability of the applied systems. Aiming to cross such barriers and provide direct applications, a personalized expert recommendation system for optimized nutrition is introduced in this paper, which performs direct to consumer personalized grocery product filtering and recommendation. Deep learning neural network model is applied to achieve automatic product categorization. The ability of scaling with unknown new data is achieved through the generalized representation of word embedding. Furthermore, the categorized products are filtered with a model based on individual genetic data with associated phenotypic information and a case study with databases from three different sources is carried out to confirm the system.
Using text mining techniques to extract phenotypic information from the PhenoCHF corpus
2015-01-01
Background Phenotypic information locked away in unstructured narrative text presents significant barriers to information accessibility, both for clinical practitioners and for computerised applications used for clinical research purposes. Text mining (TM) techniques have previously been applied successfully to extract different types of information from text in the biomedical domain. They have the potential to be extended to allow the extraction of information relating to phenotypes from free text. Methods To stimulate the development of TM systems that are able to extract phenotypic information from text, we have created a new corpus (PhenoCHF) that is annotated by domain experts with several types of phenotypic information relating to congestive heart failure. To ensure that systems developed using the corpus are robust to multiple text types, it integrates text from heterogeneous sources, i.e., electronic health records (EHRs) and scientific articles from the literature. We have developed several different phenotype extraction methods to demonstrate the utility of the corpus, and tested these methods on a further corpus, i.e., ShARe/CLEF 2013. Results Evaluation of our automated methods showed that PhenoCHF can facilitate the training of reliable phenotype extraction systems, which are robust to variations in text type. These results have been reinforced by evaluating our trained systems on the ShARe/CLEF corpus, which contains clinical records of various types. Like other studies within the biomedical domain, we found that solutions based on conditional random fields produced the best results, when coupled with a rich feature set. Conclusions PhenoCHF is the first annotated corpus aimed at encoding detailed phenotypic information. The unique heterogeneous composition of the corpus has been shown to be advantageous in the training of systems that can accurately extract phenotypic information from a range of different text types. Although the scope of our annotation is currently limited to a single disease, the promising results achieved can stimulate further work into the extraction of phenotypic information for other diseases. The PhenoCHF annotation guidelines and annotations are publicly available at https://code.google.com/p/phenochf-corpus. PMID:26099853
Using text mining techniques to extract phenotypic information from the PhenoCHF corpus.
Alnazzawi, Noha; Thompson, Paul; Batista-Navarro, Riza; Ananiadou, Sophia
2015-01-01
Phenotypic information locked away in unstructured narrative text presents significant barriers to information accessibility, both for clinical practitioners and for computerised applications used for clinical research purposes. Text mining (TM) techniques have previously been applied successfully to extract different types of information from text in the biomedical domain. They have the potential to be extended to allow the extraction of information relating to phenotypes from free text. To stimulate the development of TM systems that are able to extract phenotypic information from text, we have created a new corpus (PhenoCHF) that is annotated by domain experts with several types of phenotypic information relating to congestive heart failure. To ensure that systems developed using the corpus are robust to multiple text types, it integrates text from heterogeneous sources, i.e., electronic health records (EHRs) and scientific articles from the literature. We have developed several different phenotype extraction methods to demonstrate the utility of the corpus, and tested these methods on a further corpus, i.e., ShARe/CLEF 2013. Evaluation of our automated methods showed that PhenoCHF can facilitate the training of reliable phenotype extraction systems, which are robust to variations in text type. These results have been reinforced by evaluating our trained systems on the ShARe/CLEF corpus, which contains clinical records of various types. Like other studies within the biomedical domain, we found that solutions based on conditional random fields produced the best results, when coupled with a rich feature set. PhenoCHF is the first annotated corpus aimed at encoding detailed phenotypic information. The unique heterogeneous composition of the corpus has been shown to be advantageous in the training of systems that can accurately extract phenotypic information from a range of different text types. Although the scope of our annotation is currently limited to a single disease, the promising results achieved can stimulate further work into the extraction of phenotypic information for other diseases. The PhenoCHF annotation guidelines and annotations are publicly available at https://code.google.com/p/phenochf-corpus.
Genetic and environmental continuity in personality development: a meta-analysis.
Briley, Daniel A; Tucker-Drob, Elliot M
2014-09-01
The longitudinal stability of personality is low in childhood but increases substantially into adulthood. Theoretical explanations for this trend differ in the emphasis placed on intrinsic maturation and socializing influences. To what extent does the increasing stability of personality result from the continuity and crystallization of genetically influenced individual differences, and to what extent does the increasing stability of life experiences explain increases in personality trait stability? Behavioral genetic studies, which decompose longitudinal stability into sources associated with genetic and environmental variation, can help to address this question. We aggregated effect sizes from 24 longitudinal behavioral genetic studies containing information on a total of 21,057 sibling pairs from 6 types that varied in terms of genetic relatedness and ranged in age from infancy to old age. A combination of linear and nonlinear meta-analytic regression models were used to evaluate age trends in levels of heritability and environmentality, stabilities of genetic and environmental effects, and the contributions of genetic and environmental effects to overall phenotypic stability. Both the genetic and environmental influences on personality increase in stability with age. The contribution of genetic effects to phenotypic stability is moderate in magnitude and relatively constant with age, in part because of small-to-moderate decreases in the heritability of personality over child development that offset increases in genetic stability. In contrast, the contribution of environmental effects to phenotypic stability increases from near zero in early childhood to moderate in adulthood. The life-span trend of increasing phenotypic stability, therefore, predominantly results from environmental mechanisms. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Ivy, Reid A; Farber, Jeffrey M; Pagotto, Franco; Wiedmann, Martin
2013-01-01
Foodborne pathogen isolate collections are important for the development of detection methods, for validation of intervention strategies, and to develop an understanding of pathogenesis and virulence. We have assembled a publicly available Cronobacter (formerly Enterobacter sakazakii) isolate set that consists of (i) 25 Cronobacter sakazakii isolates, (ii) two Cronobacter malonaticus isolates, (iii) one Cronobacter muytjensii isolate, which displays some atypical phenotypic characteristics, biochemical profiles, and colony color on selected differential media, and (iv) two nonclinical Enterobacter asburiae isolates, which show some phenotypic characteristics similar to those of Cronobacter spp. The set consists of human (n = 10), food (n = 11), and environmental (n = 9) isolates. Analysis of partial 16S rDNA sequence and seven-gene multilocus sequence typing data allowed for reliable identification of these isolates to species and identification of 14 isolates as sequence type 4, which had previously been shown to be the most common C. sakazakii sequence type associated with neonatal meningitis. Phenotypic characterization was carried out with API 20E and API 32E test strips and streaking on two selective chromogenic agars; isolates were also assessed for sorbitol fermentation and growth at 45°C. Although these strategies typically produced the same classification as sequence-based strategies, based on a panel of four biochemical tests, one C. sakazakii isolate yielded inconclusive data and one was classified as C. malonaticus. EcoRI automated ribotyping and pulsed-field gel electrophoresis (PFGE) with XbaI separated the set into 23 unique ribotypes and 30 unique PFGE types, respectively, indicating subtype diversity within the set. Subtype and source data for the collection are publicly available in the PathogenTracker database (www. pathogentracker. net), which allows for continuous updating of information on the set, including links to publications that include information on isolates from this collection.
Wendelsdorf, Katherine; Shah, Sohela
2015-09-01
There is on-going effort in the biomedical research community to leverage Next Generation Sequencing (NGS) technology to identify genetic variants that affect our health. The main challenge facing researchers is getting enough samples from individuals either sick or healthy - to be able to reliably identify the few variants that are causal for a phenotype among all other variants typically seen among individuals. At the same time, more and more individuals are having their genome sequenced either out of curiosity or to identify the cause of an illness. These individuals may benefit from of a way to view and understand their data. QIAGEN's Ingenuity Variant Analysis is an online application that allows users with and without extensive bioinformatics training to incorporate information from published experiments, genetic databases, and a variety of statistical models to identify variants, from a long list of candidates, that are most likely causal for a phenotype as well as annotate variants with what is already known about them in the literature and databases. Ingenuity Variant Analysis is also an information sharing platform where users may exchange samples and analyses. The Empowered Genome Community (EGC) is a new program in which QIAGEN is making this on-line tool freely available to any individual who wishes to analyze their own genetic sequence. EGC members are then able to make their data available to other Ingenuity Variant Analysis users to be used in research. Here we present and describe the Empowered Genome Community in detail. We also present a preliminary, proof-of-concept study that utilizes the 200 genomes currently available through the EGC. The goal of this program is to allow individuals to access and understand their own data as well as facilitate citizen-scientist collaborations that can drive research forward and spur quality scientific dialogue in the general public.
Siew, Edward D; Basu, Rajit K; Wunsch, Hannah; Shaw, Andrew D; Goldstein, Stuart L; Ronco, Claudio; Kellum, John A; Bagshaw, Sean M
2016-01-01
The purpose of this review is to report how administrative data have been used to study AKI, identify current limitations, and suggest how these data sources might be enhanced to address knowledge gaps in the field. 1) To review the existing evidence-base on how AKI is coded across administrative datasets, 2) To identify limitations, gaps in knowledge, and major barriers to scientific progress in AKI related to coding in administrative data, 3) To discuss how administrative data for AKI might be enhanced to enable "communication" and "translation" within and across administrative jurisdictions, and 4) To suggest how administrative databases might be configured to inform 'registry-based' pragmatic studies. Literature review of English language articles through PubMed search for relevant AKI literature focusing on the validation of AKI in administrative data or used administrative data to describe the epidemiology of AKI. Acute Dialysis Quality Initiative (ADQI) Consensus Conference September 6-7(th), 2015, Banff, Canada. Hospitalized patients with AKI. The coding structure for AKI in many administrative datasets limits understanding of true disease burden (especially less severe AKI), its temporal trends, and clinical phenotyping. Important opportunities exist to improve the quality and coding of AKI data to better address critical knowledge gaps in AKI and improve care. A modified Delphi consensus building process consisting of review of the literature and summary statements were developed through a series of alternating breakout and plenary sessions. Administrative codes for AKI are limited by poor sensitivity, lack of standardization to classify severity, and poor contextual phenotyping. These limitations are further hampered by reduced awareness of AKI among providers and the subjective nature of reporting. While an idealized definition of AKI may be difficult to implement, improving standardization of reporting by using laboratory-based definitions and providing complementary information on the context in which AKI occurs are possible. Administrative databases may also help enhance the conduct of and inform clinical or registry-based pragmatic studies. Data sources largely restricted to North American and Europe. Administrative data are rapidly growing and evolving, and represent an unprecedented opportunity to address knowledge gaps in AKI. Progress will require continued efforts to improve awareness of the impact of AKI on public health, engage key stakeholders, and develop tangible strategies to reconfigure infrastructure to improve the reporting and phenotyping of AKI. WHY IS THIS REVIEW IMPORTANT?: Rapid growth in the size and availability of administrative data has enhanced the clinical study of acute kidney injury (AKI). However, significant limitations exist in coding that hinder our ability to better understand its epidemiology and address knowledge gaps. The following consensus-based review discusses how administrative data have been used to study AKI, identify current limitations, and suggest how these data sources might be enhanced to improve the future study of this disease. WHAT ARE THE KEY MESSAGES?: The current coding structure of administrative data is hindered by a lack of sensitivity, standardization to properly classify severity, and limited clinical phenotyping. These limitations combined with reduced awareness of AKI and the subjective nature of reporting limit understanding of disease burden across settings and time periods. As administrative data become more sophisticated and complex, important opportunities to employ more objective criteria to diagnose and stage AKI as well as improve contextual phenotyping exist that can help address knowledge gaps and improve care.
Secure count query on encrypted genomic data.
Hasan, Mohammad Zahidul; Mahdi, Md Safiur Rahman; Sadat, Md Nazmus; Mohammed, Noman
2018-05-01
Human genomic information can yield more effective healthcare by guiding medical decisions. Therefore, genomics research is gaining popularity as it can identify potential correlations between a disease and a certain gene, which improves the safety and efficacy of drug treatment and can also develop more effective prevention strategies [1]. To reduce the sampling error and to increase the statistical accuracy of this type of research projects, data from different sources need to be brought together since a single organization does not necessarily possess required amount of data. In this case, data sharing among multiple organizations must satisfy strict policies (for instance, HIPAA and PIPEDA) that have been enforced to regulate privacy-sensitive data sharing. Storage and computation on the shared data can be outsourced to a third party cloud service provider, equipped with enormous storage and computation resources. However, outsourcing data to a third party is associated with a potential risk of privacy violation of the participants, whose genomic sequence or clinical profile is used in these studies. In this article, we propose a method for secure sharing and computation on genomic data in a semi-honest cloud server. In particular, there are two main contributions. Firstly, the proposed method can handle biomedical data containing both genotype and phenotype. Secondly, our proposed index tree scheme reduces the computational overhead significantly for executing secure count query operation. In our proposed method, the confidentiality of shared data is ensured through encryption, while making the entire computation process efficient and scalable for cutting-edge biomedical applications. We evaluated our proposed method in terms of efficiency on a database of Single-Nucleotide Polymorphism (SNP) sequences, and experimental results demonstrate that the execution time for a query of 50 SNPs in a database of 50,000 records is approximately 5 s, where each record contains 500 SNPs. And, it requires 69.7 s to execute the query on the same database that also includes phenotypes. Copyright © 2018 Elsevier Inc. All rights reserved.
Richard, Annie E; Scheffer, Ingrid E; Wilson, Sarah J
2017-04-01
Richard, A.E., I.E. Scheffer and S.J. Wilson. Features of the broader autism phenotype in people with epilepsy support shared mechanisms between epilepsy and autism spectrum disorder. NEUROSCI BIOBEHAV REV 21(1) XXX-XXX, 2016. To inform on mechanisms underlying the comorbidity of epilepsy and autism spectrum disorder (ASD), we conducted meta-analyses to test whether impaired facial emotion recognition (FER) and theory of mind (ToM), key phenotypic traits of ASD, are more common in people with epilepsy (PWE) than controls. We contrasted these findings with those of relatives of individuals with ASD (ASD-relatives) compared to controls. Furthermore, we examined the relationship of demographic (age, IQ, sex) and epilepsy-related factors (epilepsy onset age, duration, seizure laterality and origin) to FER and ToM. Thirty-one eligible studies of PWE (including 1449 individuals: 77% with temporal lobe epilepsy), and 22 of ASD-relatives (N=1295) were identified by a systematic database search. Analyses revealed reduced FER and ToM in PWE compared to controls (p<0.001), but only reduced ToM in ASD-relatives (p<0.001). ToM was poorer in PWE than ASD-relatives. Only weak associations were found between FER and ToM and epilepsy-related factors. These findings suggest shared mechanisms between epilepsy and ASD, independent of intellectual disability. Copyright © 2017 Elsevier Ltd. All rights reserved.
Bethell, Richard; Scherer, Joseph; Witvrouw, Myriam; Paquet, Agnes; Coakley, Eoin; Hall, David
2012-09-01
To test tipranavir (TPV) or darunavir (DRV) as treatment options for patients with phenotypic resistance to protease inhibitors (PIs), including lopinavir, saquinavir, atazanavir, and fosamprenavir, the PhenoSense GT database was analyzed for susceptibility to DRV or TPV among PI-resistant isolates. The Monogram Biosciences HIV database (South San Francisco, CA) containing 7775 clinical isolates (2006-2008) not susceptible to at least one first-generation PI was analyzed. Phenotypic responses [resistant (R), partially susceptible (PS), or susceptible (S)] were defined by upper and lower clinical cut-offs to each PI. Genotypes were screened for amino acid substitutions associated with TPV-R/DRV-S and TPV-S/DRV-R phenotypes. In all, 4.9% (378) of isolates were resistant to all six PIs and 31.0% (2407) were resistant to none. Among isolates resistant to all four first-generation PIs, DRV resistance increased from 21.2% to 41.9% from 2006 to 2008, respectively, and resistance to TPV remained steady (53.9 to 57.3%, respectively). Higher prevalence substitutions in DRV-S/TPV-R isolates versus DRV-R/TPV-S isolates, respectively, were 82L/T (44.4% vs. 0%) and 83D (5.8% vs. 0%). Higher prevalence substitutions in DRV-R/TPV-S virus were 50V (0.0% vs. 28.9%), 54L (1.0% vs. 36.1%), and 76V (0.4% vs. 15.5%). Mutations to help predict discordant susceptibility to DRV and TPV in isolates with reduced susceptibility to other PIs were identified. DRV resistance mutations associated with improved virologic response to TPV were more prevalent in DRV-R/TPV-S isolates. TPV resistance mutations were more prevalent in TPV-R and DRV-S isolates. These results confirm the impact of genotype on phenotype, illustrating how HIV genotype and phenotype data assist regimen optimization.
DNApod: DNA polymorphism annotation database from next-generation sequence read archives.
Mochizuki, Takako; Tanizawa, Yasuhiro; Fujisawa, Takatomo; Ohta, Tazro; Nikoh, Naruo; Shimizu, Tokurou; Toyoda, Atsushi; Fujiyama, Asao; Kurata, Nori; Nagasaki, Hideki; Kaminuma, Eli; Nakamura, Yasukazu
2017-01-01
With the rapid advances in next-generation sequencing (NGS), datasets for DNA polymorphisms among various species and strains have been produced, stored, and distributed. However, reliability varies among these datasets because the experimental and analytical conditions used differ among assays. Furthermore, such datasets have been frequently distributed from the websites of individual sequencing projects. It is desirable to integrate DNA polymorphism data into one database featuring uniform quality control that is distributed from a single platform at a single place. DNA polymorphism annotation database (DNApod; http://tga.nig.ac.jp/dnapod/) is an integrated database that stores genome-wide DNA polymorphism datasets acquired under uniform analytical conditions, and this includes uniformity in the quality of the raw data, the reference genome version, and evaluation algorithms. DNApod genotypic data are re-analyzed whole-genome shotgun datasets extracted from sequence read archives, and DNApod distributes genome-wide DNA polymorphism datasets and known-gene annotations for each DNA polymorphism. This new database was developed for storing genome-wide DNA polymorphism datasets of plants, with crops being the first priority. Here, we describe our analyzed data for 679, 404, and 66 strains of rice, maize, and sorghum, respectively. The analytical methods are available as a DNApod workflow in an NGS annotation system of the DNA Data Bank of Japan and a virtual machine image. Furthermore, DNApod provides tables of links of identifiers between DNApod genotypic data and public phenotypic data. To advance the sharing of organism knowledge, DNApod offers basic and ubiquitous functions for multiple alignment and phylogenetic tree construction by using orthologous gene information.
DNApod: DNA polymorphism annotation database from next-generation sequence read archives
Mochizuki, Takako; Tanizawa, Yasuhiro; Fujisawa, Takatomo; Ohta, Tazro; Nikoh, Naruo; Shimizu, Tokurou; Toyoda, Atsushi; Fujiyama, Asao; Kurata, Nori; Nagasaki, Hideki; Kaminuma, Eli; Nakamura, Yasukazu
2017-01-01
With the rapid advances in next-generation sequencing (NGS), datasets for DNA polymorphisms among various species and strains have been produced, stored, and distributed. However, reliability varies among these datasets because the experimental and analytical conditions used differ among assays. Furthermore, such datasets have been frequently distributed from the websites of individual sequencing projects. It is desirable to integrate DNA polymorphism data into one database featuring uniform quality control that is distributed from a single platform at a single place. DNA polymorphism annotation database (DNApod; http://tga.nig.ac.jp/dnapod/) is an integrated database that stores genome-wide DNA polymorphism datasets acquired under uniform analytical conditions, and this includes uniformity in the quality of the raw data, the reference genome version, and evaluation algorithms. DNApod genotypic data are re-analyzed whole-genome shotgun datasets extracted from sequence read archives, and DNApod distributes genome-wide DNA polymorphism datasets and known-gene annotations for each DNA polymorphism. This new database was developed for storing genome-wide DNA polymorphism datasets of plants, with crops being the first priority. Here, we describe our analyzed data for 679, 404, and 66 strains of rice, maize, and sorghum, respectively. The analytical methods are available as a DNApod workflow in an NGS annotation system of the DNA Data Bank of Japan and a virtual machine image. Furthermore, DNApod provides tables of links of identifiers between DNApod genotypic data and public phenotypic data. To advance the sharing of organism knowledge, DNApod offers basic and ubiquitous functions for multiple alignment and phylogenetic tree construction by using orthologous gene information. PMID:28234924
HEROD: a human ethnic and regional specific omics database.
Zeng, Xian; Tao, Lin; Zhang, Peng; Qin, Chu; Chen, Shangying; He, Weidong; Tan, Ying; Xia Liu, Hong; Yang, Sheng Yong; Chen, Zhe; Jiang, Yu Yang; Chen, Yu Zong
2017-10-15
Genetic and gene expression variations within and between populations and across geographical regions have substantial effects on the biological phenotypes, diseases, and therapeutic response. The development of precision medicines can be facilitated by the OMICS studies of the patients of specific ethnicity and geographic region. However, there is an inadequate facility for broadly and conveniently accessing the ethnic and regional specific OMICS data. Here, we introduced a new free database, HEROD, a human ethnic and regional specific OMICS database. Its first version contains the gene expression data of 53 070 patients of 169 diseases in seven ethnic populations from 193 cities/regions in 49 nations curated from the Gene Expression Omnibus (GEO), the ArrayExpress Archive of Functional Genomics Data (ArrayExpress), the Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC). Geographic region information of curated patients was mainly manually extracted from referenced publications of each original study. These data can be accessed and downloaded via keyword search, World map search, and menu-bar search of disease name, the international classification of disease code, geographical region, location of sample collection, ethnic population, gender, age, sample source organ, patient type (patient or healthy), sample type (disease or normal tissue) and assay type on the web interface. The HEROD database is freely accessible at http://bidd2.nus.edu.sg/herod/index.php. The database and web interface are implemented in MySQL, PHP and HTML with all major browsers supported. phacyz@nus.edu.sg. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
HormoneBase, a population-level database of steroid hormone levels across vertebrates
Vitousek, Maren N.; Johnson, Michele A.; Donald, Jeremy W.; Francis, Clinton D.; Fuxjager, Matthew J.; Goymann, Wolfgang; Hau, Michaela; Husak, Jerry F.; Kircher, Bonnie K.; Knapp, Rosemary; Martin, Lynn B.; Miller, Eliot T.; Schoenle, Laura A.; Uehling, Jennifer J.; Williams, Tony D.
2018-01-01
Hormones are central regulators of organismal function and flexibility that mediate a diversity of phenotypic traits from early development through senescence. Yet despite these important roles, basic questions about how and why hormone systems vary within and across species remain unanswered. Here we describe HormoneBase, a database of circulating steroid hormone levels and their variation across vertebrates. This database aims to provide all available data on the mean, variation, and range of plasma glucocorticoids (both baseline and stress-induced) and androgens in free-living and un-manipulated adult vertebrates. HormoneBase (www.HormoneBase.org) currently includes >6,580 entries from 476 species, reported in 648 publications from 1967 to 2015, and unpublished datasets. Entries are associated with data on the species and population, sex, year and month of study, geographic coordinates, life history stage, method and latency of hormone sampling, and analysis technique. This novel resource could be used for analyses of the function and evolution of hormone systems, and the relationships between hormonal variation and a variety of processes including phenotypic variation, fitness, and species distributions. PMID:29786693
Exercises in Anatomy, Connectivity, and Morphology using Neuromorpho.org and the Allen Brain Atlas.
Chu, Philip; Peck, Joshua; Brumberg, Joshua C
2015-01-01
Laboratory instruction of neuroscience is often limited by the lack of physical resources and supplies (e.g., brains specimens, dissection kits, physiological equipment). Online databases can serve as supplements to material labs by providing professionally collected images of brain specimens and their underlying cellular populations with resolution and quality that is extremely difficult to access for strictly pedagogical purposes. We describe a method using two online databases, the Neuromorpho.org and the Allen Brain Atlas (ABA), that freely provide access to data from working brain scientists that can be modified for laboratory instruction/exercises. Neuromorpho.org is the first neuronal morphology database that provides qualitative and quantitative data from reconstructed cells analyzed in published scientific reports. The Neuromorpho.org database contains cross species and multiple neuronal phenotype datasets which allows for comparative examinations. The ABA provides modules that allow students to study the anatomy of the rodent brain, as well as observe the different cellular phenotypes that exist using histochemical labeling. Using these tools in conjunction, advanced students can ask questions about qualitative and quantitative neuronal morphology, then examine the distribution of the same cell types across the entire brain to gain a full appreciation of the magnitude of the brain's complexity.
The BioGRID interaction database: 2013 update.
Chatr-Aryamontri, Andrew; Breitkreutz, Bobby-Joe; Heinicke, Sven; Boucher, Lorrie; Winter, Andrew; Stark, Chris; Nixon, Julie; Ramage, Lindsay; Kolas, Nadine; O'Donnell, Lara; Reguly, Teresa; Breitkreutz, Ashton; Sellam, Adnane; Chen, Daici; Chang, Christie; Rust, Jennifer; Livstone, Michael; Oughtred, Rose; Dolinski, Kara; Tyers, Mike
2013-01-01
The Biological General Repository for Interaction Datasets (BioGRID: http//thebiogrid.org) is an open access archive of genetic and protein interactions that are curated from the primary biomedical literature for all major model organism species. As of September 2012, BioGRID houses more than 500 000 manually annotated interactions from more than 30 model organisms. BioGRID maintains complete curation coverage of the literature for the budding yeast Saccharomyces cerevisiae, the fission yeast Schizosaccharomyces pombe and the model plant Arabidopsis thaliana. A number of themed curation projects in areas of biomedical importance are also supported. BioGRID has established collaborations and/or shares data records for the annotation of interactions and phenotypes with most major model organism databases, including Saccharomyces Genome Database, PomBase, WormBase, FlyBase and The Arabidopsis Information Resource. BioGRID also actively engages with the text-mining community to benchmark and deploy automated tools to expedite curation workflows. BioGRID data are freely accessible through both a user-defined interactive interface and in batch downloads in a wide variety of formats, including PSI-MI2.5 and tab-delimited files. BioGRID records can also be interrogated and analyzed with a series of new bioinformatics tools, which include a post-translational modification viewer, a graphical viewer, a REST service and a Cytoscape plugin.
Protein-protein interaction networks: unraveling the wiring of molecular machines within the cell.
De Las Rivas, Javier; Fontanillo, Celia
2012-11-01
Mapping and understanding of the protein interaction networks with their key modules and hubs can provide deeper insights into the molecular machinery underlying complex phenotypes. In this article, we present the basic characteristics and definitions of protein networks, starting with a distinction of the different types of associations between proteins. We focus the review on protein-protein interactions (PPIs), a subset of associations defined as physical contacts between proteins that occur by selective molecular docking in a particular biological context. We present such definition as opposed to other types of protein associations derived from regulatory, genetic, structural or functional relations. To determine PPIs, a variety of binary and co-complex methods exist; however, not all the technologies provide the same information and data quality. A way of increasing confidence in a given protein interaction is to integrate orthogonal experimental evidences. The use of several complementary methods testing each single interaction assesses the accuracy of PPI data and tries to minimize the occurrence of false interactions. Following this approach there have been important efforts to unify primary databases of experimentally proven PPIs into integrated databases. These meta-databases provide a measure of the confidence of interactions based on the number of experimental proofs that report them. As a conclusion, we can state that integrated information allows the building of more reliable interaction networks. Identification of communities, cliques, modules and hubs by analysing the topological parameters and graph properties of the protein networks allows the discovery of central/critical nodes, which are candidates to regulate cellular flux and dynamics.
SeedStor: A Germplasm Information Management System and Public Database.
Horler, R S P; Turner, A S; Fretter, P; Ambrose, M
2018-01-01
SeedStor (https://www.seedstor.ac.uk) acts as the publicly available database for the seed collections held by the Germplasm Resources Unit (GRU) based at the John Innes Centre, Norwich, UK. The GRU is a national capability supported by the Biotechnology and Biological Sciences Research Council (BBSRC). The GRU curates germplasm collections of a range of temperate cereal, legume and Brassica crops and their associated wild relatives, as well as precise genetic stocks, near-isogenic lines and mapping populations. With >35,000 accessions, the GRU forms part of the UK's plant conservation contribution to the Multilateral System (MLS) of the International Treaty for Plant Genetic Resources for Food and Agriculture (ITPGRFA) for wheat, barley, oat and pea. SeedStor is a fully searchable system that allows our various collections to be browsed species by species through to complicated multipart phenotype criteria-driven queries. The results from these searches can be downloaded for later analysis or used to order germplasm via our shopping cart. The user community for SeedStor is the plant science research community, plant breeders, specialist growers, hobby farmers and amateur gardeners, and educationalists. Furthermore, SeedStor is much more than a database; it has been developed to act internally as a Germplasm Information Management System that allows team members to track and process germplasm requests, determine regeneration priorities, handle cost recovery and Material Transfer Agreement paperwork, manage the Seed Store holdings and easily report on a wide range of the aforementioned tasks. © The Author(s) 2017. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists.
Big data in sleep medicine: prospects and pitfalls in phenotyping
Bianchi, Matt T; Russo, Kathryn; Gabbidon, Harriett; Smith, Tiaundra; Goparaju, Balaji; Westover, M Brandon
2017-01-01
Clinical polysomnography (PSG) databases are a rich resource in the era of “big data” analytics. We explore the uses and potential pitfalls of clinical data mining of PSG using statistical principles and analysis of clinical data from our sleep center. We performed retrospective analysis of self-reported and objective PSG data from adults who underwent overnight PSG (diagnostic tests, n=1835). Self-reported symptoms overlapped markedly between the two most common categories, insomnia and sleep apnea, with the majority reporting symptoms of both disorders. Standard clinical metrics routinely reported on objective data were analyzed for basic properties (missing values, distributions), pairwise correlations, and descriptive phenotyping. Of 41 continuous variables, including clinical and PSG derived, none passed testing for normality. Objective findings of sleep apnea and periodic limb movements were common, with 51% having an apnea–hypopnea index (AHI) >5 per hour and 25% having a leg movement index >15 per hour. Different visualization methods are shown for common variables to explore population distributions. Phenotyping methods based on clinical databases are discussed for sleep architecture, sleep apnea, and insomnia. Inferential pitfalls are discussed using the current dataset and case examples from the literature. The increasing availability of clinical databases for large-scale analytics holds important promise in sleep medicine, especially as it becomes increasingly important to demonstrate the utility of clinical testing methods in management of sleep disorders. Awareness of the strengths, as well as caution regarding the limitations, will maximize the productive use of big data analytics in sleep medicine. PMID:28243157
Melo, Thaise P; Takada, Luciana; Baldi, Fernando; Oliveira, Henrique N; Dias, Marina M; Neves, Haroldo H R; Schenkel, Flavio S; Albuquerque, Lucia G; Carvalheiro, Roberto
2016-06-21
QTL mapping through genome-wide association studies (GWAS) is challenging, especially in the case of low heritability complex traits and when few animals possess genotypic and phenotypic information. When most of the phenotypic information is from non-genotyped animals, GWAS can be performed using the weighted single-step GBLUP (WssGBLUP) method, which permits to combine all available information, even that of non-genotyped animals. However, it is not clear to what extent phenotypic information from non-genotyped animals increases the power of QTL detection, and whether factors such as the extent of linkage disequilibrium (LD) in the population and weighting SNPs in WssGBLUP affect the importance of using information from non-genotyped animals in GWAS. These questions were investigated in this study using real and simulated data. Analysis of real data showed that the use of phenotypes of non-genotyped animals affected SNP effect estimates and, consequently, QTL mapping. Despite some coincidence, the most important genomic regions identified by the analyses, either using or ignoring phenotypes of non-genotyped animals, were not the same. The simulation results indicated that the inclusion of all available phenotypic information, even that of non-genotyped animals, tends to improve QTL detection for low heritability complex traits. For populations with low levels of LD, this trend of improvement was less pronounced. Stronger shrinkage on SNPs explaining lower variance was not necessarily associated with better QTL mapping. The use of phenotypic information from non-genotyped animals in GWAS may improve the ability to detect QTL for low heritability complex traits, especially in populations in which the level of LD is high.
The FlyBase database of the Drosophila genome projects and community literature
2003-01-01
FlyBase (http://flybase.bio.indiana.edu/) provides an integrated view of the fundamental genomic and genetic data on the major genetic model Drosophila melanogaster and related species. FlyBase has primary responsibility for the continual reannotation of the D. melanogaster genome. The ultimate goal of the reannotation effort is to decorate the euchromatic sequence of the genome with as much biological information as is available from the community and from the major genome project centers. A complete revision of the annotations of the now-finished euchromatic genomic sequence has been completed. There are many points of entry to the genome within FlyBase, most notably through maps, gene products and ontologies, structured phenotypic and gene expression data, and anatomy. PMID:12519974
Zhang, Hongkai; Torkamani, Ali; Jones, Teresa M; Ruiz, Diana I; Pons, Jaume; Lerner, Richard A
2011-08-16
Use of large combinatorial antibody libraries and next-generation sequencing of nucleic acids are two of the most powerful methods in modern molecular biology. The libraries are screened using the principles of evolutionary selection, albeit in real time, to enrich for members with a particular phenotype. This selective process necessarily results in the loss of information about less-fit molecules. On the other hand, sequencing of the library, by itself, gives information that is mostly unrelated to phenotype. If the two methods could be combined, the full potential of very large molecular libraries could be realized. Here we report the implementation of a phenotype-information-phenotype cycle that integrates information and gene recovery. After selection for phage-encoded antibodies that bind to targets expressed on the surface of Escherichia coli, the information content of the selected pool is obtained by pyrosequencing. Sequences that encode specific antibodies are identified by a bioinformatic analysis and recovered by a stringent affinity method that is uniquely suited for gene isolation from a highly degenerate collection of nucleic acids. This approach can be generalized for selection of antibodies against targets that are present as minor components of complex systems.
Urban, Martin; Cuzick, Alayne; Rutherford, Kim; Irvine, Alistair; Pedro, Helder; Pant, Rashmi; Sadanadan, Vidyendra; Khamari, Lokanath; Billal, Santoshkumar; Mohanty, Sagar; Hammond-Kosack, Kim E
2017-01-04
The pathogen-host interactions database (PHI-base) is available at www.phi-base.org PHI-base contains expertly curated molecular and biological information on genes proven to affect the outcome of pathogen-host interactions reported in peer reviewed research articles. In addition, literature that indicates specific gene alterations that did not affect the disease interaction phenotype are curated to provide complete datasets for comparative purposes. Viruses are not included. Here we describe a revised PHI-base Version 4 data platform with improved search, filtering and extended data display functions. A PHIB-BLAST search function is provided and a link to PHI-Canto, a tool for authors to directly curate their own published data into PHI-base. The new release of PHI-base Version 4.2 (October 2016) has an increased data content containing information from 2219 manually curated references. The data provide information on 4460 genes from 264 pathogens tested on 176 hosts in 8046 interactions. Prokaryotic and eukaryotic pathogens are represented in almost equal numbers. Host species belong ∼70% to plants and 30% to other species of medical and/or environmental importance. Additional data types included into PHI-base 4 are the direct targets of pathogen effector proteins in experimental and natural host organisms. The curation problems encountered and the future directions of the PHI-base project are briefly discussed. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Metabolic pathways for the whole community.
Hanson, Niels W; Konwar, Kishori M; Hawley, Alyse K; Altman, Tomer; Karp, Peter D; Hallam, Steven J
2014-07-22
A convergence of high-throughput sequencing and computational power is transforming biology into information science. Despite these technological advances, converting bits and bytes of sequence information into meaningful insights remains a challenging enterprise. Biological systems operate on multiple hierarchical levels from genomes to biomes. Holistic understanding of biological systems requires agile software tools that permit comparative analyses across multiple information levels (DNA, RNA, protein, and metabolites) to identify emergent properties, diagnose system states, or predict responses to environmental change. Here we adopt the MetaPathways annotation and analysis pipeline and Pathway Tools to construct environmental pathway/genome databases (ePGDBs) that describe microbial community metabolism using MetaCyc, a highly curated database of metabolic pathways and components covering all domains of life. We evaluate Pathway Tools' performance on three datasets with different complexity and coding potential, including simulated metagenomes, a symbiotic system, and the Hawaii Ocean Time-series. We define accuracy and sensitivity relationships between read length, coverage and pathway recovery and evaluate the impact of taxonomic pruning on ePGDB construction and interpretation. Resulting ePGDBs provide interactive metabolic maps, predict emergent metabolic pathways associated with biosynthesis and energy production and differentiate between genomic potential and phenotypic expression across defined environmental gradients. This multi-tiered analysis provides the user community with specific operating guidelines, performance metrics and prediction hazards for more reliable ePGDB construction and interpretation. Moreover, it demonstrates the power of Pathway Tools in predicting metabolic interactions in natural and engineered ecosystems.
Phenotypic assortment in wild primate networks: implications for the dissemination of information.
Carter, Alecia J; Lee, Alexander E G; Marshall, Harry H; Ticó, Miquel Torrents; Cowlishaw, Guy
2015-05-01
Individuals' access to social information can depend on their social network. Homophily-a preference to associate with similar phenotypes-may cause assortment within social networks that could preclude information transfer from individuals who generate information to those who would benefit from acquiring it. Thus, understanding phenotypic assortment may lead to a greater understanding of the factors that could limit the transfer of information between individuals. We tested whether there was assortment in wild baboon (Papio ursinus) networks, using data collected from two troops over 6 years for six phenotypic traits-boldness, age, dominance rank, sex and the propensity to generate/exploit information-using two methods for defining a connection between individuals-time spent in proximity and grooming. Our analysis indicated that assortment was more common in grooming than proximity networks. In general, there was homophily for boldness, age, rank and the propensity to both generate and exploit information, but heterophily for sex. However, there was considerable variability both between troops and years. The patterns of homophily we observed for these phenotypes may impede information transfer between them. However, the inconsistency in the strength of assortment between troops and years suggests that the limitations to information flow may be quite variable.
Chen, Yi- Ping Phoebe; Hanan, Jim
2002-01-01
Models of plant architecture allow us to explore how genotype environment interactions effect the development of plant phenotypes. Such models generate masses of data organised in complex hierarchies. This paper presents a generic system for creating and automatically populating a relational database from data generated by the widely used L-system approach to modelling plant morphogenesis. Techniques from compiler technology are applied to generate attributes (new fields) in the database, to simplify query development for the recursively-structured branching relationship. Use of biological terminology in an interactive query builder contributes towards making the system biologist-friendly.
Neuroimaging Data Sharing on the Neuroinformatics Database Platform
Book, Gregory A; Stevens, Michael; Assaf, Michal; Glahn, David; Pearlson, Godfrey D
2015-01-01
We describe the Neuroinformatics Database (NiDB), an open-source database platform for archiving, analysis, and sharing of neuroimaging data. Data from the multi-site projects Autism Brain Imaging Data Exchange (ABIDE), Bipolar-Schizophrenia Network on Intermediate Phenotypes parts one and two (B-SNIP1, B-SNIP2), and Monetary Incentive Delay task (MID) are available for download from the public instance of NiDB, with more projects sharing data as it becomes available. As demonstrated by making several large datasets available, NiDB is an extensible platform appropriately suited to archive and distribute shared neuroimaging data. PMID:25888923
Discovering cancer vulnerabilities using high-throughput micro-RNA screening.
Nikolic, Iva; Elsworth, Benjamin; Dodson, Eoin; Wu, Sunny Z; Gould, Cathryn M; Mestdagh, Pieter; Marshall, Glenn M; Horvath, Lisa G; Simpson, Kaylene J; Swarbrick, Alexander
2017-12-15
Micro-RNAs (miRNAs) are potent regulators of gene expression and cellular phenotype. Each miRNA has the potential to target hundreds of transcripts within the cell thus controlling fundamental cellular processes such as survival and proliferation. Here, we exploit this important feature of miRNA networks to discover vulnerabilities in cancer phenotype, and map miRNA-target relationships across different cancer types. More specifically, we report the results of a functional genomics screen of 1280 miRNA mimics and inhibitors in eight cancer cell lines, and its presentation in a sophisticated interactive data portal. This resource represents the most comprehensive survey of miRNA function in oncology, incorporating breast cancer, prostate cancer and neuroblastoma. A user-friendly web portal couples this experimental data with multiple tools for miRNA target prediction, pathway enrichment analysis and visualization. In addition, the database integrates publicly available gene expression and perturbation data enabling tailored and context-specific analysis of miRNA function in a particular disease. As a proof-of-principle, we use the database and its innovative features to uncover novel determinants of the neuroblastoma malignant phenotype. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Ajayi, Oluwaseun Jessica; Smith, Ebony Jeannae; Viangteeravat, Teeradache; Huang, Eunice Y; Nagisetty, Naga Satya V Rao; Urraca, Nora; Lusk, Laina; Finucane, Brenda; Arkilo, Dimitrios; Young, Jennifer; Jeste, Shafali; Thibert, Ronald; Reiter, Lawrence T
2017-10-18
Chromosome 15q11.2-q13.1 duplication syndrome (Dup15q syndrome) is a rare disorder caused by duplications of chromosome 15q11.2-q13.1, resulting in a wide range of developmental disabilities in affected individuals. The Dup15q Alliance is an organization that provides family support and promotes research to improve the quality of life of patients living with Dup15q syndrome. Because of the low prevalence of this condition, the establishment of a single research repository would have been difficult and more time consuming without collaboration across multiple institutions. The goal of this project is to establish a national deidentified database with clinical and survey information on individuals diagnosed with Dup15q syndrome. The development of a multiclinic site repository for clinical and survey data on individuals with Dup15q syndrome was initiated and supported by the Dup15q Alliance. Using collaborative workflows, communication protocols, and stakeholder engagement tools, a comprehensive database of patient-centered information was built. We successfully established a self-report populating, centralized repository for Dup15q syndrome research. This repository also resulted in the development of standardized instruments that can be used for other studies relating to developmental disorders. By standardizing the data collection instruments, it allows us integrate our data with other national databases, such as the National Database for Autism Research. A substantial portion of the data collected from the questionnaires was facilitated through direct engagement of participants and their families. This allowed for a more complete set of information to be collected with a minimal turnaround time. We developed a repository that can efficiently be mined for shared clinical phenotypes observed at multiple clinic sites and used as a springboard for future clinical and basic research studies. ©Oluwaseun Jessica Ajayi, Ebony Jeannae Smith, Teeradache Viangteeravat, Eunice Y Huang, Naga Satya V Rao Nagisetty, Nora Urraca, Laina Lusk, Brenda Finucane, Dimitrios Arkilo, Jennifer Young, Shafali Jeste, Ronald Thibert, The Dup15q Alliance, Lawrence T Reiter. Originally published in JMIR Research Protocols (http://www.researchprotocols.org), 18.10.2017.
Lynx web services for annotations and systems analysis of multi-gene disorders.
Sulakhe, Dinanath; Taylor, Andrew; Balasubramanian, Sandhya; Feng, Bo; Xie, Bingqing; Börnigen, Daniela; Dave, Utpal J; Foster, Ian T; Gilliam, T Conrad; Maltsev, Natalia
2014-07-01
Lynx is a web-based integrated systems biology platform that supports annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Lynx has integrated multiple classes of biomedical data (genomic, proteomic, pathways, phenotypic, toxicogenomic, contextual and others) from various public databases as well as manually curated data from our group and collaborators (LynxKB). Lynx provides tools for gene list enrichment analysis using multiple functional annotations and network-based gene prioritization. Lynx provides access to the integrated database and the analytical tools via REST based Web Services (http://lynx.ci.uchicago.edu/webservices.html). This comprises data retrieval services for specific functional annotations, services to search across the complete LynxKB (powered by Lucene), and services to access the analytical tools built within the Lynx platform. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Clayton, Stephen; Prigmore, Elena; Langley, Elizabeth; Yang, Fengtang; Maguire, Sean; Fu, Beiyuan; Rajan, Diana; Sheppard, Olivia; Scott, Carol; Hauser, Heidi; Stephens, Philip J.; Stebbings, Lucy A.; Ng, Bee Ling; Fitzgerald, Tomas; Quail, Michael A.; Banerjee, Ruby; Rothkamm, Kai; Tybulewicz, Victor L. J.; Fisher, Elizabeth M. C.; Carter, Nigel P.
2013-01-01
Down syndrome (DS) is caused by trisomy of chromosome 21 (Hsa21) and presents a complex phenotype that arises from abnormal dosage of genes on this chromosome. However, the individual dosage-sensitive genes underlying each phenotype remain largely unknown. To help dissect genotype – phenotype correlations in this complex syndrome, the first fully transchromosomic mouse model, the Tc1 mouse, which carries a copy of human chromosome 21 was produced in 2005. The Tc1 strain is trisomic for the majority of genes that cause phenotypes associated with DS, and this freely available mouse strain has become used widely to study DS, the effects of gene dosage abnormalities, and the effect on the basic biology of cells when a mouse carries a freely segregating human chromosome. Tc1 mice were created by a process that included irradiation microcell-mediated chromosome transfer of Hsa21 into recipient mouse embryonic stem cells. Here, the combination of next generation sequencing, array-CGH and fluorescence in situ hybridization technologies has enabled us to identify unsuspected rearrangements of Hsa21 in this mouse model; revealing one deletion, six duplications and more than 25 de novo structural rearrangements. Our study is not only essential for informing functional studies of the Tc1 mouse but also (1) presents for the first time a detailed sequence analysis of the effects of gamma radiation on an entire human chromosome, which gives some mechanistic insight into the effects of radiation damage on DNA, and (2) overcomes specific technical difficulties of assaying a human chromosome on a mouse background where highly conserved sequences may confound the analysis. Sequence data generated in this study is deposited in the ENA database, Study Accession number: ERP000439. PMID:23596509
McAdams, Tom A; Neiderhiser, Jenae M; Rijsdijk, Fruhling V; Narusyte, Jurgita; Lichtenstein, Paul; Eley, Thalia C
2014-07-01
Parental psychopathology, parenting style, and the quality of intrafamilial relationships are all associated with child mental health outcomes. However, most research can say little about the causal pathways underlying these associations. This is because most studies are not genetically informative and are therefore not able to account for the possibility that associations are confounded by gene-environment correlation. That is, biological parents not only provide a rearing environment for their child, but also contribute 50% of their genes. Any associations between parental phenotype and child phenotype are therefore potentially confounded. One technique for disentangling genetic from environmental effects is the children-of-twins (COT) method. This involves using data sets comprising twin parents and their children to distinguish genetic from environmental associations between parent and child phenotypes. The COT technique has grown in popularity in the last decade, and we predict that this surge in popularity will continue. In the present article we explain the COT method for those unfamiliar with its use. We present the logic underlying this approach, discuss strengths and weaknesses, and highlight important methodological considerations for researchers interested in the COT method. We also cover variations on basic COT approaches, including the extended-COT method, capable of distinguishing forms of gene-environment correlation. We then present a systematic review of all the behavioral COT studies published to date. These studies cover such diverse phenotypes as psychosis, substance abuse, internalizing, externalizing, parenting, and marital difficulties. In reviewing this literature, we highlight past applications, identify emergent patterns, and suggest avenues for future research. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Discovery of novel biomarkers and phenotypes by semantic technologies
2013-01-01
Background Biomarkers and target-specific phenotypes are important to targeted drug design and individualized medicine, thus constituting an important aspect of modern pharmaceutical research and development. More and more, the discovery of relevant biomarkers is aided by in silico techniques based on applying data mining and computational chemistry on large molecular databases. However, there is an even larger source of valuable information available that can potentially be tapped for such discoveries: repositories constituted by research documents. Results This paper reports on a pilot experiment to discover potential novel biomarkers and phenotypes for diabetes and obesity by self-organized text mining of about 120,000 PubMed abstracts, public clinical trial summaries, and internal Merck research documents. These documents were directly analyzed by the InfoCodex semantic engine, without prior human manipulations such as parsing. Recall and precision against established, but different benchmarks lie in ranges up to 30% and 50% respectively. Retrieval of known entities missed by other traditional approaches could be demonstrated. Finally, the InfoCodex semantic engine was shown to discover new diabetes and obesity biomarkers and phenotypes. Amongst these were many interesting candidates with a high potential, although noticeable noise (uninteresting or obvious terms) was generated. Conclusions The reported approach of employing autonomous self-organising semantic engines to aid biomarker discovery, supplemented by appropriate manual curation processes, shows promise and has potential to impact, conservatively, a faster alternative to vocabulary processes dependent on humans having to read and analyze all the texts. More optimistically, it could impact pharmaceutical research, for example to shorten time-to-market of novel drugs, or speed up early recognition of dead ends and adverse reactions. PMID:23402646
Accelerating root system phenotyping of seedlings through a computer-assisted processing pipeline.
Dupuy, Lionel X; Wright, Gladys; Thompson, Jacqueline A; Taylor, Anna; Dekeyser, Sebastien; White, Christopher P; Thomas, William T B; Nightingale, Mark; Hammond, John P; Graham, Neil S; Thomas, Catherine L; Broadley, Martin R; White, Philip J
2017-01-01
There are numerous systems and techniques to measure the growth of plant roots. However, phenotyping large numbers of plant roots for breeding and genetic analyses remains challenging. One major difficulty is to achieve high throughput and resolution at a reasonable cost per plant sample. Here we describe a cost-effective root phenotyping pipeline, on which we perform time and accuracy benchmarking to identify bottlenecks in such pipelines and strategies for their acceleration. Our root phenotyping pipeline was assembled with custom software and low cost material and equipment. Results show that sample preparation and handling of samples during screening are the most time consuming task in root phenotyping. Algorithms can be used to speed up the extraction of root traits from image data, but when applied to large numbers of images, there is a trade-off between time of processing the data and errors contained in the database. Scaling-up root phenotyping to large numbers of genotypes will require not only automation of sample preparation and sample handling, but also efficient algorithms for error detection for more reliable replacement of manual interventions.
Jessen, Leon Eyrich; Hoof, Ilka; Lund, Ole; Nielsen, Morten
2013-07-01
Identifying which mutation(s) within a given genotype is responsible for an observable phenotype is important in many aspects of molecular biology. Here, we present SigniSite, an online application for subgroup-free residue-level genotype-phenotype correlation. In contrast to similar methods, SigniSite does not require any pre-definition of subgroups or binary classification. Input is a set of protein sequences where each sequence has an associated real number, quantifying a given phenotype. SigniSite will then identify which amino acid residues are significantly associated with the data set phenotype. As output, SigniSite displays a sequence logo, depicting the strength of the phenotype association of each residue and a heat-map identifying 'hot' or 'cold' regions. SigniSite was benchmarked against SPEER, a state-of-the-art method for the prediction of specificity determining positions (SDP) using a set of human immunodeficiency virus protease-inhibitor genotype-phenotype data and corresponding resistance mutation scores from the Stanford University HIV Drug Resistance Database, and a data set of protein families with experimentally annotated SDPs. For both data sets, SigniSite was found to outperform SPEER. SigniSite is available at: http://www.cbs.dtu.dk/services/SigniSite/.
An automated field phenotyping pipeline for application in grapevine research.
Kicherer, Anna; Herzog, Katja; Pflanz, Michael; Wieland, Markus; Rüger, Philipp; Kecke, Steffen; Kuhlmann, Heiner; Töpfer, Reinhard
2015-02-26
Due to its perennial nature and size, the acquisition of phenotypic data in grapevine research is almost exclusively restricted to the field and done by visual estimation. This kind of evaluation procedure is limited by time, cost and the subjectivity of records. As a consequence, objectivity, automation and more precision of phenotypic data evaluation are needed to increase the number of samples, manage grapevine repositories, enable genetic research of new phenotypic traits and, therefore, increase the efficiency in plant research. In the present study, an automated field phenotyping pipeline was setup and applied in a plot of genetic resources. The application of the PHENObot allows image acquisition from at least 250 individual grapevines per hour directly in the field without user interaction. Data management is handled by a database (IMAGEdata). The automatic image analysis tool BIVcolor (Berries in Vineyards-color) permitted the collection of precise phenotypic data of two important fruit traits, berry size and color, within a large set of plants. The application of the PHENObot represents an automated tool for high-throughput sampling of image data in the field. The automated analysis of these images facilitates the generation of objective and precise phenotypic data on a larger scale.
Systematic Association of Genes to Phenotypes by Genome and Literature Mining
Jensen, Lars J; Perez-Iratxeta, Carolina; Kaczanowski, Szymon; Hooper, Sean D; Andrade, Miguel A
2005-01-01
One of the major challenges of functional genomics is to unravel the connection between genotype and phenotype. So far no global analysis has attempted to explore those connections in the light of the large phenotypic variability seen in nature. Here, we use an unsupervised, systematic approach for associating genes and phenotypic characteristics that combines literature mining with comparative genome analysis. We first mine the MEDLINE literature database for terms that reflect phenotypic similarities of species. Subsequently we predict the likely genomic determinants: genes specifically present in the respective genomes. In a global analysis involving 92 prokaryotic genomes we retrieve 323 clusters containing a total of 2,700 significant gene–phenotype associations. Some clusters contain mostly known relationships, such as genes involved in motility or plant degradation, often with additional hypothetical proteins associated with those phenotypes. Other clusters comprise unexpected associations; for example, a group of terms related to food and spoilage is linked to genes predicted to be involved in bacterial food poisoning. Among the clusters, we observe an enrichment of pathogenicity-related associations, suggesting that the approach reveals many novel genes likely to play a role in infectious diseases. PMID:15799710
An Automated Field Phenotyping Pipeline for Application in Grapevine Research
Kicherer, Anna; Herzog, Katja; Pflanz, Michael; Wieland, Markus; Rüger, Philipp; Kecke, Steffen; Kuhlmann, Heiner; Töpfer, Reinhard
2015-01-01
Due to its perennial nature and size, the acquisition of phenotypic data in grapevine research is almost exclusively restricted to the field and done by visual estimation. This kind of evaluation procedure is limited by time, cost and the subjectivity of records. As a consequence, objectivity, automation and more precision of phenotypic data evaluation are needed to increase the number of samples, manage grapevine repositories, enable genetic research of new phenotypic traits and, therefore, increase the efficiency in plant research. In the present study, an automated field phenotyping pipeline was setup and applied in a plot of genetic resources. The application of the PHENObot allows image acquisition from at least 250 individual grapevines per hour directly in the field without user interaction. Data management is handled by a database (IMAGEdata). The automatic image analysis tool BIVcolor (Berries in Vineyards-color) permitted the collection of precise phenotypic data of two important fruit traits, berry size and color, within a large set of plants. The application of the PHENObot represents an automated tool for high-throughput sampling of image data in the field. The automated analysis of these images facilitates the generation of objective and precise phenotypic data on a larger scale. PMID:25730485
Qiu, Jingya; Moore, Jason H; Darabos, Christian
2016-05-01
Genome-wide association studies (GWAS) have led to the discovery of over 200 single nucleotide polymorphisms (SNPs) associated with type 2 diabetes mellitus (T2DM). Additionally, East Asians develop T2DM at a higher rate, younger age, and lower body mass index than their European ancestry counterparts. The reason behind this occurrence remains elusive. With comprehensive searches through the National Human Genome Research Institute (NHGRI) GWAS catalog literature, we compiled a database of 2,800 ancestry-specific SNPs associated with T2DM and 70 other related traits. Manual data extraction was necessary because the GWAS catalog reports statistics such as odds ratio and P-value, but does not consistently include ancestry information. Currently, many statistics are derived by combining initial and replication samples from study populations of mixed ancestry. Analysis of all-inclusive data can be misleading, as not all SNPs are transferable across diverse populations. We used ancestry data to construct ancestry-specific human phenotype networks (HPN) centered on T2DM. Quantitative and visual analysis of network models reveal the genetic disparities between ancestry groups. Of the 27 phenotypes in the East Asian HPN, six phenotypes were unique to the network, revealing the underlying ancestry-specific nature of some SNPs associated with T2DM. We studied the relationship between T2DM and five phenotypes unique to the East Asian HPN to generate new interaction hypotheses in a clinical context. The genetic differences found in our ancestry-specific HPNs suggest different pathways are involved in the pathogenesis of T2DM among different populations. Our study underlines the importance of ancestry in the development of T2DM and its implications in pharmocogenetics and personalized medicine. © 2016 The Authors. *Genetic Epidemiology Published by Wiley Periodicals, Inc.
An integrative, translational approach to understanding rare and orphan genetically based diseases
Hoehndorf, Robert; Schofield, Paul N.; Gkoutos, Georgios V.
2013-01-01
PhenomeNet is an approach for integrating phenotypes across species and identifying candidate genes for genetic diseases based on the similarity between a disease and animal model phenotypes. In contrast to ‘guilt-by-association’ approaches, PhenomeNet relies exclusively on the comparison of phenotypes to suggest candidate genes, and can, therefore, be applied to study the molecular basis of rare and orphan diseases for which the molecular basis is unknown. In addition to disease phenotypes from the Online Mendelian Inheritance in Man (OMIM) database, we have now integrated the clinical signs from Orphanet into PhenomeNet. We demonstrate that our approach can efficiently identify known candidate genes for genetic diseases in Orphanet and OMIM. Furthermore, we find evidence that mutations in the HIP1 gene might cause Bassoe syndrome, a rare disorder with unknown genetic aetiology. Our results demonstrate that integration and computational analysis of human disease and animal model phenotypes using PhenomeNet has the potential to reveal novel insights into the pathobiology underlying genetic diseases. PMID:23853703
Mimvec: a deep learning approach for analyzing the human phenome.
Gan, Mingxin; Li, Wenran; Zeng, Wanwen; Wang, Xiaojian; Jiang, Rui
2017-09-21
The human phenome has been widely used with a variety of genomic data sources in the inference of disease genes. However, most existing methods thus far derive phenotype similarity based on the analysis of biomedical databases by using the traditional term frequency-inverse document frequency (TF-IDF) formulation. This framework, though intuitive, not only ignores semantic relationships between words but also tends to produce high-dimensional vectors, and hence lacks the ability to precisely capture intrinsic semantic characteristics of biomedical documents. To overcome these limitations, we propose a framework called mimvec to analyze the human phenome by making use of the state-of-the-art deep learning technique in natural language processing. We converted 24,061 records in the Online Mendelian Inheritance in Man (OMIM) database to low-dimensional vectors using our method. We demonstrated that the vector presentation not only effectively enabled classification of phenotype records against gene ones, but also succeeded in discriminating diseases of different inheritance styles and different mechanisms. We further derived pairwise phenotype similarities between 7988 human inherited diseases using their vector presentations. With a joint analysis of this phenome with multiple genomic data, we showed that phenotype overlap indeed implied genotype overlap. We finally used the derived phenotype similarities with genomic data to prioritize candidate genes and demonstrated advantages of this method over existing ones. Our method is capable of not only capturing semantic relationships between words in biomedical records but also alleviating the dimensional disaster accompanying the traditional TF-IDF framework. With the approaching of precision medicine, there will be abundant electronic records of medicine and health awaiting for deep analysis, and we expect to see a wide spectrum of applications borrowing the idea of our method in the near future.
Krassowski, Michal; Paczkowska, Marta; Cullion, Kim; Huang, Tina; Dzneladze, Irakli; Ouellette, B F Francis; Yamada, Joseph T; Fradet-Turcotte, Amelie
2018-01-01
Abstract Interpretation of genetic variation is needed for deciphering genotype-phenotype associations, mechanisms of inherited disease, and cancer driver mutations. Millions of single nucleotide variants (SNVs) in human genomes are known and thousands are associated with disease. An estimated 21% of disease-associated amino acid substitutions corresponding to missense SNVs are located in protein sites of post-translational modifications (PTMs), chemical modifications of amino acids that extend protein function. ActiveDriverDB is a comprehensive human proteo-genomics database that annotates disease mutations and population variants through the lens of PTMs. We integrated >385,000 published PTM sites with ∼3.6 million substitutions from The Cancer Genome Atlas (TCGA), the ClinVar database of disease genes, and human genome sequencing projects. The database includes site-specific interaction networks of proteins, upstream enzymes such as kinases, and drugs targeting these enzymes. We also predicted network-rewiring impact of mutations by analyzing gains and losses of kinase-bound sequence motifs. ActiveDriverDB provides detailed visualization, filtering, browsing and searching options for studying PTM-associated mutations. Users can upload mutation datasets interactively and use our application programming interface in pipelines. Integrative analysis of mutations and PTMs may help decipher molecular mechanisms of phenotypes and disease, as exemplified by case studies of TP53, BRCA2 and VHL. The open-source database is available at https://www.ActiveDriverDB.org. PMID:29126202
2012-01-01
Background Roses (Rosa sp.), which belong to the family Rosaceae, are the most economically important ornamental plants—making up 30% of the floriculture market. However, given high demand for roses, rose breeding programs are limited in molecular resources which can greatly enhance and speed breeding efforts. A better understanding of important genes that contribute to important floral development and desired phenotypes will lead to improved rose cultivars. For this study, we analyzed rose miRNAs and the rose flower transcriptome in order to generate a database to expound upon current knowledge regarding regulation of important floral characteristics. A rose genetic database will enable comprehensive analysis of gene expression and regulation via miRNA among different Rosa cultivars. Results We produced more than 0.5 million reads from expressed sequences, totalling more than 110 million bp. From these, we generated 35,657, 31,434, 34,725, and 39,722 flower unigenes from Rosa hybrid: ‘Vital’, ‘Maroussia’, and ‘Sympathy’ and Rosa rugosa Thunb. , respectively. The unigenes were assigned functional annotations, domains, metabolic pathways, Gene Ontology (GO) terms, Plant Ontology (PO) terms, and MIPS Functional Catalogue (FunCat) terms. Rose flower transcripts were compared with genes from whole genome sequences of Rosaceae members (apple, strawberry, and peach) and grape. We also produced approximately 40 million small RNA reads from flower tissue for Rosa, representing 267 unique miRNA tags. Among identified miRNAs, 25 of them were novel and 242 of them were conserved miRNAs. Statistical analyses of miRNA profiles revealed both shared and species-specific miRNAs, which presumably effect flower development and phenotypes. Conclusions In this study, we constructed a Rose miRNA and transcriptome database, and we analyzed the miRNAs and transcriptome generated from the flower tissues of four Rosa cultivars. The database provides a comprehensive genetic resource which can be used to better understand rose flower development and to identify candidate genes for important phenotypes. PMID:23171001
Kim, Jungeun; Park, June Hyun; Lim, Chan Ju; Lim, Jae Yun; Ryu, Jee-Youn; Lee, Bong-Woo; Choi, Jae-Pil; Kim, Woong Bom; Lee, Ha Yeon; Choi, Yourim; Kim, Donghyun; Hur, Cheol-Goo; Kim, Sukweon; Noh, Yoo-Sun; Shin, Chanseok; Kwon, Suk-Yoon
2012-11-21
Roses (Rosa sp.), which belong to the family Rosaceae, are the most economically important ornamental plants--making up 30% of the floriculture market. However, given high demand for roses, rose breeding programs are limited in molecular resources which can greatly enhance and speed breeding efforts. A better understanding of important genes that contribute to important floral development and desired phenotypes will lead to improved rose cultivars. For this study, we analyzed rose miRNAs and the rose flower transcriptome in order to generate a database to expound upon current knowledge regarding regulation of important floral characteristics. A rose genetic database will enable comprehensive analysis of gene expression and regulation via miRNA among different Rosa cultivars. We produced more than 0.5 million reads from expressed sequences, totalling more than 110 million bp. From these, we generated 35,657, 31,434, 34,725, and 39,722 flower unigenes from Rosa hybrid: 'Vital', 'Maroussia', and 'Sympathy' and Rosa rugosa Thunb., respectively. The unigenes were assigned functional annotations, domains, metabolic pathways, Gene Ontology (GO) terms, Plant Ontology (PO) terms, and MIPS Functional Catalogue (FunCat) terms. Rose flower transcripts were compared with genes from whole genome sequences of Rosaceae members (apple, strawberry, and peach) and grape. We also produced approximately 40 million small RNA reads from flower tissue for Rosa, representing 267 unique miRNA tags. Among identified miRNAs, 25 of them were novel and 242 of them were conserved miRNAs. Statistical analyses of miRNA profiles revealed both shared and species-specific miRNAs, which presumably effect flower development and phenotypes. In this study, we constructed a Rose miRNA and transcriptome database, and we analyzed the miRNAs and transcriptome generated from the flower tissues of four Rosa cultivars. The database provides a comprehensive genetic resource which can be used to better understand rose flower development and to identify candidate genes for important phenotypes.
The Matchmaker Exchange: a platform for rare disease gene discovery.
Philippakis, Anthony A; Azzariti, Danielle R; Beltran, Sergi; Brookes, Anthony J; Brownstein, Catherine A; Brudno, Michael; Brunner, Han G; Buske, Orion J; Carey, Knox; Doll, Cassie; Dumitriu, Sergiu; Dyke, Stephanie O M; den Dunnen, Johan T; Firth, Helen V; Gibbs, Richard A; Girdea, Marta; Gonzalez, Michael; Haendel, Melissa A; Hamosh, Ada; Holm, Ingrid A; Huang, Lijia; Hurles, Matthew E; Hutton, Ben; Krier, Joel B; Misyura, Andriy; Mungall, Christopher J; Paschall, Justin; Paten, Benedict; Robinson, Peter N; Schiettecatte, François; Sobreira, Nara L; Swaminathan, Ganesh J; Taschner, Peter E; Terry, Sharon F; Washington, Nicole L; Züchner, Stephan; Boycott, Kym M; Rehm, Heidi L
2015-10-01
There are few better examples of the need for data sharing than in the rare disease community, where patients, physicians, and researchers must search for "the needle in a haystack" to uncover rare, novel causes of disease within the genome. Impeding the pace of discovery has been the existence of many small siloed datasets within individual research or clinical laboratory databases and/or disease-specific organizations, hoping for serendipitous occasions when two distant investigators happen to learn they have a rare phenotype in common and can "match" these cases to build evidence for causality. However, serendipity has never proven to be a reliable or scalable approach in science. As such, the Matchmaker Exchange (MME) was launched to provide a robust and systematic approach to rare disease gene discovery through the creation of a federated network connecting databases of genotypes and rare phenotypes using a common application programming interface (API). The core building blocks of the MME have been defined and assembled. Three MME services have now been connected through the API and are available for community use. Additional databases that support internal matching are anticipated to join the MME network as it continues to grow. © 2015 WILEY PERIODICALS, INC.
The Matchmaker Exchange: A Platform for Rare Disease Gene Discovery
Philippakis, Anthony A.; Azzariti, Danielle R.; Beltran, Sergi; Brookes, Anthony J.; Brownstein, Catherine A.; Brudno, Michael; Brunner, Han G.; Buske, Orion J.; Carey, Knox; Doll, Cassie; Dumitriu, Sergiu; Dyke, Stephanie O.M.; den Dunnen, Johan T.; Firth, Helen V.; Gibbs, Richard A.; Girdea, Marta; Gonzalez, Michael; Haendel, Melissa A.; Hamosh, Ada; Holm, Ingrid A.; Huang, Lijia; Hurles, Matthew E.; Hutton, Ben; Krier, Joel B.; Misyura, Andriy; Mungall, Christopher J.; Paschall, Justin; Paten, Benedict; Robinson, Peter N.; Schiettecatte, François; Sobreira, Nara L.; Swaminathan, Ganesh J.; Taschner, Peter E.; Terry, Sharon F.; Washington, Nicole L.; Züchner, Stephan; Boycott, Kym M.; Rehm, Heidi L.
2015-01-01
There are few better examples of the need for data sharing than in the rare disease community, where patients, physicians, and researchers must search for “the needle in a haystack” to uncover rare, novel causes of disease within the genome. Impeding the pace of discovery has been the existence of many small siloed datasets within individual research or clinical laboratory databases and/or disease-specific organizations, hoping for serendipitous occasions when two distant investigators happen to learn they have a rare phenotype in common and can “match” these cases to build evidence for causality. However, serendipity has never proven to be a reliable or scalable approach in science. As such, the Matchmaker Exchange (MME) was launched to provide a robust and systematic approach to rare disease gene discovery through the creation of a federated network connecting databases of genotypes and rare phenotypes using a common application programming interface (API). The core building blocks of the MME have been defined and assembled. Three MME services have now been connected through the API and are available for community use. Additional databases that support internal matching are anticipated to join the MME network as it continues to grow. PMID:26295439
The Matchmaker Exchange: A Platform for Rare Disease Gene Discovery
Philippakis, Anthony A.; Azzariti, Danielle R.; Beltran, Sergi; ...
2015-09-17
There are few better examples of the need for data sharing than in the rare disease community, where patients, physicians, and researchers must search for "the needle in a haystack" to uncover rare, novel causes of disease within the genome. Impeding the pace of discovery has been the existence of many small siloed datasets within individual research or clinical laboratory databases and/or disease-specific organizations, hoping for serendipitous occasions when two distant investigators happen to learn they have a rare phenotype in common and can "match" these cases to build evidence for causality. However, serendipity has never proven to be amore » reliable or scalable approach in science. As such, the Matchmaker Exchange (MME) was launched to provide a robust and systematic approach to rare disease gene discovery through the creation of a federated network connecting databases of genotypes and rare phenotypes using a common application programming interface (API). The core building blocks of the MME have been defined and assembled. In conclusion, three MME services have now been connected through the API and are available for community use. Additional databases that support internal matching are anticipated to join the MME network as it continues to grow.« less
The Matchmaker Exchange: A Platform for Rare Disease Gene Discovery
DOE Office of Scientific and Technical Information (OSTI.GOV)
Philippakis, Anthony A.; Azzariti, Danielle R.; Beltran, Sergi
There are few better examples of the need for data sharing than in the rare disease community, where patients, physicians, and researchers must search for "the needle in a haystack" to uncover rare, novel causes of disease within the genome. Impeding the pace of discovery has been the existence of many small siloed datasets within individual research or clinical laboratory databases and/or disease-specific organizations, hoping for serendipitous occasions when two distant investigators happen to learn they have a rare phenotype in common and can "match" these cases to build evidence for causality. However, serendipity has never proven to be amore » reliable or scalable approach in science. As such, the Matchmaker Exchange (MME) was launched to provide a robust and systematic approach to rare disease gene discovery through the creation of a federated network connecting databases of genotypes and rare phenotypes using a common application programming interface (API). The core building blocks of the MME have been defined and assembled. In conclusion, three MME services have now been connected through the API and are available for community use. Additional databases that support internal matching are anticipated to join the MME network as it continues to grow.« less
A genome-scale metabolic flux model of Escherichia coli K–12 derived from the EcoCyc database
2014-01-01
Background Constraint-based models of Escherichia coli metabolic flux have played a key role in computational studies of cellular metabolism at the genome scale. We sought to develop a next-generation constraint-based E. coli model that achieved improved phenotypic prediction accuracy while being frequently updated and easy to use. We also sought to compare model predictions with experimental data to highlight open questions in E. coli biology. Results We present EcoCyc–18.0–GEM, a genome-scale model of the E. coli K–12 MG1655 metabolic network. The model is automatically generated from the current state of EcoCyc using the MetaFlux software, enabling the release of multiple model updates per year. EcoCyc–18.0–GEM encompasses 1445 genes, 2286 unique metabolic reactions, and 1453 unique metabolites. We demonstrate a three-part validation of the model that breaks new ground in breadth and accuracy: (i) Comparison of simulated growth in aerobic and anaerobic glucose culture with experimental results from chemostat culture and simulation results from the E. coli modeling literature. (ii) Essentiality prediction for the 1445 genes represented in the model, in which EcoCyc–18.0–GEM achieves an improved accuracy of 95.2% in predicting the growth phenotype of experimental gene knockouts. (iii) Nutrient utilization predictions under 431 different media conditions, for which the model achieves an overall accuracy of 80.7%. The model’s derivation from EcoCyc enables query and visualization via the EcoCyc website, facilitating model reuse and validation by inspection. We present an extensive investigation of disagreements between EcoCyc–18.0–GEM predictions and experimental data to highlight areas of interest to E. coli modelers and experimentalists, including 70 incorrect predictions of gene essentiality on glucose, 80 incorrect predictions of gene essentiality on glycerol, and 83 incorrect predictions of nutrient utilization. Conclusion Significant advantages can be derived from the combination of model organism databases and flux balance modeling represented by MetaFlux. Interpretation of the EcoCyc database as a flux balance model results in a highly accurate metabolic model and provides a rigorous consistency check for information stored in the database. PMID:24974895
The NKI-Rockland Sample: A Model for Accelerating the Pace of Discovery Science in Psychiatry
Nooner, Kate Brody; Colcombe, Stanley J.; Tobe, Russell H.; Mennes, Maarten; Benedict, Melissa M.; Moreno, Alexis L.; Panek, Laura J.; Brown, Shaquanna; Zavitz, Stephen T.; Li, Qingyang; Sikka, Sharad; Gutman, David; Bangaru, Saroja; Schlachter, Rochelle Tziona; Kamiel, Stephanie M.; Anwar, Ayesha R.; Hinz, Caitlin M.; Kaplan, Michelle S.; Rachlin, Anna B.; Adelsberg, Samantha; Cheung, Brian; Khanuja, Ranjit; Yan, Chaogan; Craddock, Cameron C.; Calhoun, Vincent; Courtney, William; King, Margaret; Wood, Dylan; Cox, Christine L.; Kelly, A. M. Clare; Di Martino, Adriana; Petkova, Eva; Reiss, Philip T.; Duan, Nancy; Thomsen, Dawn; Biswal, Bharat; Coffey, Barbara; Hoptman, Matthew J.; Javitt, Daniel C.; Pomara, Nunzio; Sidtis, John J.; Koplewicz, Harold S.; Castellanos, Francisco Xavier; Leventhal, Bennett L.; Milham, Michael P.
2012-01-01
The National Institute of Mental Health strategic plan for advancing psychiatric neuroscience calls for an acceleration of discovery and the delineation of developmental trajectories for risk and resilience across the lifespan. To attain these objectives, sufficiently powered datasets with broad and deep phenotypic characterization, state-of-the-art neuroimaging, and genetic samples must be generated and made openly available to the scientific community. The enhanced Nathan Kline Institute-Rockland Sample (NKI-RS) is a response to this need. NKI-RS is an ongoing, institutionally centered endeavor aimed at creating a large-scale (N > 1000), deeply phenotyped, community-ascertained, lifespan sample (ages 6–85 years old) with advanced neuroimaging and genetics. These data will be publically shared, openly, and prospectively (i.e., on a weekly basis). Herein, we describe the conceptual basis of the NKI-RS, including study design, sampling considerations, and steps to synchronize phenotypic and neuroimaging assessment. Additionally, we describe our process for sharing the data with the scientific community while protecting participant confidentiality, maintaining an adequate database, and certifying data integrity. The pilot phase of the NKI-RS, including challenges in recruiting, characterizing, imaging, and sharing data, is discussed while also explaining how this experience informed the final design of the enhanced NKI-RS. It is our hope that familiarity with the conceptual underpinnings of the enhanced NKI-RS will facilitate harmonization with future data collection efforts aimed at advancing psychiatric neuroscience and nosology. PMID:23087608
Costanzo, Maria C.; Crawford, Matthew E.; Hirschman, Jodi E.; Kranz, Janice E.; Olsen, Philip; Robertson, Laura S.; Skrzypek, Marek S.; Braun, Burkhard R.; Hopkins, Kelley Lennon; Kondu, Pinar; Lengieza, Carey; Lew-Smith, Jodi E.; Tillberg, Michael; Garrels, James I.
2001-01-01
The BioKnowledge Library is a relational database and web site (http://www.proteome.com) composed of protein-specific information collected from the scientific literature. Each Protein Report on the web site summarizes and displays published information about a single protein, including its biochemical function, role in the cell and in the whole organism, localization, mutant phenotype and genetic interactions, regulation, domains and motifs, interactions with other proteins and other relevant data. This report describes four species-specific volumes of the BioKnowledge Library, concerned with the model organisms Saccharomyces cerevisiae (YPD), Schizosaccharomyces pombe (PombePD) and Caenorhabditis elegans (WormPD), and with the fungal pathogen Candida albicans (CalPD™). Protein Reports of each species are unified in format, easily searchable and extensively cross-referenced between species. The relevance of these comprehensively curated resources to analysis of proteins in other species is discussed, and is illustrated by a survey of model organism proteins that have similarity to human proteins involved in disease. PMID:11125054
Buxbaum, Joseph D; Bolshakova, Nadia; Brownfeld, Jessica M; Anney, Richard Jl; Bender, Patrick; Bernier, Raphael; Cook, Edwin H; Coon, Hilary; Cuccaro, Michael; Freitag, Christine M; Hallmayer, Joachim; Geschwind, Daniel; Klauck, Sabine M; Nurnberger, John I; Oliveira, Guiomar; Pinto, Dalila; Poustka, Fritz; Scherer, Stephen W; Shih, Andy; Sutcliffe, James S; Szatmari, Peter; Vicente, Astrid M; Vieland, Veronica; Gallagher, Louise
2014-01-01
There is an urgent need for expanding and enhancing autism spectrum disorder (ASD) samples, in order to better understand causes of ASD. In a unique public-private partnership, 13 sites with extensive experience in both the assessment and diagnosis of ASD embarked on an ambitious, 2-year program to collect samples for genetic and phenotypic research and begin analyses on these samples. The program was called The Autism Simplex Collection (TASC). TASC sample collection began in 2008 and was completed in 2010, and included nine sites from North America and four sites from Western Europe, as well as a centralized Data Coordinating Center. Over 1,700 trios are part of this collection, with DNA from transformed cells now available through the National Institute of Mental Health (NIMH). Autism Diagnostic Interview-Revised (ADI-R) and Autism Diagnostic Observation Schedule-Generic (ADOS-G) measures are available for all probands, as are standardized IQ measures, Vineland Adaptive Behavioral Scales (VABS), the Social Responsiveness Scale (SRS), Peabody Picture Vocabulary Test (PPVT), and physical measures (height, weight, and head circumference). At almost every site, additional phenotypic measures were collected, including the Broad Autism Phenotype Questionnaire (BAPQ) and Repetitive Behavior Scale-Revised (RBS-R), as well as the non-word repetition scale, Communication Checklist (Children's or Adult), and Aberrant Behavior Checklist (ABC). Moreover, for nearly 1,000 trios, the Autism Genome Project Consortium (AGP) has carried out Illumina 1 M SNP genotyping and called copy number variation (CNV) in the samples, with data being made available through the National Institutes of Health (NIH). Whole exome sequencing (WES) has been carried out in over 500 probands, together with ancestry matched controls, and this data is also available through the NIH. Additional WES is being carried out by the Autism Sequencing Consortium (ASC), where the focus is on sequencing complete trios. ASC sequencing for the first 1,000 samples (all from whole-blood DNA) is complete and data will be released in 2014. Data is being made available through NIH databases (database of Genotypes and Phenotypes (dbGaP) and National Database for Autism Research (NDAR)) with DNA released in Dist 11.0. Primary funding for the collection, genotyping, sequencing and distribution of TASC samples was provided by Autism Speaks and the NIH, including the National Institute of Mental Health (NIMH) and the National Human Genetics Research Institute (NHGRI). TASC represents an important sample set that leverages expert sites. Similar approaches, leveraging expert sites and ongoing studies, represent an important path towards further enhancing available ASD samples.
Kulaeva, Olga A; Zhernakov, Aleksandr I; Afonin, Alexey M; Boikov, Sergei S; Sulima, Anton S; Tikhonovich, Igor A; Zhukov, Vladimir A
2017-01-01
Pea (Pisum sativum L.) is the oldest model object of plant genetics and one of the most agriculturally important legumes in the world. Since the pea genome has not been sequenced yet, identification of genes responsible for mutant phenotypes or desirable agricultural traits is usually performed via genetic mapping followed by candidate gene search. Such mapping is best carried out using gene-based molecular markers, as it opens the possibility for exploiting genome synteny between pea and its close relative Medicago truncatula Gaertn., possessing sequenced and annotated genome. In the last 5 years, a large number of pea gene-based molecular markers have been designed and mapped owing to the rapid evolution of "next-generation sequencing" technologies. However, the access to the complete set of markers designed worldwide is limited because the data are not uniformed and therefore hard to use. The Pea Marker Database was designed to combine the information about pea markers in a form of user-friendly and practical online tool. Version 1 (PMD1) comprises information about 2484 genic markers, including their locations in linkage groups, the sequences of corresponding pea transcripts and the names of related genes in M. truncatula. Version 2 (PMD2) is an updated version comprising 15944 pea markers in the same format with several advanced features. To test the performance of the PMD, fine mapping of pea symbiotic genes Sym13 and Sym27 in linkage groups VII and V, respectively, was carried out. The results of mapping allowed us to propose the Sen1 gene (a homologue of SEN1 gene of Lotus japonicus (Regel) K. Larsen) as the best candidate gene for Sym13, and to narrow the list of possible candidate genes for Sym27 to ten, thus proving PMD to be useful for pea gene mapping and cloning. All information contained in PMD1 and PMD2 is available at www.peamarker.arriam.ru.
Text mining and expert curation to develop a database on psychiatric diseases and their genes
Gutiérrez-Sacristán, Alba; Bravo, Àlex; Portero-Tresserra, Marta; Valverde, Olga; Armario, Antonio; Blanco-Gandía, M.C.; Farré, Adriana; Fernández-Ibarrondo, Lierni; Fonseca, Francina; Giraldo, Jesús; Leis, Angela; Mané, Anna; Mayer, M.A.; Montagud-Romero, Sandra; Nadal, Roser; Ortiz, Jordi; Pavon, Francisco Javier; Perez, Ezequiel Jesús; Rodríguez-Arias, Marta; Serrano, Antonia; Torrens, Marta; Warnault, Vincent; Sanz, Ferran
2017-01-01
Abstract Psychiatric disorders constitute one of the main causes of disability worldwide. During the past years, considerable research has been conducted on the genetic architecture of such diseases, although little understanding of their etiology has been achieved. The difficulty to access up-to-date, relevant genotype-phenotype information has hampered the application of this wealth of knowledge to translational research and clinical practice in order to improve diagnosis and treatment of psychiatric patients. PsyGeNET (http://www.psygenet.org/) has been developed with the aim of supporting research on the genetic architecture of psychiatric diseases, by providing integrated and structured accessibility to their genotype–phenotype association data, together with analysis and visualization tools. In this article, we describe the protocol developed for the sustainable update of this knowledge resource. It includes the recruitment of a team of domain experts in order to perform the curation of the data extracted by text mining. Annotation guidelines and a web-based annotation tool were developed to support the curators’ tasks. A curation workflow was designed including a pilot phase and two rounds of curation and analysis phases. Negative evidence from the literature on gene–disease associations (GDAs) was taken into account in the curation process. We report the results of the application of this workflow to the curation of GDAs for PsyGeNET, including the analysis of the inter-annotator agreement and suggest this model as a suitable approach for the sustainable development and update of knowledge resources. Database URL: http://www.psygenet.org PsyGeNET corpus: http://www.psygenet.org/ds/PsyGeNET/results/psygenetCorpus.tar PMID:29220439
Investigation of mutations in the HBB gene using the 1,000 genomes database.
Carlice-Dos-Reis, Tânia; Viana, Jaime; Moreira, Fabiano Cordeiro; Cardoso, Greice de Lemos; Guerreiro, João; Santos, Sidney; Ribeiro-Dos-Santos, Ândrea
2017-01-01
Mutations in the HBB gene are responsible for several serious hemoglobinopathies, such as sickle cell anemia and β-thalassemia. Sickle cell anemia is one of the most common monogenic diseases worldwide. Due to its prevalence, diverse strategies have been developed for a better understanding of its molecular mechanisms. In silico analysis has been increasingly used to investigate the genotype-phenotype relationship of many diseases, and the sequences of healthy individuals deposited in the 1,000 Genomes database appear to be an excellent tool for such analysis. The objective of this study is to analyze the variations in the HBB gene in the 1,000 Genomes database, to describe the mutation frequencies in the different population groups, and to investigate the pattern of pathogenicity. The computational tool SNPEFF was used to align the data from 2,504 samples of the 1,000 Genomes database with the HG19 genome reference. The pathogenicity of each amino acid change was investigated using the databases CLINVAR, dbSNP and HbVar and five different predictors. Twenty different mutations were found in 209 healthy individuals. The African group had the highest number of individuals with mutations, and the European group had the lowest number. Thus, it is concluded that approximately 8.3% of phenotypically healthy individuals from the 1,000 Genomes database have some mutation in the HBB gene. The frequency of mutated genes was estimated at 0.042, so that the expected frequency of being homozygous or compound heterozygous for these variants in the next generation is approximately 0.002. In total, 193 subjects had a non-synonymous mutation, which 186 (7.4%) have a deleterious mutation. Considering that the 1,000 Genomes database is representative of the world's population, it can be estimated that fourteen out of every 10,000 individuals in the world will have a hemoglobinopathy in the next generation.
Rusbridge, Clare; Knowler, Penny; Rouleau, Guy A; Minassian, Berge A; Rothuizen, Jan
2005-01-01
Inherited diseases commonly emerge within pedigree dog populations, often due to use of repeatedly bred carrier sire(s) within a small gene pool. Accurate family records are usually available making linkage analysis possible. However, there are many factors that are intrinsically difficult about collecting DNA and collating pedigree information from a large canine population. The keys to a successful DNA collection program include (1) the need to establish and maintain support from the pedigree breed clubs and pet owners; (2) committed individual(s) who can devote the considerable amount of time and energy to coordinating sample collection and communicating with breeders and clubs; and (3) providing means by which genotypic and phenotypic information can be easily collected and stored. In this article we described the clinical characteristics of inherited occipital hypoplasia/syringomyelia (Chiari type I malformation) in the cavalier King Charles spaniel and our experiences in establishing a pedigree and DNA database to study the disease.
Code of Federal Regulations, 2011 CFR
2011-01-01
... AVAILABLE CONSUMER PRODUCT SAFETY INFORMATION DATABASE (Eff. Jan. 10, 2011) Background and Definitions... Product Safety Information Database. (2) Commission or CPSC means the Consumer Product Safety Commission... Information Database, also referred to as the Database, means the database on the safety of consumer products...
Matsuda, Fumio; Nakabayashi, Ryo; Sawada, Yuji; Suzuki, Makoto; Hirai, Masami Y.; Kanaya, Shigehiko; Saito, Kazuki
2011-01-01
A novel framework for automated elucidation of metabolite structures in liquid chromatography–mass spectrometer metabolome data was constructed by integrating databases. High-resolution tandem mass spectra data automatically acquired from each metabolite signal were used for database searches. Three distinct databases, KNApSAcK, ReSpect, and the PRIMe standard compound database, were employed for the structural elucidation. The outputs were retrieved using the CAS metabolite identifier for identification and putative annotation. A simple metabolite ontology system was also introduced to attain putative characterization of the metabolite signals. The automated method was applied for the metabolome data sets obtained from the rosette leaves of 20 Arabidopsis accessions. Phenotypic variations in novel Arabidopsis metabolites among these accessions could be investigated using this method. PMID:22645535
Patterns of developmental plasticity in response to incubation temperature in reptiles.
While, Geoffrey M; Noble, Daniel W A; Uller, Tobias; Warner, Daniel A; Riley, Julia L; Du, Wei-Guo; Schwanz, Lisa E
2018-05-28
Early life environments shape phenotypic development in important ways that can lead to long-lasting effects on phenotype and fitness. In reptiles, one aspect of the early environment that impacts development is temperature (termed 'thermal developmental plasticity'). Indeed, the thermal environment during incubation is known to influence morphological, physiological, and behavioral traits, some of which have important consequences for many ecological and evolutionary processes. Despite this, few studies have attempted to synthesize and collate data from this expansive and important body of research. Here, we systematically review research into thermal developmental plasticity across reptiles, structured around the key papers and findings that have shaped the field over the past 50 years. From these papers, we introduce a large database (the 'Reptile Development Database') consisting of 9,773 trait means across 300 studies examining thermal developmental plasticity. This dataset encompasses data on a range of phenotypes, including morphological, physiological, behavioral, and performance traits along with growth rate, incubation duration, sex ratio, and survival (e.g., hatching success) across all major reptile clades. Finally, from our literature synthesis and data exploration, we identify key research themes associated with thermal developmental plasticity, important gaps in empirical research, and demonstrate how future progress can be made through targeted empirical, meta-analytic, and comparative work. © 2018 Wiley Periodicals, Inc.
Genic insights from integrated human proteomics in GeneCards.
Fishilevich, Simon; Zimmerman, Shahar; Kohn, Asher; Iny Stein, Tsippi; Olender, Tsviya; Kolker, Eugene; Safran, Marilyn; Lancet, Doron
2016-01-01
GeneCards is a one-stop shop for searchable human gene annotations (http://www.genecards.org/). Data are automatically mined from ∼120 sources and presented in an integrated web card for every human gene. We report the application of recent advances in proteomics to enhance gene annotation and classification in GeneCards. First, we constructed the Human Integrated Protein Expression Database (HIPED), a unified database of protein abundance in human tissues, based on the publically available mass spectrometry (MS)-based proteomics sources ProteomicsDB, Multi-Omics Profiling Expression Database, Protein Abundance Across Organisms and The MaxQuant DataBase. The integrated database, residing within GeneCards, compares favourably with its individual sources, covering nearly 90% of human protein-coding genes. For gene annotation and comparisons, we first defined a protein expression vector for each gene, based on normalized abundances in 69 normal human tissues. This vector is portrayed in the GeneCards expression section as a bar graph, allowing visual inspection and comparison. These data are juxtaposed with transcriptome bar graphs. Using the protein expression vectors, we further defined a pairwise metric that helps assess expression-based pairwise proximity. This new metric for finding functional partners complements eight others, including sharing of pathways, gene ontology (GO) terms and domains, implemented in the GeneCards Suite. In parallel, we calculated proteome-based differential expression, highlighting a subset of tissues that overexpress a gene and subserving gene classification. This textual annotation allows users of VarElect, the suite's next-generation phenotyper, to more effectively discover causative disease variants. Finally, we define the protein-RNA expression ratio and correlation as yet another attribute of every gene in each tissue, adding further annotative information. The results constitute a significant enhancement of several GeneCards sections and help promote and organize the genome-wide structural and functional knowledge of the human proteome. Database URL:http://www.genecards.org/. © The Author(s) 2016. Published by Oxford University Press.
Internet-accessible DNA sequence database for identifying fusaria from human and animal infections.
O'Donnell, Kerry; Sutton, Deanna A; Rinaldi, Michael G; Sarver, Brice A J; Balajee, S Arunmozhi; Schroers, Hans-Josef; Summerbell, Richard C; Robert, Vincent A R G; Crous, Pedro W; Zhang, Ning; Aoki, Takayuki; Jung, Kyongyong; Park, Jongsun; Lee, Yong-Hwan; Kang, Seogchan; Park, Bongsoo; Geiser, David M
2010-10-01
Because less than one-third of clinically relevant fusaria can be accurately identified to species level using phenotypic data (i.e., morphological species recognition), we constructed a three-locus DNA sequence database to facilitate molecular identification of the 69 Fusarium species associated with human or animal mycoses encountered in clinical microbiology laboratories. The database comprises partial sequences from three nuclear genes: translation elongation factor 1α (EF-1α), the largest subunit of RNA polymerase (RPB1), and the second largest subunit of RNA polymerase (RPB2). These three gene fragments can be amplified by PCR and sequenced using primers that are conserved across the phylogenetic breadth of Fusarium. Phylogenetic analyses of the combined data set reveal that, with the exception of two monotypic lineages, all clinically relevant fusaria are nested in one of eight variously sized and strongly supported species complexes. The monophyletic lineages have been named informally to facilitate communication of an isolate's clade membership and genetic diversity. To identify isolates to the species included within the database, partial DNA sequence data from one or more of the three genes can be used as a BLAST query against the database which is Web accessible at FUSARIUM-ID (http://isolate.fusariumdb.org) and the Centraalbureau voor Schimmelcultures (CBS-KNAW) Fungal Biodiversity Center (http://www.cbs.knaw.nl/fusarium). Alternatively, isolates can be identified via phylogenetic analysis by adding sequences of unknowns to the DNA sequence alignment, which can be downloaded from the two aforementioned websites. The utility of this database should increase significantly as members of the clinical microbiology community deposit in internationally accessible culture collections (e.g., CBS-KNAW or the Fusarium Research Center) cultures of novel mycosis-associated fusaria, along with associated, corrected sequence chromatograms and data, so that the sequence results can be verified and isolates are made available for future study.
Carr, Brian I.; Giannini, Edoardo G.; Farinati, Fabio; Ciccarese, Francesca; Rapaccini, Gian Ludovico; Marco, Maria Di; Benvegnù, Luisa; Zoli, Marco; Borzio, Franco; Caturelli, Eugenio; Chiaramonte, Maria; Trevisani, Franco
2014-01-01
Background Previous work has shown that 2 general processes contribute to hepatocellular cancer (HCC) prognosis. They are: a. liver damage, monitored by indices such as blood bilirubin, prothrombin time and AST; as well as b. tumor biology, monitored by indices such as tumor size, tumor number, presence of PVT and blood AFP levels. These 2 processes may affect one another, with prognostically significant interactions between multiple tumor and host parameters. These interactions form a context that provide personalization of the prognostic meaning of these factors for every patient. Thus, a given level of bilirubin or tumor diameter might have a different significance in different personal contexts. We previously applied Network Phenotyping Strategy (NPS) to characterize interactions between liver function indices of Asian HCC patients and recognized two clinical phenotypes, S and L, differing in tumor size and tumor nodule numbers. Aims To validate the applicability of the NPS-based HCC S/L classification on an independent European HCC cohort, for which survival information was additionally available. Methods Four sets of peripheral blood parameters, including AFP-platelets, derived from routine blood parameter levels and tumor indices from the ITA.LI.CA database, were analyzed using NPS, a graph-theory based approach, which compares personal patterns of complete relationships between clinical data values to reference patterns with significant association to disease outcomes. Results Without reference to the actual tumor sizes, patients were classified by NPS into 2 subgroups with S and L phenotypes. These two phenotypes were recognized using solely the HCC screening test results, consisting of eight common blood parameters, paired by their significant correlations, including an AFP-Platelets relationship. These trends were combined with patient age, gender and self-reported alcoholism into NPS personal patient profiles. We subsequently validated (using actual scan data) that patients in L phenotype group had 1.5x larger mean tumor masses relative to S, p=6×10−16. Importantly, with the new data, liver test pattern-identified S-phenotype patients had typically 1.7 × longer survival compared to L-phenotype. NPS integrated the liver, tumor and basic demographic factors. Cirrhosis associated thrombocytopenia was typical for smaller S-tumors. In L-tumor phenotype, typical platelet levels increased with the tumor mass. Hepatic inflammation and tumor factors contributed to more aggressive L tumors, with parenchymal destruction and shorter survival. Summary NPS provides integrative interpretation for HCC behavior, identifying two tumor and survival phenotypes by clinical parameter patterns. The NPS classifier is provided as an Excel tool. The NPS system shows the importance of considering each tumor marker and parameter in the total context of all the other parameters of an individual patient. PMID:25023357
Williams, L. Keoki; Buu, Anne
2017-01-01
We propose a multivariate genome-wide association test for mixed continuous, binary, and ordinal phenotypes. A latent response model is used to estimate the correlation between phenotypes with different measurement scales so that the empirical distribution of the Fisher’s combination statistic under the null hypothesis is estimated efficiently. The simulation study shows that our proposed correlation estimation methods have high levels of accuracy. More importantly, our approach conservatively estimates the variance of the test statistic so that the type I error rate is controlled. The simulation also shows that the proposed test maintains the power at the level very close to that of the ideal analysis based on known latent phenotypes while controlling the type I error. In contrast, conventional approaches–dichotomizing all observed phenotypes or treating them as continuous variables–could either reduce the power or employ a linear regression model unfit for the data. Furthermore, the statistical analysis on the database of the Study of Addiction: Genetics and Environment (SAGE) demonstrates that conducting a multivariate test on multiple phenotypes can increase the power of identifying markers that may not be, otherwise, chosen using marginal tests. The proposed method also offers a new approach to analyzing the Fagerström Test for Nicotine Dependence as multivariate phenotypes in genome-wide association studies. PMID:28081206
Rappaport, Noa; Fishilevich, Simon; Nudel, Ron; Twik, Michal; Belinky, Frida; Plaschkes, Inbar; Stein, Tsippi Iny; Cohen, Dana; Oz-Levi, Danit; Safran, Marilyn; Lancet, Doron
2017-08-18
A key challenge in the realm of human disease research is next generation sequencing (NGS) interpretation, whereby identified filtered variant-harboring genes are associated with a patient's disease phenotypes. This necessitates bioinformatics tools linked to comprehensive knowledgebases. The GeneCards suite databases, which include GeneCards (human genes), MalaCards (human diseases) and PathCards (human pathways) together with additional tools, are presented with the focus on MalaCards utility for NGS interpretation as well as for large scale bioinformatic analyses. VarElect, our NGS interpretation tool, leverages the broad information in the GeneCards suite databases. MalaCards algorithms unify disease-related terms and annotations from 69 sources. Further, MalaCards defines hierarchical relatedness-aliases, disease families, a related diseases network, categories and ontological classifications. GeneCards and MalaCards delineate and share a multi-tiered, scored gene-disease network, with stringency levels, including the definition of elite status-high quality gene-disease pairs, coming from manually curated trustworthy sources, that includes 4500 genes for 8000 diseases. This unique resource is key to NGS interpretation by VarElect. VarElect, a comprehensive search tool that helps infer both direct and indirect links between genes and user-supplied disease/phenotype terms, is robustly strengthened by the information found in MalaCards. The indirect mode benefits from GeneCards' diverse gene-to-gene relationships, including SuperPaths-integrated biological pathways from 12 information sources. We are currently adding an important information layer in the form of "disease SuperPaths", generated from the gene-disease matrix by an algorithm similar to that previously employed for biological pathway unification. This allows the discovery of novel gene-disease and disease-disease relationships. The advent of whole genome sequencing necessitates capacities to go beyond protein coding genes. GeneCards is highly useful in this respect, as it also addresses 101,976 non-protein-coding RNA genes. In a more recent development, we are currently adding an inclusive map of regulatory elements and their inferred target genes, generated by integration from 4 resources. MalaCards provides a rich big-data scaffold for in silico biomedical discovery within the gene-disease universe. VarElect, which depends significantly on both GeneCards and MalaCards power, is a potent tool for supporting the interpretation of wet-lab experiments, notably NGS analyses of disease. The GeneCards suite has thus transcended its 2-decade role in biomedical research, maturing into a key player in clinical investigation.
Gao, Zhen; Chen, Yang; Cai, Xiaoshu; Xu, Rong
2017-01-01
Abstract Motivation: Blood–Brain-Barrier (BBB) is a rigorous permeability barrier for maintaining homeostasis of Central Nervous System (CNS). Determination of compound’s permeability to BBB is prerequisite in CNS drug discovery. Existing computational methods usually predict drug BBB permeability from chemical structure and they generally apply to small compounds passing BBB through passive diffusion. As abundant information on drug side effects and indications has been recorded over time through extensive clinical usage, we aim to explore BBB permeability prediction from a new angle and introduce a novel approach to predict BBB permeability from drug clinical phenotypes (drug side effects and drug indications). This method can apply to both small compounds and macro-molecules penetrating BBB through various mechanisms besides passive diffusion. Results: We composed a training dataset of 213 drugs with known brain and blood steady-state concentrations ratio and extracted their side effects and indications as features. Next, we trained SVM models with polynomial kernel and obtained accuracy of 76.0%, AUC 0.739, and F1 score (macro weighted) 0.760 with Monte Carlo cross validation. The independent test accuracy was 68.3%, AUC 0.692, F1 score 0.676. When both chemical features and clinical phenotypes were available, combining the two types of features achieved significantly better performance than chemical feature based approach (accuracy 85.5% versus 72.9%, AUC 0.854 versus 0.733, F1 score 0.854 versus 0.725; P < e−90). We also conducted de novo prediction and identified 110 drugs in SIDER database having the potential to penetrate BBB, which could serve as start point for CNS drug repositioning research. Availability and Implementation: https://github.com/bioinformatics-gao/CASE-BBB-prediction-Data Contact: rxx@case.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27993785
Svahn, Johanna; Bagnasco, Francesca; Cappelli, Enrico; Onofrillo, Daniela; Caruso, Silvia; Corsolini, Fabio; De Rocco, Daniela; Savoia, Anna; Longoni, Daniela; Pillon, Marta; Marra, Nicoletta; Ramenghi, Ugo; Farruggia, Piero; Locasciulli, Anna; Addari, Carmen; Cerri, Carla; Mastrodicasa, Elena; Casazza, Gabriella; Verzegnassi, Federico; Riccardi, Francesca; Haupt, Riccardo; Barone, Angelica; Cesaro, Simone; Cugno, Chiara; Dufour, Carlo
2016-07-01
We analyzed 97 Fanconi anemia patients from a clinic/biological database for genotype, somatic, and hematologic phenotype, adverse hematological events, solid tumors, and treatment. Seventy-two patients belonged to complementation group A. Eighty percent of patients presented with mild/moderate somatic phenotype and most with cytopenia. No correlation was seen between somatic/hematologic phenotype and number of missense mutations of FANCA alleles. Over follow-up, 33% of patients improved or maintained mild/moderate cytopenia or normal blood count, whereas remaining worsened cytopenia. Eleven patients developed a hematological adverse event (MDS, AML, pathological cytogenetics) and three developed solid tumors. 10 years cumulative risk of death of the whole cohort was 25.6% with median follow-up 5.8 years. In patients eligible to hematopoietic stem cell transplantation because of moderate cytopenia, mortality was significantly higher in subjects transplanted from matched unrelated donor over nontransplanted subjects, whereas there was no significant difference between matched sibling donor transplants and nontransplanted patients. In patients eligible to transplant because of severe cytopenia and clonal disease, mortality risk was not significantly different in transplanted from matched unrelated versus matched sibling donor versus nontransplanted subjects. The decision to transplant should rely on various elements including, type of donor, HLA matching, patient comorbidities, impairment, and clonal evolution of hematopoiesis. Am. J. Hematol. 91:666-671, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Validating a strategy for psychosocial phenotyping using a large corpus of clinical text.
Gundlapalli, Adi V; Redd, Andrew; Carter, Marjorie; Divita, Guy; Shen, Shuying; Palmer, Miland; Samore, Matthew H
2013-12-01
To develop algorithms to improve efficiency of patient phenotyping using natural language processing (NLP) on text data. Of a large number of note titles available in our database, we sought to determine those with highest yield and precision for psychosocial concepts. From a database of over 1 billion documents from US Department of Veterans Affairs medical facilities, a random sample of 1500 documents from each of 218 enterprise note titles were chosen. Psychosocial concepts were extracted using a UIMA-AS-based NLP pipeline (v3NLP), using a lexicon of relevant concepts with negation and template format annotators. Human reviewers evaluated a subset of documents for false positives and sensitivity. High-yield documents were identified by hit rate and precision. Reasons for false positivity were characterized. A total of 58 707 psychosocial concepts were identified from 316 355 documents for an overall hit rate of 0.2 concepts per document (median 0.1, range 1.6-0). Of 6031 concepts reviewed from a high-yield set of note titles, the overall precision for all concept categories was 80%, with variability among note titles and concept categories. Reasons for false positivity included templating, negation, context, and alternate meaning of words. The sensitivity of the NLP system was noted to be 49% (95% CI 43% to 55%). Phenotyping using NLP need not involve the entire document corpus. Our methods offer a generalizable strategy for scaling NLP pipelines to large free text corpora with complex linguistic annotations in attempts to identify patients of a certain phenotype.
Validating a strategy for psychosocial phenotyping using a large corpus of clinical text
Gundlapalli, Adi V; Redd, Andrew; Carter, Marjorie; Divita, Guy; Shen, Shuying; Palmer, Miland; Samore, Matthew H
2013-01-01
Objective To develop algorithms to improve efficiency of patient phenotyping using natural language processing (NLP) on text data. Of a large number of note titles available in our database, we sought to determine those with highest yield and precision for psychosocial concepts. Materials and methods From a database of over 1 billion documents from US Department of Veterans Affairs medical facilities, a random sample of 1500 documents from each of 218 enterprise note titles were chosen. Psychosocial concepts were extracted using a UIMA-AS-based NLP pipeline (v3NLP), using a lexicon of relevant concepts with negation and template format annotators. Human reviewers evaluated a subset of documents for false positives and sensitivity. High-yield documents were identified by hit rate and precision. Reasons for false positivity were characterized. Results A total of 58 707 psychosocial concepts were identified from 316 355 documents for an overall hit rate of 0.2 concepts per document (median 0.1, range 1.6–0). Of 6031 concepts reviewed from a high-yield set of note titles, the overall precision for all concept categories was 80%, with variability among note titles and concept categories. Reasons for false positivity included templating, negation, context, and alternate meaning of words. The sensitivity of the NLP system was noted to be 49% (95% CI 43% to 55%). Conclusions Phenotyping using NLP need not involve the entire document corpus. Our methods offer a generalizable strategy for scaling NLP pipelines to large free text corpora with complex linguistic annotations in attempts to identify patients of a certain phenotype. PMID:24169276
[Phenotypic heterogeneity of chronic obstructive pulmonary disease].
Garcia-Aymerich, Judith; Agustí, Alvar; Barberà, Joan A; Belda, José; Farrero, Eva; Ferrer, Antoni; Ferrer, Jaume; Gáldiz, Juan B; Gea, Joaquim; Gómez, Federico P; Monsó, Eduard; Morera, Josep; Roca, Josep; Sauleda, Jaume; Antó, Josep M
2009-03-01
A functional definition of chronic obstructive pulmonary disease (COPD) based on airflow limitation has largely dominated the field. However, a view has emerged that COPD involves a complex array of cellular, organic, functional, and clinical events, with a growing interest in disentangling the phenotypic heterogeneity of COPD. The present review is based on the opinion of the authors, who have extensive research experience in several aspects of COPD. The starting assumption of the review is that current knowledge on the pathophysiology and clinical features of COPD allows us to classify phenotypic information in terms of the following dimensions: respiratory symptoms and health status, acute exacerbations, lung function, structural changes, local and systemic inflammation, and systemic effects. Twenty-six phenotypic traits were identified and assigned to one of the 6 dimensions. For each dimension, a summary is provided of the best evidence on the relationships among phenotypic traits, in particular among those corresponding to different dimensions, and on the relationship between these traits and relevant events in the natural history of COPD. The information has been organized graphically into a phenotypic matrix where each cell representing a pair of phenotypic traits is linked to relevant references. The information provided has the potential to increase our understanding of the heterogeneity of COPD phenotypes and help us plan future studies on aspects that are as yet unexplored.
Ananiadou, Sophia
2016-01-01
Biomedical literature articles and narrative content from Electronic Health Records (EHRs) both constitute rich sources of disease-phenotype information. Phenotype concepts may be mentioned in text in multiple ways, using phrases with a variety of structures. This variability stems partly from the different backgrounds of the authors, but also from the different writing styles typically used in each text type. Since EHR narrative reports and literature articles contain different but complementary types of valuable information, combining details from each text type can help to uncover new disease-phenotype associations. However, the alternative ways in which the same concept may be mentioned in each source constitutes a barrier to the automatic integration of information. Accordingly, identification of the unique concepts represented by phrases in text can help to bridge the gap between text types. We describe our development of a novel method, PhenoNorm, which integrates a number of different similarity measures to allow automatic linking of phenotype concept mentions to known concepts in the UMLS Metathesaurus, a biomedical terminological resource. PhenoNorm was developed using the PhenoCHF corpus—a collection of literature articles and narratives in EHRs, annotated for phenotypic information relating to congestive heart failure (CHF). We evaluate the performance of PhenoNorm in linking CHF-related phenotype mentions to Metathesaurus concepts, using a newly enriched version of PhenoCHF, in which each phenotype mention has an expert-verified link to a concept in the UMLS Metathesaurus. We show that PhenoNorm outperforms a number of alternative methods applied to the same task. Furthermore, we demonstrate PhenoNorm’s wider utility, by evaluating its ability to link mentions of various other types of medically-related information, occurring in texts covering wider subject areas, to concepts in different terminological resources. We show that PhenoNorm can maintain performance levels, and that its accuracy compares favourably to other methods applied to these tasks. PMID:27643689
Butler, Merlin G.; Rafi, Syed K.; Manzardo, Ann M.
2015-01-01
Recently, autism-related research has focused on the identification of various genes and disturbed pathways causing the genetically heterogeneous group of autism spectrum disorders (ASD). The list of autism-related genes has significantly increased due to better awareness with advances in genetic technology and expanding searchable genomic databases. We compiled a master list of known and clinically relevant autism spectrum disorder genes identified with supporting evidence from peer-reviewed medical literature sources by searching key words related to autism and genetics and from authoritative autism-related public access websites, such as the Simons Foundation Autism Research Institute autism genomic database dedicated to gene discovery and characterization. Our list consists of 792 genes arranged in alphabetical order in tabular form with gene symbols placed on high-resolution human chromosome ideograms, thereby enabling clinical and laboratory geneticists and genetic counsellors to access convenient visual images of the location and distribution of ASD genes. Meaningful correlations of the observed phenotype in patients with suspected/confirmed ASD gene(s) at the chromosome region or breakpoint band site can be made to inform diagnosis and gene-based personalized care and provide genetic counselling for families. PMID:25803107
IVS-II-648/649 (-T) (HBB: c.316-202del) Triggers a Novel β-Thalassemia Phenotype.
Azimi, Azam; Alibakhshi, Reza; Hayati, Hasibeh; Tahmasebi, Soosan; Alimoradi, Sasan
2017-01-01
Thalassemia is the most common inherited disorder in Iran. There are approximately 800 different genomic alterations of the β-globin gene described in the HbVar database. In this study, we identified a novel mutation in a 21-year-old woman [IVS-II-648/649 (-T); HBB: c.316-202del)] and describe its clinical implications. Two other members of this family, all with hematological and clinical features associated with β-thalassemia (β-thal), also carried this mutation. The molecular diagnosis of the β-globin gene mutation was performed by direct sequencing. Based on the observed β-thal phenotype and in silico analysis results, we concluded that this novel β-globin gene mutation was associated with the mild phenotype of β-thal.
Evaluation of personal digital assistant drug information databases for the managed care pharmacist.
Lowry, Colleen M; Kostka-Rokosz, Maria D; McCloskey, William W
2003-01-01
Personal digital assistants (PDAs) are becoming a necessity for practicing pharmacists. They offer a time-saving and convenient way to obtain current drug information. Several software companies now offer general drug information databases for use on hand held computers. PDAs priced less than 200 US dollars often have limited memory capacity; therefore, the user must choose from a growing list of general drug information database options in order to maximize utility without exceeding memory capacity. This paper reviews the attributes of available general drug information software databases for the PDA. It provides information on the content, advantages, limitations, pricing, memory requirements, and accessibility of drug information software databases. Ten drug information databases were subjectively analyzed and evaluated based on information from the product.s Web site, vendor Web sites, and from our experience. Some of these databases have attractive auxiliary features such as kinetics calculators, disease references, drug-drug and drug-herb interaction tools, and clinical guidelines, which may make them more useful to the PDA user. Not all drug information databases are equal with regard to content, author credentials, frequency of updates, and memory requirements. The user must therefore evaluate databases for completeness, currency, and cost effectiveness before purchase. In addition, consideration should be given to the ease of use and flexibility of individual programs.
Savige, Judy; Dagher, Hayat; Povey, Sue
2014-07-01
This study examined whether gene-specific DNA variant databases for inherited diseases of the kidney fulfilled the Human Variome Project recommendations of being complete, accurate, clinically relevant and freely available. A recent review identified 60 inherited renal diseases caused by mutations in 132 genes. The disease name, MIM number, gene name, together with "mutation" or "database," were used to identify web-based databases. Fifty-nine diseases (98%) due to mutations in 128 genes had a variant database. Altogether there were 349 databases (a median of 3 per gene, range 0-6), but no gene had two databases with the same number of variants, and 165 (50%) databases included fewer than 10 variants. About half the databases (180, 54%) had been updated in the previous year. Few (77, 23%) were curated by "experts" but these included nine of the 11 with the most variants. Even fewer databases (41, 12%) included clinical features apart from the name of the associated disease. Most (223, 67%) could be accessed without charge, including those for 50 genes (40%) with the maximum number of variants. Future efforts should focus on encouraging experts to collaborate on a single database for each gene affected in inherited renal disease, including both unpublished variants, and clinical phenotypes. © 2014 WILEY PERIODICALS, INC.
Computable visually observed phenotype ontological framework for plants
2011-01-01
Background The ability to search for and precisely compare similar phenotypic appearances within and across species has vast potential in plant science and genetic research. The difficulty in doing so lies in the fact that many visual phenotypic data, especially visually observed phenotypes that often times cannot be directly measured quantitatively, are in the form of text annotations, and these descriptions are plagued by semantic ambiguity, heterogeneity, and low granularity. Though several bio-ontologies have been developed to standardize phenotypic (and genotypic) information and permit comparisons across species, these semantic issues persist and prevent precise analysis and retrieval of information. A framework suitable for the modeling and analysis of precise computable representations of such phenotypic appearances is needed. Results We have developed a new framework called the Computable Visually Observed Phenotype Ontological Framework for plants. This work provides a novel quantitative view of descriptions of plant phenotypes that leverages existing bio-ontologies and utilizes a computational approach to capture and represent domain knowledge in a machine-interpretable form. This is accomplished by means of a robust and accurate semantic mapping module that automatically maps high-level semantics to low-level measurements computed from phenotype imagery. The framework was applied to two different plant species with semantic rules mined and an ontology constructed. Rule quality was evaluated and showed high quality rules for most semantics. This framework also facilitates automatic annotation of phenotype images and can be adopted by different plant communities to aid in their research. Conclusions The Computable Visually Observed Phenotype Ontological Framework for plants has been developed for more efficient and accurate management of visually observed phenotypes, which play a significant role in plant genomics research. The uniqueness of this framework is its ability to bridge the knowledge of informaticians and plant science researchers by translating descriptions of visually observed phenotypes into standardized, machine-understandable representations, thus enabling the development of advanced information retrieval and phenotype annotation analysis tools for the plant science community. PMID:21702966
[A web-based integrated clinical database for laryngeal cancer].
E, Qimin; Liu, Jialin; Li, Yong; Liang, Chuanyu
2014-08-01
To establish an integrated database for laryngeal cancer, and to provide an information platform for laryngeal cancer in clinical and fundamental researches. This database also meet the needs of clinical and scientific use. Under the guidance of clinical expert, we have constructed a web-based integrated clinical database for laryngeal carcinoma on the basis of clinical data standards, Apache+PHP+MySQL technology, laryngeal cancer specialist characteristics and tumor genetic information. A Web-based integrated clinical database for laryngeal carcinoma had been developed. This database had a user-friendly interface and the data could be entered and queried conveniently. In addition, this system utilized the clinical data standards and exchanged information with existing electronic medical records system to avoid the Information Silo. Furthermore, the forms of database was integrated with laryngeal cancer specialist characteristics and tumor genetic information. The Web-based integrated clinical database for laryngeal carcinoma has comprehensive specialist information, strong expandability, high feasibility of technique and conforms to the clinical characteristics of laryngeal cancer specialties. Using the clinical data standards and structured handling clinical data, the database can be able to meet the needs of scientific research better and facilitate information exchange, and the information collected and input about the tumor sufferers are very informative. In addition, the user can utilize the Internet to realize the convenient, swift visit and manipulation on the database.
Social Cognition, Social Skill, and the Broad Autism Phenotype
ERIC Educational Resources Information Center
Sasson, Noah J.; Nowlin, Rachel B.; Pinkham, Amy E.
2013-01-01
Social-cognitive deficits differentiate parents with the "broad autism phenotype" from non-broad autism phenotype parents more robustly than other neuropsychological features of autism, suggesting that this domain may be particularly informative for identifying genetic and brain processes associated with the phenotype. The current study…
Federal Register 2010, 2011, 2012, 2013, 2014
2011-08-30
...] FDA's Public Database of Products With Orphan-Drug Designation: Replacing Non-Informative Code Names... replaced non- informative code names with descriptive identifiers on its public database of products that... on our public database with non-informative code names. After careful consideration of this matter...
The FlyBase database of the Drosophila genome projects and community literature
2002-01-01
FlyBase (http://flybase.bio.indiana.edu/) provides an integrated view of the fundamental genomic and genetic data on the major genetic model Drosophila melanogaster and related species. Following on the success of the Drosophila genome project, FlyBase has primary responsibility for the continual reannotation of the D.melanogaster genome. The ultimate goal of the reannotation effort is to decorate the euchromatic sequence of the genome with as much biological information as is available from the community and from the major genome project centers. The current cycle of reannotation focuses on establishing a comprehensive data set of gene models (i.e. transcription units and CDSs). There are many points of entry to the genome within FlyBase, most notably through maps, gene ontologies, structured phenotypic and gene expression data, and anatomy. PMID:11752267
Envirotyping for deciphering environmental impacts on crop plants.
Xu, Yunbi
2016-04-01
Global climate change imposes increasing impacts on our environments and crop production. To decipher environmental impacts on crop plants, the concept "envirotyping" is proposed, as a third "typing" technology, complementing with genotyping and phenotyping. Environmental factors can be collected through multiple environmental trials, geographic and soil information systems, measurement of soil and canopy properties, and evaluation of companion organisms. Envirotyping contributes to crop modeling and phenotype prediction through its functional components, including genotype-by-environment interaction (GEI), genes responsive to environmental signals, biotic and abiotic stresses, and integrative phenotyping. Envirotyping, driven by information and support systems, has a wide range of applications, including environmental characterization, GEI analysis, phenotype prediction, near-iso-environment construction, agronomic genomics, precision agriculture and breeding, and development of a four-dimensional profile of crop science involving genotype (G), phenotype (P), envirotype (E) and time (T) (developmental stage). In the future, envirotyping needs to zoom into specific experimental plots and individual plants, along with the development of high-throughput and precision envirotyping platforms, to integrate genotypic, phenotypic and envirotypic information for establishing a high-efficient precision breeding and sustainable crop production system based on deciphered environmental impacts.
The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes.
Mao, Qing; Ciotlos, Serban; Zhang, Rebecca Yu; Ball, Madeleine P; Chin, Robert; Carnevali, Paolo; Barua, Nina; Nguyen, Staci; Agarwal, Misha R; Clegg, Tom; Connelly, Abram; Vandewege, Ward; Zaranek, Alexander Wait; Estep, Preston W; Church, George M; Drmanac, Radoje; Peters, Brock A
2016-10-11
Since the completion of the Human Genome Project in 2003, it is estimated that more than 200,000 individual whole human genomes have been sequenced. A stunning accomplishment in such a short period of time. However, most of these were sequenced without experimental haplotype data and are therefore missing an important aspect of genome biology. In addition, much of the genomic data is not available to the public and lacks phenotypic information. As part of the Personal Genome Project, blood samples from 184 participants were collected and processed using Complete Genomics' Long Fragment Read technology. Here, we present the experimental whole genome haplotyping and sequencing of these samples to an average read coverage depth of 100X. This is approximately three-fold higher than the read coverage applied to most whole human genome assemblies and ensures the highest quality results. Currently, 114 genomes from this dataset are freely available in the GigaDB repository and are associated with rich phenotypic data; the remaining 70 should be added in the near future as they are approved through the PGP data release process. For reproducibility analyses, 20 genomes were sequenced at least twice using independent LFR barcoded libraries. Seven genomes were also sequenced using Complete Genomics' standard non-barcoded library process. In addition, we report 2.6 million high-quality, rare variants not previously identified in the Single Nucleotide Polymorphisms database or the 1000 Genomes Project Phase 3 data. These genomes represent a unique source of haplotype and phenotype data for the scientific community and should help to expand our understanding of human genome evolution and function.
2014-01-01
Background There is an urgent need for expanding and enhancing autism spectrum disorder (ASD) samples, in order to better understand causes of ASD. Methods In a unique public-private partnership, 13 sites with extensive experience in both the assessment and diagnosis of ASD embarked on an ambitious, 2-year program to collect samples for genetic and phenotypic research and begin analyses on these samples. The program was called The Autism Simplex Collection (TASC). TASC sample collection began in 2008 and was completed in 2010, and included nine sites from North America and four sites from Western Europe, as well as a centralized Data Coordinating Center. Results Over 1,700 trios are part of this collection, with DNA from transformed cells now available through the National Institute of Mental Health (NIMH). Autism Diagnostic Interview-Revised (ADI-R) and Autism Diagnostic Observation Schedule-Generic (ADOS-G) measures are available for all probands, as are standardized IQ measures, Vineland Adaptive Behavioral Scales (VABS), the Social Responsiveness Scale (SRS), Peabody Picture Vocabulary Test (PPVT), and physical measures (height, weight, and head circumference). At almost every site, additional phenotypic measures were collected, including the Broad Autism Phenotype Questionnaire (BAPQ) and Repetitive Behavior Scale-Revised (RBS-R), as well as the non-word repetition scale, Communication Checklist (Children’s or Adult), and Aberrant Behavior Checklist (ABC). Moreover, for nearly 1,000 trios, the Autism Genome Project Consortium (AGP) has carried out Illumina 1 M SNP genotyping and called copy number variation (CNV) in the samples, with data being made available through the National Institutes of Health (NIH). Whole exome sequencing (WES) has been carried out in over 500 probands, together with ancestry matched controls, and this data is also available through the NIH. Additional WES is being carried out by the Autism Sequencing Consortium (ASC), where the focus is on sequencing complete trios. ASC sequencing for the first 1,000 samples (all from whole-blood DNA) is complete and data will be released in 2014. Data is being made available through NIH databases (database of Genotypes and Phenotypes (dbGaP) and National Database for Autism Research (NDAR)) with DNA released in Dist 11.0. Primary funding for the collection, genotyping, sequencing and distribution of TASC samples was provided by Autism Speaks and the NIH, including the National Institute of Mental Health (NIMH) and the National Human Genetics Research Institute (NHGRI). Conclusions TASC represents an important sample set that leverages expert sites. Similar approaches, leveraging expert sites and ongoing studies, represent an important path towards further enhancing available ASD samples. PMID:25392729
Genetic and environmental pathways to complex diseases.
Gohlke, Julia M; Thomas, Reuben; Zhang, Yonqing; Rosenstein, Michael C; Davis, Allan P; Murphy, Cynthia; Becker, Kevin G; Mattingly, Carolyn J; Portier, Christopher J
2009-05-05
Pathogenesis of complex diseases involves the integration of genetic and environmental factors over time, making it particularly difficult to tease apart relationships between phenotype, genotype, and environmental factors using traditional experimental approaches. Using gene-centered databases, we have developed a network of complex diseases and environmental factors through the identification of key molecular pathways associated with both genetic and environmental contributions. Comparison with known chemical disease relationships and analysis of transcriptional regulation from gene expression datasets for several environmental factors and phenotypes clustered in a metabolic syndrome and neuropsychiatric subnetwork supports our network hypotheses. This analysis identifies natural and synthetic retinoids, antipsychotic medications, Omega 3 fatty acids, and pyrethroid pesticides as potential environmental modulators of metabolic syndrome phenotypes through PPAR and adipocytokine signaling and organophosphate pesticides as potential environmental modulators of neuropsychiatric phenotypes. Identification of key regulatory pathways that integrate genetic and environmental modulators define disease associated targets that will allow for efficient screening of large numbers of environmental factors, screening that could set priorities for further research and guide public health decisions.
Updating the profile of C-terminal MECP2 deletions in Rett syndrome
Bebbington, A; Percy, A; Christodoulou, J; Ravine, D; Ho, G; Jacoby, P; Anderson, A; Pineda, M; Ben Zeev, B; Bahi-Buisson, N; Smeets, E; Leonard, H
2014-01-01
Objectives This study aimed to compare the phenotype of Rett syndrome cases with C-terminal deletions to that of cases with different MECP2 mutations and to examine the phenotypic variation within C-terminal deletions. Methods Cases were selected from InterRett, an international database and from the population-based Australian Rett Syndrome Database. Cases (n=832) were included if they had a pathogenic MECP2 mutation in which the nature of the amino acid change was known. Three severity scale systems were used, and individual aspects of the phenotype were also compared. Results Lower severity was associated with C-terminal deletions (n=79) compared to all other MECP2 mutations (e.g. Pineda scale C-terminals mean 15.0 (95% CI 14.0–16.0) vs 16.2 (15.9–16.5). Cases with C-terminal deletions were more likely to have a normal head circumference (odds ratio 3.22, 95% CI 1.53 – 6.79) and weight (odds ratio 2.97, 95% CI 1.25–5.76). Onset of stereotypies tended to be later (median age 2.5 years vs 2 years, p<0.001 from survival analysis), and age of learning to walk tended to be earlier (median age 1.6 years vs 2 years, p=0.002 from survival analysis). Those with C-terminal deletions occurring later in the region had lower average severity scores than those occurring earlier in the region. Conclusion In terms of overall severity C-terminal deletion cases would appear to be in the middle of the range. In terms of individual aspects of phenotype growth and ability to ambulate appear to be particular strengths. By pooling data internationally this study has achieved the case numbers to provide a phenotypic profile of C-terminal deletions in Rett syndrome. PMID:19914908
Miller, P Elliott; Martin, Seth S; Toth, Peter P; Santos, Raul D; Blaha, Michael J; Nasir, Khurram; Virani, Salim S; Post, Wendy S; Blumenthal, Roger S; Jones, Steven R
2015-01-01
Familial hypercholesterolemia (FH) is an autosomal dominant dyslipidemia characterized by defective low-density lipoprotein (LDL) clearance. The aim of this study was to compare Friedewald-estimated LDL cholesterol (LDL-C) to biologic LDL-C in individuals screening positive for FH and then further characterize FH phenotypes. We studied 1,320,581 individuals from the Very Large Database of Lipids, referred from 2009 to 2011 for Vertical Auto Profile ultracentrifugation testing. Friedewald LDL-C was defined as the cholesterol content of LDL-C, intermediate-density lipoprotein cholesterol, and lipoprotein(a) cholesterol (Lp(a)-C), with LDL-C representing biologic LDL-C. Using Friedewald LDL-C, we phenotypically categorized patients by the National Lipid Association guideline age-based screening thresholds for FH. In those meeting criteria, we categorized patients using population percentile-equivalent biologic LDL-C cutpoints and explored Lp(a)-C and remnant lipoprotein cholesterol (RLP-C) levels. Overall, 3829 patients met phenotypic criteria for FH by Friedewald LDL-C screening (FH+). Of those screening FH+, 78.8% were above and 21.2% were below the population percentile-equivalent biologic LDL-C. The mean difference in Friedewald biologic LDL-C percentiles was -0.01 (standard deviation, 0.17) for those above, and 1.92 (standard deviation, 9.16) for those below, respectively. Over 1 of 3 were found to have an elevated Lp(a)-C and over 50% had RLP-C greater than 95th percentile of the entire VLDL population. Of those who screened FH+, Friedewald and biologic LDL-C levels were closely correlated. Large proportions of the FH+ group had excess levels of Lp(a)-C and RLP-C. Future studies are warranted to study these mixed phenotypic groups and determine the role for further risk stratification and treatment algorithms. Copyright © 2015 National Lipid Association. Published by Elsevier Inc. All rights reserved.
Buske, Orion J.; Girdea, Marta; Dumitriu, Sergiu; Gallinger, Bailey; Hartley, Taila; Trang, Heather; Misyura, Andriy; Friedman, Tal; Beaulieu, Chandree; Bone, William P.; Links, Amanda E.; Washington, Nicole L.; Haendel, Melissa A.; Robinson, Peter N.; Boerkoel, Cornelius F.; Adams, David; Gahl, William A.; Boycott, Kym M.; Brudno, Michael
2017-01-01
The discovery of disease-causing mutations typically requires confirmation of the variant or gene in multiple unrelated individuals, and a large number of rare genetic diseases remain unsolved due to difficulty identifying second families. To enable the secure sharing of case records by clinicians and rare disease scientists, we have developed the PhenomeCentral portal (https://phenomecentral.org). Each record includes a phenotypic description and relevant genetic information (exome or candidate genes). PhenomeCentral identifies similar patients in the database based on semantic similarity between clinical features, automatically prioritized genes from whole-exome data, and candidate genes entered by the users, enabling both hypothesis-free and hypothesis-driven matchmaking. Users can then contact other submitters to follow up on promising matches. PhenomeCentral incorporates data for over 1,000 patients with rare genetic diseases, contributed by the FORGE and Care4Rare Canada projects, the US NIH Undiagnosed Diseases Program, the EU Neuromics and ANDDIrare projects, as well as numerous independent clinicians and scientists. Though the majority of these records have associated exome data, most lack a molecular diagnosis. PhenomeCentral has already been used to identify causative mutations for several patients, and its ability to find matching patients and diagnose these diseases will grow with each additional patient that is entered. PMID:26251998
A RESTful application programming interface for the PubMLST molecular typing and genome databases
Bray, James E.; Maiden, Martin C. J.
2017-01-01
Abstract Molecular typing is used to differentiate microorganisms at the subspecies or strain level for epidemiological investigations, infection control, public health and environmental sampling. DNA sequence-based typing methods require authoritative databases that link sequence variants to nomenclature in order to facilitate communication and comparison of identified types in national or global settings. The PubMLST website (https://pubmlst.org/) fulfils this role for over a hundred microorganisms for which it hosts curated molecular sequence typing data, providing sequence and allelic profile definitions for multi-locus sequence typing (MLST) and single-gene typing approaches. In recent years, these have expanded to cover the whole genome with schemes such as core genome MLST (cgMLST) and whole genome MLST (wgMLST) which catalogue the allelic diversity found in hundreds to thousands of genes. These approaches provide a common nomenclature for high-resolution strain characterization and comparison. Molecular typing information is linked to isolate provenance, phenotype, and increasingly genome assemblies, providing a resource for outbreak investigation and research in to population structure, gene association, global epidemiology and vaccine coverage. A Representational State Transfer (REST) Application Programming Interface (API) has been developed for the PubMLST website to make these large quantities of structured molecular typing and whole genome sequence data available for programmatic access by any third party application. The API is an integral component of the Bacterial Isolate Genome Sequence Database (BIGSdb) platform that is used to host PubMLST resources, and exposes all public data within the site. In addition to data browsing, searching and download, the API supports authentication and submission of new data to curator queues. Database URL: http://rest.pubmlst.org/ PMID:29220452
Genetic identification of missing persons: DNA analysis of human remains and compromised samples.
Alvarez-Cubero, M J; Saiz, M; Martinez-Gonzalez, L J; Alvarez, J C; Eisenberg, A J; Budowle, B; Lorente, J A
2012-01-01
Human identification has made great strides over the past 2 decades due to the advent of DNA typing. Forensic DNA typing provides genetic data from a variety of materials and individuals, and is applied to many important issues that confront society. Part of the success of DNA typing is the generation of DNA databases to help identify missing persons and to develop investigative leads to assist law enforcement. DNA databases house DNA profiles from convicted felons (and in some jurisdictions arrestees), forensic evidence, human remains, and direct and family reference samples of missing persons. These databases are essential tools, which are becoming quite large (for example the US Database contains 10 million profiles). The scientific, governmental and private communities continue to work together to standardize genetic markers for more effective worldwide data sharing, to develop and validate robust DNA typing kits that contain the reagents necessary to type core identity genetic markers, to develop technologies that facilitate a number of analytical processes and to develop policies to make human identity testing more effective. Indeed, DNA typing is integral to resolving a number of serious criminal and civil concerns, such as solving missing person cases and identifying victims of mass disasters and children who may have been victims of human trafficking, and provides information for historical studies. As more refined capabilities are still required, novel approaches are being sought, such as genetic testing by next-generation sequencing, mass spectrometry, chip arrays and pyrosequencing. Single nucleotide polymorphisms offer the potential to analyze severely compromised biological samples, to determine the facial phenotype of decomposed human remains and to predict the bioancestry of individuals, a new focus in analyzing this type of markers. Copyright © 2012 S. Karger AG, Basel.
Tassy, Olivier; Dauga, Delphine; Daian, Fabrice; Sobral, Daniel; Robin, François; Khoueiry, Pierre; Salgado, David; Fox, Vanessa; Caillol, Danièle; Schiappa, Renaud; Laporte, Baptiste; Rios, Anne; Luxardi, Guillaume; Kusakabe, Takehiro; Joly, Jean-Stéphane; Darras, Sébastien; Christiaen, Lionel; Contensin, Magali; Auger, Hélène; Lamy, Clément; Hudson, Clare; Rothbächer, Ute; Gilchrist, Michael J; Makabe, Kazuhiro W; Hotta, Kohji; Fujiwara, Shigeki; Satoh, Nori; Satou, Yutaka; Lemaire, Patrick
2010-10-01
Developmental biology aims to understand how the dynamics of embryonic shapes and organ functions are encoded in linear DNA molecules. Thanks to recent progress in genomics and imaging technologies, systemic approaches are now used in parallel with small-scale studies to establish links between genomic information and phenotypes, often described at the subcellular level. Current model organism databases, however, do not integrate heterogeneous data sets at different scales into a global view of the developmental program. Here, we present a novel, generic digital system, NISEED, and its implementation, ANISEED, to ascidians, which are invertebrate chordates suitable for developmental systems biology approaches. ANISEED hosts an unprecedented combination of anatomical and molecular data on ascidian development. This includes the first detailed anatomical ontologies for these embryos, and quantitative geometrical descriptions of developing cells obtained from reconstructed three-dimensional (3D) embryos up to the gastrula stages. Fully annotated gene model sets are linked to 30,000 high-resolution spatial gene expression patterns in wild-type and experimentally manipulated conditions and to 528 experimentally validated cis-regulatory regions imported from specialized databases or extracted from 160 literature articles. This highly structured data set can be explored via a Developmental Browser, a Genome Browser, and a 3D Virtual Embryo module. We show how integration of heterogeneous data in ANISEED can provide a system-level understanding of the developmental program through the automatic inference of gene regulatory interactions, the identification of inducing signals, and the discovery and explanation of novel asymmetric divisions.
Tassy, Olivier; Dauga, Delphine; Daian, Fabrice; Sobral, Daniel; Robin, François; Khoueiry, Pierre; Salgado, David; Fox, Vanessa; Caillol, Danièle; Schiappa, Renaud; Laporte, Baptiste; Rios, Anne; Luxardi, Guillaume; Kusakabe, Takehiro; Joly, Jean-Stéphane; Darras, Sébastien; Christiaen, Lionel; Contensin, Magali; Auger, Hélène; Lamy, Clément; Hudson, Clare; Rothbächer, Ute; Gilchrist, Michael J.; Makabe, Kazuhiro W.; Hotta, Kohji; Fujiwara, Shigeki; Satoh, Nori; Satou, Yutaka; Lemaire, Patrick
2010-01-01
Developmental biology aims to understand how the dynamics of embryonic shapes and organ functions are encoded in linear DNA molecules. Thanks to recent progress in genomics and imaging technologies, systemic approaches are now used in parallel with small-scale studies to establish links between genomic information and phenotypes, often described at the subcellular level. Current model organism databases, however, do not integrate heterogeneous data sets at different scales into a global view of the developmental program. Here, we present a novel, generic digital system, NISEED, and its implementation, ANISEED, to ascidians, which are invertebrate chordates suitable for developmental systems biology approaches. ANISEED hosts an unprecedented combination of anatomical and molecular data on ascidian development. This includes the first detailed anatomical ontologies for these embryos, and quantitative geometrical descriptions of developing cells obtained from reconstructed three-dimensional (3D) embryos up to the gastrula stages. Fully annotated gene model sets are linked to 30,000 high-resolution spatial gene expression patterns in wild-type and experimentally manipulated conditions and to 528 experimentally validated cis-regulatory regions imported from specialized databases or extracted from 160 literature articles. This highly structured data set can be explored via a Developmental Browser, a Genome Browser, and a 3D Virtual Embryo module. We show how integration of heterogeneous data in ANISEED can provide a system-level understanding of the developmental program through the automatic inference of gene regulatory interactions, the identification of inducing signals, and the discovery and explanation of novel asymmetric divisions. PMID:20647237
Comparison of Online Agricultural Information Services.
ERIC Educational Resources Information Center
Reneau, Fred; Patterson, Richard
1984-01-01
Outlines major online agricultural information services--agricultural databases, databases with agricultural services, educational databases in agriculture--noting services provided, access to the database, and costs. Benefits of online agricultural database sources (availability of agricultural marketing, weather, commodity prices, management…
WMC Database Evaluation. Case Study Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Palounek, Andrea P. T
The WMC Database is ultimately envisioned to hold a collection of experimental data, design information, and information from computational models. This project was a first attempt at using the Database to access experimental data and extract information from it. This evaluation shows that the Database concept is sound and robust, and that the Database, once fully populated, should remain eminently usable for future researchers.
MIPS: analysis and annotation of proteins from whole genomes
Mewes, H. W.; Amid, C.; Arnold, R.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Münsterkötter, M.; Pagel, P.; Strack, N.; Stümpflen, V.; Warfsmann, J.; Ruepp, A.
2004-01-01
The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein–protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de). PMID:14681354
MIPS: analysis and annotation of proteins from whole genomes.
Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A
2004-01-01
The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).
Rationale of the FIBROTARGETS study designed to identify novel biomarkers of myocardial fibrosis
Ferreira, João Pedro; Machu, Jean‐Loup; Girerd, Nicolas; Jaisser, Frederic; Thum, Thomas; Butler, Javed; González, Arantxa; Diez, Javier; Heymans, Stephane; McDonald, Kenneth; Gyöngyösi, Mariann; Firat, Hueseyin; Rossignol, Patrick; Pizard, Anne
2017-01-01
Abstract Aims Myocardial fibrosis alters the cardiac architecture favouring the development of cardiac dysfunction, including arrhythmias and heart failure. Reducing myocardial fibrosis may improve outcomes through the targeted diagnosis and treatment of emerging fibrotic pathways. The European‐Commission‐funded ‘FIBROTARGETS’ is a multinational academic and industrial consortium with the main aims of (i) characterizing novel key mechanistic pathways involved in the metabolism of fibrillary collagen that may serve as biotargets, (ii) evaluating the potential anti‐fibrotic properties of novel or repurposed molecules interfering with the newly identified biotargets, and (iii) characterizing bioprofiles based on distinct mechanistic phenotypes involving the aforementioned biotargets. These pathways will be explored by performing a systematic and collaborative search for mechanisms and targets of myocardial fibrosis. These mechanisms will then be translated into individualized diagnostic tools and specific therapeutic pharmacological options for heart failure. Methods and results The FIBROTARGETS consortium has merged data from 12 patient cohorts in a common database available to individual consortium partners. The database consists of >12 000 patients with a large spectrum of cardiovascular clinical phenotypes. It integrates community‐based population cohorts, cardiovascular risk cohorts, and heart failure cohorts. Conclusions The FIBROTARGETS biomarker programme is aimed at exploring fibrotic pathways allowing the bioprofiling of patients into specific ‘fibrotic’ phenotypes and identifying new therapeutic targets that will potentially enable the development of novel and tailored anti‐fibrotic therapies for heart failure. PMID:28988439
The U.S. Environmental Protection Agency (EPA), through its ToxCast program, is developing predictive toxicity approaches that will use in vitro high-throughput screening (HTS), high-content screening (HCS) and toxicogenomic data to predict in vivo toxicity phenotypes. There are ...
How to integrate quantitative information into imaging reports for oncologic patients.
Martí-Bonmatí, L; Ruiz-Martínez, E; Ten, A; Alberich-Bayarri, A
2018-05-01
Nowadays, the images and information generated in imaging tests, as well as the reports that are issued, are digital and represent a reliable source of data. Reports can be classified according to their content and to the type of information they include into three main types: organized (free text in natural language), predefined (with templates and guidelines elaborated with previously determined natural language like that used in BI-RADS and PI-RADS), or structured (with drop-down menus displaying questions with various possible answers that have been agreed on with the rest of the multidisciplinary team, which use standardized lexicons and are structured in the form of a database with data that can be traced and exploited with statistical tools and data mining). The structured report, compatible with Management of Radiology Report Templates (MRRT), makes it possible to incorporate quantitative information related with the digital analysis of the data from the acquired images to accurately and precisely describe the properties and behavior of tissues by means of radiomics (characteristics and parameters). In conclusion, structured digital information (images, text, measurements, radiomic features, and imaging biomarkers) should be integrated into computerized reports so that they can be indexed in large repositories. Radiologic databanks are fundamental for exploiting health information, phenotyping lesions and diseases, and extracting conclusions in personalized medicine. Copyright © 2018 SERAM. Publicado por Elsevier España, S.L.U. All rights reserved.
Epigenetic Inheritance and Its Role in Evolutionary Biology: Re-Evaluation and New Perspectives
Burggren, Warren
2016-01-01
Epigenetics increasingly occupies a pivotal position in our understanding of inheritance, natural selection and, perhaps, even evolution. A survey of the PubMed database, however, reveals that the great majority (>93%) of epigenetic papers have an intra-, rather than an inter-generational focus, primarily on mechanisms and disease. Approximately ~1% of epigenetic papers even mention the nexus of epigenetics, natural selection and evolution. Yet, when environments are dynamic (e.g., climate change effects), there may be an “epigenetic advantage” to phenotypic switching by epigenetic inheritance, rather than by gene mutation. An epigenetically-inherited trait can arise simultaneously in many individuals, as opposed to a single individual with a gene mutation. Moreover, a transient epigenetically-modified phenotype can be quickly “sunsetted”, with individuals reverting to the original phenotype. Thus, epigenetic phenotype switching is dynamic and temporary and can help bridge periods of environmental stress. Epigenetic inheritance likely contributes to evolution both directly and indirectly. While there is as yet incomplete evidence of direct permanent incorporation of a complex epigenetic phenotype into the genome, doubtlessly, the presence of epigenetic markers and the phenotypes they create (which may sort quite separately from the genotype within a population) will influence natural selection and, so, drive the collective genotype of a population. PMID:27231949
Wu, Huiqun; Wei, Yufang; Shang, Yujuan; Shi, Wei; Wang, Lei; Li, Jingjing; Sang, Aimin; Shi, Lili; Jiang, Kui; Dong, Jiancheng
2018-06-06
Type 2 diabetes mellitus (T2DM) is a common chronic disease, and the fragment data collected through separated vendors makes continuous management of DM patients difficult. The lack of standard of fragment data from those diabetic patients also makes the further potential phenotyping based on the diabetic data difficult. Traditional T2DM data repository only supports data collection from T2DM patients, lack of phenotyping ability and relied on standalone database design, limiting the secondary usage of these valuable data. To solve these issues, we proposed a novel T2DM data repository framework, which was based on standards. This repository can integrate data from various sources. It would be used as a standardized record for further data transfer as well as integration. Phenotyping was conducted based on clinical guidelines with KNIME workflow. To evaluate the phenotyping performance of the proposed system, data was collected from local community by healthcare providers and was then tested using algorithms. The results indicated that the proposed system could detect DR cases with an average accuracy of about 82.8%. Furthermore, these results had the promising potential of addressing fragmented data. The proposed system has integrating and phenotyping abilities, which could be used for diabetes research in future studies.
Initiative for standardization of reporting genetics of male infertility.
Traven, Eva; Ogrinc, Ana; Kunej, Tanja
2017-02-01
The number of publications on research of male infertility is increasing. Technologies used in research of male infertility generate complex results and various types of data that need to be appropriately managed, arranged, and made available to other researchers for further use. In our previous study, we collected over 800 candidate loci for male fertility in seven mammalian species. However, the continuation of the work towards a comprehensive database of candidate genes associated with different types of idiopathic human male infertility is challenging due to fragmented information, obtained from a variety of technologies and various omics approaches. Results are published in different forms and usually need to be excavated from the text, which hinders the gathering of information. Standardized reporting of genetic anomalies as well as causative and risk factors of male infertility therefore presents an important issue. The aim of the study was to collect examples of diverse genomic loci published in association with human male infertility and to propose a standardized format for reporting genetic causes of male infertility. From the currently available data we have selected 75 studies reporting 186 representative genomic loci which have been proposed as genetic risk factors for male infertility. Based on collected and formatted data, we suggested a first step towards unification of reporting the genetics of male infertility in original and review studies. The proposed initiative consists of five relevant data types: 1) genetic locus, 2) race/ethnicity, number of participants (infertile/controls), 3) methodology, 4) phenotype (clinical data, disease ontology, and disease comorbidity), and 5) reference. The proposed form for standardized reporting presents a baseline for further optimization with additional genetic and clinical information. This data standardization initiative will enable faster multi-omics data integration, database development and sharing, establishing more targeted hypotheses, and facilitating biomarker discovery.
De novo assembly and transcriptomic profiling of the grazing response in Stipa grandis.
Wan, Dongli; Wan, Yongqing; Hou, Xiangyang; Ren, Weibo; Ding, Yong; Sa, Rula
2015-01-01
Stipa grandis (Poaceae) is one of the dominant species in a typical steppe of the Inner Mongolian Plateau. However, primarily due to heavy grazing, the grasslands have become seriously degraded, and S. grandis has developed a special growth-inhibition phenotype against the stressful habitat. Because of the lack of transcriptomic and genomic information, the understanding of the molecular mechanisms underlying the grazing response of S. grandis has been prohibited. Using the Illumina HiSeq 2000 platform, two libraries prepared from non-grazing (FS) and overgrazing samples (OS) were sequenced. De novo assembly produced 94,674 unigenes, of which 65,047 unigenes had BLAST hits in the National Center for Biotechnology Information (NCBI) non-redundant (nr) database (E-value < 10-5). In total, 47,747, 26,156 and 40,842 unigenes were assigned to the Gene Ontology (GO), Clusters of Orthologous Group (COG), and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases, respectively. A total of 13,221 unigenes showed significant differences in expression under the overgrazing condition, with a threshold false discovery rate ≤ 0.001 and an absolute value of log2Ratio ≥ 1. These differentially expressed genes (DEGs) were assigned to 43,257 GO terms and were significantly enriched in 32 KEGG pathways (q-value ≤ 0.05). The alterations in the wound-, drought- and defense-related genes indicate that stressors have an additive effect on the growth inhibition of this species. This first large-scale transcriptome study will provide important information for further gene expression and functional genomics studies, and it facilitated our investigation of the molecular mechanisms of the S. grandis grazing response and the associated morphological and physiological characteristics.
De novo Assembly and Transcriptomic Profiling of the Grazing Response in Stipa grandis
Hou, Xiangyang; Ren, Weibo; Ding, Yong; Sa, Rula
2015-01-01
Background Stipa grandis (Poaceae) is one of the dominant species in a typical steppe of the Inner Mongolian Plateau. However, primarily due to heavy grazing, the grasslands have become seriously degraded, and S. grandis has developed a special growth-inhibition phenotype against the stressful habitat. Because of the lack of transcriptomic and genomic information, the understanding of the molecular mechanisms underlying the grazing response of S. grandis has been prohibited. Results Using the Illumina HiSeq 2000 platform, two libraries prepared from non-grazing (FS) and overgrazing samples (OS) were sequenced. De novo assembly produced 94,674 unigenes, of which 65,047 unigenes had BLAST hits in the National Center for Biotechnology Information (NCBI) non-redundant (nr) database (E-value < 10-5). In total, 47,747, 26,156 and 40,842 unigenes were assigned to the Gene Ontology (GO), Clusters of Orthologous Group (COG), and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases, respectively. A total of 13,221 unigenes showed significant differences in expression under the overgrazing condition, with a threshold false discovery rate ≤ 0.001 and an absolute value of log2Ratio ≥ 1. These differentially expressed genes (DEGs) were assigned to 43,257 GO terms and were significantly enriched in 32 KEGG pathways (q-value ≤ 0.05). The alterations in the wound-, drought- and defense-related genes indicate that stressors have an additive effect on the growth inhibition of this species. Conclusions This first large-scale transcriptome study will provide important information for further gene expression and functional genomics studies, and it facilitated our investigation of the molecular mechanisms of the S. grandis grazing response and the associated morphological and physiological characteristics. PMID:25875617
ERIC Educational Resources Information Center
Blackwell, Michael Lind
This study evaluates the "Education Resources Information Center" (ERIC), "Library and Information Science Abstracts" (LISA), and "Library Literature" (LL) databases, determining how long the databases take to enter records (indexing delay), how much duplication of effort exists among the three databases (indexing…
Redefining Aging in HIV Infection Using Phenotypes.
Stoff, David M; Goodkin, Karl; Jeste, Dilip; Marquine, Maria
2017-10-01
This article critically reviews the utility of "phenotypes" as behavioral descriptors in aging/HIV research that inform biological underpinnings and treatment development. We adopt a phenotypic redefinition of aging conceptualized within a broader context of HIV infection and of aging. Phenotypes are defined as dimensions of behavior, closely related to fundamental mechanisms, and, thus, may be more informative than chronological age. Primary emphasis in this review is given to comorbid aging and cognitive aging, though other phenotypes (i.e., disability, frailty, accelerated aging, successful aging) are also discussed in relation to comorbid aging and cognitive aging. The main findings that emerged from this review are as follows: (1) the phenotypes, comorbid aging and cognitive aging, are distinct from each other, yet overlapping; (2) associative relationships are the rule in HIV for comorbid and cognitive aging phenotypes; and (3) HIV behavioral interventions for both comorbid aging and cognitive aging have been limited. Three paths for research progress are identified for phenotype-defined aging/HIV research (i.e., clinical and behavioral specification, biological mechanisms, intervention targets), and some important research questions are suggested within each of these research paths.
Detecting Genetic Interactions for Quantitative Traits Using m-Spacing Entropy Measure
Yee, Jaeyong; Kwon, Min-Seok; Park, Taesung; Park, Mira
2015-01-01
A number of statistical methods for detecting gene-gene interactions have been developed in genetic association studies with binary traits. However, many phenotype measures are intrinsically quantitative and categorizing continuous traits may not always be straightforward and meaningful. Association of gene-gene interactions with an observed distribution of such phenotypes needs to be investigated directly without categorization. Information gain based on entropy measure has previously been successful in identifying genetic associations with binary traits. We extend the usefulness of this information gain by proposing a nonparametric evaluation method of conditional entropy of a quantitative phenotype associated with a given genotype. Hence, the information gain can be obtained for any phenotype distribution. Because any functional form, such as Gaussian, is not assumed for the entire distribution of a trait or a given genotype, this method is expected to be robust enough to be applied to any phenotypic association data. Here, we show its use to successfully identify the main effect, as well as the genetic interactions, associated with a quantitative trait. PMID:26339620
An Integrated Molecular Database on Indian Insects.
Pratheepa, Maria; Venkatesan, Thiruvengadam; Gracy, Gandhi; Jalali, Sushil Kumar; Rangheswaran, Rajagopal; Antony, Jomin Cruz; Rai, Anil
2018-01-01
MOlecular Database on Indian Insects (MODII) is an online database linking several databases like Insect Pest Info, Insect Barcode Information System (IBIn), Insect Whole Genome sequence, Other Genomic Resources of National Bureau of Agricultural Insect Resources (NBAIR), Whole Genome sequencing of Honey bee viruses, Insecticide resistance gene database and Genomic tools. This database was developed with a holistic approach for collecting information about phenomic and genomic information of agriculturally important insects. This insect resource database is available online for free at http://cib.res.in. http://cib.res.in/.
Mukherjee, Vaskar; Radecka, Dorota; Aerts, Guido; Verstrepen, Kevin J; Lievens, Bart; Thevelein, Johan M
2017-01-01
Non-conventional yeasts present a huge, yet barely exploited, resource of yeast biodiversity for industrial applications. This presents a great opportunity to explore alternative ethanol-fermenting yeasts that are more adapted to some of the stress factors present in the harsh environmental conditions in second-generation (2G) bioethanol fermentation. Extremely tolerant yeast species are interesting candidates to investigate the underlying tolerance mechanisms and to identify genes that when transferred to existing industrial strains could help to design more stress-tolerant cell factories. For this purpose, we performed a high-throughput phenotypic evaluation of a large collection of non-conventional yeast species to identify the tolerance limits of the different yeast species for desirable stress tolerance traits in 2G bioethanol production. Next, 12 multi-tolerant strains were selected and used in fermentations under different stressful conditions. Five strains out of which, showing desirable fermentation characteristics, were then evaluated in small-scale, semi-anaerobic fermentations with lignocellulose hydrolysates. Our results revealed the phenotypic landscape of many non-conventional yeast species which have not been previously characterized for tolerance to stress conditions relevant for bioethanol production. This has identified for each stress condition evaluated several extremely tolerant non- Saccharomyces yeasts. It also revealed multi-tolerance in several yeast species, which makes those species good candidates to investigate the molecular basis of a robust general stress tolerance. The results showed that some non-conventional yeast species have similar or even better fermentation efficiency compared to S. cerevisiae in the presence of certain stressful conditions. Prior to this study, our knowledge on extreme stress-tolerant phenotypes in non-conventional yeasts was limited to only few species. Our work has now revealed in a systematic way the potential of non- Saccharomyces species to emerge either as alternative host species or as a source of valuable genetic information for construction of more robust industrial S. serevisiae bioethanol production yeasts. Striking examples include yeast species like Pichia kudriavzevii and Wickerhamomyces anomalus that show very high tolerance to diverse stress factors. This large-scale phenotypic analysis has yielded a detailed database useful as a resource for future studies to understand and benefit from the molecular mechanisms underlying the extreme phenotypes of non-conventional yeast species.
Federated Tensor Factorization for Computational Phenotyping
Kim, Yejin; Sun, Jimeng; Yu, Hwanjo; Jiang, Xiaoqian
2017-01-01
Tensor factorization models offer an effective approach to convert massive electronic health records into meaningful clinical concepts (phenotypes) for data analysis. These models need a large amount of diverse samples to avoid population bias. An open challenge is how to derive phenotypes jointly across multiple hospitals, in which direct patient-level data sharing is not possible (e.g., due to institutional policies). In this paper, we developed a novel solution to enable federated tensor factorization for computational phenotyping without sharing patient-level data. We developed secure data harmonization and federated computation procedures based on alternating direction method of multipliers (ADMM). Using this method, the multiple hospitals iteratively update tensors and transfer secure summarized information to a central server, and the server aggregates the information to generate phenotypes. We demonstrated with real medical datasets that our method resembles the centralized training model (based on combined datasets) in terms of accuracy and phenotypes discovery while respecting privacy. PMID:29071165
47 CFR 69.120 - Line information database.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 47 Telecommunication 3 2011-10-01 2011-10-01 false Line information database. 69.120 Section 69...) ACCESS CHARGES Computation of Charges § 69.120 Line information database. (a) A charge that is expressed... from a local exchange carrier database to recover the costs of: (1) The transmission facilities between...
47 CFR 69.120 - Line information database.
Code of Federal Regulations, 2013 CFR
2013-10-01
... 47 Telecommunication 3 2013-10-01 2013-10-01 false Line information database. 69.120 Section 69...) ACCESS CHARGES Computation of Charges § 69.120 Line information database. (a) A charge that is expressed... from a local exchange carrier database to recover the costs of: (1) The transmission facilities between...
47 CFR 69.120 - Line information database.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 47 Telecommunication 3 2014-10-01 2014-10-01 false Line information database. 69.120 Section 69...) ACCESS CHARGES Computation of Charges § 69.120 Line information database. (a) A charge that is expressed... from a local exchange carrier database to recover the costs of: (1) The transmission facilities between...
47 CFR 69.120 - Line information database.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 47 Telecommunication 3 2010-10-01 2010-10-01 false Line information database. 69.120 Section 69...) ACCESS CHARGES Computation of Charges § 69.120 Line information database. (a) A charge that is expressed... from a local exchange carrier database to recover the costs of: (1) The transmission facilities between...
47 CFR 69.120 - Line information database.
Code of Federal Regulations, 2012 CFR
2012-10-01
... 47 Telecommunication 3 2012-10-01 2012-10-01 false Line information database. 69.120 Section 69...) ACCESS CHARGES Computation of Charges § 69.120 Line information database. (a) A charge that is expressed... from a local exchange carrier database to recover the costs of: (1) The transmission facilities between...
The Protein Information Resource: an integrated public resource of functional annotation of proteins
Wu, Cathy H.; Huang, Hongzhan; Arminski, Leslie; Castro-Alvear, Jorge; Chen, Yongxing; Hu, Zhang-Zhi; Ledley, Robert S.; Lewis, Kali C.; Mewes, Hans-Werner; Orcutt, Bruce C.; Suzek, Baris E.; Tsugita, Akira; Vinayaka, C. R.; Yeh, Lai-Su L.; Zhang, Jian; Barker, Winona C.
2002-01-01
The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation of protein data to support genomic/proteomic research and scientific discovery. The PIR, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the PIR-International Protein Sequence Database (PSD), the major annotated protein sequence database in the public domain, containing about 250 000 proteins. To improve protein annotation and the coverage of experimentally validated data, a bibliography submission system is developed for scientists to submit, categorize and retrieve literature information. Comprehensive protein information is available from iProClass, which includes family classification at the superfamily, domain and motif levels, structural and functional features of proteins, as well as cross-references to over 40 biological databases. To provide timely and comprehensive protein data with source attribution, we have introduced a non-redundant reference protein database, PIR-NREF. The database consists of about 800 000 proteins collected from PIR-PSD, SWISS-PROT, TrEMBL, GenPept, RefSeq and PDB, with composite protein names and literature data. To promote database interoperability, we provide XML data distribution and open database schema, and adopt common ontologies. The PIR web site (http://pir.georgetown.edu/) features data mining and sequence analysis tools for information retrieval and functional identification of proteins based on both sequence and annotation information. The PIR databases and other files are also available by FTP (ftp://nbrfa.georgetown.edu/pir_databases). PMID:11752247
Weston, David J; Gunter, Lee E; Rogers, Alistair; Wullschleger, Stan D
2008-01-01
Background One of the eminent opportunities afforded by modern genomic technologies is the potential to provide a mechanistic understanding of the processes by which genetic change translates to phenotypic variation and the resultant appearance of distinct physiological traits. Indeed much progress has been made in this area, particularly in biomedicine where functional genomic information can be used to determine the physiological state (e.g., diagnosis) and predict phenotypic outcome (e.g., patient survival). Ecology currently lacks an analogous approach where genomic information can be used to diagnose the presence of a given physiological state (e.g., stress response) and then predict likely phenotypic outcomes (e.g., stress duration and tolerance, fitness). Results Here, we demonstrate that a compendium of genomic signatures can be used to classify the plant abiotic stress phenotype in Arabidopsis according to the architecture of the transcriptome, and then be linked with gene coexpression network analysis to determine the underlying genes governing the phenotypic response. Using this approach, we confirm the existence of known stress responsive pathways and marker genes, report a common abiotic stress responsive transcriptome and relate phenotypic classification to stress duration. Conclusion Linking genomic signatures to gene coexpression analysis provides a unique method of relating an observed plant phenotype to changes in gene expression that underlie that phenotype. Such information is critical to current and future investigations in plant biology and, in particular, to evolutionary ecology, where a mechanistic understanding of adaptive physiological responses to abiotic stress can provide researchers with a tool of great predictive value in understanding species and population level adaptation to climate change. PMID:18248680
Bonatti, Francesco; Adorni, Alessia; Matichecchia, Annalisa; Mozzoni, Paola; Uliana, Vera; Pisani, Francesco; Garavelli, Livia; Graziano, Claudio; Gnoli, Maria; Bigoni, Stefania; Boschi, Elena; Martorana, Davide; Percesepe, Antonio
2017-01-01
Neurofibromatosis type I, a genetic disorder due to mutations in the NF1 gene, is characterized by a high mutation rate (about 50% of the cases are de novo) but, with the exception of whole gene deletions associated with a more severe phenotype, no specific hotspots and few solid genotype/phenotype correlations. After retrospectively re-evaluating all NF1 gene variants found in the diagnostic activity, we studied 108 patients affected by neurofibromatosis type I who harbored mutations that had not been previously reported in the international databases, with the aim of analyzing their type and distribution along the gene and of correlating them with the phenotypic features of the affected patients. Out of the 108 previously unreported variants, 14 were inherited by one of the affected parents and 94 were de novo. Twenty-nine (26.9%) mutations were of uncertain significance, whereas 79 (73.2%) were predicted as pathogenic or probably pathogenic. No differential distribution in the exons or in the protein domains was observed and no statistically significant genotype/phenotype correlation was found, confirming previous evidences. PMID:28961165
Kim, Changkug; Park, Dongsuk; Seol, Youngjoo; Hahn, Jangho
2011-01-01
The National Agricultural Biotechnology Information Center (NABIC) constructed an agricultural biology-based infrastructure and developed a Web based relational database for agricultural plants with biotechnology information. The NABIC has concentrated on functional genomics of major agricultural plants, building an integrated biotechnology database for agro-biotech information that focuses on genomics of major agricultural resources. This genome database provides annotated genome information from 1,039,823 records mapped to rice, Arabidopsis, and Chinese cabbage.
Benchmarking distributed data warehouse solutions for storing genomic variant information
Wiewiórka, Marek S.; Wysakowicz, Dawid P.; Okoniewski, Michał J.
2017-01-01
Abstract Genomic-based personalized medicine encompasses storing, analysing and interpreting genomic variants as its central issues. At a time when thousands of patientss sequenced exomes and genomes are becoming available, there is a growing need for efficient database storage and querying. The answer could be the application of modern distributed storage systems and query engines. However, the application of large genomic variant databases to this problem has not been sufficiently far explored so far in the literature. To investigate the effectiveness of modern columnar storage [column-oriented Database Management System (DBMS)] and query engines, we have developed a prototypic genomic variant data warehouse, populated with large generated content of genomic variants and phenotypic data. Next, we have benchmarked performance of a number of combinations of distributed storages and query engines on a set of SQL queries that address biological questions essential for both research and medical applications. In addition, a non-distributed, analytical database (MonetDB) has been used as a baseline. Comparison of query execution times confirms that distributed data warehousing solutions outperform classic relational DBMSs. Moreover, pre-aggregation and further denormalization of data, which reduce the number of distributed join operations, significantly improve query performance by several orders of magnitude. Most of distributed back-ends offer a good performance for complex analytical queries, while the Optimized Row Columnar (ORC) format paired with Presto and Parquet with Spark 2 query engines provide, on average, the lowest execution times. Apache Kudu on the other hand, is the only solution that guarantees a sub-second performance for simple genome range queries returning a small subset of data, where low-latency response is expected, while still offering decent performance for running analytical queries. In summary, research and clinical applications that require the storage and analysis of variants from thousands of samples can benefit from the scalability and performance of distributed data warehouse solutions. Database URL: https://github.com/ZSI-Bio/variantsdwh PMID:29220442
Liu, Mei; Wu, Yonghui; Chen, Yukun; Sun, Jingchun; Zhao, Zhongming; Chen, Xue-wen; Matheny, Michael Edwin; Xu, Hua
2012-06-01
Adverse drug reaction (ADR) is one of the major causes of failure in drug development. Severe ADRs that go undetected until the post-marketing phase of a drug often lead to patient morbidity. Accurate prediction of potential ADRs is required in the entire life cycle of a drug, including early stages of drug design, different phases of clinical trials, and post-marketing surveillance. Many studies have utilized either chemical structures or molecular pathways of the drugs to predict ADRs. Here, the authors propose a machine-learning-based approach for ADR prediction by integrating the phenotypic characteristics of a drug, including indications and other known ADRs, with the drug's chemical structures and biological properties, including protein targets and pathway information. A large-scale study was conducted to predict 1385 known ADRs of 832 approved drugs, and five machine-learning algorithms for this task were compared. This evaluation, based on a fivefold cross-validation, showed that the support vector machine algorithm outperformed the others. Of the three types of information, phenotypic data were the most informative for ADR prediction. When biological and phenotypic features were added to the baseline chemical information, the ADR prediction model achieved significant improvements in area under the curve (from 0.9054 to 0.9524), precision (from 43.37% to 66.17%), and recall (from 49.25% to 63.06%). Most importantly, the proposed model successfully predicted the ADRs associated with withdrawal of rofecoxib and cerivastatin. The results suggest that phenotypic information on drugs is valuable for ADR prediction. Moreover, they demonstrate that different models that combine chemical, biological, or phenotypic information can be built from approved drugs, and they have the potential to detect clinically important ADRs in both preclinical and post-marketing phases.
Chantreau, Maxime; Grec, Sébastien; Gutierrez, Laurent; Dalmais, Marion; Pineau, Christophe; Demailly, Hervé; Paysant-Leroux, Christine; Tavernier, Reynald; Trouvé, Jean-Paul; Chatterjee, Manash; Guillot, Xavier; Brunaud, Véronique; Chabbert, Brigitte; van Wuytswinkel, Olivier; Bendahmane, Abdelhafid; Thomasset, Brigitte; Hawkins, Simon
2013-10-15
Flax (Linum usitatissimum L.) is an economically important fiber and oil crop that has been grown for thousands of years. The genome has been recently sequenced and transcriptomics are providing information on candidate genes potentially related to agronomically-important traits. In order to accelerate functional characterization of these genes we have generated a flax EMS mutant population that can be used as a TILLinG (Targeting Induced Local Lesions in Genomes) platform for forward and reverse genetics. A population of 4,894 M2 mutant seed families was generated using 3 different EMS concentrations (0.3%, 0.6% and 0.75%) and used to produce M2 plants for subsequent phenotyping and DNA extraction. 10,839 viable M2 plants (4,033 families) were obtained and 1,552 families (38.5%) showed a visual developmental phenotype (stem size and diameter, plant architecture, flower-related). The majority of these families showed more than one phenotype. Mutant phenotype data are organised in a database and can be accessed and searched at UTILLdb (http://urgv.evry.inra.fr/UTILLdb). Preliminary screens were also performed for atypical fiber and seed phenotypes. Genomic DNA was extracted from 3,515 M2 families and eight-fold pooled for subsequent mutant detection by ENDO1 nuclease mis-match cleavage. In order to validate the collection for reverse genetics, DNA pools were screened for two genes coding enzymes of the lignin biosynthesis pathway: Coumarate-3-Hydroxylase (C3H) and Cinnamyl Alcohol Dehydrogenase (CAD). We identified 79 and 76 mutations in the C3H and CAD genes, respectively. The average mutation rate was calculated as 1/41 Kb giving rise to approximately 9,000 mutations per genome. Thirty-five out of the 52 flax cad mutant families containing missense or codon stop mutations showed the typical orange-brown xylem phenotype observed in CAD down-regulated/mutant plants in other species. We have developed a flax mutant population that can be used as an efficient forward and reverse genetics tool. The collection has an extremely high mutation rate that enables the detection of large numbers of independant mutant families by screening a comparatively low number of M2 families. The population will prove to be a valuable resource for both fundamental research and the identification of agronomically-important genes for crop improvement in flax.
2013-01-01
Background Flax (Linum usitatissimum L.) is an economically important fiber and oil crop that has been grown for thousands of years. The genome has been recently sequenced and transcriptomics are providing information on candidate genes potentially related to agronomically-important traits. In order to accelerate functional characterization of these genes we have generated a flax EMS mutant population that can be used as a TILLinG (Targeting Induced Local Lesions in Genomes) platform for forward and reverse genetics. Results A population of 4,894 M2 mutant seed families was generated using 3 different EMS concentrations (0.3%, 0.6% and 0.75%) and used to produce M2 plants for subsequent phenotyping and DNA extraction. 10,839 viable M2 plants (4,033 families) were obtained and 1,552 families (38.5%) showed a visual developmental phenotype (stem size and diameter, plant architecture, flower-related). The majority of these families showed more than one phenotype. Mutant phenotype data are organised in a database and can be accessed and searched at UTILLdb (http://urgv.evry.inra.fr/UTILLdb). Preliminary screens were also performed for atypical fiber and seed phenotypes. Genomic DNA was extracted from 3,515 M2 families and eight-fold pooled for subsequent mutant detection by ENDO1 nuclease mis-match cleavage. In order to validate the collection for reverse genetics, DNA pools were screened for two genes coding enzymes of the lignin biosynthesis pathway: Coumarate-3-Hydroxylase (C3H) and Cinnamyl Alcohol Dehydrogenase (CAD). We identified 79 and 76 mutations in the C3H and CAD genes, respectively. The average mutation rate was calculated as 1/41 Kb giving rise to approximately 9,000 mutations per genome. Thirty-five out of the 52 flax cad mutant families containing missense or codon stop mutations showed the typical orange-brown xylem phenotype observed in CAD down-regulated/mutant plants in other species. Conclusions We have developed a flax mutant population that can be used as an efficient forward and reverse genetics tool. The collection has an extremely high mutation rate that enables the detection of large numbers of independant mutant families by screening a comparatively low number of M2 families. The population will prove to be a valuable resource for both fundamental research and the identification of agronomically-important genes for crop improvement in flax. PMID:24128060
The 24th annual Nucleic Acids Research database issue: a look back and upcoming changes
Rigden, Daniel J
2017-01-01
Abstract This year's Database Issue of Nucleic Acids Research contains 152 papers that include descriptions of 54 new databases and update papers on 98 databases, of which 16 have not been previously featured in NAR. As always, these databases cover a broad range of molecular biology subjects, including genome structure, gene expression and its regulation, proteins, protein domains, and protein–protein interactions. Following the recent trend, an increasing number of new and established databases deal with the issues of human health, from cancer-causing mutations to drugs and drug targets. In accordance with this trend, three recently compiled databases that have been selected by NAR reviewers and editors as ‘breakthrough’ contributions, denovo-db, the Monarch Initiative, and Open Targets, cover human de novo gene variants, disease-related phenotypes in model organisms, and a bioinformatics platform for therapeutic target identification and validation, respectively. We expect these databases to attract the attention of numerous researchers working in various areas of genetics and genomics. Looking back at the past 12 years, we present here the ‘golden set’ of databases that have consistently served as authoritative, comprehensive, and convenient data resources widely used by the entire community and offer some lessons on what makes a successful database. The Database Issue is freely available online at the https://academic.oup.com/nar web site. An updated version of the NAR Molecular Biology Database Collection is available at http://www.oxfordjournals.org/nar/database/a/. PMID:28053160
Code of Federal Regulations, 2014 CFR
2014-04-01
... Unique Device Identification Database. 830.350 Section 830.350 Food and Drugs FOOD AND DRUG... Global Unique Device Identification Database § 830.350 Correction of information submitted to the Global Unique Device Identification Database. (a) If FDA becomes aware that any information submitted to the...
Design and Establishment of Quality Model of Fundamental Geographic Information Database
NASA Astrophysics Data System (ADS)
Ma, W.; Zhang, J.; Zhao, Y.; Zhang, P.; Dang, Y.; Zhao, T.
2018-04-01
In order to make the quality evaluation for the Fundamental Geographic Information Databases(FGIDB) more comprehensive, objective and accurate, this paper studies and establishes a quality model of FGIDB, which formed by the standardization of database construction and quality control, the conformity of data set quality and the functionality of database management system, and also designs the overall principles, contents and methods of the quality evaluation for FGIDB, providing the basis and reference for carry out quality control and quality evaluation for FGIDB. This paper designs the quality elements, evaluation items and properties of the Fundamental Geographic Information Database gradually based on the quality model framework. Connected organically, these quality elements and evaluation items constitute the quality model of the Fundamental Geographic Information Database. This model is the foundation for the quality demand stipulation and quality evaluation of the Fundamental Geographic Information Database, and is of great significance on the quality assurance in the design and development stage, the demand formulation in the testing evaluation stage, and the standard system construction for quality evaluation technology of the Fundamental Geographic Information Database.
Evaluation of consumer drug information databases.
Choi, J A; Sullivan, J; Pankaskie, M; Brufsky, J
1999-01-01
To evaluate prescription drug information contained in six consumer drug information databases available on CD-ROM, and to make health care professionals aware of the information provided, so that they may appropriately recommend these databases for use by their patients. Observational study of six consumer drug information databases: The Corner Drug Store, Home Medical Advisor, Mayo Clinic Family Pharmacist, Medical Drug Reference, Mosby's Medical Encyclopedia, and PharmAssist. Not applicable. Not applicable. Information on 20 frequently prescribed drugs was evaluated in each database. The databases were ranked using a point-scale system based on primary and secondary assessment criteria. For the primary assessment, 20 categories of information based on those included in the 1998 edition of the USP DI Volume II, Advice for the Patient: Drug Information in Lay Language were evaluated for each of the 20 drugs, and each database could earn up to 400 points (for example, 1 point was awarded if the database mentioned a drug's mechanism of action). For the secondary assessment, the inclusion of 8 additional features that could enhance the utility of the databases was evaluated (for example, 1 point was awarded if the database contained a picture of the drug), and each database could earn up to 8 points. The results of the primary and secondary assessments, listed in order of highest to lowest number of points earned, are as follows: Primary assessment--Mayo Clinic Family Pharmacist (379), Medical Drug Reference (251), PharmAssist (176), Home Medical Advisor (113.5), The Corner Drug Store (98), and Mosby's Medical Encyclopedia (18.5); secondary assessment--The Mayo Clinic Family Pharmacist (8), The Corner Drug Store (5), Mosby's Medical Encyclopedia (5), Home Medical Advisor (4), Medical Drug Reference (4), and PharmAssist (3). The Mayo Clinic Family Pharmacist was the most accurate and complete source of prescription drug information based on the USP DI Volume II and would be an appropriate database for health care professionals to recommend to patients.
Kim, ChangKug; Park, DongSuk; Seol, YoungJoo; Hahn, JangHo
2011-01-01
The National Agricultural Biotechnology Information Center (NABIC) constructed an agricultural biology-based infrastructure and developed a Web based relational database for agricultural plants with biotechnology information. The NABIC has concentrated on functional genomics of major agricultural plants, building an integrated biotechnology database for agro-biotech information that focuses on genomics of major agricultural resources. This genome database provides annotated genome information from 1,039,823 records mapped to rice, Arabidopsis, and Chinese cabbage. PMID:21887015
CartograTree: connecting tree genomes, phenotypes and environment.
Vasquez-Gross, Hans A; Yu, John J; Figueroa, Ben; Gessler, Damian D G; Neale, David B; Wegrzyn, Jill L
2013-05-01
Today, researchers spend a tremendous amount of time gathering, formatting, filtering and visualizing data collected from disparate sources. Under the umbrella of forest tree biology, we seek to provide a platform and leverage modern technologies to connect biotic and abiotic data. Our goal is to provide an integrated web-based workspace that connects environmental, genomic and phenotypic data via geo-referenced coordinates. Here, we connect the genomic query web-based workspace, DiversiTree and a novel geographical interface called CartograTree to data housed on the TreeGenes database. To accomplish this goal, we implemented Simple Semantic Web Architecture and Protocol to enable the primary genomics database, TreeGenes, to communicate with semantic web services regardless of platform or back-end technologies. The novelty of CartograTree lies in the interactive workspace that allows for geographical visualization and engagement of high performance computing (HPC) resources. The application provides a unique tool set to facilitate research on the ecology, physiology and evolution of forest tree species. CartograTree can be accessed at: http://dendrome.ucdavis.edu/cartogratree. © 2013 Blackwell Publishing Ltd.
The development of digital library system for drug research information.
Kim, H J; Kim, S R; Yoo, D S; Lee, S H; Suh, O K; Cho, J H; Shin, H T; Yoon, J P
1998-01-01
The sophistication of computer technology and information transmission on internet has made various cyber information repository available to information consumers. In the era of information super-highway, the digital library which can be accessed from remote sites at any time is considered the prototype of information repository. Using object-oriented DBMS, the very first model of digital library for pharmaceutical researchers and related professionals in Korea has been developed. The published research papers and researchers' personal information was included in the database. For database with research papers, 13 domestic journals were abstracted and scanned for full-text image files which can be viewed by Internet web browsers. The database with researchers' personal information was also developed and interlinked to the database with research papers. These database will be continuously updated and will be combined with world-wide information as the unique digital library in the field of pharmacy.
Federal Register 2010, 2011, 2012, 2013, 2014
2010-08-13
... information. Access to any such database system is limited to system administrators, individuals responsible... during the certification process. The above information will be contained in one or more databases (such as Lotus Notes) that reside on servers in EPA offices. The database(s) may be specific to one...
Kohonen-Corish, Maija R J; Macrae, Finlay; Genuardi, Maurizio; Aretz, Stefan; Bapat, Bharati; Bernstein, Inge T; Burn, John; Cotton, Richard G H; den Dunnen, Johan T; Frebourg, Thierry; Greenblatt, Marc S; Hofstra, Robert; Holinski-Feder, Elke; Lappalainen, Ilkka; Lindblom, Annika; Maglott, Donna; Møller, Pål; Morreau, Hans; Möslein, Gabriela; Sijmons, Rolf; Spurdle, Amanda B; Tavtigian, Sean; Tops, Carli M J; Weber, Thomas K; de Wind, Niels; Woods, Michael O
2011-04-01
The Human Variome Project (HVP) has established a pilot program with the International Society for Gastrointestinal Hereditary Tumours (InSiGHT) to compile all inherited variation affecting colon cancer susceptibility genes. An HVP-InSiGHT Workshop was held on May 10, 2010, prior to the HVP Integration and Implementation Meeting at UNESCO in Paris, to review the progress of this pilot program. A wide range of topics were covered, including issues relating to genotype-phenotype data submission to the InSiGHT Colon Cancer Gene Variant Databases (chromium.liacs.nl/LOVD2/colon_cancer/home.php). The meeting also canvassed the recent exciting developments in models to evaluate the pathogenicity of unclassified variants using in silico data, tumor pathology information, and functional assays, and made further plans for the future progress and sustainability of the pilot program. © 2011 Wiley-Liss, Inc.
Zhang, Yaogong; Liu, Jiahui; Liu, Xiaohu; Hong, Yuxiang; Fan, Xin; Huang, Yalou; Wang, Yuan; Xie, Maoqiang
2018-04-24
Gene-phenotype association prediction can be applied to reveal the inherited basis of human diseases and facilitate drug development. Gene-phenotype associations are related to complex biological processes and influenced by various factors, such as relationship between phenotypes and that among genes. While due to sparseness of curated gene-phenotype associations and lack of integrated analysis of the joint effect of multiple factors, existing applications are limited to prediction accuracy and potential gene-phenotype association detection. In this paper, we propose a novel method by exploiting weighted graph constraint learned from hierarchical structures of phenotype data and group prior information among genes by inheriting advantages of Non-negative Matrix Factorization (NMF), called Weighted Graph Constraint and Group Centric Non-negative Matrix Factorization (GC[Formula: see text]NMF). Specifically, first we introduce the depth of parent-child relationships between two adjacent phenotypes in hierarchical phenotypic data as weighted graph constraint for a better phenotype understanding. Second, we utilize intra-group correlation among genes in a gene group as group constraint for gene understanding. Such information provides us with the intuition that genes in a group probably result in similar phenotypes. The model not only allows us to achieve a high-grade prediction performance, but also helps us to learn interpretable representation of genes and phenotypes simultaneously to facilitate future biological analysis. Experimental results on biological gene-phenotype association datasets of mouse and human demonstrate that GC[Formula: see text]NMF can obtain superior prediction accuracy and good understandability for biological explanation over other state-of-the-arts methods.
NABIC marker database: A molecular markers information network of agricultural crops.
Kim, Chang-Kug; Seol, Young-Joo; Lee, Dong-Jun; Jeong, In-Seon; Yoon, Ung-Han; Lee, Gang-Seob; Hahn, Jang-Ho; Park, Dong-Suk
2013-01-01
In 2013, National Agricultural Biotechnology Information Center (NABIC) reconstructs a molecular marker database for useful genetic resources. The web-based marker database consists of three major functional categories: map viewer, RSN marker and gene annotation. It provides 7250 marker locations, 3301 RSN marker property, 3280 molecular marker annotation information in agricultural plants. The individual molecular marker provides information such as marker name, expressed sequence tag number, gene definition and general marker information. This updated marker-based database provides useful information through a user-friendly web interface that assisted in tracing any new structures of the chromosomes and gene positional functions using specific molecular markers. The database is available for free at http://nabic.rda.go.kr/gere/rice/molecularMarkers/
77 FR 24925 - Privacy Act of 1974; System of Records
Federal Register 2010, 2011, 2012, 2013, 2014
2012-04-26
... CES Personnel Information System database of NIFA. This database is updated annually from data provided by 1862 and 1890 land-grant universities. This database is maintained by the Agricultural Research... reviewer. NIFA maintains a database of potential reviewers. Information in the database is used to match...
Suarez-Kurtz, Guilherme; Fuchshuber-Moraes, Mateus; Struchiner, Claudio J; Parra, Esteban J
2016-08-01
Several algorithms have been proposed to reduce the genotyping effort and cost, while retaining the accuracy of N-acetyltransferase-2 (NAT2) phenotype prediction. Data from the 1000 Genomes (1KG) project and an admixed cohort of Black Brazilians were used to assess the accuracy of NAT2 phenotype prediction using algorithms based on paired single nucleotide polymorphisms (SNPs) (rs1041983 and rs1801280) or a tag SNP (rs1495741). NAT2 haplotypes comprising SNPs rs1801279, rs1041983, rs1801280, rs1799929, rs1799930, rs1208 and rs1799931 were assigned according to the arylamine N-acetyltransferases database. Contingency tables were used to visualize the agreement between the NAT2 acetylator phenotypes on the basis of these haplotypes versus phenotypes inferred by the prediction algorithms. The paired and tag SNP algorithms provided more than 96% agreement with the 7-SNP derived phenotypes in Europeans, East Asians, South Asians and Admixed Americans, but discordance of phenotype prediction occurred in 30.2 and 24.8% 1KG Africans and in 14.4 and 18.6% Black Brazilians, respectively. Paired SNP panel misclassification occurs in carriers of NATs haplotypes *13A (282T alone), *12B (282T and 803G), *6B (590A alone) and *14A (191A alone), whereas haplotype *14, defined by the 191A allele, is the major culprit of misclassification by the tag allele. Both the paired SNP and the tag SNP algorithms may be used, with economy of scale, to infer NAT2 acetylator phenotypes, including the ultra-slow phenotype, in European, East Asian, South Asian and American populations represented in the 1KG cohort. Both algorithms, however, perform poorly in populations of predominant African descent, including admixed African-Americans, African Caribbeans and Black Brazilians.
Federal Register 2010, 2011, 2012, 2013, 2014
2013-10-31
... Extension of Approval; Comment Request--Publicly Available Consumer Product Safety Information Database... Publicly Available Consumer Product Safety Information Database. The Commission will consider all comments... intention to seek extension of approval of a collection of information for a database on the safety of...
78 FR 18232 - Amendment of VOR Federal Airway V-233, Springfield, IL
Federal Register 2010, 2011, 2012, 2013, 2014
2013-03-26
... it matches the information contained in the FAA's aeronautical database, matches the depiction on the... description did not match the airway information contained in the FAA's aeronautical database or the charted... information that should have been used. The FAA aeronautical database contains the correct radial information...
ERIC Educational Resources Information Center
American Society for Information Science, Washington, DC.
This document contains abstracts of papers on database design and management which were presented at the 1986 mid-year meeting of the American Society for Information Science (ASIS). Topics considered include: knowledge representation in a bilingual art history database; proprietary database design; relational database design; in-house databases;…
Zhao, Jiangsan; Rewald, Boris; Leitner, Daniel; Nagel, Kerstin A.; Nakhforoosh, Alireza
2017-01-01
Abstract Root phenotyping provides trait information for plant breeding. A shortcoming of high-throughput root phenotyping is the limitation to seedling plants and failure to make inferences on mature root systems. We suggest root system architecture (RSA) models to predict mature root traits and overcome the inference problem. Sixteen pea genotypes were phenotyped in (i) seedling (Petri dishes) and (ii) mature (sand-filled columns) root phenotyping platforms. The RSA model RootBox was parameterized with seedling traits to simulate the fully developed root systems. Measured and modelled root length, first-order lateral number, and root distribution were compared to determine key traits for model-based prediction. No direct relationship in root traits (tap, lateral length, interbranch distance) was evident between phenotyping systems. RootBox significantly improved the inference over phenotyping platforms. Seedling plant tap and lateral root elongation rates and interbranch distance were sufficient model parameters to predict genotype ranking in total root length with an RSpearman of 0.83. Parameterization including uneven lateral spacing via a scaling function substantially improved the prediction of architectures underlying the differently sized root systems. We conclude that RSA models can solve the inference problem of seedling root phenotyping. RSA models should be included in the phenotyping pipeline to provide reliable information on mature root systems to breeding research. PMID:28168270
Buu, Anne; Williams, L Keoki; Yang, James J
2018-03-01
We propose a new genome-wide association test for mixed binary and continuous phenotypes that uses an efficient numerical method to estimate the empirical distribution of the Fisher's combination statistic under the null hypothesis. Our simulation study shows that the proposed method controls the type I error rate and also maintains its power at the level of the permutation method. More importantly, the computational efficiency of the proposed method is much higher than the one of the permutation method. The simulation results also indicate that the power of the test increases when the genetic effect increases, the minor allele frequency increases, and the correlation between responses decreases. The statistical analysis on the database of the Study of Addiction: Genetics and Environment demonstrates that the proposed method combining multiple phenotypes can increase the power of identifying markers that may not be, otherwise, chosen using marginal tests.
Warburton, Marilyn L; Williams, William Paul; Hawkins, Leigh; Bridges, Susan; Gresham, Cathy; Harper, Jonathan; Ozkan, Seval; Mylroie, J Erik; Shan, Xueyan
2011-07-01
A public candidate gene testing pipeline for resistance to aflatoxin accumulation or Aspergillus flavus infection in maize is presented here. The pipeline consists of steps for identifying, testing, and verifying the association of selected maize gene sequences with resistance under field conditions. Resources include a database of genetic and protein sequences associated with the reduction in aflatoxin contamination from previous studies; eight diverse inbred maize lines for polymorphism identification within any maize gene sequence; four Quantitative Trait Loci (QTL) mapping populations and one association mapping panel, all phenotyped for aflatoxin accumulation resistance and associated phenotypes; and capacity for Insertion/Deletion (InDel) and SNP genotyping in the population(s) for mapping. To date, ten genes have been identified as possible candidate genes and put through the candidate gene testing pipeline, and results are presented here to demonstrate the utility of the pipeline.
Improving Microbial Genome Annotations in an Integrated Database Context
Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; Anderson, Iain; Mavromatis, Konstantinos; Kyrpides, Nikos C.; Ivanova, Natalia N.
2013-01-01
Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG) family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/. PMID:23424620
Leveraging Collaborative Filtering to Accelerate Rare Disease Diagnosis
Shen, Feichen; Liu, Sijia; Wang, Yanshan; Wang, Liwei; Afzal, Naveed; Liu, Hongfang
2017-01-01
In the USA, rare diseases are defined as those affecting fewer than 200,000 patients at any given time. Patients with rare diseases are frequently misdiagnosed or undiagnosed which may due to the lack of knowledge and experience of care providers. We hypothesize that patients’ phenotypic information available in electronic medical records (EMR) can be leveraged to accelerate disease diagnosis based on the intuition that providers need to document associated phenotypic information to support the diagnosis decision, especially for rare diseases. In this study, we proposed a collaborative filtering system enriched with natural language processing and semantic techniques to assist rare disease diagnosis based on phenotypic characterization. Specifically, we leveraged four similarity measurements with two neighborhood algorithms on 2010-2015 Mayo Clinic unstructured large patient cohort and evaluated different approaches. Preliminary results demonstrated that the use of collaborative filtering with phenotypic information is able to stratify patients with relatively similar rare diseases. PMID:29854225
Leveraging Collaborative Filtering to Accelerate Rare Disease Diagnosis.
Shen, Feichen; Liu, Sijia; Wang, Yanshan; Wang, Liwei; Afzal, Naveed; Liu, Hongfang
2017-01-01
In the USA, rare diseases are defined as those affecting fewer than 200,000 patients at any given time. Patients with rare diseases are frequently misdiagnosed or undiagnosed which may due to the lack of knowledge and experience of care providers. We hypothesize that patients' phenotypic information available in electronic medical records (EMR) can be leveraged to accelerate disease diagnosis based on the intuition that providers need to document associated phenotypic information to support the diagnosis decision, especially for rare diseases. In this study, we proposed a collaborative filtering system enriched with natural language processing and semantic techniques to assist rare disease diagnosis based on phenotypic characterization. Specifically, we leveraged four similarity measurements with two neighborhood algorithms on 2010-2015 Mayo Clinic unstructured large patient cohort and evaluated different approaches. Preliminary results demonstrated that the use of collaborative filtering with phenotypic information is able to stratify patients with relatively similar rare diseases.
Law, MeiYee; Shaw, David R
2018-01-01
Mouse Genome Informatics (MGI, http://www.informatics.jax.org/ ) web resources provide free access to meticulously curated information about the laboratory mouse. MGI's primary goal is to help researchers investigate the genetic foundations of human diseases by translating information from mouse phenotypes and disease models studies to human systems. MGI provides comprehensive phenotypes for over 50,000 mutant alleles in mice and provides experimental model descriptions for over 1500 human diseases. Curated data from scientific publications are integrated with those from high-throughput phenotyping and gene expression centers. Data are standardized using defined, hierarchical vocabularies such as the Mammalian Phenotype (MP) Ontology, Mouse Developmental Anatomy and the Gene Ontologies (GO). This chapter introduces you to Gene and Allele Detail pages and provides step-by-step instructions for simple searches and those that take advantage of the breadth of MGI data integration.
Lenert, L.; Lopez-Campos, G.
2014-01-01
Summary Objectives Given the quickening speed of discovery of variant disease drivers from combined patient genotype and phenotype data, the objective is to provide methodology using big data technology to support the definition of deep phenotypes in medical records. Methods As the vast stores of genomic information increase with next generation sequencing, the importance of deep phenotyping increases. The growth of genomic data and adoption of Electronic Health Records (EHR) in medicine provides a unique opportunity to integrate phenotype and genotype data into medical records. The method by which collections of clinical findings and other health related data are leveraged to form meaningful phenotypes is an active area of research. Longitudinal data stored in EHRs provide a wealth of information that can be used to construct phenotypes of patients. We focus on a practical problem around data integration for deep phenotype identification within EHR data. The use of big data approaches are described that enable scalable markup of EHR events that can be used for semantic and temporal similarity analysis to support the identification of phenotype and genotype relationships. Conclusions Stead and colleagues’ 2005 concept of using light standards to increase the productivity of software systems by riding on the wave of hardware/processing power is described as a harbinger for designing future healthcare systems. The big data solution, using flexible markup, provides a route to improved utilization of processing power for organizing patient records in genotype and phenotype research. PMID:25123744
Annual Review of Database Developments: 1993.
ERIC Educational Resources Information Center
Basch, Reva
1993-01-01
Reviews developments in the database industry for 1993. Topics addressed include scientific and technical information; environmental issues; social sciences; legal information; business and marketing; news services; documentation; databases and document delivery; electronic bulletin boards and the Internet; and information industry organizational…
16 CFR 1102.24 - Designation of confidential information.
Code of Federal Regulations, 2014 CFR
2014-01-01
... ACT REGULATIONS PUBLICLY AVAILABLE CONSUMER PRODUCT SAFETY INFORMATION DATABASE Procedural... allegedly confidential information is not placed in the database, a request for designation of confidential... publication in the Database until it makes a determination regarding confidential treatment. (e) Assistance...
16 CFR 1102.24 - Designation of confidential information.
Code of Federal Regulations, 2012 CFR
2012-01-01
... ACT REGULATIONS PUBLICLY AVAILABLE CONSUMER PRODUCT SAFETY INFORMATION DATABASE Procedural... allegedly confidential information is not placed in the database, a request for designation of confidential... publication in the Database until it makes a determination regarding confidential treatment. (e) Assistance...
Chen, X; Yang, L; Wang, H J; Wu, B B; Lu, Y L; Dong, X R; Zhou, W H
2018-05-02
Objective: To analyze the hotspots of known pathogenic disease-causing variants of glucose-6-phosphate dehydrogenase (G6PD) and the phenotype spectrum of neonatal patients with known pathogenic disease-causing variants of G6PD. Methods: The known pathogenic disease-causing variants of G6PD were collected from Human Gene Mutation Database. Screening was performed for these variants among the 7 966 cases (2 357 neonatal, 5 609 non-neonatal) in the database of sequencing at Molecular Diagnosis Center, Children's Hospital of Fudan University. All these samples were from patients suspected with genetic disorder. The database contained Whole Exon Sequencing data and Clinical Exon Sequencing data. We screened out the patients with known pathogenic disease-causing variants of G6PD, analyzed the hotspot of G6PD and the phenotype spectrum of neonatal patients with known pathogenic disease-causing variants of G6PD. Results: (1) Among the next generation sequencing data of the 7 966 samples, 86 samples (1.1%) were detected as positive for the known pathogenic disease-causing variants of G6PD (positive samples set). In the positive sample set, 51 patients (33 males, 18 females) were newborn babies. Forty-three patients (26 males, 17 females) had the enzyme activity data of G6PD. (2) Among the 86 samples, Arg463His, Arg459Leu, Leu342Phe, Val291Met were the leading 4 disease-causing variants found in 72 samples (84%). (3) Male neonatal patients with the same variants had the statistically significant differences in enzyme activity: among 13 patients with Arg463His, enzyme activity of 9 patients was ranked as grade Ⅲ, 1 case ranked as Ⅳ, 3 cases had no activity data;among 10 patients with Arg459Leu, enzyme activity of 4 patients was ranked as Ⅱ, 4 cases ranked as Ⅲ, 2 cases had no activity data;among 2 patients with His32Arg, enzyme activity of one patient was ranked as Ⅱ, another was Ⅲ. Male neonatal patients with the same mutation and enzyme activity also had the statistically significant differences in phenotype spectrum: among 9 patients with Arg463His and level Ⅲ enzyme activity, 6 presented hyperbilirubinemia, 2 met the criteria for exchange transfusion therapy, 2 showed hemolysis;among 4 patients with Arg459Leu and level Ⅱ enzyme activity, 3 presented hyperbilirubinemia;among 4 patients with Arg459Leu and level Ⅲ enzyme activity, 2 presented hyperbilirubinemia, 1 met the standard of exchange transfusion therapy;among 3 patients with Val291Met and level Ⅲ enzyme activity, 1 presented hyperbilirubinemia. Conclusions: Arg463His, Arg459Leu, Leu342Phe, Val291Met were the hotspots variants for the G6PD. Patients with the same G6PD variants and sex present different phenotype, patients with the same G6PD variants, sex and enzyme activity also present different phenotype .
Gehrmann, Sebastian; Dernoncourt, Franck; Li, Yeran; Carlson, Eric T; Wu, Joy T; Welt, Jonathan; Foote, John; Moseley, Edward T; Grant, David W; Tyler, Patrick D; Celi, Leo A
2018-01-01
In secondary analysis of electronic health records, a crucial task consists in correctly identifying the patient cohort under investigation. In many cases, the most valuable and relevant information for an accurate classification of medical conditions exist only in clinical narratives. Therefore, it is necessary to use natural language processing (NLP) techniques to extract and evaluate these narratives. The most commonly used approach to this problem relies on extracting a number of clinician-defined medical concepts from text and using machine learning techniques to identify whether a particular patient has a certain condition. However, recent advances in deep learning and NLP enable models to learn a rich representation of (medical) language. Convolutional neural networks (CNN) for text classification can augment the existing techniques by leveraging the representation of language to learn which phrases in a text are relevant for a given medical condition. In this work, we compare concept extraction based methods with CNNs and other commonly used models in NLP in ten phenotyping tasks using 1,610 discharge summaries from the MIMIC-III database. We show that CNNs outperform concept extraction based methods in almost all of the tasks, with an improvement in F1-score of up to 26 and up to 7 percentage points in area under the ROC curve (AUC). We additionally assess the interpretability of both approaches by presenting and evaluating methods that calculate and extract the most salient phrases for a prediction. The results indicate that CNNs are a valid alternative to existing approaches in patient phenotyping and cohort identification, and should be further investigated. Moreover, the deep learning approach presented in this paper can be used to assist clinicians during chart review or support the extraction of billing codes from text by identifying and highlighting relevant phrases for various medical conditions.
Torous, John; Kiang, Mathew V; Lorme, Jeanette; Onnela, Jukka-Pekka
2016-05-05
A longstanding barrier to progress in psychiatry, both in clinical settings and research trials, has been the persistent difficulty of accurately and reliably quantifying disease phenotypes. Mobile phone technology combined with data science has the potential to offer medicine a wealth of additional information on disease phenotypes, but the large majority of existing smartphone apps are not intended for use as biomedical research platforms and, as such, do not generate research-quality data. Our aim is not the creation of yet another app per se but rather the establishment of a platform to collect research-quality smartphone raw sensor and usage pattern data. Our ultimate goal is to develop statistical, mathematical, and computational methodology to enable us and others to extract biomedical and clinical insights from smartphone data. We report on the development and early testing of Beiwe, a research platform featuring a study portal, smartphone app, database, and data modeling and analysis tools designed and developed specifically for transparent, customizable, and reproducible biomedical research use, in particular for the study of psychiatric and neurological disorders. We also outline a proposed study using the platform for patients with schizophrenia. We demonstrate the passive data capabilities of the Beiwe platform and early results of its analytical capabilities. Smartphone sensors and phone usage patterns, when coupled with appropriate statistical learning tools, are able to capture various social and behavioral manifestations of illnesses, in naturalistic settings, as lived and experienced by patients. The ubiquity of smartphones makes this type of moment-by-moment quantification of disease phenotypes highly scalable and, when integrated within a transparent research platform, presents tremendous opportunities for research, discovery, and patient health.
Integration of Multiple Genomic and Phenotype Data to Infer Novel miRNA-Disease Associations
Zhou, Meng; Cheng, Liang; Yang, Haixiu; Wang, Jing; Sun, Jie; Wang, Zhenzhen
2016-01-01
MicroRNAs (miRNAs) play an important role in the development and progression of human diseases. The identification of disease-associated miRNAs will be helpful for understanding the molecular mechanisms of diseases at the post-transcriptional level. Based on different types of genomic data sources, computational methods for miRNA-disease association prediction have been proposed. However, individual source of genomic data tends to be incomplete and noisy; therefore, the integration of various types of genomic data for inferring reliable miRNA-disease associations is urgently needed. In this study, we present a computational framework, CHNmiRD, for identifying miRNA-disease associations by integrating multiple genomic and phenotype data, including protein-protein interaction data, gene ontology data, experimentally verified miRNA-target relationships, disease phenotype information and known miRNA-disease connections. The performance of CHNmiRD was evaluated by experimentally verified miRNA-disease associations, which achieved an area under the ROC curve (AUC) of 0.834 for 5-fold cross-validation. In particular, CHNmiRD displayed excellent performance for diseases without any known related miRNAs. The results of case studies for three human diseases (glioblastoma, myocardial infarction and type 1 diabetes) showed that all of the top 10 ranked miRNAs having no known associations with these three diseases in existing miRNA-disease databases were directly or indirectly confirmed by our latest literature mining. All these results demonstrated the reliability and efficiency of CHNmiRD, and it is anticipated that CHNmiRD will serve as a powerful bioinformatics method for mining novel disease-related miRNAs and providing a new perspective into molecular mechanisms underlying human diseases at the post-transcriptional level. CHNmiRD is freely available at http://www.bio-bigdata.com/CHNmiRD. PMID:26849207
Torous, John; Kiang, Mathew V; Lorme, Jeanette
2016-01-01
Background A longstanding barrier to progress in psychiatry, both in clinical settings and research trials, has been the persistent difficulty of accurately and reliably quantifying disease phenotypes. Mobile phone technology combined with data science has the potential to offer medicine a wealth of additional information on disease phenotypes, but the large majority of existing smartphone apps are not intended for use as biomedical research platforms and, as such, do not generate research-quality data. Objective Our aim is not the creation of yet another app per se but rather the establishment of a platform to collect research-quality smartphone raw sensor and usage pattern data. Our ultimate goal is to develop statistical, mathematical, and computational methodology to enable us and others to extract biomedical and clinical insights from smartphone data. Methods We report on the development and early testing of Beiwe, a research platform featuring a study portal, smartphone app, database, and data modeling and analysis tools designed and developed specifically for transparent, customizable, and reproducible biomedical research use, in particular for the study of psychiatric and neurological disorders. We also outline a proposed study using the platform for patients with schizophrenia. Results We demonstrate the passive data capabilities of the Beiwe platform and early results of its analytical capabilities. Conclusions Smartphone sensors and phone usage patterns, when coupled with appropriate statistical learning tools, are able to capture various social and behavioral manifestations of illnesses, in naturalistic settings, as lived and experienced by patients. The ubiquity of smartphones makes this type of moment-by-moment quantification of disease phenotypes highly scalable and, when integrated within a transparent research platform, presents tremendous opportunities for research, discovery, and patient health. PMID:27150677
Integration of Multiple Genomic and Phenotype Data to Infer Novel miRNA-Disease Associations.
Shi, Hongbo; Zhang, Guangde; Zhou, Meng; Cheng, Liang; Yang, Haixiu; Wang, Jing; Sun, Jie; Wang, Zhenzhen
2016-01-01
MicroRNAs (miRNAs) play an important role in the development and progression of human diseases. The identification of disease-associated miRNAs will be helpful for understanding the molecular mechanisms of diseases at the post-transcriptional level. Based on different types of genomic data sources, computational methods for miRNA-disease association prediction have been proposed. However, individual source of genomic data tends to be incomplete and noisy; therefore, the integration of various types of genomic data for inferring reliable miRNA-disease associations is urgently needed. In this study, we present a computational framework, CHNmiRD, for identifying miRNA-disease associations by integrating multiple genomic and phenotype data, including protein-protein interaction data, gene ontology data, experimentally verified miRNA-target relationships, disease phenotype information and known miRNA-disease connections. The performance of CHNmiRD was evaluated by experimentally verified miRNA-disease associations, which achieved an area under the ROC curve (AUC) of 0.834 for 5-fold cross-validation. In particular, CHNmiRD displayed excellent performance for diseases without any known related miRNAs. The results of case studies for three human diseases (glioblastoma, myocardial infarction and type 1 diabetes) showed that all of the top 10 ranked miRNAs having no known associations with these three diseases in existing miRNA-disease databases were directly or indirectly confirmed by our latest literature mining. All these results demonstrated the reliability and efficiency of CHNmiRD, and it is anticipated that CHNmiRD will serve as a powerful bioinformatics method for mining novel disease-related miRNAs and providing a new perspective into molecular mechanisms underlying human diseases at the post-transcriptional level. CHNmiRD is freely available at http://www.bio-bigdata.com/CHNmiRD.
Overview of FEED, the feeding experiments end-user database.
Wall, Christine E; Vinyard, Christopher J; Williams, Susan H; Gapeyev, Vladimir; Liu, Xianhua; Lapp, Hilmar; German, Rebecca Z
2011-08-01
The Feeding Experiments End-user Database (FEED) is a research tool developed by the Mammalian Feeding Working Group at the National Evolutionary Synthesis Center that permits synthetic, evolutionary analyses of the physiology of mammalian feeding. The tasks of the Working Group are to compile physiologic data sets into a uniform digital format stored at a central source, develop a standardized terminology for describing and organizing the data, and carry out a set of novel analyses using FEED. FEED contains raw physiologic data linked to extensive metadata. It serves as an archive for a large number of existing data sets and a repository for future data sets. The metadata are stored as text and images that describe experimental protocols, research subjects, and anatomical information. The metadata incorporate controlled vocabularies to allow consistent use of the terms used to describe and organize the physiologic data. The planned analyses address long-standing questions concerning the phylogenetic distribution of phenotypes involving muscle anatomy and feeding physiology among mammals, the presence and nature of motor pattern conservation in the mammalian feeding muscles, and the extent to which suckling constrains the evolution of feeding behavior in adult mammals. We expect FEED to be a growing digital archive that will facilitate new research into understanding the evolution of feeding anatomy.
MiDAS: the field guide to the microbes of activated sludge
McIlroy, Simon Jon; Saunders, Aaron Marc; Albertsen, Mads; Nierychlo, Marta; McIlroy, Bianca; Hansen, Aviaja Anna; Karst, Søren Michael; Nielsen, Jeppe Lund; Nielsen, Per Halkjær
2015-01-01
The Microbial Database for Activated Sludge (MiDAS) field guide is a freely available online resource linking the identity of abundant and process critical microorganisms in activated sludge wastewater treatment systems to available data related to their functional importance. Phenotypic properties of some of these genera are described, but most are known only from sequence data. The MiDAS taxonomy is a manual curation of the SILVA taxonomy that proposes a name for all genus-level taxa observed to be abundant by large-scale 16 S rRNA gene amplicon sequencing of full-scale activated sludge communities. The taxonomy can be used to classify unknown sequences, and the online MiDAS field guide links the identity to the available information about their morphology, diversity, physiology and distribution. The use of a common taxonomy across the field will provide a solid foundation for the study of microbial ecology of the activated sludge process and related treatment processes. The online MiDAS field guide is a collaborative workspace intended to facilitate a better understanding of the ecology of activated sludge and related treatment processes—knowledge that will be an invaluable resource for the optimal design and operation of these systems. Database URL: http://www.midasfieldguide.org PMID:26120139
Valletta, Elisa; Kučera, Lukáš; Prokeš, Lubomír; Amato, Filippo; Pivetta, Tiziana; Hampl, Aleš; Havel, Josef; Vaňhara, Petr
2016-01-01
Cross-contamination of eukaryotic cell lines used in biomedical research represents a highly relevant problem. Analysis of repetitive DNA sequences, such as Short Tandem Repeats (STR), or Simple Sequence Repeats (SSR), is a widely accepted, simple, and commercially available technique to authenticate cell lines. However, it provides only qualitative information that depends on the extent of reference databases for interpretation. In this work, we developed and validated a rapid and routinely applicable method for evaluation of cell culture cross-contamination levels based on mass spectrometric fingerprints of intact mammalian cells coupled with artificial neural networks (ANNs). We used human embryonic stem cells (hESCs) contaminated by either mouse embryonic stem cells (mESCs) or mouse embryonic fibroblasts (MEFs) as a model. We determined the contamination level using a mass spectra database of known calibration mixtures that served as training input for an ANN. The ANN was then capable of correct quantification of the level of contamination of hESCs by mESCs or MEFs. We demonstrate that MS analysis, when linked to proper mathematical instruments, is a tangible tool for unraveling and quantifying heterogeneity in cell cultures. The analysis is applicable in routine scenarios for cell authentication and/or cell phenotyping in general.
Prokeš, Lubomír; Amato, Filippo; Pivetta, Tiziana; Hampl, Aleš; Havel, Josef; Vaňhara, Petr
2016-01-01
Cross-contamination of eukaryotic cell lines used in biomedical research represents a highly relevant problem. Analysis of repetitive DNA sequences, such as Short Tandem Repeats (STR), or Simple Sequence Repeats (SSR), is a widely accepted, simple, and commercially available technique to authenticate cell lines. However, it provides only qualitative information that depends on the extent of reference databases for interpretation. In this work, we developed and validated a rapid and routinely applicable method for evaluation of cell culture cross-contamination levels based on mass spectrometric fingerprints of intact mammalian cells coupled with artificial neural networks (ANNs). We used human embryonic stem cells (hESCs) contaminated by either mouse embryonic stem cells (mESCs) or mouse embryonic fibroblasts (MEFs) as a model. We determined the contamination level using a mass spectra database of known calibration mixtures that served as training input for an ANN. The ANN was then capable of correct quantification of the level of contamination of hESCs by mESCs or MEFs. We demonstrate that MS analysis, when linked to proper mathematical instruments, is a tangible tool for unraveling and quantifying heterogeneity in cell cultures. The analysis is applicable in routine scenarios for cell authentication and/or cell phenotyping in general. PMID:26821236
Getting ready for the Human Phenome Project: the 2012 forum of the Human Variome Project.
Oetting, William S; Robinson, Peter N; Greenblatt, Marc S; Cotton, Richard G; Beck, Tim; Carey, John C; Doelken, Sandra C; Girdea, Marta; Groza, Tudor; Hamilton, Carol M; Hamosh, Ada; Kerner, Berit; MacArthur, Jacqueline A L; Maglott, Donna R; Mons, Barend; Rehm, Heidi L; Schofield, Paul N; Searle, Beverly A; Smedley, Damian; Smith, Cynthia L; Bernstein, Inge Thomsen; Zankl, Andreas; Zhao, Eric Y
2013-04-01
A forum of the Human Variome Project (HVP) was held as a satellite to the 2012 Annual Meeting of the American Society of Human Genetics in San Francisco, California. The theme of this meeting was "Getting Ready for the Human Phenome Project." Understanding the genetic contribution to both rare single-gene "Mendelian" disorders and more complex common diseases will require integration of research efforts among many fields and better defined phenotypes. The HVP is dedicated to bringing together researchers and research populations throughout the world to provide the resources to investigate the impact of genetic variation on disease. To this end, there needs to be a greater sharing of phenotype and genotype data. For this to occur, many databases that currently exist will need to become interoperable to allow for the combining of cohorts with similar phenotypes to increase statistical power for studies attempting to identify novel disease genes or causative genetic variants. Improved systems and tools that enhance the collection of phenotype data from clinicians are urgently needed. This meeting begins the HVP's effort toward this important goal. © 2013 Wiley Periodicals, Inc.
Code of Federal Regulations, 2014 CFR
2014-01-01
... AVAILABLE CONSUMER PRODUCT SAFETY INFORMATION DATABASE Notice and Disclosure Requirements § 1102.42... Consumer Product Safety Information Database, particularly with respect to the accuracy, completeness, or adequacy of information submitted by persons outside of the CPSC. The Database will contain a notice to...
16 CFR § 1102.24 - Designation of confidential information.
Code of Federal Regulations, 2013 CFR
2013-01-01
... SAFETY ACT REGULATIONS PUBLICLY AVAILABLE CONSUMER PRODUCT SAFETY INFORMATION DATABASE Procedural... allegedly confidential information is not placed in the database, a request for designation of confidential... publication in the Database until it makes a determination regarding confidential treatment. (e) Assistance...
Code of Federal Regulations, 2012 CFR
2012-01-01
... AVAILABLE CONSUMER PRODUCT SAFETY INFORMATION DATABASE Notice and Disclosure Requirements § 1102.42... Consumer Product Safety Information Database, particularly with respect to the accuracy, completeness, or adequacy of information submitted by persons outside of the CPSC. The Database will contain a notice to...
Code of Federal Regulations, 2011 CFR
2011-01-01
... AVAILABLE CONSUMER PRODUCT SAFETY INFORMATION DATABASE (Eff. Jan. 10, 2011) Notice and Disclosure... of the contents of the Consumer Product Safety Information Database, particularly with respect to the accuracy, completeness, or adequacy of information submitted by persons outside of the CPSC. The Database...
[Establishment of a comprehensive database for laryngeal cancer related genes and the miRNAs].
Li, Mengjiao; E, Qimin; Liu, Jialin; Huang, Tingting; Liang, Chuanyu
2015-09-01
By collecting and analyzing the laryngeal cancer related genes and the miRNAs, to build a comprehensive laryngeal cancer-related gene database, which differs from the current biological information database with complex and clumsy structure and focuses on the theme of gene and miRNA, and it could make the research and teaching more convenient and efficient. Based on the B/S architecture, using Apache as a Web server, MySQL as coding language of database design and PHP as coding language of web design, a comprehensive database for laryngeal cancer-related genes was established, providing with the gene tables, protein tables, miRNA tables and clinical information tables of the patients with laryngeal cancer. The established database containsed 207 laryngeal cancer related genes, 243 proteins, 26 miRNAs, and their particular information such as mutations, methylations, diversified expressions, and the empirical references of laryngeal cancer relevant molecules. The database could be accessed and operated via the Internet, by which browsing and retrieval of the information were performed. The database were maintained and updated regularly. The database for laryngeal cancer related genes is resource-integrated and user-friendly, providing a genetic information query tool for the study of laryngeal cancer.
Gainotti, Sabina; Torreri, Paola; Wang, Chiuhui Mary; Reihs, Robert; Mueller, Heimo; Heslop, Emma; Roos, Marco; Badowska, Dorota Mazena; de Paulis, Federico; Kodra, Yllka; Carta, Claudio; Martìn, Estrella Lopez; Miller, Vanessa Rangel; Filocamo, Mirella; Mora, Marina; Thompson, Mark; Rubinstein, Yaffa; Posada de la Paz, Manuel; Monaco, Lucia; Lochmüller, Hanns; Taruscio, Domenica
2018-05-01
In rare disease (RD) research, there is a huge need to systematically collect biomaterials, phenotypic, and genomic data in a standardized way and to make them findable, accessible, interoperable and reusable (FAIR). RD-Connect is a 6 years global infrastructure project initiated in November 2012 that links genomic data with patient registries, biobanks, and clinical bioinformatics tools to create a central research resource for RDs. Here, we present RD-Connect Registry & Biobank Finder, a tool that helps RD researchers to find RD biobanks and registries and provide information on the availability and accessibility of content in each database. The finder concentrates information that is currently sparse on different repositories (inventories, websites, scientific journals, technical reports, etc.), including aggregated data and metadata from participating databases. Aggregated data provided by the finder, if appropriately checked, can be used by researchers who are trying to estimate the prevalence of a RD, to organize a clinical trial on a RD, or to estimate the volume of patients seen by different clinical centers. The finder is also a portal to other RD-Connect tools, providing a link to the RD-Connect Sample Catalogue, a large inventory of RD biological samples available in participating biobanks for RD research. There are several kinds of users and potential uses for the RD-Connect Registry & Biobank Finder, including researchers collaborating with academia and the industry, dealing with the questions of basic, translational, and/or clinical research. As of November 2017, the finder is populated with aggregated data for 222 registries and 21 biobanks.
Seliske, Laura; Pickett, William; Bates, Rebecca; Janssen, Ian
2012-01-01
Many studies examining the food retail environment rely on geographic information system (GIS) databases for location information. The purpose of this study was to validate information provided by two GIS databases, comparing the positional accuracy of food service places within a 1 km circular buffer surrounding 34 schools in Ontario, Canada. A commercial database (InfoCanada) and an online database (Yellow Pages) provided the addresses of food service places. Actual locations were measured using a global positioning system (GPS) device. The InfoCanada and Yellow Pages GIS databases provided the locations for 973 and 675 food service places, respectively. Overall, 749 (77.1%) and 595 (88.2%) of these were located in the field. The online database had a higher proportion of food service places found in the field. The GIS locations of 25% of the food service places were located within approximately 15 m of their actual location, 50% were within 25 m, and 75% were within 50 m. This validation study provided a detailed assessment of errors in the measurement of the location of food service places in the two databases. The location information was more accurate for the online database, however, when matching criteria were more conservative, there were no observed differences in error between the databases. PMID:23066385
Seliske, Laura; Pickett, William; Bates, Rebecca; Janssen, Ian
2012-08-01
Many studies examining the food retail environment rely on geographic information system (GIS) databases for location information. The purpose of this study was to validate information provided by two GIS databases, comparing the positional accuracy of food service places within a 1 km circular buffer surrounding 34 schools in Ontario, Canada. A commercial database (InfoCanada) and an online database (Yellow Pages) provided the addresses of food service places. Actual locations were measured using a global positioning system (GPS) device. The InfoCanada and Yellow Pages GIS databases provided the locations for 973 and 675 food service places, respectively. Overall, 749 (77.1%) and 595 (88.2%) of these were located in the field. The online database had a higher proportion of food service places found in the field. The GIS locations of 25% of the food service places were located within approximately 15 m of their actual location, 50% were within 25 m, and 75% were within 50 m. This validation study provided a detailed assessment of errors in the measurement of the location of food service places in the two databases. The location information was more accurate for the online database, however, when matching criteria were more conservative, there were no observed differences in error between the databases.
Effect of Inherited Genetic Information on Stochastic Predator-Prey Model
NASA Astrophysics Data System (ADS)
Duda, Artur; Dyś, Paweł; Nowicka, Alekandra; Dudek, Mirosław R.
We discuss the Lotka-Volterra dynamics of two populations, preys and predators, in the case when the predators posses a genetic information. The genetic information is inherited according to the rules of the Penna model of genetic evolution. Each individual of the predator population is uniquely determined by sex, genotype and phenotype. In our case, the genes are represented by 8-bit integers and the phenotypes are defined with the help of the 8-state Potts model Hamiltonian. We showed that during time evolution, the population of the predators can experience a series of dynamical phase transitions which are connected with the different types of the dominant phenotypes present in the population.
16 CFR 1102.24 - Designation of confidential information.
Code of Federal Regulations, 2011 CFR
2011-01-01
... ACT REGULATIONS PUBLICLY AVAILABLE CONSUMER PRODUCT SAFETY INFORMATION DATABASE (Eff. Jan. 10, 2011... allegedly confidential information is not placed in the database, a request for designation of confidential... publication in the Database until it makes a determination regarding confidential treatment. (e) Assistance...
16 CFR § 1102.42 - Disclaimers.
Code of Federal Regulations, 2013 CFR
2013-01-01
... AVAILABLE CONSUMER PRODUCT SAFETY INFORMATION DATABASE Notice and Disclosure Requirements § 1102.42... Consumer Product Safety Information Database, particularly with respect to the accuracy, completeness, or adequacy of information submitted by persons outside of the CPSC. The Database will contain a notice to...
49 CFR 535.8 - Reporting requirements.
Code of Federal Regulations, 2011 CFR
2011-10-01
... information. (2) Manufacturers must submit information electronically through the EPA database system as the... year 2012 the agencies are not prepared to receive information through the EPA database system... applications for certificates of conformity in accordance through the EPA database including both GHG emissions...
MIPS: analysis and annotation of proteins from whole genomes in 2005
Mewes, H. W.; Frishman, D.; Mayer, K. F. X.; Münsterkötter, M.; Noubibou, O.; Pagel, P.; Rattei, T.; Oesterheld, M.; Ruepp, A.; Stümpflen, V.
2006-01-01
The Munich Information Center for Protein Sequences (MIPS at the GSF), Neuherberg, Germany, provides resources related to genome information. Manually curated databases for several reference organisms are maintained. Several of these databases are described elsewhere in this and other recent NAR database issues. In a complementary effort, a comprehensive set of >400 genomes automatically annotated with the PEDANT system are maintained. The main goal of our current work on creating and maintaining genome databases is to extend gene centered information to information on interactions within a generic comprehensive framework. We have concentrated our efforts along three lines (i) the development of suitable comprehensive data structures and database technology, communication and query tools to include a wide range of different types of information enabling the representation of complex information such as functional modules or networks Genome Research Environment System, (ii) the development of databases covering computable information such as the basic evolutionary relations among all genes, namely SIMAP, the sequence similarity matrix and the CABiNet network analysis framework and (iii) the compilation and manual annotation of information related to interactions such as protein–protein interactions or other types of relations (e.g. MPCDB, MPPI, CYGD). All databases described and the detailed descriptions of our projects can be accessed through the MIPS WWW server (). PMID:16381839
MIPS: analysis and annotation of proteins from whole genomes in 2005.
Mewes, H W; Frishman, D; Mayer, K F X; Münsterkötter, M; Noubibou, O; Pagel, P; Rattei, T; Oesterheld, M; Ruepp, A; Stümpflen, V
2006-01-01
The Munich Information Center for Protein Sequences (MIPS at the GSF), Neuherberg, Germany, provides resources related to genome information. Manually curated databases for several reference organisms are maintained. Several of these databases are described elsewhere in this and other recent NAR database issues. In a complementary effort, a comprehensive set of >400 genomes automatically annotated with the PEDANT system are maintained. The main goal of our current work on creating and maintaining genome databases is to extend gene centered information to information on interactions within a generic comprehensive framework. We have concentrated our efforts along three lines (i) the development of suitable comprehensive data structures and database technology, communication and query tools to include a wide range of different types of information enabling the representation of complex information such as functional modules or networks Genome Research Environment System, (ii) the development of databases covering computable information such as the basic evolutionary relations among all genes, namely SIMAP, the sequence similarity matrix and the CABiNet network analysis framework and (iii) the compilation and manual annotation of information related to interactions such as protein-protein interactions or other types of relations (e.g. MPCDB, MPPI, CYGD). All databases described and the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.gsf.de).
[Software for performing a global phenotypic and genotypic nutritional assessment].
García de Diego, L; Cuervo, M; Martínez, J A
2013-01-01
The nutritional assessment of a patient needs the simultaneous managing a extensive information and a great number of databases, as both aspects of the process of nutrition and the clinical situation of the patient are analyzed. The introduction of computers in the nutritional area constitutes an extraordinary advance in the administration of nutrition information, providing a complete assessment of nutritional aspects in a quick and easy way. To develop a computer program that can be used as a tool for assessing the nutritional status of the patient, the education of clinical staff, for epidemiological studies and for educational purposes. Based on a computer program which assists the health specialist to perform a full nutritional evaluation of the patient, through the registration and assessment of the phenotypic and genotypic features. The application provides nutritional prognosis based on anthropometric and biochemical parameters, images of states of malnutrition, questionnaires to characterize diseases, diagnostic criteria, identification of alleles associated with the development of specific metabolic illnesses and questionnaires of quality of life, for a custom actuation. The program includes, as part of the nutritional assessment of the patient, food intake analysis, design of diets and promotion of physical activity, introducing food frequency questionnaires, dietary recalls, healthy eating indexes, model diets, fitness tests, and recommendations, recalls and questionnaires of physical activity. A computer program performed under Java Swing, using SQLite database and some external libraries such as JfreeChart for plotting graphs. This brand new designed software is composed of five blocks categorized into ten modules named: Patients, Anthropometry, Clinical History, Biochemistry, Dietary History, Diagnostic (with genetic make up), Quality of life, Physical activity, Energy expenditure and Diets. Each module has a specific function which evaluates a different aspect of the nutritional status of the patient. UNyDIET is a global computer program, customized and upgradeable, easy to use and versatile, aimed to health specialists, medical staff, dietitians, nutritionists, scientists and educators. This tool can be used as a working instrument in programs promoting health, nutritional and clinical assessments as well as in the evaluation of health care quality, in epidemiological studies, in nutrition intervention programs and teaching. Copyright © AULA MEDICA EDICIONES 2013. Published by AULA MEDICA. All rights reserved.
SORTEZ: a relational translator for NCBI's ASN.1 database.
Hart, K W; Searls, D B; Overton, G C
1994-07-01
The National Center for Biotechnology Information (NCBI) has created a database collection that includes several protein and nucleic acid sequence databases, a biosequence-specific subset of MEDLINE, as well as value-added information such as links between similar sequences. Information in the NCBI database is modeled in Abstract Syntax Notation 1 (ASN.1) an Open Systems Interconnection protocol designed for the purpose of exchanging structured data between software applications rather than as a data model for database systems. While the NCBI database is distributed with an easy-to-use information retrieval system, ENTREZ, the ASN.1 data model currently lacks an ad hoc query language for general-purpose data access. For that reason, we have developed a software package, SORTEZ, that transforms the ASN.1 database (or other databases with nested data structures) to a relational data model and subsequently to a relational database management system (Sybase) where information can be accessed through the relational query language, SQL. Because the need to transform data from one data model and schema to another arises naturally in several important contexts, including efficient execution of specific applications, access to multiple databases and adaptation to database evolution this work also serves as a practical study of the issues involved in the various stages of database transformation. We show that transformation from the ASN.1 data model to a relational data model can be largely automated, but that schema transformation and data conversion require considerable domain expertise and would greatly benefit from additional support tools.
Development and Implementation of Kumamoto Technopolis Regional Database T-KIND
NASA Astrophysics Data System (ADS)
Onoue, Noriaki
T-KIND (Techno-Kumamoto Information Network for Data-Base) is a system for effectively searching information of technology, human resources and industries which are necessary to realize Kumamoto Technopolis. It is composed of coded database, image database and LAN inside technoresearch park which is the center of R & D in the Technopolis. It constructs on-line system by networking general-purposed computers, minicomputers, optical disk file systems and so on, and provides the service through public telephone line. Two databases are now available on enterprise information and human resource information. The former covers about 4,000 enterprises, and the latter does about 2,000 persons.
The methodology of database design in organization management systems
NASA Astrophysics Data System (ADS)
Chudinov, I. L.; Osipova, V. V.; Bobrova, Y. V.
2017-01-01
The paper describes the unified methodology of database design for management information systems. Designing the conceptual information model for the domain area is the most important and labor-intensive stage in database design. Basing on the proposed integrated approach to design, the conceptual information model, the main principles of developing the relation databases are provided and user’s information needs are considered. According to the methodology, the process of designing the conceptual information model includes three basic stages, which are defined in detail. Finally, the article describes the process of performing the results of analyzing user’s information needs and the rationale for use of classifiers.
Rhode Island Water Supply System Management Plan Database (WSSMP-Version 1.0)
Granato, Gregory E.
2004-01-01
In Rhode Island, the availability of water of sufficient quality and quantity to meet current and future environmental and economic needs is vital to life and the State's economy. Water suppliers, the Rhode Island Water Resources Board (RIWRB), and other State agencies responsible for water resources in Rhode Island need information about available resources, the water-supply infrastructure, and water use patterns. These decision makers need historical, current, and future water-resource information. In 1997, the State of Rhode Island formalized a system of Water Supply System Management Plans (WSSMPs) to characterize and document relevant water-supply information. All major water suppliers (those that obtain, transport, purchase, or sell more than 50 million gallons of water per year) are required to prepare, maintain, and carry out WSSMPs. An electronic database for this WSSMP information has been deemed necessary by the RIWRB for water suppliers and State agencies to consistently document, maintain, and interpret the information in these plans. Availability of WSSMP data in standard formats will allow water suppliers and State agencies to improve the understanding of water-supply systems and to plan for future needs or water-supply emergencies. In 2002, however, the Rhode Island General Assembly passed a law that classifies some of the WSSMP information as confidential to protect the water-supply infrastructure from potential terrorist threats. Therefore the WSSMP database was designed for an implementation method that will balance security concerns with the information needs of the RIWRB, suppliers, other State agencies, and the public. A WSSMP database was developed by the U.S. Geological Survey in cooperation with the RIWRB. The database was designed to catalog WSSMP information in a format that would accommodate synthesis of current and future information about Rhode Island's water-supply infrastructure. This report documents the design and implementation of the WSSMP database. All WSSMP information in the database is, ultimately, linked to the individual water suppliers and to a WSSMP 'cycle' (which is currently a 5-year planning cycle for compiling WSSMP information). The database file contains 172 tables - 47 data tables, 61 association tables, 61 domain tables, and 3 example import-link tables. This database is currently implemented in the Microsoft Access database software because it is widely used within and outside of government and is familiar to many existing and potential customers. Design documentation facilitates current use and potential modification for future use of the database. Information within the structure of the WSSMP database file (WSSMPv01.mdb), a data dictionary file (WSSMPDD1.pdf), a detailed database-design diagram (WSSMPPL1.pdf), and this database-design report (OFR2004-1231.pdf) documents the design of the database. This report includes a discussion of each WSSMP data structure with an accompanying database-design diagram. Appendix 1 of this report is an index of the diagrams in the report and on the plate; this index is organized by table name in alphabetical order. Each of these products is included in digital format on the enclosed CD-ROM to facilitate use or modification of the database.
Krischer, Jeffrey P; Gopal-Srivastava, Rashmi; Groft, Stephen C; Eckstein, David J
2014-08-01
Established in 2003 by the Office of Rare Diseases Research (ORDR), in collaboration with several National Institutes of Health (NIH) Institutes/Centers, the Rare Diseases Clinical Research Network (RDCRN) consists of multiple clinical consortia conducting research in more than 200 rare diseases. The RDCRN supports longitudinal or natural history, pilot, Phase I, II, and III, case-control, cross-sectional, chart review, physician survey, bio-repository, and RDCRN Contact Registry (CR) studies. To date, there have been 24,684 participants enrolled on 120 studies from 446 sites worldwide. An additional 11,533 individuals participate in the CR. Through a central data management and coordinating center (DMCC), the RDCRN's platform for the conduct of observational research encompasses electronic case report forms, federated databases, and an online CR for epidemiological and survey research. An ORDR-governed data repository (through dbGaP, a database for genotype and phenotype information from the National Library of Medicine) has been created. DMCC coordinates with ORDR to register and upload study data to dbGaP for data sharing with the scientific community. The platform provided by the RDCRN DMCC has supported 128 studies, six of which were successfully conducted through the online CR, with 2,352 individuals accrued and a median enrollment time of just 2 months. The RDCRN has built a powerful suite of web-based tools that provide for integration of federated and online database support that can accommodate a large number of rare diseases on a global scale. RDCRN studies have made important advances in the diagnosis and treatment of rare diseases.
User assumptions about information retrieval systems: Ethical concerns
DOE Office of Scientific and Technical Information (OSTI.GOV)
Froehlich, T.J.
Information professionals, whether designers, intermediaries, database producers or vendors, bear some responsibility for the information that they make available to users of information systems. The users of such systems may tend to make many assumptions about the information that a system provides, such as believing: that the data are comprehensive, current and accurate, that the information resources or databases have same degree of quality and consistency of indexing; that the abstracts, if they exist, correctly and adequate reflect the content of the article; that there is consistency informs of author names or journal titles or indexing within and across databases;more » that there is standardization in and across databases; that once errors are detected, they are corrected; that appropriate choices of databases or information resources are a relatively easy matter, etc. The truth is that few of these assumptions are valid in commercia or corporate or organizational databases. However, given these beliefs and assumptions by many users, often promoted by information providers, information professionals, impossible, should intervene to warn users about the limitations and constraints of the databases they are using. With the growth of the Internet and end-user products (e.g., CD-ROMs), such interventions have significantly declined. In such cases, information should be provided on start-up or through interface screens, indicating to users, the constraints and orientation of the system they are using. The principle of {open_quotes}caveat emptor{close_quotes} is naive and socially irresponsible: information professionals or systems have an obligation to provide some framework or context for the information that users are accessing.« less
From genotype to phenotype: genetics and medical practice in the new millennium.
Weatherall, D
1999-01-01
The completion of the human genome project will provide a vast amount of information about human genetic diversity. One of the major challenges for the medical sciences will be to relate genotype to phenotype. Over recent years considerable progress has been made in relating the molecular pathology of monogenic diseases to the associated clinical phenotypes. Studies of the inherited disorders of haemoglobin, notably the thalassaemias, have shown how even in these, the simplest of monogenic diseases, there is remarkable complexity with respect to their phenotypic expression. Although studies of other monogenic diseases are less far advanced, it is clear that the same level of complexity will exist. This information provides some indication of the difficulties that will be met when trying to define the genes that are involved in common multigenic disorders and, in particular, in trying to relate disease phenotypes to the complex interactions between many genes and multiple environmental factors. PMID:10670020
A New Approach To Secure Federated Information Bases Using Agent Technology.
ERIC Educational Resources Information Center
Weippi, Edgar; Klug, Ludwig; Essmayr, Wolfgang
2003-01-01
Discusses database agents which can be used to establish federated information bases by integrating heterogeneous databases. Highlights include characteristics of federated information bases, including incompatible database management systems, schemata, and frequently changing context; software agent technology; Java agents; system architecture;…
A Database of Historical Information on Landslides and Floods in Italy
NASA Astrophysics Data System (ADS)
Guzzetti, F.; Tonelli, G.
2003-04-01
For the past 12 years we have maintained and updated a database of historical information on landslides and floods in Italy, known as the National Research Council's AVI (Damaged Urban Areas) Project archive. The database was originally designed to respond to a specific request of the Minister of Civil Protection, and was aimed at helping the regional assessment of landslide and flood risk in Italy. The database was first constructed in 1991-92 to cover the period 1917 to 1990. Information of damaging landslide and flood event was collected by searching archives, by screening thousands of newspaper issues, by reviewing the existing technical and scientific literature on landslides and floods in Italy, and by interviewing landslide and flood experts. The database was then updated chiefly through the analysis of hundreds of newspaper articles, and it now covers systematically the period 1900 to 1998, and non-systematically the periods 1900 to 1916 and 1999 to 2002. Non systematic information on landslide and flood events older than 20th century is also present in the database. The database currently contains information on more than 32,000 landslide events occurred at more than 25,700 sites, and on more than 28,800 flood events occurred at more than 15,600 sites. After a brief outline of the history and evolution of the AVI Project archive, we present and discuss: (a) the present structure of the database, including the hardware and software solutions adopted to maintain, manage, use and disseminate the information stored in the database, (b) the type and amount of information stored in the database, including an estimate of its completeness, and (c) examples of recent applications of the database, including a web-based GIS systems to show the location of sites historically affected by landslides and floods, and an estimate of geo-hydrological (i.e., landslide and flood) risk in Italy based on the available historical information.
2012-01-01
Background The feline genome is valuable to the veterinary and model organism genomics communities because the cat is an obligate carnivore and a model for endangered felids. The initial public release of the Felis catus genome assembly provided a framework for investigating the genomic basis of feline biology. However, the entire set of protein coding genes has not been elucidated. Results We identified and characterized 1227 protein coding feline sequences, of which 913 map to public sequences and 314 are novel. These sequences have been deposited into NCBI's genbank database and complement public genomic resources by providing additional protein coding sequences that fill in some of the gaps in the feline genome assembly. Through functional and comparative genomic analyses, we gained an understanding of the role of these sequences in feline development, nutrition and health. Specifically, we identified 104 orthologs of human genes associated with Mendelian disorders. We detected negative selection within sequences with gene ontology annotations associated with intracellular trafficking, cytoskeleton and muscle functions. We detected relatively less negative selection on protein sequences encoding extracellular networks, apoptotic pathways and mitochondrial gene ontology annotations. Additionally, we characterized feline cDNA sequences that have mouse orthologs associated with clinical, nutritional and developmental phenotypes. Together, this analysis provides an overview of the value of our cDNA sequences and enhances our understanding of how the feline genome is similar to, and different from other mammalian genomes. Conclusions The cDNA sequences reported here expand existing feline genomic resources by providing high-quality sequences annotated with comparative genomic information providing functional, clinical, nutritional and orthologous gene information. PMID:22257742
Luque, Raúl M; Ibáñez-Costa, Alejandro; Sánchez-Tejada, Laura; Rivero-Cortés, Esther; Robledo, Mercedes; Madrazo-Atutxa, Ainara; Mora, Mireia; Álvarez, Clara V; Lucas-Morante, Tomás; Álvarez-Escolá, Cristina; Fajardo, Carmen; Castaño, Luis; Gaztambide, Sonia; Venegas-Moreno, Eva; Soto-Moreno, Alfonso; Gálvez, María Ángeles; Salvador, Javier; Valassi, Elena; Webb, Susan M; Picó, Antonio; Puig-Domingo, Manel; Gilabert, Montserrat; Bernabéu, Ignacio; Marazuela, Mónica; Leal-Cerro, Alfonso; Castaño, Justo P
2016-01-01
Pituitary adenomas are uncommon, difficult to diagnose tumors whose heterogeneity and low incidence complicate large-scale studies. The Molecular Registry of Pituitary Adenomas (REMAH) was promoted by the Andalusian Society of Endocrinology and Nutrition (SAEN) in 2008 as a cooperative clinical-basic multicenter strategy aimed at improving diagnosis and treatment of pituitary adenomas by combining clinical, pathological, and molecular information. In 2010, the Spanish Society of Endocrinology and Nutrition (SEEN) extended this project to national level and established 6 nodes with common protocols and methods for sample and clinical data collection, molecular analysis, and data recording in a common registry (www.remahnacional.com). The registry combines clinical data with molecular phenotyping of the resected pituitary adenoma using quantitative real-time PCR of expression of 26 genes: Pituitary hormones (GH-PRL-LH-FSH-PRL-ACTH-CGA), receptors (somatostatin, dopamine, GHRH, GnRH, CRH, arginine-vasopressin, ghrelin), other markers (Ki67, PTTG1), and control genes. Until 2015, molecular information has been collected from 704 adenomas, out of 1179 patients registered. This strategy allows for comparative and relational analysis between the molecular profile of the different types of adenoma and the clinical phenotype of patients, which may provide a better understanding of the condition and potentially help in treatment selection. The REMAH is therefore a unique multicenter, interdisciplinary network founded on a shared database that provides a far-reaching translational approach for management of pituitary adenomas, and paves the way for the conduct of combined clinical-basic innovative studies on large patient samples. Copyright © 2016 SEEN. Published by Elsevier España, S.L.U. All rights reserved.
Puerto Rican Phenotype: Understanding Its Historical Underpinnings and Psychological Associations
ERIC Educational Resources Information Center
Lopez, Irene
2008-01-01
The following is a historically informed review of Puerto Rican phenotype. Geared toward educating psychologists, this review discusses how various psychological issues associated with phenotype may have arisen as a result of historical legacies and policies associated with race and racial mixing. It discusses how these policies used various…
Salvador-Severo, Karina; Gómez-Caudillo, Leopoldo; Quezada, Héctor; García-Trejo, José de Jesús; Cárdenas-Conejo, Alan; Vázquez-Memije, Martha Elisa; Minauro-Sanmiguel, Fernando
Mitochondriopathies are multisystem diseases affecting the oxidative phosphorylation (OXPHOS) system. Skin fibroblasts are a good model for the study of these diseases. Fibroblasts with a complex IV mitochondriopathy were used to determine the molecular mechanism and the main affected functions in this disease. Skin fibroblast were grown to assure disease phenotype. Mitochondria were isolated from these cells and their proteome extracted for protein identification. Identified proteins were validated with the MitoMiner database. Disease phenotype was corroborated on skin fibroblasts, which presented a complex IV defect. The mitochondrial proteome of these cells showed that the most affected proteins belonged to the OXPHOS system, mainly to the complexes that form supercomplexes or respirosomes (I, III, IV, and V). Defects in complex IV seemed to be due to assembly issues, which might prevent supercomplexes formation and efficient substrate channeling. It was also found that this mitochondriopathy affects other processes that are related to DNA genetic information flow (replication, transcription, and translation) as well as beta oxidation and tricarboxylic acid cycle. These data, as a whole, could be used for the better stratification of these diseases, as well as to optimize management and treatment options. Copyright © 2017 Hospital Infantil de México Federico Gómez. Publicado por Masson Doyma México S.A. All rights reserved.
Metabolic genes in cancer: their roles in tumor progression and clinical implications
Furuta, Eiji; Okuda, Hiroshi; Kobayashi, Aya; Watabe, Kounosuke
2010-01-01
Re-programming of metabolic pathways is a hallmark of physiological changes in cancer cells. The expression of certain genes that directly control the rate of key metabolic pathways including glycolysis, lipogenesis and nucleotide synthesis are drastically altered at different stages of tumor progression. These alterations are generally considered as an adaptation of tumor cells; however, they also contribute to the progression of tumor cells to become more aggressive phenotypes. This review summarizes the recent information about the mechanistic link of these genes to oncogenesis and their potential utility as diagnostic markers as well as for therapeutic targets. We particularly focus on three groups of genes; GLUT1, G6PD, TKTL1 and PGI/AMF in glycolytic pathway, ACLY, ACC1 and FAS in lipogenesis and RRM1, RRM2 and TYMS for nucleotide synthesis. All these genes are highly up-regulated in a variety of tumor cells in cancer patients, and they play active roles in tumor progression rather than expressing merely as a consequence of phenotypic change of the cancer cells. Molecular dissection of their orchestrated networks and understanding the exact mechanism of their expression will provide a window of opportunity to target these genes for specific cancer therapy. We also reviewed existing database of gene microarray to validate the utility of these genes for cancer diagnosis. PMID:20122995
Sollie, Annet; Sijmons, Rolf H; Lindhout, Dick; van der Ploeg, Ans T; Rubio Gozalbo, M Estela; Smit, G Peter A; Verheijen, Frans; Waterham, Hans R; van Weely, Sonja; Wijburg, Frits A; Wijburg, Rudolph; Visser, Gepke
2013-07-01
Data sharing is essential for a better understanding of genetic disorders. Good phenotype coding plays a key role in this process. Unfortunately, the two most widely used coding systems in medicine, ICD-10 and SNOMED-CT, lack information necessary for the detailed classification and annotation of rare and genetic disorders. This prevents the optimal registration of such patients in databases and thus data-sharing efforts. To improve care and to facilitate research for patients with metabolic disorders, we developed a new coding system for metabolic diseases with a dedicated group of clinical specialists. Next, we compared the resulting codes with those in ICD and SNOMED-CT. No matches were found in 76% of cases in ICD-10 and in 54% in SNOMED-CT. We conclude that there are sizable gaps in the SNOMED-CT and ICD coding systems for metabolic disorders. There may be similar gaps for other classes of rare and genetic disorders. We have demonstrated that expert groups can help in addressing such coding issues. Our coding system has been made available to the ICD and SNOMED-CT organizations as well as to the Orphanet and HPO organizations for further public application and updates will be published online (www.ddrmd.nl and www.cineas.org). © 2013 WILEY PERIODICALS, INC.
Mimouni-Bloch, Aviva; Yeshaya, Josepha; Kahana, Sarit; Maya, Idit; Basel-Vanagaite, Lina
2015-11-01
Microdeletions of various sizes in the 2p16.1-p15 chromosomal region have been grouped together under the 2p16.1-p15 microdeletion syndrome. Children with this syndrome generally share certain features including microcephaly, developmental delay, facial dysmorphism, urogenital and skeletal abnormalities. We present a child with a de-novo interstitial 1665 kb duplication of 2p16.1-p15. Clinical features of this child are distinct from those of children with the 2p16.1-p15 microdeletion syndrome, specifically the head circumference which is within the normal range and mild intellectual disability with absence of autistic behaviors. Microduplications many times bear milder clinical phenotypes in comparison with corresponding microdeletion syndromes. Indeed, as compared to the microdeletion syndrome patients, the 2p16.1-p15 microduplication seems to have a milder cognitive effect and no effect on other body systems. Limited information available in genetic databases about cases with overlapping duplications indicates that they all have abnormal developmental phenotypes. The involvement of genes in this location including BCL11A, USP34 and PEX13, affecting fundamental developmental processes both within and outside the nervous system may explain the clinical features of the individual described in this report. Copyright © 2015 European Paediatric Neurology Society. Published by Elsevier Ltd. All rights reserved.
77 FR 66617 - HIT Policy and Standards Committees; Workgroup Application Database
Federal Register 2010, 2011, 2012, 2013, 2014
2012-11-06
... Database AGENCY: Office of the National Coordinator for Health Information Technology, HHS. ACTION: Notice of New ONC HIT FACA Workgroup Application Database. The Office of the National Coordinator (ONC) has launched a new Health Information Technology Federal Advisory Committee Workgroup Application Database...
Zhao, Jiangsan; Bodner, Gernot; Rewald, Boris; Leitner, Daniel; Nagel, Kerstin A; Nakhforoosh, Alireza
2017-02-01
Root phenotyping provides trait information for plant breeding. A shortcoming of high-throughput root phenotyping is the limitation to seedling plants and failure to make inferences on mature root systems. We suggest root system architecture (RSA) models to predict mature root traits and overcome the inference problem. Sixteen pea genotypes were phenotyped in (i) seedling (Petri dishes) and (ii) mature (sand-filled columns) root phenotyping platforms. The RSA model RootBox was parameterized with seedling traits to simulate the fully developed root systems. Measured and modelled root length, first-order lateral number, and root distribution were compared to determine key traits for model-based prediction. No direct relationship in root traits (tap, lateral length, interbranch distance) was evident between phenotyping systems. RootBox significantly improved the inference over phenotyping platforms. Seedling plant tap and lateral root elongation rates and interbranch distance were sufficient model parameters to predict genotype ranking in total root length with an RSpearman of 0.83. Parameterization including uneven lateral spacing via a scaling function substantially improved the prediction of architectures underlying the differently sized root systems. We conclude that RSA models can solve the inference problem of seedling root phenotyping. RSA models should be included in the phenotyping pipeline to provide reliable information on mature root systems to breeding research. © The Author 2017. Published by Oxford University Press on behalf of the Society for Experimental Biology.
Co-clustering phenome–genome for phenotype classification and disease gene discovery
Hwang, TaeHyun; Atluri, Gowtham; Xie, MaoQiang; Dey, Sanjoy; Hong, Changjin; Kumar, Vipin; Kuang, Rui
2012-01-01
Understanding the categorization of human diseases is critical for reliably identifying disease causal genes. Recently, genome-wide studies of abnormal chromosomal locations related to diseases have mapped >2000 phenotype–gene relations, which provide valuable information for classifying diseases and identifying candidate genes as drug targets. In this article, a regularized non-negative matrix tri-factorization (R-NMTF) algorithm is introduced to co-cluster phenotypes and genes, and simultaneously detect associations between the detected phenotype clusters and gene clusters. The R-NMTF algorithm factorizes the phenotype–gene association matrix under the prior knowledge from phenotype similarity network and protein–protein interaction network, supervised by the label information from known disease classes and biological pathways. In the experiments on disease phenotype–gene associations in OMIM and KEGG disease pathways, R-NMTF significantly improved the classification of disease phenotypes and disease pathway genes compared with support vector machines and Label Propagation in cross-validation on the annotated phenotypes and genes. The newly predicted phenotypes in each disease class are highly consistent with human phenotype ontology annotations. The roles of the new member genes in the disease pathways are examined and validated in the protein–protein interaction subnetworks. Extensive literature review also confirmed many new members of the disease classes and pathways as well as the predicted associations between disease phenotype classes and pathways. PMID:22735708
Selection against canine hip dysplasia: success or failure?
Wilson, Bethany; Nicholas, Frank W; Thomson, Peter C
2011-08-01
Canine hip dysplasia (CHD) is a multifactorial skeletal disorder which is very common in pedigree dogs and represents a huge concern for canine welfare. Control schemes based on selective breeding have been in operation for decades. The aim of these schemes is to reduce the impact of CHD on canine welfare by selecting for reduced radiographic evidence of CHD pathology as assessed by a variety of phenotypes. There is less information regarding the genotypic correlation between these phenotypes and the impact of CHD on canine welfare. Although the phenotypes chosen as the basis for these control schemes have displayed heritable phenotypic variation in many studies, success in achieving improvement in the phenotypes has been mixed. There is significant room for improvement in the current schemes through the use of estimated breeding values (EBVs), which can combine a dog's CHD phenotype with CHD phenotypes of relatives, other phenotypes as they are proven to be genetically correlated with CHD (especially elbow dysplasia phenotypes), and information from genetic tests for population-relevant DNA markers, as such tests become available. Additionally, breed clubs should be encouraged and assisted to formulate rational, evidenced-based breeding recommendations for CHD which suit their individual circumstances and dynamically to adjust the breeding recommendations based on continuous tracking of CHD genetic trends. These improvements can assist in safely and effectively reducing the impact of CHD on pedigree dog welfare. Copyright © 2011 Elsevier Ltd. All rights reserved.
E-MSD: an integrated data resource for bioinformatics.
Velankar, S; McNeil, P; Mittard-Runte, V; Suarez, A; Barrell, D; Apweiler, R; Henrick, K
2005-01-01
The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the worldwide Protein Data Bank (wwPDB) and to work towards the integration of various bioinformatics data resources. One of the major obstacles to the improved integration of structural databases such as MSD and sequence databases like UniProt is the absence of up to date and well-maintained mapping between corresponding entries. We have worked closely with the UniProt group at the EBI to clean up the taxonomy and sequence cross-reference information in the MSD and UniProt databases. This information is vital for the reliable integration of the sequence family databases such as Pfam and Interpro with the structure-oriented databases of SCOP and CATH. This information has been made available to the eFamily group (http://www.efamily.org.uk/) and now forms the basis of the regular interchange of information between the member databases (MSD, UniProt, Pfam, Interpro, SCOP and CATH). This exchange of annotation information has enriched the structural information in the MSD database with annotation from wider sequence-oriented resources. This work was carried out under the 'Structure Integration with Function, Taxonomy and Sequences (SIFTS)' initiative (http://www.ebi.ac.uk/msd-srv/docs/sifts) in the MSD group.
Initiation of a Database of CEUS Ground Motions for NGA East
NASA Astrophysics Data System (ADS)
Cramer, C. H.
2007-12-01
The Nuclear Regulatory Commission has funded the first stage of development of a database of central and eastern US (CEUS) broadband and accelerograph records, along the lines of the existing Next Generation Attenuation (NGA) database for active tectonic areas. This database will form the foundation of an NGA East project for the development of CEUS ground-motion prediction equations that include the effects of soils. This initial effort covers the development of a database design and the beginning of data collection to populate the database. It also includes some processing for important source parameters (Brune corner frequency and stress drop) and site parameters (kappa, Vs30). Besides collecting appropriate earthquake recordings and information, existing information about site conditions at recording sites will also be gathered, including geology and geotechnical information. The long-range goal of the database development is to complete the database and make it available in 2010. The database design is centered on CEUS ground motion information needs but is built on the Pacific Earthquake Engineering Research Center's (PEER) NGA experience. Documentation from the PEER NGA website was reviewed and relevant fields incorporated into the CEUS database design. CEUS database tables include ones for earthquake, station, component, record, and references. As was done for NGA, a CEUS ground- motion flat file of key information will be extracted from the CEUS database for use in attenuation relation development. A short report on the CEUS database and several initial design-definition files are available at https://umdrive.memphis.edu:443/xythoswfs/webui/_xy-7843974_docstore1. Comments and suggestions on the database design can be sent to the author. More details will be presented in a poster at the meeting.
75 FR 29155 - Publicly Available Consumer Product Safety Information Database
Federal Register 2010, 2011, 2012, 2013, 2014
2010-05-24
...The Consumer Product Safety Commission (``Commission,'' ``CPSC,'' or ``we'') is issuing a notice of proposed rulemaking that would establish a publicly available consumer product safety information database (``database''). Section 212 of the Consumer Product Safety Improvement Act of 2008 (``CPSIA'') amended the Consumer Product Safety Act (``CPSA'') to require the Commission to establish and maintain a publicly available, searchable database on the safety of consumer products, and other products or substances regulated by the Commission. The proposed rule would interpret various statutory requirements pertaining to the information to be included in the database and also would establish provisions regarding submitting reports of harm; providing notice of reports of harm to manufacturers; publishing reports of harm and manufacturer comments in the database; and dealing with confidential and materially inaccurate information.
The CIS Database: Occupational Health and Safety Information Online.
ERIC Educational Resources Information Center
Siegel, Herbert; Scurr, Erica
1985-01-01
Describes document acquisition, selection, indexing, and abstracting and discusses online searching of the CIS database, an online system produced by the International Occupational Safety and Health Information Centre. This database comprehensively covers information in the field of occupational health and safety. Sample searches and search…
Knowledge representation in metabolic pathway databases.
Stobbe, Miranda D; Jansen, Gerbert A; Moerland, Perry D; van Kampen, Antoine H C
2014-05-01
The accurate representation of all aspects of a metabolic network in a structured format, such that it can be used for a wide variety of computational analyses, is a challenge faced by a growing number of researchers. Analysis of five major metabolic pathway databases reveals that each database has made widely different choices to address this challenge, including how to deal with knowledge that is uncertain or missing. In concise overviews, we show how concepts such as compartments, enzymatic complexes and the direction of reactions are represented in each database. Importantly, also concepts which a database does not represent are described. Which aspects of the metabolic network need to be available in a structured format and to what detail differs per application. For example, for in silico phenotype prediction, a detailed representation of gene-protein-reaction relations and the compartmentalization of the network is essential. Our analysis also shows that current databases are still limited in capturing all details of the biology of the metabolic network, further illustrated with a detailed analysis of three metabolic processes. Finally, we conclude that the conceptual differences between the databases, which make knowledge exchange and integration a challenge, have not been resolved, so far, by the exchange formats in which knowledge representation is standardized.
Planform: an application and database of graph-encoded planarian regenerative experiments.
Lobo, Daniel; Malone, Taylor J; Levin, Michael
2013-04-15
Understanding the mechanisms governing the regeneration capabilities of many organisms is a fundamental interest in biology and medicine. An ever-increasing number of manipulation and molecular experiments are attempting to discover a comprehensive model for regeneration, with the planarian flatworm being one of the most important model species. Despite much effort, no comprehensive, constructive, mechanistic models exist yet, and it is now clear that computational tools are needed to mine this huge dataset. However, until now, there is no database of regenerative experiments, and the current genotype-phenotype ontologies and databases are based on textual descriptions, which are not understandable by computers. To overcome these difficulties, we present here Planform (Planarian formalization), a manually curated database and software tool for planarian regenerative experiments, based on a mathematical graph formalism. The database contains more than a thousand experiments from the main publications in the planarian literature. The software tool provides the user with a graphical interface to easily interact with and mine the database. The presented system is a valuable resource for the regeneration community and, more importantly, will pave the way for the application of novel artificial intelligence tools to extract knowledge from this dataset. The database and software tool are freely available at http://planform.daniel-lobo.com.
48 CFR 804.1102 - Vendor Information Pages (VIP) Database.
Code of Federal Regulations, 2011 CFR
2011-10-01
... (VIP) Database. 804.1102 Section 804.1102 Federal Acquisition Regulations System DEPARTMENT OF VETERANS AFFAIRS GENERAL ADMINISTRATIVE MATTERS Contract Execution 804.1102 Vendor Information Pages (VIP) Database. Prior to January 1, 2012, all VOSBs and SDVOSBs must be listed in the VIP database, available at http...
48 CFR 804.1102 - Vendor Information Pages (VIP) Database.
Code of Federal Regulations, 2013 CFR
2013-10-01
... (VIP) Database. 804.1102 Section 804.1102 Federal Acquisition Regulations System DEPARTMENT OF VETERANS AFFAIRS GENERAL ADMINISTRATIVE MATTERS Contract Execution 804.1102 Vendor Information Pages (VIP) Database. Prior to January 1, 2012, all VOSBs and SDVOSBs must be listed in the VIP database, available at http...
48 CFR 804.1102 - Vendor Information Pages (VIP) Database.
Code of Federal Regulations, 2014 CFR
2014-10-01
... (VIP) Database. 804.1102 Section 804.1102 Federal Acquisition Regulations System DEPARTMENT OF VETERANS AFFAIRS GENERAL ADMINISTRATIVE MATTERS Contract Execution 804.1102 Vendor Information Pages (VIP) Database. Prior to January 1, 2012, all VOSBs and SDVOSBs must be listed in the VIP database, available at http...
48 CFR 804.1102 - Vendor Information Pages (VIP) Database.
Code of Federal Regulations, 2012 CFR
2012-10-01
... (VIP) Database. 804.1102 Section 804.1102 Federal Acquisition Regulations System DEPARTMENT OF VETERANS AFFAIRS GENERAL ADMINISTRATIVE MATTERS Contract Execution 804.1102 Vendor Information Pages (VIP) Database. Prior to January 1, 2012, all VOSBs and SDVOSBs must be listed in the VIP database, available at http...
48 CFR 804.1102 - Vendor Information Pages (VIP) Database.
Code of Federal Regulations, 2010 CFR
2010-10-01
... (VIP) Database. 804.1102 Section 804.1102 Federal Acquisition Regulations System DEPARTMENT OF VETERANS AFFAIRS GENERAL ADMINISTRATIVE MATTERS Contract Execution 804.1102 Vendor Information Pages (VIP) Database. Prior to January 1, 2012, all VOSBs and SDVOSBs must be listed in the VIP database, available at http...
Alternative treatment technology information center computer database system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sullivan, D.
1995-10-01
The Alternative Treatment Technology Information Center (ATTIC) computer database system was developed pursuant to the 1986 Superfund law amendments. It provides up-to-date information on innovative treatment technologies to clean up hazardous waste sites. ATTIC v2.0 provides access to several independent databases as well as a mechanism for retrieving full-text documents of key literature. It can be accessed with a personal computer and modem 24 hours a day, and there are no user fees. ATTIC provides {open_quotes}one-stop shopping{close_quotes} for information on alternative treatment options by accessing several databases: (1) treatment technology database; this contains abstracts from the literature on all typesmore » of treatment technologies, including biological, chemical, physical, and thermal methods. The best literature as viewed by experts is highlighted. (2) treatability study database; this provides performance information on technologies to remove contaminants from wastewaters and soils. It is derived from treatability studies. This database is available through ATTIC or separately as a disk that can be mailed to you. (3) underground storage tank database; this presents information on underground storage tank corrective actions, surface spills, emergency response, and remedial actions. (4) oil/chemical spill database; this provides abstracts on treatment and disposal of spilled oil and chemicals. In addition to these separate databases, ATTIC allows immediate access to other disk-based systems such as the Vendor Information System for Innovative Treatment Technologies (VISITT) and the Bioremediation in the Field Search System (BFSS). The user may download these programs to their own PC via a high-speed modem. Also via modem, users are able to download entire documents through the ATTIC system. Currently, about fifty publications are available, including Superfund Innovative Technology Evaluation (SITE) program documents.« less
Potentials of Advanced Database Technology for Military Information Systems
2001-04-01
UNCLASSIFIED Defense Technical Information Center Compilation Part Notice ADP010866 TITLE: Potentials of Advanced Database Technology for Military... Technology for Military Information Systems Sunil Choennia Ben Bruggemanb a National Aerospace Laboratory, NLR, P.O. Box 90502, 1006 BM Amsterdam...application of advanced information tech- nology, including database technology , as underpin- actions X and Y as dangerous or not? ning is
TOXCAST, A TOOL FOR CATEGORIZATION AND ...
Across several EPA Program Offices (e.g., OPPTS, OW, OAR), there is a clear need to develop strategies and methods to screen large numbers of chemicals for potential toxicity, and to use the resulting information to prioritize the use of testing resources towards those entities and endpoints that present the greatest likelihood of risk to human health and the environment. This need could be addressed using the experience of the pharmaceutical industry in the use of advanced modern molecular biology and computational chemistry tools for the development of new drugs, with appropriate adjustment to the needs and desires of environmental toxicology. A conceptual approach named ToxCast has been developed to address the needs of EPA Program Offices in the area of prioritization and screening. Modern computational chemistry and molecular biology tools bring enabling technologies forward that can provide information about the physical and biological properties of large numbers of chemicals. The essence of the proposal is to conduct a demonstration project based upon a rich toxicological database (e.g., registered pesticides, or the chemicals tested in the NTP bioassay program), select a fairly large number (50-100 or more chemicals) representative of a number of differing structural classes and phenotypic outcomes (e.g., carcinogens, reproductive toxicants, neurotoxicants), and evaluate them across a broad spectrum of information domains that modern technology has pro
48 CFR 52.204-10 - Reporting Executive Compensation and First-Tier Subcontract Awards.
Code of Federal Regulations, 2013 CFR
2013-10-01
... System for Award Management (SAM) database (FAR provision 52.204-7), the Contractor shall report the... information from SAM and FPDS databases. If FPDS information is incorrect, the contractor should notify the contracting officer. If the SAM database information is incorrect, the contractor is responsible for...
48 CFR 52.204-10 - Reporting Executive Compensation and First-Tier Subcontract Awards.
Code of Federal Regulations, 2014 CFR
2014-10-01
... System for Award Management (SAM) database (FAR provision 52.204-7), the Contractor shall report the... information from SAM and FPDS databases. If FPDS information is incorrect, the contractor should notify the contracting officer. If the SAM database information is incorrect, the contractor is responsible for...
ERIC Educational Resources Information Center
Lundquist, Carol; Frieder, Ophir; Holmes, David O.; Grossman, David
1999-01-01
Describes a scalable, parallel, relational database-drive information retrieval engine. To support portability across a wide range of execution environments, all algorithms adhere to the SQL-92 standard. By incorporating relevance feedback algorithms, accuracy is enhanced over prior database-driven information retrieval efforts. Presents…
Federal Register 2010, 2011, 2012, 2013, 2014
2012-08-09
... DEPARTMENT OF STATE [Public Notice 7976] 30-Day Notice of Proposed Information Collection: Civilian Response Corps Database In-Processing Electronic Form, OMB Control Number 1405-0168, Form DS-4096.... Title of Information Collection: Civilian Response Corps Database In-Processing Electronic Form. OMB...
Integrated Primary Care Information Database (IPCI)
The Integrated Primary Care Information Database is a longitudinal observational database that was created specifically for pharmacoepidemiological and pharmacoeconomic studies, inlcuding data from computer-based patient records supplied voluntarily by general practitioners.
Fernández, José M; Valencia, Alfonso
2004-10-12
Downloading the information stored in relational databases into XML and other flat formats is a common task in bioinformatics. This periodical dumping of information requires considerable CPU time, disk and memory resources. YAdumper has been developed as a purpose-specific tool to deal with the integral structured information download of relational databases. YAdumper is a Java application that organizes database extraction following an XML template based on an external Document Type Declaration. Compared with other non-native alternatives, YAdumper substantially reduces memory requirements and considerably improves writing performance.
Silva, Cristina; Fresco, Paula; Monteiro, Joaquim; Rama, Ana Cristina Ribeiro
2013-08-01
Evidence-Based Practice requires health care decisions to be based on the best available evidence. The model "Information Mastery" proposes that clinicians should use sources of information that have previously evaluated relevance and validity, provided at the point of care. Drug databases (DB) allow easy and fast access to information and have the benefit of more frequent content updates. Relevant information, in the context of drug therapy, is that which supports safe and effective use of medicines. Accordingly, the European Guideline on the Summary of Product Characteristics (EG-SmPC) was used as a standard to evaluate the inclusion of relevant information contents in DB. To develop and test a method to evaluate relevancy of DB contents, by assessing the inclusion of information items deemed relevant for effective and safe drug use. Hierarchical organisation and selection of the principles defined in the EGSmPC; definition of criteria to assess inclusion of selected information items; creation of a categorisation and quantification system that allows score calculation; calculation of relative differences (RD) of scores for comparison with an "ideal" database, defined as the one that achieves the best quantification possible for each of the information items; pilot test on a sample of 9 drug databases, using 10 drugs frequently associated in literature with morbidity-mortality and also being widely consumed in Portugal. Main outcome measure Calculate individual and global scores for clinically relevant information items of drug monographs in databases, using the categorisation and quantification system created. A--Method development: selection of sections, subsections, relevant information items and corresponding requisites; system to categorise and quantify their inclusion; score and RD calculation procedure. B--Pilot test: calculated scores for the 9 databases; globally, all databases evaluated significantly differed from the "ideal" database; some DB performed better but performance was inconsistent at subsections level, within the same DB. The method developed allows quantification of the inclusion of relevant information items in DB and comparison with an "ideal database". It is necessary to consult diverse DB in order to find all the relevant information needed to support clinical drug use.
Reijnders, Margot R F; Miller, Kerry A; Alvi, Mohsan; Goos, Jacqueline A C; Lees, Melissa M; de Burca, Anna; Henderson, Alex; Kraus, Alison; Mikat, Barbara; de Vries, Bert B A; Isidor, Bertrand; Kerr, Bronwyn; Marcelis, Carlo; Schluth-Bolard, Caroline; Deshpande, Charu; Ruivenkamp, Claudia A L; Wieczorek, Dagmar; Baralle, Diana; Blair, Edward M; Engels, Hartmut; Lüdecke, Hermann-Josef; Eason, Jacqueline; Santen, Gijs W E; Clayton-Smith, Jill; Chandler, Kate; Tatton-Brown, Katrina; Payne, Katelyn; Helbig, Katherine; Radtke, Kelly; Nugent, Kimberly M; Cremer, Kirsten; Strom, Tim M; Bird, Lynne M; Sinnema, Margje; Bitner-Glindzicz, Maria; van Dooren, Marieke F; Alders, Marielle; Koopmans, Marije; Brick, Lauren; Kozenko, Mariya; Harline, Megan L; Klaassens, Merel; Steinraths, Michelle; Cooper, Nicola S; Edery, Patrick; Yap, Patrick; Terhal, Paulien A; van der Spek, Peter J; Lakeman, Phillis; Taylor, Rachel L; Littlejohn, Rebecca O; Pfundt, Rolph; Mercimek-Andrews, Saadet; Stegmann, Alexander P A; Kant, Sarina G; McLean, Scott; Joss, Shelagh; Swagemakers, Sigrid M A; Douzgou, Sofia; Wall, Steven A; Küry, Sébastien; Calpena, Eduardo; Koelling, Nils; McGowan, Simon J; Twigg, Stephen R F; Mathijssen, Irene M J; Nellaker, Christoffer; Brunner, Han G; Wilkie, Andrew O M
2018-06-07
Next-generation sequencing is a powerful tool for the discovery of genes related to neurodevelopmental disorders (NDDs). Here, we report the identification of a distinct syndrome due to de novo or inherited heterozygous mutations in Tousled-like kinase 2 (TLK2) in 38 unrelated individuals and two affected mothers, using whole-exome and whole-genome sequencing technologies, matchmaker databases, and international collaborations. Affected individuals had a consistent phenotype, characterized by mild-borderline neurodevelopmental delay (86%), behavioral disorders (68%), severe gastro-intestinal problems (63%), and facial dysmorphism including blepharophimosis (82%), telecanthus (74%), prominent nasal bridge (68%), broad nasal tip (66%), thin vermilion of the upper lip (62%), and upslanting palpebral fissures (55%). Analysis of cell lines from three affected individuals showed that mutations act through a loss-of-function mechanism in at least two case subjects. Genotype-phenotype analysis and comparison of computationally modeled faces showed that phenotypes of these and other individuals with loss-of-function variants significantly overlapped with phenotypes of individuals with other variant types (missense and C-terminal truncating). This suggests that haploinsufficiency of TLK2 is the most likely underlying disease mechanism, leading to a consistent neurodevelopmental phenotype. This work illustrates the power of international data sharing, by the identification of 40 individuals from 26 different centers in 7 different countries, allowing the identification, clinical delineation, and genotype-phenotype evaluation of a distinct NDD caused by mutations in TLK2. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Bueno, Anibal; Rodríguez-López, Rocío; Reyes-Palomares, Armando; Rojano, Elena; Corpas, Manuel; Nevado, Julián; Lapunzina, Pablo; Sánchez-Jiménez, Francisca; Ranea, Juan A G
2018-06-26
Copy number variations (CNVs) are genomic structural variations (deletions, duplications, or translocations) that represent the 4.8-9.5% of human genome variation in healthy individuals. In some cases, CNVs can also lead to disease, being the etiology of many known rare genetic/genomic disorders. Despite the last advances in genomic sequencing and diagnosis, the pathological effects of many rare genetic variations remain unresolved, largely due to the low number of patients available for these cases, making it difficult to identify consistent patterns of genotype-phenotype relationships. We aimed to improve the identification of statistically consistent genotype-phenotype relationships by integrating all the genetic and clinical data of thousands of patients with rare genomic disorders (obtained from the DECIPHER database) into a phenotype-patient-genotype tripartite network. Then we assessed how our network approach could help in the characterization and diagnosis of novel cases in clinical genetics. The systematic approach implemented in this work is able to better define the relationships between phenotypes and specific loci, by exploiting large-scale association networks of phenotypes and genotypes in thousands of rare disease patients. The application of the described methodology facilitated the diagnosis of novel clinical cases, ranking phenotypes by locus specificity and reporting putative new clinical features that may suggest additional clinical follow-ups. In this work, the proof of concept developed over a set of novel clinical cases demonstrates that this network-based methodology might help improve the precision of patient clinical records and the characterization of rare syndromes.
Database Systems. Course Three. Information Systems Curriculum.
ERIC Educational Resources Information Center
O'Neil, Sharon Lund; Everett, Donna R.
This course is the third of seven in the Information Systems curriculum. The purpose of the course is to familiarize students with database management concepts and standard database management software. Databases and their roles, advantages, and limitations are explained. An overview of the course sets forth the condition and performance standard…
40 CFR 1400.13 - Read-only database.
Code of Federal Regulations, 2012 CFR
2012-07-01
... 40 Protection of Environment 34 2012-07-01 2012-07-01 false Read-only database. 1400.13 Section... INFORMATION Other Provisions § 1400.13 Read-only database. The Administrator is authorized to establish... public off-site consequence analysis information by means of a central database under the control of the...
Tourism through Travel Club: A Database Project
ERIC Educational Resources Information Center
Pratt, Renée M. E.; Smatt, Cindi T.; Wynn, Donald E.
2017-01-01
This applied database exercise utilizes a scenario-based case study to teach the basics of Microsoft Access and database management in introduction to information systems and introduction to database course. The case includes background information on a start-up business (i.e., Carol's Travel Club), description of functional business requirements,…
40 CFR 1400.13 - Read-only database.
Code of Federal Regulations, 2014 CFR
2014-07-01
... 40 Protection of Environment 33 2014-07-01 2014-07-01 false Read-only database. 1400.13 Section... INFORMATION Other Provisions § 1400.13 Read-only database. The Administrator is authorized to establish... public off-site consequence analysis information by means of a central database under the control of the...
40 CFR 1400.13 - Read-only database.
Code of Federal Regulations, 2011 CFR
2011-07-01
... 40 Protection of Environment 33 2011-07-01 2011-07-01 false Read-only database. 1400.13 Section... INFORMATION Other Provisions § 1400.13 Read-only database. The Administrator is authorized to establish... public off-site consequence analysis information by means of a central database under the control of the...
Federal Register 2010, 2011, 2012, 2013, 2014
2010-01-29
...; Comment Request Clinical Trials Reporting Program (CTRP) Database (NCI) Summary: Under the provisions of... Collection: Title: Clinical Trials Reporting Program (CTRP) Database. Type of Information Collection Request... Program (CTRP) Database, to serve as a single, definitive source of information about all NCI-supported...
40 CFR 1400.13 - Read-only database.
Code of Federal Regulations, 2013 CFR
2013-07-01
... 40 Protection of Environment 34 2013-07-01 2013-07-01 false Read-only database. 1400.13 Section... INFORMATION Other Provisions § 1400.13 Read-only database. The Administrator is authorized to establish... public off-site consequence analysis information by means of a central database under the control of the...
Marine and Hydrokinetic Data | Geospatial Data Science | NREL
. wave energy resource using a 51-month Wavewatch III hindcast database developed by the National Database The U.S. Department of Energy's Marine and Hydrokinetic Technology Database provides information database includes wave, tidal, current, and ocean thermal energy and contains information about energy
19 CFR 351.304 - Establishing business proprietary treatment of information.
Code of Federal Regulations, 2012 CFR
2012-04-01
... information. 351.304 Section 351.304 Customs Duties INTERNATIONAL TRADE ADMINISTRATION, DEPARTMENT OF COMMERCE...) Electronic databases. In accordance with § 351.303(c)(3), an electronic database need not contain brackets... in the database. The public version of the database must be publicly summarized and ranged in...