Sample records for model organism database

  1. Re-thinking organisms: The impact of databases on model organism biology.

    PubMed

    Leonelli, Sabina; Ankeny, Rachel A

    2012-03-01

    Community databases have become crucial to the collection, ordering and retrieval of data gathered on model organisms, as well as to the ways in which these data are interpreted and used across a range of research contexts. This paper analyses the impact of community databases on research practices in model organism biology by focusing on the history and current use of four community databases: FlyBase, Mouse Genome Informatics, WormBase and The Arabidopsis Information Resource. We discuss the standards used by the curators of these databases for what counts as reliable evidence, acceptable terminology, appropriate experimental set-ups and adequate materials (e.g., specimens). On the one hand, these choices are informed by the collaborative research ethos characterising most model organism communities. On the other hand, the deployment of these standards in databases reinforces this ethos and gives it concrete and precise instantiations by shaping the skills, practices, values and background knowledge required of the database users. We conclude that the increasing reliance on community databases as vehicles to circulate data is having a major impact on how researchers conduct and communicate their research, which affects how they understand the biology of model organisms and its relation to the biology of other species. Copyright © 2011 Elsevier Ltd. All rights reserved.

  2. GMODWeb: a web framework for the generic model organism database

    PubMed Central

    O'Connor, Brian D; Day, Allen; Cain, Scott; Arnaiz, Olivier; Sperling, Linda; Stein, Lincoln D

    2008-01-01

    The Generic Model Organism Database (GMOD) initiative provides species-agnostic data models and software tools for representing curated model organism data. Here we describe GMODWeb, a GMOD project designed to speed the development of model organism database (MOD) websites. Sites created with GMODWeb provide integration with other GMOD tools and allow users to browse and search through a variety of data types. GMODWeb was built using the open source Turnkey web framework and is available from . PMID:18570664

  3. Using semantic data modeling techniques to organize an object-oriented database for extending the mass storage model

    NASA Technical Reports Server (NTRS)

    Campbell, William J.; Short, Nicholas M., Jr.; Roelofs, Larry H.; Dorfman, Erik

    1991-01-01

    A methodology for optimizing organization of data obtained by NASA earth and space missions is discussed. The methodology uses a concept based on semantic data modeling techniques implemented in a hierarchical storage model. The modeling is used to organize objects in mass storage devices, relational database systems, and object-oriented databases. The semantic data modeling at the metadata record level is examined, including the simulation of a knowledge base and semantic metadata storage issues. The semantic data model hierarchy and its application for efficient data storage is addressed, as is the mapping of the application structure to the mass storage.

  4. MaizeGDB update: New tools, data, and interface for the maize model organism database

    USDA-ARS?s Scientific Manuscript database

    MaizeGDB is a highly curated, community-oriented database and informatics service to researchers focused on the crop plant and model organism Zea mays ssp. mays. Although some form of the maize community database has existed over the last 25 years, there have only been two major releases. In 1991, ...

  5. The Microphysiology Systems Database for Analyzing and Modeling Compound Interactions with Human and Animal Organ Models

    PubMed Central

    Vernetti, Lawrence; Bergenthal, Luke; Shun, Tong Ying; Taylor, D. Lansing

    2016-01-01

    Abstract Microfluidic human organ models, microphysiology systems (MPS), are currently being developed as predictive models of drug safety and efficacy in humans. To design and validate MPS as predictive of human safety liabilities requires safety data for a reference set of compounds, combined with in vitro data from the human organ models. To address this need, we have developed an internet database, the MPS database (MPS-Db), as a powerful platform for experimental design, data management, and analysis, and to combine experimental data with reference data, to enable computational modeling. The present study demonstrates the capability of the MPS-Db in early safety testing using a human liver MPS to relate the effects of tolcapone and entacapone in the in vitro model to human in vivo effects. These two compounds were chosen to be evaluated as a representative pair of marketed drugs because they are structurally similar, have the same target, and were found safe or had an acceptable risk in preclinical and clinical trials, yet tolcapone induced unacceptable levels of hepatotoxicity while entacapone was found to be safe. Results demonstrate the utility of the MPS-Db as an essential resource for relating in vitro organ model data to the multiple biochemical, preclinical, and clinical data sources on in vivo drug effects. PMID:28781990

  6. IntPath--an integrated pathway gene relationship database for model organisms and important pathogens

    PubMed Central

    2012-01-01

    Background Pathway data are important for understanding the relationship between genes, proteins and many other molecules in living organisms. Pathway gene relationships are crucial information for guidance, prediction, reference and assessment in biochemistry, computational biology, and medicine. Many well-established databases--e.g., KEGG, WikiPathways, and BioCyc--are dedicated to collecting pathway data for public access. However, the effectiveness of these databases is hindered by issues such as incompatible data formats, inconsistent molecular representations, inconsistent molecular relationship representations, inconsistent referrals to pathway names, and incomprehensive data from different databases. Results In this paper, we overcome these issues through extraction, normalization and integration of pathway data from several major public databases (KEGG, WikiPathways, BioCyc, etc). We build a database that not only hosts our integrated pathway gene relationship data for public access but also maintains the necessary updates in the long run. This public repository is named IntPath (Integrated Pathway gene relationship database for model organisms and important pathogens). Four organisms--S. cerevisiae, M. tuberculosis H37Rv, H. Sapiens and M. musculus--are included in this version (V2.0) of IntPath. IntPath uses the "full unification" approach to ensure no deletion and no introduced noise in this process. Therefore, IntPath contains much richer pathway-gene and pathway-gene pair relationships and much larger number of non-redundant genes and gene pairs than any of the single-source databases. The gene relationships of each gene (measured by average node degree) per pathway are significantly richer. The gene relationships in each pathway (measured by average number of gene pairs per pathway) are also considerably richer in the integrated pathways. Moderate manual curation are involved to get rid of errors and noises from source data (e.g., the gene ID errors

  7. IntPath--an integrated pathway gene relationship database for model organisms and important pathogens.

    PubMed

    Zhou, Hufeng; Jin, Jingjing; Zhang, Haojun; Yi, Bo; Wozniak, Michal; Wong, Limsoon

    2012-01-01

    Pathway data are important for understanding the relationship between genes, proteins and many other molecules in living organisms. Pathway gene relationships are crucial information for guidance, prediction, reference and assessment in biochemistry, computational biology, and medicine. Many well-established databases--e.g., KEGG, WikiPathways, and BioCyc--are dedicated to collecting pathway data for public access. However, the effectiveness of these databases is hindered by issues such as incompatible data formats, inconsistent molecular representations, inconsistent molecular relationship representations, inconsistent referrals to pathway names, and incomprehensive data from different databases. In this paper, we overcome these issues through extraction, normalization and integration of pathway data from several major public databases (KEGG, WikiPathways, BioCyc, etc). We build a database that not only hosts our integrated pathway gene relationship data for public access but also maintains the necessary updates in the long run. This public repository is named IntPath (Integrated Pathway gene relationship database for model organisms and important pathogens). Four organisms--S. cerevisiae, M. tuberculosis H37Rv, H. Sapiens and M. musculus--are included in this version (V2.0) of IntPath. IntPath uses the "full unification" approach to ensure no deletion and no introduced noise in this process. Therefore, IntPath contains much richer pathway-gene and pathway-gene pair relationships and much larger number of non-redundant genes and gene pairs than any of the single-source databases. The gene relationships of each gene (measured by average node degree) per pathway are significantly richer. The gene relationships in each pathway (measured by average number of gene pairs per pathway) are also considerably richer in the integrated pathways. Moderate manual curation are involved to get rid of errors and noises from source data (e.g., the gene ID errors in WikiPathways and

  8. The Zebrafish Model Organism Database: new support for human disease models, mutation details, gene expression phenotypes and searching

    PubMed Central

    Howe, Douglas G.; Bradford, Yvonne M.; Eagle, Anne; Fashena, David; Frazer, Ken; Kalita, Patrick; Mani, Prita; Martin, Ryan; Moxon, Sierra Taylor; Paddock, Holly; Pich, Christian; Ramachandran, Sridhar; Ruzicka, Leyla; Schaper, Kevin; Shao, Xiang; Singer, Amy; Toro, Sabrina; Van Slyke, Ceri; Westerfield, Monte

    2017-01-01

    The Zebrafish Model Organism Database (ZFIN; http://zfin.org) is the central resource for zebrafish (Danio rerio) genetic, genomic, phenotypic and developmental data. ZFIN curators provide expert manual curation and integration of comprehensive data involving zebrafish genes, mutants, transgenic constructs and lines, phenotypes, genotypes, gene expressions, morpholinos, TALENs, CRISPRs, antibodies, anatomical structures, models of human disease and publications. We integrate curated, directly submitted, and collaboratively generated data, making these available to zebrafish research community. Among the vertebrate model organisms, zebrafish are superbly suited for rapid generation of sequence-targeted mutant lines, characterization of phenotypes including gene expression patterns, and generation of human disease models. The recent rapid adoption of zebrafish as human disease models is making management of these data particularly important to both the research and clinical communities. Here, we describe recent enhancements to ZFIN including use of the zebrafish experimental conditions ontology, ‘Fish’ records in the ZFIN database, support for gene expression phenotypes, models of human disease, mutation details at the DNA, RNA and protein levels, and updates to the ZFIN single box search. PMID:27899582

  9. Curation accuracy of model organism databases

    PubMed Central

    Keseler, Ingrid M.; Skrzypek, Marek; Weerasinghe, Deepika; Chen, Albert Y.; Fulcher, Carol; Li, Gene-Wei; Lemmer, Kimberly C.; Mladinich, Katherine M.; Chow, Edmond D.; Sherlock, Gavin; Karp, Peter D.

    2014-01-01

    Manual extraction of information from the biomedical literature—or biocuration—is the central methodology used to construct many biological databases. For example, the UniProt protein database, the EcoCyc Escherichia coli database and the Candida Genome Database (CGD) are all based on biocuration. Biological databases are used extensively by life science researchers, as online encyclopedias, as aids in the interpretation of new experimental data and as golden standards for the development of new bioinformatics algorithms. Although manual curation has been assumed to be highly accurate, we are aware of only one previous study of biocuration accuracy. We assessed the accuracy of EcoCyc and CGD by manually selecting curated assertions within randomly chosen EcoCyc and CGD gene pages and by then validating that the data found in the referenced publications supported those assertions. A database assertion is considered to be in error if that assertion could not be found in the publication cited for that assertion. We identified 10 errors in the 633 facts that we validated across the two databases, for an overall error rate of 1.58%, and individual error rates of 1.82% for CGD and 1.40% for EcoCyc. These data suggest that manual curation of the experimental literature by Ph.D-level scientists is highly accurate. Database URL: http://ecocyc.org/, http://www.candidagenome.org// PMID:24923819

  10. Building a Database for a Quantitative Model

    NASA Technical Reports Server (NTRS)

    Kahn, C. Joseph; Kleinhammer, Roger

    2014-01-01

    A database can greatly benefit a quantitative analysis. The defining characteristic of a quantitative risk, or reliability, model is the use of failure estimate data. Models can easily contain a thousand Basic Events, relying on hundreds of individual data sources. Obviously, entering so much data by hand will eventually lead to errors. Not so obviously entering data this way does not aid linking the Basic Events to the data sources. The best way to organize large amounts of data on a computer is with a database. But a model does not require a large, enterprise-level database with dedicated developers and administrators. A database built in Excel can be quite sufficient. A simple spreadsheet database can link every Basic Event to the individual data source selected for them. This database can also contain the manipulations appropriate for how the data is used in the model. These manipulations include stressing factors based on use and maintenance cycles, dormancy, unique failure modes, the modeling of multiple items as a single "Super component" Basic Event, and Bayesian Updating based on flight and testing experience. A simple, unique metadata field in both the model and database provides a link from any Basic Event in the model to its data source and all relevant calculations. The credibility for the entire model often rests on the credibility and traceability of the data.

  11. Integration of an Evidence Base into a Probabilistic Risk Assessment Model. The Integrated Medical Model Database: An Organized Evidence Base for Assessing In-Flight Crew Health Risk and System Design

    NASA Technical Reports Server (NTRS)

    Saile, Lynn; Lopez, Vilma; Bickham, Grandin; FreiredeCarvalho, Mary; Kerstman, Eric; Byrne, Vicky; Butler, Douglas; Myers, Jerry; Walton, Marlei

    2011-01-01

    This slide presentation reviews the Integrated Medical Model (IMM) database, which is an organized evidence base for assessing in-flight crew health risk. The database is a relational database accessible to many people. The database quantifies the model inputs by a ranking based on the highest value of the data as Level of Evidence (LOE) and the quality of evidence (QOE) score that provides an assessment of the evidence base for each medical condition. The IMM evidence base has already been able to provide invaluable information for designers, and for other uses.

  12. Development and Mining of a Volatile Organic Compound Database

    PubMed Central

    Abdullah, Azian Azamimi; Ono, Naoaki; Sugiura, Tadao; Morita, Aki Hirai; Katsuragi, Tetsuo; Muto, Ai; Nishioka, Takaaki; Kanaya, Shigehiko

    2015-01-01

    Volatile organic compounds (VOCs) are small molecules that exhibit high vapor pressure under ambient conditions and have low boiling points. Although VOCs contribute only a small proportion of the total metabolites produced by living organisms, they play an important role in chemical ecology specifically in the biological interactions between organisms and ecosystems. VOCs are also important in the health care field as they are presently used as a biomarker to detect various human diseases. Information on VOCs is scattered in the literature until now; however, there is still no available database describing VOCs and their biological activities. To attain this purpose, we have developed KNApSAcK Metabolite Ecology Database, which contains the information on the relationships between VOCs and their emitting organisms. The KNApSAcK Metabolite Ecology is also linked with the KNApSAcK Core and KNApSAcK Metabolite Activity Database to provide further information on the metabolites and their biological activities. The VOC database can be accessed online. PMID:26495281

  13. Modeling and Databases for Teaching Petrology

    NASA Astrophysics Data System (ADS)

    Asher, P.; Dutrow, B.

    2003-12-01

    With the widespread availability of high-speed computers with massive storage and ready transport capability of large amounts of data, computational and petrologic modeling and the use of databases provide new tools with which to teach petrology. Modeling can be used to gain insights into a system, predict system behavior, describe a system's processes, compare with a natural system or simply to be illustrative. These aspects result from data driven or empirical, analytical or numerical models or the concurrent examination of multiple lines of evidence. At the same time, use of models can enhance core foundations of the geosciences by improving critical thinking skills and by reinforcing prior knowledge gained. However, the use of modeling to teach petrology is dictated by the level of expectation we have for students and their facility with modeling approaches. For example, do we expect students to push buttons and navigate a program, understand the conceptual model and/or evaluate the results of a model. Whatever the desired level of sophistication, specific elements of design should be incorporated into a modeling exercise for effective teaching. These include, but are not limited to; use of the scientific method, use of prior knowledge, a clear statement of purpose and goals, attainable goals, a connection to the natural/actual system, a demonstration that complex heterogeneous natural systems are amenable to analyses by these techniques and, ideally, connections to other disciplines and the larger earth system. Databases offer another avenue with which to explore petrology. Large datasets are available that allow integration of multiple lines of evidence to attack a petrologic problem or understand a petrologic process. These are collected into a database that offers a tool for exploring, organizing and analyzing the data. For example, datasets may be geochemical, mineralogic, experimental and/or visual in nature, covering global, regional to local scales

  14. Community Organizing for Database Trial Buy-In by Patrons

    ERIC Educational Resources Information Center

    Pionke, J. J.

    2015-01-01

    Database trials do not often garner a lot of feedback. Using community-organizing techniques can not only potentially increase the amount of feedback received but also deepen the relationship between the librarian and his or her constituent group. This is a case study of the use of community-organizing techniques in a series of database trials for…

  15. MODBASE, a database of annotated comparative protein structure models

    PubMed Central

    Pieper, Ursula; Eswar, Narayanan; Stuart, Ashley C.; Ilyin, Valentin A.; Sali, Andrej

    2002-01-01

    MODBASE (http://guitar.rockefeller.edu/modbase) is a relational database of annotated comparative protein structure models for all available protein sequences matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on PSI-BLAST, IMPALA and MODELLER. MODBASE uses the MySQL relational database management system for flexible and efficient querying, and the MODVIEW Netscape plugin for viewing and manipulating multiple sequences and structures. It is updated regularly to reflect the growth of the protein sequence and structure databases, as well as improvements in the software for calculating the models. For ease of access, MODBASE is organized into different datasets. The largest dataset contains models for domains in 304 517 out of 539 171 unique protein sequences in the complete TrEMBL database (23 March 2001); only models based on significant alignments (PSI-BLAST E-value < 10–4) and models assessed to have the correct fold are included. Other datasets include models for target selection and structure-based annotation by the New York Structural Genomics Research Consortium, models for prediction of genes in the Drosophila melanogaster genome, models for structure determination of several ribosomal particles and models calculated by the MODWEB comparative modeling web server. PMID:11752309

  16. DEPOT: A Database of Environmental Parameters, Organizations and Tools

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    CARSON,SUSAN D.; HUNTER,REGINA LEE; MALCZYNSKI,LEONARD A.

    2000-12-19

    The Database of Environmental Parameters, Organizations, and Tools (DEPOT) has been developed by the Department of Energy (DOE) as a central warehouse for access to data essential for environmental risk assessment analyses. Initial efforts have concentrated on groundwater and vadose zone transport data and bioaccumulation factors. DEPOT seeks to provide a source of referenced data that, wherever possible, includes the level of uncertainty associated with these parameters. Based on the amount of data available for a particular parameter, uncertainty is expressed as a standard deviation or a distribution function. DEPOT also provides DOE site-specific performance assessment data, pathway-specific transport data,more » and links to environmental regulations, disposal site waste acceptance criteria, other environmental parameter databases, and environmental risk assessment models.« less

  17. Design and Establishment of Quality Model of Fundamental Geographic Information Database

    NASA Astrophysics Data System (ADS)

    Ma, W.; Zhang, J.; Zhao, Y.; Zhang, P.; Dang, Y.; Zhao, T.

    2018-04-01

    In order to make the quality evaluation for the Fundamental Geographic Information Databases(FGIDB) more comprehensive, objective and accurate, this paper studies and establishes a quality model of FGIDB, which formed by the standardization of database construction and quality control, the conformity of data set quality and the functionality of database management system, and also designs the overall principles, contents and methods of the quality evaluation for FGIDB, providing the basis and reference for carry out quality control and quality evaluation for FGIDB. This paper designs the quality elements, evaluation items and properties of the Fundamental Geographic Information Database gradually based on the quality model framework. Connected organically, these quality elements and evaluation items constitute the quality model of the Fundamental Geographic Information Database. This model is the foundation for the quality demand stipulation and quality evaluation of the Fundamental Geographic Information Database, and is of great significance on the quality assurance in the design and development stage, the demand formulation in the testing evaluation stage, and the standard system construction for quality evaluation technology of the Fundamental Geographic Information Database.

  18. The CoFactor database: organic cofactors in enzyme catalysis.

    PubMed

    Fischer, Julia D; Holliday, Gemma L; Thornton, Janet M

    2010-10-01

    Organic enzyme cofactors are involved in many enzyme reactions. Therefore, the analysis of cofactors is crucial to gain a better understanding of enzyme catalysis. To aid this, we have created the CoFactor database. CoFactor provides a web interface to access hand-curated data extracted from the literature on organic enzyme cofactors in biocatalysis, as well as automatically collected information. CoFactor includes information on the conformational and solvent accessibility variation of the enzyme-bound cofactors, as well as mechanistic and structural information about the hosting enzymes. The database is publicly available and can be accessed at http://www.ebi.ac.uk/thornton-srv/databases/CoFactor.

  19. Software Engineering Laboratory (SEL) database organization and user's guide

    NASA Technical Reports Server (NTRS)

    So, Maria; Heller, Gerard; Steinberg, Sandra; Spiegel, Douglas

    1989-01-01

    The organization of the Software Engineering Laboratory (SEL) database is presented. Included are definitions and detailed descriptions of the database tables and views, the SEL data, and system support data. The mapping from the SEL and system support data to the base tables is described. In addition, techniques for accessing the database, through the Database Access Manager for the SEL (DAMSEL) system and via the ORACLE structured query language (SQL), are discussed.

  20. The PMDB Protein Model Database

    PubMed Central

    Castrignanò, Tiziana; De Meo, Paolo D'Onorio; Cozzetto, Domenico; Talamo, Ivano Giuseppe; Tramontano, Anna

    2006-01-01

    The Protein Model Database (PMDB) is a public resource aimed at storing manually built 3D models of proteins. The database is designed to provide access to models published in the scientific literature, together with validating experimental data. It is a relational database and it currently contains >74 000 models for ∼240 proteins. The system is accessible at and allows predictors to submit models along with related supporting evidence and users to download them through a simple and intuitive interface. Users can navigate in the database and retrieve models referring to the same target protein or to different regions of the same protein. Each model is assigned a unique identifier that allows interested users to directly access the data. PMID:16381873

  1. Organizing a breast cancer database: data management.

    PubMed

    Yi, Min; Hunt, Kelly K

    2016-06-01

    Developing and organizing a breast cancer database can provide data and serve as valuable research tools for those interested in the etiology, diagnosis, and treatment of cancer. Depending on the research setting, the quality of the data can be a major issue. Assuring that the data collection process does not contribute inaccuracies can help to assure the overall quality of subsequent analyses. Data management is work that involves the planning, development, implementation, and administration of systems for the acquisition, storage, and retrieval of data while protecting it by implementing high security levels. A properly designed database provides you with access to up-to-date, accurate information. Database design is an important component of application design. If you take the time to design your databases properly, you'll be rewarded with a solid application foundation on which you can build the rest of your application.

  2. Development of a database for chemical mechanism assignments for volatile organic emissions.

    PubMed

    Carter, William P L

    2015-10-01

    The development of a database for making model species assignments when preparing total organic gas (TOG) emissions input for atmospheric models is described. This database currently has assignments of model species for 12 different gas-phase chemical mechanisms for over 1700 chemical compounds and covers over 3000 chemical categories used in five different anthropogenic TOG profile databases or output by two different biogenic emissions models. This involved developing a unified chemical classification system, assigning compounds to mixtures, assigning model species for the mechanisms to the compounds, and making assignments for unknown, unassigned, and nonvolatile mass. The comprehensiveness of the assignments, the contributions of various types of speciation categories to current profile and total emissions data, inconsistencies with existing undocumented model species assignments, and remaining speciation issues and areas of needed work are also discussed. The use of the system to prepare input for SMOKE, the Speciation Tool, and for biogenic models is described in the supplementary materials. The database, associated programs and files, and a users manual are available online at http://www.cert.ucr.edu/~carter/emitdb . Assigning air quality model species to the hundreds of emitted chemicals is a necessary link between emissions data and modeling effects of emissions on air quality. This is not easy and makes it difficult to implement new and more chemically detailed mechanisms in models. If done incorrectly, it is similar to errors in emissions speciation or the chemical mechanism used. Nevertheless, making such assignments is often an afterthought in chemical mechanism development and emissions processing, and existing assignments are usually undocumented and have errors and inconsistencies. This work is designed to address some of these problems.

  3. MaizeGDB, the maize model organism database

    USDA-ARS?s Scientific Manuscript database

    MaizeGDB is the maize research community's database for maize genetic and genomic information. In this seminar I will outline our current endeavors including a full website redesign, the status of maize genome assembly and annotation projects, and work toward genome functional annotation. Mechanis...

  4. Xenbase, the Xenopus model organism database; new virtualized system, data types and genomes

    PubMed Central

    Karpinka, J. Brad; Fortriede, Joshua D.; Burns, Kevin A.; James-Zorn, Christina; Ponferrada, Virgilio G.; Lee, Jacqueline; Karimi, Kamran; Zorn, Aaron M.; Vize, Peter D.

    2015-01-01

    Xenbase (http://www.xenbase.org), the Xenopus frog model organism database, integrates a wide variety of data from this biomedical model genus. Two closely related species are represented: the allotetraploid Xenopus laevis that is widely used for microinjection and tissue explant-based protocols, and the diploid Xenopus tropicalis which is used for genetics and gene targeting. The two species are extremely similar and protocols, reagents and results from each species are often interchangeable. Xenbase imports, indexes, curates and manages data from both species; all of which are mapped via unique IDs and can be queried in either a species-specific or species agnostic manner. All our services have now migrated to a private cloud to achieve better performance and reliability. We have added new content, including providing full support for morpholino reagents, used to inhibit mRNA translation or splicing and binding to regulatory microRNAs. New genomes assembled by the JGI for both species and are displayed in Gbrowse and are also available for searches using BLAST. Researchers can easily navigate from genome content to gene page reports, literature, experimental reagents and many other features using hyperlinks. Xenbase has also greatly expanded image content for figures published in papers describing Xenopus research via PubMedCentral. PMID:25313157

  5. Organization and dissemination of multimedia medical databases on the WWW.

    PubMed

    Todorovski, L; Ribaric, S; Dimec, J; Hudomalj, E; Lunder, T

    1999-01-01

    In the paper, we focus on the problem of building and disseminating multimedia medical databases on the World Wide Web (WWW). The current results of the ongoing project of building a prototype dermatology images database and its WWW presentation are presented. The dermatology database is part of an ambitious plan concerning an organization of a network of medical institutions building distributed and federated multimedia databases of a much wider scale.

  6. Xenbase, the Xenopus model organism database; new virtualized system, data types and genomes.

    PubMed

    Karpinka, J Brad; Fortriede, Joshua D; Burns, Kevin A; James-Zorn, Christina; Ponferrada, Virgilio G; Lee, Jacqueline; Karimi, Kamran; Zorn, Aaron M; Vize, Peter D

    2015-01-01

    Xenbase (http://www.xenbase.org), the Xenopus frog model organism database, integrates a wide variety of data from this biomedical model genus. Two closely related species are represented: the allotetraploid Xenopus laevis that is widely used for microinjection and tissue explant-based protocols, and the diploid Xenopus tropicalis which is used for genetics and gene targeting. The two species are extremely similar and protocols, reagents and results from each species are often interchangeable. Xenbase imports, indexes, curates and manages data from both species; all of which are mapped via unique IDs and can be queried in either a species-specific or species agnostic manner. All our services have now migrated to a private cloud to achieve better performance and reliability. We have added new content, including providing full support for morpholino reagents, used to inhibit mRNA translation or splicing and binding to regulatory microRNAs. New genomes assembled by the JGI for both species and are displayed in Gbrowse and are also available for searches using BLAST. Researchers can easily navigate from genome content to gene page reports, literature, experimental reagents and many other features using hyperlinks. Xenbase has also greatly expanded image content for figures published in papers describing Xenopus research via PubMedCentral. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. Database for propagation models

    NASA Astrophysics Data System (ADS)

    Kantak, Anil V.

    1991-07-01

    A propagation researcher or a systems engineer who intends to use the results of a propagation experiment is generally faced with various database tasks such as the selection of the computer software, the hardware, and the writing of the programs to pass the data through the models of interest. This task is repeated every time a new experiment is conducted or the same experiment is carried out at a different location generating different data. Thus the users of this data have to spend a considerable portion of their time learning how to implement the computer hardware and the software towards the desired end. This situation may be facilitated considerably if an easily accessible propagation database is created that has all the accepted (standardized) propagation phenomena models approved by the propagation research community. Also, the handling of data will become easier for the user. Such a database construction can only stimulate the growth of the propagation research it if is available to all the researchers, so that the results of the experiment conducted by one researcher can be examined independently by another, without different hardware and software being used. The database may be made flexible so that the researchers need not be confined only to the contents of the database. Another way in which the database may help the researchers is by the fact that they will not have to document the software and hardware tools used in their research since the propagation research community will know the database already. The following sections show a possible database construction, as well as properties of the database for the propagation research.

  8. Software Engineering Laboratory (SEL) database organization and user's guide, revision 2

    NASA Technical Reports Server (NTRS)

    Morusiewicz, Linda; Bristow, John

    1992-01-01

    The organization of the Software Engineering Laboratory (SEL) database is presented. Included are definitions and detailed descriptions of the database tables and views, the SEL data, and system support data. The mapping from the SEL and system support data to the base table is described. In addition, techniques for accessing the database through the Database Access Manager for the SEL (DAMSEL) system and via the ORACLE structured query language (SQL) are discussed.

  9. Conceptual and logical level of database modeling

    NASA Astrophysics Data System (ADS)

    Hunka, Frantisek; Matula, Jiri

    2016-06-01

    Conceptual and logical levels form the top most levels of database modeling. Usually, ORM (Object Role Modeling) and ER diagrams are utilized to capture the corresponding schema. The final aim of business process modeling is to store its results in the form of database solution. For this reason, value oriented business process modeling which utilizes ER diagram to express the modeling entities and relationships between them are used. However, ER diagrams form the logical level of database schema. To extend possibilities of different business process modeling methodologies, the conceptual level of database modeling is needed. The paper deals with the REA value modeling approach to business process modeling using ER-diagrams, and derives conceptual model utilizing ORM modeling approach. Conceptual model extends possibilities for value modeling to other business modeling approaches.

  10. Dynamic publication model for neurophysiology databases.

    PubMed

    Gardner, D; Abato, M; Knuth, K H; DeBellis, R; Erde, S M

    2001-08-29

    We have implemented a pair of database projects, one serving cortical electrophysiology and the other invertebrate neurones and recordings. The design for each combines aspects of two proven schemes for information interchange. The journal article metaphor determined the type, scope, organization and quantity of data to comprise each submission. Sequence databases encouraged intuitive tools for data viewing, capture, and direct submission by authors. Neurophysiology required transcending these models with new datatypes. Time-series, histogram and bivariate datatypes, including illustration-like wrappers, were selected by their utility to the community of investigators. As interpretation of neurophysiological recordings depends on context supplied by metadata attributes, searches are via visual interfaces to sets of controlled-vocabulary metadata trees. Neurones, for example, can be specified by metadata describing functional and anatomical characteristics. Permanence is advanced by data model and data formats largely independent of contemporary technology or implementation, including Java and the XML standard. All user tools, including dynamic data viewers that serve as a virtual oscilloscope, are Java-based, free, multiplatform, and distributed by our application servers to any contemporary networked computer. Copyright is retained by submitters; viewer displays are dynamic and do not violate copyright of related journal figures. Panels of neurophysiologists view and test schemas and tools, enhancing community support.

  11. dSED: A database tool for modeling sediment early diagenesis

    NASA Astrophysics Data System (ADS)

    Katsev, S.; Rancourt, D. G.; L'Heureux, I.

    2003-04-01

    Sediment early diagenesis reaction transport models (RTMs) are becoming powerful tools in providing kinetic descriptions of the metal and nutrient diagenetic cycling in marine, lacustrine, estuarine, and other aquatic sediments, as well as of exchanges with the water column. Whereas there exist several good database/program combinations for thermodynamic equilibrium calculations in aqueous systems, at present there exist no database tools for classification and analysis of the kinetic data essential to RTM development. We present a database tool that is intended to serve as an online resource for information about chemical reactions, solid phase and solute reactants, sorption reactions, transport mechanisms, and kinetic and equilibrium parameters that are relevant to sediment diagenesis processes. The list of reactive substances includes but is not limited to organic matter, Fe and Mn oxides and oxyhydroxides, sulfides and sulfates, calcium, iron, and manganese carbonates, phosphorus-bearing minerals, and silicates. Aqueous phases include dissolved carbon dioxide, oxygen, methane, hydrogen sulfide, sulfate, nitrate, phosphate, some organic compounds, and dissolved metal species. A number of filters allow extracting information according to user-specified criteria, e.g., about a class of substances contributing to the cycling of iron. The database also includes bibliographic information about published diagenetic models and the reactions and processes that they consider. At the time of preparing this abstract, dSED contained 128 reactions and 12 pre-defined filters. dSED is maintained by the Lake Sediment Structure and Evolution (LSSE) group at the University of Ottawa (www.science.uottawa.ca/LSSE/dSED) and we invite input from the geochemical community.

  12. The methodology of database design in organization management systems

    NASA Astrophysics Data System (ADS)

    Chudinov, I. L.; Osipova, V. V.; Bobrova, Y. V.

    2017-01-01

    The paper describes the unified methodology of database design for management information systems. Designing the conceptual information model for the domain area is the most important and labor-intensive stage in database design. Basing on the proposed integrated approach to design, the conceptual information model, the main principles of developing the relation databases are provided and user’s information needs are considered. According to the methodology, the process of designing the conceptual information model includes three basic stages, which are defined in detail. Finally, the article describes the process of performing the results of analyzing user’s information needs and the rationale for use of classifiers.

  13. Nonbibliographic Databases in a Corporate Health, Safety, and Environment Organization.

    ERIC Educational Resources Information Center

    Cubillas, Mary M.

    1981-01-01

    Summarizes the characteristics of TOXIN, CHEMFILE, and the Product Profile Information System (PPIS), nonbibliographic databases used by Shell Oil Company's Health, Safety, and Environment Organization. (FM)

  14. Fish Karyome version 2.1: a chromosome database of fishes and other aquatic organisms

    PubMed Central

    Nagpure, Naresh Sahebrao; Pathak, Ajey Kumar; Pati, Rameshwar; Rashid, Iliyas; Sharma, Jyoti; Singh, Shri Prakash; Singh, Mahender; Sarkar, Uttam Kumar; Kushwaha, Basdeo; Kumar, Ravindra; Murali, S.

    2016-01-01

    A voluminous information is available on karyological studies of fishes; however, limited efforts were made for compilation and curation of the available karyological data in a digital form. ‘Fish Karyome’ database was the preliminary attempt to compile and digitize the available karyological information on finfishes belonging to the Indian subcontinent. But the database had limitations since it covered data only on Indian finfishes with limited search options. Perceiving the feedbacks from the users and its utility in fish cytogenetic studies, the Fish Karyome database was upgraded by applying Linux, Apache, MySQL and PHP (pre hypertext processor) (LAMP) technologies. In the present version, the scope of the system was increased by compiling and curating the available chromosomal information over the globe on fishes and other aquatic organisms, such as echinoderms, molluscs and arthropods, especially of aquaculture importance. Thus, Fish Karyome version 2.1 presently covers 866 chromosomal records for 726 species supported with 253 published articles and the information is being updated regularly. The database provides information on chromosome number and morphology, sex chromosomes, chromosome banding, molecular cytogenetic markers, etc. supported by fish and karyotype images through interactive tools. It also enables the users to browse and view chromosomal information based on habitat, family, conservation status and chromosome number. The system also displays chromosome number in model organisms, protocol for chromosome preparation and allied techniques and glossary of cytogenetic terms. A data submission facility has also been provided through data submission panel. The database can serve as a unique and useful resource for cytogenetic characterization, sex determination, chromosomal mapping, cytotaxonomy, karyo-evolution and systematics of fishes. Database URL: http://mail.nbfgr.res.in/Fish_Karyome PMID:26980518

  15. Developing High-resolution Soil Database for Regional Crop Modeling in East Africa

    NASA Astrophysics Data System (ADS)

    Han, E.; Ines, A. V. M.

    2014-12-01

    The most readily available soil data for regional crop modeling in Africa is the World Inventory of Soil Emission potentials (WISE) dataset, which has 1125 soil profiles for the world, but does not extensively cover countries Ethiopia, Kenya, Uganda and Tanzania in East Africa. Another dataset available is the HC27 (Harvest Choice by IFPRI) in a gridded format (10km) but composed of generic soil profiles based on only three criteria (texture, rooting depth, and organic carbon content). In this paper, we present a development and application of a high-resolution (1km), gridded soil database for regional crop modeling in East Africa. Basic soil information is extracted from Africa Soil Information Service (AfSIS), which provides essential soil properties (bulk density, soil organic carbon, soil PH and percentages of sand, silt and clay) for 6 different standardized soil layers (5, 15, 30, 60, 100 and 200 cm) in 1km resolution. Soil hydraulic properties (e.g., field capacity and wilting point) are derived from the AfSIS soil dataset using well-proven pedo-transfer functions and are customized for DSSAT-CSM soil data requirements. The crop model is used to evaluate crop yield forecasts using the new high resolution soil database and compared with WISE and HC27. In this paper we will present also the results of DSSAT loosely coupled with a hydrologic model (VIC) to assimilate root-zone soil moisture. Creating a grid-based soil database, which provides a consistent soil input for two different models (DSSAT and VIC) is a critical part of this work. The created soil database is expected to contribute to future applications of DSSAT crop simulation in East Africa where food security is highly vulnerable.

  16. Database of natural matrix reference materials (NMRM) for organic constituents.

    PubMed

    Iyengar, G V; Bleise, A R

    2001-06-01

    The International Atomic Energy Agency maintains a database of internationally available certified reference materials (CRM) of natural matrices. This database is periodically updated, and presently documents nearly 25,000 measurands in 1,700 materials. The organic constituents are classified in five major groups of analytes aliphatic and aromatic hydrocarbons (A), chlorinated hydrocarbons (B), pesticides (C), organometallic compounds (D) and other organic constituents (nutrients, etc.) (E). The matrices include natural materials such as body fluids, food products, soils, and sediments, terrestrial (e.g. plants), and anthropogenic products (e.g. dust, fly ash). These five organic groups of analytes encompass more than 2000 measurands for 420 different analytes in 230 materials. Of these measurands, 1,682 (68%) are certified, and 768 (32%) are provided as informational values. Within each major group of analytes, measurands reported as informational values accounted for: A (35%); B (35%); C (26%); D (10%), and E (22%). The high proportion of informational values (i.e. non-certified values) for A, B, and C, compares well with a similar but undesirable situation faced in the nineteen-seventies in the inorganic area when simultaneous multielement techniques became available. In the case of D and E, it appears that mostly targeted analytes are measured, leading to a cohesive certification profile. Although the IAEA database is not equally comprehensive for all groups of analytes cited above, it can still serve as an useful indicator of the status of organic constituents in RMs.

  17. A Conceptual Model of the Information Requirements of Nursing Organizations

    PubMed Central

    Miller, Emmy

    1989-01-01

    Three related issues play a role in the identification of the information requirements of nursing organizations. These issues are the current state of computer systems in health care organizations, the lack of a well-defined data set for nursing, and the absence of models representing data and information relevant to clinical and administrative nursing practice. This paper will examine current methods of data collection, processing, and storage in clinical and administrative nursing practice for the purpose of identifying the information requirements of nursing organizations. To satisfy these information requirements, database technology can be used; however, a model for database design is needed that reflects the conceptual framework of nursing and the professional concerns of nurses. A conceptual model of the types of data necessary to produce the desired information will be presented and the relationships among data will be delineated.

  18. Assessment of the SFC database for analysis and modeling

    NASA Technical Reports Server (NTRS)

    Centeno, Martha A.

    1994-01-01

    SFC is one of the four clusters that make up the Integrated Work Control System (IWCS), which will integrate the shuttle processing databases at Kennedy Space Center (KSC). The IWCS framework will enable communication among the four clusters and add new data collection protocols. The Shop Floor Control (SFC) module has been operational for two and a half years; however, at this stage, automatic links to the other 3 modules have not been implemented yet, except for a partial link to IOS (CASPR). SFC revolves around a DB/2 database with PFORMS acting as the database management system (DBMS). PFORMS is an off-the-shelf DB/2 application that provides a set of data entry screens and query forms. The main dynamic entity in the SFC and IOS database is a task; thus, the physical storage location and update privileges are driven by the status of the WAD. As we explored the SFC values, we realized that there was much to do before actually engaging in continuous analysis of the SFC data. Half way into this effort, it was realized that full scale analysis would have to be a future third phase of this effort. So, we concentrated on getting to know the contents of the database, and in establishing an initial set of tools to start the continuous analysis process. Specifically, we set out to: (1) provide specific procedures for statistical models, so as to enhance the TP-OAO office analysis and modeling capabilities; (2) design a data exchange interface; (3) prototype the interface to provide inputs to SCRAM; and (4) design a modeling database. These objectives were set with the expectation that, if met, they would provide former TP-OAO engineers with tools that would help them demonstrate the importance of process-based analyses. The latter, in return, will help them obtain the cooperation of various organizations in charting out their individual processes.

  19. Cyclebase 3.0: a multi-organism database on cell-cycle regulation and phenotypes.

    PubMed

    Santos, Alberto; Wernersson, Rasmus; Jensen, Lars Juhl

    2015-01-01

    The eukaryotic cell division cycle is a highly regulated process that consists of a complex series of events and involves thousands of proteins. Researchers have studied the regulation of the cell cycle in several organisms, employing a wide range of high-throughput technologies, such as microarray-based mRNA expression profiling and quantitative proteomics. Due to its complexity, the cell cycle can also fail or otherwise change in many different ways if important genes are knocked out, which has been studied in several microscopy-based knockdown screens. The data from these many large-scale efforts are not easily accessed, analyzed and combined due to their inherent heterogeneity. To address this, we have created Cyclebase--available at http://www.cyclebase.org--an online database that allows users to easily visualize and download results from genome-wide cell-cycle-related experiments. In Cyclebase version 3.0, we have updated the content of the database to reflect changes to genome annotation, added new mRNA and protein expression data, and integrated cell-cycle phenotype information from high-content screens and model-organism databases. The new version of Cyclebase also features a new web interface, designed around an overview figure that summarizes all the cell-cycle-related data for a gene. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. Fish Karyome version 2.1: a chromosome database of fishes and other aquatic organisms.

    PubMed

    Nagpure, Naresh Sahebrao; Pathak, Ajey Kumar; Pati, Rameshwar; Rashid, Iliyas; Sharma, Jyoti; Singh, Shri Prakash; Singh, Mahender; Sarkar, Uttam Kumar; Kushwaha, Basdeo; Kumar, Ravindra; Murali, S

    2016-01-01

    A voluminous information is available on karyological studies of fishes; however, limited efforts were made for compilation and curation of the available karyological data in a digital form. 'Fish Karyome' database was the preliminary attempt to compile and digitize the available karyological information on finfishes belonging to the Indian subcontinent. But the database had limitations since it covered data only on Indian finfishes with limited search options. Perceiving the feedbacks from the users and its utility in fish cytogenetic studies, the Fish Karyome database was upgraded by applying Linux, Apache, MySQL and PHP (pre hypertext processor) (LAMP) technologies. In the present version, the scope of the system was increased by compiling and curating the available chromosomal information over the globe on fishes and other aquatic organisms, such as echinoderms, molluscs and arthropods, especially of aquaculture importance. Thus, Fish Karyome version 2.1 presently covers 866 chromosomal records for 726 species supported with 253 published articles and the information is being updated regularly. The database provides information on chromosome number and morphology, sex chromosomes, chromosome banding, molecular cytogenetic markers, etc. supported by fish and karyotype images through interactive tools. It also enables the users to browse and view chromosomal information based on habitat, family, conservation status and chromosome number. The system also displays chromosome number in model organisms, protocol for chromosome preparation and allied techniques and glossary of cytogenetic terms. A data submission facility has also been provided through data submission panel. The database can serve as a unique and useful resource for cytogenetic characterization, sex determination, chromosomal mapping, cytotaxonomy, karyo-evolution and systematics of fishes. Database URL: http://mail.nbfgr.res.in/Fish_Karyome. © The Author(s) 2016. Published by Oxford University Press.

  1. Models in Translational Oncology: A Public Resource Database for Preclinical Cancer Research.

    PubMed

    Galuschka, Claudia; Proynova, Rumyana; Roth, Benjamin; Augustin, Hellmut G; Müller-Decker, Karin

    2017-05-15

    The devastating diseases of human cancer are mimicked in basic and translational cancer research by a steadily increasing number of tumor models, a situation requiring a platform with standardized reports to share model data. Models in Translational Oncology (MiTO) database was developed as a unique Web platform aiming for a comprehensive overview of preclinical models covering genetically engineered organisms, models of transplantation, chemical/physical induction, or spontaneous development, reviewed here. MiTO serves data entry for metastasis profiles and interventions. Moreover, cell lines and animal lines including tool strains can be recorded. Hyperlinks for connection with other databases and file uploads as supplementary information are supported. Several communication tools are offered to facilitate exchange of information. Notably, intellectual property can be protected prior to publication by inventor-defined accessibility of any given model. Data recall is via a highly configurable keyword search. Genome editing is expected to result in changes of the spectrum of model organisms, a reason to open MiTO for species-independent data. Registered users may deposit own model fact sheets (FS). MiTO experts check them for plausibility. Independently, manually curated FS are provided to principle investigators for revision and publication. Importantly, noneditable versions of reviewed FS can be cited in peer-reviewed journals. Cancer Res; 77(10); 2557-63. ©2017 AACR . ©2017 American Association for Cancer Research.

  2. MOSAIC: An organic geochemical and sedimentological database for marine surface sediments

    NASA Astrophysics Data System (ADS)

    Tavagna, Maria Luisa; Usman, Muhammed; De Avelar, Silvania; Eglinton, Timothy

    2015-04-01

    Modern ocean sediments serve as the interface between the biosphere and the geosphere, play a key role in biogeochemical cycles and provide a window on how contemporary processes are written into the sedimentary record. Research over past decades has resulted in a wealth of information on the content and composition of organic matter in marine sediments, with ever-more sophisticated techniques continuing to yield information of greater detail and as an accelerating pace. However, there has been no attempt to synthesize this wealth of information. We are establishing a new database that incorporates information relevant to local, regional and global-scale assessment of the content, source and fate of organic materials accumulating in contemporary marine sediments. In the MOSAIC (Modern Ocean Sediment Archive and Inventory of Carbon) database, particular emphasis is placed on molecular and isotopic information, coupled with relevant contextual information (e.g., sedimentological properties) relevant to elucidating factors that influence the efficiency and nature of organic matter burial. The main features of MOSAIC include: (i) Emphasis on continental margin sediments as major loci of carbon burial, and as the interface between terrestrial and oceanic realms; (ii) Bulk to molecular-level organic geochemical properties and parameters, including concentration and isotopic compositions; (iii) Inclusion of extensive contextual data regarding the depositional setting, in particular with respect to sedimentological and redox characteristics. The ultimate goal is to create an open-access instrument, available on the web, to be utilized for research and education by the international community who can both contribute to, and interrogate the database. The submission will be accomplished by means of a pre-configured table available on the MOSAIC webpage. The information on the filled tables will be checked and eventually imported, via the Structural Query Language (SQL), into

  3. Organizing, exploring, and analyzing antibody sequence data: the case for relational-database managers.

    PubMed

    Owens, John

    2009-01-01

    Technological advances in the acquisition of DNA and protein sequence information and the resulting onrush of data can quickly overwhelm the scientist unprepared for the volume of information that must be evaluated and carefully dissected to discover its significance. Few laboratories have the luxury of dedicated personnel to organize, analyze, or consistently record a mix of arriving sequence data. A methodology based on a modern relational-database manager is presented that is both a natural storage vessel for antibody sequence information and a conduit for organizing and exploring sequence data and accompanying annotation text. The expertise necessary to implement such a plan is equal to that required by electronic word processors or spreadsheet applications. Antibody sequence projects maintained as independent databases are selectively unified by the relational-database manager into larger database families that contribute to local analyses, reports, interactive HTML pages, or exported to facilities dedicated to sophisticated sequence analysis techniques. Database files are transposable among current versions of Microsoft, Macintosh, and UNIX operating systems.

  4. Expanding on Successful Concepts, Models, and Organization

    EPA Science Inventory

    If the goal of the AEP framework was to replace existing exposure models or databases for organizing exposure data with a concept, we would share Dr. von Göetz concerns. Instead, the outcome we promote is broader use of an organizational framework for exposure science. The f...

  5. A case study for a digital seabed database: Bohai Sea engineering geology database

    NASA Astrophysics Data System (ADS)

    Tianyun, Su; Shikui, Zhai; Baohua, Liu; Ruicai, Liang; Yanpeng, Zheng; Yong, Wang

    2006-07-01

    This paper discusses the designing plan of ORACLE-based Bohai Sea engineering geology database structure from requisition analysis, conceptual structure analysis, logical structure analysis, physical structure analysis and security designing. In the study, we used the object-oriented Unified Modeling Language (UML) to model the conceptual structure of the database and used the powerful function of data management which the object-oriented and relational database ORACLE provides to organize and manage the storage space and improve its security performance. By this means, the database can provide rapid and highly effective performance in data storage, maintenance and query to satisfy the application requisition of the Bohai Sea Oilfield Paradigm Area Information System.

  6. PHYTOTOX: DATABASE DEALING WITH THE EFFECT OF ORGANIC CHEMICALS ON TERRESTRIAL VASCULAR PLANTS

    EPA Science Inventory

    A new database, PHYTOTOX, dealing with the direct effects of exogenously supplied organic chemicals on terrestrial vascular plants is described. The database consists of two files, a Reference File and Effects File. The Reference File is a bibliographic file of published research...

  7. DAMIT: a database of asteroid models

    NASA Astrophysics Data System (ADS)

    Durech, J.; Sidorin, V.; Kaasalainen, M.

    2010-04-01

    Context. Apart from a few targets that were directly imaged by spacecraft, remote sensing techniques are the main source of information about the basic physical properties of asteroids, such as the size, the spin state, or the spectral type. The most widely used observing technique - time-resolved photometry - provides us with data that can be used for deriving asteroid shapes and spin states. In the past decade, inversion of asteroid lightcurves has led to more than a hundred asteroid models. In the next decade, when data from all-sky surveys are available, the number of asteroid models will increase. Combining photometry with, e.g., adaptive optics data produces more detailed models. Aims: We created the Database of Asteroid Models from Inversion Techniques (DAMIT) with the aim of providing the astronomical community access to reliable and up-to-date physical models of asteroids - i.e., their shapes, rotation periods, and spin axis directions. Models from DAMIT can be used for further detailed studies of individual objects, as well as for statistical studies of the whole set. Methods: Most DAMIT models were derived from photometric data by the lightcurve inversion method. Some of them have been further refined or scaled using adaptive optics images, infrared observations, or occultation data. A substantial number of the models were derived also using sparse photometric data from astrometric databases. Results: At present, the database contains models of more than one hundred asteroids. For each asteroid, DAMIT provides the polyhedral shape model, the sidereal rotation period, the spin axis direction, and the photometric data used for the inversion. The database is updated when new models are available or when already published models are updated or refined. We have also released the C source code for the lightcurve inversion and for the direct problem (updates and extensions will follow).

  8. A Framework for Cloudy Model Optimization and Database Storage

    NASA Astrophysics Data System (ADS)

    Calvén, Emilia; Helton, Andrew; Sankrit, Ravi

    2018-01-01

    We present a framework for producing Cloudy photoionization models of the nebular emission from novae ejecta and storing a subset of the results in SQL database format for later usage. The database can be searched for models best fitting observed spectral line ratios. Additionally, the framework includes an optimization feature that can be used in tandem with the database to search for and improve on models by creating new Cloudy models while, varying the parameters. The database search and optimization can be used to explore the structures of nebulae by deriving their properties from the best-fit models. The goal is to provide the community with a large database of Cloudy photoionization models, generated from parameters reflecting conditions within novae ejecta, that can be easily fitted to observed spectral lines; either by directly accessing the database using the framework code or by usage of a website specifically made for this purpose.

  9. BioModels Database: a repository of mathematical models of biological processes.

    PubMed

    Chelliah, Vijayalakshmi; Laibe, Camille; Le Novère, Nicolas

    2013-01-01

    BioModels Database is a public online resource that allows storing and sharing of published, peer-reviewed quantitative, dynamic models of biological processes. The model components and behaviour are thoroughly checked to correspond the original publication and manually curated to ensure reliability. Furthermore, the model elements are annotated with terms from controlled vocabularies as well as linked to relevant external data resources. This greatly helps in model interpretation and reuse. Models are stored in SBML format, accepted in SBML and CellML formats, and are available for download in various other common formats such as BioPAX, Octave, SciLab, VCML, XPP and PDF, in addition to SBML. The reaction network diagram of the models is also available in several formats. BioModels Database features a search engine, which provides simple and more advanced searches. Features such as online simulation and creation of smaller models (submodels) from the selected model elements of a larger one are provided. BioModels Database can be accessed both via a web interface and programmatically via web services. New models are available in BioModels Database at regular releases, about every 4 months.

  10. The BioGRID interaction database: 2017 update

    PubMed Central

    Chatr-aryamontri, Andrew; Oughtred, Rose; Boucher, Lorrie; Rust, Jennifer; Chang, Christie; Kolas, Nadine K.; O'Donnell, Lara; Oster, Sara; Theesfeld, Chandra; Sellam, Adnane; Stark, Chris; Breitkreutz, Bobby-Joe; Dolinski, Kara; Tyers, Mike

    2017-01-01

    The Biological General Repository for Interaction Datasets (BioGRID: https://thebiogrid.org) is an open access database dedicated to the annotation and archival of protein, genetic and chemical interactions for all major model organism species and humans. As of September 2016 (build 3.4.140), the BioGRID contains 1 072 173 genetic and protein interactions, and 38 559 post-translational modifications, as manually annotated from 48 114 publications. This dataset represents interaction records for 66 model organisms and represents a 30% increase compared to the previous 2015 BioGRID update. BioGRID curates the biomedical literature for major model organism species, including humans, with a recent emphasis on central biological processes and specific human diseases. To facilitate network-based approaches to drug discovery, BioGRID now incorporates 27 501 chemical–protein interactions for human drug targets, as drawn from the DrugBank database. A new dynamic interaction network viewer allows the easy navigation and filtering of all genetic and protein interaction data, as well as for bioactive compounds and their established targets. BioGRID data are directly downloadable without restriction in a variety of standardized formats and are freely distributed through partner model organism databases and meta-databases. PMID:27980099

  11. Applications of the Cambridge Structural Database in organic chemistry and crystal chemistry.

    PubMed

    Allen, Frank H; Motherwell, W D Samuel

    2002-06-01

    The Cambridge Structural Database (CSD) and its associated software systems have formed the basis for more than 800 research applications in structural chemistry, crystallography and the life sciences. Relevant references, dating from the mid-1970s, and brief synopses of these papers are collected in a database, DBUse, which is freely available via the CCDC website. This database has been used to review research applications of the CSD in organic chemistry, including supramolecular applications, and in organic crystal chemistry. The review concentrates on applications that have been published since 1990 and covers a wide range of topics, including structure correlation, conformational analysis, hydrogen bonding and other intermolecular interactions, studies of crystal packing, extended structural motifs, crystal engineering and polymorphism, and crystal structure prediction. Applications of CSD information in studies of crystal structure precision, the determination of crystal structures from powder diffraction data, together with applications in chemical informatics, are also discussed.

  12. Using LUCAS topsoil database to estimate soil organic carbon content in local spectral libraries

    NASA Astrophysics Data System (ADS)

    Castaldi, Fabio; van Wesemael, Bas; Chabrillat, Sabine; Chartin, Caroline

    2017-04-01

    The quantification of the soil organic carbon (SOC) content over large areas is mandatory to obtain accurate soil characterization and classification, which can improve site specific management at local or regional scale exploiting the strong relationship between SOC and crop growth. The estimation of the SOC is not only important for agricultural purposes: in recent years, the increasing attention towards global warming highlighted the crucial role of the soil in the global carbon cycle. In this context, soil spectroscopy is a well consolidated and widespread method to estimate soil variables exploiting the interaction between chromophores and electromagnetic radiation. The importance of spectroscopy in soil science is reflected by the increasing number of large soil spectral libraries collected in the world. These large libraries contain soil samples derived from a consistent number of pedological regions and thus from different parent material and soil types; this heterogeneity entails, in turn, a large variability in terms of mineralogical and organic composition. In the light of the huge variability of the spectral responses to SOC content and composition, a rigorous classification process is necessary to subset large spectral libraries and to avoid the calibration of global models failing to predict local variation in SOC content. In this regard, this study proposes a method to subset the European LUCAS topsoil database into soil classes using a clustering analysis based on a large number of soil properties. The LUCAS database was chosen to apply a standardized multivariate calibration approach valid for large areas without the need for extensive field and laboratory work for calibration of local models. Seven soil classes were detected by the clustering analyses and the samples belonging to each class were used to calibrate specific partial least square regression (PLSR) models to estimate SOC content of three local libraries collected in Belgium (Loam belt

  13. Techniques to Access Databases and Integrate Data for Hydrologic Modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Whelan, Gene; Tenney, Nathan D.; Pelton, Mitchell A.

    2009-06-17

    This document addresses techniques to access and integrate data for defining site-specific conditions and behaviors associated with ground-water and surface-water radionuclide transport applicable to U.S. Nuclear Regulatory Commission reviews. Environmental models typically require input data from multiple internal and external sources that may include, but are not limited to, stream and rainfall gage data, meteorological data, hydrogeological data, habitat data, and biological data. These data may be retrieved from a variety of organizations (e.g., federal, state, and regional) and source types (e.g., HTTP, FTP, and databases). Available data sources relevant to hydrologic analyses for reactor licensing are identified and reviewed.more » The data sources described can be useful to define model inputs and parameters, including site features (e.g., watershed boundaries, stream locations, reservoirs, site topography), site properties (e.g., surface conditions, subsurface hydraulic properties, water quality), and site boundary conditions, input forcings, and extreme events (e.g., stream discharge, lake levels, precipitation, recharge, flood and drought characteristics). Available software tools for accessing established databases, retrieving the data, and integrating it with models were identified and reviewed. The emphasis in this review was on existing software products with minimal required modifications to enable their use with the FRAMES modeling framework. The ability of four of these tools to access and retrieve the identified data sources was reviewed. These four software tools were the Hydrologic Data Acquisition and Processing System (HDAPS), Integrated Water Resources Modeling System (IWRMS) External Data Harvester, Data for Environmental Modeling Environmental Data Download Tool (D4EM EDDT), and the FRAMES Internet Database Tools. The IWRMS External Data Harvester and the D4EM EDDT were identified as the most promising tools based on their ability to

  14. MARRVEL: Integration of Human and Model Organism Genetic Resources to Facilitate Functional Annotation of the Human Genome.

    PubMed

    Wang, Julia; Al-Ouran, Rami; Hu, Yanhui; Kim, Seon-Young; Wan, Ying-Wooi; Wangler, Michael F; Yamamoto, Shinya; Chao, Hsiao-Tuan; Comjean, Aram; Mohr, Stephanie E; Perrimon, Norbert; Liu, Zhandong; Bellen, Hugo J

    2017-06-01

    One major challenge encountered with interpreting human genetic variants is the limited understanding of the functional impact of genetic alterations on biological processes. Furthermore, there remains an unmet demand for an efficient survey of the wealth of information on human homologs in model organisms across numerous databases. To efficiently assess the large volume of publically available information, it is important to provide a concise summary of the most relevant information in a rapid user-friendly format. To this end, we created MARRVEL (model organism aggregated resources for rare variant exploration). MARRVEL is a publicly available website that integrates information from six human genetic databases and seven model organism databases. For any given variant or gene, MARRVEL displays information from OMIM, ExAC, ClinVar, Geno2MP, DGV, and DECIPHER. Importantly, it curates model organism-specific databases to concurrently display a concise summary regarding the human gene homologs in budding and fission yeast, worm, fly, fish, mouse, and rat on a single webpage. Experiment-based information on tissue expression, protein subcellular localization, biological process, and molecular function for the human gene and homologs in the seven model organisms are arranged into a concise output. Hence, rather than visiting multiple separate databases for variant and gene analysis, users can obtain important information by searching once through MARRVEL. Altogether, MARRVEL dramatically improves efficiency and accessibility to data collection and facilitates analysis of human genes and variants by cross-disciplinary integration of 18 million records available in public databases to facilitate clinical diagnosis and basic research. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  15. Carotenoids Database: structures, chemical fingerprints and distribution among organisms

    PubMed Central

    2017-01-01

    Abstract To promote understanding of how organisms are related via carotenoids, either evolutionarily or symbiotically, or in food chains through natural histories, we built the Carotenoids Database. This provides chemical information on 1117 natural carotenoids with 683 source organisms. For extracting organisms closely related through the biosynthesis of carotenoids, we offer a new similarity search system ‘Search similar carotenoids’ using our original chemical fingerprint ‘Carotenoid DB Chemical Fingerprints’. These Carotenoid DB Chemical Fingerprints describe the chemical substructure and the modification details based upon International Union of Pure and Applied Chemistry (IUPAC) semi-systematic names of the carotenoids. The fingerprints also allow (i) easier prediction of six biological functions of carotenoids: provitamin A, membrane stabilizers, odorous substances, allelochemicals, antiproliferative activity and reverse MDR activity against cancer cells, (ii) easier classification of carotenoid structures, (iii) partial and exact structure searching and (iv) easier extraction of structural isomers and stereoisomers. We believe this to be the first attempt to establish fingerprints using the IUPAC semi-systematic names. For extracting close profiled organisms, we provide a new tool ‘Search similar profiled organisms’. Our current statistics show some insights into natural history: carotenoids seem to have been spread largely by bacteria, as they produce C30, C40, C45 and C50 carotenoids, with the widest range of end groups, and they share a small portion of C40 carotenoids with eukaryotes. Archaea share an even smaller portion with eukaryotes. Eukaryotes then have evolved a considerable variety of C40 carotenoids. Considering carotenoids, eukaryotes seem more closely related to bacteria than to archaea aside from 16S rRNA lineage analysis. Database URL: http://carotenoiddb.jp PMID:28365725

  16. Insertion algorithms for network model database management systems

    NASA Astrophysics Data System (ADS)

    Mamadolimov, Abdurashid; Khikmat, Saburov

    2017-12-01

    The network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, forms partial order. When a database is large and a query comparison is expensive then the efficiency requirement of managing algorithms is minimizing the number of query comparisons. We consider updating operation for network model database management systems. We develop a new sequantial algorithm for updating operation. Also we suggest a distributed version of the algorithm.

  17. Filling Terrorism Gaps: VEOs, Evaluating Databases, and Applying Risk Terrain Modeling to Terrorism

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hagan, Ross F.

    2016-08-29

    This paper aims to address three issues: the lack of literature differentiating terrorism and violent extremist organizations (VEOs), terrorism incident databases, and the applicability of Risk Terrain Modeling (RTM) to terrorism. Current open source literature and publicly available government sources do not differentiate between terrorism and VEOs; furthermore, they fail to define them. Addressing the lack of a comprehensive comparison of existing terrorism data sources, a matrix comparing a dozen terrorism databases is constructed, providing insight toward the array of data available. RTM, a method for spatial risk analysis at a micro level, has some applicability to terrorism research, particularlymore » for studies looking at risk indicators of terrorism. Leveraging attack data from multiple databases, combined with RTM, offers one avenue for closing existing research gaps in terrorism literature.« less

  18. SIMS: addressing the problem of heterogeneity in databases

    NASA Astrophysics Data System (ADS)

    Arens, Yigal

    1997-02-01

    The heterogeneity of remotely accessible databases -- with respect to contents, query language, semantics, organization, etc. -- presents serious obstacles to convenient querying. The SIMS (single interface to multiple sources) system addresses this global integration problem. It does so by defining a single language for describing the domain about which information is stored in the databases and using this language as the query language. Each database to which SIMS is to provide access is modeled using this language. The model describes a database's contents, organization, and other relevant features. SIMS uses these models, together with a planning system drawing on techniques from artificial intelligence, to decompose a given user's high-level query into a series of queries against the databases and other data manipulation steps. The retrieval plan is constructed so as to minimize data movement over the network and maximize parallelism to increase execution speed. SIMS can recover from network failures during plan execution by obtaining data from alternate sources, when possible. SIMS has been demonstrated in the domains of medical informatics and logistics, using real databases.

  19. Database integration in a multimedia-modeling environment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dorow, Kevin E.

    2002-09-02

    Integration of data from disparate remote sources has direct applicability to modeling, which can support Brownfield assessments. To accomplish this task, a data integration framework needs to be established. A key element in this framework is the metadata that creates the relationship between the pieces of information that are important in the multimedia modeling environment and the information that is stored in the remote data source. The design philosophy is to allow modelers and database owners to collaborate by defining this metadata in such a way that allows interaction between their components. The main parts of this framework include toolsmore » to facilitate metadata definition, database extraction plan creation, automated extraction plan execution / data retrieval, and a central clearing house for metadata and modeling / database resources. Cross-platform compatibility (using Java) and standard communications protocols (http / https) allow these parts to run in a wide variety of computing environments (Local Area Networks, Internet, etc.), and, therefore, this framework provides many benefits. Because of the specific data relationships described in the metadata, the amount of data that have to be transferred is kept to a minimum (only the data that fulfill a specific request are provided as opposed to transferring the complete contents of a data source). This allows for real-time data extraction from the actual source. Also, the framework sets up collaborative responsibilities such that the different types of participants have control over the areas in which they have domain knowledge-the modelers are responsible for defining the data relevant to their models, while the database owners are responsible for mapping the contents of the database using the metadata definitions. Finally, the data extraction mechanism allows for the ability to control access to the data and what data are made available.« less

  20. Immediate Dissemination of Student Discoveries to a Model Organism Database Enhances Classroom-Based Research Experiences

    PubMed Central

    Wiley, Emily A.; Stover, Nicholas A.

    2014-01-01

    Use of inquiry-based research modules in the classroom has soared over recent years, largely in response to national calls for teaching that provides experience with scientific processes and methodologies. To increase the visibility of in-class studies among interested researchers and to strengthen their impact on student learning, we have extended the typical model of inquiry-based labs to include a means for targeted dissemination of student-generated discoveries. This initiative required: 1) creating a set of research-based lab activities with the potential to yield results that a particular scientific community would find useful and 2) developing a means for immediate sharing of student-generated results. Working toward these goals, we designed guides for course-based research aimed to fulfill the need for functional annotation of the Tetrahymena thermophila genome, and developed an interactive Web database that links directly to the official Tetrahymena Genome Database for immediate, targeted dissemination of student discoveries. This combination of research via the course modules and the opportunity for students to immediately “publish” their novel results on a Web database actively used by outside scientists culminated in a motivational tool that enhanced students’ efforts to engage the scientific process and pursue additional research opportunities beyond the course. PMID:24591511

  1. Immediate dissemination of student discoveries to a model organism database enhances classroom-based research experiences.

    PubMed

    Wiley, Emily A; Stover, Nicholas A

    2014-01-01

    Use of inquiry-based research modules in the classroom has soared over recent years, largely in response to national calls for teaching that provides experience with scientific processes and methodologies. To increase the visibility of in-class studies among interested researchers and to strengthen their impact on student learning, we have extended the typical model of inquiry-based labs to include a means for targeted dissemination of student-generated discoveries. This initiative required: 1) creating a set of research-based lab activities with the potential to yield results that a particular scientific community would find useful and 2) developing a means for immediate sharing of student-generated results. Working toward these goals, we designed guides for course-based research aimed to fulfill the need for functional annotation of the Tetrahymena thermophila genome, and developed an interactive Web database that links directly to the official Tetrahymena Genome Database for immediate, targeted dissemination of student discoveries. This combination of research via the course modules and the opportunity for students to immediately "publish" their novel results on a Web database actively used by outside scientists culminated in a motivational tool that enhanced students' efforts to engage the scientific process and pursue additional research opportunities beyond the course.

  2. A Model Based Mars Climate Database for the Mission Design

    NASA Technical Reports Server (NTRS)

    2005-01-01

    A viewgraph presentation on a model based climate database is shown. The topics include: 1) Why a model based climate database?; 2) Mars Climate Database v3.1 Who uses it ? (approx. 60 users!); 3) The new Mars Climate database MCD v4.0; 4) MCD v4.0: what's new ? 5) Simulation of Water ice clouds; 6) Simulation of Water ice cycle; 7) A new tool for surface pressure prediction; 8) Acces to the database MCD 4.0; 9) How to access the database; and 10) New web access

  3. A database for estimating organ dose for coronary angiography and brain perfusion CT scans for arbitrary spectra and angular tube current modulation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rupcich, Franco; Badal, Andreu; Kyprianou, Iacovos

    Purpose: The purpose of this study was to develop a database for estimating organ dose in a voxelized patient model for coronary angiography and brain perfusion CT acquisitions with any spectra and angular tube current modulation setting. The database enables organ dose estimation for existing and novel acquisition techniques without requiring Monte Carlo simulations. Methods: The study simulated transport of monoenergetic photons between 5 and 150 keV for 1000 projections over 360 Degree-Sign through anthropomorphic voxelized female chest and head (0 Degree-Sign and 30 Degree-Sign tilt) phantoms and standard head and body CTDI dosimetry cylinders. The simulations resulted in tablesmore » of normalized dose deposition for several radiosensitive organs quantifying the organ dose per emitted photon for each incident photon energy and projection angle for coronary angiography and brain perfusion acquisitions. The values in a table can be multiplied by an incident spectrum and number of photons at each projection angle and then summed across all energies and angles to estimate total organ dose. Scanner-specific organ dose may be approximated by normalizing the database-estimated organ dose by the database-estimated CTDI{sub vol} and multiplying by a physical CTDI{sub vol} measurement. Two examples are provided demonstrating how to use the tables to estimate relative organ dose. In the first, the change in breast and lung dose during coronary angiography CT scans is calculated for reduced kVp, angular tube current modulation, and partial angle scanning protocols relative to a reference protocol. In the second example, the change in dose to the eye lens is calculated for a brain perfusion CT acquisition in which the gantry is tilted 30 Degree-Sign relative to a nontilted scan. Results: Our database provides tables of normalized dose deposition for several radiosensitive organs irradiated during coronary angiography and brain perfusion CT scans. Validation results

  4. Java Web Simulation (JWS); a web based database of kinetic models.

    PubMed

    Snoep, J L; Olivier, B G

    2002-01-01

    Software to make a database of kinetic models accessible via the internet has been developed and a core database has been set up at http://jjj.biochem.sun.ac.za/. This repository of models, available to everyone with internet access, opens a whole new way in which we can make our models public. Via the database, a user can change enzyme parameters and run time simulations or steady state analyses. The interface is user friendly and no additional software is necessary. The database currently contains 10 models, but since the generation of the program code to include new models has largely been automated the addition of new models is straightforward and people are invited to submit their models to be included in the database.

  5. Intrusion Detection in Database Systems

    NASA Astrophysics Data System (ADS)

    Javidi, Mohammad M.; Sohrabi, Mina; Rafsanjani, Marjan Kuchaki

    Data represent today a valuable asset for organizations and companies and must be protected. Ensuring the security and privacy of data assets is a crucial and very difficult problem in our modern networked world. Despite the necessity of protecting information stored in database systems (DBS), existing security models are insufficient to prevent misuse, especially insider abuse by legitimate users. One mechanism to safeguard the information in these databases is to use an intrusion detection system (IDS). The purpose of Intrusion detection in database systems is to detect transactions that access data without permission. In this paper several database Intrusion detection approaches are evaluated.

  6. PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification.

    PubMed

    Thomas, Paul D; Kejariwal, Anish; Campbell, Michael J; Mi, Huaiyu; Diemer, Karen; Guo, Nan; Ladunga, Istvan; Ulitsky-Lazareva, Betty; Muruganujan, Anushya; Rabkin, Steven; Vandergriff, Jody A; Doremieux, Olivier

    2003-01-01

    The PANTHER database was designed for high-throughput analysis of protein sequences. One of the key features is a simplified ontology of protein function, which allows browsing of the database by biological functions. Biologist curators have associated the ontology terms with groups of protein sequences rather than individual sequences. Statistical models (Hidden Markov Models, or HMMs) are built from each of these groups. The advantage of this approach is that new sequences can be automatically classified as they become available. To ensure accurate functional classification, HMMs are constructed not only for families, but also for functionally distinct subfamilies. Multiple sequence alignments and phylogenetic trees, including curator-assigned information, are available for each family. The current version of the PANTHER database includes training sequences from all organisms in the GenBank non-redundant protein database, and the HMMs have been used to classify gene products across the entire genomes of human, and Drosophila melanogaster. The ontology terms and protein families and subfamilies, as well as Drosophila gene c;assifications, can be browsed and searched for free. Due to outstanding contractual obligations, access to human gene classifications and to protein family trees and multiple sequence alignments will temporarily require a nominal registration fee. PANTHER is publicly available on the web at http://panther.celera.com.

  7. Examining the Factors That Contribute to Successful Database Application Implementation Using the Technology Acceptance Model

    ERIC Educational Resources Information Center

    Nworji, Alexander O.

    2013-01-01

    Most organizations spend millions of dollars due to the impact of improperly implemented database application systems as evidenced by poor data quality problems. The purpose of this quantitative study was to use, and extend, the technology acceptance model (TAM) to assess the impact of information quality and technical quality factors on database…

  8. Nonparametric Bayesian Modeling for Automated Database Schema Matching

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ferragut, Erik M; Laska, Jason A

    2015-01-01

    The problem of merging databases arises in many government and commercial applications. Schema matching, a common first step, identifies equivalent fields between databases. We introduce a schema matching framework that builds nonparametric Bayesian models for each field and compares them by computing the probability that a single model could have generated both fields. Our experiments show that our method is more accurate and faster than the existing instance-based matching algorithms in part because of the use of nonparametric Bayesian models.

  9. SSER: Species specific essential reactions database.

    PubMed

    Labena, Abraham A; Ye, Yuan-Nong; Dong, Chuan; Zhang, Fa-Z; Guo, Feng-Biao

    2017-04-19

    Essential reactions are vital components of cellular networks. They are the foundations of synthetic biology and are potential candidate targets for antimetabolic drug design. Especially if a single reaction is catalyzed by multiple enzymes, then inhibiting the reaction would be a better option than targeting the enzymes or the corresponding enzyme-encoding gene. The existing databases such as BRENDA, BiGG, KEGG, Bio-models, Biosilico, and many others offer useful and comprehensive information on biochemical reactions. But none of these databases especially focus on essential reactions. Therefore, building a centralized repository for this class of reactions would be of great value. Here, we present a species-specific essential reactions database (SSER). The current version comprises essential biochemical and transport reactions of twenty-six organisms which are identified via flux balance analysis (FBA) combined with manual curation on experimentally validated metabolic network models. Quantitative data on the number of essential reactions, number of the essential reactions associated with their respective enzyme-encoding genes and shared essential reactions across organisms are the main contents of the database. SSER would be a prime source to obtain essential reactions data and related gene and metabolite information and it can significantly facilitate the metabolic network models reconstruction and analysis, and drug target discovery studies. Users can browse, search, compare and download the essential reactions of organisms of their interest through the website http://cefg.uestc.edu.cn/sser .

  10. Exploring performance issues for a clinical database organized using an entity-attribute-value representation.

    PubMed

    Chen, R S; Nadkarni, P; Marenco, L; Levin, F; Erdos, J; Miller, P L

    2000-01-01

    The entity-attribute-value representation with classes and relationships (EAV/CR) provides a flexible and simple database schema to store heterogeneous biomedical data. In certain circumstances, however, the EAV/CR model is known to retrieve data less efficiently than conventionally based database schemas. To perform a pilot study that systematically quantifies performance differences for database queries directed at real-world microbiology data modeled with EAV/CR and conventional representations, and to explore the relative merits of different EAV/CR query implementation strategies. Clinical microbiology data obtained over a ten-year period were stored using both database models. Query execution times were compared for four clinically oriented attribute-centered and entity-centered queries operating under varying conditions of database size and system memory. The performance characteristics of three different EAV/CR query strategies were also examined. Performance was similar for entity-centered queries in the two database models. Performance in the EAV/CR model was approximately three to five times less efficient than its conventional counterpart for attribute-centered queries. The differences in query efficiency became slightly greater as database size increased, although they were reduced with the addition of system memory. The authors found that EAV/CR queries formulated using multiple, simple SQL statements executed in batch were more efficient than single, large SQL statements. This paper describes a pilot project to explore issues in and compare query performance for EAV/CR and conventional database representations. Although attribute-centered queries were less efficient in the EAV/CR model, these inefficiencies may be addressable, at least in part, by the use of more powerful hardware or more memory, or both.

  11. BioModels Database: An enhanced, curated and annotated resource for published quantitative kinetic models

    PubMed Central

    2010-01-01

    Background Quantitative models of biochemical and cellular systems are used to answer a variety of questions in the biological sciences. The number of published quantitative models is growing steadily thanks to increasing interest in the use of models as well as the development of improved software systems and the availability of better, cheaper computer hardware. To maximise the benefits of this growing body of models, the field needs centralised model repositories that will encourage, facilitate and promote model dissemination and reuse. Ideally, the models stored in these repositories should be extensively tested and encoded in community-supported and standardised formats. In addition, the models and their components should be cross-referenced with other resources in order to allow their unambiguous identification. Description BioModels Database http://www.ebi.ac.uk/biomodels/ is aimed at addressing exactly these needs. It is a freely-accessible online resource for storing, viewing, retrieving, and analysing published, peer-reviewed quantitative models of biochemical and cellular systems. The structure and behaviour of each simulation model distributed by BioModels Database are thoroughly checked; in addition, model elements are annotated with terms from controlled vocabularies as well as linked to relevant data resources. Models can be examined online or downloaded in various formats. Reaction network diagrams generated from the models are also available in several formats. BioModels Database also provides features such as online simulation and the extraction of components from large scale models into smaller submodels. Finally, the system provides a range of web services that external software systems can use to access up-to-date data from the database. Conclusions BioModels Database has become a recognised reference resource for systems biology. It is being used by the community in a variety of ways; for example, it is used to benchmark different simulation

  12. Compartmental and Data-Based Modeling of Cerebral Hemodynamics: Linear Analysis.

    PubMed

    Henley, B C; Shin, D C; Zhang, R; Marmarelis, V Z

    Compartmental and data-based modeling of cerebral hemodynamics are alternative approaches that utilize distinct model forms and have been employed in the quantitative study of cerebral hemodynamics. This paper examines the relation between a compartmental equivalent-circuit and a data-based input-output model of dynamic cerebral autoregulation (DCA) and CO2-vasomotor reactivity (DVR). The compartmental model is constructed as an equivalent-circuit utilizing putative first principles and previously proposed hypothesis-based models. The linear input-output dynamics of this compartmental model are compared with data-based estimates of the DCA-DVR process. This comparative study indicates that there are some qualitative similarities between the two-input compartmental model and experimental results.

  13. 3MdB: the Mexican Million Models database

    NASA Astrophysics Data System (ADS)

    Morisset, C.; Delgado-Inglada, G.

    2014-10-01

    The 3MdB is an original effort to construct a large multipurpose database of photoionization models. This is a more modern version of a previous attempt based on Cloudy3D and IDL tools. It is accessed by MySQL requests. The models are obtained using the well known and widely used Cloudy photoionization code (Ferland et al, 2013). The database is aimed to host grids of models with different references to identify each project and to facilitate the extraction of the desired data. We present here a description of the way the database is managed and some of the projects that use 3MdB. Anybody can ask for a grid to be run and stored in 3MdB, to increase the visibility of the grid and the potential side applications of it.

  14. Updated regulation curation model at the Saccharomyces Genome Database

    PubMed Central

    Engel, Stacia R; Skrzypek, Marek S; Hellerstedt, Sage T; Wong, Edith D; Nash, Robert S; Weng, Shuai; Binkley, Gail; Sheppard, Travis K; Karra, Kalpana; Cherry, J Michael

    2018-01-01

    Abstract The Saccharomyces Genome Database (SGD) provides comprehensive, integrated biological information for the budding yeast Saccharomyces cerevisiae, along with search and analysis tools to explore these data, enabling the discovery of functional relationships between sequence and gene products in fungi and higher organisms. We have recently expanded our data model for regulation curation to address regulation at the protein level in addition to transcription, and are presenting the expanded data on the ‘Regulation’ pages at SGD. These pages include a summary describing the context under which the regulator acts, manually curated and high-throughput annotations showing the regulatory relationships for that gene and a graphical visualization of its regulatory network and connected networks. For genes whose products regulate other genes or proteins, the Regulation page includes Gene Ontology enrichment analysis of the biological processes in which those targets participate. For DNA-binding transcription factors, we also provide other information relevant to their regulatory function, such as DNA binding site motifs and protein domains. As with other data types at SGD, all regulatory relationships and accompanying data are available through YeastMine, SGD’s data warehouse based on InterMine. Database URL: http://www.yeastgenome.org PMID:29688362

  15. The Importance of Biological Databases in Biological Discovery.

    PubMed

    Baxevanis, Andreas D; Bateman, Alex

    2015-06-19

    Biological databases play a central role in bioinformatics. They offer scientists the opportunity to access a wide variety of biologically relevant data, including the genomic sequences of an increasingly broad range of organisms. This unit provides a brief overview of major sequence databases and portals, such as GenBank, the UCSC Genome Browser, and Ensembl. Model organism databases, including WormBase, The Arabidopsis Information Resource (TAIR), and those made available through the Mouse Genome Informatics (MGI) resource, are also covered. Non-sequence-centric databases, such as Online Mendelian Inheritance in Man (OMIM), the Protein Data Bank (PDB), MetaCyc, and the Kyoto Encyclopedia of Genes and Genomes (KEGG), are also discussed. Copyright © 2015 John Wiley & Sons, Inc.

  16. Orthology for comparative genomics in the mouse genome database.

    PubMed

    Dolan, Mary E; Baldarelli, Richard M; Bello, Susan M; Ni, Li; McAndrews, Monica S; Bult, Carol J; Kadin, James A; Richardson, Joel E; Ringwald, Martin; Eppig, Janan T; Blake, Judith A

    2015-08-01

    The mouse genome database (MGD) is the model organism database component of the mouse genome informatics system at The Jackson Laboratory. MGD is the international data resource for the laboratory mouse and facilitates the use of mice in the study of human health and disease. Since its beginnings, MGD has included comparative genomics data with a particular focus on human-mouse orthology, an essential component of the use of mouse as a model organism. Over the past 25 years, novel algorithms and addition of orthologs from other model organisms have enriched comparative genomics in MGD data, extending the use of orthology data to support the laboratory mouse as a model of human biology. Here, we describe current comparative data in MGD and review the history and refinement of orthology representation in this resource.

  17. Applying AN Object-Oriented Database Model to a Scientific Database Problem: Managing Experimental Data at Cebaf.

    NASA Astrophysics Data System (ADS)

    Ehlmann, Bryon K.

    Current scientific experiments are often characterized by massive amounts of very complex data and the need for complex data analysis software. Object-oriented database (OODB) systems have the potential of improving the description of the structure and semantics of this data and of integrating the analysis software with the data. This dissertation results from research to enhance OODB functionality and methodology to support scientific databases (SDBs) and, more specifically, to support a nuclear physics experiments database for the Continuous Electron Beam Accelerator Facility (CEBAF). This research to date has identified a number of problems related to the practical application of OODB technology to the conceptual design of the CEBAF experiments database and other SDBs: the lack of a generally accepted OODB design methodology, the lack of a standard OODB model, the lack of a clear conceptual level in existing OODB models, and the limited support in existing OODB systems for many common object relationships inherent in SDBs. To address these problems, the dissertation describes an Object-Relationship Diagram (ORD) and an Object-oriented Database Definition Language (ODDL) that provide tools that allow SDB design and development to proceed systematically and independently of existing OODB systems. These tools define multi-level, conceptual data models for SDB design, which incorporate a simple notation for describing common types of relationships that occur in SDBs. ODDL allows these relationships and other desirable SDB capabilities to be supported by an extended OODB system. A conceptual model of the CEBAF experiments database is presented in terms of ORDs and the ODDL to demonstrate their functionality and use and provide a foundation for future development of experimental nuclear physics software using an OODB approach.

  18. WholeCellSimDB: a hybrid relational/HDF database for whole-cell model predictions

    PubMed Central

    Karr, Jonathan R.; Phillips, Nolan C.; Covert, Markus W.

    2014-01-01

    Mechanistic ‘whole-cell’ models are needed to develop a complete understanding of cell physiology. However, extracting biological insights from whole-cell models requires running and analyzing large numbers of simulations. We developed WholeCellSimDB, a database for organizing whole-cell simulations. WholeCellSimDB was designed to enable researchers to search simulation metadata to identify simulations for further analysis, and quickly slice and aggregate simulation results data. In addition, WholeCellSimDB enables users to share simulations with the broader research community. The database uses a hybrid relational/hierarchical data format architecture to efficiently store and retrieve both simulation setup metadata and results data. WholeCellSimDB provides a graphical Web-based interface to search, browse, plot and export simulations; a JavaScript Object Notation (JSON) Web service to retrieve data for Web-based visualizations; a command-line interface to deposit simulations; and a Python API to retrieve data for advanced analysis. Overall, we believe WholeCellSimDB will help researchers use whole-cell models to advance basic biological science and bioengineering. Database URL: http://www.wholecellsimdb.org Source code repository URL: http://github.com/CovertLab/WholeCellSimDB PMID:25231498

  19. Crystallography Open Database – an open-access collection of crystal structures

    PubMed Central

    Gražulis, Saulius; Chateigner, Daniel; Downs, Robert T.; Yokochi, A. F. T.; Quirós, Miguel; Lutterotti, Luca; Manakova, Elena; Butkus, Justas; Moeck, Peter; Le Bail, Armel

    2009-01-01

    The Crystallography Open Database (COD), which is a project that aims to gather all available inorganic, metal–organic and small organic molecule structural data in one database, is described. The database adopts an open-access model. The COD currently contains ∼80 000 entries in crystallographic information file format, with nearly full coverage of the International Union of Crystallography publications, and is growing in size and quality. PMID:22477773

  20. A database for propagation models

    NASA Technical Reports Server (NTRS)

    Kantak, Anil V.; Suwitra, Krisjani; Le, Choung

    1994-01-01

    A database of various propagation phenomena models that can be used by telecommunications systems engineers to obtain parameter values for systems design is presented. This is an easy-to-use tool and is currently available for either a PC using Excel software under Windows environment or a Macintosh using Excel software for Macintosh. All the steps necessary to use the software are easy and many times self-explanatory; however, a sample run of the CCIR rain attenuation model is presented.

  1. An Object-Relational Ifc Storage Model Based on Oracle Database

    NASA Astrophysics Data System (ADS)

    Li, Hang; Liu, Hua; Liu, Yong; Wang, Yuan

    2016-06-01

    With the building models are getting increasingly complicated, the levels of collaboration across professionals attract more attention in the architecture, engineering and construction (AEC) industry. In order to adapt the change, buildingSMART developed Industry Foundation Classes (IFC) to facilitate the interoperability between software platforms. However, IFC data are currently shared in the form of text file, which is defective. In this paper, considering the object-based inheritance hierarchy of IFC and the storage features of different database management systems (DBMS), we propose a novel object-relational storage model that uses Oracle database to store IFC data. Firstly, establish the mapping rules between data types in IFC specification and Oracle database. Secondly, design the IFC database according to the relationships among IFC entities. Thirdly, parse the IFC file and extract IFC data. And lastly, store IFC data into corresponding tables in IFC database. In experiment, three different building models are selected to demonstrate the effectiveness of our storage model. The comparison of experimental statistics proves that IFC data are lossless during data exchange.

  2. ECOS E-MATRIX Methane and Volatile Organic Carbon (VOC) Emissions Best Practices Database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Parisien, Lia

    2016-01-31

    This final scientific/technical report on the ECOS e-MATRIX Methane and Volatile Organic Carbon (VOC) Emissions Best Practices Database provides a disclaimer and acknowledgement, table of contents, executive summary, description of project activities, and briefing/technical presentation link.

  3. Greedy Sampling and Incremental Surrogate Model-Based Tailoring of Aeroservoelastic Model Database for Flexible Aircraft

    NASA Technical Reports Server (NTRS)

    Wang, Yi; Pant, Kapil; Brenner, Martin J.; Ouellette, Jeffrey A.

    2018-01-01

    This paper presents a data analysis and modeling framework to tailor and develop linear parameter-varying (LPV) aeroservoelastic (ASE) model database for flexible aircrafts in broad 2D flight parameter space. The Kriging surrogate model is constructed using ASE models at a fraction of grid points within the original model database, and then the ASE model at any flight condition can be obtained simply through surrogate model interpolation. The greedy sampling algorithm is developed to select the next sample point that carries the worst relative error between the surrogate model prediction and the benchmark model in the frequency domain among all input-output channels. The process is iterated to incrementally improve surrogate model accuracy till a pre-determined tolerance or iteration budget is met. The methodology is applied to the ASE model database of a flexible aircraft currently being tested at NASA/AFRC for flutter suppression and gust load alleviation. Our studies indicate that the proposed method can reduce the number of models in the original database by 67%. Even so the ASE models obtained through Kriging interpolation match the model in the original database constructed directly from the physics-based tool with the worst relative error far below 1%. The interpolated ASE model exhibits continuously-varying gains along a set of prescribed flight conditions. More importantly, the selected grid points are distributed non-uniformly in the parameter space, a) capturing the distinctly different dynamic behavior and its dependence on flight parameters, and b) reiterating the need and utility for adaptive space sampling techniques for ASE model database compaction. The present framework is directly extendible to high-dimensional flight parameter space, and can be used to guide the ASE model development, model order reduction, robust control synthesis and novel vehicle design of flexible aircraft.

  4. Organization's Orderly Interest Exploration: Inception, Development and Insights of AIAA's Topics Database

    NASA Technical Reports Server (NTRS)

    Marshall, Jospeh R.; Morris, Allan T.

    2007-01-01

    Since 2003, AIAA's Computer Systems and Software Systems Technical Committees (TCs) have developed a database that aids technical committee management to map technical topics to their members. This Topics/Interest (T/I) database grew out of a collection of charts and spreadsheets maintained by the TCs. Since its inception, the tool has evolved into a multi-dimensional database whose dimensions include the importance, interest and expertise of TC members and whether or not a member and/or a TC is actively involved with the topic. In 2005, the database was expanded to include the TCs in AIAA s Information Systems Group and then expanded further to include all AIAA TCs. It was field tested at an AIAA Technical Activities Committee (TAC) Workshop in early 2006 through live access by over 80 users. Through the use of the topics database, TC and program committee (PC) members can accomplish relevant tasks such as: to identify topic experts (for Aerospace America articles or external contacts), to determine the interest of its members, to identify overlapping topics between diverse TCs and PCs, to guide new member drives and to reveal emerging topics. This paper will describe the origins, inception, initial development, field test and current version of the tool as well as elucidate the benefits and insights gained by using the database to aid the management of various TC functions. Suggestions will be provided to guide future development of the database for the purpose of providing dynamics and system level benefits to AIAA that currently do not exist in any technical organization.

  5. First Database Course--Keeping It All Organized

    ERIC Educational Resources Information Center

    Baugh, Jeanne M.

    2015-01-01

    All Computer Information Systems programs require a database course for their majors. This paper describes an approach to such a course in which real world examples, both design projects and actual database application projects are incorporated throughout the semester. Students are expected to apply the traditional database concepts to actual…

  6. SynechoNET: integrated protein-protein interaction database of a model cyanobacterium Synechocystis sp. PCC 6803.

    PubMed

    Kim, Woo-Yeon; Kang, Sungsoo; Kim, Byoung-Chul; Oh, Jeehyun; Cho, Seongwoong; Bhak, Jong; Choi, Jong-Soon

    2008-01-01

    Cyanobacteria are model organisms for studying photosynthesis, carbon and nitrogen assimilation, evolution of plant plastids, and adaptability to environmental stresses. Despite many studies on cyanobacteria, there is no web-based database of their regulatory and signaling protein-protein interaction networks to date. We report a database and website SynechoNET that provides predicted protein-protein interactions. SynechoNET shows cyanobacterial domain-domain interactions as well as their protein-level interactions using the model cyanobacterium, Synechocystis sp. PCC 6803. It predicts the protein-protein interactions using public interaction databases that contain mutually complementary and redundant data. Furthermore, SynechoNET provides information on transmembrane topology, signal peptide, and domain structure in order to support the analysis of regulatory membrane proteins. Such biological information can be queried and visualized in user-friendly web interfaces that include the interactive network viewer and search pages by keyword and functional category. SynechoNET is an integrated protein-protein interaction database designed to analyze regulatory membrane proteins in cyanobacteria. It provides a platform for biologists to extend the genomic data of cyanobacteria by predicting interaction partners, membrane association, and membrane topology of Synechocystis proteins. SynechoNET is freely available at http://synechocystis.org/ or directly at http://bioportal.kobic.kr/SynechoNET/.

  7. Using chemical organization theory for model checking

    PubMed Central

    Kaleta, Christoph; Richter, Stephan; Dittrich, Peter

    2009-01-01

    Motivation: The increasing number and complexity of biomodels makes automatic procedures for checking the models' properties and quality necessary. Approaches like elementary mode analysis, flux balance analysis, deficiency analysis and chemical organization theory (OT) require only the stoichiometric structure of the reaction network for derivation of valuable information. In formalisms like Systems Biology Markup Language (SBML), however, information about the stoichiometric coefficients required for an analysis of chemical organizations can be hidden in kinetic laws. Results: First, we introduce an algorithm that uncovers stoichiometric information that might be hidden in the kinetic laws of a reaction network. This allows us to apply OT to SBML models using modifiers. Second, using the new algorithm, we performed a large-scale analysis of the 185 models contained in the manually curated BioModels Database. We found that for 41 models (22%) the set of organizations changes when modifiers are considered correctly. We discuss one of these models in detail (BIOMD149, a combined model of the ERK- and Wnt-signaling pathways), whose set of organizations drastically changes when modifiers are considered. Third, we found inconsistencies in 5 models (3%) and identified their characteristics. Compared with flux-based methods, OT is able to identify those species and reactions more accurately [in 26 cases (14%)] that can be present in a long-term simulation of the model. We conclude that our approach is a valuable tool that helps to improve the consistency of biomodels and their repositories. Availability: All data and a JAVA applet to check SBML-models is available from http://www.minet.uni-jena.de/csb/prj/ot/tools Contact: dittrich@minet.uni-jena.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19468053

  8. Integrating heterogeneous databases in clustered medic care environments using object-oriented technology

    NASA Astrophysics Data System (ADS)

    Thakore, Arun K.; Sauer, Frank

    1994-05-01

    The organization of modern medical care environments into disease-related clusters, such as a cancer center, a diabetes clinic, etc., has the side-effect of introducing multiple heterogeneous databases, often containing similar information, within the same organization. This heterogeneity fosters incompatibility and prevents the effective sharing of data amongst applications at different sites. Although integration of heterogeneous databases is now feasible, in the medical arena this is often an ad hoc process, not founded on proven database technology or formal methods. In this paper we illustrate the use of a high-level object- oriented semantic association method to model information found in different databases into an integrated conceptual global model that integrates the databases. We provide examples from the medical domain to illustrate an integration approach resulting in a consistent global view, without attacking the autonomy of the underlying databases.

  9. Effects of distributed database modeling on evaluation of transaction rollbacks

    NASA Technical Reports Server (NTRS)

    Mukkamala, Ravi

    1991-01-01

    Data distribution, degree of data replication, and transaction access patterns are key factors in determining the performance of distributed database systems. In order to simplify the evaluation of performance measures, database designers and researchers tend to make simplistic assumptions about the system. The effect is studied of modeling assumptions on the evaluation of one such measure, the number of transaction rollbacks, in a partitioned distributed database system. Six probabilistic models and expressions are developed for the numbers of rollbacks under each of these models. Essentially, the models differ in terms of the available system information. The analytical results so obtained are compared to results from simulation. From here, it is concluded that most of the probabilistic models yield overly conservative estimates of the number of rollbacks. The effect of transaction commutativity on system throughout is also grossly undermined when such models are employed.

  10. Effects of distributed database modeling on evaluation of transaction rollbacks

    NASA Technical Reports Server (NTRS)

    Mukkamala, Ravi

    1991-01-01

    Data distribution, degree of data replication, and transaction access patterns are key factors in determining the performance of distributed database systems. In order to simplify the evaluation of performance measures, database designers and researchers tend to make simplistic assumptions about the system. Here, researchers investigate the effect of modeling assumptions on the evaluation of one such measure, the number of transaction rollbacks in a partitioned distributed database system. The researchers developed six probabilistic models and expressions for the number of rollbacks under each of these models. Essentially, the models differ in terms of the available system information. The analytical results obtained are compared to results from simulation. It was concluded that most of the probabilistic models yield overly conservative estimates of the number of rollbacks. The effect of transaction commutativity on system throughput is also grossly undermined when such models are employed.

  11. Human Thermal Model Evaluation Using the JSC Human Thermal Database

    NASA Technical Reports Server (NTRS)

    Cognata, T.; Bue, G.; Makinen, J.

    2011-01-01

    The human thermal database developed at the Johnson Space Center (JSC) is used to evaluate a set of widely used human thermal models. This database will facilitate a more accurate evaluation of human thermoregulatory response using in a variety of situations, including those situations that might otherwise prove too dangerous for actual testing--such as extreme hot or cold splashdown conditions. This set includes the Wissler human thermal model, a model that has been widely used to predict the human thermoregulatory response to a variety of cold and hot environments. These models are statistically compared to the current database, which contains experiments of human subjects primarily in air from a literature survey ranging between 1953 and 2004 and from a suited experiment recently performed by the authors, for a quantitative study of relative strength and predictive quality of the models. Human thermal modeling has considerable long term utility to human space flight. Such models provide a tool to predict crew survivability in support of vehicle design and to evaluate crew response in untested environments. It is to the benefit of any such model not only to collect relevant experimental data to correlate it against, but also to maintain an experimental standard or benchmark for future development in a readily and rapidly searchable and software accessible format. The Human thermal database project is intended to do just so; to collect relevant data from literature and experimentation and to store the data in a database structure for immediate and future use as a benchmark to judge human thermal models against, in identifying model strengths and weakness, to support model development and improve correlation, and to statistically quantify a model s predictive quality.

  12. PAMDB: a comprehensive Pseudomonas aeruginosa metabolome database.

    PubMed

    Huang, Weiliang; Brewer, Luke K; Jones, Jace W; Nguyen, Angela T; Marcu, Ana; Wishart, David S; Oglesby-Sherrouse, Amanda G; Kane, Maureen A; Wilks, Angela

    2018-01-04

    The Pseudomonas aeruginosaMetabolome Database (PAMDB, http://pseudomonas.umaryland.edu) is a searchable, richly annotated metabolite database specific to P. aeruginosa. P. aeruginosa is a soil organism and significant opportunistic pathogen that adapts to its environment through a versatile energy metabolism network. Furthermore, P. aeruginosa is a model organism for the study of biofilm formation, quorum sensing, and bioremediation processes, each of which are dependent on unique pathways and metabolites. The PAMDB is modelled on the Escherichia coli (ECMDB), yeast (YMDB) and human (HMDB) metabolome databases and contains >4370 metabolites and 938 pathways with links to over 1260 genes and proteins. The database information was compiled from electronic databases, journal articles and mass spectrometry (MS) metabolomic data obtained in our laboratories. For each metabolite entered, we provide detailed compound descriptions, names and synonyms, structural and physiochemical information, nuclear magnetic resonance (NMR) and MS spectra, enzymes and pathway information, as well as gene and protein sequences. The database allows extensive searching via chemical names, structure and molecular weight, together with gene, protein and pathway relationships. The PAMBD and its future iterations will provide a valuable resource to biologists, natural product chemists and clinicians in identifying active compounds, potential biomarkers and clinical diagnostics. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. Physiological Parameters Database for PBPK Modeling (External Review Draft)

    EPA Science Inventory

    EPA released for public comment a physiological parameters database (created using Microsoft ACCESS) intended to be used in PBPK modeling. The database contains physiological parameter values for humans from early childhood through senescence. It also contains similar data for an...

  14. Human Thermal Model Evaluation Using the JSC Human Thermal Database

    NASA Technical Reports Server (NTRS)

    Bue, Grant; Makinen, Janice; Cognata, Thomas

    2012-01-01

    Human thermal modeling has considerable long term utility to human space flight. Such models provide a tool to predict crew survivability in support of vehicle design and to evaluate crew response in untested space environments. It is to the benefit of any such model not only to collect relevant experimental data to correlate it against, but also to maintain an experimental standard or benchmark for future development in a readily and rapidly searchable and software accessible format. The Human thermal database project is intended to do just so; to collect relevant data from literature and experimentation and to store the data in a database structure for immediate and future use as a benchmark to judge human thermal models against, in identifying model strengths and weakness, to support model development and improve correlation, and to statistically quantify a model s predictive quality. The human thermal database developed at the Johnson Space Center (JSC) is intended to evaluate a set of widely used human thermal models. This set includes the Wissler human thermal model, a model that has been widely used to predict the human thermoregulatory response to a variety of cold and hot environments. These models are statistically compared to the current database, which contains experiments of human subjects primarily in air from a literature survey ranging between 1953 and 2004 and from a suited experiment recently performed by the authors, for a quantitative study of relative strength and predictive quality of the models.

  15. ASGARD: an open-access database of annotated transcriptomes for emerging model arthropod species.

    PubMed

    Zeng, Victor; Extavour, Cassandra G

    2012-01-01

    The increased throughput and decreased cost of next-generation sequencing (NGS) have shifted the bottleneck genomic research from sequencing to annotation, analysis and accessibility. This is particularly challenging for research communities working on organisms that lack the basic infrastructure of a sequenced genome, or an efficient way to utilize whatever sequence data may be available. Here we present a new database, the Assembled Searchable Giant Arthropod Read Database (ASGARD). This database is a repository and search engine for transcriptomic data from arthropods that are of high interest to multiple research communities but currently lack sequenced genomes. We demonstrate the functionality and utility of ASGARD using de novo assembled transcriptomes from the milkweed bug Oncopeltus fasciatus, the cricket Gryllus bimaculatus and the amphipod crustacean Parhyale hawaiensis. We have annotated these transcriptomes to assign putative orthology, coding region determination, protein domain identification and Gene Ontology (GO) term annotation to all possible assembly products. ASGARD allows users to search all assemblies by orthology annotation, GO term annotation or Basic Local Alignment Search Tool. User-friendly features of ASGARD include search term auto-completion suggestions based on database content, the ability to download assembly product sequences in FASTA format, direct links to NCBI data for predicted orthologs and graphical representation of the location of protein domains and matches to similar sequences from the NCBI non-redundant database. ASGARD will be a useful repository for transcriptome data from future NGS studies on these and other emerging model arthropods, regardless of sequencing platform, assembly or annotation status. This database thus provides easy, one-stop access to multi-species annotated transcriptome information. We anticipate that this database will be useful for members of multiple research communities, including developmental

  16. Geospatial Database for Strata Objects Based on Land Administration Domain Model (ladm)

    NASA Astrophysics Data System (ADS)

    Nasorudin, N. N.; Hassan, M. I.; Zulkifli, N. A.; Rahman, A. Abdul

    2016-09-01

    Recently in our country, the construction of buildings become more complex and it seems that strata objects database becomes more important in registering the real world as people now own and use multilevel of spaces. Furthermore, strata title was increasingly important and need to be well-managed. LADM is a standard model for land administration and it allows integrated 2D and 3D representation of spatial units. LADM also known as ISO 19152. The aim of this paper is to develop a strata objects database using LADM. This paper discusses the current 2D geospatial database and needs for 3D geospatial database in future. This paper also attempts to develop a strata objects database using a standard data model (LADM) and to analyze the developed strata objects database using LADM data model. The current cadastre system in Malaysia includes the strata title is discussed in this paper. The problems in the 2D geospatial database were listed and the needs for 3D geospatial database in future also is discussed. The processes to design a strata objects database are conceptual, logical and physical database design. The strata objects database will allow us to find the information on both non-spatial and spatial strata title information thus shows the location of the strata unit. This development of strata objects database may help to handle the strata title and information.

  17. A database for propagation models

    NASA Technical Reports Server (NTRS)

    Kantak, Anil V.; Suwitra, Krisjani; Le, Chuong

    1995-01-01

    A database of various propagation phenomena models that can be used by telecommunications systems engineers to obtain parameter values for systems design is presented. This is an easy-to-use tool and is currently available for either a PC using Excel software under Windows environment or a Macintosh using Excel software for Macintosh. All the steps necessary to use the software are easy and many times self explanatory.

  18. Discovery of Possible Gene Relationships through the Application of Self-Organizing Maps to DNA Microarray Databases

    PubMed Central

    Chavez-Alvarez, Rocio; Chavoya, Arturo; Mendez-Vazquez, Andres

    2014-01-01

    DNA microarrays and cell cycle synchronization experiments have made possible the study of the mechanisms of cell cycle regulation of Saccharomyces cerevisiae by simultaneously monitoring the expression levels of thousands of genes at specific time points. On the other hand, pattern recognition techniques can contribute to the analysis of such massive measurements, providing a model of gene expression level evolution through the cell cycle process. In this paper, we propose the use of one of such techniques –an unsupervised artificial neural network called a Self-Organizing Map (SOM)–which has been successfully applied to processes involving very noisy signals, classifying and organizing them, and assisting in the discovery of behavior patterns without requiring prior knowledge about the process under analysis. As a test bed for the use of SOMs in finding possible relationships among genes and their possible contribution in some biological processes, we selected 282 S. cerevisiae genes that have been shown through biological experiments to have an activity during the cell cycle. The expression level of these genes was analyzed in five of the most cited time series DNA microarray databases used in the study of the cell cycle of this organism. With the use of SOM, it was possible to find clusters of genes with similar behavior in the five databases along two cell cycles. This result suggested that some of these genes might be biologically related or might have a regulatory relationship, as was corroborated by comparing some of the clusters obtained with SOMs against a previously reported regulatory network that was generated using biological knowledge, such as protein-protein interactions, gene expression levels, metabolism dynamics, promoter binding, and modification, regulation and transport of proteins. The methodology described in this paper could be applied to the study of gene relationships of other biological processes in different organisms. PMID:24699245

  19. Using the Cambridge Structural Database to Teach Molecular Geometry Concepts in Organic Chemistry

    ERIC Educational Resources Information Center

    Wackerly, Jay Wm.; Janowicz, Philip A.; Ritchey, Joshua A.; Caruso, Mary M.; Elliott, Erin L.; Moore, Jeffrey S.

    2009-01-01

    This article reports a set of two homework assignments that can be used in a second-year undergraduate organic chemistry class. These assignments were designed to help reinforce concepts of molecular geometry and to give students the opportunity to use a technological database and data mining to analyze experimentally determined chemical…

  20. Database Administrator

    ERIC Educational Resources Information Center

    Moore, Pam

    2010-01-01

    The Internet and electronic commerce (e-commerce) generate lots of data. Data must be stored, organized, and managed. Database administrators, or DBAs, work with database software to find ways to do this. They identify user needs, set up computer databases, and test systems. They ensure that systems perform as they should and add people to the…

  1. Gene and protein nomenclature in public databases

    PubMed Central

    Fundel, Katrin; Zimmer, Ralf

    2006-01-01

    Background Frequently, several alternative names are in use for biological objects such as genes and proteins. Applications like manual literature search, automated text-mining, named entity identification, gene/protein annotation, and linking of knowledge from different information sources require the knowledge of all used names referring to a given gene or protein. Various organism-specific or general public databases aim at organizing knowledge about genes and proteins. These databases can be used for deriving gene and protein name dictionaries. So far, little is known about the differences between databases in terms of size, ambiguities and overlap. Results We compiled five gene and protein name dictionaries for each of the five model organisms (yeast, fly, mouse, rat, and human) from different organism-specific and general public databases. We analyzed the degree of ambiguity of gene and protein names within and between dictionaries, to a lexicon of common English words and domain-related non-gene terms, and we compared different data sources in terms of size of extracted dictionaries and overlap of synonyms between those. The study shows that the number of genes/proteins and synonyms covered in individual databases varies significantly for a given organism, and that the degree of ambiguity of synonyms varies significantly between different organisms. Furthermore, it shows that, despite considerable efforts of co-curation, the overlap of synonyms in different data sources is rather moderate and that the degree of ambiguity of gene names with common English words and domain-related non-gene terms varies depending on the considered organism. Conclusion In conclusion, these results indicate that the combination of data contained in different databases allows the generation of gene and protein name dictionaries that contain significantly more used names than dictionaries obtained from individual data sources. Furthermore, curation of combined dictionaries

  2. Imprecision and Uncertainty in the UFO Database Model.

    ERIC Educational Resources Information Center

    Van Gyseghem, Nancy; De Caluwe, Rita

    1998-01-01

    Discusses how imprecision and uncertainty are dealt with in the UFO (Uncertainty and Fuzziness in an Object-oriented) database model. Such information is expressed by means of possibility distributions, and modeled by means of the proposed concept of "role objects." The role objects model uncertain, tentative information about objects,…

  3. VerSeDa: vertebrate secretome database

    PubMed Central

    Cortazar, Ana R.; Oguiza, José A.

    2017-01-01

    Based on the current tools, de novo secretome (full set of proteins secreted by an organism) prediction is a time consuming bioinformatic task that requires a multifactorial analysis in order to obtain reliable in silico predictions. Hence, to accelerate this process and offer researchers a reliable repository where secretome information can be obtained for vertebrates and model organisms, we have developed VerSeDa (Vertebrate Secretome Database). This freely available database stores information about proteins that are predicted to be secreted through the classical and non-classical mechanisms, for the wide range of vertebrate species deposited at the NCBI, UCSC and ENSEMBL sites. To our knowledge, VerSeDa is the only state-of-the-art database designed to store secretome data from multiple vertebrate genomes, thus, saving an important amount of time spent in the prediction of protein features that can be retrieved from this repository directly. Database URL: VerSeDa is freely available at http://genomics.cicbiogune.es/VerSeDa/index.php PMID:28365718

  4. Modeling biology using relational databases.

    PubMed

    Peitzsch, Robert M

    2003-02-01

    There are several different methodologies that can be used for designing a database schema; no one is the best for all occasions. This unit demonstrates two different techniques for designing relational tables and discusses when each should be used. These two techniques presented are (1) traditional Entity-Relationship (E-R) modeling and (2) a hybrid method that combines aspects of data warehousing and E-R modeling. The method of choice depends on (1) how well the information and all its inherent relationships are understood, (2) what types of questions will be asked, (3) how many different types of data will be included, and (4) how much data exists.

  5. Carotenoids Database: structures, chemical fingerprints and distribution among organisms.

    PubMed

    Yabuzaki, Junko

    2017-01-01

    To promote understanding of how organisms are related via carotenoids, either evolutionarily or symbiotically, or in food chains through natural histories, we built the Carotenoids Database. This provides chemical information on 1117 natural carotenoids with 683 source organisms. For extracting organisms closely related through the biosynthesis of carotenoids, we offer a new similarity search system 'Search similar carotenoids' using our original chemical fingerprint 'Carotenoid DB Chemical Fingerprints'. These Carotenoid DB Chemical Fingerprints describe the chemical substructure and the modification details based upon International Union of Pure and Applied Chemistry (IUPAC) semi-systematic names of the carotenoids. The fingerprints also allow (i) easier prediction of six biological functions of carotenoids: provitamin A, membrane stabilizers, odorous substances, allelochemicals, antiproliferative activity and reverse MDR activity against cancer cells, (ii) easier classification of carotenoid structures, (iii) partial and exact structure searching and (iv) easier extraction of structural isomers and stereoisomers. We believe this to be the first attempt to establish fingerprints using the IUPAC semi-systematic names. For extracting close profiled organisms, we provide a new tool 'Search similar profiled organisms'. Our current statistics show some insights into natural history: carotenoids seem to have been spread largely by bacteria, as they produce C30, C40, C45 and C50 carotenoids, with the widest range of end groups, and they share a small portion of C40 carotenoids with eukaryotes. Archaea share an even smaller portion with eukaryotes. Eukaryotes then have evolved a considerable variety of C40 carotenoids. Considering carotenoids, eukaryotes seem more closely related to bacteria than to archaea aside from 16S rRNA lineage analysis. : http://carotenoiddb.jp. © The Author(s) 2017. Published by Oxford University Press.

  6. Data-based mathematical modeling of vectorial transport across double-transfected polarized cells.

    PubMed

    Bartholomé, Kilian; Rius, Maria; Letschert, Katrin; Keller, Daniela; Timmer, Jens; Keppler, Dietrich

    2007-09-01

    Vectorial transport of endogenous small molecules, toxins, and drugs across polarized epithelial cells contributes to their half-life in the organism and to detoxification. To study vectorial transport in a quantitative manner, an in vitro model was used that includes polarized MDCKII cells stably expressing the recombinant human uptake transporter OATP1B3 in their basolateral membrane and the recombinant ATP-driven efflux pump ABCC2 in their apical membrane. These double-transfected cells enabled mathematical modeling of the vectorial transport of the anionic prototype substance bromosulfophthalein (BSP) that has frequently been used to examine hepatobiliary transport. Time-dependent analyses of (3)H-labeled BSP in the basolateral, intracellular, and apical compartments of cells cultured on filter membranes and efflux experiments in cells preloaded with BSP were performed. A mathematical model was fitted to the experimental data. Data-based modeling was optimized by including endogenous transport processes in addition to the recombinant transport proteins. The predominant contributions to the overall vectorial transport of BSP were mediated by OATP1B3 (44%) and ABCC2 (28%). Model comparison predicted a previously unrecognized endogenous basolateral efflux process as a negative contribution to total vectorial transport, amounting to 19%, which is in line with the detection of the basolateral efflux pump Abcc4 in MDCKII cells. Rate-determining steps in the vectorial transport were identified by calculating control coefficients. Data-based mathematical modeling of vectorial transport of BSP as a model substance resulted in a quantitative description of this process and its components. The same systems biology approach may be applied to other cellular systems and to different substances.

  7. Combining Soil Databases for Topsoil Organic Carbon Mapping in Europe.

    PubMed

    Aksoy, Ece; Yigini, Yusuf; Montanarella, Luca

    2016-01-01

    Accuracy in assessing the distribution of soil organic carbon (SOC) is an important issue because of playing key roles in the functions of both natural ecosystems and agricultural systems. There are several studies in the literature with the aim of finding the best method to assess and map the distribution of SOC content for Europe. Therefore this study aims searching for another aspect of this issue by looking to the performances of using aggregated soil samples coming from different studies and land-uses. The total number of the soil samples in this study was 23,835 and they're collected from the "Land Use/Cover Area frame Statistical Survey" (LUCAS) Project (samples from agricultural soil), BioSoil Project (samples from forest soil), and "Soil Transformations in European Catchments" (SoilTrEC) Project (samples from local soil data coming from six different critical zone observatories (CZOs) in Europe). Moreover, 15 spatial indicators (slope, aspect, elevation, compound topographic index (CTI), CORINE land-cover classification, parent material, texture, world reference base (WRB) soil classification, geological formations, annual average temperature, min-max temperature, total precipitation and average precipitation (for years 1960-1990 and 2000-2010)) were used as auxiliary variables in this prediction. One of the most popular geostatistical techniques, Regression-Kriging (RK), was applied to build the model and assess the distribution of SOC. This study showed that, even though RK method was appropriate for successful SOC mapping, using combined databases was not helpful to increase the statistical significance of the method results for assessing the SOC distribution. According to our results; SOC variation was mainly affected by elevation, slope, CTI, average temperature, average and total precipitation, texture, WRB and CORINE variables for Europe scale in our model. Moreover, the highest average SOC contents were found in the wetland areas; agricultural

  8. Combining Soil Databases for Topsoil Organic Carbon Mapping in Europe

    PubMed Central

    Aksoy, Ece

    2016-01-01

    Accuracy in assessing the distribution of soil organic carbon (SOC) is an important issue because of playing key roles in the functions of both natural ecosystems and agricultural systems. There are several studies in the literature with the aim of finding the best method to assess and map the distribution of SOC content for Europe. Therefore this study aims searching for another aspect of this issue by looking to the performances of using aggregated soil samples coming from different studies and land-uses. The total number of the soil samples in this study was 23,835 and they’re collected from the “Land Use/Cover Area frame Statistical Survey” (LUCAS) Project (samples from agricultural soil), BioSoil Project (samples from forest soil), and “Soil Transformations in European Catchments” (SoilTrEC) Project (samples from local soil data coming from six different critical zone observatories (CZOs) in Europe). Moreover, 15 spatial indicators (slope, aspect, elevation, compound topographic index (CTI), CORINE land-cover classification, parent material, texture, world reference base (WRB) soil classification, geological formations, annual average temperature, min-max temperature, total precipitation and average precipitation (for years 1960–1990 and 2000–2010)) were used as auxiliary variables in this prediction. One of the most popular geostatistical techniques, Regression-Kriging (RK), was applied to build the model and assess the distribution of SOC. This study showed that, even though RK method was appropriate for successful SOC mapping, using combined databases was not helpful to increase the statistical significance of the method results for assessing the SOC distribution. According to our results; SOC variation was mainly affected by elevation, slope, CTI, average temperature, average and total precipitation, texture, WRB and CORINE variables for Europe scale in our model. Moreover, the highest average SOC contents were found in the wetland areas

  9. Combining computational models, semantic annotations and simulation experiments in a graph database

    PubMed Central

    Henkel, Ron; Wolkenhauer, Olaf; Waltemath, Dagmar

    2015-01-01

    Model repositories such as the BioModels Database, the CellML Model Repository or JWS Online are frequently accessed to retrieve computational models of biological systems. However, their storage concepts support only restricted types of queries and not all data inside the repositories can be retrieved. In this article we present a storage concept that meets this challenge. It grounds on a graph database, reflects the models’ structure, incorporates semantic annotations and simulation descriptions and ultimately connects different types of model-related data. The connections between heterogeneous model-related data and bio-ontologies enable efficient search via biological facts and grant access to new model features. The introduced concept notably improves the access of computational models and associated simulations in a model repository. This has positive effects on tasks such as model search, retrieval, ranking, matching and filtering. Furthermore, our work for the first time enables CellML- and Systems Biology Markup Language-encoded models to be effectively maintained in one database. We show how these models can be linked via annotations and queried. Database URL: https://sems.uni-rostock.de/projects/masymos/ PMID:25754863

  10. Data-mining analysis of the global distribution of soil carbon in observational databases and Earth system models

    NASA Astrophysics Data System (ADS)

    Hashimoto, Shoji; Nanko, Kazuki; Ťupek, Boris; Lehtonen, Aleksi

    2017-03-01

    Future climate change will dramatically change the carbon balance in the soil, and this change will affect the terrestrial carbon stock and the climate itself. Earth system models (ESMs) are used to understand the current climate and to project future climate conditions, but the soil organic carbon (SOC) stock simulated by ESMs and those of observational databases are not well correlated when the two are compared at fine grid scales. However, the specific key processes and factors, as well as the relationships among these factors that govern the SOC stock, remain unclear; the inclusion of such missing information would improve the agreement between modeled and observational data. In this study, we sought to identify the influential factors that govern global SOC distribution in observational databases, as well as those simulated by ESMs. We used a data-mining (machine-learning) (boosted regression trees - BRT) scheme to identify the factors affecting the SOC stock. We applied BRT scheme to three observational databases and 15 ESM outputs from the fifth phase of the Coupled Model Intercomparison Project (CMIP5) and examined the effects of 13 variables/factors categorized into five groups (climate, soil property, topography, vegetation, and land-use history). Globally, the contributions of mean annual temperature, clay content, carbon-to-nitrogen (CN) ratio, wetland ratio, and land cover were high in observational databases, whereas the contributions of the mean annual temperature, land cover, and net primary productivity (NPP) were predominant in the SOC distribution in ESMs. A comparison of the influential factors at a global scale revealed that the most distinct differences between the SOCs from the observational databases and ESMs were the low clay content and CN ratio contributions, and the high NPP contribution in the ESMs. The results of this study will aid in identifying the causes of the current mismatches between observational SOC databases and ESM outputs

  11. Modeling Powered Aerodynamics for the Orion Launch Abort Vehicle Aerodynamic Database

    NASA Technical Reports Server (NTRS)

    Chan, David T.; Walker, Eric L.; Robinson, Philip E.; Wilson, Thomas M.

    2011-01-01

    Modeling the aerodynamics of the Orion Launch Abort Vehicle (LAV) has presented many technical challenges to the developers of the Orion aerodynamic database. During a launch abort event, the aerodynamic environment around the LAV is very complex as multiple solid rocket plumes interact with each other and the vehicle. It is further complicated by vehicle separation events such as between the LAV and the launch vehicle stack or between the launch abort tower and the crew module. The aerodynamic database for the LAV was developed mainly from wind tunnel tests involving powered jet simulations of the rocket exhaust plumes, supported by computational fluid dynamic simulations. However, limitations in both methods have made it difficult to properly capture the aerodynamics of the LAV in experimental and numerical simulations. These limitations have also influenced decisions regarding the modeling and structure of the aerodynamic database for the LAV and led to compromises and creative solutions. Two database modeling approaches are presented in this paper (incremental aerodynamics and total aerodynamics), with examples showing strengths and weaknesses of each approach. In addition, the unique problems presented to the database developers by the large data space required for modeling a launch abort event illustrate the complexities of working with multi-dimensional data.

  12. Supervised Learning for Detection of Duplicates in Genomic Sequence Databases.

    PubMed

    Chen, Qingyu; Zobel, Justin; Zhang, Xiuzhen; Verspoor, Karin

    2016-01-01

    First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases. We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from meta-data, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material.

  13. Models for liquid-liquid partition in the system dimethyl sulfoxide-organic solvent and their use for estimating descriptors for organic compounds.

    PubMed

    Karunasekara, Thushara; Poole, Colin F

    2011-07-15

    Partition coefficients for varied compounds were determined for the organic solvent-dimethyl sulfoxide biphasic partition system where the organic solvent is n-heptane or isopentyl ether. These partition coefficient databases are analyzed using the solvation parameter model facilitating a quantitative comparison of the dimethyl sulfoxide-based partition systems with other totally organic partition systems. Dimethyl sulfoxide is a moderately cohesive solvent, reasonably dipolar/polarizable and strongly hydrogen-bond basic. Although generally considered to be non-hydrogen-bond acidic, analysis of the partition coefficient database strongly supports reclassification as a weak hydrogen-bond acid in agreement with recent literature. The system constants for the n-heptane-dimethyl sulfoxide biphasic system provide an explanation of the mechanism for the selective isolation of polycyclic aromatic compounds from mixtures containing low-polarity hydrocarbons based on the capability of the polar interactions (dipolarity/polarizability and hydrogen-bonding) to overcome the opposing cohesive forces in dimethyl sulfoxide that are absent for the interactions with hydrocarbons of low polarity. In addition, dimethyl sulfoxide-organic solvent systems afford a complementary approach to other totally organic biphasic partition systems for descriptor measurements of compounds virtually insoluble in water. Copyright © 2011 Elsevier B.V. All rights reserved.

  14. Firefighters' hearing: a comparison with population databases from the International Standards Organization.

    PubMed

    Kales, S N; Freyman, R L; Hill, J M; Polyhronopoulos, G N; Aldrich, J M; Christiani, D C

    2001-07-01

    We investigated firefighters' hearing relative to general population data to adjust for age-expected hearing loss. For five groups of male firefighters with increasing mean ages, we compared their hearing thresholds at the 50th and 90th percentiles with normative and age- and sex-matched hearing data from the International Standards Organization (databases A and B). At the 50th percentile, from a mean age of 28 to a mean age of 53 years, relative to databases A and B, the firefighters lost an excess of 19 to 23 dB, 20 to 23 dB, and 16 to 19 dB at 3000, 4000, and 6000 Hz, respectively. At the 90th percentile, from a mean age of 28 to a mean age of 53 years, relative to databases A and B, the firefighters lost an excess of 12 to 20 dB, 38 to 44 dB, 41 to 45 dB, and 22 to 28 dB at 2000, 3000, 4000, and 6000 Hz, respectively. The results are consistent with accelerated hearing loss in excess of age-expected loss among the firefighters, especially at or above the 90th percentile.

  15. Sequence modelling and an extensible data model for genomic database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Peter Wei-Der

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS's do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data modelmore » that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the Extensible Object Model'', to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.« less

  16. Sequence modelling and an extensible data model for genomic database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Peter Wei-Der

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS`s do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data modelmore » that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the ``Extensible Object Model``, to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.« less

  17. CyanOmics: an integrated database of omics for the model cyanobacterium Synechococcus sp. PCC 7002.

    PubMed

    Yang, Yaohua; Feng, Jie; Li, Tao; Ge, Feng; Zhao, Jindong

    2015-01-01

    Cyanobacteria are an important group of organisms that carry out oxygenic photosynthesis and play vital roles in both the carbon and nitrogen cycles of the Earth. The annotated genome of Synechococcus sp. PCC 7002, as an ideal model cyanobacterium, is available. A series of transcriptomic and proteomic studies of Synechococcus sp. PCC 7002 cells grown under different conditions have been reported. However, no database of such integrated omics studies has been constructed. Here we present CyanOmics, a database based on the results of Synechococcus sp. PCC 7002 omics studies. CyanOmics comprises one genomic dataset, 29 transcriptomic datasets and one proteomic dataset and should prove useful for systematic and comprehensive analysis of all those data. Powerful browsing and searching tools are integrated to help users directly access information of interest with enhanced visualization of the analytical results. Furthermore, Blast is included for sequence-based similarity searching and Cluster 3.0, as well as the R hclust function is provided for cluster analyses, to increase CyanOmics's usefulness. To the best of our knowledge, it is the first integrated omics analysis database for cyanobacteria. This database should further understanding of the transcriptional patterns, and proteomic profiling of Synechococcus sp. PCC 7002 and other cyanobacteria. Additionally, the entire database framework is applicable to any sequenced prokaryotic genome and could be applied to other integrated omics analysis projects. Database URL: http://lag.ihb.ac.cn/cyanomics. © The Author(s) 2015. Published by Oxford University Press.

  18. Technical Work Plan for: Thermodynamic Database for Chemical Modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    C.F. Jovecolon

    The objective of the work scope covered by this Technical Work Plan (TWP) is to correct and improve the Yucca Mountain Project (YMP) thermodynamic databases, to update their documentation, and to ensure reasonable consistency among them. In addition, the work scope will continue to generate database revisions, which are organized and named so as to be transparent to internal and external users and reviewers. Regarding consistency among databases, it is noted that aqueous speciation and mineral solubility data for a given system may differ according to how solubility was determined, and the method used for subsequent retrieval of thermodynamic parametermore » values from measured data. Of particular concern are the details of the determination of ''infinite dilution'' constants, which involve the use of specific methods for activity coefficient corrections. That is, equilibrium constants developed for a given system for one set of conditions may not be consistent with constants developed for other conditions, depending on the species considered in the chemical reactions and the methods used in the reported studies. Hence, there will be some differences (for example in log K values) between the Pitzer and ''B-dot'' database parameters for the same reactions or species.« less

  19. Just-in-time Database-Driven Web Applications

    PubMed Central

    2003-01-01

    "Just-in-time" database-driven Web applications are inexpensive, quickly-developed software that can be put to many uses within a health care organization. Database-driven Web applications garnered 73873 hits on our system-wide intranet in 2002. They enabled collaboration and communication via user-friendly Web browser-based interfaces for both mission-critical and patient-care-critical functions. Nineteen database-driven Web applications were developed. The application categories that comprised 80% of the hits were results reporting (27%), graduate medical education (26%), research (20%), and bed availability (8%). The mean number of hits per application was 3888 (SD = 5598; range, 14-19879). A model is described for just-in-time database-driven Web application development and an example given with a popular HTML editor and database program. PMID:14517109

  20. Data-based Non-Markovian Model Inference

    NASA Astrophysics Data System (ADS)

    Ghil, Michael

    2015-04-01

    This talk concentrates on obtaining stable and efficient data-based models for simulation and prediction in the geosciences and life sciences. The proposed model derivation relies on using a multivariate time series of partial observations from a large-dimensional system, and the resulting low-order models are compared with the optimal closures predicted by the non-Markovian Mori-Zwanzig formalism of statistical physics. Multilayer stochastic models (MSMs) are introduced as both a very broad generalization and a time-continuous limit of existing multilevel, regression-based approaches to data-based closure, in particular of empirical model reduction (EMR). We show that the multilayer structure of MSMs can provide a natural Markov approximation to the generalized Langevin equation (GLE) of the Mori-Zwanzig formalism. A simple correlation-based stopping criterion for an EMR-MSM model is derived to assess how well it approximates the GLE solution. Sufficient conditions are given for the nonlinear cross-interactions between the constitutive layers of a given MSM to guarantee the existence of a global random attractor. This existence ensures that no blow-up can occur for a very broad class of MSM applications. The EMR-MSM methodology is first applied to a conceptual, nonlinear, stochastic climate model of coupled slow and fast variables, in which only slow variables are observed. The resulting reduced model with energy-conserving nonlinearities captures the main statistical features of the slow variables, even when there is no formal scale separation and the fast variables are quite energetic. Second, an MSM is shown to successfully reproduce the statistics of a partially observed, generalized Lokta-Volterra model of population dynamics in its chaotic regime. The positivity constraint on the solutions' components replaces here the quadratic-energy-preserving constraint of fluid-flow problems and it successfully prevents blow-up. This work is based on a close

  1. CycADS: an annotation database system to ease the development and update of BioCyc databases

    PubMed Central

    Vellozo, Augusto F.; Véron, Amélie S.; Baa-Puyoulet, Patrice; Huerta-Cepas, Jaime; Cottret, Ludovic; Febvay, Gérard; Calevro, Federica; Rahbé, Yvan; Douglas, Angela E.; Gabaldón, Toni; Sagot, Marie-France; Charles, Hubert; Colella, Stefano

    2011-01-01

    In recent years, genomes from an increasing number of organisms have been sequenced, but their annotation remains a time-consuming process. The BioCyc databases offer a framework for the integrated analysis of metabolic networks. The Pathway tool software suite allows the automated construction of a database starting from an annotated genome, but it requires prior integration of all annotations into a specific summary file or into a GenBank file. To allow the easy creation and update of a BioCyc database starting from the multiple genome annotation resources available over time, we have developed an ad hoc data management system that we called Cyc Annotation Database System (CycADS). CycADS is centred on a specific database model and on a set of Java programs to import, filter and export relevant information. Data from GenBank and other annotation sources (including for example: KAAS, PRIAM, Blast2GO and PhylomeDB) are collected into a database to be subsequently filtered and extracted to generate a complete annotation file. This file is then used to build an enriched BioCyc database using the PathoLogic program of Pathway Tools. The CycADS pipeline for annotation management was used to build the AcypiCyc database for the pea aphid (Acyrthosiphon pisum) whose genome was recently sequenced. The AcypiCyc database webpage includes also, for comparative analyses, two other metabolic reconstruction BioCyc databases generated using CycADS: TricaCyc for Tribolium castaneum and DromeCyc for Drosophila melanogaster. Linked to its flexible design, CycADS offers a powerful software tool for the generation and regular updating of enriched BioCyc databases. The CycADS system is particularly suited for metabolic gene annotation and network reconstruction in newly sequenced genomes. Because of the uniform annotation used for metabolic network reconstruction, CycADS is particularly useful for comparative analysis of the metabolism of different organisms. Database URL: http

  2. An online database for informing ecological network models: http://kelpforest.ucsc.edu.

    PubMed

    Beas-Luna, Rodrigo; Novak, Mark; Carr, Mark H; Tinker, Martin T; Black, August; Caselle, Jennifer E; Hoban, Michael; Malone, Dan; Iles, Alison

    2014-01-01

    Ecological network models and analyses are recognized as valuable tools for understanding the dynamics and resiliency of ecosystems, and for informing ecosystem-based approaches to management. However, few databases exist that can provide the life history, demographic and species interaction information necessary to parameterize ecological network models. Faced with the difficulty of synthesizing the information required to construct models for kelp forest ecosystems along the West Coast of North America, we developed an online database (http://kelpforest.ucsc.edu/) to facilitate the collation and dissemination of such information. Many of the database's attributes are novel yet the structure is applicable and adaptable to other ecosystem modeling efforts. Information for each taxonomic unit includes stage-specific life history, demography, and body-size allometries. Species interactions include trophic, competitive, facilitative, and parasitic forms. Each data entry is temporally and spatially explicit. The online data entry interface allows researchers anywhere to contribute and access information. Quality control is facilitated by attributing each entry to unique contributor identities and source citations. The database has proven useful as an archive of species and ecosystem-specific information in the development of several ecological network models, for informing management actions, and for education purposes (e.g., undergraduate and graduate training). To facilitate adaptation of the database by other researches for other ecosystems, the code and technical details on how to customize this database and apply it to other ecosystems are freely available and located at the following link (https://github.com/kelpforest-cameo/databaseui).

  3. WholeCellSimDB: a hybrid relational/HDF database for whole-cell model predictions.

    PubMed

    Karr, Jonathan R; Phillips, Nolan C; Covert, Markus W

    2014-01-01

    Mechanistic 'whole-cell' models are needed to develop a complete understanding of cell physiology. However, extracting biological insights from whole-cell models requires running and analyzing large numbers of simulations. We developed WholeCellSimDB, a database for organizing whole-cell simulations. WholeCellSimDB was designed to enable researchers to search simulation metadata to identify simulations for further analysis, and quickly slice and aggregate simulation results data. In addition, WholeCellSimDB enables users to share simulations with the broader research community. The database uses a hybrid relational/hierarchical data format architecture to efficiently store and retrieve both simulation setup metadata and results data. WholeCellSimDB provides a graphical Web-based interface to search, browse, plot and export simulations; a JavaScript Object Notation (JSON) Web service to retrieve data for Web-based visualizations; a command-line interface to deposit simulations; and a Python API to retrieve data for advanced analysis. Overall, we believe WholeCellSimDB will help researchers use whole-cell models to advance basic biological science and bioengineering. http://www.wholecellsimdb.org SOURCE CODE REPOSITORY: URL: http://github.com/CovertLab/WholeCellSimDB. © The Author(s) 2014. Published by Oxford University Press.

  4. IDAAPM: integrated database of ADMET and adverse effects of predictive modeling based on FDA approved drug data.

    PubMed

    Legehar, Ashenafi; Xhaard, Henri; Ghemtio, Leo

    2016-01-01

    The disposition of a pharmaceutical compound within an organism, i.e. its Absorption, Distribution, Metabolism, Excretion, Toxicity (ADMET) properties and adverse effects, critically affects late stage failure of drug candidates and has led to the withdrawal of approved drugs. Computational methods are effective approaches to reduce the number of safety issues by analyzing possible links between chemical structures and ADMET or adverse effects, but this is limited by the size, quality, and heterogeneity of the data available from individual sources. Thus, large, clean and integrated databases of approved drug data, associated with fast and efficient predictive tools are desirable early in the drug discovery process. We have built a relational database (IDAAPM) to integrate available approved drug data such as drug approval information, ADMET and adverse effects, chemical structures and molecular descriptors, targets, bioactivity and related references. The database has been coupled with a searchable web interface and modern data analytics platform (KNIME) to allow data access, data transformation, initial analysis and further predictive modeling. Data were extracted from FDA resources and supplemented from other publicly available databases. Currently, the database contains information regarding about 19,226 FDA approval applications for 31,815 products (small molecules and biologics) with their approval history, 2505 active ingredients, together with as many ADMET properties, 1629 molecular structures, 2.5 million adverse effects and 36,963 experimental drug-target bioactivity data. IDAAPM is a unique resource that, in a single relational database, provides detailed information on FDA approved drugs including their ADMET properties and adverse effects, the corresponding targets with bioactivity data, coupled with a data analytics platform. It can be used to perform basic to complex drug-target ADMET or adverse effects analysis and predictive modeling. IDAAPM is

  5. Choosing a genome browser for a Model Organism Database: surveying the Maize community

    PubMed Central

    Sen, Taner Z.; Harper, Lisa C.; Schaeffer, Mary L.; Andorf, Carson M.; Seigfried, Trent E.; Campbell, Darwin A.; Lawrence, Carolyn J.

    2010-01-01

    As the B73 maize genome sequencing project neared completion, MaizeGDB began to integrate a graphical genome browser with its existing web interface and database. To ensure that maize researchers would optimally benefit from the potential addition of a genome browser to the existing MaizeGDB resource, personnel at MaizeGDB surveyed researchers’ needs. Collected data indicate that existing genome browsers for maize were inadequate and suggest implementation of a browser with quick interface and intuitive tools would meet most researchers’ needs. Here, we document the survey’s outcomes, review functionalities of available genome browser software platforms and offer our rationale for choosing the GBrowse software suite for MaizeGDB. Because the genome as represented within the MaizeGDB Genome Browser is tied to detailed phenotypic data, molecular marker information, available stocks, etc., the MaizeGDB Genome Browser represents a novel mechanism by which the researchers can leverage maize sequence information toward crop improvement directly. Database URL: http://gbrowse.maizegdb.org/ PMID:20627860

  6. Global search tool for the Advanced Photon Source Integrated Relational Model of Installed Systems (IRMIS) database.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Quock, D. E. R.; Cianciarulo, M. B.; APS Engineering Support Division

    2007-01-01

    The Integrated Relational Model of Installed Systems (IRMIS) is a relational database tool that has been implemented at the Advanced Photon Source to maintain an updated account of approximately 600 control system software applications, 400,000 process variables, and 30,000 control system hardware components. To effectively display this large amount of control system information to operators and engineers, IRMIS was initially built with nine Web-based viewers: Applications Organizing Index, IOC, PLC, Component Type, Installed Components, Network, Controls Spares, Process Variables, and Cables. However, since each viewer is designed to provide details from only one major category of the control system, themore » necessity for a one-stop global search tool for the entire database became apparent. The user requirements for extremely fast database search time and ease of navigation through search results led to the choice of Asynchronous JavaScript and XML (AJAX) technology in the implementation of the IRMIS global search tool. Unique features of the global search tool include a two-tier level of displayed search results, and a database data integrity validation and reporting mechanism.« less

  7. Saccharomyces genome database informs human biology

    PubMed Central

    Skrzypek, Marek S; Nash, Robert S; Wong, Edith D; MacPherson, Kevin A; Karra, Kalpana; Binkley, Gail; Simison, Matt; Miyasato, Stuart R

    2018-01-01

    Abstract The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is an expertly curated database of literature-derived functional information for the model organism budding yeast, Saccharomyces cerevisiae. SGD constantly strives to synergize new types of experimental data and bioinformatics predictions with existing data, and to organize them into a comprehensive and up-to-date information resource. The primary mission of SGD is to facilitate research into the biology of yeast and to provide this wealth of information to advance, in many ways, research on other organisms, even those as evolutionarily distant as humans. To build such a bridge between biological kingdoms, SGD is curating data regarding yeast-human complementation, in which a human gene can successfully replace the function of a yeast gene, and/or vice versa. These data are manually curated from published literature, made available for download, and incorporated into a variety of analysis tools provided by SGD. PMID:29140510

  8. Saccharomyces genome database informs human biology.

    PubMed

    Skrzypek, Marek S; Nash, Robert S; Wong, Edith D; MacPherson, Kevin A; Hellerstedt, Sage T; Engel, Stacia R; Karra, Kalpana; Weng, Shuai; Sheppard, Travis K; Binkley, Gail; Simison, Matt; Miyasato, Stuart R; Cherry, J Michael

    2018-01-04

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is an expertly curated database of literature-derived functional information for the model organism budding yeast, Saccharomyces cerevisiae. SGD constantly strives to synergize new types of experimental data and bioinformatics predictions with existing data, and to organize them into a comprehensive and up-to-date information resource. The primary mission of SGD is to facilitate research into the biology of yeast and to provide this wealth of information to advance, in many ways, research on other organisms, even those as evolutionarily distant as humans. To build such a bridge between biological kingdoms, SGD is curating data regarding yeast-human complementation, in which a human gene can successfully replace the function of a yeast gene, and/or vice versa. These data are manually curated from published literature, made available for download, and incorporated into a variety of analysis tools provided by SGD. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Genome databases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Courteau, J.

    1991-10-11

    Since the Genome Project began several years ago, a plethora of databases have been developed or are in the works. They range from the massive Genome Data Base at Johns Hopkins University, the central repository of all gene mapping information, to small databases focusing on single chromosomes or organisms. Some are publicly available, others are essentially private electronic lab notebooks. Still others limit access to a consortium of researchers working on, say, a single human chromosome. An increasing number incorporate sophisticated search and analytical software, while others operate as little more than data lists. In consultation with numerous experts inmore » the field, a list has been compiled of some key genome-related databases. The list was not limited to map and sequence databases but also included the tools investigators use to interpret and elucidate genetic data, such as protein sequence and protein structure databases. Because a major goal of the Genome Project is to map and sequence the genomes of several experimental animals, including E. coli, yeast, fruit fly, nematode, and mouse, the available databases for those organisms are listed as well. The author also includes several databases that are still under development - including some ambitious efforts that go beyond data compilation to create what are being called electronic research communities, enabling many users, rather than just one or a few curators, to add or edit the data and tag it as raw or confirmed.« less

  10. Filling a missing link between biogeochemical, climate and ecosystem studies: a global database of atmospheric water-soluble organic nitrogen

    NASA Astrophysics Data System (ADS)

    Cornell, Sarah

    2015-04-01

    It is time to collate a global community database of atmospheric water-soluble organic nitrogen deposition. Organic nitrogen (ON) has long been known to be globally ubiquitous in atmospheric aerosol and precipitation, with implications for air and water quality, climate, biogeochemical cycles, ecosystems and human health. The number of studies of atmospheric ON deposition has increased steadily in recent years, but to date there is no accessible global dataset, for either bulk ON or its major components. Improved qualitative and quantitative understanding of the organic nitrogen component is needed to complement the well-established knowledge base pertaining to other components of atmospheric deposition (cf. Vet et al 2014). Without this basic information, we are increasingly constrained in addressing the current dynamics and potential interactions of atmospheric chemistry, climate and ecosystem change. To see the full picture we need global data synthesis, more targeted data gathering, and models that let us explore questions about the natural and anthropogenic dynamics of atmospheric ON. Collectively, our research community already has a substantial amount of atmospheric ON data. Published reports extend back over a century and now have near-global coverage. However, datasets available from the literature are very piecemeal and too often lack crucially important information that would enable aggregation or re-use. I am initiating an open collaborative process to construct a community database, so we can begin to systematically synthesize these datasets (generally from individual studies at a local and temporally limited scale) to increase their scientific usability and statistical power for studies of global change and anthropogenic perturbation. In drawing together our disparate knowledge, we must address various challenges and concerns, not least about the comparability of analysis and sampling methodologies, and the known complexity of composition of ON. We

  11. An Online Database for Informing Ecological Network Models: http://kelpforest.ucsc.edu

    PubMed Central

    Beas-Luna, Rodrigo; Novak, Mark; Carr, Mark H.; Tinker, Martin T.; Black, August; Caselle, Jennifer E.; Hoban, Michael; Malone, Dan; Iles, Alison

    2014-01-01

    Ecological network models and analyses are recognized as valuable tools for understanding the dynamics and resiliency of ecosystems, and for informing ecosystem-based approaches to management. However, few databases exist that can provide the life history, demographic and species interaction information necessary to parameterize ecological network models. Faced with the difficulty of synthesizing the information required to construct models for kelp forest ecosystems along the West Coast of North America, we developed an online database (http://kelpforest.ucsc.edu/) to facilitate the collation and dissemination of such information. Many of the database's attributes are novel yet the structure is applicable and adaptable to other ecosystem modeling efforts. Information for each taxonomic unit includes stage-specific life history, demography, and body-size allometries. Species interactions include trophic, competitive, facilitative, and parasitic forms. Each data entry is temporally and spatially explicit. The online data entry interface allows researchers anywhere to contribute and access information. Quality control is facilitated by attributing each entry to unique contributor identities and source citations. The database has proven useful as an archive of species and ecosystem-specific information in the development of several ecological network models, for informing management actions, and for education purposes (e.g., undergraduate and graduate training). To facilitate adaptation of the database by other researches for other ecosystems, the code and technical details on how to customize this database and apply it to other ecosystems are freely available and located at the following link (https://github.com/kelpforest-cameo/databaseui). PMID:25343723

  12. An online database for informing ecological network models: http://kelpforest.ucsc.edu

    USGS Publications Warehouse

    Beas-Luna, Rodrigo; Tinker, M. Tim; Novak, Mark; Carr, Mark H.; Black, August; Caselle, Jennifer E.; Hoban, Michael; Malone, Dan; Iles, Alison C.

    2014-01-01

    Ecological network models and analyses are recognized as valuable tools for understanding the dynamics and resiliency of ecosystems, and for informing ecosystem-based approaches to management. However, few databases exist that can provide the life history, demographic and species interaction information necessary to parameterize ecological network models. Faced with the difficulty of synthesizing the information required to construct models for kelp forest ecosystems along the West Coast of North America, we developed an online database (http://kelpforest.ucsc.edu/) to facilitate the collation and dissemination of such information. Many of the database's attributes are novel yet the structure is applicable and adaptable to other ecosystem modeling efforts. Information for each taxonomic unit includes stage-specific life history, demography, and body-size allometries. Species interactions include trophic, competitive, facilitative, and parasitic forms. Each data entry is temporally and spatially explicit. The online data entry interface allows researchers anywhere to contribute and access information. Quality control is facilitated by attributing each entry to unique contributor identities and source citations. The database has proven useful as an archive of species and ecosystem-specific information in the development of several ecological network models, for informing management actions, and for education purposes (e.g., undergraduate and graduate training). To facilitate adaptation of the database by other researches for other ecosystems, the code and technical details on how to customize this database and apply it to other ecosystems are freely available and located at the following link (https://github.com/kelpforest-cameo/data​baseui).

  13. A data model and database for high-resolution pathology analytical image informatics.

    PubMed

    Wang, Fusheng; Kong, Jun; Cooper, Lee; Pan, Tony; Kurc, Tahsin; Chen, Wenjin; Sharma, Ashish; Niedermayr, Cristobal; Oh, Tae W; Brat, Daniel; Farris, Alton B; Foran, David J; Saltz, Joel

    2011-01-01

    The systematic analysis of imaged pathology specimens often results in a vast amount of morphological information at both the cellular and sub-cellular scales. While microscopy scanners and computerized analysis are capable of capturing and analyzing data rapidly, microscopy image data remain underutilized in research and clinical settings. One major obstacle which tends to reduce wider adoption of these new technologies throughout the clinical and scientific communities is the challenge of managing, querying, and integrating the vast amounts of data resulting from the analysis of large digital pathology datasets. This paper presents a data model, which addresses these challenges, and demonstrates its implementation in a relational database system. This paper describes a data model, referred to as Pathology Analytic Imaging Standards (PAIS), and a database implementation, which are designed to support the data management and query requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines on whole-slide images and tissue microarrays (TMAs). (1) Development of a data model capable of efficiently representing and storing virtual slide related image, annotation, markup, and feature information. (2) Development of a database, based on the data model, capable of supporting queries for data retrieval based on analysis and image metadata, queries for comparison of results from different analyses, and spatial queries on segmented regions, features, and classified objects. The work described in this paper is motivated by the challenges associated with characterization of micro-scale features for comparative and correlative analyses involving whole-slides tissue images and TMAs. Technologies for digitizing tissues have advanced significantly in the past decade. Slide scanners are capable of producing high-magnification, high-resolution images from whole slides and TMAs within several minutes. Hence, it is becoming

  14. VerSeDa: vertebrate secretome database.

    PubMed

    Cortazar, Ana R; Oguiza, José A; Aransay, Ana M; Lavín, José L

    2017-01-01

    Based on the current tools, de novo secretome (full set of proteins secreted by an organism) prediction is a time consuming bioinformatic task that requires a multifactorial analysis in order to obtain reliable in silico predictions. Hence, to accelerate this process and offer researchers a reliable repository where secretome information can be obtained for vertebrates and model organisms, we have developed VerSeDa (Vertebrate Secretome Database). This freely available database stores information about proteins that are predicted to be secreted through the classical and non-classical mechanisms, for the wide range of vertebrate species deposited at the NCBI, UCSC and ENSEMBL sites. To our knowledge, VerSeDa is the only state-of-the-art database designed to store secretome data from multiple vertebrate genomes, thus, saving an important amount of time spent in the prediction of protein features that can be retrieved from this repository directly. VerSeDa is freely available at http://genomics.cicbiogune.es/VerSeDa/index.php. © The Author(s) 2017. Published by Oxford University Press.

  15. The BioGRID interaction database: 2013 update.

    PubMed

    Chatr-Aryamontri, Andrew; Breitkreutz, Bobby-Joe; Heinicke, Sven; Boucher, Lorrie; Winter, Andrew; Stark, Chris; Nixon, Julie; Ramage, Lindsay; Kolas, Nadine; O'Donnell, Lara; Reguly, Teresa; Breitkreutz, Ashton; Sellam, Adnane; Chen, Daici; Chang, Christie; Rust, Jennifer; Livstone, Michael; Oughtred, Rose; Dolinski, Kara; Tyers, Mike

    2013-01-01

    The Biological General Repository for Interaction Datasets (BioGRID: http//thebiogrid.org) is an open access archive of genetic and protein interactions that are curated from the primary biomedical literature for all major model organism species. As of September 2012, BioGRID houses more than 500 000 manually annotated interactions from more than 30 model organisms. BioGRID maintains complete curation coverage of the literature for the budding yeast Saccharomyces cerevisiae, the fission yeast Schizosaccharomyces pombe and the model plant Arabidopsis thaliana. A number of themed curation projects in areas of biomedical importance are also supported. BioGRID has established collaborations and/or shares data records for the annotation of interactions and phenotypes with most major model organism databases, including Saccharomyces Genome Database, PomBase, WormBase, FlyBase and The Arabidopsis Information Resource. BioGRID also actively engages with the text-mining community to benchmark and deploy automated tools to expedite curation workflows. BioGRID data are freely accessible through both a user-defined interactive interface and in batch downloads in a wide variety of formats, including PSI-MI2.5 and tab-delimited files. BioGRID records can also be interrogated and analyzed with a series of new bioinformatics tools, which include a post-translational modification viewer, a graphical viewer, a REST service and a Cytoscape plugin.

  16. Feasibility and utility of applications of the common data model to multiple, disparate observational health databases

    PubMed Central

    Makadia, Rupa; Matcho, Amy; Ma, Qianli; Knoll, Chris; Schuemie, Martijn; DeFalco, Frank J; Londhe, Ajit; Zhu, Vivienne; Ryan, Patrick B

    2015-01-01

    Objectives To evaluate the utility of applying the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) across multiple observational databases within an organization and to apply standardized analytics tools for conducting observational research. Materials and methods Six deidentified patient-level datasets were transformed to the OMOP CDM. We evaluated the extent of information loss that occurred through the standardization process. We developed a standardized analytic tool to replicate the cohort construction process from a published epidemiology protocol and applied the analysis to all 6 databases to assess time-to-execution and comparability of results. Results Transformation to the CDM resulted in minimal information loss across all 6 databases. Patients and observations excluded were due to identified data quality issues in the source system, 96% to 99% of condition records and 90% to 99% of drug records were successfully mapped into the CDM using the standard vocabulary. The full cohort replication and descriptive baseline summary was executed for 2 cohorts in 6 databases in less than 1 hour. Discussion The standardization process improved data quality, increased efficiency, and facilitated cross-database comparisons to support a more systematic approach to observational research. Comparisons across data sources showed consistency in the impact of inclusion criteria, using the protocol and identified differences in patient characteristics and coding practices across databases. Conclusion Standardizing data structure (through a CDM), content (through a standard vocabulary with source code mappings), and analytics can enable an institution to apply a network-based approach to observational research across multiple, disparate observational health databases. PMID:25670757

  17. Data model and relational database design for the New England Water-Use Data System (NEWUDS)

    USGS Publications Warehouse

    Tessler, Steven

    2001-01-01

    The New England Water-Use Data System (NEWUDS) is a database for the storage and retrieval of water-use data. NEWUDS can handle data covering many facets of water use, including (1) tracking various types of water-use activities (withdrawals, returns, transfers, distributions, consumptive-use, wastewater collection, and treatment); (2) the description, classification and location of places and organizations involved in water-use activities; (3) details about measured or estimated volumes of water associated with water-use activities; and (4) information about data sources and water resources associated with water use. In NEWUDS, each water transaction occurs unidirectionally between two site objects, and the sites and conveyances form a water network. The core entities in the NEWUDS model are site, conveyance, transaction/rate, location, and owner. Other important entities include water resources (used for withdrawals and returns), data sources, and aliases. Multiple water-exchange estimates can be stored for individual transactions based on different methods or data sources. Storage of user-defined details is accommodated for several of the main entities. Numerous tables containing classification terms facilitate detailed descriptions of data items and can be used for routine or custom data summarization. NEWUDS handles single-user and aggregate-user water-use data, can be used for large or small water-network projects, and is available as a stand-alone Microsoft? Access database structure. Users can customize and extend the database, link it to other databases, or implement the design in other relational database applications.

  18. Hydroacoustic forcing function modeling using DNS database

    NASA Technical Reports Server (NTRS)

    Zawadzki, I.; Gershfield, J. L.; Na, Y.; Wang, M.

    1996-01-01

    A wall pressure frequency spectrum model (Blake 1971 ) has been evaluated using databases from Direct Numerical Simulations (DNS) of a turbulent boundary layer (Na & Moin 1996). Good agreement is found for moderate to strong adverse pressure gradient flows in the absence of separation. In the separated flow region, the model underpredicts the directly calculated spectra by an order of magnitude. The discrepancy is attributed to the violation of the model assumptions in that part of the flow domain. DNS computed coherence length scales and the normalized wall pressure cross-spectra are compared with experimental data. The DNS results are consistent with experimental observations.

  19. QSAR Modeling Using Large-Scale Databases: Case Study for HIV-1 Reverse Transcriptase Inhibitors.

    PubMed

    Tarasova, Olga A; Urusova, Aleksandra F; Filimonov, Dmitry A; Nicklaus, Marc C; Zakharov, Alexey V; Poroikov, Vladimir V

    2015-07-27

    Large-scale databases are important sources of training sets for various QSAR modeling approaches. Generally, these databases contain information extracted from different sources. This variety of sources can produce inconsistency in the data, defined as sometimes widely diverging activity results for the same compound against the same target. Because such inconsistency can reduce the accuracy of predictive models built from these data, we are addressing the question of how best to use data from publicly and commercially accessible databases to create accurate and predictive QSAR models. We investigate the suitability of commercially and publicly available databases to QSAR modeling of antiviral activity (HIV-1 reverse transcriptase (RT) inhibition). We present several methods for the creation of modeling (i.e., training and test) sets from two, either commercially or freely available, databases: Thomson Reuters Integrity and ChEMBL. We found that the typical predictivities of QSAR models obtained using these different modeling set compilation methods differ significantly from each other. The best results were obtained using training sets compiled for compounds tested using only one method and material (i.e., a specific type of biological assay). Compound sets aggregated by target only typically yielded poorly predictive models. We discuss the possibility of "mix-and-matching" assay data across aggregating databases such as ChEMBL and Integrity and their current severe limitations for this purpose. One of them is the general lack of complete and semantic/computer-parsable descriptions of assay methodology carried by these databases that would allow one to determine mix-and-matchability of result sets at the assay level.

  20. Metabolic network modeling with model organisms.

    PubMed

    Yilmaz, L Safak; Walhout, Albertha Jm

    2017-02-01

    Flux balance analysis (FBA) with genome-scale metabolic network models (GSMNM) allows systems level predictions of metabolism in a variety of organisms. Different types of predictions with different accuracy levels can be made depending on the applied experimental constraints ranging from measurement of exchange fluxes to the integration of gene expression data. Metabolic network modeling with model organisms has pioneered method development in this field. In addition, model organism GSMNMs are useful for basic understanding of metabolism, and in the case of animal models, for the study of metabolic human diseases. Here, we discuss GSMNMs of most highly used model organisms with the emphasis on recent reconstructions. Published by Elsevier Ltd.

  1. Metabolic network modeling with model organisms

    PubMed Central

    Yilmaz, L. Safak; Walhout, Albertha J.M.

    2017-01-01

    Flux balance analysis (FBA) with genome-scale metabolic network models (GSMNM) allows systems level predictions of metabolism in a variety of organisms. Different types of predictions with different accuracy levels can be made depending on the applied experimental constraints ranging from measurement of exchange fluxes to the integration of gene expression data. Metabolic network modeling with model organisms has pioneered method development in this field. In addition, model organism GSMNMs are useful for basic understanding of metabolism, and in the case of animal models, for the study of metabolic human diseases. Here, we discuss GSMNMs of most highly used model organisms with the emphasis on recent reconstructions. PMID:28088694

  2. A Support Database System for Integrated System Health Management (ISHM)

    NASA Technical Reports Server (NTRS)

    Schmalzel, John; Figueroa, Jorge F.; Turowski, Mark; Morris, John

    2007-01-01

    The development, deployment, operation and maintenance of Integrated Systems Health Management (ISHM) applications require the storage and processing of tremendous amounts of low-level data. This data must be shared in a secure and cost-effective manner between developers, and processed within several heterogeneous architectures. Modern database technology allows this data to be organized efficiently, while ensuring the integrity and security of the data. The extensibility and interoperability of the current database technologies also allows for the creation of an associated support database system. A support database system provides additional capabilities by building applications on top of the database structure. These applications can then be used to support the various technologies in an ISHM architecture. This presentation and paper propose a detailed structure and application description for a support database system, called the Health Assessment Database System (HADS). The HADS provides a shared context for organizing and distributing data as well as a definition of the applications that provide the required data-driven support to ISHM. This approach provides another powerful tool for ISHM developers, while also enabling novel functionality. This functionality includes: automated firmware updating and deployment, algorithm development assistance and electronic datasheet generation. The architecture for the HADS has been developed as part of the ISHM toolset at Stennis Space Center for rocket engine testing. A detailed implementation has begun for the Methane Thruster Testbed Project (MTTP) in order to assist in developing health assessment and anomaly detection algorithms for ISHM. The structure of this implementation is shown in Figure 1. The database structure consists of three primary components: the system hierarchy model, the historical data archive and the firmware codebase. The system hierarchy model replicates the physical relationships between

  3. The relational database model and multiple multicenter clinical trials.

    PubMed

    Blumenstein, B A

    1989-12-01

    The Southwest Oncology Group (SWOG) chose to use a relational database management system (RDBMS) for the management of data from multiple clinical trials because of the underlying relational model's inherent flexibility and the natural way multiple entity types (patients, studies, and participants) can be accommodated. The tradeoffs to using the relational model as compared to using the hierarchical model include added computing cycles due to deferred data linkages and added procedural complexity due to the necessity of implementing protections against referential integrity violations. The SWOG uses its RDBMS as a platform on which to build data operations software. This data operations software, which is written in a compiled computer language, allows multiple users to simultaneously update the database and is interactive with respect to the detection of conditions requiring action and the presentation of options for dealing with those conditions. The relational model facilitates the development and maintenance of data operations software.

  4. GlobTherm, a global database on thermal tolerances for aquatic and terrestrial organisms.

    PubMed

    Bennett, Joanne M; Calosi, Piero; Clusella-Trullas, Susana; Martínez, Brezo; Sunday, Jennifer; Algar, Adam C; Araújo, Miguel B; Hawkins, Bradford A; Keith, Sally; Kühn, Ingolf; Rahbek, Carsten; Rodríguez, Laura; Singer, Alexander; Villalobos, Fabricio; Ángel Olalla-Tárraga, Miguel; Morales-Castilla, Ignacio

    2018-03-13

    How climate affects species distributions is a longstanding question receiving renewed interest owing to the need to predict the impacts of global warming on biodiversity. Is climate change forcing species to live near their critical thermal limits? Are these limits likely to change through natural selection? These and other important questions can be addressed with models relating geographical distributions of species with climate data, but inferences made with these models are highly contingent on non-climatic factors such as biotic interactions. Improved understanding of climate change effects on species will require extensive analysis of thermal physiological traits, but such data are both scarce and scattered. To overcome current limitations, we created the GlobTherm database. The database contains experimentally derived species' thermal tolerance data currently comprising over 2,000 species of terrestrial, freshwater, intertidal and marine multicellular algae, plants, fungi, and animals. The GlobTherm database will be maintained and curated by iDiv with the aim to keep expanding it, and enable further investigations on the effects of climate on the distribution of life on Earth.

  5. CyanoBase: the cyanobacteria genome database update 2010.

    PubMed

    Nakao, Mitsuteru; Okamoto, Shinobu; Kohara, Mitsuyo; Fujishiro, Tsunakazu; Fujisawa, Takatomo; Sato, Shusei; Tabata, Satoshi; Kaneko, Takakazu; Nakamura, Yasukazu

    2010-01-01

    CyanoBase (http://genome.kazusa.or.jp/cyanobase) is the genome database for cyanobacteria, which are model organisms for photosynthesis. The database houses cyanobacteria species information, complete genome sequences, genome-scale experiment data, gene information, gene annotations and mutant information. In this version, we updated these datasets and improved the navigation and the visual display of the data views. In addition, a web service API now enables users to retrieve the data in various formats with other tools, seamlessly.

  6. A novel model for estimating organic chemical bioconcentration in agricultural plants

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hung, H.; Mackay, D.; Di Guardo, A.

    1995-12-31

    There is increasing recognition that much human and wildlife exposure to organic contaminants can be traced through the food chain to bioconcentration in vegetation. For risk assessment, there is a need for an accurate model to predict organic chemical concentrations in plants. Existing models range from relatively simple correlations of concentrations using octanol-water or octanol-air partition coefficients, to complex models involving extensive physiological data. To satisfy the need for a relatively accurate model of intermediate complexity, a novel approach has been devised to predict organic chemical concentrations in agricultural plants as a function of soil and air concentrations, without themore » need for extensive plant physiological data. The plant is treated as three compartments, namely, leaves, roots and stems (including fruit and seeds). Data readily available from the literature, including chemical properties, volume, density and composition of each compartment; metabolic and growth rate of plant; and readily obtainable environmental conditions at the site are required as input. Results calculated from the model are compared with observed and experimentally-determined concentrations. It is suggested that the model, which includes a physiological database for agricultural plants, gives acceptably accurate predictions of chemical partitioning between plants, air and soil.« less

  7. Artificial intelligence techniques for modeling database user behavior

    NASA Technical Reports Server (NTRS)

    Tanner, Steve; Graves, Sara J.

    1990-01-01

    The design and development of the adaptive modeling system is described. This system models how a user accesses a relational database management system in order to improve its performance by discovering use access patterns. In the current system, these patterns are used to improve the user interface and may be used to speed data retrieval, support query optimization and support a more flexible data representation. The system models both syntactic and semantic information about the user's access and employs both procedural and rule-based logic to manipulate the model.

  8. Evolution of computational models in BioModels Database and the Physiome Model Repository.

    PubMed

    Scharm, Martin; Gebhardt, Tom; Touré, Vasundra; Bagnacani, Andrea; Salehzadeh-Yazdi, Ali; Wolkenhauer, Olaf; Waltemath, Dagmar

    2018-04-12

    A useful model is one that is being (re)used. The development of a successful model does not finish with its publication. During reuse, models are being modified, i.e. expanded, corrected, and refined. Even small changes in the encoding of a model can, however, significantly affect its interpretation. Our motivation for the present study is to identify changes in models and make them transparent and traceable. We analysed 13734 models from BioModels Database and the Physiome Model Repository. For each model, we studied the frequencies and types of updates between its first and latest release. To demonstrate the impact of changes, we explored the history of a Repressilator model in BioModels Database. We observed continuous updates in the majority of models. Surprisingly, even the early models are still being modified. We furthermore detected that many updates target annotations, which improves the information one can gain from models. To support the analysis of changes in model repositories we developed MoSt, an online tool for visualisations of changes in models. The scripts used to generate the data and figures for this study are available from GitHub https://github.com/binfalse/BiVeS-StatsGenerator and as a Docker image at https://hub.docker.com/r/binfalse/bives-statsgenerator/ . The website https://most.bio.informatik.uni-rostock.de/ provides interactive access to model versions and their evolutionary statistics. The reuse of models is still impeded by a lack of trust and documentation. A detailed and transparent documentation of all aspects of the model, including its provenance, will improve this situation. Knowledge about a model's provenance can avoid the repetition of mistakes that others already faced. More insights are gained into how the system evolves from initial findings to a profound understanding. We argue that it is the responsibility of the maintainers of model repositories to offer transparent model provenance to their users.

  9. Public Opinion Poll Question Databases: An Evaluation

    ERIC Educational Resources Information Center

    Woods, Stephen

    2007-01-01

    This paper evaluates five polling resource: iPOLL, Polling the Nations, Gallup Brain, Public Opinion Poll Question Database, and Polls and Surveys. Content was evaluated on disclosure standards from major polling organizations, scope on a model for public opinion polls, and presentation on a flow chart discussing search limitations and usability.

  10. Feasibility and utility of applications of the common data model to multiple, disparate observational health databases.

    PubMed

    Voss, Erica A; Makadia, Rupa; Matcho, Amy; Ma, Qianli; Knoll, Chris; Schuemie, Martijn; DeFalco, Frank J; Londhe, Ajit; Zhu, Vivienne; Ryan, Patrick B

    2015-05-01

    To evaluate the utility of applying the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) across multiple observational databases within an organization and to apply standardized analytics tools for conducting observational research. Six deidentified patient-level datasets were transformed to the OMOP CDM. We evaluated the extent of information loss that occurred through the standardization process. We developed a standardized analytic tool to replicate the cohort construction process from a published epidemiology protocol and applied the analysis to all 6 databases to assess time-to-execution and comparability of results. Transformation to the CDM resulted in minimal information loss across all 6 databases. Patients and observations excluded were due to identified data quality issues in the source system, 96% to 99% of condition records and 90% to 99% of drug records were successfully mapped into the CDM using the standard vocabulary. The full cohort replication and descriptive baseline summary was executed for 2 cohorts in 6 databases in less than 1 hour. The standardization process improved data quality, increased efficiency, and facilitated cross-database comparisons to support a more systematic approach to observational research. Comparisons across data sources showed consistency in the impact of inclusion criteria, using the protocol and identified differences in patient characteristics and coding practices across databases. Standardizing data structure (through a CDM), content (through a standard vocabulary with source code mappings), and analytics can enable an institution to apply a network-based approach to observational research across multiple, disparate observational health databases. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.

  11. CyanoBase: the cyanobacteria genome database update 2010

    PubMed Central

    Nakao, Mitsuteru; Okamoto, Shinobu; Kohara, Mitsuyo; Fujishiro, Tsunakazu; Fujisawa, Takatomo; Sato, Shusei; Tabata, Satoshi; Kaneko, Takakazu; Nakamura, Yasukazu

    2010-01-01

    CyanoBase (http://genome.kazusa.or.jp/cyanobase) is the genome database for cyanobacteria, which are model organisms for photosynthesis. The database houses cyanobacteria species information, complete genome sequences, genome-scale experiment data, gene information, gene annotations and mutant information. In this version, we updated these datasets and improved the navigation and the visual display of the data views. In addition, a web service API now enables users to retrieve the data in various formats with other tools, seamlessly. PMID:19880388

  12. Verification of road databases using multiple road models

    NASA Astrophysics Data System (ADS)

    Ziems, Marcel; Rottensteiner, Franz; Heipke, Christian

    2017-08-01

    In this paper a new approach for automatic road database verification based on remote sensing images is presented. In contrast to existing methods, the applicability of the new approach is not restricted to specific road types, context areas or geographic regions. This is achieved by combining several state-of-the-art road detection and road verification approaches that work well under different circumstances. Each one serves as an independent module representing a unique road model and a specific processing strategy. All modules provide independent solutions for the verification problem of each road object stored in the database in form of two probability distributions, the first one for the state of a database object (correct or incorrect), and a second one for the state of the underlying road model (applicable or not applicable). In accordance with the Dempster-Shafer Theory, both distributions are mapped to a new state space comprising the classes correct, incorrect and unknown. Statistical reasoning is applied to obtain the optimal state of a road object. A comparison with state-of-the-art road detection approaches using benchmark datasets shows that in general the proposed approach provides results with larger completeness. Additional experiments reveal that based on the proposed method a highly reliable semi-automatic approach for road data base verification can be designed.

  13. Exploring Genetic, Genomic, and Phenotypic Data at the Rat Genome Database

    PubMed Central

    Laulederkind, Stanley J. F.; Hayman, G. Thomas; Wang, Shur-Jen; Lowry, Timothy F.; Nigam, Rajni; Petri, Victoria; Smith, Jennifer R.; Dwinell, Melinda R.; Jacob, Howard J.; Shimoyama, Mary

    2013-01-01

    The laboratory rat, Rattus norvegicus, is an important model of human health and disease, and experimental findings in the rat have relevance to human physiology and disease. The Rat Genome Database (RGD, http://rgd.mcw.edu) is a model organism database that provides access to a wide variety of curated rat data including disease associations, phenotypes, pathways, molecular functions, biological processes and cellular components for genes, quantitative trait loci, and strains. We present an overview of the database followed by specific examples that can be used to gain experience in employing RGD to explore the wealth of functional data available for the rat. PMID:23255149

  14. Teaching Database Modeling and Design: Areas of Confusion and Helpful Hints

    ERIC Educational Resources Information Center

    Philip, George C.

    2007-01-01

    This paper identifies several areas of database modeling and design that have been problematic for students and even are likely to confuse faculty. Major contributing factors are the lack of clarity and inaccuracies that persist in the presentation of some basic database concepts in textbooks. The paper analyzes the problems and discusses ways to…

  15. Private and Efficient Query Processing on Outsourced Genomic Databases.

    PubMed

    Ghasemi, Reza; Al Aziz, Md Momin; Mohammed, Noman; Dehkordi, Massoud Hadian; Jiang, Xiaoqian

    2017-09-01

    Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a time consuming and expensive process. Second, it requires large-scale computation and storage systems to process genomic sequences. Third, genomic databases are often owned by different organizations, and thus, not available for public usage. Cloud computing paradigm can be leveraged to facilitate the creation and sharing of big genomic databases for these applications. Genomic data owners can outsource their databases in a centralized cloud server to ease the access of their databases. However, data owners are reluctant to adopt this model, as it requires outsourcing the data to an untrusted cloud service provider that may cause data breaches. In this paper, we propose a privacy-preserving model for outsourcing genomic data to a cloud. The proposed model enables query processing while providing privacy protection of genomic databases. Privacy of the individuals is guaranteed by permuting and adding fake genomic records in the database. These techniques allow cloud to evaluate count and top-k queries securely and efficiently. Experimental results demonstrate that a count and a top-k query over 40 Single Nucleotide Polymorphisms (SNPs) in a database of 20 000 records takes around 100 and 150 s, respectively.

  16. Private and Efficient Query Processing on Outsourced Genomic Databases

    PubMed Central

    Ghasemi, Reza; Al Aziz, Momin; Mohammed, Noman; Dehkordi, Massoud Hadian; Jiang, Xiaoqian

    2017-01-01

    Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a time-consuming and expensive process. Second, it requires large-scale computation and storage systems to processes genomic sequences. Third, genomic databases are often owned by different organizations and thus not available for public usage. Cloud computing paradigm can be leveraged to facilitate the creation and sharing of big genomic databases for these applications. Genomic data owners can outsource their databases in a centralized cloud server to ease the access of their databases. However, data owners are reluctant to adopt this model, as it requires outsourcing the data to an untrusted cloud service provider that may cause data breaches. In this paper, we propose a privacy-preserving model for outsourcing genomic data to a cloud. The proposed model enables query processing while providing privacy protection of genomic databases. Privacy of the individuals is guaranteed by permuting and adding fake genomic records in the database. These techniques allow cloud to evaluate count and top-k queries securely and efficiently. Experimental results demonstrate that a count and a top-k query over 40 SNPs in a database of 20,000 records takes around 100 and 150 seconds, respectively. PMID:27834660

  17. Hydrologic Derivatives for Modeling and Analysis—A new global high-resolution database

    USGS Publications Warehouse

    Verdin, Kristine L.

    2017-07-17

    The U.S. Geological Survey has developed a new global high-resolution hydrologic derivative database. Loosely modeled on the HYDRO1k database, this new database, entitled Hydrologic Derivatives for Modeling and Analysis, provides comprehensive and consistent global coverage of topographically derived raster layers (digital elevation model data, flow direction, flow accumulation, slope, and compound topographic index) and vector layers (streams and catchment boundaries). The coverage of the data is global, and the underlying digital elevation model is a hybrid of three datasets: HydroSHEDS (Hydrological data and maps based on SHuttle Elevation Derivatives at multiple Scales), GMTED2010 (Global Multi-resolution Terrain Elevation Data 2010), and the SRTM (Shuttle Radar Topography Mission). For most of the globe south of 60°N., the raster resolution of the data is 3 arc-seconds, corresponding to the resolution of the SRTM. For the areas north of 60°N., the resolution is 7.5 arc-seconds (the highest resolution of the GMTED2010 dataset) except for Greenland, where the resolution is 30 arc-seconds. The streams and catchments are attributed with Pfafstetter codes, based on a hierarchical numbering system, that carry important topological information. This database is appropriate for use in continental-scale modeling efforts. The work described in this report was conducted by the U.S. Geological Survey in cooperation with the National Aeronautics and Space Administration Goddard Space Flight Center.

  18. Clinical Prediction Models for Cardiovascular Disease: Tufts Predictive Analytics and Comparative Effectiveness Clinical Prediction Model Database.

    PubMed

    Wessler, Benjamin S; Lai Yh, Lana; Kramer, Whitney; Cangelosi, Michael; Raman, Gowri; Lutz, Jennifer S; Kent, David M

    2015-07-01

    Clinical prediction models (CPMs) estimate the probability of clinical outcomes and hold the potential to improve decision making and individualize care. For patients with cardiovascular disease, there are numerous CPMs available although the extent of this literature is not well described. We conducted a systematic review for articles containing CPMs for cardiovascular disease published between January 1990 and May 2012. Cardiovascular disease includes coronary heart disease, heart failure, arrhythmias, stroke, venous thromboembolism, and peripheral vascular disease. We created a novel database and characterized CPMs based on the stage of development, population under study, performance, covariates, and predicted outcomes. There are 796 models included in this database. The number of CPMs published each year is increasing steadily over time. Seven hundred seventeen (90%) are de novo CPMs, 21 (3%) are CPM recalibrations, and 58 (7%) are CPM adaptations. This database contains CPMs for 31 index conditions, including 215 CPMs for patients with coronary artery disease, 168 CPMs for population samples, and 79 models for patients with heart failure. There are 77 distinct index/outcome pairings. Of the de novo models in this database, 450 (63%) report a c-statistic and 259 (36%) report some information on calibration. There is an abundance of CPMs available for a wide assortment of cardiovascular disease conditions, with substantial redundancy in the literature. The comparative performance of these models, the consistency of effects and risk estimates across models and the actual and potential clinical impact of this body of literature is poorly understood. © 2015 American Heart Association, Inc.

  19. Use of model organism and disease databases to support matchmaking for human disease gene discovery.

    PubMed

    Mungall, Christopher J; Washington, Nicole L; Nguyen-Xuan, Jeremy; Condit, Christopher; Smedley, Damian; Köhler, Sebastian; Groza, Tudor; Shefchek, Kent; Hochheiser, Harry; Robinson, Peter N; Lewis, Suzanna E; Haendel, Melissa A

    2015-10-01

    The Matchmaker Exchange application programming interface (API) allows searching a patient's genotypic or phenotypic profiles across clinical sites, for the purposes of cohort discovery and variant disease causal validation. This API can be used not only to search for matching patients, but also to match against public disease and model organism data. This public disease data enable matching known diseases and variant-phenotype associations using phenotype semantic similarity algorithms developed by the Monarch Initiative. The model data can provide additional evidence to aid diagnosis, suggest relevant models for disease mechanism and treatment exploration, and identify collaborators across the translational divide. The Monarch Initiative provides an implementation of this API for searching multiple integrated sources of data that contextualize the knowledge about any given patient or patient family into the greater biomedical knowledge landscape. While this corpus of data can aid diagnosis, it is also the beginning of research to improve understanding of rare human diseases. © 2015 WILEY PERIODICALS, INC.

  20. The future application of GML database in GIS

    NASA Astrophysics Data System (ADS)

    Deng, Yuejin; Cheng, Yushu; Jing, Lianwen

    2006-10-01

    In 2004, the Geography Markup Language (GML) Implementation Specification (version 3.1.1) was published by Open Geospatial Consortium, Inc. Now more and more applications in geospatial data sharing and interoperability depend on GML. The primary purpose of designing GML is for exchange and transportation of geo-information by standard modeling and encoding of geography phenomena. However, the problems of how to organize and access lots of GML data effectively arise in applications. The research on GML database focuses on these problems. The effective storage of GML data is a hot topic in GIS communities today. GML Database Management System (GDBMS) mainly deals with the problem of storage and management of GML data. Now two types of XML database, namely Native XML Database, and XML-Enabled Database are classified. Since GML is an application of the XML standard to geographic data, the XML database system can also be used for the management of GML. In this paper, we review the status of the art of XML database, including storage, index and query languages, management systems and so on, then move on to the GML database. At the end, the future prospect of GML database in GIS application is presented.

  1. An object model and database for functional genomics.

    PubMed

    Jones, Andrew; Hunt, Ela; Wastling, Jonathan M; Pizarro, Angel; Stoeckert, Christian J

    2004-07-10

    Large-scale functional genomics analysis is now feasible and presents significant challenges in data analysis, storage and querying. Data standards are required to enable the development of public data repositories and to improve data sharing. There is an established data format for microarrays (microarray gene expression markup language, MAGE-ML) and a draft standard for proteomics (PEDRo). We believe that all types of functional genomics experiments should be annotated in a consistent manner, and we hope to open up new ways of comparing multiple datasets used in functional genomics. We have created a functional genomics experiment object model (FGE-OM), developed from the microarray model, MAGE-OM and two models for proteomics, PEDRo and our own model (Gla-PSI-Glasgow Proposal for the Proteomics Standards Initiative). FGE-OM comprises three namespaces representing (i) the parts of the model common to all functional genomics experiments; (ii) microarray-specific components; and (iii) proteomics-specific components. We believe that FGE-OM should initiate discussion about the contents and structure of the next version of MAGE and the future of proteomics standards. A prototype database called RNA And Protein Abundance Database (RAPAD), based on FGE-OM, has been implemented and populated with data from microbial pathogenesis. FGE-OM and the RAPAD schema are available from http://www.gusdb.org/fge.html, along with a set of more detailed diagrams. RAPAD can be accessed by registration at the site.

  2. S-World: A high resolution global soil database for simulation modelling (Invited)

    NASA Astrophysics Data System (ADS)

    Stoorvogel, J. J.

    2013-12-01

    There is an increasing call for high resolution soil information at the global level. A good example for such a call is the Global Gridded Crop Model Intercomparison carried out within AgMIP. While local studies can make use of surveying techniques to collect additional techniques this is practically impossible at the global level. It is therefore important to rely on legacy data like the Harmonized World Soil Database. Several efforts do exist that aim at the development of global gridded soil property databases. These estimates of the variation of soil properties can be used to assess e.g., global soil carbon stocks. However, they do not allow for simulation runs with e.g., crop growth simulation models as these models require a description of the entire pedon rather than a few soil properties. This study provides the required quantitative description of pedons at a 1 km resolution for simulation modelling. It uses the Harmonized World Soil Database (HWSD) for the spatial distribution of soil types, the ISRIC-WISE soil profile database to derive information on soil properties per soil type, and a range of co-variables on topography, climate, and land cover to further disaggregate the available data. The methodology aims to take stock of these available data. The soil database is developed in five main steps. Step 1: All 148 soil types are ordered on the basis of their expected topographic position using e.g., drainage, salinization, and pedogenesis. Using the topographic ordering and combining the HWSD with a digital elevation model allows for the spatial disaggregation of the composite soil units. This results in a new soil map with homogeneous soil units. Step 2: The ranges of major soil properties for the topsoil and subsoil of each of the 148 soil types are derived from the ISRIC-WISE soil profile database. Step 3: A model of soil formation is developed that focuses on the basic conceptual question where we are within the range of a particular soil property

  3. Integrated Functional and Executional Modelling of Software Using Web-Based Databases

    NASA Technical Reports Server (NTRS)

    Kulkarni, Deepak; Marietta, Roberta

    1998-01-01

    NASA's software subsystems undergo extensive modification and updates over the operational lifetimes. It is imperative that modified software should satisfy safety goals. This report discusses the difficulties encountered in doing so and discusses a solution based on integrated modelling of software, use of automatic information extraction tools, web technology and databases. To appear in an article of Journal of Database Management.

  4. Geroprotectors.org: a new, structured and curated database of current therapeutic interventions in aging and age-related disease.

    PubMed

    Moskalev, Alexey; Chernyagina, Elizaveta; de Magalhães, João Pedro; Barardo, Diogo; Thoppil, Harikrishnan; Shaposhnikov, Mikhail; Budovsky, Arie; Fraifeld, Vadim E; Garazha, Andrew; Tsvetkov, Vasily; Bronovitsky, Evgeny; Bogomolov, Vladislav; Scerbacov, Alexei; Kuryan, Oleg; Gurinovich, Roman; Jellen, Leslie C; Kennedy, Brian; Mamoshina, Polina; Dobrovolskaya, Evgeniya; Aliper, Alex; Kaminsky, Dmitry; Zhavoronkov, Alex

    2015-09-01

    As the level of interest in aging research increases, there is a growing number of geroprotectors, or therapeutic interventions that aim to extend the healthy lifespan and repair or reduce aging-related damage in model organisms and, eventually, in humans. There is a clear need for a manually-curated database of geroprotectors to compile and index their effects on aging and age-related diseases and link these effects to relevant studies and multiple biochemical and drug databases. Here, we introduce the first such resource, Geroprotectors (http://geroprotectors.org). Geroprotectors is a public, rapidly explorable database that catalogs over 250 experiments involving over 200 known or candidate geroprotectors that extend lifespan in model organisms. Each compound has a comprehensive profile complete with biochemistry, mechanisms, and lifespan effects in various model organisms, along with information ranging from chemical structure, side effects, and toxicity to FDA drug status. These are presented in a visually intuitive, efficient framework fit for casual browsing or in-depth research alike. Data are linked to the source studies or databases, providing quick and convenient access to original data. The Geroprotectors database facilitates cross-study, cross-organism, and cross-discipline analysis and saves countless hours of inefficient literature and web searching. Geroprotectors is a one-stop, knowledge-sharing, time-saving resource for researchers seeking healthy aging solutions.

  5. Geroprotectors.org: a new, structured and curated database of current therapeutic interventions in aging and age-related disease

    PubMed Central

    Moskalev, Alexey; Chernyagina, Elizaveta; de Magalhães, João Pedro; Barardo, Diogo; Thoppil, Harikrishnan; Shaposhnikov, Mikhail; Budovsky, Arie; Fraifeld, Vadim E.; Garazha, Andrew; Tsvetkov, Vasily; Bronovitsky, Evgeny; Bogomolov, Vladislav; Scerbacov, Alexei; Kuryan, Oleg; Gurinovich, Roman; Jellen, Leslie C.; Kennedy, Brian; Mamoshina, Polina; Dobrovolskaya, Evgeniya; Aliper, Alex; Kaminsky, Dmitry; Zhavoronkov, Alex

    2015-01-01

    As the level of interest in aging research increases, there is a growing number of geroprotectors, or therapeutic interventions that aim to extend the healthy lifespan and repair or reduce aging-related damage in model organisms and, eventually, in humans. There is a clear need for a manually-curated database of geroprotectors to compile and index their effects on aging and age-related diseases and link these effects to relevant studies and multiple biochemical and drug databases. Here, we introduce the first such resource, Geroprotectors (http://geroprotectors.org). Geroprotectors is a public, rapidly explorable database that catalogs over 250 experiments involving over 200 known or candidate geroprotectors that extend lifespan in model organisms. Each compound has a comprehensive profile complete with biochemistry, mechanisms, and lifespan effects in various model organisms, along with information ranging from chemical structure, side effects, and toxicity to FDA drug status. These are presented in a visually intuitive, efficient framework fit for casual browsing or in-depth research alike. Data are linked to the source studies or databases, providing quick and convenient access to original data. The Geroprotectors database facilitates cross-study, cross-organism, and cross-discipline analysis and saves countless hours of inefficient literature and web searching. Geroprotectors is a one-stop, knowledge-sharing, time-saving resource for researchers seeking healthy aging solutions. PMID:26342919

  6. Accelerating Information Retrieval from Profile Hidden Markov Model Databases.

    PubMed

    Tamimi, Ahmad; Ashhab, Yaqoub; Tamimi, Hashem

    2016-01-01

    Profile Hidden Markov Model (Profile-HMM) is an efficient statistical approach to represent protein families. Currently, several databases maintain valuable protein sequence information as profile-HMMs. There is an increasing interest to improve the efficiency of searching Profile-HMM databases to detect sequence-profile or profile-profile homology. However, most efforts to enhance searching efficiency have been focusing on improving the alignment algorithms. Although the performance of these algorithms is fairly acceptable, the growing size of these databases, as well as the increasing demand for using batch query searching approach, are strong motivations that call for further enhancement of information retrieval from profile-HMM databases. This work presents a heuristic method to accelerate the current profile-HMM homology searching approaches. The method works by cluster-based remodeling of the database to reduce the search space, rather than focusing on the alignment algorithms. Using different clustering techniques, 4284 TIGRFAMs profiles were clustered based on their similarities. A representative for each cluster was assigned. To enhance sensitivity, we proposed an extended step that allows overlapping among clusters. A validation benchmark of 6000 randomly selected protein sequences was used to query the clustered profiles. To evaluate the efficiency of our approach, speed and recall values were measured and compared with the sequential search approach. Using hierarchical, k-means, and connected component clustering techniques followed by the extended overlapping step, we obtained an average reduction in time of 41%, and an average recall of 96%. Our results demonstrate that representation of profile-HMMs using a clustering-based approach can significantly accelerate data retrieval from profile-HMM databases.

  7. CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database

    PubMed Central

    Jia, Baofeng; Raphenya, Amogelang R.; Alcock, Brian; Waglechner, Nicholas; Guo, Peiyao; Tsang, Kara K.; Lago, Briony A.; Dave, Biren M.; Pereira, Sheldon; Sharma, Arjun N.; Doshi, Sachin; Courtot, Mélanie; Lo, Raymond; Williams, Laura E.; Frye, Jonathan G.; Elsayegh, Tariq; Sardar, Daim; Westman, Erin L.; Pawlowski, Andrew C.; Johnson, Timothy A.; Brinkman, Fiona S.L.; Wright, Gerard D.; McArthur, Andrew G.

    2017-01-01

    The Comprehensive Antibiotic Resistance Database (CARD; http://arpcard.mcmaster.ca) is a manually curated resource containing high quality reference data on the molecular basis of antimicrobial resistance (AMR), with an emphasis on the genes, proteins and mutations involved in AMR. CARD is ontologically structured, model centric, and spans the breadth of AMR drug classes and resistance mechanisms, including intrinsic, mutation-driven and acquired resistance. It is built upon the Antibiotic Resistance Ontology (ARO), a custom built, interconnected and hierarchical controlled vocabulary allowing advanced data sharing and organization. Its design allows the development of novel genome analysis tools, such as the Resistance Gene Identifier (RGI) for resistome prediction from raw genome sequence. Recent improvements include extensive curation of additional reference sequences and mutations, development of a unique Model Ontology and accompanying AMR detection models to power sequence analysis, new visualization tools, and expansion of the RGI for detection of emergent AMR threats. CARD curation is updated monthly based on an interplay of manual literature curation, computational text mining, and genome analysis. PMID:27789705

  8. Environmental Education Organizations and Programs in Texas: Identifying Patterns through a Database and Survey Approach for Establishing Frameworks for Assessment and Progress

    ERIC Educational Resources Information Center

    Lloyd-Strovas, Jenny D.; Arsuffi, Thomas L.

    2016-01-01

    We examined the diversity of environmental education (EE) in Texas, USA, by developing a framework to assess EE organizations and programs at a large scale: the Environmental Education Database of Organizations and Programs (EEDOP). This framework consisted of the following characteristics: organization/visitor demographics, pedagogy/curriculum,…

  9. Developing a database for pedestrians' earthquake emergency evacuation in indoor scenarios.

    PubMed

    Zhou, Junxue; Li, Sha; Nie, Gaozhong; Fan, Xiwei; Tan, Jinxian; Li, Huayue; Pang, Xiaoke

    2018-01-01

    With the booming development of evacuation simulation software, developing an extensive database in indoor scenarios for evacuation models is imperative. In this paper, we conduct a qualitative and quantitative analysis of the collected videotapes and aim to provide a complete and unitary database of pedestrians' earthquake emergency response behaviors in indoor scenarios, including human-environment interactions. Using the qualitative analysis method, we extract keyword groups and keywords that code the response modes of pedestrians and construct a general decision flowchart using chronological organization. Using the quantitative analysis method, we analyze data on the delay time, evacuation speed, evacuation route and emergency exit choices. Furthermore, we study the effect of classroom layout on emergency evacuation. The database for indoor scenarios provides reliable input parameters and allows the construction of real and effective constraints for use in software and mathematical models. The database can also be used to validate the accuracy of evacuation models.

  10. Human Ageing Genomic Resources: new and updated databases

    PubMed Central

    Tacutu, Robi; Thornton, Daniel; Johnson, Emily; Budovsky, Arie; Barardo, Diogo; Craig, Thomas; Diana, Eugene; Lehmann, Gilad; Toren, Dmitri; Wang, Jingwei; Fraifeld, Vadim E

    2018-01-01

    Abstract In spite of a growing body of research and data, human ageing remains a poorly understood process. Over 10 years ago we developed the Human Ageing Genomic Resources (HAGR), a collection of databases and tools for studying the biology and genetics of ageing. Here, we present HAGR’s main functionalities, highlighting new additions and improvements. HAGR consists of six core databases: (i) the GenAge database of ageing-related genes, in turn composed of a dataset of >300 human ageing-related genes and a dataset with >2000 genes associated with ageing or longevity in model organisms; (ii) the AnAge database of animal ageing and longevity, featuring >4000 species; (iii) the GenDR database with >200 genes associated with the life-extending effects of dietary restriction; (iv) the LongevityMap database of human genetic association studies of longevity with >500 entries; (v) the DrugAge database with >400 ageing or longevity-associated drugs or compounds; (vi) the CellAge database with >200 genes associated with cell senescence. All our databases are manually curated by experts and regularly updated to ensure a high quality data. Cross-links across our databases and to external resources help researchers locate and integrate relevant information. HAGR is freely available online (http://genomics.senescence.info/). PMID:29121237

  11. Leaf respiration ( GlobResp) - global trait database supports Earth System Models

    DOE PAGES

    Wullschleger, Stan D.; Warren, Jeffrey; Thornton, Peter E.

    2015-03-20

    Here we detail how Atkin and his colleagues compiled a global database (GlobResp) that details rates of leaf dark respiration and associated traits from sites that span Arctic tundra to tropical forests. This compilation builds upon earlier research (Reich et al., 1998; Wright et al., 2006) and was supplemented by recent field campaigns and unpublished data.In keeping with other trait databases, GlobResp provides insights on how physiological traits, especially rates of dark respiration, vary as a function of environment and how that variation can be used to inform terrestrial biosphere models and land surface components of Earth System Models. Althoughmore » an important component of plant and ecosystem carbon (C) budgets (Wythers et al., 2013), respiration has only limited representation in models. Seen through the eyes of a plant scientist, Atkin et al. (2015) give readers a unique perspective on the climatic controls on respiration, thermal acclimation and evolutionary adaptation of dark respiration, and insights into the covariation of respiration with other leaf traits. We find there is ample evidence that once large databases are compiled, like GlobResp, they can reveal new knowledge of plant function and provide a valuable resource for hypothesis testing and model development.« less

  12. A genome-scale metabolic flux model of Escherichia coli K–12 derived from the EcoCyc database

    PubMed Central

    2014-01-01

    advantages can be derived from the combination of model organism databases and flux balance modeling represented by MetaFlux. Interpretation of the EcoCyc database as a flux balance model results in a highly accurate metabolic model and provides a rigorous consistency check for information stored in the database. PMID:24974895

  13. The BioImage Database Project: organizing multidimensional biological images in an object-relational database.

    PubMed

    Carazo, J M; Stelzer, E H

    1999-01-01

    The BioImage Database Project collects and structures multidimensional data sets recorded by various microscopic techniques relevant to modern life sciences. It provides, as precisely as possible, the circumstances in which the sample was prepared and the data were recorded. It grants access to the actual data and maintains links between related data sets. In order to promote the interdisciplinary approach of modern science, it offers a large set of key words, which covers essentially all aspects of microscopy. Nonspecialists can, therefore, access and retrieve significant information recorded and submitted by specialists in other areas. A key issue of the undertaking is to exploit the available technology and to provide a well-defined yet flexible structure for dealing with data. Its pivotal element is, therefore, a modern object relational database that structures the metadata and ameliorates the provision of a complete service. The BioImage database can be accessed through the Internet. Copyright 1999 Academic Press.

  14. Mycobacteriophage genome database.

    PubMed

    Joseph, Jerrine; Rajendran, Vasanthi; Hassan, Sameer; Kumar, Vanaja

    2011-01-01

    Mycobacteriophage genome database (MGDB) is an exclusive repository of the 64 completely sequenced mycobacteriophages with annotated information. It is a comprehensive compilation of the various gene parameters captured from several databases pooled together to empower mycobacteriophage researchers. The MGDB (Version No.1.0) comprises of 6086 genes from 64 mycobacteriophages classified into 72 families based on ACLAME database. Manual curation was aided by information available from public databases which was enriched further by analysis. Its web interface allows browsing as well as querying the classification. The main objective is to collect and organize the complexity inherent to mycobacteriophage protein classification in a rational way. The other objective is to browse the existing and new genomes and describe their functional annotation. The database is available for free at http://mpgdb.ibioinformatics.org/mpgdb.php.

  15. Constructing distributed Hippocratic video databases for privacy-preserving online patient training and counseling.

    PubMed

    Peng, Jinye; Babaguchi, Noboru; Luo, Hangzai; Gao, Yuli; Fan, Jianping

    2010-07-01

    Digital video now plays an important role in supporting more profitable online patient training and counseling, and integration of patient training videos from multiple competitive organizations in the health care network will result in better offerings for patients. However, privacy concerns often prevent multiple competitive organizations from sharing and integrating their patient training videos. In addition, patients with infectious or chronic diseases may not want the online patient training organizations to identify who they are or even which video clips they are interested in. Thus, there is an urgent need to develop more effective techniques to protect both video content privacy and access privacy . In this paper, we have developed a new approach to construct a distributed Hippocratic video database system for supporting more profitable online patient training and counseling. First, a new database modeling approach is developed to support concept-oriented video database organization and assign a degree of privacy of the video content for each database level automatically. Second, a new algorithm is developed to protect the video content privacy at the level of individual video clip by filtering out the privacy-sensitive human objects automatically. In order to integrate the patient training videos from multiple competitive organizations for constructing a centralized video database indexing structure, a privacy-preserving video sharing scheme is developed to support privacy-preserving distributed classifier training and prevent the statistical inferences from the videos that are shared for cross-validation of video classifiers. Our experiments on large-scale video databases have also provided very convincing results.

  16. The Chinchilla Research Resource Database: resource for an otolaryngology disease model

    PubMed Central

    Shimoyama, Mary; Smith, Jennifer R.; De Pons, Jeff; Tutaj, Marek; Khampang, Pawjai; Hong, Wenzhou; Erbe, Christy B.; Ehrlich, Garth D.; Bakaletz, Lauren O.; Kerschner, Joseph E.

    2016-01-01

    The long-tailed chinchilla (Chinchilla lanigera) is an established animal model for diseases of the inner and middle ear, among others. In particular, chinchilla is commonly used to study diseases involving viral and bacterial pathogens and polymicrobial infections of the upper respiratory tract and the ear, such as otitis media. The value of the chinchilla as a model for human diseases prompted the sequencing of its genome in 2012 and the more recent development of the Chinchilla Research Resource Database (http://crrd.mcw.edu) to provide investigators with easy access to relevant datasets and software tools to enhance their research. The Chinchilla Research Resource Database contains a complete catalog of genes for chinchilla and, for comparative purposes, human. Chinchilla genes can be viewed in the context of their genomic scaffold positions using the JBrowse genome browser. In contrast to the corresponding records at NCBI, individual gene reports at CRRD include functional annotations for Disease, Gene Ontology (GO) Biological Process, GO Molecular Function, GO Cellular Component and Pathway assigned to chinchilla genes based on annotations from the corresponding human orthologs. Data can be retrieved via keyword and gene-specific searches. Lists of genes with similar functional attributes can be assembled by leveraging the hierarchical structure of the Disease, GO and Pathway vocabularies through the Ontology Search and Browser tool. Such lists can then be further analyzed for commonalities using the Gene Annotator (GA) Tool. All data in the Chinchilla Research Resource Database is freely accessible and downloadable via the CRRD FTP site or using the download functions available in the search and analysis tools. The Chinchilla Research Resource Database is a rich resource for researchers using, or considering the use of, chinchilla as a model for human disease. Database URL: http://crrd.mcw.edu PMID:27173523

  17. Evaluation of low wind modeling approaches for two tall-stack databases.

    PubMed

    Paine, Robert; Samani, Olga; Kaplan, Mary; Knipping, Eladio; Kumar, Naresh

    2015-11-01

    The performance of the AERMOD air dispersion model under low wind speed conditions, especially for applications with only one level of meteorological data and no direct turbulence measurements or vertical temperature gradient observations, is the focus of this study. The analysis documented in this paper addresses evaluations for low wind conditions involving tall stack releases for which multiple years of concurrent emissions, meteorological data, and monitoring data are available. AERMOD was tested on two field-study databases involving several SO2 monitors and hourly emissions data that had sub-hourly meteorological data (e.g., 10-min averages) available using several technical options: default mode, with various low wind speed beta options, and using the available sub-hourly meteorological data. These field study databases included (1) Mercer County, a North Dakota database featuring five SO2 monitors within 10 km of the Dakota Gasification Company's plant and the Antelope Valley Station power plant in an area of both flat and elevated terrain, and (2) a flat-terrain setting database with four SO2 monitors within 6 km of the Gibson Generating Station in southwest Indiana. Both sites featured regionally representative 10-m meteorological databases, with no significant terrain obstacles between the meteorological site and the emission sources. The low wind beta options show improvement in model performance helping to reduce some of the over-prediction biases currently present in AERMOD when run with regulatory default options. The overall findings with the low wind speed testing on these tall stack field-study databases indicate that AERMOD low wind speed options have a minor effect for flat terrain locations, but can have a significant effect for elevated terrain locations. The performance of AERMOD using low wind speed options leads to improved consistency of meteorological conditions associated with the highest observed and predicted concentration events. The

  18. Expanding on Successful Concepts, Models, and Organization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Teeguarden, Justin G.; Tan, Yu-Mei; Edwards, Stephen W.

    In her letter to the editor1 regarding our recent Feature Article “Completing the Link between Exposure Science and Toxicology for Improved Environmental Health Decision Making: The Aggregate Exposure Pathway Framework” 2, Dr. von Göetz expressed several concerns about terminology, and the perception that we propose the replacement of successful approaches and models for exposure assessment with a concept. We are glad to have the opportunity to address these issues here. If the goal of the AEP framework was to replace existing exposure models or databases for organizing exposure data with a concept, we would share Dr. von Göetz concerns. Instead,more » the outcome we promote is broader use of an organizational framework for exposure science. The framework would support improved generation, organization, and interpretation of data as well as modeling and prediction, not replacement of models. The field of toxicology has seen the benefits of wide use of one or more organizational frameworks (e.g., mode and mechanism of action, adverse outcome pathway). These frameworks influence how experiments are designed, data are collected, curated, stored and interpreted and ultimately how data are used in risk assessment. Exposure science is poised to similarly benefit from broader use of a parallel organizational framework, which Dr. von Göetz correctly points out, is currently used in the exposure modeling community. In our view, the concepts used so effectively in the exposure modeling community, expanded upon in the AEP framework, could see wider adoption by the field as a whole. The value of such a framework was recognized by the National Academy of Sciences.3 Replacement of models, databases, or any application with the AEP framework was not proposed in our article. The positive role broader more consistent use of such a framework might have in enabling and advancing “general activities such as data acquisition, organization…,” and exposure modeling was

  19. The BioGRID Interaction Database: 2011 update

    PubMed Central

    Stark, Chris; Breitkreutz, Bobby-Joe; Chatr-aryamontri, Andrew; Boucher, Lorrie; Oughtred, Rose; Livstone, Michael S.; Nixon, Julie; Van Auken, Kimberly; Wang, Xiaodong; Shi, Xiaoqi; Reguly, Teresa; Rust, Jennifer M.; Winter, Andrew; Dolinski, Kara; Tyers, Mike

    2011-01-01

    The Biological General Repository for Interaction Datasets (BioGRID) is a public database that archives and disseminates genetic and protein interaction data from model organisms and humans (http://www.thebiogrid.org). BioGRID currently holds 347 966 interactions (170 162 genetic, 177 804 protein) curated from both high-throughput data sets and individual focused studies, as derived from over 23 000 publications in the primary literature. Complete coverage of the entire literature is maintained for budding yeast (Saccharomyces cerevisiae), fission yeast (Schizosaccharomyces pombe) and thale cress (Arabidopsis thaliana), and efforts to expand curation across multiple metazoan species are underway. The BioGRID houses 48 831 human protein interactions that have been curated from 10 247 publications. Current curation drives are focused on particular areas of biology to enable insights into conserved networks and pathways that are relevant to human health. The BioGRID 3.0 web interface contains new search and display features that enable rapid queries across multiple data types and sources. An automated Interaction Management System (IMS) is used to prioritize, coordinate and track curation across international sites and projects. BioGRID provides interaction data to several model organism databases, resources such as Entrez-Gene and other interaction meta-databases. The entire BioGRID 3.0 data collection may be downloaded in multiple file formats, including PSI MI XML. Source code for BioGRID 3.0 is freely available without any restrictions. PMID:21071413

  20. UGTA Photograph Database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    NSTec Environmental Restoration

    One of the advantages of the Nevada Test Site (NTS) is that most of the geologic and hydrologic features such as hydrogeologic units (HGUs), hydrostratigraphic units (HSUs), and faults, which are important aspects of flow and transport modeling, are exposed at the surface somewhere in the vicinity of the NTS and thus are available for direct observation. However, due to access restrictions and the remote locations of many of the features, most Underground Test Area (UGTA) participants cannot observe these features directly in the field. Fortunately, National Security Technologies, LLC, geologists and their predecessors have photographed many of these featuresmore » through the years. During fiscal year 2009, work was done to develop an online photograph database for use by the UGTA community. Photographs were organized, compiled, and imported into Adobe® Photoshop® Elements 7. The photographs were then assigned keyword tags such as alteration type, HGU, HSU, location, rock feature, rock type, and stratigraphic unit. Some fully tagged photographs were then selected and uploaded to the UGTA website. This online photograph database provides easy access for all UGTA participants and can help “ground truth” their analytical and modeling tasks. It also provides new participants a resource to more quickly learn the geology and hydrogeology of the NTS.« less

  1. PDXliver: a database of liver cancer patient derived xenograft mouse models.

    PubMed

    He, Sheng; Hu, Bo; Li, Chao; Lin, Ping; Tang, Wei-Guo; Sun, Yun-Fan; Feng, Fang-You-Min; Guo, Wei; Li, Jia; Xu, Yang; Yao, Qian-Lan; Zhang, Xin; Qiu, Shuang-Jian; Zhou, Jian; Fan, Jia; Li, Yi-Xue; Li, Hong; Yang, Xin-Rong

    2018-05-09

    Liver cancer is the second leading cause of cancer-related deaths and characterized by heterogeneity and drug resistance. Patient-derived xenograft (PDX) models have been widely used in cancer research because they reproduce the characteristics of original tumors. However, the current studies of liver cancer PDX mice are scattered and the number of available PDX models are too small to represent the heterogeneity of liver cancer patients. To improve this situation and to complement available PDX models related resources, here we constructed a comprehensive database, PDXliver, to integrate and analyze liver cancer PDX models. Currently, PDXliver contains 116 PDX models from Chinese liver cancer patients, 51 of them were established by the in-house PDX platform and others were curated from the public literatures. These models are annotated with complete information, including clinical characteristics of patients, genome-wide expression profiles, germline variations, somatic mutations and copy number alterations. Analysis of expression subtypes and mutated genes show that PDXliver represents the diversity of human patients. Another feature of PDXliver is storing drug response data of PDX mice, which makes it possible to explore the association between molecular profiles and drug sensitivity. All data can be accessed via the Browse and Search pages. Additionally, two tools are provided to interactively visualize the omics data of selected PDXs or to compare two groups of PDXs. As far as we known, PDXliver is the first public database of liver cancer PDX models. We hope that this comprehensive resource will accelerate the utility of PDX models and facilitate liver cancer research. The PDXliver database is freely available online at: http://www.picb.ac.cn/PDXliver/.

  2. Sediment-Hosted Copper Deposits of the World: Deposit Models and Database

    USGS Publications Warehouse

    Cox, Dennis P.; Lindsey, David A.; Singer, Donald A.; Diggles, Michael F.

    2003-01-01

    Introduction This publication contains four descriptive models and four grade-tonnage models for sediment hosted copper deposits. Descriptive models are useful in exploration planning and resource assessment because they enable the user to identify deposits in the field and to identify areas on geologic and geophysical maps where deposits could occur. Grade and tonnage models are used in resource assessment to predict the likelihood of different combinations of grades and tonnages that could occur in undiscovered deposits in a specific area. They are also useful in exploration in deciding what deposit types meet the economic objectives of the exploration company. The models in this report supersede the sediment-hosted copper models in USGS Bulletin 1693 (Cox, 1986, and Mosier and others, 1986) and are subdivided into a general type and three subtypes. The general model is useful in classifying deposits whose features are obscured by metamorphism or are otherwise poorly described, and for assessing regions in which the geologic environments are poorly understood. The three subtypes are based on differences in deposit form and environments of deposition. These differences are described under subtypes in the general model. Deposit models are based on the descriptions of geologic environments and physical characteristics, and on metal grades and tonnages of many individual deposits. Data used in this study are presented in a database representing 785 deposits in nine continents. This database was derived partly from data published by Kirkham and others (1994) and from new information in recent publications. To facilitate the construction of grade and tonnage models, the information, presented by Kirkham in disaggregated form, was brought together to provide a single grade and a single tonnage for each deposit. Throughout the report individual deposits are defined as being more than 2,000 meters from the nearest adjacent deposit. The deposit models are presented here as

  3. Guidelines for the Effective Use of Entity-Attribute-Value Modeling for Biomedical Databases

    PubMed Central

    Dinu, Valentin; Nadkarni, Prakash

    2007-01-01

    Purpose To introduce the goals of EAV database modeling, to describe the situations where Entity-Attribute-Value (EAV) modeling is a useful alternative to conventional relational methods of database modeling, and to describe the fine points of implementation in production systems. Methods We analyze the following circumstances: 1) data are sparse and have a large number of applicable attributes, but only a small fraction will apply to a given entity; 2) numerous classes of data need to be represented, each class has a limited number of attributes, but the number of instances of each class is very small. We also consider situations calling for a mixed approach where both conventional and EAV design are used for appropriate data classes. Results and Conclusions In robust production systems, EAV-modeled databases trade a modest data sub-schema for a complex metadata sub-schema. The need to design the metadata effectively makes EAV design potentially more challenging than conventional design. PMID:17098467

  4. Biocuration at the Saccharomyces Genome Database

    PubMed Central

    Skrzypek, Marek S.; Nash, Robert S.

    2015-01-01

    Saccharomyces Genome Database is an online resource dedicated to managing information about the biology and genetics of the model organism, yeast (Saccharomyces cerevisiae). This information is derived primarily from scientific publications through a process of human curation that involves manual extraction of data and their organization into a comprehensive system of knowledge. This system provides a foundation for further analysis of experimental data coming from research on yeast as well as other organisms. In this review we will demonstrate how biocuration and biocurators add a key component, the biological context, to our understanding of how genes, proteins, genomes and cells function and interact. We will explain the role biocurators play in sifting through the wealth of biological data to incorporate and connect key information. We will also discuss the many ways we assist researchers with their various research needs. We hope to convince the reader that manual curation is vital in converting the flood of data into organized and interconnected knowledge, and that biocurators play an essential role in the integration of scientific information into a coherent model of the cell. PMID:25997651

  5. Accounting for natural organic matter in aqueous chemical equilibrium models: a review of the theories and applications

    NASA Astrophysics Data System (ADS)

    Dudal, Yves; Gérard, Frédéric

    2004-08-01

    Soil organic matter consists of a highly complex and diversified blend of organic molecules, ranging from low molecular weight organic acids (LMWOAs), sugars, amines, alcohols, etc., to high apparent molecular weight fulvic and humic acids. The presence of a wide range of functional groups on these molecules makes them very reactive and influential in soil chemistry, in regards to acid-base chemistry, metal complexation, precipitation and dissolution of minerals and microbial reactions. Out of these functional groups, the carboxylic and phenolic ones are the most abundant and most influential in regards to metal complexation. Therefore, chemical equilibrium models have progressively dealt with organic matter in their calculations. This paper presents a review of six chemical equilibrium models, namely N ICA-Donnan, E Q3/6, G EOCHEM, M INTEQA2, P HREEQC and W HAM, in light of the account they make of natural organic matter (NOM) with the objective of helping potential users in choosing a modelling approach. The account has taken various faces, mainly by adding specific molecules within the existing model databases (E Q3/6, G EOCHEM, and P HREEQC) or by using either a discrete (W HAM) or a continuous (N ICA-Donnan and M INTEQA2) distribution of the deprotonated carboxylic and phenolic groups. The different ways in which soil organic matter has been integrated into these models are discussed in regards to the model-experiment comparisons that were found in the literature, concerning applications to either laboratory or natural systems. Much of the attention has been focused on the two most advanced models, W HAM and N ICA-Donnan, which are able to reasonably describe most of the experimental results. Nevertheless, a better knowledge of the humic substances metal-binding properties is needed to better constrain model inputs with site-specific parameter values. This represents the main axis of research that needs to be carried out to improve the models. In addition to

  6. Analysis of a virtual memory model for maintaining database views

    NASA Technical Reports Server (NTRS)

    Kinsley, Kathryn C.; Hughes, Charles E.

    1992-01-01

    This paper presents an analytical model for predicting the performance of a new support strategy for database views. This strategy, called the virtual method, is compared with traditional methods for supporting views. The analytical model's predictions of improved performance by the virtual method are then validated by comparing these results with those achieved in an experimental implementation.

  7. Solubility Database

    National Institute of Standards and Technology Data Gateway

    SRD 106 IUPAC-NIST Solubility Database (Web, free access)   These solubilities are compiled from 18 volumes (Click here for List) of the International Union for Pure and Applied Chemistry(IUPAC)-NIST Solubility Data Series. The database includes liquid-liquid, solid-liquid, and gas-liquid systems. Typical solvents and solutes include water, seawater, heavy water, inorganic compounds, and a variety of organic compounds such as hydrocarbons, halogenated hydrocarbons, alcohols, acids, esters and nitrogen compounds. There are over 67,500 solubility measurements and over 1800 references.

  8. Asteroid models from the Lowell photometric database

    NASA Astrophysics Data System (ADS)

    Ďurech, J.; Hanuš, J.; Oszkiewicz, D.; Vančo, R.

    2016-03-01

    Context. Information about shapes and spin states of individual asteroids is important for the study of the whole asteroid population. For asteroids from the main belt, most of the shape models available now have been reconstructed from disk-integrated photometry by the lightcurve inversion method. Aims: We want to significantly enlarge the current sample (~350) of available asteroid models. Methods: We use the lightcurve inversion method to derive new shape models and spin states of asteroids from the sparse-in-time photometry compiled in the Lowell Photometric Database. To speed up the time-consuming process of scanning the period parameter space through the use of convex shape models, we use the distributed computing project Asteroids@home, running on the Berkeley Open Infrastructure for Network Computing (BOINC) platform. This way, the period-search interval is divided into hundreds of smaller intervals. These intervals are scanned separately by different volunteers and then joined together. We also use an alternative, faster, approach when searching the best-fit period by using a model of triaxial ellipsoid. By this, we can independently confirm periods found with convex models and also find rotation periods for some of those asteroids for which the convex-model approach gives too many solutions. Results: From the analysis of Lowell photometric data of the first 100 000 numbered asteroids, we derived 328 new models. This almost doubles the number of available models. We tested the reliability of our results by comparing models that were derived from purely Lowell data with those based on dense lightcurves, and we found that the rate of false-positive solutions is very low. We also present updated plots of the distribution of spin obliquities and pole ecliptic longitudes that confirm previous findings about a non-uniform distribution of spin axes. However, the models reconstructed from noisy sparse data are heavily biased towards more elongated bodies with high

  9. Risk model of valve surgery in Japan using the Japan Adult Cardiovascular Surgery Database.

    PubMed

    Motomura, Noboru; Miyata, Hiroaki; Tsukihara, Hiroyuki; Takamoto, Shinichi

    2010-11-01

    Risk models of cardiac valve surgery using a large database are useful for improving surgical quality. In order to obtain accurate, high-quality assessments of surgical outcome, each geographic area should maintain its own database. The study aim was to collect Japanese data and to prepare a risk stratification of cardiac valve procedures, using the Japan Adult Cardiovascular Surgery Database (JACVSD). A total of 6562 valve procedure records from 97 participating sites throughout Japan was analyzed, using a data entry form with 255 variables that was sent to the JACVSD office from a web-based data collection system. The statistical model was constructed using multiple logistic regression. Model discrimination was tested using the area under the receiver operating characteristic curve (C-index). The model calibration was tested using the Hosmer-Lemeshow (H-L) test. Among 6562 operated cases, 15% had diabetes mellitus, 5% were urgent, and 12% involved preoperative renal failure. The observed 30-day and operative mortality rates were 2.9% and 4.0%, respectively. Significant variables with high odds ratios included emergent or salvage status (3.83), reoperation (3.43), and left ventricular dysfunction (3.01). The H-L test and C-index values for 30-day mortality were satisfactory (0.44 and 0.80, respectively). The results obtained in Japan were at least as good as those reported elsewhere. The performance of this risk model also matched that of the STS National Adult Cardiac Database and the European Society Database.

  10. Scaling laws and model of words organization in spoken and written language

    NASA Astrophysics Data System (ADS)

    Bian, Chunhua; Lin, Ruokuang; Zhang, Xiaoyu; Ma, Qianli D. Y.; Ivanov, Plamen Ch.

    2016-01-01

    A broad range of complex physical and biological systems exhibits scaling laws. The human language is a complex system of words organization. Studies of written texts have revealed intriguing scaling laws that characterize the frequency of words occurrence, rank of words, and growth in the number of distinct words with text length. While studies have predominantly focused on the language system in its written form, such as books, little attention is given to the structure of spoken language. Here we investigate a database of spoken language transcripts and written texts, and we uncover that words organization in both spoken language and written texts exhibits scaling laws, although with different crossover regimes and scaling exponents. We propose a model that provides insight into words organization in spoken language and written texts, and successfully accounts for all scaling laws empirically observed in both language forms.

  11. Planform: an application and database of graph-encoded planarian regenerative experiments.

    PubMed

    Lobo, Daniel; Malone, Taylor J; Levin, Michael

    2013-04-15

    Understanding the mechanisms governing the regeneration capabilities of many organisms is a fundamental interest in biology and medicine. An ever-increasing number of manipulation and molecular experiments are attempting to discover a comprehensive model for regeneration, with the planarian flatworm being one of the most important model species. Despite much effort, no comprehensive, constructive, mechanistic models exist yet, and it is now clear that computational tools are needed to mine this huge dataset. However, until now, there is no database of regenerative experiments, and the current genotype-phenotype ontologies and databases are based on textual descriptions, which are not understandable by computers. To overcome these difficulties, we present here Planform (Planarian formalization), a manually curated database and software tool for planarian regenerative experiments, based on a mathematical graph formalism. The database contains more than a thousand experiments from the main publications in the planarian literature. The software tool provides the user with a graphical interface to easily interact with and mine the database. The presented system is a valuable resource for the regeneration community and, more importantly, will pave the way for the application of novel artificial intelligence tools to extract knowledge from this dataset. The database and software tool are freely available at http://planform.daniel-lobo.com.

  12. ADASS Web Database XML Project

    NASA Astrophysics Data System (ADS)

    Barg, M. I.; Stobie, E. B.; Ferro, A. J.; O'Neil, E. J.

    In the spring of 2000, at the request of the ADASS Program Organizing Committee (POC), we began organizing information from previous ADASS conferences in an effort to create a centralized database. The beginnings of this database originated from data (invited speakers, participants, papers, etc.) extracted from HyperText Markup Language (HTML) documents from past ADASS host sites. Unfortunately, not all HTML documents are well formed and parsing them proved to be an iterative process. It was evident at the beginning that if these Web documents were organized in a standardized way, such as XML (Extensible Markup Language), the processing of this information across the Web could be automated, more efficient, and less error prone. This paper will briefly review the many programming tools available for processing XML, including Java, Perl and Python, and will explore the mapping of relational data from our MySQL database to XML.

  13. Inorganic bromine in organic molecular crystals: Database survey and four case studies

    NASA Astrophysics Data System (ADS)

    Nemec, Vinko; Lisac, Katarina; Stilinović, Vladimir; Cinčić, Dominik

    2017-01-01

    We present a Cambridge Structural Database and experimental study of multicomponent molecular crystals containing bromine. The CSD study covers supramolecular behaviour of bromide and tribromide anions as well as halogen bonded dibromine molecules in crystal structures of organic salts and cocrystals, and a study of the geometries and complexities in polybromide anion systems. In addition, we present four case studies of organic structures with bromide, tribromide and polybromide anions as well as the neutral dibromine molecule. These include the first observed crystal with diprotonated phenazine, a double salt of phenazinium bromide and tribromide, a cocrystal of 4-methoxypyridine with the neutral dibromine molecule as a halogen bond donor, as well as bis(4-methoxypyridine)bromonium polybromide. Structural features of the four case studies are in the most part consistent with the statistically prevalent behaviour indicated by the CSD study for given bromine species, although they do exhibit some unorthodox structural features and in that indicate possible supramolecular causes for aberrations from the statistically most abundant (and presumably most favourable) geometries.

  14. Relational-database model for improving quality assurance and process control in a composite manufacturing environment

    NASA Astrophysics Data System (ADS)

    Gentry, Jeffery D.

    2000-05-01

    A relational database is a powerful tool for collecting and analyzing the vast amounts of inner-related data associated with the manufacture of composite materials. A relational database contains many individual database tables that store data that are related in some fashion. Manufacturing process variables as well as quality assurance measurements can be collected and stored in database tables indexed according to lot numbers, part type or individual serial numbers. Relationships between manufacturing process and product quality can then be correlated over a wide range of product types and process variations. This paper presents details on how relational databases are used to collect, store, and analyze process variables and quality assurance data associated with the manufacture of advanced composite materials. Important considerations are covered including how the various types of data are organized and how relationships between the data are defined. Employing relational database techniques to establish correlative relationships between process variables and quality assurance measurements is then explored. Finally, the benefits of database techniques such as data warehousing, data mining and web based client/server architectures are discussed in the context of composite material manufacturing.

  15. Automated Hierarchical to CODASYL (Conference on Data Systems Languages) Database Interface Schema Translator.

    DTIC Science & Technology

    1983-12-16

    management system (DBMS) is to record and maintain information used by an organization in the organization’s decision-making process. Some advantages of a...independence. Database Management Systems are classified into three major models; relational, network, and hierarchical. Each model uses a software...feeling impedes the overall effectiveness of the 4-" Acquisition Management Information System (AMIS), which currently uses S2k. The size of the AMIS

  16. Organizations challenged by global database development

    USGS Publications Warehouse

    Sturdevant, J.A.; Eidenshink, J.C.; Loveland, Thomas R.

    1991-01-01

    Several international programs have identified the need for a global 1-kilometer spatial database for land cover and land characterization studies. In 1992, the US Geological Survey (USGS) EROS Data Center (EDC), the European Space Agency (ESA), the National Oceanic and Atmospheric Administration (NOAA) and the National Aeronautics and Space Administration (NASA) will collect and archive all 1-kilometer Advanced Very High Resolution Radiometer (AVHRR) data acquired during afternoon orbital passes over land.

  17. Transport and Environment Database System (TRENDS): Maritime air pollutant emission modelling

    NASA Astrophysics Data System (ADS)

    Georgakaki, Aliki; Coffey, Robert A.; Lock, Graham; Sorenson, Spencer C.

    This paper reports the development of the maritime module within the framework of the Transport and Environment Database System (TRENDS) project. A detailed database has been constructed for the calculation of energy consumption and air pollutant emissions. Based on an in-house database of commercial vessels kept at the Technical University of Denmark, relationships between the fuel consumption and size of different vessels have been developed, taking into account the fleet's age and service speed. The technical assumptions and factors incorporated in the database are presented, including changes from findings reported in Methodologies for Estimating air pollutant Emissions from Transport (MEET). The database operates on statistical data provided by Eurostat, which describe vessel and freight movements from and towards EU 15 major ports. Data are at port to Maritime Coastal Area (MCA) level, so a bottom-up approach is used. A port to MCA distance database has also been constructed for the purpose of the study. This was the first attempt to use Eurostat maritime statistics for emission modelling; and the problems encountered, since the statistical data collection was not undertaken with a view to this purpose, are mentioned. Examples of the results obtained by the database are presented. These include detailed air pollutant emission calculations for bulk carriers entering the port of Helsinki, as an example of the database operation, and aggregate results for different types of movements for France. Overall estimates of SO x and NO x emission caused by shipping traffic between the EU 15 countries are in the area of 1 and 1.5 million tonnes, respectively.

  18. Tree-Structured Digital Organisms Model

    NASA Astrophysics Data System (ADS)

    Suzuki, Teruhiko; Nobesawa, Shiho; Tahara, Ikuo

    Tierra and Avida are well-known models of digital organisms. They describe a life process as a sequence of computation codes. A linear sequence model may not be the only way to describe a digital organism, though it is very simple for a computer-based model. Thus we propose a new digital organism model based on a tree structure, which is rather similar to the generic programming. With our model, a life process is a combination of various functions, as if life in the real world is. This implies that our model can easily describe the hierarchical structure of life, and it can simulate evolutionary computation through mutual interaction of functions. We verified our model by simulations that our model can be regarded as a digital organism model according to its definitions. Our model even succeeded in creating species such as viruses and parasites.

  19. Subject and authorship of records related to the Organization for Tropical Studies (OTS) in BINABITROP, a comprehensive database about Costa Rican biology.

    PubMed

    Monge-Nájera, Julián; Nielsen-Muñoz, Vanessa; Azofeifa-Mora, Ana Beatriz

    2013-06-01

    BINABITROP is a bibliographical database of more than 38000 records about the ecosystems and organisms of Costa Rica. In contrast with commercial databases, such as Web of Knowledge and Scopus, which exclude most of the scientific journals published in tropical countries, BINABITROP is a comprehensive record of knowledge on the tropical ecosystems and organisms of Costa Rica. We analyzed its contents in three sites (La Selva, Palo Verde and Las Cruces) and recorded scientific field, taxonomic group and authorship. We found that most records dealt with ecology and systematics, and that most authors published only one article in the study period (1963-2011). Most research was published in four journals: Biotropica, Revista de Biología Tropical/ International Journal of Tropical Biology and Conservation, Zootaxa and Brenesia. This may be the first study of a such a comprehensive database for any case of tropical biology literature.

  20. DSSTOX WEBSITE LAUNCH: IMPROVING PUBLIC ACCESS TO DATABASES FOR BUILDING STRUCTURE-TOXICITY PREDICTION MODELS

    EPA Science Inventory

    DSSTox Website Launch: Improving Public Access to Databases for Building Structure-Toxicity Prediction Models
    Ann M. Richard
    US Environmental Protection Agency, Research Triangle Park, NC, USA

    Distributed: Decentralized set of standardized, field-delimited databases,...

  1. The Primate Life History Database: A unique shared ecological data resource

    PubMed Central

    Strier, Karen B.; Altmann, Jeanne; Brockman, Diane K.; Bronikowski, Anne M.; Cords, Marina; Fedigan, Linda M.; Lapp, Hilmar; Liu, Xianhua; Morris, William F.; Pusey, Anne E.; Stoinski, Tara S.; Alberts, Susan C.

    2011-01-01

    Summary The importance of data archiving, data sharing, and public access to data has received considerable attention. Awareness is growing among scientists that collaborative databases can facilitate these activities.We provide a detailed description of the collaborative life history database developed by our Working Group at the National Evolutionary Synthesis Center (NESCent) to address questions about life history patterns and the evolution of mortality and demographic variability in wild primates.Examples from each of the seven primate species included in our database illustrate the range of data incorporated and the challenges, decision-making processes, and criteria applied to standardize data across diverse field studies. In addition to the descriptive and structural metadata associated with our database, we also describe the process metadata (how the database was designed and delivered) and the technical specifications of the database.Our database provides a useful model for other researchers interested in developing similar types of databases for other organisms, while our process metadata may be helpful to other groups of researchers interested in developing databases for other types of collaborative analyses. PMID:21698066

  2. Organic carbon stock modelling for the quantification of the carbon sinks in terrestrial ecosystems

    NASA Astrophysics Data System (ADS)

    Durante, Pilar; Algeet, Nur; Oyonarte, Cecilio

    2017-04-01

    Given the recent environmental policies derived from the serious threats caused by global change, practical measures to decrease net CO2 emissions have to be put in place. Regarding this, carbon sequestration is a major measure to reduce atmospheric CO2 concentrations within a short and medium term, where terrestrial ecosystems play a basic role as carbon sinks. Development of tools for quantification, assessment and management of organic carbon in ecosystems at different scales and management scenarios, it is essential to achieve these commitments. The aim of this study is to establish a methodological framework for the modeling of this tool, applied to a sustainable land use planning and management at spatial and temporal scale. The methodology for carbon stock estimation in ecosystems is based on merger techniques between carbon stored in soils and aerial biomass. For this purpose, both spatial variability map of soil organic carbon (SOC) and algorithms for calculation of forest species biomass will be created. For the modelling of the SOC spatial distribution at different map scales, it is necessary to fit in and screen the available information of soil database legacy. Subsequently, SOC modelling will be based on the SCORPAN model, a quantitative model use to assess the correlation among soil-forming factors measured at the same site location. These factors will be selected from both static (terrain morphometric variables) and dynamic variables (climatic variables and vegetation indexes -NDVI-), providing to the model the spatio-temporal characteristic. After the predictive model, spatial inference techniques will be used to achieve the final map and to extrapolate the data to unavailable information areas (automated random forest regression kriging). The estimated uncertainty will be calculated to assess the model performance at different scale approaches. Organic carbon modelling of aerial biomass will be estimate using LiDAR (Light Detection And Ranging

  3. Data model and relational database design for the New Jersey Water-Transfer Data System (NJWaTr)

    USGS Publications Warehouse

    Tessler, Steven

    2003-01-01

    The New Jersey Water-Transfer Data System (NJWaTr) is a database design for the storage and retrieval of water-use data. NJWaTr can manage data encompassing many facets of water use, including (1) the tracking of various types of water-use activities (withdrawals, returns, transfers, distributions, consumptive-use, wastewater collection, and treatment); (2) the storage of descriptions, classifications and locations of places and organizations involved in water-use activities; (3) the storage of details about measured or estimated volumes of water associated with water-use activities; and (4) the storage of information about data sources and water resources associated with water use. In NJWaTr, each water transfer occurs unidirectionally between two site objects, and the sites and conveyances form a water network. The core entities in the NJWaTr model are site, conveyance, transfer/volume, location, and owner. Other important entities include water resource (used for withdrawals and returns), data source, permit, and alias. Multiple water-exchange estimates based on different methods or data sources can be stored for individual transfers. Storage of user-defined details is accommodated for several of the main entities. Many tables contain classification terms to facilitate the detailed description of data items and can be used for routine or custom data summarization. NJWaTr accommodates single-user and aggregate-user water-use data, can be used for large or small water-network projects, and is available as a stand-alone Microsoft? Access database. Data stored in the NJWaTr structure can be retrieved in user-defined combinations to serve visualization and analytical applications. Users can customize and extend the database, link it to other databases, or implement the design in other relational database applications.

  4. Comparison of the NCI open database with seven large chemical structural databases.

    PubMed

    Voigt, J H; Bienfait, B; Wang, S; Nicklaus, M C

    2001-01-01

    Eight large chemical databases have been analyzed and compared to each other. Central to this comparison is the open National Cancer Institute (NCI) database, consisting of approximately 250 000 structures. The other databases analyzed are the Available Chemicals Directory ("ACD," from MDL, release 1.99, 3D-version); the ChemACX ("ACX," from CamSoft, Version 4.5); the Maybridge Catalog and the Asinex database (both as distributed by CamSoft as part of ChemInfo 4.5); the Sigma-Aldrich Catalog (CD-ROM, 1999 Version); the World Drug Index ("WDI," Derwent, version 1999.03); and the organic part of the Cambridge Crystallographic Database ("CSD," from Cambridge Crystallographic Data Center, 1999 Version 5.18). The database properties analyzed are internal duplication rates; compounds unique to each database; cumulative occurrence of compounds in an increasing number of databases; overlap of identical compounds between two databases; similarity overlap; diversity; and others. The crystallographic database CSD and the WDI show somewhat less overlap with the other databases than those with each other. In particular the collections of commercial compounds and compilations of vendor catalogs have a substantial degree of overlap among each other. Still, no database is completely a subset of any other, and each appears to have its own niche and thus "raison d'être". The NCI database has by far the highest number of compounds that are unique to it. Approximately 200 000 of the NCI structures were not found in any of the other analyzed databases.

  5. Improved AIOMFAC model parameterisation of the temperature dependence of activity coefficients for aqueous organic mixtures

    NASA Astrophysics Data System (ADS)

    Ganbavale, G.; Zuend, A.; Marcolli, C.; Peter, T.

    2015-01-01

    This study presents a new, improved parameterisation of the temperature dependence of activity coefficients in the AIOMFAC (Aerosol Inorganic-Organic Mixtures Functional groups Activity Coefficients) model applicable for aqueous as well as water-free organic solutions. For electrolyte-free organic and organic-water mixtures the AIOMFAC model uses a group-contribution approach based on UNIFAC (UNIversal quasi-chemical Functional-group Activity Coefficients). This group-contribution approach explicitly accounts for interactions among organic functional groups and between organic functional groups and water. The previous AIOMFAC version uses a simple parameterisation of the temperature dependence of activity coefficients, aimed to be applicable in the temperature range from ~ 275 to ~ 400 K. With the goal to improve the description of a wide variety of organic compounds found in atmospheric aerosols, we extend the AIOMFAC parameterisation for the functional groups carboxyl, hydroxyl, ketone, aldehyde, ether, ester, alkyl, aromatic carbon-alcohol, and aromatic hydrocarbon to atmospherically relevant low temperatures. To this end we introduce a new parameterisation for the temperature dependence. The improved temperature dependence parameterisation is derived from classical thermodynamic theory by describing effects from changes in molar enthalpy and heat capacity of a multi-component system. Thermodynamic equilibrium data of aqueous organic and water-free organic mixtures from the literature are carefully assessed and complemented with new measurements to establish a comprehensive database, covering a wide temperature range (~ 190 to ~ 440 K) for many of the functional group combinations considered. Different experimental data types and their processing for the estimation of AIOMFAC model parameters are discussed. The new AIOMFAC parameterisation for the temperature dependence of activity coefficients from low to high temperatures shows an overall improvement of 28% in

  6. External validation and comparison with other models of the International Metastatic Renal-Cell Carcinoma Database Consortium prognostic model: a population-based study

    PubMed Central

    Heng, Daniel Y C; Xie, Wanling; Regan, Meredith M; Harshman, Lauren C; Bjarnason, Georg A; Vaishampayan, Ulka N; Mackenzie, Mary; Wood, Lori; Donskov, Frede; Tan, Min-Han; Rha, Sun-Young; Agarwal, Neeraj; Kollmannsberger, Christian; Rini, Brian I; Choueiri, Toni K

    2014-01-01

    Summary Background The International Metastatic Renal-Cell Carcinoma Database Consortium model offers prognostic information for patients with metastatic renal-cell carcinoma. We tested the accuracy of the model in an external population and compared it with other prognostic models. Methods We included patients with metastatic renal-cell carcinoma who were treated with first-line VEGF-targeted treatment at 13 international cancer centres and who were registered in the Consortium’s database but had not contributed to the initial development of the Consortium Database model. The primary endpoint was overall survival. We compared the Database Consortium model with the Cleveland Clinic Foundation (CCF) model, the International Kidney Cancer Working Group (IKCWG) model, the French model, and the Memorial Sloan-Kettering Cancer Center (MSKCC) model by concordance indices and other measures of model fit. Findings Overall, 1028 patients were included in this study, of whom 849 had complete data to assess the Database Consortium model. Median overall survival was 18·8 months (95% 17·6–21·4). The predefined Database Consortium risk factors (anaemia, thrombocytosis, neutrophilia, hypercalcaemia, Karnofsky performance status <80%, and <1 year from diagnosis to treatment) were independent predictors of poor overall survival in the external validation set (hazard ratios ranged between 1·27 and 2·08, concordance index 0·71, 95% CI 0·68–0·73). When patients were segregated into three risk categories, median overall survival was 43·2 months (95% CI 31·4–50·1) in the favourable risk group (no risk factors; 157 patients), 22·5 months (18·7–25·1) in the intermediate risk group (one to two risk factors; 440 patients), and 7·8 months (6·5–9·7) in the poor risk group (three or more risk factors; 252 patients; p<0·0001; concordance index 0·664, 95% CI 0·639–0·689). 672 patients had complete data to test all five models. The concordance index of the CCF

  7. Development of a database of organ doses for paediatric and young adult CT scans in the United Kingdom

    PubMed Central

    Kim, K. P.; Berrington de González, A.; Pearce, M. S.; Salotti, J. A.; Parker, L.; McHugh, K.; Craft, A. W.; Lee, C.

    2012-01-01

    Despite great potential benefits, there are concerns about the possible harm from medical imaging including the risk of radiation-related cancer. There are particular concerns about computed tomography (CT) scans in children because both radiation dose and sensitivity to radiation for children are typically higher than for adults undergoing equivalent procedures. As direct empirical data on the cancer risks from CT scans are lacking, the authors are conducting a retrospective cohort study of over 240 000 children in the UK who underwent CT scans. The main objective of the study is to quantify the magnitude of the cancer risk in relation to the radiation dose from CT scans. In this paper, the methods used to estimate typical organ-specific doses delivered by CT scans to children are described. An organ dose database from Monte Carlo radiation transport-based computer simulations using a series of computational human phantoms from newborn to adults for both male and female was established. Organ doses vary with patient size and sex, examination types and CT technical settings. Therefore, information on patient age, sex and examination type from electronic radiology information systems and technical settings obtained from two national surveys in the UK were used to estimate radiation dose. Absorbed doses to the brain, thyroid, breast and red bone marrow were calculated for reference male and female individuals with the ages of newborns, 1, 5, 10, 15 and 20 y for a total of 17 different scan types in the pre- and post-2001 time periods. In general, estimated organ doses were slightly higher for females than males which might be attributed to the smaller body size of the females. The younger children received higher doses in pre-2001 period when adult CT settings were typically used for children. Paediatric-specific adjustments were assumed to be used more frequently after 2001, since then radiation doses to children have often been smaller than those to adults. The

  8. Development of a database of organ doses for paediatric and young adult CT scans in the United Kingdom.

    PubMed

    Kim, K P; Berrington de González, A; Pearce, M S; Salotti, J A; Parker, L; McHugh, K; Craft, A W; Lee, C

    2012-07-01

    Despite great potential benefits, there are concerns about the possible harm from medical imaging including the risk of radiation-related cancer. There are particular concerns about computed tomography (CT) scans in children because both radiation dose and sensitivity to radiation for children are typically higher than for adults undergoing equivalent procedures. As direct empirical data on the cancer risks from CT scans are lacking, the authors are conducting a retrospective cohort study of over 240,000 children in the UK who underwent CT scans. The main objective of the study is to quantify the magnitude of the cancer risk in relation to the radiation dose from CT scans. In this paper, the methods used to estimate typical organ-specific doses delivered by CT scans to children are described. An organ dose database from Monte Carlo radiation transport-based computer simulations using a series of computational human phantoms from newborn to adults for both male and female was established. Organ doses vary with patient size and sex, examination types and CT technical settings. Therefore, information on patient age, sex and examination type from electronic radiology information systems and technical settings obtained from two national surveys in the UK were used to estimate radiation dose. Absorbed doses to the brain, thyroid, breast and red bone marrow were calculated for reference male and female individuals with the ages of newborns, 1, 5, 10, 15 and 20 y for a total of 17 different scan types in the pre- and post-2001 time periods. In general, estimated organ doses were slightly higher for females than males which might be attributed to the smaller body size of the females. The younger children received higher doses in pre-2001 period when adult CT settings were typically used for children. Paediatric-specific adjustments were assumed to be used more frequently after 2001, since then radiation doses to children have often been smaller than those to adults. The

  9. ChlamyCyc: an integrative systems biology database and web-portal for Chlamydomonas reinhardtii.

    PubMed

    May, Patrick; Christian, Jan-Ole; Kempa, Stefan; Walther, Dirk

    2009-05-04

    The unicellular green alga Chlamydomonas reinhardtii is an important eukaryotic model organism for the study of photosynthesis and plant growth. In the era of modern high-throughput technologies there is an imperative need to integrate large-scale data sets from high-throughput experimental techniques using computational methods and database resources to provide comprehensive information about the molecular and cellular organization of a single organism. In the framework of the German Systems Biology initiative GoFORSYS, a pathway database and web-portal for Chlamydomonas (ChlamyCyc) was established, which currently features about 250 metabolic pathways with associated genes, enzymes, and compound information. ChlamyCyc was assembled using an integrative approach combining the recently published genome sequence, bioinformatics methods, and experimental data from metabolomics and proteomics experiments. We analyzed and integrated a combination of primary and secondary database resources, such as existing genome annotations from JGI, EST collections, orthology information, and MapMan classification. ChlamyCyc provides a curated and integrated systems biology repository that will enable and assist in systematic studies of fundamental cellular processes in Chlamydomonas. The ChlamyCyc database and web-portal is freely available under http://chlamycyc.mpimp-golm.mpg.de.

  10. Biocuration at the Saccharomyces genome database.

    PubMed

    Skrzypek, Marek S; Nash, Robert S

    2015-08-01

    Saccharomyces Genome Database is an online resource dedicated to managing information about the biology and genetics of the model organism, yeast (Saccharomyces cerevisiae). This information is derived primarily from scientific publications through a process of human curation that involves manual extraction of data and their organization into a comprehensive system of knowledge. This system provides a foundation for further analysis of experimental data coming from research on yeast as well as other organisms. In this review we will demonstrate how biocuration and biocurators add a key component, the biological context, to our understanding of how genes, proteins, genomes and cells function and interact. We will explain the role biocurators play in sifting through the wealth of biological data to incorporate and connect key information. We will also discuss the many ways we assist researchers with their various research needs. We hope to convince the reader that manual curation is vital in converting the flood of data into organized and interconnected knowledge, and that biocurators play an essential role in the integration of scientific information into a coherent model of the cell. © 2015 Wiley Periodicals, Inc.

  11. Integrated Space Asset Management Database and Modeling

    NASA Technical Reports Server (NTRS)

    MacLeod, Todd; Gagliano, Larry; Percy, Thomas; Mason, Shane

    2015-01-01

    Effective Space Asset Management is one key to addressing the ever-growing issue of space congestion. It is imperative that agencies around the world have access to data regarding the numerous active assets and pieces of space junk currently tracked in orbit around the Earth. At the center of this issues is the effective management of data of many types related to orbiting objects. As the population of tracked objects grows, so too should the data management structure used to catalog technical specifications, orbital information, and metadata related to those populations. Marshall Space Flight Center's Space Asset Management Database (SAM-D) was implemented in order to effectively catalog a broad set of data related to known objects in space by ingesting information from a variety of database and processing that data into useful technical information. Using the universal NORAD number as a unique identifier, the SAM-D processes two-line element data into orbital characteristics and cross-references this technical data with metadata related to functional status, country of ownership, and application category. The SAM-D began as an Excel spreadsheet and was later upgraded to an Access database. While SAM-D performs its task very well, it is limited by its current platform and is not available outside of the local user base. Further, while modeling and simulation can be powerful tools to exploit the information contained in SAM-D, the current system does not allow proper integration options for combining the data with both legacy and new M&S tools. This paper provides a summary of SAM-D development efforts to date and outlines a proposed data management infrastructure that extends SAM-D to support the larger data sets to be generated. A service-oriented architecture model using an information sharing platform named SIMON will allow it to easily expand to incorporate new capabilities, including advanced analytics, M&S tools, fusion techniques and user interface for

  12. Data-based mechanistic modeling of dissolved organic carbon load through storms using continuous 15-minute resolution observations within UK upland watersheds

    NASA Astrophysics Data System (ADS)

    Jones, T.; Chappell, N. A.

    2013-12-01

    Few watershed modeling studies have addressed DOC dynamics through storm hydrographs (notable exceptions include Boyer et al., 1997 Hydrol Process; Jutras et al., 2011 Ecol Model; Xu et al., 2012 Water Resour Res). In part this has been a consequence of an incomplete understanding of the biogeochemical processes leading to DOC export to streams (Neff & Asner, 2001, Ecosystems) & an insufficient frequency of DOC monitoring to capture sometimes complex time-varying relationships between DOC & storm hydrographs (Kirchner et al., 2004, Hydrol Process). We present the results of a new & ongoing UK study that integrates two components - 1/ New observations of DOC concentrations (& derived load) continuously monitored at 15 minute intervals through multiple seasons for replicated watersheds; & 2/ A dynamic modeling technique that is able to quantify storage-decay effects, plus hysteretic, nonlinear, lagged & non-stationary relationships between DOC & controlling variables (including rainfall, streamflow, temperature & specific biogeochemical variables e.g., pH, nitrate). DOC concentration is being monitored continuously using the latest generation of UV spectrophotometers (i.e. S::CAN spectro::lysers) with in situ calibrations to laboratory analyzed DOC. The controlling variables are recorded simultaneously at the same stream stations. The watersheds selected for study are among the most intensively studied basins in the UK uplands, namely the Plynlimon & Llyn Brianne experimental basins. All contain areas of organic soils, with three having improved grasslands & three conifer afforested. The dynamic response characteristics (DRCs) that describe detailed DOC behaviour through sequences of storms are simulated using the latest identification routines for continuous time transfer function (CT-TF) models within the Matlab-based CAPTAIN toolbox (some incorporating nonlinear components). To our knowledge this is the first application of CT-TFs to modelling DOC processes

  13. Kidney transplantation after previous liver transplantation: analysis of the organ procurement transplant network database.

    PubMed

    Gonwa, Thomas A; McBride, Maureen A; Mai, Martin L; Wadei, Hani M

    2011-07-15

    Patients after liver transplant have a high incidence of chronic kidney disease and end-stage renal disease (ESRD). We investigated kidney transplantation after liver transplantation using the Organ Procurement Transplant Network database. The Organ Procurement Transplant Network database was queried for patients who received kidney transplantation after previous liver transplantation. These patients were compared with patients who received primary kidney transplantation alone during the same time period. Between 1997 and 2008, 157,086 primary kidney transplants were performed. Of these, 680 deceased donor kidney transplants and 410 living donor kidney transplants were performed in previous recipients of liver transplants. The number of kidney after liver transplants performed each year has increased from 37 per year to 124 per year in 2008. The time from liver transplant to kidney transplant increased from 8.2 to 9.0 years for living donor transplants and from 5.4 to 9.6 years for deceased donor. The 1, 3, and 5 year actuarial graft survival in both living donor kidney after liver transplant and deceased donor kidney after liver transplant are less than the kidney transplant alone patients. However, the death-censored graft survivals are equal. The patient survival is also less but is similar to what would be expected in liver transplant recipients who did not have ESRD. In 2008, kidney after liver transplantation represented 0.9% of the total kidney alone transplants performed in the United States. Kidney transplantation is an appropriate therapy for selected patients who develop ESRD after liver transplantation.

  14. Publication Trends in Model Organism Research

    PubMed Central

    Dietrich, Michael R.; Ankeny, Rachel A.; Chen, Patrick M.

    2014-01-01

    In 1990, the National Institutes of Health (NIH) gave some organisms special status as designated model organisms. This article documents publication trends for these NIH-designated model organisms over the past 40 years. We find that being designated a model organism by the NIH does not guarantee an increasing publication trend. An analysis of model and nonmodel organisms included in GENETICS since 1960 does reveal a sharp decline in the number of publications using nonmodel organisms yet no decline in the overall species diversity. We suggest that organisms with successful publication records tend to share critical characteristics, such as being well developed as standardized, experimental systems and being used by well-organized communities with good networks of exchange and methods of communication. PMID:25381363

  15. TREATABILITY DATABASE DESCRIPTION

    EPA Science Inventory

    The Drinking Water Treatability Database (TDB) presents referenced information on the control of contaminants in drinking water. It allows drinking water utilities, first responders to spills or emergencies, treatment process designers, research organizations, academics, regulato...

  16. Large image microscope array for the compilation of multimodality whole organ image databases.

    PubMed

    Namati, Eman; De Ryk, Jessica; Thiesse, Jacqueline; Towfic, Zaid; Hoffman, Eric; Mclennan, Geoffrey

    2007-11-01

    Three-dimensional, structural and functional digital image databases have many applications in education, research, and clinical medicine. However, to date, apart from cryosectioning, there have been no reliable means to obtain whole-organ, spatially conserving histology. Our aim was to generate a system capable of acquiring high-resolution images, featuring microscopic detail that could still be spatially correlated to the whole organ. To fulfill these objectives required the construction of a system physically capable of creating very fine whole-organ sections and collecting high-magnification and resolution digital images. We therefore designed a large image microscope array (LIMA) to serially section and image entire unembedded organs while maintaining the structural integrity of the tissue. The LIMA consists of several integrated components: a novel large-blade vibrating microtome, a 1.3 megapixel peltier cooled charge-coupled device camera, a high-magnification microscope, and a three axis gantry above the microtome. A custom control program was developed to automate the entire sectioning and automated raster-scan imaging sequence. The system is capable of sectioning unembedded soft tissue down to a thickness of 40 microm at specimen dimensions of 200 x 300 mm to a total depth of 350 mm. The LIMA system has been tested on fixed lung from sheep and mice, resulting in large high-quality image data sets, with minimal distinguishable disturbance in the delicate alveolar structures. Copyright 2007 Wiley-Liss, Inc.

  17. Using FlyBase, a Database of Drosophila Genes & Genomes

    PubMed Central

    Marygold, Steven J.; Crosby, Madeline A.; Goodman, Joshua L.

    2016-01-01

    SUMMARY For nearly 25 years, FlyBase (flybase.org) has provided a freely available online database of biological information about Drosophila species, focusing on the model organism D. melanogaster. The need for a centralized, integrated view of Drosophila research has never been greater as advances in genomic, proteomic and high-throughput technologies add to the quantity and diversity of available data and resources. FlyBase has taken several approaches to respond to these changes in the research landscape. Novel report pages have been generated for new reagent types and physical interaction data; Drosophila models of human disease are now represented and showcased in dedicated Human Disease Model Reports; other integrated reports have been established that bring together related genes, datasets or reagents; Gene Reports have been revised to improve access to new data types and to highlight functional data; links to external sites have been organized and expanded; and new tools have been developed to display and interrogate all these data, including improved batch processing and bulk file availability. In addition, several new community initiatives have served to enhance interactions between researchers and FlyBase, resulting in direct user contributions and improved feedback. This chapter provides an overview of the data content, organization and available tools within FlyBase, focusing on recent improvements. We hope it serves as a guide for our diverse user base, enabling efficient and effective exploration of the database and thereby accelerating research discoveries. PMID:27730573

  18. FOAM (Functional Ontology Assignments for Metagenomes): A Hidden Markov Model (HMM) database with environmental focus

    DOE PAGES

    Prestat, Emmanuel; David, Maude M.; Hultman, Jenni; ...

    2014-09-26

    A new functional gene database, FOAM (Functional Ontology Assignments for Metagenomes), was developed to screen environmental metagenomic sequence datasets. FOAM provides a new functional ontology dedicated to classify gene functions relevant to environmental microorganisms based on Hidden Markov Models (HMMs). Sets of aligned protein sequences (i.e. ‘profiles’) were tailored to a large group of target KEGG Orthologs (KOs) from which HMMs were trained. The alignments were checked and curated to make them specific to the targeted KO. Within this process, sequence profiles were enriched with the most abundant sequences available to maximize the yield of accurate classifier models. An associatedmore » functional ontology was built to describe the functional groups and hierarchy. FOAM allows the user to select the target search space before HMM-based comparison steps and to easily organize the results into different functional categories and subcategories. FOAM is publicly available at http://portal.nersc.gov/project/m1317/FOAM/.« less

  19. Team X Spacecraft Instrument Database Consolidation

    NASA Technical Reports Server (NTRS)

    Wallenstein, Kelly A.

    2005-01-01

    In the past decade, many changes have been made to Team X's process of designing each spacecraft, with the purpose of making the overall procedure more efficient over time. One such improvement is the use of information databases from previous missions, designs, and research. By referring to these databases, members of the design team can locate relevant instrument data and significantly reduce the total time they spend on each design. The files in these databases were stored in several different formats with various levels of accuracy. During the past 2 months, efforts have been made in an attempt to combine and organize these files. The main focus was in the Instruments department, where spacecraft subsystems are designed based on mission measurement requirements. A common database was developed for all instrument parameters using Microsoft Excel to minimize the time and confusion experienced when searching through files stored in several different formats and locations. By making this collection of information more organized, the files within them have become more easily searchable. Additionally, the new Excel database offers the option of importing its contents into a more efficient database management system in the future. This potential for expansion enables the database to grow and acquire more search features as needed.

  20. Biological Databases for Behavioral Neurobiology

    PubMed Central

    Baker, Erich J.

    2014-01-01

    Databases are, at their core, abstractions of data and their intentionally derived relationships. They serve as a central organizing metaphor and repository, supporting or augmenting nearly all bioinformatics. Behavioral domains provide a unique stage for contemporary databases, as research in this area spans diverse data types, locations, and data relationships. This chapter provides foundational information on the diversity and prevalence of databases, how data structures support the various needs of behavioral neuroscience analysis and interpretation. The focus is on the classes of databases, data curation, and advanced applications in bioinformatics using examples largely drawn from research efforts in behavioral neuroscience. PMID:23195119

  1. Analysis and fit of stellar spectra using a mega-database of CMFGEN models

    NASA Astrophysics Data System (ADS)

    Fierro-Santillán, Celia; Zsargó, Janos; Klapp, Jaime; Díaz-Azuara, Santiago Alfredo; Arrieta, Anabel; Arias, Lorena

    2017-11-01

    We present a tool for analysis and fit of stellar spectra using a mega database of 15,000 atmosphere models for OB stars. We have developed software tools, which allow us to find the model that best fits to an observed spectrum, comparing equivalent widths and line ratios in the observed spectrum with all models of the database. We use the Hα, Hβ, Hγ, and Hδ lines as criterion of stellar gravity and ratios of He II λ4541/He I λ4471, He II λ4200/(He I+He II λ4026), He II λ4541/He I λ4387, and He II λ4200/He I λ4144 as criterion of T eff.

  2. Software for pest-management science: computer models and databases from the United States Department of Agriculture-Agricultural Research Service.

    PubMed

    Wauchope, R Don; Ahuja, Lajpat R; Arnold, Jeffrey G; Bingner, Ron; Lowrance, Richard; van Genuchten, Martinus T; Adams, Larry D

    2003-01-01

    We present an overview of USDA Agricultural Research Service (ARS) computer models and databases related to pest-management science, emphasizing current developments in environmental risk assessment and management simulation models. The ARS has a unique national interdisciplinary team of researchers in surface and sub-surface hydrology, soil and plant science, systems analysis and pesticide science, who have networked to develop empirical and mechanistic computer models describing the behavior of pests, pest responses to controls and the environmental impact of pest-control methods. Historically, much of this work has been in support of production agriculture and in support of the conservation programs of our 'action agency' sister, the Natural Resources Conservation Service (formerly the Soil Conservation Service). Because we are a public agency, our software/database products are generally offered without cost, unless they are developed in cooperation with a private-sector cooperator. Because ARS is a basic and applied research organization, with development of new science as our highest priority, these products tend to be offered on an 'as-is' basis with limited user support except for cooperating R&D relationship with other scientists. However, rapid changes in the technology for information analysis and communication continually challenge our way of doing business.

  3. The MAR databases: development and implementation of databases specific for marine metagenomics

    PubMed Central

    Klemetsen, Terje; Raknes, Inge A; Fu, Juan; Agafonov, Alexander; Balasundaram, Sudhagar V; Tartari, Giacomo; Robertsen, Espen

    2018-01-01

    Abstract We introduce the marine databases; MarRef, MarDB and MarCat (https://mmp.sfb.uit.no/databases/), which are publicly available resources that promote marine research and innovation. These data resources, which have been implemented in the Marine Metagenomics Portal (MMP) (https://mmp.sfb.uit.no/), are collections of richly annotated and manually curated contextual (metadata) and sequence databases representing three tiers of accuracy. While MarRef is a database for completely sequenced marine prokaryotic genomes, which represent a marine prokaryote reference genome database, MarDB includes all incomplete sequenced prokaryotic genomes regardless level of completeness. The last database, MarCat, represents a gene (protein) catalog of uncultivable (and cultivable) marine genes and proteins derived from marine metagenomics samples. The first versions of MarRef and MarDB contain 612 and 3726 records, respectively. Each record is built up of 106 metadata fields including attributes for sampling, sequencing, assembly and annotation in addition to the organism and taxonomic information. Currently, MarCat contains 1227 records with 55 metadata fields. Ontologies and controlled vocabularies are used in the contextual databases to enhance consistency. The user-friendly web interface lets the visitors browse, filter and search in the contextual databases and perform BLAST searches against the corresponding sequence databases. All contextual and sequence databases are freely accessible and downloadable from https://s1.sfb.uit.no/public/mar/. PMID:29106641

  4. TogoTable: cross-database annotation system using the Resource Description Framework (RDF) data model.

    PubMed

    Kawano, Shin; Watanabe, Tsutomu; Mizuguchi, Sohei; Araki, Norie; Katayama, Toshiaki; Yamaguchi, Atsuko

    2014-07-01

    TogoTable (http://togotable.dbcls.jp/) is a web tool that adds user-specified annotations to a table that a user uploads. Annotations are drawn from several biological databases that use the Resource Description Framework (RDF) data model. TogoTable uses database identifiers (IDs) in the table as a query key for searching. RDF data, which form a network called Linked Open Data (LOD), can be searched from SPARQL endpoints using a SPARQL query language. Because TogoTable uses RDF, it can integrate annotations from not only the reference database to which the IDs originally belong, but also externally linked databases via the LOD network. For example, annotations in the Protein Data Bank can be retrieved using GeneID through links provided by the UniProt RDF. Because RDF has been standardized by the World Wide Web Consortium, any database with annotations based on the RDF data model can be easily incorporated into this tool. We believe that TogoTable is a valuable Web tool, particularly for experimental biologists who need to process huge amounts of data such as high-throughput experimental output. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. Producing a Climate-Quality Database of Global Upper Ocean Profile Temperatures - The IQuOD (International Quality-controlled Ocean Database) Project.

    NASA Astrophysics Data System (ADS)

    Sprintall, J.; Cowley, R.; Palmer, M. D.; Domingues, C. M.; Suzuki, T.; Ishii, M.; Boyer, T.; Goni, G. J.; Gouretski, V. V.; Macdonald, A. M.; Thresher, A.; Good, S. A.; Diggs, S. C.

    2016-02-01

    Historical ocean temperature profile observations provide a critical element for a host of ocean and climate research activities. These include providing initial conditions for seasonal-to-decadal prediction systems, evaluating past variations in sea level and Earth's energy imbalance, ocean state estimation for studying variability and change, and climate model evaluation and development. The International Quality controlled Ocean Database (IQuOD) initiative represents a community effort to create the most globally complete temperature profile dataset, with (intelligent) metadata and assigned uncertainties. With an internationally coordinated effort organized by oceanographers, with data and ocean instrumentation expertise, and in close consultation with end users (e.g., climate modelers), the IQuOD initiative will assess and maximize the potential of an irreplaceable collection of ocean temperature observations (tens of millions of profiles collected at a cost of tens of billions of dollars, since 1772) to fulfil the demand for a climate-quality global database that can be used with greater confidence in a vast range of climate change related research and services of societal benefit. Progress towards version 1 of the IQuOD database, ongoing and future work will be presented. More information on IQuOD is available at www.iquod.org.

  6. A Taxonomic Search Engine: Federating taxonomic databases using web services

    PubMed Central

    Page, Roderic DM

    2005-01-01

    Background The taxonomic name of an organism is a key link between different databases that store information on that organism. However, in the absence of a single, comprehensive database of organism names, individual databases lack an easy means of checking the correctness of a name. Furthermore, the same organism may have more than one name, and the same name may apply to more than one organism. Results The Taxonomic Search Engine (TSE) is a web application written in PHP that queries multiple taxonomic databases (ITIS, Index Fungorum, IPNI, NCBI, and uBIO) and summarises the results in a consistent format. It supports "drill-down" queries to retrieve a specific record. The TSE can optionally suggest alternative spellings the user can try. It also acts as a Life Science Identifier (LSID) authority for the source taxonomic databases, providing globally unique identifiers (and associated metadata) for each name. Conclusion The Taxonomic Search Engine is available at and provides a simple demonstration of the potential of the federated approach to providing access to taxonomic names. PMID:15757517

  7. THE CTEPP DATABASE

    EPA Science Inventory

    The CTEPP (Children's Total Exposure to Persistent Pesticides and Other Persistent Organic Pollutants) database contains a wealth of data on children's aggregate exposures to pollutants in their everyday surroundings. Chemical analysis data for the environmental media and ques...

  8. RPG: the Ribosomal Protein Gene database.

    PubMed

    Nakao, Akihiro; Yoshihama, Maki; Kenmochi, Naoya

    2004-01-01

    RPG (http://ribosome.miyazaki-med.ac.jp/) is a new database that provides detailed information about ribosomal protein (RP) genes. It contains data from humans and other organisms, including Drosophila melanogaster, Caenorhabditis elegans, Saccharo myces cerevisiae, Methanococcus jannaschii and Escherichia coli. Users can search the database by gene name and organism. Each record includes sequences (genomic, cDNA and amino acid sequences), intron/exon structures, genomic locations and information about orthologs. In addition, users can view and compare the gene structures of the above organisms and make multiple amino acid sequence alignments. RPG also provides information on small nucleolar RNAs (snoRNAs) that are encoded in the introns of RP genes.

  9. PlantTFDB: a comprehensive plant transcription factor database

    PubMed Central

    Guo, An-Yuan; Chen, Xin; Gao, Ge; Zhang, He; Zhu, Qi-Hui; Liu, Xiao-Chuan; Zhong, Ying-Fu; Gu, Xiaocheng; He, Kun; Luo, Jingchu

    2008-01-01

    Transcription factors (TFs) play key roles in controlling gene expression. Systematic identification and annotation of TFs, followed by construction of TF databases may serve as useful resources for studying the function and evolution of transcription factors. We developed a comprehensive plant transcription factor database PlantTFDB (http://planttfdb.cbi.pku.edu.cn), which contains 26 402 TFs predicted from 22 species, including five model organisms with available whole genome sequence and 17 plants with available EST sequences. To provide comprehensive information for those putative TFs, we made extensive annotation at both family and gene levels. A brief introduction and key references were presented for each family. Functional domain information and cross-references to various well-known public databases were available for each identified TF. In addition, we predicted putative orthologs of those TFs among the 22 species. PlantTFDB has a simple interface to allow users to search the database by IDs or free texts, to make sequence similarity search against TFs of all or individual species, and to download TF sequences for local analysis. PMID:17933783

  10. The ChArMEx database

    NASA Astrophysics Data System (ADS)

    Ferré, Hélène; Belmahfoud, Nizar; Boichard, Jean-Luc; Brissebrat, Guillaume; Cloché, Sophie; Descloitres, Jacques; Fleury, Laurence; Focsa, Loredana; Henriot, Nicolas; Mière, Arnaud; Ramage, Karim; Vermeulen, Anne; Boulanger, Damien

    2015-04-01

    The Chemistry-Aerosol Mediterranean Experiment (ChArMEx, http://charmex.lsce.ipsl.fr/) aims at a scientific assessment of the present and future state of the atmospheric environment in the Mediterranean Basin, and of its impacts on the regional climate, air quality, and marine biogeochemistry. The project includes long term monitoring of environmental parameters , intensive field campaigns, use of satellite data and modelling studies. Therefore ChARMEx scientists produce and need to access a wide diversity of data. In this context, the objective of the database task is to organize data management, distribution system and services, such as facilitating the exchange of information and stimulating the collaboration between researchers within the ChArMEx community, and beyond. The database relies on a strong collaboration between ICARE, IPSL and OMP data centers and has been set up in the framework of the Mediterranean Integrated Studies at Regional And Locals Scales (MISTRALS) program data portal. ChArMEx data, either produced or used by the project, are documented and accessible through the database website: http://mistrals.sedoo.fr/ChArMEx. The website offers the usual but user-friendly functionalities: data catalog, user registration procedure, search tool to select and access data... The metadata (data description) are standardized, and comply with international standards (ISO 19115-19139; INSPIRE European Directive; Global Change Master Directory Thesaurus). A Digital Object Identifier (DOI) assignement procedure allows to automatically register the datasets, in order to make them easier to access, cite, reuse and verify. At present, the ChArMEx database contains about 120 datasets, including more than 80 in situ datasets (2012, 2013 and 2014 summer campaigns, background monitoring station of Ersa...), 25 model output sets (dust model intercomparison, MEDCORDEX scenarios...), a high resolution emission inventory over the Mediterranean... Many in situ datasets

  11. Emissions databases for polycyclic aromatic compounds in the Canadian Athabasca oil sands region - development using current knowledge and evaluation with passive sampling and air dispersion modelling data

    NASA Astrophysics Data System (ADS)

    Qiu, Xin; Cheng, Irene; Yang, Fuquan; Horb, Erin; Zhang, Leiming; Harner, Tom

    2018-03-01

    Two speciated and spatially resolved emissions databases for polycyclic aromatic compounds (PACs) in the Athabasca oil sands region (AOSR) were developed. The first database was derived from volatile organic compound (VOC) emissions data provided by the Cumulative Environmental Management Association (CEMA) and the second database was derived from additional data collected within the Joint Canada-Alberta Oil Sands Monitoring (JOSM) program. CALPUFF modelling results for atmospheric polycyclic aromatic hydrocarbons (PAHs), alkylated PAHs, and dibenzothiophenes (DBTs), obtained using each of the emissions databases, are presented and compared with measurements from a passive air monitoring network. The JOSM-derived emissions resulted in better model-measurement agreement in the total PAH concentrations and for most PAH species concentrations compared to results using CEMA-derived emissions. At local sites near oil sands mines, the percent error of the model compared to observations decreased from 30 % using the CEMA-derived emissions to 17 % using the JOSM-derived emissions. The improvement at local sites was likely attributed to the inclusion of updated tailings pond emissions estimated from JOSM activities. In either the CEMA-derived or JOSM-derived emissions scenario, the model underestimated PAH concentrations by a factor of 3 at remote locations. Potential reasons for the disagreement include forest fire emissions, re-emissions of previously deposited PAHs, and long-range transport not considered in the model. Alkylated PAH and DBT concentrations were also significantly underestimated. The CALPUFF model is expected to predict higher concentrations because of the limited chemistry and deposition modelling. Thus the model underestimation of PACs is likely due to gaps in the emissions database for these compounds and uncertainties in the methodology for estimating the emissions. Future work is required that focuses on improving the PAC emissions estimation and

  12. BDVC (Bimodal Database of Violent Content): A database of violent audio and video

    NASA Astrophysics Data System (ADS)

    Rivera Martínez, Jose Luis; Mijes Cruz, Mario Humberto; Rodríguez Vázqu, Manuel Antonio; Rodríguez Espejo, Luis; Montoya Obeso, Abraham; García Vázquez, Mireya Saraí; Ramírez Acosta, Alejandro Álvaro

    2017-09-01

    Nowadays there is a trend towards the use of unimodal databases for multimedia content description, organization and retrieval applications of a single type of content like text, voice and images, instead bimodal databases allow to associate semantically two different types of content like audio-video, image-text, among others. The generation of a bimodal database of audio-video implies the creation of a connection between the multimedia content through the semantic relation that associates the actions of both types of information. This paper describes in detail the used characteristics and methodology for the creation of the bimodal database of violent content; the semantic relationship is stablished by the proposed concepts that describe the audiovisual information. The use of bimodal databases in applications related to the audiovisual content processing allows an increase in the semantic performance only and only if these applications process both type of content. This bimodal database counts with 580 audiovisual annotated segments, with a duration of 28 minutes, divided in 41 classes. Bimodal databases are a tool in the generation of applications for the semantic web.

  13. Reflective Database Access Control

    ERIC Educational Resources Information Center

    Olson, Lars E.

    2009-01-01

    "Reflective Database Access Control" (RDBAC) is a model in which a database privilege is expressed as a database query itself, rather than as a static privilege contained in an access control list. RDBAC aids the management of database access controls by improving the expressiveness of policies. However, such policies introduce new interactions…

  14. Space Object Radiometric Modeling for Hardbody Optical Signature Database Generation

    DTIC Science & Technology

    2009-09-01

    Introduction This presentation summarizes recent activity in monitoring spacecraft health status using passive remote optical nonimaging ...Approved for public release; distribution is unlimited. Space Object Radiometric Modeling for Hardbody Optical Signature Database Generation...It is beneficial to the observer/analyst to understand the fundamental optical signature variability associated with these detection and

  15. REDIdb: the RNA editing database.

    PubMed

    Picardi, Ernesto; Regina, Teresa Maria Rosaria; Brennicke, Axel; Quagliariello, Carla

    2007-01-01

    The RNA Editing Database (REDIdb) is an interactive, web-based database created and designed with the aim to allocate RNA editing events such as substitutions, insertions and deletions occurring in a wide range of organisms. The database contains both fully and partially sequenced DNA molecules for which editing information is available either by experimental inspection (in vitro) or by computational detection (in silico). Each record of REDIdb is organized in a specific flat-file containing a description of the main characteristics of the entry, a feature table with the editing events and related details and a sequence zone with both the genomic sequence and the corresponding edited transcript. REDIdb is a relational database in which the browsing and identification of editing sites has been simplified by means of two facilities to either graphically display genomic or cDNA sequences or to show the corresponding alignment. In both cases, all editing sites are highlighted in colour and their relative positions are detailed by mousing over. New editing positions can be directly submitted to REDIdb after a user-specific registration to obtain authorized secure access. This first version of REDIdb database stores 9964 editing events and can be freely queried at http://biologia.unical.it/py_script/search.html.

  16. Model Organisms and Traditional Chinese Medicine Syndrome Models

    PubMed Central

    Xu, Jin-Wen

    2013-01-01

    Traditional Chinese medicine (TCM) is an ancient medical system with a unique cultural background. Nowadays, more and more Western countries due to its therapeutic efficacy are accepting it. However, safety and clear pharmacological action mechanisms of TCM are still uncertain. Due to the potential application of TCM in healthcare, it is necessary to construct a scientific evaluation system with TCM characteristics and benchmark the difference from the standard of Western medicine. Model organisms have played an important role in the understanding of basic biological processes. It is easier to be studied in certain research aspects and to obtain the information of other species. Despite the controversy over suitable syndrome animal model under TCM theoretical guide, it is unquestionable that many model organisms should be used in the studies of TCM modernization, which will bring modern scientific standards into mysterious ancient Chinese medicine. In this review, we aim to summarize the utilization of model organisms in the construction of TCM syndrome model and highlight the relevance of modern medicine with TCM syndrome animal model. It will serve as the foundation for further research of model organisms and for its application in TCM syndrome model. PMID:24381636

  17. The ChArMEx database

    NASA Astrophysics Data System (ADS)

    Ferré, Helene; Belmahfoud, Nizar; Boichard, Jean-Luc; Brissebrat, Guillaume; Descloitres, Jacques; Fleury, Laurence; Focsa, Loredana; Henriot, Nicolas; Mastrorillo, Laurence; Mière, Arnaud; Vermeulen, Anne

    2014-05-01

    The Chemistry-Aerosol Mediterranean Experiment (ChArMEx, http://charmex.lsce.ipsl.fr/) aims at a scientific assessment of the present and future state of the atmospheric environment in the Mediterranean Basin, and of its impacts on the regional climate, air quality, and marine biogeochemistry. The project includes long term monitoring of environmental parameters, intensive field campaigns, use of satellite data and modelling studies. Therefore ChARMEx scientists produce and need to access a wide diversity of data. In this context, the objective of the database task is to organize data management, distribution system and services, such as facilitating the exchange of information and stimulating the collaboration between researchers within the ChArMEx community, and beyond. The database relies on a strong collaboration between OMP and ICARE data centres and has been set up in the framework of the Mediterranean Integrated Studies at Regional And Locals Scales (MISTRALS) program data portal. All the data produced by or of interest for the ChArMEx community will be documented in the data catalogue and accessible through the database website: http://mistrals.sedoo.fr/ChArMEx. At present, the ChArMEx database contains about 75 datasets, including 50 in situ datasets (2012 and 2013 campaigns, Ersa background monitoring station), 25 model outputs (dust model intercomparison, MEDCORDEX scenarios), and a high resolution emission inventory over the Mediterranean. Many in situ datasets have been inserted in a relational database, in order to enable more accurate data selection and download of different datasets in a shared format. The database website offers different tools: - A registration procedure which enables any scientist to accept the data policy and apply for a user database account. - A data catalogue that complies with metadata international standards (ISO 19115-19139; INSPIRE European Directive; Global Change Master Directory Thesaurus). - Metadata forms to document

  18. GIS-based hydrogeological databases and groundwater modelling

    NASA Astrophysics Data System (ADS)

    Gogu, Radu Constantin; Carabin, Guy; Hallet, Vincent; Peters, Valerie; Dassargues, Alain

    2001-12-01

    Reliability and validity of groundwater analysis strongly depend on the availability of large volumes of high-quality data. Putting all data into a coherent and logical structure supported by a computing environment helps ensure validity and availability and provides a powerful tool for hydrogeological studies. A hydrogeological geographic information system (GIS) database that offers facilities for groundwater-vulnerability analysis and hydrogeological modelling has been designed in Belgium for the Walloon region. Data from five river basins, chosen for their contrasting hydrogeological characteristics, have been included in the database, and a set of applications that have been developed now allow further advances. Interest is growing in the potential for integrating GIS technology and groundwater simulation models. A "loose-coupling" tool was created between the spatial-database scheme and the groundwater numerical model interface GMS (Groundwater Modelling System). Following time and spatial queries, the hydrogeological data stored in the database can be easily used within different groundwater numerical models. Résumé. La validité et la reproductibilité de l'analyse d'un aquifère dépend étroitement de la disponibilité de grandes quantités de données de très bonne qualité. Le fait de mettre toutes les données dans une structure cohérente et logique soutenue par les logiciels nécessaires aide à assurer la validité et la disponibilité et fournit un outil puissant pour les études hydrogéologiques. Une base de données pour un système d'information géographique (SIG) hydrogéologique qui offre toutes les facilités pour l'analyse de la vulnérabilité des eaux souterraines et la modélisation hydrogéologique a été établi en Belgique pour la région Wallonne. Les données de cinq bassins de rivières, choisis pour leurs caractéristiques hydrogéologiques différentes, ont été introduites dans la base de données, et un ensemble d

  19. A new Volcanic managEment Risk Database desIgn (VERDI): Application to El Hierro Island (Canary Islands)

    NASA Astrophysics Data System (ADS)

    Bartolini, S.; Becerril, L.; Martí, J.

    2014-11-01

    One of the most important issues in modern volcanology is the assessment of volcanic risk, which will depend - among other factors - on both the quantity and quality of the available data and an optimum storage mechanism. This will require the design of purpose-built databases that take into account data format and availability and afford easy data storage and sharing, and will provide for a more complete risk assessment that combines different analyses but avoids any duplication of information. Data contained in any such database should facilitate spatial and temporal analysis that will (1) produce probabilistic hazard models for future vent opening, (2) simulate volcanic hazards and (3) assess their socio-economic impact. We describe the design of a new spatial database structure, VERDI (Volcanic managEment Risk Database desIgn), which allows different types of data, including geological, volcanological, meteorological, monitoring and socio-economic information, to be manipulated, organized and managed. The root of the question is to ensure that VERDI will serve as a tool for connecting different kinds of data sources, GIS platforms and modeling applications. We present an overview of the database design, its components and the attributes that play an important role in the database model. The potential of the VERDI structure and the possibilities it offers in regard to data organization are here shown through its application on El Hierro (Canary Islands). The VERDI database will provide scientists and decision makers with a useful tool that will assist to conduct volcanic risk assessment and management.

  20. The MAR databases: development and implementation of databases specific for marine metagenomics.

    PubMed

    Klemetsen, Terje; Raknes, Inge A; Fu, Juan; Agafonov, Alexander; Balasundaram, Sudhagar V; Tartari, Giacomo; Robertsen, Espen; Willassen, Nils P

    2018-01-04

    We introduce the marine databases; MarRef, MarDB and MarCat (https://mmp.sfb.uit.no/databases/), which are publicly available resources that promote marine research and innovation. These data resources, which have been implemented in the Marine Metagenomics Portal (MMP) (https://mmp.sfb.uit.no/), are collections of richly annotated and manually curated contextual (metadata) and sequence databases representing three tiers of accuracy. While MarRef is a database for completely sequenced marine prokaryotic genomes, which represent a marine prokaryote reference genome database, MarDB includes all incomplete sequenced prokaryotic genomes regardless level of completeness. The last database, MarCat, represents a gene (protein) catalog of uncultivable (and cultivable) marine genes and proteins derived from marine metagenomics samples. The first versions of MarRef and MarDB contain 612 and 3726 records, respectively. Each record is built up of 106 metadata fields including attributes for sampling, sequencing, assembly and annotation in addition to the organism and taxonomic information. Currently, MarCat contains 1227 records with 55 metadata fields. Ontologies and controlled vocabularies are used in the contextual databases to enhance consistency. The user-friendly web interface lets the visitors browse, filter and search in the contextual databases and perform BLAST searches against the corresponding sequence databases. All contextual and sequence databases are freely accessible and downloadable from https://s1.sfb.uit.no/public/mar/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. The Neotoma Paleoecology Database

    NASA Astrophysics Data System (ADS)

    Grimm, E. C.; Ashworth, A. C.; Barnosky, A. D.; Betancourt, J. L.; Bills, B.; Booth, R.; Blois, J.; Charles, D. F.; Graham, R. W.; Goring, S. J.; Hausmann, S.; Smith, A. J.; Williams, J. W.; Buckland, P.

    2015-12-01

    The Neotoma Paleoecology Database (www.neotomadb.org) is a multiproxy, open-access, relational database that includes fossil data for the past 5 million years (the late Neogene and Quaternary Periods). Modern distributional data for various organisms are also being made available for calibration and paleoecological analyses. The project is a collaborative effort among individuals from more than 20 institutions worldwide, including domain scientists representing a spectrum of Pliocene-Quaternary fossil data types, as well as experts in information technology. Working groups are active for diatoms, insects, ostracodes, pollen and plant macroscopic remains, testate amoebae, rodent middens, vertebrates, age models, geochemistry and taphonomy. Groups are also active in developing online tools for data analyses and for developing modules for teaching at different levels. A key design concept of NeotomaDB is that stewards for various data types are able to remotely upload and manage data. Cooperatives for different kinds of paleo data, or from different regions, can appoint their own stewards. Over the past year, much progress has been made on development of the steward software-interface that will enable this capability. The steward interface uses web services that provide access to the database. More generally, these web services enable remote programmatic access to the database, which both desktop and web applications can use and which provide real-time access to the most current data. Use of these services can alleviate the need to download the entire database, which can be out-of-date as soon as new data are entered. In general, the Neotoma web services deliver data either from an entire table or from the results of a view. Upon request, new web services can be quickly generated. Future developments will likely expand the spatial and temporal dimensions of the database. NeotomaDB is open to receiving new datasets and stewards from the global Quaternary community

  2. Phase Equilibria Diagrams Database

    National Institute of Standards and Technology Data Gateway

    SRD 31 NIST/ACerS Phase Equilibria Diagrams Database (PC database for purchase)   The Phase Equilibria Diagrams Database contains commentaries and more than 21,000 diagrams for non-organic systems, including those published in all 21 hard-copy volumes produced as part of the ACerS-NIST Phase Equilibria Diagrams Program (formerly titled Phase Diagrams for Ceramists): Volumes I through XIV (blue books); Annuals 91, 92, 93; High Tc Superconductors I & II; Zirconium & Zirconia Systems; and Electronic Ceramics I. Materials covered include oxides as well as non-oxide systems such as chalcogenides and pnictides, phosphates, salt systems, and mixed systems of these classes.

  3. The Cambridge Structural Database

    PubMed Central

    Groom, Colin R.; Bruno, Ian J.; Lightfoot, Matthew P.; Ward, Suzanna C.

    2016-01-01

    The Cambridge Structural Database (CSD) contains a complete record of all published organic and metal–organic small-molecule crystal structures. The database has been in operation for over 50 years and continues to be the primary means of sharing structural chemistry data and knowledge across disciplines. As well as structures that are made public to support scientific articles, it includes many structures published directly as CSD Communications. All structures are processed both computationally and by expert structural chemistry editors prior to entering the database. A key component of this processing is the reliable association of the chemical identity of the structure studied with the experimental data. This important step helps ensure that data is widely discoverable and readily reusable. Content is further enriched through selective inclusion of additional experimental data. Entries are available to anyone through free CSD community web services. Linking services developed and maintained by the CCDC, combined with the use of standard identifiers, facilitate discovery from other resources. Data can also be accessed through CCDC and third party software applications and through an application programming interface. PMID:27048719

  4. The Cambridge Structural Database.

    PubMed

    Groom, Colin R; Bruno, Ian J; Lightfoot, Matthew P; Ward, Suzanna C

    2016-04-01

    The Cambridge Structural Database (CSD) contains a complete record of all published organic and metal-organic small-molecule crystal structures. The database has been in operation for over 50 years and continues to be the primary means of sharing structural chemistry data and knowledge across disciplines. As well as structures that are made public to support scientific articles, it includes many structures published directly as CSD Communications. All structures are processed both computationally and by expert structural chemistry editors prior to entering the database. A key component of this processing is the reliable association of the chemical identity of the structure studied with the experimental data. This important step helps ensure that data is widely discoverable and readily reusable. Content is further enriched through selective inclusion of additional experimental data. Entries are available to anyone through free CSD community web services. Linking services developed and maintained by the CCDC, combined with the use of standard identifiers, facilitate discovery from other resources. Data can also be accessed through CCDC and third party software applications and through an application programming interface.

  5. THE ECOTOX DATABASE

    EPA Science Inventory

    The database provides chemical-specific toxicity information for aquatic life, terrestrial plants, and terrestrial wildlife. ECOTOX is a comprehensive ecotoxicology database and is therefore essential for providing and suppoirting high quality models needed to estimate population...

  6. Microporoelastic Modeling of Organic-Rich Shales

    NASA Astrophysics Data System (ADS)

    Khosh Sokhan Monfared, S.; Abedi, S.; Ulm, F. J.

    2014-12-01

    Organic-rich shale is an extremely complex, naturally occurring geo-composite. The heterogeneous nature of organic-rich shale and its anisotropic behavior pose grand challenges for characterization, modeling and engineering design The intricacy of organic-rich shale, in the context of its mechanical and poromechanical properties, originates in the presence of organic/inorganic constituents and their interfaces as well as the occurrence of porosity and elastic anisotropy, at multiple length scales. To capture the contributing mechanisms, of 1st order, responsible for organic-rich shale complex behavior, we introduce an original approach for micromechanical modeling of organic-rich shales which accounts for the effect of maturity of organics on the overall elasticity through morphology considerations. This morphology contribution is captured by means of an effective media theory that bridges the gap between immature and mature systems through the choice of system's microtexture; namely a matrix-inclusion morphology (Mori-Tanaka) for immature systems and a polycrystal/granular morphology for mature systems. Also, we show that interfaces play a role on the effective elasticity of mature, organic-rich shales. The models are calibrated by means of ultrasonic pulse velocity measurements of elastic properties and validated by means of nanoindentation results. Sensitivity analyses using Spearman's Partial Rank Correlation Coefficient shows the importance of porosity and Total Organic Carbon (TOC) as key input parameters for accurate model predictions. These modeling developments pave the way to reach a "unique" set of clay properties and highlight the importance of depositional environment, burial and diagenetic processes on overall mechanical and poromechanical behavior of organic-rich shale. These developments also emphasize the importance of understanding and modeling clay elasticity and organic maturity on the overall rock behavior which is of critical importance for a

  7. RPG: the Ribosomal Protein Gene database

    PubMed Central

    Nakao, Akihiro; Yoshihama, Maki; Kenmochi, Naoya

    2004-01-01

    RPG (http://ribosome.miyazaki-med.ac.jp/) is a new database that provides detailed information about ribosomal protein (RP) genes. It contains data from humans and other organisms, including Drosophila melanogaster, Caenorhabditis elegans, Saccharo myces cerevisiae, Methanococcus jannaschii and Escherichia coli. Users can search the database by gene name and organism. Each record includes sequences (genomic, cDNA and amino acid sequences), intron/exon structures, genomic locations and information about orthologs. In addition, users can view and compare the gene structures of the above organisms and make multiple amino acid sequence alignments. RPG also provides information on small nucleolar RNAs (snoRNAs) that are encoded in the introns of RP genes. PMID:14681386

  8. Livestock Anaerobic Digester Database

    EPA Pesticide Factsheets

    The Anaerobic Digester Database provides basic information about anaerobic digesters on livestock farms in the United States, organized in Excel spreadsheets. It includes projects that are under construction, operating, or shut down.

  9. BioWarehouse: a bioinformatics database warehouse toolkit.

    PubMed

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David W J; Tenenbaum, Jessica D; Karp, Peter D

    2006-03-23

    This article addresses the problem of interoperation of heterogeneous bioinformatics databases. We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. BioWarehouse embodies significant progress on the database integration problem for bioinformatics.

  10. Using FlyBase, a Database of Drosophila Genes and Genomes.

    PubMed

    Marygold, Steven J; Crosby, Madeline A; Goodman, Joshua L

    2016-01-01

    For nearly 25 years, FlyBase (flybase.org) has provided a freely available online database of biological information about Drosophila species, focusing on the model organism D. melanogaster. The need for a centralized, integrated view of Drosophila research has never been greater as advances in genomic, proteomic, and high-throughput technologies add to the quantity and diversity of available data and resources.FlyBase has taken several approaches to respond to these changes in the research landscape. Novel report pages have been generated for new reagent types and physical interaction data; Drosophila models of human disease are now represented and showcased in dedicated Human Disease Model Reports; other integrated reports have been established that bring together related genes, datasets, or reagents; Gene Reports have been revised to improve access to new data types and to highlight functional data; links to external sites have been organized and expanded; and new tools have been developed to display and interrogate all these data, including improved batch processing and bulk file availability. In addition, several new community initiatives have served to enhance interactions between researchers and FlyBase, resulting in direct user contributions and improved feedback.This chapter provides an overview of the data content, organization, and available tools within FlyBase, focusing on recent improvements. We hope it serves as a guide for our diverse user base, enabling efficient and effective exploration of the database and thereby accelerating research discoveries.

  11. MMDB: Entrez’s 3D-structure database

    PubMed Central

    Wang, Yanli; Anderson, John B.; Chen, Jie; Geer, Lewis Y.; He, Siqian; Hurwitz, David I.; Liebert, Cynthia A.; Madej, Thomas; Marchler, Gabriele H.; Marchler-Bauer, Aron; Panchenko, Anna R.; Shoemaker, Benjamin A.; Song, James S.; Thiessen, Paul A.; Yamashita, Roxanne A.; Bryant, Stephen H.

    2002-01-01

    Three-dimensional structures are now known within many protein families and it is quite likely, in searching a sequence database, that one will encounter a homolog with known structure. The goal of Entrez’s 3D-structure database is to make this information, and the functional annotation it can provide, easily accessible to molecular biologists. To this end Entrez’s search engine provides three powerful features. (i) Sequence and structure neighbors; one may select all sequences similar to one of interest, for example, and link to any known 3D structures. (ii) Links between databases; one may search by term matching in MEDLINE, for example, and link to 3D structures reported in these articles. (iii) Sequence and structure visualization; identifying a homolog with known structure, one may view molecular-graphic and alignment displays, to infer approximate 3D structure. In this article we focus on two features of Entrez’s Molecular Modeling Database (MMDB) not described previously: links from individual biopolymer chains within 3D structures to a systematic taxonomy of organisms represented in molecular databases, and links from individual chains (and compact 3D domains within them) to structure neighbors, other chains (and 3D domains) with similar 3D structure. MMDB may be accessed at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure. PMID:11752307

  12. Modeling the High Speed Research Cycle 2B Longitudinal Aerodynamic Database Using Multivariate Orthogonal Functions

    NASA Technical Reports Server (NTRS)

    Morelli, E. A.; Proffitt, M. S.

    1999-01-01

    The data for longitudinal non-dimensional, aerodynamic coefficients in the High Speed Research Cycle 2B aerodynamic database were modeled using polynomial expressions identified with an orthogonal function modeling technique. The discrepancy between the tabular aerodynamic data and the polynomial models was tested and shown to be less than 15 percent for drag, lift, and pitching moment coefficients over the entire flight envelope. Most of this discrepancy was traced to smoothing local measurement noise and to the omission of mass case 5 data in the modeling process. A simulation check case showed that the polynomial models provided a compact and accurate representation of the nonlinear aerodynamic dependencies contained in the HSR Cycle 2B tabular aerodynamic database.

  13. Database assessment of CMIP5 and hydrological models to determine flood risk areas

    NASA Astrophysics Data System (ADS)

    Limlahapun, Ponthip; Fukui, Hiromichi

    2016-11-01

    Solutions for water-related disasters may not be solved with a single scientific method. Based on this premise, we involved logic conceptions, associate sequential result amongst models, and database applications attempting to analyse historical and future scenarios in the context of flooding. The three main models used in this study are (1) the fifth phase of the Coupled Model Intercomparison Project (CMIP5) to derive precipitation; (2) the Integrated Flood Analysis System (IFAS) to extract amount of discharge; and (3) the Hydrologic Engineering Center (HEC) model to generate inundated areas. This research notably focused on integrating data regardless of system-design complexity, and database approaches are significantly flexible, manageable, and well-supported for system data transfer, which makes them suitable for monitoring a flood. The outcome of flood map together with real-time stream data can help local communities identify areas at-risk of flooding in advance.

  14. Database constraints applied to metabolic pathway reconstruction tools.

    PubMed

    Vilaplana, Jordi; Solsona, Francesc; Teixido, Ivan; Usié, Anabel; Karathia, Hiren; Alves, Rui; Mateo, Jordi

    2014-01-01

    Our group developed two biological applications, Biblio-MetReS and Homol-MetReS, accessing the same database of organisms with annotated genes. Biblio-MetReS is a data-mining application that facilitates the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the process(es) of interest and their function. It also enables the sets of proteins involved in the process(es) in different organisms to be compared directly. The efficiency of these biological applications is directly related to the design of the shared database. We classified and analyzed the different kinds of access to the database. Based on this study, we tried to adjust and tune the configurable parameters of the database server to reach the best performance of the communication data link to/from the database system. Different database technologies were analyzed. We started the study with a public relational SQL database, MySQL. Then, the same database was implemented by a MapReduce-based database named HBase. The results indicated that the standard configuration of MySQL gives an acceptable performance for low or medium size databases. Nevertheless, tuning database parameters can greatly improve the performance and lead to very competitive runtimes.

  15. Database Constraints Applied to Metabolic Pathway Reconstruction Tools

    PubMed Central

    Vilaplana, Jordi; Solsona, Francesc; Teixido, Ivan; Usié, Anabel; Karathia, Hiren; Alves, Rui; Mateo, Jordi

    2014-01-01

    Our group developed two biological applications, Biblio-MetReS and Homol-MetReS, accessing the same database of organisms with annotated genes. Biblio-MetReS is a data-mining application that facilitates the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the process(es) of interest and their function. It also enables the sets of proteins involved in the process(es) in different organisms to be compared directly. The efficiency of these biological applications is directly related to the design of the shared database. We classified and analyzed the different kinds of access to the database. Based on this study, we tried to adjust and tune the configurable parameters of the database server to reach the best performance of the communication data link to/from the database system. Different database technologies were analyzed. We started the study with a public relational SQL database, MySQL. Then, the same database was implemented by a MapReduce-based database named HBase. The results indicated that the standard configuration of MySQL gives an acceptable performance for low or medium size databases. Nevertheless, tuning database parameters can greatly improve the performance and lead to very competitive runtimes. PMID:25202745

  16. Databases for Microbiologists

    DOE PAGES

    Zhulin, Igor B.

    2015-05-26

    Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. Finally, the purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists.

  17. Databases for Microbiologists

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhulin, Igor B.

    Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. Finally, the purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists.

  18. Databases for Microbiologists

    PubMed Central

    2015-01-01

    Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. The purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists. PMID:26013493

  19. Fuzzy queries above relational database

    NASA Astrophysics Data System (ADS)

    Smolka, Pavel; Bradac, Vladimir

    2017-11-01

    The aim of the theme is to introduce a possibility of fuzzy queries implemented in relational databases. The issue is described on a model which identifies the appropriate part of the problem domain for fuzzy approach. The model is demonstrated on a database of wines focused on searching in it. The construction of the database complies with the Law of the Czech Republic.

  20. MODEL-BASED HYDROACOUSTIC BLOCKAGE ASSESSMENT AND DEVELOPMENT OF AN EXPLOSIVE SOURCE DATABASE

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Matzel, E; Ramirez, A; Harben, P

    2005-07-11

    We are continuing the development of the Hydroacoustic Blockage Assessment Tool (HABAT) which is designed for use by analysts to predict which hydroacoustic monitoring stations can be used in discrimination analysis for any particular event. The research involves two approaches (1) model-based assessment of blockage, and (2) ground-truth data-based assessment of blockage. The tool presents the analyst with a map of the world, and plots raypath blockages from stations to sources. The analyst inputs source locations and blockage criteria, and the tool returns a list of blockage status from all source locations to all hydroacoustic stations. We are currently usingmore » the tool in an assessment of blockage criteria for simple direct-path arrivals. Hydroacoustic data, predominantly from earthquake sources, are read in and assessed for blockage at all available stations. Several measures are taken. First, can the event be observed at a station above background noise? Second, can we establish backazimuth from the station to the source. Third, how large is the decibel drop at one station relative to other stations. These observational results are then compared with model estimates to identify the best set of blockage criteria and used to create a set of blockage maps for each station. The model-based estimates are currently limited by the coarse bathymetry of existing databases and by the limitations inherent in the raytrace method. In collaboration with BBN Inc., the Hydroacoustic Coverage Assessment Model (HydroCAM) that generates the blockage files that serve as input to HABAT, is being extended to include high-resolution bathymetry databases in key areas that increase model-based blockage assessment reliability. An important aspect of this capability is to eventually include reflected T-phases where they reliably occur and to identify the associated reflectors. To assess how well any given hydroacoustic discriminant works in separating earthquake and in

  1. Model of the Dynamic Construction Process of Texts and Scaling Laws of Words Organization in Language Systems

    PubMed Central

    Li, Shan; Lin, Ruokuang; Bian, Chunhua; Ma, Qianli D. Y.

    2016-01-01

    Scaling laws characterize diverse complex systems in a broad range of fields, including physics, biology, finance, and social science. The human language is another example of a complex system of words organization. Studies on written texts have shown that scaling laws characterize the occurrence frequency of words, words rank, and the growth of distinct words with increasing text length. However, these studies have mainly concentrated on the western linguistic systems, and the laws that govern the lexical organization, structure and dynamics of the Chinese language remain not well understood. Here we study a database of Chinese and English language books. We report that three distinct scaling laws characterize words organization in the Chinese language. We find that these scaling laws have different exponents and crossover behaviors compared to English texts, indicating different words organization and dynamics of words in the process of text growth. We propose a stochastic feedback model of words organization and text growth, which successfully accounts for the empirically observed scaling laws with their corresponding scaling exponents and characteristic crossover regimes. Further, by varying key model parameters, we reproduce differences in the organization and scaling laws of words between the Chinese and English language. We also identify functional relationships between model parameters and the empirically observed scaling exponents, thus providing new insights into the words organization and growth dynamics in the Chinese and English language. PMID:28006026

  2. Model of the Dynamic Construction Process of Texts and Scaling Laws of Words Organization in Language Systems.

    PubMed

    Li, Shan; Lin, Ruokuang; Bian, Chunhua; Ma, Qianli D Y; Ivanov, Plamen Ch

    2016-01-01

    Scaling laws characterize diverse complex systems in a broad range of fields, including physics, biology, finance, and social science. The human language is another example of a complex system of words organization. Studies on written texts have shown that scaling laws characterize the occurrence frequency of words, words rank, and the growth of distinct words with increasing text length. However, these studies have mainly concentrated on the western linguistic systems, and the laws that govern the lexical organization, structure and dynamics of the Chinese language remain not well understood. Here we study a database of Chinese and English language books. We report that three distinct scaling laws characterize words organization in the Chinese language. We find that these scaling laws have different exponents and crossover behaviors compared to English texts, indicating different words organization and dynamics of words in the process of text growth. We propose a stochastic feedback model of words organization and text growth, which successfully accounts for the empirically observed scaling laws with their corresponding scaling exponents and characteristic crossover regimes. Further, by varying key model parameters, we reproduce differences in the organization and scaling laws of words between the Chinese and English language. We also identify functional relationships between model parameters and the empirically observed scaling exponents, thus providing new insights into the words organization and growth dynamics in the Chinese and English language.

  3. Integrating the Allen Brain Institute Cell Types Database into Automated Neuroscience Workflow.

    PubMed

    Stockton, David B; Santamaria, Fidel

    2017-10-01

    We developed software tools to download, extract features, and organize the Cell Types Database from the Allen Brain Institute (ABI) in order to integrate its whole cell patch clamp characterization data into the automated modeling/data analysis cycle. To expand the potential user base we employed both Python and MATLAB. The basic set of tools downloads selected raw data and extracts cell, sweep, and spike features, using ABI's feature extraction code. To facilitate data manipulation we added a tool to build a local specialized database of raw data plus extracted features. Finally, to maximize automation, we extended our NeuroManager workflow automation suite to include these tools plus a separate investigation database. The extended suite allows the user to integrate ABI experimental and modeling data into an automated workflow deployed on heterogeneous computer infrastructures, from local servers, to high performance computing environments, to the cloud. Since our approach is focused on workflow procedures our tools can be modified to interact with the increasing number of neuroscience databases being developed to cover all scales and properties of the nervous system.

  4. Partial automation of database processing of simulation outputs from L-systems models of plant morphogenesis.

    PubMed

    Chen, Yi- Ping Phoebe; Hanan, Jim

    2002-01-01

    Models of plant architecture allow us to explore how genotype environment interactions effect the development of plant phenotypes. Such models generate masses of data organised in complex hierarchies. This paper presents a generic system for creating and automatically populating a relational database from data generated by the widely used L-system approach to modelling plant morphogenesis. Techniques from compiler technology are applied to generate attributes (new fields) in the database, to simplify query development for the recursively-structured branching relationship. Use of biological terminology in an interactive query builder contributes towards making the system biologist-friendly.

  5. A dynamic clinical dental relational database.

    PubMed

    Taylor, D; Naguib, R N G; Boulton, S

    2004-09-01

    The traditional approach to relational database design is based on the logical organization of data into a number of related normalized tables. One assumption is that the nature and structure of the data is known at the design stage. In the case of designing a relational database to store historical dental epidemiological data from individual clinical surveys, the structure of the data is not known until the data is presented for inclusion into the database. This paper addresses the issues concerned with the theoretical design of a clinical dynamic database capable of adapting the internal table structure to accommodate clinical survey data, and presents a prototype database application capable of processing, displaying, and querying the dental data.

  6. The International Experimental Thermal Hydraulic Systems database – TIETHYS: A new NEA validation tool

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rohatgi, Upendra S.

    Nuclear reactor codes require validation with appropriate data representing the plant for specific scenarios. The thermal-hydraulic data is scattered in different locations and in different formats. Some of the data is in danger of being lost. A relational database is being developed to organize the international thermal hydraulic test data for various reactor concepts and different scenarios. At the reactor system level, that data is organized to include separate effect tests and integral effect tests for specific scenarios and corresponding phenomena. The database relies on the phenomena identification sections of expert developed PIRTs. The database will provide a summary ofmore » appropriate data, review of facility information, test description, instrumentation, references for the experimental data and some examples of application of the data for validation. The current database platform includes scenarios for PWR, BWR, VVER, and specific benchmarks for CFD modelling data and is to be expanded to include references for molten salt reactors. There are place holders for high temperature gas cooled reactors, CANDU and liquid metal reactors. This relational database is called The International Experimental Thermal Hydraulic Systems (TIETHYS) database and currently resides at Nuclear Energy Agency (NEA) of the OECD and is freely open to public access. Going forward the database will be extended to include additional links and data as they become available. https://www.oecd-nea.org/tiethysweb/« less

  7. BioWarehouse: a bioinformatics database warehouse toolkit

    PubMed Central

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David WJ; Tenenbaum, Jessica D; Karp, Peter D

    2006-01-01

    Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the database integration problem for

  8. Integrating Variances into an Analytical Database

    NASA Technical Reports Server (NTRS)

    Sanchez, Carlos

    2010-01-01

    For this project, I enrolled in numerous SATERN courses that taught the basics of database programming. These include: Basic Access 2007 Forms, Introduction to Database Systems, Overview of Database Design, and others. My main job was to create an analytical database that can handle many stored forms and make it easy to interpret and organize. Additionally, I helped improve an existing database and populate it with information. These databases were designed to be used with data from Safety Variances and DCR forms. The research consisted of analyzing the database and comparing the data to find out which entries were repeated the most. If an entry happened to be repeated several times in the database, that would mean that the rule or requirement targeted by that variance has been bypassed many times already and so the requirement may not really be needed, but rather should be changed to allow the variance's conditions permanently. This project did not only restrict itself to the design and development of the database system, but also worked on exporting the data from the database to a different format (e.g. Excel or Word) so it could be analyzed in a simpler fashion. Thanks to the change in format, the data was organized in a spreadsheet that made it possible to sort the data by categories or types and helped speed up searches. Once my work with the database was done, the records of variances could be arranged so that they were displayed in numerical order, or one could search for a specific document targeted by the variances and restrict the search to only include variances that modified a specific requirement. A great part that contributed to my learning was SATERN, NASA's resource for education. Thanks to the SATERN online courses I took over the summer, I was able to learn many new things about computers and databases and also go more in depth into topics I already knew about.

  9. Ontological interpretation of biomedical database content.

    PubMed

    Santana da Silva, Filipe; Jansen, Ludger; Freitas, Fred; Schulz, Stefan

    2017-06-26

    Biological databases store data about laboratory experiments, together with semantic annotations, in order to support data aggregation and retrieval. The exact meaning of such annotations in the context of a database record is often ambiguous. We address this problem by grounding implicit and explicit database content in a formal-ontological framework. By using a typical extract from the databases UniProt and Ensembl, annotated with content from GO, PR, ChEBI and NCBI Taxonomy, we created four ontological models (in OWL), which generate explicit, distinct interpretations under the BioTopLite2 (BTL2) upper-level ontology. The first three models interpret database entries as individuals (IND), defined classes (SUBC), and classes with dispositions (DISP), respectively; the fourth model (HYBR) is a combination of SUBC and DISP. For the evaluation of these four models, we consider (i) database content retrieval, using ontologies as query vocabulary; (ii) information completeness; and, (iii) DL complexity and decidability. The models were tested under these criteria against four competency questions (CQs). IND does not raise any ontological claim, besides asserting the existence of sample individuals and relations among them. Modelling patterns have to be created for each type of annotation referent. SUBC is interpreted regarding maximally fine-grained defined subclasses under the classes referred to by the data. DISP attempts to extract truly ontological statements from the database records, claiming the existence of dispositions. HYBR is a hybrid of SUBC and DISP and is more parsimonious regarding expressiveness and query answering complexity. For each of the four models, the four CQs were submitted as DL queries. This shows the ability to retrieve individuals with IND, and classes in SUBC and HYBR. DISP does not retrieve anything because the axioms with disposition are embedded in General Class Inclusion (GCI) statements. Ambiguity of biological database content is

  10. Abstraction of the Relational Model from a Department of Veterans Affairs DHCP Database: Bridging Theory and Working Application

    PubMed Central

    Levy, C.; Beauchamp, C.

    1996-01-01

    This poster describes the methods used and working prototype that was developed from an abstraction of the relational model from the VA's hierarchical DHCP database. Overlaying the relational model on DHCP permits multiple user views of the physical data structure, enhances access to the database by providing a link to commercial (SQL based) software, and supports a conceptual managed care data model based on primary and longitudinal patient care. The goal of this work was to create a relational abstraction of the existing hierarchical database; to construct, using SQL data definition language, user views of the database which reflect the clinical conceptual view of DHCP, and to allow the user to work directly with the logical view of the data using GUI based commercial software of their choosing. The workstation is intended to serve as a platform from which a managed care information model could be implemented and evaluated.

  11. Prototype of web-based database of surface wave investigation results for site classification

    NASA Astrophysics Data System (ADS)

    Hayashi, K.; Cakir, R.; Martin, A. J.; Craig, M. S.; Lorenzo, J. M.

    2016-12-01

    As active and passive surface wave methods are getting popular for evaluating site response of earthquake ground motion, demand on the development of database for investigation results is also increasing. Seismic ground motion not only depends on 1D velocity structure but also on 2D and 3D structures so that spatial information of S-wave velocity must be considered in ground motion prediction. The database can support to construct 2D and 3D underground models. Inversion of surface wave processing is essentially non-unique so that other information must be combined into the processing. The database of existed geophysical, geological and geotechnical investigation results can provide indispensable information to improve the accuracy and reliability of investigations. Most investigations, however, are carried out by individual organizations and investigation results are rarely stored in the unified and organized database. To study and discuss appropriate database and digital standard format for the surface wave investigations, we developed a prototype of web-based database to store observed data and processing results of surface wave investigations that we have performed at more than 400 sites in U.S. and Japan. The database was constructed on a web server using MySQL and PHP so that users can access to the database through the internet from anywhere with any device. All data is registered in the database with location and users can search geophysical data through Google Map. The database stores dispersion curves, horizontal to vertical spectral ratio and S-wave velocity profiles at each site that was saved in XML files as digital data so that user can review and reuse them. The database also stores a published 3D deep basin and crustal structure and user can refer it during the processing of surface wave data.

  12. The Halophile protein database.

    PubMed

    Sharma, Naveen; Farooqi, Mohammad Samir; Chaturvedi, Krishna Kumar; Lal, Shashi Bhushan; Grover, Monendra; Rai, Anil; Pandey, Pankaj

    2014-01-01

    Halophilic archaea/bacteria adapt to different salt concentration, namely extreme, moderate and low. These type of adaptations may occur as a result of modification of protein structure and other changes in different cell organelles. Thus proteins may play an important role in the adaptation of halophilic archaea/bacteria to saline conditions. The Halophile protein database (HProtDB) is a systematic attempt to document the biochemical and biophysical properties of proteins from halophilic archaea/bacteria which may be involved in adaptation of these organisms to saline conditions. In this database, various physicochemical properties such as molecular weight, theoretical pI, amino acid composition, atomic composition, estimated half-life, instability index, aliphatic index and grand average of hydropathicity (Gravy) have been listed. These physicochemical properties play an important role in identifying the protein structure, bonding pattern and function of the specific proteins. This database is comprehensive, manually curated, non-redundant catalogue of proteins. The database currently contains 59 897 proteins properties extracted from 21 different strains of halophilic archaea/bacteria. The database can be accessed through link. Database URL: http://webapp.cabgrid.res.in/protein/ © The Author(s) 2014. Published by Oxford University Press.

  13. Integrated Functional and Executional Modelling of Software Using Web-Based Databases

    NASA Technical Reports Server (NTRS)

    Kulkarni, Deepak; Marietta, Roberta

    1998-01-01

    NASA's software subsystems undergo extensive modification and updates over the operational lifetimes. It is imperative that modified software should satisfy safety goals. This report discusses the difficulties encountered in doing so and discusses a solution based on integrated modelling of software, use of automatic information extraction tools, web technology and databases.

  14. SmallSat Database

    NASA Technical Reports Server (NTRS)

    Petropulos, Dolores; Bittner, David; Murawski, Robert; Golden, Bert

    2015-01-01

    The SmallSat has an unrealized potential in both the private industry and in the federal government. Currently over 70 companies, 50 universities and 17 governmental agencies are involved in SmallSat research and development. In 1994, the U.S. Army Missile and Defense mapped the moon using smallSat imagery. Since then Smart Phones have introduced this imagery to the people of the world as diverse industries watched this trend. The deployment cost of smallSats is also greatly reduced compared to traditional satellites due to the fact that multiple units can be deployed in a single mission. Imaging payloads have become more sophisticated, smaller and lighter. In addition, the growth of small technology obtained from private industries has led to the more widespread use of smallSats. This includes greater revisit rates in imagery, significantly lower costs, the ability to update technology more frequently and the ability to decrease vulnerability of enemy attacks. The popularity of smallSats show a changing mentality in this fast paced world of tomorrow. What impact has this created on the NASA communication networks now and in future years? In this project, we are developing the SmallSat Relational Database which can support a simulation of smallSats within the NASA SCaN Compatability Environment for Networks and Integrated Communications (SCENIC) Modeling and Simulation Lab. The NASA Space Communications and Networks (SCaN) Program can use this modeling to project required network support needs in the next 10 to 15 years. The SmallSat Rational Database could model smallSats just as the other SCaN databases model the more traditional larger satellites, with a few exceptions. One being that the smallSat Database is designed to be built-to-order. The SmallSat database holds various hardware configurations that can be used to model a smallSat. It will require significant effort to develop as the research material can only be populated by hand to obtain the unique data

  15. Chess databases as a research vehicle in psychology: Modeling large data.

    PubMed

    Vaci, Nemanja; Bilalić, Merim

    2017-08-01

    The game of chess has often been used for psychological investigations, particularly in cognitive science. The clear-cut rules and well-defined environment of chess provide a model for investigations of basic cognitive processes, such as perception, memory, and problem solving, while the precise rating system for the measurement of skill has enabled investigations of individual differences and expertise-related effects. In the present study, we focus on another appealing feature of chess-namely, the large archive databases associated with the game. The German national chess database presented in this study represents a fruitful ground for the investigation of multiple longitudinal research questions, since it collects the data of over 130,000 players and spans over 25 years. The German chess database collects the data of all players, including hobby players, and all tournaments played. This results in a rich and complete collection of the skill, age, and activity of the whole population of chess players in Germany. The database therefore complements the commonly used expertise approach in cognitive science by opening up new possibilities for the investigation of multiple factors that underlie expertise and skill acquisition. Since large datasets are not common in psychology, their introduction also raises the question of optimal and efficient statistical analysis. We offer the database for download and illustrate how it can be used by providing concrete examples and a step-by-step tutorial using different statistical analyses on a range of topics, including skill development over the lifetime, birth cohort effects, effects of activity and inactivity on skill, and gender differences.

  16. A virtual observatory for photoionized nebulae: the Mexican Million Models database (3MdB).

    NASA Astrophysics Data System (ADS)

    Morisset, C.; Delgado-Inglada, G.; Flores-Fajardo, N.

    2015-04-01

    Photoionization models obtained with numerical codes are widely used to study the physics of the interstellar medium (planetary nebulae, HII regions, etc). Grids of models are performed to understand the effects of the different parameters used to describe the regions on the observables (mainly emission line intensities). Most of the time, only a small part of the computed results of such grids are published, and they are sometimes hard to obtain in a user-friendly format. We present here the Mexican Million Models dataBase (3MdB), an effort to resolve both of these issues in the form of a database of photoionization models, easily accessible through the MySQL protocol, and containing a lot of useful outputs from the models, such as the intensities of 178 emission lines, the ionic fractions of all the ions, etc. Some examples of the use of the 3MdB are also presented.

  17. Effects of Soil Data and Simulation Unit Resolution on Quantifying Changes of Soil Organic Carbon at Regional Scale with a Biogeochemical Process Model

    PubMed Central

    Zhang, Liming; Yu, Dongsheng; Shi, Xuezheng; Xu, Shengxiang; Xing, Shihe; Zhao, Yongcong

    2014-01-01

    Soil organic carbon (SOC) models were often applied to regions with high heterogeneity, but limited spatially differentiated soil information and simulation unit resolution. This study, carried out in the Tai-Lake region of China, defined the uncertainty derived from application of the DeNitrification-DeComposition (DNDC) biogeochemical model in an area with heterogeneous soil properties and different simulation units. Three different resolution soil attribute databases, a polygonal capture of mapping units at 1∶50,000 (P5), a county-based database of 1∶50,000 (C5) and county-based database of 1∶14,000,000 (C14), were used as inputs for regional DNDC simulation. The P5 and C5 databases were combined with the 1∶50,000 digital soil map, which is the most detailed soil database for the Tai-Lake region. The C14 database was combined with 1∶14,000,000 digital soil map, which is a coarse database and is often used for modeling at a national or regional scale in China. The soil polygons of P5 database and county boundaries of C5 and C14 databases were used as basic simulation units. Results project that from 1982 to 2000, total SOC change in the top layer (0–30 cm) of the 2.3 M ha of paddy soil in the Tai-Lake region was +1.48 Tg C, −3.99 Tg C and −15.38 Tg C based on P5, C5 and C14 databases, respectively. With the total SOC change as modeled with P5 inputs as the baseline, which is the advantages of using detailed, polygon-based soil dataset, the relative deviation of C5 and C14 were 368% and 1126%, respectively. The comparison illustrates that DNDC simulation is strongly influenced by choice of fundamental geographic resolution as well as input soil attribute detail. The results also indicate that improving the framework of DNDC is essential in creating accurate models of the soil carbon cycle. PMID:24523922

  18. Adding Hierarchical Objects to Relational Database General-Purpose XML-Based Information Managements

    NASA Technical Reports Server (NTRS)

    Lin, Shu-Chun; Knight, Chris; La, Tracy; Maluf, David; Bell, David; Tran, Khai Peter; Gawdiak, Yuri

    2006-01-01

    NETMARK is a flexible, high-throughput software system for managing, storing, and rapid searching of unstructured and semi-structured documents. NETMARK transforms such documents from their original highly complex, constantly changing, heterogeneous data formats into well-structured, common data formats in using Hypertext Markup Language (HTML) and/or Extensible Markup Language (XML). The software implements an object-relational database system that combines the best practices of the relational model utilizing Structured Query Language (SQL) with those of the object-oriented, semantic database model for creating complex data. In particular, NETMARK takes advantage of the Oracle 8i object-relational database model using physical-address data types for very efficient keyword searches of records across both context and content. NETMARK also supports multiple international standards such as WEBDAV for drag-and-drop file management and SOAP for integrated information management using Web services. The document-organization and -searching capabilities afforded by NETMARK are likely to make this software attractive for use in disciplines as diverse as science, auditing, and law enforcement.

  19. Pathway Analysis and Omics Data Visualization Using Pathway Genome Databases: FragariaCyc, a Case Study.

    PubMed

    Naithani, Sushma; Jaiswal, Pankaj

    2017-01-01

    The species-specific plant Pathway Genome Databases (PGDBs) based on the BioCyc platform provide a conceptual model of the cellular metabolic network of an organism. Such frameworks allow analysis of the genome-scale expression data to understand changes in the overall metabolisms of an organism (or organs, tissues, and cells) in response to various extrinsic (e.g. developmental and differentiation) and/or extrinsic signals (e.g. pathogens and abiotic stresses) from the surrounding environment. Using FragariaCyc, a pathway database for the diploid strawberry Fragaria vesca, we show (1) the basic navigation across a PGDB; (2) a case study of pathway comparison across plant species; and (3) an example of RNA-Seq data analysis using Omics Viewer tool. The protocols described here generally apply to other Pathway Tools-based PGDBs.

  20. Respiratory cancer database: An open access database of respiratory cancer gene and miRNA.

    PubMed

    Choubey, Jyotsna; Choudhari, Jyoti Kant; Patel, Ashish; Verma, Mukesh Kumar

    2017-01-01

    Respiratory cancer database (RespCanDB) is a genomic and proteomic database of cancer of respiratory organ. It also includes the information of medicinal plants used for the treatment of various respiratory cancers with structure of its active constituents as well as pharmacological and chemical information of drug associated with various respiratory cancers. Data in RespCanDB has been manually collected from published research article and from other databases. Data has been integrated using MySQL an object-relational database management system. MySQL manages all data in the back-end and provides commands to retrieve and store the data into the database. The web interface of database has been built in ASP. RespCanDB is expected to contribute to the understanding of scientific community regarding respiratory cancer biology as well as developments of new way of diagnosing and treating respiratory cancer. Currently, the database consist the oncogenomic information of lung cancer, laryngeal cancer, and nasopharyngeal cancer. Data for other cancers, such as oral and tracheal cancers, will be added in the near future. The URL of RespCanDB is http://ridb.subdic-bioinformatics-nitrr.in/.

  1. The Génolevures database.

    PubMed

    Martin, Tiphaine; Sherman, David J; Durrens, Pascal

    2011-01-01

    The Génolevures online database (URL: http://www.genolevures.org) stores and provides the data and results obtained by the Génolevures Consortium through several campaigns of genome annotation of the yeasts in the Saccharomycotina subphylum (hemiascomycetes). This database is dedicated to large-scale comparison of these genomes, storing not only the different chromosomal elements detected in the sequences, but also the logical relations between them. The database is divided into a public part, accessible to anyone through Internet, and a private part where the Consortium members make genome annotations with our Magus annotation system; this system is used to annotate several related genomes in parallel. The public database is widely consulted and offers structured data, organized using a REST web site architecture that allows for automated requests. The implementation of the database, as well as its associated tools and methods, is evolving to cope with the influx of genome sequences produced by Next Generation Sequencing (NGS). Copyright © 2011 Académie des sciences. Published by Elsevier SAS. All rights reserved.

  2. 3D Bioprinting of Tissue/Organ Models.

    PubMed

    Pati, Falguni; Gantelius, Jesper; Svahn, Helene Andersson

    2016-04-04

    In vitro tissue/organ models are useful platforms that can facilitate systematic, repetitive, and quantitative investigations of drugs/chemicals. The primary objective when developing tissue/organ models is to reproduce physiologically relevant functions that typically require complex culture systems. Bioprinting offers exciting prospects for constructing 3D tissue/organ models, as it enables the reproducible, automated production of complex living tissues. Bioprinted tissues/organs may prove useful for screening novel compounds or predicting toxicity, as the spatial and chemical complexity inherent to native tissues/organs can be recreated. In this Review, we highlight the importance of developing 3D in vitro tissue/organ models by 3D bioprinting techniques, characterization of these models for evaluating their resemblance to native tissue, and their application in the prioritization of lead candidates, toxicity testing, and as disease/tumor models. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. The Degradome database: expanding roles of mammalian proteases in life and disease

    PubMed Central

    Pérez-Silva, José G.; Español, Yaiza; Velasco, Gloria; Quesada, Víctor

    2016-01-01

    Since the definition of the degradome as the complete repertoire of proteases in a given organism, the combined effort of numerous laboratories has greatly expanded our knowledge of its roles in biology and pathology. Once the genomic sequences of several important model organisms were made available, we presented the Degradome database containing the curated sets of known protease genes in human, chimpanzee, mouse and rat. Here, we describe the updated Degradome database, featuring 81 new protease genes and 7 new protease families. Notably, in this short time span, the number of known hereditary diseases caused by mutations in protease genes has increased from 77 to 119. This increase reflects the growing interest on the roles of the degradome in multiple diseases, including cancer and ageing. Finally, we have leveraged the widespread adoption of new webtools to provide interactive graphic views that show information about proteases in the global context of the degradome. The Degradome database can be accessed through its web interface at http://degradome.uniovi.es. PMID:26553809

  4. The Transcriptome Analysis and Comparison Explorer--T-ACE: a platform-independent, graphical tool to process large RNAseq datasets of non-model organisms.

    PubMed

    Philipp, E E R; Kraemer, L; Mountfort, D; Schilhabel, M; Schreiber, S; Rosenstiel, P

    2012-03-15

    Next generation sequencing (NGS) technologies allow a rapid and cost-effective compilation of large RNA sequence datasets in model and non-model organisms. However, the storage and analysis of transcriptome information from different NGS platforms is still a significant bottleneck, leading to a delay in data dissemination and subsequent biological understanding. Especially database interfaces with transcriptome analysis modules going beyond mere read counts are missing. Here, we present the Transcriptome Analysis and Comparison Explorer (T-ACE), a tool designed for the organization and analysis of large sequence datasets, and especially suited for transcriptome projects of non-model organisms with little or no a priori sequence information. T-ACE offers a TCL-based interface, which accesses a PostgreSQL database via a php-script. Within T-ACE, information belonging to single sequences or contigs, such as annotation or read coverage, is linked to the respective sequence and immediately accessible. Sequences and assigned information can be searched via keyword- or BLAST-search. Additionally, T-ACE provides within and between transcriptome analysis modules on the level of expression, GO terms, KEGG pathways and protein domains. Results are visualized and can be easily exported for external analysis. We developed T-ACE for laboratory environments, which have only a limited amount of bioinformatics support, and for collaborative projects in which different partners work on the same dataset from different locations or platforms (Windows/Linux/MacOS). For laboratories with some experience in bioinformatics and programming, the low complexity of the database structure and open-source code provides a framework that can be customized according to the different needs of the user and transcriptome project.

  5. The risk of paradoxical embolism (RoPE) study: initial description of the completed database.

    PubMed

    Thaler, David E; Di Angelantonio, Emanuele; Di Tullio, Marco R; Donovan, Jennifer S; Griffith, John; Homma, Shunichi; Jaigobin, Cheryl; Mas, Jean-Louis; Mattle, Heinrich P; Michel, Patrik; Mono, Marie-Luise; Nedeltchev, Krassen; Papetti, Federica; Ruthazer, Robin; Serena, Joaquín; Weimar, Christian; Elkind, Mitchell S V; Kent, David M

    2013-12-01

    Detecting a benefit from closure of patent foramen ovale in patients with cryptogenic stroke is hampered by low rates of stroke recurrence and uncertainty about the causal role of patent foramen ovale in the index event. A method to predict patent foramen ovale-attributable recurrence risk is needed. However, individual databases generally have too few stroke recurrences to support risk modeling. Prior studies of this population have been limited by low statistical power for examining factors related to recurrence. The aim of this study was to develop a database to support modeling of patent foramen ovale-attributable recurrence risk by combining extant data sets. We identified investigators with extant databases including subjects with cryptogenic stroke investigated for patent foramen ovale, determined the availability and characteristics of data in each database, collaboratively specified the variables to be included in the Risk of Paradoxical Embolism database, harmonized the variables across databases, and collected new primary data when necessary and feasible. The Risk of Paradoxical Embolism database has individual clinical, radiologic, and echocardiographic data from 12 component databases, including subjects with cryptogenic stroke both with (n = 1925) and without (n = 1749) patent foramen ovale. In the patent foramen ovale subjects, a total of 381 outcomes (stroke, transient ischemic attack, death) occurred (median follow-up 2·2 years). While there were substantial variations in data collection between studies, there was sufficient overlap to define a common set of variables suitable for risk modeling. While individual studies are inadequate for modeling patent foramen ovale-attributable recurrence risk, collaboration between investigators has yielded a database with sufficient power to identify those patients at highest risk for a patent foramen ovale-related stroke recurrence who may have the greatest potential benefit from patent foramen ovale

  6. Heterogeneous database integration in biomedicine.

    PubMed

    Sujansky, W

    2001-08-01

    The rapid expansion of biomedical knowledge, reduction in computing costs, and spread of internet access have created an ocean of electronic data. The decentralized nature of our scientific community and healthcare system, however, has resulted in a patchwork of diverse, or heterogeneous, database implementations, making access to and aggregation of data across databases very difficult. The database heterogeneity problem applies equally to clinical data describing individual patients and biological data characterizing our genome. Specifically, databases are highly heterogeneous with respect to the data models they employ, the data schemas they specify, the query languages they support, and the terminologies they recognize. Heterogeneous database systems attempt to unify disparate databases by providing uniform conceptual schemas that resolve representational heterogeneities, and by providing querying capabilities that aggregate and integrate distributed data. Research in this area has applied a variety of database and knowledge-based techniques, including semantic data modeling, ontology definition, query translation, query optimization, and terminology mapping. Existing systems have addressed heterogeneous database integration in the realms of molecular biology, hospital information systems, and application portability.

  7. The usefulness of administrative databases for identifying disease cohorts is increased with a multivariate model.

    PubMed

    van Walraven, Carl; Austin, Peter C; Manuel, Douglas; Knoll, Greg; Jennings, Allison; Forster, Alan J

    2010-12-01

    Administrative databases commonly use codes to indicate diagnoses. These codes alone are often inadequate to accurately identify patients with particular conditions. In this study, we determined whether we could quantify the probability that a person has a particular disease-in this case renal failure-using other routinely collected information available in an administrative data set. This would allow the accurate identification of a disease cohort in an administrative database. We determined whether patients in a randomly selected 100,000 hospitalizations had kidney disease (defined as two or more sequential serum creatinines or the single admission creatinine indicating a calculated glomerular filtration rate less than 60 mL/min/1.73 m²). The independent association of patient- and hospitalization-level variables with renal failure was measured using a multivariate logistic regression model in a random 50% sample of the patients. The model was validated in the remaining patients. Twenty thousand seven hundred thirteen patients had kidney disease (20.7%). A diagnostic code of kidney disease was strongly associated with kidney disease (relative risk: 34.4), but the accuracy of the code was poor (sensitivity: 37.9%; specificity: 98.9%). Twenty-nine patient- and hospitalization-level variables entered the kidney disease model. This model had excellent discrimination (c-statistic: 90.1%) and accurately predicted the probability of true renal failure. The probability threshold that maximized sensitivity and specificity for the identification of true kidney disease was 21.3% (sensitivity: 80.0%; specificity: 82.2%). Multiple variables available in administrative databases can be combined to quantify the probability that a person has a particular disease. This process permits accurate identification of a disease cohort in an administrative database. These methods may be extended to other diagnoses or procedures and could both facilitate and clarify the use of

  8. [Establishment of the database of the 3D facial models for the plastic surgery based on network].

    PubMed

    Liu, Zhe; Zhang, Hai-Lin; Zhang, Zheng-Guo; Qiao, Qun

    2008-07-01

    To collect the three-dimensional (3D) facial data of 30 facial deformity patients by the 3D scanner and establish a professional database based on Internet. It can be helpful for the clinical intervention. The primitive point data of face topography were collected by the 3D scanner. Then the 3D point cloud was edited by reverse engineering software to reconstruct the 3D model of the face. The database system was divided into three parts, including basic information, disease information and surgery information. The programming language of the web system is Java. The linkages between every table of the database are credibility. The query operation and the data mining are convenient. The users can visit the database via the Internet and use the image analysis system to observe the 3D facial models interactively. In this paper we presented a database and a web system adapt to the plastic surgery of human face. It can be used both in clinic and in basic research.

  9. Development and validation of a facial expression database based on the dimensional and categorical model of emotions.

    PubMed

    Fujimura, Tomomi; Umemura, Hiroyuki

    2018-01-15

    The present study describes the development and validation of a facial expression database comprising five different horizontal face angles in dynamic and static presentations. The database includes twelve expression types portrayed by eight Japanese models. This database was inspired by the dimensional and categorical model of emotions: surprise, fear, sadness, anger with open mouth, anger with closed mouth, disgust with open mouth, disgust with closed mouth, excitement, happiness, relaxation, sleepiness, and neutral (static only). The expressions were validated using emotion classification and Affect Grid rating tasks [Russell, Weiss, & Mendelsohn, 1989. Affect Grid: A single-item scale of pleasure and arousal. Journal of Personality and Social Psychology, 57(3), 493-502]. The results indicate that most of the expressions were recognised as the intended emotions and could systematically represent affective valence and arousal. Furthermore, face angle and facial motion information influenced emotion classification and valence and arousal ratings. Our database will be available online at the following URL. https://www.dh.aist.go.jp/database/face2017/ .

  10. Hydrodynamic interaction of two swimming model micro-organisms

    NASA Astrophysics Data System (ADS)

    Ishikawa, Takuji; Simmonds, M. P.; Pedley, T. J.

    2006-12-01

    In order to understand the rheological and transport properties of a suspension of swimming micro-organisms, it is necessary to analyse the fluid-dynamical interaction of pairs of such swimming cells. In this paper, a swimming micro-organism is modelled as a squirming sphere with prescribed tangential surface velocity, referred to as a squirmer. The centre of mass of the sphere may be displaced from the geometric centre (bottom-heaviness). The effects of inertia and Brownian motion are neglected, because real micro-organisms swim at very low Reynolds numbers but are too large for Brownian effects to be important. The interaction of two squirmers is calculated analytically for the limits of small and large separations and is also calculated numerically using a boundary-element method. The analytical and the numerical results for the translational rotational velocities and for the stresslet of two squirmers correspond very well. We sought to generate a database for an interacting pair of squirmers from which one can easily predict the motion of a collection of squirmers. The behaviour of two interacting squirmers is discussed phenomenologically, too. The results for the trajectories of two squirmers show that first the squirmers attract each other, then they change their orientation dramatically when they are in near contact and finally they separate from each other. The effect of bottom-heaviness is considerable. Restricting the trajectories to two dimensions is shown to give misleading results. Some movies of interacting squirmers are available with the online version of the paper.

  11. Database for Safety-Oriented Tracking of Chemicals

    NASA Technical Reports Server (NTRS)

    Stump, Jacob; Carr, Sandra; Plumlee, Debrah; Slater, Andy; Samson, Thomas M.; Holowaty, Toby L.; Skeete, Darren; Haenz, Mary Alice; Hershman, Scot; Raviprakash, Pushpa

    2010-01-01

    SafetyChem is a computer program that maintains a relational database for tracking chemicals and associated hazards at Johnson Space Center (JSC) by use of a Web-based graphical user interface. The SafetyChem database is accessible to authorized users via a JSC intranet. All new chemicals pass through a safety office, where information on hazards, required personal protective equipment (PPE), fire-protection warnings, and target organ effects (TOEs) is extracted from material safety data sheets (MSDSs) and recorded in the database. The database facilitates real-time management of inventory with attention to such issues as stability, shelf life, reduction of waste through transfer of unused chemicals to laboratories that need them, quantification of chemical wastes, and identification of chemicals for which disposal is required. Upon searching the database for a chemical, the user receives information on physical properties of the chemical, hazard warnings, required PPE, a link to the MSDS, and references to the applicable International Standards Organization (ISO) 9000 standard work instructions and the applicable job hazard analysis. Also, to reduce the labor hours needed to comply with reporting requirements of the Occupational Safety and Health Administration, the data can be directly exported into the JSC hazardous- materials database.

  12. EcoCyc: a comprehensive database resource for Escherichia coli

    PubMed Central

    Keseler, Ingrid M.; Collado-Vides, Julio; Gama-Castro, Socorro; Ingraham, John; Paley, Suzanne; Paulsen, Ian T.; Peralta-Gil, Martín; Karp, Peter D.

    2005-01-01

    The EcoCyc database (http://EcoCyc.org/) is a comprehensive source of information on the biology of the prototypical model organism Escherichia coli K12. The mission for EcoCyc is to contain both computable descriptions of, and detailed comments describing, all genes, proteins, pathways and molecular interactions in E.coli. Through ongoing manual curation, extensive information such as summary comments, regulatory information, literature citations and evidence types has been extracted from 8862 publications and added to Version 8.5 of the EcoCyc database. The EcoCyc database can be accessed through a World Wide Web interface, while the downloadable Pathway Tools software and data files enable computational exploration of the data and provide enhanced querying capabilities that web interfaces cannot support. For example, EcoCyc contains carefully curated information that can be used as training sets for bioinformatics prediction of entities such as promoters, operons, genetic networks, transcription factor binding sites, metabolic pathways, functionally related genes, protein complexes and protein–ligand interactions. PMID:15608210

  13. Application of Large-Scale Database-Based Online Modeling to Plant State Long-Term Estimation

    NASA Astrophysics Data System (ADS)

    Ogawa, Masatoshi; Ogai, Harutoshi

    Recently, attention has been drawn to the local modeling techniques of a new idea called “Just-In-Time (JIT) modeling”. To apply “JIT modeling” to a large amount of database online, “Large-scale database-based Online Modeling (LOM)” has been proposed. LOM is a technique that makes the retrieval of neighboring data more efficient by using both “stepwise selection” and quantization. In order to predict the long-term state of the plant without using future data of manipulated variables, an Extended Sequential Prediction method of LOM (ESP-LOM) has been proposed. In this paper, the LOM and the ESP-LOM are introduced.

  14. OperomeDB: A Database of Condition-Specific Transcription Units in Prokaryotic Genomes.

    PubMed

    Chetal, Kashish; Janga, Sarath Chandra

    2015-01-01

    Background. In prokaryotic organisms, a substantial fraction of adjacent genes are organized into operons-codirectionally organized genes in prokaryotic genomes with the presence of a common promoter and terminator. Although several available operon databases provide information with varying levels of reliability, very few resources provide experimentally supported results. Therefore, we believe that the biological community could benefit from having a new operon prediction database with operons predicted using next-generation RNA-seq datasets. Description. We present operomeDB, a database which provides an ensemble of all the predicted operons for bacterial genomes using available RNA-sequencing datasets across a wide range of experimental conditions. Although several studies have recently confirmed that prokaryotic operon structure is dynamic with significant alterations across environmental and experimental conditions, there are no comprehensive databases for studying such variations across prokaryotic transcriptomes. Currently our database contains nine bacterial organisms and 168 transcriptomes for which we predicted operons. User interface is simple and easy to use, in terms of visualization, downloading, and querying of data. In addition, because of its ability to load custom datasets, users can also compare their datasets with publicly available transcriptomic data of an organism. Conclusion. OperomeDB as a database should not only aid experimental groups working on transcriptome analysis of specific organisms but also enable studies related to computational and comparative operomics.

  15. Mathematical models for exploring different aspects of genotoxicity and carcinogenicity databases.

    PubMed

    Benigni, R; Giuliani, A

    1991-12-01

    One great obstacle to understanding and using the information contained in the genotoxicity and carcinogenicity databases is the very size of such databases. Their vastness makes them difficult to read; this leads to inadequate exploitation of the information, which becomes costly in terms of time, labor, and money. In its search for adequate approaches to the problem, the scientific community has, curiously, almost entirely neglected an existent series of very powerful methods of data analysis: the multivariate data analysis techniques. These methods were specifically designed for exploring large data sets. This paper presents the multivariate techniques and reports a number of applications to genotoxicity problems. These studies show how biology and mathematical modeling can be combined and how successful this combination is.

  16. The Xeno-glycomics database (XDB): a relational database of qualitative and quantitative pig glycome repertoire.

    PubMed

    Park, Hae-Min; Park, Ju-Hyeong; Kim, Yoon-Woo; Kim, Kyoung-Jin; Jeong, Hee-Jin; Jang, Kyoung-Soon; Kim, Byung-Gee; Kim, Yun-Gon

    2013-11-15

    In recent years, the improvement of mass spectrometry-based glycomics techniques (i.e. highly sensitive, quantitative and high-throughput analytical tools) has enabled us to obtain a large dataset of glycans. Here we present a database named Xeno-glycomics database (XDB) that contains cell- or tissue-specific pig glycomes analyzed with mass spectrometry-based techniques, including a comprehensive pig glycan information on chemical structures, mass values, types and relative quantities. It was designed as a user-friendly web-based interface that allows users to query the database according to pig tissue/cell types or glycan masses. This database will contribute in providing qualitative and quantitative information on glycomes characterized from various pig cells/organs in xenotransplantation and might eventually provide new targets in the α1,3-galactosyltransferase gene-knock out pigs era. The database can be accessed on the web at http://bioinformatics.snu.ac.kr/xdb.

  17. A scalable database model for multiparametric time series: a volcano observatory case study

    NASA Astrophysics Data System (ADS)

    Montalto, Placido; Aliotta, Marco; Cassisi, Carmelo; Prestifilippo, Michele; Cannata, Andrea

    2014-05-01

    The variables collected by a sensor network constitute a heterogeneous data source that needs to be properly organized in order to be used in research and geophysical monitoring. With the time series term we refer to a set of observations of a given phenomenon acquired sequentially in time. When the time intervals are equally spaced one speaks of period or sampling frequency. Our work describes in detail a possible methodology for storage and management of time series using a specific data structure. We designed a framework, hereinafter called TSDSystem (Time Series Database System), in order to acquire time series from different data sources and standardize them within a relational database. The operation of standardization provides the ability to perform operations, such as query and visualization, of many measures synchronizing them using a common time scale. The proposed architecture follows a multiple layer paradigm (Loaders layer, Database layer and Business Logic layer). Each layer is specialized in performing particular operations for the reorganization and archiving of data from different sources such as ASCII, Excel, ODBC (Open DataBase Connectivity), file accessible from the Internet (web pages, XML). In particular, the loader layer performs a security check of the working status of each running software through an heartbeat system, in order to automate the discovery of acquisition issues and other warning conditions. Although our system has to manage huge amounts of data, performance is guaranteed by using a smart partitioning table strategy, that keeps balanced the percentage of data stored in each database table. TSDSystem also contains modules for the visualization of acquired data, that provide the possibility to query different time series on a specified time range, or follow the realtime signal acquisition, according to a data access policy from the users.

  18. Integrating ecological risk assessments across levels of organization using the Franklin-Noss model of biodiversity

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brugger, K.E.; Tiebout, H.M. III

    1994-12-31

    Wildlife toxicologists pioneered methodologies for assessing ecological risk to nontarget species. Historically, ecological risk assessments (ERAS) focused on a limited array of species and were based on a relatively few population-level endpoints (mortality, reproduction). Currently, risk assessment models are becoming increasingly complex that factor in multi-species interactions (across trophic levels) and utilize an increasingly diverse number of ecologically significant endpoints. This trend suggests the increasing importance of safeguarding not only populations of individual species, but also the overall integrity of the larger biotic systems that support them. In this sense, ERAs are in alignment with Conservation Biology, an applied sciencemore » of ecological knowledge used to conserve biodiversity. A theoretical conservation biology model could be incorporated in ERAs to quantify impacts to biodiversity (structure, function or composition across levels of biological organization). The authors suggest that the Franklin-Noss model for evaluating biodiversity, with its nested, hierarchical approach, may provide a suitable paradigm for assessing and integrating the ecological risk that chemical contaminants pose to biological systems from the simplest levels (genotypes, individual organisms) to the most complex levels of organization (communities and ecosystems). The Franklin-Noss model can accommodate the existing ecotoxicological database and, perhaps more importantly, indicate new areas in which critical endpoints should be identified and investigated.« less

  19. A database and tool for boundary conditions for regional air quality modeling: description and evaluation

    NASA Astrophysics Data System (ADS)

    Henderson, B. H.; Akhtar, F.; Pye, H. O. T.; Napelenok, S. L.; Hutzell, W. T.

    2013-09-01

    Transported air pollutants receive increasing attention as regulations tighten and global concentrations increase. The need to represent international transport in regional air quality assessments requires improved representation of boundary concentrations. Currently available observations are too sparse vertically to provide boundary information, particularly for ozone precursors, but global simulations can be used to generate spatially and temporally varying Lateral Boundary Conditions (LBC). This study presents a public database of global simulations designed and evaluated for use as LBC for air quality models (AQMs). The database covers the contiguous United States (CONUS) for the years 2000-2010 and contains hourly varying concentrations of ozone, aerosols, and their precursors. The database is complimented by a tool for configuring the global results as inputs to regional scale models (e.g., Community Multiscale Air Quality or Comprehensive Air quality Model with extensions). This study also presents an example application based on the CONUS domain, which is evaluated against satellite retrieved ozone vertical profiles. The results show performance is largely within uncertainty estimates for the Tropospheric Emission Spectrometer (TES) with some exceptions. The major difference shows a high bias in the upper troposphere along the southern boundary in January. This publication documents the global simulation database, the tool for conversion to LBC, and the fidelity of concentrations on the boundaries. This documentation is intended to support applications that require representation of long-range transport of air pollutants.

  20. A computational platform to maintain and migrate manual functional annotations for BioCyc databases.

    PubMed

    Walsh, Jesse R; Sen, Taner Z; Dickerson, Julie A

    2014-10-12

    BioCyc databases are an important resource for information on biological pathways and genomic data. Such databases represent the accumulation of biological data, some of which has been manually curated from literature. An essential feature of these databases is the continuing data integration as new knowledge is discovered. As functional annotations are improved, scalable methods are needed for curators to manage annotations without detailed knowledge of the specific design of the BioCyc database. We have developed CycTools, a software tool which allows curators to maintain functional annotations in a model organism database. This tool builds on existing software to improve and simplify annotation data imports of user provided data into BioCyc databases. Additionally, CycTools automatically resolves synonyms and alternate identifiers contained within the database into the appropriate internal identifiers. Automating steps in the manual data entry process can improve curation efforts for major biological databases. The functionality of CycTools is demonstrated by transferring GO term annotations from MaizeCyc to matching proteins in CornCyc, both maize metabolic pathway databases available at MaizeGDB, and by creating strain specific databases for metabolic engineering.

  1. Modeling self-organization of novel organic materials

    NASA Astrophysics Data System (ADS)

    Sayar, Mehmet

    In this thesis, the structural organization of oligomeric multi-block molecules is analyzed by computational analysis of coarse-grained models. These molecules form nanostructures with different dimensionalities, and the nanostructured nature of these materials leads to novel structural properties at different length scales. Previously, a number of oligomeric triblock rodcoil molecules have been shown to self-organize into mushroom shaped noncentrosymmetric nanostructures. Interestingly, thin films of these molecules contain polar domains and a finite macroscopic polarization. However, the fully polarized state is not the equilibrium state. In the first chapter, by solving a model with dipolar and Ising-like short range interactions, we show that polar domains are stable in films composed of aggregates as opposed to isolated molecules. Unlike classical molecular systems, these nanoaggregates have large intralayer spacings (a ≈ 6 nm), leading to a reduction in the repulsive dipolar interactions that oppose polar order within layers. This enables the formation of a striped pattern with polar domains of alternating directions. The energies of the possible structures at zero temperature are computed exactly and results of Monte Carlo simulations are provided at non-zero temperatures. In the second chapter, the macroscopic polarization of such nanostructured films is analyzed in the presence of a short range surface interaction. The surface interaction leads to a periodic domain structure where the balance between the up and down domains is broken, and therefore films of finite thickness have a net macroscopic polarization. The polarization per unit volume is a function of film thickness and strength of the surface interaction. Finally, in chapter three, self-organization of organic molecules into a network of one dimensional objects is analyzed. Multi-block organic dendron rodcoil molecules were found to self-organize into supramolecular nanoribbons (threads) and

  2. The Co-regulation Data Harvester: Automating gene annotation starting from a transcriptome database

    NASA Astrophysics Data System (ADS)

    Tsypin, Lev M.; Turkewitz, Aaron P.

    Identifying co-regulated genes provides a useful approach for defining pathway-specific machinery in an organism. To be efficient, this approach relies on thorough genome annotation, a process much slower than genome sequencing per se. Tetrahymena thermophila, a unicellular eukaryote, has been a useful model organism and has a fully sequenced but sparsely annotated genome. One important resource for studying this organism has been an online transcriptomic database. We have developed an automated approach to gene annotation in the context of transcriptome data in T. thermophila, called the Co-regulation Data Harvester (CDH). Beginning with a gene of interest, the CDH identifies co-regulated genes by accessing the Tetrahymena transcriptome database. It then identifies their closely related genes (orthologs) in other organisms by using reciprocal BLAST searches. Finally, it collates the annotations of those orthologs' functions, which provides the user with information to help predict the cellular role of the initial query. The CDH, which is freely available, represents a powerful new tool for analyzing cell biological pathways in Tetrahymena. Moreover, to the extent that genes and pathways are conserved between organisms, the inferences obtained via the CDH should be relevant, and can be explored, in many other systems.

  3. Organic acid modeling and model validation: Workshop summary

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sullivan, T.J.; Eilers, J.M.

    1992-08-14

    A workshop was held in Corvallis, Oregon on April 9--10, 1992 at the offices of E S Environmental Chemistry, Inc. The purpose of this workshop was to initiate research efforts on the entitled Incorporation of an organic acid representation into MAGIC (Model of Acidification of Groundwater in Catchments) and testing of the revised model using Independent data sources.'' The workshop was attended by a team of internationally-recognized experts in the fields of surface water acid-bass chemistry, organic acids, and watershed modeling. The rationale for the proposed research is based on the recent comparison between MAGIC model hindcasts and paleolimnological inferencesmore » of historical acidification for a set of 33 statistically-selected Adirondack lakes. Agreement between diatom-inferred and MAGIC-hindcast lakewater chemistry in the earlier research had been less than satisfactory. Based on preliminary analyses, it was concluded that incorporation of a reasonable organic acid representation into the version of MAGIC used for hindcasting was the logical next step toward improving model agreement.« less

  4. Empirical cost models for estimating power and energy consumption in database servers

    NASA Astrophysics Data System (ADS)

    Valdivia Garcia, Harold Dwight

    The explosive growth in the size of data centers, coupled with the widespread use of virtualization technology has brought power and energy consumption as major concerns for data center administrators. Provisioning decisions must take into consideration not only target application performance but also the power demands and total energy consumption incurred by the hardware and software to be deployed at the data center. Failure to do so will result in damaged equipment, power outages, and inefficient operation. Since database servers comprise one of the most popular and important server applications deployed in such facilities, it becomes necessary to have accurate cost models that can predict the power and energy demands that each database workloads will impose in the system. In this work we present an empirical methodology to estimate the power and energy cost of database operations. Our methodology uses multiple-linear regression to derive accurate cost models that depend only on readily available statistics such as selectivity factors, tuple size, numbers columns and relational cardinality. Moreover, our method does not need measurement of individual hardware components, but rather total power and energy consumption measured at a server. We have implemented our methodology, and ran experiments with several server configurations. Our experiments indicate that we can predict power and energy more accurately than alternative methods found in the literature.

  5. State Analysis Database Tool

    NASA Technical Reports Server (NTRS)

    Rasmussen, Robert; Bennett, Matthew

    2006-01-01

    The State Analysis Database Tool software establishes a productive environment for collaboration among software and system engineers engaged in the development of complex interacting systems. The tool embodies State Analysis, a model-based system engineering methodology founded on a state-based control architecture (see figure). A state represents a momentary condition of an evolving system, and a model may describe how a state evolves and is affected by other states. The State Analysis methodology is a process for capturing system and software requirements in the form of explicit models and states, and defining goal-based operational plans consistent with the models. Requirements, models, and operational concerns have traditionally been documented in a variety of system engineering artifacts that address different aspects of a mission s lifecycle. In State Analysis, requirements, models, and operations information are State Analysis artifacts that are consistent and stored in a State Analysis Database. The tool includes a back-end database, a multi-platform front-end client, and Web-based administrative functions. The tool is structured to prompt an engineer to follow the State Analysis methodology, to encourage state discovery and model description, and to make software requirements and operations plans consistent with model descriptions.

  6. NASA Records Database

    NASA Technical Reports Server (NTRS)

    Callac, Christopher; Lunsford, Michelle

    2005-01-01

    The NASA Records Database, comprising a Web-based application program and a database, is used to administer an archive of paper records at Stennis Space Center. The system begins with an electronic form, into which a user enters information about records that the user is sending to the archive. The form is smart : it provides instructions for entering information correctly and prompts the user to enter all required information. Once complete, the form is digitally signed and submitted to the database. The system determines which storage locations are not in use, assigns the user s boxes of records to some of them, and enters these assignments in the database. Thereafter, the software tracks the boxes and can be used to locate them. By use of search capabilities of the software, specific records can be sought by box storage locations, accession numbers, record dates, submitting organizations, or details of the records themselves. Boxes can be marked with such statuses as checked out, lost, transferred, and destroyed. The system can generate reports showing boxes awaiting destruction or transfer. When boxes are transferred to the National Archives and Records Administration (NARA), the system can automatically fill out NARA records-transfer forms. Currently, several other NASA Centers are considering deploying the NASA Records Database to help automate their records archives.

  7. Associative memory model for searching an image database by image snippet

    NASA Astrophysics Data System (ADS)

    Khan, Javed I.; Yun, David Y.

    1994-09-01

    This paper presents an associative memory called an multidimensional holographic associative computing (MHAC), which can be potentially used to perform feature based image database query using image snippet. MHAC has the unique capability to selectively focus on specific segments of a query frame during associative retrieval. As a result, this model can perform search on the basis of featural significance described by a subset of the snippet pixels. This capability is critical for visual query in image database because quite often the cognitive index features in the snippet are statistically weak. Unlike, the conventional artificial associative memories, MHAC uses a two level representation and incorporates additional meta-knowledge about the reliability status of segments of information it receives and forwards. In this paper we present the analysis of focus characteristics of MHAC.

  8. The Listeria monocytogenes strain 10403S BioCyc database

    PubMed Central

    Orsi, Renato H.; Bergholz, Teresa M.; Wiedmann, Martin; Boor, Kathryn J.

    2015-01-01

    Listeria monocytogenes is a food-borne pathogen of humans and other animals. The striking ability to survive several stresses usually used for food preservation makes L. monocytogenes one of the biggest concerns to the food industry, while the high mortality of listeriosis in specific groups of humans makes it a great concern for public health. Previous studies have shown that a regulatory network involving alternative sigma (σ) factors and transcription factors is pivotal to stress survival. However, few studies have evaluated at the metabolic networks controlled by these regulatory mechanisms. The L. monocytogenes BioCyc database uses the strain 10403S as a model. Computer-generated initial annotation for all genes also allowed for identification, annotation and display of predicted reactions and pathways carried out by a single cell. Further ongoing manual curation based on published data as well as database mining for selected genes allowed the more refined annotation of functions, which, in turn, allowed for annotation of new pathways and fine-tuning of previously defined pathways to more L. monocytogenes-specific pathways. Using RNA-Seq data, several transcription start sites and promoter regions were mapped to the 10403S genome and annotated within the database. Additionally, the identification of promoter regions and a comprehensive review of available literature allowed the annotation of several regulatory interactions involving σ factors and transcription factors. The L. monocytogenes 10403S BioCyc database is a new resource for researchers studying Listeria and related organisms. It allows users to (i) have a comprehensive view of all reactions and pathways predicted to take place within the cell in the cellular overview, as well as to (ii) upload their own data, such as differential expression data, to visualize the data in the scope of predicted pathways and regulatory networks and to carry on enrichment analyses using several different annotations

  9. LeishCyc: a guide to building a metabolic pathway database and visualization of metabolomic data.

    PubMed

    Saunders, Eleanor C; MacRae, James I; Naderer, Thomas; Ng, Milica; McConville, Malcolm J; Likić, Vladimir A

    2012-01-01

    The complexity of the metabolic networks in even the simplest organisms has raised new challenges in organizing metabolic information. To address this, specialized computer frameworks have been developed to capture, manage, and visualize metabolic knowledge. The leading databases of metabolic information are those organized under the umbrella of the BioCyc project, which consists of the reference database MetaCyc, and a number of pathway/genome databases (PGDBs) each focussed on a specific organism. A number of PGDBs have been developed for bacterial, fungal, and protozoan pathogens, greatly facilitating dissection of the metabolic potential of these organisms and the identification of new drug targets. Leishmania are protozoan parasites belonging to the family Trypanosomatidae that cause a broad spectrum of diseases in humans. In this work we use the LeishCyc database, the BioCyc database for Leishmania major, to describe how to build a BioCyc database from genomic sequences and associated annotations. By using metabolomic data generated in our group, we show how such databases can be utilized to elucidate specific changes in parasite metabolism.

  10. Physiological Information Database (PID)

    EPA Science Inventory

    EPA has developed a physiological information database (created using Microsoft ACCESS) intended to be used in PBPK modeling. The database contains physiological parameter values for humans from early childhood through senescence as well as similar data for laboratory animal spec...

  11. FDA toxicity databases and real-time data entry.

    PubMed

    Arvidson, Kirk B

    2008-11-15

    Structure-searchable electronic databases are valuable new tools that are assisting the FDA in its mission to promptly and efficiently review incoming submissions for regulatory approval of new food additives and food contact substances. The Center for Food Safety and Applied Nutrition's Office of Food Additive Safety (CFSAN/OFAS), in collaboration with Leadscope, Inc., is consolidating genetic toxicity data submitted in food additive petitions from the 1960s to the present day. The Center for Drug Evaluation and Research, Office of Pharmaceutical Science's Informatics and Computational Safety Analysis Staff (CDER/OPS/ICSAS) is separately gathering similar information from their submissions. Presently, these data are distributed in various locations such as paper files, microfiche, and non-standardized toxicology memoranda. The organization of the data into a consistent, searchable format will reduce paperwork, expedite the toxicology review process, and provide valuable information to industry that is currently available only to the FDA. Furthermore, by combining chemical structures with genetic toxicity information, biologically active moieties can be identified and used to develop quantitative structure-activity relationship (QSAR) modeling and testing guidelines. Additionally, chemicals devoid of toxicity data can be compared to known structures, allowing for improved safety review through the identification and analysis of structural analogs. Four database frameworks have been created: bacterial mutagenesis, in vitro chromosome aberration, in vitro mammalian mutagenesis, and in vivo micronucleus. Controlled vocabularies for these databases have been established. The four separate genetic toxicity databases are compiled into a single, structurally-searchable database for easy accessibility of the toxicity information. Beyond the genetic toxicity databases described here, additional databases for subchronic, chronic, and teratogenicity studies have been prepared.

  12. FDA toxicity databases and real-time data entry

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arvidson, Kirk B.

    Structure-searchable electronic databases are valuable new tools that are assisting the FDA in its mission to promptly and efficiently review incoming submissions for regulatory approval of new food additives and food contact substances. The Center for Food Safety and Applied Nutrition's Office of Food Additive Safety (CFSAN/OFAS), in collaboration with Leadscope, Inc., is consolidating genetic toxicity data submitted in food additive petitions from the 1960s to the present day. The Center for Drug Evaluation and Research, Office of Pharmaceutical Science's Informatics and Computational Safety Analysis Staff (CDER/OPS/ICSAS) is separately gathering similar information from their submissions. Presently, these data are distributedmore » in various locations such as paper files, microfiche, and non-standardized toxicology memoranda. The organization of the data into a consistent, searchable format will reduce paperwork, expedite the toxicology review process, and provide valuable information to industry that is currently available only to the FDA. Furthermore, by combining chemical structures with genetic toxicity information, biologically active moieties can be identified and used to develop quantitative structure-activity relationship (QSAR) modeling and testing guidelines. Additionally, chemicals devoid of toxicity data can be compared to known structures, allowing for improved safety review through the identification and analysis of structural analogs. Four database frameworks have been created: bacterial mutagenesis, in vitro chromosome aberration, in vitro mammalian mutagenesis, and in vivo micronucleus. Controlled vocabularies for these databases have been established. The four separate genetic toxicity databases are compiled into a single, structurally-searchable database for easy accessibility of the toxicity information. Beyond the genetic toxicity databases described here, additional databases for subchronic, chronic, and teratogenicity studies have been

  13. Using SQL Databases for Sequence Similarity Searching and Analysis.

    PubMed

    Pearson, William R; Mackey, Aaron J

    2017-09-13

    Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  14. PCoM-DB Update: A Protein Co-Migration Database for Photosynthetic Organisms.

    PubMed

    Takabayashi, Atsushi; Takabayashi, Saeka; Takahashi, Kaori; Watanabe, Mai; Uchida, Hiroko; Murakami, Akio; Fujita, Tomomichi; Ikeuchi, Masahiko; Tanaka, Ayumi

    2017-01-01

    The identification of protein complexes is important for the understanding of protein structure and function and the regulation of cellular processes. We used blue-native PAGE and tandem mass spectrometry to identify protein complexes systematically, and built a web database, the protein co-migration database (PCoM-DB, http://pcomdb.lowtem.hokudai.ac.jp/proteins/top), to provide prediction tools for protein complexes. PCoM-DB provides migration profiles for any given protein of interest, and allows users to compare them with migration profiles of other proteins, showing the oligomeric states of proteins and thus identifying potential interaction partners. The initial version of PCoM-DB (launched in January 2013) included protein complex data for Synechocystis whole cells and Arabidopsis thaliana thylakoid membranes. Here we report PCoM-DB version 2.0, which includes new data sets and analytical tools. Additional data are included from whole cells of the pelagic marine picocyanobacterium Prochlorococcus marinus, the thermophilic cyanobacterium Thermosynechococcus elongatus, the unicellular green alga Chlamydomonas reinhardtii and the bryophyte Physcomitrella patens. The Arabidopsis protein data now include data for intact mitochondria, intact chloroplasts, chloroplast stroma and chloroplast envelopes. The new tools comprise a multiple-protein search form and a heat map viewer for protein migration profiles. Users can compare migration profiles of a protein of interest among different organelles or compare migration profiles among different proteins within the same sample. For Arabidopsis proteins, users can compare migration profiles of a protein of interest with putative homologous proteins from non-Arabidopsis organisms. The updated PCoM-DB will help researchers find novel protein complexes and estimate their evolutionary changes in the green lineage. © The Author 2017. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All

  15. A survey of commercial object-oriented database management systems

    NASA Technical Reports Server (NTRS)

    Atkins, John

    1992-01-01

    The object-oriented data model is the culmination of over thirty years of database research. Initially, database research focused on the need to provide information in a consistent and efficient manner to the business community. Early data models such as the hierarchical model and the network model met the goal of consistent and efficient access to data and were substantial improvements over simple file mechanisms for storing and accessing data. However, these models required highly skilled programmers to provide access to the data. Consequently, in the early 70's E.F. Codd, an IBM research computer scientists, proposed a new data model based on the simple mathematical notion of the relation. This model is known as the Relational Model. In the relational model, data is represented in flat tables (or relations) which have no physical or internal links between them. The simplicity of this model fostered the development of powerful but relatively simple query languages that now made data directly accessible to the general database user. Except for large, multi-user database systems, a database professional was in general no longer necessary. Database professionals found that traditional data in the form of character data, dates, and numeric data were easily represented and managed via the relational model. Commercial relational database management systems proliferated and performance of relational databases improved dramatically. However, there was a growing community of potential database users whose needs were not met by the relational model. These users needed to store data with data types not available in the relational model and who required a far richer modelling environment than that provided by the relational model. Indeed, the complexity of the objects to be represented in the model mandated a new approach to database technology. The Object-Oriented Model was the result.

  16. The 2014 Nucleic Acids Research Database Issue and an updated NAR online Molecular Biology Database Collection.

    PubMed

    Fernández-Suárez, Xosé M; Rigden, Daniel J; Galperin, Michael Y

    2014-01-01

    The 2014 Nucleic Acids Research Database Issue includes descriptions of 58 new molecular biology databases and recent updates to 123 databases previously featured in NAR or other journals. For convenience, the issue is now divided into eight sections that reflect major subject categories. Among the highlights of this issue are six databases of the transcription factor binding sites in various organisms and updates on such popular databases as CAZy, Database of Genomic Variants (DGV), dbGaP, DrugBank, KEGG, miRBase, Pfam, Reactome, SEED, TCDB and UniProt. There is a strong block of structural databases, which includes, among others, the new RNA Bricks database, updates on PDBe, PDBsum, ArchDB, Gene3D, ModBase, Nucleic Acid Database and the recently revived iPfam database. An update on the NCBI's MMDB describes VAST+, an improved tool for protein structure comparison. Two articles highlight the development of the Structural Classification of Proteins (SCOP) database: one describes SCOPe, which automates assignment of new structures to the existing SCOP hierarchy; the other one describes the first version of SCOP2, with its more flexible approach to classifying protein structures. This issue also includes a collection of articles on bacterial taxonomy and metagenomics, which includes updates on the List of Prokaryotic Names with Standing in Nomenclature (LPSN), Ribosomal Database Project (RDP), the Silva/LTP project and several new metagenomics resources. The NAR online Molecular Biology Database Collection, http://www.oxfordjournals.org/nar/database/c/, has been expanded to 1552 databases. The entire Database Issue is freely available online on the Nucleic Acids Research website (http://nar.oxfordjournals.org/).

  17. QSPR model for bioconcentration factors of nonpolar organic compounds using molecular electronegativity distance vector descriptors.

    PubMed

    Qin, Li-Tang; Liu, Shu-Shen; Liu, Hai-Ling

    2010-02-01

    A five-variable model (model M2) was developed for the bioconcentration factors (BCFs) of nonpolar organic compounds (NPOCs) by using molecular electronegativity distance vector (MEDV) to characterize the structures of NPOCs and variable selection and modeling based on prediction (VSMP) to select the optimum descriptors. The estimated correlation coefficient (r (2)) and the leave-one-out cross-validation correlation coefficients (q (2)) of model M2 were 0.9271 and 0.9171, respectively. The model was externally validated by splitting the whole data set into a representative training set of 85 chemicals and a validation set of 29 chemicals. The results show that the main structural factors influencing the BCFs of NPOCs are -cCc, cCcc, -Cl, and -Br (where "-" refers to a single bond and "c" refers to a conjugated bond). The quantitative structure-property relationship (QSPR) model can effectively predict the BCFs of NPOCs, and the predictions of the model can also extend the current BCF database of experimental values.

  18. A PATO-compliant zebrafish screening database (MODB): management of morpholino knockdown screen information.

    PubMed

    Knowlton, Michelle N; Li, Tongbin; Ren, Yongliang; Bill, Brent R; Ellis, Lynda Bm; Ekker, Stephen C

    2008-01-07

    The zebrafish is a powerful model vertebrate amenable to high throughput in vivo genetic analyses. Examples include reverse genetic screens using morpholino knockdown, expression-based screening using enhancer trapping and forward genetic screening using transposon insertional mutagenesis. We have created a database to facilitate web-based distribution of data from such genetic studies. The MOrpholino DataBase is a MySQL relational database with an online, PHP interface. Multiple quality control levels allow differential access to data in raw and finished formats. MODBv1 includes sequence information relating to almost 800 morpholinos and their targets and phenotypic data regarding the dose effect of each morpholino (mortality, toxicity and defects). To improve the searchability of this database, we have incorporated a fixed-vocabulary defect ontology that allows for the organization of morpholino affects based on anatomical structure affected and defect produced. This also allows comparison between species utilizing Phenotypic Attribute Trait Ontology (PATO) designated terminology. MODB is also cross-linked with ZFIN, allowing full searches between the two databases. MODB offers users the ability to retrieve morpholino data by sequence of morpholino or target, name of target, anatomical structure affected and defect produced. MODB data can be used for functional genomic analysis of morpholino design to maximize efficacy and minimize toxicity. MODB also serves as a template for future sequence-based functional genetic screen databases, and it is currently being used as a model for the creation of a mutagenic insertional transposon database.

  19. Investigation of an artificial intelligence technology--Model trees. Novel applications for an immediate release tablet formulation database.

    PubMed

    Shao, Q; Rowe, R C; York, P

    2007-06-01

    This study has investigated an artificial intelligence technology - model trees - as a modelling tool applied to an immediate release tablet formulation database. The modelling performance was compared with artificial neural networks that have been well established and widely applied in the pharmaceutical product formulation fields. The predictability of generated models was validated on unseen data and judged by correlation coefficient R(2). Output from the model tree analyses produced multivariate linear equations which predicted tablet tensile strength, disintegration time, and drug dissolution profiles of similar quality to neural network models. However, additional and valuable knowledge hidden in the formulation database was extracted from these equations. It is concluded that, as a transparent technology, model trees are useful tools to formulators.

  20. Prediction of pelvic organ prolapse using an artificial neural network.

    PubMed

    Robinson, Christopher J; Swift, Steven; Johnson, Donna D; Almeida, Jonas S

    2008-08-01

    The objective of this investigation was to test the ability of a feedforward artificial neural network (ANN) to differentiate patients who have pelvic organ prolapse (POP) from those who retain good pelvic organ support. Following institutional review board approval, patients with POP (n = 87) and controls with good pelvic organ support (n = 368) were identified from the urogynecology research database. Historical and clinical information was extracted from the database. Data analysis included the training of a feedforward ANN, variable selection, and external validation of the model with an independent data set. Twenty variables were used. The median-performing ANN model used a median of 3 (quartile 1:3 to quartile 3:5) variables and achieved an area under the receiver operator curve of 0.90 (external, independent validation set). Ninety percent sensitivity and 83% specificity were obtained in the external validation by ANN classification. Feedforward ANN modeling is applicable to the identification and prediction of POP.

  1. International forensic automotive paint database

    NASA Astrophysics Data System (ADS)

    Bishea, Gregory A.; Buckle, Joe L.; Ryland, Scott G.

    1999-02-01

    The Technical Working Group for Materials Analysis (TWGMAT) is supporting an international forensic automotive paint database. The Federal Bureau of Investigation and the Royal Canadian Mounted Police (RCMP) are collaborating on this effort through TWGMAT. This paper outlines the support and further development of the RCMP's Automotive Paint Database, `Paint Data Query'. This cooperative agreement augments and supports a current, validated, searchable, automotive paint database that is used to identify make(s), model(s), and year(s) of questioned paint samples in hit-and-run fatalities and other associated investigations involving automotive paint.

  2. Very fast road database verification using textured 3D city models obtained from airborne imagery

    NASA Astrophysics Data System (ADS)

    Bulatov, Dimitri; Ziems, Marcel; Rottensteiner, Franz; Pohl, Melanie

    2014-10-01

    Road databases are known to be an important part of any geodata infrastructure, e.g. as the basis for urban planning or emergency services. Updating road databases for crisis events must be performed quickly and with the highest possible degree of automation. We present a semi-automatic algorithm for road verification using textured 3D city models, starting from aerial or even UAV-images. This algorithm contains two processes, which exchange input and output, but basically run independently from each other. These processes are textured urban terrain reconstruction and road verification. The first process contains a dense photogrammetric reconstruction of 3D geometry of the scene using depth maps. The second process is our core procedure, since it contains various methods for road verification. Each method represents a unique road model and a specific strategy, and thus is able to deal with a specific type of roads. Each method is designed to provide two probability distributions, where the first describes the state of a road object (correct, incorrect), and the second describes the state of its underlying road model (applicable, not applicable). Based on the Dempster-Shafer Theory, both distributions are mapped to a single distribution that refers to three states: correct, incorrect, and unknown. With respect to the interaction of both processes, the normalized elevation map and the digital orthophoto generated during 3D reconstruction are the necessary input - together with initial road database entries - for the road verification process. If the entries of the database are too obsolete or not available at all, sensor data evaluation enables classification of the road pixels of the elevation map followed by road map extraction by means of vectorization and filtering of the geometrically and topologically inconsistent objects. Depending on the time issue and availability of a geo-database for buildings, the urban terrain reconstruction procedure has semantic models

  3. MIPS PlantsDB: a database framework for comparative plant genome research.

    PubMed

    Nussbaumer, Thomas; Martis, Mihaela M; Roessner, Stephan K; Pfeifer, Matthias; Bader, Kai C; Sharma, Sapna; Gundlach, Heidrun; Spannagl, Manuel

    2013-01-01

    The rapidly increasing amount of plant genome (sequence) data enables powerful comparative analyses and integrative approaches and also requires structured and comprehensive information resources. Databases are needed for both model and crop plant organisms and both intuitive search/browse views and comparative genomics tools should communicate the data to researchers and help them interpret it. MIPS PlantsDB (http://mips.helmholtz-muenchen.de/plant/genomes.jsp) was initially described in NAR in 2007 [Spannagl,M., Noubibou,O., Haase,D., Yang,L., Gundlach,H., Hindemitt, T., Klee,K., Haberer,G., Schoof,H. and Mayer,K.F. (2007) MIPSPlantsDB-plant database resource for integrative and comparative plant genome research. Nucleic Acids Res., 35, D834-D840] and was set up from the start to provide data and information resources for individual plant species as well as a framework for integrative and comparative plant genome research. PlantsDB comprises database instances for tomato, Medicago, Arabidopsis, Brachypodium, Sorghum, maize, rice, barley and wheat. Building up on that, state-of-the-art comparative genomics tools such as CrowsNest are integrated to visualize and investigate syntenic relationships between monocot genomes. Results from novel genome analysis strategies targeting the complex and repetitive genomes of triticeae species (wheat and barley) are provided and cross-linked with model species. The MIPS Repeat Element Database (mips-REdat) and Catalog (mips-REcat) as well as tight connections to other databases, e.g. via web services, are further important components of PlantsDB.

  4. MIPS PlantsDB: a database framework for comparative plant genome research

    PubMed Central

    Nussbaumer, Thomas; Martis, Mihaela M.; Roessner, Stephan K.; Pfeifer, Matthias; Bader, Kai C.; Sharma, Sapna; Gundlach, Heidrun; Spannagl, Manuel

    2013-01-01

    The rapidly increasing amount of plant genome (sequence) data enables powerful comparative analyses and integrative approaches and also requires structured and comprehensive information resources. Databases are needed for both model and crop plant organisms and both intuitive search/browse views and comparative genomics tools should communicate the data to researchers and help them interpret it. MIPS PlantsDB (http://mips.helmholtz-muenchen.de/plant/genomes.jsp) was initially described in NAR in 2007 [Spannagl,M., Noubibou,O., Haase,D., Yang,L., Gundlach,H., Hindemitt, T., Klee,K., Haberer,G., Schoof,H. and Mayer,K.F. (2007) MIPSPlantsDB–plant database resource for integrative and comparative plant genome research. Nucleic Acids Res., 35, D834–D840] and was set up from the start to provide data and information resources for individual plant species as well as a framework for integrative and comparative plant genome research. PlantsDB comprises database instances for tomato, Medicago, Arabidopsis, Brachypodium, Sorghum, maize, rice, barley and wheat. Building up on that, state-of-the-art comparative genomics tools such as CrowsNest are integrated to visualize and investigate syntenic relationships between monocot genomes. Results from novel genome analysis strategies targeting the complex and repetitive genomes of triticeae species (wheat and barley) are provided and cross-linked with model species. The MIPS Repeat Element Database (mips-REdat) and Catalog (mips-REcat) as well as tight connections to other databases, e.g. via web services, are further important components of PlantsDB. PMID:23203886

  5. Guide on Data Models in the Selection and Use of Database Management Systems. Final Report.

    ERIC Educational Resources Information Center

    Gallagher, Leonard J.; Draper, Jesse M.

    A tutorial introduction to data models in general is provided, with particular emphasis on the relational and network models defined by the two proposed ANSI (American National Standards Institute) database language standards. Examples based on the network and relational models include specific syntax and semantics, while examples from the other…

  6. Uses and limitations of registry and academic databases.

    PubMed

    Williams, William G

    2010-01-01

    A database is simply a structured collection of information. A clinical database may be a Registry (a limited amount of data for every patient undergoing heart surgery) or Academic (an organized and extensive dataset of an inception cohort of carefully selected subset of patients). A registry and an academic database have different purposes and cost. The data to be collected for a database is defined by its purpose and the output reports required for achieving that purpose. A Registry's purpose is to ensure quality care, an Academic Database, to discover new knowledge through research. A database is only as good as the data it contains. Database personnel must be exceptionally committed and supported by clinical faculty. A system to routinely validate and verify data integrity is essential to ensure database utility. Frequent use of the database improves its accuracy. For congenital heart surgeons, routine use of a Registry Database is an essential component of clinical practice. Copyright (c) 2010 Elsevier Inc. All rights reserved.

  7. An Internet service for manipulating 3D models of human organs reconstructed from computer tomography and magnetic resonance imaging.

    PubMed

    Clapworthy, G; Krokos, M; Vasilonikolidakis, N

    1997-11-01

    Our paper describes an integrated methodology addressing the development of an Internet service for medical professionals, medical students and generally, people interested in medicine. The service (currently developed in the framework of IAEVA, a Telematics Application Programme project of the European Union), incorporates a mechanism for retrieving from a relational database (reference library) 3D volumetric models of human organs reconstructed from computer tomography (CT) and/or magnetic resonance imaging (MRI). Retrieval is implemented in a way transparent to the actual physical location of the database. Prospective users are provided with a Solid Object Viewer that offers them manipulation (rotation, zooming, dissection etc.) of 3D volumetric models. The service constitutes an excellent foundation of understanding for medical professionals/students and a mechanism for broad and rapid dissemination of information related to particular pathological conditions; although pathological conditions of the knee and skin are supported currently, our methodology allows easy service extension into other human organs ultimately covering the entire human body. The service accepts most Internet browsers and supports MS-Windows 32 platforms; no graphics accelerators or any specialised hardware are necessary, thereby allowing service availability to the widest possible audience. Nevertheless, the service operates in near real-time not only over high speed expensive network lines but also over low/medium network connections.

  8. A linear solvation energy relationship model of organic chemical partitioning to dissolved organic carbon.

    PubMed

    Kipka, Undine; Di Toro, Dominic M

    2011-09-01

    Predicting the association of contaminants with both particulate and dissolved organic matter is critical in determining the fate and bioavailability of chemicals in environmental risk assessment. To date, the association of a contaminant to particulate organic matter is considered in many multimedia transport models, but the effect of dissolved organic matter is typically ignored due to a lack of either reliable models or experimental data. The partition coefficient to dissolved organic carbon (K(DOC)) may be used to estimate the fraction of a contaminant that is associated with dissolved organic matter. Models relating K(DOC) to the octanol-water partition coefficient (K(OW)) have not been successful for many types of dissolved organic carbon in the environment. Instead, linear solvation energy relationships are proposed to model the association of chemicals with dissolved organic matter. However, more chemically diverse K(DOC) data are needed to produce a more robust model. For humic acid dissolved organic carbon, the linear solvation energy relationship predicts log K(DOC) with a root mean square error of 0.43. Copyright © 2011 SETAC.

  9. PathCase-SB architecture and database design

    PubMed Central

    2011-01-01

    Background Integration of metabolic pathways resources and regulatory metabolic network models, and deploying new tools on the integrated platform can help perform more effective and more efficient systems biology research on understanding the regulation in metabolic networks. Therefore, the tasks of (a) integrating under a single database environment regulatory metabolic networks and existing models, and (b) building tools to help with modeling and analysis are desirable and intellectually challenging computational tasks. Description PathCase Systems Biology (PathCase-SB) is built and released. The PathCase-SB database provides data and API for multiple user interfaces and software tools. The current PathCase-SB system provides a database-enabled framework and web-based computational tools towards facilitating the development of kinetic models for biological systems. PathCase-SB aims to integrate data of selected biological data sources on the web (currently, BioModels database and KEGG), and to provide more powerful and/or new capabilities via the new web-based integrative framework. This paper describes architecture and database design issues encountered in PathCase-SB's design and implementation, and presents the current design of PathCase-SB's architecture and database. Conclusions PathCase-SB architecture and database provide a highly extensible and scalable environment with easy and fast (real-time) access to the data in the database. PathCase-SB itself is already being used by researchers across the world. PMID:22070889

  10. OBSIFRAC: database-supported software for 3D modeling of rock mass fragmentation

    NASA Astrophysics Data System (ADS)

    Empereur-Mot, Luc; Villemin, Thierry

    2003-03-01

    Under stress, fractures in rock masses tend to form fully connected networks. The mass can thus be thought of as a 3D series of blocks produced by fragmentation processes. A numerical model has been developed that uses a relational database to describe such a mass. The model, which assumes the fractures to be plane, allows data from natural networks to test theories concerning fragmentation processes. In the model, blocks are bordered by faces that are composed of edges and vertices. A fracture can originate from a seed point, its orientation being controlled by the stress field specified by an orientation matrix. Alternatively, it can be generated from a discrete set of given orientations and positions. Both kinds of fracture can occur together in a model. From an original simple block, a given fracture produces two simple polyhedral blocks, and the original block becomes compound. Compound and simple blocks created throughout fragmentation are stored in the database. Several fragmentation processes have been studied. In one scenario, a constant proportion of blocks is fragmented at each step of the process. The resulting distribution appears to be fractal, although seed points are random in each fragmented block. In a second scenario, division affects only one random block at each stage of the process, and gives a Weibull volume distribution law. This software can be used for a large number of other applications.

  11. Statistical modeling of occupational chlorinated solvent exposures for case–control studies using a literature-based database

    PubMed Central

    Hein, Misty J.; Waters, Martha A.; Ruder, Avima M.; Stenzel, Mark R.; Blair, Aaron; Stewart, Patricia A.

    2010-01-01

    Objectives: Occupational exposure assessment for population-based case–control studies is challenging due to the wide variety of industries and occupations encountered by study participants. We developed and evaluated statistical models to estimate the intensity of exposure to three chlorinated solvents—methylene chloride, 1,1,1-trichloroethane, and trichloroethylene—using a database of air measurement data and associated exposure determinants. Methods: A measurement database was developed after an extensive review of the published industrial hygiene literature. The database of nearly 3000 measurements or summary measurements included sample size, measurement characteristics (year, duration, and type), and several potential exposure determinants associated with the measurements: mechanism of release (e.g. evaporation), process condition, temperature, usage rate, type of ventilation, location, presence of a confined space, and proximity to the source. The natural log-transformed measurement levels in the exposure database were modeled as a function of the measurement characteristics and exposure determinants using maximum likelihood methods. Assuming a single lognormal distribution of the measurements, an arithmetic mean exposure intensity level was estimated for each unique combination of exposure determinants and decade. Results: The proportions of variability in the measurement data explained by the modeled measurement characteristics and exposure determinants were 36, 38, and 54% for methylene chloride, 1,1,1-trichloroethane, and trichloroethylene, respectively. Model parameter estimates for the exposure determinants were in the anticipated direction. Exposure intensity estimates were plausible and exhibited internal consistency, but the ability to evaluate validity was limited. Conclusions: These prediction models can be used to estimate chlorinated solvent exposure intensity for jobs reported by population-based case–control study participants that

  12. A prediction model-based algorithm for computer-assisted database screening of adverse drug reactions in the Netherlands.

    PubMed

    Scholl, Joep H G; van Hunsel, Florence P A M; Hak, Eelko; van Puijenbroek, Eugène P

    2018-02-01

    The statistical screening of pharmacovigilance databases containing spontaneously reported adverse drug reactions (ADRs) is mainly based on disproportionality analysis. The aim of this study was to improve the efficiency of full database screening using a prediction model-based approach. A logistic regression-based prediction model containing 5 candidate predictors was developed and internally validated using the Summary of Product Characteristics as the gold standard for the outcome. All drug-ADR associations, with the exception of those related to vaccines, with a minimum of 3 reports formed the training data for the model. Performance was based on the area under the receiver operating characteristic curve (AUC). Results were compared with the current method of database screening based on the number of previously analyzed associations. A total of 25 026 unique drug-ADR associations formed the training data for the model. The final model contained all 5 candidate predictors (number of reports, disproportionality, reports from healthcare professionals, reports from marketing authorization holders, Naranjo score). The AUC for the full model was 0.740 (95% CI; 0.734-0.747). The internal validity was good based on the calibration curve and bootstrapping analysis (AUC after bootstrapping = 0.739). Compared with the old method, the AUC increased from 0.649 to 0.740, and the proportion of potential signals increased by approximately 50% (from 12.3% to 19.4%). A prediction model-based approach can be a useful tool to create priority-based listings for signal detection in databases consisting of spontaneous ADRs. © 2017 The Authors. Pharmacoepidemiology & Drug Safety Published by John Wiley & Sons Ltd.

  13. Teaching Case: Introduction to NoSQL in a Traditional Database Course

    ERIC Educational Resources Information Center

    Fowler, Brad; Godin, Joy; Geddy, Margaret

    2016-01-01

    Many organizations are dealing with the increasing demands of big data, so they are turning to NoSQL databases as their preferred system for handling the unique problems of capturing and storing massive amounts of data. Therefore, it is likely that employees in all sizes of organizations will encounter NoSQL databases. Thus, to be more job-ready,…

  14. A Conceptual Model and Database to Integrate Data and Project Management

    NASA Astrophysics Data System (ADS)

    Guarinello, M. L.; Edsall, R.; Helbling, J.; Evaldt, E.; Glenn, N. F.; Delparte, D.; Sheneman, L.; Schumaker, R.

    2015-12-01

    Data management is critically foundational to doing effective science in our data-intensive research era and done well can enhance collaboration, increase the value of research data, and support requirements by funding agencies to make scientific data and other research products available through publically accessible online repositories. However, there are few examples (but see the Long-term Ecological Research Network Data Portal) of these data being provided in such a manner that allows exploration within the context of the research process - what specific research questions do these data seek to answer? what data were used to answer these questions? what data would have been helpful to answer these questions but were not available? We propose an agile conceptual model and database design, as well as example results, that integrate data management with project management not only to maximize the value of research data products but to enhance collaboration during the project and the process of project management itself. In our project, which we call 'Data Map,' we used agile principles by adopting a user-focused approach and by designing our database to be simple, responsive, and expandable. We initially designed Data Map for the Idaho EPSCoR project "Managing Idaho's Landscapes for Ecosystem Services (MILES)" (see https://www.idahoecosystems.org//) and will present example results for this work. We consulted with our primary users- project managers, data managers, and researchers to design the Data Map. Results will be useful to project managers and to funding agencies reviewing progress because they will readily provide answers to the questions "For which research projects/questions are data available and/or being generated by MILES researchers?" and "Which research projects/questions are associated with each of the 3 primary questions from the MILES proposal?" To be responsive to the needs of the project, we chose to streamline our design for the prototype

  15. SORTEZ: a relational translator for NCBI's ASN.1 database.

    PubMed

    Hart, K W; Searls, D B; Overton, G C

    1994-07-01

    The National Center for Biotechnology Information (NCBI) has created a database collection that includes several protein and nucleic acid sequence databases, a biosequence-specific subset of MEDLINE, as well as value-added information such as links between similar sequences. Information in the NCBI database is modeled in Abstract Syntax Notation 1 (ASN.1) an Open Systems Interconnection protocol designed for the purpose of exchanging structured data between software applications rather than as a data model for database systems. While the NCBI database is distributed with an easy-to-use information retrieval system, ENTREZ, the ASN.1 data model currently lacks an ad hoc query language for general-purpose data access. For that reason, we have developed a software package, SORTEZ, that transforms the ASN.1 database (or other databases with nested data structures) to a relational data model and subsequently to a relational database management system (Sybase) where information can be accessed through the relational query language, SQL. Because the need to transform data from one data model and schema to another arises naturally in several important contexts, including efficient execution of specific applications, access to multiple databases and adaptation to database evolution this work also serves as a practical study of the issues involved in the various stages of database transformation. We show that transformation from the ASN.1 data model to a relational data model can be largely automated, but that schema transformation and data conversion require considerable domain expertise and would greatly benefit from additional support tools.

  16. Creating a model to detect dairy cattle farms with poor welfare using a national database.

    PubMed

    Krug, C; Haskell, M J; Nunes, T; Stilwell, G

    2015-12-01

    The objective of this study was to determine whether dairy farms with poor cow welfare could be identified using a national database for bovine identification and registration that monitors cattle deaths and movements. The welfare of dairy cattle was assessed using the Welfare Quality(®) protocol (WQ) on 24 Portuguese dairy farms and on 1930 animals. Five farms were classified as having poor welfare and the other 19 were classified as having good welfare. Fourteen million records from the national cattle database were analysed to identify potential welfare indicators for dairy farms. Fifteen potential national welfare indicators were calculated based on that database, and the link between the results on the WQ evaluation and the national cattle database was made using the identification code of each farm. Within the potential national welfare indicators, only two were significantly different between farms with good welfare and poor welfare, 'proportion of on-farm deaths' (p<0.01) and 'female/male birth ratio' (p<0.05). To determine whether the database welfare indicators could be used to distinguish farms with good welfare from farms with poor welfare, we created a model using the classifier J48 of Waikato Environment for Knowledge Analysis. The model was a decision tree based on two variables, 'proportion of on-farm deaths' and 'calving-to-calving interval', and it was able to correctly identify 70% and 79% of the farms classified as having poor and good welfare, respectively. The national cattle database analysis could be useful in helping official veterinary services in detecting farms that have poor welfare and also in determining which welfare indicators are poor on each particular farm. Copyright © 2015 Elsevier B.V. All rights reserved.

  17. A Canadian upland forest soil profile and carbon stocks database.

    PubMed

    Shaw, Cindy; Hilger, Arlene; Filiatrault, Michelle; Kurz, Werner

    2018-04-01

    "A Canadian upland forest soil profile and carbon stocks database" was compiled in phases over a period of 10 years to address various questions related to modeling upland forest soil carbon in a national forest carbon accounting model. For 3,253 pedons, the SITES table contains estimates for soil organic carbon stocks (Mg/ha) in organic horizons and mineral horizons to a 100-cm depth, soil taxonomy, leading tree species, mean annual temperature, annual precipitation, province or territory, terrestrial ecozone, and latitude and longitude, with an assessment of the quality of information about location. The PROFILES table contains profile data (16,167 records by horizon) used to estimate the carbon stocks that appear in the SITES table, plus additional soil chemical and physical data, where provided by the data source. The exceptions to this are estimates for soil carbon stocks based on Canadian National Forest Inventory data (NFI [2006] in REFERENCES table), where data were collected by depth increment rather than horizon and, therefore, total soil carbon stocks were calculated separately before being entered into the SITES table. Data in the PROFILES table include the carbon stock estimate for each horizon (corrected for coarse fragment content), and the data used to calculate the carbon stock estimate, such as horizon thickness, bulk density, and percent organic carbon. The PROFILES table also contains data, when reported by the source, for percent carbonate carbon, pH, percent total nitrogen, particle size distribution (percent sand, silt, clay), texture class, exchangeable cations, cation and total exchange capacity, and percent Fe and Al. An additional table provides references (REFERENCES table) for the source data. Earlier versions of the database were used to develop national soil carbon modeling categories based on differences in carbon stocks linked to soil taxonomy and to examine the potential of using soil taxonomy and leading tree species to improve

  18. A database and tool for boundary conditions for regional air quality modeling: description and evaluation

    NASA Astrophysics Data System (ADS)

    Henderson, B. H.; Akhtar, F.; Pye, H. O. T.; Napelenok, S. L.; Hutzell, W. T.

    2014-02-01

    Transported air pollutants receive increasing attention as regulations tighten and global concentrations increase. The need to represent international transport in regional air quality assessments requires improved representation of boundary concentrations. Currently available observations are too sparse vertically to provide boundary information, particularly for ozone precursors, but global simulations can be used to generate spatially and temporally varying lateral boundary conditions (LBC). This study presents a public database of global simulations designed and evaluated for use as LBC for air quality models (AQMs). The database covers the contiguous United States (CONUS) for the years 2001-2010 and contains hourly varying concentrations of ozone, aerosols, and their precursors. The database is complemented by a tool for configuring the global results as inputs to regional scale models (e.g., Community Multiscale Air Quality or Comprehensive Air quality Model with extensions). This study also presents an example application based on the CONUS domain, which is evaluated against satellite retrieved ozone and carbon monoxide vertical profiles. The results show performance is largely within uncertainty estimates for ozone from the Ozone Monitoring Instrument and carbon monoxide from the Measurements Of Pollution In The Troposphere (MOPITT), but there were some notable biases compared with Tropospheric Emission Spectrometer (TES) ozone. Compared with TES, our ozone predictions are high-biased in the upper troposphere, particularly in the south during January. This publication documents the global simulation database, the tool for conversion to LBC, and the evaluation of concentrations on the boundaries. This documentation is intended to support applications that require representation of long-range transport of air pollutants.

  19. The Cardiac Atlas Project--an imaging database for computational modeling and statistical atlases of the heart.

    PubMed

    Fonseca, Carissa G; Backhaus, Michael; Bluemke, David A; Britten, Randall D; Chung, Jae Do; Cowan, Brett R; Dinov, Ivo D; Finn, J Paul; Hunter, Peter J; Kadish, Alan H; Lee, Daniel C; Lima, Joao A C; Medrano-Gracia, Pau; Shivkumar, Kalyanam; Suinesiaputra, Avan; Tao, Wenchao; Young, Alistair A

    2011-08-15

    Integrative mathematical and statistical models of cardiac anatomy and physiology can play a vital role in understanding cardiac disease phenotype and planning therapeutic strategies. However, the accuracy and predictive power of such models is dependent upon the breadth and depth of noninvasive imaging datasets. The Cardiac Atlas Project (CAP) has established a large-scale database of cardiac imaging examinations and associated clinical data in order to develop a shareable, web-accessible, structural and functional atlas of the normal and pathological heart for clinical, research and educational purposes. A goal of CAP is to facilitate collaborative statistical analysis of regional heart shape and wall motion and characterize cardiac function among and within population groups. Three main open-source software components were developed: (i) a database with web-interface; (ii) a modeling client for 3D + time visualization and parametric description of shape and motion; and (iii) open data formats for semantic characterization of models and annotations. The database was implemented using a three-tier architecture utilizing MySQL, JBoss and Dcm4chee, in compliance with the DICOM standard to provide compatibility with existing clinical networks and devices. Parts of Dcm4chee were extended to access image specific attributes as search parameters. To date, approximately 3000 de-identified cardiac imaging examinations are available in the database. All software components developed by the CAP are open source and are freely available under the Mozilla Public License Version 1.1 (http://www.mozilla.org/MPL/MPL-1.1.txt). http://www.cardiacatlas.org a.young@auckland.ac.nz Supplementary data are available at Bioinformatics online.

  20. MEPD: a Medaka gene expression pattern database

    PubMed Central

    Henrich, Thorsten; Ramialison, Mirana; Quiring, Rebecca; Wittbrodt, Beate; Furutani-Seiki, Makoto; Wittbrodt, Joachim; Kondoh, Hisato

    2003-01-01

    The Medaka Expression Pattern Database (MEPD) stores and integrates information of gene expression during embryonic development of the small freshwater fish Medaka (Oryzias latipes). Expression patterns of genes identified by ESTs are documented by images and by descriptions through parameters such as staining intensity, category and comments and through a comprehensive, hierarchically organized dictionary of anatomical terms. Sequences of the ESTs are available and searchable through BLAST. ESTs in the database are clustered upon entry and have been blasted against public data-bases. The BLAST results are updated regularly, stored within the database and searchable. The MEPD is a project within the Medaka Genome Initiative (MGI) and entries will be interconnected to integrated genomic map databases. MEPD is accessible through the WWW at http://medaka.dsp.jst.go.jp/MEPD. PMID:12519950

  1. Creation of the NaSCoRD Database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Denman, Matthew R.; Jankovsky, Zachary Kyle; Stuart, William

    This report was written as part of a United States Department of Energy (DOE), Office of Nuclear Energy, Advanced Reactor Technologies program funded project to re-create the capabilities of the legacy Centralized Reliability Database Organization (CREDO) database. The CREDO database provided a record of component design and performance documentation across various systems that used sodium as a working fluid. Regaining this capability will allow the DOE complex and the domestic sodium reactor industry to better understand how previous systems were designed and built for use in improving the design and operations of future loops. The contents of this report include:more » overview of the current state of domestic sodium reliability databases; summary of the ongoing effort to improve, understand, and process the CREDO information; summary of the initial efforts to develop a unified sodium reliability database called the Sodium System Component Reliability Database (NaSCoRD); and explain both how potential users can access the domestic sodium reliability databases and the type of information that can be accessed from these databases.« less

  2. Mouse Genome Database: From sequence to phenotypes and disease models

    PubMed Central

    Richardson, Joel E.; Kadin, James A.; Smith, Cynthia L.; Blake, Judith A.; Bult, Carol J.

    2015-01-01

    Summary The Mouse Genome Database (MGD, www.informatics.jax.org) is the international scientific database for genetic, genomic, and biological data on the laboratory mouse to support the research requirements of the biomedical community. To accomplish this goal, MGD provides broad data coverage, serves as the authoritative standard for mouse nomenclature for genes, mutants, and strains, and curates and integrates many types of data from literature and electronic sources. Among the key data sets MGD supports are: the complete catalog of mouse genes and genome features, comparative homology data for mouse and vertebrate genes, the authoritative set of Gene Ontology (GO) annotations for mouse gene functions, a comprehensive catalog of mouse mutations and their phenotypes, and a curated compendium of mouse models of human diseases. Here, we describe the data acquisition process, specifics about MGD's key data areas, methods to access and query MGD data, and outreach and user help facilities. genesis 53:458–473, 2015. © 2015 The Authors. Genesis Published by Wiley Periodicals, Inc. PMID:26150326

  3. Concept-oriented indexing of video databases: toward semantic sensitive retrieval and browsing.

    PubMed

    Fan, Jianping; Luo, Hangzai; Elmagarmid, Ahmed K

    2004-07-01

    Digital video now plays an important role in medical education, health care, telemedicine and other medical applications. Several content-based video retrieval (CBVR) systems have been proposed in the past, but they still suffer from the following challenging problems: semantic gap, semantic video concept modeling, semantic video classification, and concept-oriented video database indexing and access. In this paper, we propose a novel framework to make some advances toward the final goal to solve these problems. Specifically, the framework includes: 1) a semantic-sensitive video content representation framework by using principal video shots to enhance the quality of features; 2) semantic video concept interpretation by using flexible mixture model to bridge the semantic gap; 3) a novel semantic video-classifier training framework by integrating feature selection, parameter estimation, and model selection seamlessly in a single algorithm; and 4) a concept-oriented video database organization technique through a certain domain-dependent concept hierarchy to enable semantic-sensitive video retrieval and browsing.

  4. Longitudinal driver model and collision warning and avoidance algorithms based on human driving databases

    NASA Astrophysics Data System (ADS)

    Lee, Kangwon

    Intelligent vehicle systems, such as Adaptive Cruise Control (ACC) or Collision Warning/Collision Avoidance (CW/CA), are currently under development, and several companies have already offered ACC on selected models. Control or decision-making algorithms of these systems are commonly evaluated under extensive computer simulations and well-defined scenarios on test tracks. However, they have rarely been validated with large quantities of naturalistic human driving data. This dissertation utilized two University of Michigan Transportation Research Institute databases (Intelligent Cruise Control Field Operational Test and System for Assessment of Vehicle Motion Environment) in the development and evaluation of longitudinal driver models and CW/CA algorithms. First, to examine how drivers normally follow other vehicles, the vehicle motion data from the databases were processed using a Kalman smoother. The processed data was then used to fit and evaluate existing longitudinal driver models (e.g., the linear follow-the-leader model, the Newell's special model, the nonlinear follow-the-leader model, the linear optimal control model, the Gipps model and the optimal velocity model). A modified version of the Gipps model was proposed and found to be accurate in both microscopic (vehicle) and macroscopic (traffic) senses. Second, to examine emergency braking behavior and to evaluate CW/CA algorithms, the concepts of signal detection theory and a performance index suitable for unbalanced situations (few threatening data points vs. many safe data points) are introduced. Selected existing CW/CA algorithms were found to have a performance index (geometric mean of true-positive rate and precision) not exceeding 20%. To optimize the parameters of the CW/CA algorithms, a new numerical optimization scheme was developed to replace the original data points with their representative statistics. A new CW/CA algorithm was proposed, which was found to score higher than 55% in the

  5. Data Mining on Distributed Medical Databases: Recent Trends and Future Directions

    NASA Astrophysics Data System (ADS)

    Atilgan, Yasemin; Dogan, Firat

    As computerization in healthcare services increase, the amount of available digital data is growing at an unprecedented rate and as a result healthcare organizations are much more able to store data than to extract knowledge from it. Today the major challenge is to transform these data into useful information and knowledge. It is important for healthcare organizations to use stored data to improve quality while reducing cost. This paper first investigates the data mining applications on centralized medical databases, and how they are used for diagnostic and population health, then introduces distributed databases. The integration needs and issues of distributed medical databases are described. Finally the paper focuses on data mining studies on distributed medical databases.

  6. Databases for multilevel biophysiology research available at Physiome.jp.

    PubMed

    Asai, Yoshiyuki; Abe, Takeshi; Li, Li; Oka, Hideki; Nomura, Taishin; Kitano, Hiroaki

    2015-01-01

    Physiome.jp (http://physiome.jp) is a portal site inaugurated in 2007 to support model-based research in physiome and systems biology. At Physiome.jp, several tools and databases are available to support construction of physiological, multi-hierarchical, large-scale models. There are three databases in Physiome.jp, housing mathematical models, morphological data, and time-series data. In late 2013, the site was fully renovated, and in May 2015, new functions were implemented to provide information infrastructure to support collaborative activities for developing models and performing simulations within the database framework. This article describes updates to the databases implemented since 2013, including cooperation among the three databases, interactive model browsing, user management, version management of models, management of parameter sets, and interoperability with applications.

  7. Observational database for studies of nearby universe

    NASA Astrophysics Data System (ADS)

    Kaisina, E. I.; Makarov, D. I.; Karachentsev, I. D.; Kaisin, S. S.

    2012-01-01

    We present the description of a database of galaxies of the Local Volume (LVG), located within 10 Mpc around the Milky Way. It contains more than 800 objects. Based on an analysis of functional capabilities, we used the PostgreSQL DBMS as a management system for our LVG database. Applying semantic modelling methods, we developed a physical ER-model of the database. We describe the developed architecture of the database table structure, and the implemented web-access, available at http://www.sao.ru/lv/lvgdb.

  8. Evaluation of a vortex-based subgrid stress model using DNS databases

    NASA Technical Reports Server (NTRS)

    Misra, Ashish; Lund, Thomas S.

    1996-01-01

    The performance of a SubGrid Stress (SGS) model for Large-Eddy Simulation (LES) developed by Misra k Pullin (1996) is studied for forced and decaying isotropic turbulence on a 32(exp 3) grid. The physical viability of the model assumptions are tested using DNS databases. The results from LES of forced turbulence at Taylor Reynolds number R(sub (lambda)) approximately equals 90 are compared with filtered DNS fields. Probability density functions (pdfs) of the subgrid energy transfer, total dissipation, and the stretch of the subgrid vorticity by the resolved velocity-gradient tensor show reasonable agreement with the DNS data. The model is also tested in LES of decaying isotropic turbulence where it correctly predicts the decay rate and energy spectra measured by Comte-Bellot & Corrsin (1971).

  9. DIMA quick start, database for inventory, monitoring and assessment

    USDA-ARS?s Scientific Manuscript database

    The Database for Inventory, Monitoring and Assessment (DIMA) is a highly-customized Microsoft Access database for collecting data electronically in the field and for organizing, storing and reporting those data for monitoring and assessment. While DIMA can be used for any number of different monito...

  10. Outreach and online training services at the Saccharomyces Genome Database.

    PubMed

    MacPherson, Kevin A; Starr, Barry; Wong, Edith D; Dalusag, Kyla S; Hellerstedt, Sage T; Lang, Olivia W; Nash, Robert S; Skrzypek, Marek S; Engel, Stacia R; Cherry, J Michael

    2017-01-01

    The Saccharomyces Genome Database (SGD; www.yeastgenome.org ), the primary genetics and genomics resource for the budding yeast S. cerevisiae , provides free public access to expertly curated information about the yeast genome and its gene products. As the central hub for the yeast research community, SGD engages in a variety of social outreach efforts to inform our users about new developments, promote collaboration, increase public awareness of the importance of yeast to biomedical research, and facilitate scientific discovery. Here we describe these various outreach methods, from networking at scientific conferences to the use of online media such as blog posts and webinars, and include our perspectives on the benefits provided by outreach activities for model organism databases. http://www.yeastgenome.org. © The Author(s) 2017. Published by Oxford University Press.

  11. WMC Database Evaluation. Case Study Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Palounek, Andrea P. T

    The WMC Database is ultimately envisioned to hold a collection of experimental data, design information, and information from computational models. This project was a first attempt at using the Database to access experimental data and extract information from it. This evaluation shows that the Database concept is sound and robust, and that the Database, once fully populated, should remain eminently usable for future researchers.

  12. SNPdbe: constructing an nsSNP functional impacts database.

    PubMed

    Schaefer, Christian; Meier, Alice; Rost, Burkhard; Bromberg, Yana

    2012-02-15

    Many existing databases annotate experimentally characterized single nucleotide polymorphisms (SNPs). Each non-synonymous SNP (nsSNP) changes one amino acid in the gene product (single amino acid substitution;SAAS). This change can either affect protein function or be neutral in that respect. Most polymorphisms lack experimental annotation of their functional impact. Here, we introduce SNPdbe-SNP database of effects, with predictions of computationally annotated functional impacts of SNPs. Database entries represent nsSNPs in dbSNP and 1000 Genomes collection, as well as variants from UniProt and PMD. SAASs come from >2600 organisms; 'human' being the most prevalent. The impact of each SAAS on protein function is predicted using the SNAP and SIFT algorithms and augmented with experimentally derived function/structure information and disease associations from PMD, OMIM and UniProt. SNPdbe is consistently updated and easily augmented with new sources of information. The database is available as an MySQL dump and via a web front end that allows searches with any combination of organism names, sequences and mutation IDs. http://www.rostlab.org/services/snpdbe.

  13. INTERCOMPARISON OF ALTERNATIVE VEGETATION DATABASES FOR REGIONAL AIR QUALITY MODELING

    EPA Science Inventory

    Vegetation cover data are used to characterize several regional air quality modeling processes, including the calculation of heat, moisture, and momentum fluxes with the Mesoscale Meteorological Model (MM5) and the estimate of biogenic volatile organic compound and nitric oxide...

  14. Research Directions in Database Security IV

    DTIC Science & Technology

    1993-07-01

    second algorithm, which is based on multiversion timestamp ordering, is that high level transactions can be forced to read arbitrarily old data values...system. The first, the single ver- sion model, stores only the latest veision of each data item, while the second, the 88 multiversion model, stores... Multiversion Database Model In the standard database model, where there is only one version of each data item, all transactions compete for the most recent

  15. Benchmarking density functional tight binding models for barrier heights and reaction energetics of organic molecules.

    PubMed

    Gruden, Maja; Andjeklović, Ljubica; Jissy, Akkarapattiakal Kuriappan; Stepanović, Stepan; Zlatar, Matija; Cui, Qiang; Elstner, Marcus

    2017-09-30

    Density Functional Tight Binding (DFTB) models are two to three orders of magnitude faster than ab initio and Density Functional Theory (DFT) methods and therefore are particularly attractive in applications to large molecules and condensed phase systems. To establish the applicability of DFTB models to general chemical reactions, we conduct benchmark calculations for barrier heights and reaction energetics of organic molecules using existing databases and several new ones compiled in this study. Structures for the transition states and stable species have been fully optimized at the DFTB level, making it possible to characterize the reliability of DFTB models in a more thorough fashion compared to conducting single point energy calculations as done in previous benchmark studies. The encouraging results for the diverse sets of reactions studied here suggest that DFTB models, especially the most recent third-order version (DFTB3/3OB augmented with dispersion correction), in most cases provide satisfactory description of organic chemical reactions with accuracy almost comparable to popular DFT methods with large basis sets, although larger errors are also seen for certain cases. Therefore, DFTB models can be effective for mechanistic analysis (e.g., transition state search) of large (bio)molecules, especially when coupled with single point energy calculations at higher levels of theory. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  16. Design of a Multi Dimensional Database for the Archimed DataWarehouse.

    PubMed

    Bréant, Claudine; Thurler, Gérald; Borst, François; Geissbuhler, Antoine

    2005-01-01

    The Archimed data warehouse project started in 1993 at the Geneva University Hospital. It has progressively integrated seven data marts (or domains of activity) archiving medical data such as Admission/Discharge/Transfer (ADT) data, laboratory results, radiology exams, diagnoses, and procedure codes. The objective of the Archimed data warehouse is to facilitate the access to an integrated and coherent view of patient medical in order to support analytical activities such as medical statistics, clinical studies, retrieval of similar cases and data mining processes. This paper discusses three principal design aspects relative to the conception of the database of the data warehouse: 1) the granularity of the database, which refers to the level of detail or summarization of data, 2) the database model and architecture, describing how data will be presented to end users and how new data is integrated, 3) the life cycle of the database, in order to ensure long term scalability of the environment. Both, the organization of patient medical data using a standardized elementary fact representation and the use of the multi dimensional model have proved to be powerful design tools to integrate data coming from the multiple heterogeneous database systems part of the transactional Hospital Information System (HIS). Concurrently, the building of the data warehouse in an incremental way has helped to control the evolution of the data content. These three design aspects bring clarity and performance regarding data access. They also provide long term scalability to the system and resilience to further changes that may occur in source systems feeding the data warehouse.

  17. BNDB - the Biochemical Network Database.

    PubMed

    Küntzer, Jan; Backes, Christina; Blum, Torsten; Gerasch, Andreas; Kaufmann, Michael; Kohlbacher, Oliver; Lenhof, Hans-Peter

    2007-10-02

    Technological advances in high-throughput techniques and efficient data acquisition methods have resulted in a massive amount of life science data. The data is stored in numerous databases that have been established over the last decades and are essential resources for scientists nowadays. However, the diversity of the databases and the underlying data models make it difficult to combine this information for solving complex problems in systems biology. Currently, researchers typically have to browse several, often highly focused, databases to obtain the required information. Hence, there is a pressing need for more efficient systems for integrating, analyzing, and interpreting these data. The standardization and virtual consolidation of the databases is a major challenge resulting in a unified access to a variety of data sources. We present the Biochemical Network Database (BNDB), a powerful relational database platform, allowing a complete semantic integration of an extensive collection of external databases. BNDB is built upon a comprehensive and extensible object model called BioCore, which is powerful enough to model most known biochemical processes and at the same time easily extensible to be adapted to new biological concepts. Besides a web interface for the search and curation of the data, a Java-based viewer (BiNA) provides a powerful platform-independent visualization and navigation of the data. BiNA uses sophisticated graph layout algorithms for an interactive visualization and navigation of BNDB. BNDB allows a simple, unified access to a variety of external data sources. Its tight integration with the biochemical network library BN++ offers the possibility for import, integration, analysis, and visualization of the data. BNDB is freely accessible at http://www.bndb.org.

  18. Modeling Secondary Organic Aerosol Formation From Emissions of Combustion Sources

    NASA Astrophysics Data System (ADS)

    Jathar, Shantanu Hemant

    Atmospheric aerosols exert a large influence on the Earth's climate and cause adverse public health effects, reduced visibility and material degradation. Secondary organic aerosol (SOA), defined as the aerosol mass arising from the oxidation products of gas-phase organic species, accounts for a significant fraction of the submicron atmospheric aerosol mass. Yet, there are large uncertainties surrounding the sources, atmospheric evolution and properties of SOA. This thesis combines laboratory experiments, extensive data analysis and global modeling to investigate the contribution of semi-volatile and intermediate volatility organic compounds (SVOC and IVOC) from combustion sources to SOA formation. The goals are to quantify the contribution of these emissions to ambient PM and to evaluate and improve models to simulate its formation. To create a database for model development and evaluation, a series of smog chamber experiments were conducted on evaporated fuel, which served as surrogates for real-world combustion emissions. Diesel formed the most SOA followed by conventional jet fuel / jet fuel derived from natural gas, gasoline and jet fuel derived from coal. The variability in SOA formation from actual combustion emissions can be partially explained by the composition of the fuel. Several models were developed and tested along with existing models using SOA data from smog chamber experiments conducted using evaporated fuel (this work, gasoline, fischertropschs, jet fuel, diesels) and published data on dilute combustion emissions (aircraft, on- and off-road gasoline, on- and off-road diesel, wood burning, biomass burning). For all of the SOA data, existing models under-predicted SOA formation if SVOC/IVOC were not included. For the evaporated fuel experiments, when SVOC/IVOC were included predictions using the existing SOA model were brought to within a factor of two of measurements with minor adjustments to model parameterizations. Further, a volatility

  19. YPD™, PombePD™ and WormPD™: model organism volumes of the BioKnowledge™ Library, an integrated resource for protein information

    PubMed Central

    Costanzo, Maria C.; Crawford, Matthew E.; Hirschman, Jodi E.; Kranz, Janice E.; Olsen, Philip; Robertson, Laura S.; Skrzypek, Marek S.; Braun, Burkhard R.; Hopkins, Kelley Lennon; Kondu, Pinar; Lengieza, Carey; Lew-Smith, Jodi E.; Tillberg, Michael; Garrels, James I.

    2001-01-01

    The BioKnowledge Library is a relational database and web site (http://www.proteome.com) composed of protein-specific information collected from the scientific literature. Each Protein Report on the web site summarizes and displays published information about a single protein, including its biochemical function, role in the cell and in the whole organism, localization, mutant phenotype and genetic interactions, regulation, domains and motifs, interactions with other proteins and other relevant data. This report describes four species-specific volumes of the BioKnowledge Library, concerned with the model organisms Saccharo­myces cerevisiae (YPD), Schizosaccharomyces pombe (PombePD) and Caenorhabditis elegans (WormPD), and with the fungal pathogen Candida albicans (CalPD™). Protein Reports of each species are unified in format, easily searchable and extensively cross-referenced between species. The relevance of these comprehensively curated resources to analysis of proteins in other species is discussed, and is illustrated by a survey of model organism proteins that have similarity to human proteins involved in disease. PMID:11125054

  20. Negative Example Selection for Protein Function Prediction: The NoGO Database

    PubMed Central

    Youngs, Noah; Penfold-Brown, Duncan; Bonneau, Richard; Shasha, Dennis

    2014-01-01

    Negative examples – genes that are known not to carry out a given protein function – are rarely recorded in genome and proteome annotation databases, such as the Gene Ontology database. Negative examples are required, however, for several of the most powerful machine learning methods for integrative protein function prediction. Most protein function prediction efforts have relied on a variety of heuristics for the choice of negative examples. Determining the accuracy of methods for negative example prediction is itself a non-trivial task, given that the Open World Assumption as applied to gene annotations rules out many traditional validation metrics. We present a rigorous comparison of these heuristics, utilizing a temporal holdout, and a novel evaluation strategy for negative examples. We add to this comparison several algorithms adapted from Positive-Unlabeled learning scenarios in text-classification, which are the current state of the art methods for generating negative examples in low-density annotation contexts. Lastly, we present two novel algorithms of our own construction, one based on empirical conditional probability, and the other using topic modeling applied to genes and annotations. We demonstrate that our algorithms achieve significantly fewer incorrect negative example predictions than the current state of the art, using multiple benchmarks covering multiple organisms. Our methods may be applied to generate negative examples for any type of method that deals with protein function, and to this end we provide a database of negative examples in several well-studied organisms, for general use (The NoGO database, available at: bonneaulab.bio.nyu.edu/nogo.html). PMID:24922051

  1. BioMart Central Portal: an open database network for the biological community

    PubMed Central

    Guberman, Jonathan M.; Ai, J.; Arnaiz, O.; Baran, Joachim; Blake, Andrew; Baldock, Richard; Chelala, Claude; Croft, David; Cros, Anthony; Cutts, Rosalind J.; Di Génova, A.; Forbes, Simon; Fujisawa, T.; Gadaleta, E.; Goodstein, D. M.; Gundem, Gunes; Haggarty, Bernard; Haider, Syed; Hall, Matthew; Harris, Todd; Haw, Robin; Hu, S.; Hubbard, Simon; Hsu, Jack; Iyer, Vivek; Jones, Philip; Katayama, Toshiaki; Kinsella, R.; Kong, Lei; Lawson, Daniel; Liang, Yong; Lopez-Bigas, Nuria; Luo, J.; Lush, Michael; Mason, Jeremy; Moreews, Francois; Ndegwa, Nelson; Oakley, Darren; Perez-Llamas, Christian; Primig, Michael; Rivkin, Elena; Rosanoff, S.; Shepherd, Rebecca; Simon, Reinhard; Skarnes, B.; Smedley, Damian; Sperling, Linda; Spooner, William; Stevenson, Peter; Stone, Kevin; Teague, J.; Wang, Jun; Wang, Jianxin; Whitty, Brett; Wong, D. T.; Wong-Erasmus, Marie; Yao, L.; Youens-Clark, Ken; Yung, Christina; Zhang, Junjun; Kasprzyk, Arek

    2011-01-01

    BioMart Central Portal is a first of its kind, community-driven effort to provide unified access to dozens of biological databases spanning genomics, proteomics, model organisms, cancer data, ontology information and more. Anybody can contribute an independently maintained resource to the Central Portal, allowing it to be exposed to and shared with the research community, and linking it with the other resources in the portal. Users can take advantage of the common interface to quickly utilize different sources without learning a new system for each. The system also simplifies cross-database searches that might otherwise require several complicated steps. Several integrated tools streamline common tasks, such as converting between ID formats and retrieving sequences. The combination of a wide variety of databases, an easy-to-use interface, robust programmatic access and the array of tools make Central Portal a one-stop shop for biological data querying. Here, we describe the structure of Central Portal and show example queries to demonstrate its capabilities. Database URL: http://central.biomart.org. PMID:21930507

  2. Toward An Unstructured Mesh Database

    NASA Astrophysics Data System (ADS)

    Rezaei Mahdiraji, Alireza; Baumann, Peter Peter

    2014-05-01

    Unstructured meshes are used in several application domains such as earth sciences (e.g., seismology), medicine, oceanography, cli- mate modeling, GIS as approximate representations of physical objects. Meshes subdivide a domain into smaller geometric elements (called cells) which are glued together by incidence relationships. The subdivision of a domain allows computational manipulation of complicated physical structures. For instance, seismologists model earthquakes using elastic wave propagation solvers on hexahedral meshes. The hexahedral con- tains several hundred millions of grid points and millions of hexahedral cells. Each vertex node in the hexahedrals stores a multitude of data fields. To run simulation on such meshes, one needs to iterate over all the cells, iterate over incident cells to a given cell, retrieve coordinates of cells, assign data values to cells, etc. Although meshes are used in many application domains, to the best of our knowledge there is no database vendor that support unstructured mesh features. Currently, the main tool for querying and manipulating unstructured meshes are mesh libraries, e.g., CGAL and GRAL. Mesh li- braries are dedicated libraries which includes mesh algorithms and can be run on mesh representations. The libraries do not scale with dataset size, do not have declarative query language, and need deep C++ knowledge for query implementations. Furthermore, due to high coupling between the implementations and input file structure, the implementations are less reusable and costly to maintain. A dedicated mesh database offers the following advantages: 1) declarative querying, 2) ease of maintenance, 3) hiding mesh storage structure from applications, and 4) transparent query optimization. To design a mesh database, the first challenge is to define a suitable generic data model for unstructured meshes. We proposed ImG-Complexes data model as a generic topological mesh data model which extends incidence graph model to multi

  3. ExtraTrain: a database of Extragenic regions and Transcriptional information in prokaryotic organisms

    PubMed Central

    Pareja, Eduardo; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Bonal, Javier; Tobes, Raquel

    2006-01-01

    Background Transcriptional regulation processes are the principal mechanisms of adaptation in prokaryotes. In these processes, the regulatory proteins and the regulatory DNA signals located in extragenic regions are the key elements involved. As all extragenic spaces are putative regulatory regions, ExtraTrain covers all extragenic regions of available genomes and regulatory proteins from bacteria and archaea included in the UniProt database. Description ExtraTrain provides integrated and easily manageable information for 679816 extragenic regions and for the genes delimiting each of them. In addition ExtraTrain supplies a tool to explore extragenic regions, named Palinsight, oriented to detect and search palindromic patterns. This interactive visual tool is totally integrated in the database, allowing the search for regulatory signals in user defined sets of extragenic regions. The 26046 regulatory proteins included in ExtraTrain belong to the families AraC/XylS, ArsR, AsnC, Cold shock domain, CRP-FNR, DeoR, GntR, IclR, LacI, LuxR, LysR, MarR, MerR, NtrC/Fis, OmpR and TetR. The database follows the InterPro criteria to define these families. The information about regulators includes manually curated sets of references specifically associated to regulator entries. In order to achieve a sustainable and maintainable knowledge database ExtraTrain is a platform open to the contribution of knowledge by the scientific community providing a system for the incorporation of textual knowledge. Conclusion ExtraTrain is a new database for exploring Extragenic regions and Transcriptional information in bacteria and archaea. ExtraTrain database is available at . PMID:16539733

  4. AgBase: supporting functional modeling in agricultural organisms

    PubMed Central

    McCarthy, Fiona M.; Gresham, Cathy R.; Buza, Teresia J.; Chouvarine, Philippe; Pillai, Lakshmi R.; Kumar, Ranjit; Ozkan, Seval; Wang, Hui; Manda, Prashanti; Arick, Tony; Bridges, Susan M.; Burgess, Shane C.

    2011-01-01

    AgBase (http://www.agbase.msstate.edu/) provides resources to facilitate modeling of functional genomics data and structural and functional annotation of agriculturally important animal, plant, microbe and parasite genomes. The website is redesigned to improve accessibility and ease of use, including improved search capabilities. Expanded capabilities include new dedicated pages for horse, cat, dog, cotton, rice and soybean. We currently provide 590 240 Gene Ontology (GO) annotations to 105 454 gene products in 64 different species, including GO annotations linked to transcripts represented on agricultural microarrays. For many of these arrays, this provides the only functional annotation available. GO annotations are available for download and we provide comprehensive, species-specific GO annotation files for 18 different organisms. The tools available at AgBase have been expanded and several existing tools improved based upon user feedback. One of seven new tools available at AgBase, GOModeler, supports hypothesis testing from functional genomics data. We host several associated databases and provide genome browsers for three agricultural pathogens. Moreover, we provide comprehensive training resources (including worked examples and tutorials) via links to Educational Resources at the AgBase website. PMID:21075795

  5. The Dfam database of repetitive DNA families.

    PubMed

    Hubley, Robert; Finn, Robert D; Clements, Jody; Eddy, Sean R; Jones, Thomas A; Bao, Weidong; Smit, Arian F A; Wheeler, Travis J

    2016-01-04

    Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of many genomes. Dfam is an open access database of families of repetitive DNA elements, in which each family is represented by a multiple sequence alignment and a profile hidden Markov model (HMM). The initial release of Dfam, featured in the 2013 NAR Database Issue, contained 1143 families of repetitive elements found in humans, and was used to produce more than 100 Mb of additional annotation of TE-derived regions in the human genome, with improved speed. Here, we describe recent advances, most notably expansion to 4150 total families including a comprehensive set of known repeat families from four new organisms (mouse, zebrafish, fly and nematode). We describe improvements to coverage, and to our methods for identifying and reducing false annotation. We also describe updates to the website interface. The Dfam website has moved to http://dfam.org. Seed alignments, profile HMMs, hit lists and other underlying data are available for download. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Lessons Learned from 2 Decades of Modelling Forest Dead Organic Matter and Soil Carbon at the National Scale

    NASA Astrophysics Data System (ADS)

    Shaw, C.; Kurz, W. A.; Metsaranta, J.; Bona, K. A.; Hararuk, O.; Smyth, C.

    2017-12-01

    The Carbon Budget Model of the Canadian Forest Sector (CBM-CFS3) is a forest carbon budget model that operates on individual stands. It is applied from regional to national-scales in Canada for national and international reporting of GHG emissions and removals and in support of analyses of forest sector mitigation options and other scientific and policy questions. This presentation will review the history and continuous improvement process of representations of dead organic matter (DOM) and soil carbon modelling. Early model versions in which dead organic matter (DOM) pools only included litter, downed deadwood and soil, to the current version where these pools are estimated separately to better compare model estimates against field measurements, or new pools have been added. Uncertainty analyses consistently point at soil C pools as large sources of uncertainty. With the new ground plot measurements from the National Forest Inventory, and with a newly compiled forest soil carbon database, we have recently completed a model data assimilation exercise that helped reduce parameter uncertainties. Lessons learned from the continuous improvement process will be summarised and we will discuss how model modification have led to improved representation of DOM and soil carbon dynamics. We conclude by suggesting future research priorities that can advance DOM and soil carbon modelling in Canadian forest ecosystems.

  7. Modeling personnel turnover in the parametric organization

    NASA Technical Reports Server (NTRS)

    Dean, Edwin B.

    1991-01-01

    A model is developed for simulating the dynamics of a newly formed organization, credible during all phases of organizational development. The model development process is broken down into the activities of determining the tasks required for parametric cost analysis (PCA), determining the skills required for each PCA task, determining the skills available in the applicant marketplace, determining the structure of the model, implementing the model, and testing it. The model, parameterized by the likelihood of job function transition, has demonstrated by the capability to represent the transition of personnel across functional boundaries within a parametric organization using a linear dynamical system, and the ability to predict required staffing profiles to meet functional needs at the desired time. The model can be extended by revisions of the state and transition structure to provide refinements in functional definition for the parametric and extended organization.

  8. EPA U.S. NATIONAL MARKAL DATABASE: DATABASE DOCUMENTATION

    EPA Science Inventory

    This document describes in detail the U.S. Energy System database developed by EPA's Integrated Strategic Assessment Work Group for use with the MARKAL model. The group is part of the Office of Research and Development and is located in the National Risk Management Research Labor...

  9. Alaska IPASS database preparation manual.

    Treesearch

    P. McHugh; D. Olson; C. Schallau

    1989-01-01

    Describes the data, their sources, and the calibration procedures used in compiling a database for the Alaska IPASS (interactive policy analysis simulation system) model. Although this manual is for Alaska, it provides generic instructions for analysts preparing databases for other geographical areas.

  10. Mass and Reliability Source (MaRS) Database

    NASA Technical Reports Server (NTRS)

    Valdenegro, Wladimir

    2017-01-01

    The Mass and Reliability Source (MaRS) Database consolidates components mass and reliability data for all Oribital Replacement Units (ORU) on the International Space Station (ISS) into a single database. It was created to help engineers develop a parametric model that relates hardware mass and reliability. MaRS supplies relevant failure data at the lowest possible component level while providing support for risk, reliability, and logistics analysis. Random-failure data is usually linked to the ORU assembly. MaRS uses this data to identify and display the lowest possible component failure level. As seen in Figure 1, the failure point is identified to the lowest level: Component 2.1. This is useful for efficient planning of spare supplies, supporting long duration crewed missions, allowing quicker trade studies, and streamlining diagnostic processes. MaRS is composed of information from various databases: MADS (operating hours), VMDB (indentured part lists), and ISS PART (failure data). This information is organized in Microsoft Excel and accessed through a program made in Microsoft Access (Figure 2). The focus of the Fall 2017 internship tour was to identify the components that were the root cause of failure from the given random-failure data, develop a taxonomy for the database, and attach material headings to the component list. Secondary objectives included verifying the integrity of the data in MaRS, eliminating any part discrepancies, and generating documentation for future reference. Due to the nature of the random-failure data, data mining had to be done manually without the assistance of an automated program to ensure positive identification.

  11. Modeling Personnel Turnover in the Parametric Organization

    NASA Technical Reports Server (NTRS)

    Dean, Edwin B.

    1991-01-01

    A primary issue in organizing a new parametric cost analysis function is to determine the skill mix and number of personnel required. The skill mix can be obtained by a functional decomposition of the tasks required within the organization and a matrixed correlation with educational or experience backgrounds. The number of personnel is a function of the skills required to cover all tasks, personnel skill background and cross training, the intensity of the workload for each task, migration through various tasks by personnel along a career path, personnel hiring limitations imposed by management and the applicant marketplace, personnel training limitations imposed by management and personnel capability, and the rate at which personnel leave the organization for whatever reason. Faced with the task of relating all of these organizational facets in order to grow a parametric cost analysis (PCA) organization from scratch, it was decided that a dynamic model was required in order to account for the obvious dynamics of the forming organization. The challenge was to create such a simple model which would be credible during all phases of organizational development. The model development process was broken down into the activities of determining the tasks required for PCA, determining the skills required for each PCA task, determining the skills available in the applicant marketplace, determining the structure of the dynamic model, implementing the dynamic model, and testing the dynamic model.

  12. A trait database for marine copepods

    NASA Astrophysics Data System (ADS)

    Brun, Philipp; Payne, Mark R.; Kiørboe, Thomas

    2017-02-01

    The trait-based approach is gaining increasing popularity in marine plankton ecology but the field urgently needs more and easier accessible trait data to advance. We compiled trait information on marine pelagic copepods, a major group of zooplankton, from the published literature and from experts and organized the data into a structured database. We collected 9306 records for 14 functional traits. Particular attention was given to body size, feeding mode, egg size, spawning strategy, respiration rate, and myelination (presence of nerve sheathing). Most records were reported at the species level, but some phylogenetically conserved traits, such as myelination, were reported at higher taxonomic levels, allowing the entire diversity of around 10 800 recognized marine copepod species to be covered with a few records. Aside from myelination, data coverage was highest for spawning strategy and body size, while information was more limited for quantitative traits related to reproduction and physiology. The database may be used to investigate relationships between traits, to produce trait biogeographies, or to inform and validate trait-based marine ecosystem models. The data can be downloaded from PANGAEA, doi:10.1594/PANGAEA.862968.

  13. [The future of clinical laboratory database management system].

    PubMed

    Kambe, M; Imidy, D; Matsubara, A; Sugimoto, Y

    1999-09-01

    To assess the present status of the clinical laboratory database management system, the difference between the Clinical Laboratory Information System and Clinical Laboratory System was explained in this study. Although three kinds of database management systems (DBMS) were shown including the relational model, tree model and network model, the relational model was found to be the best DBMS for the clinical laboratory database based on our experience and developments of some clinical laboratory expert systems. As a future clinical laboratory database management system, the IC card system connected to an automatic chemical analyzer was proposed for personal health data management and a microscope/video system was proposed for dynamic data management of leukocytes or bacteria.

  14. Standards for Clinical Grade Genomic Databases.

    PubMed

    Yohe, Sophia L; Carter, Alexis B; Pfeifer, John D; Crawford, James M; Cushman-Vokoun, Allison; Caughron, Samuel; Leonard, Debra G B

    2015-11-01

    Next-generation sequencing performed in a clinical environment must meet clinical standards, which requires reproducibility of all aspects of the testing. Clinical-grade genomic databases (CGGDs) are required to classify a variant and to assist in the professional interpretation of clinical next-generation sequencing. Applying quality laboratory standards to the reference databases used for sequence-variant interpretation presents a new challenge for validation and curation. To define CGGD and the categories of information contained in CGGDs and to frame recommendations for the structure and use of these databases in clinical patient care. Members of the College of American Pathologists Personalized Health Care Committee reviewed the literature and existing state of genomic databases and developed a framework for guiding CGGD development in the future. Clinical-grade genomic databases may provide different types of information. This work group defined 3 layers of information in CGGDs: clinical genomic variant repositories, genomic medical data repositories, and genomic medicine evidence databases. The layers are differentiated by the types of genomic and medical information contained and the utility in assisting with clinical interpretation of genomic variants. Clinical-grade genomic databases must meet specific standards regarding submission, curation, and retrieval of data, as well as the maintenance of privacy and security. These organizing principles for CGGDs should serve as a foundation for future development of specific standards that support the use of such databases for patient care.

  15. The DrugAge database of aging-related drugs.

    PubMed

    Barardo, Diogo; Thornton, Daniel; Thoppil, Harikrishnan; Walsh, Michael; Sharifi, Samim; Ferreira, Susana; Anžič, Andreja; Fernandes, Maria; Monteiro, Patrick; Grum, Tjaša; Cordeiro, Rui; De-Souza, Evandro Araújo; Budovsky, Arie; Araujo, Natali; Gruber, Jan; Petrascheck, Michael; Fraifeld, Vadim E; Zhavoronkov, Alexander; Moskalev, Alexey; de Magalhães, João Pedro

    2017-06-01

    Aging is a major worldwide medical challenge. Not surprisingly, identifying drugs and compounds that extend lifespan in model organisms is a growing research area. Here, we present DrugAge (http://genomics.senescence.info/drugs/), a curated database of lifespan-extending drugs and compounds. At the time of writing, DrugAge contains 1316 entries featuring 418 different compounds from studies across 27 model organisms, including worms, flies, yeast and mice. Data were manually curated from 324 publications. Using drug-gene interaction data, we also performed a functional enrichment analysis of targets of lifespan-extending drugs. Enriched terms include various functional categories related to glutathione and antioxidant activity, ion transport and metabolic processes. In addition, we found a modest but significant overlap between targets of lifespan-extending drugs and known aging-related genes, suggesting that some but not most aging-related pathways have been targeted pharmacologically in longevity studies. DrugAge is freely available online for the scientific community and will be an important resource for biogerontologists. © 2017 The Authors. Aging Cell published by the Anatomical Society and John Wiley & Sons Ltd.

  16. The LAILAPS search engine: a feature model for relevance ranking in life science databases.

    PubMed

    Lange, Matthias; Spies, Karl; Colmsee, Christian; Flemming, Steffen; Klapperstück, Matthias; Scholz, Uwe

    2010-03-25

    Efficient and effective information retrieval in life sciences is one of the most pressing challenge in bioinformatics. The incredible growth of life science databases to a vast network of interconnected information systems is to the same extent a big challenge and a great chance for life science research. The knowledge found in the Web, in particular in life-science databases, are a valuable major resource. In order to bring it to the scientist desktop, it is essential to have well performing search engines. Thereby, not the response time nor the number of results is important. The most crucial factor for millions of query results is the relevance ranking. In this paper, we present a feature model for relevance ranking in life science databases and its implementation in the LAILAPS search engine. Motivated by the observation of user behavior during their inspection of search engine result, we condensed a set of 9 relevance discriminating features. These features are intuitively used by scientists, who briefly screen database entries for potential relevance. The features are both sufficient to estimate the potential relevance, and efficiently quantifiable. The derivation of a relevance prediction function that computes the relevance from this features constitutes a regression problem. To solve this problem, we used artificial neural networks that have been trained with a reference set of relevant database entries for 19 protein queries. Supporting a flexible text index and a simple data import format, this concepts are implemented in the LAILAPS search engine. It can easily be used both as search engine for comprehensive integrated life science databases and for small in-house project databases. LAILAPS is publicly available for SWISSPROT data at http://lailaps.ipk-gatersleben.de.

  17. Modeling, Measurements, and Fundamental Database Development for Nonequilibrium Hypersonic Aerothermodynamics

    NASA Technical Reports Server (NTRS)

    Bose, Deepak

    2012-01-01

    The design of entry vehicles requires predictions of aerothermal environment during the hypersonic phase of their flight trajectories. These predictions are made using computational fluid dynamics (CFD) codes that often rely on physics and chemistry models of nonequilibrium processes. The primary processes of interest are gas phase chemistry, internal energy relaxation, electronic excitation, nonequilibrium emission and absorption of radiation, and gas-surface interaction leading to surface recession and catalytic recombination. NASAs Hypersonics Project is advancing the state-of-the-art in modeling of nonequilibrium phenomena by making detailed spectroscopic measurements in shock tube and arcjets, using ab-initio quantum mechanical techniques develop fundamental chemistry and spectroscopic databases, making fundamental measurements of finite-rate gas surface interactions, implementing of detailed mechanisms in the state-of-the-art CFD codes, The development of new models is based on validation with relevant experiments. We will present the latest developments and a roadmap for the technical areas mentioned above

  18. Query Monitoring and Analysis for Database Privacy - A Security Automata Model Approach.

    PubMed

    Kumar, Anand; Ligatti, Jay; Tu, Yi-Cheng

    2015-11-01

    Privacy and usage restriction issues are important when valuable data are exchanged or acquired by different organizations. Standard access control mechanisms either restrict or completely grant access to valuable data. On the other hand, data obfuscation limits the overall usability and may result in loss of total value. There are no standard policy enforcement mechanisms for data acquired through mutual and copyright agreements. In practice, many different types of policies can be enforced in protecting data privacy. Hence there is the need for an unified framework that encapsulates multiple suites of policies to protect the data. We present our vision of an architecture named security automata model (SAM) to enforce privacy-preserving policies and usage restrictions. SAM analyzes the input queries and their outputs to enforce various policies, liberating data owners from the burden of monitoring data access. SAM allows administrators to specify various policies and enforces them to monitor queries and control the data access. Our goal is to address the problems of data usage control and protection through privacy policies that can be defined, enforced, and integrated with the existing access control mechanisms using SAM. In this paper, we lay out the theoretical foundation of SAM, which is based on an automata named Mandatory Result Automata. We also discuss the major challenges of implementing SAM in a real-world database environment as well as ideas to meet such challenges.

  19. Exposure Modeling Tools and Databases for Consideration for Relevance to the Amended TSCA (ISES)

    EPA Science Inventory

    The Agency’s Office of Research and Development (ORD) has a number of ongoing exposure modeling tools and databases. These efforts are anticipated to be useful in supporting ongoing implementation of the amended Toxic Substances Control Act (TSCA). Under ORD’s Chemic...

  20. Demonstration of SLUMIS: a clinical database and management information system for a multi organ transplant program.

    PubMed Central

    Kurtz, M.; Bennett, T.; Garvin, P.; Manuel, F.; Williams, M.; Langreder, S.

    1991-01-01

    Because of the rapid evolution of the heart, heart/lung, liver, kidney and kidney/pancreas transplant programs at our institution, and because of a lack of an existing comprehensive database, we were required to develop a computerized management information system capable of supporting both clinical and research requirements of a multifaceted transplant program. SLUMIS (ST. LOUIS UNIVERSITY MULTI-ORGAN INFORMATION SYSTEM) was developed for the following reasons: 1) to comply with the reporting requirements of various transplant registries, 2) for reporting to an increasing number of government agencies and insurance carriers, 3) to obtain updates of our operative experience at regular intervals, 4) to integrate the Histocompatibility and Immunogenetics Laboratory (HLA) for online test result reporting, and 5) to facilitate clinical investigation. PMID:1807741

  1. Diverse data supports the transition of filamentous fungal model organisms into the post-genomics era

    DOE PAGES

    McCluskey, Kevin; Baker, Scott E.

    2017-02-17

    As model organisms filamentous fungi have been important since the beginning of modern biological inquiry and have benefitted from open data since the earliest genetic maps were shared. From early origins in simple Mendelian genetics of mating types, parasexual genetics of colony colour, and the foundational demonstration of the segregation of a nutritional requirement, the contribution of research systems utilising filamentous fungi has spanned the biochemical genetics era, through the molecular genetics era, and now are at the very foundation of diverse omics approaches to research and development. Fungal model organisms have come from most major taxonomic groups although Ascomycetemore » filamentous fungi have seen the most major sustained effort. In addition to the published material about filamentous fungi, shared molecular tools have found application in every area of fungal biology. Likewise, shared data has contributed to the success of model systems. Furthermore, the scale of data supporting research with filamentous fungi has grown by 10 to 12 orders of magnitude. From genetic to molecular maps, expression databases, and finally genome resources, the open and collaborative nature of the research communities has assured that the rising tide of data has lifted all of the research systems together.« less

  2. Diverse data supports the transition of filamentous fungal model organisms into the post-genomics era

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McCluskey, Kevin; Baker, Scott E.

    As model organisms filamentous fungi have been important since the beginning of modern biological inquiry and have benefitted from open data since the earliest genetic maps were shared. From early origins in simple Mendelian genetics of mating types, parasexual genetics of colony colour, and the foundational demonstration of the segregation of a nutritional requirement, the contribution of research systems utilising filamentous fungi has spanned the biochemical genetics era, through the molecular genetics era, and now are at the very foundation of diverse omics approaches to research and development. Fungal model organisms have come from most major taxonomic groups although Ascomycetemore » filamentous fungi have seen the most major sustained effort. In addition to the published material about filamentous fungi, shared molecular tools have found application in every area of fungal biology. Likewise, shared data has contributed to the success of model systems. Furthermore, the scale of data supporting research with filamentous fungi has grown by 10 to 12 orders of magnitude. From genetic to molecular maps, expression databases, and finally genome resources, the open and collaborative nature of the research communities has assured that the rising tide of data has lifted all of the research systems together.« less

  3. International energy: Research organizations, 1986--1990

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hendricks, P.; Jordan, S.

    The International Energy: Research Organizations publication contains the standardized names of energy research organizations used in energy information databases. Involved in this cooperative task are (1) the technical staff of the USDOE Office of Scientific and Technical Information (OSTI) in cooperation with the member countries of the Energy Technology Data Exchange (ETDE) and (2) the International Nuclear Information System (INIS). This publication identifies current organizations doing research in all energy fields, standardizes the format for recording these organization names in bibliographic citations, assigns a numeric code to facilitate data entry, and identifies report number prefixes assigned by these organizations. Thesemore » research organization names may be used in searching the databases Energy Science Technology'' on DIALOG and Energy'' on STN International. These organization names are also used in USDOE databases on the Integrated Technical Information System. Research organizations active in the past five years, as indicated by database records, were identified to form this publication. This directory includes approximately 34,000 organizations that reported energy-related literature from 1986 to 1990 and updates the DOE Energy Data Base: Corporate Author Entries.« less

  4. Geospatial database for heritage building conservation

    NASA Astrophysics Data System (ADS)

    Basir, W. N. F. W. A.; Setan, H.; Majid, Z.; Chong, A.

    2014-02-01

    Heritage buildings are icons from the past that exist in present time. Through heritage architecture, we can learn about economic issues and social activities of the past. Nowadays, heritage buildings are under threat from natural disaster, uncertain weather, pollution and others. In order to preserve this heritage for the future generation, recording and documenting of heritage buildings are required. With the development of information system and data collection technique, it is possible to create a 3D digital model. This 3D information plays an important role in recording and documenting heritage buildings. 3D modeling and virtual reality techniques have demonstrated the ability to visualize the real world in 3D. It can provide a better platform for communication and understanding of heritage building. Combining 3D modelling with technology of Geographic Information System (GIS) will create a database that can make various analyses about spatial data in the form of a 3D model. Objectives of this research are to determine the reliability of Terrestrial Laser Scanning (TLS) technique for data acquisition of heritage building and to develop a geospatial database for heritage building conservation purposes. The result from data acquisition will become a guideline for 3D model development. This 3D model will be exported to the GIS format in order to develop a database for heritage building conservation. In this database, requirements for heritage building conservation process are included. Through this research, a proper database for storing and documenting of the heritage building conservation data will be developed.

  5. Organizations in America: Analyzing Their Structures and Human Resource Practices Based on the National Organizations Study.

    ERIC Educational Resources Information Center

    Kalleberg, Arne L.; Knoke, David; Marsden, Peter V.; Spaeth, Joe L.

    In 1991 the National Organizations Study (NOS) surveyed a number of U.S. businesses about their structure, context, and personnel practices to produce a database for answering questions about social behavior in work organizations. This book presents the results of that survey. The study aimed to create a national database on organizations--based…

  6. Influence of high-resolution surface databases on the modeling of local atmospheric circulation systems

    NASA Astrophysics Data System (ADS)

    Paiva, L. M. S.; Bodstein, G. C. R.; Pimentel, L. C. G.

    2014-08-01

    Large-eddy simulations are performed using the Advanced Regional Prediction System (ARPS) code at horizontal grid resolutions as fine as 300 m to assess the influence of detailed and updated surface databases on the modeling of local atmospheric circulation systems of urban areas with complex terrain. Applications to air pollution and wind energy are sought. These databases are comprised of 3 arc-sec topographic data from the Shuttle Radar Topography Mission, 10 arc-sec vegetation-type data from the European Space Agency (ESA) GlobCover project, and 30 arc-sec leaf area index and fraction of absorbed photosynthetically active radiation data from the ESA GlobCarbon project. Simulations are carried out for the metropolitan area of Rio de Janeiro using six one-way nested-grid domains that allow the choice of distinct parametric models and vertical resolutions associated to each grid. ARPS is initialized using the Global Forecasting System with 0.5°-resolution data from the National Center of Environmental Prediction, which is also used every 3 h as lateral boundary condition. Topographic shading is turned on and two soil layers are used to compute the soil temperature and moisture budgets in all runs. Results for two simulated runs covering three periods of time are compared to surface and upper-air observational data to explore the dependence of the simulations on initial and boundary conditions, grid resolution, topographic and land-use databases. Our comparisons show overall good agreement between simulated and observational data, mainly for the potential temperature and the wind speed fields, and clearly indicate that the use of high-resolution databases improves significantly our ability to predict the local atmospheric circulation.

  7. System and method employing a self-organizing map load feature database to identify electric load types of different electric loads

    DOEpatents

    Lu, Bin; Harley, Ronald G.; Du, Liang; Yang, Yi; Sharma, Santosh K.; Zambare, Prachi; Madane, Mayura A.

    2014-06-17

    A method identifies electric load types of a plurality of different electric loads. The method includes providing a self-organizing map load feature database of a plurality of different electric load types and a plurality of neurons, each of the load types corresponding to a number of the neurons; employing a weight vector for each of the neurons; sensing a voltage signal and a current signal for each of the loads; determining a load feature vector including at least four different load features from the sensed voltage signal and the sensed current signal for a corresponding one of the loads; and identifying by a processor one of the load types by relating the load feature vector to the neurons of the database by identifying the weight vector of one of the neurons corresponding to the one of the load types that is a minimal distance to the load feature vector.

  8. Database Dictionary for Ethiopian National Ground-Water DAtabase (ENGDA) Data Fields

    USGS Publications Warehouse

    Kuniansky, Eve L.; Litke, David W.; Tucci, Patrick

    2007-01-01

    Introduction This document describes the data fields that are used for both field forms and the Ethiopian National Ground-water Database (ENGDA) tables associated with information stored about production wells, springs, test holes, test wells, and water level or water-quality observation wells. Several different words are used in this database dictionary and in the ENGDA database to describe a narrow shaft constructed in the ground. The most general term is borehole, which is applicable to any type of hole. A well is a borehole specifically constructed to extract water from the ground; however, for this data dictionary and for the ENGDA database, the words well and borehole are used interchangeably. A production well is defined as any well used for water supply and includes hand-dug wells, small-diameter bored wells equipped with hand pumps, or large-diameter bored wells equipped with large-capacity motorized pumps. Test holes are borings made to collect information about the subsurface with continuous core or non-continuous core and/or where geophysical logs are collected. Test holes are not converted into wells. A test well is a well constructed for hydraulic testing of an aquifer in order to plan a larger ground-water production system. A water-level or water-quality observation well is a well that is used to collect information about an aquifer and not used for water supply. A spring is any naturally flowing, local, ground-water discharge site. The database dictionary is designed to help define all fields on both field data collection forms (provided in attachment 2 of this report) and for the ENGDA software screen entry forms (described in Litke, 2007). The data entered into each screen entry field are stored in relational database tables within the computer database. The organization of the database dictionary is designed based on field data collection and the field forms, because this is what the majority of people will use. After each field, however, the

  9. The Brain Database: A Multimedia Neuroscience Database for Research and Teaching

    PubMed Central

    Wertheim, Steven L.

    1989-01-01

    The Brain Database is an information tool designed to aid in the integration of clinical and research results in neuroanatomy and regional biochemistry. It can handle a wide range of data types including natural images, 2 and 3-dimensional graphics, video, numeric data and text. It is organized around three main entities: structures, substances and processes. The database will support a wide variety of graphical interfaces. Two sample interfaces have been made. This tool is intended to serve as one component of a system that would allow neuroscientists and clinicians 1) to represent clinical and experimental data within a common framework 2) to compare results precisely between experiments and among laboratories, 3) to use computing tools as an aid in collaborative work and 4) to contribute to a shared and accessible body of knowledge about the nervous system.

  10. SAADA: Astronomical Databases Made Easier

    NASA Astrophysics Data System (ADS)

    Michel, L.; Nguyen, H. N.; Motch, C.

    2005-12-01

    Many astronomers wish to share datasets with their community but have not enough manpower to develop databases having the functionalities required for high-level scientific applications. The SAADA project aims at automatizing the creation and deployment process of such databases. A generic but scientifically relevant data model has been designed which allows one to build databases by providing only a limited number of product mapping rules. Databases created by SAADA rely on a relational database supporting JDBC and covered by a Java layer including a lot of generated code. Such databases can simultaneously host spectra, images, source lists and plots. Data are grouped in user defined collections whose content can be seen as one unique set per data type even if their formats differ. Datasets can be correlated one with each other using qualified links. These links help, for example, to handle the nature of a cross-identification (e.g., a distance or a likelihood) or to describe their scientific content (e.g., by associating a spectrum to a catalog entry). The SAADA query engine is based on a language well suited to the data model which can handle constraints on linked data, in addition to classical astronomical queries. These constraints can be applied on the linked objects (number, class and attributes) and/or on the link qualifier values. Databases created by SAADA are accessed through a rich WEB interface or a Java API. We are currently developing an inter-operability module implanting VO protocols.

  11. Estimation of daily reference evapotranspiration (ETo) using artificial intelligence methods: Offering a new approach for lagged ETo data-based modeling

    NASA Astrophysics Data System (ADS)

    Mehdizadeh, Saeid

    2018-04-01

    Evapotranspiration (ET) is considered as a key factor in hydrological and climatological studies, agricultural water management, irrigation scheduling, etc. It can be directly measured using lysimeters. Moreover, other methods such as empirical equations and artificial intelligence methods can be used to model ET. In the recent years, artificial intelligence methods have been widely utilized to estimate reference evapotranspiration (ETo). In the present study, local and external performances of multivariate adaptive regression splines (MARS) and gene expression programming (GEP) were assessed for estimating daily ETo. For this aim, daily weather data of six stations with different climates in Iran, namely Urmia and Tabriz (semi-arid), Isfahan and Shiraz (arid), Yazd and Zahedan (hyper-arid) were employed during 2000-2014. Two types of input patterns consisting of weather data-based and lagged ETo data-based scenarios were considered to develop the models. Four statistical indicators including root mean square error (RMSE), mean absolute error (MAE), coefficient of determination (R2), and mean absolute percentage error (MAPE) were used to check the accuracy of models. The local performance of models revealed that the MARS and GEP approaches have the capability to estimate daily ETo using the meteorological parameters and the lagged ETo data as inputs. Nevertheless, the MARS had the best performance in the weather data-based scenarios. On the other hand, considerable differences were not observed in the models' accuracy for the lagged ETo data-based scenarios. In the innovation of this study, novel hybrid models were proposed in the lagged ETo data-based scenarios through combination of MARS and GEP models with autoregressive conditional heteroscedasticity (ARCH) time series model. It was concluded that the proposed novel models named MARS-ARCH and GEP-ARCH improved the performance of ETo modeling compared to the single MARS and GEP. In addition, the external

  12. Establishment of an international database for genetic variants in esophageal cancer.

    PubMed

    Vihinen, Mauno

    2016-10-01

    The establishment of a database has been suggested in order to collect, organize, and distribute genetic information about esophageal cancer. The World Organization for Specialized Studies on Diseases of the Esophagus and the Human Variome Project will be in charge of a central database of information about esophageal cancer-related variations from publications, databases, and laboratories; in addition to genetic details, clinical parameters will also be included. The aim will be to get all the central players in research, clinical, and commercial laboratories to contribute. The database will follow established recommendations and guidelines. The database will require a team of dedicated curators with different backgrounds. Numerous layers of systematics will be applied to facilitate computational analyses. The data items will be extensively integrated with other information sources. The database will be distributed as open access to ensure exchange of the data with other databases. Variations will be reported in relation to reference sequences on three levels--DNA, RNA, and protein-whenever applicable. In the first phase, the database will concentrate on genetic variations including both somatic and germline variations for susceptibility genes. Additional types of information can be integrated at a later stage. © 2016 New York Academy of Sciences.

  13. The ChArMEx database

    NASA Astrophysics Data System (ADS)

    Ferré, Hélène; Descloitres, Jacques; Fleury, Laurence; Boichard, Jean-Luc; Brissebrat, Guillaume; Focsa, Loredana; Henriot, Nicolas; Mastrorillo, Laurence; Mière, Arnaud; Vermeulen, Anne

    2013-04-01

    The Chemistry-Aerosol Mediterranean Experiment (ChArMEx, http://charmex.lsce.ipsl.fr/) aims at a scientific assessment of the present and future state of the atmospheric environment in the Mediterranean Basin, and of its impacts on the regional climate, air quality, and marine biogeochemistry. The project includes long term monitoring of environmental parameters, intensive field campaigns, use of satellite data and modelling studies. Therefore ChARMEx scientists produce and need to access a wide diversity of data. In this context, the objective of the database task is to organize data management, distribution system and services such as facilitating the exchange of information and stimulating the collaboration between researchers within the ChArMEx community, and beyond. The database relies on a strong collaboration between OMP and ICARE data centres and falls within the scope of the Mediterranean Integrated Studies at Regional And Locals Scales (MISTRALS) program data portal. All the data produced by or of interest for the ChArMEx community will be documented in the data catalogue and accessible through the database website: http://mistrals.sedoo.fr/ChArMEx. The database website offers different tools: - A registration procedure which enables any scientist to accept the data policy and apply for a user database account. - Forms to document observations or products that will be provided to the database in compliance with metadata international standards (ISO 19115-19139; INSPIRE; Global Change Master Directory Thesaurus). - A search tool to browse the catalogue using thematic, geographic and/or temporal criteria. - Sorted lists of the datasets by thematic keywords, by measured parameters, by instruments or by platform type. - A shopping-cart web interface to order in situ data files. At present datasets from the background monitoring station of Ersa, Cape Corsica and from the 2012 ChArMEx pre-campaign are available. - A user-friendly access to satellite products

  14. Database of the United States Coal Pellet Collection of the U.S. Geological Survey Organic Petrology Laboratory

    USGS Publications Warehouse

    Deems, Nikolaus J.; Hackley, Paul C.

    2012-01-01

    The Organic Petrology Laboratory (OPL) of the U.S. Geological Survey (USGS) Eastern Energy Resources Science Center in Reston, Virginia, contains several thousand processed coal sample materials that were loosely organized in laboratory drawers for the past several decades. The majority of these were prepared as 1-inch-diameter particulate coal pellets (more than 6,000 pellets; one sample usually was prepared as two pellets, although some samples were prepared in as many as four pellets), which were polished and used in reflected light petrographic studies. These samples represent the work of many scientists from the 1970s to the present, most notably Ron Stanton, who managed the OPL until 2001 (see Warwick and Ruppert, 2005, for a comprehensive bibliography of Ron Stanton's work). The purpose of the project described herein was to organize and catalog the U.S. part of the petrographic sample collection into a comprehensive database (available with this report as a Microsoft Excel file) and to compile and list published studies associated with the various sample sets. Through this work, the extent of the collection is publicly documented as a resource and sample library available to other scientists and researchers working in U.S. coal basins previously studied by organic petrologists affiliated with the USGS. Other researchers may obtain samples in the OPL collection on loan at the discretion of the USGS authors listed in this report and its associated Web page.

  15. BioMart Central Portal: an open database network for the biological community.

    PubMed

    Guberman, Jonathan M; Ai, J; Arnaiz, O; Baran, Joachim; Blake, Andrew; Baldock, Richard; Chelala, Claude; Croft, David; Cros, Anthony; Cutts, Rosalind J; Di Génova, A; Forbes, Simon; Fujisawa, T; Gadaleta, E; Goodstein, D M; Gundem, Gunes; Haggarty, Bernard; Haider, Syed; Hall, Matthew; Harris, Todd; Haw, Robin; Hu, S; Hubbard, Simon; Hsu, Jack; Iyer, Vivek; Jones, Philip; Katayama, Toshiaki; Kinsella, R; Kong, Lei; Lawson, Daniel; Liang, Yong; Lopez-Bigas, Nuria; Luo, J; Lush, Michael; Mason, Jeremy; Moreews, Francois; Ndegwa, Nelson; Oakley, Darren; Perez-Llamas, Christian; Primig, Michael; Rivkin, Elena; Rosanoff, S; Shepherd, Rebecca; Simon, Reinhard; Skarnes, B; Smedley, Damian; Sperling, Linda; Spooner, William; Stevenson, Peter; Stone, Kevin; Teague, J; Wang, Jun; Wang, Jianxin; Whitty, Brett; Wong, D T; Wong-Erasmus, Marie; Yao, L; Youens-Clark, Ken; Yung, Christina; Zhang, Junjun; Kasprzyk, Arek

    2011-01-01

    BioMart Central Portal is a first of its kind, community-driven effort to provide unified access to dozens of biological databases spanning genomics, proteomics, model organisms, cancer data, ontology information and more. Anybody can contribute an independently maintained resource to the Central Portal, allowing it to be exposed to and shared with the research community, and linking it with the other resources in the portal. Users can take advantage of the common interface to quickly utilize different sources without learning a new system for each. The system also simplifies cross-database searches that might otherwise require several complicated steps. Several integrated tools streamline common tasks, such as converting between ID formats and retrieving sequences. The combination of a wide variety of databases, an easy-to-use interface, robust programmatic access and the array of tools make Central Portal a one-stop shop for biological data querying. Here, we describe the structure of Central Portal and show example queries to demonstrate its capabilities.

  16. The AMMA database

    NASA Astrophysics Data System (ADS)

    Boichard, Jean-Luc; Brissebrat, Guillaume; Cloche, Sophie; Eymard, Laurence; Fleury, Laurence; Mastrorillo, Laurence; Moulaye, Oumarou; Ramage, Karim

    2010-05-01

    The AMMA project includes aircraft, ground-based and ocean measurements, an intensive use of satellite data and diverse modelling studies. Therefore, the AMMA database aims at storing a great amount and a large variety of data, and at providing the data as rapidly and safely as possible to the AMMA research community. In order to stimulate the exchange of information and collaboration between researchers from different disciplines or using different tools, the database provides a detailed description of the products and uses standardized formats. The AMMA database contains: - AMMA field campaigns datasets; - historical data in West Africa from 1850 (operational networks and previous scientific programs); - satellite products from past and future satellites, (re-)mapped on a regular latitude/longitude grid and stored in NetCDF format (CF Convention); - model outputs from atmosphere or ocean operational (re-)analysis and forecasts, and from research simulations. The outputs are processed as the satellite products are. Before accessing the data, any user has to sign the AMMA data and publication policy. This chart only covers the use of data in the framework of scientific objectives and categorically excludes the redistribution of data to third parties and the usage for commercial applications. Some collaboration between data producers and users, and the mention of the AMMA project in any publication is also required. The AMMA database and the associated on-line tools have been fully developed and are managed by two teams in France (IPSL Database Centre, Paris and OMP, Toulouse). Users can access data of both data centres using an unique web portal. This website is composed of different modules : - Registration: forms to register, read and sign the data use chart when an user visits for the first time - Data access interface: friendly tool allowing to build a data extraction request by selecting various criteria like location, time, parameters... The request can

  17. Scalable Database Design of End-Game Model with Decoupled Countermeasure and Threat Information

    DTIC Science & Technology

    2017-11-01

    Threat Information by Decetria Akole and Michael Chen Approved for public release; distribution is unlimited...Scalable Database Design of End-Game Model with Decoupled Countermeasure and Threat Information by Decetria Akole The Thurgood Marshall...for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data

  18. A kinetics database and scripts for PHREEQC

    NASA Astrophysics Data System (ADS)

    Hu, B.; Zhang, Y.; Teng, Y.; Zhu, C.

    2017-12-01

    Kinetics of geochemical reactions has been increasingly used in numerical models to simulate coupled flow, mass transport, and chemical reactions. However, the kinetic data are scattered in the literature. To assemble a kinetic dataset for a modeling project is an intimidating task for most. In order to facilitate the application of kinetics in geochemical modeling, we assembled kinetics parameters into a database for the geochemical simulation program, PHREEQC (version 3.0). Kinetics data were collected from the literature. Our database includes kinetic data for over 70 minerals. The rate equations are also programmed into scripts with the Basic language. Using the new kinetic database, we simulated reaction path during the albite dissolution process using various rate equations in the literature. The simulation results with three different rate equations gave difference reaction paths at different time scale. Another application involves a coupled reactive transport model simulating the advancement of an acid plume in an acid mine drainage site associated with Bear Creek Uranium tailings pond. Geochemical reactions including calcite, gypsum, and illite were simulated with PHREEQC using the new kinetic database. The simulation results successfully demonstrated the utility of new kinetic database.

  19. Organic acid modeling and model validation: Workshop summary. Final report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sullivan, T.J.; Eilers, J.M.

    1992-08-14

    A workshop was held in Corvallis, Oregon on April 9--10, 1992 at the offices of E&S Environmental Chemistry, Inc. The purpose of this workshop was to initiate research efforts on the entitled ``Incorporation of an organic acid representation into MAGIC (Model of Acidification of Groundwater in Catchments) and testing of the revised model using Independent data sources.`` The workshop was attended by a team of internationally-recognized experts in the fields of surface water acid-bass chemistry, organic acids, and watershed modeling. The rationale for the proposed research is based on the recent comparison between MAGIC model hindcasts and paleolimnological inferences ofmore » historical acidification for a set of 33 statistically-selected Adirondack lakes. Agreement between diatom-inferred and MAGIC-hindcast lakewater chemistry in the earlier research had been less than satisfactory. Based on preliminary analyses, it was concluded that incorporation of a reasonable organic acid representation into the version of MAGIC used for hindcasting was the logical next step toward improving model agreement.« less

  20. A geospatial database model for the management of remote sensing datasets at multiple spectral, spatial, and temporal scales

    NASA Astrophysics Data System (ADS)

    Ifimov, Gabriela; Pigeau, Grace; Arroyo-Mora, J. Pablo; Soffer, Raymond; Leblanc, George

    2017-10-01

    In this study the development and implementation of a geospatial database model for the management of multiscale datasets encompassing airborne imagery and associated metadata is presented. To develop the multi-source geospatial database we have used a Relational Database Management System (RDBMS) on a Structure Query Language (SQL) server which was then integrated into ArcGIS and implemented as a geodatabase. The acquired datasets were compiled, standardized, and integrated into the RDBMS, where logical associations between different types of information were linked (e.g. location, date, and instrument). Airborne data, at different processing levels (digital numbers through geocorrected reflectance), were implemented in the geospatial database where the datasets are linked spatially and temporally. An example dataset consisting of airborne hyperspectral imagery, collected for inter and intra-annual vegetation characterization and detection of potential hydrocarbon seepage events over pipeline areas, is presented. Our work provides a model for the management of airborne imagery, which is a challenging aspect of data management in remote sensing, especially when large volumes of data are collected.

  1. Molecular DYNAmics of Soil Organic carbon (DYNAMOS ): a project focusing on soils and carbon through data and modeling

    NASA Astrophysics Data System (ADS)

    Mendez-Millan, Mercedes

    2010-05-01

    Here we present the first results of the DynaMOS project whose main issue is the build-up of a new generation of soil carbon model. The modeling will describe together soil organic geochemistry and soil carbon dynamics in a generalized, quantitative representation. The carbon dynamics time scale envisaged here will cover the 1 to 1000 yr range and describe molecule behaviours (i.e.)carbohydrate, peptide, amino acid, lignin, lipids, their products of biodegradation and uncharacterized carbonaceous species of biological origin. Three main characteristics define DYNAMOS model originalities: it will consider organic matter at the molecular scale, integrate back to global scale and account for component vertical movements. In a first step, specific data acquisition will concern the production, fate and age of carbon of individual organic compounds. Dynamic parameters will be acquired by compound-specific carbon isotope analysis of both 13C and 14C, by GC/C/IR-MS and AMS. Sites for data acquisition, model calibration and model validation will be chosen on the base of their isotopic history and environmental constraints: 13C natural labeling (with and without C3/C4 vegetation changes), 13C/15N-labelled litter application in both forest and cropland. They include some long-term experiments owned by the partners themselves plus a worldwide panel of sites. In a second step the depth distribution of organic species, isotopes and ages in soils (1D representation) will be modeled by coupling carbon dynamics and vertical movement. Besides the main objective of providing a robust soil carbon dynamics model, DYNAMOS will assess and model the alteration of the isotopic signature of molecules throughout decay and create a shared database of both already published and new data of compound specific information. Issues of the project will concern different scientific fields: global geochemical cycles by refining the description of the terrestrial carbon cycle and entering the chemical

  2. Molecular DYNAmics of Soil Organic carbon (DYNAMOS *): a project focusing on soils and carbon through data and modeling

    NASA Astrophysics Data System (ADS)

    Hatté, C.; Balesdent, J.; Derenne, S.; Derrien, D.; Dignac, M.; Egasse, C.; Ezat, U.; Gauthier, C.; Mendez-Millan, M.; Nguyen Tu, T.; Rumpel, C.; Sicre, M.; Zeller, B.

    2009-12-01

    Here we present the first results of the DynaMOS project whose main issue is the build-up of a new generation of soil carbon model. The modeling will describe together soil organic geochemistry and soil carbon dynamics in a generalized, quantitative representation. The carbon dynamics time scale envisaged here will cover the 1 to 1000 yr range and described molecules will be carbohydrate, peptide, amino acid, lignin, lipids, their products of biodegradation and uncharacterized carbonaceous species of biological origin. Three main characteristics define DYNAMOS model originalities: it will consider organic matter at the molecular scale, integrate back to global scale and account for component vertical movements. In a first step, specific data acquisition will concern the production, fate and age of carbon of individual organic compounds. Dynamic parameters will be acquired by compound-specific carbon isotope analysis of both 13C and 14C, by GC/C/IR-MS and AMS. Sites for data acquisition, model calibration and model validation will be chosen on the base of their isotopic history and environmental constraints: 13C natural labeling (with and without C3/C4 vegetation changes), 13C/15N-labelled litter application in both forest and cropland. They include some long-term experiments owned by the partners themselves plus a worldwide panel of sites. In a second step the depth distribution of organic species, isotopes and ages in soils (1D representation) will be modeled by coupling carbon dynamics and vertical movement. Besides the main objective of providing a robust soil carbon dynamics model, DYNAMOS will assess and model the alteration of the isotopic signature of molecules throughout decay and create a shared database of both already published and new data of compound specific information. Issues of the project will concern different scientific fields: global geochemical cycles by refining the description of the terrestrial carbon cycle and entering the chemical

  3. TRY – a global database of plant traits

    PubMed Central

    Kattge, J; Díaz, S; Lavorel, S; Prentice, I C; Leadley, P; Bönisch, G; Garnier, E; Westoby, M; Reich, P B; Wright, I J; Cornelissen, J H C; Violle, C; Harrison, S P; Van Bodegom, P M; Reichstein, M; Enquist, B J; Soudzilovskaia, N A; Ackerly, D D; Anand, M; Atkin, O; Bahn, M; Baker, T R; Baldocchi, D; Bekker, R; Blanco, C C; Blonder, B; Bond, W J; Bradstock, R; Bunker, D E; Casanoves, F; Cavender-Bares, J; Chambers, J Q; Chapin, F S; Chave, J; Coomes, D; Cornwell, W K; Craine, J M; Dobrin, B H; Duarte, L; Durka, W; Elser, J; Esser, G; Estiarte, M; Fagan, W F; Fang, J; Fernández-Méndez, F; Fidelis, A; Finegan, B; Flores, O; Ford, H; Frank, D; Freschet, G T; Fyllas, N M; Gallagher, R V; Green, W A; Gutierrez, A G; Hickler, T; Higgins, S I; Hodgson, J G; Jalili, A; Jansen, S; Joly, C A; Kerkhoff, A J; Kirkup, D; Kitajima, K; Kleyer, M; Klotz, S; Knops, J M H; Kramer, K; Kühn, I; Kurokawa, H; Laughlin, D; Lee, T D; Leishman, M; Lens, F; Lenz, T; Lewis, S L; Lloyd, J; Llusià, J; Louault, F; Ma, S; Mahecha, M D; Manning, P; Massad, T; Medlyn, B E; Messier, J; Moles, A T; Müller, S C; Nadrowski, K; Naeem, S; Niinemets, Ü; Nöllert, S; Nüske, A; Ogaya, R; Oleksyn, J; Onipchenko, V G; Onoda, Y; Ordoñez, J; Overbeck, G; Ozinga, W A; Patiño, S; Paula, S; Pausas, J G; Peñuelas, J; Phillips, O L; Pillar, V; Poorter, H; Poorter, L; Poschlod, P; Prinzing, A; Proulx, R; Rammig, A; Reinsch, S; Reu, B; Sack, L; Salgado-Negret, B; Sardans, J; Shiodera, S; Shipley, B; Siefert, A; Sosinski, E; Soussana, J-F; Swaine, E; Swenson, N; Thompson, K; Thornton, P; Waldram, M; Weiher, E; White, M; White, S; Wright, S J; Yguel, B; Zaehle, S; Zanne, A E; Wirth, C

    2011-01-01

    Plant traits – the morphological, anatomical, physiological, biochemical and phenological characteristics of plants and their organs – determine how primary producers respond to environmental factors, affect other trophic levels, influence ecosystem processes and services and provide a link from species richness to ecosystem functional diversity. Trait data thus represent the raw material for a wide range of research from evolutionary biology, community and functional ecology to biogeography. Here we present the global database initiative named TRY, which has united a wide range of the plant trait research community worldwide and gained an unprecedented buy-in of trait data: so far 93 trait databases have been contributed. The data repository currently contains almost three million trait entries for 69 000 out of the world's 300 000 plant species, with a focus on 52 groups of traits characterizing the vegetative and regeneration stages of the plant life cycle, including growth, dispersal, establishment and persistence. A first data analysis shows that most plant traits are approximately log-normally distributed, with widely differing ranges of variation across traits. Most trait variation is between species (interspecific), but significant intraspecific variation is also documented, up to 40% of the overall variation. Plant functional types (PFTs), as commonly used in vegetation models, capture a substantial fraction of the observed variation – but for several traits most variation occurs within PFTs, up to 75% of the overall variation. In the context of vegetation models these traits would better be represented by state variables rather than fixed parameter values. The improved availability of plant trait data in the unified global database is expected to support a paradigm shift from species to trait-based ecology, offer new opportunities for synthetic plant trait research and enable a more realistic and empirically grounded representation of terrestrial

  4. GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity database

    PubMed Central

    2012-01-01

    Background In the scientific biodiversity community, it is increasingly perceived the need to build a bridge between molecular and traditional biodiversity studies. We believe that the information technology could have a preeminent role in integrating the information generated by these studies with the large amount of molecular data we can find in bioinformatics public databases. This work is primarily aimed at building a bioinformatic infrastructure for the integration of public and private biodiversity data through the development of GIDL, an Intelligent Data Loader coupled with the Molecular Biodiversity Database. The system presented here organizes in an ontological way and locally stores the sequence and annotation data contained in the GenBank primary database. Methods The GIDL architecture consists of a relational database and of an intelligent data loader software. The relational database schema is designed to manage biodiversity information (Molecular Biodiversity Database) and it is organized in four areas: MolecularData, Experiment, Collection and Taxonomy. The MolecularData area is inspired to an established standard in Generic Model Organism Databases, the Chado relational schema. The peculiarity of Chado, and also its strength, is the adoption of an ontological schema which makes use of the Sequence Ontology. The Intelligent Data Loader (IDL) component of GIDL is an Extract, Transform and Load software able to parse data, to discover hidden information in the GenBank entries and to populate the Molecular Biodiversity Database. The IDL is composed by three main modules: the Parser, able to parse GenBank flat files; the Reasoner, which automatically builds CLIPS facts mapping the biological knowledge expressed by the Sequence Ontology; the DBFiller, which translates the CLIPS facts into ordered SQL statements used to populate the database. In GIDL Semantic Web technologies have been adopted due to their advantages in data representation, integration and

  5. The 24th annual Nucleic Acids Research database issue: a look back and upcoming changes

    PubMed Central

    Rigden, Daniel J

    2017-01-01

    Abstract This year's Database Issue of Nucleic Acids Research contains 152 papers that include descriptions of 54 new databases and update papers on 98 databases, of which 16 have not been previously featured in NAR. As always, these databases cover a broad range of molecular biology subjects, including genome structure, gene expression and its regulation, proteins, protein domains, and protein–protein interactions. Following the recent trend, an increasing number of new and established databases deal with the issues of human health, from cancer-causing mutations to drugs and drug targets. In accordance with this trend, three recently compiled databases that have been selected by NAR reviewers and editors as ‘breakthrough’ contributions, denovo-db, the Monarch Initiative, and Open Targets, cover human de novo gene variants, disease-related phenotypes in model organisms, and a bioinformatics platform for therapeutic target identification and validation, respectively. We expect these databases to attract the attention of numerous researchers working in various areas of genetics and genomics. Looking back at the past 12 years, we present here the ‘golden set’ of databases that have consistently served as authoritative, comprehensive, and convenient data resources widely used by the entire community and offer some lessons on what makes a successful database. The Database Issue is freely available online at the https://academic.oup.com/nar web site. An updated version of the NAR Molecular Biology Database Collection is available at http://www.oxfordjournals.org/nar/database/a/. PMID:28053160

  6. A user-friendly phytoremediation database: creating the searchable database, the users, and the broader implications.

    PubMed

    Famulari, Stevie; Witz, Kyla

    2015-01-01

    Designers, students, teachers, gardeners, farmers, landscape architects, architects, engineers, homeowners, and others have uses for the practice of phytoremediation. This research looks at the creation of a phytoremediation database which is designed for ease of use for a non-scientific user, as well as for students in an educational setting ( http://www.steviefamulari.net/phytoremediation ). During 2012, Environmental Artist & Professor of Landscape Architecture Stevie Famulari, with assistance from Kyla Witz, a landscape architecture student, created an online searchable database designed for high public accessibility. The database is a record of research of plant species that aid in the uptake of contaminants, including metals, organic materials, biodiesels & oils, and radionuclides. The database consists of multiple interconnected indexes categorized into common and scientific plant name, contaminant name, and contaminant type. It includes photographs, hardiness zones, specific plant qualities, full citations to the original research, and other relevant information intended to aid those designing with phytoremediation search for potential plants which may be used to address their site's need. The objective of the terminology section is to remove uncertainty for more inexperienced users, and to clarify terms for a more user-friendly experience. Implications of the work, including education and ease of browsing, as well as use of the database in teaching, are discussed.

  7. Development of an inorganic and organic aerosol model (CHIMERE 2017β v1.0): seasonal and spatial evaluation over Europe

    NASA Astrophysics Data System (ADS)

    Couvidat, Florian; Bessagnet, Bertrand; Garcia-Vivanco, Marta; Real, Elsa; Menut, Laurent; Colette, Augustin

    2018-01-01

    A new aerosol module was developed and integrated in the air quality model CHIMERE. Developments include the use of the Model of Emissions and Gases and Aerosols from Nature (MEGAN) 2.1 for biogenic emissions, the implementation of the inorganic thermodynamic model ISORROPIA 2.1, revision of wet deposition processes and of the algorithms of condensation/evaporation and coagulation and the implementation of the secondary organic aerosol (SOA) mechanism H2O and the thermodynamic model SOAP. Concentrations of particles over Europe were simulated by the model for the year 2013. Model concentrations were compared to the European Monitoring and Evaluation Programme (EMEP) observations and other observations available in the EBAS database to evaluate the performance of the model. Performances were determined for several components of particles (sea salt, sulfate, ammonium, nitrate, organic aerosol) with a seasonal and regional analysis of results. The model gives satisfactory performance in general. For sea salt, the model succeeds in reproducing the seasonal evolution of concentrations for western and central Europe. For sulfate, except for an overestimation of sulfate in northern Europe, modeled concentrations are close to observations and the model succeeds in reproducing the seasonal evolution of concentrations. For organic aerosol, the model reproduces with satisfactory results concentrations for stations with strong modeled biogenic SOA concentrations. However, the model strongly overestimates ammonium nitrate concentrations during late autumn (possibly due to problems in the temporal evolution of emissions) and strongly underestimates summer organic aerosol concentrations over most of the stations (especially in the northern half of Europe). This underestimation could be due to a lack of anthropogenic SOA or biogenic emissions in northern Europe. A list of recommended tests and developments to improve the model is also given.

  8. The Cambridge Structural Database in retrospect and prospect.

    PubMed

    Groom, Colin R; Allen, Frank H

    2014-01-13

    The Cambridge Crystallographic Data Centre (CCDC) was established in 1965 to record numerical, chemical and bibliographic data relating to published organic and metal-organic crystal structures. The Cambridge Structural Database (CSD) now stores data for nearly 700,000 structures and is a comprehensive and fully retrospective historical archive of small-molecule crystallography. Nearly 40,000 new structures are added each year. As X-ray crystallography celebrates its centenary as a subject, and the CCDC approaches its own 50th year, this article traces the origins of the CCDC as a publicly funded organization and its onward development into a self-financing charitable institution. Principally, however, we describe the growth of the CSD and its extensive associated software system, and summarize its impact and value as a basis for research in structural chemistry, materials science and the life sciences, including drug discovery and drug development. Finally, the article considers the CCDC's funding model in relation to open access and open data paradigms. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  9. BioQ: tracing experimental origins in public genomic databases using a novel data provenance model.

    PubMed

    Saccone, Scott F; Quan, Jiaxi; Jones, Peter L

    2012-04-15

    Public genomic databases, which are often used to guide genetic studies of human disease, are now being applied to genomic medicine through in silico integrative genomics. These databases, however, often lack tools for systematically determining the experimental origins of the data. We introduce a new data provenance model that we have implemented in a public web application, BioQ, for assessing the reliability of the data by systematically tracing its experimental origins to the original subjects and biologics. BioQ allows investigators to both visualize data provenance as well as explore individual elements of experimental process flow using precise tools for detailed data exploration and documentation. It includes a number of human genetic variation databases such as the HapMap and 1000 Genomes projects. BioQ is freely available to the public at http://bioq.saclab.net.

  10. Chemical transport model simulations of organic aerosol in ...

    EPA Pesticide Factsheets

    Gasoline- and diesel-fueled engines are ubiquitous sources of air pollution in urban environments. They emit both primary particulate matter and precursor gases that react to form secondary particulate matter in the atmosphere. In this work, we updated the organic aerosol module and organic emissions inventory of a three-dimensional chemical transport model, the Community Multiscale Air Quality Model (CMAQ), using recent, experimentally derived inputs and parameterizations for mobile sources. The updated model included a revised volatile organic compound (VOC) speciation for mobile sources and secondary organic aerosol (SOA) formation from unspeciated intermediate volatility organic compounds (IVOCs). The updated model was used to simulate air quality in southern California during May and June 2010, when the California Research at the Nexus of Air Quality and Climate Change (CalNex) study was conducted. Compared to the Traditional version of CMAQ, which is commonly used for regulatory applications, the updated model did not significantly alter the predicted organic aerosol (OA) mass concentrations but did substantially improve predictions of OA sources and composition (e.g., POA–SOA split), as well as ambient IVOC concentrations. The updated model, despite substantial differences in emissions and chemistry, performed similar to a recently released research version of CMAQ (Woody et al., 2016) that did not include the updated VOC and IVOC emissions and SOA data

  11. Making Organisms Model Human Behavior: Situated Models in North-American Alcohol Research, 1950-onwards

    PubMed Central

    Leonelli, Sabina; Ankeny, Rachel A.; Nelson, Nicole C.; Ramsden, Edmund

    2014-01-01

    Argument We examine the criteria used to validate the use of nonhuman organisms in North-American alcohol addiction research from the 1950s to the present day. We argue that this field, where the similarities between behaviors in humans and non-humans are particularly difficult to assess, has addressed questions of model validity by transforming the situatedness of non-human organisms into an experimental tool. We demonstrate that model validity does not hinge on the standardization of one type of organism in isolation, as often the case with genetic model organisms. Rather, organisms are viewed as necessarily situated: they cannot be understood as a model for human behavior in isolation from their environmental conditions. Hence the environment itself is standardized as part of the modeling process; and model validity is assessed with reference to the environmental conditions under which organisms are studied. PMID:25233743

  12. Solution Kinetics Database on the Web

    National Institute of Standards and Technology Data Gateway

    SRD 40 NDRL/NIST Solution Kinetics Database on the Web (Web, free access)   Data for free radical processes involving primary radicals from water, inorganic radicals and carbon-centered radicals in solution, and singlet oxygen and organic peroxyl radicals in various solvents.

  13. A new relational database structure and online interface for the HITRAN database

    NASA Astrophysics Data System (ADS)

    Hill, Christian; Gordon, Iouli E.; Rothman, Laurence S.; Tennyson, Jonathan

    2013-11-01

    A new format for the HITRAN database is proposed. By storing the line-transition data in a number of linked tables described by a relational database schema, it is possible to overcome the limitations of the existing format, which have become increasingly apparent over the last few years as new and more varied data are being used by radiative-transfer models. Although the database in the new format can be searched using the well-established Structured Query Language (SQL), a web service, HITRANonline, has been deployed to allow users to make most common queries of the database using a graphical user interface in a web page. The advantages of the relational form of the database to ensuring data integrity and consistency are explored, and the compatibility of the online interface with the emerging standards of the Virtual Atomic and Molecular Data Centre (VAMDC) project is discussed. In particular, the ability to access HITRAN data using a standard query language from other websites, command line tools and from within computer programs is described.

  14. Self-organizing map models of language acquisition

    PubMed Central

    Li, Ping; Zhao, Xiaowei

    2013-01-01

    Connectionist models have had a profound impact on theories of language. While most early models were inspired by the classic parallel distributed processing architecture, recent models of language have explored various other types of models, including self-organizing models for language acquisition. In this paper, we aim at providing a review of the latter type of models, and highlight a number of simulation experiments that we have conducted based on these models. We show that self-organizing connectionist models can provide significant insights into long-standing debates in both monolingual and bilingual language development. We suggest future directions in which these models can be extended, to better connect with behavioral and neural data, and to make clear predictions in testing relevant psycholinguistic theories. PMID:24312061

  15. Web application and database modeling of traffic impact analysis using Google Maps

    NASA Astrophysics Data System (ADS)

    Yulianto, Budi; Setiono

    2017-06-01

    Traffic impact analysis (TIA) is a traffic study that aims at identifying the impact of traffic generated by development or change in land use. In addition to identifying the traffic impact, TIA is also equipped with mitigation measurement to minimize the arising traffic impact. TIA has been increasingly important since it was defined in the act as one of the requirements in the proposal of Building Permit. The act encourages a number of TIA studies in various cities in Indonesia, including Surakarta. For that reason, it is necessary to study the development of TIA by adopting the concept Transportation Impact Control (TIC) in the implementation of the TIA standard document and multimodal modeling. It includes TIA's standardization for technical guidelines, database and inspection by providing TIA checklists, monitoring and evaluation. The research was undertaken by collecting the historical data of junctions, modeling of the data in the form of relational database, building a user interface for CRUD (Create, Read, Update and Delete) the TIA data in the form of web programming with Google Maps libraries. The result research is a system that provides information that helps the improvement and repairment of TIA documents that exist today which is more transparent, reliable and credible.

  16. Hymenoptera Genome Database: integrating genome annotations in HymenopteraMine

    PubMed Central

    Elsik, Christine G.; Tayal, Aditi; Diesh, Colin M.; Unni, Deepak R.; Emery, Marianne L.; Nguyen, Hung N.; Hagen, Darren E.

    2016-01-01

    We report an update of the Hymenoptera Genome Database (HGD) (http://HymenopteraGenome.org), a model organism database for insect species of the order Hymenoptera (ants, bees and wasps). HGD maintains genomic data for 9 bee species, 10 ant species and 1 wasp, including the versions of genome and annotation data sets published by the genome sequencing consortiums and those provided by NCBI. A new data-mining warehouse, HymenopteraMine, based on the InterMine data warehousing system, integrates the genome data with data from external sources and facilitates cross-species analyses based on orthology. New genome browsers and annotation tools based on JBrowse/WebApollo provide easy genome navigation, and viewing of high throughput sequence data sets and can be used for collaborative genome annotation. All of the genomes and annotation data sets are combined into a single BLAST server that allows users to select and combine sequence data sets to search. PMID:26578564

  17. The impact of extracerebral organ failure on outcome of patients after cardiac arrest: an observational study from the ICON database.

    PubMed

    Nobile, Leda; Taccone, Fabio S; Szakmany, Tamas; Sakr, Yasser; Jakob, Stephan M; Pellis, Tommaso; Antonelli, Massimo; Leone, Marc; Wittebole, Xavier; Pickkers, Peter; Vincent, Jean-Louis

    2016-11-14

    We used data from a large international database to assess the incidence and impact of extracerebral organ dysfunction on prognosis of patients admitted after cardiac arrest (CA). This was a sub-analysis of the Intensive Care Over Nations (ICON) database, which contains data from all adult patients admitted to one of 730 participating intensive care units (ICUs) in 84 countries from 8-18 May 2012, except admissions for routine postoperative surveillance. For this analysis, patients admitted after CA (defined as those with "post-anoxic coma" or "cardiac arrest" as the reason for ICU admission) were included. Data were collected daily in the ICU for a maximum of 28 days; patients were followed up for outcome data until death, hospital discharge, or a maximum of 60 days in-hospital. Favorable neurological outcome was defined as alive at hospital discharge with a last available neurological Sequential Organ Failure Assessment (SOFA) subscore of 0-2. Among the 469 patients admitted after CA, 250 (53 %) had had out-of-hospital CA; 210 (45 %) patients died in the ICU and 357 (76 %) had an unfavorable neurological outcome. Non-survivors had a higher incidence of renal (43 vs. 16 %), cardiovascular (56 vs. 45 %), and respiratory (62 vs. 48 %) failure on admission and during the ICU stay than survivors (all p < 0.05). Similar results were found for patients with unfavorable vs. favorable neurological outcomes. In multivariable analysis, independent predictors of ICU mortality were renal failure on admission, high admission Simplified Acute Physiology Score (SAPS) II, high maximum serum lactate levels within the first 24 h after ICU admission, and development of sepsis. Independent predictors of unfavorable neurological outcome were mechanical ventilation on admission, high admission SAPS II score, and neurological dysfunction on admission. In this multicenter cohort, extracerebral organ dysfunction was common in CA patients. Renal failure on admission was the

  18. Distributed Database Control and Allocation. Volume 3. Distributed Database System Designer’s Handbook.

    DTIC Science & Technology

    1983-10-01

    Multiversion Data 2-18 2.7.1 Multiversion Timestamping 2-20 2.T.2 Multiversion Looking 2-20 2.8 Combining the Techniques 2-22 3. Database Recovery Algorithms...See rTHEM79, GIFF79] for details. 2.7 Multiversion Data Let us return to a database system model where each logical data item is stored at one DM...In a multiversion database each Write wifxl, produces a new copy (or version) of x, denoted xi. Thus, the value of z is a set of ver- sions. For each

  19. Thermodynamic Modeling of Organic-Inorganic Aerosols with the Group-Contribution Model AIOMFAC

    NASA Astrophysics Data System (ADS)

    Zuend, A.; Marcolli, C.; Luo, B. P.; Peter, T.

    2009-04-01

    Liquid aerosol particles are - from a physicochemical viewpoint - mixtures of inorganic salts, acids, water and a large variety of organic compounds (Rogge et al., 1993; Zhang et al., 2007). Molecular interactions between these aerosol components lead to deviations from ideal thermodynamic behavior. Strong non-ideality between organics and dissolved ions may influence the aerosol phases at equilibrium by means of liquid-liquid phase separations into a mainly polar (aqueous) and a less polar (organic) phase. A number of activity models exists to successfully describe the thermodynamic equilibrium of aqueous electrolyte solutions. However, the large number of different, often multi-functional, organic compounds in mixed organic-inorganic particles is a challenging problem for the development of thermodynamic models. The group-contribution concept as introduced in the UNIFAC model by Fredenslund et al. (1975), is a practical method to handle this difficulty and to add a certain predictability for unknown organic substances. We present the group-contribution model AIOMFAC (Aerosol Inorganic-Organic Mixtures Functional groups Activity Coefficients), which explicitly accounts for molecular interactions between solution constituents, both organic and inorganic, to calculate activities, chemical potentials and the total Gibbs energy of mixed systems (Zuend et al., 2008). This model enables the computation of vapor-liquid (VLE), liquid-liquid (LLE) and solid-liquid (SLE) equilibria within one framework. Focusing on atmospheric applications we considered eight different cations, five anions and a wide range of alcohols/polyols as organic compounds. With AIOMFAC, the activities of the components within an aqueous electrolyte solution are very well represented up to high ionic strength. We show that the semi-empirical middle-range parametrization of direct organic-inorganic interactions in alcohol-water-salt solutions enables accurate computations of vapor-liquid and liquid

  20. SolCyc: a database hub at the Sol Genomics Network (SGN) for the manual curation of metabolic networks in Solanum and Nicotiana specific databases

    PubMed Central

    Foerster, Hartmut; Bombarely, Aureliano; Battey, James N D; Sierro, Nicolas; Ivanov, Nikolai V; Mueller, Lukas A

    2018-01-01

    Abstract SolCyc is the entry portal to pathway/genome databases (PGDBs) for major species of the Solanaceae family hosted at the Sol Genomics Network. Currently, SolCyc comprises six organism-specific PGDBs for tomato, potato, pepper, petunia, tobacco and one Rubiaceae, coffee. The metabolic networks of those PGDBs have been computationally predicted by the pathologic component of the pathway tools software using the manually curated multi-domain database MetaCyc (http://www.metacyc.org/) as reference. SolCyc has been recently extended by taxon-specific databases, i.e. the family-specific SolanaCyc database, containing only curated data pertinent to species of the nightshade family, and NicotianaCyc, a genus-specific database that stores all relevant metabolic data of the Nicotiana genus. Through manual curation of the published literature, new metabolic pathways have been created in those databases, which are complemented by the continuously updated, relevant species-specific pathways from MetaCyc. At present, SolanaCyc comprises 199 pathways and 29 superpathways and NicotianaCyc accounts for 72 pathways and 13 superpathways. Curator-maintained, taxon-specific databases such as SolanaCyc and NicotianaCyc are characterized by an enrichment of data specific to these taxa and free of falsely predicted pathways. Both databases have been used to update recently created Nicotiana-specific databases for Nicotiana tabacum, Nicotiana benthamiana, Nicotiana sylvestris and Nicotiana tomentosiformis by propagating verifiable data into those PGDBs. In addition, in-depth curation of the pathways in N.tabacum has been carried out which resulted in the elimination of 156 pathways from the 569 pathways predicted by pathway tools. Together, in-depth curation of the predicted pathway network and the supplementation with curated data from taxon-specific databases has substantially improved the curation status of the species–specific N.tabacum PGDB. The implementation of this

  1. Virtual Organizations: Trends and Models

    NASA Astrophysics Data System (ADS)

    Nami, Mohammad Reza; Malekpour, Abbaas

    The Use of ICT in business has changed views about traditional business. With VO, organizations with out physical, geographical, or structural constraint can collaborate with together in order to fulfill customer requests in a networked environment. This idea improves resource utilization, reduces development process and costs, and saves time. Virtual Organization (VO) is always a form of partnership and managing partners and handling partnerships are crucial. Virtual organizations are defined as a temporary collection of enterprises that cooperate and share resources, knowledge, and competencies to better respond to business opportunities. This paper presents an overview of virtual organizations and main issues in collaboration such as security and management. It also presents a number of different model approaches according to their purpose and applications.

  2. SASD: the Synthetic Alternative Splicing Database for identifying novel isoform from proteomics

    PubMed Central

    2013-01-01

    Background Alternative splicing is an important and widespread mechanism for generating protein diversity and regulating protein expression. High-throughput identification and analysis of alternative splicing in the protein level has more advantages than in the mRNA level. The combination of alternative splicing database and tandem mass spectrometry provides a powerful technique for identification, analysis and characterization of potential novel alternative splicing protein isoforms from proteomics. Therefore, based on the peptidomic database of human protein isoforms for proteomics experiments, our objective is to design a new alternative splicing database to 1) provide more coverage of genes, transcripts and alternative splicing, 2) exclusively focus on the alternative splicing, and 3) perform context-specific alternative splicing analysis. Results We used a three-step pipeline to create a synthetic alternative splicing database (SASD) to identify novel alternative splicing isoforms and interpret them at the context of pathway, disease, drug and organ specificity or custom gene set with maximum coverage and exclusive focus on alternative splicing. First, we extracted information on gene structures of all genes in the Ensembl Genes 71 database and incorporated the Integrated Pathway Analysis Database. Then, we compiled artificial splicing transcripts. Lastly, we translated the artificial transcripts into alternative splicing peptides. The SASD is a comprehensive database containing 56,630 genes (Ensembl gene IDs), 95,260 transcripts (Ensembl transcript IDs), and 11,919,779 Alternative Splicing peptides, and also covering about 1,956 pathways, 6,704 diseases, 5,615 drugs, and 52 organs. The database has a web-based user interface that allows users to search, display and download a single gene/transcript/protein, custom gene set, pathway, disease, drug, organ related alternative splicing. Moreover, the quality of the database was validated with comparison to other

  3. New Directions in Library and Information Science Education. Final Report. Volume 2.6: Database Distributor/Service Professional Competencies.

    ERIC Educational Resources Information Center

    Griffiths, Jose-Marie; And Others

    This document contains validated activities and competencies needed by librarians working in a database distributor/service organization. The activities of professionals working in database distributor/service organizations are listed by function: Database Processing; Customer Support; System Administration; and Planning. The competencies are…

  4. FCDD: A Database for Fruit Crops Diseases.

    PubMed

    Chauhan, Rupal; Jasrai, Yogesh; Pandya, Himanshu; Chaudhari, Suman; Samota, Chand Mal

    2014-01-01

    Fruit Crops Diseases Database (FCDD) requires a number of biotechnology and bioinformatics tools. The FCDD is a unique bioinformatics resource that compiles information about 162 details on fruit crops diseases, diseases type, its causal organism, images, symptoms and their control. The FCDD contains 171 phytochemicals from 25 fruits, their 2D images and their 20 possible sequences. This information has been manually extracted and manually verified from numerous sources, including other electronic databases, textbooks and scientific journals. FCDD is fully searchable and supports extensive text search. The main focus of the FCDD is on providing possible information of fruit crops diseases, which will help in discovery of potential drugs from one of the common bioresource-fruits. The database was developed using MySQL. The database interface is developed in PHP, HTML and JAVA. FCDD is freely available. http://www.fruitcropsdd.com/

  5. Influence of dissolved organic carbon content on modelling natural organic matter acid-base properties.

    PubMed

    Garnier, Cédric; Mounier, Stéphane; Benaïm, Jean Yves

    2004-10-01

    Natural organic matter (NOM) behaviour towards proton is an important parameter to understand NOM fate in the environment. Moreover, it is necessary to determine NOM acid-base properties before investigating trace metals complexation by natural organic matter. This work focuses on the possibility to determine these acid-base properties by accurate and simple titrations, even at low organic matter concentrations. So, the experiments were conducted on concentrated and diluted solutions of extracted humic and fulvic acid from Laurentian River, on concentrated and diluted model solutions of well-known simple molecules (acetic and phenolic acids), and on natural samples from the Seine river (France) which are not pre-concentrated. Titration experiments were modelled by a 6 acidic-sites discrete model, except for the model solutions. The modelling software used, called PROSECE (Programme d'Optimisation et de SpEciation Chimique dans l'Environnement), has been developed in our laboratory, is based on the mass balance equilibrium resolution. The results obtained on extracted organic matter and model solutions point out a threshold value for a confident determination of the studied organic matter acid-base properties. They also show an aberrant decreasing carboxylic/phenolic ratio with increasing sample dilution. This shift is neither due to any conformational effect, since it is also observed on model solutions, nor to ionic strength variations which is controlled during all experiments. On the other hand, it could be the result of an electrode troubleshooting occurring at basic pH values, which effect is amplified at low total concentration of acidic sites. So, in our conditions, the limit for a correct modelling of NOM acid-base properties is defined as 0.04 meq of total analysed acidic sites concentration. As for the analysed natural samples, due to their high acidic sites content, it is possible to model their behaviour despite the low organic carbon concentration.

  6. Designing Corporate Databases to Support Technology Innovation

    ERIC Educational Resources Information Center

    Gultz, Michael Jarett

    2012-01-01

    Based on a review of the existing literature on database design, this study proposed a unified database model to support corporate technology innovation. This study assessed potential support for the model based on the opinions of 200 technology industry executives, including Chief Information Officers, Chief Knowledge Officers and Chief Learning…

  7. Architecture Knowledge for Evaluating Scalable Databases

    DTIC Science & Technology

    2015-01-16

    problems, arising from the proliferation of new data models and distributed technologies for building scalable, available data stores . Architects must...longer are relational databases the de facto standard for building data repositories. Highly distributed, scalable “ NoSQL ” databases [11] have emerged...This is especially challenging at the data storage layer. The multitude of competing NoSQL database technologies creates a complex and rapidly

  8. Shewregdb: Database and visualization environment for experimental and predicted regulatory information in Shewanella oneidensis mr-1

    PubMed Central

    Syed, Mustafa H; Karpinets, Tatiana V; Leuze, Michael R; Kora, Guruprasad H; Romine, Margaret R; Uberbacher, Edward C

    2009-01-01

    Shewanella oneidensis MR-1 is an important model organism for environmental research as it has an exceptional metabolic and respiratory versatility regulated by a complex regulatory network. We have developed a database to collect experimental and computational data relating to regulation of gene and protein expression, and, a visualization environment that enables integration of these data types. The regulatory information in the database includes predictions of DNA regulator binding sites, sigma factor binding sites, transcription units, operons, promoters, and RNA regulators including non-coding RNAs, riboswitches, and different types of terminators. Availability http://shewanella-knowledgebase.org:8080/Shewanella/gbrowserLanding.jsp PMID:20198195

  9. Semantic mediation in the national geologic map database (US)

    USGS Publications Warehouse

    Percy, D.; Richard, S.; Soller, D.

    2008-01-01

    Controlled language is the primary challenge in merging heterogeneous databases of geologic information. Each agency or organization produces databases with different schema, and different terminology for describing the objects within. In order to make some progress toward merging these databases using current technology, we have developed software and a workflow that allows for the "manual semantic mediation" of these geologic map databases. Enthusiastic support from many state agencies (stakeholders and data stewards) has shown that the community supports this approach. Future implementations will move toward a more Artificial Intelligence-based approach, using expert-systems or knowledge-bases to process data based on the training sets we have developed manually.

  10. Topsoil organic carbon content of Europe, a new map based on a generalised additive model

    NASA Astrophysics Data System (ADS)

    de Brogniez, Delphine; Ballabio, Cristiano; Stevens, Antoine; Jones, Robert J. A.; Montanarella, Luca; van Wesemael, Bas

    2014-05-01

    There is an increasing demand for up-to-date spatially continuous organic carbon (OC) data for global environment and climatic modeling. Whilst the current map of topsoil organic carbon content for Europe (Jones et al., 2005) was produced by applying expert-knowledge based pedo-transfer rules on large soil mapping units, the aim of this study was to replace it by applying digital soil mapping techniques on the first European harmonised geo-referenced topsoil (0-20 cm) database, which arises from the LUCAS (land use/cover area frame statistical survey) survey. A generalized additive model (GAM) was calibrated on 85% of the dataset (ca. 17 000 soil samples) and a backward stepwise approach selected slope, land cover, temperature, net primary productivity, latitude and longitude as environmental covariates (500 m resolution). The validation of the model (applied on 15% of the dataset), gave an R2 of 0.27. We observed that most organic soils were under-predicted by the model and that soils of Scandinavia were also poorly predicted. The model showed an RMSE of 42 g kg-1 for mineral soils and of 287 g kg-1 for organic soils. The map of predicted OC content showed the lowest values in Mediterranean countries and in croplands across Europe, whereas highest OC content were predicted in wetlands, woodlands and in mountainous areas. The map of standard error of the OC model predictions showed high values in northern latitudes, wetlands, moors and heathlands, whereas low uncertainty was mostly found in croplands. A comparison of our results with the map of Jones et al. (2005) showed a general agreement on the prediction of mineral soils' OC content, most probably because the models use some common covariates, namely land cover and temperature. Our model however failed to predict values of OC content greater than 200 g kg-1, which we explain by the imposed unimodal distribution of our model, whose mean is tilted towards the majority of soils, which are mineral. Finally, average

  11. NanoE-Tox: New and in-depth database concerning ecotoxicity of nanomaterials.

    PubMed

    Juganson, Katre; Ivask, Angela; Blinova, Irina; Mortimer, Monika; Kahru, Anne

    2015-01-01

    The increasing production and use of engineered nanomaterials (ENMs) inevitably results in their higher concentrations in the environment. This may lead to undesirable environmental effects and thus warrants risk assessment. The ecotoxicity testing of a wide variety of ENMs rapidly evolving in the market is costly but also ethically questionable when bioassays with vertebrates are conducted. Therefore, alternative methods, e.g., models for predicting toxicity mechanisms of ENMs based on their physico-chemical properties (e.g., quantitative (nano)structure-activity relationships, QSARs/QNARs), should be developed. While the development of such models relies on good-quality experimental toxicity data, most of the available data in the literature even for the same test species are highly variable. In order to map and analyse the state of the art of the existing nanoecotoxicological information suitable for QNARs, we created a database NanoE-Tox that is available as Supporting Information File 1. The database is based on existing literature on ecotoxicology of eight ENMs with different chemical composition: carbon nanotubes (CNTs), fullerenes, silver (Ag), titanium dioxide (TiO2), zinc oxide (ZnO), cerium dioxide (CeO2), copper oxide (CuO), and iron oxide (FeO x ; Fe2O3, Fe3O4). Altogether, NanoE-Tox database consolidates data from 224 articles and lists altogether 1,518 toxicity values (EC50/LC50/NOEC) with corresponding test conditions and physico-chemical parameters of the ENMs as well as reported toxicity mechanisms and uptake of ENMs in the organisms. 35% of the data in NanoE-Tox concerns ecotoxicity of Ag NPs, followed by TiO2 (22%), CeO2 (13%), and ZnO (10%). Most of the data originates from studies with crustaceans (26%), bacteria (17%), fish (13%), and algae (11%). Based on the median toxicity values of the most sensitive organism (data derived from three or more articles) the toxicity order was as follows: Ag > ZnO > CuO > CeO2 > CNTs > TiO2 > FeO x . We

  12. Specialized microbial databases for inductive exploration of microbial genome sequences

    PubMed Central

    Fang, Gang; Ho, Christine; Qiu, Yaowu; Cubas, Virginie; Yu, Zhou; Cabau, Cédric; Cheung, Frankie; Moszer, Ivan; Danchin, Antoine

    2005-01-01

    Background The enormous amount of genome sequence data asks for user-oriented databases to manage sequences and annotations. Queries must include search tools permitting function identification through exploration of related objects. Methods The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented. Results Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore , a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya) has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns. Conclusion This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis) associated to related organisms for comparison. PMID:15698474

  13. KaBOB: ontology-based semantic integration of biomedical databases.

    PubMed

    Livingston, Kevin M; Bada, Michael; Baumgartner, William A; Hunter, Lawrence E

    2015-04-23

    The ability to query many independent biological databases using a common ontology-based semantic model would facilitate deeper integration and more effective utilization of these diverse and rapidly growing resources. Despite ongoing work moving toward shared data formats and linked identifiers, significant problems persist in semantic data integration in order to establish shared identity and shared meaning across heterogeneous biomedical data sources. We present five processes for semantic data integration that, when applied collectively, solve seven key problems. These processes include making explicit the differences between biomedical concepts and database records, aggregating sets of identifiers denoting the same biomedical concepts across data sources, and using declaratively represented forward-chaining rules to take information that is variably represented in source databases and integrating it into a consistent biomedical representation. We demonstrate these processes and solutions by presenting KaBOB (the Knowledge Base Of Biomedicine), a knowledge base of semantically integrated data from 18 prominent biomedical databases using common representations grounded in Open Biomedical Ontologies. An instance of KaBOB with data about humans and seven major model organisms can be built using on the order of 500 million RDF triples. All source code for building KaBOB is available under an open-source license. KaBOB is an integrated knowledge base of biomedical data representationally based in prominent, actively maintained Open Biomedical Ontologies, thus enabling queries of the underlying data in terms of biomedical concepts (e.g., genes and gene products, interactions and processes) rather than features of source-specific data schemas or file formats. KaBOB resolves many of the issues that routinely plague biomedical researchers intending to work with data from multiple data sources and provides a platform for ongoing data integration and development and for

  14. NHEXAS PHASE I ARIZONA STUDY--STANDARD OPERATING PROCEDURE FOR DATABASE TREE AND DATA SOURCES (UA-D-41.0)

    EPA Science Inventory

    The purpose of this SOP is to describe the database storage organization, as well as describe the sources of data for each database used during the Arizona NHEXAS project and the "Border" study. Keywords: data; database; organization.

    The National Human Exposure Assessment Sur...

  15. Hmrbase: a database of hormones and their receptors

    PubMed Central

    Rashid, Mamoon; Singla, Deepak; Sharma, Arun; Kumar, Manish; Raghava, Gajendra PS

    2009-01-01

    Background Hormones are signaling molecules that play vital roles in various life processes, like growth and differentiation, physiology, and reproduction. These molecules are mostly secreted by endocrine glands, and transported to target organs through the bloodstream. Deficient, or excessive, levels of hormones are associated with several diseases such as cancer, osteoporosis, diabetes etc. Thus, it is important to collect and compile information about hormones and their receptors. Description This manuscript describes a database called Hmrbase which has been developed for managing information about hormones and their receptors. It is a highly curated database for which information has been collected from the literature and the public databases. The current version of Hmrbase contains comprehensive information about ~2000 hormones, e.g., about their function, source organism, receptors, mature sequences, structures etc. Hmrbase also contains information about ~3000 hormone receptors, in terms of amino acid sequences, subcellular localizations, ligands, and post-translational modifications etc. One of the major features of this database is that it provides data about ~4100 hormone-receptor pairs. A number of online tools have been integrated into the database, to provide the facilities like keyword search, structure-based search, mapping of a given peptide(s) on the hormone/receptor sequence, sequence similarity search. This database also provides a number of external links to other resources/databases in order to help in the retrieving of further related information. Conclusion Owing to the high impact of endocrine research in the biomedical sciences, the Hmrbase could become a leading data portal for researchers. The salient features of Hmrbase are hormone-receptor pair-related information, mapping of peptide stretches on the protein sequences of hormones and receptors, Pfam domain annotations, categorical browsing options, online data submission, Drug

  16. Screening-level models to estimate partition ratios of organic chemicals between polymeric materials, air and water.

    PubMed

    Reppas-Chrysovitsinos, Efstathios; Sobek, Anna; MacLeod, Matthew

    2016-06-15

    Polymeric materials flowing through the technosphere are repositories of organic chemicals throughout their life cycle. Equilibrium partition ratios of organic chemicals between these materials and air (KMA) or water (KMW) are required for models of fate and transport, high-throughput exposure assessment and passive sampling. KMA and KMW have been measured for a growing number of chemical/material combinations, but significant data gaps still exist. We assembled a database of 363 KMA and 910 KMW measurements for 446 individual compounds and nearly 40 individual polymers and biopolymers, collected from 29 studies. We used the EPI Suite and ABSOLV software packages to estimate physicochemical properties of the compounds and we employed an empirical correlation based on Trouton's rule to adjust the measured KMA and KMW values to a standard reference temperature of 298 K. Then, we used a thermodynamic triangle with Henry's law constant to calculate a complete set of 1273 KMA and KMW values. Using simple linear regression, we developed a suite of single parameter linear free energy relationship (spLFER) models to estimate KMA from the EPI Suite-estimated octanol-air partition ratio (KOA) and KMW from the EPI Suite-estimated octanol-water (KOW) partition ratio. Similarly, using multiple linear regression, we developed a set of polyparameter linear free energy relationship (ppLFER) models to estimate KMA and KMW from ABSOLV-estimated Abraham solvation parameters. We explored the two LFER approaches to investigate (1) their performance in estimating partition ratios, and (2) uncertainties associated with treating all different polymers as a single "bulk" polymeric material compartment. The models we have developed are suitable for screening assessments of the tendency for organic chemicals to be emitted from materials, and for use in multimedia models of the fate of organic chemicals in the indoor environment. In screening applications we recommend that KMA and KMW be

  17. ALDB: a domestic-animal long noncoding RNA database.

    PubMed

    Li, Aimin; Zhang, Junying; Zhou, Zhongyin; Wang, Lei; Liu, Yujuan; Liu, Yajun

    2015-01-01

    Long noncoding RNAs (lncRNAs) have attracted significant attention in recent years due to their important roles in many biological processes. Domestic animals constitute a unique resource for understanding the genetic basis of phenotypic variation and are ideal models relevant to diverse areas of biomedical research. With improving sequencing technologies, numerous domestic-animal lncRNAs are now available. Thus, there is an immediate need for a database resource that can assist researchers to store, organize, analyze and visualize domestic-animal lncRNAs. The domestic-animal lncRNA database, named ALDB, is the first comprehensive database with a focus on the domestic-animal lncRNAs. It currently archives 12,103 pig intergenic lncRNAs (lincRNAs), 8,923 chicken lincRNAs and 8,250 cow lincRNAs. In addition to the annotations of lincRNAs, it offers related data that is not available yet in existing lncRNA databases (lncRNAdb and NONCODE), such as genome-wide expression profiles and animal quantitative trait loci (QTLs) of domestic animals. Moreover, a collection of interfaces and applications, such as the Basic Local Alignment Search Tool (BLAST), the Generic Genome Browser (GBrowse) and flexible search functionalities, are available to help users effectively explore, analyze and download data related to domestic-animal lncRNAs. ALDB enables the exploration and comparative analysis of lncRNAs in domestic animals. A user-friendly web interface, integrated information and tools make it valuable to researchers in their studies. ALDB is freely available from http://res.xaut.edu.cn/aldb/index.jsp.

  18. Query Monitoring and Analysis for Database Privacy - A Security Automata Model Approach

    PubMed Central

    Kumar, Anand; Ligatti, Jay; Tu, Yi-Cheng

    2015-01-01

    Privacy and usage restriction issues are important when valuable data are exchanged or acquired by different organizations. Standard access control mechanisms either restrict or completely grant access to valuable data. On the other hand, data obfuscation limits the overall usability and may result in loss of total value. There are no standard policy enforcement mechanisms for data acquired through mutual and copyright agreements. In practice, many different types of policies can be enforced in protecting data privacy. Hence there is the need for an unified framework that encapsulates multiple suites of policies to protect the data. We present our vision of an architecture named security automata model (SAM) to enforce privacy-preserving policies and usage restrictions. SAM analyzes the input queries and their outputs to enforce various policies, liberating data owners from the burden of monitoring data access. SAM allows administrators to specify various policies and enforces them to monitor queries and control the data access. Our goal is to address the problems of data usage control and protection through privacy policies that can be defined, enforced, and integrated with the existing access control mechanisms using SAM. In this paper, we lay out the theoretical foundation of SAM, which is based on an automata named Mandatory Result Automata. We also discuss the major challenges of implementing SAM in a real-world database environment as well as ideas to meet such challenges. PMID:26997936

  19. Illuminating the Depths of the MagIC (Magnetics Information Consortium) Database

    NASA Astrophysics Data System (ADS)

    Koppers, A. A. P.; Minnett, R.; Jarboe, N.; Jonestrask, L.; Tauxe, L.; Constable, C.

    2015-12-01

    The Magnetics Information Consortium (http://earthref.org/MagIC/) is a grass-roots cyberinfrastructure effort envisioned by the paleo-, geo-, and rock magnetic scientific community. Its mission is to archive their wealth of peer-reviewed raw data and interpretations from magnetics studies on natural and synthetic samples. Many of these valuable data are legacy datasets that were never published in their entirety, some resided in other databases that are no longer maintained, and others were never digitized from the field notebooks and lab work. Due to the volume of data collected, most studies, modern and legacy, only publish the interpreted results and, occasionally, a subset of the raw data. MagIC is making an extraordinary effort to archive these data in a single data model, including the raw instrument measurements if possible. This facilitates the reproducibility of the interpretations, the re-interpretation of the raw data as the community introduces new techniques, and the compilation of heterogeneous datasets that are otherwise distributed across multiple formats and physical locations. MagIC has developed tools to assist the scientific community in many stages of their workflow. Contributors easily share studies (in a private mode if so desired) in the MagIC Database with colleagues and reviewers prior to publication, publish the data online after the study is peer reviewed, and visualize their data in the context of the rest of the contributions to the MagIC Database. From organizing their data in the MagIC Data Model with an online editable spreadsheet, to validating the integrity of the dataset with automated plots and statistics, MagIC is continually lowering the barriers to transforming dark data into transparent and reproducible datasets. Additionally, this web application generalizes to other databases in MagIC's umbrella website (EarthRef.org) so that the Geochemical Earth Reference Model (http://earthref.org/GERM/) portal, Seamount Biogeosciences

  20. Who's Gonna Pay the Piper for Free Online Databases?

    ERIC Educational Resources Information Center

    Jacso, Peter

    1996-01-01

    Discusses new pricing models for some online services and considers the possibilities for the traditional online database market. Topics include multimedia music databases, including copyright implications; other retail-oriented databases; and paying for free databases with advertising. (LRW)

  1. Molecule database framework: a framework for creating database applications with chemical structure search capability

    PubMed Central

    2013-01-01

    Background Research in organic chemistry generates samples of novel chemicals together with their properties and other related data. The involved scientists must be able to store this data and search it by chemical structure. There are commercial solutions for common needs like chemical registration systems or electronic lab notebooks. However for specific requirements of in-house databases and processes no such solutions exist. Another issue is that commercial solutions have the risk of vendor lock-in and may require an expensive license of a proprietary relational database management system. To speed up and simplify the development for applications that require chemical structure search capabilities, I have developed Molecule Database Framework. The framework abstracts the storing and searching of chemical structures into method calls. Therefore software developers do not require extensive knowledge about chemistry and the underlying database cartridge. This decreases application development time. Results Molecule Database Framework is written in Java and I created it by integrating existing free and open-source tools and frameworks. The core functionality includes: • Support for multi-component compounds (mixtures) • Import and export of SD-files • Optional security (authorization) For chemical structure searching Molecule Database Framework leverages the capabilities of the Bingo Cartridge for PostgreSQL and provides type-safe searching, caching, transactions and optional method level security. Molecule Database Framework supports multi-component chemical compounds (mixtures). Furthermore the design of entity classes and the reasoning behind it are explained. By means of a simple web application I describe how the framework could be used. I then benchmarked this example application to create some basic performance expectations for chemical structure searches and import and export of SD-files. Conclusions By using a simple web application it was

  2. Molecule database framework: a framework for creating database applications with chemical structure search capability.

    PubMed

    Kiener, Joos

    2013-12-11

    Research in organic chemistry generates samples of novel chemicals together with their properties and other related data. The involved scientists must be able to store this data and search it by chemical structure. There are commercial solutions for common needs like chemical registration systems or electronic lab notebooks. However for specific requirements of in-house databases and processes no such solutions exist. Another issue is that commercial solutions have the risk of vendor lock-in and may require an expensive license of a proprietary relational database management system. To speed up and simplify the development for applications that require chemical structure search capabilities, I have developed Molecule Database Framework. The framework abstracts the storing and searching of chemical structures into method calls. Therefore software developers do not require extensive knowledge about chemistry and the underlying database cartridge. This decreases application development time. Molecule Database Framework is written in Java and I created it by integrating existing free and open-source tools and frameworks. The core functionality includes:•Support for multi-component compounds (mixtures)•Import and export of SD-files•Optional security (authorization)For chemical structure searching Molecule Database Framework leverages the capabilities of the Bingo Cartridge for PostgreSQL and provides type-safe searching, caching, transactions and optional method level security. Molecule Database Framework supports multi-component chemical compounds (mixtures).Furthermore the design of entity classes and the reasoning behind it are explained. By means of a simple web application I describe how the framework could be used. I then benchmarked this example application to create some basic performance expectations for chemical structure searches and import and export of SD-files. By using a simple web application it was shown that Molecule Database Framework

  3. ODIN. Online Database Information Network: ODIN Policy & Procedure Manual.

    ERIC Educational Resources Information Center

    Townley, Charles T.; And Others

    Policies and procedures are outlined for the Online Database Information Network (ODIN), a cooperative of libraries in south-central Pennsylvania, which was organized to improve library services through technology. The first section covers organization and goals, members, and responsibilities of the administrative council and libraries. Patrons…

  4. MatProps: Material Properties Database and Associated Access Library

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Durrenberger, J K; Becker, R C; Goto, D M

    2007-08-13

    Coefficients for analytic constitutive and equation of state models (EOS), which are used by many hydro codes at LLNL, are currently stored in a legacy material database (Steinberg, UCRL-MA-106349). Parameters for numerous materials are available through this database, and include Steinberg-Guinan and Steinberg-Lund constitutive models for metals, JWL equations of state for high explosives, and Mie-Gruniesen equations of state for metals. These constitutive models are used in most of the simulations done by ASC codes today at Livermore. Analytic EOSs are also still used, but have been superseded in many cases by tabular representations in LEOS (http://leos.llnl.gov). Numerous advanced constitutivemore » models have been developed and implemented into ASC codes over the past 20 years. These newer models have more physics and better representations of material strength properties than their predecessors, and therefore more model coefficients. However, a material database of these coefficients is not readily available. Therefore incorporating these coefficients with those of the legacy models into a portable database that could be shared amongst codes would be most welcome. The goal of this paper is to describe the MatProp effort at LLNL to create such a database and associated access library that could be used by codes throughout the DOE complex and beyond. We have written an initial version of the MatProp database and access library and our DOE/ASC code ALE3D (Nichols et. al., UCRL-MA-152204) is able to import information from the database. The database, a link to which exists on the Sourceforge server at LLNL, contains coefficients for many materials and models (see Appendix), and includes material parameters in the following categories--flow stress, shear modulus, strength, damage, and equation of state. Future versions of the Matprop database and access library will include the ability to read and write material descriptions that can be exchanged between codes

  5. Disbiome database: linking the microbiome to disease.

    PubMed

    Janssens, Yorick; Nielandt, Joachim; Bronselaer, Antoon; Debunne, Nathan; Verbeke, Frederick; Wynendaele, Evelien; Van Immerseel, Filip; Vandewynckel, Yves-Paul; De Tré, Guy; De Spiegeleer, Bart

    2018-06-04

    Recent research has provided fascinating indications and evidence that the host health is linked to its microbial inhabitants. Due to the development of high-throughput sequencing technologies, more and more data covering microbial composition changes in different disease types are emerging. However, this information is dispersed over a wide variety of medical and biomedical disciplines. Disbiome is a database which collects and presents published microbiota-disease information in a standardized way. The diseases are classified using the MedDRA classification system and the micro-organisms are linked to their NCBI and SILVA taxonomy. Finally, each study included in the Disbiome database is assessed for its reporting quality using a standardized questionnaire. Disbiome is the first database giving a clear, concise and up-to-date overview of microbial composition differences in diseases, together with the relevant information of the studies published. The strength of this database lies within the combination of the presence of references to other databases, which enables both specific and diverse search strategies within the Disbiome database, and the human annotation which ensures a simple and structured presentation of the available data.

  6. Quality control of EUVE databases

    NASA Technical Reports Server (NTRS)

    John, L. M.; Drake, J.

    1992-01-01

    The publicly accessible databases for the Extreme Ultraviolet Explorer include: the EUVE Archive mailserver; the CEA ftp site; the EUVE Guest Observer Mailserver; and the Astronomical Data System node. The EUVE Performance Assurance team is responsible for verifying that these public EUVE databases are working properly, and that the public availability of EUVE data contained therein does not infringe any data rights which may have been assigned. In this poster, we describe the Quality Assurance (QA) procedures we have developed from the approach of QA as a service organization, thus reflecting the overall EUVE philosophy of Quality Assurance integrated into normal operating procedures, rather than imposed as an external, post facto, control mechanism.

  7. Modeling the influence of organic acids on soil weathering

    NASA Astrophysics Data System (ADS)

    Lawrence, Corey; Harden, Jennifer; Maher, Kate

    2014-08-01

    Biological inputs and organic matter cycling have long been regarded as important factors in the physical and chemical development of soils. In particular, the extent to which low molecular weight organic acids, such as oxalate, influence geochemical reactions has been widely studied. Although the effects of organic acids are diverse, there is strong evidence that organic acids accelerate the dissolution of some minerals. However, the influence of organic acids at the field-scale and over the timescales of soil development has not been evaluated in detail. In this study, a reactive-transport model of soil chemical weathering and pedogenic development was used to quantify the extent to which organic acid cycling controls mineral dissolution rates and long-term patterns of chemical weathering. Specifically, oxalic acid was added to simulations of soil development to investigate a well-studied chronosequence of soils near Santa Cruz, CA. The model formulation includes organic acid input, transport, decomposition, organic-metal aqueous complexation and mineral surface complexation in various combinations. Results suggest that although organic acid reactions accelerate mineral dissolution rates near the soil surface, the net response is an overall decrease in chemical weathering. Model results demonstrate the importance of organic acid input concentrations, fluid flow, decomposition and secondary mineral precipitation rates on the evolution of mineral weathering fronts. In particular, model soil profile evolution is sensitive to kaolinite precipitation and oxalate decomposition rates. The soil profile-scale modeling presented here provides insights into the influence of organic carbon cycling on soil weathering and pedogenesis and supports the need for further field-scale measurements of the flux and speciation of reactive organic compounds.

  8. Modeling the influence of organic acids on soil weathering

    USGS Publications Warehouse

    Lawrence, Corey R.; Harden, Jennifer W.; Maher, Kate

    2014-01-01

    Biological inputs and organic matter cycling have long been regarded as important factors in the physical and chemical development of soils. In particular, the extent to which low molecular weight organic acids, such as oxalate, influence geochemical reactions has been widely studied. Although the effects of organic acids are diverse, there is strong evidence that organic acids accelerate the dissolution of some minerals. However, the influence of organic acids at the field-scale and over the timescales of soil development has not been evaluated in detail. In this study, a reactive-transport model of soil chemical weathering and pedogenic development was used to quantify the extent to which organic acid cycling controls mineral dissolution rates and long-term patterns of chemical weathering. Specifically, oxalic acid was added to simulations of soil development to investigate a well-studied chronosequence of soils near Santa Cruz, CA. The model formulation includes organic acid input, transport, decomposition, organic-metal aqueous complexation and mineral surface complexation in various combinations. Results suggest that although organic acid reactions accelerate mineral dissolution rates near the soil surface, the net response is an overall decrease in chemical weathering. Model results demonstrate the importance of organic acid input concentrations, fluid flow, decomposition and secondary mineral precipitation rates on the evolution of mineral weathering fronts. In particular, model soil profile evolution is sensitive to kaolinite precipitation and oxalate decomposition rates. The soil profile-scale modeling presented here provides insights into the influence of organic carbon cycling on soil weathering and pedogenesis and supports the need for further field-scale measurements of the flux and speciation of reactive organic compounds.

  9. Putting "Organizations" into an Organization Theory Course: A Hybrid CAO Model for Teaching Organization Theory

    ERIC Educational Resources Information Center

    Hannah, David R.; Venkatachary, Ranga

    2010-01-01

    In this article, the authors present a retrospective analysis of an instructor's multiyear redesign of a course on organization theory into what is called a hybrid Classroom-as-Organization model. It is suggested that this new course design served to apprentice students to function in quasi-real organizational structures. The authors further argue…

  10. Healing models for organizations: description, measurement, and outcomes.

    PubMed

    Malloch, K

    2000-01-01

    Healthcare leaders are continually searching for ways to improve their ability to provide optimal healthcare services, be financially viable, and retain quality caregivers, often feeling like such goals are impossible to achieve in today's intensely competitive environment. Many healthcare leaders intuitively recognize the need for more humanistic models and the probable connection with positive patient outcomes and financial success but are hesitant to make significant changes in their organizations because of the lack of model descriptions or documented recognition of the clinical and financial advantages of humanistic models. This article describes a study that was developed in response to the increasing work in humanistic or healing environment models and the need for validation of the advantages of such models. The healthy organization model, a framework for healthcare organizations that incorporates humanistic healing values within the traditional structure, is presented as a result of the study. This model addresses the importance of optimal clinical services, financial performance, and staff satisfaction. The five research-based organizational components that form the framework are described, and key indicators of organizational effectiveness over a five-year period are presented. The resulting empirical data are strongly supportive of the healing model and reflect positive outcomes for the organization.

  11. Application of the intelligent techniques in transplantation databases: a review of articles published in 2009 and 2010.

    PubMed

    Sousa, F S; Hummel, A D; Maciel, R F; Cohrs, F M; Falcão, A E J; Teixeira, F; Baptista, R; Mancini, F; da Costa, T M; Alves, D; Pisa, I T

    2011-05-01

    The replacement of defective organs with healthy ones is an old problem, but only a few years ago was this issue put into practice. Improvements in the whole transplantation process have been increasingly important in clinical practice. In this context are clinical decision support systems (CDSSs), which have reflected a significant amount of work to use mathematical and intelligent techniques. The aim of this article was to present consideration of intelligent techniques used in recent years (2009 and 2010) to analyze organ transplant databases. To this end, we performed a search of the PubMed and Institute for Scientific Information (ISI) Web of Knowledge databases to find articles published in 2009 and 2010 about intelligent techniques applied to transplantation databases. Among 69 retrieved articles, we chose according to inclusion and exclusion criteria. The main techniques were: Artificial Neural Networks (ANN), Logistic Regression (LR), Decision Trees (DT), Markov Models (MM), and Bayesian Networks (BN). Most articles used ANN. Some publications described comparisons between techniques or the use of various techniques together. The use of intelligent techniques to extract knowledge from databases of healthcare is increasingly common. Although authors preferred to use ANN, statistical techniques were equally effective for this enterprise. Copyright © 2011 Elsevier Inc. All rights reserved.

  12. Candidate gene database and transcript map for peach, a model species for fruit trees.

    PubMed

    Horn, Renate; Lecouls, Anne-Claire; Callahan, Ann; Dandekar, Abhaya; Garay, Lilibeth; McCord, Per; Howad, Werner; Chan, Helen; Verde, Ignazio; Main, Doreen; Jung, Sook; Georgi, Laura; Forrest, Sam; Mook, Jennifer; Zhebentyayeva, Tatyana; Yu, Yeisoo; Kim, Hye Ran; Jesudurai, Christopher; Sosinski, Bryon; Arús, Pere; Baird, Vance; Parfitt, Dan; Reighard, Gregory; Scorza, Ralph; Tomkins, Jeffrey; Wing, Rod; Abbott, Albert Glenn

    2005-05-01

    Peach (Prunus persica) is a model species for the Rosaceae, which includes a number of economically important fruit tree species. To develop an extensive Prunus expressed sequence tag (EST) database for identifying and cloning the genes important to fruit and tree development, we generated 9,984 high-quality ESTs from a peach cDNA library of developing fruit mesocarp. After assembly and annotation, a putative peach unigene set consisting of 3,842 ESTs was defined. Gene ontology (GO) classification was assigned based on the annotation of the single "best hit" match against the Swiss-Prot database. No significant homology could be found in the GenBank nr databases for 24.3% of the sequences. Using core markers from the general Prunus genetic map, we anchored bacterial artificial chromosome (BAC) clones on the genetic map, thereby providing a framework for the construction of a physical and transcript map. A transcript map was developed by hybridizing 1,236 ESTs from the putative peach unigene set and an additional 68 peach cDNA clones against the peach BAC library. Hybridizing ESTs to genetically anchored BACs immediately localized 11.2% of the ESTs on the genetic map. ESTs showed a clustering of expressed genes in defined regions of the linkage groups. [The data were built into a regularly updated Genome Database for Rosaceae (GDR), available at (http://www.genome.clemson.edu/gdr/).].

  13. Computational assessment of model-based wave separation using a database of virtual subjects.

    PubMed

    Hametner, Bernhard; Schneider, Magdalena; Parragh, Stephanie; Wassertheurer, Siegfried

    2017-11-07

    The quantification of arterial wave reflection is an important area of interest in arterial pulse wave analysis. It can be achieved by wave separation analysis (WSA) if both the aortic pressure waveform and the aortic flow waveform are known. For better applicability, several mathematical models have been established to estimate aortic flow solely based on pressure waveforms. The aim of this study is to investigate and verify the model-based wave separation of the ARCSolver method on virtual pulse wave measurements. The study is based on an open access virtual database generated via simulations. Seven cardiac and arterial parameters were varied within physiological healthy ranges, leading to a total of 3325 virtual healthy subjects. For assessing the model-based ARCSolver method computationally, this method was used to perform WSA based on the aortic root pressure waveforms of the virtual patients. Asa reference, the values of WSA using both the pressure and flow waveforms provided by the virtual database were taken. The investigated parameters showed a good overall agreement between the model-based method and the reference. Mean differences and standard deviations were -0.05±0.02AU for characteristic impedance, -3.93±1.79mmHg for forward pressure amplitude, 1.37±1.56mmHg for backward pressure amplitude and 12.42±4.88% for reflection magnitude. The results indicate that the mathematical blood flow model of the ARCSolver method is a feasible surrogate for a measured flow waveform and provides a reasonable way to assess arterial wave reflection non-invasively in healthy subjects. Copyright © 2017 Elsevier Ltd. All rights reserved.

  14. Advanced transportation system studies. Alternate propulsion subsystem concepts: Propulsion database

    NASA Technical Reports Server (NTRS)

    Levack, Daniel

    1993-01-01

    The Advanced Transportation System Studies alternate propulsion subsystem concepts propulsion database interim report is presented. The objective of the database development task is to produce a propulsion database which is easy to use and modify while also being comprehensive in the level of detail available. The database is to be available on the Macintosh computer system. The task is to extend across all three years of the contract. Consequently, a significant fraction of the effort in this first year of the task was devoted to the development of the database structure to ensure a robust base for the following years' efforts. Nonetheless, significant point design propulsion system descriptions and parametric models were also produced. Each of the two propulsion databases, parametric propulsion database and propulsion system database, are described. The descriptions include a user's guide to each code, write-ups for models used, and sample output. The parametric database has models for LOX/H2 and LOX/RP liquid engines, solid rocket boosters using three different propellants, a hybrid rocket booster, and a NERVA derived nuclear thermal rocket engine.

  15. International energy: Research organizations, 1988--1992. Revision 1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hendricks, P.; Jordan, S.

    This publication contains the standardized names of energy research organizations used in energy information databases. Involved in this cooperative task are (1) the technical staff of the US DOE Office of Scientific and Technical Information (OSTI) in cooperation with the member countries of the Energy Technology Data Exchange (ETDE) and (2) the International Nuclear Information System (INIS). ETDE member countries are also members of the International Nuclear Information System (INIS). Nuclear organization names recorded for INIS by these ETDE member countries are also included in the ETDE Energy Database. Therefore, these organization names are cooperatively standardized for use in bothmore » information systems. This publication identifies current organizations doing research in all energy fields, standardizes the format for recording these organization names in bibliographic citations, assigns a numeric code to facilitate data entry, and identifies report number prefixes assigned by these organizations. These research organization names may be used in searching the databases ``Energy Science & Technology`` on DIALOG and ``Energy`` on STN International. These organization names are also used in USDOE databases on the Integrated Technical Information System. Research organizations active in the past five years, as indicated by database records, were identified to form this publication. This directory includes approximately 31,000 organizations that reported energy-related literature from 1988 to 1992 and updates the DOE Energy Data Base: Corporate Author Entries.« less

  16. Follicle Online: an integrated database of follicle assembly, development and ovulation

    PubMed Central

    Hua, Juan; Xu, Bo; Yang, Yifan; Ban, Rongjun; Iqbal, Furhan; Zhang, Yuanwei; Shi, Qinghua

    2015-01-01

    Folliculogenesis is an important part of ovarian function as it provides the oocytes for female reproductive life. Characterizing genes/proteins involved in folliculogenesis is fundamental for understanding the mechanisms associated with this biological function and to cure the diseases associated with folliculogenesis. A large number of genes/proteins associated with folliculogenesis have been identified from different species. However, no dedicated public resource is currently available for folliculogenesis-related genes/proteins that are validated by experiments. Here, we are reporting a database ‘Follicle Online’ that provides the experimentally validated gene/protein map of the folliculogenesis in a number of species. Follicle Online is a web-based database system for storing and retrieving folliculogenesis-related experimental data. It provides detailed information for 580 genes/proteins (from 23 model organisms, including Homo sapiens, Mus musculus, Rattus norvegicus, Mesocricetus auratus, Bos Taurus, Drosophila and Xenopus laevis) that have been reported to be involved in folliculogenesis, POF (premature ovarian failure) and PCOS (polycystic ovary syndrome). The literature was manually curated from more than 43 000 published articles (till 1 March 2014). The Follicle Online database is implemented in PHP + MySQL + JavaScript and this user-friendly web application provides access to the stored data. In summary, we have developed a centralized database that provides users with comprehensive information about genes/proteins involved in folliculogenesis. This database can be accessed freely and all the stored data can be viewed without any registration. Database URL: http://mcg.ustc.edu.cn/sdap1/follicle/index.php PMID:25931457

  17. Follicle Online: an integrated database of follicle assembly, development and ovulation.

    PubMed

    Hua, Juan; Xu, Bo; Yang, Yifan; Ban, Rongjun; Iqbal, Furhan; Cooke, Howard J; Zhang, Yuanwei; Shi, Qinghua

    2015-01-01

    Folliculogenesis is an important part of ovarian function as it provides the oocytes for female reproductive life. Characterizing genes/proteins involved in folliculogenesis is fundamental for understanding the mechanisms associated with this biological function and to cure the diseases associated with folliculogenesis. A large number of genes/proteins associated with folliculogenesis have been identified from different species. However, no dedicated public resource is currently available for folliculogenesis-related genes/proteins that are validated by experiments. Here, we are reporting a database 'Follicle Online' that provides the experimentally validated gene/protein map of the folliculogenesis in a number of species. Follicle Online is a web-based database system for storing and retrieving folliculogenesis-related experimental data. It provides detailed information for 580 genes/proteins (from 23 model organisms, including Homo sapiens, Mus musculus, Rattus norvegicus, Mesocricetus auratus, Bos Taurus, Drosophila and Xenopus laevis) that have been reported to be involved in folliculogenesis, POF (premature ovarian failure) and PCOS (polycystic ovary syndrome). The literature was manually curated from more than 43,000 published articles (till 1 March 2014). The Follicle Online database is implemented in PHP + MySQL + JavaScript and this user-friendly web application provides access to the stored data. In summary, we have developed a centralized database that provides users with comprehensive information about genes/proteins involved in folliculogenesis. This database can be accessed freely and all the stored data can be viewed without any registration. Database URL: http://mcg.ustc.edu.cn/sdap1/follicle/index.php © The Author(s) 2015. Published by Oxford University Press.

  18. The Collaboration Model: The Effective Model for the Increasing Interdependence of Organizations.

    ERIC Educational Resources Information Center

    Doan, Sheila R.

    Scarce resources have facilitated increasing interdependence among organizations. This paper describes the group dynamics of the cooperation and collaboration models and examines which one is most suitable for maintaining effective group involvement. The cooperation model is comprised of two organizations that reach a mutual agreement; however,…

  19. Evaluating Organic Aerosol Model Performance: Impact of two Embedded Assumptions

    NASA Astrophysics Data System (ADS)

    Jiang, W.; Giroux, E.; Roth, H.; Yin, D.

    2004-05-01

    Organic aerosols are important due to their abundance in the polluted lower atmosphere and their impact on human health and vegetation. However, modeling organic aerosols is a very challenging task because of the complexity of aerosol composition, structure, and formation processes. Assumptions and their associated uncertainties in both models and measurement data make model performance evaluation a truly demanding job. Although some assumptions are obvious, others are hidden and embedded, and can significantly impact modeling results, possibly even changing conclusions about model performance. This paper focuses on analyzing the impact of two embedded assumptions on evaluation of organic aerosol model performance. One assumption is about the enthalpy of vaporization widely used in various secondary organic aerosol (SOA) algorithms. The other is about the conversion factor used to obtain ambient organic aerosol concentrations from measured organic carbon. These two assumptions reflect uncertainties in the model and in the ambient measurement data, respectively. For illustration purposes, various choices of the assumed values are implemented in the evaluation process for an air quality model based on CMAQ (the Community Multiscale Air Quality Model). Model simulations are conducted for the Lower Fraser Valley covering Southwest British Columbia, Canada, and Northwest Washington, United States, for a historical pollution episode in 1993. To understand the impact of the assumed enthalpy of vaporization on modeling results, its impact on instantaneous organic aerosol yields (IAY) through partitioning coefficients is analysed first. The analysis shows that utilizing different enthalpy of vaporization values causes changes in the shapes of IAY curves and in the response of SOA formation capability of reactive organic gases to temperature variations. These changes are then carried into the air quality model and cause substantial changes in the organic aerosol modeling

  20. GigaTON: an extensive publicly searchable database providing a new reference transcriptome in the pacific oyster Crassostrea gigas.

    PubMed

    Riviere, Guillaume; Klopp, Christophe; Ibouniyamine, Nabihoudine; Huvet, Arnaud; Boudry, Pierre; Favrel, Pascal

    2015-12-02

    The Pacific oyster, Crassostrea gigas, is one of the most important aquaculture shellfish resources worldwide. Important efforts have been undertaken towards a better knowledge of its genome and transcriptome, which makes now C. gigas becoming a model organism among lophotrochozoans, the under-described sister clade of ecdysozoans within protostomes. These massive sequencing efforts offer the opportunity to assemble gene expression data and make such resource accessible and exploitable for the scientific community. Therefore, we undertook this assembly into an up-to-date publicly available transcriptome database: the GigaTON (Gigas TranscriptOme pipeliNe) database. We assembled 2204 million sequences obtained from 114 publicly available RNA-seq libraries that were realized using all embryo-larval development stages, adult organs, different environmental stressors including heavy metals, temperature, salinity and exposure to air, which were mostly performed as part of the Crassostrea gigas genome project. This data was analyzed in silico and resulted into 56621 newly assembled contigs that were deposited into a publicly available database, the GigaTON database. This database also provides powerful and user-friendly request tools to browse and retrieve information about annotation, expression level, UTRs, splice and polymorphism, and gene ontology associated to all the contigs into each, and between all libraries. The GigaTON database provides a convenient, potent and versatile interface to browse, retrieve, confront and compare massive transcriptomic information in an extensive range of conditions, tissues and developmental stages in Crassostrea gigas. To our knowledge, the GigaTON database constitutes the most extensive transcriptomic database to date in marine invertebrates, thereby a new reference transcriptome in the oyster, a highly valuable resource to physiologists and evolutionary biologists.

  1. From Population Databases to Research and Informed Health Decisions and Policy.

    PubMed

    Machluf, Yossy; Tal, Orna; Navon, Amir; Chaiter, Yoram

    2017-01-01

    In the era of big data, the medical community is inspired to maximize the utilization and processing of the rapidly expanding medical datasets for clinical-related and policy-driven research. This requires a medical database that can be aggregated, interpreted, and integrated at both the individual and population levels. Policymakers seek data as a lever for wise, evidence-based decision-making and information-driven policy. Yet, bridging the gap between data collection, research, and policymaking, is a major challenge. To bridge this gap, we propose a four-step model: (A) creating a conjoined task force of all relevant parties to declare a national program to promote collaborations; (B) promoting a national digital records project, or at least a network of synchronized and integrated databases, in an accessible transparent manner; (C) creating an interoperative national research environment to enable the analysis of the organized and integrated data and to generate evidence; and (D) utilizing the evidence to improve decision-making, to support a wisely chosen national policy. For the latter purpose, we also developed a novel multidimensional set of criteria to illuminate insights and estimate the risk for future morbidity based on current medical conditions. Used by policymakers, providers of health plans, caregivers, and health organizations, we presume this model will assist transforming evidence generation to support the design of health policy and programs, as well as improved decision-making about health and health care, at all levels: individual, communal, organizational, and national.

  2. Construction and completion of flux balance models from pathway databases.

    PubMed

    Latendresse, Mario; Krummenacker, Markus; Trupp, Miles; Karp, Peter D

    2012-02-01

    Flux balance analysis (FBA) is a well-known technique for genome-scale modeling of metabolic flux. Typically, an FBA formulation requires the accurate specification of four sets: biochemical reactions, biomass metabolites, nutrients and secreted metabolites. The development of FBA models can be time consuming and tedious because of the difficulty in assembling completely accurate descriptions of these sets, and in identifying errors in the composition of these sets. For example, the presence of a single non-producible metabolite in the biomass will make the entire model infeasible. Other difficulties in FBA modeling are that model distributions, and predicted fluxes, can be cryptic and difficult to understand. We present a multiple gap-filling method to accelerate the development of FBA models using a new tool, called MetaFlux, based on mixed integer linear programming (MILP). The method suggests corrections to the sets of reactions, biomass metabolites, nutrients and secretions. The method generates FBA models directly from Pathway/Genome Databases. Thus, FBA models developed in this framework are easily queried and visualized using the Pathway Tools software. Predicted fluxes are more easily comprehended by visualizing them on diagrams of individual metabolic pathways or of metabolic maps. MetaFlux can also remove redundant high-flux loops, solve FBA models once they are generated and model the effects of gene knockouts. MetaFlux has been validated through construction of FBA models for Escherichia coli and Homo sapiens. Pathway Tools with MetaFlux is freely available to academic users, and for a fee to commercial users. Download from: biocyc.org/download.shtml. mario.latendresse@sri.com Supplementary data are available at Bioinformatics online.

  3. SSME environment database development

    NASA Technical Reports Server (NTRS)

    Reardon, John

    1987-01-01

    The internal environment of the Space Shuttle Main Engine (SSME) is being determined from hot firings of the prototype engines and from model tests using either air or water as the test fluid. The objectives are to develop a database system to facilitate management and analysis of test measurements and results, to enter available data into the the database, and to analyze available data to establish conventions and procedures to provide consistency in data normalization and configuration geometry references.

  4. BioQ: tracing experimental origins in public genomic databases using a novel data provenance model

    PubMed Central

    Saccone, Scott F.; Quan, Jiaxi; Jones, Peter L.

    2012-01-01

    Motivation: Public genomic databases, which are often used to guide genetic studies of human disease, are now being applied to genomic medicine through in silico integrative genomics. These databases, however, often lack tools for systematically determining the experimental origins of the data. Results: We introduce a new data provenance model that we have implemented in a public web application, BioQ, for assessing the reliability of the data by systematically tracing its experimental origins to the original subjects and biologics. BioQ allows investigators to both visualize data provenance as well as explore individual elements of experimental process flow using precise tools for detailed data exploration and documentation. It includes a number of human genetic variation databases such as the HapMap and 1000 Genomes projects. Availability and implementation: BioQ is freely available to the public at http://bioq.saclab.net Contact: ssaccone@wustl.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22426342

  5. Soil organic carbon stocks in Alaska estimated with spatial and pedon data

    USGS Publications Warehouse

    Bliss, Norman B.; Maursetter, J.

    2010-01-01

    Temperatures in high-latitude ecosystems are increasing faster than the average rate of global warming, which may lead to a positive feedback for climate change by increasing the respiration rates of soil organic C. If a positive feedback is confirmed, soil C will represent a source of greenhouse gases that is not currently considered in international protocols to regulate C emissions. We present new estimates of the stocks of soil organic C in Alaska, calculated by linking spatial and field data developed by the USDA NRCS. The spatial data are from the State Soil Geographic database (STATSGO), and the field and laboratory data are from the National Soil Characterization Database, also known as the pedon database. The new estimates range from 32 to 53 Pg of soil organic C for Alaska, formed by linking the spatial and field data using the attributes of Soil Taxonomy. For modelers, we recommend an estimation method based on taxonomic subgroups with interpolation for missing areas, which yields an estimate of 48 Pg. This is a substantial increase over a magnitude of 13 Pg estimated from only the STATSGO data as originally distributed in 1994, but the increase reflects different estimation methods and is not a measure of the change in C on the landscape. Pedon samples were collected between 1952 and 2002, so the results do not represent a single point in time. The linked databases provide an improved basis for modeling the impacts of climate change on net ecosystem exchange.

  6. Podiform chromite deposits--database and grade and tonnage models

    USGS Publications Warehouse

    Mosier, Dan L.; Singer, Donald A.; Moring, Barry C.; Galloway, John P.

    2012-01-01

    Chromite ((Mg, Fe++)(Cr, Al, Fe+++)2O4) is the only source for the metallic element chromium, which is used in the metallurgical, chemical, and refractory industries. Podiform chromite deposits are small magmatic chromite bodies formed in the ultramafic section of an ophiolite complex in the oceanic crust. These deposits have been found in midoceanic ridge, off-ridge, and suprasubduction tectonic settings. Most podiform chromite deposits are found in dunite or peridotite near the contact of the cumulate and tectonite zones in ophiolites. We have identified 1,124 individual podiform chromite deposits, based on a 100-meter spatial rule, and have compiled them in a database. Of these, 619 deposits have been used to create three new grade and tonnage models for podiform chromite deposits. The major podiform chromite model has a median tonnage of 11,000 metric tons and a mean grade of 45 percent Cr2O3. The minor podiform chromite model has a median tonnage of 100 metric tons and a mean grade of 43 percent Cr2O3. The banded podiform chromite model has a median tonnage of 650 metric tons and a mean grade of 42 percent Cr2O3. Observed frequency distributions are also given for grades of rhodium, iridium, ruthenium, palladium, and platinum. In resource assessment applications, both major and minor podiform chromite models may be used for any ophiolite complex regardless of its tectonic setting or ophiolite zone. Expected sizes of undiscovered podiform chromite deposits, with respect to degree of deformation or ore-forming process, may determine which model is appropriate. The banded podiform chromite model may be applicable for ophiolites in both suprasubduction and midoceanic ridge settings.

  7. A rat RNA-Seq transcriptomic BodyMap across 11 organs and 4 developmental stages

    PubMed Central

    Yu, Ying; Fuscoe, James C.; Zhao, Chen; Guo, Chao; Jia, Meiwen; Qing, Tao; Bannon, Desmond I.; Lancashire, Lee; Bao, Wenjun; Du, Tingting; Luo, Heng; Su, Zhenqiang; Jones, Wendell D.; Moland, Carrie L.; Branham, William S.; Qian, Feng; Ning, Baitang; Li, Yan; Hong, Huixiao; Guo, Lei; Mei, Nan; Shi, Tieliu; Wang, Kevin Y.; Wolfinger, Russell D.; Nikolsky, Yuri; Walker, Stephen J.; Duerksen-Hughes, Penelope; Mason, Christopher E.; Tong, Weida; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Shi, Leming; Wang, Charles

    2014-01-01

    The rat has been used extensively as a model for evaluating chemical toxicities and for understanding drug mechanisms. However, its transcriptome across multiple organs, or developmental stages, has not yet been reported. Here we show, as part of the SEQC consortium efforts, a comprehensive rat transcriptomic BodyMap created by performing RNA-Seq on 320 samples from 11 organs of both sexes of juvenile, adolescent, adult and aged Fischer 344 rats. We catalogue the expression profiles of 40,064 genes, 65,167 transcripts, 31,909 alternatively spliced transcript variants and 2,367 non-coding genes/non-coding RNAs (ncRNAs) annotated in AceView. We find that organ-enriched, differentially expressed genes reflect the known organ-specific biological activities. A large number of transcripts show organ-specific, age-dependent or sex-specific differential expression patterns. We create a web-based, open-access rat BodyMap database of expression profiles with crosslinks to other widely used databases, anticipating that it will serve as a primary resource for biomedical research using the rat model. PMID:24510058

  8. Biomine: predicting links between biological entities using network models of heterogeneous databases.

    PubMed

    Eronen, Lauri; Toivonen, Hannu

    2012-06-06

    Biological databases contain large amounts of data concerning the functions and associations of genes and proteins. Integration of data from several such databases into a single repository can aid the discovery of previously unknown connections spanning multiple types of relationships and databases. Biomine is a system that integrates cross-references from several biological databases into a graph model with multiple types of edges, such as protein interactions, gene-disease associations and gene ontology annotations. Edges are weighted based on their type, reliability, and informativeness. We present Biomine and evaluate its performance in link prediction, where the goal is to predict pairs of nodes that will be connected in the future, based on current data. In particular, we formulate protein interaction prediction and disease gene prioritization tasks as instances of link prediction. The predictions are based on a proximity measure computed on the integrated graph. We consider and experiment with several such measures, and perform a parameter optimization procedure where different edge types are weighted to optimize link prediction accuracy. We also propose a novel method for disease-gene prioritization, defined as finding a subset of candidate genes that cluster together in the graph. We experimentally evaluate Biomine by predicting future annotations in the source databases and prioritizing lists of putative disease genes. The experimental results show that Biomine has strong potential for predicting links when a set of selected candidate links is available. The predictions obtained using the entire Biomine dataset are shown to clearly outperform ones obtained using any single source of data alone, when different types of links are suitably weighted. In the gene prioritization task, an established reference set of disease-associated genes is useful, but the results show that under favorable conditions, Biomine can also perform well when no such information is

  9. The System Dynamics Model for Development of Organic Agriculture

    NASA Astrophysics Data System (ADS)

    Rozman, Črtomir; Škraba, Andrej; Kljajić, Miroljub; Pažek, Karmen; Bavec, Martina; Bavec, Franci

    2008-10-01

    Organic agriculture is the highest environmentally valuable agricultural system, and has strategic importance at national level that goes beyond the interests of agricultural sector. In this paper we address development of organic farming simulation model based on a system dynamics methodology (SD). The system incorporates relevant variables, which affect the development of the organic farming. The group decision support system (GDSS) was used in order to identify most relevant variables for construction of causal loop diagram and further model development. The model seeks answers to strategic questions related to the level of organically utilized area, levels of production and crop selection in a long term dynamic context and will be used for simulation of different policy scenarios for organic farming and their impact on economic and environmental parameters of organic production at an aggregate level.

  10. SSBD: a database of quantitative data of spatiotemporal dynamics of biological phenomena

    PubMed Central

    Tohsato, Yukako; Ho, Kenneth H. L.; Kyoda, Koji; Onami, Shuichi

    2016-01-01

    Motivation: Rapid advances in live-cell imaging analysis and mathematical modeling have produced a large amount of quantitative data on spatiotemporal dynamics of biological objects ranging from molecules to organisms. There is now a crucial need to bring these large amounts of quantitative biological dynamics data together centrally in a coherent and systematic manner. This will facilitate the reuse of this data for further analysis. Results: We have developed the Systems Science of Biological Dynamics database (SSBD) to store and share quantitative biological dynamics data. SSBD currently provides 311 sets of quantitative data for single molecules, nuclei and whole organisms in a wide variety of model organisms from Escherichia coli to Mus musculus. The data are provided in Biological Dynamics Markup Language format and also through a REST API. In addition, SSBD provides 188 sets of time-lapse microscopy images from which the quantitative data were obtained and software tools for data visualization and analysis. Availability and Implementation: SSBD is accessible at http://ssbd.qbic.riken.jp. Contact: sonami@riken.jp PMID:27412095

  11. SSBD: a database of quantitative data of spatiotemporal dynamics of biological phenomena.

    PubMed

    Tohsato, Yukako; Ho, Kenneth H L; Kyoda, Koji; Onami, Shuichi

    2016-11-15

    Rapid advances in live-cell imaging analysis and mathematical modeling have produced a large amount of quantitative data on spatiotemporal dynamics of biological objects ranging from molecules to organisms. There is now a crucial need to bring these large amounts of quantitative biological dynamics data together centrally in a coherent and systematic manner. This will facilitate the reuse of this data for further analysis. We have developed the Systems Science of Biological Dynamics database (SSBD) to store and share quantitative biological dynamics data. SSBD currently provides 311 sets of quantitative data for single molecules, nuclei and whole organisms in a wide variety of model organisms from Escherichia coli to Mus musculus The data are provided in Biological Dynamics Markup Language format and also through a REST API. In addition, SSBD provides 188 sets of time-lapse microscopy images from which the quantitative data were obtained and software tools for data visualization and analysis. SSBD is accessible at http://ssbd.qbic.riken.jp CONTACT: sonami@riken.jp. © The Author 2016. Published by Oxford University Press.

  12. Exploring Protein Function Using the Saccharomyces Genome Database.

    PubMed

    Wong, Edith D

    2017-01-01

    Elucidating the function of individual proteins will help to create a comprehensive picture of cell biology, as well as shed light on human disease mechanisms, possible treatments, and cures. Due to its compact genome, and extensive history of experimentation and annotation, the budding yeast Saccharomyces cerevisiae is an ideal model organism in which to determine protein function. This information can then be leveraged to infer functions of human homologs. Despite the large amount of research and biological data about S. cerevisiae, many proteins' functions remain unknown. Here, we explore ways to use the Saccharomyces Genome Database (SGD; http://www.yeastgenome.org ) to predict the function of proteins and gain insight into their roles in various cellular processes.

  13. Using Online Databases in Corporate Issues Management.

    ERIC Educational Resources Information Center

    Thomsen, Steven R.

    1995-01-01

    Finds that corporate public relations practitioners felt they were able, using online database and information services, to intercept issues earlier in the "issue cycle" and thus enable their organizations to develop more "proactionary" or "catalytic" issues management repose strategies. (SR)

  14. GETPrime: a gene- or transcript-specific primer database for quantitative real-time PCR.

    PubMed

    Gubelmann, Carine; Gattiker, Alexandre; Massouras, Andreas; Hens, Korneel; David, Fabrice; Decouttere, Frederik; Rougemont, Jacques; Deplancke, Bart

    2011-01-01

    The vast majority of genes in humans and other organisms undergo alternative splicing, yet the biological function of splice variants is still very poorly understood in large part because of the lack of simple tools that can map the expression profiles and patterns of these variants with high sensitivity. High-throughput quantitative real-time polymerase chain reaction (qPCR) is an ideal technique to accurately quantify nucleic acid sequences including splice variants. However, currently available primer design programs do not distinguish between splice variants and also differ substantially in overall quality, functionality or throughput mode. Here, we present GETPrime, a primer database supported by a novel platform that uniquely combines and automates several features critical for optimal qPCR primer design. These include the consideration of all gene splice variants to enable either gene-specific (covering the majority of splice variants) or transcript-specific (covering one splice variant) expression profiling, primer specificity validation, automated best primer pair selection according to strict criteria and graphical visualization of the latter primer pairs within their genomic context. GETPrime primers have been extensively validated experimentally, demonstrating high transcript specificity in complex samples. Thus, the free-access, user-friendly GETPrime database allows fast primer retrieval and visualization for genes or groups of genes of most common model organisms, and is available at http://updepla1srv1.epfl.ch/getprime/. Database URL: http://deplanckelab.epfl.ch.

  15. Aquatic information and retrieval (AQUIRE) database system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hunter, R.; Niemi, G.; Pilli, A.

    The AQUIRE database system is one of the foremost international resources for finding aquatic toxicity information. Information in the system is organized around the concept of an 'aquatic toxicity test.' A toxicity test record contains information about the chemical, species, endpoint, endpoint concentrations, and test conditions under which the toxicity test was conducted. For the past 10 years aquatic literature has been reviewed and entered into the system. Currently, the AQUIRE database system contains data on more than 2,400 species, 160 endpoints, 5,000 chemicals, 6,000 references, and 104,000 toxicity tests.

  16. MiCroKit 3.0: an integrated database of midbody, centrosome and kinetochore.

    PubMed

    Ren, Jian; Liu, Zexian; Gao, Xinjiao; Jin, Changjiang; Ye, Mingliang; Zou, Hanfa; Wen, Longping; Zhang, Zhaolei; Xue, Yu; Yao, Xuebiao

    2010-01-01

    During cell division/mitosis, a specific subset of proteins is spatially and temporally assembled into protein super complexes in three distinct regions, i.e. centrosome/spindle pole, kinetochore/centromere and midbody/cleavage furrow/phragmoplast/bud neck, and modulates cell division process faithfully. Although many experimental efforts have been carried out to investigate the characteristics of these proteins, no integrated database was available. Here, we present the MiCroKit database (http://microkit.biocuckoo.org) of proteins that localize in midbody, centrosome and/or kinetochore. We collected into the MiCroKit database experimentally verified microkit proteins from the scientific literature that have unambiguous supportive evidence for subcellular localization under fluorescent microscope. The current version of MiCroKit 3.0 provides detailed information for 1489 microkit proteins from seven model organisms, including Saccharomyces cerevisiae, Schizasaccharomyces pombe, Caenorhabditis elegans, Drosophila melanogaster, Xenopus laevis, Mus musculus and Homo sapiens. Moreover, the orthologous information was provided for these microkit proteins, and could be a useful resource for further experimental identification. The online service of MiCroKit database was implemented in PHP + MySQL + JavaScript, while the local packages were developed in JAVA 1.5 (J2SE 5.0).

  17. Evidence in the learning organization

    PubMed Central

    Crites, Gerald E; McNamara, Megan C; Akl, Elie A; Richardson, W Scott; Umscheid, Craig A; Nishikawa, James

    2009-01-01

    Background Organizational leaders in business and medicine have been experiencing a similar dilemma: how to ensure that their organizational members are adopting work innovations in a timely fashion. Organizational leaders in healthcare have attempted to resolve this dilemma by offering specific solutions, such as evidence-based medicine (EBM), but organizations are still not systematically adopting evidence-based practice innovations as rapidly as expected by policy-makers (the knowing-doing gap problem). Some business leaders have adopted a systems-based perspective, called the learning organization (LO), to address a similar dilemma. Three years ago, the Society of General Internal Medicine's Evidence-based Medicine Task Force began an inquiry to integrate the EBM and LO concepts into one model to address the knowing-doing gap problem. Methods During the model development process, the authors searched several databases for relevant LO frameworks and their related concepts by using a broad search strategy. To identify the key LO frameworks and consolidate them into one model, the authors used consensus-based decision-making and a narrative thematic synthesis guided by several qualitative criteria. The authors subjected the model to external, independent review and improved upon its design with this feedback. Results The authors found seven LO frameworks particularly relevant to evidence-based practice innovations in organizations. The authors describe their interpretations of these frameworks for healthcare organizations, the process they used to integrate the LO frameworks with EBM principles, and the resulting Evidence in the Learning Organization (ELO) model. They also provide a health organization scenario to illustrate ELO concepts in application. Conclusion The authors intend, by sharing the LO frameworks and the ELO model, to help organizations identify their capacities to learn and share knowledge about evidence-based practice innovations. The ELO model

  18. High-Performance Secure Database Access Technologies for HEP Grids

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Matthew Vranicar; John Weicher

    2006-04-17

    The Large Hadron Collider (LHC) at the CERN Laboratory will become the largest scientific instrument in the world when it starts operations in 2007. Large Scale Analysis Computer Systems (computational grids) are required to extract rare signals of new physics from petabytes of LHC detector data. In addition to file-based event data, LHC data processing applications require access to large amounts of data in relational databases: detector conditions, calibrations, etc. U.S. high energy physicists demand efficient performance of grid computing applications in LHC physics research where world-wide remote participation is vital to their success. To empower physicists with data-intensive analysismore » capabilities a whole hyperinfrastructure of distributed databases cross-cuts a multi-tier hierarchy of computational grids. The crosscutting allows separation of concerns across both the global environment of a federation of computational grids and the local environment of a physicist’s computer used for analysis. Very few efforts are on-going in the area of database and grid integration research. Most of these are outside of the U.S. and rely on traditional approaches to secure database access via an extraneous security layer separate from the database system core, preventing efficient data transfers. Our findings are shared by the Database Access and Integration Services Working Group of the Global Grid Forum, who states that "Research and development activities relating to the Grid have generally focused on applications where data is stored in files. However, in many scientific and commercial domains, database management systems have a central role in data storage, access, organization, authorization, etc, for numerous applications.” There is a clear opportunity for a technological breakthrough, requiring innovative steps to provide high-performance secure database access technologies for grid computing. We believe that an innovative database architecture where

  19. The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.

    PubMed

    Keseler, Ingrid M; Mackie, Amanda; Santos-Zavaleta, Alberto; Billington, Richard; Bonavides-Martínez, César; Caspi, Ron; Fulcher, Carol; Gama-Castro, Socorro; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Muñiz-Rascado, Luis; Ong, Quang; Paley, Suzanne; Peralta-Gil, Martin; Subhraveti, Pallavi; Velázquez-Ramírez, David A; Weaver, Daniel; Collado-Vides, Julio; Paulsen, Ian; Karp, Peter D

    2017-01-04

    EcoCyc (EcoCyc.org) is a freely accessible, comprehensive database that collects and summarizes experimental data for Escherichia coli K-12, the best-studied bacterial model organism. New experimental discoveries about gene products, their function and regulation, new metabolic pathways, enzymes and cofactors are regularly added to EcoCyc. New SmartTable tools allow users to browse collections of related EcoCyc content. SmartTables can also serve as repositories for user- or curator-generated lists. EcoCyc now supports running and modifying E. coli metabolic models directly on the EcoCyc website. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. The model and moral justification for organ procurement in Japan.

    PubMed

    Bagheri, Alireza; Shoji, Shin'ichi

    2005-01-01

    Organ replacement therapy is a part of medical practice in today's world and many countries have adopted the required guidelines and regulations. Establishing the basis on which organs can be removed, is still one of the most controversial issues of health policy making in the debate. The critical disparity between supply and demand in organ replacement therapy, even with the existence of social acceptance and organ transplantation law, turns attention towards the importance of an appropriate model of organ procurement. This model should be able to expand the donor pool and increase the organ retrieval rate by converting potential donors to actual ones. In Japan the organ transplantation law which was enacted in 1997 allows organ procurement from brain death as well as non-heart beating cadavers according to restricted conditions. One such condition includes the necessity of both the donor's and the family's written consent. Under current organ procurement policy, organs from only 29 brain death cases have been so far procured. In this paper after examining the current organ procurement system in Japan and the moral justifications behind different organ procurement models we conclude that the Japanese system does not clearly fall into one of the popular organ procurement models.

  1. Integrating diverse databases into an unified analysis framework: a Galaxy approach

    PubMed Central

    Blankenberg, Daniel; Coraor, Nathan; Von Kuster, Gregory; Taylor, James; Nekrutenko, Anton

    2011-01-01

    Recent technological advances have lead to the ability to generate large amounts of data for model and non-model organisms. Whereas, in the past, there have been a relatively small number of central repositories that serve genomic data, an increasing number of distinct specialized data repositories and resources have been established. Here, we describe a generic approach that provides for the integration of a diverse spectrum of data resources into a unified analysis framework, Galaxy (http://usegalaxy.org). This approach allows the simplified coupling of external data resources with the data analysis tools available to Galaxy users, while leveraging the native data mining facilities of the external data resources. Database URL: http://usegalaxy.org PMID:21531983

  2. MiDAS 2.0: an ecosystem-specific taxonomy and online database for the organisms of wastewater treatment systems expanded for anaerobic digester groups

    PubMed Central

    McIlroy, Simon Jon; Kirkegaard, Rasmus Hansen; McIlroy, Bianca; Nierychlo, Marta; Kristensen, Jannie Munk; Karst, Søren Michael; Albertsen, Mads

    2017-01-01

    Abstract Wastewater is increasingly viewed as a resource, with anaerobic digester technology being routinely implemented for biogas production. Characterising the microbial communities involved in wastewater treatment facilities and their anaerobic digesters is considered key to their optimal design and operation. Amplicon sequencing of the 16S rRNA gene allows high-throughput monitoring of these systems. The MiDAS field guide is a public resource providing amplicon sequencing protocols and an ecosystem-specific taxonomic database optimized for use with wastewater treatment facility samples. The curated taxonomy endeavours to provide a genus-level-classification for abundant phylotypes and the online field guide links this identity to published information regarding their ecology, function and distribution. This article describes the expansion of the database resources to cover the organisms of the anaerobic digester systems fed primary sludge and surplus activated sludge. The updated database includes descriptions of the abundant genus-level-taxa in influent wastewater, activated sludge and anaerobic digesters. Abundance information is also included to allow assessment of the role of emigration in the ecology of each phylotype. MiDAS is intended as a collaborative resource for the progression of research into the ecology of wastewater treatment, by providing a public repository for knowledge that is accessible to all interested in these biotechnologically important systems. Database URL: http://www.midasfieldguide.org PMID:28365734

  3. The Microbial Rosetta Stone Database: A compilation of global and emerging infectious microorganisms and bioterrorist threat agents

    PubMed Central

    Ecker, David J; Sampath, Rangarajan; Willett, Paul; Wyatt, Jacqueline R; Samant, Vivek; Massire, Christian; Hall, Thomas A; Hari, Kumar; McNeil, John A; Büchen-Osmond, Cornelia; Budowle, Bruce

    2005-01-01

    Background Thousands of different microorganisms affect the health, safety, and economic stability of populations. Many different medical and governmental organizations have created lists of the pathogenic microorganisms relevant to their missions; however, the nomenclature for biological agents on these lists and pathogens described in the literature is inexact. This ambiguity can be a significant block to effective communication among the diverse communities that must deal with epidemics or bioterrorist attacks. Results We have developed a database known as the Microbial Rosetta Stone. The database relates microorganism names, taxonomic classifications, diseases, specific detection and treatment protocols, and relevant literature. The database structure facilitates linkage to public genomic databases. This paper focuses on the information in the database for pathogens that impact global public health, emerging infectious organisms, and bioterrorist threat agents. Conclusion The Microbial Rosetta Stone is available at . The database provides public access to up-to-date taxonomic classifications of organisms that cause human diseases, improves the consistency of nomenclature in disease reporting, and provides useful links between different public genomic and public health databases. PMID:15850481

  4. Influence of high-resolution surface databases on the modeling of local atmospheric circulation systems

    NASA Astrophysics Data System (ADS)

    Paiva, L. M. S.; Bodstein, G. C. R.; Pimentel, L. C. G.

    2013-12-01

    Large-eddy simulations are performed using the Advanced Regional Prediction System (ARPS) code at horizontal grid resolutions as fine as 300 m to assess the influence of detailed and updated surface databases on the modeling of local atmospheric circulation systems of urban areas with complex terrain. Applications to air pollution and wind energy are sought. These databases are comprised of 3 arc-sec topographic data from the Shuttle Radar Topography Mission, 10 arc-sec vegetation type data from the European Space Agency (ESA) GlobCover Project, and 30 arc-sec Leaf Area Index and Fraction of Absorbed Photosynthetically Active Radiation data from the ESA GlobCarbon Project. Simulations are carried out for the Metropolitan Area of Rio de Janeiro using six one-way nested-grid domains that allow the choice of distinct parametric models and vertical resolutions associated to each grid. ARPS is initialized using the Global Forecasting System with 0.5°-resolution data from the National Center of Environmental Prediction, which is also used every 3 h as lateral boundary condition. Topographic shading is turned on and two soil layers with depths of 0.01 and 1.0 m are used to compute the soil temperature and moisture budgets in all runs. Results for two simulated runs covering the period from 6 to 7 September 2007 are compared to surface and upper-air observational data to explore the dependence of the simulations on initial and boundary conditions, topographic and land-use databases and grid resolution. Our comparisons show overall good agreement between simulated and observed data and also indicate that the low resolution of the 30 arc-sec soil database from United States Geological Survey, the soil moisture and skin temperature initial conditions assimilated from the GFS analyses and the synoptic forcing on the lateral boundaries of the finer grids may affect an adequate spatial description of the meteorological variables.

  5. Self-Organized Criticality in an Asexual Model?

    NASA Astrophysics Data System (ADS)

    Chisholm, Colin; Jan, Naeem; Gibbs, Peter; Erzan, Ayşe.

    Recent work has shown that the distribution of steady state mutations for an asexual ``bacteria'' model has features similar to that seen in Self-Organized Critical (SOC) sandpile model of Bak et al. We investigate this coincidence further and search for ``self-organized critical'' state for bacteria but instead find that the SOC sandpile critical behavior is very sensitive; critical behavior is destroyed with small perturbations effectively when the absorption of sand is introduced. It is only in the limit when the length of the genome of the bacteria tends to infinity that SOC properties are recovered for the asexual model.

  6. Database Entity Persistence with Hibernate for the Network Connectivity Analysis Model

    DTIC Science & Technology

    2014-04-01

    time savings in the Java coding development process. Appendices A and B describe address setup procedures for installing the MySQL database...development environment is required: • The open source MySQL Database Management System (DBMS) from Oracle, which is a Java Database Connectivity (JDBC...compliant DBMS • MySQL JDBC Driver library that comes as a plug-in with the Netbeans distribution • The latest Java Development Kit with the latest

  7. DB-PABP: a database of polyanion-binding proteins

    PubMed Central

    Fang, Jianwen; Dong, Yinghua; Salamat-Miller, Nazila; Russell Middaugh, C.

    2008-01-01

    The interactions between polyanions (PAs) and polyanion-binding proteins (PABPs) have been found to play significant roles in many essential biological processes including intracellular organization, transport and protein folding. Furthermore, many neurodegenerative disease-related proteins are PABPs. Thus, a better understanding of PA/PABP interactions may not only enhance our understandings of biological systems but also provide new clues to these deadly diseases. The literature in this field is widely scattered, suggesting the need for a comprehensive and searchable database of PABPs. The DB-PABP is a comprehensive, manually curated and searchable database of experimentally characterized PABPs. It is freely available and can be accessed online at http://pabp.bcf.ku.edu/DB_PABP/. The DB-PABP was implemented as a MySQL relational database. An interactive web interface was created using Java Server Pages (JSP). The search page of the database is organized into a main search form and a section for utilities. The main search form enables custom searches via four menus: protein names, polyanion names, the source species of the proteins and the methods used to discover the interactions. Available utilities include a commonality matrix, a function of listing PABPs by the number of interacting polyanions and a string search for author surnames. The DB-PABP is maintained at the University of Kansas. We encourage users to provide feedback and submit new data and references. PMID:17916573

  8. DB-PABP: a database of polyanion-binding proteins.

    PubMed

    Fang, Jianwen; Dong, Yinghua; Salamat-Miller, Nazila; Middaugh, C Russell

    2008-01-01

    The interactions between polyanions (PAs) and polyanion-binding proteins (PABPs) have been found to play significant roles in many essential biological processes including intracellular organization, transport and protein folding. Furthermore, many neurodegenerative disease-related proteins are PABPs. Thus, a better understanding of PA/PABP interactions may not only enhance our understandings of biological systems but also provide new clues to these deadly diseases. The literature in this field is widely scattered, suggesting the need for a comprehensive and searchable database of PABPs. The DB-PABP is a comprehensive, manually curated and searchable database of experimentally characterized PABPs. It is freely available and can be accessed online at http://pabp.bcf.ku.edu/DB_PABP/. The DB-PABP was implemented as a MySQL relational database. An interactive web interface was created using Java Server Pages (JSP). The search page of the database is organized into a main search form and a section for utilities. The main search form enables custom searches via four menus: protein names, polyanion names, the source species of the proteins and the methods used to discover the interactions. Available utilities include a commonality matrix, a function of listing PABPs by the number of interacting polyanions and a string search for author surnames. The DB-PABP is maintained at the University of Kansas. We encourage users to provide feedback and submit new data and references.

  9. Definitions of database files and fields of the Personal Computer-Based Water Data Sources Directory

    USGS Publications Warehouse

    Green, J. Wayne

    1991-01-01

    This report describes the data-base files and fields of the personal computer-based Water Data Sources Directory (WDSD). The personal computer-based WDSD was derived from the U.S. Geological Survey (USGS) mainframe computer version. The mainframe version of the WDSD is a hierarchical data-base design. The personal computer-based WDSD is a relational data- base design. This report describes the data-base files and fields of the relational data-base design in dBASE IV (the use of brand names in this abstract is for identification purposes only and does not constitute endorsement by the U.S. Geological Survey) for the personal computer. The WDSD contains information on (1) the type of organization, (2) the major orientation of water-data activities conducted by each organization, (3) the names, addresses, and telephone numbers of offices within each organization from which water data may be obtained, (4) the types of data held by each organization and the geographic locations within which these data have been collected, (5) alternative sources of an organization's data, (6) the designation of liaison personnel in matters related to water-data acquisition and indexing, (7) the volume of water data indexed for the organization, and (8) information about other types of data and services available from the organization that are pertinent to water-resources activities.

  10. Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database.

    PubMed

    Carver, Tim; Berriman, Matthew; Tivey, Adrian; Patel, Chinmay; Böhme, Ulrike; Barrell, Barclay G; Parkhill, Julian; Rajandream, Marie-Adèle

    2008-12-01

    Artemis and Artemis Comparison Tool (ACT) have become mainstream tools for viewing and annotating sequence data, particularly for microbial genomes. Since its first release, Artemis has been continuously developed and supported with additional functionality for editing and analysing sequences based on feedback from an active user community of laboratory biologists and professional annotators. Nevertheless, its utility has been somewhat restricted by its limitation to reading and writing from flat files. Therefore, a new version of Artemis has been developed, which reads from and writes to a relational database schema, and allows users to annotate more complex, often large and fragmented, genome sequences. Artemis and ACT have now been extended to read and write directly to the Generic Model Organism Database (GMOD, http://www.gmod.org) Chado relational database schema. In addition, a Gene Builder tool has been developed to provide structured forms and tables to edit coordinates of gene models and edit functional annotation, based on standard ontologies, controlled vocabularies and free text. Artemis and ACT are freely available (under a GPL licence) for download (for MacOSX, UNIX and Windows) at the Wellcome Trust Sanger Institute web sites: http://www.sanger.ac.uk/Software/Artemis/ http://www.sanger.ac.uk/Software/ACT/

  11. Proteomics: Protein Identification Using Online Databases

    ERIC Educational Resources Information Center

    Eurich, Chris; Fields, Peter A.; Rice, Elizabeth

    2012-01-01

    Proteomics is an emerging area of systems biology that allows simultaneous study of thousands of proteins expressed in cells, tissues, or whole organisms. We have developed this activity to enable high school or college students to explore proteomic databases using mass spectrometry data files generated from yeast proteins in a college laboratory…

  12. Linking Multiple Databases: Term Project Using "Sentences" DBMS.

    ERIC Educational Resources Information Center

    King, Ronald S.; Rainwater, Stephen B.

    This paper describes a methodology for use in teaching an introductory Database Management System (DBMS) course. Students master basic database concepts through the use of a multiple component project implemented in both relational and associative data models. The associative data model is a new approach for designing multi-user, Web-enabled…

  13. dbPAF: an integrative database of protein phosphorylation in animals and fungi.

    PubMed

    Ullah, Shahid; Lin, Shaofeng; Xu, Yang; Deng, Wankun; Ma, Lili; Zhang, Ying; Liu, Zexian; Xue, Yu

    2016-03-24

    Protein phosphorylation is one of the most important post-translational modifications (PTMs) and regulates a broad spectrum of biological processes. Recent progresses in phosphoproteomic identifications have generated a flood of phosphorylation sites, while the integration of these sites is an urgent need. In this work, we developed a curated database of dbPAF, containing known phosphorylation sites in H. sapiens, M. musculus, R. norvegicus, D. melanogaster, C. elegans, S. pombe and S. cerevisiae. From the scientific literature and public databases, we totally collected and integrated 54,148 phosphoproteins with 483,001 phosphorylation sites. Multiple options were provided for accessing the data, while original references and other annotations were also present for each phosphoprotein. Based on the new data set, we computationally detected significantly over-represented sequence motifs around phosphorylation sites, predicted potential kinases that are responsible for the modification of collected phospho-sites, and evolutionarily analyzed phosphorylation conservation states across different species. Besides to be largely consistent with previous reports, our results also proposed new features of phospho-regulation. Taken together, our database can be useful for further analyses of protein phosphorylation in human and other model organisms. The dbPAF database was implemented in PHP + MySQL and freely available at http://dbpaf.biocuckoo.org.

  14. Keeping Track of Our Treasures: Managing Historical Data with Relational Database Software.

    ERIC Educational Resources Information Center

    Gutmann, Myron P.; And Others

    1989-01-01

    Describes the way a relational database management system manages a large historical data collection project. Shows that such databases are practical to construct. States that the programing tasks involved are not for beginners, but the rewards of having data organized are worthwhile. (GG)

  15. System, method and apparatus for generating phrases from a database

    NASA Technical Reports Server (NTRS)

    McGreevy, Michael W. (Inventor)

    2004-01-01

    A phrase generation is a method of generating sequences of terms, such as phrases, that may occur within a database of subsets containing sequences of terms, such as text. A database is provided and a relational model of the database is created. A query is then input. The query includes a term or a sequence of terms or multiple individual terms or multiple sequences of terms or combinations thereof. Next, several sequences of terms that are contextually related to the query are assembled from contextual relations in the model of the database. The sequences of terms are then sorted and output. Phrase generation can also be an iterative process used to produce sequences of terms from a relational model of a database.

  16. Exploring human disease using the Rat Genome Database.

    PubMed

    Shimoyama, Mary; Laulederkind, Stanley J F; De Pons, Jeff; Nigam, Rajni; Smith, Jennifer R; Tutaj, Marek; Petri, Victoria; Hayman, G Thomas; Wang, Shur-Jen; Ghiasvand, Omid; Thota, Jyothi; Dwinell, Melinda R

    2016-10-01

    Rattus norvegicus, the laboratory rat, has been a crucial model for studies of the environmental and genetic factors associated with human diseases for over 150 years. It is the primary model organism for toxicology and pharmacology studies, and has features that make it the model of choice in many complex-disease studies. Since 1999, the Rat Genome Database (RGD; http://rgd.mcw.edu) has been the premier resource for genomic, genetic, phenotype and strain data for the laboratory rat. The primary role of RGD is to curate rat data and validate orthologous relationships with human and mouse genes, and make these data available for incorporation into other major databases such as NCBI, Ensembl and UniProt. RGD also provides official nomenclature for rat genes, quantitative trait loci, strains and genetic markers, as well as unique identifiers. The RGD team adds enormous value to these basic data elements through functional and disease annotations, the analysis and visual presentation of pathways, and the integration of phenotype measurement data for strains used as disease models. Because much of the rat research community focuses on understanding human diseases, RGD provides a number of datasets and software tools that allow users to easily explore and make disease-related connections among these datasets. RGD also provides comprehensive human and mouse data for comparative purposes, illustrating the value of the rat in translational research. This article introduces RGD and its suite of tools and datasets to researchers - within and beyond the rat community - who are particularly interested in leveraging rat-based insights to understand human diseases. © 2016. Published by The Company of Biologists Ltd.

  17. Hymenoptera Genome Database: integrating genome annotations in HymenopteraMine.

    PubMed

    Elsik, Christine G; Tayal, Aditi; Diesh, Colin M; Unni, Deepak R; Emery, Marianne L; Nguyen, Hung N; Hagen, Darren E

    2016-01-04

    We report an update of the Hymenoptera Genome Database (HGD) (http://HymenopteraGenome.org), a model organism database for insect species of the order Hymenoptera (ants, bees and wasps). HGD maintains genomic data for 9 bee species, 10 ant species and 1 wasp, including the versions of genome and annotation data sets published by the genome sequencing consortiums and those provided by NCBI. A new data-mining warehouse, HymenopteraMine, based on the InterMine data warehousing system, integrates the genome data with data from external sources and facilitates cross-species analyses based on orthology. New genome browsers and annotation tools based on JBrowse/WebApollo provide easy genome navigation, and viewing of high throughput sequence data sets and can be used for collaborative genome annotation. All of the genomes and annotation data sets are combined into a single BLAST server that allows users to select and combine sequence data sets to search. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. MPD: a pathogen genome and metagenome database

    PubMed Central

    Zhang, Tingting; Miao, Jiaojiao; Han, Na; Qiang, Yujun; Zhang, Wen

    2018-01-01

    Abstract Advances in high-throughput sequencing have led to unprecedented growth in the amount of available genome sequencing data, especially for bacterial genomes, which has been accompanied by a challenge for the storage and management of such huge datasets. To facilitate bacterial research and related studies, we have developed the Mypathogen database (MPD), which provides access to users for searching, downloading, storing and sharing bacterial genomics data. The MPD represents the first pathogenic database for microbial genomes and metagenomes, and currently covers pathogenic microbial genomes (6604 genera, 11 071 species, 41 906 strains) and metagenomic data from host, air, water and other sources (28 816 samples). The MPD also functions as a management system for statistical and storage data that can be used by different organizations, thereby facilitating data sharing among different organizations and research groups. A user-friendly local client tool is provided to maintain the steady transmission of big sequencing data. The MPD is a useful tool for analysis and management in genomic research, especially for clinical Centers for Disease Control and epidemiological studies, and is expected to contribute to advancing knowledge on pathogenic bacteria genomes and metagenomes. Database URL: http://data.mypathogen.org PMID:29917040

  19. Animal models of female pelvic organ prolapse: lessons learned

    PubMed Central

    Couri, Bruna M; Lenis, Andrew T; Borazjani, Ali; Paraiso, Marie Fidela R; Damaser, Margot S

    2012-01-01

    Pelvic organ prolapse is a vaginal protrusion of female pelvic organs. It has high prevalence worldwide and represents a great burden to the economy. The pathophysiology of pelvic organ prolapse is multifactorial and includes genetic predisposition, aberrant connective tissue, obesity, advancing age, vaginal delivery and other risk factors. Owing to the long course prior to patients becoming symptomatic and ethical questions surrounding human studies, animal models are necessary and useful. These models can mimic different human characteristics – histological, anatomical or hormonal, but none present all of the characteristics at the same time. Major animal models include knockout mice, rats, sheep, rabbits and nonhuman primates. In this article we discuss different animal models and their utility for investigating the natural progression of pelvic organ prolapse pathophysiology and novel treatment approaches. PMID:22707980

  20. Lessons Learned from Deploying an Analytical Task Management Database

    NASA Technical Reports Server (NTRS)

    O'Neil, Daniel A.; Welch, Clara; Arceneaux, Joshua; Bulgatz, Dennis; Hunt, Mitch; Young, Stephen

    2007-01-01

    Defining requirements, missions, technologies, and concepts for space exploration involves multiple levels of organizations, teams of people with complementary skills, and analytical models and simulations. Analytical activities range from filling a To-Be-Determined (TBD) in a requirement to creating animations and simulations of exploration missions. In a program as large as returning to the Moon, there are hundreds of simultaneous analysis activities. A way to manage and integrate efforts of this magnitude is to deploy a centralized database that provides the capability to define tasks, identify resources, describe products, schedule deliveries, and generate a variety of reports. This paper describes a web-accessible task management system and explains the lessons learned during the development and deployment of the database. Through the database, managers and team leaders can define tasks, establish review schedules, assign teams, link tasks to specific requirements, identify products, and link the task data records to external repositories that contain the products. Data filters and spreadsheet export utilities provide a powerful capability to create custom reports. Import utilities provide a means to populate the database from previously filled form files. Within a four month period, a small team analyzed requirements, developed a prototype, conducted multiple system demonstrations, and deployed a working system supporting hundreds of users across the aeros pace community. Open-source technologies and agile software development techniques, applied by a skilled team enabled this impressive achievement. Topics in the paper cover the web application technologies, agile software development, an overview of the system's functions and features, dealing with increasing scope, and deploying new versions of the system.