Science.gov

Sample records for public dna databases

  1. Public participation in genetic databases: crossing the boundaries between biobanks and forensic DNA databases through the principle of solidarity

    PubMed Central

    Machado, Helena; Silva, Susana

    2015-01-01

    The ethical aspects of biobanks and forensic DNA databases are often treated as separate issues. As a reflection of this, public participation, or the involvement of citizens in genetic databases, has been approached differently in the fields of forensics and medicine. This paper aims to cross the boundaries between medicine and forensics by exploring the flows between the ethical issues presented in the two domains and the subsequent conceptualisation of public trust and legitimisation. We propose to introduce the concept of ‘solidarity’, traditionally applied only to medical and research biobanks, into a consideration of public engagement in medicine and forensics. Inclusion of a solidarity-based framework, in both medical biobanks and forensic DNA databases, raises new questions that should be included in the ethical debate, in relation to both health services/medical research and activities associated with the criminal justice system. PMID:26139851

  2. Public participation in genetic databases: crossing the boundaries between biobanks and forensic DNA databases through the principle of solidarity.

    PubMed

    Machado, Helena; Silva, Susana

    2015-10-01

    The ethical aspects of biobanks and forensic DNA databases are often treated as separate issues. As a reflection of this, public participation, or the involvement of citizens in genetic databases, has been approached differently in the fields of forensics and medicine. This paper aims to cross the boundaries between medicine and forensics by exploring the flows between the ethical issues presented in the two domains and the subsequent conceptualisation of public trust and legitimisation. We propose to introduce the concept of 'solidarity', traditionally applied only to medical and research biobanks, into a consideration of public engagement in medicine and forensics. Inclusion of a solidarity-based framework, in both medical biobanks and forensic DNA databases, raises new questions that should be included in the ethical debate, in relation to both health services/medical research and activities associated with the criminal justice system. PMID:26139851

  3. About DNA databasing and investigative genetic analysis of externally visible characteristics: A public survey.

    PubMed

    Zieger, Martin; Utz, Silvia

    2015-07-01

    During the last decade, DNA profiling and the use of DNA databases have become two of the most employed instruments of police investigations. This very rapid establishment of forensic genetics is yet far from being complete. In the last few years novel types of analyses have been presented to describe phenotypically a possible perpetrator. We conducted the present study among German speaking Swiss residents for two main reasons: firstly, we aimed at getting an impression of the public awareness and acceptance of the Swiss DNA database and the perception of a hypothetical DNA database containing all Swiss residents. Secondly, we wanted to get a broader picture of how people that are not working in the field of forensic genetics think about legal permission to establish phenotypic descriptions of alleged criminals by genetic means. Even though a significant number of study participants did not even know about the existence of the Swiss DNA database, its acceptance appears to be very high. Generally our results suggest that the current forensic use of DNA profiling is considered highly trustworthy. However, the acceptance of a hypothetical universal database would be only as low as about 30% among the 284 respondents to our study, mostly because people are concerned about the security of their genetic data, their privacy or a possible risk of abuse of such a database. Concerning the genetic analysis of externally visible characteristics and biogeographical ancestry, we discover a high degree of acceptance. The acceptance decreases slightly when precise characteristics are presented to the participants in detail. About half of the respondents would be in favor of the moderate use of physical traits analyses only for serious crimes threatening life, health or sexual integrity. The possible risk of discrimination and reinforcement of racism, as discussed by scholars from anthropology, bioethics, law, philosophy and sociology, is mentioned less frequently by the study participants than we would have expected. A national DNA database and the widespread use of DNA analyses for police and justice have an impact on the entire society. Therefore the concerns of lay persons from the respective population should be heard and considered. The aims of this study were to draw a broader picture of the public opinion on DNA databasing and to contribute to the debate about the possible future use of genetics to reveal phenotypic characteristics. Our data might provide an additional perspective for experts involved in regulatory or legislative processes. PMID:26004189

  4. Spanish public awareness regarding DNA profile databases in forensic genetics: what type of DNA profiles should be included?

    PubMed Central

    Gamero, Joaquín J; Romero, Jose‐Luis; Peralta, Juan‐Luis; Carvalho, Mónica; Corte‐Real, Francisco

    2007-01-01

    The importance of non‐codifying DNA polymorphism for the administration of justice is now well known. In Spain, however, this type of test has given rise to questions in recent years: (a) Should consent be obtained before biological samples are taken from an individual for DNA analysis? (b) Does society perceive these techniques and methods of analysis as being reliable? (c) There appears to be lack of knowledge concerning the basic norms that regulate databases containing private or personal information and the protection that information of this type must be given. This opinion survey and the subsequent analysis of the results in ethical terms may serve to reveal the criteria and the degree of information that society has with regard to DNA databases. In the study, 73.20% (SE 1.12%) of the population surveyed was in favour of specific legislation for computer files in which DNA analysis results for forensic purposes are stored. PMID:17906059

  5. NCCDPHP PUBLICATION DATABASE

    EPA Science Inventory

    This database provides bibliographic citations and abstracts of publications produced by the CDC's National Center for Chronic Disease Prevention and Health Promotion (NCCDPHP) including journal articles, monographs, book chapters, reports, policy documents, and fact sheets. Full...

  6. Enhancing the DNA Patent Database

    SciTech Connect

    Walters, LeRoy B.

    2008-02-18

    Final Report on Award No. DE-FG0201ER63171 Principal Investigator: LeRoy B. Walters February 18, 2008 This project successfully completed its goal of surveying and reporting on the DNA patenting and licensing policies at 30 major U.S. academic institutions. The report of survey results was published in the January 2006 issue of Nature Biotechnology under the title “The Licensing of DNA Patents by US Academic Institutions: An Empirical Survey.” Lori Pressman was the lead author on this feature article. A PDF reprint of the article will be submitted to our Program Officer under separate cover. The project team has continued to update the DNA Patent Database on a weekly basis since the conclusion of the project. The database can be accessed at dnapatents.georgetown.edu. This database provides a valuable research tool for academic researchers, policymakers, and citizens. A report entitled Reaping the Benefits of Genomic and Proteomic Research: Intellectual Property Rights, Innovation, and Public Health was published in 2006 by the Committee on Intellectual Property Rights in Genomic and Protein Research and Innovation, Board on Science, Technology, and Economic Policy at the National Academies. The report was edited by Stephen A. Merrill and Anne-Marie Mazza. This report employed and then adapted the methodology developed by our research project and quoted our findings at several points. (The full report can be viewed online at the following URL: http://www.nap.edu/openbook.php?record_id=11487&page=R1). My colleagues and I are grateful for the research support of the ELSI program at the U.S. Department of Energy.

  7. Compressing DNA sequence databases with coil

    PubMed Central

    White, W Timothy J; Hendy, Michael D

    2008-01-01

    Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work. PMID:18489794

  8. AIDS PUBLIC INFORMATION DATABASE

    EPA Science Inventory

    The AIDS Public Information Data Set is computer software designed to run on a Microsoft Windows microcomputer, and contains information abstracted from acquired immunodeficiency syndrome (AIDS) cases reported in the United States. The data set is created by the Division of HIV/A...

  9. Database Support for Research in Public Administration

    ERIC Educational Resources Information Center

    Tucker, James Cory

    2005-01-01

    This study examines the extent to which databases support student and faculty research in the area of public administration. A list of journals in public administration, public policy, political science, public budgeting and finance, and other related areas was compared to the journal content list of six business databases. These databases

  10. Short Tandem Repeat DNA Internet Database

    National Institute of Standards and Technology Data Gateway

    SRD 130 Short Tandem Repeat DNA Internet Database (Web, free access)   Short Tandem Repeat DNA Internet Database is intended to benefit research and application of short tandem repeat DNA markers for human identity testing. Facts and sequence information on each STR system, population data, commonly used multiplex STR systems, PCR primers and conditions, and a review of various technologies for analysis of STR alleles have been included.

  11. Forensic DNA Profiling and Database

    PubMed Central

    Panneerchelvam, S.; Norazmi, M.N.

    2003-01-01

    The incredible power of DNA technology as an identification tool had brought a tremendous change in crimnal justice . DNA data base is an information resource for the forensic DNA typing community with details on commonly used short tandem repeat (STR) DNA markers. This article discusses the essential steps in compilation of COmbined DNA Index System (CODIS) on validated polymerase chain amplified STRs and their use in crime detection. PMID:23386793

  12. Database Support for Research in Public Administration

    ERIC Educational Resources Information Center

    Tucker, James Cory

    2005-01-01

    This study examines the extent to which databases support student and faculty research in the area of public administration. A list of journals in public administration, public policy, political science, public budgeting and finance, and other related areas was compared to the journal content list of six business databases. These databases…

  13. REPAIRtoire—a database of DNA repair pathways

    PubMed Central

    Milanowska, Kaja; Krwawicz, Joanna; Papaj, Grzegorz; Kosiński, Jan; Poleszak, Katarzyna; Lesiak, Justyna; Osińska, Ewelina; Rother, Kristian; Bujnicki, Janusz M.

    2011-01-01

    REPAIRtoire is the first comprehensive database resource for systems biology of DNA damage and repair. The database collects and organizes the following types of information: (i) DNA damage linked to environmental mutagenic and cytotoxic agents, (ii) pathways comprising individual processes and enzymatic reactions involved in the removal of damage, (iii) proteins participating in DNA repair and (iv) diseases correlated with mutations in genes encoding DNA repair proteins. REPAIRtoire provides also links to publications and external databases. REPAIRtoire contains information about eight main DNA damage checkpoint, repair and tolerance pathways: DNA damage signaling, direct reversal repair, base excision repair, nucleotide excision repair, mismatch repair, homologous recombination repair, nonhomologous end-joining and translesion synthesis. The pathway/protein dataset is currently limited to three model organisms: Escherichia coli, Saccharomyces cerevisiae and Homo sapiens. The DNA repair and tolerance pathways are represented as graphs and in tabular form with descriptions of each repair step and corresponding proteins, and individual entries are cross-referenced to supporting literature and primary databases. REPAIRtoire can be queried by the name of pathway, protein, enzymatic complex, damage and disease. In addition, a tool for drawing custom DNA–protein complexes is available online. REPAIRtoire is freely available and can be accessed at http://repairtoire.genesilico.pl/. PMID:21051355

  14. Plant rDNA database: update and new features

    PubMed Central

    Garcia, Sònia; Gálvez, Francisco; Gras, Airy; Kovařík, Aleš; Garnatje, Teresa

    2014-01-01

    The Plant rDNA database (www.plantrdnadatabase.com) is an open access online resource providing detailed information on numbers, structures and positions of 5S and 18S-5.8S-26S (35S) ribosomal DNA loci. The data have been obtained from >600 publications on plant molecular cytogenetics, mostly based on fluorescent in situ hybridization (FISH). This edition of the database contains information on 1609 species derived from 2839 records, which means an expansion of 55.76 and 94.45%, respectively. It holds the data for angiosperms, gymnosperms, bryophytes and pteridophytes available as of June 2013. Information from publications reporting data for a single rDNA (either 5S or 35S alone) and annotation regarding transcriptional activity of 35S loci now appears in the database. Preliminary analyses suggest greater variability in the number of rDNA loci in gymnosperms than in angiosperms. New applications provide ideograms of the species showing the positions of rDNA loci as well as a visual representation of their genome sizes. We have also introduced other features to boost the usability of the Web interface, such as an application for convenient data export and a new section with rDNA–FISH-related information (mostly detailing protocols and reagents). In addition, we upgraded and/or proofread tabs and links and modified the website for a more dynamic appearance. This manuscript provides a synopsis of these changes and developments. Database URL: http://www.plantrdnadatabase.com PMID:24980131

  15. MitoBreak: the mitochondrial DNA breakpoints database

    PubMed Central

    Damas, Joana; Carneiro, João; Amorim, António; Pereira, Filipe

    2014-01-01

    Mitochondrial DNA (mtDNA) rearrangements are key events in the development of many diseases. Investigations of mtDNA regions affected by rearrangements (i.e. breakpoints) can lead to important discoveries about rearrangement mechanisms and can offer important clues about the causes of mitochondrial diseases. Here, we present the mitochondrial DNA breakpoints database (MitoBreak; http://mitobreak.portugene.com), a free, web-accessible comprehensive list of breakpoints from three classes of somatic mtDNA rearrangements: circular deleted (deletions), circular partially duplicated (duplications) and linear mtDNAs. Currently, MitoBreak contains >1400 mtDNA rearrangements from seven species (Homo sapiens, Mus musculus, Rattus norvegicus, Macaca mulatta, Drosophila melanogaster, Caenorhabditis elegans and Podospora anserina) and their associated phenotypic information collected from nearly 400 publications. The database allows researchers to perform multiple types of data analyses through user-friendly interfaces with full or partial datasets. It also permits the download of curated data and the submission of new mtDNA rearrangements. For each reported case, MitoBreak also documents the precise breakpoint positions, junction sequences, disease or associated symptoms and links to the related publications, providing a useful resource to study the causes and consequences of mtDNA structural alterations. PMID:24170808

  16. Should arrestee DNA databases extend to misdemeanors?

    PubMed

    Joh, Elizabeth E

    2014-01-01

    In the United States, those groups of persons eligible for compulsory DNA sampling by law enforcement authorities continue to expand. The collection of DNA samples from felony arrestees will likely be adopted by many more states after the U.S. Supreme Court's 2013 decision in Maryland v. King, which upheld a state law permitting the compulsory and warrantless DNA sampling from those arrested of serious offenses. At the time of the decision, 28 states and the federal government already had arrestee DNA collection statutes in place. Nevada became the 29th state to collect DNA from arrestees in May 2013, and several others have bills under consideration. Should states collect DNA from misdemeanor arrestees as well? This article considers this as yet largely unrealized but nevertheless important potential expansion of arrestee DNA databases. The collection of DNA samples from those arrested of relatively minor offenses would increase the number of samples, and perhaps consequently the number of "hits." On balance, however, such an expansion of current DNA laws raises enough serious concerns-chiefly about police discretion, inequitable enforcement, and cost-that legislators should refrain from changing arrestee DNA laws in this way. PMID:25669828

  17. On parallel search of DNA sequence databases

    SciTech Connect

    Guan, Xiaogun; Mann, R.; Mural, R.; Uberbacher, E.

    1991-01-01

    This paper describes the development of large scale parallel search methods for DNA databases using dynamic programming algorithm on an Intel iPCS/860 parallel computer. The performance of these methods has been measured and several strategies for improving performance are discussed. 6 refs., 2 figs., 2 tabs.

  18. Ethical-legal problems of DNA databases in criminal investigation.

    PubMed

    Guillén, M; Lareu, M V; Pestoni, C; Salas, A; Carracedo, A

    2000-08-01

    Advances in DNA technology and the discovery of DNA polymorphisms have permitted the creation of DNA databases of individuals for the purpose of criminal investigation. Many ethical and legal problems arise in the preparation of a DNA database, and these problems are especially important when one analyses the legal regulations on the subject. In this paper three main groups of possibilities, three systems, are analysed in relation to databases. The first system is based on a general analysis of the population; the second one is based on the taking of samples for a particular list of crimes, and a third is based only on the specific analysis of each case. The advantages and disadvantages of each system are compared and controversial issues are then examined. We found the second system to be the best choice for Spain and other European countries with a similar tradition when we weighed the rights of an individual against the public's interest in the prosecution of a crime. PMID:10951922

  19. Analysis of commercial and public bioactivity databases.

    PubMed

    Tiikkainen, Pekka; Franke, Lutz

    2012-02-27

    Activity data for small molecules are invaluable in chemoinformatics. Various bioactivity databases exist containing detailed information of target proteins and quantitative binding data for small molecules extracted from journals and patents. In the current work, we have merged several public and commercial bioactivity databases into one bioactivity metabase. The molecular presentation, target information, and activity data of the vendor databases were standardized. The main motivation of the work was to create a single relational database which allows fast and simple data retrieval by in-house scientists. Second, we wanted to know the amount of overlap between databases by commercial and public vendors to see whether the former contain data complementing the latter. Third, we quantified the degree of inconsistency between data sources by comparing data points derived from the same scientific article cited by more than one vendor. We found that each data source contains unique data which is due to different scientific articles cited by the vendors. When comparing data derived from the same article we found that inconsistencies between the vendors are common. In conclusion, using databases of different vendors is still useful since the data overlap is not complete. It should be noted that this can be partially explained by the inconsistencies and errors in the source data. PMID:22145975

  20. The Dfam database of repetitive DNA families.

    PubMed

    Hubley, Robert; Finn, Robert D; Clements, Jody; Eddy, Sean R; Jones, Thomas A; Bao, Weidong; Smit, Arian F A; Wheeler, Travis J

    2016-01-01

    Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of many genomes. Dfam is an open access database of families of repetitive DNA elements, in which each family is represented by a multiple sequence alignment and a profile hidden Markov model (HMM). The initial release of Dfam, featured in the 2013 NAR Database Issue, contained 1143 families of repetitive elements found in humans, and was used to produce more than 100 Mb of additional annotation of TE-derived regions in the human genome, with improved speed. Here, we describe recent advances, most notably expansion to 4150 total families including a comprehensive set of known repeat families from four new organisms (mouse, zebrafish, fly and nematode). We describe improvements to coverage, and to our methods for identifying and reducing false annotation. We also describe updates to the website interface. The Dfam website has moved to http://dfam.org. Seed alignments, profile HMMs, hit lists and other underlying data are available for download. PMID:26612867

  1. The Dfam database of repetitive DNA families

    PubMed Central

    Hubley, Robert; Finn, Robert D.; Clements, Jody; Eddy, Sean R.; Jones, Thomas A.; Bao, Weidong; Smit, Arian F.A.; Wheeler, Travis J.

    2016-01-01

    Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of many genomes. Dfam is an open access database of families of repetitive DNA elements, in which each family is represented by a multiple sequence alignment and a profile hidden Markov model (HMM). The initial release of Dfam, featured in the 2013 NAR Database Issue, contained 1143 families of repetitive elements found in humans, and was used to produce more than 100 Mb of additional annotation of TE-derived regions in the human genome, with improved speed. Here, we describe recent advances, most notably expansion to 4150 total families including a comprehensive set of known repeat families from four new organisms (mouse, zebrafish, fly and nematode). We describe improvements to coverage, and to our methods for identifying and reducing false annotation. We also describe updates to the website interface. The Dfam website has moved to http://dfam.org. Seed alignments, profile HMMs, hit lists and other underlying data are available for download. PMID:26612867

  2. OriDB: a DNA replication origin database.

    PubMed

    Nieduszynski, Conrad A; Hiraga, Shin-ichiro; Ak, Prashanth; Benham, Craig J; Donaldson, Anne D

    2007-01-01

    Replication of eukaryotic chromosomes initiates at multiple sites called replication origins. Replication origins are best understood in the budding yeast Saccharomyces cerevisiae, where several complementary studies have mapped their locations genome-wide. We have collated these datasets, taking account of the resolution of each study, to generate a single list of distinct origin sites. OriDB provides a web-based catalogue of these confirmed and predicted S.cerevisiae DNA replication origin sites. Each proposed or confirmed origin site appears as a record in OriDB, with each record comprising seven pages. These pages provide, in text and graphical formats, the following information: genomic location and chromosome context of the origin site; time of origin replication; DNA sequence of proposed or experimentally confirmed origin elements; free energy required to open the DNA duplex (stress-induced DNA duplex destabilization or SIDD); and phylogenetic conservation of sequence elements. In addition, OriDB encourages community submission of additional information for each origin site through a User Notes facility. Origin sites are linked to several external resources, including the Saccharomyces Genome Database (SGD) and relevant publications at PubMed. Finally, a Chromosome Viewer utility allows users to interactively generate graphical representations of DNA replication data genome-wide. OriDB is available at www.oridb.org. PMID:17065467

  3. Data publication: towards a database of everything

    PubMed Central

    Smith, Vincent S

    2009-01-01

    The fabric of science is changing, driven by a revolution in digital technologies that facilitate the acquisition and communication of massive amounts of data. This is changing the nature of collaboration and expanding opportunities to participate in science. If digital technologies are the engine of this revolution, digital data are its fuel. But for many scientific disciplines, this fuel is in short supply. The publication of primary data is not a universal or mandatory part of science, and despite policies and proclamations to the contrary, calls to make data publicly available have largely gone unheeded. In this short essay I consider why, and explore some of the challenges that lie ahead, as we work toward a database of everything. PMID:19552813

  4. 24 CFR 81.72 - Public-use database and public information.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 24 Housing and Urban Development 1 2014-04-01 2014-04-01 false Public-use database and public... Public-use database and public information. (a) General. Except as provided in paragraph (c) of this section, the Secretary shall establish and make available for public use, a public-use database...

  5. 24 CFR 81.72 - Public-use database and public information.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 24 Housing and Urban Development 1 2013-04-01 2013-04-01 false Public-use database and public... Public-use database and public information. (a) General. Except as provided in paragraph (c) of this section, the Secretary shall establish and make available for public use, a public-use database...

  6. 24 CFR 81.72 - Public-use database and public information.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 24 Housing and Urban Development 1 2012-04-01 2012-04-01 false Public-use database and public... Public-use database and public information. (a) General. Except as provided in paragraph (c) of this section, the Secretary shall establish and make available for public use, a public-use database...

  7. 24 CFR 81.72 - Public-use database and public information.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 24 Housing and Urban Development 1 2010-04-01 2010-04-01 false Public-use database and public... Public-use database and public information. (a) General. Except as provided in paragraph (c) of this section, the Secretary shall establish and make available for public use, a public-use database...

  8. 24 CFR 81.72 - Public-use database and public information.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 24 Housing and Urban Development 1 2011-04-01 2011-04-01 false Public-use database and public... Public-use database and public information. (a) General. Except as provided in paragraph (c) of this section, the Secretary shall establish and make available for public use, a public-use database...

  9. A DNA recombinant database management system.

    PubMed Central

    Tolstoshev, C M; Jeltsch, J M; Fritz, R; Oudet, P

    1983-01-01

    A set of computer programs is described which constitutes a clone database management system. Maintenance of the database and the stocks of material is designed to be under the control of one person or group of people, who may insert, delete or modify data entries, and who may interrogate the database as to which stocks are in need of checking. The system is organised in such a way that information is freely and speedily available to all users. Database entries may be accessed by name or key word. PMID:6306595

  10. Information Access through Electronic Databases for Rural Public Libraries.

    ERIC Educational Resources Information Center

    Canepi, Kitti

    1997-01-01

    To compile a list of recommended electronic databases for rural libraries, public library patron questions received by the Arizona State Reference Center were searched on ten databases. The results indicated Books in Print, Magazine Database, ABI/INFORM, Public Affairs Information System (PAIS), and Government Printing Office (GPO) Publications…

  11. The Availability of Faculty Publication Databases from Library Web Pages

    ERIC Educational Resources Information Center

    Blummer, Barbara A.

    2007-01-01

    Faculty publication databases or author bibliographies offer libraries an opportunity to provide services to users. Initially, these databases remained initiatives of special libraries in the health-sciences fields. Librarians used the publication information derived from these databases to compile lists for annual reports. However, the advent of…

  12. Exploration of the Chemical Space of Public Genomic Databases

    EPA Science Inventory

    The current project aims to chemically index the content of public genomic databases to make these data accessible in relation to other publicly available, chemically-indexed toxicological information.

  13. Towards computational improvement of DNA database indexing and short DNA query searching

    PubMed Central

    Stojanov, Done; Koceski, Sašo; Mileva, Aleksandra; Koceska, Nataša; Bande, Cveta Martinovska

    2014-01-01

    In order to facilitate and speed up the search of massive DNA databases, the database is indexed at the beginning, employing a mapping function. By searching through the indexed data structure, exact query hits can be identified. If the database is searched against an annotated DNA query, such as a known promoter consensus sequence, then the starting locations and the number of potential genes can be determined. This is particularly relevant if unannotated DNA sequences have to be functionally annotated. However, indexing a massive DNA database and searching an indexed data structure with millions of entries is a time-demanding process. In this paper, we propose a fast DNA database indexing and searching approach, identifying all query hits in the database, without having to examine all entries in the indexed data structure, limiting the maximum length of a query that can be searched against the database. By applying the proposed indexing equation, the whole human genome could be indexed in 10 hours on a personal computer, under the assumption that there is enough RAM to store the indexed data structure. Analysing the methodology proposed by Reneker, we observed that hits at starting positions are not reported, if the database is searched against a query shorter than nucleotides, such that is the length of the DNA database words being mapped and is the length of the query. A solution of this drawback is also presented. PMID:26019584

  14. MethHC: a database of DNA methylation and gene expression in human cancer

    PubMed Central

    Huang, Wei-Yun; Hsu, Sheng-Da; Huang, Hsi-Yuan; Sun, Yi-Ming; Chou, Chih-Hung; Weng, Shun-Long; Huang, Hsien-Da

    2015-01-01

    We present MethHC (http://MethHC.mbc.nctu.edu.tw), a database comprising a systematic integration of a large collection of DNA methylation data and mRNA/microRNA expression profiles in human cancer. DNA methylation is an important epigenetic regulator of gene transcription, and genes with high levels of DNA methylation in their promoter regions are transcriptionally silent. Increasing numbers of DNA methylation and mRNA/microRNA expression profiles are being published in different public repositories. These data can help researchers to identify epigenetic patterns that are important for carcinogenesis. MethHC integrates data such as DNA methylation, mRNA expression, DNA methylation of microRNA gene and microRNA expression to identify correlations between DNA methylation and mRNA/microRNA expression from TCGA (The Cancer Genome Atlas), which includes 18 human cancers in more than 6000 samples, 6548 microarrays and 12 567 RNA sequencing data. PMID:25398901

  15. Comparisons of familial DNA database searching strategies.

    PubMed

    Ge, Jianye; Chakraborty, Ranajit; Eisenberg, Arthur; Budowle, Bruce

    2011-11-01

    The current familial searching strategies are generally based on either Identity-By-State (IBS) (i.e., number of shared alleles) or likelihood ratio (i.e., kinship index [KI]) assessments. In this study, the expected IBS match probabilities given relationships and the logic of the likelihood ratio method were addressed. Further, the false-positive and false-negative rates of the strategies were compared analytically or by simulations using Caucasian population data of the 13 CODIS Short Tandem Repeat (STR). IBS ≥ 15, IBS ≥ 16, KI ≥ 1000, or KI ≥ 10,000 were found to be good thresholds for balancing false-positive and false-negative rates. IBS ≥ 17 and/or KI ≥ 1,000,000 can exclude the majority of candidate profiles in the database, either related or not, and may be an initial screening option if a small candidate list is desired. Polices combining both IBS and KI can provide higher accuracy. Typing additional STRs can provide better searching performance, and lineage markers can be extremely useful for reducing false rates. PMID:21827463

  16. 76 FR 1137 - Publicly Available Consumer Product Safety Information Database: Notice of Public Web Conferences

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-01-07

    ... site. In the Federal Register of December 9, 2010 (75 FR 76832), we published a final rule to establish... search function of the publicly available consumer product safety information database (``Database... public will use to file a report of harm and the search function of the Database. The Web conference...

  17. The Israel DNA database--the establishment of a rapid, semi-automated analysis system.

    PubMed

    Zamir, Ashira; Dell'Ariccia-Carmon, Aviva; Zaken, Neomi; Oz, Carla

    2012-03-01

    The Israel Police DNA database, also known as IPDIS (Israel Police DNA Index System), has been operating since February 2007. During that time more than 135,000 reference samples have been uploaded and more than 2000 hits reported. We have developed an effective semi-automated system that includes two automated punchers, three liquid handler robots and four genetic analyzers. An inhouse LIMS program enables full tracking of every sample through the entire process of registration, pre-PCR handling, analysis of profiles, uploading to the database, hit reports and ultimately storage. The LIMS is also responsible for the future tracking of samples and their profiles to be expunged from the database according to the Israeli DNA legislation. The database is administered by an in-house developed software program, where reference and evidentiary profiles are uploaded, stored, searched and matched. The DNA database has proven to be an effective investigative tool which has gained the confidence of the Israeli public and on which the Israel National Police force has grown to rely. PMID:21727053

  18. The Human Variome Project: ensuring the quality of DNA variant databases in inherited renal disease.

    PubMed

    Savige, Judy; Dalgleish, Raymond; Cotton, Richard Gh; den Dunnen, Johan T; Macrae, Finlay; Povey, Sue

    2015-11-01

    A recent review identified 60 common inherited renal diseases caused by DNA variants in 132 different genes. These diseases can be diagnosed with DNA sequencing, but each gene probably also has a thousand normal variants. Many more normal variants have been characterised by individual laboratories than are reported in the literature or found in publicly accessible collections. At present, testing laboratories must assess each novel change they identify for pathogenicity, even when this has been done elsewhere previously, and the distinction between normal and disease-associated variants is particularly an issue with the recent surge in exomic sequencing and gene discovery projects. The Human Variome Project recommends the establishment of gene-specific DNA variant databases to facilitate the sharing of DNA variants and decisions about likely disease causation. Databases improve diagnostic accuracy and testing efficiency, and reduce costs. They also help with genotype-phenotype correlations and predictive algorithms. The Human Variome Project advocates databases that use standardised descriptions, are up-to-date, include clinical information and are freely available. Currently, the genes affected in the most common inherited renal diseases correspond to 350 different variant databases, many of which are incomplete or have insufficient clinical details for genotype-phenotype correlations. Assistance is needed from nephrologists to maximise the usefulness of these databases for the diagnosis and management of inherited renal disease. PMID:25384529

  19. Building a Microforms Database at the New York Public Library.

    ERIC Educational Resources Information Center

    Thomas, Wendy

    1990-01-01

    Discusses some problems in accessing the microform collection of the Research Libraries of the New York Public Library using the library's printed and online catalogs. Creation of an in-house automated microforms database is described, including software, selection and downloading of records, updating, and searching the database. (MES)

  20. 75 FR 29155 - Publicly Available Consumer Product Safety Information Database

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-05-24

    ...The Consumer Product Safety Commission (``Commission,'' ``CPSC,'' or ``we'') is issuing a notice of proposed rulemaking that would establish a publicly available consumer product safety information database (``database''). Section 212 of the Consumer Product Safety Improvement Act of 2008 (``CPSIA'') amended the Consumer Product Safety Act (``CPSA'') to require the Commission to establish and......

  1. DNAVaxDB: the first web-based DNA vaccine database and its data analysis.

    PubMed

    Racz, Rebecca; Li, Xinna; Patel, Mukti; Xiang, Zuoshuang; He, Yongqun

    2014-01-01

    Since the first DNA vaccine studies were done in the 1990s, thousands more studies have followed. Here we report the development and analysis of DNAVaxDB (http://www.violinet.org/dnavaxdb), the first publically available web-based DNA vaccine database that curates, stores, and analyzes experimentally verified DNA vaccines, DNA vaccine plasmid vectors, and protective antigens used in DNA vaccines. All data in DNAVaxDB are annotated from reliable resources, particularly peer-reviewed articles. Among over 140 DNA vaccine plasmids, some plasmids were more frequently used in one type of pathogen than others; for example, pCMVi-UB for G- bacterial DNA vaccines, and pCAGGS for viral DNA vaccines. Presently, over 400 DNA vaccines containing over 370 protective antigens from over 90 infectious and non-infectious diseases have been curated in DNAVaxDB. While extracellular and bacterial cell surface proteins and adhesin proteins were frequently used for DNA vaccine development, the majority of protective antigens used in Chlamydophila DNA vaccines are localized to the inner portion of the cell. The DNA vaccine priming, other vaccine boosting vaccination regimen has been widely used to induce protection against infection of different pathogens such as HIV. Parasitic and cancer DNA vaccines were also systematically analyzed. User-friendly web query and visualization interfaces are available in DNAVaxDB for interactive data search. To support data exchange, the information of DNA vaccines, plasmids, and protective antigens is stored in the Vaccine Ontology (VO). DNAVaxDB is targeted to become a timely and vital source of DNA vaccines and related data and facilitate advanced DNA vaccine research and development. PMID:25104313

  2. DNAVaxDB: the first web-based DNA vaccine database and its data analysis

    PubMed Central

    2014-01-01

    Since the first DNA vaccine studies were done in the 1990s, thousands more studies have followed. Here we report the development and analysis of DNAVaxDB (http://www.violinet.org/dnavaxdb), the first publically available web-based DNA vaccine database that curates, stores, and analyzes experimentally verified DNA vaccines, DNA vaccine plasmid vectors, and protective antigens used in DNA vaccines. All data in DNAVaxDB are annotated from reliable resources, particularly peer-reviewed articles. Among over 140 DNA vaccine plasmids, some plasmids were more frequently used in one type of pathogen than others; for example, pCMVi-UB for G- bacterial DNA vaccines, and pCAGGS for viral DNA vaccines. Presently, over 400 DNA vaccines containing over 370 protective antigens from over 90 infectious and non-infectious diseases have been curated in DNAVaxDB. While extracellular and bacterial cell surface proteins and adhesin proteins were frequently used for DNA vaccine development, the majority of protective antigens used in Chlamydophila DNA vaccines are localized to the inner portion of the cell. The DNA vaccine priming, other vaccine boosting vaccination regimen has been widely used to induce protection against infection of different pathogens such as HIV. Parasitic and cancer DNA vaccines were also systematically analyzed. User-friendly web query and visualization interfaces are available in DNAVaxDB for interactive data search. To support data exchange, the information of DNA vaccines, plasmids, and protective antigens is stored in the Vaccine Ontology (VO). DNAVaxDB is targeted to become a timely and vital source of DNA vaccines and related data and facilitate advanced DNA vaccine research and development. PMID:25104313

  3. [Privacy and public benefit in using large scale health databases].

    PubMed

    Yamamoto, Ryuichi

    2014-01-01

    In Japan, large scale heath databases were constructed in a few years, such as National Claim insurance and health checkup database (NDB) and Japanese Sentinel project. But there are some legal issues for making adequate balance between privacy and public benefit by using such databases. NDB is carried based on the act for elderly person's health care but in this act, nothing is mentioned for using this database for general public benefit. Therefore researchers who use this database are forced to pay much concern about anonymization and information security that may disturb the research work itself. Japanese Sentinel project is a national project to detecting drug adverse reaction using large scale distributed clinical databases of large hospitals. Although patients give the future consent for general such purpose for public good, it is still under discussion using insufficiently anonymized data. Generally speaking, researchers of study for public benefit will not infringe patient's privacy, but vague and complex requirements of legislation about personal data protection may disturb the researches. Medical science does not progress without using clinical information, therefore the adequate legislation that is simple and clear for both researchers and patients is strongly required. In Japan, the specific act for balancing privacy and public benefit is now under discussion. The author recommended the researchers including the field of pharmacology should pay attention to, participate in the discussion of, and make suggestion to such act or regulations. PMID:24790041

  4. Navigating from Publications to Astronomical Databases

    NASA Astrophysics Data System (ADS)

    Ochsenbein, François; Bertout, Claude; Lequeux, James; Genova, Françoise

    The implementation of journals on the Web has opened new possibilities for the scientific usage of published results because it is now possible to link published articles to other types of information. The availability of published information in electronic form also allows for new types of content validation, complementary to the referee's validation, and to the layout performed by the publisher. For several years now, authors publishing in A&A are offered the possibility of quoting the astronomical objects they are studying directly in their latex manuscript (via the object macro). Since April 2001, this macro is being translated into an actual link from the article to the SIMBAD database. This experiment is still a prototype, and its various aspects are presented here.

  5. Enhancing thermal video using a public database of images

    NASA Astrophysics Data System (ADS)

    Qadir, Hemin; Kozaitis, S. P.; Ali, Ehsan

    2014-05-01

    We presented a system to display nightime imagery with natural colors using a public database of images. We initially combined two spectral bands of images, thermal and visible, to enhance night vision imagery, however the fused image gave an unnatural color appearance. Therefore, a color transfer based on look-up table (LUT) was used to replace the false color appearance with a colormap derived from a daytime reference image obtained from a public database using the GPS coordinates of the vehicle. Because of the computational demand in deriving the colormap from the reference image, we created an additional local database of colormaps. Reference images from the public database were compared to a compact local database to retrieve one of a limited number of colormaps that represented several driving environments. Each colormap in the local database was stored with an image from which it was derived. To retrieve a colormap, we compared the histogram of the fused image with histograms of images in the local database. The colormaps of the best match was then used for the fused image. Continuously selecting and applying colormaps using this approach offered a convenient way to color night vision imagery.

  6. The publication and database deposition of molecular interaction data.

    PubMed

    Orchard, Sandra; Aranda, Bruno; Hermjakob, Henning

    2010-04-01

    Depositing data to a public domain interaction database not only improves the quality and quantity of interactions available to the user community, but also increases the visibility of the data, with members of the International Molecular Exchange (IMEx) databases making the information available in all participating resources. Datasets submitted prior to publication will be issued an accession number that may be included in a publication and which increases user accessibility to the data. No dataset is too small for submission, and the database curators will provide assistance in ensuring the information is correctly represented. This unit provides several alternative protocols to assist the author in preparing and submitting data as an integral part of the manuscript-preparation process. Which method the author selects is largely dictated by the amount of data to be deposited. In addition, two support protocols describe assignment of unambiguous accession numbers and use of controlled vocabulary terms. PMID:20393973

  7. Prototype Food and Nutrient Database for Dietary Studies: Branded Food Products Database for Public Health Proof of Concept

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The Prototype Food and Nutrient Database for Dietary Studies (Prototype FNDDS) Branded Food Products Database for Public Health is a proof of concept database. The database contains a small selection of food products which is being used to exhibit the approach for incorporation of the Branded Food ...

  8. Genetics and Forensics: Making the National DNA Database

    PubMed Central

    Johnson, Paul; Williams, Robin; Martin, Paul

    2005-01-01

    This paper is based on a current study of the growing police use of the epistemic authority of molecular biology for the identification of criminal suspects in support of crime investigation. It discusses the development of DNA profiling and the establishment and development of the UK National DNA Database (NDNAD) as an instance of the ‘scientification of police work’ (Ericson and Shearing 1986) in which the police uses of science and technology have a recursive effect on their future development. The NDNAD, owned by the Association of Chief Police Officers of England and Wales, is the first of its kind in the world and currently contains the genetic profiles of more than 2 million people. The paper provides a framework for the examination of this socio-technical innovation, begins to tease out the dense and compact history of the database and accounts for the way in which changes and developments across disparate scientific, governmental and policing contexts, have all contributed to the range of uses to which it is put. PMID:16467921

  9. SSAHA: a fast search method for large DNA databases.

    PubMed

    Ning, Z; Cox, A J; Mullikin, J C

    2001-10-01

    We describe an algorithm, SSAHA (Sequence Search and Alignment by Hashing Algorithm), for performing fast searches on databases containing multiple gigabases of DNA. Sequences in the database are preprocessed by breaking them into consecutive k-tuples of k contiguous bases and then using a hash table to store the position of each occurrence of each k-tuple. Searching for a query sequence in the database is done by obtaining from the hash table the "hits" for each k-tuple in the query sequence and then performing a sort on the results. We discuss the effect of the tuple length k on the search speed, memory usage, and sensitivity of the algorithm and present the results of computational experiments which show that SSAHA can be three to four orders of magnitude faster than BLAST or FASTA, while requiring less memory than suffix tree methods. The SSAHA algorithm is used for high-throughput single nucleotide polymorphism (SNP) detection and very large scale sequence assembly. Also, it provides Web-based sequence search facilities for Ensembl projects. PMID:11591649

  10. SSAHA: A Fast Search Method for Large DNA Databases

    PubMed Central

    Ning, Zemin; Cox, Anthony J.; Mullikin, James C.

    2001-01-01

    We describe an algorithm, SSAHA (Sequence Search and Alignment by Hashing Algorithm), for performing fast searches on databases containing multiple gigabases of DNA. Sequences in the database are preprocessed by breaking them into consecutive k-tuples of k contiguous bases and then using a hash table to store the position of each occurrence of each k-tuple. Searching for a query sequence in the database is done by obtaining from the hash table the “hits” for each k-tuple in the query sequence and then performing a sort on the results. We discuss the effect of the tuple length k on the search speed, memory usage, and sensitivity of the algorithm and present the results of computational experiments which show that SSAHA can be three to four orders of magnitude faster than BLAST or FASTA, while requiring less memory than suffix tree methods. The SSAHA algorithm is used for high-throughput single nucleotide polymorphism (SNP) detection and very large scale sequence assembly. Also, it provides Web-based sequence search facilities for Ensembl projects. PMID:11591649

  11. Pathway Analysis for Drug Repositioning Based on Public Database Mining

    PubMed Central

    2015-01-01

    Sixteen FDA-approved drugs were investigated to elucidate their mechanisms of action (MOAs) and clinical functions by pathway analysis based on retrieved drug targets interacting with or affected by the investigated drugs. Protein and gene targets and associated pathways were obtained by data-mining of public databases including the MMDB, PubChem BioAssay, GEO DataSets, and the BioSystems databases. Entrez E-Utilities were applied, and in-house Ruby scripts were developed for data retrieval and pathway analysis to identify and evaluate relevant pathways common to the retrieved drug targets. Pathways pertinent to clinical uses or MOAs were obtained for most drugs. Interestingly, some drugs identified pathways responsible for other diseases than their current therapeutic uses, and these pathways were verified retrospectively by in vitro tests, in vivo tests, or clinical trials. The pathway enrichment analysis based on drug target information from public databases could provide a novel approach for elucidating drug MOAs and repositioning, therefore benefiting the discovery of new therapeutic treatments for diseases. PMID:24460210

  12. High-throughput STR analysis for DNA database using direct PCR.

    PubMed

    Sim, Jeong Eun; Park, Su Jeong; Lee, Han Chul; Kim, Se-Yong; Kim, Jong Yeol; Lee, Seung Hwan

    2013-07-01

    Since the Korean criminal DNA database was launched in 2010, we have focused on establishing an automated DNA database profiling system that analyzes short tandem repeat loci in a high-throughput and cost-effective manner. We established a DNA database profiling system without DNA purification using a direct PCR buffer system. The quality of direct PCR procedures was compared with that of conventional PCR system under their respective optimized conditions. The results revealed not only perfect concordance but also an excellent PCR success rate, good electropherogram quality, and an optimal intra/inter-loci peak height ratio. In particular, the proportion of DNA extraction required due to direct PCR failure could be minimized to <3%. In conclusion, the newly developed direct PCR system can be adopted for automated DNA database profiling systems to replace or supplement conventional PCR system in a time- and cost-saving manner. PMID:23683293

  13. Forensic DNA databases in Western Balkan region: retrospectives, perspectives, and initiatives

    PubMed Central

    Marjanović, Damir; Konjhodžić, Rijad; Butorac, Sara Sanela; Drobnič, Katja; Merkaš, Siniša; Lauc, Gordan; Primorac, Damir; Anđelinović, Šimun; Milosavljević, Mladen; Karan, Željko; Vidović, Stojko; Stojković, Oliver; Panić, Bojana; Vučetić Dragović, Anđelka; Kovačević, Sandra; Jakovski, Zlatko; Asplen, Chris; Primorac, Dragan

    2011-01-01

    The European Network of Forensic Science Institutes (ENFSI) recommended the establishment of forensic DNA databases and specific implementation and management legislations for all EU/ENFSI members. Therefore, forensic institutions from Bosnia and Herzegovina, Serbia, Montenegro, and Macedonia launched a wide set of activities to support these recommendations. To assess the current state, a regional expert team completed detailed screening and investigation of the existing forensic DNA data repositories and associated legislation in these countries. The scope also included relevant concurrent projects and a wide spectrum of different activities in relation to forensics DNA use. The state of forensic DNA analysis was also determined in the neighboring Slovenia and Croatia, which already have functional national DNA databases. There is a need for a ‘regional supplement’ to the current documentation and standards pertaining to forensic application of DNA databases, which should include regional-specific preliminary aims and recommendations. PMID:21674821

  14. The Web-Based DNA Vaccine Database DNAVaxDB and Its Usage for Rational DNA Vaccine Design.

    PubMed

    Racz, Rebecca; He, Yongqun

    2016-01-01

    A DNA vaccine is a vaccine that uses a mammalian expression vector to express one or more protein antigens and is administered in vivo to induce an adaptive immune response. Since the 1990s, a significant amount of research has been performed on DNA vaccines and the mechanisms behind them. To meet the needs of the DNA vaccine research community, we created DNAVaxDB ( http://www.violinet.org/dnavaxdb ), the first Web-based database and analysis resource of experimentally verified DNA vaccines. All the data in DNAVaxDB, which includes plasmids, antigens, vaccines, and sources, is manually curated and experimentally verified. This chapter goes over the detail of DNAVaxDB system and shows how the DNA vaccine database, combined with the Vaxign vaccine design tool, can be used for rational design of a DNA vaccine against a pathogen, such as Mycobacterium bovis. PMID:27076334

  15. Databases and Bioinformatics Tools for the Study of DNA Repair

    PubMed Central

    Milanowska, Kaja; Rother, Kristian; Bujnicki, Janusz M.

    2011-01-01

    DNA is continuously exposed to many different damaging agents such as environmental chemicals, UV light, ionizing radiation, and reactive cellular metabolites. DNA lesions can result in different phenotypical consequences ranging from a number of diseases, including cancer, to cellular malfunction, cell death, or aging. To counteract the deleterious effects of DNA damage, cells have developed various repair systems, including biochemical pathways responsible for the removal of single-strand lesions such as base excision repair (BER) and nucleotide excision repair (NER) or specialized polymerases temporarily taking over lesion-arrested DNA polymerases during the S phase in translesion synthesis (TLS). There are also other mechanisms of DNA repair such as homologous recombination repair (HRR), nonhomologous end-joining repair (NHEJ), or DNA damage response system (DDR). This paper reviews bioinformatics resources specialized in disseminating information about DNA repair pathways, proteins involved in repair mechanisms, damaging agents, and DNA lesions. PMID:22091405

  16. Exploring public databases to characterize urban flood risks in Amsterdam

    NASA Astrophysics Data System (ADS)

    Gaitan, Santiago; ten Veldhuis, Marie-claire; van de Giesen, Nick

    2015-04-01

    Cities worldwide are challenged by increasing urban flood risks. Precise and realistic measures are required to decide upon investment to reduce their impacts. Obvious flooding factors affecting flood risk include sewer systems performance and urban topography. However, currently implemented sewer and topographic models do not provide realistic predictions of local flooding occurrence during heavy rain events. Assessing other factors such as spatially distributed rainfall and socioeconomic characteristics may help to explain probability and impacts of urban flooding. Several public databases were analyzed: complaints about flooding made by citizens, rainfall depths (15 min and 100 Ha spatio-temporal resolution), grids describing number of inhabitants, income, and housing price (1Ha and 25Ha resolution); and buildings age. Data analysis was done using Python and GIS programming, and included spatial indexing of data, cluster analysis, and multivariate regression on the complaints. Complaints were used as a proxy to characterize flooding impacts. The cluster analysis, run for all the variables except the complaints, grouped part of the grid-cells of central Amsterdam into a highly differentiated group, covering 10% of the analyzed area, and accounting for 25% of registered complaints. The configuration of the analyzed variables in central Amsterdam coincides with a high complaint count. Remaining complaints were evenly dispersed along other groups. An adjusted R2 of 0.38 in the multivariate regression suggests that explaining power can improve if additional variables are considered. While rainfall intensity explained 4% of the incidence of complaints, population density and building age significantly explained around 20% each. Data mining of public databases proved to be a valuable tool to identify factors explaining variability in occurrence of urban pluvial flooding, though additional variables must be considered to fully explain flood risk variability.

  17. MitBASE : a comprehensive and integrated mitochondrial DNA database. The present status

    PubMed Central

    Attimonelli, M.; Altamura, N.; Benne, R.; Brennicke, A.; Cooper, J. M.; D’Elia, D.; Montalvo, A. de; Pinto, B. de; De Robertis, M.; Golik, P.; Knoop, V.; Lanave, C.; Lazowska, J.; Licciulli, F.; Malladi, B. S.; Memeo, F.; Monnerot, M.; Pasimeni, R.; Pilbout, S.; Schapira, A. H. V.; Sloof, P.; Saccone, C.

    2000-01-01

    MitBASE is an integrated and comprehensive database of mitochondrial DNA data which collects, under a single interface, databases for Plant, Vertebrate, Invertebrate, Human, Protist and Fungal mtDNA and a Pilot database on nuclear genes involved in mitochondrial biogenesis in Saccharomyces cerevisiae. MitBASE reports all available information from different organisms and from intraspecies variants and mutants. Data have been drawn from the primary databases and from the literature; value adding information has been structured, e.g., editing information on protist mtDNA genomes, pathological information for human mtDNA variants, etc. The different databases, some of which are structured using commercial packages (Microsoft Access, File Maker Pro) while others use a flat-file format, have been integrated under ORACLE. Ad hoc retrieval systems have been devised for some of the above listed databases keeping into account their peculiarities. The database is resident at the EBI and is available at the following site: http://www3.ebi.ac.uk/Research/Mitbase/mitbase.pl . The impact of this project is intended for both basic and applied research. The study of mitochondrial genetic diseases and mitochondrial DNA intraspecies diversity are key topics in several biotechnological fields. The database has been funded within the EU Biotechnology programme. PMID:10592207

  18. The EpiSLI Database: A Publicly Available Database on Speech and Language

    ERIC Educational Resources Information Center

    Tomblin, J. Bruce

    2010-01-01

    Purpose: This article describes a database that was created in the process of conducting a large-scale epidemiologic study of specific language impairment (SLI). As such, this database will be referred to as the EpiSLI database. Children with SLI have unexpected and unexplained difficulties learning and using spoken language. Although there is no…

  19. DNA Fingerprint Database for Crapemyrtle Cultivar Identification, Hybrid Verification, and Parentage Analysis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The objective of this study was to create DNA fingerprints for the Razzle Dazzle® crape myrtle series using simple sequence repeat (SSR) markers, and compare them with the DNA fingerprints of a database made up of over 50 popular crape myrtle cultivars currently available in the trade. Data consiste...

  20. The Moroccan Genetic Disease Database (MGDD): a database for DNA variations related to inherited disorders and disease susceptibility

    PubMed Central

    Charoute, Hicham; Nahili, Halima; Abidi, Omar; Gabi, Khalid; Rouba, Hassan; Fakiri, Malika; Barakat, Abdelhamid

    2014-01-01

    National and ethnic mutation databases provide comprehensive information about genetic variations reported in a population or an ethnic group. In this paper, we present the Moroccan Genetic Disease Database (MGDD), a catalogue of genetic data related to diseases identified in the Moroccan population. We used the PubMed, Web of Science and Google Scholar databases to identify available articles published until April 2013. The Database is designed and implemented on a three-tier model using Mysql relational database and the PHP programming language. To date, the database contains 425 mutations and 208 polymorphisms found in 301 genes and 259 diseases. Most Mendelian diseases in the Moroccan population follow autosomal recessive mode of inheritance (74.17%) and affect endocrine, nutritional and metabolic physiology. The MGDD database provides reference information for researchers, clinicians and health professionals through a user-friendly Web interface. Its content should be useful to improve researches in human molecular genetics, disease diagnoses and design of association studies. MGDD can be publicly accessed at http://mgdd.pasteur.ma. PMID:23860041

  1. The Moroccan Genetic Disease Database (MGDD): a database for DNA variations related to inherited disorders and disease susceptibility.

    PubMed

    Charoute, Hicham; Nahili, Halima; Abidi, Omar; Gabi, Khalid; Rouba, Hassan; Fakiri, Malika; Barakat, Abdelhamid

    2014-03-01

    National and ethnic mutation databases provide comprehensive information about genetic variations reported in a population or an ethnic group. In this paper, we present the Moroccan Genetic Disease Database (MGDD), a catalogue of genetic data related to diseases identified in the Moroccan population. We used the PubMed, Web of Science and Google Scholar databases to identify available articles published until April 2013. The Database is designed and implemented on a three-tier model using Mysql relational database and the PHP programming language. To date, the database contains 425 mutations and 208 polymorphisms found in 301 genes and 259 diseases. Most Mendelian diseases in the Moroccan population follow autosomal recessive mode of inheritance (74.17%) and affect endocrine, nutritional and metabolic physiology. The MGDD database provides reference information for researchers, clinicians and health professionals through a user-friendly Web interface. Its content should be useful to improve researches in human molecular genetics, disease diagnoses and design of association studies. MGDD can be publicly accessed at http://mgdd.pasteur.ma. PMID:23860041

  2. TabSQL: a MySQL tool to facilitate mapping user data to public databases

    PubMed Central

    2010-01-01

    Background With advances in high-throughput genomics and proteomics, it is challenging for biologists to deal with large data files and to map their data to annotations in public databases. Results We developed TabSQL, a MySQL-based application tool, for viewing, filtering and querying data files with large numbers of rows. TabSQL provides functions for downloading and installing table files from public databases including the Gene Ontology database (GO), the Ensembl databases, and genome databases from the UCSC genome bioinformatics site. Any other database that provides tab-delimited flat files can also be imported. The downloaded gene annotation tables can be queried together with users' data in TabSQL using either a graphic interface or command line. Conclusions TabSQL allows queries across the user's data and public databases without programming. It is a convenient tool for biologists to annotate and enrich their data. PMID:20573251

  3. Generation and analysis of a 29,745 unique Expressed Sequence Tags from the Pacific oyster (Crassostrea gigas) assembled into a publicly accessible database: the GigasDatabase

    PubMed Central

    2009-01-01

    Background Although bivalves are among the most-studied marine organisms because of their ecological role and economic importance, very little information is available on the genome sequences of oyster species. This report documents three large-scale cDNA sequencing projects for the Pacific oyster Crassostrea gigas initiated to provide a large number of expressed sequence tags that were subsequently compiled in a publicly accessible database. This resource allowed for the identification of a large number of transcripts and provides valuable information for ongoing investigations of tissue-specific and stimulus-dependant gene expression patterns. These data are crucial for constructing comprehensive DNA microarrays, identifying single nucleotide polymorphisms and microsatellites in coding regions, and for identifying genes when the entire genome sequence of C. gigas becomes available. Description In the present paper, we report the production of 40,845 high-quality ESTs that identify 29,745 unique transcribed sequences consisting of 7,940 contigs and 21,805 singletons. All of these new sequences, together with existing public sequence data, have been compiled into a publicly-available Website http://public-contigbrowser.sigenae.org:9090/Crassostrea_gigas/index.html. Approximately 43% of the unique ESTs had significant matches against the SwissProt database and 27% were annotated using Gene Ontology terms. In addition, we identified a total of 208 in silico microsatellites from the ESTs, with 173 having sufficient flanking sequence for primer design. We also identified a total of 7,530 putative in silico, single-nucleotide polymorphisms using existing and newly-generated EST resources for the Pacific oyster. Conclusion A publicly-available database has been populated with 29,745 unique sequences for the Pacific oyster Crassostrea gigas. The database provides many tools to search cleaned and assembled ESTs. The user may input and submit several filters, such as protein or nucleotide hits, to select and download relevant elements. This database constitutes one of the most developed genomic resources accessible among Lophotrochozoans, an orphan clade of bilateral animals. These data will accelerate the development of both genomics and genetics in a commercially-important species with the highest annual, commercial production of any aquatic organism. PMID:19640306

  4. DNA patenting: implications for public health research.

    PubMed Central

    Dutfield, Graham

    2006-01-01

    I weigh the arguments for and against the patenting of functional DNA sequences including genes, and find the objections to be compelling. Is an outright ban on DNA patenting the right policy response? Not necessarily. Governments may wish to consider options ranging from patent law reforms to the creation of new rights. There are alternative ways to protect DNA sequences that industry may choose if DNA patenting is restricted or banned. Some of these alternatives may be more harmful than patents. Such unintended consequences of patent bans mean that we should think hard before concluding that prohibition is the only response to legitimate concerns about the appropriateness of patents in the field of human genomics. PMID:16710549

  5. Amerindian mitochondrial DNA haplogroups predominate in the population of Argentina: towards a first nationwide forensic mitochondrial DNA sequence database.

    PubMed

    Bobillo, Maria Cecilia; Zimmermann, Bettina; Sala, Andrea; Huber, Gabriela; Röck, Alexander; Bandelt, Hans-Jürgen; Corach, Daniel; Parson, Walther

    2010-07-01

    The study presents South American mitochondrial DNA (mtDNA) data from selected north (N = 98), central (N = 193) and south (N = 47) Argentinean populations. Sequence analysis of the complete mtDNA control region (CR, 16024-576) resulted in 288 unique haplotypes ignoring C-insertions around positions 16193, 309, and 573; the additional analysis of coding region single nucleotide polymorphisms enabled a fine classification of the described lineages. The Amerindian haplogroups were most frequent in the north and south representing more than 60% of the sequences. A slightly different situation was observed in central Argentina where the Amerindian haplogroups represented less than 50%, and the European contribution was more relevant. Particular clades of the Amerindian subhaplogroups turned out to be nearly region-specific. A minor contribution of African lineages was observed throughout the country. This comprehensive admixture of worldwide mtDNA lineages and the regional specificity of certain clades in the Argentinean population underscore the necessity of carefully selecting regional samples in order to develop a nationwide mtDNA database for forensic and anthropological purposes. The mtDNA sequencing and analysis were performed under EMPOP guidelines in order to attain high quality for the mtDNA database. PMID:19680675

  6. Database Changes (Post-Publication). ERIC Processing Manual, Section X.

    ERIC Educational Resources Information Center

    Brandhorst, Ted, Ed.

    The purpose of this section is to specify the procedure for making changes to the ERIC database after the data involved have been announced in the abstract journals RIE or CIJE. As a matter of general ERIC policy, a document or journal article is not re-announced or re-entered into the database as a new accession for the purpose of accomplishing a…

  7. Characterization and compilation of polymorphic simple sequence repeat (SSR) markers of peanut from public database

    PubMed Central

    2012-01-01

    Background There are several reports describing thousands of SSR markers in the peanut (Arachis hypogaea L.) genome. There is a need to integrate various research reports of peanut DNA polymorphism into a single platform. Further, because of lack of uniformity in the labeling of these markers across the publications, there is some confusion on the identities of many markers. We describe below an effort to develop a central comprehensive database of polymorphic SSR markers in peanut. Findings We compiled 1,343 SSR markers as detecting polymorphism (14.5%) within a total of 9,274 markers. Amongst all polymorphic SSRs examined, we found that AG motif (36.5%) was the most abundant followed by AAG (12.1%), AAT (10.9%), and AT (10.3%).The mean length of SSR repeats in dinucleotide SSRs was significantly longer than that in trinucleotide SSRs. Dinucleotide SSRs showed higher polymorphism frequency for genomic SSRs when compared to trinucleotide SSRs, while for EST-SSRs, the frequency of polymorphic SSRs was higher in trinucleotide SSRs than in dinucleotide SSRs. The correlation of the length of SSR and the frequency of polymorphism revealed that the frequency of polymorphism was decreased as motif repeat number increased. Conclusions The assembled polymorphic SSRs would enhance the density of the existing genetic maps of peanut, which could also be a useful source of DNA markers suitable for high-throughput QTL mapping and marker-assisted selection in peanut improvement and thus would be of value to breeders. PMID:22818284

  8. 75 FR 76831 - Publicly Available Consumer Product Safety Information Database

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-12-09

    ... available database. On May 24, 2010, we published a notice of proposed rulemaking at 75 FR 29156, which set... stakeholder input and comment, all of which were discussed in the preamble to the proposed rule at 75 FR 29156... (75 FR 29156, May 24, 2010) pertaining to each section. In addition to comments on each of...

  9. Toward a mtDNA Locus-Specific Mutation Database Using the LOVD Platform

    PubMed Central

    Elson, Joanna L.; Sweeney, Mary G.; Procaccio, Vincent; Yarham, John W.; Salas, Antonio; Kong, Qing-Peng; van der Westhuizen, Francois H.; Pitceathly, Robert D.S.; Thorburn, David R.; Lott, Marie T.; Wallace, Douglas C.; Taylor, Robert W.; McFarland, Robert

    2015-01-01

    The Human Variome Project (HVP) is a global effort to collect and curate all human genetic variation affecting health. Mutations of mitochondrial DNA (mtDNA) are an important cause of neurogenetic disease in humans; however, identification of the pathogenic mutations responsible can be problematic. In this article, we provide explanations as to why and suggest how such difficulties might be overcome. We put forward a case in support of a new Locus Specific Mutation Database (LSDB) implemented using the Leiden Open-source Variation Database (LOVD) system that will not only list primary mutations, but also present the evidence supporting their role in disease. Critically, we feel that this new database should have the capacity to store information on the observed phenotypes alongside the genetic variation, thereby facilitating our understanding of the complex and variable presentation of mtDNA disease. LOVD supports fast queries of both seen and hidden data and allows storage of sequence variants from high-throughput sequence analysis. The LOVD platform will allow construction of a secure mtDNA database; one that can fully utilize currently available data, as well as that being generated by high-throughput sequencing, to link genotype with phenotype enhancing our understanding of mitochondrial disease, with a view to providing better prognostic information. PMID:22581690

  10. An Internet-Accessible DNA Sequence Database for Identifying Fusaria from Human and Animal Infections

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Because less than one-third of clinically relevant fusaria can be accurately identified to species level using phenotypic data (i.e., morphological species recognition), we constructed a three-locus DNA sequence database to facilitate molecular identification of the 69 Fusarium species associated wi...

  11. DNAtraffic—a new database for systems biology of DNA dynamics during the cell life

    PubMed Central

    Kuchta, Krzysztof; Barszcz, Daniela; Grzesiuk, Elzbieta; Pomorski, Pawel; Krwawicz, Joanna

    2012-01-01

    DNAtraffic (http://dnatraffic.ibb.waw.pl/) is dedicated to be a unique comprehensive and richly annotated database of genome dynamics during the cell life. It contains extensive data on the nomenclature, ontology, structure and function of proteins related to the DNA integrity mechanisms such as chromatin remodeling, histone modifications, DNA repair and damage response from eight organisms: Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Escherichia coli and Arabidopsis thaliana. DNAtraffic contains comprehensive information on the diseases related to the assembled human proteins. DNAtraffic is richly annotated in the systemic information on the nomenclature, chemistry and structure of DNA damage and their sources, including environmental agents or commonly used drugs targeting nucleic acids and/or proteins involved in the maintenance of genome stability. One of the DNAtraffic database aim is to create the first platform of the combinatorial complexity of DNA network analysis. Database includes illustrations of pathways, damage, proteins and drugs. Since DNAtraffic is designed to cover a broad spectrum of scientific disciplines, it has to be extensively linked to numerous external data sources. Our database represents the result of the manual annotation work aimed at making the DNAtraffic much more useful for a wide range of systems biology applications. PMID:22110027

  12. DSSTOX WEBSITE LAUNCH: IMPROVING PUBLIC ACCESS TO DATABASES FOR BUILDING STRUCTURE-TOXICITY PREDICTION MODELS

    EPA Science Inventory

    DSSTox Website Launch: Improving Public Access to Databases for Building Structure-Toxicity Prediction Models
    Ann M. Richard
    US Environmental Protection Agency, Research Triangle Park, NC, USA

    Distributed: Decentralized set of standardized, field-delimited databases,...

  13. Strengthening the United States' database protection laws: balancing public access and private control.

    PubMed

    Resnik, David B

    2003-07-01

    This paper develops three arguments for increasing the strength of database protection under U.S. law. First, stronger protections would encourage private investment in database development, and private databases have many potential benefits for science and industry. Second, stronger protections would discourage extensive use of private licenses to protect databases and would allow for greater public control over database laws and policies. Third, stronger database protections in the U.S. would harmonize U.S. and E.U. laws and would thus enhance international trade, commerce, and research. The U.S. should therefore follow the European example and develop two tiers of protection for databases: 1) protection for creative databases under copyright law; 2) protection for non-creative databases through a special type of sui generis protection. In order to balance private control of data and public access to data, sui generis protections should define a "fair use" exemption that permits some unauthorized extraction of data for private, educational, and research purposes, provided that such extraction does not adversely impact the economic value of the database. PMID:12971291

  14. 76 FR 53912 - FDA's Public Database of Products With Orphan-Drug Designation: Replacing Non-Informative Code...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-08-30

    ... HUMAN SERVICES Food and Drug Administration FDA's Public Database of Products With Orphan-Drug... its public database of products that have received orphan-drug designation. The Orphan Drug Act... received orphan designation were published on our public database with non-informative code names....

  15. Prisoners' expectations of the national forensic DNA database: surveillance and reconfiguration of individual rights.

    PubMed

    Machado, Helena; Santos, Filipe; Silva, Susana

    2011-07-15

    In this paper we aim to discuss how Portuguese prisoners know and what they feel about surveillance mechanisms related to the inclusion and deletion of the DNA profiles of convicted criminals in the national forensic database. Through a set of interviews with individuals currently imprisoned we focus on the ways this group perceives forensic DNA technologies. While the institutional and political discourses maintain that the restricted use and application of DNA profiles within the national forensic database protects individuals' rights, the prisoners claim that police misuse of such technologies potentially makes it difficult to escape from surveillance and acts as a mean of reinforcing the stigma of delinquency. The prisoners also argue that additional intensive and extensive use of surveillance devices might be more protective of their own individual rights and might possibly increase potential for exoneration. PMID:21414735

  16. Bioethical Biobanks: Three Concerns in Designing and Using Law Enforcement DNA Identification Databases

    SciTech Connect

    D.H. Kaye

    2006-10-19

    Federal and state law enforcement authorities have amassed large collections of DNA samples and the identifying profiles derived from them. These databases help to identify the guilty and to exonerate the innocent, but as the databanks grow, so do fears about civil liberties. The research reported here discusses three legal and social policy issues that have been raised in regard to these biobanks—the choice of loci to type for identifying individuals, the indefinite retention of DNA samples, and the use of the DNA samples or the identifying profiles for research purposes. It also considers the possible value of the databases for research into the genetics of human behavior and the ethics of using them for this purpose. It rejects the broad claim that such research is inherently unethical but proposes procedures for ensuring that the value of the proposed research justifies any psychosocial or other risks to the subjects of the research.

  17. An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system

    DOE PAGESBeta

    AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

    2015-11-19

    Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less

  18. An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system

    SciTech Connect

    AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

    2015-11-19

    Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database in which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.

  19. FastStats: a public health statistics database.

    PubMed

    Vardell, Emily

    2014-01-01

    FastStats is a site that provides quick and easy access to public health statistics. The freely available website is maintained by the Centers for Disease Control and Prevention's National Center for Health Statistics. Users can browse alphabetically by topic and state/territory or search across the National Center for Health Statistics site. A description of the browsing capabilities and sample searches are presented. PMID:24735268

  20. Internet-Accessible DNA Sequence Database for Identifying Fusaria from Human and Animal Infections ▿

    PubMed Central

    O'Donnell, Kerry; Sutton, Deanna A.; Rinaldi, Michael G.; Sarver, Brice A. J.; Balajee, S. Arunmozhi; Schroers, Hans-Josef; Summerbell, Richard C.; Robert, Vincent A. R. G.; Crous, Pedro W.; Zhang, Ning; Aoki, Takayuki; Jung, Kyongyong; Park, Jongsun; Lee, Yong-Hwan; Kang, Seogchan; Park, Bongsoo; Geiser, David M.

    2010-01-01

    Because less than one-third of clinically relevant fusaria can be accurately identified to species level using phenotypic data (i.e., morphological species recognition), we constructed a three-locus DNA sequence database to facilitate molecular identification of the 69 Fusarium species associated with human or animal mycoses encountered in clinical microbiology laboratories. The database comprises partial sequences from three nuclear genes: translation elongation factor 1α (EF-1α), the largest subunit of RNA polymerase (RPB1), and the second largest subunit of RNA polymerase (RPB2). These three gene fragments can be amplified by PCR and sequenced using primers that are conserved across the phylogenetic breadth of Fusarium. Phylogenetic analyses of the combined data set reveal that, with the exception of two monotypic lineages, all clinically relevant fusaria are nested in one of eight variously sized and strongly supported species complexes. The monophyletic lineages have been named informally to facilitate communication of an isolate's clade membership and genetic diversity. To identify isolates to the species included within the database, partial DNA sequence data from one or more of the three genes can be used as a BLAST query against the database which is Web accessible at FUSARIUM-ID (http://isolate.fusariumdb.org) and the Centraalbureau voor Schimmelcultures (CBS-KNAW) Fungal Biodiversity Center (http://www.cbs.knaw.nl/fusarium). Alternatively, isolates can be identified via phylogenetic analysis by adding sequences of unknowns to the DNA sequence alignment, which can be downloaded from the two aforementioned websites. The utility of this database should increase significantly as members of the clinical microbiology community deposit in internationally accessible culture collections (e.g., CBS-KNAW or the Fusarium Research Center) cultures of novel mycosis-associated fusaria, along with associated, corrected sequence chromatograms and data, so that the sequence results can be verified and isolates are made available for future study. PMID:20686083

  1. Internet-accessible DNA sequence database for identifying fusaria from human and animal infections.

    PubMed

    O'Donnell, Kerry; Sutton, Deanna A; Rinaldi, Michael G; Sarver, Brice A J; Balajee, S Arunmozhi; Schroers, Hans-Josef; Summerbell, Richard C; Robert, Vincent A R G; Crous, Pedro W; Zhang, Ning; Aoki, Takayuki; Jung, Kyongyong; Park, Jongsun; Lee, Yong-Hwan; Kang, Seogchan; Park, Bongsoo; Geiser, David M

    2010-10-01

    Because less than one-third of clinically relevant fusaria can be accurately identified to species level using phenotypic data (i.e., morphological species recognition), we constructed a three-locus DNA sequence database to facilitate molecular identification of the 69 Fusarium species associated with human or animal mycoses encountered in clinical microbiology laboratories. The database comprises partial sequences from three nuclear genes: translation elongation factor 1α (EF-1α), the largest subunit of RNA polymerase (RPB1), and the second largest subunit of RNA polymerase (RPB2). These three gene fragments can be amplified by PCR and sequenced using primers that are conserved across the phylogenetic breadth of Fusarium. Phylogenetic analyses of the combined data set reveal that, with the exception of two monotypic lineages, all clinically relevant fusaria are nested in one of eight variously sized and strongly supported species complexes. The monophyletic lineages have been named informally to facilitate communication of an isolate's clade membership and genetic diversity. To identify isolates to the species included within the database, partial DNA sequence data from one or more of the three genes can be used as a BLAST query against the database which is Web accessible at FUSARIUM-ID (http://isolate.fusariumdb.org) and the Centraalbureau voor Schimmelcultures (CBS-KNAW) Fungal Biodiversity Center (http://www.cbs.knaw.nl/fusarium). Alternatively, isolates can be identified via phylogenetic analysis by adding sequences of unknowns to the DNA sequence alignment, which can be downloaded from the two aforementioned websites. The utility of this database should increase significantly as members of the clinical microbiology community deposit in internationally accessible culture collections (e.g., CBS-KNAW or the Fusarium Research Center) cultures of novel mycosis-associated fusaria, along with associated, corrected sequence chromatograms and data, so that the sequence results can be verified and isolates are made available for future study. PMID:20686083

  2. From metaphor to practices: The introduction of "information engineers" into the first DNA sequence database.

    PubMed

    García-Sancho, Miguel

    2011-01-01

    This paper explores the introduction of professional systems engineers and information management practices into the first centralized DNA sequence database, developed at the European Molecular Biology Laboratory (EMBL) during the 1980s. In so doing, it complements the literature on the emergence of an information discourse after World War II and its subsequent influence in biological research. By the careers of the database creators and the computer algorithms they designed, analyzing, from the mid-1960s onwards information in biology gradually shifted from a pervasive metaphor to be embodied in practices and professionals such as those incorporated at the EMBL. I then investigate the reception of these database professionals by the EMBL biological staff, which evolved from initial disregard to necessary collaboration as the relationship between DNA, genes, and proteins turned out to be more complex than expected. The trajectories of the database professionals at the EMBL suggest that the initial subject matter of the historiography of genomics should be the long-standing practices that emerged after World War II and to a large extent originated outside biomedicine and academia. Only after addressing these practices, historians may turn to their further disciplinary assemblage in fields such as bioinformatics or biotechnology. PMID:21789956

  3. [Brazilian Food Composition Database (TBCA-USP): Data compilation to serve the public good].

    PubMed

    Cardoso Lopes, Tássia do Vale; Cyrillo, Denise Cavallini; Giuntini, Eliana Bistriche; Lajolo, Franco Maria; De Menezes, Elizabete Wenzel

    2015-09-01

    The article shows the evolution of the Brazilian Food Composition Database (TBCA-USP), since its creation until its next update. The article characterizes the TBCA-USP database like a public good and highlights the importance of the food composition data compilation as a high cost-effective activity. It reports the social relevance of the information about food composition and the importance of this database in the national context. It also indicates extension and update strategies of the TBCA-USP. PMID:26821491

  4. HUGE: a database for human large proteins identified in the Kazusa cDNA sequencing project.

    PubMed

    Kikuno, R; Nagase, T; Suyama, M; Waki, M; Hirosawa, M; Ohara, O

    2000-01-01

    HUGE is a database for human large proteins newly identified in the Kazusa cDNA project, the aim of which is to predict the primary structure of proteins from the sequences of human large cDNAs (>4 kb). In particular, cDNA clones capable of coding for large proteins (>50 kDa) are the current targets of the project. HUGE contains >1100 cDNA sequences and detailed information obtained through analysis of the sequences of cDNAs and the predicted proteins. Besides an increase in the number of cDNA entries, the amount of experimental data for expression profiling has been largely increased and data on chromosomal locations have been newly added. All of the protein-coding regions were examined by GeneMark analysis, and the results of a motif/domain search of each predicted protein sequence against the Pfam database have been newly added. HUGE is available through the WWW at http://www.kazusa.or.jp/huge PMID:10592264

  5. Development and Evaluation of a Quality-Controlled Ribosomal Sequence Database for 16S Ribosomal DNA-Based Identification of Staphylococcus Species

    PubMed Central

    Becker, Karsten; Harmsen, Dag; Mellmann, Alexander; Meier, Christian; Schumann, Peter; Peters, Georg; von Eiff, Christof

    2004-01-01

    To establish an improved ribosomal gene sequence database as part of the Ribosomal Differentiation of Microorganisms (RIDOM) project and to overcome the drawbacks of phenotypic identification systems and publicly accessible sequence databases, both strands of the 5′ end of the 16S ribosomal DNA (rDNA) of 81 type and reference strains comprising all validly described staphylococcal (sub)species were sequenced. Assuming a normal distribution for pairwise distances of all unique staphylococcal sequences and choosing a reporting criterion of ≥98.7% similarity for a “distinct species,” a statistical error probability of 1.0% was calculated. To evaluate this database, a 16S rDNA fragment (corresponding to Escherichia coli positions 54 to 510) of 55 clinical Staphylococcus isolates (including those of the small-colony variant phenotype) were sequenced and analyzed by the RIDOM approach. Of these isolates, 54 (98.2%) had a similarity score above the proposed threshold using RIDOM; 48 (87.3%) of the sequences gave a perfect match, whereas 83.6% were found by searching National Center for Biotechnology Information (NCBI) database entries. In contrast to RIDOM, which showed four ambiguities at the species level (mainly concerning Staphylococcus intermedius versus Staphylococcus delphini), the NCBI database search yielded 18 taxon-related ambiguities and showed numerous matches exhibiting redundant or unspecified entries. Comparing molecular results with those of biochemical procedures, ID 32 Staph (bioMérieux, Marcy I'Etoile, France) and VITEK 2 (bioMérieux) failed to identify 13 (23.6%) and 19 (34.5%) isolates, respectively, due to incorrect identification and/or categorization below acceptable values. In contrast to phenotypic methods and the NCBI database, the novel high-quality RIDOM sequence database provides excellent identification of staphylococci, including rarely isolated species and phenotypic variants. PMID:15528685

  6. Development and evaluation of a quality-controlled ribosomal sequence database for 16S ribosomal DNA-based identification of Staphylococcus species.

    PubMed

    Becker, Karsten; Harmsen, Dag; Mellmann, Alexander; Meier, Christian; Schumann, Peter; Peters, Georg; von Eiff, Christof

    2004-11-01

    To establish an improved ribosomal gene sequence database as part of the Ribosomal Differentiation of Microorganisms (RIDOM) project and to overcome the drawbacks of phenotypic identification systems and publicly accessible sequence databases, both strands of the 5' end of the 16S ribosomal DNA (rDNA) of 81 type and reference strains comprising all validly described staphylococcal (sub)species were sequenced. Assuming a normal distribution for pairwise distances of all unique staphylococcal sequences and choosing a reporting criterion of > or =98.7% similarity for a "distinct species," a statistical error probability of 1.0% was calculated. To evaluate this database, a 16S rDNA fragment (corresponding to Escherichia coli positions 54 to 510) of 55 clinical Staphylococcus isolates (including those of the small-colony variant phenotype) were sequenced and analyzed by the RIDOM approach. Of these isolates, 54 (98.2%) had a similarity score above the proposed threshold using RIDOM; 48 (87.3%) of the sequences gave a perfect match, whereas 83.6% were found by searching National Center for Biotechnology Information (NCBI) database entries. In contrast to RIDOM, which showed four ambiguities at the species level (mainly concerning Staphylococcus intermedius versus Staphylococcus delphini), the NCBI database search yielded 18 taxon-related ambiguities and showed numerous matches exhibiting redundant or unspecified entries. Comparing molecular results with those of biochemical procedures, ID 32 Staph (bioMerieux, Marcy I'Etoile, France) and VITEK 2 (bioMerieux) failed to identify 13 (23.6%) and 19 (34.5%) isolates, respectively, due to incorrect identification and/or categorization below acceptable values. In contrast to phenotypic methods and the NCBI database, the novel high-quality RIDOM sequence database provides excellent identification of staphylococci, including rarely isolated species and phenotypic variants. PMID:15528685

  7. PUBLIC HEALTH AND EPIDEMIOLOGICAL DATABASES FOR THE ENHANCEMENT OF MEDICAL EDUCATION.

    PubMed

    Jamal, Qazi Mohammad Sajid; Siddiqui, Mughees Uddin; Alzohairy, Mohammad Abdulrahman; Al Karaawi, Mohammed Abdullah

    2015-01-01

    The collaboration of public health education and information technology has made patient care safer and more reliable than before. Nurses and doctors use handheld computers to record a patient's medical history and check that they are administering the correct treatment. Fortunately Public Health Informatics (PHI) is the intersecting point of technology and public health. Therefore, the inclusion of online medical and epidemiology databases in the course curriculum of budding medical professionals and postgraduate students would be beneficial in enhancing the quality of health care, extensive epidemiological research, health education, health policies, health planning and consumer satisfaction as well. The purpose of this article is to discuss and provide introduction of various databases which have huge information and it could be used to enhance the public health education. PMID:26392847

  8. PUBLIC HEALTH AND EPIDEMIOLOGICAL DATABASES FOR THE ENHANCEMENT OF MEDICAL EDUCATION

    PubMed Central

    Jamal, Qazi Mohammad Sajid; Siddiqui, Mughees Uddin; Alzohairy, Mohammad Abdulrahman; Al Karaawi, Mohammed Abdullah

    2015-01-01

    The collaboration of public health education and information technology has made patient care safer and more reliable than before. Nurses and doctors use handheld computers to record a patient's medical history and check that they are administering the correct treatment. Fortunately Public Health Informatics (PHI) is the intersecting point of technology and public health. Therefore, the inclusion of online medical and epidemiology databases in the course curriculum of budding medical professionals and postgraduate students would be beneficial in enhancing the quality of health care, extensive epidemiological research, health education, health policies, health planning and consumer satisfaction as well. The purpose of this article is to discuss and provide introduction of various databases which have huge information and it could be used to enhance the public health education. PMID:26392847

  9. Feline Non-repetitive Mitochondrial DNA Control Region Database for Forensic Evidence

    PubMed Central

    Grahn, R. A.; Kurushima, J. D.; Billings, N. C.; Grahn, J.C.; Halverson, J. L.; Hammer, E.; Ho, C.K.; Kun, T. J.; Levy, J.K.; Lipinski, M. J.; Mwenda, J.M.; Ozpinar, H.; Schuster, R.K; Shoorijeh, S.J.; Tarditi, C. R.; Waly, N.E.; Wictum, E. J.; Lyons, L. A.

    2010-01-01

    The domestic cat is the one of the most popular pets throughout the world. A by-product of owning, interacting with, or being in a household with a cat is the transfer of shed fur to clothing or personal objects. As trace evidence, transferred cat fur is a relatively untapped resource for forensic scientists. Both phenotypic and genotypic characteristics can be obtained from cat fur, but databases for neither aspect exist. Because cats incessantly groom, cat fur may have nucleated cells, not only in the hair bulb, but also as epithelial cells on the hair shaft deposited during the grooming process, thereby generally providing material for DNA profiling. To effectively exploit cat hair as a resource, representative databases must be established. This study evaluates 402 bp of the mtDNA control region (CR) from 1,394 cats, including cats from 25 distinct worldwide populations and 26 breeds. Eighty-three percent of the cats are represented by 12 major mitotypes. An additional 8.0% are clearly derived from the major mitotypes. Unique sequences were found in 7.5% of the cats. The overall genetic diversity for this data set was 0.8813 ± 0.0046 with a random match probability of 11.8%. This region of the cat mtDNA has discriminatory power suitable for forensic application worldwide. PMID:20457082

  10. The Neutron Monitor database as a tool for space weather, education, and public outreach

    NASA Astrophysics Data System (ADS)

    Steigies, Christian T.; Klein, Karl-Ludwig; Bütikofer, Rolf

    2014-05-01

    The Neutron Monitor database (NMDB) was created to make measurements from ground-based Neutron Monitors easily accessible. Data from more than 40 stations is available in the database and can be plotted via a webpage and downloaded as ASCII tables for further processing. Real-time applications, like the GLE Alert, can access the database directly. The NMDB project has also hosted training sessions and created extensive public outreach and training material that has been translated into 11 languages. This material is openly available on the NMDB website and is frequently used in highschool and university courses. While the availability of data from currently operating stations is nearing completion, the availability of historical data, especially no longer operating stations, is still limited. We are currently trying to fill these gaps. As a first step a project to make NMDB compatible with the database of relativistic solar particle events (GLEs) is starting this year.

  11. Governing Software: Networks, Databases and Algorithmic Power in the Digital Governance of Public Education

    ERIC Educational Resources Information Center

    Williamson, Ben

    2015-01-01

    This article examines the emergence of "digital governance" in public education in England. Drawing on and combining concepts from software studies, policy and political studies, it identifies some specific approaches to digital governance facilitated by network-based communications and database-driven information processing software…

  12. Governing Software: Networks, Databases and Algorithmic Power in the Digital Governance of Public Education

    ERIC Educational Resources Information Center

    Williamson, Ben

    2015-01-01

    This article examines the emergence of "digital governance" in public education in England. Drawing on and combining concepts from software studies, policy and political studies, it identifies some specific approaches to digital governance facilitated by network-based communications and database-driven information processing software

  13. HEDS - EPA DATABASE SYSTEM FOR PUBLIC ACCESS TO HUMAN EXPOSURE DATA

    EPA Science Inventory

    Human Exposure Database System (HEDS) is an Internet-based system developed to provide public access to human-exposure-related data from studies conducted by EPA's National Exposure Research Laboratory (NERL). HEDS was designed to work with the EPA Office of Research and Devel...

  14. An efficient similarity search based on indexing in large DNA databases.

    PubMed

    Jeong, In-Seon; Park, Kyoung-Wook; Kang, Seung-Ho; Lim, Hyeong-Seok

    2010-04-01

    Index-based search algorithms are an important part of a genomic search, and how to construct indices is the key to an index-based search algorithm to compute similarities between two DNA sequences. In this paper, we propose an efficient query processing method that uses special transformations to construct an index. It uses small storage and it rapidly finds the similarity between two sequences in a DNA sequence database. At first, a sequence is partitioned into equal length windows. We select the likely subsequences by computing Hamming distance to query sequence. The algorithm then transforms the subsequences in each window into a multidimensional vector space by indexing the frequencies of the characters, including the positional information of the characters in the subsequences. The result of our experiments shows that the algorithm has faster run time than other heuristic algorithms based on index structure. Also, the algorithm is as accurate as those heuristic algorithms. PMID:20418167

  15. Information Technologies in Public Health Management: A Database on Biocides to Improve Quality of Life

    PubMed Central

    Roman, C; Scripcariu, L; Diaconescu, RM; Grigoriu, A

    2012-01-01

    Background Biocides for prolonging the shelf life of a large variety of materials have been extensively used over the last decades. It has estimated that the worldwide biocide consumption to be about 12.4 billion dollars in 2011, and is expected to increase in 2012. As biocides are substances we get in contact with in our everyday lives, access to this type of information is of paramount importance in order to ensure an appropriate living environment. Consequently, a database where information may be quickly processed, sorted, and easily accessed, according to different search criteria, is the most desirable solution. The main aim of this work was to design and implement a relational database with complete information about biocides used in public health management to improve the quality of life. Methods: Design and implementation of a relational database for biocides, by using the software “phpMyAdmin”. Results: A database, which allows for an efficient collection, storage, and management of information including chemical properties and applications of a large quantity of biocides, as well as its adequate dissemination into the public health environment. Conclusion: The information contained in the database herein presented promotes an adequate use of biocides, by means of information technologies, which in consequence may help achieve important improvement in our quality of life. PMID:23113190

  16. Publicly Available Database : Improved Spectral Line Measurements In SDSS DR7 Galaxies

    NASA Astrophysics Data System (ADS)

    Oh, Kyuseok; Sarzi, M.; Schawinski, K.; Yi, S. K.

    2012-01-01

    We present a new database of absorption and emission line measurements based on the Sloan Digital Sky Survey 7th data release for the galaxies within a redshift of 0.2. Our work makes use of the publicly available penalized pixel-fitting(pPXF) and GANDALF codes, aiming to improve the existing measurements for stellar kinematics, the strength of various absorption-line features, and the flux and width of the emissions from different species of ionized gas. The absorption line strengths measured by SDSS pipeline are seriously contaminated by emission fill-in. We effectively separate emission lines from absorption lines. For instance, this work successfully extract [NI] doublet from Mgb and it leads to more realistic result of alpha enhancement on late-type galaxies compared to the previous database. Besides accurately measuring line strengths, the database provides new parameters that are indicative of line strength measurement quality. Users can build a subset of database optimal for their studies using specific cuts in the fitting quality parameters as well as empirical signal-to-noise. Applying these parameters, we found `hidden’ broad-line-region galaxies and they turned out to be Seyfert I nuclei that were not picked up as AGN by SDSS. The database is publicly available at http://gem.yonsei.ac.kr/ossy

  17. Local mitochondrial DNA haplotype databases needed for domestic dog populations that have experienced founder effect.

    PubMed

    Spadaro, Amanda; Ream, Kelsey; Braham, Caitlyn; Webb, Kristen M

    2015-03-01

    Biological material from pets is often collected as evidence from crime scenes. Due to sample type and quality, mitochondrial DNA (mtDNA) is frequently evaluated to identify the potential contributor. MtDNA has a lower discriminatory power than nuclear DNA with multiple individuals in a population potentially carrying the same mtDNA sequence, or haplotype. The frequency distribution of mtDNA haplotypes in a population must be known in order to determine the evidentiary value of a match between crime scene evidence and the potential contributor of the biological material. This is especially important in geographic areas that include remote and/or isolated populations where founder effect may have lead to a decrease in genetic diversity and a non-random distribution of haplotypes relative to the population at large. Here we compared the haplotype diversity in dogs from the noncontiguous states of Alaska and Hawaii relative to the contiguous United States (US). We report a greater proportion of dogs carrying an A haplotype in Alaska relative to any other US population. Significant variation in the distribution of haplotype frequencies was discovered when comparing the haplotype diversity of dogs in Hawaii to that of the continental US. Each of these regions exhibits reduced genetic diversity relative to the contiguous US, likely due to founder effect. We recommend that specific databases be created to accurately represent the mitochondrial haplotype diversity in these remote areas. Furthermore, our work demonstrates the importance of local surveys for populations that may have experienced found effect. PMID:25612881

  18. Italian mitochondrial DNA database: results of a collaborative exercise and proficiency testing.

    PubMed

    Turchi, Chiara; Buscemi, Loredana; Previderè, Carlo; Grignani, Pierangela; Brandstätter, Anita; Achilli, Alessandro; Parson, Walther; Tagliabracci, Adriano

    2008-05-01

    This work is a review of a collaborative exercise on mtDNA analysis undertaken by the Italian working group (Ge.F.I.). A total of 593 samples from 11 forensic genetic laboratories were subjected to hypervariable region (HVS-I/HVS-II) sequence analysis. The raw lane data were sent to MtDNA Population Database (EMPOP) for an independent evaluation. For the inclusion of data for the Italian database, quality assurance procedures were applied to the control region profiles. Only eight laboratories with a final population sample of 395 subjects passed the quality conformance test. Control region haplogroup (hg) assignments were confirmed by restriction fragment length polymorphism (RFLP) typing of the most common European hg-diagnostic sites. A total of 306 unique haplotypes derived from the combined analysis of control and coding region polymorphisms were found; the most common haplotype--CRS, 263, 309.1C, 315.1C/ not7025 AluI--was shared by 20 subjects. The majority of mtDNAs detected in the Italian population fell into the most common west Eurasian hgs: R0a (0.76%), HV (4.81%), H (38.99%), HV0 (3.55%), J (7.85%), T (13.42%), U (11.65%), K (10.13%), I (1.52%), X (2.78%), and W (1.01%). PMID:17952451

  19. SABRE2: a database connecting plant EST/full-length cDNA clones with Arabidopsis information.

    PubMed

    Fukami-Kobayashi, Kaoru; Nakamura, Yasukazu; Tamura, Takuro; Kobayashi, Masatomo

    2014-01-01

    The SABRE (Systematic consolidation of Arabidopsis and other Botanical REsources) database cross-searches plant genetic resources through publicly available Arabidopsis information. In SABRE, plant expressed sequence tag (EST)/cDNA clones are related to TAIR (The Arabidoposis Information Resource) gene models and their annotations through sequence similarity. By entering a keyword, SABRE searches and retrieves TAIR gene models and annotations, together with homologous gene clones from various plant species. SABRE thus facilitates using TAIR annotations of Arabidopsis genes for research on homologous genes from other model plants. To expand the application range of SABRE to crop breeding, we have recently upgraded SABRE to SABRE2 (http://sabre.epd.brc.riken.jp/SABRE2.html), by newly adding six model plants (including the major crops barley, soybean, tomato and wheat), and by improving the retrieval interface. The present version has integrated information on >1.5 million plant EST/cDNA clones from the National BioResource Project (NBRP) of Japan. All clones are actual experimental resources from 14 plant species (Arabidoposis, barley, cassava, Chinese cabbage, lotus, morning glory, poplar, Physcomitrella patens, Striga hermonthica, soybean, Thellungiella halophila, tobacco, tomato and wheat), and are available from the core facilities of the NBRP. SABRE2 is thus a useful tool that can contribute towards the improvement of important crop breeds by connecting basic research and crop breeding. PMID:24323624

  20. 76 FR 77533 - Notice of Order: Revisions to Enterprise Public Use Database Incorporating High-Cost Single...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-12-13

    ... September 28, 2011 at 76 FR 60031, regarding FHFA's adoption of an Order revising FHFA's Public Use Database... From the Federal Register Online via the Government Publishing Office FEDERAL HOUSING FINANCE AGENCY Notice of Order: Revisions to Enterprise Public Use Database Incorporating High-Cost...

  1. Approaching the taxonomic affiliation of unidentified sequences in public databases – an example from the mycorrhizal fungi

    PubMed Central

    Nilsson, R Henrik; Kristiansson, Erik; Ryberg, Martin; Larsson, Karl-Henrik

    2005-01-01

    Background During the last few years, DNA sequence analysis has become one of the primary means of taxonomic identification of species, particularly so for species that are minute or otherwise lack distinct, readily obtainable morphological characters. Although the number of sequences available for comparison in public databases such as GenBank increases exponentially, only a minuscule fraction of all organisms have been sequenced, leaving taxon sampling a momentous problem for sequence-based taxonomic identification. When querying GenBank with a set of unidentified sequences, a considerable proportion typically lack fully identified matches, forming an ever-mounting pile of sequences that the researcher will have to monitor manually in the hope that new, clarifying sequences have been submitted by other researchers. To alleviate these concerns, a project to automatically monitor select unidentified sequences in GenBank for taxonomic progress through repeated local BLAST searches was initiated. Mycorrhizal fungi – a field where species identification often is prohibitively complex – and the much used ITS locus were chosen as test bed. Results A Perl script package called emerencia is presented. On a regular basis, it downloads select sequences from GenBank, separates the identified sequences from those insufficiently identified, and performs BLAST searches between these two datasets, storing all results in an SQL database. On the accompanying web-service , users can monitor the taxonomic progress of insufficiently identified sequences over time, either through active searches or by signing up for e-mail notification upon disclosure of better matches. Other search categories, such as listing all insufficiently identified sequences (and their present best fully identified matches) publication-wise, are also available. Discussion The ever-increasing use of DNA sequences for identification purposes largely falls back on the assumption that public sequence databases contain a thorough sampling of taxonomically well-annotated sequences. Taxonomy, held by some to be an old-fashioned trade, has accordingly never been more important. emerencia does not automate the taxonomic process, but it does allow researchers to focus their efforts elsewhere than countless manual BLAST runs and arduous sieving of BLAST hit lists. The emerencia system is available on an open source basis for local installation with any organism and gene group as targets. PMID:16022740

  2. Genomics and Public Health Research: Can the State Allow Access to Genomic Databases?

    PubMed Central

    Cousineau, J; Girard, N; Monardes, C; Leroux, T; Jean, M Stanton

    2012-01-01

    Because many diseases are multifactorial disorders, the scientific progress in genomics and genetics should be taken into consideration in public health research. In this context, genomic databases will constitute an important source of information. Consequently, it is important to identify and characterize the State’s role and authority on matters related to public health, in order to verify whether it has access to such databases while engaging in public health genomic research. We first consider the evolution of the concept of public health, as well as its core functions, using a comparative approach (e.g. WHO, PAHO, CDC and the Canadian province of Quebec). Following an analysis of relevant Quebec legislation, the precautionary principle is examined as a possible avenue to justify State access to and use of genomic databases for research purposes. Finally, we consider the Influenza pandemic plans developed by WHO, Canada, and Quebec, as examples of key tools framing public health decision-making process. We observed that State powers in public health, are not, in Quebec, well adapted to the expansion of genomics research. We propose that the scope of the concept of research in public health should be clear and include the following characteristics: a commitment to the health and well-being of the population and to their determinants; the inclusion of both applied research and basic research; and, an appropriate model of governance (authorization, follow-up, consent, etc.). We also suggest that the strategic approach version of the precautionary principle could guide collective choices in these matters. PMID:23113174

  3. Generation and analysis of end sequence database for T-DNA tagging lines in rice.

    PubMed

    An, Suyoung; Park, Sunhee; Jeong, Dong-Hoon; Lee, Dong-Yeon; Kang, Hong-Gyu; Yu, Jung-Hwa; Hur, Junghe; Kim, Sung-Ryul; Kim, Young-Hea; Lee, Miok; Han, Soonki; Kim, Soo-Jin; Yang, Jungwon; Kim, Eunjoo; Wi, Soo Jin; Chung, Hoo Sun; Hong, Jong-Pil; Choe, Vitnary; Lee, Hak-Kyung; Choi, Jung-Hee; Nam, Jongmin; Kim, Seong-Ryong; Park, Phun-Bum; Park, Ky Young; Kim, Woo Taek; Choe, Sunghwa; Lee, Chin-Bum; An, Gynheung

    2003-12-01

    We analyzed 6749 lines tagged by the gene trap vector pGA2707. This resulted in the isolation of 3793 genomic sequences flanking the T-DNA. Among the insertions, 1846 T-DNAs were integrated into genic regions, and 1864 were located in intergenic regions. Frequencies were also higher at the beginning and end of the coding regions and upstream near the ATG start codon. The overall GC content at the insertion sites was close to that measured from the entire rice (Oryza sativa) genome. Functional classification of these 1846 tagged genes showed a distribution similar to that observed for all the genes in the rice chromosomes. This indicates that T-DNA insertion is not biased toward a particular class of genes. There were 764, 327, and 346 T-DNA insertions in chromosomes 1, 4 and 10, respectively. Insertions were not evenly distributed; frequencies were higher at the ends of the chromosomes and lower near the centromere. At certain sites, the frequency was higher than in the surrounding regions. This sequence database will be valuable in identifying knockout mutants for elucidating gene function in rice. This resource is available to the scientific community at http://www.postech.ac.kr/life/pfg/risd. PMID:14630961

  4. Accessing the public MIMIC-II intensive care relational database for clinical research

    PubMed Central

    2013-01-01

    Background The Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) database is a free, public resource for intensive care research. The database was officially released in 2006, and has attracted a growing number of researchers in academia and industry. We present the two major software tools that facilitate accessing the relational database: the web-based QueryBuilder and a downloadable virtual machine (VM) image. Results QueryBuilder and the MIMIC-II VM have been developed successfully and are freely available to MIMIC-II users. Simple example SQL queries and the resulting data are presented. Clinical studies pertaining to acute kidney injury and prediction of fluid requirements in the intensive care unit are shown as typical examples of research performed with MIMIC-II. In addition, MIMIC-II has also provided data for annual PhysioNet/Computing in Cardiology Challenges, including the 2012 Challenge “Predicting mortality of ICU Patients”. Conclusions QueryBuilder is a web-based tool that provides easy access to MIMIC-II. For more computationally intensive queries, one can locally install a complete copy of MIMIC-II in a VM. Both publicly available tools provide the MIMIC-II research community with convenient querying interfaces and complement the value of the MIMIC-II relational database. PMID:23302652

  5. A quality alert and call for improved curation of public chemistry databases.

    PubMed

    Williams, Antony J; Ekins, Sean

    2011-09-01

    In the last ten years, public online databases have rapidly become trusted valuable resources upon which researchers rely for their chemical structures and data for use in cheminformatics, bioinformatics, systems biology, translational medicine and now drug repositioning or repurposing efforts. Their utility depends on the quality of the underlying molecular structures used. Unfortunately, the quality of much of the chemical structure-based data introduced to the public domain is poor. As an example we describe some of the errors found in the recently released NIH Chemical Genomics Center 'NPC browser' database as an example. There is an urgent need for government funded data curation to improve the quality of internet chemistry and to limit the proliferation of errors and wasted efforts. PMID:21871970

  6. A parallel and incremental algorithm for efficient unique signature discovery on DNA databases

    PubMed Central

    2010-01-01

    Background DNA signatures are distinct short nucleotide sequences that provide valuable information that is used for various purposes, such as the design of Polymerase Chain Reaction primers and microarray experiments. Biologists usually use a discovery algorithm to find unique signatures from DNA databases, and then apply the signatures to microarray experiments. Such discovery algorithms require to set some input factors, such as signature length l and mismatch tolerance d, which affect the discovery results. However, suggestions about how to select proper factor values are rare, especially when an unfamiliar DNA database is used. In most cases, biologists typically select factor values based on experience, or even by guessing. If the discovered result is unsatisfactory, biologists change the input factors of the algorithm to obtain a new result. This process is repeated until a proper result is obtained. Implicit signatures under the discovery condition (l, d) are defined as the signatures of length ≤ l with mismatch tolerance ≥ d. A discovery algorithm that could discover all implicit signatures, such that those that meet the requirements concerning the results, would be more helpful than one that depends on trial and error. However, existing discovery algorithms do not address the need to discover all implicit signatures. Results This work proposes two discovery algorithms - the consecutive multiple discovery (CMD) algorithm and the parallel and incremental signature discovery (PISD) algorithm. The PISD algorithm is designed for efficiently discovering signatures under a certain discovery condition. The algorithm finds new results by using previously discovered results as candidates, rather than by using the whole database. The PISD algorithm further increases discovery efficiency by applying parallel computing. The CMD algorithm is designed to discover implicit signatures efficiently. It uses the PISD algorithm as a kernel routine to discover implicit signatures efficiently under every feasible discovery condition. Conclusions The proposed algorithms discover implicit signatures efficiently. The presented CMD algorithm has up to 97% less execution time than typical sequential discovery algorithms in the discovery of implicit signatures in experiments, when eight processing cores are used. PMID:20230647

  7. Assessment of Residential History Generation Using a Public-Record Database

    PubMed Central

    Wheeler, David C.; Wang, Aobo

    2015-01-01

    In studies of disease with potential environmental risk factors, residential location is often used as a surrogate for unknown environmental exposures or as a basis for assigning environmental exposures. These studies most typically use the residential location at the time of diagnosis due to ease of collection. However, previous residential locations may be more useful for risk analysis because of population mobility and disease latency. When residential histories have not been collected in a study, it may be possible to generate them through public-record databases. In this study, we evaluated the ability of a public-records database from LexisNexis to provide residential histories for subjects in a geographically diverse cohort study. We calculated 11 performance metrics comparing study-collected addresses and two address retrieval services from LexisNexis. We found 77% and 90% match rates for city and state and 72% and 87% detailed address match rates with the basic and enhanced services, respectively. The enhanced LexisNexis service covered 86% of the time at residential addresses recorded in the study. The mean match rate for detailed address matches varied spatially over states. The results suggest that public record databases can be useful for reconstructing residential histories for subjects in epidemiologic studies. PMID:26393626

  8. Privacy protection and public goods: building a genetic database for health research in Newfoundland and Labrador

    PubMed Central

    Pullman, Daryl; Perrot-Daley, Astrid; Hodgkinson, Kathy; Street, Catherine; Rahman, Proton

    2013-01-01

    Objective To provide a legal and ethical analysis of some of the implementation challenges faced by the Population Therapeutics Research Group (PTRG) at Memorial University (Canada), in using genealogical information offered by individuals for its genetics research database. Materials and methods This paper describes the unique historical and genetic characteristics of the Newfoundland and Labrador founder population, which gave rise to the opportunity for PTRG to build the Newfoundland Genealogy Database containing digitized records of all pre-confederation (1949) census records of the Newfoundland founder population. In addition to building the database, PTRG has developed the Heritability Analytics Infrastructure, a data management structure that stores genotype, phenotype, and pedigree information in a single database, and custom linkage software (KINNECT) to perform pedigree linkages on the genealogy database. Discussion A newly adopted legal regimen in Newfoundland and Labrador is discussed. It incorporates health privacy legislation with a unique research ethics statute governing the composition and activities of research ethics boards and, for the first time in Canada, elevating the status of national research ethics guidelines into law. The discussion looks at this integration of legal and ethical principles which provides a flexible and seamless framework for balancing the privacy rights and welfare interests of individuals, families, and larger societies in the creation and use of research data infrastructures as public goods. Conclusion The complementary legal and ethical frameworks that now coexist in Newfoundland and Labrador provide the legislative authority, ethical legitimacy, and practical flexibility needed to find a workable balance between privacy interests and public goods. Such an approach may also be instructive for other jurisdictions as they seek to construct and use biobanks and related research platforms for genetic research. PMID:22859644

  9. High-quality mtDNA control region sequences from 680 individuals sampled across the Netherlands to establish a national forensic mtDNA reference database.

    PubMed

    Chaitanya, Lakshmi; van Oven, Mannis; Brauer, Silke; Zimmermann, Bettina; Huber, Gabriela; Xavier, Catarina; Parson, Walther; de Knijff, Peter; Kayser, Manfred

    2016-03-01

    The use of mitochondrial DNA (mtDNA) for maternal lineage identification often marks the last resort when investigating forensic and missing-person cases involving highly degraded biological materials. As with all comparative DNA testing, a match between evidence and reference sample requires a statistical interpretation, for which high-quality mtDNA population frequency data are crucial. Here, we determined, under high quality standards, the complete mtDNA control-region sequences of 680 individuals from across the Netherlands sampled at 54 sites, covering the entire country with 10 geographic sub-regions. The complete mtDNA control region (nucleotide positions 16,024-16,569 and 1-576) was amplified with two PCR primers and sequenced with ten different sequencing primers using the EMPOP protocol. Haplotype diversity of the entire sample set was very high at 99.63% and, accordingly, the random-match probability was 0.37%. No population substructure within the Netherlands was detected with our dataset. Phylogenetic analyses were performed to determine mtDNA haplogroups. Inclusion of these high-quality data in the EMPOP database (accession number: EMP00666) will improve its overall data content and geographic coverage in the interest of all EMPOP users worldwide. Moreover, this dataset will serve as (the start of) a national reference database for mtDNA applications in forensic and missing person casework in the Netherlands. PMID:26774101

  10. CoDNaS 2.0: a comprehensive database of protein conformational diversity in the native state.

    PubMed

    Monzon, Alexander Miguel; Rohr, Cristian Oscar; Fornasari, María Silvina; Parisi, Gustavo

    2016-01-01

    CoDNaS (conformational diversity of the native state) is a protein conformational diversity database. Conformational diversity describes structural differences between conformers that define the native state of proteins. It is a key concept to understand protein function and biological processes related to protein functions. CoDNaS offers a well curated database that is experimentally driven, thoroughly linked, and annotated. CoDNaS facilitates the extraction of key information on small structural differences based on protein movements. CoDNaS enables users to easily relate the degree of conformational diversity with physical, chemical and biological properties derived from experiments on protein structure and biological characteristics. The new version of CoDNaS includes ∼70% of all available protein structures, and new tools have been added that run sequence searches, display structural flexibility profiles and allow users to browse the database for different structural classes. These tools facilitate the exploration of protein conformational diversity and its role in protein function.Database URL:http://ufq.unq.edu.ar/codnas. PMID:27022160

  11. CoDNaS 2.0: a comprehensive database of protein conformational diversity in the native state

    PubMed Central

    Monzon, Alexander Miguel; Rohr, Cristian Oscar; Fornasari, María Silvina; Parisi, Gustavo

    2016-01-01

    CoDNaS (conformational diversity of the native state) is a protein conformational diversity database. Conformational diversity describes structural differences between conformers that define the native state of proteins. It is a key concept to understand protein function and biological processes related to protein functions. CoDNaS offers a well curated database that is experimentally driven, thoroughly linked, and annotated. CoDNaS facilitates the extraction of key information on small structural differences based on protein movements. CoDNaS enables users to easily relate the degree of conformational diversity with physical, chemical and biological properties derived from experiments on protein structure and biological characteristics. The new version of CoDNaS includes ∼70% of all available protein structures, and new tools have been added that run sequence searches, display structural flexibility profiles and allow users to browse the database for different structural classes. These tools facilitate the exploration of protein conformational diversity and its role in protein function. Database URL: http://ufq.unq.edu.ar/codnas PMID:27022160

  12. Update of MmtDB: a Metazoa mitochondrial DNA variants database.

    PubMed Central

    Attimonelli, M; Calò, D; De Montalvo, A; Lanave, C; Sasanelli, D; Tommaseo Ponzetta, M; Saccone, C

    1998-01-01

    The present paper describes the improvements in MmtDB, a specialised database designed to collect Metazoa mitochondrial DNA variants. Priority in the data collection has been given to Metazoa for which a large amount of variants is available, e.g., for humans. Starting from the sequences available in the Nucleotide Sequence Databases, the redundant sequences have been removed and new sequences from other sources have been added. Value-added information is associated to each variant sequence, e.g., analysed region, experimental method, tissue and cell lines, population data, sex, age, family code and information about the variation events (nucleotide position, involved gene, restriction site gain or loss). Cross-references are introduced to the EMBL Data Library, as well as an internal cross-referencing among MmtDB entries according to tissual, heteroplasmic, familiar and aplotypical correlation. Furthermore MmtDB has a new section, AMmtDB: Aligned Metazoan mitochondrial biosequences. MmtDB can be accessed through the World Wide Web at URL http://WWW.ba.cnr.it/[symbol: see text]areamt08/MmtDBWWW.htm PMID:9399815

  13. Similarity landscapes: An improved method for scientific visualization of information from protein and DNA database searches

    SciTech Connect

    Dogget, N.; Myers, G.; Wills, C.J.

    1998-12-01

    This is the final report of a three-year, Laboratory Directed Research and Development (LDRD) project at the Los Alamos National Laboratory (LANL). The authors have used computer simulations and examination of a variety of databases to answer questions about a wide range of evolutionary questions. The authors have found that there is a clear distinction in the evolution of HIV-1 and HIV-2, with the former and more virulent virus evolving more rapidly at a functional level. The authors have discovered highly non-random patterns in the evolution of HIV-1 that can be attributed to a variety of selective pressures. In the course of examination of microsatellite DNA (short repeat regions) in microorganisms, the authors have found clear differences between prokaryotes and eukaryotes in their distribution, differences that can be tied to different selective pressures. They have developed a new method (topiary pruning) for enhancing the phylogenetic information contained in DNA sequences. Most recently, the authors have discovered effects in complex rainforest ecosystems that indicate strong frequency-dependent interactions between host species and their parasites, leading to the maintenance of ecosystem variability.

  14. Dfam: a database of repetitive DNA based on profile hidden Markov models.

    PubMed

    Wheeler, Travis J; Clements, Jody; Eddy, Sean R; Hubley, Robert; Jones, Thomas A; Jurka, Jerzy; Smit, Arian F A; Finn, Robert D

    2013-01-01

    We present a database of repetitive DNA elements, called Dfam (http://dfam.janelia.org). Many genomes contain a large fraction of repetitive DNA, much of which is made up of remnants of transposable elements (TEs). Accurate annotation of TEs enables research into their biology and can shed light on the evolutionary processes that shape genomes. Identification and masking of TEs can also greatly simplify many downstream genome annotation and sequence analysis tasks. The commonly used TE annotation tools RepeatMasker and Censor depend on sequence homology search tools such as cross_match and BLAST variants, as well as Repbase, a collection of known TE families each represented by a single consensus sequence. Dfam contains entries corresponding to all Repbase TE entries for which instances have been found in the human genome. Each Dfam entry is represented by a profile hidden Markov model, built from alignments generated using RepeatMasker and Repbase. When used in conjunction with the hidden Markov model search tool nhmmer, Dfam produces a 2.9% increase in coverage over consensus sequence search methods on a large human benchmark, while maintaining low false discovery rates, and coverage of the full human genome is 54.5%. The website provides a collection of tools and data views to support improved TE curation and annotation efforts. Dfam is also available for download in flat file format or in the form of MySQL table dumps. PMID:23203985

  15. Dfam: a database of repetitive DNA based on profile hidden Markov models

    PubMed Central

    Wheeler, Travis J.; Clements, Jody; Eddy, Sean R.; Hubley, Robert; Jones, Thomas A.; Jurka, Jerzy; Smit, Arian F. A.; Finn, Robert D.

    2013-01-01

    We present a database of repetitive DNA elements, called Dfam (http://dfam.janelia.org). Many genomes contain a large fraction of repetitive DNA, much of which is made up of remnants of transposable elements (TEs). Accurate annotation of TEs enables research into their biology and can shed light on the evolutionary processes that shape genomes. Identification and masking of TEs can also greatly simplify many downstream genome annotation and sequence analysis tasks. The commonly used TE annotation tools RepeatMasker and Censor depend on sequence homology search tools such as cross_match and BLAST variants, as well as Repbase, a collection of known TE families each represented by a single consensus sequence. Dfam contains entries corresponding to all Repbase TE entries for which instances have been found in the human genome. Each Dfam entry is represented by a profile hidden Markov model, built from alignments generated using RepeatMasker and Repbase. When used in conjunction with the hidden Markov model search tool nhmmer, Dfam produces a 2.9% increase in coverage over consensus sequence search methods on a large human benchmark, while maintaining low false discovery rates, and coverage of the full human genome is 54.5%. The website provides a collection of tools and data views to support improved TE curation and annotation efforts. Dfam is also available for download in flat file format or in the form of MySQL table dumps. PMID:23203985

  16. First whole genome based microsatellite DNA marker database of tomato for mapping and variety identification

    PubMed Central

    2013-01-01

    Background The cultivated tomato is second most consumed vegetable of the world and is an important part of a diverse and balanced diet as a rich source of vitamins, minerals, phenolic antioxidants and antioxidant lycopene having anti-cancer properties. To reap benefit of genomics of the domestic tomato (Solanum lycopersicum L.) unravelled by Tomato Genome Consortium (The Tomato Genome Consortium, 2012), the bulk mining of its markers in totality is imperative and critically required. The solgenomics has limited number of microsatellite DNA markers (2867) pertaining to solanaceae family. As these markers are of linkage map having relative distance, the choice of selected markers based on absolute distance as of physical map is missing. Only limited microsatellite markers with limitations are reported for variety identification thus there is a need for more markers supplementing DUS test and also for traceability of product in global market. Description We present here the first whole genome based microsatellite DNA marker database of tomato, TomSatDB (Tomato MicroSatellite Database) with more than 1.4 million markers mined in-silico, using MIcroSAtellite (MISA) tool. To cater the customized needs of wet lab, features with a novelty of an automated primer designing tool is added. TomSatDB (http://cabindb.iasri.res.in/tomsatdb), a user-friendly and freely accessible tool offers chromosome wise as well as location wise search of primers. It is an online relational database based on “three-tier architecture” that catalogues information of microsatellites in MySQL and user-friendly interface developed using PHP (Hypertext Pre Processor). Conclusion Besides abiotic stress, tomato is known to have biotic stress due to its susceptibility over 200 diseases caused by pathogenic fungi, bacteria, viruses and nematodes. These markers are expected to pave the way of germplasm management over abiotic and biotic stress as well as improvement through molecular breeding, leading to increased tomato productivity in India as well as other parts of the world. In era of IPR the new variety can be identified based on allelic variation among varieties supplementing DUS test and product traceability.

  17. Low template STR typing: effect of replicate number and consensus method on genotyping reliability and DNA database search results.

    PubMed

    Benschop, Corina C G; van der Beek, Cornelis P; Meiland, Hugo C; van Gorp, Ankie G M; Westen, Antoinette A; Sijen, Titia

    2011-08-01

    To analyze DNA samples with very low DNA concentrations, various methods have been developed that sensitize short tandem repeat (STR) typing. Sensitized DNA typing is accompanied by stochastic amplification effects, such as allele drop-outs and drop-ins. Therefore low template (LT) DNA profiles are interpreted with care. One can either try to infer the genotype by a consensus method that uses alleles confirmed in replicate analyses, or one can use a statistical model to evaluate the strength of the evidence in a direct comparison with a known DNA profile. In this study we focused on the first strategy and we show that the procedure by which the consensus profile is assembled will affect genotyping reliability. In order to gain insight in the roles of replicate number and requested level of reproducibility, we generated six independent amplifications of samples of known donors. The LT methods included both increased cycling and enhanced capillary electrophoresis (CE) injection [1]. Consensus profiles were assembled from two to six of the replications using four methods: composite (include all alleles), n-1 (include alleles detected in all but one replicate), n/2 (include alleles detected in at least half of the replicates) and 2× (include alleles detected twice). We compared the consensus DNA profiles with the DNA profile of the known donor, studied the stochastic amplification effects and examined the effect of the consensus procedure on DNA database search results. From all these analyses we conclude that the accuracy of LT DNA typing and the efficiency of database searching improve when the number of replicates is increased and the consensus method is n/2. The most functional number of replicates within this n/2 method is four (although a replicate number of three suffices for samples showing >25% of the alleles in standard STR typing). This approach was also the optimal strategy for the analysis of 2-person mixtures, although modified search strategies may be needed to retrieve the minor component in database searches. From the database searches follows the recommendation to specifically mark LT DNA profiles when entering them into the DNA database. PMID:20655289

  18. mirPub: a database for searching microRNA publications

    PubMed Central

    Vergoulis, Thanasis; Kanellos, Ilias; Kostoulas, Nikos; Georgakilas, Georgios; Sellis, Timos; Hatzigeorgiou, Artemis; Dalamagas, Theodore

    2015-01-01

    Summary: Identifying, amongst millions of publications available in MEDLINE, those that are relevant to specific microRNAs (miRNAs) of interest based on keyword search faces major obstacles. References to miRNA names in the literature often deviate from standard nomenclature for various reasons, since even the official nomenclature evolves. For instance, a single miRNA name may identify two completely different molecules or two different names may refer to the same molecule. mirPub is a database with a powerful and intuitive interface, which facilitates searching for miRNA literature, addressing the aforementioned issues. To provide effective search services, mirPub applies text mining techniques on MEDLINE, integrates data from several curated databases and exploits data from its user community following a crowdsourcing approach. Other key features include an interactive visualization service that illustrates intuitively the evolution of miRNA data, tag clouds summarizing the relevance of publications to particular diseases, cell types or tissues and access to TarBase 6.0 data to oversee genes related to miRNA publications. Availability and Implementation: mirPub is freely available at http://www.microrna.gr/mirpub/. Contact: vergoulis@imis.athena-innovation.gr or dalamag@imis.athena-innovation.gr Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25527833

  19. Fluorescence- and capillary electrophoresis (CE)-based SSR DNA fingerprinting and a molecular identity database for the Louisiana sugarcane industry

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A database of Louisiana sugarcane molecular identity has been constructed and is being updated annually using FAM or HEX or NED fluorescence- and capillary electrophoresis (CE)-based microsatellite (SSR) fingerprinting information. The fingerprints are PCR-amplified from leaf DNA samples of current ...

  20. A Two-locus DNA Sequence Database for Typing Plant and Human Pathogens Within the Fusarium oxysporum Species Complex

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We constructed a two-locus database, comprising partial translation elongation factor (EF-1alpha) gene sequences and nearly full-length sequences of the nuclear ribosomal intergenic spacer region (IGS rDNA) for 850 isolates spanning the phylogenetic breadth of the Fusarium oxysporum species complex ...

  1. MitoAge: a database for comparative analysis of mitochondrial DNA, with a special focus on animal longevity

    PubMed Central

    Toren, Dmitri; Barzilay, Thomer; Tacutu, Robi; Lehmann, Gilad; Muradian, Khachik K.; Fraifeld, Vadim E.

    2016-01-01

    Mitochondria are the only organelles in the animal cells that have their own genome. Due to a key role in energy production, generation of damaging factors (ROS, heat), and apoptosis, mitochondria and mtDNA in particular have long been considered one of the major players in the mechanisms of aging, longevity and age-related diseases. The rapidly increasing number of species with fully sequenced mtDNA, together with accumulated data on longevity records, provides a new fascinating basis for comparative analysis of the links between mtDNA features and animal longevity. To facilitate such analyses and to support the scientific community in carrying these out, we developed the MitoAge database containing calculated mtDNA compositional features of the entire mitochondrial genome, mtDNA coding (tRNA, rRNA, protein-coding genes) and non-coding (D-loop) regions, and codon usage/amino acids frequency for each protein-coding gene. MitoAge includes 922 species with fully sequenced mtDNA and maximum lifespan records. The database is available through the MitoAge website (www.mitoage.org or www.mitoage.info), which provides the necessary tools for searching, browsing, comparing and downloading the data sets of interest for selected taxonomic groups across the Kingdom Animalia. The MitoAge website assists in statistical analysis of different features of the mtDNA and their correlative links to longevity. PMID:26590258

  2. MitoAge: a database for comparative analysis of mitochondrial DNA, with a special focus on animal longevity.

    PubMed

    Toren, Dmitri; Barzilay, Thomer; Tacutu, Robi; Lehmann, Gilad; Muradian, Khachik K; Fraifeld, Vadim E

    2016-01-01

    Mitochondria are the only organelles in the animal cells that have their own genome. Due to a key role in energy production, generation of damaging factors (ROS, heat), and apoptosis, mitochondria and mtDNA in particular have long been considered one of the major players in the mechanisms of aging, longevity and age-related diseases. The rapidly increasing number of species with fully sequenced mtDNA, together with accumulated data on longevity records, provides a new fascinating basis for comparative analysis of the links between mtDNA features and animal longevity. To facilitate such analyses and to support the scientific community in carrying these out, we developed the MitoAge database containing calculated mtDNA compositional features of the entire mitochondrial genome, mtDNA coding (tRNA, rRNA, protein-coding genes) and non-coding (D-loop) regions, and codon usage/amino acids frequency for each protein-coding gene. MitoAge includes 922 species with fully sequenced mtDNA and maximum lifespan records. The database is available through the MitoAge website (www.mitoage.org or www.mitoage.info), which provides the necessary tools for searching, browsing, comparing and downloading the data sets of interest for selected taxonomic groups across the Kingdom Animalia. The MitoAge website assists in statistical analysis of different features of the mtDNA and their correlative links to longevity. PMID:26590258

  3. The Government Finance Database: A Common Resource for Quantitative Research in Public Financial Analysis.

    PubMed

    Pierson, Kawika; Hand, Michael L; Thompson, Fred

    2015-01-01

    Quantitative public financial management research focused on local governments is limited by the absence of a common database for empirical analysis. While the U.S. Census Bureau distributes government finance data that some scholars have utilized, the arduous process of collecting, interpreting, and organizing the data has led its adoption to be prohibitive and inconsistent. In this article we offer a single, coherent resource that contains all of the government financial data from 1967-2012, uses easy to understand natural-language variable names, and will be extended when new data is available. PMID:26107821

  4. The Government Finance Database: A Common Resource for Quantitative Research in Public Financial Analysis

    PubMed Central

    Pierson, Kawika; Hand, Michael L.; Thompson, Fred

    2015-01-01

    Quantitative public financial management research focused on local governments is limited by the absence of a common database for empirical analysis. While the U.S. Census Bureau distributes government finance data that some scholars have utilized, the arduous process of collecting, interpreting, and organizing the data has led its adoption to be prohibitive and inconsistent. In this article we offer a single, coherent resource that contains all of the government financial data from 1967-2012, uses easy to understand natural-language variable names, and will be extended when new data is available. PMID:26107821

  5. Near real-time operation of public image database for ground vehicle navigation

    NASA Astrophysics Data System (ADS)

    Ali, E.; Kozaitis, S. P.

    2015-02-01

    An effective color night vision system for ground vehicle navigation should operate in near real-time to be practical. We described a system that uses a public database as a source of color information to colorize night vision imagery. Such an approach presents several problems due to differences between acquired and reference imagery. Our system performed registration, colorizing, and reference updating in near real-time in an effort to help drivers of ground vehicles during night to see a colored view of a scene.

  6. Public Perceptions and Expectations of the Forensic Use of DNA: Results of a Preliminary Study

    ERIC Educational Resources Information Center

    Curtis, Cate

    2009-01-01

    The forensic use of Deoxyribonucleic Acid (DNA) is demonstrating significant success as a crime-solving tool. However, numerous concerns have been raised regarding the potential for DNA use to contravene cultural, ethical, and legal codes. In this article the expectations and level of knowledge of the New Zealand public of the DNA data-bank and…

  7. Public Perceptions and Expectations of the Forensic Use of DNA: Results of a Preliminary Study

    ERIC Educational Resources Information Center

    Curtis, Cate

    2009-01-01

    The forensic use of Deoxyribonucleic Acid (DNA) is demonstrating significant success as a crime-solving tool. However, numerous concerns have been raised regarding the potential for DNA use to contravene cultural, ethical, and legal codes. In this article the expectations and level of knowledge of the New Zealand public of the DNA data-bank and

  8. DNA banking and DNA databanking: Legal, ethical, and public policy issues. Progress report, [April 1, 1993--March 31, 1994

    SciTech Connect

    Reilly, P.R.; McEwen, J.E.; Small, D.

    1994-02-18

    The purpose of the grant was to provide support to enable us to: (1) perform legal and empirical research and critically analyze DNA banking and DNA databanking as those activities are conducted by state forensic laboratories, the military, academic researchers, and commercial enterprises; and (2) develop a broadcast quality educational videotape for viewing by the general public about DNA technology and the privacy and related issues that it raises. The grant thus has both a research and analysis component and a public education component. This report outlines the work completed since the inception of the project and describes the activities still in progress.

  9. Publicly available database for spectral line measurements of SDSS DR7 galaxies

    NASA Astrophysics Data System (ADS)

    Oh, Kyuseok; Sarzi, Marc; Schawinski, Kevin; Yi, Sukyoung K.

    2012-08-01

    We present a new database of absorption and emission-line measurements based on the Sloan Digital Sky Survey (SDSS) 7th data release of galaxies within a redshift of 0.2. Using the publicly available penalized pixel-fitting (pPXF) and gas and absorption line fitting (gandalf) codes, our work improve the existing measurements for stellar kinematics, the strength of various absorption line features, and the flux and width of the emissions from different species of ionised gas. Most notable of our work is that, we provide quality of the fit to assess reliability of the measurements. The quality assessment can be highly effective for finding new classes of objects. For example, based on the quality assessment around the Ha and [NII] nebular lines, we found approximately 1% of the SDSS spectra which classified as galaxies by the SDSS pipeline are in fact type I Seyfert AGN. This paper presents a summary of the recent paper, Oh et al.(2011). The database is publicly available at http://gem.yonsei.ac.kr/ossy/.

  10. Covariation of the Incidence of Type 1 Diabetes with Country Characteristics Available in Public Databases

    PubMed Central

    Diaz-Valencia, Paula Andrea; Bougnères, Pierre; Valleron, Alain-Jacques

    2015-01-01

    Background The incidence of Type 1 Diabetes (T1D) in children varies dramatically between countries. Part of the explanation must be sought in environmental factors. Increasingly, public databases provide information on country-to-country environmental differences. Methods Information on the incidence of T1D and country characteristics were searched for in the 194 World Health Organization (WHO) member countries. T1D incidence was extracted from a systematic literature review of all papers published between 1975 and 2014, including the 2013 update from the International Diabetes Federation. The information on country characteristics was searched in public databases. We considered all indicators with a plausible relation with T1D and those previously reported as correlated with T1D, and for which there was less than 5% missing values. This yielded 77 indicators. Four domains were explored: Climate and environment, Demography, Economy, and Health Conditions. Bonferroni correction to correct false discovery rate (FDR) was used in bivariate analyses. Stepwise multiple regressions, served to identify independent predictors of the geographical variation of T1D. Findings T1D incidence was estimated for 80 WHO countries. Forty-one significant correlations between T1D and the selected indicators were found. Stepwise Multiple Linear Regressions performed in the four explored domains indicated that the percentages of variance explained by the indicators were respectively 35% for Climate and environment, 33% for Demography, 45% for Economy, and 46% for Health conditions, and 51% in the Final model, where all variables selected by domain were considered. Significant environmental predictors of the country-to-country variation of T1D incidence included UV radiation, number of mobile cellular subscriptions in the country, health expenditure per capita, hepatitis B immunization and mean body mass index (BMI). Conclusions The increasing availability of public databases providing information in all global environmental domains should allow new analyses to identify further geographical, behavioral, social and economic factors, or indicators that point to latent causal factors of T1D. PMID:25706995

  11. MethBank: a database integrating next-generation sequencing single-base-resolution DNA methylation programming data.

    PubMed

    Zou, Dong; Sun, Shixiang; Li, Rujiao; Liu, Jiang; Zhang, Jing; Zhang, Zhang

    2015-01-01

    DNA methylation plays crucial roles during embryonic development. Here we present MethBank (http://dnamethylome.org), a DNA methylome programming database that integrates the genome-wide single-base nucleotide methylomes of gametes and early embryos in different model organisms. Unlike extant relevant databases, MethBank incorporates the whole-genome single-base-resolution methylomes of gametes and early embryos at multiple different developmental stages in zebrafish and mouse. MethBank allows users to retrieve methylation levels, differentially methylated regions, CpG islands, gene expression profiles and genetic polymorphisms for a specific gene or genomic region. Moreover, it offers a methylome browser that is capable of visualizing high-resolution DNA methylation profiles as well as other related data in an interactive manner and thus is of great helpfulness for users to investigate methylation patterns and changes of gametes and early embryos at different developmental stages. Ongoing efforts are focused on incorporation of methylomes and related data from other organisms. Together, MethBank features integration and visualization of high-resolution DNA methylation data as well as other related data, enabling identification of potential DNA methylation signatures in different developmental stages and accordingly providing an important resource for the epigenetic and developmental studies. PMID:25294826

  12. SBMDb: first whole genome putative microsatellite DNA marker database of sugarbeet for bioenergy and industrial applications

    PubMed Central

    Iquebal, Mir Asif; Jaiswal, Sarika; Angadi, U.B.; Sablok, Gaurav; Arora, Vasu; Kumar, Sunil; Rai, Anil; Kumar, Dinesh

    2015-01-01

    DNA marker plays important role as valuable tools to increase crop productivity by finding plausible answers to genetic variations and linking the Quantitative Trait Loci (QTL) of beneficial trait. Prior approaches in development of Short Tandem Repeats (STR) markers were time consuming and inefficient. Recent methods invoking the development of STR markers using whole genomic or transcriptomics data has gained wide importance with immense potential in developing breeding and cultivator improvement approaches. Availability of whole genome sequences and in silico approaches has revolutionized bulk marker discovery. We report world’s first sugarbeet whole genome marker discovery having 145 K markers along with 5 K functional domain markers unified in common platform using MySQL, Apache and PHP in SBMDb. Embedded markers and corresponding location information can be selected for desired chromosome, location/interval and primers can be generated using Primer3 core, integrated at backend. Our analyses revealed abundance of ‘mono’ repeat (76.82%) over ‘di’ repeats (13.68%). Highest density (671.05 markers/Mb) was found in chromosome 1 and lowest density (341.27 markers/Mb) in chromosome 6. Current investigation of sugarbeet genome marker density has direct implications in increasing mapping marker density. This will enable present linkage map having marker distance of ∼2 cM, i.e. from 200 to 2.6 Kb, thus facilitating QTL/gene mapping. We also report e-PCR-based detection of 2027 polymorphic markers in panel of five genotypes. These markers can be used for DUS test of variety identification and MAS/GAS in variety improvement program. The present database presents wide source of potential markers for developing and implementing new approaches for molecular breeding required to accelerate industrious use of this crop, especially for sugar, health care products, medicines and color dye. Identified markers will also help in improvement of bioenergy trait of bioethanol and biogas production along with reaping advantage of crop efficiency in terms of low water and carbon footprint especially in era of climate change. Database URL: http://webapp.cabgrid.res.in/sbmdb/ PMID:26647370

  13. SBMDb: first whole genome putative microsatellite DNA marker database of sugarbeet for bioenergy and industrial applications.

    PubMed

    Iquebal, Mir Asif; Jaiswal, Sarika; Angadi, U B; Sablok, Gaurav; Arora, Vasu; Kumar, Sunil; Rai, Anil; Kumar, Dinesh

    2015-01-01

    DNA marker plays important role as valuable tools to increase crop productivity by finding plausible answers to genetic variations and linking the Quantitative Trait Loci (QTL) of beneficial trait. Prior approaches in development of Short Tandem Repeats (STR) markers were time consuming and inefficient. Recent methods invoking the development of STR markers using whole genomic or transcriptomics data has gained wide importance with immense potential in developing breeding and cultivator improvement approaches. Availability of whole genome sequences and in silico approaches has revolutionized bulk marker discovery. We report world's first sugarbeet whole genome marker discovery having 145 K markers along with 5 K functional domain markers unified in common platform using MySQL, Apache and PHP in SBMDb. Embedded markers and corresponding location information can be selected for desired chromosome, location/interval and primers can be generated using Primer3 core, integrated at backend. Our analyses revealed abundance of 'mono' repeat (76.82%) over 'di' repeats (13.68%). Highest density (671.05 markers/Mb) was found in chromosome 1 and lowest density (341.27 markers/Mb) in chromosome 6. Current investigation of sugarbeet genome marker density has direct implications in increasing mapping marker density. This will enable present linkage map having marker distance of ∼2 cM, i.e. from 200 to 2.6 Kb, thus facilitating QTL/gene mapping. We also report e-PCR-based detection of 2027 polymorphic markers in panel of five genotypes. These markers can be used for DUS test of variety identification and MAS/GAS in variety improvement program. The present database presents wide source of potential markers for developing and implementing new approaches for molecular breeding required to accelerate industrious use of this crop, especially for sugar, health care products, medicines and color dye. Identified markers will also help in improvement of bioenergy trait of bioethanol and biogas production along with reaping advantage of crop efficiency in terms of low water and carbon footprint especially in era of climate change. Database URL: http://webapp.cabgrid.res.in/sbmdb/. PMID:26647370

  14. Familial searching: a specialist forensic DNA profiling service utilising the National DNA Database to identify unknown offenders via their relatives--the UK experience.

    PubMed

    Maguire, C N; McCallum, L A; Storey, C; Whitaker, J P

    2014-01-01

    The National DNA Database (NDNAD) of England and Wales was established on April 10th 1995. The NDNAD is governed by a variety of legislative instruments that mean that DNA samples can be taken if an individual is arrested and detained in a police station. The biological samples and the DNA profiles derived from them can be used for purposes related to the prevention and detection of crime, the investigation of an offence and for the conduct of a prosecution. Following the South East Asian Tsunami of December 2004, the legislation was amended to allow the use of the NDNAD to assist in the identification of a deceased person or of a body part where death has occurred from natural causes or from a natural disaster. The UK NDNAD now contains the DNA profiles of approximately 6 million individuals representing 9.6% of the UK population. As the science of DNA profiling advanced, the National DNA Database provided a potential resource for increased intelligence beyond the direct matching for which it was originally created. The familial searching service offered to the police by several UK forensic science providers exploits the size and geographic coverage of the NDNAD and the fact that close relatives of an offender may share a significant proportion of that offender's DNA profile and will often reside in close geographic proximity to him or her. Between 2002 and 2011 Forensic Science Service Ltd. (FSS) provided familial search services to support 188 police investigations, 70 of which are still active cases. This technique, which may be used in serious crime cases or in 'cold case' reviews when there are few or no investigative leads, has led to the identification of 41 perpetrators or suspects. In this paper we discuss the processes, utility, and governance of the familial search service in which the NDNAD is searched for close genetic relatives of an offender who has left DNA evidence at a crime scene, but whose DNA profile is not represented within the NDNAD. We discuss the scientific basis of the familial search approach, other DNA-based methods for eliminating individuals from the candidate lists generated by these NDNAD searches, the value of filtering these lists by age, ethnic appearance and geography and the governance required by the NDNAD Strategy Board when a police force commissions a familial search. We present the FSS data in relation to the utility of the familial searching service and demonstrate the power of the technique by reference to casework examples. We comment on the uptake of familial searching of DNA databases in the USA, the Netherlands, Australia, and New Zealand. Finally, following the adverse ruling by the European Court of Human Rights against the UK in regard to the S & Marper cases and the consequent introduction of the Protection of Freedoms Act (2012), we discuss the impact that changes to regulations concerning the storage of DNA samples will have on the continuing provision of familial searching of the National DNA Database in England and Wales. PMID:24315582

  15. [Clinical data mining by exploring public MIMIC-II intensive care database].

    PubMed

    Wang, Jian; Zhang, Zhengbo; Wang, Weidong; Pan, Liang; Chai, Xiaoke

    2014-11-01

    This paper introduces a free and publicly open ICU database: multi-parameter intelligent monitoring in intensive care II: MIMIC-II, which has been built up and maintained by the laboratory of computational physiology at the Massachusetts Institute Technology, Beth Israel Deaconess Medical Center and Philips Healthcare over the past decade. This paper briefly introduces its infrastructure, implementation and applications in clinical studies. Clinical study pertaining to circadian variation in heart rate and blood pressure during sepsis is shown as a typical example of research performed with MIMIC-II. In this study, it was found there was significant difference in circadian variation in both heart rate and blood pressure between survival and non-survival groups in septic patients. This study tackled several important techniques necessary for the investigation of the circadian rhythm. PMID:25980124

  16. Documentation for the U.S. Geological Survey Public-Supply Database (PSDB): a database of permitted public-supply wells, surface-water intakes, and systems in the United States

    USGS Publications Warehouse

    Price, Curtis V.; Maupin, Molly A.

    2014-01-01

    The purpose of this report is to document the PSDB and explain the methods used to populate and update the data from the SDWIS, State datasets, and map and geospatial imagery. This report describes 3 data tables and 11 domain tables, including field contents, data sources, and relations between tables. Although the PSDB database is not available to the general public, this information should be useful for others who are developing other database systems to store and analyze public-supply system and facility data.

  17. Mitochondrial DNA in the Central European population. Human identification with the help of the forensic mt-DNA D-loop-base database.

    PubMed

    Wittig, H; Augustin, C; Baasner, A; Bulnheim, U; Dimo-Simonin, N; Edelmann, J; Hering, S; Jung, S; Lutz, S; Michael, M; Parson, W; Poetsch, M; Schneider, P M; Weichhold, G; Krause, D

    2000-09-11

    Sequencing of mtDNA is an advanced method for the individualisation of traces. Disadvantages of this method are expensive and time-consuming analysis and evaluation procedures as well as the necessary stock of population-genetic data which is still insufficient. Central European institutes of forensic medicine from Germany, Austria, and Switzerland have been working together since the beginning of 1998 to establish a mtDNA database. The aim is to build up a large stock of forensically established data and provide population-genetic data for frequency investigations, which will serve as a basis for expert opinions and scientific research. Good data quality is ensured by using original sequences only. Ring tests, which have been conducted to enhance analytical reliability, revealed a high correspondence rate of the analytical results obtained by the individual member institutes. Today 1410 sequences are available for comparison, of which 1285 sequences in the HV1 and HV2 regions cover the full ranges from 16051 to 16365 and from 73 to 340 (according to Anderson). The major part is formed by Central European sequences comprising 1256 data sets from Germany, Austria, and Switzerland. Today the database contains sequences from a total of 12 European, six African and three Asian countries including 100 sequences from Japan. This paper is aimed at discussing the individualisation potentials of mtDNA as well as the possibilities and limits of ethnic differentiation by means of pairwise sequence differences on the basis of the data stock available. PMID:10978611

  18. SkyDOT: a publicly accessible variability database, containing multiple sky surveys and real-time data

    SciTech Connect

    Starr, D. L.; Wozniak, P. R.; Vestrand, W. T.

    2002-01-01

    SkyDOT (Sky Database for Objects in Time-Domain) is a Virtual Observatory currently comprised of data from the RAPTOR, ROTSE I, and OGLE I1 survey projects. This makes it a very large time domain database. In addition, the RAPTOR project provides SkyDOT with real-time variability data as well as stereoscopic information. With its web interface, we believe SkyDOT will be a very useful tool for both astronomers, and the public. Our main task has been to construct an efficient relational database containing all existing data, while handling a real-time inflow of data. We also provide a useful web interface allowing easy access to both astronomers and the public. Initially, this server will allow common searches, specific queries, and access to light curves. In the future we will include machine learning classification tools and access to spectral information.

  19. Complementary Value of Databases for Discovery of Scholarly Literature: A User Survey of Online Searching for Publications in Art History

    ERIC Educational Resources Information Center

    Nemeth, Erik

    2010-01-01

    Discovery of academic literature through Web search engines challenges the traditional role of specialized research databases. Creation of literature outside academic presses and peer-reviewed publications expands the content for scholarly research within a particular field. The resulting body of literature raises the question of whether scholars…

  20. Complementary Value of Databases for Discovery of Scholarly Literature: A User Survey of Online Searching for Publications in Art History

    ERIC Educational Resources Information Center

    Nemeth, Erik

    2010-01-01

    Discovery of academic literature through Web search engines challenges the traditional role of specialized research databases. Creation of literature outside academic presses and peer-reviewed publications expands the content for scholarly research within a particular field. The resulting body of literature raises the question of whether scholars

  1. 76 FR 60031 - Notice of Order: Revisions to Enterprise Public Use Database Incorporating High-Cost Single...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-09-28

    ... census tract identifiers; and a National File B containing unit-level data on all single-family... 1, 2010 (75 FR 41180, 41189 (July 15, 2010)) and shall be effective until such time as FHFA... AGENCY Notice of Order: Revisions to Enterprise Public Use Database Incorporating High-Cost...

  2. Defining new criteria for selection of cell-based intestinal models using publicly available databases

    PubMed Central

    2012-01-01

    Background The criteria for choosing relevant cell lines among a vast panel of available intestinal-derived lines exhibiting a wide range of functional properties are still ill-defined. The objective of this study was, therefore, to establish objective criteria for choosing relevant cell lines to assess their appropriateness as tumor models as well as for drug absorption studies. Results We made use of publicly available expression signatures and cell based functional assays to delineate differences between various intestinal colon carcinoma cell lines and normal intestinal epithelium. We have compared a panel of intestinal cell lines with patient-derived normal and tumor epithelium and classified them according to traits relating to oncogenic pathway activity, epithelial-mesenchymal transition (EMT) and stemness, migratory properties, proliferative activity, transporter expression profiles and chemosensitivity. For example, SW480 represent an EMT-high, migratory phenotype and scored highest in terms of signatures associated to worse overall survival and higher risk of recurrence based on patient derived databases. On the other hand, differentiated HT29 and T84 cells showed gene expression patterns closest to tumor bulk derived cells. Regarding drug absorption, we confirmed that differentiated Caco-2 cells are the model of choice for active uptake studies in the small intestine. Regarding chemosensitivity we were unable to confirm a recently proposed association of chemo-resistance with EMT traits. However, a novel signature was identified through mining of NCI60 GI50 values that allowed to rank the panel of intestinal cell lines according to their drug responsiveness to commonly used chemotherapeutics. Conclusions This study presents a straightforward strategy to exploit publicly available gene expression data to guide the choice of cell-based models. While this approach does not overcome the major limitations of such models, introducing a rank order of selected features may allow selecting model cell lines that are more adapted and pertinent to the addressed biological question. PMID:22726358

  3. Potential translational targets revealed by linking mouse grooming behavioral phenotypes to gene expression using public databases

    PubMed Central

    Roth, Andrew; Kyzar, Evan; Cachat, Jonathan; Stewart, Adam Michael; Green, Jeremy; Gaikwad, Siddharth; O’Leary, Timothy P.; Tabakoff, Boris; Brown, Richard E.; Kalueff, Allan V.

    2014-01-01

    Rodent self-grooming is an important, evolutionarily conserved behavior, highly sensitive to pharmacological and genetic manipulations. Mice with aberrant grooming phenotypes are currently used to model various human disorders. Therefore, it is critical to understand the biology of grooming behavior, and to assess its translational validity to humans. The present in-silico study used publicly available gene expression and behavioral data obtained from several inbred mouse strains in the open-field, light-dark box, elevated plus- and elevated zero-maze tests. As grooming duration differed between strains, our analysis revealed several candidate genes with significant correlations between gene expression in the brain and grooming duration. The Allen Brain Atlas, STRING, GoMiner and Mouse Genome Informatics databases were used to functionally map and analyze these candidate mouse genes against their human orthologs, assessing the strain ranking of their expression and the regional distribution of expression in the mouse brain. This allowed us to identify an interconnected network of candidate genes (which have expression levels that correlate with grooming behavior), display altered patterns of expression in key brain areas related to grooming, and underlie important functions in the brain. Collectively, our results demonstrate the utility of large-scale, high-throughput data-mining and in-silico modeling for linking genomic and behavioral data, as well as their potential to identify novel neural targets for complex neurobehavioral phenotypes, including grooming. PMID:23123364

  4. BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology

    PubMed Central

    Gilson, Michael K.; Liu, Tiqing; Baitaluk, Michael; Nicola, George; Hwang, Linda; Chong, Jenny

    2016-01-01

    BindingDB, www.bindingdb.org, is a publicly accessible database of experimental protein-small molecule interaction data. Its collection of over a million data entries derives primarily from scientific articles and, increasingly, US patents. BindingDB provides many ways to browse and search for data of interest, including an advanced search tool, which can cross searches of multiple query types, including text, chemical structure, protein sequence and numerical affinities. The PDB and PubMed provide links to data in BindingDB, and vice versa; and BindingDB provides links to pathway information, the ZINC catalog of available compounds, and other resources. The BindingDB website offers specialized tools that take advantage of its large data collection, including ones to generate hypotheses for the protein targets bound by a bioactive compound, and for the compounds bound by a new protein of known sequence; and virtual compound screening by maximal chemical similarity, binary kernel discrimination, and support vector machine methods. Specialized data sets are also available, such as binding data for hundreds of congeneric series of ligands, drawn from BindingDB and organized for use in validating drug design methods. BindingDB offers several forms of programmatic access, and comes with extensive background material and documentation. Here, we provide the first update of BindingDB since 2007, focusing on new and unique features and highlighting directions of importance to the field as a whole. PMID:26481362

  5. BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology.

    PubMed

    Gilson, Michael K; Liu, Tiqing; Baitaluk, Michael; Nicola, George; Hwang, Linda; Chong, Jenny

    2016-01-01

    BindingDB, www.bindingdb.org, is a publicly accessible database of experimental protein-small molecule interaction data. Its collection of over a million data entries derives primarily from scientific articles and, increasingly, US patents. BindingDB provides many ways to browse and search for data of interest, including an advanced search tool, which can cross searches of multiple query types, including text, chemical structure, protein sequence and numerical affinities. The PDB and PubMed provide links to data in BindingDB, and vice versa; and BindingDB provides links to pathway information, the ZINC catalog of available compounds, and other resources. The BindingDB website offers specialized tools that take advantage of its large data collection, including ones to generate hypotheses for the protein targets bound by a bioactive compound, and for the compounds bound by a new protein of known sequence; and virtual compound screening by maximal chemical similarity, binary kernel discrimination, and support vector machine methods. Specialized data sets are also available, such as binding data for hundreds of congeneric series of ligands, drawn from BindingDB and organized for use in validating drug design methods. BindingDB offers several forms of programmatic access, and comes with extensive background material and documentation. Here, we provide the first update of BindingDB since 2007, focusing on new and unique features and highlighting directions of importance to the field as a whole. PMID:26481362

  6. A search for obligatory paternal alleles in a DNA database to find an alleged rapist in a fatherless paternity case.

    PubMed

    Barash, Mark; Reshef, Ayeleth; Voskoboinik, Lev; Zamir, Ashira; Motro, Uzi; Gafny, Ron

    2012-07-01

    A sexual assault case resulted in a pregnancy, which was subsequently aborted. The alleged father of the fetus was unknown. Maternal and fetal types were obtained using the 11-locus AmpFℓSTR(®) SGM Plus(®) kit. The national DNA database was searched for the paternal obligatory alleles and detected two suspects who could not be excluded as father of the male fetus. Additional typing using the AmpFℓSTR(®) Minifiler(™) kit, containing three additional autosomal loci, was not sufficient to exclude either suspect. Subsequent typing using the PowerPlex(®) 16, containing four additional loci, and Y-Filer(™) kits resulted in excluding one suspect. Searching a database for paternal obligatory alleles can be fruitful, but is fraught with possible false positive results so that finding a match must be taken as only preliminary evidence. PMID:22390833

  7. A two-locus DNA sequence database for identifying host-specific pathogens and phylogenetic diversity within the Fusarium oxysporum species complex

    Technology Transfer Automated Retrieval System (TEKTRAN)

    An electronically portable two-locus DNA sequence database, comprising partial sequences of the translation elongation factor gene (EF-1a, 634 bp alignment) and nearly complete sequences of the nuclear ribosomal intergenic spacer region (IGS rDNA, 2220 bp alignment) for 850 isolates spanning the phy...

  8. Construction of a public CHO cell line transcript database using versatile bioinformatics analysis pipelines.

    PubMed

    Rupp, Oliver; Becker, Jennifer; Brinkrolf, Karina; Timmermann, Christina; Borth, Nicole; Pühler, Alfred; Noll, Thomas; Goesmann, Alexander

    2014-01-01

    Chinese hamster ovary (CHO) cell lines represent the most commonly used mammalian expression system for the production of therapeutic proteins. In this context, detailed knowledge of the CHO cell transcriptome might help to improve biotechnological processes conducted by specific cell lines. Nevertheless, very few assembled cDNA sequences of CHO cells were publicly released until recently, which puts a severe limitation on biotechnological research. Two extended annotation systems and web-based tools, one for browsing eukaryotic genomes (GenDBE) and one for viewing eukaryotic transcriptomes (SAMS), were established as the first step towards a publicly usable CHO cell genome/transcriptome analysis platform. This is complemented by the development of a new strategy to assemble the ca. 100 million reads, sequenced from a broad range of diverse transcripts, to a high quality CHO cell transcript set. The cDNA libraries were constructed from different CHO cell lines grown under various culture conditions and sequenced using Roche/454 and Illumina sequencing technologies in addition to sequencing reads from a previous study. Two pipelines to extend and improve the CHO cell line transcripts were established. First, de novo assemblies were carried out with the Trinity and Oases assemblers, using varying k-mer sizes. The resulting contigs were screened for potential CDS using ESTScan. Redundant contigs were filtered out using cd-hit-est. The remaining CDS contigs were re-assembled with CAP3. Second, a reference-based assembly with the TopHat/Cufflinks pipeline was performed, using the recently published draft genome sequence of CHO-K1 as reference. Additionally, the de novo contigs were mapped to the reference genome using GMAP and merged with the Cufflinks assembly using the cuffmerge software. With this approach 28,874 transcripts located on 16,492 gene loci could be assembled. Combining the results of both approaches, 65,561 transcripts were identified for CHO cell lines, which could be clustered by sequence identity into 17,598 gene clusters. PMID:24427317

  9. Towards standards for data exchange and integration and their impact on a public database such as CEBS (Chemical Effects in Biological Systems)

    SciTech Connect

    Fostel, Jennifer M.

    2008-11-15

    Integration, re-use and meta-analysis of high content study data, typical of DNA microarray studies, can increase its scientific utility. Access to study data and design parameters would enhance the mining of data integrated across studies. However, without standards for which data to include in exchange, and common exchange formats, publication of high content data is time-consuming and often prohibitive. The MGED Society ( (www.mged.org)) was formed in response to the widespread publication of microarray data, and the recognition of the utility of data re-use for meta-analysis. The NIEHS has developed the Chemical Effects in Biological Systems (CEBS) database, which can manage and integrate study data and design from biological and biomedical studies. As community standards are developed for study data and metadata it will become increasingly straightforward to publish high content data in CEBS, where they will be available for meta-analysis. Different exchange formats for study data are being developed: Standard for Exchange of Nonclinical Data (SEND; (www.cdisc.org)); Tox-ML ( (www.Leadscope.com)) and Simple Investigation Formatted Text (SIFT) from the NIEHS. Data integration can be done at the level of conclusions about responsive genes and phenotypes, and this workflow is supported by CEBS. CEBS also integrates raw and preprocessed data within a given platform. The utility and a method for integrating data within and across DNA microarray studies is shown in an example analysis using DrugMatrix data deposited in CEBS by Iconix Pharmaceuticals.

  10. The TTSMI database: a catalog of triplex target DNA sites associated with genes and regulatory elements in the human genome.

    PubMed

    Jenjaroenpun, Piroon; Chew, Chee Siang; Yong, Tai Pang; Choowongkomon, Kiattawee; Thammasorn, Wimada; Kuznetsov, Vladimir A

    2015-01-01

    A triplex target DNA site (TTS), a stretch of DNA that is composed of polypurines, is able to form a triple-helix (triplex) structure with triplex-forming oligonucleotides (TFOs) and is able to influence the site-specific modulation of gene expression and/or the modification of genomic DNA. The co-localization of a genomic TTS with gene regulatory signals and functional genome structures suggests that TFOs could potentially be exploited in antigene strategies for the therapy of cancers and other genetic diseases. Here, we present the TTS Mapping and Integration (TTSMI; http://ttsmi.bii.a-star.edu.sg) database, which provides a catalog of unique TTS locations in the human genome and tools for analyzing the co-localization of TTSs with genomic regulatory sequences and signals that were identified using next-generation sequencing techniques and/or predicted by computational models. TTSMI was designed as a user-friendly tool that facilitates (i) fast searching/filtering of TTSs using several search terms and criteria associated with sequence stability and specificity, (ii) interactive filtering of TTSs that co-localize with gene regulatory signals and non-B DNA structures, (iii) exploration of dynamic combinations of the biological signals of specific TTSs and (iv) visualization of a TTS simultaneously with diverse annotation tracks via the UCSC genome browser. PMID:25324314

  11. The electric dipole moment of DNA-binding HU protein calculated by the use of an NMR database.

    PubMed

    Takashima, S; Yamaoka, K

    1999-08-30

    Electric birefringence measurements indicated the presence of a large permanent dipole moment in HU protein-DNA complex. In order to substantiate this observation, numerical computation of the dipole moment of HU protein homodimer was carried out by using NMR protein databases. The dipole moments of globular proteins have hitherto been calculated with X-ray databases and NMR data have never been used before. The advantages of NMR databases are: (a) NMR data are obtained, unlike X-ray databases, using protein solutions. Accordingly, this method eliminates the bothersome question as to the possible alteration of the protein structure due to the transition from the crystalline state to the solution state. This question is particularly important for proteins such as HU protein which has some degree of internal flexibility; (b) the three-dimensional coordinates of hydrogen atoms in protein molecules can be determined with a sufficient resolution and this enables the N-H as well as C = O bond moments to be calculated. Since the NMR database of HU protein from Bacillus stearothermophilus consists of 25 models, the surface charge as well as the core dipole moments were computed for each of these structures. The results of these calculations show that the net permanent dipole moments of HU protein homodimer is approximately 500-530 D (1 D = 3.33 x 10(-30) Cm) at pH 7.5 and 600-630 D at the isoelectric point (pH 10.5). These permanent dipole moments are unusually large for a small protein of the size of 19.5 kDa. Nevertheless, the result of numerical calculations is compatible with the electro-optical observation, confirming a very large dipole moment in this protein. PMID:10483709

  12. Leading-edge forensic DNA analyses and the necessity of including crime scene investigators, police officers and technicians in a DNA elimination database.

    PubMed

    Lapointe, Martine; Rogic, Anita; Bourgoin, Sarah; Jolicoeur, Christine; Séguin, Diane

    2015-11-01

    In recent years, sophisticated technology has significantly increased the sensitivity and analytical power of genetic analyses so that very little starting material may now produce viable genetic profiles. This sensitivity however, has also increased the risk of detecting unknown genetic profiles assumed to be that of the perpetrator, yet originate from extraneous sources such as from crime scene workers. These contaminants may mislead investigations, keeping criminal cases active and unresolved for long spans of time. Voluntary submission of DNA samples from crime scene workers is fairly low, therefore we have created a promotional method for our staff elimination database that has resulted in a significant increase in voluntary samples since 2011. Our database enforces privacy safeguards and allows for optional anonymity to all staff members. We also offer information sessions at various police precincts to advise crime scene workers of the importance and success of our staff elimination database. This study, a pioneer in its field, has obtained 327 voluntary submissions from crime scene workers to date, of which 46 individual profiles (14%) have been matched to 58 criminal cases. By implementing our methods and respect for individual privacy, forensic laboratories everywhere may see similar growth and success in explaining unidentified genetic profiles in stagnate criminal cases. PMID:26117338

  13. Contamination of cDNA libraries and expressed sequence-tags databases

    SciTech Connect

    Dean, M.; Allikmets, R.

    1995-11-01

    Partially sequenced cDNAs, or expressed sequence tags (ESTs), are claimed to represent an efficient strategy for characterizing an organism`s genes. By necessity, these sequences are incompletely characterized, and examples of contamination of cDNA libraries with sequences from other species have been described. It has been suggested that a Human T-cell cDNA library (Clontech HL1963g) is contaminated by sequences from yeast (Saccharomyces cerevisiae) and an unknown bacterium. We are characterizing human ESTs that represent new members of the ATP-binding cassette transporter super-family. In examining human ESTs generated from the T-cell library, we have encountered one gene that was in fact a yeast sequence (Genbank Z15214 = SSH2 locus) and several genes that do not hybridize to human DNA or RNA. PCR primers from these sequences failed to amplify a product from human, yeast, or Escherichia coli DNA but did produce a product from a Clontech kidney cDNA library (HL1123a). To determine the source of the contamination, we amplified a conserved segment of the 16S rDNA (following a suggestion from Dr. C. Savakis) from the kidney library. The sequence of this product was nearly identical to that of the bacterium Leuconostoc lactis (300 of 304 bp). Leuconostoc species are commonly found in dairy products, fruits, vegetables, and wine and are nonpathogenic to humans. 6 refs., 1 fig.

  14. Searching for first-degree familial relationships in California's offender DNA database: validation of a likelihood ratio-based approach.

    PubMed

    Myers, Steven P; Timken, Mark D; Piucci, Matthew L; Sims, Gary A; Greenwald, Michael A; Weigand, James J; Konzak, Kenneth C; Buoncristiani, Martin R

    2011-11-01

    A validation study was performed to measure the effectiveness of using a likelihood ratio-based approach to search for possible first-degree familial relationships (full-sibling and parent-child) by comparing an evidence autosomal short tandem repeat (STR) profile to California's ∼1,000,000-profile State DNA Index System (SDIS) database. Test searches used autosomal STR and Y-STR profiles generated for 100 artificial test families. When the test sample and the first-degree relative in the database were characterized at the 15 Identifiler(®) (Applied Biosystems(®), Foster City, CA) STR loci, the search procedure included 96% of the fathers and 72% of the full-siblings. When the relative profile was limited to the 13 Combined DNA Index System (CODIS) core loci, the search procedure included 93% of the fathers and 61% of the full-siblings. These results, combined with those of functional tests using three real families, support the effectiveness of this tool. Based upon these results, the validated approach was implemented as a key, pragmatic and demonstrably practical component of the California Department of Justice's Familial Search Program. An investigative lead created through this process recently led to an arrest in the Los Angeles Grim Sleeper serial murders. PMID:21056023

  15. Errors in the interpretation of copy number variations due to the use of public databases as a reference.

    PubMed

    Bastida-Lertxundi, Nerea; López-López, Elixabet; Piñán, M Angeles; Puiggros, Anna; Navajas, Aurora; Solé, Francesc; García-Orad, Africa

    2014-04-01

    The identification of new cryptic deletions and duplications can be used to improve prognostic classification in cancer. To obtain accurate results, it is necessary to discriminate between somatic alterations in the tumor cell and germline polymorphisms. For this purpose, copy number variation (CNV) public databases have been used as a reference. Nevertheless, the use of these databases may lead to erroneous results. Our main goal was to explore the limitations of the use of CNV databases, such as the Database of Genomic Variants (DGV), as the reference. To that end, we used pediatric acute lymphoblastic leukemia (ALL) as a model. We analyzed the genome-wide copy number profile of 23 ALL patients and conducted a comparison of the results obtained using the DGV with those obtained using the normal sample from the patient as the reference. Using only the DGV, 19% of alterations and 41% of polymorphisms were erroneously catalogued. Our results support the hypothesis that with the use of databases such as the DGV as the reference, a high percentage of the variations can be erroneously classified. PMID:24767712

  16. The Human Transcript Database: A Catalogue of Full Length cDNA Inserts

    SciTech Connect

    Bouckk John; Michael McLeod; Kim Worley; Richard Gibbs

    1999-09-10

    The BCM Search Launcher provided improved access to web-based sequence analysis services during the granting period and beyond. The Search Launcher web site grouped analysis procedures by function and provided default parameters that provided reasonable search results for most applications. For instance, most queries were automatically masked for repeat sequences prior to sequence database searches to avoid spurious matches. In addition to the web-based access and arrangements that were made using the functions easier, the BCM Search Launcher provided unique value-added applications like the BEAUTY sequence database search tool that combined information about protein domains and sequence database search results to give an enhanced, more complete picture of the reliability and relative value of the information reported. This enhanced search tool made evaluating search results more straight-forward and consistent. Some of the favorite features of the web site are the sequence utilities and the batch client functionality that allows processing of multiple samples from the command line interface. One measure of the success of the BCM Search Launcher is the number of sites that have adopted the models first developed on the site. The graphic display on the BLAST search from the NCBI web site is one such outgrowth, as is the display of protein domain search results within BLAST search results, and the design of the Biology Workbench application. The logs of usage and comments from users confirm the great utility of this resource.

  17. PEDE (Pig EST Data Explorer): construction of a database for ESTs derived from porcine full-length cDNA libraries.

    PubMed

    Uenishi, Hirohide; Eguchi, Tomoko; Suzuki, Kohei; Sawazaki, Tetsuya; Toki, Daisuke; Shinkai, Hiroki; Okumura, Naohiko; Hamasima, Noriyuki; Awata, Takashi

    2004-01-01

    We generated the PEDE (Pig EST Data Explorer; http://pede.dna.affrc.go.jp/) database using sequences assembled from porcine 5' ESTs from oligo-capped full-length cDNA libraries. Thus far we have performed EST analysis of various organs (thymus, spleen, uterus, lung, liver, ovary and peripheral blood mononuclear cells) and assembled 68,076 high-quality sequences into 5546 contigs and 28,461 singlets. PEDE provides a search interface for getting results of homology searches and enables users to obtain information on sequence data and cDNA clones of interest. Single-nucleotide polymorphisms detected through comparison of the EST sequences are classified by origin (western and oriental breeds) and are searchable in the database. This database system can accelerate analyses of livestock traits and yields information that can lead to new applications in pigs as model systems for medical research. PMID:14681463

  18. Database and online map service on unstable rock slopes in Norway - From data perpetuation to public information

    NASA Astrophysics Data System (ADS)

    Oppikofer, Thierry; Nordahl, Bobo; Bunkholt, Halvor; Nicolaisen, Magnus; Jarna, Alexandra; Iversen, Sverre; Hermanns, Reginald L.; Böhme, Martina; Yugsi Molina, Freddy X.

    2015-11-01

    The unstable rock slope database is developed and maintained by the Geological Survey of Norway as part of the systematic mapping of unstable rock slopes in Norway. This mapping aims to detect catastrophic rock slope failures before they occur. More than 250 unstable slopes with post-glacial deformation are detected up to now. The main aims of the unstable rock slope database are (1) to serve as a national archive for unstable rock slopes in Norway; (2) to serve for data collection and storage during field mapping; (3) to provide decision-makers with hazard zones and other necessary information on unstable rock slopes for land-use planning and mitigation; and (4) to inform the public through an online map service. The database is organized hierarchically with a main point for each unstable rock slope to which several feature classes and tables are linked. This main point feature class includes several general attributes of the unstable rock slopes, such as site name, general and geological descriptions, executed works, recommendations, technical parameters (volume, lithology, mechanism and others), displacement rates, possible consequences, as well as hazard and risk classification. Feature classes and tables linked to the main feature class include different scenarios of an unstable rock slope, field observation points, sampling points for dating, displacement measurement stations, lineaments, unstable areas, run-out areas, areas affected by secondary effects, along with tables for hazard and risk classification and URL links to further documentation and references. The database on unstable rock slopes in Norway will be publicly consultable through an online map service. Factsheets with key information on unstable rock slopes can be automatically generated and downloaded for each site. Areas of possible rock avalanche run-out and their secondary effects displayed in the online map service, along with hazard and risk assessments, will become important tools for land-use planning. The present database will further evolve in the coming years as the systematic mapping progresses and as available techniques and tools evolve.

  19. Demographic and experiential correlates of public attitudes towards cell-free fetal DNA screening.

    PubMed

    Sayres, Lauren C; Allyse, Megan; Goodspeed, Taylor A; Cho, Mildred K

    2014-12-01

    This study seeks to inform clinical application of cell-free fetal DNA (cffDNA) screening as a novel method for prenatal trisomy detection by investigating public attitudes towards this technology and demographic and experiential characteristics related to these attitudes. Two versions of a 25-item survey assessing interest in cffDNA and existing first-trimester combined screening for either trisomy 13 and 18 or trisomy 21 were distributed among 3,164 members of the United States public. Logistic regression was performed to determine variables predictive of interest in screening options. Approximately 47% of respondents expressed an interest in cffDNA screening for trisomy 13, 18, and 21, with a majority interested in cffDNA screening as a stand-alone technique. A significantly greater percent would consider termination of pregnancy following a diagnosis of trisomy 13 or 18 (52%) over one of trisomy 21 (44%). Willingness to consider abortion of an affected pregnancy was the strongest correlate to interest in both cffDNA and first-trimester combined screening, although markedly more respondents expressed an interest in some form of screening (69% and 71%, respectively) than would consider termination. Greater educational attainment, higher income, and insurance coverage predicted interest in cffDNA screening; stronger religious identification also corresponded to decreased interest. Prior experience with disability and genetic testing was associated with increased interest in cffDNA screening. Several of these factors, in addition to advanced age and Asian race, were, in turn, predictive of respondents' increased willingness to consider post-diagnosis termination of pregnancy. In conclusion, divergent attitudes towards cffDNA screening--and prenatal options more generally--appear correlated with individual socioeconomic and religious backgrounds and experiences with disability and genetic testing. Clinical implementation and counseling for novel prenatal technologies should take these diverse stakeholder values into consideration. PMID:24715419

  20. Discovery of Possible Gene Relationships through the Application of Self-Organizing Maps to DNA Microarray Databases

    PubMed Central

    Chavez-Alvarez, Rocio; Chavoya, Arturo; Mendez-Vazquez, Andres

    2014-01-01

    DNA microarrays and cell cycle synchronization experiments have made possible the study of the mechanisms of cell cycle regulation of Saccharomyces cerevisiae by simultaneously monitoring the expression levels of thousands of genes at specific time points. On the other hand, pattern recognition techniques can contribute to the analysis of such massive measurements, providing a model of gene expression level evolution through the cell cycle process. In this paper, we propose the use of one of such techniques –an unsupervised artificial neural network called a Self-Organizing Map (SOM)–which has been successfully applied to processes involving very noisy signals, classifying and organizing them, and assisting in the discovery of behavior patterns without requiring prior knowledge about the process under analysis. As a test bed for the use of SOMs in finding possible relationships among genes and their possible contribution in some biological processes, we selected 282 S. cerevisiae genes that have been shown through biological experiments to have an activity during the cell cycle. The expression level of these genes was analyzed in five of the most cited time series DNA microarray databases used in the study of the cell cycle of this organism. With the use of SOM, it was possible to find clusters of genes with similar behavior in the five databases along two cell cycles. This result suggested that some of these genes might be biologically related or might have a regulatory relationship, as was corroborated by comparing some of the clusters obtained with SOMs against a previously reported regulatory network that was generated using biological knowledge, such as protein-protein interactions, gene expression levels, metabolism dynamics, promoter binding, and modification, regulation and transport of proteins. The methodology described in this paper could be applied to the study of gene relationships of other biological processes in different organisms. PMID:24699245

  1. Discovery of possible gene relationships through the application of self-organizing maps to DNA microarray databases.

    PubMed

    Chavez-Alvarez, Rocio; Chavoya, Arturo; Mendez-Vazquez, Andres

    2014-01-01

    DNA microarrays and cell cycle synchronization experiments have made possible the study of the mechanisms of cell cycle regulation of Saccharomyces cerevisiae by simultaneously monitoring the expression levels of thousands of genes at specific time points. On the other hand, pattern recognition techniques can contribute to the analysis of such massive measurements, providing a model of gene expression level evolution through the cell cycle process. In this paper, we propose the use of one of such techniques--an unsupervised artificial neural network called a Self-Organizing Map (SOM)-which has been successfully applied to processes involving very noisy signals, classifying and organizing them, and assisting in the discovery of behavior patterns without requiring prior knowledge about the process under analysis. As a test bed for the use of SOMs in finding possible relationships among genes and their possible contribution in some biological processes, we selected 282 S. cerevisiae genes that have been shown through biological experiments to have an activity during the cell cycle. The expression level of these genes was analyzed in five of the most cited time series DNA microarray databases used in the study of the cell cycle of this organism. With the use of SOM, it was possible to find clusters of genes with similar behavior in the five databases along two cell cycles. This result suggested that some of these genes might be biologically related or might have a regulatory relationship, as was corroborated by comparing some of the clusters obtained with SOMs against a previously reported regulatory network that was generated using biological knowledge, such as protein-protein interactions, gene expression levels, metabolism dynamics, promoter binding, and modification, regulation and transport of proteins. The methodology described in this paper could be applied to the study of gene relationships of other biological processes in different organisms. PMID:24699245

  2. PreDREM: a database of predicted DNA regulatory motifs from 349 human cell and tissue samples.

    PubMed

    Zheng, Yiyu; Li, Xiaoman; Hu, Haiyan

    2015-01-01

    PreDREM is a database of DNA regulatory motifs and motifs modules predicted from DNase I hypersensitive sites in 349 human cell and tissue samples. It contains 845-1325 predicted motifs in each sample, which result in a total of 2684 non-redundant motifs. In comparison with seven large collections of known motifs, more than 84% of the 2684 predicted motifs are similar to the known motifs, and 54-76% of the known motifs are similar to the predicted motifs. PreDREM also stores 43 663-20 13 288 motif modules in each sample, which provide the cofactor motifs of each predicted motif. Compared with motifs of known interacting transcription factor (TF) pairs in eight resources, on average, 84% of motif pairs corresponding to known interacting TF pairs are included in the predicted motif modules. Through its web interface, PreDREM allows users to browse motif information by tissues, datasets, individual non-redundant motifs, etc. Users can also search motifs, motif modules, instances of motifs and motif modules in given genomic regions, tissue or cell types a motif occurs, etc. PreDREM thus provides a useful resource for the understanding of cell- and tissue-specific gene regulation in the human genome. Database URL: http://server.cs.ucf.edu/predrem/. PMID:25725063

  3. PreDREM: a database of predicted DNA regulatory motifs from 349 human cell and tissue samples

    PubMed Central

    Zheng, Yiyu; Li, Xiaoman; Hu, Haiyan

    2015-01-01

    PreDREM is a database of DNA regulatory motifs and motifs modules predicted from DNase I hypersensitive sites in 349 human cell and tissue samples. It contains 8451325 predicted motifs in each sample, which result in a total of 2684 non-redundant motifs. In comparison with seven large collections of known motifs, more than 84% of the 2684 predicted motifs are similar to the known motifs, and 5476% of the known motifs are similar to the predicted motifs. PreDREM also stores 43 66320 13 288 motif modules in each sample, which provide the cofactor motifs of each predicted motif. Compared with motifs of known interacting transcription factor (TF) pairs in eight resources, on average, 84% of motif pairs corresponding to known interacting TF pairs are included in the predicted motif modules. Through its web interface, PreDREM allows users to browse motif information by tissues, datasets, individual non-redundant motifs, etc. Users can also search motifs, motif modules, instances of motifs and motif modules in given genomic regions, tissue or cell types a motif occurs, etc. PreDREM thus provides a useful resource for the understanding of cell- and tissue-specific gene regulation in the human genome. Database URL: http://server.cs.ucf.edu/predrem/. PMID:25725063

  4. Implications for DNA identification arising from an analysis of Australian forensic databases.

    PubMed

    Ayres, Karen L; Chaseling, Janet; Balding, David J

    2002-09-26

    Previous analyses of Australian samples have suggested that populations of the same broad racial group (Caucasian, Asian, Aboriginal) tend to be genetically similar across states. This suggests that a single national Australian database for each such group may be feasible, which would greatly facilitate casework. We have investigated samples drawn from each of these groups in different Australian states, and have quantified the genetic homogeneity across states within each racial group in terms of the "coancestry coefficient" F(ST). In accord with earlier results, we find that F(ST) values, as estimated from these data, are very small for Caucasians and Asians, usually <0.5%. We find that "declared" Aborigines (which includes many with partly Aboriginal genetic heritage) are also genetically similar across states, although they display some differentiation from a "pure" Aboriginal population (almost entirely of Aboriginal genetic heritage). PMID:12243876

  5. Identifying contributors of two-person DNA mixtures by familial database search.

    PubMed

    Chung, Yuk-Ka; Fung, Wing K

    2013-01-01

    The role of familial database search as a crime-solving tool has been increasingly recognized by forensic scientists. As an enhancement to the existing familial search approach on single source cases, this article presents our current progress in exploring the potential use of familial search to mixture cases. A novel method was established to predict the outcome of the search, from which a simple strategy for determining an appropriate scale of investigation by the police force is developed. Illustrated by an example using Swedish data, our approach is shown to have the potential for assisting the police force to decide on the scale of investigation, thereby achieving desirable crime-solving rate with reasonable cost. PMID:22270047

  6. CARL Corporation to Market Knight Ridder DIALOG Databases to the Academic and Public Library Market.

    ERIC Educational Resources Information Center

    Machovec, George S.

    1996-01-01

    With the advent of CD-ROMs, libraries began to limit online searching via DIALOG. To increase DIALOG's market share, Colorado Alliance of Research Libraries (CARL) Corporation is developing graphical user interfaces using World Wide Web and Windows technology and has reached agreements with Knight Ridder Information and with most of their database

  7. [Organic Law 10/2007, of October 8, regulating the police database on identifiers obtained from DNA: historic background and genetic view].

    PubMed

    Garca, Oscar

    2007-01-01

    Recently, Basic Law 10/2007 of 8 October has entered into effect, which regulates the police database on identifiers that are obtained from DNA. In the following lines, the author reveals the process of approval of this law as well as approaching certain of its aspects from a genetic perspective. PMID:18330105

  8. Database on natural polymorphisms and resistance-related non-synonymous mutations in thymidine kinase and DNA polymerase genes of herpes simplex virus types 1 and 2.

    PubMed

    Sauerbrei, Andreas; Bohn-Wippert, Kathrin; Kaspar, Marisa; Krumbholz, Andi; Karrasch, Matthias; Zell, Roland

    2016-01-01

    The use of genotypic resistance testing of herpes simplex virus types 1 and 2 (HSV-1 and HSV-2) is increasing because the rapid availability of results significantly improves the treatment of severe infections, especially in immunocompromised patients. However, an essential precondition is a broad knowledge of natural polymorphisms and resistance-associated mutations in the thymidine kinase (TK) and DNA polymerase (pol) genes, of which the DNA polymerase (Pol) enzyme is targeted by the highly effective antiviral drugs in clinical use. Thus, this review presents a database of all non-synonymous mutations of TK and DNA pol genes of HSV-1 and HSV-2 whose association with resistance or natural gene polymorphism has been clarified by phenotypic and/or functional assays. In addition, the laboratory methods for verifying natural polymorphisms or resistance mutations are summarized. This database can help considerably to facilitate the interpretation of genotypic resistance findings in clinical HSV-1 and HSV-2 strains. PMID:26433780

  9. Genotyping and interpretation of STR-DNA: Low-template, mixtures and database matches-Twenty years of research and development.

    PubMed

    Gill, Peter; Haned, Hinda; Bleka, Oyvind; Hansson, Oskar; Dørum, Guro; Egeland, Thore

    2015-09-01

    The introduction of Short Tandem Repeat (STR) DNA was a revolution within a revolution that transformed forensic DNA profiling into a tool that could be used, for the first time, to create National DNA databases. This transformation would not have been possible without the concurrent development of fluorescent automated sequencers, combined with the ability to multiplex several loci together. Use of the polymerase chain reaction (PCR) increased the sensitivity of the method to enable the analysis of a handful of cells. The first multiplexes were simple: 'the quad', introduced by the defunct UK Forensic Science Service (FSS) in 1994, rapidly followed by a more discriminating 'six-plex' (Second Generation Multiplex) in 1995 that was used to create the world's first national DNA database. The success of the database rapidly outgrew the functionality of the original system - by the year 2000 a new multiplex of ten-loci was introduced to reduce the chance of adventitious matches. The technology was adopted world-wide, albeit with different loci. The political requirement to introduce pan-European databases encouraged standardisation - the development of European Standard Set (ESS) of markers comprising twelve-loci is the latest iteration. Although development has been impressive, the methods used to interpret evidence have lagged behind. For example, the theory to interpret complex DNA profiles (low-level mixtures), had been developed fifteen years ago, but only in the past year or so, are the concepts starting to be widely adopted. A plethora of different models (some commercial and others non-commercial) have appeared. This has led to a confusing 'debate' about the 'best' to use. The different models available are described along with their advantages and disadvantages. A section discusses the development of national DNA databases, along with details of an associated controversy to estimate the strength of evidence of matches. Current methodology is limited to searches of complete profiles - another example where the interpretation of matches has not kept pace with development of theory. STRs have also transformed the area of Disaster Victim Identification (DVI) which frequently requires kinship analysis. However, genotyping efficiency is complicated by complex, degraded DNA profiles. Finally, there is now a detailed understanding of the causes of stochastic effects that cause DNA profiles to exhibit the phenomena of drop-out and drop-in, along with artefacts such as stutters. The phenomena discussed include: heterozygote balance; stutter; degradation; the effect of decreasing quantities of DNA; the dilution effect. PMID:25866376

  10. BrassicaTED - a public database for utilization of miniature transposable elements in Brassica species

    PubMed Central

    2014-01-01

    Background MITE, TRIM and SINEs are miniature form transposable elements (mTEs) that are ubiquitous and dispersed throughout entire plant genomes. Tens of thousands of members cause insertion polymorphism at both the inter- and intra- species level. Therefore, mTEs are valuable targets and resources for development of markers that can be utilized for breeding, genetic diversity and genome evolution studies. Taking advantage of the completely sequenced genomes of Brassica rapa and B. oleracea, characterization of mTEs and building a curated database are prerequisite to extending their utilization for genomics and applied fields in Brassica crops. Findings We have developed BrassicaTED as a unique web portal containing detailed characterization information for mTEs of Brassica species. At present, BrassicaTED has datasets for 41 mTE families, including 5894 and 6026 members from 20 MITE families, 1393 and 1639 members from 5 TRIM families, 1270 and 2364 members from 16 SINE families in B. rapa and B. oleracea, respectively. BrassicaTED offers different sections to browse structural and positional characteristics for every mTE family. In addition, we have added data on 289 MITE insertion polymorphisms from a survey of seven Brassica relatives. Genes with internal mTE insertions are shown with detailed gene annotation and microarray-based comparative gene expression data in comparison with their paralogs in the triplicated B. rapa genome. This database also includes a novel tool, K BLAST (Karyotype BLAST), for clear visualization of the locations for each member in the B. rapa and B. oleracea pseudo-genome sequences. Conclusions BrassicaTED is a newly developed database of information regarding the characteristics and potential utility of mTEs including MITE, TRIM and SINEs in B. rapa and B. oleracea. The database will promote the development of desirable mTE-based markers, which can be utilized for genomics and breeding in Brassica species. BrassicaTED will be a valuable repository for scientists and breeders, promoting efficient research on Brassica species. BrassicaTED can be accessed at http://im-crop.snu.ac.kr/BrassicaTED/index.php. PMID:24948109

  11. CMD: A Cotton Microsatellite Database Resource for Gossypium Genomics

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The Cotton Microsatellite Database (CMD) is a curated and integrated web-based database providing centralized access to all publicly available cotton microsatellite markers (SSRs) (http://www.cottonssr.org). At present it contains DNA sequence, SSR marker, mapping and similarity data for nine cotton...

  12. Automatic detection of lung nodules in computed tomography images: training and validation of algorithms using public research databases

    NASA Astrophysics Data System (ADS)

    Camarlinghi, Niccolò

    2013-09-01

    Lung cancer is one of the main public health issues in developed countries. Lung cancer typically manifests itself as non-calcified pulmonary nodules that can be detected reading lung Computed Tomography (CT) images. To assist radiologists in reading images, researchers started, a decade ago, the development of Computer Aided Detection (CAD) methods capable of detecting lung nodules. In this work, a CAD composed of two CAD subprocedures is presented: , devoted to the identification of parenchymal nodules, and , devoted to the identification of the nodules attached to the pleura surface. Both CADs are an upgrade of two methods previously presented as Voxel Based Neural Approach CAD . The novelty of this paper consists in the massive training using the public research Lung International Database Consortium (LIDC) database and on the implementation of new features for classification with respect to the original VBNA method. Finally, the proposed CAD is blindly validated on the ANODE09 dataset. The result of the validation is a score of 0.393, which corresponds to the average sensitivity of the CAD computed at seven predefined false positive rates: 1/8, 1/4, 1/2, 1, 2, 4, and 8 FP/CT.

  13. Seabird databases and the new paradigm for scientific publication and attribution

    USGS Publications Warehouse

    Hatch, Scott A.

    2010-01-01

    For more than 300 years, the peer-reviewed journal article has been the principal medium for packaging and delivering scientific data. With new tools for managing digital data, a new paradigm is emerging—one that demands open and direct access to data and that enables and rewards a broad-based approach to scientific questions. Ground-breaking papers in the future will increasingly be those that creatively mine and synthesize vast stores of data available on the Internet. This is especially true for conservation science, in which essential data can be readily captured in standard record formats. For seabird professionals, a number of globally shared databases are in the offing, or should be. These databases will capture the salient results of inventories and monitoring, pelagic surveys, diet studies, and telemetry. A number of real or perceived barriers to data sharing exist, but none is insurmountable. Our discipline should take an important stride now by adopting a specially designed markup language for annotating and sharing seabird data.

  14. Evaluation and Utilization as a Public Health Tool of a National Molecular Epidemiological Tuberculosis Outbreak Database within the United Kingdom from 1997 to 2001

    PubMed Central

    Drobniewski, F. A.; Gibson, A.; Ruddy, M.; Yates, M. D.

    2003-01-01

    The aim of this study was to develop a national model and analyze the value of a molecular epidemiological Mycobacterium tuberculosis DNA fingerprint-outbreak database. Incidents were investigated by the United Kingdom PHLS Mycobacterium Reference Unit (MRU) from June 1997 to December 2001, inclusive. A total of 124 incidents involving 972 tuberculosis cases, including 520 patient cultures from referred incidents and 452 patient cultures related to two population studies, were examined by using restriction fragment length polymorphism IS6110 fingerprinting and rapid epidemiological typing. Investigations were divided into the following three categories, reflecting different operational strategies: retrospective passive analysis, retrospective active analysis, and retrospective prospective analysis. The majority of incidents were in the retrospective passive analysis category, i.e., the individual submitting isolates has a suspicion they may be linked. Outbreaks were examined in schools, hospitals, farms, prisons, and public houses, and laboratory cross-contamination events and unusual clinical presentations were investigated. Retrospective active analysis involved a major outbreak centered on a high school. Contact tracing of a teenager with smear-positive pulmonary tuberculosis matched 14 individuals, including members of his class, and another 60 cases were identified in schools clinically and radiologically and by skin testing. Retrospective prospective analysis involved an outbreak of 94 isoniazid-resistant tuberculosis cases in London, United Kingdom, that began after cases were identified at one hospital in January 2000. Contact tracing and comparison with MRU databases indicated that the earliest matched case had occurred in 1995. Subsequently, the MRU changed to an active prospective analysis targeting linked isoniazid-monoresistant isolates for follow up. The patients were multiethnic, born mainly in the United Kingdom, and included professionals, individuals from the music industry, intravenous drug abusers, and prisoners. PMID:12734218

  15. DISTRIBUTED STRUCTURE-SEARCHABLE TOXICITY (DSSTOX) DATABASE NETWORK: MAKING PUBLIC TOXICITY DATA RESOURCES MORE ACCESSIBLE AND USABLE FOR DATA EXPLORATION AND SAR DEVELOPMENT

    EPA Science Inventory


    Distributed Structure-Searchable Toxicity (DSSTox) Database Network: Making Public Toxicity Data Resources More Accessible and U sable for Data Exploration and SAR Development

    Many sources of public toxicity data are not currently linked to chemical structure, are not ...

  16. Arylamine N-acetyltransferases in prokaryotic and eukaryotic genomes: a survey of public databases.

    PubMed

    Vagena, Eirini; Fakis, Giannoulis; Boukouvala, Sotiria

    2008-09-01

    Arylamine N-acetyltransferases (NATs) are xenobiotic metabolizing enzymes found in prokaryotes and eukaryotes. NATs have been characterized in bacteria (Bacilli, Mycobacteria, Salmonella etc.), laboratory animals (chicken, rabbit, rodents etc.) and humans, where the NAT loci occupy 230 kilobases on chromosome 8p22. Our previous comprehensive search for NAT genes involved 416 genomes (340 prokaryotic, 76 eukaryotic) and identified NAT homologues in several taxa, while also reporting on taxa that appeared to lack NAT genes [Boukouvala, S. and Fakis, G. (2005) Drug Metab. Rev. 37(3), 511-564]. Here, we present an update of this genomic search, covering 2138 genomes (1674 prokaryotic, 464 eukaryotic), of which 1167 (986 prokaryotic, 181 eukaryotic) were accessible using the advanced search algorithm tBLASTn. We have reconstructed the full-length open reading frames for putative proteins with sequence homology and features characteristic of NAT from 274 bacterial genomes (31 actinobacteria, 6 bacteroidetes/chlorobi, 2 cyanobacteria, 65 firmicutes and 170 proteobacteria) and 27 animals (1 sea-urchin, 5 fishes, 1 lizard, 1 bird and 19 mammals). Partial NAT sequences were recovered from several other organisms, including fungi, where NAT genes were found in 30 ascomycetes and 2 basidiomycetes. No NATs were found in arhaea, plants and lower invertebrates (insects and worms), while it is also uncertain whether NAT genes exist in protista. We present comparative genomic and phylogenetic analyses of the identified NAT homologues and announce a new database that will maintain information on non-human NATs and will provide recommendations for a standardized nomenclature, along the lines of the NAT Gene Nomenclature Committee. PMID:18781915

  17. Creating a data exchange strategy for radiotherapy research: Towards federated databases and anonymised public datasets

    PubMed Central

    Skripcak, Tomas; Belka, Claus; Bosch, Walter; Brink, Carsten; Brunner, Thomas; Budach, Volker; Büttner, Daniel; Debus, Jürgen; Dekker, Andre; Grau, Cai; Gulliford, Sarah; Hurkmans, Coen; Just, Uwe; Krause, Mechthild; Lambin, Philippe; Langendijk, Johannes A.; Lewensohn, Rolf; Lühr, Armin; Maingon, Philippe; Masucci, Michele; Niyazi, Maximilian; Poortmans, Philip; Simon, Monique; Schmidberger, Heinz; Spezi, Emiliano; Stuschke, Martin; Valentini, Vincenzo; Verheij, Marcel; Whitfield, Gillian; Zackrisson, Björn; Zips, Daniel; Baumann, Michael

    2015-01-01

    Disconnected cancer research data management and lack of information exchange about planned and ongoing research are complicating the utilisation of internationally collected medical information for improving cancer patient care. Rapidly collecting/pooling data can accelerate translational research in radiation therapy and oncology. The exchange of study data is one of the fundamental principles behind data aggregation and data mining. The possibilities of reproducing the original study results, performing further analyses on existing research data to generate new hypotheses or developing computational models to support medical decisions (e.g. risk/benefit analysis of treatment options) represent just a fraction of the potential benefits of medical data-pooling. Distributed machine learning and knowledge exchange from federated databases can be considered as one beyond other attractive approaches for knowledge generation within “Big Data”. Data interoperability between research institutions should be the major concern behind a wider collaboration. Information captured in electronic patient records (EPRs) and study case report forms (eCRFs), linked together with medical imaging and treatment planning data, are deemed to be fundamental elements for large multi-centre studies in the field of radiation therapy and oncology. To fully utilise the captured medical information, the study data have to be more than just an electronic version of a traditional (un-modifiable) paper CRF. Challenges that have to be addressed are data interoperability, utilisation of standards, data quality and privacy concerns, data ownership, rights to publish, data pooling architecture and storage. This paper discusses a framework for conceptual packages of ideas focused on a strategic development for international research data exchange in the field of radiation therapy and oncology. PMID:25458128

  18. Creating a data exchange strategy for radiotherapy research: towards federated databases and anonymised public datasets.

    PubMed

    Skripcak, Tomas; Belka, Claus; Bosch, Walter; Brink, Carsten; Brunner, Thomas; Budach, Volker; Büttner, Daniel; Debus, Jürgen; Dekker, Andre; Grau, Cai; Gulliford, Sarah; Hurkmans, Coen; Just, Uwe; Krause, Mechthild; Lambin, Philippe; Langendijk, Johannes A; Lewensohn, Rolf; Lühr, Armin; Maingon, Philippe; Masucci, Michele; Niyazi, Maximilian; Poortmans, Philip; Simon, Monique; Schmidberger, Heinz; Spezi, Emiliano; Stuschke, Martin; Valentini, Vincenzo; Verheij, Marcel; Whitfield, Gillian; Zackrisson, Björn; Zips, Daniel; Baumann, Michael

    2014-12-01

    Disconnected cancer research data management and lack of information exchange about planned and ongoing research are complicating the utilisation of internationally collected medical information for improving cancer patient care. Rapidly collecting/pooling data can accelerate translational research in radiation therapy and oncology. The exchange of study data is one of the fundamental principles behind data aggregation and data mining. The possibilities of reproducing the original study results, performing further analyses on existing research data to generate new hypotheses or developing computational models to support medical decisions (e.g. risk/benefit analysis of treatment options) represent just a fraction of the potential benefits of medical data-pooling. Distributed machine learning and knowledge exchange from federated databases can be considered as one beyond other attractive approaches for knowledge generation within "Big Data". Data interoperability between research institutions should be the major concern behind a wider collaboration. Information captured in electronic patient records (EPRs) and study case report forms (eCRFs), linked together with medical imaging and treatment planning data, are deemed to be fundamental elements for large multi-centre studies in the field of radiation therapy and oncology. To fully utilise the captured medical information, the study data have to be more than just an electronic version of a traditional (un-modifiable) paper CRF. Challenges that have to be addressed are data interoperability, utilisation of standards, data quality and privacy concerns, data ownership, rights to publish, data pooling architecture and storage. This paper discusses a framework for conceptual packages of ideas focused on a strategic development for international research data exchange in the field of radiation therapy and oncology. PMID:25458128

  19. The Mission Accessible Near-Earth Object Survey Public Database Development Effort

    NASA Astrophysics Data System (ADS)

    Burt, Brian; Moskovitz, Nicholas; Putnam, Lowell

    2014-11-01

    The Mission Accessible Near-Earth Object Survey (MANOS) began in August 2013 as a multi-year physical characterization survey that was awarded large survey status by NOAO. MANOS will target several hundred mission-accessible NEOs across visible and near-infrared wavelengths, ultimately providing a comprehensive catalog of physical properties (astrometry, light curves, spectra). The MANOS project will provide a resource that not only helps to manage our survey in a fully transparent, publicly accessible forum, but will also help to coordinate minor planet characterization efforts and target prioritization across multiple research groups. Working towards that goal, we are developing a portal for rapid, up to date, public dissemination of our data. Migrating the Lowell Astorb dataset to a SQL framework is a major step towards the modernization of the system and will make capable up-to-date deployment of data. This will further allow us to develop utilities of various complexity, such as a deltaV calculator, minor planet finder charts, and sophisticated ephemeri generation functions. We present the state of this effort and a preliminary timeline for functionality.

  20. UniPROBE, update 2015: new tools and content for the online database of protein-binding microarray data on protein–DNA interactions

    PubMed Central

    Hume, Maxwell A.; Barrera, Luis A.; Gisselbrecht, Stephen S.; Bulyk, Martha L.

    2015-01-01

    The Universal PBM Resource for Oligonucleotide Binding Evaluation (UniPROBE) serves as a convenient source of information on published data generated using universal protein-binding microarray (PBM) technology, which provides in vitro data about the relative DNA-binding preferences of transcription factors for all possible sequence variants of a length k (‘k-mers’). The database displays important information about the proteins and displays their DNA-binding specificity data in terms of k-mers, position weight matrices and graphical sequence logos. This update to the database documents the growth of UniPROBE since the last update 4 years ago, and introduces a variety of new features and tools, including a new streamlined pipeline that facilitates data deposition by universal PBM data generators in the research community, a tool that generates putative nonbinding (i.e. negative control) DNA sequences for one or more proteins and novel motifs obtained by analyzing the PBM data using the BEEML-PBM algorithm for motif inference. The UniPROBE database is available at http://uniprobe.org. PMID:25378322

  1. Exploration of Preterm Birth Rates Using the Public Health Exposome Database and Computational Analysis Methods

    PubMed Central

    Kershenbaum, Anne D.; Langston, Michael A.; Levine, Robert S.; Saxton, Arnold M.; Oyana, Tonny J.; Kilbourne, Barbara J.; Rogers, Gary L.; Gittner, Lisaann S.; Baktash, Suzanne H.; Matthews-Juarez, Patricia; Juarez, Paul D.

    2014-01-01

    Recent advances in informatics technology has made it possible to integrate, manipulate, and analyze variables from a wide range of scientific disciplines allowing for the examination of complex social problems such as health disparities. This study used 589 county-level variables to identify and compare geographical variation of high and low preterm birth rates. Data were collected from a number of publically available sources, bringing together natality outcomes with attributes of the natural, built, social, and policy environments. Singleton early premature county birth rate, in counties with population size over 100,000 persons provided the dependent variable. Graph theoretical techniques were used to identify a wide range of predictor variables from various domains, including black proportion, obesity and diabetes, sexually transmitted infection rates, mother’s age, income, marriage rates, pollution and temperature among others. Dense subgraphs (paracliques) representing groups of highly correlated variables were resolved into latent factors, which were then used to build a regression model explaining prematurity (R-squared = 76.7%). Two lists of counties with large positive and large negative residuals, indicating unusual prematurity rates given their circumstances, may serve as a starting point for ways to intervene and reduce health disparities for preterm births. PMID:25464130

  2. Exploration of preterm birth rates using the public health exposome database and computational analysis methods.

    PubMed

    Kershenbaum, Anne D; Langston, Michael A; Levine, Robert S; Saxton, Arnold M; Oyana, Tonny J; Kilbourne, Barbara J; Rogers, Gary L; Gittner, Lisaann S; Baktash, Suzanne H; Matthews-Juarez, Patricia; Juarez, Paul D

    2014-12-01

    Recent advances in informatics technology has made it possible to integrate, manipulate, and analyze variables from a wide range of scientific disciplines allowing for the examination of complex social problems such as health disparities. This study used 589 county-level variables to identify and compare geographical variation of high and low preterm birth rates. Data were collected from a number of publically available sources, bringing together natality outcomes with attributes of the natural, built, social, and policy environments. Singleton early premature county birth rate, in counties with population size over 100,000 persons provided the dependent variable. Graph theoretical techniques were used to identify a wide range of predictor variables from various domains, including black proportion, obesity and diabetes, sexually transmitted infection rates, mother's age, income, marriage rates, pollution and temperature among others. Dense subgraphs (paracliques) representing groups of highly correlated variables were resolved into latent factors, which were then used to build a regression model explaining prematurity (R-squared = 76.7%). Two lists of counties with large positive and large negative residuals, indicating unusual prematurity rates given their circumstances, may serve as a starting point for ways to intervene and reduce health disparities for preterm births. PMID:25464130

  3. De-identifying a public use microdata file from the Canadian national discharge abstract database

    PubMed Central

    2011-01-01

    Abstract Background The Canadian Institute for Health Information (CIHI) collects hospital discharge abstract data (DAD) from Canadian provinces and territories. There are many demands for the disclosure of this data for research and analysis to inform policy making. To expedite the disclosure of data for some of these purposes, the construction of a DAD public use microdata file (PUMF) was considered. Such purposes include: confirming some published results, providing broader feedback to CIHI to improve data quality, training students and fellows, providing an easily accessible data set for researchers to prepare for analyses on the full DAD data set, and serve as a large health data set for computer scientists and statisticians to evaluate analysis and data mining techniques. The objective of this study was to measure the probability of re-identification for records in a PUMF, and to de-identify a national DAD PUMF consisting of 10% of records. Methods Plausible attacks on a PUMF were evaluated. Based on these attacks, the 2008-2009 national DAD was de-identified. A new algorithm was developed to minimize the amount of suppression while maximizing the precision of the data. The acceptable threshold for the probability of correct re-identification of a record was set at between 0.04 and 0.05. Information loss was measured in terms of the extent of suppression and entropy. Results Two different PUMF files were produced, one with geographic information, and one with no geographic information but more clinical information. At a threshold of 0.05, the maximum proportion of records with the diagnosis code suppressed was 20%, but these suppressions represented only 8-9% of all values in the DAD. Our suppression algorithm has less information loss than a more traditional approach to suppression. Smaller regions, patients with longer stays, and age groups that are infrequently admitted to hospitals tend to be the ones with the highest rates of suppression. Conclusions The strategies we used to maximize data utility and minimize information loss can result in a PUMF that would be useful for the specific purposes noted earlier. However, to create a more detailed file with less information loss suitable for more complex health services research, the risk would need to be mitigated by requiring the data recipient to commit to a data sharing agreement. PMID:21861894

  4. Does language matter? A case study of epidemiological and public health journals, databases and professional education in French, German and Italian

    PubMed Central

    Baussano, Iacopo; Brzoska, Patrick; Fedeli, Ugo; Larouche, Claudia; Razum, Oliver; Fung, Isaac C-H

    2008-01-01

    Epidemiology and public health are usually context-specific. Journals published in different languages and countries play a role both as sources of data and as channels through which evidence is incorporated into local public health practice. Databases in these languages facilitate access to relevant journals, and professional education in these languages facilitates the growth of native expertise in epidemiology and public health. However, as English has become the lingua franca of scientific communication in the era of globalisation, many journals published in non-English languages face the difficult dilemma of either switching to English and competing internationally, or sticking to the native tongue and having a restricted circulation among a local readership. This paper discusses the historical development of epidemiology and the current scene of epidemiological and public health journals, databases and professional education in three Western European languages: French, German and Italian, and examines the dynamics and struggles they have today. PMID:18826570

  5. Does language matter? A case study of epidemiological and public health journals, databases and professional education in French, German and Italian.

    PubMed

    Baussano, Iacopo; Brzoska, Patrick; Fedeli, Ugo; Larouche, Claudia; Razum, Oliver; Fung, Isaac C-H

    2008-01-01

    Epidemiology and public health are usually context-specific. Journals published in different languages and countries play a role both as sources of data and as channels through which evidence is incorporated into local public health practice. Databases in these languages facilitate access to relevant journals, and professional education in these languages facilitates the growth of native expertise in epidemiology and public health. However, as English has become the lingua franca of scientific communication in the era of globalisation, many journals published in non-English languages face the difficult dilemma of either switching to English and competing internationally, or sticking to the native tongue and having a restricted circulation among a local readership. This paper discusses the historical development of epidemiology and the current scene of epidemiological and public health journals, databases and professional education in three Western European languages: French, German and Italian, and examines the dynamics and struggles they have today. PMID:18826570

  6. Comparative Study of Seven Commercial Kits for Human DNA Extraction from Urine Samples Suitable for DNA Biomarker-Based Public Health Studies

    PubMed Central

    El Bali, Latifa; Diman, Aurélie; Bernard, Alfred; Roosens, Nancy H. C.; De Keersmaecker, Sigrid C. J.

    2014-01-01

    Human genomic DNA extracted from urine could be an interesting tool for large-scale public health studies involving characterization of genetic variations or DNA biomarkers as a result of the simple and noninvasive collection method. These studies, involving many samples, require a rapid, easy, and standardized extraction protocol. Moreover, for practicability, there is a necessity to collect urine at a moment different from the first void and to store it appropriately until analysis. The present study compared seven commercial kits to select the most appropriate urinary human DNA extraction procedure for epidemiological studies. DNA yield has been determined using different quantification methods: two classical, i.e., NanoDrop and PicoGreen, and two species-specific real-time quantitative (q)PCR assays, as DNA extracted from urine contains, besides human, microbial DNA also, which largely contributes to the total DNA yield. In addition, the kits giving a good yield were also tested for the presence of PCR inhibitors. Further comparisons were performed regarding the sampling time and the storage conditions. Finally, as a proof-of-concept, an important gene related to smoking has been genotyped using the developed tools. We could select one well-performing kit for the human DNA extraction from urine suitable for molecular diagnostic real-time qPCR-based assays targeting genetic variations, applicable to large-scale studies. In addition, successful genotyping was possible using DNA extracted from urine stored at −20°C for several months, and an acceptable yield could also be obtained from urine collected at different moments during the day, which is particularly important for public health studies. PMID:25365790

  7. Clinical and public health research using methylated DNA Immunoprecipitation (MeDIP): A comparison of commercially available kits to examine differential DNA methylation across the genome

    PubMed Central

    Brebi-Mieville, Priscilla; Ili-Gangas, Carmen; Leal-Rojas, Pamela; Noordhuis, Maartje; Soudry, Ethan; Perez, Jimena; Roa, Juan Carlos; Sidransky, David; Guerrero-Preston, Rafael

    2012-01-01

    The methylated DNA immunoprecipitation method (MeDIP) is a genome-wide, high-resolution approach that detects DNA methylation with oligonucleotide tiling arrays or high throughput sequencing platforms. A simplified high-throughput MeDIP assay will enable translational research studies in clinics and populations, which will greatly enhance our understanding of the human methylome. We compared three commercial kits, MagMeDIP Kit TM (Diagenode), Methylated-DNA IP Kit (Zymo Research) and Methylamp Methylated DNA Capture Kit (Epigentek), in order to identify which one has better reliability and sensitivity for genomic DNA enrichment. Each kit was used to enrich two samples, one from fresh tissue and one from a cell line, with two different DNA amounts. The enrichment efficiency of each kit was evaluated by agarose gel band intensity after Nco I digestion and by reaction yield of methylated DNA. A successful enrichment is expected to have a 1:4 to 10:1 conversion ratio and a yield of 80% or higher. We also evaluated the hybridization efficiency to genome-wide methylation arrays in a separate cohort of tissue samples. We observed that the MagMeDIP kit had the highest yield for the two DNA amounts and for both the tissue and cell line samples, as well as for the positive control. In addition, the DNA was successfully enriched from a 1:4 to 10:1 ratio. Therefore, the MagMeDIP kit is a useful research tool that will enable clinical and public health genome-wide DNA methylation studies. PMID:22207357

  8. Genome databases

    SciTech Connect

    Courteau, J.

    1991-10-11

    Since the Genome Project began several years ago, a plethora of databases have been developed or are in the works. They range from the massive Genome Data Base at Johns Hopkins University, the central repository of all gene mapping information, to small databases focusing on single chromosomes or organisms. Some are publicly available, others are essentially private electronic lab notebooks. Still others limit access to a consortium of researchers working on, say, a single human chromosome. An increasing number incorporate sophisticated search and analytical software, while others operate as little more than data lists. In consultation with numerous experts in the field, a list has been compiled of some key genome-related databases. The list was not limited to map and sequence databases but also included the tools investigators use to interpret and elucidate genetic data, such as protein sequence and protein structure databases. Because a major goal of the Genome Project is to map and sequence the genomes of several experimental animals, including E. coli, yeast, fruit fly, nematode, and mouse, the available databases for those organisms are listed as well. The author also includes several databases that are still under development - including some ambitious efforts that go beyond data compilation to create what are being called electronic research communities, enabling many users, rather than just one or a few curators, to add or edit the data and tag it as raw or confirmed.

  9. E-SovTox: An online database of the main publicly-available sources of toxicity data concerning REACH-relevant chemicals published in the Russian language.

    PubMed

    Sihtmäe, Mariliis; Blinova, Irina; Aruoja, Villem; Dubourguier, Henri-Charles; Legrand, Nicolas; Kahru, Anne

    2010-08-01

    A new open-access online database, E-SovTox, is presented. E-SovTox provides toxicological data for substances relevant to the EU Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH) system, from publicly-available Russian language data sources. The database contains information selected mainly from scientific journals published during the Soviet Union era. The main information source for this database - the journal, Gigiena Truda i Professional'nye Zabolevania [Industrial Hygiene and Occupational Diseases], published between 1957 and 1992 - features acute, but also chronic, toxicity data for numerous industrial chemicals, e.g. for rats, mice, guinea-pigs and rabbits. The main goal of the abovementioned toxicity studies was to derive the maximum allowable concentration limits for industrial chemicals in the occupational health settings of the former Soviet Union. Thus, articles featured in the database include mostly data on LD50 values, skin and eye irritation, skin sensitisation and cumulative properties. Currently, the E-SovTox database contains toxicity data selected from more than 500 papers covering more than 600 chemicals. The user is provided with the main toxicity information, as well as abstracts of these papers in Russian and in English (given as provided in the original publication). The search engine allows cross-searching of the database by the name or CAS number of the compound, and the author of the paper. The E-SovTox database can be used as a decision-support tool by researchers and regulators for the hazard assessment of chemical substances. PMID:20822322

  10. Conference examines admissibility of DNA evidence and other technical issues in public controversies

    SciTech Connect

    Field, T.G. Jr.

    1994-12-31

    A conference entitled, ``Which Scientist do you believe? Process Alternatives in Technological Controversies,`` was held in Concord, NH on October 6--7, 1994. It was organized bu Arthur Kantrowitz (Thayer School of Engineering, Dartmouth College) and Thomas Field (Franklin Pierce Law Center [FPLC]) and attracted almost 40 US and Canadian conferees. Partial funding was provided by the Ethical, Legal and Social Issues component of the DOE Human Genome Project. Presentations included a discussion of the following: Processes for resolving medical controversies; the need to separate scientific facts from social values in public controversies with large technical components; Congressional initiatives for regulating risks to humans and the environment; DNA evidence in the courtroom; and the need for non-technical input in framing issues to be resolved by scientists.

  11. Development of a DNA microarray to detect antimicrobial resistance genes identified in the national center for biotechnology information database

    Technology Transfer Automated Retrieval System (TEKTRAN)

    High density genotyping techniques are needed for investigating antimicrobial resistance especially in the case of multi-drug resistant (MDR) isolates. To achieve this all antimicrobial resistance genes in the NCBI Genbank database were identified by key word searches of sequence annotations and the...

  12. PFR²: a curated database of planktonic foraminifera 18S ribosomal DNA as a resource for studies of plankton ecology, biogeography and evolution.

    PubMed

    Morard, Raphaël; Darling, Kate F; Mahé, Frédéric; Audic, Stéphane; Ujiié, Yurika; Weiner, Agnes K M; André, Aurore; Seears, Heidi A; Wade, Christopher M; Quillévéré, Frédéric; Douady, Christophe J; Escarguel, Gilles; de Garidel-Thoron, Thibault; Siccha, Michael; Kucera, Michal; de Vargas, Colomban

    2015-11-01

    Planktonic foraminifera (Rhizaria) are ubiquitous marine pelagic protists producing calcareous shells with conspicuous morphology. They play an important role in the marine carbon cycle, and their exceptional fossil record serves as the basis for biochronostratigraphy and past climate reconstructions. A major worldwide sampling effort over the last two decades has resulted in the establishment of multiple large collections of cryopreserved individual planktonic foraminifera samples. Thousands of 18S rDNA partial sequences have been generated, representing all major known morphological taxa across their worldwide oceanic range. This comprehensive data coverage provides an opportunity to assess patterns of molecular ecology and evolution in a holistic way for an entire group of planktonic protists. We combined all available published and unpublished genetic data to build PFR(2), the Planktonic foraminifera Ribosomal Reference database. The first version of the database includes 3322 reference 18S rDNA sequences belonging to 32 of the 47 known morphospecies of extant planktonic foraminifera, collected from 460 oceanic stations. All sequences have been rigorously taxonomically curated using a six-rank annotation system fully resolved to the morphological species level and linked to a series of metadata. The PFR(2) website, available at http://pfr2.sb-roscoff.fr, allows downloading the entire database or specific sections, as well as the identification of new planktonic foraminiferal sequences. Its novel, fully documented curation process integrates advances in morphological and molecular taxonomy. It allows for an increase in its taxonomic resolution and assures that integrity is maintained by including a complete contingency tracking of annotations and assuring that the annotations remain internally consistent. PMID:25828689

  13. Reflections on a decade of research by ASEAN dental faculties: analysis of publications from ISI-WOS databases from 2000 to 2009.

    PubMed

    Sirisinha, Stitaya; Koontongkaew, Sittichai; Phantumvanit, Prathip; Wittayawuttikul, Ruchareka

    2011-05-01

    This communication analyzed research publications in dentistry in the Institute of Scientific Information Web of Science databases of 10 dental faculties in the Association of South-East Asian Nations (ASEAN) from 2000 to 2009. The term used for the "all-document types" search was "Faculty of Dentistry/College of Dentistry." Abstracts presented at regional meetings were also included in the analysis. The Times Higher Education System QS World University Rankings showed that universities in the region fare poorly in world university rankings. Only the National University of Singapore and Nanyang Technological University appeared in the top 100 in 2009; 19 universities in the region, including Indonesia, Malaysia, the Philippines, Singapore, and Thailand, appeared in the top 500. Data from the databases showed that research publications by dental institutes in the region fall short of their Asian counterparts. Singapore and Thailand are the most active in dental research of the ASEAN countries. PMID:25426599

  14. DNA.

    ERIC Educational Resources Information Center

    Felsenfeld, Gary

    1985-01-01

    Structural form, bonding scheme, and chromatin structure of and gene-modification experiments with deoxyribonucleic acid (DNA) are described. Indicates that DNA's double helix is variable and also flexible as it interacts with regulatory and other molecules to transfer hereditary messages. (DH)

  15. Non-B DB v2.0: a database of predicted non-B DNA-forming motifs and its associated tools.

    PubMed

    Cer, Regina Z; Donohue, Duncan E; Mudunuri, Uma S; Temiz, Nuri A; Loss, Michael A; Starner, Nathan J; Halusa, Goran N; Volfovsky, Natalia; Yi, Ming; Luke, Brian T; Bacolla, Albino; Collins, Jack R; Stephens, Robert M

    2013-01-01

    The non-B DB, available at http://nonb.abcc.ncifcrf.gov, catalogs predicted non-B DNA-forming sequence motifs, including Z-DNA, G-quadruplex, A-phased repeats, inverted repeats, mirror repeats, direct repeats and their corresponding subsets: cruciforms, triplexes and slipped structures, in several genomes. Version 2.0 of the database revises and re-implements the motif discovery algorithms to better align with accepted definitions and thresholds for motifs, expands the non-B DNA-forming motifs coverage by including short tandem repeats and adds key visualization tools to compare motif locations relative to other genomic annotations. Non-B DB v2.0 extends the ability for comparative genomics by including re-annotation of the five organisms reported in non-B DB v1.0, human, chimpanzee, dog, macaque and mouse, and adds seven additional organisms: orangutan, rat, cow, pig, horse, platypus and Arabidopsis thaliana. Additionally, the non-B DB v2.0 provides an overall improved graphical user interface and faster query performance. PMID:23125372

  16. Development of a maize molecular evolutionary genomic database.

    PubMed

    Du, Chunguang; Buckler, Edward; Muse, Spencer

    2003-01-01

    PANZEA is the first public database for studying maize genomic diversity. It was initiated as a repository of genomic diversity for an NSF Plant Genome project on 'Maize Evolutionary Genomics'. PANZEA is hosted at the Bioinformatics Research Center, North Carolina State University, and is open to the public (http://statgen.ncsu.edu/panzea). PANZEA is designed to capture the interrelationships between germplasm, molecular diversity, phenotypic diversity and genome structure. It has the ability to store, integrate and visualize DNA sequence, enzymatic, SSR (simple sequence repeat) marker, germplasm and phenotypic data. The relational data model is selected and implemented in Oracle. An automated DNA sequence data submission tool has been created that allows project researchers to remotely submit their DNA sequence data directly to PANZEA. On-line database search forms and reports have been created to allow users to search or download germplasm, DNA sequence, gene/locus data and much more, directly from the web. PMID:18629130

  17. Prevalence of human cell material: DNA and RNA profiling of public and private objects and after activity scenarios.

    PubMed

    van den Berge, M; Ozcanhan, G; Zijlstra, S; Lindenbergh, A; Sijen, T

    2016-03-01

    Especially when minute evidentiary traces are analysed, background cell material unrelated to the crime may contribute to detectable levels in the genetic analyses. To gain understanding on the composition of human cell material residing on surfaces contributing to background traces, we performed DNA and mRNA profiling on samplings of various items. Samples were selected by considering events contributing to cell material deposits in exemplary activities (e.g. dragging a person by the trouser ankles), and can be grouped as public objects, private samples, transfer-related samples and washing machine experiments. Results show that high DNA yields do not necessarily relate to an increased number of contributors or to the detection of other cell types than skin. Background cellular material may be found on any type of public or private item. When a major contributor can be deduced in DNA profiles from private items, this can be a different person than the owner of the item. Also when a specific activity is performed and the areas of physical contact are analysed, the "perpetrator" does not necessarily represent the major contributor in the STR profile. Washing machine experiments show that transfer and persistence during laundry is limited for DNA and cell type dependent for RNA. Skin conditions such as the presence of sebum or sweat can promote DNA transfer. Results of this study, which encompasses 549 samples, increase our understanding regarding the prevalence of human cell material in background and activity scenarios. PMID:26736139

  18. Who owns what? Private ownership and the public interest in recombinant DNA technology in the 1970s.

    PubMed

    Yi, Doogab

    2011-09-01

    This essay analyzes how academic institutions, government agencies, and the nascent biotech industry contested the legal ownership of recombinant DNA technology in the name of the public interest. It reconstructs the way a small but influential group of government officials and university research administrators introduced a new framework for the commercialization of academic research in the context of a national debate over scientific research's contributions to American economic prosperity and public health. They claimed that private ownership of inventions arising from public support would provide a powerful means to liberate biomedical discoveries for public benefit. This articulation of the causal link between private ownership and the public interest, it is argued, justified a new set of expectations about the use of research results arising from government or public support, in which commercialization became a new public obligation for academic researchers. By highlighting the broader economic and legal shifts that prompted the reconfiguration of the ownership of public knowledge in late twentieth-century American capitalism, the essay examines the threads of policy-informed legal ideas that came together to affirm private ownership of biomedical knowledge as germane to the public interest in the coming of age of biotechnology and genetic medicine. PMID:22073770

  19. Aviation Safety Issues Database

    NASA Technical Reports Server (NTRS)

    Morello, Samuel A.; Ricks, Wendell R.

    2009-01-01

    The aviation safety issues database was instrumental in the refinement and substantiation of the National Aviation Safety Strategic Plan (NASSP). The issues database is a comprehensive set of issues from an extremely broad base of aviation functions, personnel, and vehicle categories, both nationally and internationally. Several aviation safety stakeholders such as the Commercial Aviation Safety Team (CAST) have already used the database. This broader interest was the genesis to making the database publically accessible and writing this report.

  20. Morchella MLST database

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Welcome to the Morchella MLST database. This dedicated database was set up at the CBS-KNAW Biodiversity Center by Vincent Robert in February 2012, using BioloMICS software (Robert et al., 2011), to facilitate DNA sequence-based identifications of Morchella species via the Internet. The current datab...

  1. 'Show me your original face before you were born': the convergence of public fetuses and sacred DNA.

    PubMed

    Gilbert, Scott F; Howes-Mischel, Rebecca

    2004-01-01

    Embryology is an intensely visual field, and it has provided the public with images of human embryos and fetuses. The responses to these images can be extremely powerful and personal, and the images (as well as our reactions to them) are conditioned by social and political agendas. The image of the 'autonomous fetus' abstracts the fetus from the mother, the womb, and from all social contexts, thereby emphasizing 'individuality'. The image of 'sacred DNA' emphasizes DNA as the unmoved mover, the eidos, the soul of the human being. Since fertilization involves the forming of a new constellation of DNA in the zygote, the act of fertilization is being perceived as the secular and technical equivalent of ensoulment. This privileges fertilization above the other possible scientifically valued times when 'human life' begins. PMID:16302694

  2. Islander: a database of integrative islands in prokaryotic genomes, the associated integrases and their DNA site specificities.

    PubMed

    Mantri, Yogita; Williams, Kelly P

    2004-01-01

    Prokaryotic chromosomes often contain islands, such as temperate phages or pathogenicity islands, delivered by site-specific integrases. Integration usually occurs within a tRNA or tmRNA gene, splitting the gene, yet sequences within the island restore the disrupted gene. The regenerated RNA gene and the displaced fragment of that gene thus mark the endpoints of the island. We applied this principle to search for islands in genomic DNA sequences. Our algorithm generates a list of tRNA and tmRNA genes, uses each as the query for a BLAST search of the starting DNA and removes unlikely hits through a series of filters. A search for islands in 106 whole bacterial genomes produced 143 candidates, with the search itself providing an estimate of three false candidates among these. Preliminary phylogenetic analysis of the associated integrases reduced this set to 89 cases of independently evolved site specificity, which showed strong bias for the tmRNA gene. The website Islander (http://www.indiana.edu/ approximately islander) presents the candidate islands in GenBank-style files and correlates integrase phylogeny with site specificity. PMID:14681358

  3. DNA

    ERIC Educational Resources Information Center

    Stent, Gunther S.

    1970-01-01

    This history for molecular genetics and its explanation of DNA begins with an analysis of the Golden Jubilee essay papers, 1955. The paper ends stating that the higher nervous system is the one major frontier of biological inquiry which still offers some romance of research. (Author/VW)

  4. The Institute of Public Administration's Document Center: From Paper to Electronic Records--A Full Image Government Documents Database.

    ERIC Educational Resources Information Center

    Al-Zahrani, Rashed S.

    Since its establishment in 1960, the Institute of Public Administration (IPA) in Riyadh, Saudi Arabia has had responsibility for documenting Saudi administrative literature, the official publications of Saudi Arabia, and the literature of regional and international organizations through establishment of the Document Center in 1961. This paper…

  5. Publications

    Cancer.gov

    Information about NCI publications including PDQ cancer information for patients and health professionals, patient-education publications, fact sheets, dictionaries, NCI blogs and newsletters and major reports.

  6. Contamination of sequence databases with adaptor sequences

    SciTech Connect

    Yoshikawa, Takeo; Sanders, A.R.; Detera-Wadleigh, S.D.

    1997-02-01

    Because of the exponential increase in the amount of DNA sequences being added to the public databases on a daily basis, it has become imperative to identify sources of contamination rapidly. Previously, contaminations of sequence databases have been reported to alert the scientific community to the problem. These contaminations can be divided into two categories. The first category comprises host sequences that have been difficult for submitters to manage or control. Examples include anomalous sequences derived from Escherichia coli, which are inserted into the chromosomes (and plasmids) of the bacterial hosts. Insertion sequences are highly mobile and are capable of transposing themselves into plasmids during cloning manipulation. Another example of the first category is the infection with yeast genomic DNA or with bacterial DNA of some commercially available cDNA libraries from Clontech. The second category of database contamination is due to the inadvertent inclusion of nonhost sequences. This category includes incorporation of cloning-vector sequences and multicloning sites in the database submission. M13-derived artifacts have been common, since M13-based vectors have been widely used for subcloning DNA fragments. Recognizing this problem, the National Center for Biotechnology Information (NCBI) started to screen, in April 1994, all sequences directly submitted to GenBank, against a set of vector data retrieved from GenBank by use of key-word searches, such as {open_quotes}vector.{close_quotes} In this report, we present evidence for another sequence artifact that is widespread but that, to our knowledge, has not yet been reported. 11 refs., 1 tab.

  7. Ionic Liquids Database- (ILThermo)

    National Institute of Standards and Technology Data Gateway

    SRD 147 Ionic Liquids Database- (ILThermo) (Web, free access)   IUPAC Ionic Liquids Database, ILThermo, is a free web research tool that allows users worldwide to access an up-to-date data collection from the publications on experimental investigations of thermodynamic, and transport properties of ionic liquids as well as binary and ternary mixtures containing ionic liquids.

  8. National Vulnerability Database (NVD)

    National Institute of Standards and Technology Data Gateway

    National Vulnerability Database (NVD) (Web, free access)   NVD is a comprehensive cyber security vulnerability database that integrates all publicly available U.S. Government vulnerability resources and provides references to industry resources. It is based on and synchronized with the CVE vulnerability naming standard.

  9. The EMBL Nucleotide Sequence Database.

    PubMed Central

    Stoesser, G; Sterk, P; Tuli, M A; Stoehr, P J; Cameron, G N

    1997-01-01

    The EMBL Nucleotide Sequence Database is a comprehensive database of DNA and RNA sequences directly submitted from researchers and genome sequencing groups and collected from the scientific literature and patent applications. In collaboration with DDBJ and GenBank the database is produced, maintained and distributed at the European Bioinformatics Institute (EBI) and constitutes Europe's primary nucleotide sequence resource. Database releases are produced quarterly and are distributed on CD-ROM. EBI's network services allow access to the most up-to-date data collection via Internet and World Wide Web interface, providing database searching and sequence similarity facilities plus access to a large number of additional databases. PMID:9016493

  10. The PNNL quantitative infrared database for gas-phase sensing: a spectral library for environmental, hazmat, and public safety standoff detection

    NASA Astrophysics Data System (ADS)

    Johnson, Timothy J.; Sams, Robert L.; Sharpe, Steven W.

    2004-03-01

    Pacific Northwest National Laboratory (PNNL) continues to expand its library of quantitative infrared reference spectra for remote sensing. The gas-phase data are recorded at 0.1 cm-1 resolution, with nitrogen pressure broadening to one atmosphere to emulate spectra recorded in the field. It is planned that the PNNL library will consist of approximately 500 vapor-phase spectra associated with the U.S. Department of Energy"s environmental, energy, and public safety missions. At present, the database is comprised of approximately 300 infrared spectra, many of which represent highly reactive or toxic species. For the 298 K data, each reported spectrum is in fact a composite spectrum generated by a Beer"s law plot (at each wavelength) to typically 12 measured spectra. Recent additions to the database include the vapors of several semi-volatile and non-volatile liquids using an improved dissemination technique for vaporizing the liquid into the nitrogen carrier gas. Experimental and analytical methods are used to remove several known and new artifacts associated with FTIR gas-phase spectroscopy. Details concerning sample preparation and composite spectrum generation are discussed.

  11. The PNNL Quantitative Infrared Database for Gas-Phase Sensing: A spectral Library for Environmental, Hazmat, and Public Safety Standoff Detection

    SciTech Connect

    Johnson, Timothy J.; Sams, Robert L.; Sharpe, Steven W.; Arthur J. Sedlacek III, Richard Colton, Tuan Vo-Dinh

    2004-03-25

    Pacific Northwest National Laboratory (PNNL) continues to expand its library of quantitative infrared reference spectra for remote sensing. The gas-phase data are recorded at 0.1 cm-1 resolution, with nitrogen pressure broadening to one atmosphere to emulate spectra recorded in the field. It is planned that the PNNL library will consist of approximately 500 vapor-phase spectra associated with the U.S. Department of Energy's environmental, energy, and public safety missions. At present, the database is comprised of approximately 300 infrared spectra, many of which represent highly reactive or toxic species. For the 298 K data, each reported spectrum is in fact a composite spectrum generated by a Beer's law plot (at each wavelength) to typically 12 measured spectra. Recent additions to the database include the vapors of several semi-volatile and non-volatile liquids using an improved dissemination technique for vaporizing the liquid into the nitrogen carrier gas. Experimental and analytical methods are used to remove several known and new artifacts associated with FTIR gas-phase spectroscopy. Details concerning sample preparation and composite spectrum generation are discussed.

  12. The PNNL Quantitative Infrared Database for Gas-Phase Sensing: A Spectral Library for Environmental, Hazmat and Public Safety Standoff Detection

    SciTech Connect

    Johnson, Timothy J.; Sams, Robert L.; Sharpe, Steven W.

    2004-01-01

    Pacific Northwest National Laboratory (PNNL) continues to expand its library of quantitative infrared reference spectra for remote sensing. The gas-phase data are recorded at 0.1 cm-1 resolution, with nitrogen pressure broadening to one atmosphere to emulate spectra recorded in the field. It is planned that the PNNL library will consist of approximately 500 vapor-phase spectra associated with DOE’s environmental, energy, and public safety missions. At present, the database is comprised of approximately 300 infrared spectra, many of which represent highly reactive or toxic species. For the 298 K data, each reported spectrum is in fact a composite spectrum generated by a Beer’s law plot (at each wavelength) to typically 12 measured spectra. Recent additions to the database include the vapors of several semi-volatile and non-volatile liquids using an improved dissemination technique for vaporizing the liquid into the nitrogen carrier gas. Experimental and analytical methods are used to remove several known and new artifacts associated with FTIR gas-phase spectroscopy. Details concerning sample preparation and composite spectrum generation are discussed.

  13. HS3D, A Dataset of Homo Sapiens Splice Regions, and its Extraction Procedure from a Major Public Database

    NASA Astrophysics Data System (ADS)

    Pollastro, Pasquale; Rampone, Salvatore

    The aim of this work is to describe a cleaning procedure of GenBank data, producing material to train and to assess the prediction accuracy of computational approaches for gene characterization. A procedure (GenBank2HS3D) has been defined, producing a dataset (HS3D - Homo Sapiens Splice Sites Dataset) of Homo Sapiens Splice regions extracted from GenBank (Rel.123 at this time). It selects, from the complete GenBank Primate Division, entries of Human Nuclear DNA according with several assessed criteria; then it extracts exons and introns from these entries (actually 4523 + 3802). Donor and acceptor sites are then extracted as windows of 140 nucleotides around each splice site (3799 + 3799). After discarding windows not including canonical GT-AG junctions (65 + 74), including insufficient data (not enough material for a 140 nucleotide window) (686 + 589), including not AGCT bases (29 + 30), and redundant (218 + 226), the remaining windows (2796 + 2880) are reported in the dataset. Finally, windows of false splice sites are selected by searching canonical GT-AG pairs in not splicing positions (271 937 + 332 296). The false sites in a range +/- 60 from a true splice site are marked as proximal. HS3D, release 1.2 at this time, is available at the Web server of the University of Sannio: http://www.sci.unisannio.it/docenti/rampone/.

  14. Publications.

    ERIC Educational Resources Information Center

    Aviation/Space, 1980

    1980-01-01

    Presents a variety of publications available from government and nongovernment sources. The government publications are from the Federal Aviation Administration (FAA) and the National Aeronautics and Space Administration (NASA) and are designed for educators, students, and the public. (Author/SA)

  15. First insight into the human liver proteome from PROTEOME(SKY)-LIVER(Hu) 1.0, a publicly available database.

    PubMed

    2010-01-01

    Herein, we report proteome and transcriptome profiles of the human adult liver and present an initial analysis. Overall, the human liver proteome (HLP) data set comprises 6788 identified proteins with at least two peptides matches at 95% confidence, including 3721 proteins newly identified in liver. The human liver transcriptome (HLT) data set consists of 11 205 expressed genes. The HLP is the largest proteome data set for a human organ and is the first direct association between a proteome and its transcriptome derived from the same sample. Although it is hard to approach complete coverage of the HLP currently, several conclusions based on this data set are clearly reached: (1) The 5816 protein-encoding genes (PEGs) represented by the HLP and the 11 104 PEGs represented in the HLT have been identified from 20 070 PEGs in IPI Human v3.07 and 19 478 PEGs in the integrated human transcriptome database, respectively. (2) The patterns of chromosomal distribution of the genes corresponding to the HLP are highly consistent with those of the HLT. Some chromosomal regions, such as 16p13.3, 19q13.31, 19q13.42, and Xq28, exhibit particularly high densities of liver-specific genes, which perform the important functions related to normal physiology or/and pathology in this organ. (3) The HLP spans 6 orders of magnitude in relative protein abundance and 78% of the proteins fall in the middle of this range. Of newly identified liver proteins, 82.5% are of low abundance. (4) Proteins involving in metabolism, transport, and coagulation and those containing active domains for metabolism, transport, and biosynthesis are significantly enriched in liver. (5) All 94 metabolic pathways in KEGG are touched to different extent. Of which, for 48 pathways, particularly those involved in metabolism of carbohydrates and amino acids, more than 80% of the component proteins have been detected. The liver-specific pathways, such as those participating in metabolism of bile acid and bilirubin and in biotransformation, are identified with remarkably high coverage. A total of 31 members of the cytochrome P450 family are identified, four of which have been observed for the first time in human liver. (6) Transport proteins involved in energy metabolism and secretion of both protein and bile acid are highly abundant. Three ion channels are described for the first time in liver. (7) The 800 proteins related to signal transduction and primarily involved in cellular recognition, localization, communication, and inflammation are present in the HLP data set. Insulin and adipocytokine pathways, which are involved in the regulation of glucose and fatty acids, are highly covered. (8) Transcription factors (309 in total) have been recognized at relatively low detection rates and abundance; however, transcription factors regulating gene expression related to transport, metabolism, and biosynthesis are detected at relatively higher coverage and the protein products of their target genes (100 in total), such as metabolic enzymes and plasma proteins, are also identified. (9) The overlap between the human liver and plasma proteomes is particularly noteworthy in the coagulation/anticoagulation/fibrinolysis and complement system. There is a significantly positive linear correlation between the abundance of coagulator proteins in liver and plasma. PMID:19653699

  16. DSSTOX STRUCTURE-SEARCHABLE PUBLIC TOXICITY DATABASE NETWORK: CURRENT PROGRESS AND NEW INITIATIVES TO IMPROVE CHEMO-BIOINFORMATICS CAPABILITIES

    EPA Science Inventory

    The EPA DSSTox website (http://www/epa.gov/nheerl/dsstox) publishes standardized, structure-annotated toxicity databases, covering a broad range of toxicity disciplines. Each DSSTox database features documentation written in collaboration with the source authors and toxicity expe...

  17. A database for tracking toxicogenomic samples and procedures.

    PubMed

    Bao, Wenjun; Schmid, Judith E; Goetz, Amber K; Ren, Hongzu; Dix, David J

    2005-01-01

    Reproductive toxicogenomic studies generate large amounts of toxicological and genomic data. On the toxicology side, a substantial quantity of data accumulates from conventional endpoints such as histology, reproductive physiology and biochemistry. The largest source of genomics data is DNA microarrays, which generate enormous amounts of information in the course of profiling gene expression. Thus, data storage and management become essential and require a more sophisticated system than lab notebooks and electronic spreadsheets. We developed a database for tracking toxicogenomic samples and procedures (TSP 1.0) for our reproductive studies based on the MIAME-Tox guidelines and relational database theory. This database stores the various types of data from both toxicological and genomic assays in a hierarchical fashion. The user-friendly interface provides easy procedures for researchers to add, edit, save, delete, and navigate different records. Finally, TSP facilitates exporting microarray data into public databases. PMID:15686874

  18. Overview of the HUPO Plasma Proteome Project: Results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database

    SciTech Connect

    Omenn, Gilbert; States, David J.; Adamski, Marcin; Blackwell, Thomas W.; Menon, Rajasree; Hermjakob, Henning; Apweiler, Rolf; Haab, Brian B.; Simpson, Richard; Eddes, James; Kapp, Eugene; Moritz, Rod; Chan, Daniel W.; Rai, Alex J.; Admon, Arie; Aebersold, Ruedi; Eng, Jimmy K.; Hancock, William S.; Hefta, Stanley A.; Meyer, Helmut; Paik, Young-Ki; Yoo, Jong-Shin; Ping, Peipei; Pounds, Joel G.; Adkins, Joshua N.; Qian, Xiaohong; Wang, Rong; Wasinger, Valerie; Wu, Chi Yue; Zhao, Xiaohang; Zeng, Rong; Archakov, Alexander; Tsugita, Akira; Beer, Ilan; Pandey, Akhilesh; Pisano, Michael; Andrews, Philip; Tammen, Harald; Speicher, David W.; Hanash, Samir M.

    2005-08-13

    HUPO initiated the Plasma Proteome Project (PPP) in 2002. Its pilot phase has (1) evaluated advantages and limitations of many depletion, fractionation, and MS technology platforms; (2) compared PPP reference specimens of human serum and EDTA, heparin, and citrate-anticoagulated plasma; and (3) created a publicly-available knowledge base (www.bioinformatics. med.umich.edu/hupo/ppp; www.ebi.ac.uk/pride). Thirty-five participating laboratories in 13 countries submitted datasets. Working groups addressed (a) specimen stability and protein concentrations; (b) protein identifications from 18 MS/MS datasets; (c) independent analyses from raw MS-MS spectra; (d) search engine performance, subproteome analyses, and biological insights; (e) antibody arrays; and (f) direct MS/SELDI analyses. MS-MS datasets had 15 710 different International Protein Index (IPI) protein IDs; our integration algorithm applied to multiple matches of peptide sequences yielded 9504 IPI proteins identified with one or more peptides and 3020 proteins identified with two or more peptides (the Core Dataset). These proteins have been characterized with Gene Ontology, InterPro, Novartis Atlas, OMIM, and immunoassay based concentration determinations. The database permits examination of many other subsets, such as 1274 proteins identified with three or more peptides. Reverse protein to DNA matching identified proteins for 118 previously unidentified ORFs. We recommend use of plasma instead of serum, with EDTA (or citrate) for anticoagulation. To improve resolution, sensitivity and reproducibility of peptide identifications and protein matches, we recommend combinations of depletion, fractionation, and MS/MS technologies, with explicit criteria for evaluation of spectra, use of search algorithms, and integration of homologous protein matches. This Special Issue of PROTEOMICS presents papers integral to the collaborative analysis plus many reports of supplementary work on various aspects of the PPP workplan. These PPP results on complexity, dynamic range, incomplete sampling, false-positive matches, and integration of diverse datasets for plasma and serum proteins lay a foundation for development and validation of circulating protein biomarkers in health and disease.

  19. Biofuel Database

    National Institute of Standards and Technology Data Gateway

    Biofuel Database (Web, free access)   This database brings together structural, biological, and thermodynamic data for enzymes that are either in current use or are being considered for use in the production of biofuels.

  20. A practical tool for public health surveillance: Semi-automated coding of short injury narratives from large administrative databases using Naïve Bayes algorithms.

    PubMed

    Marucci-Wellman, Helen R; Lehto, Mark R; Corns, Helen L

    2015-11-01

    Public health surveillance programs in the U.S. are undergoing landmark changes with the availability of electronic health records and advancements in information technology. Injury narratives gathered from hospital records, workers compensation claims or national surveys can be very useful for identifying antecedents to injury or emerging risks. However, classifying narratives manually can become prohibitive for large datasets. The purpose of this study was to develop a human-machine system that could be relatively easily tailored to routinely and accurately classify injury narratives from large administrative databases such as workers compensation. We used a semi-automated approach based on two Naïve Bayesian algorithms to classify 15,000 workers compensation narratives into two-digit Bureau of Labor Statistics (BLS) event (leading to injury) codes. Narratives were filtered out for manual review if the algorithms disagreed or made weak predictions. This approach resulted in an overall accuracy of 87%, with consistently high positive predictive values across all two-digit BLS event categories including the very small categories (e.g., exposure to noise, needle sticks). The Naïve Bayes algorithms were able to identify and accurately machine code most narratives leaving only 32% (4853) for manual review. This strategy substantially reduces the need for resources compared with manual review alone. PMID:26412196

  1. Method and system for normalizing biometric variations to authenticate users from a public database and that ensures individual biometric data privacy

    DOEpatents

    Strait, Robert S.; Pearson, Peter K.; Sengupta, Sailes K.

    2000-01-01

    A password system comprises a set of codewords spaced apart from one another by a Hamming distance (HD) that exceeds twice the variability that can be projected for a series of biometric measurements for a particular individual and that is less than the HD that can be encountered between two individuals. To enroll an individual, a biometric measurement is taken and exclusive-ORed with a random codeword to produce a "reference value." To verify the individual later, a biometric measurement is taken and exclusive-ORed with the reference value to reproduce the original random codeword or its approximation. If the reproduced value is not a codeword, the nearest codeword to it is found, and the bits that were corrected to produce the codeword to it is found, and the bits that were corrected to produce the codeword are also toggled in the biometric measurement taken and the codeword generated during enrollment. The correction scheme can be implemented by any conventional error correction code such as Reed-Muller code R(m,n). In the implementation using a hand geometry device an R(2,5) code has been used in this invention. Such codeword and biometric measurement can then be used to see if the individual is an authorized user. Conventional Diffie-Hellman public key encryption schemes and hashing procedures can then be used to secure the communications lines carrying the biometric information and to secure the database of authorized users.

  2. Computational tools and resources for metabolism-related property predictions. 1. Overview of publicly available (free and commercial) databases and software

    PubMed Central

    Peach, Megan L; Zakharov, Alexey V; Liu, Ruifeng; Pugliese, Angelo; Tawa, Gregory; Wallqvist, Anders; Nicklaus, Marc C

    2014-01-01

    Metabolism has been identified as a defining factor in drug development success or failure because of its impact on many aspects of drug pharmacology, including bioavailability, half-life and toxicity. In this article, we provide an outline and descriptions of the resources for metabolism-related property predictions that are currently either freely or commercially available to the public. These resources include databases with data on, and software for prediction of, several end points: metabolite formation, sites of metabolic transformation, binding to metabolizing enzymes and metabolic stability. We attempt to place each tool in historical context and describe, wherever possible, the data it was based on. For predictions of interactions with metabolizing enzymes, we show a typical set of results for a small test set of compounds. Our aim is to give a clear overview of the areas and aspects of metabolism prediction in which the currently available resources are useful and accurate, and the areas in which they are inadequate or missing entirely. PMID:23088273

  3. Database Administrator

    ERIC Educational Resources Information Center

    Moore, Pam

    2010-01-01

    The Internet and electronic commerce (e-commerce) generate lots of data. Data must be stored, organized, and managed. Database administrators, or DBAs, work with database software to find ways to do this. They identify user needs, set up computer databases, and test systems. They ensure that systems perform as they should and add people to the

  4. Electronic Databases.

    ERIC Educational Resources Information Center

    Williams, Martha E.

    1985-01-01

    Presents examples of bibliographic, full-text, and numeric databases. Also discusses how to access these databases online, aids to online retrieval, and several issues and trends (including copyright and downloading, transborder data flow, use of optical disc/videodisc technology, and changing roles in database generation and processing). (JN)

  5. Database Administrator

    ERIC Educational Resources Information Center

    Moore, Pam

    2010-01-01

    The Internet and electronic commerce (e-commerce) generate lots of data. Data must be stored, organized, and managed. Database administrators, or DBAs, work with database software to find ways to do this. They identify user needs, set up computer databases, and test systems. They ensure that systems perform as they should and add people to the…

  6. Public Databases Supporting Computational Toxicology

    EPA Science Inventory

    A major goal of the emerging field of computational toxicology is the development of screening-level models that predict potential toxicity of chemicals from a combination of mechanistic in vitro assay data and chemical structure descriptors. In order to build these models, resea...

  7. Fast decision tree-based method to index large DNA-protein sequence databases using hybrid distributed-shared memory programming model.

    PubMed

    Jaber, Khalid Mohammad; Abdullah, Rosni; Rashid, Nur'Aini Abdul

    2014-01-01

    In recent times, the size of biological databases has increased significantly, with the continuous growth in the number of users and rate of queries; such that some databases have reached the terabyte size. There is therefore, the increasing need to access databases at the fastest rates possible. In this paper, the decision tree indexing model (PDTIM) was parallelised, using a hybrid of distributed and shared memory on resident database; with horizontal and vertical growth through Message Passing Interface (MPI) and POSIX Thread (PThread), to accelerate the index building time. The PDTIM was implemented using 1, 2, 4 and 5 processors on 1, 2, 3 and 4 threads respectively. The results show that the hybrid technique improved the speedup, compared to a sequential version. It could be concluded from results that the proposed PDTIM is appropriate for large data sets, in terms of index building time. PMID:24794073

  8. IDPredictor: predict database links in biomedical database.

    PubMed

    Mehlhorn, Hendrik; Lange, Matthias; Scholz, Uwe; Schreiber, Falk

    2012-01-01

    Knowledge found in biomedical databases, in particular in Web information systems, is a major bioinformatics resource. In general, this biological knowledge is worldwide represented in a network of databases. These data is spread among thousands of databases, which overlap in content, but differ substantially with respect to content detail, interface, formats and data structure. To support a functional annotation of lab data, such as protein sequences, metabolites or DNA sequences as well as a semi-automated data exploration in information retrieval environments, an integrated view to databases is essential. Search engines have the potential of assisting in data retrieval from these structured sources, but fall short of providing a comprehensive knowledge except out of the interlinked databases. A prerequisite of supporting the concept of an integrated data view is to acquire insights into cross-references among database entities. This issue is being hampered by the fact, that only a fraction of all possible cross-references are explicitely tagged in the particular biomedical informations systems. In this work, we investigate to what extend an automated construction of an integrated data network is possible. We propose a method that predicts and extracts cross-references from multiple life science databases and possible referenced data targets. We study the retrieval quality of our method and report on first, promising results. The method is implemented as the tool IDPredictor, which is published under the DOI 10.5447/IPK/2012/4 and is freely available using the URL: http://dx.doi.org/10.5447/IPK/2012/4. PMID:22736059

  9. International Society of Human and Animal Mycology (ISHAM)-ITS reference DNA barcoding database--the quality controlled standard tool for routine identification of human and animal pathogenic fungi.

    PubMed

    Irinyi, Laszlo; Serena, Carolina; Garcia-Hermoso, Dea; Arabatzis, Michael; Desnos-Ollivier, Marie; Vu, Duong; Cardinali, Gianluigi; Arthur, Ian; Normand, Anne-Cécile; Giraldo, Alejandra; da Cunha, Keith Cassia; Sandoval-Denis, Marcelo; Hendrickx, Marijke; Nishikaku, Angela Satie; de Azevedo Melo, Analy Salles; Merseguel, Karina Bellinghausen; Khan, Aziza; Parente Rocha, Juliana Alves; Sampaio, Paula; da Silva Briones, Marcelo Ribeiro; e Ferreira, Renata Carmona; de Medeiros Muniz, Mauro; Castañón-Olivares, Laura Rosio; Estrada-Barcenas, Daniel; Cassagne, Carole; Mary, Charles; Duan, Shu Yao; Kong, Fanrong; Sun, Annie Ying; Zeng, Xianyu; Zhao, Zuotao; Gantois, Nausicaa; Botterel, Françoise; Robbertse, Barbara; Schoch, Conrad; Gams, Walter; Ellis, David; Halliday, Catriona; Chen, Sharon; Sorrell, Tania C; Piarroux, Renaud; Colombo, Arnaldo L; Pais, Célia; de Hoog, Sybren; Zancopé-Oliveira, Rosely Maria; Taylor, Maria Lucia; Toriello, Conchita; de Almeida Soares, Célia Maria; Delhaes, Laurence; Stubbe, Dirk; Dromer, Françoise; Ranque, Stéphane; Guarro, Josep; Cano-Lira, Jose F; Robert, Vincent; Velegraki, Aristea; Meyer, Wieland

    2015-05-01

    Human and animal fungal pathogens are a growing threat worldwide leading to emerging infections and creating new risks for established ones. There is a growing need for a rapid and accurate identification of pathogens to enable early diagnosis and targeted antifungal therapy. Morphological and biochemical identification methods are time-consuming and require trained experts. Alternatively, molecular methods, such as DNA barcoding, a powerful and easy tool for rapid monophasic identification, offer a practical approach for species identification and less demanding in terms of taxonomical expertise. However, its wide-spread use is still limited by a lack of quality-controlled reference databases and the evolving recognition and definition of new fungal species/complexes. An international consortium of medical mycology laboratories was formed aiming to establish a quality controlled ITS database under the umbrella of the ISHAM working group on "DNA barcoding of human and animal pathogenic fungi." A new database, containing 2800 ITS sequences representing 421 fungal species, providing the medical community with a freely accessible tool at http://www.isham.org/ and http://its.mycologylab.org/ to rapidly and reliably identify most agents of mycoses, was established. The generated sequences included in the new database were used to evaluate the variation and overall utility of the ITS region for the identification of pathogenic fungi at intra-and interspecies level. The average intraspecies variation ranged from 0 to 2.25%. This highlighted selected pathogenic fungal species, such as the dermatophytes and emerging yeast, for which additional molecular methods/genetic markers are required for their reliable identification from clinical and veterinary specimens. PMID:25802363

  10. The Hawaiian Algal Database: a laboratory LIMS and online resource for biodiversity data

    PubMed Central

    Wang, Norman; Sherwood, Alison R; Kurihara, Akira; Conklin, Kimberly Y; Sauvage, Thomas; Presting, Gernot G

    2009-01-01

    Background Organization and presentation of biodiversity data is greatly facilitated by databases that are specially designed to allow easy data entry and organized data display. Such databases also have the capacity to serve as Laboratory Information Management Systems (LIMS). The Hawaiian Algal Database was designed to showcase specimens collected from the Hawaiian Archipelago, enabling users around the world to compare their specimens with our photographs and DNA sequence data, and to provide lab personnel with an organizational tool for storing various biodiversity data types. Description We describe the Hawaiian Algal Database, a comprehensive and searchable database containing photographs and micrographs, geo-referenced collecting information, taxonomic checklists and standardized DNA sequence data. All data for individual samples are linked through unique accession numbers. Users can search online for sample information by accession number, numerous levels of taxonomy, or collection site. At the present time the database contains data representing over 2,000 samples of marine, freshwater and terrestrial algae from the Hawaiian Archipelago. These samples are primarily red algae, although other taxa are being added. Conclusion The Hawaiian Algal Database is a digital repository for Hawaiian algal samples and acts as a LIMS for the laboratory. Users can make use of the online search tool to view and download specimen photographs and micrographs, DNA sequences and relevant habitat data, including georeferenced collecting locations. It is publicly available at . PMID:19728892

  11. A database of expressed genes from Cochliomyia hominivorax (Diptera: Calliphoridae).

    PubMed

    Guerrero, F D; Dowd, S E; Djikeng, A; Wiley, G; Macmil, S; Saldivar, L; Najar, F; Roe, B A

    2009-09-01

    We used an expressed sequence tag and 454 pyrosequencing approach to initiate a study of the genome of the screwworm, Cochliomyia hominivorax (Coquerel) (Diptera: Calliphoridae). Two normalized cDNA libraries were constructed from RNA isolated from embryos and second instar larvae from the Panama 95 strain. Approximately 5,400 clones from each library were sequenced from both the 5' and 3' directions using the Sanger method. In addition, double-stranded cDNA was prepared from random-primed polyA RNA purified from embryos, second-instar larvae, adult males, and adult females. These four cDNA samples were used for 454 pyrosequencing that produced approximately 300,000 independent sequences. Sequences were assembled into a database of assembled contigs and singletons and used to search public protein databases and annotate the sequences. The full database consists of 6,076 contigs and 58,221 singletons assembled from both the traditional expressed sequence tag (EST) and 454 sequences. Annotation of the data led to the identification of several gene coding regions with possible roles in sex determination in the screwworm. This database will facilitate the design of microarray and other experiments to study screwworm gene expression on a larger scale than previously possible. PMID:19769042

  12. Database Manager

    ERIC Educational Resources Information Center

    Martin, Andrew

    2010-01-01

    It is normal practice today for organizations to store large quantities of records of related information as computer-based files or databases. Purposeful information is retrieved by performing queries on the data sets. The purpose of DATABASE MANAGER is to communicate to students the method by which the computer performs these queries. This…

  13. Maize databases

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This chapter is a succinct overview of maize data held in the species-specific database MaizeGDB (the Maize Genomics and Genetics Database), and selected multi-species data repositories, such as Gramene/Ensembl Plants, Phytozome, UniProt and the National Center for Biotechnology Information (NCBI), ...

  14. National Ambient Radiation Database

    SciTech Connect

    Dziuban, J.; Sears, R.

    2003-02-25

    The U.S. Environmental Protection Agency (EPA) recently developed a searchable database and website for the Environmental Radiation Ambient Monitoring System (ERAMS) data. This site contains nationwide radiation monitoring data for air particulates, precipitation, drinking water, surface water and pasteurized milk. This site provides location-specific as well as national information on environmental radioactivity across several media. It provides high quality data for assessing public exposure and environmental impacts resulting from nuclear emergencies and provides baseline data during routine conditions. The database and website are accessible at www.epa.gov/enviro/. This site contains (1) a query for the general public which is easy to use--limits the amount of information provided, but includes the ability to graph the data with risk benchmarks and (2) a query for a more technical user which allows access to all of the data in the database, (3) background information on ER AMS.

  15. The Radiation Hybrid Database.

    PubMed Central

    Lijnzaad, P; Helgesen, C; Rodriguez-Tomé, P

    1998-01-01

    Since July 1995, the European Bioinformatics Institute (EBI) has maintained RHdb (http://www.ebi.ac.uk/RHdb/RHdb.html ), a public database for radiation hybrid data. Radiation hybrid mapping is an important technique for determining high resolution maps. Recently, CORBA access has been added to Rhdb. The EBI is an Outstation of the European Molecular Biology Laboratory (EMBL). PMID:9399810

  16. Investigating Evolutionary Questions Using Online Molecular Databases.

    ERIC Educational Resources Information Center

    Puterbaugh, Mary N.; Burleigh, J. Gordon

    2001-01-01

    Recommends using online molecular databases as teaching tools to illustrate evolutionary questions and concepts while introducing students to public molecular databases. Provides activities in which students make molecular comparisons between species. (YDS)

  17. The EUVE satellite survey database

    NASA Technical Reports Server (NTRS)

    Craig, N.; Chen, T.; Hawkins, I.; Fruscione, A.

    1993-01-01

    The EUVE survey database contains fundamental science data for 9000 potential source locations (pigeonholes) in the sky. The first release of the Bright Source List is now available to the public through an interface with the NASA Astrophysical Data System. We describe the database schema design and the EUVE source categorization algorithm that compares sources to the ROSAT Wide Field Camera source list.

  18. Database Licensing: A Future View.

    ERIC Educational Resources Information Center

    Flanagan, Michael

    1993-01-01

    Access to database information in libraries will increase as licenses for tape loading of data onto public access catalogs becomes more widespread. Institutions with adequate storage capacity will have full text databases, and the adoption of the Z39.50 standard, which allows differing computer systems to interface with each other, will increase…

  19. BIOMARKERS DATABASE

    EPA Science Inventory

    This database was developed by assembling and evaluating the literature relevant to human biomarkers. It catalogues and evaluates the usefulness of biomarkers of exposure, susceptibility and effect which may be relevant for a longitudinal cohort study. In addition to describing ...

  20. Database filters

    SciTech Connect

    Pramanik, S.

    1982-01-01

    Several hardware database-searchers for a large number of patterns or keys are presented. These searchers can be implemented by a random access memory and are suitable for VLSI implementation. Application of these searchers as database filters is described; a filter detects all the matched records in the database, as well as a few others. The percentage of unmatched records can be reduced to any arbitrary minimum value by using several filters together, or passing the output records repeatedly through the same filters. The performance of the filters using the iterative approach depends very much on the regrouping algorithms of the patterns/keys. Several such algorithms are presented and their performances compared. A single pass is required if they are pipelined. Hardware organisation for different pipelined approaches are also studied. Experiments are performed for all the different hardware organisations mentioned above on an employee-name database. 25 references.

  1. Fun Databases: My Top Ten.

    ERIC Educational Resources Information Center

    O'Leary, Mick

    1992-01-01

    Provides reviews of 10 online databases: Consumer Reports; Public Opinion Online; Encyclopedia of Associations; Official Airline Guide Adventure Atlas and Events Calendar; CENDATA; Hollywood Hotline; Fearless Taster; Soap Opera Summaries; and Human Sexuality. (LRW)

  2. Addition of a breeding database in the Genome Database for Rosaceae.

    PubMed

    Evans, Kate; Jung, Sook; Lee, Taein; Brutcher, Lisa; Cho, Ilhyung; Peace, Cameron; Main, Dorrie

    2013-01-01

    Breeding programs produce large datasets that require efficient management systems to keep track of performance, pedigree, geographical and image-based data. With the development of DNA-based screening technologies, more breeding programs perform genotyping in addition to phenotyping for performance evaluation. The integration of breeding data with other genomic and genetic data is instrumental for the refinement of marker-assisted breeding tools, enhances genetic understanding of important crop traits and maximizes access and utility by crop breeders and allied scientists. Development of new infrastructure in the Genome Database for Rosaceae (GDR) was designed and implemented to enable secure and efficient storage, management and analysis of large datasets from the Washington State University apple breeding program and subsequently expanded to fit datasets from other Rosaceae breeders. The infrastructure was built using the software Chado and Drupal, making use of the Natural Diversity module to accommodate large-scale phenotypic and genotypic data. Breeders can search accessions within the GDR to identify individuals with specific trait combinations. Results from Search by Parentage lists individuals with parents in common and results from Individual Variety pages link to all data available on each chosen individual including pedigree, phenotypic and genotypic information. Genotypic data are searchable by markers and alleles; results are linked to other pages in the GDR to enable the user to access tools such as GBrowse and CMap. This breeding database provides users with the opportunity to search datasets in a fully targeted manner and retrieve and compare performance data from multiple selections, years and sites, and to output the data needed for variety release publications and patent applications. The breeding database facilitates efficient program management. Storing publicly available breeding data in a database together with genomic and genetic data will further accelerate the cross-utilization of diverse data types by researchers from various disciplines. Database URL: http://www.rosaceae.org/breeders_toolbox. PMID:24247530

  3. Content based information retrieval in forensic image databases.

    PubMed

    Geradts, Zeno; Bijhold, Jurrien

    2002-03-01

    This paper gives an overview of the various available image databases and ways of searching these databases on image contents. The developments in research groups of searching in image databases is evaluated and compared with the forensic databases that exist. Forensic image databases of fingerprints, faces, shoeprints, handwriting, cartridge cases, drugs tablets, and tool marks are described. The developments in these fields appear to be valuable for forensic databases, especially that of the framework in MPEG-7, where the searching in image databases is standardized. In the future, the combination of the databases (also DNA-databases) and possibilities to combine these can result in stronger forensic evidence. PMID:11908596

  4. NUCLEAR DATABASES FOR REACTOR APPLICATIONS.

    SciTech Connect

    PRITYCHENKO, B.; ARCILLA, R.; BURROWS, T.; HERMAN, M.W.; MUGHABGHAB, S.; OBLOZINSKY, P.; ROCHMAN, D.; SONZOGNI, A.A.; TULI, J.; WINCHELL, D.F.

    2006-06-05

    The National Nuclear Data Center (NNDC): An overview of nuclear databases, related products, nuclear data Web services and publications. The NNDC collects, evaluates, and disseminates nuclear physics data for basic research and applied nuclear technologies. The NNDC maintains and contributes to the nuclear reaction (ENDF, CSISRS) and nuclear structure databases along with several others databases (CapGam, MIRD, IRDF-2002) and provides coordination for the Cross Section Evaluation Working Group (CSEWG) and the US Nuclear Data Program (USNDP). The Center produces several publications and codes such as Atlas of Neutron Resonances, Nuclear Wallet Cards booklets and develops codes, such as nuclear reaction model code Empire.

  5. DNA data bank of Japan (DDBJ) progress report.

    PubMed

    Mashima, Jun; Kodama, Yuichi; Kosuge, Takehide; Fujisawa, Takatomo; Katayama, Toshiaki; Nagasaki, Hideki; Okuda, Yoshihiro; Kaminuma, Eli; Ogasawara, Osamu; Okubo, Kousaku; Nakamura, Yasukazu; Takagi, Toshihisa

    2016-01-01

    The DNA Data Bank of Japan Center (DDBJ Center; http://www.ddbj.nig.ac.jp) maintains and provides public archival, retrieval and analytical services for biological information. The contents of the DDBJ databases are shared with the US National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI) within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). Since 2013, the DDBJ Center has been operating the Japanese Genotype-phenotype Archive (JGA) in collaboration with the National Bioscience Database Center (NBDC) in Japan. In addition, the DDBJ Center develops semantic web technologies for data integration and sharing in collaboration with the Database Center for Life Science (DBCLS) in Japan. This paper briefly reports on the activities of the DDBJ Center over the past year including submissions to databases and improvements in our services for data retrieval, analysis, and integration. PMID:26578571

  6. DNA data bank of Japan (DDBJ) progress report

    PubMed Central

    Mashima, Jun; Kodama, Yuichi; Kosuge, Takehide; Fujisawa, Takatomo; Katayama, Toshiaki; Nagasaki, Hideki; Okuda, Yoshihiro; Kaminuma, Eli; Ogasawara, Osamu; Okubo, Kousaku; Nakamura, Yasukazu; Takagi, Toshihisa

    2016-01-01

    The DNA Data Bank of Japan Center (DDBJ Center; http://www.ddbj.nig.ac.jp) maintains and provides public archival, retrieval and analytical services for biological information. The contents of the DDBJ databases are shared with the US National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI) within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). Since 2013, the DDBJ Center has been operating the Japanese Genotype-phenotype Archive (JGA) in collaboration with the National Bioscience Database Center (NBDC) in Japan. In addition, the DDBJ Center develops semantic web technologies for data integration and sharing in collaboration with the Database Center for Life Science (DBCLS) in Japan. This paper briefly reports on the activities of the DDBJ Center over the past year including submissions to databases and improvements in our services for data retrieval, analysis, and integration. PMID:26578571

  7. ARTI Refrigerant Database

    SciTech Connect

    Calm, J.M.

    1992-04-30

    The Refrigerant Database consolidates and facilitates access to information to assist industry in developing equipment using alternative refrigerants. The underlying purpose is to accelerate phase out of chemical compounds of environmental concern. The database provides bibliographic citations and abstracts for publications that may be useful in research and design of air- conditioning and refrigeration equipment. The complete documents are not included, though some may be added at a later date. The database identifies sources of specific information on R-32, R-123, R-124, R- 125, R-134a, R-141b, R142b, R-143a, R-152a, R-290 (propane), R-717 (ammonia), ethers, and others as well as azeotropic and zeotropic blends of these fluids. It addresses polyalkylene glycol (PAG), ester, and other lubricants. It also references documents addressing compatibility of refrigerants and lubricants with metals, plastics, elastomers, motor insulation, and other materials used in refrigerant circuits.

  8. A comprehensive DNA barcode database for Central European beetles with a focus on Germany: adding more than 3500 identified species to BOLD.

    PubMed

    Hendrich, Lars; Morinière, Jérôme; Haszprunar, Gerhard; Hebert, Paul D N; Hausmann, Axel; Köhler, Frank; Balke, Michael

    2015-07-01

    Beetles are the most diverse group of animals and are crucial for ecosystem functioning. In many countries, they are well established for environmental impact assessment, but even in the well-studied Central European fauna, species identification can be very difficult. A comprehensive and taxonomically well-curated DNA barcode library could remedy this deficit and could also link hundreds of years of traditional knowledge with next generation sequencing technology. However, such a beetle library is missing to date. This study provides the globally largest DNA barcode reference library for Coleoptera for 15 948 individuals belonging to 3514 well-identified species (53% of the German fauna) with representatives from 97 of 103 families (94%). This study is the first comprehensive regional test of the efficiency of DNA barcoding for beetles with a focus on Germany. Sequences ≥500 bp were recovered from 63% of the specimens analysed (15 948 of 25 294) with short sequences from another 997 specimens. Whereas most specimens (92.2%) could be unambiguously assigned to a single known species by sequence diversity at CO1, 1089 specimens (6.8%) were assigned to more than one Barcode Index Number (BIN), creating 395 BINs which need further study to ascertain if they represent cryptic species, mitochondrial introgression, or simply regional variation in widespread species. We found 409 specimens (2.6%) that shared a BIN assignment with another species, most involving a pair of closely allied species as 43 BINs were involved. Most of these taxa were separated by barcodes although sequence divergences were low. Only 155 specimens (0.97%) show identical or overlapping clusters. PMID:25469559

  9. Solubility Database

    National Institute of Standards and Technology Data Gateway

    SRD 106 IUPAC-NIST Solubility Database (Web, free access)   These solubilities are compiled from 18 volumes (Click here for List) of the International Union for Pure and Applied Chemistry(IUPAC)-NIST Solubility Data Series. The database includes liquid-liquid, solid-liquid, and gas-liquid systems. Typical solvents and solutes include water, seawater, heavy water, inorganic compounds, and a variety of organic compounds such as hydrocarbons, halogenated hydrocarbons, alcohols, acids, esters and nitrogen compounds. There are over 67,500 solubility measurements and over 1800 references.

  10. The Genopolis Microarray Database

    PubMed Central

    Splendiani, Andrea; Brandizi, Marco; Even, Gael; Beretta, Ottavio; Pavelka, Norman; Pelizzola, Mattia; Mayhaus, Manuel; Foti, Maria; Mauri, Giancarlo; Ricciardi-Castagnoli, Paola

    2007-01-01

    Background Gene expression databases are key resources for microarray data management and analysis and the importance of a proper annotation of their content is well understood. Public repositories as well as microarray database systems that can be implemented by single laboratories exist. However, there is not yet a tool that can easily support a collaborative environment where different users with different rights of access to data can interact to define a common highly coherent content. The scope of the Genopolis database is to provide a resource that allows different groups performing microarray experiments related to a common subject to create a common coherent knowledge base and to analyse it. The Genopolis database has been implemented as a dedicated system for the scientific community studying dendritic and macrophage cells functions and host-parasite interactions. Results The Genopolis Database system allows the community to build an object based MIAME compliant annotation of their experiments and to store images, raw and processed data from the Affymetrix GeneChip® platform. It supports dynamical definition of controlled vocabularies and provides automated and supervised steps to control the coherence of data and annotations. It allows a precise control of the visibility of the database content to different sub groups in the community and facilitates exports of its content to public repositories. It provides an interactive users interface for data analysis: this allows users to visualize data matrices based on functional lists and sample characterization, and to navigate to other data matrices defined by similarity of expression values as well as functional characterizations of genes involved. A collaborative environment is also provided for the definition and sharing of functional annotation by users. Conclusion The Genopolis Database supports a community in building a common coherent knowledge base and analyse it. This fills a gap between a local database and a public repository, where the development of a common coherent annotation is important. In its current implementation, it provides a uniform coherently annotated dataset on dendritic cells and macrophage differentiation. PMID:17430566

  11. COXPRESdb in 2015: coexpression database for animal species by DNA-microarray and RNAseq-based expression data with multiple quality assessment systems

    PubMed Central

    Okamura, Yasunobu; Aoki, Yuichi; Obayashi, Takeshi; Tadaka, Shu; Ito, Satoshi; Narise, Takafumi; Kinoshita, Kengo

    2015-01-01

    The COXPRESdb (http://coxpresdb.jp) provides gene coexpression relationships for animal species. Here, we report the updates of the database, mainly focusing on the following two points. For the first point, we added RNAseq-based gene coexpression data for three species (human, mouse and fly), and largely increased the number of microarray experiments to nine species. The increase of the number of expression data with multiple platforms could enhance the reliability of coexpression data. For the second point, we refined the data assessment procedures, for each coexpressed gene list and for the total performance of a platform. The assessment of coexpressed gene list now uses more reasonable P-values derived from platform-specific null distribution. These developments greatly reduced pseudo-predictions for directly associated genes, thus expanding the reliability of coexpression data to design new experiments and to discuss experimental results. PMID:25392420

  12. FishTraits Database

    USGS Publications Warehouse

    Angermeier, Paul L.; Frimpong, Emmanuel A.

    2009-01-01

    The need for integrated and widely accessible sources of species traits data to facilitate studies of ecology, conservation, and management has motivated development of traits databases for various taxa. In spite of the increasing number of traits-based analyses of freshwater fishes in the United States, no consolidated database of traits of this group exists publicly, and much useful information on these species is documented only in obscure sources. The largely inaccessible and unconsolidated traits information makes large-scale analysis involving many fishes and/or traits particularly challenging. FishTraits is a database of >100 traits for 809 (731 native and 78 exotic) fish species found in freshwaters of the conterminous United States, including 37 native families and 145 native genera. The database contains information on four major categories of traits: (1) trophic ecology, (2) body size and reproductive ecology (life history), (3) habitat associations, and (4) salinity and temperature tolerances. Information on geographic distribution and conservation status is also included. Together, we refer to the traits, distribution, and conservation status information as attributes. Descriptions of attributes are available here. Many sources were consulted to compile attributes, including state and regional species accounts and other databases.

  13. Annual Review of Database Developments 1991.

    ERIC Educational Resources Information Center

    Basch, Reva

    1991-01-01

    Review of developments in databases highlights a new emphasis on accessibility. Topics discussed include the internationalization of databases; databases that deal with finance, drugs, and toxic waste; access to public records, both personal and corporate; media online; reducing large files of data to smaller, more manageable files; and…

  14. ECOTOX database; new additions and future direction

    EPA Science Inventory

    The ECOTOXicology database (ECOTOX) is a comprehensive, publicly available knowledgebase developed and maintained by ORD/NHEERL. It is used for environmental toxicity data on aquatic life, terrestrial plants and wildlife. Publications are identified for potential applicability af...

  15. Quality control of EUVE databases

    NASA Technical Reports Server (NTRS)

    John, L. M.; Drake, J.

    1992-01-01

    The publicly accessible databases for the Extreme Ultraviolet Explorer include: the EUVE Archive mailserver; the CEA ftp site; the EUVE Guest Observer Mailserver; and the Astronomical Data System node. The EUVE Performance Assurance team is responsible for verifying that these public EUVE databases are working properly, and that the public availability of EUVE data contained therein does not infringe any data rights which may have been assigned. In this poster, we describe the Quality Assurance (QA) procedures we have developed from the approach of QA as a service organization, thus reflecting the overall EUVE philosophy of Quality Assurance integrated into normal operating procedures, rather than imposed as an external, post facto, control mechanism.

  16. Progress towards a Spacecraft-Associated Microbial Meta-database (SAMM)

    NASA Astrophysics Data System (ADS)

    Mogul, Rakesh; Keagy, Laura; Nava, Argelia; Zerehi, Farah

    The microbial inventories within the assembly facilities for spacecraft represent the primary pool of forward contaminants that may compromise life-detection missions. Accordingly, we are constructing a meta-database of these microorganisms for the purpose of building a bioinformatic resource for planetary protection and astrobiology-related endeavors. Using student-led efforts, the meta-database is being constructed from literature reports and is inclusive of both isolated microorganisms and those solely detected through DNA-based techniques. The Spacecraft-Associated Microbial Meta-database (SAMM) currently includes over 800 entries that are organized using 32 meta-tags involving taxonomy, location of isolation (facility and component), category of characterization (culture and/or genetic), types of characterizations (e.g., culture, 16s rDNA, phylochip, FAME, and DNA hybridization), growth conditions, Gram stain, and general physiological traits (e.g., sporulation, extremotolerance, and respiration properties). Interrogations on the database show that the cleanrooms at Kennedy Space Center (KSC) are ~ 2-fold greater in diversity in bacterial genera when compared to the Jet Propulsion Laboratory (JPL), and that bacteria related to water, plant, and human environments are more often associated with the KSC-specific genera. These results are parallel to those reported in the literature, and hence serve as benchmarks demonstrating the bioinformatic potential of this meta-database. The ultimate plans for SAMM include public availability, expansion through crowdsourcing efforts, and potential use as a companion resource to the culture collections assembled by DSMZ and JPL.

  17. Genomic BLAST: custom-defined virtual databases for complete and unfinished genomes.

    PubMed

    Cummings, Leda; Riley, Leigh; Black, Lori; Souvorov, Alexander; Resenchuk, Sergei; Dondoshansky, Ilya; Tatusova, Tatiana

    2002-11-01

    BLAST (Basic Local Alignment Search Tool) searches against DNA and protein sequence databases have become an indispensable tool for biomedical research. The proliferation of the genome sequencing projects is steadily increasing the fraction of genome-derived sequences in the public databases and their importance as a public resource. We report here the availability of Genomic BLAST, a novel graphical tool for simplifying BLAST searches against complete and unfinished genome sequences. This tool allows the user to compare the query sequence against a virtual database of DNA and/or protein sequences from a selected group of organisms with finished or unfinished genomes. The organisms for such a database can be selected using either a graphic taxonomy-based tree or an alphabetical list of organism-specific sequences. The first option is designed to help explore the evolutionary relationships among organisms within a certain taxonomy group when performing BLAST searches. The use of an alphabetical list allows the user to perform a more elaborate set of selections, assembling any given number of organism-specific databases from unfinished or complete genomes. This tool, available at the NCBI web site http://www.ncbi.nlm.nih.gov/cgi-bin/Entrez/genom_table_cgi, currently provides access to over 170 bacterial and archaeal genomes and over 40 eukaryotic genomes. PMID:12435493

  18. MetaBase—the wiki-database of biological databases

    PubMed Central

    Bolser, Dan M.; Chibon, Pierre-Yves; Palopoli, Nicolas; Gong, Sungsam; Jacob, Daniel; Angel, Victoria Dominguez Del; Swan, Dan; Bassi, Sebastian; González, Virginia; Suravajhala, Prashanth; Hwang, Seungwoo; Romano, Paolo; Edwards, Rob; Bishop, Bryan; Eargle, John; Shtatland, Timur; Provart, Nicholas J.; Clements, Dave; Renfro, Daniel P.; Bhak, Daeui; Bhak, Jong

    2012-01-01

    Biology is generating more data than ever. As a result, there is an ever increasing number of publicly available databases that analyse, integrate and summarize the available data, providing an invaluable resource for the biological community. As this trend continues, there is a pressing need to organize, catalogue and rate these resources, so that the information they contain can be most effectively exploited. MetaBase (MB) (http://MetaDatabase.Org) is a community-curated database containing more than 2000 commonly used biological databases. Each entry is structured using templates and can carry various user comments and annotations. Entries can be searched, listed, browsed or queried. The database was created using the same MediaWiki technology that powers Wikipedia, allowing users to contribute on many different levels. The initial release of MB was derived from the content of the 2007 Nucleic Acids Research (NAR) Database Issue. Since then, approximately 100 databases have been manually collected from the literature, and users have added information for over 240 databases. MB is synchronized annually with the static Molecular Biology Database Collection provided by NAR. To date, there have been 19 significant contributors to the project; each one is listed as an author here to highlight the community aspect of the project. PMID:22139927

  19. Generation and Analysis of a Large-Scale Expressed Sequence Tag Database from a Full-Length Enriched cDNA Library of Developing Leaves of Gossypium hirsutum L

    PubMed Central

    Pang, Chaoyou; Fan, Shuli; Song, Meizhen; Yu, Shuxun

    2013-01-01

    Background Cotton (Gossypium hirsutum L.) is one of the world’s most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. Methodology/Principal Findings In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR), which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. Conclusions/Significance These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence assembly and annotation in G. hirsutum and comparative genomics among Gossypium species. PMID:24146870

  20. Colombian forensic genetics as a form of public science: The role of race, nation and common sense in the stabilization of DNA populations

    PubMed Central

    Schwartz-Marín, Ernesto; Wade, Peter; Cruz-Santiago, Arely; Cárdenas, Roosbelinda

    2015-01-01

    This article examines the role that vernacular notions of racialized-regional difference play in the constitution and stabilization of DNA populations in Colombian forensic science, in what we frame as a process of public science. In public science, the imaginations of the scientific world and common-sense public knowledge are integral to the production and circulation of science itself. We explore the origins and circulation of a scientific object – ‘La Tabla’, published in Paredes et al. and used in genetic forensic identification procedures – among genetic research institutes, forensic genetics laboratories and courtrooms in Bogotá. We unveil the double life of this central object of forensic genetics. On the one hand, La Tabla enjoys an indisputable public place in the processing of forensic genetic evidence in Colombia (paternity cases, identification of bodies, etc.). On the other hand, the relations it establishes between ‘race’, geography and genetics are questioned among population geneticists in Colombia. Although forensic technicians are aware of the disputes among population geneticists, they use and endorse the relations established between genetics, ‘race’ and geography because these fit with common-sense notions of visible bodily difference and the regionalization of race in the Colombian nation.

  1. The AMMA database

    NASA Astrophysics Data System (ADS)

    Boichard, Jean-Luc; Brissebrat, Guillaume; Cloche, Sophie; Eymard, Laurence; Fleury, Laurence; Mastrorillo, Laurence; Moulaye, Oumarou; Ramage, Karim

    2010-05-01

    The AMMA project includes aircraft, ground-based and ocean measurements, an intensive use of satellite data and diverse modelling studies. Therefore, the AMMA database aims at storing a great amount and a large variety of data, and at providing the data as rapidly and safely as possible to the AMMA research community. In order to stimulate the exchange of information and collaboration between researchers from different disciplines or using different tools, the database provides a detailed description of the products and uses standardized formats. The AMMA database contains: - AMMA field campaigns datasets; - historical data in West Africa from 1850 (operational networks and previous scientific programs); - satellite products from past and future satellites, (re-)mapped on a regular latitude/longitude grid and stored in NetCDF format (CF Convention); - model outputs from atmosphere or ocean operational (re-)analysis and forecasts, and from research simulations. The outputs are processed as the satellite products are. Before accessing the data, any user has to sign the AMMA data and publication policy. This chart only covers the use of data in the framework of scientific objectives and categorically excludes the redistribution of data to third parties and the usage for commercial applications. Some collaboration between data producers and users, and the mention of the AMMA project in any publication is also required. The AMMA database and the associated on-line tools have been fully developed and are managed by two teams in France (IPSL Database Centre, Paris and OMP, Toulouse). Users can access data of both data centres using an unique web portal. This website is composed of different modules : - Registration: forms to register, read and sign the data use chart when an user visits for the first time - Data access interface: friendly tool allowing to build a data extraction request by selecting various criteria like location, time, parameters... The request can concern local, satellite and model data. - Documentation: catalogue of all the available data and their metadata. These tools have been developed using standard and free languages and softwares: - Linux system with an Apache web server and a Tomcat application server; - J2EE tools : JSF and Struts frameworks, hibernate; - relational database management systems: PostgreSQL and MySQL; - OpenLDAP directory. In order to facilitate the access to the data by African scientists, the complete system has been mirrored at AGHRYMET Regional Centre in Niamey and is operational there since January 2009. Users can now access metadata and request data through one or the other of two equivalent portals: http://database.amma-international.org or http://amma.agrhymet.ne/amma-data.

  2. Targeted Sequencing for Discovery and Validation of DNA Methylation Markers of Colon Cancer Metastasis — EDRN Public Portal

    Cancer.gov

    Colon cancer is the second leading cause of cancer death in the United States. A key issue in treating colon cancer patients is inability to accurately predict tumors that have metastatic potential and require adjuvant chemotherapy. This project will test the model that tumor metastases arise from intra-tumor heterogeneity generated by DNA methylation events, and that detecting these events can provide a predictve signature of tumors with poor outcome

  3. Annotation of novel neuropeptide precursors in the migratory locust based on transcript screening of a public EST database and mass spectrometry

    PubMed Central

    Clynen, Elke; Huybrechts, Jurgen; Verleyen, Peter; De Loof, Arnold; Schoofs, Liliane

    2006-01-01

    Background For holometabolous insects there has been an explosion of proteomic and peptidomic information thanks to large genome sequencing projects. Heterometabolous insects, although comprising many important species, have been far less studied. The migratory locust Locusta migratoria, a heterometabolous insect, is one of the most infamous agricultural pests. They undergo a well-known and profound phase transition from the relatively harmless solitary form to a ferocious gregarious form. The underlying regulatory mechanisms of this phase transition are not fully understood, but it is undoubtedly that neuropeptides are involved. However, neuropeptide research in locusts is hampered by the absence of genomic information. Results Recently, EST (Expressed Sequence Tag) databases from Locusta migratoria were constructed. Using bioinformatical tools, we searched these EST databases specifically for neuropeptide precursors. Based on known locust neuropeptide sequences, we confirmed the sequence of several previously identified neuropeptide precursors (i.e. pacifastin-related peptides), which consolidated our method. In addition, we found two novel neuroparsin precursors and annotated the hitherto unknown tachykinin precursor. Besides one of the known tachykinin peptides, this EST contained an additional tachykinin-like sequence. Using neuropeptide precursors from Drosophila melanogaster as a query, we succeeded in annotating the Locusta neuropeptide F, allatostatin-C and ecdysis-triggering hormone precursor, which until now had not been identified in locusts or in any other heterometabolous insect. For the tachykinin precursor, the ecdysis-triggering hormone precursor and the allatostatin-C precursor, translation of the predicted neuropeptides in neural tissues was confirmed with mass spectrometric techniques. Conclusion In this study we describe the annotation of 6 novel neuropeptide precursors and the neuropeptides they encode from the migratory locust, Locusta migratoria. By combining the manual annotation of neuropeptides with experimental evidence provided by mass spectrometry, we demonstrate that the genes are not only transcribed but also translated into precursor proteins. In addition, we show which neuropeptides are cleaved from these precursor proteins and how they are post-translationally modified. PMID:16899111

  4. Publication Bias in Antipsychotic Trials: An Analysis of Efficacy Comparing the Published Literature to the US Food and Drug Administration Database

    PubMed Central

    Turner, Erick H.; Knoepflmacher, Daniel; Shapley, Lee

    2012-01-01

    Background Publication bias compromises the validity of evidence-based medicine, yet a growing body of research shows that this problem is widespread. Efficacy data from drug regulatory agencies, e.g., the US Food and Drug Administration (FDA), can serve as a benchmark or control against which data in journal articles can be checked. Thus one may determine whether publication bias is present and quantify the extent to which it inflates apparent drug efficacy. Methods and Findings FDA Drug Approval Packages for eight second-generation antipsychotics—aripiprazole, iloperidone, olanzapine, paliperidone, quetiapine, risperidone, risperidone long-acting injection (risperidone LAI), and ziprasidone—were used to identify a cohort of 24 FDA-registered premarketing trials. The results of these trials according to the FDA were compared with the results conveyed in corresponding journal articles. The relationship between study outcome and publication status was examined, and effect sizes derived from the two data sources were compared. Among the 24 FDA-registered trials, four (17%) were unpublished. Of these, three failed to show that the study drug had a statistical advantage over placebo, and one showed the study drug was statistically inferior to the active comparator. Among the 20 published trials, the five that were not positive, according to the FDA, showed some evidence of outcome reporting bias. However, the association between trial outcome and publication status did not reach statistical significance. Further, the apparent increase in the effect size point estimate due to publication bias was modest (8%) and not statistically significant. On the other hand, the effect size for unpublished trials (0.23, 95% confidence interval 0.07 to 0.39) was less than half that for the published trials (0.47, 95% confidence interval 0.40 to 0.54), a difference that was significant. Conclusions The magnitude of publication bias found for antipsychotics was less than that found previously for antidepressants, possibly because antipsychotics demonstrate superiority to placebo more consistently. Without increased access to regulatory agency data, publication bias will continue to blur distinctions between effective and ineffective drugs. Please see later in the article for the Editors' Summary PMID:22448149

  5. Database of recent tsunami deposits

    USGS Publications Warehouse

    Peters, Robert; Jaffe, Bruce E.

    2010-01-01

    This report describes a database of sedimentary characteristics of tsunami deposits derived from published accounts of tsunami deposit investigations conducted shortly after the occurrence of a tsunami. The database contains 228 entries, each entry containing data from up to 71 categories. It includes data from 51 publications covering 15 tsunamis distributed between 16 countries. The database encompasses a wide range of depositional settings including tropical islands, beaches, coastal plains, river banks, agricultural fields, and urban environments. It includes data from both local tsunamis and teletsunamis. The data are valuable for interpreting prehistorical, historical, and modern tsunami deposits, and for the development of criteria to identify tsunami deposits in the geologic record.

  6. Molecular Identification and Databases in Fusarium

    Technology Transfer Automated Retrieval System (TEKTRAN)

    DNA sequence-based methods for identifying pathogenic and mycotoxigenic Fusarium isolates have become the gold standard worldwide. Moreover, fusarial DNA sequence data are increasing rapidly in several web-accessible databases for comparative purposes. Unfortunately, the use of Basic Alignment Sea...

  7. The Porcelain Crab Transcriptome and PCAD, the Porcelain Crab Microarray and Sequence Database

    SciTech Connect

    Tagmount, Abderrahmane; Wang, Mei; Lindquist, Erika; Tanaka, Yoshihiro; Teranishi, Kristen S.; Sunagawa, Shinichi; Wong, Mike; Stillman, Jonathon H.

    2010-01-27

    Background: With the emergence of a completed genome sequence of the freshwater crustacean Daphnia pulex, construction of genomic-scale sequence databases for additional crustacean sequences are important for comparative genomics and annotation. Porcelain crabs, genus Petrolisthes, have been powerful crustacean models for environmental and evolutionary physiology with respect to thermal adaptation and understanding responses of marine organisms to climate change. Here, we present a large-scale EST sequencing and cDNA microarray database project for the porcelain crab Petrolisthes cinctipes. Methodology/Principal Findings: A set of ~;;30K unique sequences (UniSeqs) representing ~;;19K clusters were generated from ~;;98K high quality ESTs from a set of tissue specific non-normalized and mixed-tissue normalized cDNA libraries from the porcelain crab Petrolisthes cinctipes. Homology for each UniSeq was assessed using BLAST, InterProScan, GO and KEGG database searches. Approximately 66percent of the UniSeqs had homology in at least one of the databases. All EST and UniSeq sequences along with annotation results and coordinated cDNA microarray datasets have been made publicly accessible at the Porcelain Crab Array Database (PCAD), a feature-enriched version of the Stanford and Longhorn Array Databases.Conclusions/Significance: The EST project presented here represents the third largest sequencing effort for any crustacean, and the largest effort for any crab species. Our assembly and clustering results suggest that our porcelain crab EST data set is equally diverse to the much larger EST set generated in the Daphnia pulex genome sequencing project, and thus will be an important resource to the Daphnia research community. Our homology results support the pancrustacea hypothesis and suggest that Malacostraca may be ancestral to Branchiopoda and Hexapoda. Our results also suggest that our cDNA microarrays cover as much of the transcriptome as can reasonably be captured in EST library sequencing approaches, and thus represent a rich resource for studies of environmental genomics.

  8. The 2013 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection

    PubMed Central

    Fernández-Suárez, Xosé M.; Galperin, Michael Y.

    2013-01-01

    The 20th annual Database Issue of Nucleic Acids Research includes 176 articles, half of which describe new online molecular biology databases and the other half provide updates on the databases previously featured in NAR and other journals. This year’s highlights include two databases of DNA repeat elements; several databases of transcriptional factors and transcriptional factor-binding sites; databases on various aspects of protein structure and protein–protein interactions; databases for metagenomic and rRNA sequence analysis; and four databases specifically dedicated to Escherichia coli. The increased emphasis on using the genome data to improve human health is reflected in the development of the databases of genomic structural variation (NCBI’s dbVar and EBI’s DGVa), the NIH Genetic Testing Registry and several other databases centered on the genetic basis of human disease, potential drugs, their targets and the mechanisms of protein–ligand binding. Two new databases present genomic and RNAseq data for monkeys, providing wealth of data on our closest relatives for comparative genomics purposes. The NAR online Molecular Biology Database Collection, available at http://www.oxfordjournals.org/nar/database/a/, has been updated and currently lists 1512 online databases. The full content of the Database Issue is freely available online on the Nucleic Acids Research website (http://nar.oxfordjournals.org/). PMID:23203983

  9. ARTI Refrigerant Database

    SciTech Connect

    Calm, J.M.

    1992-11-09

    The database provides bibliographic citations and abstracts for publications that may be useful in research and design of air- conditioning and refrigeration equipment. The database identifies sources of specific information on R-32, R-123, R-124, R-125, R-134, R-134a, R-141b, R-142b, R-143a, R-152a, R-245ca, R-290 (propane), R- 717 (ammonia), ethers, and others as well as azeotropic and zeotropic and zeotropic blends of these fluids. It addresses lubricants including alkylbenzene, polyalkylene glycol, ester, and other synthetics as well as mineral oils. It also references documents on compatibility of refrigerants and lubricants with metals, plastics, elastomers, motor insulation, and other materials used in refrigerant circuits. A computerized version is available that includes retrieval software.

  10. REDIdb: the RNA editing database

    PubMed Central

    Picardi, Ernesto; Regina, Teresa Maria Rosaria; Brennicke, Axel; Quagliariello, Carla

    2007-01-01

    The RNA Editing Database (REDIdb) is an interactive, web-based database created and designed with the aim to allocate RNA editing events such as substitutions, insertions and deletions occurring in a wide range of organisms. The database contains both fully and partially sequenced DNA molecules for which editing information is available either by experimental inspection (in vitro) or by computational detection (in silico). Each record of REDIdb is organized in a specific flat-file containing a description of the main characteristics of the entry, a feature table with the editing events and related details and a sequence zone with both the genomic sequence and the corresponding edited transcript. REDIdb is a relational database in which the browsing and identification of editing sites has been simplified by means of two facilities to either graphically display genomic or cDNA sequences or to show the corresponding alignment. In both cases, all editing sites are highlighted in colour and their relative positions are detailed by mousing over. New editing positions can be directly submitted to REDIdb after a user-specific registration to obtain authorized secure access. This first version of REDIdb database stores 9964 editing events and can be freely queried at . PMID:17175530

  11. Stackfile Database

    NASA Technical Reports Server (NTRS)

    deVarvalho, Robert; Desai, Shailen D.; Haines, Bruce J.; Kruizinga, Gerhard L.; Gilmer, Christopher

    2013-01-01

    This software provides storage retrieval and analysis functionality for managing satellite altimetry data. It improves the efficiency and analysis capabilities of existing database software with improved flexibility and documentation. It offers flexibility in the type of data that can be stored. There is efficient retrieval either across the spatial domain or the time domain. Built-in analysis tools are provided for frequently performed altimetry tasks. This software package is used for storing and manipulating satellite measurement data. It was developed with a focus on handling the requirements of repeat-track altimetry missions such as Topex and Jason. It was, however, designed to work with a wide variety of satellite measurement data [e.g., Gravity Recovery And Climate Experiment -- GRACE). The software consists of several command-line tools for importing, retrieving, and analyzing satellite measurement data.

  12. Open Geoscience Database

    NASA Astrophysics Data System (ADS)

    Bashev, A.

    2012-04-01

    Currently there is an enormous amount of various geoscience databases. Unfortunately the only users of the majority of the databases are their elaborators. There are several reasons for that: incompaitability, specificity of tasks and objects and so on. However the main obstacles for wide usage of geoscience databases are complexity for elaborators and complication for users. The complexity of architecture leads to high costs that block the public access. The complication prevents users from understanding when and how to use the database. Only databases, associated with GoogleMaps don't have these drawbacks, but they could be hardly named "geoscience" Nevertheless, open and simple geoscience database is necessary at least for educational purposes (see our abstract for ESSI20/EOS12). We developed a database and web interface to work with them and now it is accessible at maps.sch192.ru. In this database a result is a value of a parameter (no matter which) in a station with a certain position, associated with metadata: the date when the result was obtained; the type of a station (lake, soil etc); the contributor that sent the result. Each contributor has its own profile, that allows to estimate the reliability of the data. The results can be represented on GoogleMaps space image as a point in a certain position, coloured according to the value of the parameter. There are default colour scales and each registered user can create the own scale. The results can be also extracted in *.csv file. For both types of representation one could select the data by date, object type, parameter type, area and contributor. The data are uploaded in *.csv format: Name of the station; Lattitude(dd.dddddd); Longitude(ddd.dddddd); Station type; Parameter type; Parameter value; Date(yyyy-mm-dd). The contributor is recognised while entering. This is the minimal set of features that is required to connect a value of a parameter with a position and see the results. All the complicated data treatment could be conducted in other programs after extraction the filtered data into *.csv file. It makes the database understandable for non-experts. The database employs open data format (*.csv) and wide spread tools: PHP as the program language, MySQL as database management system, JavaScript for interaction with GoogleMaps and JQueryUI for create user interface. The database is multilingual: there are association tables, which connect with elements of the database. In total the development required about 150 hours. The database still has several problems. The main problem is the reliability of the data. Actually it needs an expert system for estimation the reliability, but the elaboration of such a system would take more resources than the database itself. The second problem is the problem of stream selection - how to select the stations that are connected with each other (for example, belong to one water stream) and indicate their sequence. Currently the interface is English and Russian. However it can be easily translated to your language. But some problems we decided. For example problem "the problem of the same station" (sometimes the distance between stations is smaller, than the error of position): when you adding new station to the database our application automatically find station near this place. Also we decided problem of object and parameter type (how to regard "EC" and "electrical conductivity" as the same parameter). This problem has been solved using "associative tables". If you would like to see the interface on your language, just contact us. We should send you the list of terms and phrases for translation on your language. The main advantage of the database is that it is totally open: everybody can see, extract the data from the database and use them for non-commercial purposes with no charge. Registered users can contribute to the database without getting paid. We hope, that it will be widely used first of all for education purposes, but professional scientists could use it also.

  13. The Hawaiian Freshwater Algal Database (HfwADB): a laboratory LIMS and online biodiversity resource

    PubMed Central

    2012-01-01

    Background Biodiversity databases serve the important role of highlighting species-level diversity from defined geographical regions. Databases that are specially designed to accommodate the types of data gathered during regional surveys are valuable in allowing full data access and display to researchers not directly involved with the project, while serving as a Laboratory Information Management System (LIMS). The Hawaiian Freshwater Algal Database, or HfwADB, was modified from the Hawaiian Algal Database to showcase non-marine algal specimens collected from the Hawaiian Archipelago by accommodating the additional level of organization required for samples including multiple species. Description The Hawaiian Freshwater Algal Database is a comprehensive and searchable database containing photographs and micrographs of samples and collection sites, geo-referenced collecting information, taxonomic data and standardized DNA sequence data. All data for individual samples are linked through unique 10-digit accession numbers (“Isolate Accession”), the first five of which correspond to the collection site (“Environmental Accession”). Users can search online for sample information by accession number, various levels of taxonomy, habitat or collection site. HfwADB is hosted at the University of Hawaii, and was made publicly accessible in October 2011. At the present time the database houses data for over 2,825 samples of non-marine algae from 1,786 collection sites from the Hawaiian Archipelago. These samples include cyanobacteria, red and green algae and diatoms, as well as lesser representation from some other algal lineages. Conclusions HfwADB is a digital repository that acts as a Laboratory Information Management System for Hawaiian non-marine algal data. Users can interact with the repository through the web to view relevant habitat data (including geo-referenced collection locations) and download images of collection sites, specimen photographs and micrographs, and DNA sequences. It is publicly available at http://algae.manoa.hawaii.edu/hfwadb/. PMID:23095476

  14. Rapid and accurate identification of microorganisms contaminating cosmetic products based on DNA sequence homology.

    PubMed

    Fujita, Y; Shibayama, H; Suzuki, Y; Karita, S; Takamatsu, S

    2005-12-01

    The aim of this study was to develop rapid and accurate procedures to identify microorganisms contaminating cosmetic products, based on the identity of the nucleotide sequences of the internal transcribed spacer (ITS) region of the ribosomal RNA coding DNA (rDNA). Five types of microorganisms were isolated from the inner portion of lotion bottle caps, skin care lotions, and cleansing gels. The rDNA ITS region of microorganisms was amplified through the use of colony-direct PCR or ordinal PCR using DNA extracts as templates. The nucleotide sequences of the amplified DNA were determined and subjected to homology search of a publicly available DNA database. Thereby, we obtained DNA sequences possessing high similarity with the query sequences from the databases of all the five organisms analyzed. The traditional identification procedure requires expert skills, and a time period of approximately 1 month to identify the microorganisms. On the contrary, 3-7 days were sufficient to complete all the procedures employed in the current method, including isolation and cultivation of organisms, DNA sequencing, and the database homology search. Moreover, it was possible to develop the skills necessary to perform the molecular techniques required for the identification procedures within 1 week. Consequently, the current method is useful for rapid and accurate identification of microorganisms, contaminating cosmetics. PMID:18492168

  15. Database Marketplace 2002: The Database Universe.

    ERIC Educational Resources Information Center

    Tenopir, Carol; Baker, Gayle; Robinson, William

    2002-01-01

    Reviews the database industry over the past year, including new companies and services, company closures, popular database formats, popular access methods, and changes in existing products and services. Lists 33 firms and their database services; 33 firms and their database products; and 61 company profiles. (LRW)

  16. The IPD and IMGT/HLA database: allele variant databases

    PubMed Central

    Robinson, James; Halliwell, Jason A.; Hayhurst, James D.; Flicek, Paul; Parham, Peter; Marsh, Steven G. E.

    2015-01-01

    The Immuno Polymorphism Database (IPD) was developed to provide a centralized system for the study of polymorphism in genes of the immune system. Through the IPD project we have established a central platform for the curation and publication of locus-specific databases involved either directly or related to the function of the Major Histocompatibility Complex in a number of different species. We have collaborated with specialist groups or nomenclature committees that curate the individual sections before they are submitted to IPD for online publication. IPD consists of five core databases, with the IMGT/HLA Database as the primary database. Through the work of the various nomenclature committees, the HLA Informatics Group and in collaboration with the European Bioinformatics Institute we are able to provide public access to this data through the website http://www.ebi.ac.uk/ipd/. The IPD project continues to develop with new tools being added to address scientific developments, such as Next Generation Sequencing, and to address user feedback and requests. Regular updates to the website ensure that new and confirmatory sequences are dispersed to the immunogenetics community, and the wider research and clinical communities. PMID:25414341

  17. Did intense adverse media publicity impact on prescribing of paroxetine and the notification of suspected adverse drug reactions? Analysis of routine databases, 2001–2004

    PubMed Central

    Martin, Richard M; May, Margaret; Gunnell, David

    2006-01-01

    Aim To document the impact on clinical practice in England of media attention around possible adverse effects of paroxetine. Design Analysis of national selective serotonin reuptake inhibitor (SSRI) prescribing trends and yellow-card adverse drug reaction reports, 2001–2004. Results From a steady state in 2001, paroxetine prescribing declined sharply from April 2002, coinciding with a USA regulatory action; the subsequent decline in paroxetine prescribing was 1.87% per month (95% confidence interval −2.06, −1.68). Other SSRI prescribing increased by 1% per month until a major UK review of SSRIs in children in December 2003, after which prescribing plateaued. Media publicity was associated with short-term peaks in yellow-card reports related to paroxetine. Conclusion Falls in paroxetine and other SSRI prescribing in the UK coincided, respectively, with regulatory communications from the USA and the UK, but associations may have noncausal or other explanations. Reports of adverse reactions to paroxetine appeared to increase after adverse media publicity about the drug. PMID:16433877

  18. Kazusa Marker DataBase: a database for genomics, genetics, and molecular breeding in plants.

    PubMed

    Shirasawa, Kenta; Isobe, Sachiko; Tabata, Satoshi; Hirakawa, Hideki

    2014-09-01

    In order to provide useful genomic information for agronomical plants, we have established a database, the Kazusa Marker DataBase (http://marker.kazusa.or.jp). This database includes information on DNA markers, e.g., SSR and SNP markers, genetic linkage maps, and physical maps, that were developed at the Kazusa DNA Research Institute. Keyword searches for the markers, sequence data used for marker development, and experimental conditions are also available through this database. Currently, 10 plant species have been targeted: tomato (Solanum lycopersicum), pepper (Capsicum annuum), strawberry (Fragaria × ananassa), radish (Raphanus sativus), Lotus japonicus, soybean (Glycine max), peanut (Arachis hypogaea), red clover (Trifolium pratense), white clover (Trifolium repens), and eucalyptus (Eucalyptus camaldulensis). In addition, the number of plant species registered in this database will be increased as our research progresses. The Kazusa Marker DataBase will be a useful tool for both basic and applied sciences, such as genomics, genetics, and molecular breeding in crops. PMID:25320561

  19. Kazusa Marker DataBase: a database for genomics, genetics, and molecular breeding in plants

    PubMed Central

    Shirasawa, Kenta; Isobe, Sachiko; Tabata, Satoshi; Hirakawa, Hideki

    2014-01-01

    In order to provide useful genomic information for agronomical plants, we have established a database, the Kazusa Marker DataBase (http://marker.kazusa.or.jp). This database includes information on DNA markers, e.g., SSR and SNP markers, genetic linkage maps, and physical maps, that were developed at the Kazusa DNA Research Institute. Keyword searches for the markers, sequence data used for marker development, and experimental conditions are also available through this database. Currently, 10 plant species have been targeted: tomato (Solanum lycopersicum), pepper (Capsicum annuum), strawberry (Fragaria × ananassa), radish (Raphanus sativus), Lotus japonicus, soybean (Glycine max), peanut (Arachis hypogaea), red clover (Trifolium pratense), white clover (Trifolium repens), and eucalyptus (Eucalyptus camaldulensis). In addition, the number of plant species registered in this database will be increased as our research progresses. The Kazusa Marker DataBase will be a useful tool for both basic and applied sciences, such as genomics, genetics, and molecular breeding in crops. PMID:25320561

  20. Mining of public sequencing databases supports a non-dietary origin for putative foreign miRNAs: underestimated effects of contamination in NGS

    PubMed Central

    Tosar, Juan Pablo; Rovira, Carlos; Naya, Hugo; Cayota, Alfonso

    2014-01-01

    The report that exogenous plant miRNAs are able to cross the mammalian gastrointestinal tract and exert gene-regulation mechanism in mammalian tissues has yielded a lot of controversy, both in the public press and the scientific literature. Despite the initial enthusiasm, reproducibility of these results was recently questioned by several authors. To analyze the causes of this unease, we searched for diet-derived miRNAs in deep-sequencing libraries performed by ourselves and others. We found variable amounts of plant miRNAs in publicly available small RNA-seq data sets of human tissues. In human spermatozoa, exogenous RNAs reached extreme, biologically meaningless levels. On the contrary, plant miRNAs were not detected in our sequencing of human sperm cells, which was performed in the absence of any known sources of plant contamination. We designed an experiment to show that cross-contamination during library preparation is a source of exogenous RNAs. These contamination-derived exogenous sequences even resisted oxidation with sodium periodate. To test the assumption that diet-derived miRNAs were actually contamination-derived, we sought in the literature for previous sequencing reports performed by the same group which reported the initial finding. We analyzed the spectra of plant miRNAs in a small RNA sequencing study performed in amphioxus by this group in 2009 and we found a very strong correlation with the plant miRNAs which they later reported in human sera. Even though contamination with exogenous sequences may be easy to detect, cross-contamination between samples from the same organism can go completely unnoticed, possibly affecting conclusions derived from NGS transcriptomics. PMID:24729469

  1. The taming of an impossible child: a standardized all-in approach to the phylogeny of Hymenoptera using public database sequences

    PubMed Central

    2011-01-01

    Background Enormous molecular sequence data have been accumulated over the past several years and are still exponentially growing with the use of faster and cheaper sequencing techniques. There is high and widespread interest in using these data for phylogenetic analyses. However, the amount of data that one can retrieve from public sequence repositories is virtually impossible to tame without dedicated software that automates processes. Here we present a novel bioinformatics pipeline for downloading, formatting, filtering and analyzing public sequence data deposited in GenBank. It combines some well-established programs with numerous newly developed software tools (available at http://software.zfmk.de/). Results We used the bioinformatics pipeline to investigate the phylogeny of the megadiverse insect order Hymenoptera (sawflies, bees, wasps and ants) by retrieving and processing more than 120,000 sequences and by selecting subsets under the criteria of compositional homogeneity and defined levels of density and overlap. Tree reconstruction was done with a partitioned maximum likelihood analysis from a supermatrix with more than 80,000 sites and more than 1,100 species. In the inferred tree, consistent with previous studies, "Symphyta" is paraphyletic. Within Apocrita, our analysis suggests a topology of Stephanoidea + (Ichneumonoidea + (Proctotrupomorpha + (Evanioidea + Aculeata))). Despite the huge amount of data, we identified several persistent problems in the Hymenoptera tree. Data coverage is still extremely low, and additional data have to be collected to reliably infer the phylogeny of Hymenoptera. Conclusions While we applied our bioinformatics pipeline to Hymenoptera, we designed the approach to be as general as possible. With this pipeline, it is possible to produce phylogenetic trees for any taxonomic group and to monitor new data and tree robustness in a taxon of interest. It therefore has great potential to meet the challenges of the phylogenomic era and to deepen our understanding of the tree of life. PMID:21851592

  2. XMAn: a Homo sapiens mutated-peptide database for the MS analysis of cancerous cell states.

    PubMed

    Yang, Xu; Lazar, Iulia M

    2014-12-01

    To enable the identification of mutated peptide sequences in complex biological samples, in this work, two novel cancer- and disease-related protein databases with mutation information collected from several public resources such as COSMIC, IARC P53, OMIM, and UniProtKB were developed. In-house developed Perl scripts were used to search and process the data and to translate each gene-level mutation into a mutated peptide sequence. The cancer and disease mutation databases comprise a total of 872,125 and 27,148 peptide entries from 25 642 and 2913 proteins, respectively. A description line for each entry provides the parent protein ID and name, the cDNA- and protein-level mutation site and type, the originating database, and the disease or cancer tissue type and corresponding hits. The two databases are FASTA-formatted to enable data retrieval by commonly used tandem MS search engines. While the largest number of mutations were encountered for the amino acids A/D/E/G/L/P/R/S, the global mutation profiles replicate closely the outcome of the 1000 Genomes Project aimed at cataloguing natural mutations in the human population. The affected proteins were primarily involved in transcription regulation, splicing, protein synthesis/folding/binding, redox/energy production, adhesion/motility, and to some extent in DNA damage repair and signaling. The applicability of the database to identifying the presence of mutated peptides was investigated with MCF-7 breast cancer cell extracts. PMID:25211293

  3. MtDNA GeneExtractor: a computer tool for mtDNA gene/region information extraction.

    PubMed

    Freitas, Fernando; Oliveira, Sandra; Rocha, Ricardo; Pereira, Luísa

    2009-02-01

    The analysis of considerable numbers of DNA sequences is largely dependent on the development of simple software tools for automatically process the genetic data deposited on public databases. However, there are some difficulties in the automation process due to diverse synonyms being used as qualifiers for genes and some inconsistencies in gene locations between related Primate species, this fact happening even in the carefully curated database RefSeq. Here, we present mtDNA GeneEXtractor, a Windows based computer tool developed for the extraction of information for particular gene/regions from mammal mitochondrial DNA sequences deposited under GenBank format. The tool was quite efficient in retrieving organized information for comparative mtDNA gene/region diversity analyses when tested for the evaluation of transition/transversion ratios in humans and between Primates. Taking phylogenetic information into account to avoid redundancy due to ancestry-sharing, the transition/transversion ratios in the 13 protein-coding genes had a mean value of 12.46 for Primates (from 6.46 in ND2 to 17.04 in COX1) and higher (34.74) but more heterogeneous (ranging from 17.30 in ND5 to 74.39 in ND4) in a worldwide human database. The similar patterns of transition/transversion ratios in all positions and in only four fold degenerate positions show no evidence for selection in the 13 mtDNA protein-coding genes. PMID:19070686

  4. FishMicrosat: a microsatellite database of commercially important fishes and shellfishes of the Indian subcontinent

    PubMed Central

    2013-01-01

    Background Microsatellite DNA is one of many powerful genetic markers used for the construction of genetic linkage maps and the study of population genetics. The biological databases in public domain hold vast numbers of microsatellite sequences for many organisms including fishes. The microsatellite data available in these data sources were extracted and managed into a database that facilitates sequences analysis and browsing relevant information. The system also helps to design primer sequences for flanking regions of repeat loci for PCR identification of polymorphism within populations. Description FishMicrosat is a database of microsatellite sequences of fishes and shellfishes that includes important aquaculture species such as Lates calcarifer, Ctenopharyngodon idella, Hypophthalmichthys molitrix, Penaeus monodon, Labeo rohita, Oreochromis niloticus, Fenneropenaeus indicus and Macrobrachium rosenbergii. The database contains 4398 microsatellite sequences of 41 species belonging to 15 families from the Indian subcontinent. GenBank of NCBI was used as a prime data source for developing the database. The database presents information about simple and compound microsatellites, their clusters and locus orientation within sequences. The database has been integrated with different tools in a web interface such as primer designing, locus finding, mapping repeats, detecting similarities among sequences across species, and searching using motifs and keywords. In addition, the database has the ability to browse information on the top 10 families and the top 10 species, through record overview. Conclusions FishMicrosat database is a useful resource for fish and shellfish microsatellite analyses and locus identification across species, which has important applications in population genetics, evolutionary studies and genetic relatedness among species. The database can be expanded further to include the microsatellite data of fishes and shellfishes from other regions and available information on genome sequencing project of species of aquaculture importance. PMID:24047532

  5. Curcumin Resource Database

    PubMed Central

    Kumar, Anil; Chetia, Hasnahana; Sharma, Swagata; Kabiraj, Debajyoti; Talukdar, Narayan Chandra; Bora, Utpal

    2015-01-01

    Curcumin is one of the most intensively studied diarylheptanoid, Curcuma longa being its principal producer. This apart, a class of promising curcumin analogs has been generated in laboratories, aptly named as Curcuminoids which are showing huge potential in the fields of medicine, food technology, etc. The lack of a universal source of data on curcumin as well as curcuminoids has been felt by the curcumin research community for long. Hence, in an attempt to address this stumbling block, we have developed Curcumin Resource Database (CRDB) that aims to perform as a gateway-cum-repository to access all relevant data and related information on curcumin and its analogs. Currently, this database encompasses 1186 curcumin analogs, 195 molecular targets, 9075 peer reviewed publications, 489 patents and 176 varieties of C. longa obtained by extensive data mining and careful curation from numerous sources. Each data entry is identified by a unique CRDB ID (identifier). Furnished with a user-friendly web interface and in-built search engine, CRDB provides well-curated and cross-referenced information that are hyperlinked with external sources. CRDB is expected to be highly useful to the researchers working on structure as well as ligand-based molecular design of curcumin analogs. Database URL: http://www.crdb.in PMID:26220923

  6. ARTI Refrigerant Database

    SciTech Connect

    Cain, J.M. , Great Falls, VA )

    1993-04-30

    The Refrigerant Database consolidates and facilitates access to information to assist industry in developing equipment using alternative refrigerants. The underlying purpose is to accelerate phase out of chemical compounds of environmental concern. The database provides bibliographic citations and abstracts for publications that may be useful in research and design of air-conditioning and refrigeration equipment. The complete documents are not included. The database identifies sources of specific information on R-32, R-123, R-124, R-125, R-134, R-134a, R-141b, R-142b, R-143a, R-152a, R-245ca, R-290 (propane), R-717 (ammonia), ethers, and others as well as azeotropic and zeotropic blends of these fluids. It addresses lubricants including alkylbenzene, polyalkylene glycol, ester, and other synthetics as well as mineral oils. It also references documents addressing compatibility of refrigerants and lubricants with metals, plastics, elastomers, motor insulation, and other materials used in refrigerant circuits. Incomplete citations or abstracts are provided for some documents to accelerate availability of the information and will be completed or replaced in future updates.

  7. The Cambridge Structural Database.

    PubMed

    Groom, Colin R; Bruno, Ian J; Lightfoot, Matthew P; Ward, Suzanna C

    2016-04-01

    The Cambridge Structural Database (CSD) contains a complete record of all published organic and metal-organic small-molecule crystal structures. The database has been in operation for over 50 years and continues to be the primary means of sharing structural chemistry data and knowledge across disciplines. As well as structures that are made public to support scientific articles, it includes many structures published directly as CSD Communications. All structures are processed both computationally and by expert structural chemistry editors prior to entering the database. A key component of this processing is the reliable association of the chemical identity of the structure studied with the experimental data. This important step helps ensure that data is widely discoverable and readily reusable. Content is further enriched through selective inclusion of additional experimental data. Entries are available to anyone through free CSD community web services. Linking services developed and maintained by the CCDC, combined with the use of standard identifiers, facilitate discovery from other resources. Data can also be accessed through CCDC and third party software applications and through an application programming interface. PMID:27048719

  8. ARTI refrigerant database

    SciTech Connect

    Calm, J.M.

    1998-08-01

    The Refrigerant Database is an information system on alternative refrigerants, associated lubricants, and their use in air conditioning and refrigeration. It consolidates and facilitates access to property, compatibility, environmental, safety, application and other information. It provides corresponding information on older refrigerants, to assist manufactures and those using alternative refrigerants, to make comparisons and determine differences. The underlying purpose is to accelerate phase out of chemical compounds of environmental concern. The database provides bibliographic citations and abstracts for publications that may be useful in research and design of air-conditioning and refrigeration equipment. The complete documents are not included, though some may be added at a later date. The database identifies sources of specific information on many refrigerants including propane, ammonia, water, carbon dioxide, propylene, ethers, and others as well as azeotropic and zeotropic blends of these fluids. It addresses lubricants including alkylbenzene, polyalkylene glycol, polyolester, and other synthetics as well as mineral oils. It also references documents addressing compatibility of refrigerants and lubricants with metals, plastics, elastomers, motor insulation, and other materials used in refrigerant circuits. Incomplete citations or abstracts are provided for some documents. They are included to accelerate availability of the information and will be completed or replaced in future updates.

  9. The Cambridge Structural Database

    PubMed Central

    Groom, Colin R.; Bruno, Ian J.; Lightfoot, Matthew P.; Ward, Suzanna C.

    2016-01-01

    The Cambridge Structural Database (CSD) contains a complete record of all published organic and metal–organic small-molecule crystal structures. The database has been in operation for over 50 years and continues to be the primary means of sharing structural chemistry data and knowledge across disciplines. As well as structures that are made public to support scientific articles, it includes many structures published directly as CSD Communications. All structures are processed both computationally and by expert structural chemistry editors prior to entering the database. A key component of this processing is the reliable association of the chemical identity of the structure studied with the experimental data. This important step helps ensure that data is widely discoverable and readily reusable. Content is further enriched through selective inclusion of additional experimental data. Entries are available to anyone through free CSD community web services. Linking services developed and maintained by the CCDC, combined with the use of standard identifiers, facilitate discovery from other resources. Data can also be accessed through CCDC and third party software applications and through an application programming interface. PMID:27048719

  10. ARTI refrigerant database

    SciTech Connect

    Calm, J.M.

    1996-04-15

    The Refrigerant Database is an information system on alternative refrigerants, associated lubricants, and their use in air conditioning and refrigeration. It consolidates and facilitates access to property, compatibility, environmental, safety, application and other information. It provides corresponding information on older refrigerants, to assist manufacturers and those using alternative refrigerants, to make comparisons and determine differences. The underlying purpose is to accelerate phase out of chemical compounds of environmental concern. The database provides bibliographic citations and abstracts for publications that may be useful in research and design of air-conditioning and refrigeration equipment. The complete documents are not included, though some may be added at a later date. The database identifies sources of specific information on refrigerants. It addresses lubricants including alkylbenzene, polyalkylene glycol, polyolester, and other synthetics as well as mineral oils. It also references documents addressing compatibility of refrigerants and lubricants with metals, plastics, elastomers, motor insulation, and other materials used in refrigerant circuits. Incomplete citations or abstracts are provided for some documents. They are included to accelerate availability of the information and will be completed or replaced in future updates. Citations in this report are divided into the following topics: thermophysical properties; materials compatibility; lubricants and tribology; application data; safety; test and analysis methods; impacts; regulatory actions; substitute refrigerants; identification; absorption and adsorption; research programs; and miscellaneous documents. Information is also presented on ordering instructions for the computerized version.

  11. ARTI refrigerant database

    SciTech Connect

    Calm, J.M.

    1997-02-01

    The Refrigerant Database is an information system on alternative refrigerants, associated lubricants, and their use in air conditioning and refrigeration. It consolidates and facilitates access to property, compatibility, environmental, safety, application and other information. It provides corresponding information on older refrigerants, to assist manufacturers and those using alterative refrigerants, to make comparisons and determine differences. The underlying purpose is to accelerate phase out of chemical compounds of environmental concern. The database provides bibliographic citations and abstracts for publications that may be useful in research and design of air-conditioning and refrigeration equipment. The complete documents are not included, though some may be added at a later date. The database identifies sources of specific information on various refrigerants. It addresses lubricants including alkylbenzene, polyalkylene glycol, polyolester, and other synthetics as well as mineral oils. It also references documents addressing compatibility of refrigerants and lubricants with metals, plastics, elastomers, motor insulation, and other materials used in refrigerant circuits. Incomplete citations or abstracts are provided for some documents. They are included to accelerate availability of the information and will be completed or replaced in future updates.

  12. The EMBL Nucleotide Sequence Database.

    PubMed

    Stoesser, Guenter; Baker, Wendy; van den Broek, Alexandra; Camon, Evelyn; Garcia-Pastor, Maria; Kanz, Carola; Kulikova, Tamara; Leinonen, Rasko; Lin, Quan; Lombard, Vincent; Lopez, Rodrigo; Redaschi, Nicole; Stoehr, Peter; Tuli, Mary Ann; Tzouvara, Katerina; Vaughan, Robert

    2002-01-01

    The EMBL Nucleotide Sequence Database (aka EMBL-Bank; http://www.ebi.ac.uk/embl/) incorporates, organises and distributes nucleotide sequences from all available public sources. EMBL-Bank is located and maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK. In an international collaboration with DDBJ (Japan) and GenBank (USA), data are exchanged amongst the collaborating databases on a daily basis. Major contributors to the EMBL database are individual scientists and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via FTP, email and World Wide Web interfaces. EBI's Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many other specialized databases. For sequence similarity searching, a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. All resources can be accessed via the EBI home page at http://www.ebi.ac.uk. PMID:11752244

  13. Introducing a High-Risk HPV DNA Test Into a Public Sector Screening Program in El Salvador

    PubMed Central

    Cremer, Miriam L.; Maza, Mauricio; Alfaro, Karla M.; Kim, Jane J.; Ditzian, Lauren R.; Villalta, Sofia; Alonzo, Todd A.; Felix, Juan C.; Castle, Philip E.; Gage, Julia C.

    2016-01-01

    Objective In a primary human papillomavirus (HPV) screening program, we compared the 6-month follow-up among colposcopy and noncolposcopy-based management strategies for screen-positive women. Materials and Methods Women aged 30 to 49 years were screened with HPV DNA tests using both self-collection and provider collection of samples. Women testing positive received either (1) colposcopy management (CM) consisting of colposcopy and management per local guidelines or (2) screen-and-treat (ST) management using visual inspection with acetic acid to determine cryotherapy eligibility, with eligible women undergoing immediate cryotherapy. One thousand women were recruited in each cohort. Of these, 368 (18.4%) of 2000 women were recruited using a more intensive outreach strategy. Demographics, HPV positivity, and treatment compliance were compared across recruitment and management strategies. Results More women in the ST cohort received treatment within 6 months compared with those in the CM cohort (117/119 [98.3%] vs 64/93 [68.8%]; p < .001). Women recruited through more intensive outreach were more likely to be HPV positive, lived in urban areas, were more educated, and had higher numbers of lifetime sexual partners and fewer children. Conclusions Women in the CM arm were less likely to complete care than women in the ST arm. Targeted outreach to underscreened women successfully identified women with higher prevalence of HPV and possibly higher disease burden. PMID:26890683

  14. REFEREE: BIBLIOGRAPHIC DATABASE MANAGER, DOCUMENTATION

    EPA Science Inventory

    The publication is the user's manual for 3.xx releases of REFEREE, a general-purpose bibliographic database management program for IBM-compatible microcomputers. The REFEREE software also is available from NTIS. The manual has two main sections--Quick Tour and References Guide--a...

  15. FlyVar: a database for genetic variation in Drosophila melanogaster

    PubMed Central

    Wang, Fei; Jiang, Lichun; Chen, Yong; Haelterman, Nele A.; Bellen, Hugo J.; Chen, Rui

    2015-01-01

    FlyVar is a publicly and freely available platform that addresses the increasing need of next generation sequencing data analysis in the Drosophila research community. It is composed of three parts. First, a database that contains 5.94 million DNA polymorphisms found in Drosophila melanogaster derived from whole genome shotgun sequencing of 612 genomes of D. melanogaster. In addition, a list of 1094 dispensable genes has been identified. Second, a graphical user interface (GUI) has been implemented to allow easy and flexible queries of the database. Third, a set of interactive online tools enables filtering and annotation of genomic sequences obtained from individual D. melanogaster strains to identify candidate mutations. FlyVar permits the analysis of next generation sequencing data without the need of extensive computational training or resources. Database URL: www.iipl.fudan.edu.cn/FlyVar. PMID:26289428

  16. The Molecule Pages database

    PubMed Central

    Saunders, Brian; Lyon, Stephen; Day, Matthew; Riley, Brenda; Chenette, Emily; Subramaniam, Shankar

    2008-01-01

    The UCSD-Nature Signaling Gateway Molecule Pages (http://www.signaling-gateway.org/molecule) provides essential information on more than 3800 mammalian proteins involved in cellular signaling. The Molecule Pages contain expert-authored and peer-reviewed information based on the published literature, complemented by regularly updated information derived from public data source references and sequence analysis. The expert-authored data includes both a full-text review about the molecule, with citations, and highly structured data for bioinformatics interrogation, including information on protein interactions and states, transitions between states and protein function. The expert-authored pages are anonymously peer reviewed by the Nature Publishing Group. The Molecule Pages data is present in an object-relational database format and is freely accessible to the authors, the reviewers and the public from a web browser that serves as a presentation layer. The Molecule Pages are supported by several applications that along with the database and the interfaces form a multi-tier architecture. The Molecule Pages and the Signaling Gateway are routinely accessed by a very large research community. PMID:17965093

  17. Physiological Parameters Database for PBPK Modeling (External Review Draft)

    EPA Science Inventory

    EPA released for public comment a physiological parameters database (created using Microsoft ACCESS) intended to be used in PBPK modeling. The database contains physiological parameter values for humans from early childhood through senescence. It also contains similar data for an...

  18. Re-identification of DNA through an automated linkage process.

    PubMed Central

    Malin, B.; Sweeney, L.

    2001-01-01

    This work demonstrates how seemingly anonymous DNA database entries can be related to publicly available health information to uniquely and specifically identify the persons who are the subjects of the information even though the DNA information contains no accompanying explicit identifiers such as name, address, or Social Security number and contains no additional fields of personal information. The software program, REID (Re-Identification of DNA), iteratively uncovers unique occurrences in visit-disease patterns across data collections that reveal inferences about the identities of the patients who are the subject of the DNA. Using real-world data, REID established identifiable linkages in 33-100% of the 10,886 cases explicitly surveyed over 8 gene-based diseases. PMID:11825223

  19. Overlap in Bibliographic Databases.

    ERIC Educational Resources Information Center

    Hood, William W.; Wilson, Concepcion S.

    2003-01-01

    Examines the topic of Fuzzy Set Theory to determine the overlap of coverage in bibliographic databases. Highlights include examples of comparisons of database coverage; frequency distribution of the degree of overlap; records with maximum overlap; records unique to one database; intra-database duplicates; and overlap in the top ten databases.…

  20. Database Access Systems.

    ERIC Educational Resources Information Center

    Dalrymple, Prudence W.; Roderer, Nancy K.

    1994-01-01

    Highlights the changes that have occurred from 1987-93 in database access systems. Topics addressed include types of databases, including CD-ROMs; enduser interface; database selection; database access management, including library instruction and use of primary literature; economic issues; database users; the search process; and improving…

  1. A Novel Approach: Chemical Relational Databases, and the Role of the ISSCAN Database on Assessing Chemical Carcinogenity

    EPA Science Inventory

    Mutagenicity and carcinogenicity databases are crucial resources for toxicologists and regulators involved in chemicals risk assessment. Until recently, existing public toxicity databases have been constructed primarily as "look-up-tables" of existing data, and most often did no...

  2. Databases of the marine metagenomics.

    PubMed

    Mineta, Katsuhiko; Gojobori, Takashi

    2016-02-01

    The metagenomic data obtained from marine environments is significantly useful for understanding marine microbial communities. In comparison with the conventional amplicon-based approach of metagenomics, the recent shotgun sequencing-based approach has become a powerful tool that provides an efficient way of grasping a diversity of the entire microbial community at a sampling point in the sea. However, this approach accelerates accumulation of the metagenome data as well as increase of data complexity. Moreover, when metagenomic approach is used for monitoring a time change of marine environments at multiple locations of the seawater, accumulation of metagenomics data will become tremendous with an enormous speed. Because this kind of situation has started becoming of reality at many marine research institutions and stations all over the world, it looks obvious that the data management and analysis will be confronted by the so-called Big Data issues such as how the database can be constructed in an efficient way and how useful knowledge should be extracted from a vast amount of the data. In this review, we summarize the outline of all the major databases of marine metagenome that are currently publically available, noting that database exclusively on marine metagenome is none but the number of metagenome databases including marine metagenome data are six, unexpectedly still small. We also extend our explanation to the databases, as reference database we call, that will be useful for constructing a marine metagenome database as well as complementing important information with the database. Then, we would point out a number of challenges to be conquered in constructing the marine metagenome database. PMID:26518717

  3. Mouse Resource Browser—a database of mouse databases

    PubMed Central

    Zouberakis, Michael; Chandras, Christina; Swertz, Morris; Smedley, Damian; Gruenberger, Michael; Bard, Jonathan; Schughart, Klaus; Rosenthal, Nadia; Hancock, John M.; Schofield, Paul N.; Kollias, George; Aidinis, Vassilis

    2010-01-01

    The laboratory mouse has become the organism of choice for discovering gene function and unravelling pathogenetic mechanisms of human diseases through the application of various functional genomic approaches. The resulting deluge of data has led to the deployment of numerous online resources and the concomitant need for formalized experimental descriptions, data standardization, database interoperability and integration, a need that has yet to be met. We present here the Mouse Resource Browser (MRB), a database of mouse databases that indexes 217 publicly available mouse resources under 22 categories and uses a standardised database description framework (the CASIMIR DDF) to provide information on their controlled vocabularies (ontologies and minimum information standards), and technical information on programmatic access and data availability. Focusing on interoperability and integration, MRB offers automatic generation of downloadable and re-distributable SOAP application-programming interfaces for resources that provide direct database access. MRB aims to provide useful information to both bench scientists, who can easily navigate and find all mouse related resources in one place, and bioinformaticians, who will be provided with interoperable resources containing data which can be mined and integrated. Database URL: http://bioit.fleming.gr/mrb PMID:20627861

  4. Correlates of Access to Business Research Databases

    ERIC Educational Resources Information Center

    Gottfried, John C.

    2010-01-01

    This study examines potential correlates of business research database access through academic libraries serving top business programs in the United States. Results indicate that greater access to research databases is related to enrollment in graduate business programs, but not to overall enrollment or status as a public or private institution.

  5. Library Instruction and Online Database Searching.

    ERIC Educational Resources Information Center

    Mercado, Heidi

    1999-01-01

    Reviews changes in online database searching in academic libraries. Topics include librarians conducting all searches; the advent of end-user searching and the need for user instruction; compact disk technology; online public catalogs; the Internet; full text databases; electronic information literacy; user education and the remote library user;…

  6. Annual Review of Database Development: 1992.

    ERIC Educational Resources Information Center

    Basch, Reva

    1992-01-01

    Reviews recent trends in databases and online systems. Topics discussed include new access points for established databases; acquisitions, consolidations, and competition between vendors; European coverage; international services; online reference materials, including telephone directories; political and legal materials and public records;…

  7. 24 CFR 902.24 - Database adjustment.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 24 Housing and Urban Development 4 2014-04-01 2014-04-01 false Database adjustment. 902.24 Section 902.24 Housing and Urban Development REGULATIONS RELATING TO HOUSING AND URBAN DEVELOPMENT (CONTINUED... PUBLIC HOUSING ASSESSMENT SYSTEM Physical Condition Indicator § 902.24 Database adjustment....

  8. Correlates of Access to Business Research Databases

    ERIC Educational Resources Information Center

    Gottfried, John C.

    2010-01-01

    This study examines potential correlates of business research database access through academic libraries serving top business programs in the United States. Results indicate that greater access to research databases is related to enrollment in graduate business programs, but not to overall enrollment or status as a public or private institution.…

  9. 24 CFR 902.24 - Database adjustment.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 24 Housing and Urban Development 4 2013-04-01 2013-04-01 false Database adjustment. 902.24 Section 902.24 Housing and Urban Development REGULATIONS RELATING TO HOUSING AND URBAN DEVELOPMENT (CONTINUED... PUBLIC HOUSING ASSESSMENT SYSTEM Physical Condition Indicator § 902.24 Database adjustment....

  10. 24 CFR 902.24 - Database adjustment.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 24 Housing and Urban Development 4 2012-04-01 2012-04-01 false Database adjustment. 902.24 Section 902.24 Housing and Urban Development REGULATIONS RELATING TO HOUSING AND URBAN DEVELOPMENT (CONTINUED... PUBLIC HOUSING ASSESSMENT SYSTEM Physical Condition Indicator § 902.24 Database adjustment....

  11. The urologic epithelial stem cell database (UESC) – a web tool for cell type-specific gene expression and immunohistochemistry images of the prostate and bladder

    PubMed Central

    Pascal, Laura E; Deutsch, Eric W; Campbell, David S; Korb, Martin; True, Lawrence D; Liu, Alvin Y

    2007-01-01

    Background Public databases are crucial for analysis of high-dimensional gene and protein expression data. The Urologic Epithelial Stem Cells (UESC) database is a public database that contains gene and protein information for the major cell types of the prostate, prostate cancer cell lines, and a cancer cell type isolated from a primary tumor. Similarly, such information is available for urinary bladder cell types. Description Two major data types were archived in the database, protein abundance localization data from immunohistochemistry images, and transcript abundance data principally from DNA microarray analysis. Data results were organized in modules that were made to operate independently but built upon a core functionality. Gene array data and immunostaining images for human and mouse prostate and bladder were made available for interrogation. Data analysis capabilities include: (1) CD (cluster designation) cell surface protein data. For each cluster designation molecule, a data summary allows easy retrieval of images (at multiple magnifications). (2) Microarray data. Single gene or batch search can be initiated with Affymetrix Probeset ID, Gene Name, or Accession Number together with options of coalescing probesets and/or replicates. Conclusion Databases are invaluable for biomedical research, and their utility depends on data quality and user friendliness. UESC provides for database queries and tools to examine cell type-specific gene expression (normal vs. cancer), whereas most other databases contain only whole tissue expression datasets. The UESC database provides a valuable tool in the analysis of differential gene expression in prostate cancer genes in cancer progression. PMID:18072977

  12. DMTB: the magnetotactic bacteria database

    NASA Astrophysics Data System (ADS)

    Pan, Y.; Lin, W.

    2012-12-01

    Magnetotactic bacteria (MTB) are of interest in biogeomagnetism, rock magnetism, microbiology, biomineralization, and advanced magnetic materials because of their ability to synthesize highly ordered intracellular nano-sized magnetic minerals, magnetite or greigite. Great strides for MTB studies have been made in the past few decades. More than 600 articles concerning MTB have been published. These rapidly growing data are stimulating cross disciplinary studies in such field as biogeomagnetism. We have compiled the first online database for MTB, i.e., Database of Magnestotactic Bacteria (DMTB, http://database.biomnsl.com). It contains useful information of 16S rRNA gene sequences, oligonucleotides, and magnetic properties of MTB, and corresponding ecological metadata of sampling sites. The 16S rRNA gene sequences are collected from the GenBank database, while all other data are collected from the scientific literature. Rock magnetic properties for both uncultivated and cultivated MTB species are also included. In the DMTB database, data are accessible through four main interfaces: Site Sort, Phylo Sort, Oligonucleotides, and Magnetic Properties. References in each entry serve as links to specific pages within public databases. The online comprehensive DMTB will provide a very useful data resource for researchers from various disciplines, e.g., microbiology, rock magnetism and paleomagnetism, biogeomagnetism, magnetic material sciences and others.

  13. DNA BARCODING IN LAND PLANTS: DEVELOPING STANDARDS TO QUANTIFY AND MAXIMIZE SUCCESS

    PubMed Central

    Erickson, David L.; Spouge, John; Resch, Alissa; Weigt, Lee A.; Kress, W. John

    2009-01-01

    The selection of a DNA barcode in plants has been impeded in part due to the relatively low rates of nucleotide substitution observed at the most accessible plastid markers. However, the absence of consensus also reflects a lack of standards for comparing potential barcode markers. While many publications have suggested a host of plant DNA barcodes, the studies cannot be readily compared with each other through any quantitative or statistical parameter, partly because they put forward no single compelling rationale relevant to the adoption of a DNA barcode in plants. Here, we argue that the efficacy of any particular plant DNA barcode selection should reflect the anticipated performance of the resulting barcode database in assignment of a query sequence to species. While legitimate scientific disagreement exists over the criteria relevant to database performance, the notion gives a unifying rationale for prioritizing selection criteria. Accordingly, we suggest a measure of barcode efficacy based on the rationale of database performance, the probability of correct identification (PCI). Moreover, the definition of PCI is left flexible enough to handle most of the scientific disagreement over how to best evaluate DNA barcodes. Finally, we consider how different types of barcodes might require different methods of analysis and database design and indicate how the analysis might affect the selection of the most broadly effective barcode for land plants. PMID:19779570

  14. Searching and Indexing Genomic Databases via Kernelization

    PubMed Central

    Gagie, Travis; Puglisi, Simon J.

    2015-01-01

    The rapid advance of DNA sequencing technologies has yielded databases of thousands of genomes. To search and index these databases effectively, it is important that we take advantage of the similarity between those genomes. Several authors have recently suggested searching or indexing only one reference genome and the parts of the other genomes where they differ. In this paper, we survey the 20-year history of this idea and discuss its relation to kernelization in parameterized complexity. PMID:25710001

  15. CD-ROM-aided Databases

    NASA Astrophysics Data System (ADS)

    Masuyama, Keiichi

    CD-ROM has rapidly evolved as a new information medium with large capacity, In the U.S. it is predicted that it will become two hundred billion yen market in three years, and thus CD-ROM is strategic target of database industry. Here in Japan the movement toward its commercialization has been active since this year. Shall CD-ROM bussiness ever conquer information market as an on-disk database or electronic publication? Referring to some cases of the applications in the U.S. the author views marketability and the future trend of this new optical disk medium.

  16. RiboaptDB: A Comprehensive Database of Ribozymes and Aptamers

    PubMed Central

    Thodima, Venkata; Pirooznia, Mehdi; Deng, Youping

    2006-01-01

    Background Catalytic RNA molecules are called ribozymes. The aptamers are DNA or RNA molecules that have been selected from vast populations of random sequences, through a combinatorial approach known as SELEX. The selected oligo-nucleotide sequences (~200 bp in length) have the ability to recognize a broad range of specific ligands by forming binding pockets. These novel aptamer sequences can bind to nucleic acids, proteins or small organic and inorganic chemical compounds and have many potential uses in medicine and technology. Results The comprehensive sequence information on aptamers and ribozymes that have been generated by in vitro selection methods are included in this RiboaptDB database. Such types of unnatural data generated by in vitro methods are not available in the public 'natural' sequence databases such as GenBank and EMBL. The amount of sequence data generated by in vitro selection experiments has been accumulating exponentially. There are 370 artificial ribozyme sequences and 3842 aptamer sequences in the total 4212 sequences from 423 citations in this RiboaptDB. We included general search feature, and individual feature wise search, user submission form for new data through online and also local BLAST search. Conclusion This database, besides serving as a storehouse of sequences that may have diagnostic or therapeutic utility in medicine, provides valuable information for computational and theoretical biologists. The RiboaptDB is extremely useful for garnering information about in vitro selection experiments as a whole and for better understanding the distribution of functional nucleic acids in sequence space. The database is updated regularly and is publicly available at . PMID:17118149

  17. Evidence for DNA Damage as a Biological Link Between Diabetes and Cancer

    PubMed Central

    Lee, Shao Chin; Chan, Juliana CN

    2015-01-01

    Objective: This review examines the evidence that: Diabetes is a state of DNA damage; pathophysiological factors in diabetes can cause DNA damage; DNA damage can cause mutations; and DNA mutation is linked to carcinogenesis. Data Sources: We retrieved information from the PubMed database up to January, 2014, using various search terms and their combinations including DNA damage, diabetes, cancer, high glucose, hyperglycemia, free fatty acids, palmitic acid, advanced glycation end products, mutation and carcinogenesis. Study Selection: We included data from peer-reviewed journals and a textbook printed in English on relationships between DNA damage and diabetes as well as pathophysiological factors in diabetes. Publications on relationships among DNA damage, mutagenesis, and carcinogenesis, were also reviewed. We organized this information into a conceptual framework to explain the possible causal relationship between DNA damage and carcinogenesis in diabetes. Results: There are a large amount of data supporting the view that DNA mutation is a typical feature in carcinogenesis. Patients with type 2 diabetes have increased production of reactive oxygen species, reduced levels of antioxidant capacity, and increased levels of DNA damage. The pathophysiological factors and metabolic milieu in diabetes can cause DNA damage such as DNA strand break and base modification (i.e., oxidation). Emerging experimental data suggest that signal pathways (i.e., Akt/tuberin) link diabetes to DNA damage. This collective evidence indicates that diabetes is a pathophysiological state of oxidative stress and DNA damage which can lead to various types of mutation to cause aberration in cells and thereby increased cancer risk. Conclusions: This review highlights the interrelationships amongst diabetes, DNA damage, DNA mutation and carcinogenesis, which suggests that DNA damage can be a biological link between diabetes and cancer. PMID:26021514

  18. The ITPA disruption database

    NASA Astrophysics Data System (ADS)

    Eidietis, N. W.; Gerhardt, S. P.; Granetz, R. S.; Kawano, Y.; Lehnen, M.; Lister, J. B.; Pautasso, G.; Riccardo, V.; Tanna, R. L.; Thornton, A. J.; ITPA Disruption Database Participants, The

    2015-06-01

    A multi-device database of disruption characteristics has been developed under the auspices of the International Tokamak Physics Activity magneto-hydrodynamics topical group. The purpose of this ITPA disruption database (IDDB) is to find the commonalities between the disruption and disruption mitigation characteristics in a wide variety of tokamaks in order to elucidate the physics underlying tokamak disruptions and to extrapolate toward much larger devices, such as ITER and future burning plasma devices. In contrast to previous smaller disruption data collation efforts, the IDDB aims to provide significant context for each shot provided, allowing exploration of a wide array of relationships between pre-disruption and disruption parameters. The IDDB presently includes contributions from nine tokamaks, including both conventional aspect ratio and spherical tokamaks. An initial parametric analysis of the available data is presented. This analysis includes current quench rates, halo current fraction and peaking, and the effectiveness of massive impurity injection. The IDDB is publicly available, with instruction for access provided herein.

  19. Curcumin Resource Database.

    PubMed

    Kumar, Anil; Chetia, Hasnahana; Sharma, Swagata; Kabiraj, Debajyoti; Talukdar, Narayan Chandra; Bora, Utpal

    2015-01-01

    Curcumin is one of the most intensively studied diarylheptanoid, Curcuma longa being its principal producer. This apart, a class of promising curcumin analogs has been generated in laboratories, aptly named as Curcuminoids which are showing huge potential in the fields of medicine, food technology, etc. The lack of a universal source of data on curcumin as well as curcuminoids has been felt by the curcumin research community for long. Hence, in an attempt to address this stumbling block, we have developed Curcumin Resource Database (CRDB) that aims to perform as a gateway-cum-repository to access all relevant data and related information on curcumin and its analogs. Currently, this database encompasses 1186 curcumin analogs, 195 molecular targets, 9075 peer reviewed publications, 489 patents and 176 varieties of C. longa obtained by extensive data mining and careful curation from numerous sources. Each data entry is identified by a unique CRDB ID (identifier). Furnished with a user-friendly web interface and in-built search engine, CRDB provides well-curated and cross-referenced information that are hyperlinked with external sources. CRDB is expected to be highly useful to the researchers working on structure as well as ligand-based molecular design of curcumin analogs. PMID:26220923

  20. Three Decades of Recombinant DNA.

    ERIC Educational Resources Information Center

    Palmer, Jackie

    1985-01-01

    Discusses highlights in the development of genetic engineering, examining techniques with recombinant DNA, legal and ethical issues, GenBank (a national database of nucleic acid sequences), and other topics. (JN)

  1. Mouse Phenome Database

    PubMed Central

    Grubb, Stephen C.; Bult, Carol J.; Bogue, Molly A.

    2014-01-01

    The Mouse Phenome Database (MPD; phenome.jax.org) was launched in 2001 as the data coordination center for the international Mouse Phenome Project. MPD integrates quantitative phenotype, gene expression and genotype data into a common annotated framework to facilitate query and analysis. MPD contains >3500 phenotype measurements or traits relevant to human health, including cancer, aging, cardiovascular disorders, obesity, infectious disease susceptibility, blood disorders, neurosensory disorders, drug addiction and toxicity. Since our 2012 NAR report, we have added >70 new data sets, including data from Collaborative Cross lines and Diversity Outbred mice. During this time we have completely revamped our homepage, improved search and navigational aspects of the MPD application, developed several web-enabled data analysis and visualization tools, annotated phenotype data to public ontologies, developed an ontology browser and released new single nucleotide polymorphism query functionality with much higher density coverage than before. Here, we summarize recent data acquisitions and describe our latest improvements. PMID:24243846

  2. Quantifying the Consistency of Scientific Databases

    PubMed Central

    Šubelj, Lovro; Bajec, Marko; Mileva Boshkoska, Biljana; Kastrin, Andrej; Levnajić, Zoran

    2015-01-01

    Science is a social process with far-reaching impact on our modern society. In recent years, for the first time we are able to scientifically study the science itself. This is enabled by massive amounts of data on scientific publications that is increasingly becoming available. The data is contained in several databases such as Web of Science or PubMed, maintained by various public and private entities. Unfortunately, these databases are not always consistent, which considerably hinders this study. Relying on the powerful framework of complex networks, we conduct a systematic analysis of the consistency among six major scientific databases. We found that identifying a single "best" database is far from easy. Nevertheless, our results indicate appreciable differences in mutual consistency of different databases, which we interpret as recipes for future bibliometric studies. PMID:25984946

  3. Human gene mutation database-a biomedical information and research resource.

    PubMed

    Krawczak, M; Ball, E V; Fenton, I; Stenson, P D; Abeysinghe, S; Thomas, N; Cooper, D N

    2000-01-01

    Although 20 years have elapsed since the first single basepair substitution underlying an inherited disease in humans was characterised at the DNA level, the initiative has only recently been taken to establish central database resources for pathological genetic variants. Disease-associated gene lesions are currently collected and publicised by the Human Gene Mutation Database (HGMD) in Cardiff, locus-specific mutation databases, and to some extent also by the Genome Database (GDB) and Online Mendelian Inheritance in Man (OMIM). To date, HGMD represents the only comprehensive and publicly available database of gene lesions underlying human inherited disease. By July 1999, HGMD contained over 18,000 different mutations from some 900 human genes, the majority being single basepair substitutions. In addition to its potential as an information resource for clinicians and genetic counsellors, HGMD has allowed molecular geneticists to address a variety of biological questions through meta-analysis of the collated data. HGMD also promises to assist research workers in optimising mutation search strategies for a given gene. A questionnaire sent out to, and answered by, the editors of 20 key journals revealed that human genetics journals are increasingly reluctant to publish mutation reports. Electronic data submission and publication facilities are therefore urgently required. The World Wide Web (WWW) provides an excellent medium within which to combine the centralised management of basic mutation data, including rigorous quality control, with the possibility of publishing additional mutation-related information. In response to these needs, HGMD has both instituted a collaboration with Springer-Verlag GmbH, Heidelberg, to potentiate free online submission and electronic publication of human gene mutation data and developed links with the curators of locus-specific mutation databases. PMID:10612821

  4. National Residential Efficiency Measures Database

    DOE Data Explorer

    The National Residential Efficiency Measures Database is a publicly available, centralized resource of residential building retrofit measures and costs for the U.S. building industry. With support from the U.S. Department of Energy, NREL developed this tool to help users determine the most cost-effective retrofit measures for improving energy efficiency of existing homes. Software developers who require residential retrofit performance and cost data for applications that evaluate residential efficiency measures are the primary audience for this database. In addition, home performance contractors and manufacturers of residential materials and equipment may find this information useful. The database offers the following types of retrofit measures: 1) Appliances, 2) Domestic Hot Water, 3) Enclosure, 4) Heating, Ventilating, and Air Conditioning (HVAC), 5) Lighting, 6) Miscellaneous.

  5. VoSeq: A Voucher and DNA Sequence Web Application

    PubMed Central

    Peña, Carlos; Malm, Tobias

    2012-01-01

    There is an ever growing number of molecular phylogenetic studies published, due to, in part, the advent of new techniques that allow cheap and quick DNA sequencing. Hence, the demand for relational databases with which to manage and annotate the amassing DNA sequences, genes, voucher specimens and associated biological data is increasing. In addition, a user-friendly interface is necessary for easy integration and management of the data stored in the database back-end. Available databases allow management of a wide variety of biological data. However, most database systems are not specifically constructed with the aim of being an organizational tool for researchers working in phylogenetic inference. We here report a new software facilitating easy management of voucher and sequence data, consisting of a relational database as back-end for a graphic user interface accessed via a web browser. The application, VoSeq, includes tools for creating molecular datasets of DNA or amino acid sequences ready to be used in commonly used phylogenetic software such as RAxML, TNT, MrBayes and PAUP, as well as for creating tables ready for publishing. It also has inbuilt BLAST capabilities against all DNA sequences stored in VoSeq as well as sequences in NCBI GenBank. By using mash-ups and calls to web services, VoSeq allows easy integration with public services such as Yahoo! Maps, Flickr, Encyclopedia of Life (EOL) and GBIF (by generating data-dumps that can be processed with GBIF's Integrated Publishing Toolkit). PMID:22720030

  6. Online Databases in Physics.

    ERIC Educational Resources Information Center

    Sievert, MaryEllen C.; Verbeck, Alison F.

    1984-01-01

    This overview of 47 online sources for physics information available in the United States--including sub-field databases, transdisciplinary databases, and multidisciplinary databases-- notes content, print source, language, time coverage, and databank. Two discipline-specific databases (SPIN and PHYSICS BRIEFS) are also discussed. (EJS)

  7. Databases: Beyond the Basics.

    ERIC Educational Resources Information Center

    Whittaker, Robert

    This presented paper offers an elementary description of database characteristics and then provides a survey of databases that may be useful to the teacher and researcher in Slavic and East European languages and literatures. The survey focuses on commercial databases that are available, usable, and needed. Individual databases discussed include:…

  8. Reflective Database Access Control

    ERIC Educational Resources Information Center

    Olson, Lars E.

    2009-01-01

    "Reflective Database Access Control" (RDBAC) is a model in which a database privilege is expressed as a database query itself, rather than as a static privilege contained in an access control list. RDBAC aids the management of database access controls by improving the expressiveness of policies. However, such policies introduce new interactions

  9. Reflective Database Access Control

    ERIC Educational Resources Information Center

    Olson, Lars E.

    2009-01-01

    "Reflective Database Access Control" (RDBAC) is a model in which a database privilege is expressed as a database query itself, rather than as a static privilege contained in an access control list. RDBAC aids the management of database access controls by improving the expressiveness of policies. However, such policies introduce new interactions…

  10. [A database for research in public health].

    PubMed

    Ramírez-Sánchez, A L; Infante-Castañeda, C; Schlaepfer-Pedrazzini, L; Bobadilla, J L; Nájera, P; Ramírez, T J

    1990-01-01

    Epidemiological and health system research projects are often delayed due to the difficulties to build validated data basis in personal computers. This papers presents a new computer interactive program for handling numeric data from a given questionnaire to a structured archive. The questionnaire includes the basic variables of the dwelling and of the members of the household. A list of sociodemographic and health variables are selected, although other variables can be easily added, according to special needs. All the intermediate steps regularly needed to construct a data base are included in the package: capture, verification, validation and record linkage. The package is equipped with the basic procedures needed to produce tabulations and basic statistical analysis. PMID:2263987

  11. Human Mitochondrial Protein Database

    National Institute of Standards and Technology Data Gateway

    SRD 131 Human Mitochondrial Protein Database (Web, free access)   The Human Mitochondrial Protein Database (HMPDb) provides comprehensive data on mitochondrial and human nuclear encoded proteins involved in mitochondrial biogenesis and function. This database consolidates information from SwissProt, LocusLink, Protein Data Bank (PDB), GenBank, Genome Database (GDB), Online Mendelian Inheritance in Man (OMIM), Human Mitochondrial Genome Database (mtDB), MITOMAP, Neuromuscular Disease Center and Human 2-D PAGE Databases. This database is intended as a tool not only to aid in studying the mitochondrion but in studying the associated diseases.

  12. Interactive bibliographical database on color

    NASA Astrophysics Data System (ADS)

    Caivano, Jose L.

    2002-06-01

    The paper describes the methodology and results of a project under development, aimed at the elaboration of an interactive bibliographical database on color in all fields of application: philosophy, psychology, semiotics, education, anthropology, physical and natural sciences, biology, medicine, technology, industry, architecture and design, arts, linguistics, geography, history. The project is initially based upon an already developed bibliography, published in different journals, updated in various opportunities, and now available at the Internet, with more than 2,000 entries. The interactive database will amplify that bibliography, incorporating hyperlinks and contents (indexes, abstracts, keywords, introductions, or eventually the complete document), and devising mechanisms for information retrieval. The sources to be included are: books, doctoral dissertations, multimedia publications, reference works. The main arrangement will be chronological, but the design of the database will allow rearrangements or selections by different fields: subject, Decimal Classification System, author, language, country, publisher, etc. A further project is to develop another database, including color-specialized journals or newsletters, and articles on color published in international journals, arranged in this case by journal name and date of publication, but allowing also rearrangements or selections by author, subject and keywords.

  13. ppdb: a plant promoter database

    PubMed Central

    Yamamoto, Yoshiharu Y.; Obokata, Junichi

    2008-01-01

    ppdb (http://www.ppdb.gene.nagoya-u.ac.jp) is a plant promoter database that provides promoter annotation of Arabidopsis and rice. The database contains information on promoter structures, transcription start sites (TSSs) that have been identified from full-length cDNA clones and also a vast amount of TSS tag data. In ppdb, the promoter structures are determined by sets of promoter elements identified by a position-sensitive extraction method called local distribution of short sequences (LDSS). By using this database, the core promoter structure, the presence of regulatory elements and the distribution of TSS clusters can be identified. Although no differentiation of promoter architecture among plant species has been reported, there is some divergence of utilized sequences for promoter elements. Therefore, ppdb is based on species-specific sets of promoter elements, rather than on general motifs for multiple species. Each regulatory sequence is hyperlinked to literary information, a PLACE entry served by a plant cis-element database, and a list of promoters containing the regulatory sequence. PMID:17947329

  14. ppdb: a plant promoter database.

    PubMed

    Yamamoto, Yoshiharu Y; Obokata, Junichi

    2008-01-01

    ppdb (http://www.ppdb.gene.nagoya-u.ac.jp) is a plant promoter database that provides promoter annotation of Arabidopsis and rice. The database contains information on promoter structures, transcription start sites (TSSs) that have been identified from full-length cDNA clones and also a vast amount of TSS tag data. In ppdb, the promoter structures are determined by sets of promoter elements identified by a position-sensitive extraction method called local distribution of short sequences (LDSS). By using this database, the core promoter structure, the presence of regulatory elements and the distribution of TSS clusters can be identified. Although no differentiation of promoter architecture among plant species has been reported, there is some divergence of utilized sequences for promoter elements. Therefore, ppdb is based on species-specific sets of promoter elements, rather than on general motifs for multiple species. Each regulatory sequence is hyperlinked to literary information, a PLACE entry served by a plant cis-element database, and a list of promoters containing the regulatory sequence. PMID:17947329

  15. 42 CFR 455.436 - Federal database checks.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... 42 Public Health 4 2013-10-01 2013-10-01 false Federal database checks. 455.436 Section 455.436....436 Federal database checks. The State Medicaid agency must do all of the following: (a) Confirm the... databases. (b) Check the Social Security Administration's Death Master File, the National Plan and...

  16. 42 CFR 455.436 - Federal database checks.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... 42 Public Health 4 2014-10-01 2014-10-01 false Federal database checks. 455.436 Section 455.436....436 Federal database checks. The State Medicaid agency must do all of the following: (a) Confirm the... databases. (b) Check the Social Security Administration's Death Master File, the National Plan and...

  17. 40 CFR 1400.13 - Read-only database.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 33 2014-07-01 2014-07-01 false Read-only database. 1400.13 Section... INFORMATION Other Provisions § 1400.13 Read-only database. The Administrator is authorized to establish... public off-site consequence analysis information by means of a central database under the control of...

  18. 42 CFR 455.436 - Federal database checks.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 42 Public Health 4 2011-10-01 2011-10-01 false Federal database checks. 455.436 Section 455.436....436 Federal database checks. The State Medicaid agency must do all of the following: (a) Confirm the... databases. (b) Check the Social Security Administration's Death Master File, the National Plan and...

  19. 40 CFR 1400.13 - Read-only database.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 34 2012-07-01 2012-07-01 false Read-only database. 1400.13 Section... INFORMATION Other Provisions § 1400.13 Read-only database. The Administrator is authorized to establish... public off-site consequence analysis information by means of a central database under the control of...

  20. 40 CFR 1400.13 - Read-only database.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 34 2013-07-01 2013-07-01 false Read-only database. 1400.13 Section... INFORMATION Other Provisions § 1400.13 Read-only database. The Administrator is authorized to establish... public off-site consequence analysis information by means of a central database under the control of...

  1. 42 CFR 455.436 - Federal database checks.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... 42 Public Health 4 2012-10-01 2012-10-01 false Federal database checks. 455.436 Section 455.436....436 Federal database checks. The State Medicaid agency must do all of the following: (a) Confirm the... databases. (b) Check the Social Security Administration's Death Master File, the National Plan and...

  2. DNA Sequencing apparatus

    DOEpatents

    Tabor, Stanley; Richardson, Charles C.

    1992-01-01

    An automated DNA sequencing apparatus having a reactor for providing at least two series of DNA products formed from a single primer and a DNA strand, each DNA product of a series differing in molecular weight and having a chain terminating agent at one end; separating means for separating the DNA products to form a series bands, the intensity of substantially all nearby bands in a different series being different, band reading means for determining the position an This invention was made with government support including a grant from the U.S. Public Health Service, contract number AI-06045. The U.S. government has certain rights in the invention.

  3. CEBAF Large Acceptance Spectrometer (CLAS) Physics Database

    DOE Data Explorer

    A username and password are required to access and search the entire database. However, the Overview page provides links to detailed data pages for each of the experiments available for public access. There are many experiments with data that the public can freely access.

  4. On-Line Databases in Mexico.

    ERIC Educational Resources Information Center

    Molina, Enzo

    1986-01-01

    Use of online bibliographic databases in Mexico is provided through Servicio de Consulta a Bancos de Informacion, a public service that provides information retrieval, document delivery, translation, technical support, and training services. Technical infrastructure is based on a public packet-switching network and institutional users may receive…

  5. Village Green Project: Web-accessible Database

    EPA Science Inventory

    The purpose of this web-accessible database is for the public to be able to view instantaneous readings from a solar-powered air monitoring station located in a public location (prototype pilot test is outside of a library in Durham County, NC). The data are wirelessly transmitte...

  6. The UCSC Genome Browser database: 2015 update.

    PubMed

    Rosenbloom, Kate R; Armstrong, Joel; Barber, Galt P; Casper, Jonathan; Clawson, Hiram; Diekhans, Mark; Dreszer, Timothy R; Fujita, Pauline A; Guruvadoo, Luvina; Haeussler, Maximilian; Harte, Rachel A; Heitner, Steve; Hickey, Glenn; Hinrichs, Angie S; Hubley, Robert; Karolchik, Donna; Learned, Katrina; Lee, Brian T; Li, Chin H; Miga, Karen H; Nguyen, Ngan; Paten, Benedict; Raney, Brian J; Smit, Arian F A; Speir, Matthew L; Zweig, Ann S; Haussler, David; Kuhn, Robert M; Kent, W James

    2015-01-01

    Launched in 2001 to showcase the draft human genome assembly, the UCSC Genome Browser database (http://genome.ucsc.edu) and associated tools continue to grow, providing a comprehensive resource of genome assemblies and annotations to scientists and students worldwide. Highlights of the past year include the release of a browser for the first new human genome reference assembly in 4 years in December 2013 (GRCh38, UCSC hg38), a watershed comparative genomics annotation (100-species multiple alignment and conservation) and a novel distribution mechanism for the browser (GBiB: Genome Browser in a Box). We created browsers for new species (Chinese hamster, elephant shark, minke whale), 'mined the web' for DNA sequences and expanded the browser display with stacked color graphs and region highlighting. As our user community increasingly adopts the UCSC track hub and assembly hub representations for sharing large-scale genomic annotation data sets and genome sequencing projects, our menu of public data hubs has tripled. PMID:25428374

  7. The UCSC Genome Browser database: 2015 update

    PubMed Central

    Rosenbloom, Kate R.; Armstrong, Joel; Barber, Galt P.; Casper, Jonathan; Clawson, Hiram; Diekhans, Mark; Dreszer, Timothy R.; Fujita, Pauline A.; Guruvadoo, Luvina; Haeussler, Maximilian; Harte, Rachel A.; Heitner, Steve; Hickey, Glenn; Hinrichs, Angie S.; Hubley, Robert; Karolchik, Donna; Learned, Katrina; Lee, Brian T.; Li, Chin H.; Miga, Karen H.; Nguyen, Ngan; Paten, Benedict; Raney, Brian J.; Smit, Arian F. A.; Speir, Matthew L.; Zweig, Ann S.; Haussler, David; Kuhn, Robert M.; Kent, W. James

    2015-01-01

    Launched in 2001 to showcase the draft human genome assembly, the UCSC Genome Browser database (http://genome.ucsc.edu) and associated tools continue to grow, providing a comprehensive resource of genome assemblies and annotations to scientists and students worldwide. Highlights of the past year include the release of a browser for the first new human genome reference assembly in 4 years in December 2013 (GRCh38, UCSC hg38), a watershed comparative genomics annotation (100-species multiple alignment and conservation) and a novel distribution mechanism for the browser (GBiB: Genome Browser in a Box). We created browsers for new species (Chinese hamster, elephant shark, minke whale), ‘mined the web’ for DNA sequences and expanded the browser display with stacked color graphs and region highlighting. As our user community increasingly adopts the UCSC track hub and assembly hub representations for sharing large-scale genomic annotation data sets and genome sequencing projects, our menu of public data hubs has tripled. PMID:25428374

  8. MitoZoa 2.0: a database resource and search tools for comparative and evolutionary analyses of mitochondrial genomes in Metazoa

    PubMed Central

    D'Onorio de Meo, Paolo; D'Antonio, Mattia; Griggio, Francesca; Lupi, Renato; Borsani, Massimiliano; Pavesi, Giulio; Castrignanò, Tiziana; Pesole, Graziano; Gissi, Carmela

    2012-01-01

    The MITOchondrial genome database of metaZOAns (MitoZoa) is a public resource for comparative analyses of metazoan mitochondrial genomes (mtDNA) at both the sequence and genomic organizational levels. The main characteristics of the MitoZoa database are the careful revision of mtDNA entry annotations and the possibility of retrieving gene order and non-coding region (NCR) data in appropriate formats. The MitoZoa retrieval system enables basic and complex queries at various taxonomic levels using different search menus. MitoZoa 2.0 has been enhanced in several aspects, including: a re-annotation pipeline to check the correctness of protein-coding gene predictions; a standardized annotation of introns and of precursor ORFs whose functionality is post-transcriptionally recovered by RNA editing or programmed translational frameshifting; updates of taxon-related fields and a BLAST sequence similarity search tool. Database novelties and the definition of standard mtDNA annotation rules, together with the user-friendly retrieval system and the BLAST service, make MitoZoa a valuable resource for comparative and evolutionary analyses as well as a reference database to assist in the annotation of novel mtDNA sequences. MitoZoa is freely accessible at http://www.caspur.it/mitozoa. PMID:22123747

  9. Medical database security policies.

    PubMed

    Pangalos, G J

    1993-11-01

    Database security plays an important role in the overall security of medical information systems. Security does not only involve fundamental ethical principles such as privacy and confidentiality, but is also an essential prerequisite for effective medical care. The general framework and the requirements for medical database security are presented. The three prominent proposals for medical database security are discussed in some detail, together with specific proposals for medical database security. A number of parameters for a secure medical database development are presented and discussed, and guidelines are given for the development of secure medical database systems. PMID:8295541

  10. Curation accuracy of model organism databases

    PubMed Central

    Keseler, Ingrid M.; Skrzypek, Marek; Weerasinghe, Deepika; Chen, Albert Y.; Fulcher, Carol; Li, Gene-Wei; Lemmer, Kimberly C.; Mladinich, Katherine M.; Chow, Edmond D.; Sherlock, Gavin; Karp, Peter D.

    2014-01-01

    Manual extraction of information from the biomedical literature—or biocuration—is the central methodology used to construct many biological databases. For example, the UniProt protein database, the EcoCyc Escherichia coli database and the Candida Genome Database (CGD) are all based on biocuration. Biological databases are used extensively by life science researchers, as online encyclopedias, as aids in the interpretation of new experimental data and as golden standards for the development of new bioinformatics algorithms. Although manual curation has been assumed to be highly accurate, we are aware of only one previous study of biocuration accuracy. We assessed the accuracy of EcoCyc and CGD by manually selecting curated assertions within randomly chosen EcoCyc and CGD gene pages and by then validating that the data found in the referenced publications supported those assertions. A database assertion is considered to be in error if that assertion could not be found in the publication cited for that assertion. We identified 10 errors in the 633 facts that we validated across the two databases, for an overall error rate of 1.58%, and individual error rates of 1.82% for CGD and 1.40% for EcoCyc. These data suggest that manual curation of the experimental literature by Ph.D-level scientists is highly accurate. Database URL: http://ecocyc.org/, http://www.candidagenome.org// PMID:24923819

  11. A nation's genes for a cure to cancer: evolving ethical, social and legal issues regarding population genetic databases.

    PubMed

    Hsieh, Alice

    2004-01-01

    The advent of the human genome sequence has focused research on understanding underlying genetic links to complex diseases such as cancer, asthma and heart disease. In the past few years, individual countries, such as Iceland, Estonia, Singapore and the United Kingdom, have created national databases of their citizens' DNA for comparative research. Most recently, an international consortium including Nigeria, Japan, China and the United States launched a $100 million project called the International HapMap to map the human genome according to haplotypes, blocks of DNA that contain genetic variation. Such population genetic databases present challenging ethical, social and legal issues, yet regulation of genetic information has developed sporadically, from region to region, without a consistent international standard. Without a clear understanding of the consequences of genetic research in terms of individual and community-wide discrimination and stigmatization, genetic databases raise concerns about the protection of genetic information. This Note provides a survey of the evolving landscape of population genetic databases as a legislative and public policy tool for national and international regulators. It compares different approaches to regulating the collection and use of population genetic databases in order to understand what areas of consensus are formulating a foundation for an international standard. As the first population genetics project that will span multiple countries for the collection of DNA, the International HapMap has the potential to become an influential standard for the protection of population genetic information. This Note highlights issues among the national databases and the HapMap project that raise ethical, social and legal concerns for the future and recommends further protections for both individual donors and community interests. PMID:16755693

  12. THE ECOTOX DATABASE

    EPA Science Inventory

    The database provides chemical-specific toxicity information for aquatic life, terrestrial plants, and terrestrial wildlife. ECOTOX is a comprehensive ecotoxicology database and is therefore essential for providing and suppoirting high quality models needed to estimate population...

  13. Household Products Database: Pesticides

    MedlinePlus

    ... Names Types of Products Manufacturers Ingredients About the Database FAQ Product Recalls Help Glossary Contact Us More ... holders. Information is extracted from Consumer Product Information Database 2001-2015 by DeLima Associates. All rights reserved. ...

  14. Physiological Information Database (PID)

    EPA Science Inventory

    EPA has developed a physiological information database (created using Microsoft ACCESS) intended to be used in PBPK modeling. The database contains physiological parameter values for humans from early childhood through senescence as well as similar data for laboratory animal spec...

  15. Network II Database

    Energy Science and Technology Software Center (ESTSC)

    1994-11-07

    The Oak Ridge National Laboratory (ORNL) Rail and Barge Network II Database is a representation of the rail and barge system of the United States. The network is derived from the Federal Rail Administration (FRA) rail database.

  16. ECOTOX DATABASE SYSTEM

    EPA Science Inventory

    The ECOTOXicology database is a source for locating single chemical toxicity data for aquatic life, terrestrial plants and wildlife. ECOTOX integrates three toxicology effects databases: AQUIRE (aquatic life), PHYTOTOX (terrestrial plants), and TERRETOX (terrestrial wildlife). Th...

  17. Scopus database: a review

    PubMed Central

    Burnham, Judy F

    2006-01-01

    The Scopus database provides access to STM journal articles and the references included in those articles, allowing the searcher to search both forward and backward in time. The database can be used for collection development as well as for research. This review provides information on the key points of the database and compares it to Web of Science. Neither database is inclusive, but complements each other. If a library can only afford one, choice must be based in institutional needs. PMID:16522216

  18. Plant and Crop Databases

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Databases have become an integral part of all aspects of biological research, including basic and applied plant biology. The importance of databases continues to increase as the volume of data from direct and indirect genomics approaches expands. What is not always obvious to users of databases is t...

  19. Mission and Assets Database

    NASA Technical Reports Server (NTRS)

    Baldwin, John; Zendejas, Silvino; Gutheinz, Sandy; Borden, Chester; Wang, Yeou-Fang

    2009-01-01

    Mission and Assets Database (MADB) Version 1.0 is an SQL database system with a Web user interface to centralize information. The database stores flight project support resource requirements, view periods, antenna information, schedule, and forecast results for use in mid-range and long-term planning of Deep Space Network (DSN) assets.

  20. State and Local Government Publications.

    ERIC Educational Resources Information Center

    Nakata, Yuri; Kopec, Karen

    1980-01-01

    Reviews trends in library programs for state and local government publications and documents the increased interest in microforms and databases. Discussion focuses on publication distribution and control, and efforts to support interstate networking. There are 28 references. (RAA)

  1. The Stanford Tissue Microarray Database.

    PubMed

    Marinelli, Robert J; Montgomery, Kelli; Liu, Chih Long; Shah, Nigam H; Prapong, Wijan; Nitzberg, Michael; Zachariah, Zachariah K; Sherlock, Gavin J; Natkunam, Yasodha; West, Robert B; van de Rijn, Matt; Brown, Patrick O; Ball, Catherine A

    2008-01-01

    The Stanford Tissue Microarray Database (TMAD; http://tma.stanford.edu) is a public resource for disseminating annotated tissue images and associated expression data. Stanford University pathologists, researchers and their collaborators worldwide use TMAD for designing, viewing, scoring and analyzing their tissue microarrays. The use of tissue microarrays allows hundreds of human tissue cores to be simultaneously probed by antibodies to detect protein abundance (Immunohistochemistry; IHC), or by labeled nucleic acids (in situ hybridization; ISH) to detect transcript abundance. TMAD archives multi-wavelength fluorescence and bright-field images of tissue microarrays for scoring and analysis. As of July 2007, TMAD contained 205 161 images archiving 349 distinct probes on 1488 tissue microarray slides. Of these, 31 306 images for 68 probes on 125 slides have been released to the public. To date, 12 publications have been based on these raw public data. TMAD incorporates the NCI Thesaurus ontology for searching tissues in the cancer domain. Image processing researchers can extract images and scores for training and testing classification algorithms. The production server uses the Apache HTTP Server, Oracle Database and Perl application code. Source code is available to interested researchers under a no-cost license. PMID:17989087

  2. The Stanford Tissue Microarray Database

    PubMed Central

    Marinelli, Robert J.; Montgomery, Kelli; Liu, Chih Long; Shah, Nigam H.; Prapong, Wijan; Nitzberg, Michael; Zachariah, Zachariah K.; Sherlock, Gavin J.; Natkunam, Yasodha; West, Robert B.; van de Rijn, Matt; Brown, Patrick O.; Ball, Catherine A.

    2008-01-01

    The Stanford Tissue Microarray Database (TMAD; http://tma.stanford.edu) is a public resource for disseminating annotated tissue images and associated expression data. Stanford University pathologists, researchers and their collaborators worldwide use TMAD for designing, viewing, scoring and analyzing their tissue microarrays. The use of tissue microarrays allows hundreds of human tissue cores to be simultaneously probed by antibodies to detect protein abundance (Immunohistochemistry; IHC), or by labeled nucleic acids (in situ hybridization; ISH) to detect transcript abundance. TMAD archives multi-wavelength fluorescence and bright-field images of tissue microarrays for scoring and analysis. As of July 2007, TMAD contained 205 161 images archiving 349 distinct probes on 1488 tissue microarray slides. Of these, 31 306 images for 68 probes on 125 slides have been released to the public. To date, 12 publications have been based on these raw public data. TMAD incorporates the NCI Thesaurus ontology for searching tissues in the cancer domain. Image processing researchers can extract images and scores for training and testing classification algorithms. The production server uses the Apache HTTP Server, Oracle Database and Perl application code. Source code is available to interested researchers under a no-cost license. PMID:17989087

  3. The Organelle Genome Database Project (GOBASE).

    PubMed Central

    Korab-Laskowska, M; Rioux, P; Brossard, N; Littlejohn, T G; Gray, M W; Lang, B F; Burger, G

    1998-01-01

    The taxonomically broad organelle genome database (GOBASE) organizes and integrates diverse data related to organelles (mitochondria and chloroplasts). The current version of GOBASE focuses on the mitochondrial subset of data and contains molecular sequences, RNA secondary structures and genetic maps, as well as taxonomic information for all eukaryotic species represented. The database has been designed so that complex biological queries, especially ones posed in a comparative genomics context, are supported. GOBASE has been implemented as a relational database with a web-based user interface (http://megasun.bch.umontreal.ca/gobase/gobas e.html ). Custom software tools have been written in house to assist in the population of the database, data validation, nomenclature standardization and front-end design. The database is fully operational and publicly accessible via the World Wide Web, allowing interactive browsing, sophisticated searching and easy downloading of data. PMID:9399818

  4. LOTUS-DB: an integrative and interactive database for Nelumbo nucifera study.

    PubMed

    Wang, Kun; Deng, Jiao; Damaris, Rebecca Njeri; Yang, Mei; Xu, Liming; Yang, Pingfang

    2015-01-01

    Besides its important significance in plant taxonomy and phylogeny, sacred lotus (Nelumbo nucifera Gaertn.) might also hold the key to the secrets of aging, which attracts crescent attentions from researchers all over the world. The genetic or molecular studies on this species depend on its genome information. In 2013, two publications reported the sequencing of its full genome, based on which we constructed a database named as LOTUS-DB. It will provide comprehensive information on the annotation, gene function and expression for the sacred lotus. The information will facilitate users to efficiently query and browse genes, graphically visualize genome and download a variety of complex data information on genome DNA, coding sequence (CDS), transcripts or peptide sequences, promoters and markers. It will accelerate researches on gene cloning, functional identification of sacred lotus, and hence promote the studies on this species and plant genomics as well. Database URL: http://lotus-db.wbgcas.cn PMID:25819075

  5. LOTUS-DB: an integrative and interactive database for Nelumbo nucifera study

    PubMed Central

    Wang, Kun; Deng, Jiao; Damaris, Rebecca Njeri; Yang, Mei; Xu, Liming; Yang, Pingfang

    2015-01-01

    Besides its important significance in plant taxonomy and phylogeny, sacred lotus (Nelumbo nucifera Gaertn.) might also hold the key to the secrets of aging, which attracts crescent attentions from researchers all over the world. The genetic or molecular studies on this species depend on its genome information. In 2013, two publications reported the sequencing of its full genome, based on which we constructed a database named as LOTUS-DB. It will provide comprehensive information on the annotation, gene function and expression for the sacred lotus. The information will facilitate users to efficiently query and browse genes, graphically visualize genome and download a variety of complex data information on genome DNA, coding sequence (CDS), transcripts or peptide sequences, promoters and markers. It will accelerate researches on gene cloning, functional identification of sacred lotus, and hence promote the studies on this species and plant genomics as well. Database URL: http://lotus-db.wbgcas.cn. PMID:25819075

  6. BlotBase: a northern blot database.

    PubMed

    Schlamp, K; Weinmann, A; Krupp, M; Maass, T; Galle, Pr; Teufel, A

    2008-12-31

    With the availability of high-throughput gene expression analysis, multiple public expression databases emerged, mostly based on microarray expression data. Although these databases are of significant biomedical value, they do hold significant drawbacks, especially concerning the reliability of single gene expression profiles obtained by microarray data. Simultaneously, reliable data on an individual gene's expression are often published as single northern blots in individual publications. These data were not yet available for high-throughput screening. To reduce the gap between high-throughput expression data and individual highly reliable expression data, we designed a novel database "BlotBase", a freely and easily accessible database, currently containing approximately 700 published northern blots of human or mouse origin (http://www.medicalgenomics.org/Databases/BlotBase). As the database is open for public data submission, we expect this database to quickly become a large expression profiling resource, eventually providing higher reliability in high-throughput gene expression analysis. Realizing BlotBase, Pubmed was searched manually and by computer based text mining methods to obtain publications containing northern blot results. Subsequently, northern blots were extracted and expression values of different tissues calculated utilizing Image J. All data were made available through a user friendly web front end. The data may be searched by either full text search or list of available northern blots of a specific tissue. Northern blot expression profiles were displayed by three expression states as well as a bar chart, allowing for automated evaluation. Furthermore, we integrated additional features, e.g. instant access to the corresponding RNA sequence or primer design tools making further expression analysis more convenient. Finally, through a semiautomatic submission system this database was opened to the bioinformatics community. PMID:18838116

  7. DNA Barcoding for Species Assignment: The Case of Mediterranean Marine Fishes

    PubMed Central

    Landi, Monica; Dimech, Mark; Arculeo, Marco; Biondo, Girolama; Martins, Rogelia; Carneiro, Miguel; Carvalho, Gary Robert; Brutto, Sabrina Lo; Costa, Filipe O.

    2014-01-01

    Background DNA barcoding enhances the prospects for species-level identifications globally using a standardized and authenticated DNA-based approach. Reference libraries comprising validated DNA barcodes (COI) constitute robust datasets for testing query sequences, providing considerable utility to identify marine fish and other organisms. Here we test the feasibility of using DNA barcoding to assign species to tissue samples from fish collected in the central Mediterranean Sea, a major contributor to the European marine ichthyofaunal diversity. Methodology/Principal Findings A dataset of 1278 DNA barcodes, representing 218 marine fish species, was used to test the utility of DNA barcodes to assign species from query sequences. We tested query sequences against 1) a reference library of ranked DNA barcodes from the neighbouring North East Atlantic, and 2) the public databases BOLD and GenBank. In the first case, a reference library comprising DNA barcodes with reliability grades for 146 fish species was used as diagnostic dataset to screen 486 query DNA sequences from fish specimens collected in the central basin of the Mediterranean Sea. Of all query sequences suitable for comparisons 98% were unambiguously confirmed through complete match with reference DNA barcodes. In the second case, it was possible to assign species to 83% (BOLD-IDS) and 72% (GenBank) of the sequences from the Mediterranean. Relatively high intraspecific genetic distances were found in 7 species (2.2%–18.74%), most of them of high commercial relevance, suggesting possible cryptic species. Conclusion/Significance We emphasize the discriminatory power of COI barcodes and their application to cases requiring species level resolution starting from query sequences. Results highlight the value of public reference libraries of reliability grade-annotated DNA barcodes, to identify species from different geographical origins. The ability to assign species with high precision from DNA samples of disparate quality and origin has major utility in several fields, from fisheries and conservation programs to control of fish products authenticity. PMID:25222272

  8. The Porcelain Crab Transcriptome and PCAD, the Porcelain Crab Microarray and Sequence Database

    PubMed Central

    Tagmount, Abderrahmane; Wang, Mei; Lindquist, Erika; Tanaka, Yoshihiro; Teranishi, Kristen S.; Sunagawa, Shinichi; Wong, Mike; Stillman, Jonathon H.

    2010-01-01

    Background With the emergence of a completed genome sequence of the freshwater crustacean Daphnia pulex, construction of genomic-scale sequence databases for additional crustacean sequences are important for comparative genomics and annotation. Porcelain crabs, genus Petrolisthes, have been powerful crustacean models for environmental and evolutionary physiology with respect to thermal adaptation and understanding responses of marine organisms to climate change. Here, we present a large-scale EST sequencing and cDNA microarray database project for the porcelain crab Petrolisthes cinctipes. Methodology/Principal Findings A set of ∼30K unique sequences (UniSeqs) representing ∼19K clusters were generated from ∼98K high quality ESTs from a set of tissue specific non-normalized and mixed-tissue normalized cDNA libraries from the porcelain crab Petrolisthes cinctipes. Homology for each UniSeq was assessed using BLAST, InterProScan, GO and KEGG database searches. Approximately 66% of the UniSeqs had homology in at least one of the databases. All EST and UniSeq sequences along with annotation results and coordinated cDNA microarray datasets have been made publicly accessible at the Porcelain Crab Array Database (PCAD), a feature-enriched version of the Stanford and Longhorn Array Databases. Conclusions/Significance The EST project presented here represents the third largest sequencing effort for any crustacean, and the largest effort for any crab species. Our assembly and clustering results suggest that our porcelain crab EST data set is equally diverse to the much larger EST set generated in the Daphnia pulex genome sequencing project, and thus will be an important resource to the Daphnia research community. Our homology results support the pancrustacea hypothesis and suggest that Malacostraca may be ancestral to Branchiopoda and Hexapoda. Our results also suggest that our cDNA microarrays cover as much of the transcriptome as can reasonably be captured in EST library sequencing approaches, and thus represent a rich resource for studies of environmental genomics. PMID:20174471

  9. An Introduction to Database Structure and Database Machines.

    ERIC Educational Resources Information Center

    Detweiler, Karen

    1984-01-01

    Enumerates principal management objectives of database management systems (data independence, quality, security, multiuser access, central control) and criteria for comparison (response time, size, flexibility, other features). Conventional database management systems, relational databases, and database machines used for backend processing are

  10. An Introduction to Database Structure and Database Machines.

    ERIC Educational Resources Information Center

    Detweiler, Karen

    1984-01-01

    Enumerates principal management objectives of database management systems (data independence, quality, security, multiuser access, central control) and criteria for comparison (response time, size, flexibility, other features). Conventional database management systems, relational databases, and database machines used for backend processing are…

  11. Genotype-based databases for variants causing rare diseases.

    PubMed

    Lanthaler, Barbara; Wieser, Stefanie; Deutschmann, Andrea; Schossig, Anna; Fauth, Christine; Zschocke, Johannes; Witsch-Baumgartner, Martina

    2014-10-15

    Inherited diseases are the result of DNA sequence changes. In recessive diseases, the clinical phenotype results from the combined functional effects of variants in both copies of the gene. In some diseases there is often considerable variability of clinical presentation or disease severity, which may be predicted by the genotype. Additional effects may be triggered by environmental factors, as well as genetic modifiers which could be nucleotide polymorphisms in related genes, e.g. maternal ApoE or ABCA1 genotypes which may have an influence on the phenotype of SLOS individuals. Here we report the establishment of genotype variation databases for various rare diseases which provide individual clinical phenotypes associated with genotypes and include data about possible genetic modifiers. These databases aim to be an easy public access to information on rare and private variants with clinical data, which will facilitate the interpretation of genetic variants. The created databases include ACAD8 (isobutyryl-CoA dehydrogenase deficiency (IBD)), ACADSB (short-chain acyl-CoA dehydrogenase (SCAD) deficiency), AUH (3-methylglutaconic aciduria (3-MGCA)), DHCR7 (Smith-Lemli-Opitz syndrome), HMGCS2 (3-hydroxy-3-methylglutaryl-CoA synthase 2 deficiency), HSD17B10 (17-beta-hydroxysteroid dehydrogenase X deficiency), FKBP14 (Ehlers-Danlos syndrome with progressive kyphoscoliosis, myopathy, and hearing loss; EDSKMH) and ROGDI (Kohlschütter-Tönz syndrome). These genes have been selected because of our specific research interests in these rare and metabolic diseases. The aim of the database was to include all identified individuals with variants in these specific genes. Identical genotypes are listed multiple times if they were found in several patients, phenotypic descriptions and biochemical data are included as detailed as possible in view also of validating the proposed pathogenicity of these genotypes. For DHCR7 genetic modifier data (maternal APOE and ABCA1 genotypes) is also included. Databases are available at http://databases.lovd.nl/shared/genes and will be updated based on periodic literature reviews and submitted reports. PMID:25111118

  12. The National Land Cover Database

    USGS Publications Warehouse

    Homer, Collin H.; Fry, Joyce A.; Barnes, Christopher A.

    2012-01-01

    The National Land Cover Database (NLCD) serves as the definitive Landsat-based, 30-meter resolution, land cover database for the Nation. NLCD provides spatial reference and descriptive data for characteristics of the land surface such as thematic class (for example, urban, agriculture, and forest), percent impervious surface, and percent tree canopy cover. NLCD supports a wide variety of Federal, State, local, and nongovernmental applications that seek to assess ecosystem status and health, understand the spatial patterns of biodiversity, predict effects of climate change, and develop land management policy. NLCD products are created by the Multi-Resolution Land Characteristics (MRLC) Consortium, a partnership of Federal agencies led by the U.S. Geological Survey. All NLCD data products are available for download at no charge to the public from the MRLC Web site: http://www.mrlc.gov.

  13. GOLD: The Genomes Online Database

    DOE Data Explorer

    Kyrpides, Nikos; Liolios, Dinos; Chen, Amy; Tavernarakis, Nektarios; Hugenholtz, Philip; Markowitz, Victor; Bernal, Alex

    Since its inception in 1997, GOLD has continuously monitored genome sequencing projects worldwide and has provided the community with a unique centralized resource that integrates diverse information related to Archaea, Bacteria, Eukaryotic and more recently Metagenomic sequencing projects. As of September 2007, GOLD recorded 639 completed genome projects. These projects have their complete sequence deposited into the public archival sequence databases such as GenBank EMBL,and DDBJ. From the total of 639 complete and published genome projects as of 9/2007, 527 were bacterial, 47 were archaeal and 65 were eukaryotic. In addition to the complete projects, there were 2158 ongoing sequencing projects. 1328 of those were bacterial, 59 archaeal and 771 eukaryotic projects. Two types of metadata are provided by GOLD: (i) project metadata and (ii) organism/environment metadata. GOLD CARD pages for every project are available from the link of every GOLD_STAMP ID. The information in every one of these pages is organized into three tables: (a) Organism information, (b) Genome project information and (c) External links. [The Genomes On Line Database (GOLD) in 2007: Status of genomic and metagenomic projects and their associated metadata, Konstantinos Liolios, Konstantinos Mavromatis, Nektarios Tavernarakis and Nikos C. Kyrpides, Nucleic Acids Research Advance Access published online on November 2, 2007, Nucleic Acids Research, doi:10.1093/nar/gkm884]

    The basic tables in the GOLD database that can be browsed or searched include the following information:

    • Gold Stamp ID
    • Organism name
    • Domain
    • Links to information sources
    • Size and link to a map, when available
    • Chromosome number, Plas number, and GC content
    • A link for downloading the actual genome data
    • Institution that did the sequencing
    • Funding source
    • Database where information resides
    • Publication status and information

    (Specialized Interface)

  14. IPD--the Immuno Polymorphism Database.

    PubMed

    Robinson, James; Waller, Matthew J; Stoehr, Peter; Marsh, Steven G E

    2005-01-01

    The Immuno Polymorphism Database (IPD) (http://www.ebi.ac.uk/ipd/) is a set of specialist databases related to the study of polymorphic genes in the immune system. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of Killer-cell Immunoglobulin-like Receptors; IPD-MHC, a database of sequences of the Major Histocompatibility Complex of different species; IPD-HPA, alloantigens expressed only on platelets; and IPD-ESTAB, which provides access to the European Searchable Tumour Cell-Line Database, a cell bank of immunologically characterized melanoma cell lines. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. Those sections with similar data, such as IPD-KIR and IPD-MHC share the same database structure. The sharing of a common database structure makes it easier to implement common tools for data submission and retrieval. The data are currently available online from the website and ftp directory; files will also be made available in different formats to download from the website and ftp server. The data will also be included in SRS, BLAST and FASTA search engines at the European Bioinformatics Institute. PMID:15608253

  15. Database extraction strategies for low-template evidence.

    PubMed

    Bleka, Øyvind; Dørum, Guro; Haned, Hinda; Gill, Peter

    2014-03-01

    Often in forensic cases, the profile of at least one of the contributors to a DNA evidence sample is unknown and a database search is needed to discover possible perpetrators. In this article we consider two types of search strategies to extract suspects from a database using methods based on probability arguments. The performance of the proposed match scores is demonstrated by carrying out a study of each match score relative to the level of allele drop-out in the crime sample, simulating low-template DNA. The efficiency was measured by random man simulation and we compared the performance using the SGM Plus kit and the ESX 17 kit for the Norwegian population, demonstrating that the latter has greatly enhanced power to discover perpetrators of crime in large national DNA databases. The code for the database extraction strategies will be prepared for release in the R-package forensim. PMID:24528591

  16. PDS: A Performance Database Server

    DOE PAGESBeta

    Berry, Michael W.; Dongarra, Jack J.; Larose, Brian H.; Letsche, Todd A.

    1994-01-01

    The process of gathering, archiving, and distributing computer benchmark data is a cumbersome task usually performed by computer users and vendors with little coordination. Most important, there is no publicly available central depository of performance data for all ranges of machines from personal computers to supercomputers. We present an Internet-accessible performance database server (PDS) that can be used to extract current benchmark data and literature. As an extension to the X-Windows-based user interface (Xnetlib) to the Netlib archival system, PDS provides an on-line catalog of public domain computer benchmarks such as the LINPACK benchmark, Perfect benchmarks, and the NAS parallelmore » benchmarks. PDS does not reformat or present the benchmark data in any way that conflicts with the original methodology of any particular benchmark; it is thereby devoid of any subjective interpretations of machine performance. We believe that all branches (research laboratories, academia, and industry) of the general computing community can use this facility to archive performance metrics and make them readily available to the public. PDS can provide a more manageable approach to the development and support of a large dynamic database of published performance metrics.« less

  17. A software system for gene sequence database construction.

    PubMed

    Liu, Z; Borneman, J; Jiang, T

    2004-01-01

    We propose a Web-based software system for sequence database construction. An example application of this system is to construct a ribosomal RNA gene (rDNA) sequence database to facilitate the study of microbial communities. A fast and accurate approximate string-matching algorithm is implemented to fetch rDNA sequences sandwiched by two given primers from GenBank. A homology search algorithm based on Basic-Local-Alignment-Search-Tool (BLAST) is then used to extract rDNA sequences that do not contain the primers. This two-step process leads to an rDNA sequence database for a specific taxonomic group. We consider the distance between two given primers, mismatches and degeneracy when performing string matching. In the homology search, a chaining algorithm is combined with BLAST to obtain global alignments based on local alignments. This system can be used in many biological applications. PMID:17270858

  18. The Latin American Social Medicine database

    PubMed Central

    Eldredge, Jonathan D; Waitzkin, Howard; Buchanan, Holly S; Teal, Janis; Iriart, Celia; Wiley, Kevin; Tregear, Jonathan

    2004-01-01

    Background Public health practitioners and researchers for many years have been attempting to understand more clearly the links between social conditions and the health of populations. Until recently, most public health professionals in English-speaking countries were unaware that their colleagues in Latin America had developed an entire field of inquiry and practice devoted to making these links more clearly understood. The Latin American Social Medicine (LASM) database finally bridges this previous gap. Description This public health informatics case study describes the key features of a unique information resource intended to improve access to LASM literature and to augment understanding about the social determinants of health. This case study includes both quantitative and qualitative evaluation data. Currently the LASM database at The University of New Mexico brings important information, originally known mostly within professional networks located in Latin American countries to public health professionals worldwide via the Internet. The LASM database uses Spanish, Portuguese, and English language trilingual, structured abstracts to summarize classic and contemporary works. Conclusion This database provides helpful information for public health professionals on the social determinants of health and expands access to LASM. PMID:15627401

  19. 2010 Worldwide Gasification Database

    DOE Data Explorer

    The 2010 Worldwide Gasification Database describes the current world gasification industry and identifies near-term planned capacity additions. The database lists gasification projects and includes information (e.g., plant location, number and type of gasifiers, syngas capacity, feedstock, and products). The database reveals that the worldwide gasification capacity has continued to grow for the past several decades and is now at 70,817 megawatts thermal (MWth) of syngas output at 144 operating plants with a total of 412 gasifiers.

  20. Indexing in temporal databases

    SciTech Connect

    Novikov, B.A.

    1995-03-01

    The concepts of temporal databases and supporting physical structures are discussed. Most of the known access methods for temporal databases are variations of ordinary one-dimensional access methods and actually ignore time-dimensional features. An exception is TSB-trees, which support queries of different types. A modification of TSB-trees based on a trie hashing scheme and exceeding TSB-trees for speed of search in the actual part of a database is described.

  1. IPSec Database Query Acceleration

    NASA Astrophysics Data System (ADS)

    Ferrante, Alberto; Chandra, Satish; Piuri, Vincenzo

    IPSec is a suite of protocols that adds security to communications at the IP level. Protocols within IPSec make extensive use of two databases, namely the Security Policy Database (SPD) and the Security Association Database (SAD). The ability to query the SPD quickly is fundamental as this operation needs to be done for each incoming or outgoing IP packet, even if no IPSec processing needs to be applied on it. This may easily result in millions of query per second in gigabit networks.

  2. ITS-90 Thermocouple Database

    National Institute of Standards and Technology Data Gateway

    SRD 60 NIST ITS-90 Thermocouple Database (Web, free access)   Web version of Standard Reference Database 60 and NIST Monograph 175. The database gives temperature -- electromotive force (emf) reference functions and tables for the letter-designated thermocouple types B, E, J, K, N, R, S and T. These reference functions have been adopted as standards by the American Society for Testing and Materials (ASTM) and the International Electrotechnical Commission (IEC).

  3. Open access intrapartum CTG database

    PubMed Central

    2014-01-01

    Background Cardiotocography (CTG) is a monitoring of fetal heart rate and uterine contractions. Since 1960 it is routinely used by obstetricians to assess fetal well-being. Many attempts to introduce methods of automatic signal processing and evaluation have appeared during the last 20 years, however still no significant progress similar to that in the domain of adult heart rate variability, where open access databases are available (e.g. MIT-BIH), is visible. Based on a thorough review of the relevant publications, presented in this paper, the shortcomings of the current state are obvious. A lack of common ground for clinicians and technicians in the field hinders clinically usable progress. Our open access database of digital intrapartum cardiotocographic recordings aims to change that. Description The intrapartum CTG database consists in total of 552 intrapartum recordings, which were acquired between April 2010 and August 2012 at the obstetrics ward of the University Hospital in Brno, Czech Republic. All recordings were stored in electronic form in the OB TraceVue®;system. The recordings were selected from 9164 intrapartum recordings with clinical as well as technical considerations in mind. All recordings are at most 90 minutes long and start a maximum of 90 minutes before delivery. The time relation of CTG to delivery is known as well as the length of the second stage of labor which does not exceed 30 minutes. The majority of recordings (all but 46 cesarean sections) is – on purpose – from vaginal deliveries. All recordings have available biochemical markers as well as some more general clinical features. Full description of the database and reasoning behind selection of the parameters is presented in the paper. Conclusion A new open-access CTG database is introduced which should give the research community common ground for comparison of results on reasonably large database. We anticipate that after reading the paper, the reader will understand the context of the field from clinical and technical perspectives which will enable him/her to use the database and also understand its limitations. PMID:24418387

  4. 16 CFR 1102.28 - Publication of reports of harm.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... REGULATIONS PUBLICLY AVAILABLE CONSUMER PRODUCT SAFETY INFORMATION DATABASE (Eff. Jan. 10, 2011) Procedural..., the Commission will publish reports of harm that meet the requirements for publication in the Database...(d) in the Database beyond the 10-business-day time frame set forth in paragraph (a) of this...

  5. Databases for Microbiologists

    PubMed Central

    2015-01-01

    Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. The purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists. PMID:26013493

  6. Nuclear Science References Database

    NASA Astrophysics Data System (ADS)

    Pritychenko, B.; Běták, E.; Singh, B.; Totans, J.

    2014-06-01

    The Nuclear Science References (NSR) database together with its associated Web interface, is the world's only comprehensive source of easily accessible low- and intermediate-energy nuclear physics bibliographic information for more than 210,000 articles since the beginning of nuclear science. The weekly-updated NSR database provides essential support for nuclear data evaluation, compilation and research activities. The principles of the database and Web application development and maintenance are described. Examples of nuclear structure, reaction and decay applications are specifically included. The complete NSR database is freely available at the websites of the National Nuclear Data Center http://www.nndc.bnl.gov/nsr.

  7. Backing up DMF Databases

    NASA Technical Reports Server (NTRS)

    Cardo, Nicholas P.; Woodrow, Thomas (Technical Monitor)

    1994-01-01

    A complete backup of the Cray Data Migration Facility (DMF) databases should include the data migration databases, all media specific process' (MSP's) databases, and the journal file. The backup should be able to accomplished without impacting users or stopping DMF. The High Speed Processors group at the Numerical Aerodynamics Simulation (NAS) Facility at NASA Ames Research Center undertook the task of finding an effective and efficient way to backup all DMF databases. This has been accomplished by taking advantage of new features introduced in DMF 2.0 and adding a minor modification to the dmdaemon. This paper discusses the investigation and the changes necessary to implement these enhancements.

  8. Databases for LDEF results

    NASA Technical Reports Server (NTRS)

    Bohnhoff-Hlavacek, Gail

    1992-01-01

    One of the objectives of the team supporting the LDEF Systems and Materials Special Investigative Groups is to develop databases of experimental findings. These databases identify the hardware flown, summarize results and conclusions, and provide a system for acknowledging investigators, tracing sources of data, and future design suggestions. To date, databases covering the optical experiments, and thermal control materials (chromic acid anodized aluminum, silverized Teflon blankets, and paints) have been developed at Boeing. We used the Filemaker Pro software, the database manager for the Macintosh computer produced by the Claris Corporation. It is a flat, text-retrievable database that provides access to the data via an intuitive user interface, without tedious programming. Though this software is available only for the Macintosh computer at this time, copies of the databases can be saved to a format that is readable on a personal computer as well. Further, the data can be exported to more powerful relational databases, capabilities, and use of the LDEF databases and describe how to get copies of the database for your own research.

  9. Veterans Administration Databases

    Cancer.gov

    The Veterans Administration Information Resource Center provides database and informatics experts, customer service, expert advice, information products, and web technology to VA researchers and others.

  10. Increased coverage of protein families with the Blocks Database servers

    PubMed Central

    Henikoff, Jorja G.; Greene, Elizabeth A.; Pietrokovski, Shmuel; Henikoff, Steven

    2000-01-01

    The Blocks Database WWW (http://blocks.fhcrc.org ) and Email (blocks@blocks.fhcrc.org ) servers provide tools to search DNA and protein queries against the Blocks+ Database of multiple alignments, which represent conserved protein regions. Blocks+ nearly doubles the number of protein families included in the database by adding families from the Pfam-A, ProDom and Domo databases to those from PROSITE and PRINTS. Other new features include improved Block Searcher statistics, searching with NCBIs IMPALA program and 3D display of blocks on PDB structures. PMID:10592233

  11. Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers

    PubMed Central

    2011-01-01

    Background Most information on genomic variations and their associations with phenotypes are covered exclusively in scientific publications rather than in structured databases. These texts commonly describe variations using natural language; database identifiers are seldom mentioned. This complicates the retrieval of variations, associated articles, as well as information extraction, e. g. the search for biological implications. To overcome these challenges, procedures to map textual mentions of variations to database identifiers need to be developed. Results This article describes a workflow for normalization of variation mentions, i.e. the association of them to unique database identifiers. Common pitfalls in the interpretation of single nucleotide polymorphism (SNP) mentions are highlighted and discussed. The developed normalization procedure achieves a precision of 98.1 % and a recall of 67.5% for unambiguous association of variation mentions with dbSNP identifiers on a text corpus based on 296 MEDLINE abstracts containing 527 mentions of SNPs. The annotated corpus is freely available at http://www.scai.fraunhofer.de/snp-normalization-corpus.html. Conclusions Comparable approaches usually focus on variations mentioned on the protein sequence and neglect problems for other SNP mentions. The results presented here indicate that normalizing SNPs described on DNA level is more difficult than the normalization of SNPs described on protein level. The challenges associated with normalization are exemplified with ambiguities and errors, which occur in this corpus. PMID:21992066

  12. DNA Profiling of Convicted Offender Samples for the Combined DNA Index System

    ERIC Educational Resources Information Center

    Millard, Julie T

    2011-01-01

    The cornerstone of forensic chemistry is that a perpetrator inevitably leaves trace evidence at a crime scene. One important type of evidence is DNA, which has been instrumental in both the implication and exoneration of thousands of suspects in a wide range of crimes. The Combined DNA Index System (CODIS), a network of DNA databases, provides…

  13. DNA Profiling of Convicted Offender Samples for the Combined DNA Index System

    ERIC Educational Resources Information Center

    Millard, Julie T

    2011-01-01

    The cornerstone of forensic chemistry is that a perpetrator inevitably leaves trace evidence at a crime scene. One important type of evidence is DNA, which has been instrumental in both the implication and exoneration of thousands of suspects in a wide range of crimes. The Combined DNA Index System (CODIS), a network of DNA databases, provides

  14. DENdb: database of integrated human enhancers

    PubMed Central

    Ashoor, Haitham; Kleftogiannis, Dimitrios; Radovanovic, Aleksandar; Bajic, Vladimir B.

    2015-01-01

    Enhancers are cis-acting DNA regulatory regions that play a key role in distal control of transcriptional activities. Identification of enhancers, coupled with a comprehensive functional analysis of their properties, could improve our understanding of complex gene transcription mechanisms and gene regulation processes in general. We developed DENdb, a centralized on-line repository of predicted enhancers derived from multiple human cell-lines. DENdb integrates enhancers predicted by five different methods generating an enriched catalogue of putative enhancers for each of the analysed cell-lines. DENdb provides information about the overlap of enhancers with DNase I hypersensitive regions, ChIP-seq regions of a number of transcription factors and transcription factor binding motifs, means to explore enhancer interactions with DNA using several chromatin interaction assays and enhancer neighbouring genes. DENdb is designed as a relational database that facilitates fast and efficient searching, browsing and visualization of information. Database URL: http://www.cbrc.kaust.edu.sa/dendb/ PMID:26342387

  15. Long Valley caldera GIS Database

    NASA Astrophysics Data System (ADS)

    Williams, M. J.; Battaglia, M.; Hill, D.; Langbein, J.; Segall, P.

    2002-12-01

    In May of 1980, a strong earthquake swarm that included four magnitude 6 earthquakes struck the southern margin of Long Valley Caldera associated with a 25-cm, dome-shaped uplift of the caldera floor. These events marked the onset of the latest period of caldera unrest that continues to this day. This ongoing unrest includes recurring earthquake swarms and continued dome-shaped uplift of the central section of the caldera (the resurgent dome) accompanied by changes in thermal springs and gas emissions. Analysis of combined gravity and geodetic data confirms the intrusion of silicic magma beneath Long Valley caldera. In 1982, the U.S. Geological Survey under the Volcano Hazards Program began an intensive effort to monitor and study geologic unrest in Long Valley Caldera. This database provides an overview of the studies being conducted by the Long Valley Observatory in Eastern California from 1975 to 2000. The database includes geological, monitoring and topographic datasets related to the Long Valley Caldera, plus a number of USGS publications on Long Valley (e.g., fact-sheets, references). Datasets are available as text files or ArcView shapefiles. Database CD-ROM Table of Contents: - Geological data (digital geologic map) - Monitoring data: Deformation (EDM, GPS, Leveling); Earthquakes; Gravity; Hydrologic; CO2 - Topographic data: DEM, DRG, Landsat 7, Rivers, Roads, Water Bodies - ArcView Project File

  16. Development a GIS Snowstorm Database

    NASA Astrophysics Data System (ADS)

    Squires, M. F.

    2010-12-01

    This paper describes the development of a GIS Snowstorm Database (GSDB) at NOAA’s National Climatic Data Center. The snowstorm database is a collection of GIS layers and tabular information for 471 snowstorms between 1900 and 2010. Each snowstorm has undergone automated and manual quality control. The beginning and ending date of each snowstorm is specified. The original purpose of this data was to serve as input for NCDC’s new Regional Snowfall Impact Scale (ReSIS). However, this data is being preserved and used to investigate the impacts of snowstorms on society. GSDB is used to summarize the impact of snowstorms on transportation (interstates) and various classes of facilities (roads, schools, hospitals, etc.). GSDB can also be linked to other sources of impacts such as insurance loss information and Storm Data. Thus the snowstorm database is suited for many different types of users including the general public, decision makers, and researchers. This paper summarizes quality control issues associated with using snowfall data, methods used to identify the starting and ending dates of a storm, and examples of the tables that combine snowfall and societal data.

  17. IDBD: Infectious Disease Biomarker Database

    PubMed Central

    Yang, In Seok; Ryu, Chunsun; Cho, Ki Joon; Kim, Jin Kwang; Ong, Swee Hoe; Mitchell, Wayne P.; Kim, Bong Su; Kim, Kyung Hyun

    2008-01-01

    Biomarkers enable early diagnosis, guide molecularly targeted therapy and monitor the activity and therapeutic responses across a variety of diseases. Despite intensified interest and research, however, the overall rate of development of novel biomarkers has been falling. Moreover, no solution is yet available that efficiently retrieves and processes biomarker information pertaining to infectious diseases. Infectious Disease Biomarker Database (IDBD) is one of the first efforts to build an easily accessible and comprehensive literature-derived database covering known infectious disease biomarkers. IDBD is a community annotation database, utilizing collaborative Web 2.0 features, providing a convenient user interface to input and revise data online. It allows users to link infectious diseases or pathogens to protein, gene or carbohydrate biomarkers through the use of search tools. It supports various types of data searches and application tools to analyze sequence and structure features of potential and validated biomarkers. Currently, IDBD integrates 611 biomarkers for 66 infectious diseases and 70 pathogens. It is publicly accessible at http://biomarker.cdc.go.kr and http://biomarker.korea.ac.kr. PMID:17982173

  18. ERGDB: Estrogen Responsive Genes Database.

    PubMed

    Tang, Suisheng; Han, Hao; Bajic, Vladimir B

    2004-01-01

    ERGDB is an integrated knowledge database dedicated to genes responsive to estrogen. Genes included in ERGDB are those whose expression levels are experimentally proven to be either up-regulated or down-regulated by estrogen. Genes included are identified based on publications from the PubMed database and each record has been manually examined, evaluated and selected for inclusion by biologists. ERGDB aims to be a unified gateway to store, search, retrieve and update information about estrogen responsive genes. Each record contains links to relevant databases, such as GenBank, LocusLink, Refseq, PubMed and ATCC. The unique feature of ERGDB is that it contains information on the dependence of gene reactions on experimental conditions. In addition to basic information about the genes, information for each record includes gene functional description, experimental methods used, tissue or cell type, gene reaction, estrogen exposure time and the summary of putative estrogen response elements if the gene's promoter sequence was available. Through a web interface at http://sdmc.i2r.a-star.edu.sg/ergdb/ cgi-bin/explore.pl users can either browse or query ERGDB. Access is free for academic and non-profit users. PMID:14681475

  19. Biological Macromolecule Crystallization Database

    National Institute of Standards and Technology Data Gateway

    SRD 21 Biological Macromolecule Crystallization Database (Web, free access)   The Biological Macromolecule Crystallization Database and NASA Archive for Protein Crystal Growth Data (BMCD) contains the conditions reported for the crystallization of proteins and nucleic acids used in X-ray structure determinations and archives the results of microgravity macromolecule crystallization studies.

  20. HIV Structural Database

    National Institute of Standards and Technology Data Gateway

    SRD 102 HIV Structural Database (Web, free access)   The HIV Protease Structural Database is an archive of experimentally determined 3-D structures of Human Immunodeficiency Virus 1 (HIV-1), Human Immunodeficiency Virus 2 (HIV-2) and Simian Immunodeficiency Virus (SIV) Proteases and their complexes with inhibitors or products of substrate cleavage.

  1. First Look: TRADEMARKSCAN Database.

    ERIC Educational Resources Information Center

    Fernald, Anne Conway; Davidson, Alan B.

    1984-01-01

    Describes database produced by Thomson and Thomson and available on Dialog which contains over 700,000 records representing all active federal trademark registrations and applications for registrations filed in United States Patent and Trademark Office. A typical record, special features, database applications, learning to use TRADEMARKSCAN, and

  2. Dictionary as Database.

    ERIC Educational Resources Information Center

    Painter, Derrick

    1996-01-01

    Discussion of dictionaries as databases focuses on the digitizing of The Oxford English dictionary (OED) and the use of Standard Generalized Mark-Up Language (SGML). Topics include the creation of a consortium to digitize the OED, document structure, relational databases, text forms, sequence, and discourse. (LRW)

  3. Assignment to database industy

    NASA Astrophysics Data System (ADS)

    Abe, Kohichiroh

    Various kinds of databases are considered to be essential part in future large sized systems. Information provision only by databases is also considered to be growing as the market becomes mature. This paper discusses how such circumstances have been built and will be developed from now on.

  4. BioImaging Database

    SciTech Connect

    David Nix, Lisa Simirenko

    2006-10-25

    The Biolmaging Database (BID) is a relational database developed to store the data and meta-data for the 3D gene expression in early Drosophila embryo development on a cellular level. The schema was written to be used with the MySQL DBMS but with minor modifications can be used on any SQL compliant relational DBMS.

  5. Build Your Own Database.

    ERIC Educational Resources Information Center

    Jacso, Peter; Lancaster, F. W.

    This book is intended to help librarians and others to produce databases of better value and quality, especially if they have had little previous experience in database construction. Drawing upon almost 40 years of experience in the field of information retrieval, this book emphasizes basic principles and approaches rather than in-depth and…

  6. The intelligent database machine

    NASA Technical Reports Server (NTRS)

    Yancey, K. E.

    1985-01-01

    The IDM data base was compared with the data base crack to determine whether IDM 500 would better serve the needs of the MSFC data base management system than Oracle. The two were compared and the performance of the IDM was studied. Implementations that work best on which database are implicated. The choice is left to the database administrator.

  7. Database Reviews: Legal Information.

    ERIC Educational Resources Information Center

    Seiser, Virginia

    Detailed reviews of two legal information databases--"Laborlaw I" and "Legal Resource Index"--are presented in this paper. Each database review begins with a bibliographic entry listing the title; producer; vendor; cost per hour contact time; offline print cost per citation; time period covered; frequency of updates; and size of file. A detailed…

  8. Atomic Spectra Database (ASD)

    National Institute of Standards and Technology Data Gateway

    SRD 78 NIST Atomic Spectra Database (ASD) (Web, free access)   This database provides access and search capability for NIST critically evaluated data on atomic energy levels, wavelengths, and transition probabilities that are reasonably up-to-date. The NIST Atomic Spectroscopy Data Center has carried out these critical compilations.

  9. Structural Ceramics Database

    National Institute of Standards and Technology Data Gateway

    SRD 30 NIST Structural Ceramics Database (Web, free access)   The NIST Structural Ceramics Database (WebSCD) provides evaluated materials property data for a wide range of advanced ceramics known variously as structural ceramics, engineering ceramics, and fine ceramics.

  10. Knowledge Discovery in Databases.

    ERIC Educational Resources Information Center

    Norton, M. Jay

    1999-01-01

    Knowledge discovery in databases (KDD) revolves around the investigation and creation of knowledge, processes, algorithms, and mechanisms for retrieving knowledge from data collections. The article is an introductory overview of KDD. The rationale and environment of its development and applications are discussed. Issues related to database design…

  11. Knowledge Discovery in Databases.

    ERIC Educational Resources Information Center

    Norton, M. Jay

    1999-01-01

    Knowledge discovery in databases (KDD) revolves around the investigation and creation of knowledge, processes, algorithms, and mechanisms for retrieving knowledge from data collections. The article is an introductory overview of KDD. The rationale and environment of its development and applications are discussed. Issues related to database design

  12. Database Searching by Managers.

    ERIC Educational Resources Information Center

    Arnold, Stephen E.

    Managers and executives need the easy and quick access to business and management information that online databases can provide, but many have difficulty articulating their search needs to an intermediary. One possible solution would be to encourage managers and their immediate support staff members to search textual databases directly as they now…

  13. A Quality System Database

    NASA Technical Reports Server (NTRS)

    Snell, William H.; Turner, Anne M.; Gifford, Luther; Stites, William

    2010-01-01

    A quality system database (QSD), and software to administer the database, were developed to support recording of administrative nonconformance activities that involve requirements for documentation of corrective and/or preventive actions, which can include ISO 9000 internal quality audits and customer complaints.

  14. SENTRA, a database of signal transduction proteins.

    SciTech Connect

    D'Souza, M.; Romine, M. F.; Maltsev, N.; Mathematics and Computer Science; PNNL

    2000-01-01

    SENTRA, available via URL http://wit.mcs.anl.gov/WIT2/Sentra/, is a database of proteins associated with microbial signal transduction. The database currently includes the classical two-component signal transduction pathway proteins and methyl-accepting chemotaxis proteins, but will be expanded to also include other classes of signal transduction systems that are modulated by phosphorylation or methylation reactions. Although the majority of database entries are from prokaryotic systems, eukaroytic proteins with bacterial-like signal transduction domains are also included. Currently SENTRA contains signal transduction proteins in 34 complete and almost completely sequenced prokaryotic genomes, as well as sequences from 243 organisms available in public databases (SWISS-PROT and EMBL). The analysis was carried out within the framework of the WIT2 system, which is designed and implemented to support genetic sequence analysis and comparative analysis of sequenced genomes.

  15. The HITRAN 2008 Molecular Spectroscopic Database

    NASA Technical Reports Server (NTRS)

    Rothman, Laurence S.; Gordon, Iouli E.; Barbe, Alain; Benner, D. Chris; Bernath, Peter F.; Birk, Manfred; Boudon, V.; Brown, Linda R.; Campargue, Alain; Champion, J.-P.; Chance, Kelly V.; Coudert, L. H.; Sung, K.; Toth, R. A.

    2009-01-01

    This paper describes the status of the 2008 edition of the HITRAN molecular spectroscopic database. The new edition is the first official public release since the 2004 edition, although a number of crucial updates had been made available online since 2004. The HITRAN compilation consists of several components that serve as input for radiative-transfer calculation codes: individual line parameters for the microwave through visible spectra of molecules in the gas phase; absorption cross-sections for molecules having dense spectral features, i.e., spectra in which the individual lines are not resolved; individual line parameters and absorption cross sections for bands in the ultra-violet; refractive indices of aerosols, tables and files of general properties associated with the database; and database management software. The line-by-line portion of the database contains spectroscopic parameters for forty-two molecules including many of their isotopologues.

  16. [Total quality management of clinical database].

    PubMed

    Okubo, Suguru; Miyata, Hiroaki; Tomotaki, Ai; Motomura, Noboru; Murakami, Arata; Ono, Minoru; Iwanaka, Tadashi

    2013-06-01

    Data entry system should be constructed considering utility, accuracy, propriety, and feasibility. The methods for developing useful and accurate clinical databases are 1)system development based on the concept of "error proofing", 2)system test by real users, 3)guidances for participants, and 4)incentive for accurate data entry. In terms of propriety, to gain patient's consent on data collection and to publicly announce objectives and methods of clinical database are necessary. Confidentiality and anonymization of data are also important. Balancing efficacy and propriety for maximization of patients' and societal benefit is one of the important responsibilities of database management organizations. In addition, assessment of data quality such as audit and feedback is useful for enhancing accuracy and reliability of clinical databases. PMID:23917055

  17. Dial-up remote access image database

    NASA Astrophysics Data System (ADS)

    Ho, Chung-Ding; Lee, Su-Ming; Liao, Pen-Kung; Tsai, Ming-Houng; Shieh, Wern-Sheng; Chang, Horng-Ren; Ju, Rong-Hauh

    1994-04-01

    In this paper, a prototyping system for dial-up remote access image database is proposed. As a videotex system, the system includes Information Customer, Information Provider, Communication Server, Public Switch Telephone Networks, and a database server containing an image database. Because the color natural image is included in the database, the high resolution visual medium are given and many possible applications can be provided. Currently, a color image with a resolution of 400 by 400 can be accessed in about 25 seconds by using JPEG compression and high-speed modem. The system can be employed on many applications, such as home-shopping, remote education, etc. Also, it can be a pioneer system to provide teleservice in Integrated Serve Digital Network.

  18. PubChem Substance and Compound databases

    PubMed Central

    Kim, Sunghwan; Thiessen, Paul A.; Bolton, Evan E.; Chen, Jie; Fu, Gang; Gindulyte, Asta; Han, Lianyi; He, Jane; He, Siqian; Shoemaker, Benjamin A.; Wang, Jiyao; Yu, Bo; Zhang, Jian; Bryant, Stephen H.

    2016-01-01

    PubChem (https://pubchem.ncbi.nlm.nih.gov) is a public repository for information on chemical substances and their biological activities, launched in 2004 as a component of the Molecular Libraries Roadmap Initiatives of the US National Institutes of Health (NIH). For the past 11 years, PubChem has grown to a sizable system, serving as a chemical information resource for the scientific research community. PubChem consists of three inter-linked databases, Substance, Compound and BioAssay. The Substance database contains chemical information deposited by individual data contributors to PubChem, and the Compound database stores unique chemical structures extracted from the Substance database. Biological activity data of chemical substances tested in assay experiments are contained in the BioAssay database. This paper provides an overview of the PubChem Substance and Compound databases, including data sources and contents, data organization, data submission using PubChem Upload, chemical structure standardization, web-based interfaces for textual and non-textual searches, and programmatic access. It also gives a brief description of PubChem3D, a resource derived from theoretical three-dimensional structures of compounds in PubChem, as well as PubChemRDF, Resource Description Framework (RDF)-formatted PubChem data for data sharing, analysis and integration with information contained in other databases. PMID:26400175

  19. Protein sequence databases.

    PubMed

    Apweiler, Rolf; Bairoch, Amos; Wu, Cathy H

    2004-02-01

    A variety of protein sequence databases exist, ranging from simple sequence repositories, which store data with little or no manual intervention in the creation of the records, to expertly curated universal databases that cover all species and in which the original sequence data are enhanced by the manual addition of further information in each sequence record. As the focus of researchers moves from the genome to the proteins encoded by it, these databases will play an even more important role as central comprehensive resources of protein information. Several the leading protein sequence databases are discussed here, with special emphasis on the databases now provided by the Universal Protein Knowledgebase (UniProt) consortium. PMID:15036160

  20. [Glaucoma Service Database].

    PubMed

    Jamrozy-Witkowska, Agnieszka M; Witkowski, Tomasz; Krzyzanowska, Patrycja

    2003-01-01

    We present the common problems related to clinical databases. The Glaucoma Service Database created in our clinic is an attempt of developing the optimal medical database. The system organizes our repository of clinical data. It consist of 3 modules: 1) the users list with predefined privileges and rights, 2) lists of coded data for further use, that facilitate filling in the fields, 3) clinical details of all patients. The user interface of our database is very simply, thus it is very easy to use it even by unskilled staff. The accuracy of data is protected by system's internal algorithms. It could be used to investigate clinical epidemiology, risk assessment, post-marketing surveillance of drugs, practice variation and decision analysis. Data from Glaucoma Service Database can also help in the management of health service. PMID:14969171

  1. Cascadia Tsunami Deposit Database

    USGS Publications Warehouse

    Peters, Robert; Jaffe, Bruce; Gelfenbaum, Guy; Peterson, Curt

    2003-01-01

    The Cascadia Tsunami Deposit Database contains data on the location and sedimentological properties of tsunami deposits found along the Cascadia margin. Data have been compiled from 52 studies, documenting 59 sites from northern California to Vancouver Island, British Columbia that contain known or potential tsunami deposits. Bibliographical references are provided for all sites included in the database. Cascadia tsunami deposits are usually seen as anomalous sand layers in coastal marsh or lake sediments. The studies cited in the database use numerous criteria based on sedimentary characteristics to distinguish tsunami deposits from sand layers deposited by other processes, such as river flooding and storm surges. Several studies cited in the database contain evidence for more than one tsunami at a site. Data categories include age, thickness, layering, grainsize, and other sedimentological characteristics of Cascadia tsunami deposits. The database documents the variability observed in tsunami deposits found along the Cascadia margin.

  2. RefSeq microbial genomes database: new representation and annotation strategy.

    PubMed

    Tatusova, Tatiana; Ciufo, Stacy; Fedorov, Boris; O'Neill, Kathleen; Tolstoy, Igor

    2014-01-01

    The source of the microbial genomic sequences in the RefSeq collection is the set of primary sequence records submitted to the International Nucleotide Sequence Database public archives. These can be accessed through the Entrez search and retrieval system at http://www.ncbi.nlm.nih.gov/genome. Next-generation sequencing has enabled researchers to perform genomic sequencing at rates that were unimaginable in the past. Microbial genomes can now be sequenced in a matter of hours, which has led to a significant increase in the number of assembled genomes deposited in the public archives. This huge increase in DNA sequence data presents new challenges for the annotation, analysis and visualization bioinformatics tools. New strategies have been developed for the annotation and representation of reference genomes and sequence variations derived from population studies and clinical outbreaks. PMID:24316578

  3. MINT, the molecular interaction database: 2012 update.

    PubMed

    Licata, Luana; Briganti, Leonardo; Peluso, Daniele; Perfetto, Livia; Iannuccelli, Marta; Galeota, Eugenia; Sacco, Francesca; Palma, Anita; Nardozza, Aurelio Pio; Santonico, Elena; Castagnoli, Luisa; Cesareni, Gianni

    2012-01-01

    The Molecular INTeraction Database (MINT, http://mint.bio.uniroma2.it/mint/) is a public repository for protein-protein interactions (PPI) reported in peer-reviewed journals. The database grows steadily over the years and at September 2011 contains approximately 235,000 binary interactions captured from over 4750 publications. The web interface allows the users to search, visualize and download interactions data. MINT is one of the members of the International Molecular Exchange consortium (IMEx) and adopts the Molecular Interaction Ontology of the Proteomics Standard Initiative (PSI-MI) standards for curation and data exchange. MINT data are freely accessible and downloadable at http://mint.bio.uniroma2.it/mint/download.do. We report here the growth of the database, the major changes in curation policy and a new algorithm to assign a confidence to each interaction. PMID:22096227

  4. Using the Proteomics Identifications Database (PRIDE).

    PubMed

    Martens, Lennart; Jones, Phil; Côté, Richard

    2008-03-01

    The Proteomics Identifications Database (PRIDE) is a public data repository designed to store, disseminate, and analyze mass spectrometry based proteomics datasets. The PRIDE database can accommodate any level of detailed metadata about the submitted results, which can be queried, explored, viewed, or downloaded via the PRIDE Web interface. The PRIDE database also provides a simple, yet powerful, access control mechanism that fully supports confidential peer-reviewing of data related to a manuscript, ensuring that these results remain invisible to the general public while allowing referees and journal editors anonymized access to the data. This unit describes in detail the functionality that PRIDE provides with regards to searching, viewing, and comparing the available data, as well as different options for submitting data to PRIDE. PMID:18428683

  5. Novel circular DNA viruses identified in Procordulia grayi and Xanthocnemis zealandica larvae using metagenomic approaches.

    PubMed

    Dayaram, Anisha; Galatowitsch, Mark; Harding, Jon S; Argüello-Astorga, Gerardo R; Varsani, Arvind

    2014-03-01

    Recent advances in sequencing and metagenomics have enabled the discovery of many novel single stranded DNA (ssDNA) viruses from various environments. We have previously demonstrated that adult dragonflies, as predatory insects, are useful indicators of ssDNA viruses in terrestrial ecosystems. Here we recover and characterise 13 viral genomes which represent 10 novel and diverse circular replication associated protein (Rep)-encoding single stranded (CRESS) DNA viruses (1628-2668nt) from Procordulia grayi and Xanthocnemis zealandica dragonfly larvae collected from four high-country lakes in the South Island of New Zealand. The dragonfly larvae associated CRESS DNA viruses have different genome architectures, however, they all encode two major open reading frames (ORFs) which either have bidirectional or unidirectional arrangement. The 13 viral genomes have a conserved NAGTATTAC nonanucleotide motif and in their predicted Rep proteins we identified the rolling circle replication (RCR) motif 1, 2 and 3, as well as superfamily 3 (SF3) helicase motifs. Maximum likelihood phylogenetic and pairwise identity analysis of the Rep amino acid sequences reveal that the dragonfly larvae novel CRESS DNA viruses share <63% pairwise amino acid identity to the Reps of other CRESS DNA viruses whose complete genomes have been determined and available in public databases and that these viruses are novel. CRESS DNA viruses are circulating in larval dragonfly populations; however, we are unable to ascertain whether these viruses are infecting the larvae directly or are transient within dragonflies via their diet. PMID:24462907

  6. Complete genomic DNA sequence of the East Asian spotted fever disease agent Rickettsia japonica.

    PubMed

    Matsutani, Minenosuke; Ogawa, Motohiko; Takaoka, Naohisa; Hanaoka, Nozomu; Toh, Hidehiro; Yamashita, Atsushi; Oshima, Kenshiro; Hirakawa, Hideki; Kuhara, Satoru; Suzuki, Harumi; Hattori, Masahira; Kishimoto, Toshio; Ando, Shuji; Azuma, Yoshinao; Shirai, Mutsunori

    2013-01-01

    Rickettsia japonica is an obligate intracellular alphaproteobacteria that causes tick-borne Japanese spotted fever, which has spread throughout East Asia. We determined the complete genomic DNA sequence of R. japonica type strain YH (VR-1363), which consists of 1,283,087 base pairs (bp) and 971 protein-coding genes. Comparison of the genomic DNA sequence of R. japonica with other rickettsiae in the public databases showed that 2 regions (4,323 and 216 bp) were conserved in a very narrow range of Rickettsia species, and the shorter one was inserted in, and disrupted, a preexisting open reading frame (ORF). While it is unknown how the DNA sequences were acquired in R. japonica genomes, it may be a useful signature for the diagnosis of Rickettsia species. Instead of the species-specific inserted DNA sequences, rickettsial genomes contain Rickettsia-specific palindromic elements (RPEs), which are also capable of locating in preexisting ORFs. Precise alignments of protein and DNA sequences involving RPEs showed that when a gene contains an inserted DNA sequence, each rickettsial ortholog carried an inserted DNA sequence at the same locus. The sequence, ATGAC, was shown to be highly frequent and thus characteristic in certain RPEs (RPE-4, RPE-6, and RPE-7). This finding implies that RPE-4, RPE-6, and RPE-7 were derived from a common inserted DNA sequence. PMID:24039725

  7. A Database System for Course Administration.

    ERIC Educational Resources Information Center

    Benbasat, Izak; And Others

    1982-01-01

    Describes a computer-assisted testing system which produces multiple-choice examinations for a college course in business administration. The system uses SPIRES (Stanford Public Information REtrieval System) to manage a database of questions and related data, mark-sense cards for machine grading tests, and ACL (6) (Audit Command Language) to

  8. A Database System for Course Administration.

    ERIC Educational Resources Information Center

    Benbasat, Izak; And Others

    1982-01-01

    Describes a computer-assisted testing system which produces multiple-choice examinations for a college course in business administration. The system uses SPIRES (Stanford Public Information REtrieval System) to manage a database of questions and related data, mark-sense cards for machine grading tests, and ACL (6) (Audit Command Language) to…

  9. Multiple Database Access: More Users, More Considerations.

    ERIC Educational Resources Information Center

    Hardin, Steve

    1991-01-01

    Describes how the Educational Resources Information Center (ERIC) and several other databases were added to the online public access catalog at Indiana State University, and explores impacts of enhancing access in this manner at several academic libraries. Telecommunications and software design consideration are also discussed, together with…

  10. Plant databases and data analysis tools

    Technology Transfer Automated Retrieval System (TEKTRAN)

    It is anticipated that the coming years will see the generation of large datasets including diagnostic markers in several plant species with emphasis on crop plants. To use these datasets effectively in any plant breeding program, it is essential to have the information available via public database...

  11. THE DRINKING WATER TREATABILITY DATABASE (Conference Paper)

    EPA Science Inventory

    The Drinking Water Treatability Database (TDB) assembles referenced data on the control of contaminants in drinking water, housed on an interactive, publicly-available, USEPA web site (www.epa.gov/tdb). The TDB is of use to drinking water utilities, treatment process design engin...

  12. THE DRINKING WATER TREATABILITY DATABASE (Slides)

    EPA Science Inventory

    The Drinking Water Treatability Database (TDB) assembles referenced data on the control of contaminants in drinking water, housed on an interactive, publicly-available, USEPA web site (www.epa.gov/tdb). The TDB is of use to drinking water utilities, treatment process design engin...

  13. 24 CFR 902.24 - Database adjustment.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... the technical review process in § 902.68. (2) To adjust a physical condition score based on... PUBLIC HOUSING ASSESSMENT SYSTEM Physical Condition Indicator § 902.24 Database adjustment. (a) Adjustments for factors not reflected or inappropriately reflected in physical condition score....

  14. The RECONS 25 Parsec Database

    NASA Astrophysics Data System (ADS)

    Henry, Todd J.; Jao, Wei-Chun; Pewett, Tiffany; Riedel, Adric R.; Silverstein, Michele L.; Slatten, Kenneth J.; Winters, Jennifer G.; Recons Team

    2015-01-01

    The REsearch Consortium On Nearby Stars (RECONS, www.recons.org) Team has been mapping the solar neighborhood since 1994. Nearby stars provide the fundamental framework upon which all of stellar astronomy is based, both for individual stars and stellar populations. The nearest stars are also the primary targets for extrasolar planet searches, and will undoubtedly play key roles in understanding the prevalence and structure of solar systems, and ultimately, in our search for life elsewhere.We have built the RECONS 25 Parsec Database to encourage and enable exploration of the Sun's nearest neighbors. The Database, slated for public release in 2015, contains 3088 stars, brown dwarfs, andexoplanets in 2184 systems as of October 1, 2014. All of these systems have accurate trigonometric parallaxes in the refereed literature placing them closer than 25.0 parsecs, i.e., parallaxes greater than 40 mas with errors less than 10 mas. Carefully vetted astrometric, photometric, and spectroscopic data are incorporated intothe Database from reliable sources, including significant original data collected by members of the RECONS Team.Current exploration of the solar neighborhood by RECONS, enabled by the Database, focuses on the ubiquitous red dwarfs, including: assessing the stellar companion population of ~1200 red dwarfs (Winters), investigating the astrophysical causes that spread red dwarfs of similar temperatures by a factor of 16 in luminosity (Pewett), and canvassing ~3000 red dwarfs for excess emission due to unseen companions and dust (Silverstein). In addition, a decade long astrometric survey of ~500 red dwarfs in the southern sky has begun, in an effort to understand the stellar, brown dwarf, and planetary companion populations for the stars that make up at least 75% of all stars in the Universe.This effort has been supported by the NSF through grants AST-0908402, AST-1109445, and AST-1412026, and via observations made possible by the SMARTS Consortium.

  15. Central Asia Active Fault Database

    NASA Astrophysics Data System (ADS)

    Mohadjer, Solmaz; Ehlers, Todd A.; Kakar, Najibullah

    2014-05-01

    The ongoing collision of the Indian subcontinent with Asia controls active tectonics and seismicity in Central Asia. This motion is accommodated by faults that have historically caused devastating earthquakes and continue to pose serious threats to the population at risk. Despite international and regional efforts to assess seismic hazards in Central Asia, little attention has been given to development of a comprehensive database for active faults in the region. To address this issue and to better understand the distribution and level of seismic hazard in Central Asia, we are developing a publically available database for active faults of Central Asia (including but not limited to Afghanistan, Tajikistan, Kyrgyzstan, northern Pakistan and western China) using ArcGIS. The database is designed to allow users to store, map and query important fault parameters such as fault location, displacement history, rate of movement, and other data relevant to seismic hazard studies including fault trench locations, geochronology constraints, and seismic studies. Data sources integrated into the database include previously published maps and scientific investigations as well as strain rate measurements and historic and recent seismicity. In addition, high resolution Quickbird, Spot, and Aster imagery are used for selected features to locate and measure offset of landforms associated with Quaternary faulting. These features are individually digitized and linked to attribute tables that provide a description for each feature. Preliminary observations include inconsistent and sometimes inaccurate information for faults documented in different studies. For example, the Darvaz-Karakul fault which roughly defines the western margin of the Pamir, has been mapped with differences in location of up to 12 kilometers. The sense of motion for this fault ranges from unknown to thrust and strike-slip in three different studies despite documented left-lateral displacements of Holocene and late Pleistocene landforms observed near the fault trace.

  16. Pfam: the protein families database.

    PubMed

    Finn, Robert D; Bateman, Alex; Clements, Jody; Coggill, Penelope; Eberhardt, Ruth Y; Eddy, Sean R; Heger, Andreas; Hetherington, Kirstie; Holm, Liisa; Mistry, Jaina; Sonnhammer, Erik L L; Tate, John; Punta, Marco

    2014-01-01

    Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures. PMID:24288371

  17. Hazard Analysis Database Report

    SciTech Connect

    GAULT, G.W.

    1999-10-13

    The Hazard Analysis Database was developed in conjunction with the hazard analysis activities conducted in accordance with DOE-STD-3009-94, Preparation Guide for US Department of Energy Nonreactor Nuclear Facility Safety Analysis Reports, for the Tank Waste Remediation System (TWRS) Final Safety Analysis Report (FSAR). The FSAR is part of the approved TWRS Authorization Basis (AB). This document describes, identifies, and defines the contents and structure of the TWRS FSAR Hazard Analysis Database and documents the configuration control changes made to the database. The TWRS Hazard Analysis Database contains the collection of information generated during the initial hazard evaluations and the subsequent hazard and accident analysis activities. The database supports the preparation of Chapters 3,4, and 5 of the TWRS FSAR and the USQ process and consists of two major, interrelated data sets: (1) Hazard Evaluation Database--Data from the results of the hazard evaluations; and (2) Hazard Topography Database--Data from the system familiarization and hazard identification.

  18. Database similarity searches.

    PubMed

    Plewniak, Frédéric

    2008-01-01

    With genome sequencing projects producing huge amounts of sequence data, database sequence similarity search has become a central tool in bioinformatics to identify potentially homologous sequences. It is thus widely used as an initial step for sequence characterization and annotation, phylogeny, genomics, transcriptomics, and proteomics studies. Database similarity search is based upon sequence alignment methods also used in pairwise sequence comparison. Sequence alignment can be global (whole sequence alignment) or local (partial sequence alignment) and there are algorithms to find the optimal alignment given particular comparison criteria. However, as database searches require the comparison of the query sequence with every single sequence in the database, heuristic algorithms have been designed to reduce the time required to build an alignment that has a reasonable chance to be the best one. Such algorithms have been implemented as fast and efficient programs (Blast, FastA) available in different types to address different kinds of problems. After searching the appropriate database, similarity search programs produce a list of similar sequences and local alignments. These results should be carefully examined before coming to any conclusion, as many traps await the similarity seeker: paralogues, multidomain proteins, pseudogenes, etc. This chapter presents points that should always be kept in mind when performing database similarity searches for various goals. It ends with a practical example of sequence characterization from a single protein database search using Blast. PMID:18592192

  19. ResPlan Database

    NASA Technical Reports Server (NTRS)

    Zellers, Michael L.

    2003-01-01

    The main project I was involved in was new application development for the existing CIS0 Database (ResPlan). This database application was developed in Microsoft Access. Initial meetings with Greg Follen, Linda McMillen, Griselle LaFontaine and others identified a few key weaknesses with the existing database. The weaknesses centered around that while the database correctly modeled the structure of Programs, Projects and Tasks, once the data was entered, the database did not capture any dynamic status information, and as such was of limited usefulness. After the initial meetings my goals were identified as follows: Enhance the ResPlan Database to include qualitative and quantitative status information about the Programs, Projects and Tasks Train staff members about the ResPlan database from both the user perspective and the developer perspective Give consideration to a Web Interface for reporting. Initially, the thought was that there would not be adequate time to actually develop the Web Interface, Greg wanted it understood that this was an eventual goal and as such should be a consideration throughout the development process.

  20. Hazard Analysis Database Report

    SciTech Connect

    GRAMS, W.H.

    2000-12-28

    The Hazard Analysis Database was developed in conjunction with the hazard analysis activities conducted in accordance with DOE-STD-3009-94, Preparation Guide for U S . Department of Energy Nonreactor Nuclear Facility Safety Analysis Reports, for HNF-SD-WM-SAR-067, Tank Farms Final Safety Analysis Report (FSAR). The FSAR is part of the approved Authorization Basis (AB) for the River Protection Project (RPP). This document describes, identifies, and defines the contents and structure of the Tank Farms FSAR Hazard Analysis Database and documents the configuration control changes made to the database. The Hazard Analysis Database contains the collection of information generated during the initial hazard evaluations and the subsequent hazard and accident analysis activities. The Hazard Analysis Database supports the preparation of Chapters 3 ,4 , and 5 of the Tank Farms FSAR and the Unreviewed Safety Question (USQ) process and consists of two major, interrelated data sets: (1) Hazard Analysis Database: Data from the results of the hazard evaluations, and (2) Hazard Topography Database: Data from the system familiarization and hazard identification.

  1. DNA repair

    SciTech Connect

    Friedberg, E.C.; Hanawalt, P.C. )

    1988-01-01

    Topics covered in this book included: Eukaryote model systems for DNA repair study; Sensitive detection of DNA lesions and their repair; and Defined DNA sequence probes for analysis of mutagenesis and repair.

  2. Prolinks: a database of protein functional linkages derived from coevolution

    PubMed Central

    Bowers, Peter M; Pellegrini, Matteo; Thompson, Mike J; Fierro, Joe; Yeates, Todd O; Eisenberg, David

    2004-01-01

    The advent of whole-genome sequencing has led to methods that infer protein function and linkages. We have combined four such algorithms (phylogenetic profile, Rosetta Stone, gene neighbor and gene cluster) in a single database - Prolinks - that spans 83 organisms and includes 10 million high-confidence links. The Proteome Navigator tool allows users to browse predicted linkage networks interactively, providing accompanying annotation from public databases. The Prolinks database and the Proteome Navigator tool are available for use online at . PMID:15128449

  3. Database for propagation models

    NASA Technical Reports Server (NTRS)

    Kantak, Anil V.

    1991-01-01

    A propagation researcher or a systems engineer who intends to use the results of a propagation experiment is generally faced with various database tasks such as the selection of the computer software, the hardware, and the writing of the programs to pass the data through the models of interest. This task is repeated every time a new experiment is conducted or the same experiment is carried out at a different location generating different data. Thus the users of this data have to spend a considerable portion of their time learning how to implement the computer hardware and the software towards the desired end. This situation may be facilitated considerably if an easily accessible propagation database is created that has all the accepted (standardized) propagation phenomena models approved by the propagation research community. Also, the handling of data will become easier for the user. Such a database construction can only stimulate the growth of the propagation research it if is available to all the researchers, so that the results of the experiment conducted by one researcher can be examined independently by another, without different hardware and software being used. The database may be made flexible so that the researchers need not be confined only to the contents of the database. Another way in which the database may help the researchers is by the fact that they will not have to document the software and hardware tools used in their research since the propagation research community will know the database already. The following sections show a possible database construction, as well as properties of the database for the propagation research.

  4. The Gaia Parameter Database

    NASA Astrophysics Data System (ADS)

    de Bruijne, J. H. J.; Lammers, U.; Perryman, M. A. C.

    2005-01-01

    The parallel development of many aspects of a complex mission like Gaia, which includes numerous participants in ESA, industrial companies, and a large and active scientific collaboration throughout Europe, makes keeping track of the many design changes, instrument and operational complexities, and numerical values for the data analysis a very challenging problem. A comprehensive, easily-accessible, up-to-date, and definitive compilation of a large range of numerical quantities is required, and the Gaia parameter database has been established to satisfy these needs. The database is a centralised repository containing, besides mathematical, physical, and astronomical constants, many satellite and subsystem design parameters. At the end of 2004, more than 1600 parameters had been included. Version control has been implemented, providing, next to a `live' version with the most recent parameters, well-defined reference versions of the full database contents. The database can be queried or browsed using a regular Web browser (http://www.rssd.esa.int/Gaia/paramdb). Query results are formated by default in HTML. Data can also be retrieved as Fortran-77, Fortran-90, Java, ANSIC, C++, or XML structures for direct inclusion into software codes in these languages. The idea is that all collaborating scientists can use the database parameters and values, once retrieved, directly linked to computational routines. An off-line access mode is also available, enabling users to automatically download the contents of the database. The database will be maintained actively, and significant extensions of the contents are planned. Consistent use in the future of the database by the Gaia community at large, including all industrial teams, will ensure correct numerical values throughout the complex software systems being built up as details of the Gaia design develop. The database is already being used for the telemetry simulation chain in ESTEC, and in the data simulations for GDAAS2.

  5. Numeric Databases in the Sciences.

    ERIC Educational Resources Information Center

    Meschel, S. V.

    1984-01-01

    Provides exploration into types of numeric databases available (also known as source databases, nonbibliographic databases, data-files, data-banks, fact banks); examines differences and similarities between bibliographic and numeric databases; identifies disciplines that utilize numeric databases; and surveys representative examples in the…

  6. Databases for materials selection

    SciTech Connect

    1996-06-01

    The Cambridge Materials Selector (CMS2.0) materials database was developed by the Engineering Dept. at Cambridge University in the United Kingdom. This database makes it possible to select a material for a specific application from essentially all classes of materials. Genera, Predict, and Socrates software programs from CLI International, Houston, Texas, automate materials selection and corrosion problem-solving tasks. They are said to significantly reduce the time necessary to select a suitable material and/or to assess a corrosion problem and reach cost-effective solutions. This article describes both databases and tells how to use them.

  7. Phase Equilibria Diagrams Database

    National Institute of Standards and Technology Data Gateway

    SRD 31 NIST/ACerS Phase Equilibria Diagrams Database (PC database for purchase)   The Phase Equilibria Diagrams Database contains commentaries and more than 21,000 diagrams for non-organic systems, including those published in all 21 hard-copy volumes produced as part of the ACerS-NIST Phase Equilibria Diagrams Program (formerly titled Phase Diagrams for Ceramists): Volumes I through XIV (blue books); Annuals 91, 92, 93; High Tc Superconductors I & II; Zirconium & Zirconia Systems; and Electronic Ceramics I. Materials covered include oxides as well as non-oxide systems such as chalcogenides and pnictides, phosphates, salt systems, and mixed systems of these classes.

  8. International Comparisions Database

    National Institute of Standards and Technology Data Gateway

    International Comparisions Database (Web, free access)   The International Comparisons Database (ICDB) serves the U.S. and the Inter-American System of Metrology (SIM) with information based on Appendices B (International Comparisons), C (Calibration and Measurement Capabilities) and D (List of Participating Countries) of the Comit� International des Poids et Mesures (CIPM) Mutual Recognition Arrangement (MRA). The official source of the data is The BIPM key comparison database. The ICDB provides access to results of comparisons of measurements and standards organized by the consultative committees of the CIPM and the Regional Metrology Organizations.

  9. JICST Factual Database(2)

    NASA Astrophysics Data System (ADS)

    Araki, Keisuke

    The computer programme, which builds atom-bond connection tables from nomenclatures, is developed. Chemical substances with their nomenclature and varieties of trivial names or experimental code numbers are inputted. The chemical structures of the database are stereospecifically stored and are able to be searched and displayed according to stereochemistry. Source data are from laws and regulations of Japan, RTECS of US and so on. The database plays a central role within the integrated fact database service of JICST and makes interrelational retrieval possible.

  10. Optical DNA

    NASA Astrophysics Data System (ADS)

    Vijaywargi, Deepak; Lewis, Dave; Kirovski, Darko

    A certificate of authenticity (COA) is an inexpensive physical object with a random and unique structure S which is hard to near-exactly replicate. An inexpensive device should be able to scan object’s physical “fingerprint,” a set of features that represents S. In this paper, we explore one set of requirements that optical media such as DVDs should satisfy, to be considered as COAs. As manufacturing of such media produces inevitable errors, we use the locations and count of these errors as a “fingerprint” for each optical disc: its optical DNA. The “fingerprint” is signed using publisher’s private-key and the resulting signature is stored onto the optical medium using a post-production process. Standard DVD players with altered firmware that includes publisher’s public-key, should be able to verify the authenticity of DVDs protected with optical DNA. Our key finding is that for the proposed protocol, only DVDs with exceptional wear-and-tear characteristics would result in an inexpensive and viable anti-counterfeiting technology.

  11. Rice Annotation Database (RAD): a contig-oriented database for map-based rice genomics.

    PubMed

    Ito, Yuichi; Arikawa, Kohji; Antonio, Baltazar A; Ohta, Isamu; Naito, Shinji; Mukai, Yoshiyuki; Shimano, Atsuko; Masukawa, Masatoshi; Shibata, Michie; Yamamoto, Mayu; Ito, Yukiyo; Yokoyama, Junri; Sakai, Yasumichi; Sakata, Katsumi; Nagamura, Yoshiaki; Namiki, Nobukazu; Matsumoto, Takashi; Higo, Kenichi; Sasaki, Takuji

    2005-01-01

    A contig-oriented database for annotation of the rice genome has been constructed to facilitate map-based rice genomics. The Rice Annotation Database has the following functional features: (i) extensive effort of manual annotations of P1-derived artificial chromosome/bacterial artificial chromosome clones can be merged at chromosome and contig-level; (ii) concise visualization of the annotation information such as the predicted genes, results of various prediction programs (RiceHMM, Genscan, Genscan+, Fgenesh, GeneMark, etc.), homology to expressed sequence tag, full-length cDNA and protein; (iii) user-friendly clone / gene query system; (iv) download functions for nucleotide, amino acid and coding sequences; (v) analysis of various features of the genome (GC-content, average value, etc.); and (vi) genome-wide homology search (BLAST) of contig- and chromosome-level genome sequence to allow comparative analysis with the genome sequence of other organisms. As of October 2004, the database contains a total of 215 Mb sequence with relevant annotation results including 30 000 manually curated genes. The database can provide the latest information on manual annotation as well as a comprehensive structural analysis of various features of the rice genome. The database can be accessed at http://rad.dna.affrc.go.jp/. PMID:15608281

  12. The Vocational Guidance Research Database: A Scientometric Approach

    ERIC Educational Resources Information Center

    Flores-Buils, Raquel; Gil-Beltran, Jose Manuel; Caballer-Miedes, Antonio; Martinez-Martinez, Miguel Angel

    2012-01-01

    The scientometric study of scientific output through publications in specialized journals cannot be undertaken exclusively with the databases available today. For this reason, the objective of this article is to introduce the "Base de Datos de Investigacion en Orientacion Vocacional" [Vocational Guidance Research Database], based on the use of

  13. The Vocational Guidance Research Database: A Scientometric Approach

    ERIC Educational Resources Information Center

    Flores-Buils, Raquel; Gil-Beltran, Jose Manuel; Caballer-Miedes, Antonio; Martinez-Martinez, Miguel Angel

    2012-01-01

    The scientometric study of scientific output through publications in specialized journals cannot be undertaken exclusively with the databases available today. For this reason, the objective of this article is to introduce the "Base de Datos de Investigacion en Orientacion Vocacional" [Vocational Guidance Research Database], based on the use of…

  14. Hungry for Nutrient Data? Navigating the USDA Nutrient Database

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The USDA National Nutrient Database for Standard Reference (SR) is the major source of food composition data in the United States, providing the foundation for most food composition databases in the public and private sectors. Most nutrition professionals are familiar with the basics of the SR onlin...

  15. The world bacterial biogeography and biodiversity through databases: a case study of NCBI Nucleotide Database and GBIF Database.

    PubMed

    Selama, Okba; James, Phillip; Nateche, Farida; Wellington, Elizabeth M H; Hacène, Hocine

    2013-01-01

    Databases are an essential tool and resource within the field of bioinformatics. The primary aim of this study was to generate an overview of global bacterial biodiversity and biogeography using available data from the two largest public online databases, NCBI Nucleotide and GBIF. The secondary aim was to highlight the contribution each geographic area has to each database. The basis for data analysis of this study was the metadata provided by both databases, mainly, the taxonomy and the geographical area origin of isolation of the microorganism (record). These were directly obtained from GBIF through the online interface, while E-utilities and Python were used in combination with a programmatic web service access to obtain data from the NCBI Nucleotide Database. Results indicate that the American continent, and more specifically the USA, is the top contributor, while Africa and Antarctica are less well represented. This highlights the imbalance of exploration within these areas rather than any reduction in biodiversity. This study describes a novel approach to generating global scale patterns of bacterial biodiversity and biogeography and indicates that the Proteobacteria are the most abundant and widely distributed phylum within both databases. PMID:24228241

  16. The World Bacterial Biogeography and Biodiversity through Databases: A Case Study of NCBI Nucleotide Database and GBIF Database

    PubMed Central

    James, Phillip; Nateche, Farida; Wellington, Elizabeth M. H.; Hacène, Hocine

    2013-01-01

    Databases are an essential tool and resource within the field of bioinformatics. The primary aim of this study was to generate an overview of global bacterial biodiversity and biogeography using available data from the two largest public online databases, NCBI Nucleotide and GBIF. The secondary aim was to highlight the contribution each geographic area has to each database. The basis for data analysis of this study was the metadata provided by both databases, mainly, the taxonomy and the geographical area origin of isolation of the microorganism (record). These were directly obtained from GBIF through the online interface, while E-utilities and Python were used in combination with a programmatic web service access to obtain data from the NCBI Nucleotide Database. Results indicate that the American continent, and more specifically the USA, is the top contributor, while Africa and Antarctica are less well represented. This highlights the imbalance of exploration within these areas rather than any reduction in biodiversity. This study describes a novel approach to generating global scale patterns of bacterial biodiversity and biogeography and indicates that the Proteobacteria are the most abundant and widely distributed phylum within both databases. PMID:24228241

  17. TREATABILITY DATABASE DESCRIPTION

    EPA Science Inventory

    The Drinking Water Treatability Database (TDB) presents referenced information on the control of contaminants in drinking water. It allows drinking water utilities, first responders to spills or emergencies, treatment process designers, research organizations, academics, regulato...

  18. THE CTEPP DATABASE

    EPA Science Inventory

    The CTEPP (Children's Total Exposure to Persistent Pesticides and Other Persistent Organic Pollutants) database contains a wealth of data on children's aggregate exposures to pollutants in their everyday surroundings. Chemical analysis data for the environmental media and ques...

  19. Nuclear Science References Database

    SciTech Connect

    Pritychenko, B.; Běták, E.; Singh, B.; Totans, J.

    2014-06-15

    The Nuclear Science References (NSR) database together with its associated Web interface, is the world's only comprehensive source of easily accessible low- and intermediate-energy nuclear physics bibliographic information for more than 210,000 articles since the beginning of nuclear science. The weekly-updated NSR database provides essential support for nuclear data evaluation, compilation and research activities. The principles of the database and Web application development and maintenance are described. Examples of nuclear structure, reaction and decay applications are specifically included. The complete NSR database is freely available at the websites of the National Nuclear Data Center (http://www.nndc.bnl.gov/nsr) and the International Atomic Energy Agency (http://www-nds.iaea.org/nsr)

  20. Requirements Management Database

    Energy Science and Technology Software Center (ESTSC)

    2009-08-13

    This application is a simplified and customized version of the RBA and CTS databases to capture federal, site, and facility requirements, link to actions that must be performed to maintain compliance with their contractual and other requirements.

  1. Chemical Kinetics Database

    National Institute of Standards and Technology Data Gateway

    SRD 17 NIST Chemical Kinetics Database (Web, free access)   The NIST Chemical Kinetics Database includes essentially all reported kinetics results for thermal gas-phase chemical reactions. The database is designed to be searched for kinetics data based on the specific reactants involved, for reactions resulting in specified products, for all the reactions of a particular species, or for various combinations of these. In addition, the bibliography can be searched by author name or combination of names. The database contains in excess of 38,000 separate reaction records for over 11,700 distinct reactant pairs. These data have been abstracted from over 12,000 papers with literature coverage through early 2000.

  2. Enhancing medical database security.

    PubMed

    Pangalos, G; Khair, M; Bozios, L

    1994-08-01

    A methodology for the enhancement of database security in a hospital environment is presented in this paper which is based on both the discretionary and the mandatory database security policies. In this way the advantages of both approaches are combined to enhance medical database security. An appropriate classification of the different types of users according to their different needs and roles and a User Role Definition Hierarchy has been used. The experience obtained from the experimental implementation of the proposed methodology in a major general hospital is briefly discussed. The implementation has shown that the combined discretionary and mandatory security enforcement effectively limits the unauthorized access to the medical database, without severely restricting the capabilities of the system. PMID:7829977

  3. The PHARMSEARCH database.

    PubMed

    O'Hara, M P; Pagis, C

    1991-02-01

    PHARMSEARCH, a database produced by the French Patent and Trademark Office (INPI), covers pharmaceutical patents issued by the Europeans, French, and United States patent offices from November 1986 onward. PHARMSEARCH is composed of MPHARM, a structure file searchable using Markush DARC software, and PHARM, the companion bibliographic file. Markush structures claimed in the patent documents are entered into the database as variable generic structures. Specific structures are also included in the database, when they are not part of a Markush structure in the patent document. Chemical index terms describe all moieties of the structure. Indexing also describes the therapeutic activities and preparation processes for the compounds. The indexing policies used in the production of this database are described. PMID:2026662

  4. ARTI Refrigerant Database

    SciTech Connect

    Calm, J.M.

    1995-06-01

    The Refrigerant Database consolidates and facilitates access to information to assist industry in developing equipment using alternative refrigerants. The underlying purpose is to accelerate phase out of chemical compounds of environmental concern.

  5. ARTI Refrigerant Database

    SciTech Connect

    Calm, J.M.

    1995-02-01

    The Refrigerant Database consolidates and facilitates access to information to assist industry in developing equipment using alternative refrigerants. The underlying purpose is to accelerate phase-out of chemical compounds of environmental concern.

  6. ARTI Refrigerant Database

    SciTech Connect

    Calm, J.M.

    1994-05-27

    The Refrigerant Database consolidates and facilitates access to information to assist industry in developing equipment using alternative refrigerants. The underlying purpose is to accelerate phase out of chemical compounds of environmental concern.

  7. Toward unification of taxonomy databases in a distributed computer environment

    SciTech Connect

    Kitakami, Hajime; Tateno, Yoshio; Gojobori, Takashi

    1994-12-31

    All the taxonomy databases constructed with the DNA databases of the international DNA data banks are powerful electronic dictionaries which aid in biological research by computer. The taxonomy databases are, however not consistently unified with a relational format. If we can achieve consistent unification of the taxonomy databases, it will be useful in comparing many research results, and investigating future research directions from existent research results. In particular, it will be useful in comparing relationships between phylogenetic trees inferred from molecular data and those constructed from morphological data. The goal of the present study is to unify the existent taxonomy databases and eliminate inconsistencies (errors) that are present in them. Inconsistencies occur particularly in the restructuring of the existent taxonomy databases, since classification rules for constructing the taxonomy have rapidly changed with biological advancements. A repair system is needed to remove inconsistencies in each data bank and mismatches among data banks. This paper describes a new methodology for removing both inconsistencies and mismatches from the databases on a distributed computer environment. The methodology is implemented in a relational database management system, SYBASE.

  8. The problems and promise of DNA barcodes for species diagnosis of primate biomaterials

    PubMed Central

    Lorenz, Joseph G; Jackson, Whitney E; Beck, Jeanne C; Hanner, Robert

    2005-01-01

    The Integrated Primate Biomaterials and Information Resource (www.IPBIR.org) provides essential research reagents to the scientific community by establishing, verifying, maintaining, and distributing DNA and RNA derived from primate cell cultures. The IPBIR uses mitochondrial cytochrome c oxidase subunit I sequences to verify the identity of samples for quality control purposes in the accession, cell culture, DNA extraction processes and prior to shipping to end users. As a result, IPBIR is accumulating a database of ‘DNA barcodes’ for many species of primates. However, this quality control process is complicated by taxon specific patterns of ‘universal primer’ failure, as well as the amplification or co-amplification of nuclear pseudogenes of mitochondrial origins. To overcome these difficulties, taxon specific primers have been developed, and reverse transcriptase PCR is utilized to exclude these extraneous sequences from amplification. DNA barcoding of primates has applications to conservation and law enforcement. Depositing barcode sequences in a public database, along with primer sequences, trace files and associated quality scores, makes this species identification technique widely accessible. Reference DNA barcode sequences should be derived from, and linked to, specimens of known provenance in web-accessible collections in order to validate this system of molecular diagnostics. PMID:16214744

  9. DNA sequencing

    SciTech Connect

    Tabor, S.; Richardson, C.C.

    1991-02-19

    This patent describes a method for determining the nucleotide base sequence of a DNA molecule. It comprises: providing the DNA molecule annealed with a primer molecule able to hybridize to the DNA molecule; incubating the annealed molecules in a vessel containing four different deoxynucleoside triphosphates, a processive DNA polymerase, wherein the polymerase is able to remain bound for at least 500 bases to the DNA molecule in an environmental condition used in the extension reaction of a DNA sequencing reaction, the polymerase having less than 500 units of exonuclease activity per mg of the polymerase, and one of four DNA synthesis terminating agents which terminate DNA synthesis at a specific nucleotide base, wherein each the agent terminates DNA synthesis at a different nucleotide base, and separating the DNA products of the incubating reaction according to their size, whereby at least a part of the nucleotide base sequence of the DNA molecule can be determined.

  10. Comprehensive Thematic T-matrix Reference Database: a 2013-2014 Update

    NASA Technical Reports Server (NTRS)

    Mishchenko, Michael I.; Zakharova, Nadezhda T.; Khlebtsov, Nikolai G.; Wriedt, Thomas; Videen, Gorden

    2014-01-01

    This paper is the sixth update to the comprehensive thematic database of peer-reviewedT-matrix publications initiated by us in 2004 and includes relevant publications that have appeared since 2013. It also lists several earlier publications not incorporated in the original database and previous updates.

  11. Database computing in HEP

    NASA Technical Reports Server (NTRS)

    Day, C. T.; Loken, S.; Macfarlane, J. F.; May, E.; Lifka, D.; Lusk, E.; Price, L. E.; Baden, A.; Grossman, R.; Qin, X.

    1992-01-01

    The major SSC experiments are expected to produce up to 1 Petabyte of data per year each. Once the primary reconstruction is completed by farms of inexpensive processors, I/O becomes a major factor in further analysis of the data. We believe that the application of database techniques can significantly reduce the I/O performed in these analyses. We present examples of such I/O reductions in prototypes based on relational and object-oriented databases of CDF data samples.

  12. Database computing in HEP

    SciTech Connect

    Day, C.T.; Loken, S.; MacFarlane, J.F. ); May, E.; Lifka, D.; Lusk, E.; Price, L.E. ); Baden, A. . Dept. of Physics); Grossman, R.; Qin, X. . Dept. of Mathematics, Statistics and Computer Science); Cormell, L.; Leibold, P.; Liu, D

    1992-01-01

    The major SSC experiments are expected to produce up to 1 Petabyte of data per year each. Once the primary reconstruction is completed by farms of inexpensive processors. I/O becomes a major factor in further analysis of the data. We believe that the application of database techniques can significantly reduce the I/O performed in these analyses. We present examples of such I/O reductions in prototype based on relational and object-oriented databases of CDF data samples.

  13. Human mapping databases.

    PubMed

    Talbot, C; Cuticchia, A J

    2001-05-01

    This unit concentrates on the data contained within two human genome databasesGDB (Genome Database) and OMIM (Online Mendelian Inheritance in Man)and includes discussion of different methods for submitting and accessing data. An understanding of electronic mail, FTP, and the use of a World Wide Web (WWW) navigational tool such as Netscape or Internet Explorer is a prerequisite for utilizing the information in this unit. PMID:18428234

  14. Querying genomic databases

    SciTech Connect

    Baehr, A.; Hagstrom, R.; Joerg, D.; Overbeek, R.

    1991-09-01

    A natural-language interface has been developed that retrieves genomic information by using a simple subset of English. The interface spares the biologist from the task of learning database-specific query languages and computer programming. Currently, the interface deals with the E. coli genome. It can, however, be readily extended and shows promise as a means of easy access to other sequenced genomic databases as well.

  15. Steam Properties Database

    National Institute of Standards and Technology Data Gateway

    SRD 10 NIST/ASME Steam Properties Database (PC database for purchase)   Based upon the International Association for the Properties of Water and Steam (IAPWS) 1995 formulation for the thermodynamic properties of water and the most recent IAPWS formulations for transport and other properties, this updated version provides water properties over a wide range of conditions according to the accepted international standards.

  16. SSME environment database development

    NASA Technical Reports Server (NTRS)

    Reardon, John

    1987-01-01

    The internal environment of the Space Shuttle Main Engine (SSME) is being determined from hot firings of the prototype engines and from model tests using either air or water as the test fluid. The objectives are to develop a database system to facilitate management and analysis of test measurements and results, to enter available data into the the database, and to analyze available data to establish conventions and procedures to provide consistency in data normalization and configuration geometry references.

  17. Corruption of genomic databases with anomalous sequence.

    PubMed Central

    Lamperti, E D; Kittelberger, J M; Smith, T F; Villa-Komaroff, L

    1992-01-01

    We describe evidence that DNA sequences from vectors used for cloning and sequencing have been incorporated accidentally into eukaryotic entries in the GenBank database. These incorporations were not restricted to one type of vector or to a single mechanism. Many minor instances may have been the result of simple editing errors, but some entries contained large blocks of vector sequence that had been incorporated by contamination or other accidents during cloning. Some cases involved unusual rearrangements and areas of vector distant from the normal insertion sites. Matches to vector were found in 0.23% of 20,000 sequences analyzed in GenBank Release 63. Although the possibility of anomalous sequence incorporation has been recognized since the inception of GenBank and should be easy to avoid, recent evidence suggests that this problem is increasing more quickly than the database itself. The presence of anomalous sequence may have serious consequences for the interpretation and use of database entries, and will have an impact on issues of database management. The incorporated vector fragments described here may also be useful for a crude estimate of the fidelity of sequence information in the database. In alignments with well-defined ends, the matching sequences showed 96.8% identity to vector; when poorer matches with arbitrary limits were included, the aggregate identity to vector sequence was 94.8%. PMID:1614861

  18. Open systems and databases

    SciTech Connect

    Martire, G.S. ); Nuttall, D.J.H. )

    1993-05-01

    This paper is part of a series of papers invited by the IEEE POWER CONTROL CENTER WORKING GROUP concerning the changing designs of modern control centers. Papers invited by the Working Group discuss the following issues: Benefits of Openness, Criteria for Evaluating Open EMS Systems, Hardware Design, Configuration Management, Security, Project Management, Databases, SCADA, Inter- and Intra-System Communications and Man-Machine Interfaces,'' The goal of this paper is to provide an introduction to the issues pertaining to Open Systems and Databases.'' The intent is to assist understanding of some of the underlying factors that effect choices that must be made when selecting a database system for use in a control room environment. This paper describes and compares the major database information models which are in common use for database systems and provides an overview of SQL. A case for the control center community to follow the workings of the non-formal standards bodies is presented along with possible uses and the benefits of commercially available databases within the control center. The reasons behind the emergence of industry supported standards organizations such as the Open Software Foundation (OSF) and SQL Access are presented.

  19. Crude Oil Analysis Database

    DOE Data Explorer

    Shay, Johanna Y.

    The composition and physical properties of crude oil vary widely from one reservoir to another within an oil field, as well as from one field or region to another. Although all oils consist of hydrocarbons and their derivatives, the proportions of various types of compounds differ greatly. This makes some oils more suitable than others for specific refining processes and uses. To take advantage of this diversity, one needs access to information in a large database of crude oil analyses. The Crude Oil Analysis Database (COADB) currently satisfies this need by offering 9,056 crude oil analyses. Of these, 8,500 are United States domestic oils. The database contains results of analysis of the general properties and chemical composition, as well as the field, formation, and geographic location of the crude oil sample. [Taken from the Introduction to COAMDATA_DESC.pdf, part of the zipped software and database file at http://www.netl.doe.gov/technologies/oil-gas/Software/database.html] Save the zipped file to your PC. When opened, it will contain PDF documents and a large Excel spreadsheet. It will also contain the database in Microsoft Access 2002.

  20. The Halophile Protein Database

    PubMed Central

    Sharma, Naveen; Farooqi, Mohammad Samir; Chaturvedi, Krishna Kumar; Lal, Shashi Bhushan; Grover, Monendra; Rai, Anil; Pandey, Pankaj

    2014-01-01

    Halophilic archaea/bacteria adapt to different salt concentration, namely extreme, moderate and low. These type of adaptations may occur as a result of modification of protein structure and other changes in different cell organelles. Thus proteins may play an important role in the adaptation of halophilic archaea/bacteria to saline conditions. The Halophile protein database (HProtDB) is a systematic attempt to document the biochemical and biophysical properties of proteins from halophilic archaea/bacteria which may be involved in adaptation of these organisms to saline conditions. In this database, various physicochemical properties such as molecular weight, theoretical pI, amino acid composition, atomic composition, estimated half-life, instability index, aliphatic index and grand average of hydropathicity (Gravy) have been listed. These physicochemical properties play an important role in identifying the protein structure, bonding pattern and function of the specific proteins. This database is comprehensive, manually curated, non-redundant catalogue of proteins. The database currently contains 59 897 proteins properties extracted from 21 different strains of halophilic archaea/bacteria. The database can be accessed through link. Database URL: http://webapp.cabgrid.res.in/protein/ PMID:25468930

  1. Specialist Bibliographic Databases

    PubMed Central

    2016-01-01

    Specialist bibliographic databases offer essential online tools for researchers and authors who work on specific subjects and perform comprehensive and systematic syntheses of evidence. This article presents examples of the established specialist databases, which may be of interest to those engaged in multidisciplinary science communication. Access to most specialist databases is through subscription schemes and membership in professional associations. Several aggregators of information and database vendors, such as EBSCOhost and ProQuest, facilitate advanced searches supported by specialist keyword thesauri. Searches of items through specialist databases are complementary to those through multidisciplinary research platforms, such as PubMed, Web of Science, and Google Scholar. Familiarizing with the functional characteristics of biomedical and nonbiomedical bibliographic search tools is mandatory for researchers, authors, editors, and publishers. The database users are offered updates of the indexed journal lists, abstracts, author profiles, and links to other metadata. Editors and publishers may find particularly useful source selection criteria and apply for coverage of their peer-reviewed journals and grey literature sources. These criteria are aimed at accepting relevant sources with established editorial policies and quality controls. PMID:27134485

  2. Specialist Bibliographic Databases.

    PubMed

    Gasparyan, Armen Yuri; Yessirkepov, Marlen; Voronov, Alexander A; Trukhachev, Vladimir I; Kostyukova, Elena I; Gerasimov, Alexey N; Kitas, George D

    2016-05-01

    Specialist bibliographic databases offer essential online tools for researchers and authors who work on specific subjects and perform comprehensive and systematic syntheses of evidence. This article presents examples of the established specialist databases, which may be of interest to those engaged in multidisciplinary science communication. Access to most specialist databases is through subscription schemes and membership in professional associations. Several aggregators of information and database vendors, such as EBSCOhost and ProQuest, facilitate advanced searches supported by specialist keyword thesauri. Searches of items through specialist databases are complementary to those through multidisciplinary research platforms, such as PubMed, Web of Science, and Google Scholar. Familiarizing with the functional characteristics of biomedical and nonbiomedical bibliographic search tools is mandatory for researchers, authors, editors, and publishers. The database users are offered updates of the indexed journal lists, abstracts, author profiles, and links to other metadata. Editors and publishers may find particularly useful source selection criteria and apply for coverage of their peer-reviewed journals and grey literature sources. These criteria are aimed at accepting relevant sources with established editorial policies and quality controls. PMID:27134485

  3. Drinking Water Database

    NASA Technical Reports Server (NTRS)

    Murray, ShaTerea R.

    2004-01-01

    This summer I had the opportunity to work in the Environmental Management Office (EMO) under the Chemical Sampling and Analysis Team or CS&AT. This team s mission is to support Glenn Research Center (GRC) and EM0 by providing chemical sampling and analysis services and expert consulting. Services include sampling and chemical analysis of water, soil, fbels, oils, paint, insulation materials, etc. One of this team s major projects is the Drinking Water Project. This is a project that is done on Glenn s water coolers and ten percent of its sink every two years. For the past two summers an intern had been putting together a database for this team to record the test they had perform. She had successfully created a database but hadn't worked out all the quirks. So this summer William Wilder (an intern from Cleveland State University) and I worked together to perfect her database. We began be finding out exactly what every member of the team thought about the database and what they would change if any. After collecting this data we both had to take some courses in Microsoft Access in order to fix the problems. Next we began looking at what exactly how the database worked from the outside inward. Then we began trying to change the database but we quickly found out that this would be virtually impossible.

  4. The Immune Epitope Database 2.0

    PubMed Central

    Vita, Randi; Zarebski, Laura; Greenbaum, Jason A.; Emami, Hussein; Hoof, Ilka; Salimi, Nima; Damle, Rohini; Sette, Alessandro; Peters, Bjoern

    2010-01-01

    The Immune Epitope Database (IEDB, www.iedb.org) provides a catalog of experimentally characterized B and T cell epitopes, as well as data on Major Histocompatibility Complex (MHC) binding and MHC ligand elution experiments. The database represents the molecular structures recognized by adaptive immune receptors and the experimental contexts in which these molecules were determined to be immune epitopes. Epitopes recognized in humans, nonhuman primates, rodents, pigs, cats and all other tested species are included. Both positive and negative experimental results are captured. Over the course of 4 years, the data from 180 978 experiments were curated manually from the literature, which covers ∼99% of all publicly available information on peptide epitopes mapped in infectious agents (excluding HIV) and 93% of those mapped in allergens. In addition, data that would otherwise be unavailable to the public from 129 186 experiments were submitted directly by investigators. The curation of epitopes related to autoimmunity is expected to be completed by the end of 2010. The database can be queried by epitope structure, source organism, MHC restriction, assay type or host organism, among other criteria. The database structure, as well as its querying, browsing and reporting interfaces, was completely redesigned for the IEDB 2.0 release, which became publicly available in early 2009. PMID:19906713

  5. Database of Mechanical Properties of Textile Composites

    NASA Technical Reports Server (NTRS)

    Delbrey, Jerry

    1996-01-01

    This report describes the approach followed to develop a database for mechanical properties of textile composites. The data in this database is assembled from NASA Advanced Composites Technology (ACT) programs and from data in the public domain. This database meets the data documentation requirements of MIL-HDBK-17, Section 8.1.2, which describes in detail the type and amount of information needed to completely document composite material properties. The database focuses on mechanical properties of textile composite. Properties are available for a range of parameters such as direction, fiber architecture, materials, environmental condition, and failure mode. The composite materials in the database contain innovative textile architectures such as the braided, woven, and knitted materials evaluated under the NASA ACT programs. In summary, the database contains results for approximately 3500 coupon level tests, for ten different fiber/resin combinations, and seven different textile architectures. It also includes a limited amount of prepreg tape composites data from ACT programs where side-by-side comparisons were made.

  6. IPDthe Immuno Polymorphism Database

    PubMed Central

    Robinson, James; Halliwell, Jason A.; McWilliam, Hamish; Lopez, Rodrigo; Marsh, Steven G. E.

    2013-01-01

    The Immuno Polymorphism Database (IPD), http://www.ebi.ac.uk/ipd/ is a set of specialist databases related to the study of polymorphic genes in the immune system. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of killer-cell immunoglobulin-like receptors, IPD-MHC, a database of sequences of the major histocompatibility complex of different species; IPD-HPA, alloantigens expressed only on platelets; and IPD-ESTDAB, which provides access to the European Searchable Tumour Cell-Line Database, a cell bank of immunologically characterized melanoma cell lines. The data is currently available online from the website and FTP directory. This article describes the latest updates and additional tools added to the IPD project. PMID:23180793

  7. IEEE Conference Publications in Libraries.

    ERIC Educational Resources Information Center

    Johnson, Karl E.

    1984-01-01

    Conclusions of surveys (63 libraries, OCLC database, University of Rhode Island users) assessing handling of Institute of Electrical and Electronics Engineers (IEEE) conference publications indicate that most libraries fully catalog these publications using LC cataloging, and library patrons frequently require series access to publications. Eight

  8. NASA scientific and technical publications: A catalog of Special Publications, Reference Publications, Conference Publications, and Technical Papers, 1987

    NASA Technical Reports Server (NTRS)

    1988-01-01

    This catalog lists 239 citations of all NASA Special Publications, NASA Reference Publications, NASA Conference Publications, and NASA Technical Papers that were entered in the NASA scientific and technical information database during accession year 1987. The entries are grouped by subject category. Indexes of subject terms, personal authors, and NASA report numbers are provided.

  9. NASA scientific and technical publications: A catalog of special publications, reference publications, conference publications, and technical papers, 1987-1990

    NASA Technical Reports Server (NTRS)

    1991-01-01

    This catalog lists 783 citations of all NASA Special Publications, NASA Reference Publications, NASA Conference Publications, and NASA Technical Papers that were entered into NASA Scientific and Technical Information Database during the year's 1987 through 1990. The entries are grouped by subject category. Indexes of subject terms, personal authors, and NASA report numbers are provided.

  10. NASA scientific and technical publications: A catalog of special publications, reference publications, conference publications, and technical papers, 1989

    NASA Technical Reports Server (NTRS)

    1990-01-01

    This catalog lists 190 citations of all NASA Special Publications, NASA Reference Publications, NASA Conference Publications, and NASA Technical Papers that were entered into the NASA scientific and technical information database during accession year 1989. The entries are grouped by subject category. Indexes of subject terms, personal authors, and NASA report numbers are provided.

  11. IEEE Conference Publications in Libraries.

    ERIC Educational Resources Information Center

    Johnson, Karl E.

    1984-01-01

    Conclusions of surveys (63 libraries, OCLC database, University of Rhode Island users) assessing handling of Institute of Electrical and Electronics Engineers (IEEE) conference publications indicate that most libraries fully catalog these publications using LC cataloging, and library patrons frequently require series access to publications. Eight…

  12. NASA scientific and technical publications: A catalog of special publications, reference publications, conference publications, and technical papers, 1991-1992

    NASA Technical Reports Server (NTRS)

    1993-01-01

    This catalog lists 458 citations of all NASA Special Publications, NASA Reference Publications, NASA Conference Publications, and NASA Technical Papers that were entered into the NASA Scientific and Technical Information database during accession year 1991 through 1992. The entries are grouped by subject category. Indexes of subject terms, personal authors, and NASA report numbers are provided.

  13. SAbDab: the structural antibody database.

    PubMed

    Dunbar, James; Krawczyk, Konrad; Leem, Jinwoo; Baker, Terry; Fuchs, Angelika; Georges, Guy; Shi, Jiye; Deane, Charlotte M

    2014-01-01

    Structural antibody database (SAbDab; http://opig.stats.ox.ac.uk/webapps/sabdab) is an online resource containing all the publicly available antibody structures annotated and presented in a consistent fashion. The data are annotated with several properties including experimental information, gene details, correct heavy and light chain pairings, antigen details and, where available, antibody-antigen binding affinity. The user can select structures, according to these attributes as well as structural properties such as complementarity determining region loop conformation and variable domain orientation. Individual structures, datasets and the complete database can be downloaded. PMID:24214988

  14. NASA aerospace database subject scope: An overview

    NASA Technical Reports Server (NTRS)

    1993-01-01

    Outlined here is the subject scope of the NASA Aerospace Database, a publicly available subset of the NASA Scientific and Technical (STI) Database. Topics of interest to NASA are outlined and placed within the framework of the following broad aerospace subject categories: aeronautics, astronautics, chemistry and materials, engineering, geosciences, life sciences, mathematical and computer sciences, physics, social sciences, space sciences, and general. A brief discussion of the subject scope is given for each broad area, followed by a similar explanation of each of the narrower subject fields that follow. The subject category code is listed for each entry.

  15. A computerised database system for bovine traceability.

    PubMed

    Houston, R

    2001-08-01

    A computerised database system to record the details of all individual cattle, cattle holdings, cattle movements and cattle tests has been in use in Northern Ireland since 1988. This system was originally used purely to administer official tuberculosis and brucellosis eradication schemes, but subsequent developments have employed the traceability function to extend the use of the system to quality assurance, public health and marketing of beef and beef products. The database has evolved into the current, second generation system and this case study details that evolution from a manual system and describes potential future developments of the system. PMID:11548534

  16. wDBTF: an integrated database resource for studying wheat transcription factor families

    PubMed Central

    2010-01-01

    Background Transcription factors (TFs) regulate gene expression by interacting with promoters of their target genes and are classified into families based on their DNA-binding domains. Genes coding for TFs have been identified in the sequences of model plant genomes. The rice (Oryza sativa spp. japonica) genome contains 2,384 TF gene models, which represent the mRNA transcript of a locus, classed into 63 families. Results We have created an extensive list of wheat (Triticum aestivum L) TF sequences based on sequence homology with rice TFs identified and classified in the Database of Rice Transcription Factors (DRTF). We have identified 7,112 wheat sequences (contigs and singletons) from a dataset of 1,033,960 expressed sequence tag and mRNA (ET) sequences available. This number is about three times the number of TFs in rice so proportionally is very similar if allowance is made for the hexaploidy of wheat. Of these sequences 3,820 encode gene products with a DNA-binding domain and thus were confirmed as potential regulators. These 3,820 sequences were classified into 40 families and 84 subfamilies and some members defined orphan families. The results were compiled in the Database of Wheat Transcription Factor (wDBTF), an inventory available on the web http://wwwappli.nantes.inra.fr:8180/wDBFT/. For each accession, a link to its library source and its Affymetrix identification number is provided. The positions of Pfam (protein family database) motifs were given when known. Conclusions wDBTF collates 3,820 wheat TF sequences validated by the presence of a DNA-binding domain out of 7,112 potential TF sequences identified from publicly available gene expression data. We also incorporated in silico expression data on these TFs into the database. Thus this database provides a major resource for systematic studies of TF families and their expression in wheat as illustrated here in a study of DOF family members expressed during seed development. PMID:20298594

  17. Publicity and public relations

    NASA Technical Reports Server (NTRS)

    Fosha, Charles E.

    1990-01-01

    This paper addresses approaches to using publicity and public relations to meet the goals of the NASA Space Grant College. Methods universities and colleges can use to publicize space activities are presented.

  18. Recent updates and developments to plant genome size databases

    PubMed Central

    Garcia, Snia; Leitch, Ilia J.; Anadon-Rosell, Alba; Canela, Miguel .; Glvez, Francisco; Garnatje, Teresa; Gras, Airy; Hidalgo, Oriane; Johnston, Emmeline; Mas de Xaxars, Gemma; Pellicer, Jaume; Siljak-Yakovlev, Sonja; Valls, Joan; Vitales, Daniel; Bennett, Michael D.

    2014-01-01

    Two plant genome size databases have been recently updated and/or extended: the Plant DNA C-values database (http://data.kew.org/cvalues), and GSAD, the Genome Size in Asteraceae database (http://www.asteraceaegenomesize.com). While the first provides information on nuclear DNA contents across land plants and some algal groups, the second is focused on one of the largest and most economically important angiosperm families, Asteraceae. Genome size data have numerous applications: they can be used in comparative studies on genome evolution, or as a tool to appraise the cost of whole-genome sequencing programs. The growing interest in genome size and increasing rate of data accumulation has necessitated the continued update of these databases. Currently, the Plant DNA C-values database (Release 6.0, Dec. 2012) contains data for 8510 species, while GSAD has 1219 species (Release 2.0, June 2013), representing increases of 17 and 51%, respectively, in the number of species with genome size data, compared with previous releases. Here we provide overviews of the most recent releases of each database, and outline new features of GSAD. The latter include (i) a tool to visually compare genome size data between species, (ii) the option to export data and (iii) a webpage containing information about flow cytometry protocols. PMID:24288377

  19. Database Constraints Applied to Metabolic Pathway Reconstruction Tools

    PubMed Central

    Vilaplana, Jordi; Solsona, Francesc; Teixido, Ivan; Usié, Anabel; Karathia, Hiren; Alves, Rui; Mateo, Jordi

    2014-01-01

    Our group developed two biological applications, Biblio-MetReS and Homol-MetReS, accessing the same database of organisms with annotated genes. Biblio-MetReS is a data-mining application that facilitates the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the process(es) of interest and their function. It also enables the sets of proteins involved in the process(es) in different organisms to be compared directly. The efficiency of these biological applications is directly related to the design of the shared database. We classified and analyzed the different kinds of access to the database. Based on this study, we tried to adjust and tune the configurable parameters of the database server to reach the best performance of the communication data link to/from the database system. Different database technologies were analyzed. We started the study with a public relational SQL database, MySQL. Then, the same database was implemented by a MapReduce-based database named HBase. The results indicated that the standard configuration of MySQL gives an acceptable performance for low or medium size databases. Nevertheless, tuning database parameters can greatly improve the performance and lead to very competitive runtimes. PMID:25202745

  20. Database constraints applied to metabolic pathway reconstruction tools.

    PubMed

    Vilaplana, Jordi; Solsona, Francesc; Teixido, Ivan; Usié, Anabel; Karathia, Hiren; Alves, Rui; Mateo, Jordi

    2014-01-01

    Our group developed two biological applications, Biblio-MetReS and Homol-MetReS, accessing the same database of organisms with annotated genes. Biblio-MetReS is a data-mining application that facilitates the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the process(es) of interest and their function. It also enables the sets of proteins involved in the process(es) in different organisms to be compared directly. The efficiency of these biological applications is directly related to the design of the shared database. We classified and analyzed the different kinds of access to the database. Based on this study, we tried to adjust and tune the configurable parameters of the database server to reach the best performance of the communication data link to/from the database system. Different database technologies were analyzed. We started the study with a public relational SQL database, MySQL. Then, the same database was implemented by a MapReduce-based database named HBase. The results indicated that the standard configuration of MySQL gives an acceptable performance for low or medium size databases. Nevertheless, tuning database parameters can greatly improve the performance and lead to very competitive runtimes. PMID:25202745