public dna databases: Topics by Science.gov

Sample records for public dna databases

Public participation in genetic databases: crossing the boundaries between biobanks and forensic DNA databases through the principle of solidarity

PubMed Central

Machado, Helena; Silva, Susana

2015-01-01

The ethical aspects of biobanks and forensic DNA databases are often treated as separate issues. As a reflection of this, public participation, or the involvement of citizens in genetic databases, has been approached differently in the fields of forensics and medicine. This paper aims to cross the boundaries between medicine and forensics by exploring the flows between the ethical issues presented in the two domains and the subsequent conceptualisation of public trust and legitimisation. We propose to introduce the concept of ‘solidarity’, traditionally applied only to medical and research biobanks, into a consideration of public engagement in medicine and forensics. Inclusion of a solidarity-based framework, in both medical biobanks and forensic DNA databases, raises new questions that should be included in the ethical debate, in relation to both health services/medical research and activities associated with the criminal justice system. PMID:26139851
Public participation in genetic databases: crossing the boundaries between biobanks and forensic DNA databases through the principle of solidarity.

PubMed

Machado, Helena; Silva, Susana

2015-10-01

The ethical aspects of biobanks and forensic DNA databases are often treated as separate issues. As a reflection of this, public participation, or the involvement of citizens in genetic databases, has been approached differently in the fields of forensics and medicine. This paper aims to cross the boundaries between medicine and forensics by exploring the flows between the ethical issues presented in the two domains and the subsequent conceptualisation of public trust and legitimisation. We propose to introduce the concept of 'solidarity', traditionally applied only to medical and research biobanks, into a consideration of public engagement in medicine and forensics. Inclusion of a solidarity-based framework, in both medical biobanks and forensic DNA databases, raises new questions that should be included in the ethical debate, in relation to both health services/medical research and activities associated with the criminal justice system. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Canis mtDNA HV1 database: a web-based tool for collecting and surveying Canis mtDNA HV1 haplotype in public database.

PubMed

Thai, Quan Ke; Chung, Dung Anh; Tran, Hoang-Dung

2017-06-26

Canine and wolf mitochondrial DNA haplotypes, which can be used for forensic or phylogenetic analyses, have been defined in various schemes depending on the region analyzed. In recent studies, the 582 bp fragment of the HV1 region is most commonly used. 317 different canine HV1 haplotypes have been reported in the rapidly growing public database GenBank. These reported haplotypes contain several inconsistencies in their haplotype information. To overcome this issue, we have developed a Canis mtDNA HV1 database. This database collects data on the HV1 582 bp region in dog mitochondrial DNA from the GenBank to screen and correct the inconsistencies. It also supports users in detection of new novel mutation profiles and assignment of new haplotypes. The Canis mtDNA HV1 database (CHD) contains 5567 nucleotide entries originating from 15 subspecies in the species Canis lupus. Of these entries, 3646 were haplotypes and grouped into 804 distinct sequences. 319 sequences were recognized as previously assigned haplotypes, while the remaining 485 sequences had new mutation profiles and were marked as new haplotype candidates awaiting further analysis for haplotype assignment. Of the 3646 nucleotide entries, only 414 were annotated with correct haplotype information, while 3232 had insufficient or lacked haplotype information and were corrected or modified before storing in the CHD. The CHD can be accessed at http://chd.vnbiology.com . It provides sequences, haplotype information, and a web-based tool for mtDNA HV1 haplotyping. The CHD is updated monthly and supplies all data for download. The Canis mtDNA HV1 database contains information about canine mitochondrial DNA HV1 sequences with reconciled annotation. It serves as a tool for detection of inconsistencies in GenBank and helps identifying new HV1 haplotypes. Thus, it supports the scientific community in naming new HV1 haplotypes and to reconcile existing annotation of HV1 582 bp sequences.
MitoBreak: the mitochondrial DNA breakpoints database.

PubMed

Damas, Joana; Carneiro, João; Amorim, António; Pereira, Filipe

2014-01-01

Mitochondrial DNA (mtDNA) rearrangements are key events in the development of many diseases. Investigations of mtDNA regions affected by rearrangements (i.e. breakpoints) can lead to important discoveries about rearrangement mechanisms and can offer important clues about the causes of mitochondrial diseases. Here, we present the mitochondrial DNA breakpoints database (MitoBreak; http://mitobreak.portugene.com), a free, web-accessible comprehensive list of breakpoints from three classes of somatic mtDNA rearrangements: circular deleted (deletions), circular partially duplicated (duplications) and linear mtDNAs. Currently, MitoBreak contains >1400 mtDNA rearrangements from seven species (Homo sapiens, Mus musculus, Rattus norvegicus, Macaca mulatta, Drosophila melanogaster, Caenorhabditis elegans and Podospora anserina) and their associated phenotypic information collected from nearly 400 publications. The database allows researchers to perform multiple types of data analyses through user-friendly interfaces with full or partial datasets. It also permits the download of curated data and the submission of new mtDNA rearrangements. For each reported case, MitoBreak also documents the precise breakpoint positions, junction sequences, disease or associated symptoms and links to the related publications, providing a useful resource to study the causes and consequences of mtDNA structural alterations.
Compressing DNA sequence databases with coil.

PubMed

White, W Timothy J; Hendy, Michael D

2008-05-20

Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression - an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression - the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.
Compressing DNA sequence databases with coil

PubMed Central

White, W Timothy J; Hendy, Michael D

2008-01-01

Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work. PMID:18489794
Plant rDNA database: update and new features.

PubMed

Garcia, Sònia; Gálvez, Francisco; Gras, Airy; Kovařík, Aleš; Garnatje, Teresa

2014-01-01

The Plant rDNA database (www.plantrdnadatabase.com) is an open access online resource providing detailed information on numbers, structures and positions of 5S and 18S-5.8S-26S (35S) ribosomal DNA loci. The data have been obtained from >600 publications on plant molecular cytogenetics, mostly based on fluorescent in situ hybridization (FISH). This edition of the database contains information on 1609 species derived from 2839 records, which means an expansion of 55.76 and 94.45%, respectively. It holds the data for angiosperms, gymnosperms, bryophytes and pteridophytes available as of June 2013. Information from publications reporting data for a single rDNA (either 5S or 35S alone) and annotation regarding transcriptional activity of 35S loci now appears in the database. Preliminary analyses suggest greater variability in the number of rDNA loci in gymnosperms than in angiosperms. New applications provide ideograms of the species showing the positions of rDNA loci as well as a visual representation of their genome sizes. We have also introduced other features to boost the usability of the Web interface, such as an application for convenient data export and a new section with rDNA-FISH-related information (mostly detailing protocols and reagents). In addition, we upgraded and/or proofread tabs and links and modified the website for a more dynamic appearance. This manuscript provides a synopsis of these changes and developments. http://www.plantrdnadatabase.com. © The Author(s) 2014. Published by Oxford University Press.
High-throughput STR analysis for DNA database using direct PCR.

PubMed

Sim, Jeong Eun; Park, Su Jeong; Lee, Han Chul; Kim, Se-Yong; Kim, Jong Yeol; Lee, Seung Hwan

2013-07-01

Since the Korean criminal DNA database was launched in 2010, we have focused on establishing an automated DNA database profiling system that analyzes short tandem repeat loci in a high-throughput and cost-effective manner. We established a DNA database profiling system without DNA purification using a direct PCR buffer system. The quality of direct PCR procedures was compared with that of conventional PCR system under their respective optimized conditions. The results revealed not only perfect concordance but also an excellent PCR success rate, good electropherogram quality, and an optimal intra/inter-loci peak height ratio. In particular, the proportion of DNA extraction required due to direct PCR failure could be minimized to <3%. In conclusion, the newly developed direct PCR system can be adopted for automated DNA database profiling systems to replace or supplement conventional PCR system in a time- and cost-saving manner. © 2013 American Academy of Forensic Sciences Published 2013. This article is a U.S. Government work and is in the public domain in the U.S.A.
JICST Factual Database JICST DNA Database

NASA Astrophysics Data System (ADS)

Shirokizawa, Yoshiko; Abe, Atsushi

Japan Information Center of Science and Technology (JICST) has started the on-line service of DNA database in October 1988. This database is composed of EMBL Nucleotide Sequence Library and Genetic Sequence Data Bank. The authors outline the database system, data items and search commands. Examples of retrieval session are presented.
The Protein-DNA Interface database

PubMed Central

2010-01-01

The Protein-DNA Interface database (PDIdb) is a repository containing relevant structural information of Protein-DNA complexes solved by X-ray crystallography and available at the Protein Data Bank. The database includes a simple functional classification of the protein-DNA complexes that consists of three hierarchical levels: Class, Type and Subtype. This classification has been defined and manually curated by humans based on the information gathered from several sources that include PDB, PubMed, CATH, SCOP and COPS. The current version of the database contains only structures with resolution of 2.5 Å or higher, accounting for a total of 922 entries. The major aim of this database is to contribute to the understanding of the main rules that underlie the molecular recognition process between DNA and proteins. To this end, the database is focused on each specific atomic interface rather than on the separated binding partners. Therefore, each entry in this database consists of a single and independent protein-DNA interface. We hope that PDIdb will be useful to many researchers working in fields such as the prediction of transcription factor binding sites in DNA, the study of specificity determinants that mediate enzyme recognition events, engineering and design of new DNA binding proteins with distinct binding specificity and affinity, among others. Finally, due to its friendly and easy-to-use web interface, we hope that PDIdb will also serve educational and teaching purposes. PMID:20482798
The Protein-DNA Interface database.

PubMed

Norambuena, Tomás; Melo, Francisco

2010-05-18

The Protein-DNA Interface database (PDIdb) is a repository containing relevant structural information of Protein-DNA complexes solved by X-ray crystallography and available at the Protein Data Bank. The database includes a simple functional classification of the protein-DNA complexes that consists of three hierarchical levels: Class, Type and Subtype. This classification has been defined and manually curated by humans based on the information gathered from several sources that include PDB, PubMed, CATH, SCOP and COPS. The current version of the database contains only structures with resolution of 2.5 A or higher, accounting for a total of 922 entries. The major aim of this database is to contribute to the understanding of the main rules that underlie the molecular recognition process between DNA and proteins. To this end, the database is focused on each specific atomic interface rather than on the separated binding partners. Therefore, each entry in this database consists of a single and independent protein-DNA interface.We hope that PDIdb will be useful to many researchers working in fields such as the prediction of transcription factor binding sites in DNA, the study of specificity determinants that mediate enzyme recognition events, engineering and design of new DNA binding proteins with distinct binding specificity and affinity, among others. Finally, due to its friendly and easy-to-use web interface, we hope that PDIdb will also serve educational and teaching purposes.
Enhancing the DNA Patent Database

DOE Office of Scientific and Technical Information (OSTI.GOV)

Walters, LeRoy B.

Final Report on Award No. DE-FG0201ER63171 Principal Investigator: LeRoy B. Walters February 18, 2008 This project successfully completed its goal of surveying and reporting on the DNA patenting and licensing policies at 30 major U.S. academic institutions. The report of survey results was published in the January 2006 issue of Nature Biotechnology under the title “The Licensing of DNA Patents by US Academic Institutions: An Empirical Survey.” Lori Pressman was the lead author on this feature article. A PDF reprint of the article will be submitted to our Program Officer under separate cover. The project team has continued to updatemore » the DNA Patent Database on a weekly basis since the conclusion of the project. The database can be accessed at dnapatents.georgetown.edu. This database provides a valuable research tool for academic researchers, policymakers, and citizens. A report entitled Reaping the Benefits of Genomic and Proteomic Research: Intellectual Property Rights, Innovation, and Public Health was published in 2006 by the Committee on Intellectual Property Rights in Genomic and Protein Research and Innovation, Board on Science, Technology, and Economic Policy at the National Academies. The report was edited by Stephen A. Merrill and Anne-Marie Mazza. This report employed and then adapted the methodology developed by our research project and quoted our findings at several points. (The full report can be viewed online at the following URL: http://www.nap.edu/openbook.php?record_id=11487&page=R1). My colleagues and I are grateful for the research support of the ELSI program at the U.S. Department of Energy.« less
Attitudes regarding the national forensic DNA database: Survey data from the general public, prison inmates and prosecutors' offices in the Republic of Serbia.

PubMed

Teodorović, Smilja; Mijović, Dragan; Radovanović Nenadić, Una; Savić, Marina

2017-05-01

Worldwide, the establishment of national forensic DNA databases has transformed personal identification in the criminal justice system over the past two decades. It has also stimulated much debate centering on ethical issues, human rights, individual privacy, lack of safeguards and other standards. Therefore, a balance between effectiveness and intrusiveness of a national DNA repository is an imperative and needs to be achieved through a suitable legal framework. On its path to the European Union (EU), the Republic of Serbia is required to harmonize its national policies and legislation with the EU. Specifically, Chapter 24 of the EU acquis communautaire (Justice, Freedom and Security) stipulates the compulsory creation of a forensic DNA registry and adoption of corresponding legislation. This process is expected to occur in 2016. Thus, in light of launching the national DNA database, the goal of this work is to instigate a consultation with the Serbian public regarding their views on various aspects of the forensic DNA databank. Importantly, this study specifically assessed the opinions of distinct categories of citizens, including the general public, the prosecutors' offices staff, prisoners, prison guards, and students majoring in criminalistics. Our findings set a baseline for Serbian attitudes towards DNA databank custody, DNA sample and profile inclusion and retention criteria, ethical issues and concerns. Furthermore, results clearly demonstrate a permissive outlook of the respondents who are professional "beneficiaries" of genetic profiling and a restrictive position taken by the respondents whose genetic material has been acquired by the government. We believe that this opinion poll will be essential in discussions regarding a national DNA database, as well as in motivating further research on the reasons behind the observed views and subsequent development of educational strategies. All of these are, in turn, expected to aid the creation of suitable
Short Tandem Repeat DNA Internet Database

National Institute of Standards and Technology Data Gateway

SRD 130 Short Tandem Repeat DNA Internet Database (Web, free access) Short Tandem Repeat DNA Internet Database is intended to benefit research and application of short tandem repeat DNA markers for human identity testing. Facts and sequence information on each STR system, population data, commonly used multiplex STR systems, PCR primers and conditions, and a review of various technologies for analysis of STR alleles have been included.
DNApod: DNA polymorphism annotation database from next-generation sequence read archives.

PubMed

Mochizuki, Takako; Tanizawa, Yasuhiro; Fujisawa, Takatomo; Ohta, Tazro; Nikoh, Naruo; Shimizu, Tokurou; Toyoda, Atsushi; Fujiyama, Asao; Kurata, Nori; Nagasaki, Hideki; Kaminuma, Eli; Nakamura, Yasukazu

2017-01-01

With the rapid advances in next-generation sequencing (NGS), datasets for DNA polymorphisms among various species and strains have been produced, stored, and distributed. However, reliability varies among these datasets because the experimental and analytical conditions used differ among assays. Furthermore, such datasets have been frequently distributed from the websites of individual sequencing projects. It is desirable to integrate DNA polymorphism data into one database featuring uniform quality control that is distributed from a single platform at a single place. DNA polymorphism annotation database (DNApod; http://tga.nig.ac.jp/dnapod/) is an integrated database that stores genome-wide DNA polymorphism datasets acquired under uniform analytical conditions, and this includes uniformity in the quality of the raw data, the reference genome version, and evaluation algorithms. DNApod genotypic data are re-analyzed whole-genome shotgun datasets extracted from sequence read archives, and DNApod distributes genome-wide DNA polymorphism datasets and known-gene annotations for each DNA polymorphism. This new database was developed for storing genome-wide DNA polymorphism datasets of plants, with crops being the first priority. Here, we describe our analyzed data for 679, 404, and 66 strains of rice, maize, and sorghum, respectively. The analytical methods are available as a DNApod workflow in an NGS annotation system of the DNA Data Bank of Japan and a virtual machine image. Furthermore, DNApod provides tables of links of identifiers between DNApod genotypic data and public phenotypic data. To advance the sharing of organism knowledge, DNApod offers basic and ubiquitous functions for multiple alignment and phylogenetic tree construction by using orthologous gene information.
DNApod: DNA polymorphism annotation database from next-generation sequence read archives

PubMed Central

Mochizuki, Takako; Tanizawa, Yasuhiro; Fujisawa, Takatomo; Ohta, Tazro; Nikoh, Naruo; Shimizu, Tokurou; Toyoda, Atsushi; Fujiyama, Asao; Kurata, Nori; Nagasaki, Hideki; Kaminuma, Eli; Nakamura, Yasukazu

2017-01-01

With the rapid advances in next-generation sequencing (NGS), datasets for DNA polymorphisms among various species and strains have been produced, stored, and distributed. However, reliability varies among these datasets because the experimental and analytical conditions used differ among assays. Furthermore, such datasets have been frequently distributed from the websites of individual sequencing projects. It is desirable to integrate DNA polymorphism data into one database featuring uniform quality control that is distributed from a single platform at a single place. DNA polymorphism annotation database (DNApod; http://tga.nig.ac.jp/dnapod/) is an integrated database that stores genome-wide DNA polymorphism datasets acquired under uniform analytical conditions, and this includes uniformity in the quality of the raw data, the reference genome version, and evaluation algorithms. DNApod genotypic data are re-analyzed whole-genome shotgun datasets extracted from sequence read archives, and DNApod distributes genome-wide DNA polymorphism datasets and known-gene annotations for each DNA polymorphism. This new database was developed for storing genome-wide DNA polymorphism datasets of plants, with crops being the first priority. Here, we describe our analyzed data for 679, 404, and 66 strains of rice, maize, and sorghum, respectively. The analytical methods are available as a DNApod workflow in an NGS annotation system of the DNA Data Bank of Japan and a virtual machine image. Furthermore, DNApod provides tables of links of identifiers between DNApod genotypic data and public phenotypic data. To advance the sharing of organism knowledge, DNApod offers basic and ubiquitous functions for multiple alignment and phylogenetic tree construction by using orthologous gene information. PMID:28234924
The Israel DNA database--the establishment of a rapid, semi-automated analysis system.

PubMed

Zamir, Ashira; Dell'Ariccia-Carmon, Aviva; Zaken, Neomi; Oz, Carla

2012-03-01

The Israel Police DNA database, also known as IPDIS (Israel Police DNA Index System), has been operating since February 2007. During that time more than 135,000 reference samples have been uploaded and more than 2000 hits reported. We have developed an effective semi-automated system that includes two automated punchers, three liquid handler robots and four genetic analyzers. An inhouse LIMS program enables full tracking of every sample through the entire process of registration, pre-PCR handling, analysis of profiles, uploading to the database, hit reports and ultimately storage. The LIMS is also responsible for the future tracking of samples and their profiles to be expunged from the database according to the Israeli DNA legislation. The database is administered by an in-house developed software program, where reference and evidentiary profiles are uploaded, stored, searched and matched. The DNA database has proven to be an effective investigative tool which has gained the confidence of the Israeli public and on which the Israel National Police force has grown to rely. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Towards computational improvement of DNA database indexing and short DNA query searching.

PubMed

Stojanov, Done; Koceski, Sašo; Mileva, Aleksandra; Koceska, Nataša; Bande, Cveta Martinovska

2014-09-03

In order to facilitate and speed up the search of massive DNA databases, the database is indexed at the beginning, employing a mapping function. By searching through the indexed data structure, exact query hits can be identified. If the database is searched against an annotated DNA query, such as a known promoter consensus sequence, then the starting locations and the number of potential genes can be determined. This is particularly relevant if unannotated DNA sequences have to be functionally annotated. However, indexing a massive DNA database and searching an indexed data structure with millions of entries is a time-demanding process. In this paper, we propose a fast DNA database indexing and searching approach, identifying all query hits in the database, without having to examine all entries in the indexed data structure, limiting the maximum length of a query that can be searched against the database. By applying the proposed indexing equation, the whole human genome could be indexed in 10 hours on a personal computer, under the assumption that there is enough RAM to store the indexed data structure. Analysing the methodology proposed by Reneker, we observed that hits at starting positions [Formula: see text] are not reported, if the database is searched against a query shorter than [Formula: see text] nucleotides, such that [Formula: see text] is the length of the DNA database words being mapped and [Formula: see text] is the length of the query. A solution of this drawback is also presented.
Evaluation of DNA mixtures from database search.

PubMed

Chung, Yuk-Ka; Hu, Yue-Qing; Fung, Wing K

2010-03-01

With the aim of bridging the gap between DNA mixture analysis and DNA database search, a novel approach is proposed to evaluate the forensic evidence of DNA mixtures when the suspect is identified by the search of a database of DNA profiles. General formulae are developed for the calculation of the likelihood ratio for a two-person mixture under general situations including multiple matches and imperfect evidence. The influence of the prior probabilities on the weight of evidence under the scenario of multiple matches is demonstrated by a numerical example based on Hong Kong data. Our approach is shown to be capable of presenting the forensic evidence of DNA mixtures in a comprehensive way when the suspect is identified through database search.
Public variant databases: liability?

PubMed

Thorogood, Adrian; Cook-Deegan, Robert; Knoppers, Bartha Maria

2017-07-01

Public variant databases support the curation, clinical interpretation, and sharing of genomic data, thus reducing harmful errors or delays in diagnosis. As variant databases are increasingly relied on in the clinical context, there is concern that negligent variant interpretation will harm patients and attract liability. This article explores the evolving legal duties of laboratories, public variant databases, and physicians in clinical genomics and recommends a governance framework for databases to promote responsible data sharing.Genet Med advance online publication 15 December 2016.

Public variant databases: liability?

PubMed Central

Thorogood, Adrian; Cook-Deegan, Robert; Knoppers, Bartha Maria

2017-01-01

Public variant databases support the curation, clinical interpretation, and sharing of genomic data, thus reducing harmful errors or delays in diagnosis. As variant databases are increasingly relied on in the clinical context, there is concern that negligent variant interpretation will harm patients and attract liability. This article explores the evolving legal duties of laboratories, public variant databases, and physicians in clinical genomics and recommends a governance framework for databases to promote responsible data sharing. Genet Med advance online publication 15 December 2016 PMID:27977006
Searching mixed DNA profiles directly against profile databases.

PubMed

Bright, Jo-Anne; Taylor, Duncan; Curran, James; Buckleton, John

2014-03-01

DNA databases have revolutionised forensic science. They are a powerful investigative tool as they have the potential to identify persons of interest in criminal investigations. Routinely, a DNA profile generated from a crime sample could only be searched for in a database of individuals if the stain was from single contributor (single source) or if a contributor could unambiguously be determined from a mixed DNA profile. This meant that a significant number of samples were unsuitable for database searching. The advent of continuous methods for the interpretation of DNA profiles offers an advanced way to draw inferential power from the considerable investment made in DNA databases. Using these methods, each profile on the database may be considered a possible contributor to a mixture and a likelihood ratio (LR) can be formed. Those profiles which produce a sufficiently large LR can serve as an investigative lead. In this paper empirical studies are described to determine what constitutes a large LR. We investigate the effect on a database search of complex mixed DNA profiles with contributors in equal proportions with dropout as a consideration, and also the effect of an incorrect assignment of the number of contributors to a profile. In addition, we give, as a demonstration of the method, the results using two crime samples that were previously unsuitable for database comparison. We show that effective management of the selection of samples for searching and the interpretation of the output can be highly informative. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
STRBase: a short tandem repeat DNA database for the human identity testing community

PubMed Central

Ruitberg, Christian M.; Reeder, Dennis J.; Butler, John M.

2001-01-01

The National Institute of Standards and Technology (NIST) has compiled and maintained a Short Tandem Repeat DNA Internet Database (http://www.cstl.nist.gov/biotech/strbase/) since 1997 commonly referred to as STRBase. This database is an information resource for the forensic DNA typing community with details on commonly used short tandem repeat (STR) DNA markers. STRBase consolidates and organizes the abundant literature on this subject to facilitate on-going efforts in DNA typing. Observed alleles and annotated sequence for each STR locus are described along with a review of STR analysis technologies. Additionally, commercially available STR multiplex kits are described, published polymerase chain reaction (PCR) primer sequences are reported, and validation studies conducted by a number of forensic laboratories are listed. To supplement the technical information, addresses for scientists and hyperlinks to organizations working in this area are available, along with the comprehensive reference list of over 1300 publications on STRs used for DNA typing purposes. PMID:11125125
MICA: desktop software for comprehensive searching of DNA databases

PubMed Central

Stokes, William A; Glick, Benjamin S

2006-01-01

Background Molecular biologists work with DNA databases that often include entire genomes. A common requirement is to search a DNA database to find exact matches for a nondegenerate or partially degenerate query. The software programs available for such purposes are normally designed to run on remote servers, but an appealing alternative is to work with DNA databases stored on local computers. We describe a desktop software program termed MICA (K-Mer Indexing with Compact Arrays) that allows large DNA databases to be searched efficiently using very little memory. Results MICA rapidly indexes a DNA database. On a Macintosh G5 computer, the complete human genome could be indexed in about 5 minutes. The indexing algorithm recognizes all 15 characters of the DNA alphabet and fully captures the information in any DNA sequence, yet for a typical sequence of length L, the index occupies only about 2L bytes. The index can be searched to return a complete list of exact matches for a nondegenerate or partially degenerate query of any length. A typical search of a long DNA sequence involves reading only a small fraction of the index into memory. As a result, searches are fast even when the available RAM is limited. Conclusion MICA is suitable as a search engine for desktop DNA analysis software. PMID:17018144
24 CFR 81.72 - Public-use database and public information.

Code of Federal Regulations, 2012 CFR

2012-04-01

... 24 Housing and Urban Development 1 2012-04-01 2012-04-01 false Public-use database and public... Public-use database and public information. (a) General. Except as provided in paragraph (c) of this section, the Secretary shall establish and make available for public use, a public-use database containing...
24 CFR 81.72 - Public-use database and public information.

Code of Federal Regulations, 2013 CFR

2013-04-01

... 24 Housing and Urban Development 1 2013-04-01 2013-04-01 false Public-use database and public... Public-use database and public information. (a) General. Except as provided in paragraph (c) of this section, the Secretary shall establish and make available for public use, a public-use database containing...
24 CFR 81.72 - Public-use database and public information.

Code of Federal Regulations, 2010 CFR

2010-04-01

... 24 Housing and Urban Development 1 2010-04-01 2010-04-01 false Public-use database and public... Public-use database and public information. (a) General. Except as provided in paragraph (c) of this section, the Secretary shall establish and make available for public use, a public-use database containing...
24 CFR 81.72 - Public-use database and public information.

Code of Federal Regulations, 2014 CFR

2014-04-01

... 24 Housing and Urban Development 1 2014-04-01 2014-04-01 false Public-use database and public... Public-use database and public information. (a) General. Except as provided in paragraph (c) of this section, the Secretary shall establish and make available for public use, a public-use database containing...
24 CFR 81.72 - Public-use database and public information.

Code of Federal Regulations, 2011 CFR

2011-04-01

... 24 Housing and Urban Development 1 2011-04-01 2011-04-01 false Public-use database and public... Public-use database and public information. (a) General. Except as provided in paragraph (c) of this section, the Secretary shall establish and make available for public use, a public-use database containing...
[Integrated DNA barcoding database for identifying Chinese animal medicine].

PubMed

Shi, Lin-Chun; Yao, Hui; Xie, Li-Fang; Zhu, Ying-Jie; Song, Jing-Yuan; Zhang, Hui; Chen, Shi-Lin

2014-06-01

In order to construct an integrated DNA barcoding database for identifying Chinese animal medicine, the authors and their cooperators have completed a lot of researches for identifying Chinese animal medicines using DNA barcoding technology. Sequences from GenBank have been analyzed simultaneously. Three different methods, BLAST, barcoding gap and Tree building, have been used to confirm the reliabilities of barcode records in the database. The integrated DNA barcoding database for identifying Chinese animal medicine has been constructed using three different parts: specimen, sequence and literature information. This database contained about 800 animal medicines and the adulterants and closely related species. Unknown specimens can be identified by pasting their sequence record into the window on the ID page of species identification system for traditional Chinese medicine (www. tcmbarcode. cn). The integrated DNA barcoding database for identifying Chinese animal medicine is significantly important for animal species identification, rare and endangered species conservation and sustainable utilization of animal resources.
DDRprot: a database of DNA damage response-related proteins.

PubMed

Andrés-León, Eduardo; Cases, Ildefonso; Arcas, Aida; Rojas, Ana M

2016-01-01

The DNA Damage Response (DDR) signalling network is an essential system that protects the genome's integrity. The DDRprot database presented here is a resource that integrates manually curated information on the human DDR network and its sub-pathways. For each particular DDR protein, we present detailed information about its function. If involved in post-translational modifications (PTMs) with each other, we depict the position of the modified residue/s in the three-dimensional structures, when resolved structures are available for the proteins. All this information is linked to the original publication from where it was obtained. Phylogenetic information is also shown, including time of emergence and conservation across 47 selected species, family trees and sequence alignments of homologues. The DDRprot database can be queried by different criteria: pathways, species, evolutionary age or involvement in (PTM). Sequence searches using hidden Markov models can be also used.Database URL: http://ddr.cbbio.es. © The Author(s) 2016. Published by Oxford University Press.
Evolutionary trends in animal ribosomal DNA loci: introduction to a new online database.

PubMed

Sochorová, Jana; Garcia, Sònia; Gálvez, Francisco; Symonová, Radka; Kovařík, Aleš

2018-03-01

Ribosomal DNA (rDNA) loci encoding 5S and 45S (18S-5.8S-28S) rRNAs are important components of eukaryotic chromosomes. Here, we set up the animal rDNA database containing cytogenetic information about these loci in 1343 animal species (264 families) collected from 542 publications. The data are based on in situ hybridisation studies (both radioactive and fluorescent) carried out in major groups of vertebrates (fish, reptiles, amphibians, birds, and mammals) and invertebrates (mostly insects and mollusks). The database is accessible online at www.animalrdnadatabase.com . The median number of 45S and 5S sites was close to two per diploid chromosome set for both rDNAs despite large variation (1-74 for 5S and 1-54 for 45S sites). No significant correlation between the number of 5S and 45S rDNA loci was observed, suggesting that their distribution and amplification across the chromosomes follow independent evolutionary trajectories. Each group, irrespective of taxonomic classification, contained rDNA sites at any chromosome location. However, the distal and pericentromeric positions were the most prevalent (> 75% karyotypes) for 45S loci, while the position of 5S loci was more variable. We also examined potential relationships between molecular attributes of rDNA (homogenisation and expression) and cytogenetic parameters such as rDNA positions, chromosome number, and morphology.
MethHC: a database of DNA methylation and gene expression in human cancer.

PubMed

Huang, Wei-Yun; Hsu, Sheng-Da; Huang, Hsi-Yuan; Sun, Yi-Ming; Chou, Chih-Hung; Weng, Shun-Long; Huang, Hsien-Da

2015-01-01

We present MethHC (http://MethHC.mbc.nctu.edu.tw), a database comprising a systematic integration of a large collection of DNA methylation data and mRNA/microRNA expression profiles in human cancer. DNA methylation is an important epigenetic regulator of gene transcription, and genes with high levels of DNA methylation in their promoter regions are transcriptionally silent. Increasing numbers of DNA methylation and mRNA/microRNA expression profiles are being published in different public repositories. These data can help researchers to identify epigenetic patterns that are important for carcinogenesis. MethHC integrates data such as DNA methylation, mRNA expression, DNA methylation of microRNA gene and microRNA expression to identify correlations between DNA methylation and mRNA/microRNA expression from TCGA (The Cancer Genome Atlas), which includes 18 human cancers in more than 6000 samples, 6548 microarrays and 12 567 RNA sequencing data. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
hPDI: a database of experimental human protein-DNA interactions.

PubMed

Xie, Zhi; Hu, Shaohui; Blackshaw, Seth; Zhu, Heng; Qian, Jiang

2010-01-15

The human protein DNA Interactome (hPDI) database holds experimental protein-DNA interaction data for humans identified by protein microarray assays. The unique characteristics of hPDI are that it contains consensus DNA-binding sequences not only for nearly 500 human transcription factors but also for >500 unconventional DNA-binding proteins, which are completely uncharacterized previously. Users can browse, search and download a subset or the entire data via a web interface. This database is freely accessible for any academic purposes. http://bioinfo.wilmer.jhu.edu/PDI/.
TFBSshape: a motif database for DNA shape features of transcription factor binding sites.

PubMed

Yang, Lin; Zhou, Tianyin; Dror, Iris; Mathelier, Anthony; Wasserman, Wyeth W; Gordân, Raluca; Rohs, Remo

2014-01-01

Transcription factor binding sites (TFBSs) are most commonly characterized by the nucleotide preferences at each position of the DNA target. Whereas these sequence motifs are quite accurate descriptions of DNA binding specificities of transcription factors (TFs), proteins recognize DNA as a three-dimensional object. DNA structural features refine the description of TF binding specificities and provide mechanistic insights into protein-DNA recognition. Existing motif databases contain extensive nucleotide sequences identified in binding experiments based on their selection by a TF. To utilize DNA shape information when analysing the DNA binding specificities of TFs, we developed a new tool, the TFBSshape database (available at http://rohslab.cmb.usc.edu/TFBSshape/), for calculating DNA structural features from nucleotide sequences provided by motif databases. The TFBSshape database can be used to generate heat maps and quantitative data for DNA structural features (i.e., minor groove width, roll, propeller twist and helix twist) for 739 TF datasets from 23 different species derived from the motif databases JASPAR and UniPROBE. As demonstrated for the basic helix-loop-helix and homeodomain TF families, our TFBSshape database can be used to compare, qualitatively and quantitatively, the DNA binding specificities of closely related TFs and, thus, uncover differential DNA binding specificities that are not apparent from nucleotide sequence alone.
TFBSshape: a motif database for DNA shape features of transcription factor binding sites

PubMed Central

Yang, Lin; Zhou, Tianyin; Dror, Iris; Mathelier, Anthony; Wasserman, Wyeth W.; Gordân, Raluca; Rohs, Remo

2014-01-01

Transcription factor binding sites (TFBSs) are most commonly characterized by the nucleotide preferences at each position of the DNA target. Whereas these sequence motifs are quite accurate descriptions of DNA binding specificities of transcription factors (TFs), proteins recognize DNA as a three-dimensional object. DNA structural features refine the description of TF binding specificities and provide mechanistic insights into protein–DNA recognition. Existing motif databases contain extensive nucleotide sequences identified in binding experiments based on their selection by a TF. To utilize DNA shape information when analysing the DNA binding specificities of TFs, we developed a new tool, the TFBSshape database (available at http://rohslab.cmb.usc.edu/TFBSshape/), for calculating DNA structural features from nucleotide sequences provided by motif databases. The TFBSshape database can be used to generate heat maps and quantitative data for DNA structural features (i.e., minor groove width, roll, propeller twist and helix twist) for 739 TF datasets from 23 different species derived from the motif databases JASPAR and UniPROBE. As demonstrated for the basic helix-loop-helix and homeodomain TF families, our TFBSshape database can be used to compare, qualitatively and quantitatively, the DNA binding specificities of closely related TFs and, thus, uncover differential DNA binding specificities that are not apparent from nucleotide sequence alone. PMID:24214955
Development and expansion of high-quality control region databases to improve forensic mtDNA evidence interpretation.

PubMed

Irwin, Jodi A; Saunier, Jessica L; Strouss, Katharine M; Sturk, Kimberly A; Diegoli, Toni M; Just, Rebecca S; Coble, Michael D; Parson, Walther; Parsons, Thomas J

2007-06-01

In an effort to increase the quantity, breadth and availability of mtDNA databases suitable for forensic comparisons, we have developed a high-throughput process to generate approximately 5000 control region sequences per year from regional US populations, global populations from which the current US population is derived and global populations currently under-represented in available forensic databases. The system utilizes robotic instrumentation for all laboratory steps from pre-extraction through sequence detection, and a rigorous eight-step, multi-laboratory data review process with entirely electronic data transfer. Over the past 3 years, nearly 10,000 control region sequences have been generated using this approach. These data are being made publicly available and should further address the need for consistent, high-quality mtDNA databases for forensic testing.
Deppdb--DNA electrostatic potential properties database: electrostatic properties of genome DNA.

PubMed

Osypov, Alexander A; Krutinin, Gleb G; Kamzolova, Svetlana G

2010-06-01

The electrostatic properties of genome DNA influence its interactions with different proteins, in particular, the regulation of transcription by RNA-polymerases. DEPPDB--DNA Electrostatic Potential Properties Database--was developed to hold and provide all available information on the electrostatic properties of genome DNA combined with its sequence and annotation of biological and structural properties of genome elements and whole genomes. Genomes in DEPPDB are organized on a taxonomical basis. Currently, the database contains all the completely sequenced bacterial and viral genomes according to NCBI RefSeq. General properties of the genome DNA electrostatic potential profile and principles of its formation are revealed. This potential correlates with the GC content but does not correspond to it exactly and strongly depends on both the sequence arrangement and its context (flanking regions). Analysis of the promoter regions for bacterial and viral RNA polymerases revealed a correspondence between the scale of these proteins' physical properties and electrostatic profile patterns. We also discovered a direct correlation between the potential value and the binding frequency of RNA polymerase to DNA, supporting the idea of the role of electrostatics in these interactions. This matches a pronounced tendency of the promoter regions to possess higher values of the electrostatic potential.
RICD: a rice indica cDNA database resource for rice functional genomics.

PubMed

Lu, Tingting; Huang, Xuehui; Zhu, Chuanrang; Huang, Tao; Zhao, Qiang; Xie, Kabing; Xiong, Lizhong; Zhang, Qifa; Han, Bin

2008-11-26

The Oryza sativa L. indica subspecies is the most widely cultivated rice. During the last few years, we have collected over 20,000 putative full-length cDNAs and over 40,000 ESTs isolated from various cDNA libraries of two indica varieties Guangluai 4 and Minghui 63. A database of the rice indica cDNAs was therefore built to provide a comprehensive web data source for searching and retrieving the indica cDNA clones. Rice Indica cDNA Database (RICD) is an online MySQL-PHP driven database with a user-friendly web interface. It allows investigators to query the cDNA clones by keyword, genome position, nucleotide or protein sequence, and putative function. It also provides a series of information, including sequences, protein domain annotations, similarity search results, SNPs and InDels information, and hyperlinks to gene annotation in both The Rice Annotation Project Database (RAP-DB) and The TIGR Rice Genome Annotation Resource, expression atlas in RiceGE and variation report in Gramene of each cDNA. The online rice indica cDNA database provides cDNA resource with comprehensive information to researchers for functional analysis of indica subspecies and for comparative genomics. The RICD database is available through our website http://www.ncgr.ac.cn/ricd.
MitBASE : a comprehensive and integrated mitochondrial DNA database. The present status

PubMed Central

Attimonelli, M.; Altamura, N.; Benne, R.; Brennicke, A.; Cooper, J. M.; D’Elia, D.; Montalvo, A. de; Pinto, B. de; De Robertis, M.; Golik, P.; Knoop, V.; Lanave, C.; Lazowska, J.; Licciulli, F.; Malladi, B. S.; Memeo, F.; Monnerot, M.; Pasimeni, R.; Pilbout, S.; Schapira, A. H. V.; Sloof, P.; Saccone, C.

2000-01-01

MitBASE is an integrated and comprehensive database of mitochondrial DNA data which collects, under a single interface, databases for Plant, Vertebrate, Invertebrate, Human, Protist and Fungal mtDNA and a Pilot database on nuclear genes involved in mitochondrial biogenesis in Saccharomyces cerevisiae. MitBASE reports all available information from different organisms and from intraspecies variants and mutants. Data have been drawn from the primary databases and from the literature; value adding information has been structured, e.g., editing information on protist mtDNA genomes, pathological information for human mtDNA variants, etc. The different databases, some of which are structured using commercial packages (Microsoft Access, File Maker Pro) while others use a flat-file format, have been integrated under ORACLE. Ad hoc retrieval systems have been devised for some of the above listed databases keeping into account their peculiarities. The database is resident at the EBI and is available at the following site: http://www3.ebi.ac.uk/Research/Mitbase/mitbase.pl . The impact of this project is intended for both basic and applied research. The study of mitochondrial genetic diseases and mitochondrial DNA intraspecies diversity are key topics in several biotechnological fields. The database has been funded within the EU Biotechnology programme. PMID:10592207

DEPPDB - DNA electrostatic potential properties database. Electrostatic properties of genome DNA elements.

PubMed

Osypov, Alexander A; Krutinin, Gleb G; Krutinina, Eugenia A; Kamzolova, Svetlana G

2012-04-01

Electrostatic properties of genome DNA are important to its interactions with different proteins, in particular, related to transcription. DEPPDB - DNA Electrostatic Potential (and other Physical) Properties Database - provides information on the electrostatic and other physical properties of genome DNA combined with its sequence and annotation of biological and structural properties of genomes and their elements. Genomes are organized on taxonomical basis, supporting comparative and evolutionary studies. Currently, DEPPDB contains all completely sequenced bacterial, viral, mitochondrial, and plastids genomes according to the NCBI RefSeq, and some model eukaryotic genomes. Data for promoters, regulation sites, binding proteins, etc., are incorporated from established DBs and literature. The database is complemented by analytical tools. User sequences calculations are available. Case studies discovered electrostatics complementing DNA bending in E.coli plasmid BNT2 promoter functioning, possibly affecting host-environment metabolic switch. Transcription factors binding sites gravitate to high potential regions, confirming the electrostatics universal importance in protein-DNA interactions beyond the classical promoter-RNA polymerase recognition and regulation. Other genome elements, such as terminators, also show electrostatic peculiarities. Most intriguing are gene starts, exhibiting taxonomic correlations. The necessity of the genome electrostatic properties studies is discussed.
Forensic DNA databases in Western Balkan region: retrospectives, perspectives, and initiatives

PubMed Central

Marjanović, Damir; Konjhodžić, Rijad; Butorac, Sara Sanela; Drobnič, Katja; Merkaš, Siniša; Lauc, Gordan; Primorac, Damir; Anđelinović, Šimun; Milosavljević, Mladen; Karan, Željko; Vidović, Stojko; Stojković, Oliver; Panić, Bojana; Vučetić Dragović, Anđelka; Kovačević, Sandra; Jakovski, Zlatko; Asplen, Chris; Primorac, Dragan

2011-01-01

The European Network of Forensic Science Institutes (ENFSI) recommended the establishment of forensic DNA databases and specific implementation and management legislations for all EU/ENFSI members. Therefore, forensic institutions from Bosnia and Herzegovina, Serbia, Montenegro, and Macedonia launched a wide set of activities to support these recommendations. To assess the current state, a regional expert team completed detailed screening and investigation of the existing forensic DNA data repositories and associated legislation in these countries. The scope also included relevant concurrent projects and a wide spectrum of different activities in relation to forensics DNA use. The state of forensic DNA analysis was also determined in the neighboring Slovenia and Croatia, which already have functional national DNA databases. There is a need for a ‘regional supplement’ to the current documentation and standards pertaining to forensic application of DNA databases, which should include regional-specific preliminary aims and recommendations. PMID:21674821
Forensic DNA databases in Western Balkan region: retrospectives, perspectives, and initiatives.

PubMed

Marjanović, Damir; Konjhodzić, Rijad; Butorac, Sara Sanela; Drobnic, Katja; Merkas, Sinisa; Lauc, Gordan; Primorac, Damir; Andjelinović, Simun; Milosavljević, Mladen; Karan, Zeljko; Vidović, Stojko; Stojković, Oliver; Panić, Bojana; Vucetić Dragović, Andjelka; Kovacević, Sandra; Jakovski, Zlatko; Asplen, Chris; Primorac, Dragan

2011-06-01

The European Network of Forensic Science Institutes (ENFSI) recommended the establishment of forensic DNA databases and specific implementation and management legislations for all EU/ENFSI members. Therefore, forensic institutions from Bosnia and Herzegovina, Serbia, Montenegro, and Macedonia launched a wide set of activities to support these recommendations. To assess the current state, a regional expert team completed detailed screening and investigation of the existing forensic DNA data repositories and associated legislation in these countries. The scope also included relevant concurrent projects and a wide spectrum of different activities in relation to forensics DNA use. The state of forensic DNA analysis was also determined in the neighboring Slovenia and Croatia, which already have functional national DNA databases. There is a need for a 'regional supplement' to the current documentation and standards pertaining to forensic application of DNA databases, which should include regional-specific preliminary aims and recommendations.
NCCDPHP PUBLICATION DATABASE

EPA Science Inventory

This database provides bibliographic citations and abstracts of publications produced by the CDC's National Center for Chronic Disease Prevention and Health Promotion (NCCDPHP) including journal articles, monographs, book chapters, reports, policy documents, and fact sheets. Full...
Database Support for Research in Public Administration

ERIC Educational Resources Information Center

Tucker, James Cory

2005-01-01

This study examines the extent to which databases support student and faculty research in the area of public administration. A list of journals in public administration, public policy, political science, public budgeting and finance, and other related areas was compared to the journal content list of six business databases. These databases…
The Web-Based DNA Vaccine Database DNAVaxDB and Its Usage for Rational DNA Vaccine Design.

PubMed

Racz, Rebecca; He, Yongqun

2016-01-01

A DNA vaccine is a vaccine that uses a mammalian expression vector to express one or more protein antigens and is administered in vivo to induce an adaptive immune response. Since the 1990s, a significant amount of research has been performed on DNA vaccines and the mechanisms behind them. To meet the needs of the DNA vaccine research community, we created DNAVaxDB ( http://www.violinet.org/dnavaxdb ), the first Web-based database and analysis resource of experimentally verified DNA vaccines. All the data in DNAVaxDB, which includes plasmids, antigens, vaccines, and sources, is manually curated and experimentally verified. This chapter goes over the detail of DNAVaxDB system and shows how the DNA vaccine database, combined with the Vaxign vaccine design tool, can be used for rational design of a DNA vaccine against a pathogen, such as Mycobacterium bovis.
3DNALandscapes: a database for exploring the conformational features of DNA.

PubMed

Zheng, Guohui; Colasanti, Andrew V; Lu, Xiang-Jun; Olson, Wilma K

2010-01-01

3DNALandscapes, located at: http://3DNAscapes.rutgers.edu, is a new database for exploring the conformational features of DNA. In contrast to most structural databases, which archive the Cartesian coordinates and/or derived parameters and images for individual structures, 3DNALandscapes enables searches of conformational information across multiple structures. The database contains a wide variety of structural parameters and molecular images, computed with the 3DNA software package and known to be useful for characterizing and understanding the sequence-dependent spatial arrangements of the DNA sugar-phosphate backbone, sugar-base side groups, base pairs, base-pair steps, groove structure, etc. The data comprise all DNA-containing structures--both free and bound to proteins, drugs and other ligands--currently available in the Protein Data Bank. The web interface allows the user to link, report, plot and analyze this information from numerous perspectives and thereby gain insight into DNA conformation, deformability and interactions in different sequence and structural contexts. The data accumulated from known, well-resolved DNA structures can serve as useful benchmarks for the analysis and simulation of new structures. The collective data can also help to understand how DNA deforms in response to proteins and other molecules and undergoes conformational rearrangements.
An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system.

PubMed

AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

2015-11-19

Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database in which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. This database will facilitate the analysis of protein-DNA interactions and the development of
An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system

DOE Office of Scientific and Technical Information (OSTI.GOV)

AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the
An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system

DOE PAGES

AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

2015-11-19

Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the
Ribosomal DNA stability is supported by many 'buffer genes'-introduction to the Yeast rDNA Stability Database.

PubMed

Kobayashi, Takehiko; Sasaki, Mariko

2017-01-01

The ribosomal RNA gene (rDNA) is the most abundant gene in yeast and other eukaryotic organisms. Due to its heavy transcription, repetitive structure and programmed replication fork pauses, the rDNA is one of the most unstable regions in the genome. Thus, the rDNA is the best region to study the mechanisms responsible for maintaining genome integrity. Recently, we screened a library of ∼4800 budding yeast gene knockout strains to identify mutants defective in the maintenance of rDNA stability. The results of this screen are summarized in the Yeast rDNA Stability (YRS) Database, in which the stability and copy number of rDNA in each mutant are presented. From this screen, we identified ∼700 genes that may contribute to the maintenance of rDNA stability. In addition, ∼50 mutants had abnormally high or low rDNA copy numbers. Moreover, some mutants with unstable rDNA displayed abnormalities in another chromosome. In this review, we introduce the YRS Database and discuss the roles of newly identified genes that contribute to rDNA maintenance and genome integrity. © FEMS 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Publications of Australian LIS Academics in Databases

ERIC Educational Resources Information Center

Wilson, Concepcion S.; Boell, Sebastian K.; Kennan, Mary Anne; Willard, Patricia

2011-01-01

This paper examines aspects of journal articles published from 1967 to 2008, located in eight databases, and authored or co-authored by academics serving for at least two years in Australian LIS programs from 1959 to 2008. These aspects are: inclusion of publications in databases, publications in journals, authorship characteristics of…
BioBarcode: a general DNA barcoding database and server platform for Asian biodiversity resources.

PubMed

Lim, Jeongheui; Kim, Sang-Yoon; Kim, Sungmin; Eo, Hae-Seok; Kim, Chang-Bae; Paek, Woon Kee; Kim, Won; Bhak, Jong

2009-12-03

DNA barcoding provides a rapid, accurate, and standardized method for species-level identification using short DNA sequences. Such a standardized identification method is useful for mapping all the species on Earth, particularly when DNA sequencing technology is cheaply available. There are many nations in Asia with many biodiversity resources that need to be mapped and registered in databases. We have built a general DNA barcode data processing system, BioBarcode, with open source software - which is a general purpose database and server. It uses mySQL RDBMS 5.0, BLAST2, and Apache httpd server. An exemplary database of BioBarcode has around 11,300 specimen entries (including GenBank data) and registers the biological species to map their genetic relationships. The BioBarcode database contains a chromatogram viewer which improves the performance in DNA sequence analyses. Asia has a very high degree of biodiversity and the BioBarcode database server system aims to provide an efficient bioinformatics protocol that can be freely used by Asian researchers and research organizations interested in DNA barcoding. The BioBarcode promotes the rapid acquisition of biological species DNA sequence data that meet global standards by providing specialized services, and provides useful tools that will make barcoding cheaper and faster in the biodiversity community such as standardization, depository, management, and analysis of DNA barcode data. The system can be downloaded upon request, and an exemplary server has been constructed with which to build an Asian biodiversity system http://www.asianbarcode.org.
Mock jurors' use of error rates in DNA database trawls.

PubMed

Scurich, Nicholas; John, Richard S

2013-12-01

Forensic science is not infallible, as data collected by the Innocence Project have revealed. The rate at which errors occur in forensic DNA testing-the so-called "gold standard" of forensic science-is not currently known. This article presents a Bayesian analysis to demonstrate the profound impact that error rates have on the probative value of a DNA match. Empirical evidence on whether jurors are sensitive to this effect is equivocal: Studies have typically found they are not, while a recent, methodologically rigorous study found that they can be. This article presents the results of an experiment that examined this issue within the context of a database trawl case in which one DNA profile was tested against a multitude of profiles. The description of the database was manipulated (i.e., "medical" or "offender" database, or not specified) as was the rate of error (i.e., one-in-10 or one-in-1,000). Jury-eligible participants were nearly twice as likely to convict in the offender database condition compared to the condition not specified. The error rates did not affect verdicts. Both factors, however, affected the perception of the defendant's guilt, in the expected direction, although the size of the effect was meager compared to Bayesian prescriptions. The results suggest that the disclosure of an offender database to jurors might constitute prejudicial evidence, and calls for proficiency testing in forensic science as well as training of jurors are echoed. (c) 2013 APA, all rights reserved
The collation of forensic DNA case data into a multi-dimensional intelligence database.

PubMed

Walsh, S J; Moss, D S; Kliem, C; Vintiner, G M

2002-01-01

The primary aim of any DNA Database is to link individuals to unsolved offenses and unsolved offenses to each other via DNA profiling. This aim has been successfully realised during the operation of the New Zealand (NZ) DNA Databank over the past five years. The DNA Intelligence Project (DIP), a collaborative project involving NZ forensic and law enforcement agencies, interrogated the forensic case data held on the NZ DNA databank and collated it into a functional intelligence database. This database has been used to identify significant trends which direct Police and forensic personnel towards the most appropriate use of DNA technology. Intelligence is being provided in areas such as the level of usage of DNA techniques in criminal investigation, the relative success of crime scene samples and the geographical distribution of crimes. The DIP has broadened the dimensions of the information offered through the NZ DNA Databank and has furthered the understanding and investigative capability of both Police and forensic scientists. The outcomes of this research fit soundly with the current policies of 'intelligence led policing', which are being adopted by Police jurisdictions locally and overseas.
"Would you accept having your DNA profile inserted in the National Forensic DNA database? Why?" Results of a questionnaire applied in Portugal.

PubMed

Machado, Helena; Silva, Susana

2014-01-01

The creation and expansion of forensic DNA databases might involve potential threats to the protection of a range of human rights. At the same time, such databases have social benefits. Based on data collected through an online questionnaire applied to 628 individuals in Portugal, this paper aims to analyze the citizens' willingness to donate voluntarily a sample for profiling and inclusion in the National Forensic DNA Database and the views underpinning such a decision. Nearly one-quarter of the respondents would indicate 'no', and this negative response increased significantly with age and education. The overriding willingness to accept the inclusion of the individual genetic profile indicates an acknowledgement of the investigative potential of forensic DNA technologies and a relegation of civil liberties and human rights to the background, owing to the perceived benefits of protecting both society and the individual from crime. This rationale is mostly expressed by the idea that all citizens should contribute to the expansion of the National Forensic DNA Database for reasons that range from the more abstract assumption that donating a sample for profiling would be helpful in fighting crime to the more concrete suggestion that everyone (criminals and non-criminals) should be in the database. The concerns with the risks of accepting the donation of a sample for genetic profiling and inclusion in the National Forensic DNA Database are mostly related to lack of control and insufficient or unclear regulations concerning safeguarding individuals' data and supervising the access and uses of genetic data. By providing an empirically-grounded understanding of the attitudes regarding willingness to donate voluntary a sample for profiling and inclusion in a National Forensic DNA Database, this study also considers the citizens' perceived benefits and risks of operating forensic DNA databases. These collective views might be useful for the formation of international common
DNAVaxDB: the first web-based DNA vaccine database and its data analysis

PubMed Central

2014-01-01

Since the first DNA vaccine studies were done in the 1990s, thousands more studies have followed. Here we report the development and analysis of DNAVaxDB (http://www.violinet.org/dnavaxdb), the first publically available web-based DNA vaccine database that curates, stores, and analyzes experimentally verified DNA vaccines, DNA vaccine plasmid vectors, and protective antigens used in DNA vaccines. All data in DNAVaxDB are annotated from reliable resources, particularly peer-reviewed articles. Among over 140 DNA vaccine plasmids, some plasmids were more frequently used in one type of pathogen than others; for example, pCMVi-UB for G- bacterial DNA vaccines, and pCAGGS for viral DNA vaccines. Presently, over 400 DNA vaccines containing over 370 protective antigens from over 90 infectious and non-infectious diseases have been curated in DNAVaxDB. While extracellular and bacterial cell surface proteins and adhesin proteins were frequently used for DNA vaccine development, the majority of protective antigens used in Chlamydophila DNA vaccines are localized to the inner portion of the cell. The DNA vaccine priming, other vaccine boosting vaccination regimen has been widely used to induce protection against infection of different pathogens such as HIV. Parasitic and cancer DNA vaccines were also systematically analyzed. User-friendly web query and visualization interfaces are available in DNAVaxDB for interactive data search. To support data exchange, the information of DNA vaccines, plasmids, and protective antigens is stored in the Vaccine Ontology (VO). DNAVaxDB is targeted to become a timely and vital source of DNA vaccines and related data and facilitate advanced DNA vaccine research and development. PMID:25104313
SAM: String-based sequence search algorithm for mitochondrial DNA database queries

PubMed Central

Röck, Alexander; Irwin, Jodi; Dür, Arne; Parsons, Thomas; Parson, Walther

2011-01-01

The analysis of the haploid mitochondrial (mt) genome has numerous applications in forensic and population genetics, as well as in disease studies. Although mtDNA haplotypes are usually determined by sequencing, they are rarely reported as a nucleotide string. Traditionally they are presented in a difference-coded position-based format relative to the corrected version of the first sequenced mtDNA. This convention requires recommendations for standardized sequence alignment that is known to vary between scientific disciplines, even between laboratories. As a consequence, database searches that are vital for the interpretation of mtDNA data can suffer from biased results when query and database haplotypes are annotated differently. In the forensic context that would usually lead to underestimation of the absolute and relative frequencies. To address this issue we introduce SAM, a string-based search algorithm that converts query and database sequences to position-free nucleotide strings and thus eliminates the possibility that identical sequences will be missed in a database query. The mere application of a BLAST algorithm would not be a sufficient remedy as it uses a heuristic approach and does not address properties specific to mtDNA, such as phylogenetically stable but also rapidly evolving insertion and deletion events. The software presented here provides additional flexibility to incorporate phylogenetic data, site-specific mutation rates, and other biologically relevant information that would refine the interpretation of mitochondrial DNA data. The manuscript is accompanied by freeware and example data sets that can be used to evaluate the new software (http://stringvalidation.org). PMID:21056022
The Dfam database of repetitive DNA families.

PubMed

Hubley, Robert; Finn, Robert D; Clements, Jody; Eddy, Sean R; Jones, Thomas A; Bao, Weidong; Smit, Arian F A; Wheeler, Travis J

2016-01-04

Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of many genomes. Dfam is an open access database of families of repetitive DNA elements, in which each family is represented by a multiple sequence alignment and a profile hidden Markov model (HMM). The initial release of Dfam, featured in the 2013 NAR Database Issue, contained 1143 families of repetitive elements found in humans, and was used to produce more than 100 Mb of additional annotation of TE-derived regions in the human genome, with improved speed. Here, we describe recent advances, most notably expansion to 4150 total families including a comprehensive set of known repeat families from four new organisms (mouse, zebrafish, fly and nematode). We describe improvements to coverage, and to our methods for identifying and reducing false annotation. We also describe updates to the website interface. The Dfam website has moved to http://dfam.org. Seed alignments, profile HMMs, hit lists and other underlying data are available for download. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Choice of population database for forensic DNA profile analysis.

PubMed

Steele, Christopher D; Balding, David J

2014-12-01

When evaluating the weight of evidence (WoE) for an individual to be a contributor to a DNA sample, an allele frequency database is required. The allele frequencies are needed to inform about genotype probabilities for unknown contributors of DNA to the sample. Typically databases are available from several populations, and a common practice is to evaluate the WoE using each available database for each unknown contributor. Often the most conservative WoE (most favourable to the defence) is the one reported to the court. However the number of human populations that could be considered is essentially unlimited and the number of contributors to a sample can be large, making it impractical to perform every possible WoE calculation, particularly for complex crime scene profiles. We propose instead the use of only the database that best matches the ancestry of the queried contributor, together with a substantial FST adjustment. To investigate the degree of conservativeness of this approach, we performed extensive simulations of one- and two-contributor crime scene profiles, in the latter case with, and without, the profile of the second contributor available for the analysis. The genotypes were simulated using five population databases, which were also available for the analysis, and evaluations of WoE using our heuristic rule were compared with several alternative calculations using different databases. Using FST=0.03, we found that our heuristic gave WoE more favourable to the defence than alternative calculations in well over 99% of the comparisons we considered; on average the difference in WoE was just under 0.2 bans (orders of magnitude) per locus. The degree of conservativeness of the heuristic rule can be adjusted through the FST value. We propose the use of this heuristic for DNA profile WoE calculations, due to its ease of implementation, and efficient use of the evidence while allowing a flexible degree of conservativeness. Copyright © 2014. Published by

BioBarcode: a general DNA barcoding database and server platform for Asian biodiversity resources

PubMed Central

2009-01-01

Background DNA barcoding provides a rapid, accurate, and standardized method for species-level identification using short DNA sequences. Such a standardized identification method is useful for mapping all the species on Earth, particularly when DNA sequencing technology is cheaply available. There are many nations in Asia with many biodiversity resources that need to be mapped and registered in databases. Results We have built a general DNA barcode data processing system, BioBarcode, with open source software - which is a general purpose database and server. It uses mySQL RDBMS 5.0, BLAST2, and Apache httpd server. An exemplary database of BioBarcode has around 11,300 specimen entries (including GenBank data) and registers the biological species to map their genetic relationships. The BioBarcode database contains a chromatogram viewer which improves the performance in DNA sequence analyses. Conclusion Asia has a very high degree of biodiversity and the BioBarcode database server system aims to provide an efficient bioinformatics protocol that can be freely used by Asian researchers and research organizations interested in DNA barcoding. The BioBarcode promotes the rapid acquisition of biological species DNA sequence data that meet global standards by providing specialized services, and provides useful tools that will make barcoding cheaper and faster in the biodiversity community such as standardization, depository, management, and analysis of DNA barcode data. The system can be downloaded upon request, and an exemplary server has been constructed with which to build an Asian biodiversity system http://www.asianbarcode.org. PMID:19958506
The effect of wild card designations and rare alleles in forensic DNA database searches.

PubMed

Tvedebrink, Torben; Bright, Jo-Anne; Buckleton, John S; Curran, James M; Morling, Niels

2015-05-01

Forensic DNA databases are powerful tools used for the identification of persons of interest in criminal investigations. Typically, they consist of two parts: (1) a database containing DNA profiles of known individuals and (2) a database of DNA profiles associated with crime scenes. The risk of adventitious or chance matches between crimes and innocent people increases as the number of profiles within a database grows and more data is shared between various forensic DNA databases, e.g. from different jurisdictions. The DNA profiles obtained from crime scenes are often partial because crime samples may be compromised in quantity or quality. When an individual's profile cannot be resolved from a DNA mixture, ambiguity is introduced. A wild card, F, may be used in place of an allele that has dropped out or when an ambiguous profile is resolved from a DNA mixture. Variant alleles that do not correspond to any marker in the allelic ladder or appear above or below the extent of the allelic ladder range are assigned the allele designation R for rare allele. R alleles are position specific with respect to the observed/unambiguous allele. The F and R designations are made when the exact genotype has not been determined. The F and R designation are treated as wild cards for searching, which results in increased chance of adventitious matches. We investigated the probability of adventitious matches given these two types of wild cards. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Influencing Database Use in Public Libraries.

ERIC Educational Resources Information Center

Tenopir, Carol

1999-01-01

Discusses results of a survey of factors influencing database use in public libraries. Highlights the importance of content; ease of use; and importance of instruction. Tabulates importance indications for number and location of workstations, library hours, availability of remote login, usefulness and quality of content, lack of other databases,…
Validation of SmartRank: A likelihood ratio software for searching national DNA databases with complex DNA profiles.

PubMed

Benschop, Corina C G; van de Merwe, Linda; de Jong, Jeroen; Vanvooren, Vanessa; Kempenaers, Morgane; Kees van der Beek, C P; Barni, Filippo; Reyes, Eusebio López; Moulin, Léa; Pene, Laurent; Haned, Hinda; Sijen, Titia

2017-07-01

Searching a national DNA database with complex and incomplete profiles usually yields very large numbers of possible matches that can present many candidate suspects to be further investigated by the forensic scientist and/or police. Current practice in most forensic laboratories consists of ordering these 'hits' based on the number of matching alleles with the searched profile. Thus, candidate profiles that share the same number of matching alleles are not differentiated and due to the lack of other ranking criteria for the candidate list it may be difficult to discern a true match from the false positives or notice that all candidates are in fact false positives. SmartRank was developed to put forward only relevant candidates and rank them accordingly. The SmartRank software computes a likelihood ratio (LR) for the searched profile and each profile in the DNA database and ranks database entries above a defined LR threshold according to the calculated LR. In this study, we examined for mixed DNA profiles of variable complexity whether the true donors are retrieved, what the number of false positives above an LR threshold is and the ranking position of the true donors. Using 343 mixed DNA profiles over 750 SmartRank searches were performed. In addition, the performance of SmartRank and CODIS were compared regarding DNA database searches and SmartRank was found complementary to CODIS. We also describe the applicable domain of SmartRank and provide guidelines. The SmartRank software is open-source and freely available. Using the best practice guidelines, SmartRank enables obtaining investigative leads in criminal cases lacking a suspect. Copyright © 2017 Elsevier B.V. All rights reserved.
Toward a mtDNA locus-specific mutation database using the LOVD platform.

PubMed

Elson, Joanna L; Sweeney, Mary G; Procaccio, Vincent; Yarham, John W; Salas, Antonio; Kong, Qing-Peng; van der Westhuizen, Francois H; Pitceathly, Robert D S; Thorburn, David R; Lott, Marie T; Wallace, Douglas C; Taylor, Robert W; McFarland, Robert

2012-09-01

The Human Variome Project (HVP) is a global effort to collect and curate all human genetic variation affecting health. Mutations of mitochondrial DNA (mtDNA) are an important cause of neurogenetic disease in humans; however, identification of the pathogenic mutations responsible can be problematic. In this article, we provide explanations as to why and suggest how such difficulties might be overcome. We put forward a case in support of a new Locus Specific Mutation Database (LSDB) implemented using the Leiden Open-source Variation Database (LOVD) system that will not only list primary mutations, but also present the evidence supporting their role in disease. Critically, we feel that this new database should have the capacity to store information on the observed phenotypes alongside the genetic variation, thereby facilitating our understanding of the complex and variable presentation of mtDNA disease. LOVD supports fast queries of both seen and hidden data and allows storage of sequence variants from high-throughput sequence analysis. The LOVD platform will allow construction of a secure mtDNA database; one that can fully utilize currently available data, as well as that being generated by high-throughput sequencing, to link genotype with phenotype enhancing our understanding of mitochondrial disease, with a view to providing better prognostic information. © 2012 Wiley Periodicals, Inc.
Toward a mtDNA Locus-Specific Mutation Database Using the LOVD Platform

PubMed Central

Elson, Joanna L.; Sweeney, Mary G.; Procaccio, Vincent; Yarham, John W.; Salas, Antonio; Kong, Qing-Peng; van der Westhuizen, Francois H.; Pitceathly, Robert D.S.; Thorburn, David R.; Lott, Marie T.; Wallace, Douglas C.; Taylor, Robert W.; McFarland, Robert

2015-01-01

The Human Variome Project (HVP) is a global effort to collect and curate all human genetic variation affecting health. Mutations of mitochondrial DNA (mtDNA) are an important cause of neurogenetic disease in humans; however, identification of the pathogenic mutations responsible can be problematic. In this article, we provide explanations as to why and suggest how such difficulties might be overcome. We put forward a case in support of a new Locus Specific Mutation Database (LSDB) implemented using the Leiden Open-source Variation Database (LOVD) system that will not only list primary mutations, but also present the evidence supporting their role in disease. Critically, we feel that this new database should have the capacity to store information on the observed phenotypes alongside the genetic variation, thereby facilitating our understanding of the complex and variable presentation of mtDNA disease. LOVD supports fast queries of both seen and hidden data and allows storage of sequence variants from high-throughput sequence analysis. The LOVD platform will allow construction of a secure mtDNA database; one that can fully utilize currently available data, as well as that being generated by high-throughput sequencing, to link genotype with phenotype enhancing our understanding of mitochondrial disease, with a view to providing better prognostic information. PMID:22581690
[Current status of DNA databases in the forensic field: new progress, new legal needs].

PubMed

Baeta, Miriam; Martínez-Jarreta, Begoña

2009-01-01

One of the most polemic issues regarding the use of deoxyribonucleic acid (DNA) in the legal sphere, refers to the creation of DNA databases. Until relatively recently, Spain did not have a law to support the establishment of a national DNA profile bank for forensic purposes, and preserve the fundamental rights of subjects whose data are archived therein. The regulatory law of police databases regarding identifiers obtained from DNA approved in 2007, covers this void in the Spanish legislation and responds to the incessant need to adapt the laws to continuous scientific and technological progress.
MeDReaders: a database for transcription factors that bind to methylated DNA.

PubMed

Wang, Guohua; Luo, Ximei; Wang, Jianan; Wan, Jun; Xia, Shuli; Zhu, Heng; Qian, Jiang; Wang, Yadong

2018-01-04

Understanding the molecular principles governing interactions between transcription factors (TFs) and DNA targets is one of the main subjects for transcriptional regulation. Recently, emerging evidence demonstrated that some TFs could bind to DNA motifs containing highly methylated CpGs both in vitro and in vivo. Identification of such TFs and elucidation of their physiological roles now become an important stepping-stone toward understanding the mechanisms underlying the methylation-mediated biological processes, which have crucial implications for human disease and disease development. Hence, we constructed a database, named as MeDReaders, to collect information about methylated DNA binding activities. A total of 731 TFs, which could bind to methylated DNA sequences, were manually curated in human and mouse studies reported in the literature. In silico approaches were applied to predict methylated and unmethylated motifs of 292 TFs by integrating whole genome bisulfite sequencing (WGBS) and ChIP-Seq datasets in six human cell lines and one mouse cell line extracted from ENCODE and GEO database. MeDReaders database will provide a comprehensive resource for further studies and aid related experiment designs. The database implemented unified access for users to most TFs involved in such methylation-associated binding actives. The website is available at http://medreader.org/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Genetics and Forensics: Making the National DNA Database

PubMed Central

Johnson, Paul; Williams, Robin; Martin, Paul

2005-01-01

This paper is based on a current study of the growing police use of the epistemic authority of molecular biology for the identification of criminal suspects in support of crime investigation. It discusses the development of DNA profiling and the establishment and development of the UK National DNA Database (NDNAD) as an instance of the ‘scientification of police work’ (Ericson and Shearing 1986) in which the police uses of science and technology have a recursive effect on their future development. The NDNAD, owned by the Association of Chief Police Officers of England and Wales, is the first of its kind in the world and currently contains the genetic profiles of more than 2 million people. The paper provides a framework for the examination of this socio-technical innovation, begins to tease out the dense and compact history of the database and accounts for the way in which changes and developments across disparate scientific, governmental and policing contexts, have all contributed to the range of uses to which it is put. PMID:16467921
DNA algorithms of implementing biomolecular databases on a biological computer.

PubMed

Chang, Weng-Long; Vasilakos, Athanasios V

2015-01-01

In this paper, DNA algorithms are proposed to perform eight operations of relational algebra (calculus), which include Cartesian product, union, set difference, selection, projection, intersection, join, and division, on biomolecular relational databases.
Launching the Greek forensic DNA database. The legal framework and arising ethical issues.

PubMed

Voultsos, Polychronis; Njau, Samuel; Tairis, Nikolaos; Psaroulis, Dimitrios; Kovatsi, Leda

2011-11-01

Since the creation of the first national DNA database in Europe in 1995, many European countries have legislated laws for initiating and regulating their own databases. The Greek government legislated a law in 2008, by which the National DNA Database of Greece was founded and regulated. According to this law, only DNA profiles from convicted criminals were recorded. Nevertheless, a year later, in 2009, the law was amended to permit the creation of an expanded database including innocent people and children. Unfortunately, the new law is very vague in many aspects and does not respect the principle of proportionality. Therefore, according to our opinion, it will soon need to be re-amended. Furthermore, prior to legislating the new law, there was no debate with the community itself in order to clarify what system would best suit Greece and what the citizens would be willing to accept. We present the current legal framework in Greece, we highlight issues that need to be clarified and we discuss possible ethical issues that may arise. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Human Chromosome Y and Haplogroups; introducing YDHS Database.

PubMed

Tiirikka, Timo; Moilanen, Jukka S

2015-12-01

As the high throughput sequencing efforts generate more biological information, scientists from different disciplines are interpreting the polymorphisms that make us unique. In addition, there is an increasing trend in general public to research their own genealogy, find distant relatives and to know more about their biological background. Commercial vendors are providing analyses of mitochondrial and Y-chromosomal markers for such purposes. Clearly, an easy-to-use free interface to the existing data on the identified variants would be in the interest of general public and professionals less familiar with the field. Here we introduce a novel metadatabase YDHS that aims to provide such an interface for Y-chromosomal DNA (Y-DNA) haplogroups and sequence variants. The database uses ISOGG Y-DNA tree as the source of mutations and haplogroups and by using genomic positions of the mutations the database links them to genes and other biological entities. YDHS contains analysis tools for deeper Y-SNP analysis. YDHS addresses the shortage of Y-DNA related databases. We have tested our database using a set of different cases from literature ranging from infertility to autism. The database is at http://www.semanticgen.net/ydhs Y-chromosomal DNA (Y-DNA) haplogroups and sequence variants have not been in the scientific limelight, excluding certain specialized fields like forensics, mainly because there is not much freely available information or it is scattered in different sources. However, as we have demonstrated Y-SNPs do play a role in various cases on the haplogroup level and it is possible to create a free Y-DNA dedicated bioinformatics resource.
Analysis of commercial and public bioactivity databases.

PubMed

Tiikkainen, Pekka; Franke, Lutz

2012-02-27

Activity data for small molecules are invaluable in chemoinformatics. Various bioactivity databases exist containing detailed information of target proteins and quantitative binding data for small molecules extracted from journals and patents. In the current work, we have merged several public and commercial bioactivity databases into one bioactivity metabase. The molecular presentation, target information, and activity data of the vendor databases were standardized. The main motivation of the work was to create a single relational database which allows fast and simple data retrieval by in-house scientists. Second, we wanted to know the amount of overlap between databases by commercial and public vendors to see whether the former contain data complementing the latter. Third, we quantified the degree of inconsistency between data sources by comparing data points derived from the same scientific article cited by more than one vendor. We found that each data source contains unique data which is due to different scientific articles cited by the vendors. When comparing data derived from the same article we found that inconsistencies between the vendors are common. In conclusion, using databases of different vendors is still useful since the data overlap is not complete. It should be noted that this can be partially explained by the inconsistencies and errors in the source data.
Non-B DB: a database of predicted non-B DNA-forming motifs in mammalian genomes.

PubMed

Cer, Regina Z; Bruce, Kevin H; Mudunuri, Uma S; Yi, Ming; Volfovsky, Natalia; Luke, Brian T; Bacolla, Albino; Collins, Jack R; Stephens, Robert M

2011-01-01

Although the capability of DNA to form a variety of non-canonical (non-B) structures has long been recognized, the overall significance of these alternate conformations in biology has only recently become accepted en masse. In order to provide access to genome-wide locations of these classes of predicted structures, we have developed non-B DB, a database integrating annotations and analysis of non-B DNA-forming sequence motifs. The database provides the most complete list of alternative DNA structure predictions available, including Z-DNA motifs, quadruplex-forming motifs, inverted repeats, mirror repeats and direct repeats and their associated subsets of cruciforms, triplex and slipped structures, respectively. The database also contains motifs predicted to form static DNA bends, short tandem repeats and homo(purine•pyrimidine) tracts that have been associated with disease. The database has been built using the latest releases of the human, chimp, dog, macaque and mouse genomes, so that the results can be compared directly with other data sources. In order to make the data interpretable in a genomic context, features such as genes, single-nucleotide polymorphisms and repetitive elements (SINE, LINE, etc.) have also been incorporated. The database is accessed through query pages that produce results with links to the UCSC browser and a GBrowse-based genomic viewer. It is freely accessible at http://nonb.abcc.ncifcrf.gov.
The mitochondrial DNA makeup of Romanians: A forensic mtDNA control region database and phylogenetic characterization.

PubMed

Turchi, Chiara; Stanciu, Florin; Paselli, Giorgia; Buscemi, Loredana; Parson, Walther; Tagliabracci, Adriano

2016-09-01

To evaluate the pattern of Romanian population from a mitochondrial perspective and to establish an appropriate mtDNA forensic database, we generated a high-quality mtDNA control region dataset from 407 Romanian subjects belonging to four major historical regions: Moldavia, Transylvania, Wallachia and Dobruja. The entire control region (CR) was analyzed by Sanger-type sequencing assays and the resulting 306 different haplotypes were classified into haplogroups according to the most updated mtDNA phylogeny. The Romanian gene pool is mainly composed of West Eurasian lineages H (31.7%), U (12.8%), J (10.8%), R (10.1%), T (9.1%), N (8.1%), HV (5.4%),K (3.7%), HV0 (4.2%), with exceptions of East Asian haplogroup M (3.4%) and African haplogroup L (0.7%). The pattern of mtDNA variation observed in this study indicates that the mitochondrial DNA pool is geographically homogeneous across Romania and that the haplogroup composition reveals signals of admixture of populations of different origin. The PCA scatterplot supported this scenario, with Romania located in southeastern Europe area, close to Bulgaria and Hungary, and as a borderland with respect to east Mediterranean and other eastern European countries. High haplotype diversity (0.993) and nucleotide diversity indices (0.00838±0.00426), together with low random match probability (0.0087) suggest the usefulness of this control region dataset as a forensic database in routine forensic mtDNA analysis and in the investigation of maternal genetic lineages in the Romanian population. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
NREL: U.S. Life Cycle Inventory Database - Publications

Science.gov Websites

Publications Planning Documents U.S. Life Cycle Inventory Database Roadmap, February 2009 U.S. Life Cycle Inventory User Survey, February 2009 U.S. LCI Database Factsheet, March 2005 User's Guide for Life
DNAtraffic--a new database for systems biology of DNA dynamics during the cell life.

PubMed

Kuchta, Krzysztof; Barszcz, Daniela; Grzesiuk, Elzbieta; Pomorski, Pawel; Krwawicz, Joanna

2012-01-01

DNAtraffic (http://dnatraffic.ibb.waw.pl/) is dedicated to be a unique comprehensive and richly annotated database of genome dynamics during the cell life. It contains extensive data on the nomenclature, ontology, structure and function of proteins related to the DNA integrity mechanisms such as chromatin remodeling, histone modifications, DNA repair and damage response from eight organisms: Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Escherichia coli and Arabidopsis thaliana. DNAtraffic contains comprehensive information on the diseases related to the assembled human proteins. DNAtraffic is richly annotated in the systemic information on the nomenclature, chemistry and structure of DNA damage and their sources, including environmental agents or commonly used drugs targeting nucleic acids and/or proteins involved in the maintenance of genome stability. One of the DNAtraffic database aim is to create the first platform of the combinatorial complexity of DNA network analysis. Database includes illustrations of pathways, damage, proteins and drugs. Since DNAtraffic is designed to cover a broad spectrum of scientific disciplines, it has to be extensively linked to numerous external data sources. Our database represents the result of the manual annotation work aimed at making the DNAtraffic much more useful for a wide range of systems biology applications.
DNAtraffic—a new database for systems biology of DNA dynamics during the cell life

PubMed Central

Kuchta, Krzysztof; Barszcz, Daniela; Grzesiuk, Elzbieta; Pomorski, Pawel; Krwawicz, Joanna

2012-01-01

DNAtraffic (http://dnatraffic.ibb.waw.pl/) is dedicated to be a unique comprehensive and richly annotated database of genome dynamics during the cell life. It contains extensive data on the nomenclature, ontology, structure and function of proteins related to the DNA integrity mechanisms such as chromatin remodeling, histone modifications, DNA repair and damage response from eight organisms: Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Escherichia coli and Arabidopsis thaliana. DNAtraffic contains comprehensive information on the diseases related to the assembled human proteins. DNAtraffic is richly annotated in the systemic information on the nomenclature, chemistry and structure of DNA damage and their sources, including environmental agents or commonly used drugs targeting nucleic acids and/or proteins involved in the maintenance of genome stability. One of the DNAtraffic database aim is to create the first platform of the combinatorial complexity of DNA network analysis. Database includes illustrations of pathways, damage, proteins and drugs. Since DNAtraffic is designed to cover a broad spectrum of scientific disciplines, it has to be extensively linked to numerous external data sources. Our database represents the result of the manual annotation work aimed at making the DNAtraffic much more useful for a wide range of systems biology applications. PMID:22110027
Construction of a primary DNA fingerprint database for cotton cultivars.

PubMed

Zhang, Y C; Kuang, M; Yang, W H; Xu, H X; Zhou, D Y; Wang, Y Q; Feng, X A; Su, C; Wang, F

2013-06-13

Forty core primers were used to construct a DNA fingerprint database of 132 cotton species based on multiplex fluorescence detection technology. A high first successful ratio of 99.04% was demonstrated with tetraplex polymerase chain reaction. Forty primer pairs amplified a total of 262 genotypes among 132 species, with an average of 6.55 per primer and values of polymorphism information content varying from 0.340 to 0.882. Conflicting DNA homozygous ratios were found in various species. The highest DNA homozygous ratio was found in landrace standard cultivars, which had an 81.46% DNA homozygous ratio. The lowest occurred in a group of 2010 leading cultivars with a homozygous ratio of 63.04%. Genetic diversity of the 132 species was briefly analyzed using unweighted pair-group method with arithmetic means.
Public Opinion Poll Question Databases: An Evaluation

ERIC Educational Resources Information Center

Woods, Stephen

2007-01-01

This paper evaluates five polling resource: iPOLL, Polling the Nations, Gallup Brain, Public Opinion Poll Question Database, and Polls and Surveys. Content was evaluated on disclosure standards from major polling organizations, scope on a model for public opinion polls, and presentation on a flow chart discussing search limitations and usability.

[Privacy and public benefit in using large scale health databases].

PubMed

Yamamoto, Ryuichi

2014-01-01

In Japan, large scale heath databases were constructed in a few years, such as National Claim insurance and health checkup database (NDB) and Japanese Sentinel project. But there are some legal issues for making adequate balance between privacy and public benefit by using such databases. NDB is carried based on the act for elderly person's health care but in this act, nothing is mentioned for using this database for general public benefit. Therefore researchers who use this database are forced to pay much concern about anonymization and information security that may disturb the research work itself. Japanese Sentinel project is a national project to detecting drug adverse reaction using large scale distributed clinical databases of large hospitals. Although patients give the future consent for general such purpose for public good, it is still under discussion using insufficiently anonymized data. Generally speaking, researchers of study for public benefit will not infringe patient's privacy, but vague and complex requirements of legislation about personal data protection may disturb the researches. Medical science does not progress without using clinical information, therefore the adequate legislation that is simple and clear for both researchers and patients is strongly required. In Japan, the specific act for balancing privacy and public benefit is now under discussion. The author recommended the researchers including the field of pharmacology should pay attention to, participate in the discussion of, and make suggestion to such act or regulations.
Nucleotide Sequence Database Comparison for Routine Dermatophyte Identification by Internal Transcribed Spacer 2 Genetic Region DNA Barcoding.

PubMed

Normand, A C; Packeu, A; Cassagne, C; Hendrickx, M; Ranque, S; Piarroux, R

2018-05-01

Conventional dermatophyte identification is based on morphological features. However, recent studies have proposed to use the nucleotide sequences of the rRNA internal transcribed spacer (ITS) region as an identification barcode of all fungi, including dermatophytes. Several nucleotide databases are available to compare sequences and thus identify isolates; however, these databases often contain mislabeled sequences that impair sequence-based identification. We evaluated five of these databases on a clinical isolate panel. We selected 292 clinical dermatophyte strains that were prospectively subjected to an ITS2 nucleotide sequence analysis. Sequences were analyzed against the databases, and the results were compared to clusters obtained via DNA alignment of sequence segments. The DNA tree served as the identification standard throughout the study. According to the ITS2 sequence identification, the majority of strains (255/292) belonged to the genus Trichophyton , mainly T. rubrum complex ( n = 184), T. interdigitale ( n = 40), T. tonsurans ( n = 26), and T. benhamiae ( n = 5). Other genera included Microsporum (e.g., M. canis [ n = 21], M. audouinii [ n = 10], Nannizzia gypsea [ n = 3], and Epidermophyton [ n = 3]). Species-level identification of T. rubrum complex isolates was an issue. Overall, ITS DNA sequencing is a reliable tool to identify dermatophyte species given that a comprehensive and correctly labeled database is consulted. Since many inaccurate identification results exist in the DNA databases used for this study, reference databases must be verified frequently and amended in line with the current revisions of fungal taxonomy. Before describing a new species or adding a new DNA reference to the available databases, its position in the phylogenetic tree must be verified. Copyright © 2018 American Society for Microbiology.
HUNT: launch of a full-length cDNA database from the Helix Research Institute.

PubMed

Yudate, H T; Suwa, M; Irie, R; Matsui, H; Nishikawa, T; Nakamura, Y; Yamaguchi, D; Peng, Z Z; Yamamoto, T; Nagai, K; Hayashi, K; Otsuki, T; Sugiyama, T; Ota, T; Suzuki, Y; Sugano, S; Isogai, T; Masuho, Y

2001-01-01

The Helix Research Institute (HRI) in Japan is releasing 4356 HUman Novel Transcripts and related information in the newly established HUNT database. The institute is a joint research project principally funded by the Japanese Ministry of International Trade and Industry, and the clones were sequenced in the governmental New Energy and Industrial Technology Development Organization (NEDO) Human cDNA Sequencing Project. The HUNT database contains an extensive amount of annotation from advanced analysis and represents an essential bioinformatics contribution towards understanding of the gene function. The HRI human cDNA clones were obtained from full-length enriched cDNA libraries constructed with the oligo-capping method and have resulted in novel full-length cDNA sequences. A large fraction has little similarity to any proteins of known function and to obtain clues about possible function we have developed original analysis procedures. Any putative function deduced here can be validated or refuted by complementary analysis results. The user can also extract information from specific categories like PROSITE patterns, PFAM domains, PSORT localization, transmembrane helices and clones with GENIUS structure assignments. The HUNT database can be accessed at http://www.hri.co.jp/HUNT.
Genetic variants of the DNA repair genes from Exome Aggregation Consortium (EXAC) database: significance in cancer.

PubMed

Das, Raima; Ghosh, Sankar Kumar

2017-04-01

DNA repair pathway is a primary defense system that eliminates wide varieties of DNA damage. Any deficiencies in them are likely to cause the chromosomal instability that leads to cell malfunctioning and tumorigenesis. Genetic polymorphisms in DNA repair genes have demonstrated a significant association with cancer risk. Our study attempts to give a glimpse of the overall scenario of the germline polymorphisms in the DNA repair genes by taking into account of the Exome Aggregation Consortium (ExAC) database as well as the Human Gene Mutation Database (HGMD) for evaluating the disease link, particularly in cancer. It has been found that ExAC DNA repair dataset (which consists of 228 DNA repair genes) comprises 30.4% missense, 12.5% dbSNP reported and 3.2% ClinVar significant variants. 27% of all the missense variants has the deleterious SIFT score of 0.00 and 6% variants carrying the most damaging Polyphen-2 score of 1.00, thus affecting the protein structure and function. However, as per HGMD, only a fraction (1.2%) of ExAC DNA repair variants was found to be cancer-related, indicating remaining variants reported in both the databases to be further analyzed. This, in turn, may provide an increased spectrum of the reported cancer linked variants in the DNA repair genes present in ExAC database. Moreover, further in silico functional assay of the identified vital cancer-associated variants, which is essential to get their actual biological significance, may shed some lights in the field of targeted drug development in near future. Copyright © 2017. Published by Elsevier B.V.
Exploration of the Chemical Space of Public Genomic Databases

EPA Science Inventory

The current project aims to chemically index the content of public genomic databases to make these data accessible in relation to other publicly available, chemically-indexed toxicological information.
75 FR 29155 - Publicly Available Consumer Product Safety Information Database

Federal Register 2010, 2011, 2012, 2013, 2014

2010-05-24

...The Consumer Product Safety Commission (``Commission,'' ``CPSC,'' or ``we'') is issuing a notice of proposed rulemaking that would establish a publicly available consumer product safety information database (``database''). Section 212 of the Consumer Product Safety Improvement Act of 2008 (``CPSIA'') amended the Consumer Product Safety Act (``CPSA'') to require the Commission to establish and maintain a publicly available, searchable database on the safety of consumer products, and other products or substances regulated by the Commission. The proposed rule would interpret various statutory requirements pertaining to the information to be included in the database and also would establish provisions regarding submitting reports of harm; providing notice of reports of harm to manufacturers; publishing reports of harm and manufacturer comments in the database; and dealing with confidential and materially inaccurate information.
Prototype Food and Nutrient Database for Dietary Studies: Branded Food Products Database for Public Health Proof of Concept

USDA-ARS?s Scientific Manuscript database

The Prototype Food and Nutrient Database for Dietary Studies (Prototype FNDDS) Branded Food Products Database for Public Health is a proof of concept database. The database contains a small selection of food products which is being used to exhibit the approach for incorporation of the Branded Food ...
DR-GAS: a database of functional genetic variants and their phosphorylation states in human DNA repair systems.

PubMed

Sehgal, Manika; Singh, Tiratha Raj

2014-04-01

We present DR-GAS(1), a unique, consolidated and comprehensive DNA repair genetic association studies database of human DNA repair system. It presents information on repair genes, assorted mechanisms of DNA repair, linkage disequilibrium, haplotype blocks, nsSNPs, phosphorylation sites, associated diseases, and pathways involved in repair systems. DNA repair is an intricate process which plays an essential role in maintaining the integrity of the genome by eradicating the damaging effect of internal and external changes in the genome. Hence, it is crucial to extensively understand the intact process of DNA repair, genes involved, non-synonymous SNPs which perhaps affect the function, phosphorylated residues and other related genetic parameters. All the corresponding entries for DNA repair genes, such as proteins, OMIM IDs, literature references and pathways are cross-referenced to their respective primary databases. DNA repair genes and their associated parameters are either represented in tabular or in graphical form through images elucidated by computational and statistical analyses. It is believed that the database will assist molecular biologists, biotechnologists, therapeutic developers and other scientific community to encounter biologically meaningful information, and meticulous contribution of genetic level information towards treacherous diseases in human DNA repair systems. DR-GAS is freely available for academic and research purposes at: http://www.bioinfoindia.org/drgas. Copyright © 2014 Elsevier B.V. All rights reserved.
Database of amino acid-nucleotide contacts in contacts in DNA-homeodomain protein

NASA Astrophysics Data System (ADS)

Grokhlina, T. I.; Zrelov, P. V.; Ivanov, V. V.; Polozov, R. V.; Chirgadze, Yu. N.; Sivozhelezov, V. S.

2013-09-01

The analysis of amino acid-nucleotide contacts in interfaces of the protein-DNA complexes, intended to find consistencies in the protein-DNA recognition, is a complex problem that requires an analysis of the physicochemical characteristics of these contacts and the positions of the participating amino acids and nucleotides in the chains of the protein and the DNA, respectively, as well as conservatism of these contacts. Thus, those heterogeneous data should be systematized. For this purpose we have developed a database of amino acid-nucleotide contacts ANTPC (Amino acid Nucleotide Type Position Conservation) following the archetypal example of the proteins in the homeodomain family. We show that it can be used to compare and classify the interfaces of the protein-DNA complexes.
Academic impact of a public electronic health database: bibliometric analysis of studies using the general practice research database.

PubMed

Chen, Yu-Chun; Wu, Jau-Ching; Haschler, Ingo; Majeed, Azeem; Chen, Tzeng-Ji; Wetter, Thomas

2011-01-01

Studies that use electronic health databases as research material are getting popular but the influence of a single electronic health database had not been well investigated yet. The United Kingdom's General Practice Research Database (GPRD) is one of the few electronic health databases publicly available to academic researchers. This study analyzed studies that used GPRD to demonstrate the scientific production and academic impact by a single public health database. A total of 749 studies published between 1995 and 2009 with 'General Practice Research Database' as their topics, defined as GPRD studies, were extracted from Web of Science. By the end of 2009, the GPRD had attracted 1251 authors from 22 countries and been used extensively in 749 studies published in 193 journals across 58 study fields. Each GPRD study was cited 2.7 times by successive studies. Moreover, the total number of GPRD studies increased rapidly, and it is expected to reach 1500 by 2015, twice the number accumulated till the end of 2009. Since 17 of the most prolific authors (1.4% of all authors) contributed nearly half (47.9%) of GPRD studies, success in conducting GPRD studies may accumulate. The GPRD was used mainly in, but not limited to, the three study fields of "Pharmacology and Pharmacy", "General and Internal Medicine", and "Public, Environmental and Occupational Health". The UK and United States were the two most active regions of GPRD studies. One-third of GRPD studies were internationally co-authored. A public electronic health database such as the GPRD will promote scientific production in many ways. Data owners of electronic health databases at a national level should consider how to reduce access barriers and to make data more available for research.
Internet-accessible DNA sequence database for identifying fusaria from human and animal infections.

PubMed

O'Donnell, Kerry; Sutton, Deanna A; Rinaldi, Michael G; Sarver, Brice A J; Balajee, S Arunmozhi; Schroers, Hans-Josef; Summerbell, Richard C; Robert, Vincent A R G; Crous, Pedro W; Zhang, Ning; Aoki, Takayuki; Jung, Kyongyong; Park, Jongsun; Lee, Yong-Hwan; Kang, Seogchan; Park, Bongsoo; Geiser, David M

2010-10-01

Because less than one-third of clinically relevant fusaria can be accurately identified to species level using phenotypic data (i.e., morphological species recognition), we constructed a three-locus DNA sequence database to facilitate molecular identification of the 69 Fusarium species associated with human or animal mycoses encountered in clinical microbiology laboratories. The database comprises partial sequences from three nuclear genes: translation elongation factor 1α (EF-1α), the largest subunit of RNA polymerase (RPB1), and the second largest subunit of RNA polymerase (RPB2). These three gene fragments can be amplified by PCR and sequenced using primers that are conserved across the phylogenetic breadth of Fusarium. Phylogenetic analyses of the combined data set reveal that, with the exception of two monotypic lineages, all clinically relevant fusaria are nested in one of eight variously sized and strongly supported species complexes. The monophyletic lineages have been named informally to facilitate communication of an isolate's clade membership and genetic diversity. To identify isolates to the species included within the database, partial DNA sequence data from one or more of the three genes can be used as a BLAST query against the database which is Web accessible at FUSARIUM-ID (http://isolate.fusariumdb.org) and the Centraalbureau voor Schimmelcultures (CBS-KNAW) Fungal Biodiversity Center (http://www.cbs.knaw.nl/fusarium). Alternatively, isolates can be identified via phylogenetic analysis by adding sequences of unknowns to the DNA sequence alignment, which can be downloaded from the two aforementioned websites. The utility of this database should increase significantly as members of the clinical microbiology community deposit in internationally accessible culture collections (e.g., CBS-KNAW or the Fusarium Research Center) cultures of novel mycosis-associated fusaria, along with associated, corrected sequence chromatograms and data, so that the
Does an English appeal court ruling increase the risks of miscarriages of justice when complex DNA profiles are searched against the national DNA database?

PubMed

Gill, P; Bleka, Ø; Egeland, T

2014-11-01

Likelihood ratio (LR) methods to interpret multi-contributor, low template, complex DNA mixtures are becoming standard practice. The next major development will be to introduce search engines based on the new methods to interrogate very large national DNA databases, such as those held by China, the USA and the UK. Here we describe a rapid method that was used to assign a LR to each individual member of database of 5 million genotypes which can be ranked in order. Previous authors have only considered database trawls in the context of binary match or non-match criteria. However, the concept of match/non-match no longer applies within the new paradigm introduced, since the distribution of resultant LRs is continuous for practical purposes. An English appeal court decision allows scientists to routinely report complex DNA profiles using nothing more than their subjective personal 'experience of casework' and 'observations' in order to apply an expression of the rarity of an evidential sample. This ruling must be considered in context of a recent high profile English case, where an individual was extracted from a database and wrongly accused of a serious crime. In this case the DNA evidence was used to negate the overwhelming exculpatory (non-DNA) evidence. Demonstrable confirmation bias, also known as the 'CSI-effect, seriously affected the investigation. The case demonstrated that in practice, databases could be used to select and prosecute an individual, simply because he ranked high in the list of possible matches. We have identified this phenomenon as a cognitive error which we term: 'the naïve investigator effect'. We take the opportunity to test the performance of database extraction strategies either by using a simple matching allele count (MAC) method or LR. The example heard by the appeal court is used as the exemplar case. It is demonstrated that the LR search-method offers substantial benefits compared to searches based on simple matching allele count (MAC
Mitochondrial DNA control region sequences from Nairobi (Kenya): inferring phylogenetic parameters for the establishment of a forensic database.

PubMed

Brandstätter, Anita; Peterson, Christine T; Irwin, Jodi A; Mpoke, Solomon; Koech, Davy K; Parson, Walther; Parsons, Thomas J

2004-10-01

Large forensic mtDNA databases which adhere to strict guidelines for generation and maintenance, are not available for many populations outside of the United States and western Europe. We have established a high quality mtDNA control region sequence database for urban Nairobi as both a reference database for forensic investigations, and as a tool to examine the genetic variation of Kenyan sequences in the context of known African variation. The Nairobi sequences exhibited high variation and a low random match probability, indicating utility for forensic testing. Haplogroup identification and frequencies were compared with those reported from other published studies on African, or African-origin populations from Mozambique, Sierra Leone, and the United States, and suggest significant differences in the mtDNA compositions of the various populations. The quality of the sequence data in our study was investigated and supported using phylogenetic measures. Our data demonstrate the diversity and distinctiveness of African populations, and underline the importance of establishing additional forensic mtDNA databases of indigenous African populations.
76 FR 1137 - Publicly Available Consumer Product Safety Information Database: Notice of Public Web Conferences

Federal Register 2010, 2011, 2012, 2013, 2014

2011-01-07

...: Notice of Public Web Conferences AGENCY: Consumer Product Safety Commission. ACTION: Notice. SUMMARY: The Consumer Product Safety Commission (``Commission,'' ``CPSC,'' or ``we'') is announcing two Web conferences... database (``Database''). The Web conferences will be webcast live from the Commission's headquarters in...
Gene and protein nomenclature in public databases

PubMed Central

Fundel, Katrin; Zimmer, Ralf

2006-01-01

Background Frequently, several alternative names are in use for biological objects such as genes and proteins. Applications like manual literature search, automated text-mining, named entity identification, gene/protein annotation, and linking of knowledge from different information sources require the knowledge of all used names referring to a given gene or protein. Various organism-specific or general public databases aim at organizing knowledge about genes and proteins. These databases can be used for deriving gene and protein name dictionaries. So far, little is known about the differences between databases in terms of size, ambiguities and overlap. Results We compiled five gene and protein name dictionaries for each of the five model organisms (yeast, fly, mouse, rat, and human) from different organism-specific and general public databases. We analyzed the degree of ambiguity of gene and protein names within and between dictionaries, to a lexicon of common English words and domain-related non-gene terms, and we compared different data sources in terms of size of extracted dictionaries and overlap of synonyms between those. The study shows that the number of genes/proteins and synonyms covered in individual databases varies significantly for a given organism, and that the degree of ambiguity of synonyms varies significantly between different organisms. Furthermore, it shows that, despite considerable efforts of co-curation, the overlap of synonyms in different data sources is rather moderate and that the degree of ambiguity of gene names with common English words and domain-related non-gene terms varies depending on the considered organism. Conclusion In conclusion, these results indicate that the combination of data contained in different databases allows the generation of gene and protein name dictionaries that contain significantly more used names than dictionaries obtained from individual data sources. Furthermore, curation of combined dictionaries
FARME DB: a functional antibiotic resistance element database

PubMed Central

Wallace, James C.; Port, Jesse A.; Smith, Marissa N.; Faustman, Elaine M.

2017-01-01

Antibiotic resistance (AR) is a major global public health threat but few resources exist that catalog AR genes outside of a clinical context. Current AR sequence databases are assembled almost exclusively from genomic sequences derived from clinical bacterial isolates and thus do not include many microbial sequences derived from environmental samples that confer resistance in functional metagenomic studies. These environmental metagenomic sequences often show little or no similarity to AR sequences from clinical isolates using standard classification criteria. In addition, existing AR databases provide no information about flanking sequences containing regulatory or mobile genetic elements. To help address this issue, we created an annotated database of DNA and protein sequences derived exclusively from environmental metagenomic sequences showing AR in laboratory experiments. Our Functional Antibiotic Resistant Metagenomic Element (FARME) database is a compilation of publically available DNA sequences and predicted protein sequences conferring AR as well as regulatory elements, mobile genetic elements and predicted proteins flanking antibiotic resistant genes. FARME is the first database to focus on functional metagenomic AR gene elements and provides a resource to better understand AR in the 99% of bacteria which cannot be cultured and the relationship between environmental AR sequences and antibiotic resistant genes derived from cultured isolates. Database URL: http://staff.washington.edu/jwallace/farme PMID:28077567
Approaching the taxonomic affiliation of unidentified sequences in public databases--an example from the mycorrhizal fungi.

PubMed

Nilsson, R Henrik; Kristiansson, Erik; Ryberg, Martin; Larsson, Karl-Henrik

2005-07-18

During the last few years, DNA sequence analysis has become one of the primary means of taxonomic identification of species, particularly so for species that are minute or otherwise lack distinct, readily obtainable morphological characters. Although the number of sequences available for comparison in public databases such as GenBank increases exponentially, only a minuscule fraction of all organisms have been sequenced, leaving taxon sampling a momentous problem for sequence-based taxonomic identification. When querying GenBank with a set of unidentified sequences, a considerable proportion typically lack fully identified matches, forming an ever-mounting pile of sequences that the researcher will have to monitor manually in the hope that new, clarifying sequences have been submitted by other researchers. To alleviate these concerns, a project to automatically monitor select unidentified sequences in GenBank for taxonomic progress through repeated local BLAST searches was initiated. Mycorrhizal fungi--a field where species identification often is prohibitively complex--and the much used ITS locus were chosen as test bed. A Perl script package called emerencia is presented. On a regular basis, it downloads select sequences from GenBank, separates the identified sequences from those insufficiently identified, and performs BLAST searches between these two datasets, storing all results in an SQL database. On the accompanying web-service http://emerencia.math.chalmers.se, users can monitor the taxonomic progress of insufficiently identified sequences over time, either through active searches or by signing up for e-mail notification upon disclosure of better matches. Other search categories, such as listing all insufficiently identified sequences (and their present best fully identified matches) publication-wise, are also available. The ever-increasing use of DNA sequences for identification purposes largely falls back on the assumption that public sequence databases
A Public-Use, Full-Screen Interface for SPIRES Databases.

ERIC Educational Resources Information Center

Kriz, Harry M.

This paper describes the techniques for implementing a full-screen, custom SPIRES interface for a public-use library database. The database-independent protocol that controls the system is described in detail. Source code for an entire working application using this interface is included. The protocol, with less than 170 lines of procedural code,…
The barley EST DNA Replication and Repair Database (bEST-DRRD) as a tool for the identification of the genes involved in DNA replication and repair.

PubMed

Gruszka, Damian; Marzec, Marek; Szarejko, Iwona

2012-06-14

The high level of conservation of genes that regulate DNA replication and repair indicates that they may serve as a source of information on the origin and evolution of the species and makes them a reliable system for the identification of cross-species homologs. Studies that had been conducted to date shed light on the processes of DNA replication and repair in bacteria, yeast and mammals. However, there is still much to be learned about the process of DNA damage repair in plants. These studies, which were conducted mainly using bioinformatics tools, enabled the list of genes that participate in various pathways of DNA repair in Arabidopsis thaliana (L.) Heynh to be outlined; however, information regarding these mechanisms in crop plants is still very limited. A similar, functional approach is particularly difficult for a species whose complete genomic sequences are still unavailable. One of the solutions is to apply ESTs (Expressed Sequence Tags) as the basis for gene identification. For the construction of the barley EST DNA Replication and Repair Database (bEST-DRRD), presented here, the Arabidopsis nucleotide and protein sequences involved in DNA replication and repair were used to browse for and retrieve the deposited sequences, derived from four barley (Hordeum vulgare L.) sequence databases, including the "Barley Genome version 0.05" database (encompassing ca. 90% of barley coding sequences) and from two databases covering the complete genomes of two monocot models: Oryza sativa L. and Brachypodium distachyon L. in order to identify homologous genes. Sequences of the categorised Arabidopsis queries are used for browsing the repositories, which are located on the ViroBLAST platform. The bEST-DRRD is currently used in our project during the identification and validation of the barley genes involved in DNA repair. The presented database provides information about the Arabidopsis genes involved in DNA replication and repair, their expression patterns and models
Prisoners' expectations of the national forensic DNA database: surveillance and reconfiguration of individual rights.

PubMed

Machado, Helena; Santos, Filipe; Silva, Susana

2011-07-15

In this paper we aim to discuss how Portuguese prisoners know and what they feel about surveillance mechanisms related to the inclusion and deletion of the DNA profiles of convicted criminals in the national forensic database. Through a set of interviews with individuals currently imprisoned we focus on the ways this group perceives forensic DNA technologies. While the institutional and political discourses maintain that the restricted use and application of DNA profiles within the national forensic database protects individuals' rights, the prisoners claim that police misuse of such technologies potentially makes it difficult to escape from surveillance and acts as a mean of reinforcing the stigma of delinquency. The prisoners also argue that additional intensive and extensive use of surveillance devices might be more protective of their own individual rights and might possibly increase potential for exoneration. Crown Copyright © 2011. Published by Elsevier Ireland Ltd. All rights reserved.

Academic Impact of a Public Electronic Health Database: Bibliometric Analysis of Studies Using the General Practice Research Database

PubMed Central

Chen, Yu-Chun; Wu, Jau-Ching; Haschler, Ingo; Majeed, Azeem; Chen, Tzeng-Ji; Wetter, Thomas

2011-01-01

Background Studies that use electronic health databases as research material are getting popular but the influence of a single electronic health database had not been well investigated yet. The United Kingdom's General Practice Research Database (GPRD) is one of the few electronic health databases publicly available to academic researchers. This study analyzed studies that used GPRD to demonstrate the scientific production and academic impact by a single public health database. Methodology and Findings A total of 749 studies published between 1995 and 2009 with ‘General Practice Research Database’ as their topics, defined as GPRD studies, were extracted from Web of Science. By the end of 2009, the GPRD had attracted 1251 authors from 22 countries and been used extensively in 749 studies published in 193 journals across 58 study fields. Each GPRD study was cited 2.7 times by successive studies. Moreover, the total number of GPRD studies increased rapidly, and it is expected to reach 1500 by 2015, twice the number accumulated till the end of 2009. Since 17 of the most prolific authors (1.4% of all authors) contributed nearly half (47.9%) of GPRD studies, success in conducting GPRD studies may accumulate. The GPRD was used mainly in, but not limited to, the three study fields of “Pharmacology and Pharmacy”, “General and Internal Medicine”, and “Public, Environmental and Occupational Health”. The UK and United States were the two most active regions of GPRD studies. One-third of GRPD studies were internationally co-authored. Conclusions A public electronic health database such as the GPRD will promote scientific production in many ways. Data owners of electronic health databases at a national level should consider how to reduce access barriers and to make data more available for research. PMID:21731733
Feline Non-repetitive Mitochondrial DNA Control Region Database for Forensic Evidence

PubMed Central

Grahn, R. A.; Kurushima, J. D.; Billings, N. C.; Grahn, J.C.; Halverson, J. L.; Hammer, E.; Ho, C.K.; Kun, T. J.; Levy, J.K.; Lipinski, M. J.; Mwenda, J.M.; Ozpinar, H.; Schuster, R.K; Shoorijeh, S.J.; Tarditi, C. R.; Waly, N.E.; Wictum, E. J.; Lyons, L. A.

2010-01-01

The domestic cat is the one of the most popular pets throughout the world. A by-product of owning, interacting with, or being in a household with a cat is the transfer of shed fur to clothing or personal objects. As trace evidence, transferred cat fur is a relatively untapped resource for forensic scientists. Both phenotypic and genotypic characteristics can be obtained from cat fur, but databases for neither aspect exist. Because cats incessantly groom, cat fur may have nucleated cells, not only in the hair bulb, but also as epithelial cells on the hair shaft deposited during the grooming process, thereby generally providing material for DNA profiling. To effectively exploit cat hair as a resource, representative databases must be established. This study evaluates 402 bp of the mtDNA control region (CR) from 1,394 cats, including cats from 25 distinct worldwide populations and 26 breeds. Eighty-three percent of the cats are represented by 12 major mitotypes. An additional 8.0% are clearly derived from the major mitotypes. Unique sequences were found in 7.5% of the cats. The overall genetic diversity for this data set was 0.8813 ± 0.0046 with a random match probability of 11.8%. This region of the cat mtDNA has discriminatory power suitable for forensic application worldwide. PMID:20457082
Development of a 20-locus fluorescent multiplex system as a valuable tool for national DNA database.

PubMed

Jiang, Xianhua; Guo, Fei; Jia, Fei; Jin, Ping; Sun, Zhu

2013-02-01

The multiplex system allows the detection of 19 autosomal short tandem repeat (STR) loci [including all Combined DNA Index System (CODIS) STR loci as well as D2S1338, D6S1043, D12S391, D19S433, Penta D and Penta E] plus the sex-determining locus Amelogenin in a single reaction, comprising all STR loci in various commercial kits used in the China national DNA database (NDNAD). Primers are designed so that the amplicons are distributed ranging from 90 base pairs (bp) to 450 bp within a five-dye fluorescent design with the fifth dye reserved for the internal size standard. With 30 cycles, 125 pg to 2 ng DNA template showed optimal profiling result, while robust profiles could also be achieved by adjusting the cycle numbers for the DNA template beyond that optimal DNA input range. Mixture studies showed that 83% and 87% of minor alleles were detected at 9:1 and 1:9 ratios, respectively. When 4 ng of degraded DNA was digested by 2-min DNase and 1 ng undegraded DNA was added to 400 μM haematin, the complete profiles were still observed. Polymerase chain reaction (PCR)-based procedures were examined and optimized including the concentrations of primer set, magnesium and the Taq polymerase as well as volume, cycle number and annealing temperature. In addition, the system has been validated by 3000 bloodstain samples and 35 common case samples in line with the Chinese National Standards and Scientific Working Group on DNA Analysis Methods (SWGDAM) guidelines. The total probability of identity (TPI) can reach to 8×10(-24), where DNA database can be improved at the level of 10 million DNA profiles or more because the number of expected match is far from one person (4×10(-10)) and can be negligible. Further, our system also demonstrates its good performance in case samples and it will be an ideal tool for forensic DNA typing and databasing with potential application. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
MethSMRT: an integrative database for DNA N6-methyladenine and N4-methylcytosine generated by single-molecular real-time sequencing.

PubMed

Ye, Pohao; Luan, Yizhao; Chen, Kaining; Liu, Yizhi; Xiao, Chuanle; Xie, Zhi

2017-01-04

DNA methylation is an important type of epigenetic modifications, where 5- methylcytosine (5mC), 6-methyadenine (6mA) and 4-methylcytosine (4mC) are the most common types. Previous efforts have been largely focused on 5mC, providing invaluable insights into epigenetic regulation through DNA methylation. Recently developed single-molecule real-time (SMRT) sequencing technology provides a unique opportunity to detect the less studied DNA 6mA and 4mC modifications at single-nucleotide resolution. With a rapidly increased amount of SMRT sequencing data generated, there is an emerging demand to systematically explore DNA 6mA and 4mC modifications from these data sets. MethSMRT is the first resource hosting DNA 6mA and 4mC methylomes. All the data sets were processed using the same analysis pipeline with the same quality control. The current version of the database provides a platform to store, browse, search and download epigenome-wide methylation profiles of 156 species, including seven eukaryotes such as Arabidopsis, C. elegans, Drosophila, mouse and yeast, as well as 149 prokaryotes. It also offers a genome browser to visualize the methylation sites and related information such as single nucleotide polymorphisms (SNP) and genomic annotation. Furthermore, the database provides a quick summary of statistics of methylome of 6mA and 4mC and predicted methylation motifs for each species. MethSMRT is publicly available at http://sysbio.sysu.edu.cn/methsmrt/ without use restriction. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
High-quality mtDNA control region sequences from 680 individuals sampled across the Netherlands to establish a national forensic mtDNA reference database.

PubMed

Chaitanya, Lakshmi; van Oven, Mannis; Brauer, Silke; Zimmermann, Bettina; Huber, Gabriela; Xavier, Catarina; Parson, Walther; de Knijff, Peter; Kayser, Manfred

2016-03-01

The use of mitochondrial DNA (mtDNA) for maternal lineage identification often marks the last resort when investigating forensic and missing-person cases involving highly degraded biological materials. As with all comparative DNA testing, a match between evidence and reference sample requires a statistical interpretation, for which high-quality mtDNA population frequency data are crucial. Here, we determined, under high quality standards, the complete mtDNA control-region sequences of 680 individuals from across the Netherlands sampled at 54 sites, covering the entire country with 10 geographic sub-regions. The complete mtDNA control region (nucleotide positions 16,024-16,569 and 1-576) was amplified with two PCR primers and sequenced with ten different sequencing primers using the EMPOP protocol. Haplotype diversity of the entire sample set was very high at 99.63% and, accordingly, the random-match probability was 0.37%. No population substructure within the Netherlands was detected with our dataset. Phylogenetic analyses were performed to determine mtDNA haplogroups. Inclusion of these high-quality data in the EMPOP database (accession number: EMP00666) will improve its overall data content and geographic coverage in the interest of all EMPOP users worldwide. Moreover, this dataset will serve as (the start of) a national reference database for mtDNA applications in forensic and missing person casework in the Netherlands. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Retrovirus Integration Database (RID): a public database for retroviral insertion sites into host genomes.

PubMed

Shao, Wei; Shan, Jigui; Kearney, Mary F; Wu, Xiaolin; Maldarelli, Frank; Mellors, John W; Luke, Brian; Coffin, John M; Hughes, Stephen H

2016-07-04

The NCI Retrovirus Integration Database is a MySql-based relational database created for storing and retrieving comprehensive information about retroviral integration sites, primarily, but not exclusively, HIV-1. The database is accessible to the public for submission or extraction of data originating from experiments aimed at collecting information related to retroviral integration sites including: the site of integration into the host genome, the virus family and subtype, the origin of the sample, gene exons/introns associated with integration, and proviral orientation. Information about the references from which the data were collected is also stored in the database. Tools are built into the website that can be used to map the integration sites to UCSC genome browser, to plot the integration site patterns on a chromosome, and to display provirus LTRs in their inserted genome sequence. The website is robust, user friendly, and allows users to query the database and analyze the data dynamically. https://rid.ncifcrf.gov ; or http://home.ncifcrf.gov/hivdrp/resources.htm .
Low template STR typing: effect of replicate number and consensus method on genotyping reliability and DNA database search results.

PubMed

Benschop, Corina C G; van der Beek, Cornelis P; Meiland, Hugo C; van Gorp, Ankie G M; Westen, Antoinette A; Sijen, Titia

2011-08-01

To analyze DNA samples with very low DNA concentrations, various methods have been developed that sensitize short tandem repeat (STR) typing. Sensitized DNA typing is accompanied by stochastic amplification effects, such as allele drop-outs and drop-ins. Therefore low template (LT) DNA profiles are interpreted with care. One can either try to infer the genotype by a consensus method that uses alleles confirmed in replicate analyses, or one can use a statistical model to evaluate the strength of the evidence in a direct comparison with a known DNA profile. In this study we focused on the first strategy and we show that the procedure by which the consensus profile is assembled will affect genotyping reliability. In order to gain insight in the roles of replicate number and requested level of reproducibility, we generated six independent amplifications of samples of known donors. The LT methods included both increased cycling and enhanced capillary electrophoresis (CE) injection [1]. Consensus profiles were assembled from two to six of the replications using four methods: composite (include all alleles), n-1 (include alleles detected in all but one replicate), n/2 (include alleles detected in at least half of the replicates) and 2× (include alleles detected twice). We compared the consensus DNA profiles with the DNA profile of the known donor, studied the stochastic amplification effects and examined the effect of the consensus procedure on DNA database search results. From all these analyses we conclude that the accuracy of LT DNA typing and the efficiency of database searching improve when the number of replicates is increased and the consensus method is n/2. The most functional number of replicates within this n/2 method is four (although a replicate number of three suffices for samples showing >25% of the alleles in standard STR typing). This approach was also the optimal strategy for the analysis of 2-person mixtures, although modified search strategies may be
Molecular scaffold analysis of natural products databases in the public domain.

PubMed

Yongye, Austin B; Waddell, Jacob; Medina-Franco, José L

2012-11-01

Natural products represent important sources of bioactive compounds in drug discovery efforts. In this work, we compiled five natural products databases available in the public domain and performed a comprehensive chemoinformatic analysis focused on the content and diversity of the scaffolds with an overview of the diversity based on molecular fingerprints. The natural products databases were compared with each other and with a set of molecules obtained from in-house combinatorial libraries, and with a general screening commercial library. It was found that publicly available natural products databases have different scaffold diversity. In contrast to the common concept that larger libraries have the largest scaffold diversity, the largest natural products collection analyzed in this work was not the most diverse. The general screening library showed, overall, the highest scaffold diversity. However, considering the most frequent scaffolds, the general reference library was the least diverse. In general, natural products databases in the public domain showed low molecule overlap. In addition to benzene and acyclic compounds, flavones, coumarins, and flavanones were identified as the most frequent molecular scaffolds across the different natural products collections. The results of this work have direct implications in the computational and experimental screening of natural product databases for drug discovery. © 2012 John Wiley & Sons A/S.
TabSQL: a MySQL tool to facilitate mapping user data to public databases.

PubMed

Xia, Xiao-Qin; McClelland, Michael; Wang, Yipeng

2010-06-23

With advances in high-throughput genomics and proteomics, it is challenging for biologists to deal with large data files and to map their data to annotations in public databases. We developed TabSQL, a MySQL-based application tool, for viewing, filtering and querying data files with large numbers of rows. TabSQL provides functions for downloading and installing table files from public databases including the Gene Ontology database (GO), the Ensembl databases, and genome databases from the UCSC genome bioinformatics site. Any other database that provides tab-delimited flat files can also be imported. The downloaded gene annotation tables can be queried together with users' data in TabSQL using either a graphic interface or command line. TabSQL allows queries across the user's data and public databases without programming. It is a convenient tool for biologists to annotate and enrich their data.
TabSQL: a MySQL tool to facilitate mapping user data to public databases

PubMed Central

2010-01-01

Background With advances in high-throughput genomics and proteomics, it is challenging for biologists to deal with large data files and to map their data to annotations in public databases. Results We developed TabSQL, a MySQL-based application tool, for viewing, filtering and querying data files with large numbers of rows. TabSQL provides functions for downloading and installing table files from public databases including the Gene Ontology database (GO), the Ensembl databases, and genome databases from the UCSC genome bioinformatics site. Any other database that provides tab-delimited flat files can also be imported. The downloaded gene annotation tables can be queried together with users' data in TabSQL using either a graphic interface or command line. Conclusions TabSQL allows queries across the user's data and public databases without programming. It is a convenient tool for biologists to annotate and enrich their data. PMID:20573251
Estimating haplotype frequencies by combining data from large DNA pools with database information.

PubMed

Gasbarra, Dario; Kulathinal, Sangita; Pirinen, Matti; Sillanpää, Mikko J

2011-01-01

We assume that allele frequency data have been extracted from several large DNA pools, each containing genetic material of up to hundreds of sampled individuals. Our goal is to estimate the haplotype frequencies among the sampled individuals by combining the pooled allele frequency data with prior knowledge about the set of possible haplotypes. Such prior information can be obtained, for example, from a database such as HapMap. We present a Bayesian haplotyping method for pooled DNA based on a continuous approximation of the multinomial distribution. The proposed method is applicable when the sizes of the DNA pools and/or the number of considered loci exceed the limits of several earlier methods. In the example analyses, the proposed model clearly outperforms a deterministic greedy algorithm on real data from the HapMap database. With a small number of loci, the performance of the proposed method is similar to that of an EM-algorithm, which uses a multinormal approximation for the pooled allele frequencies, but which does not utilize prior information about the haplotypes. The method has been implemented using Matlab and the code is available upon request from the authors.
Digital Equipment Corporation's CRDOM Software and Database Publications.

ERIC Educational Resources Information Center

Adams, Michael Q.

1986-01-01

Acquaints information professionals with Digital Equipment Corporation's compact optical disk read-only-memory (CDROM) search and retrieval software and growing library of CDROM database publications (COMPENDEX, Chemical Abstracts Services). Highlights include MicroBASIS, boolean operators, range operators, word and phrase searching, proximity…
The DNA database search controversy revisited: bridging the Bayesian-frequentist gap.

PubMed

Storvik, Geir; Egeland, Thore

2007-09-01

Two different quantities have been suggested for quantification of evidence in cases where a suspect is found by a search through a database of DNA profiles. The likelihood ratio, typically motivated from a Bayesian setting, is preferred by most experts in the field. The so-called np rule has been suggested through frequentist arguments and has been suggested by the American National Research Council and Stockmarr (1999, Biometrics55, 671-677). The two quantities differ substantially and have given rise to the DNA database search controversy. Although several authors have criticized the different approaches, a full explanation of why these differences appear is still lacking. In this article we show that a P-value in a frequentist hypothesis setting is approximately equal to the result of the np rule. We argue, however, that a more reasonable procedure in this case is to use conditional testing, in which case a P-value directly related to posterior probabilities and the likelihood ratio is obtained. This way of viewing the problem bridges the gap between the Bayesian and frequentist approaches. At the same time it indicates that the np rule should not be used to quantify evidence.
Reporting discrepancies between the ClinicalTrials.gov results database and peer-reviewed publications.

PubMed

Hartung, Daniel M; Zarin, Deborah A; Guise, Jeanne-Marie; McDonagh, Marian; Paynter, Robin; Helfand, Mark

2014-04-01

ClinicalTrials.gov requires reporting of result summaries for many drug and device trials. To evaluate the consistency of reporting of trials that are registered in the ClinicalTrials.gov results database and published in the literature. ClinicalTrials.gov results database and matched publications identified through ClinicalTrials.gov and a manual search of 2 electronic databases. 10% random sample of phase 3 or 4 trials with results in the ClinicalTrials.gov results database, completed before 1 January 2009, with 2 or more groups. One reviewer extracted data about trial design and results from the results database and matching publications. A subsample was independently verified. Of 110 trials with results, most were industry-sponsored, parallel-design drug studies. The most common inconsistency was the number of secondary outcome measures reported (80%). Sixteen trials (15%) reported the primary outcome description inconsistently, and 22 (20%) reported the primary outcome value inconsistently. Thirty-eight trials inconsistently reported the number of individuals with a serious adverse event (SAE); of these, 33 (87%) reported more SAEs in ClinicalTrials.gov. Among the 84 trials that reported SAEs in ClinicalTrials.gov, 11 publications did not mention SAEs, 5 reported them as zero or not occurring, and 21 reported a different number of SAEs. Among 29 trials that reported deaths in ClinicalTrials.gov, 28% differed from the matched publication. Small sample that included earliest results posted to the database. Reporting discrepancies between the ClinicalTrials.gov results database and matching publications are common. Which source contains the more accurate account of results is unclear, although ClinicalTrials.gov may provide a more comprehensive description of adverse events than the publication. Agency for Healthcare Research and Quality.
Forensic DNA Profiling and Database

PubMed Central

Panneerchelvam, S.; Norazmi, M.N.

2003-01-01

The incredible power of DNA technology as an identification tool had brought a tremendous change in crimnal justice . DNA data base is an information resource for the forensic DNA typing community with details on commonly used short tandem repeat (STR) DNA markers. This article discusses the essential steps in compilation of COmbined DNA Index System (CODIS) on validated polymerase chain amplified STRs and their use in crime detection. PMID:23386793
76 FR 53912 - FDA's Public Database of Products With Orphan-Drug Designation: Replacing Non-Informative Code...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-08-30

...] FDA's Public Database of Products With Orphan-Drug Designation: Replacing Non-Informative Code Names... replaced non- informative code names with descriptive identifiers on its public database of products that... on our public database with non-informative code names. After careful consideration of this matter...
MethBank: a database integrating next-generation sequencing single-base-resolution DNA methylation programming data.

PubMed

Zou, Dong; Sun, Shixiang; Li, Rujiao; Liu, Jiang; Zhang, Jing; Zhang, Zhang

2015-01-01

DNA methylation plays crucial roles during embryonic development. Here we present MethBank (http://dnamethylome.org), a DNA methylome programming database that integrates the genome-wide single-base nucleotide methylomes of gametes and early embryos in different model organisms. Unlike extant relevant databases, MethBank incorporates the whole-genome single-base-resolution methylomes of gametes and early embryos at multiple different developmental stages in zebrafish and mouse. MethBank allows users to retrieve methylation levels, differentially methylated regions, CpG islands, gene expression profiles and genetic polymorphisms for a specific gene or genomic region. Moreover, it offers a methylome browser that is capable of visualizing high-resolution DNA methylation profiles as well as other related data in an interactive manner and thus is of great helpfulness for users to investigate methylation patterns and changes of gametes and early embryos at different developmental stages. Ongoing efforts are focused on incorporation of methylomes and related data from other organisms. Together, MethBank features integration and visualization of high-resolution DNA methylation data as well as other related data, enabling identification of potential DNA methylation signatures in different developmental stages and accordingly providing an important resource for the epigenetic and developmental studies. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Familial searching: a specialist forensic DNA profiling service utilising the National DNA Database to identify unknown offenders via their relatives--the UK experience.

PubMed

Maguire, C N; McCallum, L A; Storey, C; Whitaker, J P

2014-01-01

The National DNA Database (NDNAD) of England and Wales was established on April 10th 1995. The NDNAD is governed by a variety of legislative instruments that mean that DNA samples can be taken if an individual is arrested and detained in a police station. The biological samples and the DNA profiles derived from them can be used for purposes related to the prevention and detection of crime, the investigation of an offence and for the conduct of a prosecution. Following the South East Asian Tsunami of December 2004, the legislation was amended to allow the use of the NDNAD to assist in the identification of a deceased person or of a body part where death has occurred from natural causes or from a natural disaster. The UK NDNAD now contains the DNA profiles of approximately 6 million individuals representing 9.6% of the UK population. As the science of DNA profiling advanced, the National DNA Database provided a potential resource for increased intelligence beyond the direct matching for which it was originally created. The familial searching service offered to the police by several UK forensic science providers exploits the size and geographic coverage of the NDNAD and the fact that close relatives of an offender may share a significant proportion of that offender's DNA profile and will often reside in close geographic proximity to him or her. Between 2002 and 2011 Forensic Science Service Ltd. (FSS) provided familial search services to support 188 police investigations, 70 of which are still active cases. This technique, which may be used in serious crime cases or in 'cold case' reviews when there are few or no investigative leads, has led to the identification of 41 perpetrators or suspects. In this paper we discuss the processes, utility, and governance of the familial search service in which the NDNAD is searched for close genetic relatives of an offender who has left DNA evidence at a crime scene, but whose DNA profile is not represented within the NDNAD. We
Collecting, archiving and processing DNA from wildlife samples using FTA® databasing paper

PubMed Central

Smith, LM; Burgoyne, LA

2004-01-01

Background Methods involving the analysis of nucleic acids have become widespread in the fields of traditional biology and ecology, however the storage and transport of samples collected in the field to the laboratory in such a manner to allow purification of intact nucleic acids can prove problematical. Results FTA® databasing paper is widely used in human forensic analysis for the storage of biological samples and for purification of nucleic acids. The possible uses of FTA® databasing paper in the purification of DNA from samples of wildlife origin were examined, with particular reference to problems expected due to the nature of samples of wildlife origin. The processing of blood and tissue samples, the possibility of excess DNA in blood samples due to nucleated erythrocytes, and the analysis of degraded samples were all examined, as was the question of long term storage of blood samples on FTA® paper. Examples of the end use of the purified DNA are given for all protocols and the rationale behind the processing procedures is also explained to allow the end user to adjust the protocols as required. Conclusions FTA® paper is eminently suitable for collection of, and purification of nucleic acids from, biological samples from a wide range of wildlife species. This technology makes the collection and storage of such samples much simpler. PMID:15072582
A publication database for optical long baseline interferometry

NASA Astrophysics Data System (ADS)

Malbet, Fabien; Mella, Guillaume; Lawson, Peter; Taillifet, Esther; Lafrasse, Sylvain

2010-07-01

Optical long baseline interferometry is a technique that has generated almost 850 refereed papers to date. The targets span a large variety of objects from planetary systems to extragalactic studies and all branches of stellar physics. We have created a database hosted by the JMMC and connected to the Optical Long Baseline Interferometry Newsletter (OLBIN) web site using MySQL and a collection of XML or PHP scripts in order to store and classify these publications. Each entry is defined by its ADS bibcode, includes basic ADS informations and metadata. The metadata are specified by tags sorted in categories: interferometric facilities, instrumentation, wavelength of operation, spectral resolution, type of measurement, target type, and paper category, for example. The whole OLBIN publication list has been processed and we present how the database is organized and can be accessed. We use this tool to generate statistical plots of interest for the community in optical long baseline interferometry.

Distributed structure-searchable toxicity (DSSTox) public database network: a proposal.

PubMed

Richard, Ann M; Williams, ClarLynda R

2002-01-29

The ability to assess the potential genotoxicity, carcinogenicity, or other toxicity of pharmaceutical or industrial chemicals based on chemical structure information is a highly coveted and shared goal of varied academic, commercial, and government regulatory groups. These diverse interests often employ different approaches and have different criteria and use for toxicity assessments, but they share a need for unrestricted access to existing public toxicity data linked with chemical structure information. Currently, there exists no central repository of toxicity information, commercial or public, that adequately meets the data requirements for flexible analogue searching, Structure-Activity Relationship (SAR) model development, or building of chemical relational databases (CRD). The distributed structure-searchable toxicity (DSSTox) public database network is being proposed as a community-supported, web-based effort to address these shared needs of the SAR and toxicology communities. The DSSTox project has the following major elements: (1) to adopt and encourage the use of a common standard file format (structure data file (SDF)) for public toxicity databases that includes chemical structure, text and property information, and that can easily be imported into available CRD applications; (2) to implement a distributed source approach, managed by a DSSTox Central Website, that will enable decentralized, free public access to structure-toxicity data files, and that will effectively link knowledgeable toxicity data sources with potential users of these data from other disciplines (such as chemistry, modeling, and computer science); and (3) to engage public/commercial/academic/industry groups in contributing to and expanding this community-wide, public data sharing and distribution effort. The DSSTox project's overall aims are to effect the closer association of chemical structure information with existing toxicity data, and to promote and facilitate structure
75 FR 41180 - Notice of Order: Revisions to Enterprise Public Use Database

Federal Register 2010, 2011, 2012, 2013, 2014

2010-07-15

... Database AGENCY: Federal Housing Finance Agency. ACTION: Notice of order. SUMMARY: Section 1323(a)(1) of.... This responsibility to maintain a public use database (PUDB) for such mortgage data was transferred to... purpose of loan data field in these two databases. 4. Single-family Data Field 27 and Multifamily Data...
[An in vivo DNA adduct database for carcinogens: O6-alkylguanine, O4-alkylthymine and 8-hydroxyguanine].

PubMed

Morimoto, K; Kimura, M; Murata, T; Imai, Y; Ookami, N; Igarashi, T; Kanoh, N; Kaminuma, T; Hayashi, Y

1994-01-01

Many carcinogens react with DNA and form critical DNA adducts, such as O6-alkylguanine (O6-AG), O4-alkylthymine (O4-AT), and 8-hydroxyguanine (8-OHG). This study provides a database that can be used for molecular dosimetry of these DNA adducts. A literature survey on DNA binding in vivo was done by the Dialog search from the MEDLINE database. We propose a Critical Covalent Binding Index (CCBI) for the assessment of in vivo DNA binding level (expressed as micro mol chemical bound per mol G or T/mmol chemical administered per kg body weight). The number of records and compounds in parenthesis of O6-AG, O4-AT, and 8-OHG were 245(13), 54(4), 79(15), respectively. Since the CCBI values for N-nitrosamine in target organ were higher than for non-target organ, they may provide a useful index for estimation of target organ site and carcinogenic potency. As a case example, CCBI values for O4-AT from animal data were applied for diethylnitrosamine human exposure estimation by diethylnitrosamine.
Contamination of sequence databases with adaptor sequences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoshikawa, Takeo; Sanders, A.R.; Detera-Wadleigh, S.D.

Because of the exponential increase in the amount of DNA sequences being added to the public databases on a daily basis, it has become imperative to identify sources of contamination rapidly. Previously, contaminations of sequence databases have been reported to alert the scientific community to the problem. These contaminations can be divided into two categories. The first category comprises host sequences that have been difficult for submitters to manage or control. Examples include anomalous sequences derived from Escherichia coli, which are inserted into the chromosomes (and plasmids) of the bacterial hosts. Insertion sequences are highly mobile and are capable ofmore » transposing themselves into plasmids during cloning manipulation. Another example of the first category is the infection with yeast genomic DNA or with bacterial DNA of some commercially available cDNA libraries from Clontech. The second category of database contamination is due to the inadvertent inclusion of nonhost sequences. This category includes incorporation of cloning-vector sequences and multicloning sites in the database submission. M13-derived artifacts have been common, since M13-based vectors have been widely used for subcloning DNA fragments. Recognizing this problem, the National Center for Biotechnology Information (NCBI) started to screen, in April 1994, all sequences directly submitted to GenBank, against a set of vector data retrieved from GenBank by use of key-word searches, such as {open_quotes}vector.{close_quotes} In this report, we present evidence for another sequence artifact that is widespread but that, to our knowledge, has not yet been reported. 11 refs., 1 tab.« less
Documentation for the U.S. Geological Survey Public-Supply Database (PSDB): A database of permitted public-supply wells, surface-water intakes, and systems in the United States

USGS Publications Warehouse

Price, Curtis V.; Maupin, Molly A.

2014-01-01

The purpose of this report is to document the PSDB and explain the methods used to populate and update the data from the SDWIS, State datasets, and map and geospatial imagery. This report describes 3 data tables and 11 domain tables, including field contents, data sources, and relations between tables. Although the PSDB database is not available to the general public, this information should be useful for others who are developing other database systems to store and analyze public-supply system and facility data.
iMETHYL: an integrative database of human DNA methylation, gene expression, and genomic variation.

PubMed

Komaki, Shohei; Shiwa, Yuh; Furukawa, Ryohei; Hachiya, Tsuyoshi; Ohmomo, Hideki; Otomo, Ryo; Satoh, Mamoru; Hitomi, Jiro; Sobue, Kenji; Sasaki, Makoto; Shimizu, Atsushi

2018-01-01

We launched an integrative multi-omics database, iMETHYL (http://imethyl.iwate-megabank.org). iMETHYL provides whole-DNA methylation (~24 million autosomal CpG sites), whole-genome (~9 million single-nucleotide variants), and whole-transcriptome (>14 000 genes) data for CD4 + T-lymphocytes, monocytes, and neutrophils collected from approximately 100 subjects. These data were obtained from whole-genome bisulfite sequencing, whole-genome sequencing, and whole-transcriptome sequencing, making iMETHYL a comprehensive database.
Average probability that a "cold hit" in a DNA database search results in an erroneous attribution.

PubMed

Song, Yun S; Patil, Anand; Murphy, Erin E; Slatkin, Montgomery

2009-01-01

We consider a hypothetical series of cases in which the DNA profile of a crime-scene sample is found to match a known profile in a DNA database (i.e., a "cold hit"), resulting in the identification of a suspect based only on genetic evidence. We show that the average probability that there is another person in the population whose profile matches the crime-scene sample but who is not in the database is approximately 2(N - d)p(A), where N is the number of individuals in the population, d is the number of profiles in the database, and p(A) is the average match probability (AMP) for the population. The AMP is estimated by computing the average of the probabilities that two individuals in the population have the same profile. We show further that if a priori each individual in the population is equally likely to have left the crime-scene sample, then the average probability that the database search attributes the crime-scene sample to a wrong person is (N - d)p(A).
The Moroccan Genetic Disease Database (MGDD): a database for DNA variations related to inherited disorders and disease susceptibility.

PubMed

Charoute, Hicham; Nahili, Halima; Abidi, Omar; Gabi, Khalid; Rouba, Hassan; Fakiri, Malika; Barakat, Abdelhamid

2014-03-01

National and ethnic mutation databases provide comprehensive information about genetic variations reported in a population or an ethnic group. In this paper, we present the Moroccan Genetic Disease Database (MGDD), a catalogue of genetic data related to diseases identified in the Moroccan population. We used the PubMed, Web of Science and Google Scholar databases to identify available articles published until April 2013. The Database is designed and implemented on a three-tier model using Mysql relational database and the PHP programming language. To date, the database contains 425 mutations and 208 polymorphisms found in 301 genes and 259 diseases. Most Mendelian diseases in the Moroccan population follow autosomal recessive mode of inheritance (74.17%) and affect endocrine, nutritional and metabolic physiology. The MGDD database provides reference information for researchers, clinicians and health professionals through a user-friendly Web interface. Its content should be useful to improve researches in human molecular genetics, disease diagnoses and design of association studies. MGDD can be publicly accessed at http://mgdd.pasteur.ma.
Genotyping and interpretation of STR-DNA: Low-template, mixtures and database matches-Twenty years of research and development.

PubMed

Gill, Peter; Haned, Hinda; Bleka, Oyvind; Hansson, Oskar; Dørum, Guro; Egeland, Thore

2015-09-01

The introduction of Short Tandem Repeat (STR) DNA was a revolution within a revolution that transformed forensic DNA profiling into a tool that could be used, for the first time, to create National DNA databases. This transformation would not have been possible without the concurrent development of fluorescent automated sequencers, combined with the ability to multiplex several loci together. Use of the polymerase chain reaction (PCR) increased the sensitivity of the method to enable the analysis of a handful of cells. The first multiplexes were simple: 'the quad', introduced by the defunct UK Forensic Science Service (FSS) in 1994, rapidly followed by a more discriminating 'six-plex' (Second Generation Multiplex) in 1995 that was used to create the world's first national DNA database. The success of the database rapidly outgrew the functionality of the original system - by the year 2000 a new multiplex of ten-loci was introduced to reduce the chance of adventitious matches. The technology was adopted world-wide, albeit with different loci. The political requirement to introduce pan-European databases encouraged standardisation - the development of European Standard Set (ESS) of markers comprising twelve-loci is the latest iteration. Although development has been impressive, the methods used to interpret evidence have lagged behind. For example, the theory to interpret complex DNA profiles (low-level mixtures), had been developed fifteen years ago, but only in the past year or so, are the concepts starting to be widely adopted. A plethora of different models (some commercial and others non-commercial) have appeared. This has led to a confusing 'debate' about the 'best' to use. The different models available are described along with their advantages and disadvantages. A section discusses the development of national DNA databases, along with details of an associated controversy to estimate the strength of evidence of matches. Current methodology is limited to
Building a Faculty Publications Database: A Case Study

ERIC Educational Resources Information Center

Tabaei, Sara; Schaffer, Yitzchak; McMurray, Gregory; Simon, Bashe

2013-01-01

This case study shares the experience of building an in-house faculty publications database that was spearheaded by the Touro College and University System library in 2010. The project began with the intention of contributing to the college by collecting the research accomplishments of our faculty and staff, thereby also increasing library…
The igmspec database of public spectra probing the intergalactic medium

NASA Astrophysics Data System (ADS)

Prochaska, J. X.

2017-04-01

We describe v02 of igmspec, a database of publicly available ultraviolet, optical, and near-infrared spectra that probe the intergalactic medium (IGM). This database, a child of the specdb repository in the specdb github organization, comprises 403 277 unique sources and 434 686 spectra obtained with the world's greatest observatories. All of these data are distributed in a single ≈ 25GB HDF5 file maintained at the University of California Observatories and the University of California, Santa Cruz. The specdb software package includes Python scripts and modules for searching the source catalog and spectral datasets, and software links to the linetools package for spectral analysis. The repository also includes software to generate private spectral datasets that are compliant with International Virtual Observatory Alliance (IVOA) protocols and a Python-based interface for IVOA Simple Spectral Access queries. Future versions of igmspec will ingest other sources (e.g. gamma-ray burst afterglows) and other surveys as they become publicly available. The overall goal is to include every spectrum that effectively probes the IGM. Future databases of specdb may include publicly available galaxy spectra (exgalspec) and published supernovae spectra (snspec). The community is encouraged to join the effort on github: https://github.com/specdb.
The UK National DNA Database: Implementation of the Protection of Freedoms Act 2012.

PubMed

Amankwaa, Aaron Opoku; McCartney, Carole

2018-03-01

In 2008, the European Court of Human Rights, in S and Marper v the United Kingdom, ruled that a retention regime that permits the indefinite retention of DNA records of both convicted and non-convicted ("innocent") individuals is disproportionate. The court noted that there was inadequate evidence to justify the retention of DNA records of the innocent. Since the Marper ruling, the laws governing the taking, use, and retention of forensic DNA in England and Wales have changed with the enactment of the Protection of Freedoms Act 2012 (PoFA). This Act, put briefly, permits the indefinite retention of DNA profiles of most convicted individuals and temporal retention for some first-time convicted minors and innocent individuals on the National DNA Database (NDNAD). The PoFA regime was implemented in October 2013. This paper examines ten post-implementation reports of the NDNAD Strategy Board (3), the NDNAD Ethics Group (3) and the Office of the Biometrics Commissioner (OBC) (4). Overall, the reports highlight a considerable improvement in the performance of the database, with a current match rate of 63.3%. Further, the new regime has strengthened the genetic privacy protection of UK citizens. The OBC reports detail implementation challenges ranging from technical, legal and procedural issues to sufficient understanding of the requirements of PoFA by police forces. Risks highlighted in these reports include the deletion of some "retainable" profiles, which could potentially lead to future crimes going undetected. A further risk is the illegal retention of some profiles from innocent individuals, which may lead to privacy issues and legal challenges. In conclusion, the PoFA regime appears to be working well, however, critical research is still needed to evaluate its overall efficacy compared to other retention regimes. Copyright © 2018 Elsevier B.V. All rights reserved.
An Internet-Accessible DNA Sequence Database for Identifying Fusaria from Human and Animal Infections

USDA-ARS?s Scientific Manuscript database

Because less than one-third of clinically relevant fusaria can be accurately identified to species level using phenotypic data (i.e., morphological species recognition), we constructed a three-locus DNA sequence database to facilitate molecular identification of the 69 Fusarium species associated wi...
Public involvement in pharmacogenomics research: a national survey on public attitudes towards pharmacogenomics research and the willingness to donate DNA samples to a DNA bank in Japan.

PubMed

Kobayashi, Eriko; Satoh, Nobunori

2009-11-01

To assess the attitudes of the Japanese general public towards pharmacogenomics research and a DNA bank for identifying genomic markers associated with ADRs and their willingness to donate DNA samples, we conducted a national survey for 1,103 Japanese adults from the general public, not a patient population. The response rate was 36.8%. The majority of the respondents showed a positive attitude towards pharmacogenomics research (81.0%) and a DNA bank (70.4%). Considering fictitious clinical situations such as taking medications and experiencing ADRs, the willingness to donate DNA samples when experiencing ADRs (61.7%) was higher than when taking medications (45.3%). Older generations were significantly associated with a decreased willingness to donate (OR = 0.45, CI 0.28-0.72 in 50s. OR = 0.49, CI: 0.31-0.77 in 60s). Positive attitudes towards pharmacogenomics research, a DNA bank, blood/bone marrow/organ donation were significantly associated with an increased willingness. However, the respondents had the following concerns regarding a DNA bank: the confidentiality of their personal information, the manner by which research results were utilized and simply the use of their own DNA for research. In order to attain public understanding to overcome these concerns, a process of public awareness should be put into place to emphasize the beneficial aspects of identifying genomic markers associated with ADRs and to address these concerns raised in our study. Further study is needed to assess the willingness of actual patients taking medications in real situations, since the respondents in our study were from the general public, not a patient population, and their willingness was assessed on the condition of assuming that they were patients taking medications.
Identification of functional enolase genes of the silkworm Bombyx mori from public databases with a combination of dry and wet bench processes.

PubMed

Kikuchi, Akira; Nakazato, Takeru; Ito, Katsuhiko; Nojima, Yosui; Yokoyama, Takeshi; Iwabuchi, Kikuo; Bono, Hidemasa; Toyoda, Atsushi; Fujiyama, Asao; Sato, Ryoichi; Tabunoki, Hiroko

2017-01-13

Various insect species have been added to genomic databases over the years. Thus, researchers can easily obtain online genomic information on invertebrates and insects. However, many incorrectly annotated genes are included in these databases, which can prevent the correct interpretation of subsequent functional analyses. To address this problem, we used a combination of dry and wet bench processes to select functional genes from public databases. Enolase is an important glycolytic enzyme in all organisms. We used a combination of dry and wet bench processes to identify functional enolases in the silkworm Bombyx mori (BmEno). First, we detected five annotated enolases from public databases using a Hidden Markov Model (HMM) search, and then through cDNA cloning, Northern blotting, and RNA-seq analysis, we revealed three functional enolases in B. mori: BmEno1, BmEno2, and BmEnoC. BmEno1 contained a conserved key amino acid residue for metal binding and substrate binding in other species. However, BmEno2 and BmEnoC showed a change in this key amino acid. Phylogenetic analysis showed that BmEno2 and BmEnoC were distinct from BmEno1 and other enolases, and were distributed only in lepidopteran clusters. BmEno1 was expressed in all of the tissues used in our study. In contrast, BmEno2 was mainly expressed in the testis with some expression in the ovary and suboesophageal ganglion. BmEnoC was weakly expressed in the testis. Quantitative RT-PCR showed that the mRNA expression of BmEno2 and BmEnoC correlated with testis development; thus, BmEno2 and BmEnoC may be related to lepidopteran-specific spermiogenesis. We identified and characterized three functional enolases from public databases with a combination of dry and wet bench processes in the silkworm B. mori. In addition, we determined that BmEno2 and BmEnoC had species-specific functions. Our strategy could be helpful for the detection of minor genes and functional genes in non-model organisms from public databases.
From metaphor to practices: The introduction of "information engineers" into the first DNA sequence database.

PubMed

García-Sancho, Miguel

2011-01-01

This paper explores the introduction of professional systems engineers and information management practices into the first centralized DNA sequence database, developed at the European Molecular Biology Laboratory (EMBL) during the 1980s. In so doing, it complements the literature on the emergence of an information discourse after World War II and its subsequent influence in biological research. By the careers of the database creators and the computer algorithms they designed, analyzing, from the mid-1960s onwards information in biology gradually shifted from a pervasive metaphor to be embodied in practices and professionals such as those incorporated at the EMBL. I then investigate the reception of these database professionals by the EMBL biological staff, which evolved from initial disregard to necessary collaboration as the relationship between DNA, genes, and proteins turned out to be more complex than expected. The trajectories of the database professionals at the EMBL suggest that the initial subject matter of the historiography of genomics should be the long-standing practices that emerged after World War II and to a large extent originated outside biomedicine and academia. Only after addressing these practices, historians may turn to their further disciplinary assemblage in fields such as bioinformatics or biotechnology.
IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses

DOE PAGES

Paez-Espino, David; Chen, I. -Min A.; Palaniappan, Krishna; ...

2016-10-30

Viruses represent the most abundant life forms on the planet. Recent experimental and computational improvements have led to a dramatic increase in the number of viral genome sequences identified primarily from metagenomic samples. As a result of the expanding catalog of metagenomic viral sequences, there exists a need for a comprehensive computational platform integrating all these sequences with associated metadata and analytical tools. Here we present IMG/VR (https://img.jgi.doe.gov/vr/), the largest publicly available database of 3908 isolate reference DNA viruses with 264 413 computationally identified viral contigs from > 6000 ecologically diverse metagenomic samples. Approximately half of the viral contigs aremore » grouped into genetically distinct quasi-species clusters. Microbial hosts are predicted for 20 000 viral sequences, revealing nine microbial phyla previously unreported to be infected by viruses. Viral sequences can be queried using a variety of associated metadata, including habitat type and geographic location of the samples, or taxonomic classification according to hallmark viral genes. IMG/VR has a user-friendly interface that allows users to interrogate all integrated data and interact by comparingwith external sequences, thus serving as an essential resource in the viral genomics community.« less
IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses

DOE Office of Scientific and Technical Information (OSTI.GOV)

Paez-Espino, David; Chen, I. -Min A.; Palaniappan, Krishna

Viruses represent the most abundant life forms on the planet. Recent experimental and computational improvements have led to a dramatic increase in the number of viral genome sequences identified primarily from metagenomic samples. As a result of the expanding catalog of metagenomic viral sequences, there exists a need for a comprehensive computational platform integrating all these sequences with associated metadata and analytical tools. Here we present IMG/VR (https://img.jgi.doe.gov/vr/), the largest publicly available database of 3908 isolate reference DNA viruses with 264 413 computationally identified viral contigs from > 6000 ecologically diverse metagenomic samples. Approximately half of the viral contigs aremore » grouped into genetically distinct quasi-species clusters. Microbial hosts are predicted for 20 000 viral sequences, revealing nine microbial phyla previously unreported to be infected by viruses. Viral sequences can be queried using a variety of associated metadata, including habitat type and geographic location of the samples, or taxonomic classification according to hallmark viral genes. IMG/VR has a user-friendly interface that allows users to interrogate all integrated data and interact by comparingwith external sequences, thus serving as an essential resource in the viral genomics community.« less
A Large Population Genetic Study of 15 Autosomal Short Tandem Repeat Loci for Establishment of Korean DNA Profile Database

PubMed Central

Yoo, Seong Yeon; Cho, Nam Soo; Park, Myung Jin; Seong, Ki Min; Hwang, Jung Ho; Song, Seok Bean; Han, Myun Soo; Lee, Won Tae; Chung, Ki Wha

2011-01-01

Genotyping of highly polymorphic short tandem repeat (STR) markers is widely used for the genetic identification of individuals in forensic DNA analyses and in paternity disputes. The National DNA Profile Databank recently established by the DNA Identification Act in Korea contains the computerized STR DNA profiles of individuals convicted of crimes. For the establishment of a large autosomal STR loci population database, 1805 samples were obtained at random from Korean individuals and 15 autosomal STR markers were analyzed using the AmpFlSTR Identifiler PCR Amplification kit. For the 15 autosomal STR markers, no deviations from the Hardy-Weinberg equilibrium were observed. The most informative locus in our data set was the D2S1338 with a discrimination power of 0.9699. The combined matching probability was 1.521 × 10-17. This large STR profile dataset including atypical alleles will be important for the establishment of the Korean DNA database and for forensic applications. PMID:21597912
A large population genetic study of 15 autosomal short tandem repeat loci for establishment of Korean DNA profile database.

PubMed

Yoo, Seong Yeon; Cho, Nam Soo; Park, Myung Jin; Seong, Ki Min; Hwang, Jung Ho; Song, Seok Bean; Han, Myun Soo; Lee, Won Tae; Chung, Ki Wha

2011-07-01

Genotyping of highly polymorphic short tandem repeat (STR) markers is widely used for the genetic identification of individuals in forensic DNA analyses and in paternity disputes. The National DNA Profile Databank recently established by the DNA Identification Act in Korea contains the computerized STR DNA profiles of individuals convicted of crimes. For the establishment of a large autosomal STR loci population database, 1805 samples were obtained at random from Korean individuals and 15 autosomal STR markers were analyzed using the AmpFlSTR Identifiler PCR Amplification kit. For the 15 autosomal STR markers, no deviations from the Hardy-Weinberg equilibrium were observed. The most informative locus in our data set was the D2S1338 with a discrimination power of 0.9699. The combined matching probability was 1.521 × 10(-17). This large STR profile dataset including atypical alleles will be important for the establishment of the Korean DNA database and for forensic applications.

MitoAge: a database for comparative analysis of mitochondrial DNA, with a special focus on animal longevity.

PubMed

Toren, Dmitri; Barzilay, Thomer; Tacutu, Robi; Lehmann, Gilad; Muradian, Khachik K; Fraifeld, Vadim E

2016-01-04

Mitochondria are the only organelles in the animal cells that have their own genome. Due to a key role in energy production, generation of damaging factors (ROS, heat), and apoptosis, mitochondria and mtDNA in particular have long been considered one of the major players in the mechanisms of aging, longevity and age-related diseases. The rapidly increasing number of species with fully sequenced mtDNA, together with accumulated data on longevity records, provides a new fascinating basis for comparative analysis of the links between mtDNA features and animal longevity. To facilitate such analyses and to support the scientific community in carrying these out, we developed the MitoAge database containing calculated mtDNA compositional features of the entire mitochondrial genome, mtDNA coding (tRNA, rRNA, protein-coding genes) and non-coding (D-loop) regions, and codon usage/amino acids frequency for each protein-coding gene. MitoAge includes 922 species with fully sequenced mtDNA and maximum lifespan records. The database is available through the MitoAge website (www.mitoage.org or www.mitoage.info), which provides the necessary tools for searching, browsing, comparing and downloading the data sets of interest for selected taxonomic groups across the Kingdom Animalia. The MitoAge website assists in statistical analysis of different features of the mtDNA and their correlative links to longevity. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
mirPub: a database for searching microRNA publications.

PubMed

Vergoulis, Thanasis; Kanellos, Ilias; Kostoulas, Nikos; Georgakilas, Georgios; Sellis, Timos; Hatzigeorgiou, Artemis; Dalamagas, Theodore

2015-05-01

Identifying, amongst millions of publications available in MEDLINE, those that are relevant to specific microRNAs (miRNAs) of interest based on keyword search faces major obstacles. References to miRNA names in the literature often deviate from standard nomenclature for various reasons, since even the official nomenclature evolves. For instance, a single miRNA name may identify two completely different molecules or two different names may refer to the same molecule. mirPub is a database with a powerful and intuitive interface, which facilitates searching for miRNA literature, addressing the aforementioned issues. To provide effective search services, mirPub applies text mining techniques on MEDLINE, integrates data from several curated databases and exploits data from its user community following a crowdsourcing approach. Other key features include an interactive visualization service that illustrates intuitively the evolution of miRNA data, tag clouds summarizing the relevance of publications to particular diseases, cell types or tissues and access to TarBase 6.0 data to oversee genes related to miRNA publications. mirPub is freely available at http://www.microrna.gr/mirpub/. vergoulis@imis.athena-innovation.gr or dalamag@imis.athena-innovation.gr Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
PRODORIC2: the bacterial gene regulation database in 2018

PubMed Central

Dudek, Christian-Alexander; Hartlich, Juliane; Brötje, David; Jahn, Dieter

2018-01-01

Abstract Bacteria adapt to changes in their environment via differential gene expression mediated by DNA binding transcriptional regulators. The PRODORIC2 database hosts one of the largest collections of DNA binding sites for prokaryotic transcription factors. It is the result of the thoroughly redesigned PRODORIC database. PRODORIC2 is more intuitive and user-friendly. Besides significant technical improvements, the new update offers more than 1000 new transcription factor binding sites and 110 new position weight matrices for genome-wide pattern searches with the Virtual Footprint tool. Moreover, binding sites deduced from high-throughput experiments were included. Data for 6 new bacterial species including bacteria of the Rhodobacteraceae family were added. Finally, a comprehensive collection of sigma- and transcription factor data for the nosocomial pathogen Clostridium difficile is now part of the database. PRODORIC2 is publicly available at http://www.prodoric2.de. PMID:29136200
IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses.

PubMed

Paez-Espino, David; Chen, I-Min A; Palaniappan, Krishna; Ratner, Anna; Chu, Ken; Szeto, Ernest; Pillay, Manoj; Huang, Jinghua; Markowitz, Victor M; Nielsen, Torben; Huntemann, Marcel; K Reddy, T B; Pavlopoulos, Georgios A; Sullivan, Matthew B; Campbell, Barbara J; Chen, Feng; McMahon, Katherine; Hallam, Steve J; Denef, Vincent; Cavicchioli, Ricardo; Caffrey, Sean M; Streit, Wolfgang R; Webster, John; Handley, Kim M; Salekdeh, Ghasem H; Tsesmetzis, Nicolas; Setubal, Joao C; Pope, Phillip B; Liu, Wen-Tso; Rivers, Adam R; Ivanova, Natalia N; Kyrpides, Nikos C

2017-01-04

Viruses represent the most abundant life forms on the planet. Recent experimental and computational improvements have led to a dramatic increase in the number of viral genome sequences identified primarily from metagenomic samples. As a result of the expanding catalog of metagenomic viral sequences, there exists a need for a comprehensive computational platform integrating all these sequences with associated metadata and analytical tools. Here we present IMG/VR (https://img.jgi.doe.gov/vr/), the largest publicly available database of 3908 isolate reference DNA viruses with 264 413 computationally identified viral contigs from >6000 ecologically diverse metagenomic samples. Approximately half of the viral contigs are grouped into genetically distinct quasi-species clusters. Microbial hosts are predicted for 20 000 viral sequences, revealing nine microbial phyla previously unreported to be infected by viruses. Viral sequences can be queried using a variety of associated metadata, including habitat type and geographic location of the samples, or taxonomic classification according to hallmark viral genes. IMG/VR has a user-friendly interface that allows users to interrogate all integrated data and interact by comparing with external sequences, thus serving as an essential resource in the viral genomics community. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Recommendations of the DNA Commission of the International Society for Forensic Genetics (ISFG) on quality control of autosomal Short Tandem Repeat allele frequency databasing (STRidER).

PubMed

Bodner, Martin; Bastisch, Ingo; Butler, John M; Fimmers, Rolf; Gill, Peter; Gusmão, Leonor; Morling, Niels; Phillips, Christopher; Prinz, Mechthild; Schneider, Peter M; Parson, Walther

2016-09-01

The statistical evaluation of autosomal Short Tandem Repeat (STR) genotypes is based on allele frequencies. These are empirically determined from sets of randomly selected human samples, compiled into STR databases that have been established in the course of population genetic studies. There is currently no agreed procedure of performing quality control of STR allele frequency databases, and the reliability and accuracy of the data are largely based on the responsibility of the individual contributing research groups. It has been demonstrated with databases of haploid markers (EMPOP for mitochondrial mtDNA, and YHRD for Y-chromosomal loci) that centralized quality control and data curation is essential to minimize error. The concepts employed for quality control involve software-aided likelihood-of-genotype, phylogenetic, and population genetic checks that allow the researchers to compare novel data to established datasets and, thus, maintain the high quality required in forensic genetics. Here, we present STRidER (http://strider.online), a publicly available, centrally curated online allele frequency database and quality control platform for autosomal STRs. STRidER expands on the previously established ENFSI DNA WG STRbASE and applies standard concepts established for haploid and autosomal markers as well as novel tools to reduce error and increase the quality of autosomal STR data. The platform constitutes a significant improvement and innovation for the scientific community, offering autosomal STR data quality control and reliable STR genotype estimates. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Application of a mitochondrial DNA control region frequency database for UK domestic cats.

PubMed

Ottolini, Barbara; Lall, Gurdeep Matharu; Sacchini, Federico; Jobling, Mark A; Wetton, Jon H

2017-03-01

DNA variation in 402bp of the mitochondrial control region flanked by repeat sequences RS2 and RS3 was evaluated by Sanger sequencing in 152 English domestic cats, in order to determine the significance of matching DNA sequences between hairs found with a victim's body and the suspect's pet cat. Whilst 95% of English cats possessed one of the twelve globally widespread mitotypes, four new variants were observed, the most common of which (2% frequency) was shared with the evidential samples. No significant difference in mitotype frequency was seen between 32 individuals from the locality of the crime and 120 additional cats from the rest of England, suggesting a lack of local population structure. However, significant differences were observed in comparison with frequencies in other countries, including the closely neighbouring Netherlands, highlighting the importance of appropriate genetic databases when determining the evidential significance of mitochondrial DNA evidence. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Bioethical Biobanks: Three Concerns in Designing and Using Law Enforcement DNA Identification Databases

DOE Office of Scientific and Technical Information (OSTI.GOV)

D.H. Kaye

2006-10-19

Federal and state law enforcement authorities have amassed large collections of DNA samples and the identifying profiles derived from them. These databases help to identify the guilty and to exonerate the innocent, but as the databanks grow, so do fears about civil liberties. The research reported here discusses three legal and social policy issues that have been raised in regard to these biobanks—the choice of loci to type for identifying individuals, the indefinite retention of DNA samples, and the use of the DNA samples or the identifying profiles for research purposes. It also considers the possible value of the databasesmore » for research into the genetics of human behavior and the ethics of using them for this purpose. It rejects the broad claim that such research is inherently unethical but proposes procedures for ensuring that the value of the proposed research justifies any psychosocial or other risks to the subjects of the research.« less
REBASE--a database for DNA restriction and modification: enzymes, genes and genomes.

PubMed

Roberts, Richard J; Vincze, Tamas; Posfai, Janos; Macelis, Dana

2015-01-01

REBASE is a comprehensive and fully curated database of information about the components of restriction-modification (RM) systems. It contains fully referenced information about recognition and cleavage sites for both restriction enzymes and methyltransferases as well as commercial availability, methylation sensitivity, crystal and sequence data. All genomes that are completely sequenced are analyzed for RM system components, and with the advent of PacBio sequencing, the recognition sequences of DNA methyltransferases (MTases) are appearing rapidly. Thus, Type I and Type III systems can now be characterized in terms of recognition specificity merely by DNA sequencing. The contents of REBASE may be browsed from the web http://rebase.neb.com and selected compilations can be downloaded by FTP (ftp.neb.com). Monthly updates are also available via email. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Novel statistical tools for management of public databases facilitate community-wide replicability and control of false discovery.

PubMed

Rosset, Saharon; Aharoni, Ehud; Neuvirth, Hani

2014-07-01

Issues of publication bias, lack of replicability, and false discovery have long plagued the genetics community. Proper utilization of public and shared data resources presents an opportunity to ameliorate these problems. We present an approach to public database management that we term Quality Preserving Database (QPD). It enables perpetual use of the database for testing statistical hypotheses while controlling false discovery and avoiding publication bias on the one hand, and maintaining testing power on the other hand. We demonstrate it on a use case of a replication server for GWAS findings, underlining its practical utility. We argue that a shift to using QPD in managing current and future biological databases will significantly enhance the community's ability to make efficient and statistically sound use of the available data resources. © 2014 WILEY PERIODICALS, INC.
76 FR 77533 - Notice of Order: Revisions to Enterprise Public Use Database Incorporating High-Cost Single...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-12-13

... FEDERAL HOUSING FINANCE AGENCY [No. 2011-N-13] Notice of Order: Revisions to Enterprise Public Use Database Incorporating High-Cost Single-Family Securitized Loan Data Fields and Technical Data Field..., regarding FHFA's adoption of an Order revising FHFA's Public Use Database matrices to include certain data...
Genomics and Public Health Research: Can the State Allow Access to Genomic Databases?

PubMed Central

Cousineau, J; Girard, N; Monardes, C; Leroux, T; Jean, M Stanton

2012-01-01

Because many diseases are multifactorial disorders, the scientific progress in genomics and genetics should be taken into consideration in public health research. In this context, genomic databases will constitute an important source of information. Consequently, it is important to identify and characterize the State’s role and authority on matters related to public health, in order to verify whether it has access to such databases while engaging in public health genomic research. We first consider the evolution of the concept of public health, as well as its core functions, using a comparative approach (e.g. WHO, PAHO, CDC and the Canadian province of Quebec). Following an analysis of relevant Quebec legislation, the precautionary principle is examined as a possible avenue to justify State access to and use of genomic databases for research purposes. Finally, we consider the Influenza pandemic plans developed by WHO, Canada, and Quebec, as examples of key tools framing public health decision-making process. We observed that State powers in public health, are not, in Quebec, well adapted to the expansion of genomics research. We propose that the scope of the concept of research in public health should be clear and include the following characteristics: a commitment to the health and well-being of the population and to their determinants; the inclusion of both applied research and basic research; and, an appropriate model of governance (authorization, follow-up, consent, etc.). We also suggest that the strategic approach version of the precautionary principle could guide collective choices in these matters. PMID:23113174
Enhancing the GABI-Kat Arabidopsis thaliana T-DNA Insertion Mutant Database by Incorporating Araport11 Annotation.

PubMed

Kleinboelting, Nils; Huep, Gunnar; Weisshaar, Bernd

2017-01-01

SimpleSearch provides access to a database containing information about T-DNA insertion lines of the GABI-Kat collection of Arabidopsis thaliana mutants. These mutants are an important tool for reverse genetics, and GABI-Kat is the second largest collection of such T-DNA insertion mutants. Insertion sites were deduced from flanking sequence tags (FSTs), and the database contains information about mutant plant lines as well as insertion alleles. Here, we describe improvements within the interface (available at http://www.gabi-kat.de/db/genehits.php) and with regard to the database content that have been realized in the last five years. These improvements include the integration of the Araport11 genome sequence annotation data containing the recently updated A. thaliana structural gene descriptions, an updated visualization component that displays groups of insertions with very similar insertion positions, mapped confirmation sequences, and primers. The visualization component provides a quick way to identify insertions of interest, and access to improved data about the exact structure of confirmed insertion alleles. In addition, the database content has been extended by incorporating additional insertion alleles that were detected during the confirmation process, as well as by adding new FSTs that have been produced during continued efforts to complement gaps in FST availability. Finally, the current database content regarding predicted and confirmed insertion alleles as well as primer sequences has been made available as downloadable flat files. © The Author 2016. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists.
Progress towards a Spacecraft-Associated Microbial Meta-database (SAMM)

NASA Astrophysics Data System (ADS)

Mogul, Rakesh; Keagy, Laura; Nava, Argelia; Zerehi, Farah

The microbial inventories within the assembly facilities for spacecraft represent the primary pool of forward contaminants that may compromise life-detection missions. Accordingly, we are constructing a meta-database of these microorganisms for the purpose of building a bioinformatic resource for planetary protection and astrobiology-related endeavors. Using student-led efforts, the meta-database is being constructed from literature reports and is inclusive of both isolated microorganisms and those solely detected through DNA-based techniques. The Spacecraft-Associated Microbial Meta-database (SAMM) currently includes over 800 entries that are organized using 32 meta-tags involving taxonomy, location of isolation (facility and component), category of characterization (culture and/or genetic), types of characterizations (e.g., culture, 16s rDNA, phylochip, FAME, and DNA hybridization), growth conditions, Gram stain, and general physiological traits (e.g., sporulation, extremotolerance, and respiration properties). Interrogations on the database show that the cleanrooms at Kennedy Space Center (KSC) are ~ 2-fold greater in diversity in bacterial genera when compared to the Jet Propulsion Laboratory (JPL), and that bacteria related to water, plant, and human environments are more often associated with the KSC-specific genera. These results are parallel to those reported in the literature, and hence serve as benchmarks demonstrating the bioinformatic potential of this meta-database. The ultimate plans for SAMM include public availability, expansion through crowdsourcing efforts, and potential use as a companion resource to the culture collections assembled by DSMZ and JPL.
Clinical Variant Classification: A Comparison of Public Databases and a Commercial Testing Laboratory.

PubMed

Gradishar, William; Johnson, KariAnne; Brown, Krystal; Mundt, Erin; Manley, Susan

2017-07-01

There is a growing move to consult public databases following receipt of a genetic test result from a clinical laboratory; however, the well-documented limitations of these databases call into question how often clinicians will encounter discordant variant classifications that may introduce uncertainty into patient management. Here, we evaluate discordance in BRCA1 and BRCA2 variant classifications between a single commercial testing laboratory and a public database commonly consulted in clinical practice. BRCA1 and BRCA2 variant classifications were obtained from ClinVar and compared with the classifications from a reference laboratory. Full concordance and discordance were determined for variants whose ClinVar entries were of the same pathogenicity (pathogenic, benign, or uncertain). Variants with conflicting ClinVar classifications were considered partially concordant if ≥1 of the listed classifications agreed with the reference laboratory classification. Four thousand two hundred and fifty unique BRCA1 and BRCA2 variants were available for analysis. Overall, 73.2% of classifications were fully concordant and 12.3% were partially concordant. The remaining 14.5% of variants had discordant classifications, most of which had a definitive classification (pathogenic or benign) from the reference laboratory compared with an uncertain classification in ClinVar (14.0%). Here, we show that discrepant classifications between a public database and single reference laboratory potentially account for 26.7% of variants in BRCA1 and BRCA2 . The time and expertise required of clinicians to research these discordant classifications call into question the practicality of checking all test results against a database and suggest that discordant classifications should be interpreted with these limitations in mind. With the increasing use of clinical genetic testing for hereditary cancer risk, accurate variant classification is vital to ensuring appropriate medical management
DSSTOX WEBSITE LAUNCH: IMPROVING PUBLIC ACCESS TO DATABASES FOR BUILDING STRUCTURE-TOXICITY PREDICTION MODELS

EPA Science Inventory

DSSTox Website Launch: Improving Public Access to Databases for Building Structure-Toxicity Prediction Models
Ann M. Richard
US Environmental Protection Agency, Research Triangle Park, NC, USA

Distributed: Decentralized set of standardized, field-delimited databases,...
dndDB: a database focused on phosphorothioation of the DNA backbone.

PubMed

Ou, Hong-Yu; He, Xinyi; Shao, Yucheng; Tai, Cui; Rajakumar, Kumar; Deng, Zixin

2009-01-01

The Dnd DNA degradation phenotype was first observed during electrophoresis of genomic DNA from Streptomyces lividans more than 20 years ago. It was subsequently shown to be governed by the five-gene dnd cluster. Similar gene clusters have now been found to be widespread among many other distantly related bacteria. Recently the dnd cluster was shown to mediate the incorporation of sulphur into the DNA backbone via a sequence-selective, stereo-specific phosphorothioate modification in Escherichia coli B7A. Intriguingly, to date all identified dnd clusters lie within mobile genetic elements, the vast majority in laterally transferred genomic islands. We organized available data from experimental and bioinformatics analyses about the DNA phosphorothioation phenomenon and associated documentation as a dndDB database. It contains the following detailed information: (i) Dnd phenotype; (ii) dnd gene clusters; (iii) genomic islands harbouring dnd genes; (iv) Dnd proteins and conserved domains. As of 25 December 2008, dndDB contained data corresponding to 24 bacterial species exhibiting the Dnd phenotype reported in the scientific literature. In addition, via in silico analysis, dndDB identified 26 syntenic dnd clusters from 25 species of Eubacteria and Archaea, 25 dnd-bearing genomic islands and one dnd plasmid containing 114 dnd genes. A further 397 other genes coding for proteins with varying levels of similarity to Dnd proteins were also included in dndDB. A broad range of similarity search, sequence alignment and phylogenetic tools are readily accessible to allow for to individualized directions of research focused on dnd genes. dndDB can facilitate efficient investigation of a wide range of aspects relating to dnd DNA modification and other island-encoded functions in host organisms. dndDB version 1.0 is freely available at http://mml.sjtu.edu.cn/dndDB/.
Minors or suspects? A discussion of the legal and ethical issues surrounding the indefinite storage of DNA collected from children aged 10-18 years on the National DNA Database in England and Wales.

PubMed

Mansel, Charlotte; Davies, Sharon

2012-10-01

There are currently over 250,000 children between the ages of 10 and 18 years who have their genetic information stored on the National DNA Database. This paper explores the legal and ethical issues surrounding this controversial subject, with particular focus on juvenile capacity and the potential results of criminalizing young children and adolescents. The implications of the adverse legal judgement of the European Court of Human Rights in S and Marper v UK (2008) and the violation of Article 8 of the Convention are discussed. The authors have considered the requirement to balance the rights of the individual, particularly those of minors, against the need to protect the public and have compared the position in Scotland to that of the rest of the UK. The authors conclude that a more ethically acceptable alternative could be the creation of a separate forensic database for children aged 10-18 years, set up to safeguard the interests of those who have not been convicted of any crime.
Characterizing the genetic structure of a forensic DNA database using a latent variable approach.

PubMed

Kruijver, Maarten

2016-07-01

Several problems in forensic genetics require a representative model of a forensic DNA database. Obtaining an accurate representation of the offender database can be difficult, since databases typically contain groups of persons with unregistered ethnic origins in unknown proportions. We propose to estimate the allele frequencies of the subpopulations comprising the offender database and their proportions from the database itself using a latent variable approach. We present a model for which parameters can be estimated using the expectation maximization (EM) algorithm. This approach does not rely on relatively small and possibly unrepresentative population surveys, but is driven by the actual genetic composition of the database only. We fit the model to a snapshot of the Dutch offender database (2014), which contains close to 180,000 profiles, and find that three subpopulations suffice to describe a large fraction of the heterogeneity in the database. We demonstrate the utility and reliability of the approach with three applications. First, we use the model to predict the number of false leads obtained in database searches. We assess how well the model predicts the number of false leads obtained in mock searches in the Dutch offender database, both for the case of familial searching for first degree relatives of a donor and searching for contributors to three-person mixtures. Second, we study the degree of partial matching between all pairs of profiles in the Dutch database and compare this to what is predicted using the latent variable approach. Third, we use the model to provide evidence to support that the Dutch practice of estimating match probabilities using the Balding-Nichols formula with a native Dutch reference database and θ=0.03 is conservative. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
The Hawaiian Algal Database: a laboratory LIMS and online resource for biodiversity data

PubMed Central

Wang, Norman; Sherwood, Alison R; Kurihara, Akira; Conklin, Kimberly Y; Sauvage, Thomas; Presting, Gernot G

2009-01-01

Background Organization and presentation of biodiversity data is greatly facilitated by databases that are specially designed to allow easy data entry and organized data display. Such databases also have the capacity to serve as Laboratory Information Management Systems (LIMS). The Hawaiian Algal Database was designed to showcase specimens collected from the Hawaiian Archipelago, enabling users around the world to compare their specimens with our photographs and DNA sequence data, and to provide lab personnel with an organizational tool for storing various biodiversity data types. Description We describe the Hawaiian Algal Database, a comprehensive and searchable database containing photographs and micrographs, geo-referenced collecting information, taxonomic checklists and standardized DNA sequence data. All data for individual samples are linked through unique accession numbers. Users can search online for sample information by accession number, numerous levels of taxonomy, or collection site. At the present time the database contains data representing over 2,000 samples of marine, freshwater and terrestrial algae from the Hawaiian Archipelago. These samples are primarily red algae, although other taxa are being added. Conclusion The Hawaiian Algal Database is a digital repository for Hawaiian algal samples and acts as a LIMS for the laboratory. Users can make use of the online search tool to view and download specimen photographs and micrographs, DNA sequences and relevant habitat data, including georeferenced collecting locations. It is publicly available at . PMID:19728892
PubDNA Finder: a web database linking full-text articles to sequences of nucleic acids.

PubMed

García-Remesal, Miguel; Cuevas, Alejandro; Pérez-Rey, David; Martín, Luis; Anguita, Alberto; de la Iglesia, Diana; de la Calle, Guillermo; Crespo, José; Maojo, Víctor

2010-11-01

PubDNA Finder is an online repository that we have created to link PubMed Central manuscripts to the sequences of nucleic acids appearing in them. It extends the search capabilities provided by PubMed Central by enabling researchers to perform advanced searches involving sequences of nucleic acids. This includes, among other features (i) searching for papers mentioning one or more specific sequences of nucleic acids and (ii) retrieving the genetic sequences appearing in different articles. These additional query capabilities are provided by a searchable index that we created by using the full text of the 176 672 papers available at PubMed Central at the time of writing and the sequences of nucleic acids appearing in them. To automatically extract the genetic sequences occurring in each paper, we used an original method we have developed. The database is updated monthly by automatically connecting to the PubMed Central FTP site to retrieve and index new manuscripts. Users can query the database via the web interface provided. PubDNA Finder can be freely accessed at http://servet.dia.fi.upm.es:8080/pubdnafinder

Addition of a breeding database in the Genome Database for Rosaceae.

PubMed

Evans, Kate; Jung, Sook; Lee, Taein; Brutcher, Lisa; Cho, Ilhyung; Peace, Cameron; Main, Dorrie

2013-01-01

Breeding programs produce large datasets that require efficient management systems to keep track of performance, pedigree, geographical and image-based data. With the development of DNA-based screening technologies, more breeding programs perform genotyping in addition to phenotyping for performance evaluation. The integration of breeding data with other genomic and genetic data is instrumental for the refinement of marker-assisted breeding tools, enhances genetic understanding of important crop traits and maximizes access and utility by crop breeders and allied scientists. Development of new infrastructure in the Genome Database for Rosaceae (GDR) was designed and implemented to enable secure and efficient storage, management and analysis of large datasets from the Washington State University apple breeding program and subsequently expanded to fit datasets from other Rosaceae breeders. The infrastructure was built using the software Chado and Drupal, making use of the Natural Diversity module to accommodate large-scale phenotypic and genotypic data. Breeders can search accessions within the GDR to identify individuals with specific trait combinations. Results from Search by Parentage lists individuals with parents in common and results from Individual Variety pages link to all data available on each chosen individual including pedigree, phenotypic and genotypic information. Genotypic data are searchable by markers and alleles; results are linked to other pages in the GDR to enable the user to access tools such as GBrowse and CMap. This breeding database provides users with the opportunity to search datasets in a fully targeted manner and retrieve and compare performance data from multiple selections, years and sites, and to output the data needed for variety release publications and patent applications. The breeding database facilitates efficient program management. Storing publicly available breeding data in a database together with genomic and genetic data will
Accessing the public MIMIC-II intensive care relational database for clinical research.

PubMed

Scott, Daniel J; Lee, Joon; Silva, Ikaro; Park, Shinhyuk; Moody, George B; Celi, Leo A; Mark, Roger G

2013-01-10

The Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) database is a free, public resource for intensive care research. The database was officially released in 2006, and has attracted a growing number of researchers in academia and industry. We present the two major software tools that facilitate accessing the relational database: the web-based QueryBuilder and a downloadable virtual machine (VM) image. QueryBuilder and the MIMIC-II VM have been developed successfully and are freely available to MIMIC-II users. Simple example SQL queries and the resulting data are presented. Clinical studies pertaining to acute kidney injury and prediction of fluid requirements in the intensive care unit are shown as typical examples of research performed with MIMIC-II. In addition, MIMIC-II has also provided data for annual PhysioNet/Computing in Cardiology Challenges, including the 2012 Challenge "Predicting mortality of ICU Patients". QueryBuilder is a web-based tool that provides easy access to MIMIC-II. For more computationally intensive queries, one can locally install a complete copy of MIMIC-II in a VM. Both publicly available tools provide the MIMIC-II research community with convenient querying interfaces and complement the value of the MIMIC-II relational database.
Accessing the public MIMIC-II intensive care relational database for clinical research

PubMed Central

2013-01-01

Background The Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) database is a free, public resource for intensive care research. The database was officially released in 2006, and has attracted a growing number of researchers in academia and industry. We present the two major software tools that facilitate accessing the relational database: the web-based QueryBuilder and a downloadable virtual machine (VM) image. Results QueryBuilder and the MIMIC-II VM have been developed successfully and are freely available to MIMIC-II users. Simple example SQL queries and the resulting data are presented. Clinical studies pertaining to acute kidney injury and prediction of fluid requirements in the intensive care unit are shown as typical examples of research performed with MIMIC-II. In addition, MIMIC-II has also provided data for annual PhysioNet/Computing in Cardiology Challenges, including the 2012 Challenge “Predicting mortality of ICU Patients”. Conclusions QueryBuilder is a web-based tool that provides easy access to MIMIC-II. For more computationally intensive queries, one can locally install a complete copy of MIMIC-II in a VM. Both publicly available tools provide the MIMIC-II research community with convenient querying interfaces and complement the value of the MIMIC-II relational database. PMID:23302652
BioQ: tracing experimental origins in public genomic databases using a novel data provenance model.

PubMed

Saccone, Scott F; Quan, Jiaxi; Jones, Peter L

2012-04-15

Public genomic databases, which are often used to guide genetic studies of human disease, are now being applied to genomic medicine through in silico integrative genomics. These databases, however, often lack tools for systematically determining the experimental origins of the data. We introduce a new data provenance model that we have implemented in a public web application, BioQ, for assessing the reliability of the data by systematically tracing its experimental origins to the original subjects and biologics. BioQ allows investigators to both visualize data provenance as well as explore individual elements of experimental process flow using precise tools for detailed data exploration and documentation. It includes a number of human genetic variation databases such as the HapMap and 1000 Genomes projects. BioQ is freely available to the public at http://bioq.saclab.net.
HEDS - EPA DATABASE SYSTEM FOR PUBLIC ACCESS TO HUMAN EXPOSURE DATA

EPA Science Inventory

Human Exposure Database System (HEDS) is an Internet-based system developed to provide public access to human-exposure-related data from studies conducted by EPA's National Exposure Research Laboratory (NERL). HEDS was designed to work with the EPA Office of Research and Devel...
Publication trend, resource utilization, and impact of the US National Cancer Database

PubMed Central

Su, Chang; Peng, Cuiying; Agbodza, Ena; Bai, Harrison X.; Huang, Yuqian; Karakousis, Giorgos; Zhang, Paul J.; Zhang, Zishu

2018-01-01

Abstract Background: The utilization and impact of the studies published using the National Cancer Database (NCDB) is currently unclear. In this study, we aim to characterize the published studies, and identify relatively unexplored areas for future investigations. Methods: A literature search was performed using PubMed in January 2017 to identify all papers published using NCDB data. Characteristics of the publications were extracted. Citation frequencies were obtained through the Web of Science. Results: Three hundred 2 articles written by 230 first authors met the inclusion criteria. The number of publications grew exponentially since 2013, with 108 articles published in 2016. Articles were published in 86 journals. The majority of the published papers focused on digestive system cancer, while bone and joints, eye and orbit, myeloma, mesothelioma, and Kaposi Sarcoma were never studied. Thirteen institutions in the United States were associated with more than 5 publications. The papers have been cited for a total of 9858 times since the publication of the first paper in 1992. Frequently appearing keywords congregated into 3 clusters: “demographics,” “treatments and survival,” and “statistical analysis method.” Even though the main focuses of the articles captured a extremely wide range, they can be classified into 2 main categories: survival analysis and characterization. Other focuses include database(s) analysis and/or comparison, and hospital reporting. Conclusion: The surging interest in the use of NCDB is accompanied by unequal utilization of resources by individuals and institutions. Certain areas were relatively understudied and should be further explored. PMID:29489679
Large-scale annotation of small-molecule libraries using public databases.

PubMed

Zhou, Yingyao; Zhou, Bin; Chen, Kaisheng; Yan, S Frank; King, Frederick J; Jiang, Shumei; Winzeler, Elizabeth A

2007-01-01

While many large publicly accessible databases provide excellent annotation for biological macromolecules, the same is not true for small chemical compounds. Commercial data sources also fail to encompass an annotation interface for large numbers of compounds and tend to be cost prohibitive to be widely available to biomedical researchers. Therefore, using annotation information for the selection of lead compounds from a modern day high-throughput screening (HTS) campaign presently occurs only under a very limited scale. The recent rapid expansion of the NIH PubChem database provides an opportunity to link existing biological databases with compound catalogs and provides relevant information that potentially could improve the information garnered from large-scale screening efforts. Using the 2.5 million compound collection at the Genomics Institute of the Novartis Research Foundation (GNF) as a model, we determined that approximately 4% of the library contained compounds with potential annotation in such databases as PubChem and the World Drug Index (WDI) as well as related databases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) and ChemIDplus. Furthermore, the exact structure match analysis showed 32% of GNF compounds can be linked to third party databases via PubChem. We also showed annotations such as MeSH (medical subject headings) terms can be applied to in-house HTS databases in identifying signature biological inhibition profiles of interest as well as expediting the assay validation process. The automated annotation of thousands of screening hits in batch is becoming feasible and has the potential to play an essential role in the hit-to-lead decision making process.
NCI at Frederick Scientific Library Reintroduces Scientific Publications Database | Poster

Cancer.gov

A 20-year-old database of scientific publications by NCI at Frederick, FNLCR, and affiliated employees has gotten a significant facelift. Maintained by the Scientific Library, the redesigned database—which is linked from each of the Scientific Library’s web pages—offers features that were not available in previous versions, such as additional search limits and non-traditional
Widespread recombination in published animal mtDNA sequences.

PubMed

Tsaousis, A D; Martin, D P; Ladoukakis, E D; Posada, D; Zouros, E

2005-04-01

Mitochondrial DNA (mtDNA) recombination has been observed in several animal species, but there are doubts as to whether it is common or only occurs under special circumstances. Animal mtDNA sequences retrieved from public databases were unambiguously aligned and rigorously tested for evidence of recombination. At least 30 recombination events were detected among 186 alignments examined. Recombinant sequences were found in invertebrates and vertebrates, including primates. It appears that mtDNA recombination may occur regularly in the animal cell but rarely produces new haplotypes because of homoplasmy. Common animal mtDNA recombination would necessitate a reexamination of phylogenetic and biohistorical inference based on the assumption of clonal mtDNA transmission. Recombination may also have an important role in producing and purging mtDNA mutations and thus in mtDNA-based diseases and senescence.
The Brainomics/Localizer database.

PubMed

Papadopoulos Orfanos, Dimitri; Michel, Vincent; Schwartz, Yannick; Pinel, Philippe; Moreno, Antonio; Le Bihan, Denis; Frouin, Vincent

2017-01-01

The Brainomics/Localizer database exposes part of the data collected by the in-house Localizer project, which planned to acquire four types of data from volunteer research subjects: anatomical MRI scans, functional MRI data, behavioral and demographic data, and DNA sampling. Over the years, this local project has been collecting such data from hundreds of subjects. We had selected 94 of these subjects for their complete datasets, including all four types of data, as the basis for a prior publication; the Brainomics/Localizer database publishes the data associated with these 94 subjects. Since regulatory rules prevent us from making genetic data available for download, the database serves only anatomical MRI scans, functional MRI data, behavioral and demographic data. To publish this set of heterogeneous data, we use dedicated software based on the open-source CubicWeb semantic web framework. Through genericity in the data model and flexibility in the display of data (web pages, CSV, JSON, XML), CubicWeb helps us expose these complex datasets in original and efficient ways. Copyright © 2015 Elsevier Inc. All rights reserved.
dbAMEPNI: a database of alanine mutagenic effects for protein–nucleic acid interactions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, Ling; Xiong, Yi; Gao, Hongyun

Protein–nucleic acid interactions play essential roles in various biological activities such as gene regulation, transcription, DNA repair and DNA packaging. Understanding the effects of amino acid substitutions on protein–nucleic acid binding affinities can help elucidate the molecular mechanism of protein–nucleic acid recognition. Until now, no comprehensive and updated database of quantitative binding data on alanine mutagenic effects for protein–nucleic acid interactions is publicly accessible. Thus, we developed a new database of Alanine Mutagenic Effects for Protein-Nucleic Acid Interactions (dbAMEPNI). dbAMEPNI is a manually curated, literature-derived database, comprising over 577 alanine mutagenic data with experimentally determined binding affinities for protein–nucleic acidmore » complexes. Here, it contains several important parameters, such as dissociation constant (Kd), Gibbs free energy change (ΔΔG), experimental conditions and structural parameters of mutant residues. In addition, the database provides an extended dataset of 282 single alanine mutations with only qualitative data (or descriptive effects) of thermodynamic information.« less
dbAMEPNI: a database of alanine mutagenic effects for protein–nucleic acid interactions

DOE PAGES

Liu, Ling; Xiong, Yi; Gao, Hongyun; ...

2018-04-02

Protein–nucleic acid interactions play essential roles in various biological activities such as gene regulation, transcription, DNA repair and DNA packaging. Understanding the effects of amino acid substitutions on protein–nucleic acid binding affinities can help elucidate the molecular mechanism of protein–nucleic acid recognition. Until now, no comprehensive and updated database of quantitative binding data on alanine mutagenic effects for protein–nucleic acid interactions is publicly accessible. Thus, we developed a new database of Alanine Mutagenic Effects for Protein-Nucleic Acid Interactions (dbAMEPNI). dbAMEPNI is a manually curated, literature-derived database, comprising over 577 alanine mutagenic data with experimentally determined binding affinities for protein–nucleic acidmore » complexes. Here, it contains several important parameters, such as dissociation constant (Kd), Gibbs free energy change (ΔΔG), experimental conditions and structural parameters of mutant residues. In addition, the database provides an extended dataset of 282 single alanine mutations with only qualitative data (or descriptive effects) of thermodynamic information.« less
DNA variant databases improve test accuracy and phenotype prediction in Alport syndrome.

PubMed

Savige, Judy; Ars, Elisabet; Cotton, Richard G H; Crockett, David; Dagher, Hayat; Deltas, Constantinos; Ding, Jie; Flinter, Frances; Pont-Kingdon, Genevieve; Smaoui, Nizar; Torra, Roser; Storey, Helen

2014-06-01

X-linked Alport syndrome is a form of progressive renal failure caused by pathogenic variants in the COL4A5 gene. More than 700 variants have been described and a further 400 are estimated to be known to individual laboratories but are unpublished. The major genetic testing laboratories for X-linked Alport syndrome worldwide have established a Web-based database for published and unpublished COL4A5 variants ( https://grenada.lumc.nl/LOVD2/COL4A/home.php?select_db=COL4A5 ). This conforms with the recommendations of the Human Variome Project: it uses the Leiden Open Variation Database (LOVD) format, describes variants according to the human reference sequence with standardized nomenclature, indicates likely pathogenicity and associated clinical features, and credits the submitting laboratory. The database includes non-pathogenic and recurrent variants, and is linked to another COL4A5 mutation database and relevant bioinformatics sites. Access is free. Increasing the number of COL4A5 variants in the public domain helps patients, diagnostic laboratories, clinicians, and researchers. The database improves the accuracy and efficiency of genetic testing because its variants are already categorized for pathogenicity. The description of further COL4A5 variants and clinical associations will improve our ability to predict phenotype and our understanding of collagen IV biochemistry. The database for X-linked Alport syndrome represents a model for databases in other inherited renal diseases.
Use of DNA profiles for investigation using a simulated national DNA database: Part II. Statistical and ethical considerations on familial searching.

PubMed

Hicks, T; Taroni, F; Curran, J; Buckleton, J; Castella, V; Ribaux, O

2010-10-01

Familial searching consists of searching for a full profile left at a crime scene in a National DNA Database (NDNAD). In this paper we are interested in the circumstance where no full match is returned, but a partial match is found between a database member's profile and the crime stain. Because close relatives share more of their DNA than unrelated persons, this partial match may indicate that the crime stain was left by a close relative of the person with whom the partial match was found. This approach has successfully solved important crimes in the UK and the USA. In a previous paper, a model, which takes into account substructure and siblings, was used to simulate a NDNAD. In this paper, we have used this model to test the usefulness of familial searching and offer guidelines for pre-assessment of the cases based on the likelihood ratio. Siblings of "persons" present in the simulated Swiss NDNAD were created. These profiles (N=10,000) were used as traces and were then compared to the whole database (N=100,000). The statistical results obtained show that the technique has great potential confirming the findings of previous studies. However, effectiveness of the technique is only one part of the story. Familial searching has juridical and ethical aspects that should not be ignored. In Switzerland for example, there are no specific guidelines to the legality or otherwise of familial searching. This article both presents statistical results, and addresses criminological and civil liberties aspects to take into account risks and benefits of familial searching. Copyright © 2009 Elsevier Ireland Ltd. All rights reserved.
76 FR 60031 - Notice of Order: Revisions to Enterprise Public Use Database Incorporating High-Cost Single...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-09-28

... Database Incorporating High-Cost Single-Family Securitized Loan Data Fields and Technical Data Field... single-family matrix in FHFA's Public Use Database (PUDB) to include data fields for the high-cost single... of loan attributes in FHFA's databases that could be used, singularly or in some combination, to...
Genomic Approach to Understand the Association of DNA Repair with Longevity and Healthy Aging Using Genomic Databases of Oldest-Old Population

PubMed Central

Kim, Hyun Soo

2018-01-01

Aged population is increasing worldwide due to the aging process that is inevitable. Accordingly, longevity and healthy aging have been spotlighted to promote social contribution of aged population. Many studies in the past few decades have reported the process of aging and longevity, emphasizing the importance of maintaining genomic stability in exceptionally long-lived population. Underlying reason of longevity remains unclear due to its complexity involving multiple factors. With advances in sequencing technology and human genome-associated approaches, studies based on population-based genomic studies are increasing. In this review, we summarize recent longevity and healthy aging studies of human population focusing on DNA repair as a major factor in maintaining genome integrity. To keep pace with recent growth in genomic research, aging- and longevity-associated genomic databases are also briefly introduced. To suggest novel approaches to investigate longevity-associated genetic variants related to DNA repair using genomic databases, gene set analysis was conducted, focusing on DNA repair- and longevity-associated genes. Their biological networks were additionally analyzed to grasp major factors containing genetic variants of human longevity and healthy aging in DNA repair mechanisms. In summary, this review emphasizes DNA repair activity in human longevity and suggests approach to conduct DNA repair-associated genomic study on human healthy aging.
Microcomputer Database Management Systems that Interface with Online Public Access Catalogs.

ERIC Educational Resources Information Center

Rice, James

1988-01-01

Describes a study that assessed the availability and use of microcomputer database management interfaces to online public access catalogs. The software capabilities needed to effect such an interface are identified, and available software packages are evaluated by these criteria. A directory of software vendors is provided. (4 notes with…
BioQ: tracing experimental origins in public genomic databases using a novel data provenance model

PubMed Central

Saccone, Scott F.; Quan, Jiaxi; Jones, Peter L.

2012-01-01

Motivation: Public genomic databases, which are often used to guide genetic studies of human disease, are now being applied to genomic medicine through in silico integrative genomics. These databases, however, often lack tools for systematically determining the experimental origins of the data. Results: We introduce a new data provenance model that we have implemented in a public web application, BioQ, for assessing the reliability of the data by systematically tracing its experimental origins to the original subjects and biologics. BioQ allows investigators to both visualize data provenance as well as explore individual elements of experimental process flow using precise tools for detailed data exploration and documentation. It includes a number of human genetic variation databases such as the HapMap and 1000 Genomes projects. Availability and implementation: BioQ is freely available to the public at http://bioq.saclab.net Contact: ssaccone@wustl.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22426342
UniPROBE, update 2015: new tools and content for the online database of protein-binding microarray data on protein-DNA interactions.

PubMed

Hume, Maxwell A; Barrera, Luis A; Gisselbrecht, Stephen S; Bulyk, Martha L

2015-01-01

The Universal PBM Resource for Oligonucleotide Binding Evaluation (UniPROBE) serves as a convenient source of information on published data generated using universal protein-binding microarray (PBM) technology, which provides in vitro data about the relative DNA-binding preferences of transcription factors for all possible sequence variants of a length k ('k-mers'). The database displays important information about the proteins and displays their DNA-binding specificity data in terms of k-mers, position weight matrices and graphical sequence logos. This update to the database documents the growth of UniPROBE since the last update 4 years ago, and introduces a variety of new features and tools, including a new streamlined pipeline that facilitates data deposition by universal PBM data generators in the research community, a tool that generates putative nonbinding (i.e. negative control) DNA sequences for one or more proteins and novel motifs obtained by analyzing the PBM data using the BEEML-PBM algorithm for motif inference. The UniPROBE database is available at http://uniprobe.org. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Characterization and compilation of polymorphic simple sequence repeat (SSR) markers of peanut from public database

PubMed Central

2012-01-01

Background There are several reports describing thousands of SSR markers in the peanut (Arachis hypogaea L.) genome. There is a need to integrate various research reports of peanut DNA polymorphism into a single platform. Further, because of lack of uniformity in the labeling of these markers across the publications, there is some confusion on the identities of many markers. We describe below an effort to develop a central comprehensive database of polymorphic SSR markers in peanut. Findings We compiled 1,343 SSR markers as detecting polymorphism (14.5%) within a total of 9,274 markers. Amongst all polymorphic SSRs examined, we found that AG motif (36.5%) was the most abundant followed by AAG (12.1%), AAT (10.9%), and AT (10.3%).The mean length of SSR repeats in dinucleotide SSRs was significantly longer than that in trinucleotide SSRs. Dinucleotide SSRs showed higher polymorphism frequency for genomic SSRs when compared to trinucleotide SSRs, while for EST-SSRs, the frequency of polymorphic SSRs was higher in trinucleotide SSRs than in dinucleotide SSRs. The correlation of the length of SSR and the frequency of polymorphism revealed that the frequency of polymorphism was decreased as motif repeat number increased. Conclusions The assembled polymorphic SSRs would enhance the density of the existing genetic maps of peanut, which could also be a useful source of DNA markers suitable for high-throughput QTL mapping and marker-assisted selection in peanut improvement and thus would be of value to breeders. PMID:22818284

NCI at Frederick Scientific Library Reintroduces Scientific Publications Database | Poster

Cancer.gov

A 20-year-old database of scientific publications by NCI at Frederick, FNLCR, and affiliated employees has gotten a significant facelift. Maintained by the Scientific Library, the redesigned database—which is linked from each of the Scientific Library’s web pages—offers features that were not available in previous versions, such as additional search limits and non-traditional metrics for scholarly and scientific publishing known as altmetrics.
A public HTLV-1 molecular epidemiology database for sequence management and data mining.

PubMed

Araujo, Thessika Hialla Almeida; Souza-Brito, Leandro Inacio; Libin, Pieter; Deforche, Koen; Edwards, Dustin; de Albuquerque-Junior, Antonio Eduardo; Vandamme, Anne-Mieke; Galvao-Castro, Bernardo; Alcantara, Luiz Carlos Junior

2012-01-01

It is estimated that 15 to 20 million people are infected with the human T-cell lymphotropic virus type 1 (HTLV-1). At present, there are more than 2,000 unique HTLV-1 isolate sequences published. A central database to aggregate sequence information from a range of epidemiological aspects including HTLV-1 infections, pathogenesis, origins, and evolutionary dynamics would be useful to scientists and physicians worldwide. Described here, we have developed a database that collects and annotates sequence data and can be accessed through a user-friendly search interface. The HTLV-1 Molecular Epidemiology Database website is available at http://htlv1db.bahia.fiocruz.br/. All data was obtained from publications available at GenBank or through contact with the authors. The database was developed using Apache Webserver 2.1.6 and SGBD MySQL. The webpage interfaces were developed in HTML and sever-side scripting written in PHP. The HTLV-1 Molecular Epidemiology Database is hosted on the Gonçalo Moniz/FIOCRUZ Research Center server. There are currently 2,457 registered sequences with 2,024 (82.37%) of those sequences representing unique isolates. Of these sequences, 803 (39.67%) contain information about clinical status (TSP/HAM, 17.19%; ATL, 7.41%; asymptomatic, 12.89%; other diseases, 2.17%; and no information, 60.32%). Further, 7.26% of sequences contain information on patient gender while 5.23% of sequences provide the age of the patient. The HTLV-1 Molecular Epidemiology Database retrieves and stores annotated HTLV-1 proviral sequences from clinical, epidemiological, and geographical studies. The collected sequences and related information are now accessible on a publically available and user-friendly website. This open-access database will support clinical research and vaccine development related to viral genotype.
GABI-Kat SimpleSearch: new features of the Arabidopsis thaliana T-DNA mutant database.

PubMed

Kleinboelting, Nils; Huep, Gunnar; Kloetgen, Andreas; Viehoever, Prisca; Weisshaar, Bernd

2012-01-01

T-DNA insertion mutants are very valuable for reverse genetics in Arabidopsis thaliana. Several projects have generated large sequence-indexed collections of T-DNA insertion lines, of which GABI-Kat is the second largest resource worldwide. User access to the collection and its Flanking Sequence Tags (FSTs) is provided by the front end SimpleSearch (http://www.GABI-Kat.de). Several significant improvements have been implemented recently. The database now relies on the TAIRv10 genome sequence and annotation dataset. All FSTs have been newly mapped using an optimized procedure that leads to improved accuracy of insertion site predictions. A fraction of the collection with weak FST yield was re-analysed by generating new FSTs. Along with newly found predictions for older sequences about 20,000 new FSTs were included in the database. Information about groups of FSTs pointing to the same insertion site that is found in several lines but is real only in a single line are included, and many problematic FST-to-line links have been corrected using new wet-lab data. SimpleSearch currently contains data from ~71,000 lines with predicted insertions covering 62.5% of the 27,206 nuclear protein coding genes, and offers insertion allele-specific data from 9545 confirmed lines that are available from the Nottingham Arabidopsis Stock Centre.
The Histone Database: an integrated resource for histones and histone fold-containing proteins

PubMed Central

Mariño-Ramírez, Leonardo; Levine, Kevin M.; Morales, Mario; Zhang, Suiyuan; Moreland, R. Travis; Baxevanis, Andreas D.; Landsman, David

2011-01-01

Eukaryotic chromatin is composed of DNA and protein components—core histones—that act to compactly pack the DNA into nucleosomes, the fundamental building blocks of chromatin. These nucleosomes are connected to adjacent nucleosomes by linker histones. Nucleosomes are highly dynamic and, through various core histone post-translational modifications and incorporation of diverse histone variants, can serve as epigenetic marks to control processes such as gene expression and recombination. The Histone Sequence Database is a curated collection of sequences and structures of histones and non-histone proteins containing histone folds, assembled from major public databases. Here, we report a substantial increase in the number of sequences and taxonomic coverage for histone and histone fold-containing proteins available in the database. Additionally, the database now contains an expanded dataset that includes archaeal histone sequences. The database also provides comprehensive multiple sequence alignments for each of the four core histones (H2A, H2B, H3 and H4), the linker histones (H1/H5) and the archaeal histones. The database also includes current information on solved histone fold-containing structures. The Histone Sequence Database is an inclusive resource for the analysis of chromatin structure and function focused on histones and histone fold-containing proteins. Database URL: The Histone Sequence Database is freely available and can be accessed at http://research.nhgri.nih.gov/histones/. PMID:22025671
Addition of a breeding database in the Genome Database for Rosaceae

PubMed Central

Evans, Kate; Jung, Sook; Lee, Taein; Brutcher, Lisa; Cho, Ilhyung; Peace, Cameron; Main, Dorrie

2013-01-01

Breeding programs produce large datasets that require efficient management systems to keep track of performance, pedigree, geographical and image-based data. With the development of DNA-based screening technologies, more breeding programs perform genotyping in addition to phenotyping for performance evaluation. The integration of breeding data with other genomic and genetic data is instrumental for the refinement of marker-assisted breeding tools, enhances genetic understanding of important crop traits and maximizes access and utility by crop breeders and allied scientists. Development of new infrastructure in the Genome Database for Rosaceae (GDR) was designed and implemented to enable secure and efficient storage, management and analysis of large datasets from the Washington State University apple breeding program and subsequently expanded to fit datasets from other Rosaceae breeders. The infrastructure was built using the software Chado and Drupal, making use of the Natural Diversity module to accommodate large-scale phenotypic and genotypic data. Breeders can search accessions within the GDR to identify individuals with specific trait combinations. Results from Search by Parentage lists individuals with parents in common and results from Individual Variety pages link to all data available on each chosen individual including pedigree, phenotypic and genotypic information. Genotypic data are searchable by markers and alleles; results are linked to other pages in the GDR to enable the user to access tools such as GBrowse and CMap. This breeding database provides users with the opportunity to search datasets in a fully targeted manner and retrieve and compare performance data from multiple selections, years and sites, and to output the data needed for variety release publications and patent applications. The breeding database facilitates efficient program management. Storing publicly available breeding data in a database together with genomic and genetic data will
Coverage and quality: A comparison of Web of Science and Scopus databases for reporting faculty nursing publication metrics.

PubMed

Powell, Kimberly R; Peterson, Shenita R

Web of Science and Scopus are the leading databases of scholarly impact. Recent studies outside the field of nursing report differences in journal coverage and quality. A comparative analysis of nursing publications reported impact. Journal coverage by each database for the field of nursing was compared. Additionally, publications by 2014 nursing faculty were collected in both databases and compared for overall coverage and reported quality, as modeled by Scimajo Journal Rank, peer review status, and MEDLINE inclusion. Individual author impact, modeled by the h-index, was calculated by each database for comparison. Scopus offered significantly higher journal coverage. For 2014 faculty publications, 100% of journals were found in Scopus, Web of Science offered 82%. No significant difference was found in the quality of reported journals. Author h-index was found to be higher in Scopus. When reporting faculty publications and scholarly impact, academic nursing programs may be better represented by Scopus, without compromising journal quality. Programs with strong interdisciplinary work should examine all areas of strength to ensure appropriate coverage. Copyright © 2017 Elsevier Inc. All rights reserved.
Quantitative assessment of the expanding complementarity between public and commercial databases of bioactive compounds.

PubMed

Southan, Christopher; Várkonyi, Péter; Muresan, Sorel

2009-07-06

Since 2004 public cheminformatic databases and their collective functionality for exploring relationships between compounds, protein sequences, literature and assay data have advanced dramatically. In parallel, commercial sources that extract and curate such relationships from journals and patents have also been expanding. This work updates a previous comparative study of databases chosen because of their bioactive content, availability of downloads and facility to select informative subsets. Where they could be calculated, extracted compounds-per-journal article were in the range of 12 to 19 but compound-per-protein counts increased with document numbers. Chemical structure filtration to facilitate standardised comparisons typically reduced source counts by between 5% and 30%. The pair-wise overlaps between 23 databases and subsets were determined, as well as changes between 2006 and 2008. While all compound sets have increased, PubChem has doubled to 14.2 million. The 2008 comparison matrix shows not only overlap but also unique content across all sources. Many of the detailed differences could be attributed to individual strategies for data selection and extraction. While there was a big increase in patent-derived structures entering PubChem since 2006, GVKBIO contains over 0.8 million unique structures from this source. Venn diagrams showed extensive overlap between compounds extracted by independent expert curation from journals by GVKBIO, WOMBAT (both commercial) and BindingDB (public) but each included unique content. In contrast, the approved drug collections from GVKBIO, MDDR (commercial) and DrugBank (public) showed surprisingly low overlap. Aggregating all commercial sources established that while 1 million compounds overlapped with PubChem 1.2 million did not. On the basis of chemical structure content per se public sources have covered an increasing proportion of commercial databases over the last two years. However, commercial products included in
DNA Data Bank of Japan

PubMed Central

Mashima, Jun; Kodama, Yuichi; Fujisawa, Takatomo; Katayama, Toshiaki; Okuda, Yoshihiro; Kaminuma, Eli; Ogasawara, Osamu; Okubo, Kousaku; Nakamura, Yasukazu; Takagi, Toshihisa

2017-01-01

The DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) has been providing public data services for thirty years (since 1987). We are collecting nucleotide sequence data from researchers as a member of the International Nucleotide Sequence Database Collaboration (INSDC, http://www.insdc.org), in collaboration with the US National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI). The DDBJ Center also services Japanese Genotype-phenotype Archive (JGA), with the National Bioscience Database Center to collect human-subjected data from Japanese researchers. Here, we report our database activities for INSDC and JGA over the past year, and introduce retrieval and analytical services running on our supercomputer system and their recent modifications. Furthermore, with the Database Center for Life Science, the DDBJ Center improves semantic web technologies to integrate and to share biological data, for providing the RDF version of the sequence data. PMID:27924010
Morchella MLST database

USDA-ARS?s Scientific Manuscript database

Welcome to the Morchella MLST database. This dedicated database was set up at the CBS-KNAW Biodiversity Center by Vincent Robert in February 2012, using BioloMICS software (Robert et al., 2011), to facilitate DNA sequence-based identifications of Morchella species via the Internet. The current datab...
Fluorescence- and capillary electrophoresis (CE)-based SSR DNA fingerprinting and a molecular identity database for the Louisiana sugarcane industry

USDA-ARS?s Scientific Manuscript database

A database of Louisiana sugarcane molecular identity has been constructed and is being updated annually using FAM or HEX or NED fluorescence- and capillary electrophoresis (CE)-based microsatellite (SSR) fingerprinting information. The fingerprints are PCR-amplified from leaf DNA samples of current ...
Application of cytochrome b DNA sequences for the authentication of endangered snake species.

PubMed

Wong, Ka-Lok; Wang, Jun; But, Paul Pui-Hay; Shaw, Pang-Chui

2004-01-06

In order to enforce the conservation program and curbing the illegal trading and consumption of endangered snake species, the value of cytochrome b sequence in the authentication of snake species was evaluated. As an illustration, DNA was extracted, selected cytochrome b DNA sequences amplified and sequenced from six snakes commonly consumed in Hong Kong. Cataloging with sequences available in public, a cytochrome b database containing 90 species of snakes was constructed. In this database, sequence homology between snakes ranged from 70.68 to 95.11%. On the other hand, intraspecific variation of three tested snakes was 0-0.98%. Using the database, we were able to determine the identity of six meat samples confiscated by the Agriculture, Fisheries and Conservation Department, HKSAR.
Information technologies in public health management: a database on biocides to improve quality of life.

PubMed

Roman, C; Scripcariu, L; Diaconescu, Rm; Grigoriu, A

2012-01-01

Biocides for prolonging the shelf life of a large variety of materials have been extensively used over the last decades. It has estimated that the worldwide biocide consumption to be about 12.4 billion dollars in 2011, and is expected to increase in 2012. As biocides are substances we get in contact with in our everyday lives, access to this type of information is of paramount importance in order to ensure an appropriate living environment. Consequently, a database where information may be quickly processed, sorted, and easily accessed, according to different search criteria, is the most desirable solution. The main aim of this work was to design and implement a relational database with complete information about biocides used in public health management to improve the quality of life. Design and implementation of a relational database for biocides, by using the software "phpMyAdmin". A database, which allows for an efficient collection, storage, and management of information including chemical properties and applications of a large quantity of biocides, as well as its adequate dissemination into the public health environment. The information contained in the database herein presented promotes an adequate use of biocides, by means of information technologies, which in consequence may help achieve important improvement in our quality of life.
Kazusa Marker DataBase: a database for genomics, genetics, and molecular breeding in plants.

PubMed

Shirasawa, Kenta; Isobe, Sachiko; Tabata, Satoshi; Hirakawa, Hideki

2014-09-01

In order to provide useful genomic information for agronomical plants, we have established a database, the Kazusa Marker DataBase (http://marker.kazusa.or.jp). This database includes information on DNA markers, e.g., SSR and SNP markers, genetic linkage maps, and physical maps, that were developed at the Kazusa DNA Research Institute. Keyword searches for the markers, sequence data used for marker development, and experimental conditions are also available through this database. Currently, 10 plant species have been targeted: tomato (Solanum lycopersicum), pepper (Capsicum annuum), strawberry (Fragaria × ananassa), radish (Raphanus sativus), Lotus japonicus, soybean (Glycine max), peanut (Arachis hypogaea), red clover (Trifolium pratense), white clover (Trifolium repens), and eucalyptus (Eucalyptus camaldulensis). In addition, the number of plant species registered in this database will be increased as our research progresses. The Kazusa Marker DataBase will be a useful tool for both basic and applied sciences, such as genomics, genetics, and molecular breeding in crops.
Comparative Study of Seven Commercial Kits for Human DNA Extraction from Urine Samples Suitable for DNA Biomarker-Based Public Health Studies

PubMed Central

El Bali, Latifa; Diman, Aurélie; Bernard, Alfred; Roosens, Nancy H. C.; De Keersmaecker, Sigrid C. J.

2014-01-01

Human genomic DNA extracted from urine could be an interesting tool for large-scale public health studies involving characterization of genetic variations or DNA biomarkers as a result of the simple and noninvasive collection method. These studies, involving many samples, require a rapid, easy, and standardized extraction protocol. Moreover, for practicability, there is a necessity to collect urine at a moment different from the first void and to store it appropriately until analysis. The present study compared seven commercial kits to select the most appropriate urinary human DNA extraction procedure for epidemiological studies. DNA yield has been determined using different quantification methods: two classical, i.e., NanoDrop and PicoGreen, and two species-specific real-time quantitative (q)PCR assays, as DNA extracted from urine contains, besides human, microbial DNA also, which largely contributes to the total DNA yield. In addition, the kits giving a good yield were also tested for the presence of PCR inhibitors. Further comparisons were performed regarding the sampling time and the storage conditions. Finally, as a proof-of-concept, an important gene related to smoking has been genotyped using the developed tools. We could select one well-performing kit for the human DNA extraction from urine suitable for molecular diagnostic real-time qPCR-based assays targeting genetic variations, applicable to large-scale studies. In addition, successful genotyping was possible using DNA extracted from urine stored at −20°C for several months, and an acceptable yield could also be obtained from urine collected at different moments during the day, which is particularly important for public health studies. PMID:25365790
SkyDOT: a publicly accessible variability database, containing multiple sky surveys and real-time data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Starr, D. L.; Wozniak, P. R.; Vestrand, W. T.

2002-01-01

SkyDOT (Sky Database for Objects in Time-Domain) is a Virtual Observatory currently comprised of data from the RAPTOR, ROTSE I, and OGLE I1 survey projects. This makes it a very large time domain database. In addition, the RAPTOR project provides SkyDOT with real-time variability data as well as stereoscopic information. With its web interface, we believe SkyDOT will be a very useful tool for both astronomers, and the public. Our main task has been to construct an efficient relational database containing all existing data, while handling a real-time inflow of data. We also provide a useful web interface allowing easymore » access to both astronomers and the public. Initially, this server will allow common searches, specific queries, and access to light curves. In the future we will include machine learning classification tools and access to spectral information.« less
NCBI GEO: mining millions of expression profiles--database and tools.

PubMed

Barrett, Tanya; Suzek, Tugba O; Troup, Dennis B; Wilhite, Stephen E; Ngau, Wing-Chi; Ledoux, Pierre; Rudnev, Dmitry; Lash, Alex E; Fujibuchi, Wataru; Edgar, Ron

2005-01-01

The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) is the largest fully public repository for high-throughput molecular abundance data, primarily gene expression data. The database has a flexible and open design that allows the submission, storage and retrieval of many data types. These data include microarray-based experiments measuring the abundance of mRNA, genomic DNA and protein molecules, as well as non-array-based technologies such as serial analysis of gene expression (SAGE) and mass spectrometry proteomic technology. GEO currently holds over 30,000 submissions representing approximately half a billion individual molecular abundance measurements, for over 100 organisms. Here, we describe recent database developments that facilitate effective mining and visualization of these data. Features are provided to examine data from both experiment- and gene-centric perspectives using user-friendly Web-based interfaces accessible to those without computational or microarray-related analytical expertise. The GEO database is publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo.
Public Perceptions and Expectations of the Forensic Use of DNA: Results of a Preliminary Study

ERIC Educational Resources Information Center

Curtis, Cate

2009-01-01

The forensic use of Deoxyribonucleic Acid (DNA) is demonstrating significant success as a crime-solving tool. However, numerous concerns have been raised regarding the potential for DNA use to contravene cultural, ethical, and legal codes. In this article the expectations and level of knowledge of the New Zealand public of the DNA data-bank and…
A public database of macromolecular diffraction experiments.

PubMed

Grabowski, Marek; Langner, Karol M; Cymborowski, Marcin; Porebski, Przemyslaw J; Sroka, Piotr; Zheng, Heping; Cooper, David R; Zimmerman, Matthew D; Elsliger, Marc André; Burley, Stephen K; Minor, Wladek

2016-11-01

The low reproducibility of published experimental results in many scientific disciplines has recently garnered negative attention in scientific journals and the general media. Public transparency, including the availability of `raw' experimental data, will help to address growing concerns regarding scientific integrity. Macromolecular X-ray crystallography has led the way in requiring the public dissemination of atomic coordinates and a wealth of experimental data, making the field one of the most reproducible in the biological sciences. However, there remains no mandate for public disclosure of the original diffraction data. The Integrated Resource for Reproducibility in Macromolecular Crystallography (IRRMC) has been developed to archive raw data from diffraction experiments and, equally importantly, to provide related metadata. Currently, the database of our resource contains data from 2920 macromolecular diffraction experiments (5767 data sets), accounting for around 3% of all depositions in the Protein Data Bank (PDB), with their corresponding partially curated metadata. IRRMC utilizes distributed storage implemented using a federated architecture of many independent storage servers, which provides both scalability and sustainability. The resource, which is accessible via the web portal at http://www.proteindiffraction.org, can be searched using various criteria. All data are available for unrestricted access and download. The resource serves as a proof of concept and demonstrates the feasibility of archiving raw diffraction data and associated metadata from X-ray crystallographic studies of biological macromolecules. The goal is to expand this resource and include data sets that failed to yield X-ray structures in order to facilitate collaborative efforts that will improve protein structure-determination methods and to ensure the availability of `orphan' data left behind for various reasons by individual investigators and/or extinct structural genomics
Bayesian approach to transforming public gene expression repositories into disease diagnosis databases.

PubMed

Huang, Haiyan; Liu, Chun-Chi; Zhou, Xianghong Jasmine

2010-04-13

The rapid accumulation of gene expression data has offered unprecedented opportunities to study human diseases. The National Center for Biotechnology Information Gene Expression Omnibus is currently the largest database that systematically documents the genome-wide molecular basis of diseases. However, thus far, this resource has been far from fully utilized. This paper describes the first study to transform public gene expression repositories into an automated disease diagnosis database. Particularly, we have developed a systematic framework, including a two-stage Bayesian learning approach, to achieve the diagnosis of one or multiple diseases for a query expression profile along a hierarchical disease taxonomy. Our approach, including standardizing cross-platform gene expression data and heterogeneous disease annotations, allows analyzing both sources of information in a unified probabilistic system. A high level of overall diagnostic accuracy was shown by cross validation. It was also demonstrated that the power of our method can increase significantly with the continued growth of public gene expression repositories. Finally, we showed how our disease diagnosis system can be used to characterize complex phenotypes and to construct a disease-drug connectivity map.
The Hawaiian Freshwater Algal Database (HfwADB): a laboratory LIMS and online biodiversity resource

PubMed Central

2012-01-01

Background Biodiversity databases serve the important role of highlighting species-level diversity from defined geographical regions. Databases that are specially designed to accommodate the types of data gathered during regional surveys are valuable in allowing full data access and display to researchers not directly involved with the project, while serving as a Laboratory Information Management System (LIMS). The Hawaiian Freshwater Algal Database, or HfwADB, was modified from the Hawaiian Algal Database to showcase non-marine algal specimens collected from the Hawaiian Archipelago by accommodating the additional level of organization required for samples including multiple species. Description The Hawaiian Freshwater Algal Database is a comprehensive and searchable database containing photographs and micrographs of samples and collection sites, geo-referenced collecting information, taxonomic data and standardized DNA sequence data. All data for individual samples are linked through unique 10-digit accession numbers (“Isolate Accession”), the first five of which correspond to the collection site (“Environmental Accession”). Users can search online for sample information by accession number, various levels of taxonomy, habitat or collection site. HfwADB is hosted at the University of Hawaii, and was made publicly accessible in October 2011. At the present time the database houses data for over 2,825 samples of non-marine algae from 1,786 collection sites from the Hawaiian Archipelago. These samples include cyanobacteria, red and green algae and diatoms, as well as lesser representation from some other algal lineages. Conclusions HfwADB is a digital repository that acts as a Laboratory Information Management System for Hawaiian non-marine algal data. Users can interact with the repository through the web to view relevant habitat data (including geo-referenced collection locations) and download images of collection sites, specimen photographs and

Publication proportions for registered breast cancer trials: before and following the introduction of the ClinicalTrials.gov results database.

PubMed

Asiimwe, Innocent Gerald; Rumona, Dickson

2016-01-01

To limit selective and incomplete publication of the results of clinical trials, registries including ClinicalTrials.gov were introduced. The ClinicalTrials.gov registry added a results database in 2008 to enable researchers to post the results of their trials as stipulated by the Food and Drug Administration Amendment Act of 2007. This study aimed to determine the direction and magnitude of any change in publication proportions of registered breast cancer trials that occurred since the inception of the ClinicalTrials.gov results database. A cross-sectional study design was employed using ClinicalTrials.gov, a publicly available registry/results database as the primary data source. Registry contents under the subcategories 'Breast Neoplasms' and 'Breast Neoplasms, Male' were downloaded on 1 August 2015. A literature search for included trials was afterwards conducted using MEDLINE and DISCOVER databases to determine publication status of the registered breast cancer trials. Nearly half (168/340) of the listed trials had been published, with a median time to publication of 24 months (Q1 = 14 months, Q3 = 42 months). Only 86 trials were published within 24 months of completion. There was no significant increase in publication proportions of trials that were completed before the introduction of the results database compared to those completed after (OR = 1.00, 95 % CI = .61 to 1.63; adjusted OR = 0.84, 95 % CI = .51 to 1.39). Characteristics associated with publication included trial type (observational versus interventional adjusted OR = .28, 95 % CI = .10 to .74) and completion/termination status (terminated versus completed adjusted OR = .22, 95 % CI = .09 to .51). Less than a half of breast cancer trials registered in ClinicalTrials.gov are published in peer-reviewed journals.
Public involvement in pharmacogenomics research: a national survey on patients' attitudes towards pharmacogenomics research and the willingness to donate DNA samples to a DNA bank in Japan.

PubMed

Kobayashi, Eriko; Sakurada, Tomoya; Ueda, Shiro; Satoh, Nobunori

2011-05-01

To assess the attitude of Japanese patients towards pharmacogenomics research and a DNA bank for identifying genomic markers associated with adverse drug reactions (ADRs) and their willingness to donate DNA samples, we conducted a survey of 550 male and female patients. The majority of the respondents showed a positive attitude towards pharmacogenomics research (87.6%) and a DNA bank (75.1%). The willingness to donate DNA samples when experiencing severe ADRs (55.8%) was higher than when taking medications (40.4%). Positive attitudes towards a DNA bank and organ donation were significantly associated with an increased willingness to donate. Though the level of positive attitude in the patient population was higher than that in the general public in our former study (81.0 and 70.4%, respectively), the level of the willingness of patients to donate was 40.4% when taking medications and 55.8% when experiencing severe ADRs which was lower than that of the general public in our former study (45.3 and 61.7%). The results suggested that the level of true willingness in the patient population was lower than that of the general public considering the fictitious situation presented to the public (to suppose that they were patients receiving medication). It is important to assess the willingness of patients who are true potential donors, not the general public.
A Public Database of Memory and Naive B-Cell Receptor Sequences.

PubMed

DeWitt, William S; Lindau, Paul; Snyder, Thomas M; Sherwood, Anna M; Vignali, Marissa; Carlson, Christopher S; Greenberg, Philip D; Duerkopp, Natalie; Emerson, Ryan O; Robins, Harlan S

2016-01-01

The vast diversity of B-cell receptors (BCR) and secreted antibodies enables the recognition of, and response to, a wide range of epitopes, but this diversity has also limited our understanding of humoral immunity. We present a public database of more than 37 million unique BCR sequences from three healthy adult donors that is many fold deeper than any existing resource, together with a set of online tools designed to facilitate the visualization and analysis of the annotated data. We estimate the clonal diversity of the naive and memory B-cell repertoires of healthy individuals, and provide a set of examples that illustrate the utility of the database, including several views of the basic properties of immunoglobulin heavy chain sequences, such as rearrangement length, subunit usage, and somatic hypermutation positions and dynamics.
Publication trend, resource utilization, and impact of the US National Cancer Database: A systematic review.

PubMed

Su, Chang; Peng, Cuiying; Agbodza, Ena; Bai, Harrison X; Huang, Yuqian; Karakousis, Giorgos; Zhang, Paul J; Zhang, Zishu

2018-03-01

The utilization and impact of the studies published using the National Cancer Database (NCDB) is currently unclear. In this study, we aim to characterize the published studies, and identify relatively unexplored areas for future investigations. A literature search was performed using PubMed in January 2017 to identify all papers published using NCDB data. Characteristics of the publications were extracted. Citation frequencies were obtained through the Web of Science. Three hundred 2 articles written by 230 first authors met the inclusion criteria. The number of publications grew exponentially since 2013, with 108 articles published in 2016. Articles were published in 86 journals. The majority of the published papers focused on digestive system cancer, while bone and joints, eye and orbit, myeloma, mesothelioma, and Kaposi Sarcoma were never studied. Thirteen institutions in the United States were associated with more than 5 publications. The papers have been cited for a total of 9858 times since the publication of the first paper in 1992. Frequently appearing keywords congregated into 3 clusters: "demographics," "treatments and survival," and "statistical analysis method." Even though the main focuses of the articles captured a extremely wide range, they can be classified into 2 main categories: survival analysis and characterization. Other focuses include database(s) analysis and/or comparison, and hospital reporting. The surging interest in the use of NCDB is accompanied by unequal utilization of resources by individuals and institutions. Certain areas were relatively understudied and should be further explored.
GigaTON: an extensive publicly searchable database providing a new reference transcriptome in the pacific oyster Crassostrea gigas.

PubMed

Riviere, Guillaume; Klopp, Christophe; Ibouniyamine, Nabihoudine; Huvet, Arnaud; Boudry, Pierre; Favrel, Pascal

2015-12-02

The Pacific oyster, Crassostrea gigas, is one of the most important aquaculture shellfish resources worldwide. Important efforts have been undertaken towards a better knowledge of its genome and transcriptome, which makes now C. gigas becoming a model organism among lophotrochozoans, the under-described sister clade of ecdysozoans within protostomes. These massive sequencing efforts offer the opportunity to assemble gene expression data and make such resource accessible and exploitable for the scientific community. Therefore, we undertook this assembly into an up-to-date publicly available transcriptome database: the GigaTON (Gigas TranscriptOme pipeliNe) database. We assembled 2204 million sequences obtained from 114 publicly available RNA-seq libraries that were realized using all embryo-larval development stages, adult organs, different environmental stressors including heavy metals, temperature, salinity and exposure to air, which were mostly performed as part of the Crassostrea gigas genome project. This data was analyzed in silico and resulted into 56621 newly assembled contigs that were deposited into a publicly available database, the GigaTON database. This database also provides powerful and user-friendly request tools to browse and retrieve information about annotation, expression level, UTRs, splice and polymorphism, and gene ontology associated to all the contigs into each, and between all libraries. The GigaTON database provides a convenient, potent and versatile interface to browse, retrieve, confront and compare massive transcriptomic information in an extensive range of conditions, tissues and developmental stages in Crassostrea gigas. To our knowledge, the GigaTON database constitutes the most extensive transcriptomic database to date in marine invertebrates, thereby a new reference transcriptome in the oyster, a highly valuable resource to physiologists and evolutionary biologists.
GlycomeDB – integration of open-access carbohydrate structure databases

PubMed Central

Ranzinger, René; Herget, Stephan; Wetter, Thomas; von der Lieth, Claus-Wilhelm

2008-01-01

Background Although carbohydrates are the third major class of biological macromolecules, after proteins and DNA, there is neither a comprehensive database for carbohydrate structures nor an established universal structure encoding scheme for computational purposes. Funding for further development of the Complex Carbohydrate Structure Database (CCSD or CarbBank) ceased in 1997, and since then several initiatives have developed independent databases with partially overlapping foci. For each database, different encoding schemes for residues and sequence topology were designed. Therefore, it is virtually impossible to obtain an overview of all deposited structures or to compare the contents of the various databases. Results We have implemented procedures which download the structures contained in the seven major databases, e.g. GLYCOSCIENCES.de, the Consortium for Functional Glycomics (CFG), the Kyoto Encyclopedia of Genes and Genomes (KEGG) and the Bacterial Carbohydrate Structure Database (BCSDB). We have created a new database called GlycomeDB, containing all structures, their taxonomic annotations and references (IDs) for the original databases. More than 100000 datasets were imported, resulting in more than 33000 unique sequences now encoded in GlycomeDB using the universal format GlycoCT. Inconsistencies were found in all public databases, which were discussed and corrected in multiple feedback rounds with the responsible curators. Conclusion GlycomeDB is a new, publicly available database for carbohydrate sequences with a unified, all-encompassing structure encoding format and NCBI taxonomic referencing. The database is updated weekly and can be downloaded free of charge. The JAVA application GlycoUpdateDB is also available for establishing and updating a local installation of GlycomeDB. With the advent of GlycomeDB, the distributed islands of knowledge in glycomics are now bridged to form a single resource. PMID:18803830
The EpiSLI Database: A Publicly Available Database on Speech and Language

ERIC Educational Resources Information Center

Tomblin, J. Bruce

2010-01-01

Purpose: This article describes a database that was created in the process of conducting a large-scale epidemiologic study of specific language impairment (SLI). As such, this database will be referred to as the EpiSLI database. Children with SLI have unexpected and unexplained difficulties learning and using spoken language. Although there is no…
Development and Application of a Salmonid EST Database and cDNA Microarray: Data Mining and Interspecific Hybridization Characteristics

PubMed Central

Rise, Matthew L.; von Schalburg, Kristian R.; Brown, Gordon D.; Mawer, Melanie A.; Devlin, Robert H.; Kuipers, Nathanael; Busby, Maura; Beetz-Sargent, Marianne; Alberto, Roberto; Gibbs, A. Ross; Hunt, Peter; Shukin, Robert; Zeznik, Jeffrey A.; Nelson, Colleen; Jones, Simon R.M.; Smailus, Duane E.; Jones, Steven J.M.; Schein, Jacqueline E.; Marra, Marco A.; Butterfield, Yaron S.N.; Stott, Jeff M.; Ng, Siemon H.S.; Davidson, William S.; Koop, Ben F.

2004-01-01

We report 80,388 ESTs from 23 Atlantic salmon (Salmo salar) cDNA libraries (61,819 ESTs), 6 rainbow trout (Oncorhynchus mykiss) cDNA libraries (14,544 ESTs), 2 chinook salmon (Oncorhynchus tshawytscha) cDNA libraries (1317 ESTs), 2 sockeye salmon (Oncorhynchus nerka) cDNA libraries (1243 ESTs), and 2 lake whitefish (Coregonus clupeaformis) cDNA libraries (1465 ESTs). The majority of these are 3′ sequences, allowing discrimination between paralogs arising from a recent genome duplication in the salmonid lineage. Sequence assembly reveals 28,710 different S. salar, 8981 O. mykiss, 1085 O. tshawytscha, 520 O. nerka, and 1176 C. clupeaformis putative transcripts. We annotate the submitted portion of our EST database by molecular function. Higher- and lower-molecular-weight fractions of libraries are shown to contain distinct gene sets, and higher rates of gene discovery are associated with higher-molecular weight libraries. Pyloric caecum library group annotations indicate this organ may function in redox control and as a barrier against systemic uptake of xenobiotics. A microarray is described, containing 7356 salmonid elements representing 3557 different cDNAs. Analyses of cross-species hybridizations to this cDNA microarray indicate that this resource may be used for studies involving all salmonids. PMID:14962987
NPIDB: Nucleic acid-Protein Interaction DataBase.

PubMed

Kirsanov, Dmitry D; Zanegina, Olga N; Aksianov, Evgeniy A; Spirin, Sergei A; Karyagina, Anna S; Alexeevski, Andrei V

2013-01-01

The Nucleic acid-Protein Interaction DataBase (http://npidb.belozersky.msu.ru/) contains information derived from structures of DNA-protein and RNA-protein complexes extracted from the Protein Data Bank (3846 complexes in October 2012). It provides a web interface and a set of tools for extracting biologically meaningful characteristics of nucleoprotein complexes. The content of the database is updated weekly. The current version of the Nucleic acid-Protein Interaction DataBase is an upgrade of the version published in 2007. The improvements include a new web interface, new tools for calculation of intermolecular interactions, a classification of SCOP families that contains DNA-binding protein domains and data on conserved water molecules on the DNA-protein interface.
Evaluation of partial 16S ribosomal DNA sequencing for identification of nocardia species by using the MicroSeq 500 system with an expanded database.

PubMed

Cloud, Joann L; Conville, Patricia S; Croft, Ann; Harmsen, Dag; Witebsky, Frank G; Carroll, Karen C

2004-02-01

Identification of clinically significant nocardiae to the species level is important in patient diagnosis and treatment. A study was performed to evaluate Nocardia species identification obtained by partial 16S ribosomal DNA (rDNA) sequencing by the MicroSeq 500 system with an expanded database. The expanded portion of the database was developed from partial 5' 16S rDNA sequences derived from 28 reference strains (from the American Type Culture Collection and the Japanese Collection of Microorganisms). The expanded MicroSeq 500 system was compared to (i). conventional identification obtained from a combination of growth characteristics with biochemical and drug susceptibility tests; (ii). molecular techniques involving restriction enzyme analysis (REA) of portions of the 16S rRNA and 65-kDa heat shock protein genes; and (iii). when necessary, sequencing of a 999-bp fragment of the 16S rRNA gene. An unknown isolate was identified as a particular species if the sequence obtained by partial 16S rDNA sequencing by the expanded MicroSeq 500 system was 99.0% similar to that of the reference strain. Ninety-four nocardiae representing 10 separate species were isolated from patient specimens and examined by using the three different methods. Sequencing of partial 16S rDNA by the expanded MicroSeq 500 system resulted in only 72% agreement with conventional methods for species identification and 90% agreement with the alternative molecular methods. Molecular methods for identification of Nocardia species provide more accurate and rapid results than the conventional methods using biochemical and susceptibility testing. With an expanded database, the MicroSeq 500 system for partial 16S rDNA was able to correctly identify the human pathogens N. brasiliensis, N. cyriacigeorgica, N. farcinica, N. nova, N. otitidiscaviarum, and N. veterana.
Development of a Publicly Available, Comprehensive Database of Fiber and Health Outcomes: Rationale and Methods

PubMed Central

Livingston, Kara A.; Chung, Mei; Sawicki, Caleigh M.; Lyle, Barbara J.; Wang, Ding Ding; Roberts, Susan B.; McKeown, Nicola M.

2016-01-01

Background Dietary fiber is a broad category of compounds historically defined as partially or completely indigestible plant-based carbohydrates and lignin with, more recently, the additional criteria that fibers incorporated into foods as additives should demonstrate functional human health outcomes to receive a fiber classification. Thousands of research studies have been published examining fibers and health outcomes. Objectives (1) Develop a database listing studies testing fiber and physiological health outcomes identified by experts at the Ninth Vahouny Conference; (2) Use evidence mapping methodology to summarize this body of literature. This paper summarizes the rationale, methodology, and resulting database. The database will help both scientists and policy-makers to evaluate evidence linking specific fibers with physiological health outcomes, and identify missing information. Methods To build this database, we conducted a systematic literature search for human intervention studies published in English from 1946 to May 2015. Our search strategy included a broad definition of fiber search terms, as well as search terms for nine physiological health outcomes identified at the Ninth Vahouny Fiber Symposium. Abstracts were screened using a priori defined eligibility criteria and a low threshold for inclusion to minimize the likelihood of rejecting articles of interest. Publications then were reviewed in full text, applying additional a priori defined exclusion criteria. The database was built and published on the Systematic Review Data Repository (SRDR™), a web-based, publicly available application. Conclusions A fiber database was created. This resource will reduce the unnecessary replication of effort in conducting systematic reviews by serving as both a central database archiving PICO (population, intervention, comparator, outcome) data on published studies and as a searchable tool through which this data can be extracted and updated. PMID:27348733
Pathogenicity in POLG syndromes: DNA polymerase gamma pathogenicity prediction server and database.

PubMed

Nurminen, Anssi; Farnum, Gregory A; Kaguni, Laurie S

2017-06-01

DNA polymerase gamma (POLG) is the replicative polymerase responsible for maintaining mitochondrial DNA (mtDNA). Disorders related to its functionality are a major cause of mitochondrial disease. The clinical spectrum of POLG syndromes includes Alpers-Huttenlocher syndrome (AHS), childhood myocerebrohepatopathy spectrum (MCHS), myoclonic epilepsy myopathy sensory ataxia (MEMSA), the ataxia neuropathy spectrum (ANS) and progressive external ophthalmoplegia (PEO). We have collected all publicly available POLG-related patient data and analyzed it using our pathogenic clustering model to provide a new research and clinical tool in the form of an online server. The server evaluates the pathogenicity of both previously reported and novel mutations. There are currently 176 unique point mutations reported and found in mitochondrial patients in the gene encoding the catalytic subunit of POLG, POLG . The mutations are distributed nearly uniformly along the length of the primary amino acid sequence of the gene. Our analysis shows that most of the mutations are recessive, and that the reported dominant mutations cluster within the polymerase active site in the tertiary structure of the POLG enzyme. The POLG Pathogenicity Prediction Server (http://polg.bmb.msu.edu) is targeted at clinicians and scientists studying POLG disorders, and aims to provide the most current available information regarding the pathogenicity of POLG mutations.
DNA Commission of the International Society for Forensic Genetics: revised and extended guidelines for mitochondrial DNA typing.

PubMed

Parson, W; Gusmão, L; Hares, D R; Irwin, J A; Mayr, W R; Morling, N; Pokorak, E; Prinz, M; Salas, A; Schneider, P M; Parsons, T J

2014-11-01

The DNA Commission of the International Society of Forensic Genetics (ISFG) regularly publishes guidelines and recommendations concerning the application of DNA polymorphisms to the question of human identification. Previous recommendations published in 2000 addressed the analysis and interpretation of mitochondrial DNA (mtDNA) in forensic casework. While the foundations set forth in the earlier recommendations still apply, new approaches to the quality control, alignment and nomenclature of mitochondrial sequences, as well as the establishment of mtDNA reference population databases, have been developed. Here, we describe these developments and discuss their application to both mtDNA casework and mtDNA reference population databasing applications. While the generation of mtDNA for forensic casework has always been guided by specific standards, it is now well-established that data of the same quality are required for the mtDNA reference population data used to assess the statistical weight of the evidence. As a result, we introduce guidelines regarding sequence generation, as well as quality control measures based on the known worldwide mtDNA phylogeny, that can be applied to ensure the highest quality population data possible. For both casework and reference population databasing applications, the alignment and nomenclature of haplotypes is revised here and the phylogenetic alignment proffered as acceptable standard. In addition, the interpretation of heteroplasmy in the forensic context is updated, and the utility of alignment-free database searches for unbiased probability estimates is highlighted. Finally, we discuss statistical issues and define minimal standards for mtDNA database searches. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
The Porcelain Crab Transcriptome and PCAD, the Porcelain Crab Microarray and Sequence Database

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tagmount, Abderrahmane; Wang, Mei; Lindquist, Erika

2010-01-27

Background: With the emergence of a completed genome sequence of the freshwater crustacean Daphnia pulex, construction of genomic-scale sequence databases for additional crustacean sequences are important for comparative genomics and annotation. Porcelain crabs, genus Petrolisthes, have been powerful crustacean models for environmental and evolutionary physiology with respect to thermal adaptation and understanding responses of marine organisms to climate change. Here, we present a large-scale EST sequencing and cDNA microarray database project for the porcelain crab Petrolisthes cinctipes. Methodology/Principal Findings: A set of ~;;30K unique sequences (UniSeqs) representing ~;;19K clusters were generated from ~;;98K high quality ESTs from a set ofmore » tissue specific non-normalized and mixed-tissue normalized cDNA libraries from the porcelain crab Petrolisthes cinctipes. Homology for each UniSeq was assessed using BLAST, InterProScan, GO and KEGG database searches. Approximately 66percent of the UniSeqs had homology in at least one of the databases. All EST and UniSeq sequences along with annotation results and coordinated cDNA microarray datasets have been made publicly accessible at the Porcelain Crab Array Database (PCAD), a feature-enriched version of the Stanford and Longhorn Array Databases.Conclusions/Significance: The EST project presented here represents the third largest sequencing effort for any crustacean, and the largest effort for any crab species. Our assembly and clustering results suggest that our porcelain crab EST data set is equally diverse to the much larger EST set generated in the Daphnia pulex genome sequencing project, and thus will be an important resource to the Daphnia research community. Our homology results support the pancrustacea hypothesis and suggest that Malacostraca may be ancestral to Branchiopoda and Hexapoda. Our results also suggest that our cDNA microarrays cover as much of the transcriptome as can reasonably be
Identifying common donors in DNA mixtures, with applications to database searches.

PubMed

Slooten, K

2017-01-01

Several methods exist to compute the likelihood ratio LR(M, g) evaluating the possible contribution of a person of interest with genotype g to a mixed trace M. In this paper we generalize this LR to a likelihood ratio LR(M 1 , M 2 ) involving two possibly mixed traces M 1 and M 2 , where the question is whether there is a donor in common to both traces. In case one of the traces is in fact a single genotype, then this likelihood ratio reduces to the usual LR(M, g). We explain how our method conceptually is a logical consequence of the fact that LR calculations of the form LR(M, g) can be equivalently regarded as a probabilistic deconvolution of the mixture. Based on simulated data, and using a semi-continuous mixture evaluation model, we derive ROC curves of our method applied to various types of mixtures. From these data we conclude that searches for a common donor are often feasible in the sense that a very small false positive rate can be combined with a high probability to detect a common donor if there is one. We also show how database searches comparing all traces to each other can be carried out efficiently, as illustrated by the application of the method to the mixed traces in the Dutch DNA database. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Colony-PCR Is a Rapid Method for DNA Amplification of Hyphomycetes

PubMed Central

Walch, Georg; Knapp, Maria; Rainer, Georg; Peintner, Ursula

2016-01-01

Fungal pure cultures identified with both classical morphological methods and through barcoding sequences are a basic requirement for reliable reference sequences in public databases. Improved techniques for an accelerated DNA barcode reference library construction will result in considerably improved sequence databases covering a wider taxonomic range. Fast, cheap, and reliable methods for obtaining DNA sequences from fungal isolates are, therefore, a valuable tool for the scientific community. Direct colony PCR was already successfully established for yeasts, but has not been evaluated for a wide range of anamorphic soil fungi up to now, and a direct amplification protocol for hyphomycetes without tissue pre-treatment has not been published so far. Here, we present a colony PCR technique directly from fungal hyphae without previous DNA extraction or other prior manipulation. Seven hundred eighty-eight fungal strains from 48 genera were tested with a success rate of 86%. PCR success varied considerably: DNA of fungi belonging to the genera Cladosporium, Geomyces, Fusarium, and Mortierella could be amplified with high success. DNA of soil-borne yeasts was always successfully amplified. Absidia, Mucor, Trichoderma, and Penicillium isolates had noticeably lower PCR success. PMID:29376929
DNA Microarray for Detection of Macrolide Resistance Genes

PubMed Central

Cassone, Marco; D'Andrea, Marco M.; Iannelli, Francesco; Oggioni, Marco R.; Rossolini, Gian Maria; Pozzi, Gianni

2006-01-01

A DNA microarray was developed to detect bacterial genes conferring resistance to macrolides and related antibiotics. A database containing 65 nonredundant genes selected from publicly available DNA sequences was constructed and used to design 100 oligonucleotide probes that could specifically detect and discriminate all 65 genes. Probes were spotted on a glass slide, and the array was reacted with DNA templates extracted from 20 reference strains of eight different bacterial species (Streptococcus pneumoniae, Streptococcus pyogenes, Enterococcus faecalis, Enterococcus faecium, Staphylococcus aureus, Staphylococcus haemolyticus, Escherichia coli, and Bacteroides fragilis) known to harbor 29 different macrolide resistance genes. Hybridization results showed that probes reacted with, and only with, the expected DNA templates and allowed discovery of three unexpected genes, including msr(SA) in B. fragilis, an efflux gene that has not yet been described for gram-negative bacteria. PMID:16723563
Assessing the quality of life history information in publicly available databases.

PubMed

Thorson, James T; Cope, Jason M; Patrick, Wesley S

2014-01-01

Single-species life history parameters are central to ecological research and management, including the fields of macro-ecology, fisheries science, and ecosystem modeling. However, there has been little independent evaluation of the precision and accuracy of the life history values in global and publicly available databases. We therefore develop a novel method based on a Bayesian errors-in-variables model that compares database entries with estimates from local experts, and we illustrate this process by assessing the accuracy and precision of entries in FishBase, one of the largest and oldest life history databases. This model distinguishes biases among seven life history parameters, two types of information available in FishBase (i.e., published values and those estimated from other parameters), and two taxa (i.e., bony and cartilaginous fishes) relative to values from regional experts in the United States, while accounting for additional variance caused by sex- and region-specific life history traits. For published values in FishBase, the model identifies a small positive bias in natural mortality and negative bias in maximum age, perhaps caused by unacknowledged mortality caused by fishing. For life history values calculated by FishBase, the model identified large and inconsistent biases. The model also demonstrates greatest precision for body size parameters, decreased precision for values derived from geographically distant populations, and greatest between-sex differences in age at maturity. We recommend that our bias and precision estimates be used in future errors-in-variables models as a prior on measurement errors. This approach is broadly applicable to global databases of life history traits and, if used, will encourage further development and improvements in these databases.
'Your DNA, Your Say': global survey gathering attitudes toward genomics: design, delivery and methods.

PubMed

Middleton, Anna; Niemiec, Emilia; Prainsack, Barbara; Bobe, Jason; Farley, Lauren; Steed, Claire; Smith, James; Bevan, Paul; Bonhomme, Natasha; Kleiderman, Erika; Thorogood, Adrian; Schickhardt, Christoph; Garattini, Chiara; Vears, Danya; Littler, Katherine; Banner, Natalie; Scott, Erick; Kovalevskaya, Nadezda V; Levin, Elissa; Morley, Katherine I; Howard, Heidi C

2018-06-01

Our international study, 'Your DNA, Your Say', uses film and an online cross-sectional survey to gather public attitudes toward the donation, access and sharing of DNA information. We describe the methodological approach used to create an engaging and bespoke survey, suitable for translation into many different languages. We address some of the particular challenges in designing a survey on the subject of genomics. In order to understand the significance of a genomic result, researchers and clinicians alike use external databases containing DNA and medical information from thousands of people. We ask how publics would like their 'anonymous' data to be used (or not to be used) and whether they are concerned by the potential risks of reidentification; the results will be used to inform policy.
Database extraction strategies for low-template evidence.

PubMed

Bleka, Øyvind; Dørum, Guro; Haned, Hinda; Gill, Peter

2014-03-01

Often in forensic cases, the profile of at least one of the contributors to a DNA evidence sample is unknown and a database search is needed to discover possible perpetrators. In this article we consider two types of search strategies to extract suspects from a database using methods based on probability arguments. The performance of the proposed match scores is demonstrated by carrying out a study of each match score relative to the level of allele drop-out in the crime sample, simulating low-template DNA. The efficiency was measured by random man simulation and we compared the performance using the SGM Plus kit and the ESX 17 kit for the Norwegian population, demonstrating that the latter has greatly enhanced power to discover perpetrators of crime in large national DNA databases. The code for the database extraction strategies will be prepared for release in the R-package forensim. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

Governing Software: Networks, Databases and Algorithmic Power in the Digital Governance of Public Education

ERIC Educational Resources Information Center

Williamson, Ben

2015-01-01

This article examines the emergence of "digital governance" in public education in England. Drawing on and combining concepts from software studies, policy and political studies, it identifies some specific approaches to digital governance facilitated by network-based communications and database-driven information processing software…
Resolving the problem of multiple accessions of the same transcript deposited across various public databases.

PubMed

Weirick, Tyler; John, David; Uchida, Shizuka

2017-03-01

Maintaining the consistency of genomic annotations is an increasingly complex task because of the iterative and dynamic nature of assembly and annotation, growing numbers of biological databases and insufficient integration of annotations across databases. As information exchange among databases is poor, a 'novel' sequence from one reference annotation could be annotated in another. Furthermore, relationships to nearby or overlapping annotated transcripts are even more complicated when using different genome assemblies. To better understand these problems, we surveyed current and previous versions of genomic assemblies and annotations across a number of public databases containing long noncoding RNA. We identified numerous discrepancies of transcripts regarding their genomic locations, transcript lengths and identifiers. Further investigation showed that the positional differences between reference annotations of essentially the same transcript could lead to differences in its measured expression at the RNA level. To aid in resolving these problems, we present the algorithm 'Universal Genomic Accession Hash (UGAHash)' and created an open source web tool to encourage the usage of the UGAHash algorithm. The UGAHash web tool (http://ugahash.uni-frankfurt.de) can be accessed freely without registration. The web tool allows researchers to generate Universal Genomic Accessions for genomic features or to explore annotations deposited in the public databases of the past and present versions. We anticipate that the UGAHash web tool will be a valuable tool to check for the existence of transcripts before judging the newly discovered transcripts as novel. © The Author 2016. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
UniPROBE, update 2011: expanded content and search tools in the online database of protein-binding microarray data on protein-DNA interactions.

PubMed

Robasky, Kimberly; Bulyk, Martha L

2011-01-01

The Universal PBM Resource for Oligonucleotide-Binding Evaluation (UniPROBE) database is a centralized repository of information on the DNA-binding preferences of proteins as determined by universal protein-binding microarray (PBM) technology. Each entry for a protein (or protein complex) in UniPROBE provides the quantitative preferences for all possible nucleotide sequence variants ('words') of length k ('k-mers'), as well as position weight matrix (PWM) and graphical sequence logo representations of the k-mer data. In this update, we describe >130% expansion of the database content, incorporation of a protein BLAST (blastp) tool for finding protein sequence matches in UniPROBE, the introduction of UniPROBE accession numbers and additional database enhancements. The UniPROBE database is available at http://uniprobe.org.
Toward fish and seafood traceability: anchovy species determination in fish products by molecular markers and support through a public domain database.

PubMed

Jérôme, Marc; Martinsohn, Jann Thorsten; Ortega, Delphine; Carreau, Philippe; Verrez-Bagnis, Véronique; Mouchel, Olivier

2008-05-28

Traceability in the fish food sector plays an increasingly important role for consumer protection and confidence building. This is reflected by the introduction of legislation and rules covering traceability on national and international levels. Although traceability through labeling is well established and supported by respective regulations, monitoring and enforcement of these rules are still hampered by the lack of efficient diagnostic tools. We describe protocols using a direct sequencing method based on 212-274-bp diagnostic sequences derived from species-specific mitochondria DNA cytochrome b, 16S rRNA, and cytochrome oxidase subunit I sequences which can efficiently be applied to unambiguously determine even closely related fish species in processed food products labeled "anchovy". Traceability of anchovy-labeled products is supported by the public online database AnchovyID ( http://anchovyid.jrc.ec.europa.eu), which provided data obtained during our study and tools for analytical purposes.
A spatial national health facility database for public health sector planning in Kenya in 2008.

PubMed

Noor, Abdisalan M; Alegana, Victor A; Gething, Peter W; Snow, Robert W

2009-03-06

Efforts to tackle the enormous burden of ill-health in low-income countries are hampered by weak health information infrastructures that do not support appropriate planning and resource allocation. For health information systems to function well, a reliable inventory of health service providers is critical. The spatial referencing of service providers to allow their representation in a geographic information system is vital if the full planning potential of such data is to be realized. A disparate series of contemporary lists of health service providers were used to update a public health facility database of Kenya last compiled in 2003. These new lists were derived primarily through the national distribution of antimalarial and antiretroviral commodities since 2006. A combination of methods, including global positioning systems, was used to map service providers. These spatially-referenced data were combined with high-resolution population maps to analyze disparity in geographic access to public health care. The updated 2008 database contained 5,334 public health facilities (67% ministry of health; 28% mission and nongovernmental organizations; 2% local authorities; and 3% employers and other ministries). This represented an overall increase of 1,862 facilities compared to 2003. Most of the additional facilities belonged to the ministry of health (79%) and the majority were dispensaries (91%). 93% of the health facilities were spatially referenced, 38% using global positioning systems compared to 21% in 2003. 89% of the population was within 5 km Euclidean distance to a public health facility in 2008 compared to 71% in 2003. Over 80% of the population outside 5 km of public health service providers was in the sparsely settled pastoralist areas of the country. We have shown that, with concerted effort, a relatively complete inventory of mapped health services is possible with enormous potential for improving planning. Expansion in public health care in Kenya has
DNA Data Bank of Japan: 30th anniversary.

PubMed

Kodama, Yuichi; Mashima, Jun; Kosuge, Takehide; Kaminuma, Eli; Ogasawara, Osamu; Okubo, Kousaku; Nakamura, Yasukazu; Takagi, Toshihisa

2018-01-04

The DNA Data Bank of Japan (DDBJ) Center (http://www.ddbj.nig.ac.jp) has been providing public data services for 30 years since 1987. We are collecting nucleotide sequence data and associated biological information from researchers as a member of the International Nucleotide Sequence Database Collaboration (INSDC), in collaboration with the US National Center for Biotechnology Information and the European Bioinformatics Institute. The DDBJ Center also services the Japanese Genotype-phenotype Archive (JGA) with the National Bioscience Database Center to collect genotype and phenotype data of human individuals. Here, we outline our database activities for INSDC and JGA over the past year, and introduce submission, retrieval and analysis services running on our supercomputer system and their recent developments. Furthermore, we highlight our responses to the amended Japanese rules for the protection of personal information and the launch of the DDBJ Group Cloud service for sharing pre-publication data among research groups. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Leading-edge forensic DNA analyses and the necessity of including crime scene investigators, police officers and technicians in a DNA elimination database.

PubMed

Lapointe, Martine; Rogic, Anita; Bourgoin, Sarah; Jolicoeur, Christine; Séguin, Diane

2015-11-01

In recent years, sophisticated technology has significantly increased the sensitivity and analytical power of genetic analyses so that very little starting material may now produce viable genetic profiles. This sensitivity however, has also increased the risk of detecting unknown genetic profiles assumed to be that of the perpetrator, yet originate from extraneous sources such as from crime scene workers. These contaminants may mislead investigations, keeping criminal cases active and unresolved for long spans of time. Voluntary submission of DNA samples from crime scene workers is fairly low, therefore we have created a promotional method for our staff elimination database that has resulted in a significant increase in voluntary samples since 2011. Our database enforces privacy safeguards and allows for optional anonymity to all staff members. We also offer information sessions at various police precincts to advise crime scene workers of the importance and success of our staff elimination database. This study, a pioneer in its field, has obtained 327 voluntary submissions from crime scene workers to date, of which 46 individual profiles (14%) have been matched to 58 criminal cases. By implementing our methods and respect for individual privacy, forensic laboratories everywhere may see similar growth and success in explaining unidentified genetic profiles in stagnate criminal cases. Crown Copyright © 2015. Published by Elsevier Ireland Ltd. All rights reserved.
Toward public volume database management: a case study of NOVA, the National Online Volumetric Archive

NASA Astrophysics Data System (ADS)

Fletcher, Alex; Yoo, Terry S.

2004-04-01

Public databases today can be constructed with a wide variety of authoring and management structures. The widespread appeal of Internet search engines suggests that public information be made open and available to common search strategies, making accessible information that would otherwise be hidden by the infrastructure and software interfaces of a traditional database management system. We present the construction and organizational details for managing NOVA, the National Online Volumetric Archive. As an archival effort of the Visible Human Project for supporting medical visualization research, archiving 3D multimodal radiological teaching files, and enhancing medical education with volumetric data, our overall database structure is simplified; archives grow by accruing information, but seldom have to modify, delete, or overwrite stored records. NOVA is being constructed and populated so that it is transparent to the Internet; that is, much of its internal structure is mirrored in HTML allowing internet search engines to investigate, catalog, and link directly to the deep relational structure of the collection index. The key organizational concept for NOVA is the Image Content Group (ICG), an indexing strategy for cataloging incoming data as a set structure rather than by keyword management. These groups are managed through a series of XML files and authoring scripts. We cover the motivation for Image Content Groups, their overall construction, authorship, and management in XML, and the pilot results for creating public data repositories using this strategy.
MethBank 3.0: a database of DNA methylomes across a variety of species.

PubMed

Li, Rujiao; Liang, Fang; Li, Mengwei; Zou, Dong; Sun, Shixiang; Zhao, Yongbing; Zhao, Wenming; Bao, Yiming; Xiao, Jingfa; Zhang, Zhang

2018-01-04

MethBank (http://bigd.big.ac.cn/methbank) is a database that integrates high-quality DNA methylomes across a variety of species and provides an interactive browser for visualization of methylation data. Here, we present an updated implementation of MethBank (version 3.0) by incorporating more DNA methylomes from multiple species and equipping with more enhanced functionalities for data annotation and more friendly web interfaces for data presentation, search and visualization. MethBank 3.0 features large-scale integration of high-quality methylomes, involving 34 consensus reference methylomes derived from a large number of human samples, 336 single-base resolution methylomes from different developmental stages and/or tissues of five plants, and 18 single-base resolution methylomes from gametes and early embryos at multiple stages of two animals. Additionally, it is enhanced by improving the functionalities for data annotation, which accordingly enables systematic identification of methylation sites closely associated with age, sites with constant methylation levels across different ages, differentially methylated promoters, age-specific differentially methylated cytosines/regions, and methylated CpG islands. Moreover, MethBank provides tools to estimate human methylation age online and to identify differentially methylated promoters, respectively. Taken together, MethBank is upgraded with significant improvements and advances over the previous version, which is of great help for deciphering DNA methylation regulatory mechanisms for epigenetic studies. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
GBshape: a genome browser database for DNA shape annotations

PubMed Central

Chiu, Tsu-Pei; Yang, Lin; Zhou, Tianyin; Main, Bradley J.; Parker, Stephen C.J.; Nuzhdin, Sergey V.; Tullius, Thomas D.; Rohs, Remo

2015-01-01

Many regulatory mechanisms require a high degree of specificity in protein-DNA binding. Nucleotide sequence does not provide an answer to the question of why a protein binds only to a small subset of the many putative binding sites in the genome that share the same core motif. Whereas higher-order effects, such as chromatin accessibility, cooperativity and cofactors, have been described, DNA shape recently gained attention as another feature that fine-tunes the DNA binding specificities of some transcription factor families. Our Genome Browser for DNA shape annotations (GBshape; freely available at http://rohslab.cmb.usc.edu/GBshape/) provides minor groove width, propeller twist, roll, helix twist and hydroxyl radical cleavage predictions for the entire genomes of 94 organisms. Additional genomes can easily be added using the GBshape framework. GBshape can be used to visualize DNA shape annotations qualitatively in a genome browser track format, and to download quantitative values of DNA shape features as a function of genomic position at nucleotide resolution. As biological applications, we illustrate the periodicity of DNA shape features that are present in nucleosome-occupied sequences from human, fly and worm, and we demonstrate structural similarities between transcription start sites in the genomes of four Drosophila species. PMID:25326329
A large scale analysis of cDNA in Arabidopsis thaliana: generation of 12,028 non-redundant expressed sequence tags from normalized and size-selected cDNA libraries.

PubMed

Asamizu, E; Nakamura, Y; Sato, S; Tabata, S

2000-06-30

For comprehensive analysis of genes expressed in the model dicotyledonous plant, Arabidopsis thaliana, expressed sequence tags (ESTs) were accumulated. Normalized and size-selected cDNA libraries were constructed from aboveground organs, flower buds, roots, green siliques and liquid-cultured seedlings, respectively, and a total of 14,026 5'-end ESTs and 39,207 3'-end ESTs were obtained. The 3'-end ESTs could be clustered into 12,028 non-redundant groups. Similarity search of the non-redundant ESTs against the public non-redundant protein database indicated that 4816 groups show similarity to genes of known function, 1864 to hypothetical genes, and the remaining 5348 are novel sequences. Gene coverage by the non-redundant ESTs was analyzed using the annotated genomic sequences of approximately 10 Mb on chromosomes 3 and 5. A total of 923 regions were hit by at least one EST, among which only 499 regions were hit by the ESTs deposited in the public database. The result indicates that the EST source generated in this project complements the EST data in the public database and facilitates new gene discovery.
POLLUX: a program for simulated cloning, mutagenesis and database searching of DNA constructs.

PubMed

Dayringer, H E; Sammons, S A

1991-04-01

Computer support for research in biotechnology has developed rapidly and has provided several tools to aid the researcher. This report describes the capabilities of new computer software developed in this laboratory to aid in the documentation and planning of experiments in molecular biology. The program, POLLUX, provides a graphical medium for the entry, edit and manipulation of DNA constructs and a textual format for display and edit of construct descriptive data. Program operation and procedures are designed to mimic the actual laboratory experiments with respect to capability and the order in which they are performed. Flexible control over the content of the computer-generated displays and program facilities is provided by a mouse-driven menu interface. Programmed facilities for mutagenesis, simulated cloning and searching of the database from networked workstations are described.
GBshape: a genome browser database for DNA shape annotations.

PubMed

Chiu, Tsu-Pei; Yang, Lin; Zhou, Tianyin; Main, Bradley J; Parker, Stephen C J; Nuzhdin, Sergey V; Tullius, Thomas D; Rohs, Remo

2015-01-01

Many regulatory mechanisms require a high degree of specificity in protein-DNA binding. Nucleotide sequence does not provide an answer to the question of why a protein binds only to a small subset of the many putative binding sites in the genome that share the same core motif. Whereas higher-order effects, such as chromatin accessibility, cooperativity and cofactors, have been described, DNA shape recently gained attention as another feature that fine-tunes the DNA binding specificities of some transcription factor families. Our Genome Browser for DNA shape annotations (GBshape; freely available at http://rohslab.cmb.usc.edu/GBshape/) provides minor groove width, propeller twist, roll, helix twist and hydroxyl radical cleavage predictions for the entire genomes of 94 organisms. Additional genomes can easily be added using the GBshape framework. GBshape can be used to visualize DNA shape annotations qualitatively in a genome browser track format, and to download quantitative values of DNA shape features as a function of genomic position at nucleotide resolution. As biological applications, we illustrate the periodicity of DNA shape features that are present in nucleosome-occupied sequences from human, fly and worm, and we demonstrate structural similarities between transcription start sites in the genomes of four Drosophila species. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Establishment of an international database for genetic variants in esophageal cancer.

PubMed

Vihinen, Mauno

2016-10-01

The establishment of a database has been suggested in order to collect, organize, and distribute genetic information about esophageal cancer. The World Organization for Specialized Studies on Diseases of the Esophagus and the Human Variome Project will be in charge of a central database of information about esophageal cancer-related variations from publications, databases, and laboratories; in addition to genetic details, clinical parameters will also be included. The aim will be to get all the central players in research, clinical, and commercial laboratories to contribute. The database will follow established recommendations and guidelines. The database will require a team of dedicated curators with different backgrounds. Numerous layers of systematics will be applied to facilitate computational analyses. The data items will be extensively integrated with other information sources. The database will be distributed as open access to ensure exchange of the data with other databases. Variations will be reported in relation to reference sequences on three levels--DNA, RNA, and protein-whenever applicable. In the first phase, the database will concentrate on genetic variations including both somatic and germline variations for susceptibility genes. Additional types of information can be integrated at a later stage. © 2016 New York Academy of Sciences.
Encoding of low-quality DNA profiles as genotype probability matrices for improved profile comparisons, relatedness evaluation and database searches.

PubMed

Ryan, K; Williams, D Gareth; Balding, David J

2016-11-01

Many DNA profiles recovered from crime scene samples are of a quality that does not allow them to be searched against, nor entered into, databases. We propose a method for the comparison of profiles arising from two DNA samples, one or both of which can have multiple donors and be affected by low DNA template or degraded DNA. We compute likelihood ratios to evaluate the hypothesis that the two samples have a common DNA donor, and hypotheses specifying the relatedness of two donors. Our method uses a probability distribution for the genotype of the donor of interest in each sample. This distribution can be obtained from a statistical model, or we can exploit the ability of trained human experts to assess genotype probabilities, thus extracting much information that would be discarded by standard interpretation rules. Our method is compatible with established methods in simple settings, but is more widely applicable and can make better use of information than many current methods for the analysis of mixed-source, low-template DNA profiles. It can accommodate uncertainty arising from relatedness instead of or in addition to uncertainty arising from noisy genotyping. We describe a computer program GPMDNA, available under an open source licence, to calculate LRs using the method presented in this paper. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Molecular Identification and Databases in Fusarium

USDA-ARS?s Scientific Manuscript database

DNA sequence-based methods for identifying pathogenic and mycotoxigenic Fusarium isolates have become the gold standard worldwide. Moreover, fusarial DNA sequence data are increasing rapidly in several web-accessible databases for comparative purposes. Unfortunately, the use of Basic Alignment Sea...
Dfam: a database of repetitive DNA based on profile hidden Markov models.

PubMed

Wheeler, Travis J; Clements, Jody; Eddy, Sean R; Hubley, Robert; Jones, Thomas A; Jurka, Jerzy; Smit, Arian F A; Finn, Robert D

2013-01-01

We present a database of repetitive DNA elements, called Dfam (http://dfam.janelia.org). Many genomes contain a large fraction of repetitive DNA, much of which is made up of remnants of transposable elements (TEs). Accurate annotation of TEs enables research into their biology and can shed light on the evolutionary processes that shape genomes. Identification and masking of TEs can also greatly simplify many downstream genome annotation and sequence analysis tasks. The commonly used TE annotation tools RepeatMasker and Censor depend on sequence homology search tools such as cross_match and BLAST variants, as well as Repbase, a collection of known TE families each represented by a single consensus sequence. Dfam contains entries corresponding to all Repbase TE entries for which instances have been found in the human genome. Each Dfam entry is represented by a profile hidden Markov model, built from alignments generated using RepeatMasker and Repbase. When used in conjunction with the hidden Markov model search tool nhmmer, Dfam produces a 2.9% increase in coverage over consensus sequence search methods on a large human benchmark, while maintaining low false discovery rates, and coverage of the full human genome is 54.5%. The website provides a collection of tools and data views to support improved TE curation and annotation efforts. Dfam is also available for download in flat file format or in the form of MySQL table dumps.
Reusable data in public health data-bases-problems encountered in Danish Children's Database.

PubMed

Høstgaard, Anna Marie; Pape-Haugaard, Louise

2012-01-01

Denmark have unique health informatics databases e.g. "The Children's Database", which since 2009 holds data on all Danish children from birth until 17 years of age. In the current set-up a number of potential sources of errors exist - both technical and human-which means that the data is flawed. This gives rise to erroneous statistics and makes the data unsuitable for research purposes. In order to make the data usable, it is necessary to develop new methods for validating the data generation process at the municipal/regional/national level. In the present ongoing research project, two research areas are combined: Public Health Informatics and Computer Science, and both ethnographic as well as system engineering research methods are used. The project is expected to generate new generic methods and knowledge about electronic data collection and transmission in different social contexts and by different social groups and thus to be of international importance, since this is sparsely documented in the Public Health Informatics perspective. This paper presents the preliminary results, which indicate that health information technology used ought to be subject for redesign, where a thorough insight into the work practices should be point of departure.
The Society of Thoracic Surgeons Congenital Heart Surgery Database Public Reporting Initiative.

PubMed

Jacobs, Jeffrey P

2017-01-01

Three basic principles provide the rationale for the Society of Thoracic Surgeons (STS) Congenital Heart Surgery Database (CHSD) public reporting initiative: (1) Variation in congenital and pediatric cardiac surgical outcomes exist. (2) Patients and their families have the right to know the outcomes of the treatments that they will receive. (3). It is our professional responsibility to share this information with them in a format they can understand. The STS CHSD public reporting initiative facilitates the voluntary transparent public reporting of congenital and pediatric cardiac surgical outcomes using the STS CHSD Mortality Risk Model. The STS CHSD Mortality Risk Model is used to calculate risk-adjusted operative mortality and adjusts for the following variables: age, primary procedure, weight (neonates and infants), prior cardiothoracic operations, non-cardiac congenital anatomic abnormalities, chromosomal abnormalities or syndromes, prematurity (neonates and infants), and preoperative factors (including preoperative/preprocedural mechanical circulatory support [intraaortic balloon pump, ventricular assist device, extracorporeal membrane oxygenation, or cardiopulmonary support], shock [persistent at time of surgery], mechanical ventilation to treat cardiorespiratory failure, renal failure requiring dialysis and/or renal dysfunction, preoperative neurological deficit, and other preoperative factors). Operative mortality is defined in all STS databases as (1) all deaths, regardless of cause, occurring during the hospitalization in which the operation was performed, even if after 30 days (including patients transferred to other acute care facilities); and (2) all deaths, regardless of cause, occurring after discharge from the hospital, but before the end of the 30 th postoperative day. The STS CHSD Mortality Risk Model has good model fit and discrimination with an overall C statistics of 0.875 and 0.858 in the development sample and the validation sample
Non-B DB v2.0: a database of predicted non-B DNA-forming motifs and its associated tools.

PubMed

Cer, Regina Z; Donohue, Duncan E; Mudunuri, Uma S; Temiz, Nuri A; Loss, Michael A; Starner, Nathan J; Halusa, Goran N; Volfovsky, Natalia; Yi, Ming; Luke, Brian T; Bacolla, Albino; Collins, Jack R; Stephens, Robert M

2013-01-01

The non-B DB, available at http://nonb.abcc.ncifcrf.gov, catalogs predicted non-B DNA-forming sequence motifs, including Z-DNA, G-quadruplex, A-phased repeats, inverted repeats, mirror repeats, direct repeats and their corresponding subsets: cruciforms, triplexes and slipped structures, in several genomes. Version 2.0 of the database revises and re-implements the motif discovery algorithms to better align with accepted definitions and thresholds for motifs, expands the non-B DNA-forming motifs coverage by including short tandem repeats and adds key visualization tools to compare motif locations relative to other genomic annotations. Non-B DB v2.0 extends the ability for comparative genomics by including re-annotation of the five organisms reported in non-B DB v1.0, human, chimpanzee, dog, macaque and mouse, and adds seven additional organisms: orangutan, rat, cow, pig, horse, platypus and Arabidopsis thaliana. Additionally, the non-B DB v2.0 provides an overall improved graphical user interface and faster query performance.

Models in Translational Oncology: A Public Resource Database for Preclinical Cancer Research.

PubMed

Galuschka, Claudia; Proynova, Rumyana; Roth, Benjamin; Augustin, Hellmut G; Müller-Decker, Karin

2017-05-15

The devastating diseases of human cancer are mimicked in basic and translational cancer research by a steadily increasing number of tumor models, a situation requiring a platform with standardized reports to share model data. Models in Translational Oncology (MiTO) database was developed as a unique Web platform aiming for a comprehensive overview of preclinical models covering genetically engineered organisms, models of transplantation, chemical/physical induction, or spontaneous development, reviewed here. MiTO serves data entry for metastasis profiles and interventions. Moreover, cell lines and animal lines including tool strains can be recorded. Hyperlinks for connection with other databases and file uploads as supplementary information are supported. Several communication tools are offered to facilitate exchange of information. Notably, intellectual property can be protected prior to publication by inventor-defined accessibility of any given model. Data recall is via a highly configurable keyword search. Genome editing is expected to result in changes of the spectrum of model organisms, a reason to open MiTO for species-independent data. Registered users may deposit own model fact sheets (FS). MiTO experts check them for plausibility. Independently, manually curated FS are provided to principle investigators for revision and publication. Importantly, noneditable versions of reviewed FS can be cited in peer-reviewed journals. Cancer Res; 77(10); 2557-63. ©2017 AACR . ©2017 American Association for Cancer Research.
Exploring public databases to characterize urban flood risks in Amsterdam

NASA Astrophysics Data System (ADS)

Gaitan, Santiago; ten Veldhuis, Marie-claire; van de Giesen, Nick

2015-04-01

Cities worldwide are challenged by increasing urban flood risks. Precise and realistic measures are required to decide upon investment to reduce their impacts. Obvious flooding factors affecting flood risk include sewer systems performance and urban topography. However, currently implemented sewer and topographic models do not provide realistic predictions of local flooding occurrence during heavy rain events. Assessing other factors such as spatially distributed rainfall and socioeconomic characteristics may help to explain probability and impacts of urban flooding. Several public databases were analyzed: complaints about flooding made by citizens, rainfall depths (15 min and 100 Ha spatio-temporal resolution), grids describing number of inhabitants, income, and housing price (1Ha and 25Ha resolution); and buildings age. Data analysis was done using Python and GIS programming, and included spatial indexing of data, cluster analysis, and multivariate regression on the complaints. Complaints were used as a proxy to characterize flooding impacts. The cluster analysis, run for all the variables except the complaints, grouped part of the grid-cells of central Amsterdam into a highly differentiated group, covering 10% of the analyzed area, and accounting for 25% of registered complaints. The configuration of the analyzed variables in central Amsterdam coincides with a high complaint count. Remaining complaints were evenly dispersed along other groups. An adjusted R2 of 0.38 in the multivariate regression suggests that explaining power can improve if additional variables are considered. While rainfall intensity explained 4% of the incidence of complaints, population density and building age significantly explained around 20% each. Data mining of public databases proved to be a valuable tool to identify factors explaining variability in occurrence of urban pluvial flooding, though additional variables must be considered to fully explain flood risk variability.
Tomato Expression Database (TED): a suite of data presentation and analysis tools

PubMed Central

Fei, Zhangjun; Tang, Xuemei; Alba, Rob; Giovannoni, James

2006-01-01

The Tomato Expression Database (TED) includes three integrated components. The Tomato Microarray Data Warehouse serves as a central repository for raw gene expression data derived from the public tomato cDNA microarray. In addition to expression data, TED stores experimental design and array information in compliance with the MIAME guidelines and provides web interfaces for researchers to retrieve data for their own analysis and use. The Tomato Microarray Expression Database contains normalized and processed microarray data for ten time points with nine pair-wise comparisons during fruit development and ripening in a normal tomato variety and nearly isogenic single gene mutants impacting fruit development and ripening. Finally, the Tomato Digital Expression Database contains raw and normalized digital expression (EST abundance) data derived from analysis of the complete public tomato EST collection containing >150 000 ESTs derived from 27 different non-normalized EST libraries. This last component also includes tools for the comparison of tomato and Arabidopsis digital expression data. A set of query interfaces and analysis, and visualization tools have been developed and incorporated into TED, which aid users in identifying and deciphering biologically important information from our datasets. TED can be accessed at . PMID:16381976
Tomato Expression Database (TED): a suite of data presentation and analysis tools.

PubMed

Fei, Zhangjun; Tang, Xuemei; Alba, Rob; Giovannoni, James

2006-01-01

The Tomato Expression Database (TED) includes three integrated components. The Tomato Microarray Data Warehouse serves as a central repository for raw gene expression data derived from the public tomato cDNA microarray. In addition to expression data, TED stores experimental design and array information in compliance with the MIAME guidelines and provides web interfaces for researchers to retrieve data for their own analysis and use. The Tomato Microarray Expression Database contains normalized and processed microarray data for ten time points with nine pair-wise comparisons during fruit development and ripening in a normal tomato variety and nearly isogenic single gene mutants impacting fruit development and ripening. Finally, the Tomato Digital Expression Database contains raw and normalized digital expression (EST abundance) data derived from analysis of the complete public tomato EST collection containing >150,000 ESTs derived from 27 different non-normalized EST libraries. This last component also includes tools for the comparison of tomato and Arabidopsis digital expression data. A set of query interfaces and analysis, and visualization tools have been developed and incorporated into TED, which aid users in identifying and deciphering biologically important information from our datasets. TED can be accessed at http://ted.bti.cornell.edu.
Hawaii bibliographic database

USGS Publications Warehouse

Wright, T.L.; Takahashi, T.J.

1998-01-01

The Hawaii bibliographic database has been created to contain all of the literature, from 1779 to the present, pertinent to the volcanological history of the Hawaiian-Emperor volcanic chain. References are entered in a PC- and Macintosh-compatible EndNote Plus bibliographic database with keywords and abstracts or (if no abstract) with annotations as to content. Keywords emphasize location, discipline, process, identification of new chemical data or age determinations, and type of publication. The database is updated approximately three times a year and is available to upload from an ftp site. The bibliography contained 8460 references at the time this paper was submitted for publication. Use of the database greatly enhances the power and completeness of library searches for anyone interested in Hawaiian volcanism.
Applying Knowledge Discovery in Databases in Public Health Data Set: Challenges and Concerns

PubMed Central

Volrathongchia, Kanittha

2003-01-01

In attempting to apply Knowledge Discovery in Databases (KDD) to generate a predictive model from a health care dataset that is currently available to the public, the first step is to pre-process the data to overcome the challenges of missing data, redundant observations, and records containing inaccurate data. This study will demonstrate how to use simple pre-processing methods to improve the quality of input data. PMID:14728545
NPL-PAD (National Priorities List Publication Assistance Database) for Region 7

EPA Pesticide Factsheets

THIS DATA ASSET NO LONGER ACTIVE: This is metadata documentation for the National Priorities List (NPL) Publication Assistance Databsae (PAD), a Lotus Notes application that holds Region 7's universe of NPL site information such as site description, threats and contaminants, cleanup approach, environmental process, community involvement, site repository, and regional contacts. This database used to be updated annually, at different times for different NPLs, but it is currently no longer being used. This work fell under objectives for EPA's 2003-2008 Strategic Plan (Goal 3) for Land Preservation & Restoration, which are to clean up and reuse contaminated land.
DNA sequence database as a tool to identify decapod crustaceans on the São Paulo coastline.

PubMed

Mantelatto, Fernando L; Terossi, Mariana; Negri, Mariana; Buranelli, Raquel C; Robles, Rafael; Magalhães, Tatiana; Tamburus, Ana Francisca; Rossi, Natália; Miyazaki, Mayara J

2017-09-05

DNA barcoding has emerged as an efficient tool for taxonomy and other biodiversity fields. The vast and speciose group of decapod crustaceans is not an exception in the current scenario and comparing short DNA fragments has enabled researchers to overcome some taxonomic impediments to help broadening knowledge on the diversity of this group of crustaceans. Brazil is considered as an important area in terms of global marine biodiversity and some regions stand out in terms of decapod fauna, such as the São Paulo coastline. Thus, the aim of this study is to obtain sequences of the mitochondrial markers (COI and 16S) for decapod crustaceans distributed at the São Paulo coastline and to test the accuracy of these markers for species identification from this region by comparing our sequences to those already present in the GenBank database. We sampled along almost the 300 km of the São Paulo coastline from estuaries to offshore islands during the development of a multidisciplinary research project that took place for 5 years. All the species were processed to obtain the DNA sequences. The diversity of the decapod fauna on the São Paulo coastline comprises at least 404 species. We were able to collect 256 of those species and sequence of at least one of the target genes from 221. By testing the accuracy of these two DNA markers as a tool for identification, we were able to check our own identifications, including new records in GenBank, spot potential mistakes in GenBank, and detect potential new species.
Aptamer Database

PubMed Central

Lee, Jennifer F.; Hesselberth, Jay R.; Meyers, Lauren Ancel; Ellington, Andrew D.

2004-01-01

The aptamer database is designed to contain comprehensive sequence information on aptamers and unnatural ribozymes that have been generated by in vitro selection methods. Such data are not normally collected in ‘natural’ sequence databases, such as GenBank. Besides serving as a storehouse of sequences that may have diagnostic or therapeutic utility, the database serves as a valuable resource for theoretical biologists who describe and explore fitness landscapes. The database is updated monthly and is publicly available at http://aptamer.icmb.utexas.edu/. PMID:14681367
Toward a public analysis database for LHC new physics searches using M ADA NALYSIS 5

NASA Astrophysics Data System (ADS)

Dumont, B.; Fuks, B.; Kraml, S.; Bein, S.; Chalons, G.; Conte, E.; Kulkarni, S.; Sengupta, D.; Wymant, C.

2015-02-01

We present the implementation, in the MadAnalysis 5 framework, of several ATLAS and CMS searches for supersymmetry in data recorded during the first run of the LHC. We provide extensive details on the validation of our implementations and propose to create a public analysis database within this framework.
Effective biomedical document classification for identifying publications relevant to the mouse Gene Expression Database (GXD).

PubMed

Jiang, Xiangying; Ringwald, Martin; Blake, Judith; Shatkay, Hagit

2017-01-01

The Gene Expression Database (GXD) is a comprehensive online database within the Mouse Genome Informatics resource, aiming to provide available information about endogenous gene expression during mouse development. The information stems primarily from many thousands of biomedical publications that database curators must go through and read. Given the very large number of biomedical papers published each year, automatic document classification plays an important role in biomedical research. Specifically, an effective and efficient document classifier is needed for supporting the GXD annotation workflow. We present here an effective yet relatively simple classification scheme, which uses readily available tools while employing feature selection, aiming to assist curators in identifying publications relevant to GXD. We examine the performance of our method over a large manually curated dataset, consisting of more than 25 000 PubMed abstracts, of which about half are curated as relevant to GXD while the other half as irrelevant to GXD. In addition to text from title-and-abstract, we also consider image captions, an important information source that we integrate into our method. We apply a captions-based classifier to a subset of about 3300 documents, for which the full text of the curated articles is available. The results demonstrate that our proposed approach is robust and effectively addresses the GXD document classification. Moreover, using information obtained from image captions clearly improves performance, compared to title and abstract alone, affirming the utility of image captions as a substantial evidence source for automatically determining the relevance of biomedical publications to a specific subject area. www.informatics.jax.org. © The Author(s) 2017. Published by Oxford University Press.
Privacy protection and public goods: building a genetic database for health research in Newfoundland and Labrador.

PubMed

Kosseim, Patricia; Pullman, Daryl; Perrot-Daley, Astrid; Hodgkinson, Kathy; Street, Catherine; Rahman, Proton

2013-01-01

To provide a legal and ethical analysis of some of the implementation challenges faced by the Population Therapeutics Research Group (PTRG) at Memorial University (Canada), in using genealogical information offered by individuals for its genetics research database. This paper describes the unique historical and genetic characteristics of the Newfoundland and Labrador founder population, which gave rise to the opportunity for PTRG to build the Newfoundland Genealogy Database containing digitized records of all pre-confederation (1949) census records of the Newfoundland founder population. In addition to building the database, PTRG has developed the Heritability Analytics Infrastructure, a data management structure that stores genotype, phenotype, and pedigree information in a single database, and custom linkage software (KINNECT) to perform pedigree linkages on the genealogy database. A newly adopted legal regimen in Newfoundland and Labrador is discussed. It incorporates health privacy legislation with a unique research ethics statute governing the composition and activities of research ethics boards and, for the first time in Canada, elevating the status of national research ethics guidelines into law. The discussion looks at this integration of legal and ethical principles which provides a flexible and seamless framework for balancing the privacy rights and welfare interests of individuals, families, and larger societies in the creation and use of research data infrastructures as public goods. The complementary legal and ethical frameworks that now coexist in Newfoundland and Labrador provide the legislative authority, ethical legitimacy, and practical flexibility needed to find a workable balance between privacy interests and public goods. Such an approach may also be instructive for other jurisdictions as they seek to construct and use biobanks and related research platforms for genetic research.
A meta-data based method for DNA microarray imputation.

PubMed

Jörnsten, Rebecka; Ouyang, Ming; Wang, Hui-Yu

2007-03-29

DNA microarray experiments are conducted in logical sets, such as time course profiling after a treatment is applied to the samples, or comparisons of the samples under two or more conditions. Due to cost and design constraints of spotted cDNA microarray experiments, each logical set commonly includes only a small number of replicates per condition. Despite the vast improvement of the microarray technology in recent years, missing values are prevalent. Intuitively, imputation of missing values is best done using many replicates within the same logical set. In practice, there are few replicates and thus reliable imputation within logical sets is difficult. However, it is in the case of few replicates that the presence of missing values, and how they are imputed, can have the most profound impact on the outcome of downstream analyses (e.g. significance analysis and clustering). This study explores the feasibility of imputation across logical sets, using the vast amount of publicly available microarray data to improve imputation reliability in the small sample size setting. We download all cDNA microarray data of Saccharomyces cerevisiae, Arabidopsis thaliana, and Caenorhabditis elegans from the Stanford Microarray Database. Through cross-validation and simulation, we find that, for all three species, our proposed imputation using data from public databases is far superior to imputation within a logical set, sometimes to an astonishing degree. Furthermore, the imputation root mean square error for significant genes is generally a lot less than that of non-significant ones. Since downstream analysis of significant genes, such as clustering and network analysis, can be very sensitive to small perturbations of estimated gene effects, it is highly recommended that researchers apply reliable data imputation prior to further analysis. Our method can also be applied to cDNA microarray experiments from other species, provided good reference data are available.
SinEx DB: a database for single exon coding sequences in mammalian genomes.

PubMed

Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S

2016-01-01

Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl. © The Author(s) 2016. Published by Oxford University Press.
TRedD—A database for tandem repeats over the edit distance

PubMed Central

Sokol, Dina; Atagun, Firat

2010-01-01

A ‘tandem repeat’ in DNA is a sequence of two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats are common in the genomes of both eukaryotic and prokaryotic organisms. They are significant markers for human identity testing, disease diagnosis, sequence homology and population studies. In this article, we describe a new database, TRedD, which contains the tandem repeats found in the human genome. The database is publicly available online, and the software for locating the repeats is also freely available. The definition of tandem repeats used by TRedD is a new and innovative definition based upon the concept of ‘evolutive tandem repeats’. In addition, we have developed a tool, called TandemGraph, to graphically depict the repeats occurring in a sequence. This tool can be coupled with any repeat finding software, and it should greatly facilitate analysis of results. Database URL: http://tandem.sci.brooklyn.cuny.edu/ PMID:20624712
DNA profiles, computer searches, and the Fourth Amendment.

PubMed

Kimel, Catherine W

2013-01-01

Pursuant to federal statutes and to laws in all fifty states, the United States government has assembled a database containing the DNA profiles of over eleven million citizens. Without judicial authorization, the government searches each of these profiles one-hundred thousand times every day, seeking to link database subjects to crimes they are not suspected of committing. Yet, courts and scholars that have addressed DNA databasing have focused their attention almost exclusively on the constitutionality of the government's seizure of the biological samples from which the profiles are generated. This Note fills a gap in the scholarship by examining the Fourth Amendment problems that arise when the government searches its vast DNA database. This Note argues that each attempt to match two DNA profiles constitutes a Fourth Amendment search because each attempted match infringes upon database subjects' expectations of privacy in their biological relationships and physical movements. The Note further argues that database searches are unreasonable as they are currently conducted, and it suggests an adaptation of computer-search procedures to remedy the constitutional deficiency.
A Chronostratigraphic Relational Database Ontology

NASA Astrophysics Data System (ADS)

Platon, E.; Gary, A.; Sikora, P.

2005-12-01

A chronostratigraphic research database was donated by British Petroleum to the Stratigraphy Group at the Energy and Geoscience Institute (EGI), University of Utah. These data consists of over 2,000 measured sections representing over three decades of research into the application of the graphic correlation method. The data are global and includes both microfossil (foraminifera, calcareous nannoplankton, spores, pollen, dinoflagellate cysts, etc) and macrofossil data. The objective of the donation was to make the research data available to the public in order to encourage additional chronostratigraphy studies, specifically regarding graphic correlation. As part of the National Science Foundation's Cyberinfrastructure for the Geosciences (GEON) initiative these data have been made available to the public at http://css.egi.utah.edu. To encourage further research using the graphic correlation method, EGI has developed a software package, StrataPlot that will soon be publicly available from the GEON website as a standalone software download. The EGI chronostratigraphy research database, although relatively large, has many data holes relative to some paleontological disciplines and geographical areas, so the challenge becomes how do we expand the data available for chronostratigrahic studies using graphic correlation. There are several public or soon-to-be public databases available to chronostratigraphic research, but they have their own data structures and modes of presentation. The heterogeneous nature of these database schemas hinders their integration and makes it difficult for the user to retrieve and consolidate potentially valuable chronostratigraphic data. The integration of these data sources would facilitate rapid and comprehensive data searches, thus helping advance studies in chronostratigraphy. The GEON project will host a number of databases within the geology domain, some of which contain biostratigraphic data. Ontologies are being developed to provide
NASA STI Database, Aerospace Database and ARIN coverage of 'space law'

NASA Technical Reports Server (NTRS)

Buchan, Ronald L.

1992-01-01

The space-law coverage provided by the NASA STI Database, the Aerospace Database, and ARIN is briefly described. Particular attention is given to the space law content of the two Databases and of ARIN, the NASA Thesauras space law terminology, space law publication forms, and the availability of the space law literature.
Young people's awareness on biobanking and DNA profiling: results of a questionnaire administered to Italian university students.

PubMed

Tozzo, Pamela; Fassina, Antonio; Caenazzo, Luciana

2017-12-01

Current policy approaches to social and ethical issues surrounding biobanks manifest lack of public information given by researchers and government, despite the evidence that Italian citizens are well informed about technical and other public perspectives of biotechnologies. For this reason, the focus of our survey was to interview our University's students on these aspects. The sample consisted of Padua University students (N = 959), who were administered a questionnaire comprising eight questions covering their knowledge about biobanks, their perception of the related benefits and risks, their willingness to donate samples to a biobank for research purposes, their attitude to having their own DNA profile included in a forensic DNA database, and the reasons behind their answers. The vast majority of the students invited to take part in the survey completed the questionnaire, and the number of participants sufficed to be considered representative of the target population. Despite the respondents' unfamiliarity with the topics explored, suggested by the huge group of respondents answering "I don't know" to the questions regarding Itaian regulation and reality, their answers demonstrate a general agreement to participate in a biobanking scheme for research purposes, as expressed by the 91% of respondents who were reportedly willing to donate their samples. As for the idea of a forensic DNA database, 35% of respondents said they would agree to having their profile included in such a database, even if they were not fully aware of the benefits and risks of such action.This study shows that Italian people with a higher education take a generally positive attitude to the idea of donating biological samples. It contributes to empirical evidence of what Italy's citizens understand about biobanking, and of their willingness to donate samples for research purposes, and also to have their genetic profiles included in a national forensic DNA database. Our findings may have
REDIdb: the RNA editing database.

PubMed

Picardi, Ernesto; Regina, Teresa Maria Rosaria; Brennicke, Axel; Quagliariello, Carla

2007-01-01

The RNA Editing Database (REDIdb) is an interactive, web-based database created and designed with the aim to allocate RNA editing events such as substitutions, insertions and deletions occurring in a wide range of organisms. The database contains both fully and partially sequenced DNA molecules for which editing information is available either by experimental inspection (in vitro) or by computational detection (in silico). Each record of REDIdb is organized in a specific flat-file containing a description of the main characteristics of the entry, a feature table with the editing events and related details and a sequence zone with both the genomic sequence and the corresponding edited transcript. REDIdb is a relational database in which the browsing and identification of editing sites has been simplified by means of two facilities to either graphically display genomic or cDNA sequences or to show the corresponding alignment. In both cases, all editing sites are highlighted in colour and their relative positions are detailed by mousing over. New editing positions can be directly submitted to REDIdb after a user-specific registration to obtain authorized secure access. This first version of REDIdb database stores 9964 editing events and can be freely queried at http://biologia.unical.it/py_script/search.html.

The Fulton County Medical Examiner's experience with the Federal Bureau of Investigation National Missing Person DNA Database Program, 2004-2007.

PubMed

Heninger, Michael; Hanzlick, Randy

2011-03-01

Medical examiners and coroners occasionally encounter unidentified human bodies, which remain unidentified for extended periods. In such cases, when traditional methods of identification have failed or cannot be used, DNA profiling may be used. The Federal Bureau of Investigation has a National Missing Person DNA database (NMPDD) laboratory to which samples may be submitted on such cases and from possible relatives or environments of unidentified decedents. This article describes the experience of the Fulton County Medical Examiner (FCME) in submitting samples to the NMPDD laboratory. A database was established at the FCME to track the submission of samples from unidentified decedents to the NMPDD laboratory for DNA testing along with the results and turnaround times. In December 2004, the FCME inventoried all cases for which samples were available and began to submit them to the NMPDD laboratory for testing. DNA testing and isolation rates, sample type, and turnaround times were tabulated in October 2006 for samples submitted between December 16, 2004 and December 16, 2005. An overall summary of data was also prepared concerning the status of all samples submitted as of April 17, 2007. During the 1-year study period, samples from 77 unidentified decedents were submitted to the laboratory. As of October 2006 (22 months after submission of the first samples and 10 months after submission of the last samples), testing had been completed on 53% of the samples submitted, and 68% of those tested resulted in a mitochondrial DNA profile. Turnaround times ranged from 66 to 557 days, improved with time, and had a mean of 107 days for specimens submitted during the latter part of the study period. As of April 17, 2007, we had submitted samples involving 84 unidentified decedents. Seventy-five percent of the samples have now been tested. Data from the NMPDD laboratory have resulted in 4 identifications by comparison with putative relatives, 4 exclusions, and no cold hits
The problems and promise of DNA barcodes for species diagnosis of primate biomaterials

PubMed Central

Lorenz, Joseph G; Jackson, Whitney E; Beck, Jeanne C; Hanner, Robert

2005-01-01

The Integrated Primate Biomaterials and Information Resource (www.IPBIR.org) provides essential research reagents to the scientific community by establishing, verifying, maintaining, and distributing DNA and RNA derived from primate cell cultures. The IPBIR uses mitochondrial cytochrome c oxidase subunit I sequences to verify the identity of samples for quality control purposes in the accession, cell culture, DNA extraction processes and prior to shipping to end users. As a result, IPBIR is accumulating a database of ‘DNA barcodes’ for many species of primates. However, this quality control process is complicated by taxon specific patterns of ‘universal primer’ failure, as well as the amplification or co-amplification of nuclear pseudogenes of mitochondrial origins. To overcome these difficulties, taxon specific primers have been developed, and reverse transcriptase PCR is utilized to exclude these extraneous sequences from amplification. DNA barcoding of primates has applications to conservation and law enforcement. Depositing barcode sequences in a public database, along with primer sequences, trace files and associated quality scores, makes this species identification technique widely accessible. Reference DNA barcode sequences should be derived from, and linked to, specimens of known provenance in web-accessible collections in order to validate this system of molecular diagnostics. PMID:16214744
Data on publications, structural analyses, and queries used to build and utilize the AlloRep database.

PubMed

Sousa, Filipa L; Parente, Daniel J; Hessman, Jacob A; Chazelle, Allen; Teichmann, Sarah A; Swint-Kruse, Liskin

2016-09-01

The AlloRep database (www.AlloRep.org) (Sousa et al., 2016) [1] compiles extensive sequence, mutagenesis, and structural information for the LacI/GalR family of transcription regulators. Sequence alignments are presented for >3000 proteins in 45 paralog subfamilies and as a subsampled alignment of the whole family. Phenotypic and biochemical data on almost 6000 mutants have been compiled from an exhaustive search of the literature; citations for these data are included herein. These data include information about oligomerization state, stability, DNA binding and allosteric regulation. Protein structural data for 65 proteins are presented as easily-accessible, residue-contact networks. Finally, this article includes example queries to enable the use of the AlloRep database. See the related article, "AlloRep: a repository of sequence, structural and mutagenesis data for the LacI/GalR transcription regulators" (Sousa et al., 2016) [1].
The electric dipole moment of DNA-binding HU protein calculated by the use of an NMR database.

PubMed

Takashima, S; Yamaoka, K

1999-08-30

Electric birefringence measurements indicated the presence of a large permanent dipole moment in HU protein-DNA complex. In order to substantiate this observation, numerical computation of the dipole moment of HU protein homodimer was carried out by using NMR protein databases. The dipole moments of globular proteins have hitherto been calculated with X-ray databases and NMR data have never been used before. The advantages of NMR databases are: (a) NMR data are obtained, unlike X-ray databases, using protein solutions. Accordingly, this method eliminates the bothersome question as to the possible alteration of the protein structure due to the transition from the crystalline state to the solution state. This question is particularly important for proteins such as HU protein which has some degree of internal flexibility; (b) the three-dimensional coordinates of hydrogen atoms in protein molecules can be determined with a sufficient resolution and this enables the N-H as well as C = O bond moments to be calculated. Since the NMR database of HU protein from Bacillus stearothermophilus consists of 25 models, the surface charge as well as the core dipole moments were computed for each of these structures. The results of these calculations show that the net permanent dipole moments of HU protein homodimer is approximately 500-530 D (1 D = 3.33 x 10(-30) Cm) at pH 7.5 and 600-630 D at the isoelectric point (pH 10.5). These permanent dipole moments are unusually large for a small protein of the size of 19.5 kDa. Nevertheless, the result of numerical calculations is compatible with the electro-optical observation, confirming a very large dipole moment in this protein.
DNA Profiling of Convicted Offender Samples for the Combined DNA Index System

ERIC Educational Resources Information Center

Millard, Julie T

2011-01-01

The cornerstone of forensic chemistry is that a perpetrator inevitably leaves trace evidence at a crime scene. One important type of evidence is DNA, which has been instrumental in both the implication and exoneration of thousands of suspects in a wide range of crimes. The Combined DNA Index System (CODIS), a network of DNA databases, provides…
STANDARDIZATION AND STRUCTURAL ANNOTATION OF PUBLIC TOXICITY DATABASES: IMPROVING SAR CAPABILITIES AND LINKAGE TO 'OMICS DATA

EPA Science Inventory

Standardization and structural annotation of public toxicity databases: Improving SAR capabilities and linkage to 'omics data
Ann M. Richard', ClarLynda Williams', Jamie Burch2
'Nat Health & Environ Res Lab, US EPA, RTP, NC 27711; 2EPA/NC Central Univ Student COOP Trainee<...
Recent updates and developments to plant genome size databases

PubMed Central

Garcia, Sònia; Leitch, Ilia J.; Anadon-Rosell, Alba; Canela, Miguel Á.; Gálvez, Francisco; Garnatje, Teresa; Gras, Airy; Hidalgo, Oriane; Johnston, Emmeline; Mas de Xaxars, Gemma; Pellicer, Jaume; Siljak-Yakovlev, Sonja; Vallès, Joan; Vitales, Daniel; Bennett, Michael D.

2014-01-01

Two plant genome size databases have been recently updated and/or extended: the Plant DNA C-values database (http://data.kew.org/cvalues), and GSAD, the Genome Size in Asteraceae database (http://www.asteraceaegenomesize.com). While the first provides information on nuclear DNA contents across land plants and some algal groups, the second is focused on one of the largest and most economically important angiosperm families, Asteraceae. Genome size data have numerous applications: they can be used in comparative studies on genome evolution, or as a tool to appraise the cost of whole-genome sequencing programs. The growing interest in genome size and increasing rate of data accumulation has necessitated the continued update of these databases. Currently, the Plant DNA C-values database (Release 6.0, Dec. 2012) contains data for 8510 species, while GSAD has 1219 species (Release 2.0, June 2013), representing increases of 17 and 51%, respectively, in the number of species with genome size data, compared with previous releases. Here we provide overviews of the most recent releases of each database, and outline new features of GSAD. The latter include (i) a tool to visually compare genome size data between species, (ii) the option to export data and (iii) a webpage containing information about flow cytometry protocols. PMID:24288377
On the level of coverage and citation of publications by mechanicians of the national academy of sciences of Ukraine in the Scopus database

NASA Astrophysics Data System (ADS)

Guz, A. N.; Rushchitsky, J. J.

2009-11-01

The paper analyzes the level of coverage and citation of publications by mechanicians of the National Academy of Sciences of Ukraine (NASU) in the Scopus database. Two groups of mechanicians are considered. One group includes 66 doctors of sciences of the S. P. Timoshenko Institute of Mechanics as representatives of the oldest institute of the NASU. The other group includes 34 members (academicians and corresponding members) of the Division of Mechanics of the NASU as representatives of the authoritative community of mechanicians in Ukraine. The results are presented for each scientist in the form of two indices—the total number of publications accessible in the database as the level of coverage of the scientist's publications in this database and the h-index as the citation level of these publications. This paper may be considered to continue the papers [6-12] published in Prikladnaya Mekhanika (International Applied Mechanics) in 2005-2009
Privacy protection and public goods: building a genetic database for health research in Newfoundland and Labrador

PubMed Central

Pullman, Daryl; Perrot-Daley, Astrid; Hodgkinson, Kathy; Street, Catherine; Rahman, Proton

2013-01-01

Objective To provide a legal and ethical analysis of some of the implementation challenges faced by the Population Therapeutics Research Group (PTRG) at Memorial University (Canada), in using genealogical information offered by individuals for its genetics research database. Materials and methods This paper describes the unique historical and genetic characteristics of the Newfoundland and Labrador founder population, which gave rise to the opportunity for PTRG to build the Newfoundland Genealogy Database containing digitized records of all pre-confederation (1949) census records of the Newfoundland founder population. In addition to building the database, PTRG has developed the Heritability Analytics Infrastructure, a data management structure that stores genotype, phenotype, and pedigree information in a single database, and custom linkage software (KINNECT) to perform pedigree linkages on the genealogy database. Discussion A newly adopted legal regimen in Newfoundland and Labrador is discussed. It incorporates health privacy legislation with a unique research ethics statute governing the composition and activities of research ethics boards and, for the first time in Canada, elevating the status of national research ethics guidelines into law. The discussion looks at this integration of legal and ethical principles which provides a flexible and seamless framework for balancing the privacy rights and welfare interests of individuals, families, and larger societies in the creation and use of research data infrastructures as public goods. Conclusion The complementary legal and ethical frameworks that now coexist in Newfoundland and Labrador provide the legislative authority, ethical legitimacy, and practical flexibility needed to find a workable balance between privacy interests and public goods. Such an approach may also be instructive for other jurisdictions as they seek to construct and use biobanks and related research platforms for genetic research. PMID
Drinking Water Treatability Database (Database)

EPA Science Inventory

The drinking Water Treatability Database (TDB) will provide data taken from the literature on the control of contaminants in drinking water, and will be housed on an interactive, publicly-available USEPA web site. It can be used for identifying effective treatment processes, rec...
Constructing a publically available distracted driving database and research tool.

PubMed

Atchley, Paul; Tran, Ashleigh V; Salehinejad, Mohammad Ali

2017-02-01

The goal of the current work was to create a publicly available visualization tool of distracted driving research, the purpose of which is to allow the public and other stakeholders to empirically inform questions of their choice that may bear on policy discussions. Fifty years of distracted driving research was used to design a comprehensive database of studies that evaluated the effects of distraction on driving performance. Distraction sources (e.g., texting, talking, visual distraction) and performance measures were defined, and the sample of studies were evaluated and categorized by their measures. The final product yielded 342 studies using various methodologies. Across all measures, 1297 found distractions degraded driving performance, 54 found distraction improved driving performance, and 257 found distraction had no effect on driving performance. An analysis of the most common phone distractions (texting and talking) showed that texting almost always results in degraded performance. Aggregate data reveal no difference in performance decrements for hand-held or hands-free phones even though single studies of those variables vary in their outcomes. This project illustrates how scientific research can be made publically available for use by a diverse audience of stakeholders. An important result of this project is that data aggregated along a simple set of characteristics such as whether or not performance is decreased, improved or not affected, can reveal trends in the data that are less clear from any individual study. Copyright © 2016 Elsevier Ltd. All rights reserved.
RefSeq microbial genomes database: new representation and annotation strategy.

PubMed

Tatusova, Tatiana; Ciufo, Stacy; Fedorov, Boris; O'Neill, Kathleen; Tolstoy, Igor

2014-01-01

The source of the microbial genomic sequences in the RefSeq collection is the set of primary sequence records submitted to the International Nucleotide Sequence Database public archives. These can be accessed through the Entrez search and retrieval system at http://www.ncbi.nlm.nih.gov/genome. Next-generation sequencing has enabled researchers to perform genomic sequencing at rates that were unimaginable in the past. Microbial genomes can now be sequenced in a matter of hours, which has led to a significant increase in the number of assembled genomes deposited in the public archives. This huge increase in DNA sequence data presents new challenges for the annotation, analysis and visualization bioinformatics tools. New strategies have been developed for the annotation and representation of reference genomes and sequence variations derived from population studies and clinical outbreaks.
PhytoREF: a reference database of the plastidial 16S rRNA gene of photosynthetic eukaryotes with curated taxonomy.

PubMed

Decelle, Johan; Romac, Sarah; Stern, Rowena F; Bendif, El Mahdi; Zingone, Adriana; Audic, Stéphane; Guiry, Michael D; Guillou, Laure; Tessier, Désiré; Le Gall, Florence; Gourvil, Priscillia; Dos Santos, Adriana L; Probert, Ian; Vaulot, Daniel; de Vargas, Colomban; Christen, Richard

2015-11-01

Photosynthetic eukaryotes have a critical role as the main producers in most ecosystems of the biosphere. The ongoing environmental metabarcoding revolution opens the perspective for holistic ecosystems biological studies of these organisms, in particular the unicellular microalgae that often lack distinctive morphological characters and have complex life cycles. To interpret environmental sequences, metabarcoding necessarily relies on taxonomically curated databases containing reference sequences of the targeted gene (or barcode) from identified organisms. To date, no such reference framework exists for photosynthetic eukaryotes. In this study, we built the PhytoREF database that contains 6490 plastidial 16S rDNA reference sequences that originate from a large diversity of eukaryotes representing all known major photosynthetic lineages. We compiled 3333 amplicon sequences available from public databases and 879 sequences extracted from plastidial genomes, and generated 411 novel sequences from cultured marine microalgal strains belonging to different eukaryotic lineages. A total of 1867 environmental Sanger 16S rDNA sequences were also included in the database. Stringent quality filtering and a phylogeny-based taxonomic classification were applied for each 16S rDNA sequence. The database mainly focuses on marine microalgae, but sequences from land plants (representing half of the PhytoREF sequences) and freshwater taxa were also included to broaden the applicability of PhytoREF to different aquatic and terrestrial habitats. PhytoREF, accessible via a web interface (http://phytoref.fr), is a new resource in molecular ecology to foster the discovery, assessment and monitoring of the diversity of photosynthetic eukaryotes using high-throughput sequencing. © 2015 John Wiley & Sons Ltd.
Database on natural polymorphisms and resistance-related non-synonymous mutations in thymidine kinase and DNA polymerase genes of herpes simplex virus types 1 and 2.

PubMed

Sauerbrei, Andreas; Bohn-Wippert, Kathrin; Kaspar, Marisa; Krumbholz, Andi; Karrasch, Matthias; Zell, Roland

2016-01-01

The use of genotypic resistance testing of herpes simplex virus types 1 and 2 (HSV-1 and HSV-2) is increasing because the rapid availability of results significantly improves the treatment of severe infections, especially in immunocompromised patients. However, an essential precondition is a broad knowledge of natural polymorphisms and resistance-associated mutations in the thymidine kinase (TK) and DNA polymerase (pol) genes, of which the DNA polymerase (Pol) enzyme is targeted by the highly effective antiviral drugs in clinical use. Thus, this review presents a database of all non-synonymous mutations of TK and DNA pol genes of HSV-1 and HSV-2 whose association with resistance or natural gene polymorphism has been clarified by phenotypic and/or functional assays. In addition, the laboratory methods for verifying natural polymorphisms or resistance mutations are summarized. This database can help considerably to facilitate the interpretation of genotypic resistance findings in clinical HSV-1 and HSV-2 strains. © The Author 2015. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
The MAR databases: development and implementation of databases specific for marine metagenomics

PubMed Central

Klemetsen, Terje; Raknes, Inge A; Fu, Juan; Agafonov, Alexander; Balasundaram, Sudhagar V; Tartari, Giacomo; Robertsen, Espen

2018-01-01

Abstract We introduce the marine databases; MarRef, MarDB and MarCat (https://mmp.sfb.uit.no/databases/), which are publicly available resources that promote marine research and innovation. These data resources, which have been implemented in the Marine Metagenomics Portal (MMP) (https://mmp.sfb.uit.no/), are collections of richly annotated and manually curated contextual (metadata) and sequence databases representing three tiers of accuracy. While MarRef is a database for completely sequenced marine prokaryotic genomes, which represent a marine prokaryote reference genome database, MarDB includes all incomplete sequenced prokaryotic genomes regardless level of completeness. The last database, MarCat, represents a gene (protein) catalog of uncultivable (and cultivable) marine genes and proteins derived from marine metagenomics samples. The first versions of MarRef and MarDB contain 612 and 3726 records, respectively. Each record is built up of 106 metadata fields including attributes for sampling, sequencing, assembly and annotation in addition to the organism and taxonomic information. Currently, MarCat contains 1227 records with 55 metadata fields. Ontologies and controlled vocabularies are used in the contextual databases to enhance consistency. The user-friendly web interface lets the visitors browse, filter and search in the contextual databases and perform BLAST searches against the corresponding sequence databases. All contextual and sequence databases are freely accessible and downloadable from https://s1.sfb.uit.no/public/mar/. PMID:29106641
Human Contamination in Public Genome Assemblies.

PubMed

Kryukov, Kirill; Imanishi, Tadashi

2016-01-01

Contamination in genome assembly can lead to wrong or confusing results when using such genome as reference in sequence comparison. Although bacterial contamination is well known, the problem of human-originated contamination received little attention. In this study we surveyed 45,735 available genome assemblies for evidence of human contamination. We used lineage specificity to distinguish between contamination and conservation. We found that 154 genome assemblies contain fragments that with high confidence originate as contamination from human DNA. Majority of contaminating human sequences were present in the reference human genome assembly for over a decade. We recommend that existing contaminated genomes should be revised to remove contaminated sequence, and that new assemblies should be thoroughly checked for presence of human DNA before submitting them to public databases.
Discovery of Possible Gene Relationships through the Application of Self-Organizing Maps to DNA Microarray Databases

PubMed Central

Chavez-Alvarez, Rocio; Chavoya, Arturo; Mendez-Vazquez, Andres

2014-01-01

DNA microarrays and cell cycle synchronization experiments have made possible the study of the mechanisms of cell cycle regulation of Saccharomyces cerevisiae by simultaneously monitoring the expression levels of thousands of genes at specific time points. On the other hand, pattern recognition techniques can contribute to the analysis of such massive measurements, providing a model of gene expression level evolution through the cell cycle process. In this paper, we propose the use of one of such techniques –an unsupervised artificial neural network called a Self-Organizing Map (SOM)–which has been successfully applied to processes involving very noisy signals, classifying and organizing them, and assisting in the discovery of behavior patterns without requiring prior knowledge about the process under analysis. As a test bed for the use of SOMs in finding possible relationships among genes and their possible contribution in some biological processes, we selected 282 S. cerevisiae genes that have been shown through biological experiments to have an activity during the cell cycle. The expression level of these genes was analyzed in five of the most cited time series DNA microarray databases used in the study of the cell cycle of this organism. With the use of SOM, it was possible to find clusters of genes with similar behavior in the five databases along two cell cycles. This result suggested that some of these genes might be biologically related or might have a regulatory relationship, as was corroborated by comparing some of the clusters obtained with SOMs against a previously reported regulatory network that was generated using biological knowledge, such as protein-protein interactions, gene expression levels, metabolism dynamics, promoter binding, and modification, regulation and transport of proteins. The methodology described in this paper could be applied to the study of gene relationships of other biological processes in different organisms. PMID:24699245
NREL: U.S. Life Cycle Inventory Database - About the LCI Database Project

Science.gov Websites

About the LCI Database Project The U.S. Life Cycle Inventory (LCI) Database is a publicly available data collection and analysis methods. Finding consistent and transparent LCI data for life cycle and maintain the database. The 2009 U.S. Life Cycle Inventory (LCI) Data Stakeholder meeting was an
Public Regulatory Databases as a Source of Insight for Neuromodulation Devices Stimulation Parameters

PubMed Central

Kumsa, Doe; Steinke, G. Karl; Molnar, Gregory F.; Hudak, Eric M.; Montague, Fred W.; Kelley, Shawn C.; Untereker, Darrel F.; Shi, Alan; Hahn, Benjamin P.; Condit, Chris; Lee, Hyowon; Bardot, Dawn; Centeno, Jose A.; Krauthamer, Victor; Takmakov, Pavel A.

2017-01-01

Objective The Shannon model is often used to define an expected boundary between non-damaging and damaging modes of electrical neurostimulation. Numerous preclinical studies have been performed by manufacturers of neuromodulation devices using different animal models and a broad range of stimulation parameters while developing devices for clinical use. These studies are mostly absent from peer-reviewed literature, which may lead to this information being overlooked by the scientific community. We aimed to locate summaries of these studies accessible via public regulatory databases and to add them to a body of knowledge available to a broad scientific community. Methods We employed web search terms describing device type, intended use, neural target, therapeutic application, company name, and submission number to identify summaries for premarket approval (PMA) devices and 510(k) devices. We filtered these records to a subset of entries that have sufficient technical information relevant to safety of neurostimulation. Results We identified 13 product codes for 8 types of neuromodulation devices. These led us to devices that have 22 PMAs and 154 510(k)s and six transcripts of public panel meetings. We found one PMA for a brain, peripheral nerve, and spinal cord stimulator and five 510(k) spinal cord stimulators with enough information to plot in Shannon coordinates of charge and charge density per phase. Conclusions Analysis of relevant entries from public regulatory databases reveals use of pig, sheep, monkey, dog, and goat animal models with deep brain, peripheral nerve, muscle and spinal cord electrode placement with a variety of stimulation durations (hours to years); frequencies (10–10,000 Hz) and magnitudes (Shannon k from below zero to 4.47). Data from located entries indicate that a feline cortical model that employs acute stimulation might have limitations for assessing tissue damage in diverse anatomical locations, particularly for peripheral nerve and
Public Regulatory Databases as a Source of Insight for Neuromodulation Devices Stimulation Parameters.

PubMed

Kumsa, Doe; Steinke, G Karl; Molnar, Gregory F; Hudak, Eric M; Montague, Fred W; Kelley, Shawn C; Untereker, Darrel F; Shi, Alan; Hahn, Benjamin P; Condit, Chris; Lee, Hyowon; Bardot, Dawn; Centeno, Jose A; Krauthamer, Victor; Takmakov, Pavel A

2018-02-01

The Shannon model is often used to define an expected boundary between non-damaging and damaging modes of electrical neurostimulation. Numerous preclinical studies have been performed by manufacturers of neuromodulation devices using different animal models and a broad range of stimulation parameters while developing devices for clinical use. These studies are mostly absent from peer-reviewed literature, which may lead to this information being overlooked by the scientific community. We aimed to locate summaries of these studies accessible via public regulatory databases and to add them to a body of knowledge available to a broad scientific community. We employed web search terms describing device type, intended use, neural target, therapeutic application, company name, and submission number to identify summaries for premarket approval (PMA) devices and 510(k) devices. We filtered these records to a subset of entries that have sufficient technical information relevant to safety of neurostimulation. We identified 13 product codes for 8 types of neuromodulation devices. These led us to devices that have 22 PMAs and 154 510(k)s and six transcripts of public panel meetings. We found one PMA for a brain, peripheral nerve, and spinal cord stimulator and five 510(k) spinal cord stimulators with enough information to plot in Shannon coordinates of charge and charge density per phase. Analysis of relevant entries from public regulatory databases reveals use of pig, sheep, monkey, dog, and goat animal models with deep brain, peripheral nerve, muscle and spinal cord electrode placement with a variety of stimulation durations (hours to years); frequencies (10-10,000 Hz) and magnitudes (Shannon k from below zero to 4.47). Data from located entries indicate that a feline cortical model that employs acute stimulation might have limitations for assessing tissue damage in diverse anatomical locations, particularly for peripheral nerve and spinal cord simulation. © 2017

Content based information retrieval in forensic image databases.

PubMed

Geradts, Zeno; Bijhold, Jurrien

2002-03-01

This paper gives an overview of the various available image databases and ways of searching these databases on image contents. The developments in research groups of searching in image databases is evaluated and compared with the forensic databases that exist. Forensic image databases of fingerprints, faces, shoeprints, handwriting, cartridge cases, drugs tablets, and tool marks are described. The developments in these fields appear to be valuable for forensic databases, especially that of the framework in MPEG-7, where the searching in image databases is standardized. In the future, the combination of the databases (also DNA-databases) and possibilities to combine these can result in stronger forensic evidence.
OriDB, the DNA replication origin database updated and extended.

PubMed

Siow, Cheuk C; Nieduszynska, Sian R; Müller, Carolin A; Nieduszynski, Conrad A

2012-01-01

OriDB (http://www.oridb.org/) is a database containing collated genome-wide mapping studies of confirmed and predicted replication origin sites. The original database collated and curated Saccharomyces cerevisiae origin mapping studies. Here, we report that the OriDB database and web site have been revamped to improve user accessibility to curated data sets, to greatly increase the number of curated origin mapping studies, and to include the collation of replication origin sites in the fission yeast Schizosaccharomyces pombe. The revised database structure underlies these improvements and will facilitate further expansion in the future. The updated OriDB for S. cerevisiae is available at http://cerevisiae.oridb.org/ and for S. pombe at http://pombe.oridb.org/.
A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE).

PubMed

Wu, Tsung-Jung; Shamsaddini, Amirhossein; Pan, Yang; Smith, Krista; Crichton, Daniel J; Simonyan, Vahan; Mazumder, Raja

2014-01-01

Years of sequence feature curation by UniProtKB/Swiss-Prot, PIR-PSD, NCBI-CDD, RefSeq and other database biocurators has led to a rich repository of information on functional sites of genes and proteins. This information along with variation-related annotation can be used to scan human short sequence reads from next-generation sequencing (NGS) pipelines for presence of non-synonymous single-nucleotide variations (nsSNVs) that affect functional sites. This and similar workflows are becoming more important because thousands of NGS data sets are being made available through projects such as The Cancer Genome Atlas (TCGA), and researchers want to evaluate their biomarkers in genomic data. BioMuta, an integrated sequence feature database, provides a framework for automated and manual curation and integration of cancer-related sequence features so that they can be used in NGS analysis pipelines. Sequence feature information in BioMuta is collected from the Catalogue of Somatic Mutations in Cancer (COSMIC), ClinVar, UniProtKB and through biocuration of information available from publications. Additionally, nsSNVs identified through automated analysis of NGS data from TCGA are also included in the database. Because of the petabytes of data and information present in NGS primary repositories, a platform HIVE (High-performance Integrated Virtual Environment) for storing, analyzing, computing and curating NGS data and associated metadata has been developed. Using HIVE, 31 979 nsSNVs were identified in TCGA-derived NGS data from breast cancer patients. All variations identified through this process are stored in a Curated Short Read archive, and the nsSNVs from the tumor samples are included in BioMuta. Currently, BioMuta has 26 cancer types with 13 896 small-scale and 308 986 large-scale study-derived variations. Integration of variation data allows identifications of novel or common nsSNVs that can be prioritized in validation studies. Database URL: BioMuta: http
The MAR databases: development and implementation of databases specific for marine metagenomics.

PubMed

Klemetsen, Terje; Raknes, Inge A; Fu, Juan; Agafonov, Alexander; Balasundaram, Sudhagar V; Tartari, Giacomo; Robertsen, Espen; Willassen, Nils P

2018-01-04

We introduce the marine databases; MarRef, MarDB and MarCat (https://mmp.sfb.uit.no/databases/), which are publicly available resources that promote marine research and innovation. These data resources, which have been implemented in the Marine Metagenomics Portal (MMP) (https://mmp.sfb.uit.no/), are collections of richly annotated and manually curated contextual (metadata) and sequence databases representing three tiers of accuracy. While MarRef is a database for completely sequenced marine prokaryotic genomes, which represent a marine prokaryote reference genome database, MarDB includes all incomplete sequenced prokaryotic genomes regardless level of completeness. The last database, MarCat, represents a gene (protein) catalog of uncultivable (and cultivable) marine genes and proteins derived from marine metagenomics samples. The first versions of MarRef and MarDB contain 612 and 3726 records, respectively. Each record is built up of 106 metadata fields including attributes for sampling, sequencing, assembly and annotation in addition to the organism and taxonomic information. Currently, MarCat contains 1227 records with 55 metadata fields. Ontologies and controlled vocabularies are used in the contextual databases to enhance consistency. The user-friendly web interface lets the visitors browse, filter and search in the contextual databases and perform BLAST searches against the corresponding sequence databases. All contextual and sequence databases are freely accessible and downloadable from https://s1.sfb.uit.no/public/mar/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
21SSD: a new public 21-cm EoR database

NASA Astrophysics Data System (ADS)

Eames, Evan; Semelin, Benoît

2018-05-01

With current efforts inching closer to detecting the 21-cm signal from the Epoch of Reionization (EoR), proper preparation will require publicly available simulated models of the various forms the signal could take. In this work we present a database of such models, available at 21ssd.obspm.fr. The models are created with a fully-coupled radiative hydrodynamic simulation (LICORICE), and are created at high resolution (10243). We also begin to analyse and explore the possible 21-cm EoR signals (with Power Spectra and Pixel Distribution Functions), and study the effects of thermal noise on our ability to recover the signal out to high redshifts. Finally, we begin to explore the concepts of `distance' between different models, which represents a crucial step towards optimising parameter space sampling, training neural networks, and finally extracting parameter values from observations.
Scientific publications about DNA structure-function and PCR technique in Costa Rica: a historic view (1953-2003).

PubMed

Albertazzi, Federico J

2004-09-01

The spreading of knowledge depends on the access to the information and its immediate use. Models are useful to explain specific phenomena. The scientific community accepts some models in Biology after a period of time, once it has evidence to support it. The model of the structure and function of the DNA proposed by Watson & Crick (1953) was not the exception, since a few years later the DNA model was finally accepted. In Costa Rica, DNA function was first mentioned in 1970, in the magazine Biologia Tropical (Tropical Biology Magazine), more than 15 years after its first publication in a scientific journal. An opposite situation occurs with technical innovations. If the efficiency of a new scientific technique is proved in a compelling way, then the acceptance by the community comes swiftly. This was the case of the polymerase chain reaction, or PCR. The first PCR machine in Costa Rica arrived in 1991, only three years after its publication.
Estimating Diversity of Florida Keys Zooplankton Using New Environmental DNA Methods

NASA Astrophysics Data System (ADS)

Djurhuus, A.; Goldsmith, D. B.; Sawaya, N. A.; Breitbart, M.

2016-02-01

Zooplankton are of great importance in marine food webs, where they serve to link the phytoplankton and bacteria with higher trophic levels. Zooplankton are a diverse group containing molluscs, crustaceans, fish larvae and many other taxa. The sheer number of species and often minor morphological distinctions between species makes it challenging and exceptionally time consuming to identify the species composition of marine zooplankton samples. As a part of the Marine Biodiversity Observation Network (MBON) project, we have developed and groundtruthed an alternative, relatively time-efficient method for zooplankton identification using environmental DNA (eDNA). Samples were collected from Molasses reef, Looe Key, and Western Sambo along the Florida Keys from five bi-monthly cruises on board the RV Walton Smith. Samples were collected for environmental DNA (eDNA) by filtering 1 L of water on to a 0.22 µm filter and zooplankton samples were collected using nets with three mesh sizes (64μm, 200μm, and 500μm) to catch different size fractions. Half of zooplankton samples were fixed in 70% ethanol and half in 10% formalin, for DNA extraction and morphological identification, respectively. Individuals representing visually abundant taxa were picked into individual wells for PCR with universal 18S rRNA gene primers and subsequent sequencing to build a reference barcode database for zooplankton species commonly found in the study region. PCR and Illumina MiSeq next generation sequencing was applied to the eDNA extracted from the 0.22 μm filters and sequences were be compared to our local custom database as well as publicly available databases to determine zooplankton community composition. Finally, composition and diversity analyses were performed to compare results obtained with the new eDNA approach to standard morphological classification of zooplankton communities. Results show that the eDNA approach can enable the determination of zooplankton diversity through
Lnc2Meth: a manually curated database of regulatory relationships between long non-coding RNAs and DNA methylation associated with human disease

PubMed Central

Zhi, Hui; Li, Xin; Wang, Peng; Gao, Yue; Gao, Baoqing; Zhou, Dianshuang; Zhang, Yan; Guo, Maoni; Yue, Ming; Shen, Weitao

2018-01-01

Abstract Lnc2Meth (http://www.bio-bigdata.com/Lnc2Meth/), an interactive resource to identify regulatory relationships between human long non-coding RNAs (lncRNAs) and DNA methylation, is not only a manually curated collection and annotation of experimentally supported lncRNAs-DNA methylation associations but also a platform that effectively integrates tools for calculating and identifying the differentially methylated lncRNAs and protein-coding genes (PCGs) in diverse human diseases. The resource provides: (i) advanced search possibilities, e.g. retrieval of the database by searching the lncRNA symbol of interest, DNA methylation patterns, regulatory mechanisms and disease types; (ii) abundant computationally calculated DNA methylation array profiles for the lncRNAs and PCGs; (iii) the prognostic values for each hit transcript calculated from the patients clinical data; (iv) a genome browser to display the DNA methylation landscape of the lncRNA transcripts for a specific type of disease; (v) tools to re-annotate probes to lncRNA loci and identify the differential methylation patterns for lncRNAs and PCGs with user-supplied external datasets; (vi) an R package (LncDM) to complete the differentially methylated lncRNAs identification and visualization with local computers. Lnc2Meth provides a timely and valuable resource that can be applied to significantly expand our understanding of the regulatory relationships between lncRNAs and DNA methylation in various human diseases. PMID:29069510
MGDB: crossing the marker genes of a user microarray with a database of public-microarrays marker genes.

PubMed

Huerta, Mario; Munyi, Marc; Expósito, David; Querol, Enric; Cedano, Juan

2014-06-15

The microarrays performed by scientific teams grow exponentially. These microarray data could be useful for researchers around the world, but unfortunately they are underused. To fully exploit these data, it is necessary (i) to extract these data from a repository of the high-throughput gene expression data like Gene Expression Omnibus (GEO) and (ii) to make the data from different microarrays comparable with tools easy to use for scientists. We have developed these two solutions in our server, implementing a database of microarray marker genes (Marker Genes Data Base). This database contains the marker genes of all GEO microarray datasets and it is updated monthly with the new microarrays from GEO. Thus, researchers can see whether the marker genes of their microarray are marker genes in other microarrays in the database, expanding the analysis of their microarray to the rest of the public microarrays. This solution helps not only to corroborate the conclusions regarding a researcher's microarray but also to identify the phenotype of different subsets of individuals under investigation, to frame the results with microarray experiments from other species, pathologies or tissues, to search for drugs that promote the transition between the studied phenotypes, to detect undesirable side effects of the treatment applied, etc. Thus, the researcher can quickly add relevant information to his/her studies from all of the previous analyses performed in other studies as long as they have been deposited in public repositories. Marker-gene database tool: http://ibb.uab.es/mgdb © The Author 2014. Published by Oxford University Press.
Uncovering Trophic Interactions in Arthropod Predators through DNA Shotgun-Sequencing of Gut Contents

PubMed Central

Paula, Débora P.; Linard, Benjamin; Crampton-Platt, Alex; Srivathsan, Amrita; Timmermans, Martijn J. T. N.; Sujii, Edison R.; Pires, Carmen S. S.; Souza, Lucas M.; Andow, David A.; Vogler, Alfried P.

2016-01-01

Characterizing trophic networks is fundamental to many questions in ecology, but this typically requires painstaking efforts, especially to identify the diet of small generalist predators. Several attempts have been devoted to develop suitable molecular tools to determine predatory trophic interactions through gut content analysis, and the challenge has been to achieve simultaneously high taxonomic breadth and resolution. General and practical methods are still needed, preferably independent of PCR amplification of barcodes, to recover a broader range of interactions. Here we applied shotgun-sequencing of the DNA from arthropod predator gut contents, extracted from four common coccinellid and dermapteran predators co-occurring in an agroecosystem in Brazil. By matching unassembled reads against six DNA reference databases obtained from public databases and newly assembled mitogenomes, and filtering for high overlap length and identity, we identified prey and other foreign DNA in the predator guts. Good taxonomic breadth and resolution was achieved (93% of prey identified to species or genus), but with low recovery of matching reads. Two to nine trophic interactions were found for these predators, some of which were only inferred by the presence of parasitoids and components of the microbiome known to be associated with aphid prey. Intraguild predation was also found, including among closely related ladybird species. Uncertainty arises from the lack of comprehensive reference databases and reliance on low numbers of matching reads accentuating the risk of false positives. We discuss caveats and some future prospects that could improve the use of direct DNA shotgun-sequencing to characterize arthropod trophic networks. PMID:27622637
Access to DNA and protein databases on the Internet.

PubMed

Harper, R

1994-02-01

During the past year, the number of biological databases that can be queried via Internet has dramatically increased. This increase has resulted from the introduction of networking tools, such as Gopher and WAIS, that make it easy for research workers to index databases and make them available for on-line browsing. Biocomputing in the nineties will see the advent of more client/server options for the solution of problems in bioinformatics.
Hospice palliative care article publications: An analysis of the Web of Science database from 1993 to 2013.

PubMed

Chang, Hsiao-Ting; Lin, Ming-Hwai; Chen, Chun-Ku; Hwang, Shinn-Jang; Hwang, I-Hsuan; Chen, Yu-Chun

2016-01-01

Academic publications are important for developing a medical specialty or discipline and improvements of quality of care. As hospice palliative care medicine is a rapidly growing medical specialty in Taiwan, this study aimed to analyze the hospice palliative care-related publications from 1993 through 2013 both worldwide and in Taiwan, by using the Web of Science database. Academic articles published with topics including "hospice", "palliative care", "end of life care", and "terminal care" were retrieved and analyzed from the Web of Science database, which includes documents published in Science Citation Index-Expanded and Social Science Citation Indexed journals from 1993 to 2013. Compound annual growth rates (CAGRs) were calculated to evaluate the trends of publications. There were a total of 27,788 documents published worldwide during the years 1993 to 2013. The top five most prolific countries/areas with published documents were the United States (11,419 documents, 41.09%), England (3620 documents, 13.03%), Canada (2428 documents, 8.74%), Germany (1598 documents, 5.75%), and Australia (1580 documents, 5.69%). Three hundred and ten documents (1.12%) were published from Taiwan, which ranks second among Asian countries (after Japan, with 594 documents, 2.14%) and 16(th) in the world. During this 21-year period, the number of hospice palliative care-related article publications increased rapidly. The worldwide CAGR for hospice palliative care publications during 1993 through 2013 was 12.9%. As for Taiwan, the CAGR for publications during 1999 through 2013 was 19.4%. The majority of these documents were submitted from universities or hospitals affiliated to universities. The number of hospice palliative care-related publications increased rapidly from 1993 to 2013 in the world and in Taiwan; however, the number of publications from Taiwan is still far below those published in several other countries. Further research is needed to identify and try to reduce the
Database Resources of the BIG Data Center in 2018.

PubMed

2018-01-04

The BIG Data Center at Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences provides freely open access to a suite of database resources in support of worldwide research activities in both academia and industry. With the vast amounts of omics data generated at ever-greater scales and rates, the BIG Data Center is continually expanding, updating and enriching its core database resources through big-data integration and value-added curation, including BioCode (a repository archiving bioinformatics tool codes), BioProject (a biological project library), BioSample (a biological sample library), Genome Sequence Archive (GSA, a data repository for archiving raw sequence reads), Genome Warehouse (GWH, a centralized resource housing genome-scale data), Genome Variation Map (GVM, a public repository of genome variations), Gene Expression Nebulas (GEN, a database of gene expression profiles based on RNA-Seq data), Methylation Bank (MethBank, an integrated databank of DNA methylomes), and Science Wikis (a series of biological knowledge wikis for community annotations). In addition, three featured web services are provided, viz., BIG Search (search as a service; a scalable inter-domain text search engine), BIG SSO (single sign-on as a service; a user access control system to gain access to multiple independent systems with a single ID and password) and Gsub (submission as a service; a unified submission service for all relevant resources). All of these resources are publicly accessible through the home page of the BIG Data Center at http://bigd.big.ac.cn. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Database Resources of the BIG Data Center in 2018

PubMed Central

Xu, Xingjian; Hao, Lili; Zhu, Junwei; Tang, Bixia; Zhou, Qing; Song, Fuhai; Chen, Tingting; Zhang, Sisi; Dong, Lili; Lan, Li; Wang, Yanqing; Sang, Jian; Hao, Lili; Liang, Fang; Cao, Jiabao; Liu, Fang; Liu, Lin; Wang, Fan; Ma, Yingke; Xu, Xingjian; Zhang, Lijuan; Chen, Meili; Tian, Dongmei; Li, Cuiping; Dong, Lili; Du, Zhenglin; Yuan, Na; Zeng, Jingyao; Zhang, Zhewen; Wang, Jinyue; Shi, Shuo; Zhang, Yadong; Pan, Mengyu; Tang, Bixia; Zou, Dong; Song, Shuhui; Sang, Jian; Xia, Lin; Wang, Zhennan; Li, Man; Cao, Jiabao; Niu, Guangyi; Zhang, Yang; Sheng, Xin; Lu, Mingming; Wang, Qi; Xiao, Jingfa; Zou, Dong; Wang, Fan; Hao, Lili; Liang, Fang; Li, Mengwei; Sun, Shixiang; Zou, Dong; Li, Rujiao; Yu, Chunlei; Wang, Guangyu; Sang, Jian; Liu, Lin; Li, Mengwei; Li, Man; Niu, Guangyi; Cao, Jiabao; Sun, Shixiang; Xia, Lin; Yin, Hongyan; Zou, Dong; Xu, Xingjian; Ma, Lina; Chen, Huanxin; Sun, Yubin; Yu, Lei; Zhai, Shuang; Sun, Mingyuan; Zhang, Zhang; Zhao, Wenming; Xiao, Jingfa; Bao, Yiming; Song, Shuhui; Hao, Lili; Li, Rujiao; Ma, Lina; Sang, Jian; Wang, Yanqing; Tang, Bixia; Zou, Dong; Wang, Fan

2018-01-01

Abstract The BIG Data Center at Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences provides freely open access to a suite of database resources in support of worldwide research activities in both academia and industry. With the vast amounts of omics data generated at ever-greater scales and rates, the BIG Data Center is continually expanding, updating and enriching its core database resources through big-data integration and value-added curation, including BioCode (a repository archiving bioinformatics tool codes), BioProject (a biological project library), BioSample (a biological sample library), Genome Sequence Archive (GSA, a data repository for archiving raw sequence reads), Genome Warehouse (GWH, a centralized resource housing genome-scale data), Genome Variation Map (GVM, a public repository of genome variations), Gene Expression Nebulas (GEN, a database of gene expression profiles based on RNA-Seq data), Methylation Bank (MethBank, an integrated databank of DNA methylomes), and Science Wikis (a series of biological knowledge wikis for community annotations). In addition, three featured web services are provided, viz., BIG Search (search as a service; a scalable inter-domain text search engine), BIG SSO (single sign-on as a service; a user access control system to gain access to multiple independent systems with a single ID and password) and Gsub (submission as a service; a unified submission service for all relevant resources). All of these resources are publicly accessible through the home page of the BIG Data Center at http://bigd.big.ac.cn. PMID:29036542
A novel approach: chemical relational databases, and the role of the ISSCAN database on assessing chemical carcinogenicity.

PubMed

Benigni, Romualdo; Bossa, Cecilia; Richard, Ann M; Yang, Chihae

2008-01-01

Mutagenicity and carcinogenicity databases are crucial resources for toxicologists and regulators involved in chemicals risk assessment. Until recently, existing public toxicity databases have been constructed primarily as "look-up-tables" of existing data, and most often did not contain chemical structures. Concepts and technologies originated from the structure-activity relationships science have provided powerful tools to create new types of databases, where the effective linkage of chemical toxicity with chemical structure can facilitate and greatly enhance data gathering and hypothesis generation, by permitting: a) exploration across both chemical and biological domains; and b) structure-searchability through the data. This paper reviews the main public databases, together with the progress in the field of chemical relational databases, and presents the ISSCAN database on experimental chemical carcinogens.
JICST Factual Database(1)

NASA Astrophysics Data System (ADS)

Kurosawa, Shinji

The outline of JICST factual database (JOIS-F), which JICST has started from January, 1988, and its online service are described in this paper. First, the author mentions the circumstances from 1973, when its planning was started, to the present, and its relation to "Project by Special Coordination Founds for Promoting Science and Technology". Secondly, databases, which are now under development aiming to start its services from fiscal 1988 or fiscal 1989, of DNA, metallic material intensity, crystal structure, chemical substance regulations, and so forth, are described. Lastly, its online service is briefly explained.
DNA patenting: implications for public health research.

PubMed Central

Dutfield, Graham

2006-01-01

I weigh the arguments for and against the patenting of functional DNA sequences including genes, and find the objections to be compelling. Is an outright ban on DNA patenting the right policy response? Not necessarily. Governments may wish to consider options ranging from patent law reforms to the creation of new rights. There are alternative ways to protect DNA sequences that industry may choose if DNA patenting is restricted or banned. Some of these alternatives may be more harmful than patents. Such unintended consequences of patent bans mean that we should think hard before concluding that prohibition is the only response to legitimate concerns about the appropriateness of patents in the field of human genomics. PMID:16710549
Alignment of high-throughput sequencing data inside in-memory databases.

PubMed

Firnkorn, Daniel; Knaup-Gregori, Petra; Lorenzo Bermejo, Justo; Ganzinger, Matthias

2014-01-01

In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.
Does language matter? A case study of epidemiological and public health journals, databases and professional education in French, German and Italian

PubMed Central

Baussano, Iacopo; Brzoska, Patrick; Fedeli, Ugo; Larouche, Claudia; Razum, Oliver; Fung, Isaac C-H

2008-01-01

Epidemiology and public health are usually context-specific. Journals published in different languages and countries play a role both as sources of data and as channels through which evidence is incorporated into local public health practice. Databases in these languages facilitate access to relevant journals, and professional education in these languages facilitates the growth of native expertise in epidemiology and public health. However, as English has become the lingua franca of scientific communication in the era of globalisation, many journals published in non-English languages face the difficult dilemma of either switching to English and competing internationally, or sticking to the native tongue and having a restricted circulation among a local readership. This paper discusses the historical development of epidemiology and the current scene of epidemiological and public health journals, databases and professional education in three Western European languages: French, German and Italian, and examines the dynamics and struggles they have today. PMID:18826570
Aviation Safety Issues Database

NASA Technical Reports Server (NTRS)

Morello, Samuel A.; Ricks, Wendell R.

2009-01-01

The aviation safety issues database was instrumental in the refinement and substantiation of the National Aviation Safety Strategic Plan (NASSP). The issues database is a comprehensive set of issues from an extremely broad base of aviation functions, personnel, and vehicle categories, both nationally and internationally. Several aviation safety stakeholders such as the Commercial Aviation Safety Team (CAST) have already used the database. This broader interest was the genesis to making the database publically accessible and writing this report.

5S ribosomal RNA database Y2K

PubMed Central

Szymanski, Maciej; Barciszewska, Miroslawa Z.; Barciszewski, Jan; Erdmann, Volker A.

2000-01-01

This paper presents the updated version (Y2K) of the database of ribosomal 5S ribonucleic acids (5S rRNA) and their genes (5S rDNA), http://rose.man/poznan. pl/5SData/index.html . This edition of the database contains 1985 primary structures of 5S rRNA and 5S rDNA. They include 60 archaebacterial, 470 eubacterial, 63 plastid, nine mitochondrial and 1383 eukaryotic sequences. The nucleotide sequences of the 5S rRNAs or 5S rDNAs are divided according to the taxonomic position of the source organisms. PMID:10592212
5S ribosomal RNA database Y2K.

PubMed

Szymanski, M; Barciszewska, M Z; Barciszewski, J; Erdmann, V A

2000-01-01

This paper presents the updated version (Y2K) of the database of ribosomal 5S ribonucleic acids (5S rRNA) and their genes (5S rDNA), http://rose.man/poznan.pl/5SData/index.html. This edition of the database contains 1985primary structures of 5S rRNA and 5S rDNA. They include 60 archaebacterial, 470 eubacterial, 63 plastid, nine mitochondrial and 1383 eukaryotic sequences. The nucleotide sequences of the 5S rRNAs or 5S rDNAs are divided according to the taxonomic position of the source organisms.
DISTRIBUTED STRUCTURE-SEARCHABLE TOXICITY (DSSTOX) DATABASE NETWORK: MAKING PUBLIC TOXICITY DATA RESOURCES MORE ACCESSIBLE AND USABLE FOR DATA EXPLORATION AND SAR DEVELOPMENT

EPA Science Inventory

Distributed Structure-Searchable Toxicity (DSSTox) Database Network: Making Public Toxicity Data Resources More Accessible and U sable for Data Exploration and SAR Development

Many sources of public toxicity data are not currently linked to chemical structure, are not ...
The TTSMI database: a catalog of triplex target DNA sites associated with genes and regulatory elements in the human genome.

PubMed

Jenjaroenpun, Piroon; Chew, Chee Siang; Yong, Tai Pang; Choowongkomon, Kiattawee; Thammasorn, Wimada; Kuznetsov, Vladimir A

2015-01-01

A triplex target DNA site (TTS), a stretch of DNA that is composed of polypurines, is able to form a triple-helix (triplex) structure with triplex-forming oligonucleotides (TFOs) and is able to influence the site-specific modulation of gene expression and/or the modification of genomic DNA. The co-localization of a genomic TTS with gene regulatory signals and functional genome structures suggests that TFOs could potentially be exploited in antigene strategies for the therapy of cancers and other genetic diseases. Here, we present the TTS Mapping and Integration (TTSMI; http://ttsmi.bii.a-star.edu.sg) database, which provides a catalog of unique TTS locations in the human genome and tools for analyzing the co-localization of TTSs with genomic regulatory sequences and signals that were identified using next-generation sequencing techniques and/or predicted by computational models. TTSMI was designed as a user-friendly tool that facilitates (i) fast searching/filtering of TTSs using several search terms and criteria associated with sequence stability and specificity, (ii) interactive filtering of TTSs that co-localize with gene regulatory signals and non-B DNA structures, (iii) exploration of dynamic combinations of the biological signals of specific TTSs and (iv) visualization of a TTS simultaneously with diverse annotation tracks via the UCSC genome browser. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
SeedStor: A Germplasm Information Management System and Public Database

PubMed Central

Horler, RSP; Turner, AS; Fretter, P; Ambrose, M

2018-01-01

Abstract SeedStor (https://www.seedstor.ac.uk) acts as the publicly available database for the seed collections held by the Germplasm Resources Unit (GRU) based at the John Innes Centre, Norwich, UK. The GRU is a national capability supported by the Biotechnology and Biological Sciences Research Council (BBSRC). The GRU curates germplasm collections of a range of temperate cereal, legume and Brassica crops and their associated wild relatives, as well as precise genetic stocks, near-isogenic lines and mapping populations. With >35,000 accessions, the GRU forms part of the UK’s plant conservation contribution to the Multilateral System (MLS) of the International Treaty for Plant Genetic Resources for Food and Agriculture (ITPGRFA) for wheat, barley, oat and pea. SeedStor is a fully searchable system that allows our various collections to be browsed species by species through to complicated multipart phenotype criteria-driven queries. The results from these searches can be downloaded for later analysis or used to order germplasm via our shopping cart. The user community for SeedStor is the plant science research community, plant breeders, specialist growers, hobby farmers and amateur gardeners, and educationalists. Furthermore, SeedStor is much more than a database; it has been developed to act internally as a Germplasm Information Management System that allows team members to track and process germplasm requests, determine regeneration priorities, handle cost recovery and Material Transfer Agreement paperwork, manage the Seed Store holdings and easily report on a wide range of the aforementioned tasks. PMID:29228298
The Latin American Social Medicine database

PubMed Central

Eldredge, Jonathan D; Waitzkin, Howard; Buchanan, Holly S; Teal, Janis; Iriart, Celia; Wiley, Kevin; Tregear, Jonathan

2004-01-01

Background Public health practitioners and researchers for many years have been attempting to understand more clearly the links between social conditions and the health of populations. Until recently, most public health professionals in English-speaking countries were unaware that their colleagues in Latin America had developed an entire field of inquiry and practice devoted to making these links more clearly understood. The Latin American Social Medicine (LASM) database finally bridges this previous gap. Description This public health informatics case study describes the key features of a unique information resource intended to improve access to LASM literature and to augment understanding about the social determinants of health. This case study includes both quantitative and qualitative evaluation data. Currently the LASM database at The University of New Mexico brings important information, originally known mostly within professional networks located in Latin American countries to public health professionals worldwide via the Internet. The LASM database uses Spanish, Portuguese, and English language trilingual, structured abstracts to summarize classic and contemporary works. Conclusion This database provides helpful information for public health professionals on the social determinants of health and expands access to LASM. PMID:15627401
Toward unification of taxonomy databases in a distributed computer environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kitakami, Hajime; Tateno, Yoshio; Gojobori, Takashi

1994-12-31

All the taxonomy databases constructed with the DNA databases of the international DNA data banks are powerful electronic dictionaries which aid in biological research by computer. The taxonomy databases are, however not consistently unified with a relational format. If we can achieve consistent unification of the taxonomy databases, it will be useful in comparing many research results, and investigating future research directions from existent research results. In particular, it will be useful in comparing relationships between phylogenetic trees inferred from molecular data and those constructed from morphological data. The goal of the present study is to unify the existent taxonomymore » databases and eliminate inconsistencies (errors) that are present in them. Inconsistencies occur particularly in the restructuring of the existent taxonomy databases, since classification rules for constructing the taxonomy have rapidly changed with biological advancements. A repair system is needed to remove inconsistencies in each data bank and mismatches among data banks. This paper describes a new methodology for removing both inconsistencies and mismatches from the databases on a distributed computer environment. The methodology is implemented in a relational database management system, SYBASE.« less
The Génolevures database.

PubMed

Martin, Tiphaine; Sherman, David J; Durrens, Pascal

2011-01-01

The Génolevures online database (URL: http://www.genolevures.org) stores and provides the data and results obtained by the Génolevures Consortium through several campaigns of genome annotation of the yeasts in the Saccharomycotina subphylum (hemiascomycetes). This database is dedicated to large-scale comparison of these genomes, storing not only the different chromosomal elements detected in the sequences, but also the logical relations between them. The database is divided into a public part, accessible to anyone through Internet, and a private part where the Consortium members make genome annotations with our Magus annotation system; this system is used to annotate several related genomes in parallel. The public database is widely consulted and offers structured data, organized using a REST web site architecture that allows for automated requests. The implementation of the database, as well as its associated tools and methods, is evolving to cope with the influx of genome sequences produced by Next Generation Sequencing (NGS). Copyright © 2011 Académie des sciences. Published by Elsevier SAS. All rights reserved.
Lnc2Meth: a manually curated database of regulatory relationships between long non-coding RNAs and DNA methylation associated with human disease.

PubMed

Zhi, Hui; Li, Xin; Wang, Peng; Gao, Yue; Gao, Baoqing; Zhou, Dianshuang; Zhang, Yan; Guo, Maoni; Yue, Ming; Shen, Weitao; Ning, Shangwei; Jin, Lianhong; Li, Xia

2018-01-04

Lnc2Meth (http://www.bio-bigdata.com/Lnc2Meth/), an interactive resource to identify regulatory relationships between human long non-coding RNAs (lncRNAs) and DNA methylation, is not only a manually curated collection and annotation of experimentally supported lncRNAs-DNA methylation associations but also a platform that effectively integrates tools for calculating and identifying the differentially methylated lncRNAs and protein-coding genes (PCGs) in diverse human diseases. The resource provides: (i) advanced search possibilities, e.g. retrieval of the database by searching the lncRNA symbol of interest, DNA methylation patterns, regulatory mechanisms and disease types; (ii) abundant computationally calculated DNA methylation array profiles for the lncRNAs and PCGs; (iii) the prognostic values for each hit transcript calculated from the patients clinical data; (iv) a genome browser to display the DNA methylation landscape of the lncRNA transcripts for a specific type of disease; (v) tools to re-annotate probes to lncRNA loci and identify the differential methylation patterns for lncRNAs and PCGs with user-supplied external datasets; (vi) an R package (LncDM) to complete the differentially methylated lncRNAs identification and visualization with local computers. Lnc2Meth provides a timely and valuable resource that can be applied to significantly expand our understanding of the regulatory relationships between lncRNAs and DNA methylation in various human diseases. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
DNA barcoding in the media: does coverage of cool science reflect its social context?

PubMed

Geary, Janis; Camicioli, Emma; Bubela, Tania

2016-09-01

Paul Hebert and colleagues first described DNA barcoding in 2003, which led to international efforts to promote and coordinate its use. Since its inception, DNA barcoding has generated considerable media coverage. We analysed whether this coverage reflected both the scientific and social mandates of international barcoding organizations. We searched newspaper databases to identify 900 English-language articles from 2003 to 2013. Coverage of the science of DNA barcoding was highly positive but lacked context for key topics. Coverage omissions pose challenges for public understanding of the science and applications of DNA barcoding; these included coverage of governance structures and issues related to the sharing of genetic resources across national borders. Our analysis provided insight into how barcoding communication efforts have translated into media coverage; more targeted communication efforts may focus media attention on previously omitted, but important topics. Our analysis is timely as the DNA barcoding community works to establish the International Society for the Barcode of Life.
Identification and correction of abnormal, incomplete and mispredicted proteins in public databases.

PubMed

Nagy, Alinda; Hegyi, Hédi; Farkas, Krisztina; Tordai, Hedvig; Kozma, Evelin; Bányai, László; Patthy, László

2008-08-27

Despite significant improvements in computational annotation of genomes, sequences of abnormal, incomplete or incorrectly predicted genes and proteins remain abundant in public databases. Since the majority of incomplete, abnormal or mispredicted entries are not annotated as such, these errors seriously affect the reliability of these databases. Here we describe the MisPred approach that may provide an efficient means for the quality control of databases. The current version of the MisPred approach uses five distinct routines for identifying abnormal, incomplete or mispredicted entries based on the principle that a sequence is likely to be incorrect if some of its features conflict with our current knowledge about protein-coding genes and proteins: (i) conflict between the predicted subcellular localization of proteins and the absence of the corresponding sequence signals; (ii) presence of extracellular and cytoplasmic domains and the absence of transmembrane segments; (iii) co-occurrence of extracellular and nuclear domains; (iv) violation of domain integrity; (v) chimeras encoded by two or more genes located on different chromosomes. Analyses of predicted EnsEMBL protein sequences of nine deuterostome (Homo sapiens, Mus musculus, Rattus norvegicus, Monodelphis domestica, Gallus gallus, Xenopus tropicalis, Fugu rubripes, Danio rerio and Ciona intestinalis) and two protostome species (Caenorhabditis elegans and Drosophila melanogaster) have revealed that the absence of expected signal peptides and violation of domain integrity account for the majority of mispredictions. Analyses of sequences predicted by NCBI's GNOMON annotation pipeline show that the rates of mispredictions are comparable to those of EnsEMBL. Interestingly, even the manually curated UniProtKB/Swiss-Prot dataset is contaminated with mispredicted or abnormal proteins, although to a much lesser extent than UniProtKB/TrEMBL or the EnsEMBL or GNOMON-predicted entries. MisPred works efficiently in
ESTuber db: an online database for Tuber borchii EST sequences.

PubMed

Lazzari, Barbara; Caprera, Andrea; Cosentino, Cristian; Stella, Alessandra; Milanesi, Luciano; Viotti, Angelo

2007-03-08

The ESTuber database (http://www.itb.cnr.it/estuber) includes 3,271 Tuber borchii expressed sequence tags (EST). The dataset consists of 2,389 sequences from an in-house prepared cDNA library from truffle vegetative hyphae, and 882 sequences downloaded from GenBank and representing four libraries from white truffle mycelia and ascocarps at different developmental stages. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts. Data were collected in a MySQL database, which can be queried via a php-based web interface. Sequences included in the ESTuber db were clustered and annotated against three databases: the GenBank nr database, the UniProtKB database and a third in-house prepared database of fungi genomic sequences. An algorithm was implemented to infer statistical classification among Gene Ontology categories from the ontology occurrences deduced from the annotation procedure against the UniProtKB database. Ontologies were also deduced from the annotation of more than 130,000 EST sequences from five filamentous fungi, for intra-species comparison purposes. Further analyses were performed on the ESTuber db dataset, including tandem repeats search and comparison of the putative protein dataset inferred from the EST sequences to the PROSITE database for protein patterns identification. All the analyses were performed both on the complete sequence dataset and on the contig consensus sequences generated by the EST assembly procedure. The resulting web site is a resource of data and links related to truffle expressed genes. The Sequence Report and Contig Report pages are the web interface core structures which, together with the Text search utility and the Blast utility, allow easy access to the data stored in the database.
National Vulnerability Database (NVD)

National Institute of Standards and Technology Data Gateway

National Vulnerability Database (NVD) (Web, free access) NVD is a comprehensive cyber security vulnerability database that integrates all publicly available U.S. Government vulnerability resources and provides references to industry resources. It is based on and synchronized with the CVE vulnerability naming standard.
A Partnership for Public Health: USDA Branded Food Products Database

USDA-ARS?s Scientific Manuscript database

The importance of comprehensive food composition databases is more critical than ever in helping to address global food security. The USDA National Nutrient Database for Standard Reference is the “gold standard” for food composition databases. The presentation will include new developments in stren...
Variations in clinicopathologic characteristics of thyroid cancer among racial ethnic groups: analysis of a large public city hospital and the SEER database.

PubMed

Moo-Young, Tricia A; Panergo, Jessel; Wang, Chih E; Patel, Subhash; Duh, Hong Yan; Winchester, David J; Prinz, Richard A; Fogelfeld, Leon

2013-11-01

Clinicopathologic variables influence the treatment and prognosis of patients with thyroid cancer. A retrospective analysis of public hospital thyroid cancer database and the Surveillance, Epidemiology and End Results 17 database was conducted. Demographic, clinical, and pathologic data were compared across ethnic groups. Within the public hospital database, Hispanics versus non-Hispanic whites were younger and had more lymph node involvement (34% vs 17%, P < .001). Median tumor size was not statistically different across ethnic groups. Similar findings were demonstrated within the Surveillance, Epidemiology and End Results database. African Americans aged <45 years had the largest tumors but were least likely to have lymph node involvement. Asians had the most stage IV disease despite having no differences in tumor size, lymph node involvement, and capsular invasion. There is considerable variability in the clinical presentation of thyroid cancer across ethnic groups. Such disparities persist within an equal-access health care system. These findings suggest that factors beyond socioeconomics may contribute to such differences. Copyright © 2013 Elsevier Inc. All rights reserved.
Database and online map service on unstable rock slopes in Norway - From data perpetuation to public information

NASA Astrophysics Data System (ADS)

Oppikofer, Thierry; Nordahl, Bobo; Bunkholt, Halvor; Nicolaisen, Magnus; Jarna, Alexandra; Iversen, Sverre; Hermanns, Reginald L.; Böhme, Martina; Yugsi Molina, Freddy X.

2015-11-01

The unstable rock slope database is developed and maintained by the Geological Survey of Norway as part of the systematic mapping of unstable rock slopes in Norway. This mapping aims to detect catastrophic rock slope failures before they occur. More than 250 unstable slopes with post-glacial deformation are detected up to now. The main aims of the unstable rock slope database are (1) to serve as a national archive for unstable rock slopes in Norway; (2) to serve for data collection and storage during field mapping; (3) to provide decision-makers with hazard zones and other necessary information on unstable rock slopes for land-use planning and mitigation; and (4) to inform the public through an online map service. The database is organized hierarchically with a main point for each unstable rock slope to which several feature classes and tables are linked. This main point feature class includes several general attributes of the unstable rock slopes, such as site name, general and geological descriptions, executed works, recommendations, technical parameters (volume, lithology, mechanism and others), displacement rates, possible consequences, as well as hazard and risk classification. Feature classes and tables linked to the main feature class include different scenarios of an unstable rock slope, field observation points, sampling points for dating, displacement measurement stations, lineaments, unstable areas, run-out areas, areas affected by secondary effects, along with tables for hazard and risk classification and URL links to further documentation and references. The database on unstable rock slopes in Norway will be publicly consultable through an online map service. Factsheets with key information on unstable rock slopes can be automatically generated and downloaded for each site. Areas of possible rock avalanche run-out and their secondary effects displayed in the online map service, along with hazard and risk assessments, will become important tools for
[Public scientific knowledge distribution in health information, communication and information technology indexed in MEDLINE and LILACS databases].

PubMed

Packer, Abel Laerte; Tardelli, Adalberto Otranto; Castro, Regina Célia Figueiredo

2007-01-01

This study explores the distribution of international, regional and national scientific output in health information and communication, indexed in the MEDLINE and LILACS databases, between 1996 and 2005. A selection of articles was based on the hierarchical structure of Information Science in MeSH vocabulary. Four specific domains were determined: health information, medical informatics, scientific communications on healthcare and healthcare communications. The variables analyzed were: most-covered subjects and journals, author affiliation and publication countries and languages, in both databases. The Information Science category is represented in nearly 5% of MEDLINE and LILACS articles. The four domains under analysis showed a relative annual increase in MEDLINE. The Medical Informatics domain showed the highest number of records in MEDLINE, representing about half of all indexed articles. The importance of Information Science as a whole is more visible in publications from developed countries and the findings indicate the predominance of the United States, with significant growth in scientific output from China and South Korea and, to a lesser extent, Brazil.
PFR²: a curated database of planktonic foraminifera 18S ribosomal DNA as a resource for studies of plankton ecology, biogeography and evolution.

PubMed

Morard, Raphaël; Darling, Kate F; Mahé, Frédéric; Audic, Stéphane; Ujiié, Yurika; Weiner, Agnes K M; André, Aurore; Seears, Heidi A; Wade, Christopher M; Quillévéré, Frédéric; Douady, Christophe J; Escarguel, Gilles; de Garidel-Thoron, Thibault; Siccha, Michael; Kucera, Michal; de Vargas, Colomban

2015-11-01

Planktonic foraminifera (Rhizaria) are ubiquitous marine pelagic protists producing calcareous shells with conspicuous morphology. They play an important role in the marine carbon cycle, and their exceptional fossil record serves as the basis for biochronostratigraphy and past climate reconstructions. A major worldwide sampling effort over the last two decades has resulted in the establishment of multiple large collections of cryopreserved individual planktonic foraminifera samples. Thousands of 18S rDNA partial sequences have been generated, representing all major known morphological taxa across their worldwide oceanic range. This comprehensive data coverage provides an opportunity to assess patterns of molecular ecology and evolution in a holistic way for an entire group of planktonic protists. We combined all available published and unpublished genetic data to build PFR(2), the Planktonic foraminifera Ribosomal Reference database. The first version of the database includes 3322 reference 18S rDNA sequences belonging to 32 of the 47 known morphospecies of extant planktonic foraminifera, collected from 460 oceanic stations. All sequences have been rigorously taxonomically curated using a six-rank annotation system fully resolved to the morphological species level and linked to a series of metadata. The PFR(2) website, available at http://pfr2.sb-roscoff.fr, allows downloading the entire database or specific sections, as well as the identification of new planktonic foraminiferal sequences. Its novel, fully documented curation process integrates advances in morphological and molecular taxonomy. It allows for an increase in its taxonomic resolution and assures that integrity is maintained by including a complete contingency tracking of annotations and assuring that the annotations remain internally consistent. © 2015 John Wiley & Sons Ltd.
Relational databases: a transparent framework for encouraging biology students to think informatically.

PubMed

Rice, Michael; Gladstone, William; Weir, Michael

2004-01-01

We discuss how relational databases constitute an ideal framework for representing and analyzing large-scale genomic data sets in biology. As a case study, we describe a Drosophila splice-site database that we recently developed at Wesleyan University for use in research and teaching. The database stores data about splice sites computed by a custom algorithm using Drosophila cDNA transcripts and genomic DNA and supports a set of procedures for analyzing splice-site sequence space. A generic Web interface permits the execution of the procedures with a variety of parameter settings and also supports custom structured query language queries. Moreover, new analytical procedures can be added by updating special metatables in the database without altering the Web interface. The database provides a powerful setting for students to develop informatic thinking skills.
Relational Databases: A Transparent Framework for Encouraging Biology Students To Think Informatically

PubMed Central

2004-01-01

We discuss how relational databases constitute an ideal framework for representing and analyzing large-scale genomic data sets in biology. As a case study, we describe a Drosophila splice-site database that we recently developed at Wesleyan University for use in research and teaching. The database stores data about splice sites computed by a custom algorithm using Drosophila cDNA transcripts and genomic DNA and supports a set of procedures for analyzing splice-site sequence space. A generic Web interface permits the execution of the procedures with a variety of parameter settings and also supports custom structured query language queries. Moreover, new analytical procedures can be added by updating special metatables in the database without altering the Web interface. The database provides a powerful setting for students to develop informatic thinking skills. PMID:15592597

The Government Finance Database: A Common Resource for Quantitative Research in Public Financial Analysis

PubMed Central

Pierson, Kawika; Hand, Michael L.; Thompson, Fred

2015-01-01

Quantitative public financial management research focused on local governments is limited by the absence of a common database for empirical analysis. While the U.S. Census Bureau distributes government finance data that some scholars have utilized, the arduous process of collecting, interpreting, and organizing the data has led its adoption to be prohibitive and inconsistent. In this article we offer a single, coherent resource that contains all of the government financial data from 1967-2012, uses easy to understand natural-language variable names, and will be extended when new data is available. PMID:26107821
The Government Finance Database: A Common Resource for Quantitative Research in Public Financial Analysis.

PubMed

Pierson, Kawika; Hand, Michael L; Thompson, Fred

2015-01-01

Quantitative public financial management research focused on local governments is limited by the absence of a common database for empirical analysis. While the U.S. Census Bureau distributes government finance data that some scholars have utilized, the arduous process of collecting, interpreting, and organizing the data has led its adoption to be prohibitive and inconsistent. In this article we offer a single, coherent resource that contains all of the government financial data from 1967-2012, uses easy to understand natural-language variable names, and will be extended when new data is available.
SeedStor: A Germplasm Information Management System and Public Database.

PubMed

Horler, R S P; Turner, A S; Fretter, P; Ambrose, M

2018-01-01

SeedStor (https://www.seedstor.ac.uk) acts as the publicly available database for the seed collections held by the Germplasm Resources Unit (GRU) based at the John Innes Centre, Norwich, UK. The GRU is a national capability supported by the Biotechnology and Biological Sciences Research Council (BBSRC). The GRU curates germplasm collections of a range of temperate cereal, legume and Brassica crops and their associated wild relatives, as well as precise genetic stocks, near-isogenic lines and mapping populations. With >35,000 accessions, the GRU forms part of the UK's plant conservation contribution to the Multilateral System (MLS) of the International Treaty for Plant Genetic Resources for Food and Agriculture (ITPGRFA) for wheat, barley, oat and pea. SeedStor is a fully searchable system that allows our various collections to be browsed species by species through to complicated multipart phenotype criteria-driven queries. The results from these searches can be downloaded for later analysis or used to order germplasm via our shopping cart. The user community for SeedStor is the plant science research community, plant breeders, specialist growers, hobby farmers and amateur gardeners, and educationalists. Furthermore, SeedStor is much more than a database; it has been developed to act internally as a Germplasm Information Management System that allows team members to track and process germplasm requests, determine regeneration priorities, handle cost recovery and Material Transfer Agreement paperwork, manage the Seed Store holdings and easily report on a wide range of the aforementioned tasks. © The Author(s) 2017. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists.
Mycobacteriophage genome database.

PubMed

Joseph, Jerrine; Rajendran, Vasanthi; Hassan, Sameer; Kumar, Vanaja

2011-01-01

Mycobacteriophage genome database (MGDB) is an exclusive repository of the 64 completely sequenced mycobacteriophages with annotated information. It is a comprehensive compilation of the various gene parameters captured from several databases pooled together to empower mycobacteriophage researchers. The MGDB (Version No.1.0) comprises of 6086 genes from 64 mycobacteriophages classified into 72 families based on ACLAME database. Manual curation was aided by information available from public databases which was enriched further by analysis. Its web interface allows browsing as well as querying the classification. The main objective is to collect and organize the complexity inherent to mycobacteriophage protein classification in a rational way. The other objective is to browse the existing and new genomes and describe their functional annotation. The database is available for free at http://mpgdb.ibioinformatics.org/mpgdb.php.
USDA Branded Food Products Database, Release 2

USDA-ARS?s Scientific Manuscript database

The USDA Branded Food Products Database is the ongoing result of a Public-Private Partnership (PPP), whose goal is to enhance public health and the sharing of open data by complementing the USDA National Nutrient Database for Standard Reference (SR) with nutrient composition of branded foods and pri...
Sports medicine clinical trial research publications in academic medical journals between 1996 and 2005: an audit of the PubMed MEDLINE database.

PubMed

Nichols, A W

2008-11-01

To identify sports medicine-related clinical trial research articles in the PubMed MEDLINE database published between 1996 and 2005 and conduct a review and analysis of topics of research, experimental designs, journals of publication and the internationality of authorships. Sports medicine research is international in scope with improving study methodology and an evolution of topics. Structured review of articles identified in a search of a large electronic medical database. PubMed MEDLINE database. Sports medicine-related clinical research trials published between 1996 and 2005. Review and analysis of articles that meet inclusion criteria. Articles were examined for study topics, research methods, experimental subject characteristics, journal of publication, lead authors and journal countries of origin and language of publication. The search retrieved 414 articles, of which 379 (345 English language and 34 non-English language) met the inclusion criteria. The number of publications increased steadily during the study period. Randomised clinical trials were the most common study type and the "diagnosis, management and treatment of sports-related injuries and conditions" was the most popular study topic. The knee, ankle/foot and shoulder were the most frequent anatomical sites of study. Soccer players and runners were the favourite study subjects. The American Journal of Sports Medicine had the highest number of publications and shared the greatest international diversity of authorships with the British Journal of Sports Medicine. The USA, Australia, Germany and the UK produced a good number of the lead authorships. In all, 91% of articles and 88% of journals were published in English. Sports medicine-related research is internationally diverse, clinical trial publications are increasing and the sophistication of research design may be improving.
Trends in global acupuncture publications: An analysis of the Web of Science database from 1988 to 2015.

PubMed

Kung, Yen-Ying; Hwang, Shinn-Jang; Li, Tsai-Feng; Ko, Seong-Gyu; Huang, Ching-Wen; Chen, Fang-Pey

2017-08-01

Acupuncture is a rapidly growing medical specialty worldwide. This study aimed to analyze the acupuncture publications from 1988 to 2015 by using the Web of Science (WoS) database. Familiarity with the trend of acupuncture publications will facilitate a better understanding of existing academic research in acupuncture and its applications. Academic articles published focusing on acupuncture were retrieved and analyzed from the WoS database which included articles published in Science Citation Index-Expanded and Social Science Citation Indexed journals from 1988 to 2015. A total of 7450 articles were published in the field of acupuncture during the period of 1988-2015. Annual article publications increased from 109 in 1988 to 670 in 2015. The People's Republic of China (published 2076 articles, 27.9%), USA (published 1638 articles, 22.0%) and South Korea (published 707 articles, 9.5%) were the most abundantly prolific countries. According to the WoS subject categories, 2591 articles (34.8%) were published in the category of Integrative and Complementary Medicine, followed by Neurosciences (1147 articles, 15.4%), and General Internal Medicine (918 articles, 12.3%). Kyung Hee University (South Korea) is the most prolific organization that is the source of acupuncture publications (365 articles, 4.9%). Fields within acupuncture with the most cited articles included mechanism, clinical trials, epidemiology, and a new research method of acupuncture. Publications associated with acupuncture increased rapidly from 1988 to 2015. The different applications of acupuncture were extensive in multiple fields of medicine. It is important to maintain and even nourish a certain quantity and quality of published acupuncture papers, which can play an important role in developing a medical discipline for acupuncture. Copyright © 2017. Published by Elsevier Taiwan LLC.
Bridging Plant and Human Radiation Response and DNA Repair through an In Silico Approach

PubMed Central

Nikitaki, Zacharenia; Pavlopoulou, Athanasia; Holá, Marcela; Donà, Mattia; Michalopoulos, Ioannis; Balestrazzi, Alma; Angelis, Karel J.; Georgakilas, Alexandros G.

2017-01-01

The mechanisms of response to radiation exposure are conserved in plants and animals. The DNA damage response (DDR) pathways are the predominant molecular pathways activated upon exposure to radiation, both in plants and animals. The conserved features of DDR in plants and animals might facilitate interdisciplinary studies that cross traditional boundaries between animal and plant biology in order to expand the collection of biomarkers currently used for radiation exposure monitoring (REM) in environmental and biomedical settings. Genes implicated in trans-kingdom conserved DDR networks often triggered by ionizing radiation (IR) and UV light are deposited into biological databases. In this study, we have applied an innovative approach utilizing data pertinent to plant and human genes from publicly available databases towards the design of a ‘plant radiation biodosimeter’, that is, a plant and DDR gene-based platform that could serve as a REM reliable biomarker for assessing environmental radiation exposure and associated risk. From our analysis, in addition to REM biomarkers, a significant number of genes, both in human and Arabidopsis thaliana, not yet characterized as DDR, are suggested as possible DNA repair players. Last but not least, we provide an example on the applicability of an Arabidopsis thaliana—based plant system monitoring the role of cancer-related DNA repair genes BRCA1, BARD1 and PARP1 in processing DNA lesions. PMID:28587301
Bridging Plant and Human Radiation Response and DNA Repair through an In Silico Approach.

PubMed

Nikitaki, Zacharenia; Pavlopoulou, Athanasia; Holá, Marcela; Donà, Mattia; Michalopoulos, Ioannis; Balestrazzi, Alma; Angelis, Karel J; Georgakilas, Alexandros G

2017-06-06

The mechanisms of response to radiation exposure are conserved in plants and animals. The DNA damage response (DDR) pathways are the predominant molecular pathways activated upon exposure to radiation, both in plants and animals. The conserved features of DDR in plants and animals might facilitate interdisciplinary studies that cross traditional boundaries between animal and plant biology in order to expand the collection of biomarkers currently used for radiation exposure monitoring (REM) in environmental and biomedical settings. Genes implicated in trans-kingdom conserved DDR networks often triggered by ionizing radiation (IR) and UV light are deposited into biological databases. In this study, we have applied an innovative approach utilizing data pertinent to plant and human genes from publicly available databases towards the design of a 'plant radiation biodosimeter', that is, a plant and DDR gene-based platform that could serve as a REM reliable biomarker for assessing environmental radiation exposure and associated risk. From our analysis, in addition to REM biomarkers, a significant number of genes, both in human and Arabidopsis thaliana, not yet characterized as DDR, are suggested as possible DNA repair players. Last but not least, we provide an example on the applicability of an Arabidopsis thaliana- based plant system monitoring the role of cancer-related DNA repair genes BRCA1 , BARD1 and PARP1 in processing DNA lesions.
The radiopurity.org material database

NASA Astrophysics Data System (ADS)

Cooley, J.; Loach, J. C.; Poon, A. W. P.

2018-01-01

The database at http://www.radiopurity.org is the world's largest public database of material radio-purity mea-surements. These measurements are used by members of the low-background physics community to build experiments that search for neutrinos, neutrinoless double-beta decay, WIMP dark matter, and other exciting physics. This paper summarizes the current status and the future plan of this database.
The EMBL nucleotide sequence database

PubMed Central

Stoesser, Guenter; Baker, Wendy; van den Broek, Alexandra; Camon, Evelyn; Garcia-Pastor, Maria; Kanz, Carola; Kulikova, Tamara; Lombard, Vincent; Lopez, Rodrigo; Parkinson, Helen; Redaschi, Nicole; Sterk, Peter; Stoehr, Peter; Tuli, Mary Ann

2001-01-01

The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. PMID:11125039
The complex genetics of human insulin-like growth factor 2 are not reflected in public databases.

PubMed

Rotwein, Peter

2018-03-23

Recent advances in genetics present unique opportunities for enhancing knowledge about human physiology and disease susceptibility. Understanding this information at the individual gene level is challenging and requires extracting, collating, and interpreting data from a variety of public gene repositories. Here, I illustrate this challenge by analyzing the gene for human insulin-like growth factor 2 ( IGF2 ) through the lens of several databases. IGF2, a 67-amino acid secreted peptide, is essential for normal prenatal growth and is involved in other physiological and pathophysiological processes in humans. Surprisingly, none of the genetic databases accurately described or completely delineated human IGF2 gene structure or transcript expression, even though all relevant information could be found in the published literature. Although IGF2 shares multiple features with the mouse Igf2 gene, it has several unique properties, including transcription from five promoters. Both genes undergo parental imprinting, with IGF2 / Igf2 being expressed primarily from the paternal chromosome and the adjacent H19 gene from the maternal chromosome. Unlike mouse Igf2 , whose expression declines after birth, human IGF2 remains active throughout life. This characteristic has been attributed to a unique human gene promoter that escapes imprinting, but as shown here, it involves several different promoters with distinct tissue-specific expression patterns. Because new testable hypotheses could lead to critical insights into IGF2 actions in human physiology and disease, it is incumbent that our fundamental understanding is accurate. Similar challenges affecting knowledge of other human genes should promote attempts to critically evaluate, interpret, and correct human genetic data in publicly available databases. © 2018 by The American Society for Biochemistry and Molecular Biology, Inc.
Establishing a database of Canadian feline mitotypes for forensic use.

PubMed

Arcieri, M; Agostinelli, G; Gray, Z; Spadaro, A; Lyons, L A; Webb, K M

2016-05-01

Hair shed by pet animals is often found and collected as evidence from crime scenes. Due to limitations such as small amount and low quality, mitochondrial DNA (mtDNA) is often the only type of DNA that can be used for linking the hair to a potential contributor. mtDNA has lower discriminatory power than nuclear DNA because multiple, unrelated individuals within a population can have the same mtDNA sequence, or mitotype. Therefore, to determine the evidentiary value of a match between crime scene evidence and a suspected contributor, the frequency of the mitotype must be known within the regional population. While mitotype frequencies have been determined for the United States' cat population, the frequencies are unknown for the Canadian cat population. Given the countries' close proximity and similar human settlement patterns, these populations may be homogenous, meaning a single, regional database may be used for estimating cat population mitotype frequencies. Here we determined the mitotype frequencies of the Canadian cat population and compared them to the United States' cat population. The two cat populations are statistically homogenous, however mitotype B6 was found in high frequency in Canada and extremely low frequency in the United States, meaning a single database would not be appropriate for North America. Furthermore, this work calls attention to these local spikes in frequency of otherwise rare mitotypes, instances of which exist around the world and have the potential to misrepresent the evidentiary value of matches compared to a regional database. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
The LISS--a public database of common imaging signs of lung diseases for computer-aided detection and diagnosis research and medical education.

PubMed

Han, Guanghui; Liu, Xiabi; Han, Feifei; Santika, I Nyoman Tenaya; Zhao, Yanfeng; Zhao, Xinming; Zhou, Chunwu

2015-02-01

Lung computed tomography (CT) imaging signs play important roles in the diagnosis of lung diseases. In this paper, we review the significance of CT imaging signs in disease diagnosis and determine the inclusion criterion of CT scans and CT imaging signs of our database. We develop the software of abnormal regions annotation and design the storage scheme of CT images and annotation data. Then, we present a publicly available database of lung CT imaging signs, called LISS for short, which contains 271 CT scans and 677 abnormal regions in them. The 677 abnormal regions are divided into nine categories of common CT imaging signs of lung disease (CISLs). The ground truth of these CISLs regions and the corresponding categories are provided. Furthermore, to make the database publicly available, all private data in CT scans are eliminated or replaced with provisioned values. The main characteristic of our LISS database is that it is developed from a new perspective of CT imaging signs of lung diseases instead of commonly considered lung nodules. Thus, it is promising to apply to computer-aided detection and diagnosis research and medical education.
Beyond the cold hit: measuring the impact of the national DNA data bank on public safety at the city and county level.

PubMed

Gabriel, Matthew; Boland, Cherisse; Holt, Cydne

2010-01-01

Over the past decade, the Combined DNA Index System (CODIS) has increased solvability of violent crimes by linking evidence DNA profiles to known offenders. At present, an in-depth analysis of the United States National DNA Data Bank effort has not assessed the success of this national public safety endeavor. Critics of this effort often focus on laboratory and police investigators unable to provide timely investigative support as a root cause(s) of CODIS' failure to increase public safety. By studying a group of nearly 200 DNA cold hits obtained in SFPD criminal investigations from 2001-2006, three key performance metrics (Significance of Cold Hits, Case Progression & Judicial Resolution, and Potential Reduction of Future Criminal Activity) provide a proper context in which to define the impact of CODIS at the City and County level. Further, the analysis of a recidivist group of cold hit offenders and their past interaction with law enforcement established five noteworthy criminal case resolution trends; these trends signify challenges to CODIS in achieving meaningful case resolutions. CODIS' effectiveness and critical activities to support case resolutions are the responsibility of all criminal justice partners in order to achieve long-lasting public safety within the United States.
Dynamic publication model for neurophysiology databases.

PubMed

Gardner, D; Abato, M; Knuth, K H; DeBellis, R; Erde, S M

2001-08-29

We have implemented a pair of database projects, one serving cortical electrophysiology and the other invertebrate neurones and recordings. The design for each combines aspects of two proven schemes for information interchange. The journal article metaphor determined the type, scope, organization and quantity of data to comprise each submission. Sequence databases encouraged intuitive tools for data viewing, capture, and direct submission by authors. Neurophysiology required transcending these models with new datatypes. Time-series, histogram and bivariate datatypes, including illustration-like wrappers, were selected by their utility to the community of investigators. As interpretation of neurophysiological recordings depends on context supplied by metadata attributes, searches are via visual interfaces to sets of controlled-vocabulary metadata trees. Neurones, for example, can be specified by metadata describing functional and anatomical characteristics. Permanence is advanced by data model and data formats largely independent of contemporary technology or implementation, including Java and the XML standard. All user tools, including dynamic data viewers that serve as a virtual oscilloscope, are Java-based, free, multiplatform, and distributed by our application servers to any contemporary networked computer. Copyright is retained by submitters; viewer displays are dynamic and do not violate copyright of related journal figures. Panels of neurophysiologists view and test schemas and tools, enhancing community support.
There Is a Significant Discrepancy Between "Big Data" Database and Original Research Publications on Hip Arthroscopy Outcomes: A Systematic Review.

PubMed

Sochacki, Kyle R; Jack, Robert A; Safran, Marc R; Nho, Shane J; Harris, Joshua D

2018-06-01

The purpose of this study was to compare (1) major complication, (2) revision, and (3) conversion to arthroplasty rates following hip arthroscopy between database studies and original research peer-reviewed publications. A systematic review was performed using PRISMA guidelines. PubMed, SCOPUS, SportDiscus, and Cochrane Central Register of Controlled Trials were searched for studies that investigated major complication (dislocation, femoral neck fracture, avascular necrosis, fluid extravasation, septic arthritis, death), revision, and hip arthroplasty conversion rates following hip arthroscopy. Major complication, revision, and conversion to hip arthroplasty rates were compared between original research (single- or multicenter therapeutic studies) and database (insurance database using ICD-9/10 and/or current procedural terminology coding terminology) publishing studies. Two hundred seven studies (201 original research publications [15,780 subjects; 54% female] and 6 database studies [20,825 subjects; 60% female]) were analyzed (mean age, 38.2 ± 11.6 years old; mean follow-up, 2.7 ± 2.9 years). The database studies had a significantly higher age (40.6 + 2.8 vs 35.4 ± 11.6), body mass index (27.4 ± 5.6 vs 24.9 ± 3.1), percentage of females (60.1% vs 53.8%), and longer follow-up (3.1 ± 1.6 vs 2.7 ± 3.0) compared with original research (P < .0001 for all). Ninety-seven (0.6%) major complications occurred in the individual studies, and 95 (0.8%) major complications occurred in the database studies (P = .029; relative risk [RR], 1.3). There was a significantly higher rate of femoral neck fracture (0.24% vs 0.03%; P < .0001; RR, 8.0), and hip dislocation (0.17% vs 0.06%; P = .023; RR, 2.2) in the database studies. Reoperations occurred at a significantly higher rate in the database studies (11.1% vs 7.3%; P < .001; RR, 1.5). There was a significantly higher rate of conversion to arthroplasty in the database studies (8.0% vs 3.7%; P < .001; RR, 2.2). Database
Specialized microbial databases for inductive exploration of microbial genome sequences

PubMed Central

Fang, Gang; Ho, Christine; Qiu, Yaowu; Cubas, Virginie; Yu, Zhou; Cabau, Cédric; Cheung, Frankie; Moszer, Ivan; Danchin, Antoine

2005-01-01

Background The enormous amount of genome sequence data asks for user-oriented databases to manage sequences and annotations. Queries must include search tools permitting function identification through exploration of related objects. Methods The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented. Results Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore , a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya) has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns. Conclusion This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis) associated to related organisms for comparison. PMID:15698474
DNA barcoding insect–host plant associations

PubMed Central

Jurado-Rivera, José A.; Vogler, Alfried P.; Reid, Chris A.M.; Petitpierre, Eduard; Gómez-Zurita, Jesús

2008-01-01

Short-sequence fragments (‘DNA barcodes’) used widely for plant identification and inventorying remain to be applied to complex biological problems. Host–herbivore interactions are fundamental to coevolutionary relationships of a large proportion of species on the Earth, but their study is frequently hampered by limited or unreliable host records. Here we demonstrate that DNA barcodes can greatly improve this situation as they (i) provide a secure identification of host plant species and (ii) establish the authenticity of the trophic association. Host plants of leaf beetles (subfamily Chrysomelinae) from Australia were identified using the chloroplast trnL(UAA) intron as barcode amplified from beetle DNA extracts. Sequence similarity and phylogenetic analyses provided precise identifications of each host species at tribal, generic and specific levels, depending on the available database coverage in various plant lineages. The 76 species of Chrysomelinae included—more than 10 per cent of the known Australian fauna—feed on 13 plant families, with preference for Australian radiations of Myrtaceae (eucalypts) and Fabaceae (acacias). Phylogenetic analysis of beetles shows general conservation of host association but with rare host shifts between distant plant lineages, including a few cases where barcodes supported two phylogenetically distant host plants. The study demonstrates that plant barcoding is already feasible with the current publicly available data. By sequencing plant barcodes directly from DNA extractions made from herbivorous beetles, strong physical evidence for the host association is provided. Thus, molecular identification using short DNA fragments brings together the detection of species and the analysis of their interactions. PMID:19004756
Database Changes (Post-Publication). ERIC Processing Manual, Section X.

ERIC Educational Resources Information Center

Brandhorst, Ted, Ed.

The purpose of this section is to specify the procedure for making changes to the ERIC database after the data involved have been announced in the abstract journals RIE or CIJE. As a matter of general ERIC policy, a document or journal article is not re-announced or re-entered into the database as a new accession for the purpose of accomplishing a…

Integrating early detection with DNA barcoding: species identification of a non-native monitor lizard (Squamata: Varanidae) carcass in Mississippi, U.S.A.

USGS Publications Warehouse

Reed, Robert N.; Hopken, Matthew W.; Steen, David A.; Falk, Bryan G.; Piaggio, Antoinette J.

2016-01-01

Early detection of invasive species is critical to increasing the probability of successful management. At the primary stage of an invasion, invasive species are easier to control as the population is likely represented by just a few individuals. Detection of these first few individuals can be challenging, particularly if they are cryptic or otherwise characterized by low detectability. The engagement of members of the public may be critical to early detection as there are far more citizen s on the landscape than trained biologists. However, it can be difficult to assess the credibility of public reporting, especially when a diagnostic digital image or a physical specimen in good condition are lacking. DNA barcoding can be used for verification when morphological identification of a specimen is not possible or uncertain (i.e., degraded or partial specimen). DNA barcoding relies on obtaining a DNA sequence from a relatively small fragment of mitochondrial DNA and comparing it to a database of sequences containing a variety of expertly identified species. He rein we report the successful identification of a degraded specimen of a non-native, potentially invasive reptile species (Varanus niloticus) via DNA barcoding, after discovery and reporting by a member of the public.
Ionic Liquids Database- (ILThermo)

National Institute of Standards and Technology Data Gateway

SRD 147 NIST Ionic Liquids Database- (ILThermo) (Web, free access) IUPAC Ionic Liquids Database, ILThermo, is a free web research tool that allows users worldwide to access an up-to-date data collection from the publications on experimental investigations of thermodynamic, and transport properties of ionic liquids as well as binary and ternary mixtures containing ionic liquids.
Establishment of Kawasaki disease database based on metadata standard.

PubMed

Park, Yu Rang; Kim, Jae-Jung; Yoon, Young Jo; Yoon, Young-Kwang; Koo, Ha Yeong; Hong, Young Mi; Jang, Gi Young; Shin, Soo-Yong; Lee, Jong-Keuk

2016-07-01

Kawasaki disease (KD) is a rare disease that occurs predominantly in infants and young children. To identify KD susceptibility genes and to develop a diagnostic test, a specific therapy, or prevention method, collecting KD patients' clinical and genomic data is one of the major issues. For this purpose, Kawasaki Disease Database (KDD) was developed based on the efforts of Korean Kawasaki Disease Genetics Consortium (KKDGC). KDD is a collection of 1292 clinical data and genomic samples of 1283 patients from 13 KKDGC-participating hospitals. Each sample contains the relevant clinical data, genomic DNA and plasma samples isolated from patients' blood, omics data and KD-associated genotype data. Clinical data was collected and saved using the common data elements based on the ISO/IEC 11179 metadata standard. Two genome-wide association study data of total 482 samples and whole exome sequencing data of 12 samples were also collected. In addition, KDD includes the rare cases of KD (16 cases with family history, 46 cases with recurrence, 119 cases with intravenous immunoglobulin non-responsiveness, and 52 cases with coronary artery aneurysm). As the first public database for KD, KDD can significantly facilitate KD studies. All data in KDD can be searchable and downloadable. KDD was implemented in PHP, MySQL and Apache, with all major browsers supported.Database URL: http://www.kawasakidisease.kr. © The Author(s) 2016. Published by Oxford University Press.
Genomic survey and expression analysis of DNA repair genes in the genus Leptospira.

PubMed

Martins-Pinheiro, Marinalva; Schons-Fonseca, Luciane; da Silva, Josefa B; Domingos, Renan H; Momo, Leonardo Hiroyuki Santos; Simões, Ana Carolina Quirino; Ho, Paulo Lee; da Costa, Renata M A

2016-04-01

Leptospirosis is an emerging zoonosis with important economic and public health consequences and is caused by pathogenic leptospires. The genus Leptospira belongs to the order Spirochaetales and comprises saprophytic (L. biflexa), pathogenic (L. interrogans) and host-dependent (L. borgpetersenii) members. Here, we present an in silico search for DNA repair pathways in Leptospira spp. The relevance of such DNA repair pathways was assessed through the identification of mRNA levels of some genes during infection in animal model and after exposition to spleen cells. The search was performed by comparison of available Leptospira spp. genomes in public databases with known DNA repair-related genes. Leptospires exhibit some distinct and unexpected characteristics, for instance the existence of a redundant mechanism for repairing a chemically diverse spectrum of alkylated nucleobases, a new mutS-like gene and a new shorter version of uvrD. Leptospira spp. shares some characteristics from Gram-positive, as the presence of PcrA, two RecQ paralogs and two SSB proteins; the latter is considered a feature shared by naturally competent bacteria. We did not find a significant reduction in the number of DNA repair-related genes in both pathogenic and host-dependent species. Pathogenic leptospires were enriched for genes dedicated to base excision repair and non-homologous end joining. Their evolutionary history reveals a remarkable importance of lateral gene transfer events for the evolution of the genus. Up-regulation of specific DNA repair genes, including components of SOS regulon, during infection in animal model validates the critical role of DNA repair mechanisms for the complex interplay between host/pathogen.
Prevalence of human cell material: DNA and RNA profiling of public and private objects and after activity scenarios.

PubMed

van den Berge, M; Ozcanhan, G; Zijlstra, S; Lindenbergh, A; Sijen, T

2016-03-01

Especially when minute evidentiary traces are analysed, background cell material unrelated to the crime may contribute to detectable levels in the genetic analyses. To gain understanding on the composition of human cell material residing on surfaces contributing to background traces, we performed DNA and mRNA profiling on samplings of various items. Samples were selected by considering events contributing to cell material deposits in exemplary activities (e.g. dragging a person by the trouser ankles), and can be grouped as public objects, private samples, transfer-related samples and washing machine experiments. Results show that high DNA yields do not necessarily relate to an increased number of contributors or to the detection of other cell types than skin. Background cellular material may be found on any type of public or private item. When a major contributor can be deduced in DNA profiles from private items, this can be a different person than the owner of the item. Also when a specific activity is performed and the areas of physical contact are analysed, the "perpetrator" does not necessarily represent the major contributor in the STR profile. Washing machine experiments show that transfer and persistence during laundry is limited for DNA and cell type dependent for RNA. Skin conditions such as the presence of sebum or sweat can promote DNA transfer. Results of this study, which encompasses 549 samples, increase our understanding regarding the prevalence of human cell material in background and activity scenarios. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
An algorithm of discovering signatures from DNA databases on a computer cluster.

PubMed

Lee, Hsiao Ping; Sheu, Tzu-Fang

2014-10-05

Signatures are short sequences that are unique and not similar to any other sequence in a database that can be used as the basis to identify different species. Even though several signature discovery algorithms have been proposed in the past, these algorithms require the entirety of databases to be loaded in the memory, thus restricting the amount of data that they can process. It makes those algorithms unable to process databases with large amounts of data. Also, those algorithms use sequential models and have slower discovery speeds, meaning that the efficiency can be improved. In this research, we are debuting the utilization of a divide-and-conquer strategy in signature discovery and have proposed a parallel signature discovery algorithm on a computer cluster. The algorithm applies the divide-and-conquer strategy to solve the problem posed to the existing algorithms where they are unable to process large databases and uses a parallel computing mechanism to effectively improve the efficiency of signature discovery. Even when run with just the memory of regular personal computers, the algorithm can still process large databases such as the human whole-genome EST database which were previously unable to be processed by the existing algorithms. The algorithm proposed in this research is not limited by the amount of usable memory and can rapidly find signatures in large databases, making it useful in applications such as Next Generation Sequencing and other large database analysis and processing. The implementation of the proposed algorithm is available at http://www.cs.pu.edu.tw/~fang/DDCSDPrograms/DDCSD.htm.
Quality control of EUVE databases

NASA Technical Reports Server (NTRS)

John, L. M.; Drake, J.

1992-01-01

The publicly accessible databases for the Extreme Ultraviolet Explorer include: the EUVE Archive mailserver; the CEA ftp site; the EUVE Guest Observer Mailserver; and the Astronomical Data System node. The EUVE Performance Assurance team is responsible for verifying that these public EUVE databases are working properly, and that the public availability of EUVE data contained therein does not infringe any data rights which may have been assigned. In this poster, we describe the Quality Assurance (QA) procedures we have developed from the approach of QA as a service organization, thus reflecting the overall EUVE philosophy of Quality Assurance integrated into normal operating procedures, rather than imposed as an external, post facto, control mechanism.
Consumer Product Category Database

EPA Pesticide Factsheets

The Chemical and Product Categories database (CPCat) catalogs the use of over 40,000 chemicals and their presence in different consumer products. The chemical use information is compiled from multiple sources while product information is gathered from publicly available Material Safety Data Sheets (MSDS). EPA researchers are evaluating the possibility of expanding the database with additional product and use information.
Standard atomic volumes in double-stranded DNA and packing in protein–DNA interfaces

PubMed Central

Nadassy, Katalin; Tomás-Oliveira, Isabel; Alberts, Ian; Janin, Joël; Wodak, Shoshana J.

2001-01-01

Standard volumes for atoms in double-stranded B-DNA are derived using high resolution crystal structures from the Nucleic Acid Database (NDB) and compared with corresponding values derived from crystal structures of small organic compounds in the Cambridge Structural Database (CSD). Two different methods are used to compute these volumes: the classical Voronoi method, which does not depend on the size of atoms, and the related Radical Planes method which does. Results show that atomic groups buried in the interior of double-stranded DNA are, on average, more tightly packed than in related small molecules in the CSD. The packing efficiency of DNA atoms at the interfaces of 25 high resolution protein–DNA complexes is determined by computing the ratios between the volumes of interfacial DNA atoms and the corresponding standard volumes. These ratios are found to be close to unity, indicating that the DNA atoms at protein–DNA interfaces are as closely packed as in crystals of B-DNA. Analogous volume ratios, computed for buried protein atoms, are also near unity, confirming our earlier conclusions that the packing efficiency of these atoms is similar to that in the protein interior. In addition, we examine the number, volume and solvent occupation of cavities located at the protein–DNA interfaces and compared them with those in the protein interior. Cavities are found to be ubiquitous in the interfaces as well as inside the protein moieties. The frequency of solvent occupation of cavities is however higher in the interfaces, indicating that those are more hydrated than protein interiors. Lastly, we compare our results with those obtained using two different measures of shape complementarity of the analysed interfaces, and find that the correlation between our volume ratios and these measures, as well as between the measures themselves, is weak. Our results indicate that a tightly packed environment made up of DNA, protein and solvent atoms plays a significant role
Comprehensive Thematic T-matrix Reference Database: a 2013-2014 Update

NASA Technical Reports Server (NTRS)

Mishchenko, Michael I.; Zakharova, Nadezhda T.; Khlebtsov, Nikolai G.; Wriedt, Thomas; Videen, Gorden

2014-01-01

This paper is the sixth update to the comprehensive thematic database of peer-reviewedT-matrix publications initiated by us in 2004 and includes relevant publications that have appeared since 2013. It also lists several earlier publications not incorporated in the original database and previous updates.
MitoFish and MiFish Pipeline: A Mitochondrial Genome Database of Fish with an Analysis Pipeline for Environmental DNA Metabarcoding.

PubMed

Sato, Yukuto; Miya, Masaki; Fukunaga, Tsukasa; Sado, Tetsuya; Iwasaki, Wataru

2018-06-01

Fish mitochondrial genome (mitogenome) data form a fundamental basis for revealing vertebrate evolution and hydrosphere ecology. Here, we report recent functional updates of MitoFish, which is a database of fish mitogenomes with a precise annotation pipeline MitoAnnotator. Most importantly, we describe implementation of MiFish pipeline for metabarcoding analysis of fish mitochondrial environmental DNA, which is a fast-emerging and powerful technology in fish studies. MitoFish, MitoAnnotator, and MiFish pipeline constitute a key platform for studies of fish evolution, ecology, and conservation, and are freely available at http://mitofish.aori.u-tokyo.ac.jp/ (last accessed April 7th, 2018).
DNA typing in forensic medicine and in criminal investigations: a current survey.

PubMed

Benecke, M

1997-05-01

Since 1985 DNA typing of biological material has become one of the most powerful tools for personal identification in forensic medicine and in criminal investigations [1-6]. Classical DNA "fingerprinting" is increasingly being replaced by polymerase chain reaction (PCR) based technology which detects very short polymorphic stretches of DNA [7-15]. DNA loci which forensic scientists study do not code for proteins, and they are spread over the whole genome [16, 17]. These loci are neutral, and few provide any information about individuals except for their identity. Minute amounts of biological material are sufficient for DNA typing. Many European countries are beginning to establish databases to store DNA profiles of crime scenes and known offenders. A brief overview is given of past and present DNA typing and the establishment of forensic DNA databases in Europe.
DNA typing in forensic medicine and in criminal investigations: a current survey

NASA Astrophysics Data System (ADS)

Benecke, Mark

Since 1985 DNA typing of biological material has become one of the most powerful tools for personal identification in forensic medicine and in criminal investigations [1-6]. Classical DNA "fingerprinting" is increasingly being replaced by polymerase chain reaction (PCR) based technology which detects very short polymorphic stretches of DNA [7-15]. DNA loci which forensic scientists study do not code for proteins, and they are spread over the whole genome [16, 17]. These loci are neutral, and few provide any information about individuals except for their identity. Minute amounts of biological material are sufficient for DNA typing. Many European countries are beginning to establish databases to store DNA profiles of crime scenes and known offenders. A brief overview is given of past and present DNA typing and the establishment of forensic DNA databases in Europe.
MUBII-TB-DB: a database of mutations associated with antibiotic resistance in Mycobacterium tuberculosis.

PubMed

Flandrois, Jean-Pierre; Lina, Gérard; Dumitrescu, Oana

2014-04-14

Tuberculosis is an infectious bacterial disease caused by Mycobacterium tuberculosis. It remains a major health threat, killing over one million people every year worldwide. An early antibiotic therapy is the basis of the treatment, and the emergence and spread of multidrug and extensively drug-resistant mutant strains raise significant challenges. As these bacteria grow very slowly, drug resistance mutations are currently detected using molecular biology techniques. Resistance mutations are identified by sequencing the resistance-linked genes followed by a comparison with the literature data. The only online database is the TB Drug Resistance Mutation database (TBDReaM database); however, it requires mutation detection before use, and its interrogation is complex due to its loose syntax and grammar. The MUBII-TB-DB database is a simple, highly structured text-based database that contains a set of Mycobacterium tuberculosis mutations (DNA and proteins) occurring at seven loci: rpoB, pncA, katG; mabA(fabG1)-inhA, gyrA, gyrB, and rrs. Resistance mutation data were extracted after the systematic review of MEDLINE referenced publications before March 2013. MUBII analyzes the query sequence obtained by PCR-sequencing using two parallel strategies: i) a BLAST search against a set of previously reconstructed mutated sequences and ii) the alignment of the query sequences (DNA and its protein translation) with the wild-type sequences. The post-treatment includes the extraction of the aligned sequences together with their descriptors (position and nature of mutations). The whole procedure is performed using the internet. The results are graphs (alignments) and text (description of the mutation, therapeutic significance). The system is quick and easy to use, even for technicians without bioinformatics training. MUBII-TB-DB is a structured database of the mutations occurring at seven loci of major therapeutic value in tuberculosis management. Moreover, the system provides
Usefulness of Canadian Public Health Insurance Administrative Databases to Assess Breast and Ovarian Cancer Screening Imaging Technologies for BRCA1/2 Mutation Carriers.

PubMed

Larouche, Geneviève; Chiquette, Jocelyne; Plante, Marie; Pelletier, Sylvie; Simard, Jacques; Dorval, Michel

2016-11-01

In Canada, recommendations for clinical management of hereditary breast and ovarian cancer among individuals carrying a deleterious BRCA1 or BRCA2 mutation have been available since 2007. Eight years later, very little is known about the uptake of screening and risk-reduction measures in this population. Because Canada's public health care system falls under provincial jurisdictions, using provincial health care administrative databases appears a valuable option to assess management of BRCA1/2 mutation carriers. The objective was to explore the usefulness of public health insurance administrative databases in British Columbia, Ontario, and Quebec to assess management after BRCA1/2 genetic testing. Official public health insurance documents were considered potentially useful if they had specific procedure codes, and pertained to procedures performed in the public and private health care systems. All 3 administrative databases have specific procedures codes for mammography and breast ultrasounds. Only Quebec and Ontario have a specific procedure code for breast magnetic resonance imaging. It is impossible to assess, on an individual basis, the frequency of others screening exams, with the exception of CA-125 testing in British Columbia. Screenings done in private practice are excluded from the administrative databases unless covered by special agreements for reimbursement, such as all breast imaging exams in Ontario and mammograms in British Columbia and Quebec. There are no specific procedure codes for risk-reduction surgeries for breast and ovarian cancer. Population-based assessment of breast and ovarian cancer risk management strategies other than mammographic screening, using only administrative data, is currently challenging in the 3 Canadian provinces studied. Copyright © 2016 Canadian Association of Radiologists. Published by Elsevier Inc. All rights reserved.
A Novel Approach: Chemical Relational Databases, and the Role of the ISSCAN Database on Assessing Chemical Carcinogenity

EPA Science Inventory

Mutagenicity and carcinogenicity databases are crucial resources for toxicologists and regulators involved in chemicals risk assessment. Until recently, existing public toxicity databases have been constructed primarily as "look-up-tables" of existing data, and most often did no...
Tree chemistry database (version 1.0)

Treesearch

Linda H. Pardo; Molly Robin-Abbott; Natasha Duarte; Eric K. Miller

2005-01-01

The Tree Chemistry Database is a relational database of C, N, P, K, Ca, Mg, Mn, and Al concentrations in bole bark, bole wood, branches, twigs, and foliage. Compiled from data in 218 articles and publications, the database contains reported nutrient and biomass values for tree species in the Northeastern United States. Nutrient data can be sorted on parameters such as...
Complementary Value of Databases for Discovery of Scholarly Literature: A User Survey of Online Searching for Publications in Art History

ERIC Educational Resources Information Center

Nemeth, Erik

2010-01-01

Discovery of academic literature through Web search engines challenges the traditional role of specialized research databases. Creation of literature outside academic presses and peer-reviewed publications expands the content for scholarly research within a particular field. The resulting body of literature raises the question of whether scholars…
Annual Review of Database Development: 1992.

ERIC Educational Resources Information Center

Basch, Reva

1992-01-01

Reviews recent trends in databases and online systems. Topics discussed include new access points for established databases; acquisitions, consolidations, and competition between vendors; European coverage; international services; online reference materials, including telephone directories; political and legal materials and public records;…
On-Line Database of Vibration-Based Damage Detection Experiments

NASA Technical Reports Server (NTRS)

Pappa, Richard S.; Doebling, Scott W.; Kholwad, Tina D.

2000-01-01

This paper describes a new, on-line bibliographic database of vibration-based damage detection experiments. Publications in the database discuss experiments conducted on actual structures as well as those conducted with simulated data. The database can be searched and sorted in many ways, and it provides photographs of test structures when available. It currently contains 100 publications, which is estimated to be about 5-10% of the number of papers written to date on this subject. Additional entries are forthcoming. This database is available for public use on the Internet at the following address: http://sdbpappa-mac.larc.nasa.gov. Click on the link named "dd_experiments.fp3" and then type "guest" as the password. No user name is required.

JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles

PubMed Central

Portales-Casamar, Elodie; Thongjuea, Supat; Kwon, Andrew T.; Arenillas, David; Zhao, Xiaobei; Valen, Eivind; Yusuf, Dimas; Lenhard, Boris; Wasserman, Wyeth W.; Sandelin, Albin

2010-01-01

JASPAR (http://jaspar.genereg.net) is the leading open-access database of matrix profiles describing the DNA-binding patterns of transcription factors (TFs) and other proteins interacting with DNA in a sequence-specific manner. Its fourth major release is the largest expansion of the core database to date: the database now holds 457 non-redundant, curated profiles. The new entries include the first batch of profiles derived from ChIP-seq and ChIP-chip whole-genome binding experiments, and 177 yeast TF binding profiles. The introduction of a yeast division brings the convenience of JASPAR to an active research community. As binding models are refined by newer data, the JASPAR database now uses versioning of matrices: in this release, 12% of the older models were updated to improved versions. Classification of TF families has been improved by adopting a new DNA-binding domain nomenclature. A curated catalog of mammalian TFs is provided, extending the use of the JASPAR profiles to additional TFs belonging to the same structural family. The changes in the database set the system ready for more rapid acquisition of new high-throughput data sources. Additionally, three new special collections provide matrix profile data produced by recent alternative high-throughput approaches. PMID:19906716
Inclusiveness, Effectiveness and Intrusiveness: Issues in the Developing Uses of DNA Profiling in Support of Criminal Investigations

PubMed Central

2005-01-01

Précis The rapid implementation and continuing expansion of forensic DNA databases around the world has been supported by claims about their effectiveness in criminal investigations and challenged by assertions of the resulting intrusiveness into individual privacy. These two competing perspectives provide the basis for ongoing considerations about the categories of persons who should be subject to nonconsensual DNA sampling and profile retention as well as the uses to which such profiles should be put. This paper uses the example of the current arrangements for forensic DNA databasing in England & Wales to discuss the ways in which the legislative and operational basis for police DNA databasing is reliant upon continuous deliberations over these and other matters by a range of key stakeholders. We also assess the effects of the recent innovative use of DNA databasing for ‘familial searching’ in this jurisdiction in order to show how agreed understandings about the appropriate uses of DNA can become unsettled and reformulated even where their investigative effectiveness is uncontested. We conclude by making some observations about the future of what is recognised to be the largest forensic DNA database in the world. PMID:16240734
Expanding the functional human mitochondrial DNA database by the establishment of primate xenomitochondrial cybrids

PubMed Central

Kenyon, Lesley; Moraes, Carlos T.

1997-01-01

The nuclear and mitochondrial genomes coevolve to optimize approximately 100 different interactions necessary for an efficient ATP-generating system. This coevolution led to a species-specific compatibility between these genomes. We introduced mitochondrial DNA (mtDNA) from different primates into mtDNA-less human cells and selected for growth of cells with a functional oxidative phosphorylation system. mtDNA from common chimpanzee, pigmy chimpanzee, and gorilla were able to restore oxidative phosphorylation in the context of a human nuclear background, whereas mtDNA from orangutan, and species representative of Old-World monkeys, New-World monkeys, and lemurs were not. Oxygen consumption, a sensitive index of respiratory function, showed that mtDNA from chimpanzee, pigmy chimpanzee, and gorilla replaced the human mtDNA and restored respiration to essentially normal levels. Mitochondrial protein synthesis was also unaltered in successful “xenomitochondrial cybrids.” The abrupt failure of mtDNA from primate species that diverged from humans as recently as 8–18 million years ago to functionally replace human mtDNA suggests the presence of one or a few mutations affecting critical nuclear–mitochondrial genome interactions between these species. These cellular systems provide a demonstration of intergenus mtDNA transfer, expand more than 20-fold the number of mtDNA polymorphisms that can be analyzed in a human nuclear background, and provide a novel model for the study of nuclear–mitochondrial interactions. PMID:9256447
40 CFR 1400.13 - Read-only database.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 34 2012-07-01 2012-07-01 false Read-only database. 1400.13 Section... INFORMATION Other Provisions § 1400.13 Read-only database. The Administrator is authorized to establish... public off-site consequence analysis information by means of a central database under the control of the...
42 CFR 455.436 - Federal database checks.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 42 Public Health 4 2012-10-01 2012-10-01 false Federal database checks. 455.436 Section 455.436....436 Federal database checks. The State Medicaid agency must do all of the following: (a) Confirm the... databases. (b) Check the Social Security Administration's Death Master File, the National Plan and Provider...
42 CFR 455.436 - Federal database checks.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 42 Public Health 4 2011-10-01 2011-10-01 false Federal database checks. 455.436 Section 455.436....436 Federal database checks. The State Medicaid agency must do all of the following: (a) Confirm the... databases. (b) Check the Social Security Administration's Death Master File, the National Plan and Provider...
40 CFR 1400.13 - Read-only database.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 33 2014-07-01 2014-07-01 false Read-only database. 1400.13 Section... INFORMATION Other Provisions § 1400.13 Read-only database. The Administrator is authorized to establish... public off-site consequence analysis information by means of a central database under the control of the...
42 CFR 455.436 - Federal database checks.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 42 Public Health 4 2014-10-01 2014-10-01 false Federal database checks. 455.436 Section 455.436....436 Federal database checks. The State Medicaid agency must do all of the following: (a) Confirm the... databases. (b) Check the Social Security Administration's Death Master File, the National Plan and Provider...
40 CFR 1400.13 - Read-only database.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 33 2011-07-01 2011-07-01 false Read-only database. 1400.13 Section... INFORMATION Other Provisions § 1400.13 Read-only database. The Administrator is authorized to establish... public off-site consequence analysis information by means of a central database under the control of the...
42 CFR 455.436 - Federal database checks.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 42 Public Health 4 2013-10-01 2013-10-01 false Federal database checks. 455.436 Section 455.436....436 Federal database checks. The State Medicaid agency must do all of the following: (a) Confirm the... databases. (b) Check the Social Security Administration's Death Master File, the National Plan and Provider...
40 CFR 1400.13 - Read-only database.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 34 2013-07-01 2013-07-01 false Read-only database. 1400.13 Section... INFORMATION Other Provisions § 1400.13 Read-only database. The Administrator is authorized to establish... public off-site consequence analysis information by means of a central database under the control of the...
Curation accuracy of model organism databases

PubMed Central

Keseler, Ingrid M.; Skrzypek, Marek; Weerasinghe, Deepika; Chen, Albert Y.; Fulcher, Carol; Li, Gene-Wei; Lemmer, Kimberly C.; Mladinich, Katherine M.; Chow, Edmond D.; Sherlock, Gavin; Karp, Peter D.

2014-01-01

Manual extraction of information from the biomedical literature—or biocuration—is the central methodology used to construct many biological databases. For example, the UniProt protein database, the EcoCyc Escherichia coli database and the Candida Genome Database (CGD) are all based on biocuration. Biological databases are used extensively by life science researchers, as online encyclopedias, as aids in the interpretation of new experimental data and as golden standards for the development of new bioinformatics algorithms. Although manual curation has been assumed to be highly accurate, we are aware of only one previous study of biocuration accuracy. We assessed the accuracy of EcoCyc and CGD by manually selecting curated assertions within randomly chosen EcoCyc and CGD gene pages and by then validating that the data found in the referenced publications supported those assertions. A database assertion is considered to be in error if that assertion could not be found in the publication cited for that assertion. We identified 10 errors in the 633 facts that we validated across the two databases, for an overall error rate of 1.58%, and individual error rates of 1.82% for CGD and 1.40% for EcoCyc. These data suggest that manual curation of the experimental literature by Ph.D-level scientists is highly accurate. Database URL: http://ecocyc.org/, http://www.candidagenome.org// PMID:24923819
Genome databases

DOE Office of Scientific and Technical Information (OSTI.GOV)

Courteau, J.

1991-10-11

Since the Genome Project began several years ago, a plethora of databases have been developed or are in the works. They range from the massive Genome Data Base at Johns Hopkins University, the central repository of all gene mapping information, to small databases focusing on single chromosomes or organisms. Some are publicly available, others are essentially private electronic lab notebooks. Still others limit access to a consortium of researchers working on, say, a single human chromosome. An increasing number incorporate sophisticated search and analytical software, while others operate as little more than data lists. In consultation with numerous experts inmore » the field, a list has been compiled of some key genome-related databases. The list was not limited to map and sequence databases but also included the tools investigators use to interpret and elucidate genetic data, such as protein sequence and protein structure databases. Because a major goal of the Genome Project is to map and sequence the genomes of several experimental animals, including E. coli, yeast, fruit fly, nematode, and mouse, the available databases for those organisms are listed as well. The author also includes several databases that are still under development - including some ambitious efforts that go beyond data compilation to create what are being called electronic research communities, enabling many users, rather than just one or a few curators, to add or edit the data and tag it as raw or confirmed.« less
The Giardia genome project database.

PubMed

McArthur, A G; Morrison, H G; Nixon, J E; Passamaneck, N Q; Kim, U; Hinkle, G; Crocker, M K; Holder, M E; Farr, R; Reich, C I; Olsen, G E; Aley, S B; Adam, R D; Gillin, F D; Sogin, M L

2000-08-15

The Giardia genome project database provides an online resource for Giardia lamblia (WB strain, clone C6) genome sequence information. The database includes edited single-pass reads, the results of BLASTX searches, and details of progress towards sequencing the entire 12 million-bp Giardia genome. Pre-sorted BLASTX results can be retrieved based on keyword searches and BLAST searches of the high throughput Giardia data can be initiated from the web site or through NCBI. Descriptions of the genomic DNA libraries, project protocols and summary statistics are also available. Although the Giardia genome project is ongoing, new sequences are made available on a bi-monthly basis to ensure that researchers have access to information that may assist them in the search for genes and their biological function. The current URL of the Giardia genome project database is www.mbl.edu/Giardia.
Distribution and Classification of Serine β-Lactamases in Brazilian Hospital Sewage and Other Environmental Metagenomes Deposited in Public Databases

PubMed Central

Fróes, Adriana M.; da Mota, Fábio F.; Cuadrat, Rafael R. C.; Dávila, Alberto M. R.

2016-01-01

β-lactam is the most used antibiotic class in the clinical area and it acts on blocking the bacteria cell wall synthesis, causing cell death. However, some bacteria have evolved resistance to these antibiotics mainly due the production of enzymes known as β-lactamases. Hospital sewage is an important source of dispersion of multidrug-resistant bacteria in rivers and oceans. In this work, we used next-generation DNA sequencing to explore the diversity and dissemination of serine β-lactamases in two hospital sewage from Rio de Janeiro, Brazil (South Zone, SZ and North Zone, NZ), presenting different profiles, and to compare them with public environmental data available. Also, we propose a Hidden-Markov-Model approach to screen potential serine β-lactamases genes (in public environments samples and generated hospital sewage data), exploring its evolutionary relationships. Due to the high variability in β-lactamases, we used a position-specific scoring matrix search method (RPS-BLAST) against conserved domain database profiles (CDD, Pfam, and COG) followed by visual inspection to detect conserved motifs, to increase the reliability of the results and remove possible false positives. We were able to identify novel β-lactamases from Brazilian hospital sewage and to estimate relative abundance of its types. The highest relative abundance found in SZ was the Class A (50%), while Class D is predominant in NZ (55%). CfxA (65%) and ACC (47%) types were the most abundant genes detected in SZ, while in NZ the most frequent were OXA-10 (32%), CfxA (28%), ACC (21%), CEPA (20%), and FOX (19%). Phylogenetic analysis revealed β-lactamases from Brazilian hospital sewage grouped in the same clade and close to sequences belonging to Firmicutes and Bacteroidetes groups, but distant from potential β-lactamases screened from public environmental data, that grouped closer to β-lactamases of Proteobacteria. Our results demonstrated that HMM-based approach identified homologs of
Distribution and Classification of Serine β-Lactamases in Brazilian Hospital Sewage and Other Environmental Metagenomes Deposited in Public Databases.

PubMed

Fróes, Adriana M; da Mota, Fábio F; Cuadrat, Rafael R C; Dávila, Alberto M R

2016-01-01

β-lactam is the most used antibiotic class in the clinical area and it acts on blocking the bacteria cell wall synthesis, causing cell death. However, some bacteria have evolved resistance to these antibiotics mainly due the production of enzymes known as β-lactamases. Hospital sewage is an important source of dispersion of multidrug-resistant bacteria in rivers and oceans. In this work, we used next-generation DNA sequencing to explore the diversity and dissemination of serine β-lactamases in two hospital sewage from Rio de Janeiro, Brazil (South Zone, SZ and North Zone, NZ), presenting different profiles, and to compare them with public environmental data available. Also, we propose a Hidden-Markov-Model approach to screen potential serine β-lactamases genes (in public environments samples and generated hospital sewage data), exploring its evolutionary relationships. Due to the high variability in β-lactamases, we used a position-specific scoring matrix search method (RPS-BLAST) against conserved domain database profiles (CDD, Pfam, and COG) followed by visual inspection to detect conserved motifs, to increase the reliability of the results and remove possible false positives. We were able to identify novel β-lactamases from Brazilian hospital sewage and to estimate relative abundance of its types. The highest relative abundance found in SZ was the Class A (50%), while Class D is predominant in NZ (55%). CfxA (65%) and ACC (47%) types were the most abundant genes detected in SZ, while in NZ the most frequent were OXA-10 (32%), CfxA (28%), ACC (21%), CEPA (20%), and FOX (19%). Phylogenetic analysis revealed β-lactamases from Brazilian hospital sewage grouped in the same clade and close to sequences belonging to Firmicutes and Bacteroidetes groups, but distant from potential β-lactamases screened from public environmental data, that grouped closer to β-lactamases of Proteobacteria. Our results demonstrated that HMM-based approach identified homologs of
Investigating Evolutionary Questions Using Online Molecular Databases.

ERIC Educational Resources Information Center

Puterbaugh, Mary N.; Burleigh, J. Gordon

2001-01-01

Recommends using online molecular databases as teaching tools to illustrate evolutionary questions and concepts while introducing students to public molecular databases. Provides activities in which students make molecular comparisons between species. (YDS)
Partial DNA sequencing of Douglas-fir cDNAs used in RFLP mapping

Treesearch

K.D. Jermstad; D.L. Bassoni; C.S. Kinlaw; D.B. Neale

1998-01-01

DNA sequences from 87 Douglas-fir (Pseudotsuga menziesii [Mirb.] Franco) cDNA RFLP probes were determined. Sequences were submitted to the GenBank dbEST database and searched for similarity against nucleotide and protein databases using the BLASTn and BLASTx programs. Twenty-one sequences (24%) were assigned putative functions; 18 of which...
OnTheFly: a database of Drosophila melanogaster transcription factors and their binding sites.

PubMed

Shazman, Shula; Lee, Hunjoong; Socol, Yakov; Mann, Richard S; Honig, Barry

2014-01-01

We present OnTheFly (http://bhapp.c2b2.columbia.edu/OnTheFly/index.php), a database comprising a systematic collection of transcription factors (TFs) of Drosophila melanogaster and their DNA-binding sites. TFs predicted in the Drosophila melanogaster genome are annotated and classified and their structures, obtained via experiment or homology models, are provided. All known preferred TF DNA-binding sites obtained from the B1H, DNase I and SELEX methodologies are presented. DNA shape parameters predicted for these sites are obtained from a high throughput server or from crystal structures of protein-DNA complexes where available. An important feature of the database is that all DNA-binding domains and their binding sites are fully annotated in a eukaryote using structural criteria and evolutionary homology. OnTheFly thus provides a comprehensive view of TFs and their binding sites that will be a valuable resource for deciphering non-coding regulatory DNA.
LMSD: LIPID MAPS structure database

PubMed Central

Sud, Manish; Fahy, Eoin; Cotter, Dawn; Brown, Alex; Dennis, Edward A.; Glass, Christopher K.; Merrill, Alfred H.; Murphy, Robert C.; Raetz, Christian R. H.; Russell, David W.; Subramaniam, Shankar

2007-01-01

The LIPID MAPS Structure Database (LMSD) is a relational database encompassing structures and annotations of biologically relevant lipids. Structures of lipids in the database come from four sources: (i) LIPID MAPS Consortium's core laboratories and partners; (ii) lipids identified by LIPID MAPS experiments; (iii) computationally generated structures for appropriate lipid classes; (iv) biologically relevant lipids manually curated from LIPID BANK, LIPIDAT and other public sources. All the lipid structures in LMSD are drawn in a consistent fashion. In addition to a classification-based retrieval of lipids, users can search LMSD using either text-based or structure-based search options. The text-based search implementation supports data retrieval by any combination of these data fields: LIPID MAPS ID, systematic or common name, mass, formula, category, main class, and subclass data fields. The structure-based search, in conjunction with optional data fields, provides the capability to perform a substructure search or exact match for the structure drawn by the user. Search results, in addition to structure and annotations, also include relevant links to external databases. The LMSD is publicly available at PMID:17098933

Introducing a Public Stereoscopic 3D High Dynamic Range (SHDR) Video Database

NASA Astrophysics Data System (ADS)

Banitalebi-Dehkordi, Amin

2017-03-01

High dynamic range (HDR) displays and cameras are paving their ways through the consumer market at a rapid growth rate. Thanks to TV and camera manufacturers, HDR systems are now becoming available commercially to end users. This is taking place only a few years after the blooming of 3D video technologies. MPEG/ITU are also actively working towards the standardization of these technologies. However, preliminary research efforts in these video technologies are hammered by the lack of sufficient experimental data. In this paper, we introduce a Stereoscopic 3D HDR database of videos that is made publicly available to the research community. We explain the procedure taken to capture, calibrate, and post-process the videos. In addition, we provide insights on potential use-cases, challenges, and research opportunities, implied by the combination of higher dynamic range of the HDR aspect, and depth impression of the 3D aspect.
Three Decades of Recombinant DNA.

ERIC Educational Resources Information Center

Palmer, Jackie

1985-01-01

Discusses highlights in the development of genetic engineering, examining techniques with recombinant DNA, legal and ethical issues, GenBank (a national database of nucleic acid sequences), and other topics. (JN)
Significant variance in genetic diversity among populations of Schistosoma haematobium detected using microsatellite DNA loci from a genome-wide database.

PubMed

Glenn, Travis C; Lance, Stacey L; McKee, Anna M; Webster, Bonnie L; Emery, Aidan M; Zerlotini, Adhemar; Oliveira, Guilherme; Rollinson, David; Faircloth, Brant C

2013-10-17

Urogenital schistosomiasis caused by Schistosoma haematobium is widely distributed across Africa and is increasingly being targeted for control. Genome sequences and population genetic parameters can give insight into the potential for population- or species-level drug resistance. Microsatellite DNA loci are genetic markers in wide use by Schistosoma researchers, but there are few primers available for S. haematobium. We sequenced 1,058,114 random DNA fragments from clonal cercariae collected from a snail infected with a single Schistosoma haematobium miracidium. We assembled and aligned the S. haematobium sequences to the genomes of S. mansoni and S. japonicum, identifying microsatellite DNA loci across all three species and designing primers to amplify the loci in S. haematobium. To validate our primers, we screened 32 randomly selected primer pairs with population samples of S. haematobium. We designed >13,790 primer pairs to amplify unique microsatellite loci in S. haematobium, (available at http://www.cebio.org/projetos/schistosoma-haematobium-genome). The three Schistosoma genomes contained similar overall frequencies of microsatellites, but the frequency and length distributions of specific motifs differed among species. We identified 15 primer pairs that amplified consistently and were easily scored. We genotyped these 15 loci in S. haematobium individuals from six locations: Zanzibar had the highest levels of diversity; Malawi, Mauritius, Nigeria, and Senegal were nearly as diverse; but the sample from South Africa was much less diverse. About half of the primers in the database of Schistosoma haematobium microsatellite DNA loci should yield amplifiable and easily scored polymorphic markers, thus providing thousands of potential markers. Sequence conservation among S. haematobium, S. japonicum, and S. mansoni is relatively high, thus it should now be possible to identify markers that are universal among Schistosoma species (i.e., using DNA sequences
24 CFR 902.24 - Database adjustment.

Code of Federal Regulations, 2012 CFR

2012-04-01

... 24 Housing and Urban Development 4 2012-04-01 2012-04-01 false Database adjustment. 902.24 Section 902.24 Housing and Urban Development REGULATIONS RELATING TO HOUSING AND URBAN DEVELOPMENT (CONTINUED... PUBLIC HOUSING ASSESSMENT SYSTEM Physical Condition Indicator § 902.24 Database adjustment. (a...
24 CFR 902.24 - Database adjustment.

Code of Federal Regulations, 2013 CFR

2013-04-01

... 24 Housing and Urban Development 4 2013-04-01 2013-04-01 false Database adjustment. 902.24 Section 902.24 Housing and Urban Development REGULATIONS RELATING TO HOUSING AND URBAN DEVELOPMENT (CONTINUED... PUBLIC HOUSING ASSESSMENT SYSTEM Physical Condition Indicator § 902.24 Database adjustment. (a...
24 CFR 902.24 - Database adjustment.

Code of Federal Regulations, 2011 CFR

2011-04-01

... 24 Housing and Urban Development 4 2011-04-01 2011-04-01 false Database adjustment. 902.24 Section 902.24 Housing and Urban Development REGULATIONS RELATING TO HOUSING AND URBAN DEVELOPMENT (CONTINUED... PUBLIC HOUSING ASSESSMENT SYSTEM Physical Condition Indicator § 902.24 Database adjustment. (a...
24 CFR 902.24 - Database adjustment.

Code of Federal Regulations, 2014 CFR

2014-04-01

... 24 Housing and Urban Development 4 2014-04-01 2014-04-01 false Database adjustment. 902.24 Section 902.24 Housing and Urban Development REGULATIONS RELATING TO HOUSING AND URBAN DEVELOPMENT (CONTINUED... PUBLIC HOUSING ASSESSMENT SYSTEM Physical Condition Indicator § 902.24 Database adjustment. (a...
New taxonomy and old collections: integrating DNA barcoding into the collection curation process.

PubMed

Puillandre, N; Bouchet, P; Boisselier-Dubayle, M-C; Brisset, J; Buge, B; Castelin, M; Chagnoux, S; Christophe, T; Corbari, L; Lambourdière, J; Lozouet, P; Marani, G; Rivasseau, A; Silva, N; Terryn, Y; Tillier, S; Utge, J; Samadi, S

2012-05-01

Because they house large biodiversity collections and are also research centres with sequencing facilities, natural history museums are well placed to develop DNA barcoding best practices. The main difficulty is generally the vouchering system: it must ensure that all data produced remain attached to the corresponding specimen, from the field to publication in articles and online databases. The Museum National d'Histoire Naturelle in Paris is one of the leading laboratories in the Marine Barcode of Life (MarBOL) project, which was used as a pilot programme to include barcode collections for marine molluscs and crustaceans. The system is based on two relational databases. The first one classically records the data (locality and identification) attached to the specimens. In the second one, tissue-clippings, DNA extractions (both preserved in 2D barcode tubes) and PCR data (including primers) are linked to the corresponding specimen. All the steps of the process [sampling event, specimen identification, molecular processing, data submission to Barcode Of Life Database (BOLD) and GenBank] are thus linked together. Furthermore, we have developed several web-based tools to automatically upload data into the system, control the quality of the sequences produced and facilitate the submission to online databases. This work is the result of a joint effort from several teams in the Museum National d'Histoire Naturelle (MNHN), but also from a collaborative network of taxonomists and molecular systematists outside the museum, resulting in the vouchering so far of ∼41,000 sequences and the production of ∼11,000 COI sequences. © 2012 Blackwell Publishing Ltd.
Harnessing mtDNA variation to resolve ambiguity in ‘Redfish’ sold in Europe

PubMed Central

Moore, Lauren; Pampoulie, Christophe; Di Muri, Cristina; Vandamme, Sara; Mariani, Stefano

2017-01-01

Morphology-based identification of North Atlantic Sebastes has long been controversial and misidentification may produce misleading data, with cascading consequences that negatively affect fisheries management and seafood labelling. North Atlantic Sebastes comprises of four species, commonly known as ‘redfish’, but little is known about the number, identity and labelling accuracy of redfish species sold across Europe. We used a molecular approach to identify redfish species from ‘blind’ specimens to evaluate the performance of the Barcode of Life (BOLD) and Genbank databases, as well as carrying out a market product accuracy survey from retailers across Europe. The conventional BOLD approach proved ambiguous, and phylogenetic analysis based on mtDNA control region sequences provided a higher resolution for species identification. By sampling market products from four countries, we found the presence of two species of redfish (S. norvegicus and S. mentella) and one unidentified Pacific rockfish marketed in Europe. Furthermore, public databases revealed the existence of inaccurate reference sequences, likely stemming from species misidentification from previous studies, which currently hinders the efficacy of DNA methods for the identification of Sebastes market samples. PMID:29018597
Using the Proteomics Identifications Database (PRIDE).

PubMed

Martens, Lennart; Jones, Phil; Côté, Richard

2008-03-01

The Proteomics Identifications Database (PRIDE) is a public data repository designed to store, disseminate, and analyze mass spectrometry based proteomics datasets. The PRIDE database can accommodate any level of detailed metadata about the submitted results, which can be queried, explored, viewed, or downloaded via the PRIDE Web interface. The PRIDE database also provides a simple, yet powerful, access control mechanism that fully supports confidential peer-reviewing of data related to a manuscript, ensuring that these results remain invisible to the general public while allowing referees and journal editors anonymized access to the data. This unit describes in detail the functionality that PRIDE provides with regards to searching, viewing, and comparing the available data, as well as different options for submitting data to PRIDE.
The 2015 Nucleic Acids Research Database Issue and molecular biology database collection.

PubMed

Galperin, Michael Y; Rigden, Daniel J; Fernández-Suárez, Xosé M

2015-01-01

The 2015 Nucleic Acids Research Database Issue contains 172 papers that include descriptions of 56 new molecular biology databases, and updates on 115 databases whose descriptions have been previously published in NAR or other journals. Following the classification that has been introduced last year in order to simplify navigation of the entire issue, these articles are divided into eight subject categories. This year's highlights include RNAcentral, an international community portal to various databases on noncoding RNA; ValidatorDB, a validation database for protein structures and their ligands; SASBDB, a primary repository for small-angle scattering data of various macromolecular complexes; MoonProt, a database of 'moonlighting' proteins, and two new databases of protein-protein and other macromolecular complexes, ComPPI and the Complex Portal. This issue also includes an unusually high number of cancer-related databases and other databases dedicated to genomic basics of disease and potential drugs and drug targets. The size of NAR online Molecular Biology Database Collection, http://www.oxfordjournals.org/nar/database/a/, remained approximately the same, following the addition of 74 new resources and removal of 77 obsolete web sites. The entire Database Issue is freely available online on the Nucleic Acids Research web site (http://nar.oxfordjournals.org/). Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.
On-Line Databases in Mexico.

ERIC Educational Resources Information Center

Molina, Enzo

1986-01-01

Use of online bibliographic databases in Mexico is provided through Servicio de Consulta a Bancos de Informacion, a public service that provides information retrieval, document delivery, translation, technical support, and training services. Technical infrastructure is based on a public packet-switching network and institutional users may receive…
Tibetan Magmatism Database

NASA Astrophysics Data System (ADS)

Chapman, James B.; Kapp, Paul

2017-11-01

A database containing previously published geochronologic, geochemical, and isotopic data on Mesozoic to Quaternary igneous rocks in the Himalayan-Tibetan orogenic system are presented. The database is intended to serve as a repository for new and existing igneous rock data and is publicly accessible through a web-based platform that includes an interactive map and data table interface with search, filtering, and download options. To illustrate the utility of the database, the age, location, and ɛHft composition of magmatism from the central Gangdese batholith in the southern Lhasa terrane are compared. The data identify three high-flux events, which peak at 93, 50, and 15 Ma. They are characterized by inboard arc migration and a temporal and spatial shift to more evolved isotopic compositions.
Sequencing of cDNA Clones from the Genetic Map of Tomato (Lycopersicon esculentum)

PubMed Central

Ganal, Martin W.; Czihal, Rosemarie; Hannappel, Ulrich; Kloos, Dorothee-U.; Polley, Andreas; Ling, Hong-Qing

1998-01-01

The dense RFLP linkage map of tomato (Lycopersicon esculentum) contains >300 anonymous cDNA clones. Of those clones, 272 were partially or completely sequenced. The sequences were compared at the DNA and protein level to known genes in databases. For 57% of the clones, a significant match to previously described genes was found. The information will permit the conversion of those markers to STS markers and allow their use in PCR-based mapping experiments. Furthermore, it will facilitate the comparative mapping of genes across distantly related plant species by direct comparison of DNA sequences and map positions. [cDNA sequence data reported in this paper have been submitted to the EMBL database under accession nos. AA824695–AA825005 and the dbEST_Id database under accession nos. 1546519–1546862.] PMID:9724330
A Novel Approach: Chemical Relational Databases, and the ...

EPA Pesticide Factsheets

Mutagenicity and carcinogenicity databases are crucial resources for toxicologists and regulators involved in chemicals risk assessment. Until recently, existing public toxicity databases have been constructed primarily as
International Society of Human and Animal Mycology (ISHAM)-ITS reference DNA barcoding database--the quality controlled standard tool for routine identification of human and animal pathogenic fungi.

PubMed

Irinyi, Laszlo; Serena, Carolina; Garcia-Hermoso, Dea; Arabatzis, Michael; Desnos-Ollivier, Marie; Vu, Duong; Cardinali, Gianluigi; Arthur, Ian; Normand, Anne-Cécile; Giraldo, Alejandra; da Cunha, Keith Cassia; Sandoval-Denis, Marcelo; Hendrickx, Marijke; Nishikaku, Angela Satie; de Azevedo Melo, Analy Salles; Merseguel, Karina Bellinghausen; Khan, Aziza; Parente Rocha, Juliana Alves; Sampaio, Paula; da Silva Briones, Marcelo Ribeiro; e Ferreira, Renata Carmona; de Medeiros Muniz, Mauro; Castañón-Olivares, Laura Rosio; Estrada-Barcenas, Daniel; Cassagne, Carole; Mary, Charles; Duan, Shu Yao; Kong, Fanrong; Sun, Annie Ying; Zeng, Xianyu; Zhao, Zuotao; Gantois, Nausicaa; Botterel, Françoise; Robbertse, Barbara; Schoch, Conrad; Gams, Walter; Ellis, David; Halliday, Catriona; Chen, Sharon; Sorrell, Tania C; Piarroux, Renaud; Colombo, Arnaldo L; Pais, Célia; de Hoog, Sybren; Zancopé-Oliveira, Rosely Maria; Taylor, Maria Lucia; Toriello, Conchita; de Almeida Soares, Célia Maria; Delhaes, Laurence; Stubbe, Dirk; Dromer, Françoise; Ranque, Stéphane; Guarro, Josep; Cano-Lira, Jose F; Robert, Vincent; Velegraki, Aristea; Meyer, Wieland

2015-05-01

Human and animal fungal pathogens are a growing threat worldwide leading to emerging infections and creating new risks for established ones. There is a growing need for a rapid and accurate identification of pathogens to enable early diagnosis and targeted antifungal therapy. Morphological and biochemical identification methods are time-consuming and require trained experts. Alternatively, molecular methods, such as DNA barcoding, a powerful and easy tool for rapid monophasic identification, offer a practical approach for species identification and less demanding in terms of taxonomical expertise. However, its wide-spread use is still limited by a lack of quality-controlled reference databases and the evolving recognition and definition of new fungal species/complexes. An international consortium of medical mycology laboratories was formed aiming to establish a quality controlled ITS database under the umbrella of the ISHAM working group on "DNA barcoding of human and animal pathogenic fungi." A new database, containing 2800 ITS sequences representing 421 fungal species, providing the medical community with a freely accessible tool at http://www.isham.org/ and http://its.mycologylab.org/ to rapidly and reliably identify most agents of mycoses, was established. The generated sequences included in the new database were used to evaluate the variation and overall utility of the ITS region for the identification of pathogenic fungi at intra-and interspecies level. The average intraspecies variation ranged from 0 to 2.25%. This highlighted selected pathogenic fungal species, such as the dermatophytes and emerging yeast, for which additional molecular methods/genetic markers are required for their reliable identification from clinical and veterinary specimens. © The Author 2015. Published by Oxford University Press on behalf of The International Society for Human and Animal Mycology. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
MetPetDB: A database for metamorphic geochemistry

NASA Astrophysics Data System (ADS)

Spear, Frank S.; Hallett, Benjamin; Pyle, Joseph M.; Adalı, Sibel; Szymanski, Boleslaw K.; Waters, Anthony; Linder, Zak; Pearce, Shawn O.; Fyffe, Matthew; Goldfarb, Dennis; Glickenhouse, Nickolas; Buletti, Heather

2009-12-01

We present a data model for the initial implementation of MetPetDB, a geochemical database specific to metamorphic rock samples. The database is designed around the concept of preservation of spatial relationships, at all scales, of chemical analyses and their textural setting. Objects in the database (samples) represent physical rock samples; each sample may contain one or more subsamples with associated geochemical and image data. Samples, subsamples, geochemical data, and images are described with attributes (some required, some optional); these attributes also serve as search delimiters. All data in the database are classified as published (i.e., archived or published data), public or private. Public and published data may be freely searched and downloaded. All private data is owned; permission to view, edit, download and otherwise manipulate private data may be granted only by the data owner; all such editing operations are recorded by the database to create a data version log. The sharing of data permissions among a group of collaborators researching a common sample is done by the sample owner through the project manager. User interaction with MetPetDB is hosted by a web-based platform based upon the Java servlet application programming interface, with the PostgreSQL relational database. The database web portal includes modules that allow the user to interact with the database: registered users may save and download public and published data, upload private data, create projects, and assign permission levels to project collaborators. An Image Viewer module provides for spatial integration of image and geochemical data. A toolkit consisting of plotting and geochemical calculation software for data analysis and a mobile application for viewing the public and published data is being developed. Future issues to address include population of the database, integration with other geochemical databases, development of the analysis toolkit, creation of data models for
DNA fingerprinting in forensics: past, present, future

PubMed Central

2013-01-01

DNA fingerprinting, one of the great discoveries of the late 20th century, has revolutionized forensic investigations. This review briefly recapitulates 30 years of progress in forensic DNA analysis which helps to convict criminals, exonerate the wrongly accused, and identify victims of crime, disasters, and war. Current standard methods based on short tandem repeats (STRs) as well as lineage markers (Y chromosome, mitochondrial DNA) are covered and applications are illustrated by casework examples. Benefits and risks of expanding forensic DNA databases are discussed and we ask what the future holds for forensic DNA fingerprinting. PMID:24245688
Library Instruction and Online Database Searching.

ERIC Educational Resources Information Center

Mercado, Heidi

1999-01-01

Reviews changes in online database searching in academic libraries. Topics include librarians conducting all searches; the advent of end-user searching and the need for user instruction; compact disk technology; online public catalogs; the Internet; full text databases; electronic information literacy; user education and the remote library user;…
Family medicine publications in Taiwan: an analysis of the Web of Science database from 1993 to 2012.

PubMed

Lin, Ming-Hwai; Hwang, Shinn-Jang; Hwang, I-Hsuan; Chen, Yu-Chun

2014-11-01

Academic publications are important for developing a medical specialty or discipline. Since family medicine is a rapidly growing medical specialty in Taiwan, this study aimed to analyze family medicine publications from 1993 to 2012 in Taiwan using the Web of Science database. Published academic articles submitted from departments/institutes of family medicine were retrieved and analyzed from the Web of Science database, which includes articles published in the Science Citation Index-Expanded and Social Science Citation Indexed journals from 1993 to 2012. Among 33,073 published articles submitted from the departments/institutes of family medicine worldwide during the years 1993-2012, 1552 articles (4.69%) were submitted from Taiwan, ranking fourth in the world after the USA, Canada, and Sweden. In total, 1409 articles from Taiwan, excluding meeting abstracts and corrections, were selected for further analyses. During these two decades, family medicine publications increased rapidly. There were 60 articles published during 1993-1997, 180 articles during 1998-2002, 334 articles during 2003-2007, and up to 836 articles during 2008-2012. However, the mean citation number of articles decreased from 19.0 to 17.7, 15.1, and 3.8, and the mean impact factor of published journals decreased from 3.41 to 3.15, 2.78 and 2.82 during the periods 1993-1997, 1998-2002, 2003-2007, and 2008-2012, respectively. Most articles belonged to the subject category of the Medicine, General and Internal category (194 articles, 13.8%), followed by Public Environmental Occupational Health (144 articles, 10.2%), Oncology (126 articles, 9.2%), Endocrinology Metabolism (111 articles, 7.9%), Geriatrics Gerontology (99 articles, 7.0%), and the Gastroenterology Hepatology category (85 articles, 6.0%). However, only six articles (0.4%) were published in the Primary Health Care category. Publications from departments/institutes of family medicine in Taiwan increased rapidly from 1993 to 2012. However

AtlasT4SS: a curated database for type IV secretion systems.

PubMed

Souza, Rangel C; del Rosario Quispe Saji, Guadalupe; Costa, Maiana O C; Netto, Diogo S; Lima, Nicholas C B; Klein, Cecília C; Vasconcelos, Ana Tereza R; Nicolás, Marisa F

2012-08-09

The type IV secretion system (T4SS) can be classified as a large family of macromolecule transporter systems, divided into three recognized sub-families, according to the well-known functions. The major sub-family is the conjugation system, which allows transfer of genetic material, such as a nucleoprotein, via cell contact among bacteria. Also, the conjugation system can transfer genetic material from bacteria to eukaryotic cells; such is the case with the T-DNA transfer of Agrobacterium tumefaciens to host plant cells. The system of effector protein transport constitutes the second sub-family, and the third one corresponds to the DNA uptake/release system. Genome analyses have revealed numerous T4SS in Bacteria and Archaea. The purpose of this work was to organize, classify, and integrate the T4SS data into a single database, called AtlasT4SS - the first public database devoted exclusively to this prokaryotic secretion system. The AtlasT4SS is a manual curated database that describes a large number of proteins related to the type IV secretion system reported so far in Gram-negative and Gram-positive bacteria, as well as in Archaea. The database was created using the RDBMS MySQL and the Catalyst Framework based in the Perl programming language and using the Model-View-Controller (MVC) design pattern for Web. The current version holds a comprehensive collection of 1,617 T4SS proteins from 58 Bacteria (49 Gram-negative and 9 Gram-Positive), one Archaea and 11 plasmids. By applying the bi-directional best hit (BBH) relationship in pairwise genome comparison, it was possible to obtain a core set of 134 clusters of orthologous genes encoding T4SS proteins. In our database we present one way of classifying orthologous groups of T4SSs in a hierarchical classification scheme with three levels. The first level comprises four classes that are based on the organization of genetic determinants, shared homologies, and evolutionary relationships: (i) F-T4SS, (ii) P-T4SS, (iii
Reflections on a decade of research by ASEAN dental faculties: analysis of publications from ISI-WOS databases from 2000 to 2009.

PubMed

Sirisinha, Stitaya; Koontongkaew, Sittichai; Phantumvanit, Prathip; Wittayawuttikul, Ruchareka

2011-05-01

This communication analyzed research publications in dentistry in the Institute of Scientific Information Web of Science databases of 10 dental faculties in the Association of South-East Asian Nations (ASEAN) from 2000 to 2009. The term used for the "all-document types" search was "Faculty of Dentistry/College of Dentistry." Abstracts presented at regional meetings were also included in the analysis. The Times Higher Education System QS World University Rankings showed that universities in the region fare poorly in world university rankings. Only the National University of Singapore and Nanyang Technological University appeared in the top 100 in 2009; 19 universities in the region, including Indonesia, Malaysia, the Philippines, Singapore, and Thailand, appeared in the top 500. Data from the databases showed that research publications by dental institutes in the region fall short of their Asian counterparts. Singapore and Thailand are the most active in dental research of the ASEAN countries. © 2011 Blackwell Publishing Asia Pty Ltd.
Expert searching in public health

PubMed Central

Alpi, Kristine M.

2005-01-01

Objective: The article explores the characteristics of public health information needs and the resources available to address those needs that distinguish it as an area of searching requiring particular expertise. Methods: Public health searching activities from reference questions and literature search requests at a large, urban health department library were reviewed to identify the challenges in finding relevant public health information. Results: The terminology of the information request frequently differed from the vocabularies available in the databases. Searches required the use of multiple databases and/or Web resources with diverse interfaces. Issues of the scope and features of the databases relevant to the search questions were considered. Conclusion: Expert searching in public health differs from other types of expert searching in the subject breadth and technical demands of the databases to be searched, the fluidity and lack of standardization of the vocabulary, and the relative scarcity of high-quality investigations at the appropriate level of geographic specificity. Health sciences librarians require a broad exposure to databases, gray literature, and public health terminology to perform as expert searchers in public health. PMID:15685281
76 FR 10044 - Notice of Proposed Information Collection for Public Comment Public Housing Assessment System...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-02-23

... Database Adjustments AGENCY: Office of the Assistant Secretary for Public and Indian Housing, HUD. ACTION...: Public Housing Assessment System Appeals, Technical Reviews and Database Adjustments. OMB Control Number..., at Sec. 902.24, a database adjustment if certain conditions are present. A technical review of the...
DNA barcoding of medicinal plant material for identification

USDA-ARS?s Scientific Manuscript database

Because of the increasing demand for herbal remedies and for authentication of the source material, it is vital to provide a single database containing information about authentic plant materials and their potential adulterants. The database should provide DNA barcodes for data retrieval and similar...
Comparison of sequencing the D2 region of the large subunit ribosomal RNA gene (MicroSEQ®) versus the internal transcribed spacer (ITS) regions using two public databases for identification of common and uncommon clinically relevant fungal species.

PubMed

Arbefeville, S; Harris, A; Ferrieri, P

2017-09-01

Fungal infections cause considerable morbidity and mortality in immunocompromised patients. Rapid and accurate identification of fungi is essential to guide accurately targeted antifungal therapy. With the advent of molecular methods, clinical laboratories can use new technologies to supplement traditional phenotypic identification of fungi. The aims of the study were to evaluate the sole commercially available MicroSEQ® D2 LSU rDNA Fungal Identification Kit compared to the in-house developed internal transcribed spacer (ITS) regions assay in identifying moulds, using two well-known online public databases to analyze sequenced data. 85 common and uncommon clinically relevant fungi isolated from clinical specimens were sequenced for the D2 region of the large subunit (LSU) of ribosomal RNA (rRNA) gene with the MicroSEQ® Kit and the ITS regions with the in house developed assay. The generated sequenced data were analyzed with the online GenBank and MycoBank public databases. The D2 region of the LSU rRNA gene identified 89.4% or 92.9% of the 85 isolates to the genus level and the full ITS region (f-ITS) 96.5% or 100%, using GenBank or MycoBank, respectively, when compared to the consensus ID. When comparing species-level designations to the consensus ID, D2 region of the LSU rRNA gene aligned with 44.7% (38/85) or 52.9% (45/85) of these isolates in GenBank or MycoBank, respectively. By comparison, f-ITS possessed greater specificity, followed by ITS1, then ITS2 regions using GenBank or MycoBank. Using GenBank or MycoBank, D2 region of the LSU rRNA gene outperformed phenotypic based ID at the genus level. Comparing rates of ID between D2 region of the LSU rRNA gene and the ITS regions in GenBank or MycoBank at the species level against the consensus ID, f-ITS and ITS2 exceeded performance of the D2 region of the LSU rRNA gene, but ITS1 had similar performance to the D2 region of the LSU rRNA gene using MycoBank. Our results indicated that the MicroSEQ® D2 LSU rDNA
RiceFOX: a database of Arabidopsis mutant lines overexpressing rice full-length cDNA that contains a wide range of trait information to facilitate analysis of gene function.

PubMed

Sakurai, Tetsuya; Kondou, Youichi; Akiyama, Kenji; Kurotani, Atsushi; Higuchi, Mieko; Ichikawa, Takanari; Kuroda, Hirofumi; Kusano, Miyako; Mori, Masaki; Saitou, Tsutomu; Sakakibara, Hitoshi; Sugano, Shoji; Suzuki, Makoto; Takahashi, Hideki; Takahashi, Shinya; Takatsuji, Hiroshi; Yokotani, Naoki; Yoshizumi, Takeshi; Saito, Kazuki; Shinozaki, Kazuo; Oda, Kenji; Hirochika, Hirohiko; Matsui, Minami

2011-02-01

Identification of gene function is important not only for basic research but also for applied science, especially with regard to improvements in crop production. For rapid and efficient elucidation of useful traits, we developed a system named FOX hunting (Full-length cDNA Over-eXpressor gene hunting) using full-length cDNAs (fl-cDNAs). A heterologous expression approach provides a solution for the high-throughput characterization of gene functions in agricultural plant species. Since fl-cDNAs contain all the information of functional mRNAs and proteins, we introduced rice fl-cDNAs into Arabidopsis plants for systematic gain-of-function mutation. We generated >30,000 independent Arabidopsis transgenic lines expressing rice fl-cDNAs (rice FOX Arabidopsis mutant lines). These rice FOX Arabidopsis lines were screened systematically for various criteria such as morphology, photosynthesis, UV resistance, element composition, plant hormone profile, metabolite profile/fingerprinting, bacterial resistance, and heat and salt tolerance. The information obtained from these screenings was compiled into a database named 'RiceFOX'. This database contains around 18,000 records of rice FOX Arabidopsis lines and allows users to search against all the observed results, ranging from morphological to invisible traits. The number of searchable items is approximately 100; moreover, the rice FOX Arabidopsis lines can be searched by rice and Arabidopsis gene/protein identifiers, sequence similarity to the introduced rice fl-cDNA and traits. The RiceFOX database is available at http://ricefox.psc.riken.jp/.
RiceFOX: A Database of Arabidopsis Mutant Lines Overexpressing Rice Full-Length cDNA that Contains a Wide Range of Trait Information to Facilitate Analysis of Gene Function

PubMed Central

Sakurai, Tetsuya; Kondou, Youichi; Akiyama, Kenji; Kurotani, Atsushi; Higuchi, Mieko; Ichikawa, Takanari; Kuroda, Hirofumi; Kusano, Miyako; Mori, Masaki; Saitou, Tsutomu; Sakakibara, Hitoshi; Sugano, Shoji; Suzuki, Makoto; Takahashi, Hideki; Takahashi, Shinya; Takatsuji, Hiroshi; Yokotani, Naoki; Yoshizumi, Takeshi; Saito, Kazuki; Shinozaki, Kazuo; Oda, Kenji; Hirochika, Hirohiko; Matsui, Minami

2011-01-01

Identification of gene function is important not only for basic research but also for applied science, especially with regard to improvements in crop production. For rapid and efficient elucidation of useful traits, we developed a system named FOX hunting (Full-length cDNA Over-eXpressor gene hunting) using full-length cDNAs (fl-cDNAs). A heterologous expression approach provides a solution for the high-throughput characterization of gene functions in agricultural plant species. Since fl-cDNAs contain all the information of functional mRNAs and proteins, we introduced rice fl-cDNAs into Arabidopsis plants for systematic gain-of-function mutation. We generated >30,000 independent Arabidopsis transgenic lines expressing rice fl-cDNAs (rice FOX Arabidopsis mutant lines). These rice FOX Arabidopsis lines were screened systematically for various criteria such as morphology, photosynthesis, UV resistance, element composition, plant hormone profile, metabolite profile/fingerprinting, bacterial resistance, and heat and salt tolerance. The information obtained from these screenings was compiled into a database named ‘RiceFOX’. This database contains around 18,000 records of rice FOX Arabidopsis lines and allows users to search against all the observed results, ranging from morphological to invisible traits. The number of searchable items is approximately 100; moreover, the rice FOX Arabidopsis lines can be searched by rice and Arabidopsis gene/protein identifiers, sequence similarity to the introduced rice fl-cDNA and traits. The RiceFOX database is available at http://ricefox.psc.riken.jp/. PMID:21186176
Interactive bibliographical database on color

NASA Astrophysics Data System (ADS)

Caivano, Jose L.

2002-06-01

The paper describes the methodology and results of a project under development, aimed at the elaboration of an interactive bibliographical database on color in all fields of application: philosophy, psychology, semiotics, education, anthropology, physical and natural sciences, biology, medicine, technology, industry, architecture and design, arts, linguistics, geography, history. The project is initially based upon an already developed bibliography, published in different journals, updated in various opportunities, and now available at the Internet, with more than 2,000 entries. The interactive database will amplify that bibliography, incorporating hyperlinks and contents (indexes, abstracts, keywords, introductions, or eventually the complete document), and devising mechanisms for information retrieval. The sources to be included are: books, doctoral dissertations, multimedia publications, reference works. The main arrangement will be chronological, but the design of the database will allow rearrangements or selections by different fields: subject, Decimal Classification System, author, language, country, publisher, etc. A further project is to develop another database, including color-specialized journals or newsletters, and articles on color published in international journals, arranged in this case by journal name and date of publication, but allowing also rearrangements or selections by author, subject and keywords.
DNA in the Criminal Justice System: The DNA Success Story in Perspective.

PubMed

Mapes, Anna A; Kloosterman, Ate D; de Poot, Christianne J

2015-07-01

Current figures on the efficiency of DNA as an investigative tool in criminal investigations only tell part of the story. To get the DNA success story in the right perspective, we examined all forensic reports from serious (N = 116) and high-volume crime cases (N = 2791) over the year 2011 from one police region in the Netherlands. These data show that 38% of analyzed serious crime traces (N = 384) and 17% of analyzed high-volume crime traces (N = 386) did not result in a DNA profile. Turnaround times (from crime scene to DNA report) were 66 days for traces from serious crimes and 44 days for traces from high-volume crimes. Suspects were truly identified through a match with the Offender DNA database of the Netherlands in 3% of the serious crime cases and in 1% of the high-volume crime cases. These data are important for both the forensic laboratory and the professionals in the criminal justice system to further optimize forensic DNA testing as an investigative tool. © 2015 American Academy of Forensic Sciences.
Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers

PubMed Central

2011-01-01

Background Most information on genomic variations and their associations with phenotypes are covered exclusively in scientific publications rather than in structured databases. These texts commonly describe variations using natural language; database identifiers are seldom mentioned. This complicates the retrieval of variations, associated articles, as well as information extraction, e. g. the search for biological implications. To overcome these challenges, procedures to map textual mentions of variations to database identifiers need to be developed. Results This article describes a workflow for normalization of variation mentions, i.e. the association of them to unique database identifiers. Common pitfalls in the interpretation of single nucleotide polymorphism (SNP) mentions are highlighted and discussed. The developed normalization procedure achieves a precision of 98.1 % and a recall of 67.5% for unambiguous association of variation mentions with dbSNP identifiers on a text corpus based on 296 MEDLINE abstracts containing 527 mentions of SNPs. The annotated corpus is freely available at http://www.scai.fraunhofer.de/snp-normalization-corpus.html. Conclusions Comparable approaches usually focus on variations mentioned on the protein sequence and neglect problems for other SNP mentions. The results presented here indicate that normalizing SNPs described on DNA level is more difficult than the normalization of SNPs described on protein level. The challenges associated with normalization are exemplified with ambiguities and errors, which occur in this corpus. PMID:21992066
NREL: Renewable Resource Data Center - Biomass Resource Publications

Science.gov Websites

Marginal Lands in APEC Economies NREL Publications Database For a comprehensive list of other NREL biomass resource publications, explore NREL's Publications Database. When searching the database, search on "
MortalityPredictors.org: a manually-curated database of published biomarkers of human all-cause mortality.

PubMed

Peto, Maximus V; De la Guardia, Carlos; Winslow, Ksenia; Ho, Andrew; Fortney, Kristen; Morgen, Eric

2017-08-31

Biomarkers of all-cause mortality are of tremendous clinical and research interest. Because of the long potential duration of prospective human lifespan studies, such biomarkers can play a key role in quantifying human aging and quickly evaluating any potential therapies. Decades of research into mortality biomarkers have resulted in numerous associations documented across hundreds of publications. Here, we present MortalityPredictors.org , a manually-curated, publicly accessible database, housing published, statistically-significant relationships between biomarkers and all-cause mortality in population-based or generally healthy samples. To gather the information for this database, we searched PubMed for appropriate research papers and then manually curated relevant data from each paper. We manually curated 1,576 biomarker associations, involving 471 distinct biomarkers. Biomarkers ranged in type from hematologic (red blood cell distribution width) to molecular (DNA methylation changes) to physical (grip strength). Via the web interface, the resulting data can be easily browsed, searched, and downloaded for further analysis. MortalityPredictors.org provides comprehensive results on published biomarkers of human all-cause mortality that can be used to compare biomarkers, facilitate meta-analysis, assist with the experimental design of aging studies, and serve as a central resource for analysis. We hope that it will facilitate future research into human mortality and aging.
MortalityPredictors.org: a manually-curated database of published biomarkers of human all-cause mortality

PubMed Central

Winslow, Ksenia; Ho, Andrew; Fortney, Kristen; Morgen, Eric

2017-01-01

Biomarkers of all-cause mortality are of tremendous clinical and research interest. Because of the long potential duration of prospective human lifespan studies, such biomarkers can play a key role in quantifying human aging and quickly evaluating any potential therapies. Decades of research into mortality biomarkers have resulted in numerous associations documented across hundreds of publications. Here, we present MortalityPredictors.org, a manually-curated, publicly accessible database, housing published, statistically-significant relationships between biomarkers and all-cause mortality in population-based or generally healthy samples. To gather the information for this database, we searched PubMed for appropriate research papers and then manually curated relevant data from each paper. We manually curated 1,576 biomarker associations, involving 471 distinct biomarkers. Biomarkers ranged in type from hematologic (red blood cell distribution width) to molecular (DNA methylation changes) to physical (grip strength). Via the web interface, the resulting data can be easily browsed, searched, and downloaded for further analysis. MortalityPredictors.org provides comprehensive results on published biomarkers of human all-cause mortality that can be used to compare biomarkers, facilitate meta-analysis, assist with the experimental design of aging studies, and serve as a central resource for analysis. We hope that it will facilitate future research into human mortality and aging. PMID:28858850
RPG: the Ribosomal Protein Gene database.

PubMed

Nakao, Akihiro; Yoshihama, Maki; Kenmochi, Naoya

2004-01-01

RPG (http://ribosome.miyazaki-med.ac.jp/) is a new database that provides detailed information about ribosomal protein (RP) genes. It contains data from humans and other organisms, including Drosophila melanogaster, Caenorhabditis elegans, Saccharo myces cerevisiae, Methanococcus jannaschii and Escherichia coli. Users can search the database by gene name and organism. Each record includes sequences (genomic, cDNA and amino acid sequences), intron/exon structures, genomic locations and information about orthologs. In addition, users can view and compare the gene structures of the above organisms and make multiple amino acid sequence alignments. RPG also provides information on small nucleolar RNAs (snoRNAs) that are encoded in the introns of RP genes.
SALAD database: a motif-based database of protein annotations for plant comparative genomics

PubMed Central

Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

2010-01-01

Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209 529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named ‘SALAD on ARRAYs’ to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis. PMID:19854933
SALAD database: a motif-based database of protein annotations for plant comparative genomics.

PubMed

Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

2010-01-01

Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209,529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named 'SALAD on ARRAYs' to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis.
DNA Sequencing apparatus

DOEpatents

Tabor, Stanley; Richardson, Charles C.

1992-01-01

An automated DNA sequencing apparatus having a reactor for providing at least two series of DNA products formed from a single primer and a DNA strand, each DNA product of a series differing in molecular weight and having a chain terminating agent at one end; separating means for separating the DNA products to form a series bands, the intensity of substantially all nearby bands in a different series being different, band reading means for determining the position an This invention was made with government support including a grant from the U.S. Public Health Service, contract number AI-06045. The U.S. government has certain rights in the invention.
FishTraits Database

USGS Publications Warehouse

Angermeier, Paul L.; Frimpong, Emmanuel A.

2009-01-01

The need for integrated and widely accessible sources of species traits data to facilitate studies of ecology, conservation, and management has motivated development of traits databases for various taxa. In spite of the increasing number of traits-based analyses of freshwater fishes in the United States, no consolidated database of traits of this group exists publicly, and much useful information on these species is documented only in obscure sources. The largely inaccessible and unconsolidated traits information makes large-scale analysis involving many fishes and/or traits particularly challenging. FishTraits is a database of >100 traits for 809 (731 native and 78 exotic) fish species found in freshwaters of the conterminous United States, including 37 native families and 145 native genera. The database contains information on four major categories of traits: (1) trophic ecology, (2) body size and reproductive ecology (life history), (3) habitat associations, and (4) salinity and temperature tolerances. Information on geographic distribution and conservation status is also included. Together, we refer to the traits, distribution, and conservation status information as attributes. Descriptions of attributes are available here. Many sources were consulted to compile attributes, including state and regional species accounts and other databases.
The construction of an EST database for Bombyx mori and its application

PubMed Central

Mita, Kazuei; Morimyo, Mitsuoki; Okano, Kazuhiro; Koike, Yoshiko; Nohata, Junko; Kawasaki, Hideki; Kadono-Okuda, Keiko; Yamamoto, Kimiko; Suzuki, Masataka G.; Shimada, Toru; Goldsmith, Marian R.; Maeda, Susumu

2003-01-01

To build a foundation for the complete genome analysis of Bombyx mori, we have constructed an EST database. Because gene expression patterns deeply depend on tissues as well as developmental stages, we analyzed many cDNA libraries prepared from various tissues and different developmental stages to cover the entire set of Bombyx genes. So far, the Bombyx EST database contains 35,000 ESTs from 36 cDNA libraries, which are grouped into ≈11,000 nonredundant ESTs with the average length of 1.25 kb. The comparison with FlyBase suggests that the present EST database, SilkBase, covers >55% of all genes of Bombyx. The fraction of library-specific ESTs in each cDNA library indicates that we have not yet reached saturation, showing the validity of our strategy for constructing an EST database to cover all genes. To tackle the coming saturation problem, we have checked two methods, subtraction and normalization, to increase coverage and decrease the number of housekeeping genes, resulting in a 5–11% increase of library-specific ESTs. The identification of a number of genes and comprehensive cloning of gene families have already emerged from the SilkBase search. Direct links of SilkBase with FlyBase and WormBase provide ready identification of candidate Lepidoptera-specific genes. PMID:14614147

FragIdent--automatic identification and characterisation of cDNA-fragments.

PubMed

Seelow, Dominik; Goehler, Heike; Hoffmann, Katrin

2009-03-02

Many genetic studies and functional assays are based on cDNA fragments. After the generation of cDNA fragments from an mRNA sample, their content is at first unknown and must be assigned by sequencing reactions or hybridisation experiments. Even in characterised libraries, a considerable number of clones are wrongly annotated. Furthermore, mix-ups can happen in the laboratory. It is therefore essential to the relevance of experimental results to confirm or determine the identity of the employed cDNA fragments. However, the manual approach for the characterisation of these fragments using BLAST web interfaces is not suited for larger number of sequences and so far, no user-friendly software is publicly available. Here we present the development of FragIdent, an application for the automatic identification of open reading frames (ORFs) within cDNA-fragments. The software performs BLAST analyses to identify the genes represented by the sequences and suggests primers to complete the sequencing of the whole insert. Gene-specific information as well as the protein domains encoded by the cDNA fragment are retrieved from Internet-based databases and included in the output. The application features an intuitive graphical interface and is designed for researchers without any bioinformatics skills. It is suited for projects comprising up to several hundred different clones. We used FragIdent to identify 84 cDNA clones from a yeast two-hybrid experiment. Furthermore, we identified 131 protein domains within our analysed clones. The source code is freely available from our homepage at http://compbio.charite.de/genetik/FragIdent/.
An open experimental database for exploring inorganic materials.

PubMed

Zakutayev, Andriy; Wunder, Nick; Schwarting, Marcus; Perkins, John D; White, Robert; Munch, Kristin; Tumas, William; Phillips, Caleb

2018-04-03

The use of advanced machine learning algorithms in experimental materials science is limited by the lack of sufficiently large and diverse datasets amenable to data mining. If publicly open, such data resources would also enable materials research by scientists without access to expensive experimental equipment. Here, we report on our progress towards a publicly open High Throughput Experimental Materials (HTEM) Database (htem.nrel.gov). This database currently contains 140,000 sample entries, characterized by structural (100,000), synthetic (80,000), chemical (70,000), and optoelectronic (50,000) properties of inorganic thin film materials, grouped in >4,000 sample entries across >100 materials systems; more than a half of these data are publicly available. This article shows how the HTEM database may enable scientists to explore materials by browsing web-based user interface and an application programming interface. This paper also describes a HTE approach to generating materials data, and discusses the laboratory information management system (LIMS), that underpin HTEM database. Finally, this manuscript illustrates how advanced machine learning algorithms can be adopted to materials science problems using this open data resource.
An open experimental database for exploring inorganic materials

PubMed Central

Zakutayev, Andriy; Wunder, Nick; Schwarting, Marcus; Perkins, John D.; White, Robert; Munch, Kristin; Tumas, William; Phillips, Caleb

2018-01-01

The use of advanced machine learning algorithms in experimental materials science is limited by the lack of sufficiently large and diverse datasets amenable to data mining. If publicly open, such data resources would also enable materials research by scientists without access to expensive experimental equipment. Here, we report on our progress towards a publicly open High Throughput Experimental Materials (HTEM) Database (htem.nrel.gov). This database currently contains 140,000 sample entries, characterized by structural (100,000), synthetic (80,000), chemical (70,000), and optoelectronic (50,000) properties of inorganic thin film materials, grouped in >4,000 sample entries across >100 materials systems; more than a half of these data are publicly available. This article shows how the HTEM database may enable scientists to explore materials by browsing web-based user interface and an application programming interface. This paper also describes a HTE approach to generating materials data, and discusses the laboratory information management system (LIMS), that underpin HTEM database. Finally, this manuscript illustrates how advanced machine learning algorithms can be adopted to materials science problems using this open data resource. PMID:29611842
Navigating from Publications to Astronomical Databases

NASA Astrophysics Data System (ADS)

Ochsenbein, François; Bertout, Claude; Lequeux, James; Genova, Françoise

The implementation of journals on the Web has opened new possibilities for the scientific usage of published results because it is now possible to link published articles to other types of information. The availability of published information in electronic form also allows for new types of content validation, complementary to the referee's validation, and to the layout performed by the publisher. For several years now, authors publishing in A&A are offered the possibility of quoting the astronomical objects they are studying directly in their latex manuscript (via the object macro). Since April 2001, this macro is being translated into an actual link from the article to the SIMBAD database. This experiment is still a prototype, and its various aspects are presented here.
Second-Tier Database for Ecosystem Focus, 2002-2003 Annual Report.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Van Holmes, Chris; Muongchanh, Christine; Anderson, James J.

2003-11-01

The Second-Tier Database for Ecosystem Focus (Contract 00004124) provides direct and timely public access to Columbia Basin environmental, operational, fishery and riverine data resources for federal, state, public and private entities. The Second-Tier Database known as Data Access in Realtime (DART) integrates public data for effective access, consideration and application. DART also provides analysis tools and performance measures helpful in evaluating the condition of Columbia Basin salmonid stocks.
The DNA Data Bank of Japan launches a new resource, the DDBJ Omics Archive of functional genomics experiments.

PubMed

Kodama, Yuichi; Mashima, Jun; Kaminuma, Eli; Gojobori, Takashi; Ogasawara, Osamu; Takagi, Toshihisa; Okubo, Kousaku; Nakamura, Yasukazu

2012-01-01

The DNA Data Bank of Japan (DDBJ; http://www.ddbj.nig.ac.jp) maintains and provides archival, retrieval and analytical resources for biological information. The central DDBJ resource consists of public, open-access nucleotide sequence databases including raw sequence reads, assembly information and functional annotation. Database content is exchanged with EBI and NCBI within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). In 2011, DDBJ launched two new resources: the 'DDBJ Omics Archive' (DOR; http://trace.ddbj.nig.ac.jp/dor) and BioProject (http://trace.ddbj.nig.ac.jp/bioproject). DOR is an archival database of functional genomics data generated by microarray and highly parallel new generation sequencers. Data are exchanged between the ArrayExpress at EBI and DOR in the common MAGE-TAB format. BioProject provides an organizational framework to access metadata about research projects and the data from the projects that are deposited into different databases. In this article, we describe major changes and improvements introduced to the DDBJ services, and the launch of two new resources: DOR and BioProject.
Who owns what? Private ownership and the public interest in recombinant DNA technology in the 1970s.

PubMed

Yi, Doogab

2011-09-01

This essay analyzes how academic institutions, government agencies, and the nascent biotech industry contested the legal ownership of recombinant DNA technology in the name of the public interest. It reconstructs the way a small but influential group of government officials and university research administrators introduced a new framework for the commercialization of academic research in the context of a national debate over scientific research's contributions to American economic prosperity and public health. They claimed that private ownership of inventions arising from public support would provide a powerful means to liberate biomedical discoveries for public benefit. This articulation of the causal link between private ownership and the public interest, it is argued, justified a new set of expectations about the use of research results arising from government or public support, in which commercialization became a new public obligation for academic researchers. By highlighting the broader economic and legal shifts that prompted the reconfiguration of the ownership of public knowledge in late twentieth-century American capitalism, the essay examines the threads of policy-informed legal ideas that came together to affirm private ownership of biomedical knowledge as germane to the public interest in the coming of age of biotechnology and genetic medicine.
Criminal genomic pragmatism: prisoners' representations of DNA technology and biosecurity.

PubMed

Machado, Helena; Silva, Susana

2012-01-01

Within the context of the use of DNA technology in crime investigation, biosecurity is perceived by different stakeholders according to their particular rationalities and interests. Very little is known about prisoners' perceptions and assessments of the uses of DNA technology in solving crime. To propose a conceptual model that serves to analyse and interpret prisoners' representations of DNA technology and biosecurity. A qualitative study using an interpretative approach based on 31 semi-structured tape-recorded interviews was carried out between May and September 2009, involving male inmates in three prisons located in the north of Portugal. The content analysis focused on the following topics: the meanings attributed to DNA and assessments of the risks and benefits of the uses of DNA technology and databasing in forensic applications. DNA was described as a record of identity, an exceptional material, and a powerful biometric identifier. The interviewees believed that DNA can be planted to incriminate suspects. Convicted offenders argued for the need to extend the criteria for the inclusion of DNA profiles in forensic databases and to restrict the removal of profiles. The conceptual model entitled criminal genomic pragmatism allows for an understanding of the views of prison inmates regarding DNA technology and biosecurity.
mtDNA sequence diversity of Hazara ethnic group from Pakistan.

PubMed

Rakha, Allah; Fatima; Peng, Min-Sheng; Adan, Atif; Bi, Rui; Yasmin, Memona; Yao, Yong-Gang

2017-09-01

The present study was undertaken to investigate mitochondrial DNA (mtDNA) control region sequences of Hazaras from Pakistan, so as to generate mtDNA reference database for forensic casework in Pakistan and to analyze phylogenetic relationship of this particular ethnic group with geographically proximal populations. Complete mtDNA control region (nt 16024-576) sequences were generated through Sanger Sequencing for 319 Hazara individuals from Quetta, Baluchistan. The population sample set showed a total of 189 distinct haplotypes, belonging mainly to West Eurasian (51.72%), East & Southeast Asian (29.78%) and South Asian (18.50%) haplogroups. Compared with other populations from Pakistan, the Hazara population had a relatively high haplotype diversity (0.9945) and a lower random match probability (0.0085). The dataset has been incorporated into EMPOP database under accession number EMP00680. The data herein comprises the largest, and likely most thoroughly examined, control region mtDNA dataset from Hazaras of Pakistan. Copyright © 2017 Elsevier B.V. All rights reserved.
IPD—the Immuno Polymorphism Database

PubMed Central

Robinson, James; Halliwell, Jason A.; McWilliam, Hamish; Lopez, Rodrigo; Marsh, Steven G. E.

2013-01-01

The Immuno Polymorphism Database (IPD), http://www.ebi.ac.uk/ipd/ is a set of specialist databases related to the study of polymorphic genes in the immune system. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of killer-cell immunoglobulin-like receptors, IPD-MHC, a database of sequences of the major histocompatibility complex of different species; IPD-HPA, alloantigens expressed only on platelets; and IPD-ESTDAB, which provides access to the European Searchable Tumour Cell-Line Database, a cell bank of immunologically characterized melanoma cell lines. The data is currently available online from the website and FTP directory. This article describes the latest updates and additional tools added to the IPD project. PMID:23180793
Village Green Project: Web-accessible Database

EPA Science Inventory

The purpose of this web-accessible database is for the public to be able to view instantaneous readings from a solar-powered air monitoring station located in a public location (prototype pilot test is outside of a library in Durham County, NC). The data are wirelessly transmitte...
Northern Forest Futures reporting tools and database guide

Treesearch

Patrick D. Miles; Robert J. Huggett; W. Keith Moser

2015-01-01

The Northern Forest Futures database (NFFDB) supports the reporting of both current and projected future forest conditions for the 20 states that make up the U.S. North, an area bounded by Maine, Maryland, Missouri, and Minnesota. The NFFDB database and attendant reporting tools are available to the public as a Microsoft AccessTM database. The...
Comprehensive T-Matrix Reference Database: A 2007-2009 Update

NASA Technical Reports Server (NTRS)

Mishchenko, Michael I.; Zakharova, Nadia T.; Videen, Gorden; Khlebtsov, Nikolai G.; Wriedt, Thomas

2010-01-01

The T-matrix method is among the most versatile, efficient, and widely used theoretical techniques for the numerically exact computation of electromagnetic scattering by homogeneous and composite particles, clusters of particles, discrete random media, and particles in the vicinity of an interface separating two half-spaces with different refractive indices. This paper presents an update to the comprehensive database of T-matrix publications compiled by us previously and includes the publications that appeared since 2007. It also lists several earlier publications not included in the original database.
Comprehensive T-matrix Reference Database: A 2009-2011 Update

NASA Technical Reports Server (NTRS)

Zakharova, Nadezhda T.; Videen, G.; Khlebtsov, Nikolai G.

2012-01-01

The T-matrix method is one of the most versatile and efficient theoretical techniques widely used for the computation of electromagnetic scattering by single and composite particles, discrete random media, and particles in the vicinity of an interface separating two half-spaces with different refractive indices. This paper presents an update to the comprehensive database of peer-reviewed T-matrix publications compiled by us previously and includes the publications that appeared since 2009. It also lists several earlier publications not included in the original database.
DNA fingerprinting, DNA barcoding, and next generation sequencing technology in plants.

PubMed

Sucher, Nikolaus J; Hennell, James R; Carles, Maria C

2012-01-01

DNA fingerprinting of plants has become an invaluable tool in forensic, scientific, and industrial laboratories all over the world. PCR has become part of virtually every variation of the plethora of approaches used for DNA fingerprinting today. DNA sequencing is increasingly used either in combination with or as a replacement for traditional DNA fingerprinting techniques. A prime example is the use of short, standardized regions of the genome as taxon barcodes for biological identification of plants. Rapid advances in "next generation sequencing" (NGS) technology are driving down the cost of sequencing and bringing large-scale sequencing projects into the reach of individual investigators. We present an overview of recent publications that demonstrate the use of "NGS" technology for DNA fingerprinting and DNA barcoding applications.
The PMDB Protein Model Database

PubMed Central

Castrignanò, Tiziana; De Meo, Paolo D'Onorio; Cozzetto, Domenico; Talamo, Ivano Giuseppe; Tramontano, Anna

2006-01-01

The Protein Model Database (PMDB) is a public resource aimed at storing manually built 3D models of proteins. The database is designed to provide access to models published in the scientific literature, together with validating experimental data. It is a relational database and it currently contains >74 000 models for ∼240 proteins. The system is accessible at and allows predictors to submit models along with related supporting evidence and users to download them through a simple and intuitive interface. Users can navigate in the database and retrieve models referring to the same target protein or to different regions of the same protein. Each model is assigned a unique identifier that allows interested users to directly access the data. PMID:16381873
Evaluation and Utilization as a Public Health Tool of a National Molecular Epidemiological Tuberculosis Outbreak Database within the United Kingdom from 1997 to 2001

PubMed Central

Drobniewski, F. A.; Gibson, A.; Ruddy, M.; Yates, M. D.

2003-01-01

The aim of this study was to develop a national model and analyze the value of a molecular epidemiological Mycobacterium tuberculosis DNA fingerprint-outbreak database. Incidents were investigated by the United Kingdom PHLS Mycobacterium Reference Unit (MRU) from June 1997 to December 2001, inclusive. A total of 124 incidents involving 972 tuberculosis cases, including 520 patient cultures from referred incidents and 452 patient cultures related to two population studies, were examined by using restriction fragment length polymorphism IS6110 fingerprinting and rapid epidemiological typing. Investigations were divided into the following three categories, reflecting different operational strategies: retrospective passive analysis, retrospective active analysis, and retrospective prospective analysis. The majority of incidents were in the retrospective passive analysis category, i.e., the individual submitting isolates has a suspicion they may be linked. Outbreaks were examined in schools, hospitals, farms, prisons, and public houses, and laboratory cross-contamination events and unusual clinical presentations were investigated. Retrospective active analysis involved a major outbreak centered on a high school. Contact tracing of a teenager with smear-positive pulmonary tuberculosis matched 14 individuals, including members of his class, and another 60 cases were identified in schools clinically and radiologically and by skin testing. Retrospective prospective analysis involved an outbreak of 94 isoniazid-resistant tuberculosis cases in London, United Kingdom, that began after cases were identified at one hospital in January 2000. Contact tracing and comparison with MRU databases indicated that the earliest matched case had occurred in 1995. Subsequently, the MRU changed to an active prospective analysis targeting linked isoniazid-monoresistant isolates for follow up. The patients were multiethnic, born mainly in the United Kingdom, and included professionals
Evaluation and utilization as a public health tool of a national molecular epidemiological tuberculosis outbreak database within the United Kingdom from 1997 to 2001.

PubMed

Drobniewski, F A; Gibson, A; Ruddy, M; Yates, M D

2003-05-01

The aim of this study was to develop a national model and analyze the value of a molecular epidemiological Mycobacterium tuberculosis DNA fingerprint-outbreak database. Incidents were investigated by the United Kingdom PHLS Mycobacterium Reference Unit (MRU) from June 1997 to December 2001, inclusive. A total of 124 incidents involving 972 tuberculosis cases, including 520 patient cultures from referred incidents and 452 patient cultures related to two population studies, were examined by using restriction fragment length polymorphism IS6110 fingerprinting and rapid epidemiological typing. Investigations were divided into the following three categories, reflecting different operational strategies: retrospective passive analysis, retrospective active analysis, and retrospective prospective analysis. The majority of incidents were in the retrospective passive analysis category, i.e., the individual submitting isolates has a suspicion they may be linked. Outbreaks were examined in schools, hospitals, farms, prisons, and public houses, and laboratory cross-contamination events and unusual clinical presentations were investigated. Retrospective active analysis involved a major outbreak centered on a high school. Contact tracing of a teenager with smear-positive pulmonary tuberculosis matched 14 individuals, including members of his class, and another 60 cases were identified in schools clinically and radiologically and by skin testing. Retrospective prospective analysis involved an outbreak of 94 isoniazid-resistant tuberculosis cases in London, United Kingdom, that began after cases were identified at one hospital in January 2000. Contact tracing and comparison with MRU databases indicated that the earliest matched case had occurred in 1995. Subsequently, the MRU changed to an active prospective analysis targeting linked isoniazid-monoresistant isolates for follow up. The patients were multiethnic, born mainly in the United Kingdom, and included professionals
CoryneRegNet 4.0 – A reference database for corynebacterial gene regulatory networks

PubMed Central

Baumbach, Jan

2007-01-01

Background Detailed information on DNA-binding transcription factors (the key players in the regulation of gene expression) and on transcriptional regulatory interactions of microorganisms deduced from literature-derived knowledge, computer predictions and global DNA microarray hybridization experiments, has opened the way for the genome-wide analysis of transcriptional regulatory networks. The large-scale reconstruction of these networks allows the in silico analysis of cell behavior in response to changing environmental conditions. We previously published CoryneRegNet, an ontology-based data warehouse of corynebacterial transcription factors and regulatory networks. Initially, it was designed to provide methods for the analysis and visualization of the gene regulatory network of Corynebacterium glutamicum. Results Now we introduce CoryneRegNet release 4.0, which integrates data on the gene regulatory networks of 4 corynebacteria, 2 mycobacteria and the model organism Escherichia coli K12. As the previous versions, CoryneRegNet provides a web-based user interface to access the database content, to allow various queries, and to support the reconstruction, analysis and visualization of regulatory networks at different hierarchical levels. In this article, we present the further improved database content of CoryneRegNet along with novel analysis features. The network visualization feature GraphVis now allows the inter-species comparisons of reconstructed gene regulatory networks and the projection of gene expression levels onto that networks. Therefore, we added stimulon data directly into the database, but also provide Web Service access to the DNA microarray analysis platform EMMA. Additionally, CoryneRegNet now provides a SOAP based Web Service server, which can easily be consumed by other bioinformatics software systems. Stimulons (imported from the database, or uploaded by the user) can be analyzed in the context of known transcriptional regulatory networks to
“NaKnowBase”: A Nanomaterials Relational Database

EPA Science Inventory

NaKnowBase is an internal relational database populated with data from peer-reviewed ORD nanomaterials research publications. The database focuses on papers describing the actions of nanomaterials in environmental or biological media including their interactions, transformations...

Database systems for knowledge-based discovery.

PubMed

Jagarlapudi, Sarma A R P; Kishan, K V Radha

2009-01-01

Several database systems have been developed to provide valuable information from the bench chemist to biologist, medical practitioner to pharmaceutical scientist in a structured format. The advent of information technology and computational power enhanced the ability to access large volumes of data in the form of a database where one could do compilation, searching, archiving, analysis, and finally knowledge derivation. Although, data are of variable types the tools used for database creation, searching and retrieval are similar. GVK BIO has been developing databases from publicly available scientific literature in specific areas like medicinal chemistry, clinical research, and mechanism-based toxicity so that the structured databases containing vast data could be used in several areas of research. These databases were classified as reference centric or compound centric depending on the way the database systems were designed. Integration of these databases with knowledge derivation tools would enhance the value of these systems toward better drug design and discovery.
The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A Completed Reference Database of Lung Nodules on CT Scans

DOE Office of Scientific and Technical Information (OSTI.GOV)

NONE

2011-02-15

Purpose: The development of computer-aided diagnostic (CAD) methods for lung nodule detection, classification, and quantitative assessment can be facilitated through a well-characterized repository of computed tomography (CT) scans. The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI) completed such a database, establishing a publicly available reference for the medical imaging research community. Initiated by the National Cancer Institute (NCI), further advanced by the Foundation for the National Institutes of Health (FNIH), and accompanied by the Food and Drug Administration (FDA) through active participation, this public-private partnership demonstrates the success of a consortium founded on a consensus-based process.more » Methods: Seven academic centers and eight medical imaging companies collaborated to identify, address, and resolve challenging organizational, technical, and clinical issues to provide a solid foundation for a robust database. The LIDC/IDRI Database contains 1018 cases, each of which includes images from a clinical thoracic CT scan and an associated XML file that records the results of a two-phase image annotation process performed by four experienced thoracic radiologists. In the initial blinded-read phase, each radiologist independently reviewed each CT scan and marked lesions belonging to one of three categories (''nodule{>=}3 mm,''''nodule<3 mm,'' and ''non-nodule{>=}3 mm''). In the subsequent unblinded-read phase, each radiologist independently reviewed their own marks along with the anonymized marks of the three other radiologists to render a final opinion. The goal of this process was to identify as completely as possible all lung nodules in each CT scan without requiring forced consensus. Results: The Database contains 7371 lesions marked ''nodule'' by at least one radiologist. 2669 of these lesions were marked ''nodule{>=}3 mm'' by at least one radiologist, of which 928 (34.7%) received such marks
ECOTOX database; new additions and future direction

EPA Science Inventory

The ECOTOXicology database (ECOTOX) is a comprehensive, publicly available knowledgebase developed and maintained by ORD/NHEERL. It is used for environmental toxicity data on aquatic life, terrestrial plants and wildlife. Publications are identified for potential applicability af...
Advancements in web-database applications for rabies surveillance.

PubMed

Rees, Erin E; Gendron, Bruno; Lelièvre, Frédérick; Coté, Nathalie; Bélanger, Denise

2011-08-02

Protection of public health from rabies is informed by the analysis of surveillance data from human and animal populations. In Canada, public health, agricultural and wildlife agencies at the provincial and federal level are responsible for rabies disease control, and this has led to multiple agency-specific data repositories. Aggregation of agency-specific data into one database application would enable more comprehensive data analyses and effective communication among participating agencies. In Québec, RageDB was developed to house surveillance data for the raccoon rabies variant, representing the next generation in web-based database applications that provide a key resource for the protection of public health. RageDB incorporates data from, and grants access to, all agencies responsible for the surveillance of raccoon rabies in Québec. Technological advancements of RageDB to rabies surveillance databases include (1) automatic integration of multi-agency data and diagnostic results on a daily basis; (2) a web-based data editing interface that enables authorized users to add, edit and extract data; and (3) an interactive dashboard to help visualize data simply and efficiently, in table, chart, and cartographic formats. Furthermore, RageDB stores data from citizens who voluntarily report sightings of rabies suspect animals. We also discuss how sightings data can indicate public perception to the risk of racoon rabies and thus aid in directing the allocation of disease control resources for protecting public health. RageDB provides an example in the evolution of spatio-temporal database applications for the storage, analysis and communication of disease surveillance data. The database was fast and inexpensive to develop by using open-source technologies, simple and efficient design strategies, and shared web hosting. The database increases communication among agencies collaborating to protect human health from raccoon rabies. Furthermore, health agencies have real
Advancements in web-database applications for rabies surveillance

PubMed Central

2011-01-01

Background Protection of public health from rabies is informed by the analysis of surveillance data from human and animal populations. In Canada, public health, agricultural and wildlife agencies at the provincial and federal level are responsible for rabies disease control, and this has led to multiple agency-specific data repositories. Aggregation of agency-specific data into one database application would enable more comprehensive data analyses and effective communication among participating agencies. In Québec, RageDB was developed to house surveillance data for the raccoon rabies variant, representing the next generation in web-based database applications that provide a key resource for the protection of public health. Results RageDB incorporates data from, and grants access to, all agencies responsible for the surveillance of raccoon rabies in Québec. Technological advancements of RageDB to rabies surveillance databases include 1) automatic integration of multi-agency data and diagnostic results on a daily basis; 2) a web-based data editing interface that enables authorized users to add, edit and extract data; and 3) an interactive dashboard to help visualize data simply and efficiently, in table, chart, and cartographic formats. Furthermore, RageDB stores data from citizens who voluntarily report sightings of rabies suspect animals. We also discuss how sightings data can indicate public perception to the risk of racoon rabies and thus aid in directing the allocation of disease control resources for protecting public health. Conclusions RageDB provides an example in the evolution of spatio-temporal database applications for the storage, analysis and communication of disease surveillance data. The database was fast and inexpensive to develop by using open-source technologies, simple and efficient design strategies, and shared web hosting. The database increases communication among agencies collaborating to protect human health from raccoon rabies
“NaKnowBase”: A Nanomaterials Relational Database

EPA Science Inventory

NaKnowBase is a relational database populated with data from peer-reviewed ORD nanomaterials research publications. The database focuses on papers describing the actions of nanomaterials in environmental or biological media including their interactions, transformations and poten...
32 CFR 338.1 - Ordering DNA issuances.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 32 National Defense 2 2010-07-01 2010-07-01 false Ordering DNA issuances. 338.1 Section 338.1... DOD INFORMATION AVAILABILITY TO THE PUBLIC OF DEFENSE NUCLEAR AGENCY (DNA) INSTRUCTIONS AND CHANGES THERETO § 338.1 Ordering DNA issuances. (a) The DNA issuances published in the DNA indexes are published...
32 CFR 338.1 - Ordering DNA issuances.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 32 National Defense 2 2011-07-01 2011-07-01 false Ordering DNA issuances. 338.1 Section 338.1... DOD INFORMATION AVAILABILITY TO THE PUBLIC OF DEFENSE NUCLEAR AGENCY (DNA) INSTRUCTIONS AND CHANGES THERETO § 338.1 Ordering DNA issuances. (a) The DNA issuances published in the DNA indexes are published...
32 CFR 338.1 - Ordering DNA issuances.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 32 National Defense 2 2014-07-01 2014-07-01 false Ordering DNA issuances. 338.1 Section 338.1... DOD INFORMATION AVAILABILITY TO THE PUBLIC OF DEFENSE NUCLEAR AGENCY (DNA) INSTRUCTIONS AND CHANGES THERETO § 338.1 Ordering DNA issuances. (a) The DNA issuances published in the DNA indexes are published...
32 CFR 338.1 - Ordering DNA issuances.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 32 National Defense 2 2012-07-01 2012-07-01 false Ordering DNA issuances. 338.1 Section 338.1... DOD INFORMATION AVAILABILITY TO THE PUBLIC OF DEFENSE NUCLEAR AGENCY (DNA) INSTRUCTIONS AND CHANGES THERETO § 338.1 Ordering DNA issuances. (a) The DNA issuances published in the DNA indexes are published...
32 CFR 338.1 - Ordering DNA issuances.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 32 National Defense 2 2013-07-01 2013-07-01 false Ordering DNA issuances. 338.1 Section 338.1... DOD INFORMATION AVAILABILITY TO THE PUBLIC OF DEFENSE NUCLEAR AGENCY (DNA) INSTRUCTIONS AND CHANGES THERETO § 338.1 Ordering DNA issuances. (a) The DNA issuances published in the DNA indexes are published...
PoMaMo--a comprehensive database for potato genome data.

PubMed

Meyer, Svenja; Nagel, Axel; Gebhardt, Christiane

2005-01-01

A database for potato genome data (PoMaMo, Potato Maps and More) was established. The database contains molecular maps of all twelve potato chromosomes with about 1000 mapped elements, sequence data, putative gene functions, results from BLAST analysis, SNP and InDel information from different diploid and tetraploid potato genotypes, publication references, links to other public databases like GenBank (http://www.ncbi.nlm.nih.gov/) or SGN (Solanaceae Genomics Network, http://www.sgn.cornell.edu/), etc. Flexible search and data visualization interfaces enable easy access to the data via internet (https://gabi.rzpd.de/PoMaMo.html). The Java servlet tool YAMB (Yet Another Map Browser) was designed to interactively display chromosomal maps. Maps can be zoomed in and out, and detailed information about mapped elements can be obtained by clicking on an element of interest. The GreenCards interface allows a text-based data search by marker-, sequence- or genotype name, by sequence accession number, gene function, BLAST Hit or publication reference. The PoMaMo database is a comprehensive database for different potato genome data, and to date the only database containing SNP and InDel data from diploid and tetraploid potato genotypes.
1-CMDb: A Curated Database of Genomic Variations of the One-Carbon Metabolism Pathway.

PubMed

Bhat, Manoj K; Gadekar, Veerendra P; Jain, Aditya; Paul, Bobby; Rai, Padmalatha S; Satyamoorthy, Kapaettu

2017-01-01

The one-carbon metabolism pathway is vital in maintaining tissue homeostasis by driving the critical reactions of folate and methionine cycles. A myriad of genetic and epigenetic events mark the rate of reactions in a tissue-specific manner. Integration of these to predict and provide personalized health management requires robust computational tools that can process multiomics data. The DNA sequences that may determine the chain of biological events and the endpoint reactions within one-carbon metabolism genes remain to be comprehensively recorded. Hence, we designed the one-carbon metabolism database (1-CMDb) as a platform to interrogate its association with a host of human disorders. DNA sequence and network information of a total of 48 genes were extracted from a literature survey and KEGG pathway that are involved in the one-carbon folate-mediated pathway. The information generated, collected, and compiled for all these genes from the UCSC genome browser included the single nucleotide polymorphisms (SNPs), CpGs, copy number variations (CNVs), and miRNAs, and a comprehensive database was created. Furthermore, a significant correlation analysis was performed for SNPs in the pathway genes. Detailed data of SNPs, CNVs, CpG islands, and miRNAs for 48 folate pathway genes were compiled. The SNPs in CNVs (9670), CpGs (984), and miRNAs (14) were also compiled for all pathway genes. The SIFT score, the prediction and PolyPhen score, as well as the prediction for each of the SNPs were tabulated and represented for folate pathway genes. Also included in the database for folate pathway genes were the links to 124 various phenotypes and disease associations as reported in the literature and from publicly available information. A comprehensive database was generated consisting of genomic elements within and among SNPs, CNVs, CpGs, and miRNAs of one-carbon metabolism pathways to facilitate (a) single source of information and (b) integration into large-genome scale network
Gene expression analysis by cDNA-AFLP highlights a set of new signaling networks and translational control during seed dormancy breaking in Nicotiana plumbaginifolia.

PubMed

Bove, Jérôme; Lucas, Philippe; Godin, Béatrice; Ogé, Laurent; Jullien, Marc; Grappin, Philippe

2005-03-01

Seed dormancy in Nicotiana plumbaginifolia is characterized by an abscisic acid accumulation linked to a pronounced germination delay. Dormancy can be released by 1 year after-ripening treatment. Using a cDNA-amplified fragment length polymorphism (cDNA-AFLP) approach we compared the gene expression patterns of dormant and after-ripened seeds, air-dry or during one day imbibition and analyzed 15,000 cDNA fragments. Among them 1020 were found to be differentially regulated by dormancy. Of 412 sequenced cDNA fragments, 83 were assigned to a known function by search similarities to public databases. The functional categories of the identified dormancy maintenance and breaking responsive genes, give evidence that after-ripening turns in the air-dry seed to a new developmental program that modulates, at the RNA level, components of translational control, signaling networks, transcriptional control and regulated proteolysis.
NASA scientific and technical publications: A catalog of special publications, reference publications, conference publications, and technical papers, 1989

NASA Technical Reports Server (NTRS)

1990-01-01

This catalog lists 190 citations of all NASA Special Publications, NASA Reference Publications, NASA Conference Publications, and NASA Technical Papers that were entered into the NASA scientific and technical information database during accession year 1989. The entries are grouped by subject category. Indexes of subject terms, personal authors, and NASA report numbers are provided.
NASA scientific and technical publications: A catalog of Special Publications, Reference Publications, Conference Publications, and Technical Papers, 1987

NASA Technical Reports Server (NTRS)

1988-01-01

This catalog lists 239 citations of all NASA Special Publications, NASA Reference Publications, NASA Conference Publications, and NASA Technical Papers that were entered in the NASA scientific and technical information database during accession year 1987. The entries are grouped by subject category. Indexes of subject terms, personal authors, and NASA report numbers are provided.
New Zealand's National Landslide Database

NASA Astrophysics Data System (ADS)

Rosser, B.; Dellow, S.; Haubrook, S.; Glassey, P.

2016-12-01

Since 1780, landslides have caused an average of about 3 deaths a year in New Zealand and have cost the economy an average of at least NZ$250M/a (0.1% GDP). To understand the risk posed by landslide hazards to society, a thorough knowledge of where, when and why different types of landslides occur is vital. The main objective for establishing the database was to provide a centralised national-scale, publically available database to collate landslide information that could be used for landslide hazard and risk assessment. Design of a national landslide database for New Zealand required consideration of both existing landslide data stored in a variety of digital formats, and future data, yet to be collected. Pre-existing databases were developed and populated with data reflecting the needs of the landslide or hazard project, and the database structures of the time. Bringing these data into a single unified database required a new structure capable of storing and delivering data at a variety of scales and accuracy and with different attributes. A "unified data model" was developed to enable the database to hold old and new landslide data irrespective of scale and method of capture. The database contains information on landslide locations and where available: 1) the timing of landslides and the events that may have triggered them; 2) the type of landslide movement; 3) the volume and area; 4) the source and debris tail; and 5) the impacts caused by the landslide. Information from a variety of sources including aerial photographs (and other remotely sensed data), field reconnaissance and media accounts has been collated and is presented for each landslide along with metadata describing the data sources and quality. There are currently nearly 19,000 landslide records in the database that include point locations, polygons of landslide source and deposit areas, and linear features. Several large datasets are awaiting upload which will bring the total number of landslides to
ARTI refrigerant database

DOE Office of Scientific and Technical Information (OSTI.GOV)

Calm, J.M.

1998-03-15

The Refrigerant Database is an information system on alternative refrigerants, associated lubricants, and their use in air conditioning and refrigeration. It consolidates and facilitates access to thermophysical properties, compatibility, environmental, safety, application and other information. It provides corresponding information on older refrigerants, to assist manufacturers and those using alternative refrigerants, to make comparisons and determine differences. The underlying purpose is to accelerate phase out of chemical compounds of environmental concern. The database provides bibliographic citations and abstracts for publications that may be useful in research and design of air conditioning and refrigeration equipment. It also references documents addressing compatibility ofmore » refrigerants and lubricants with other materials.« less
"First generation" automated DNA sequencing technology.

PubMed

Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M

2011-10-01

Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.
The CHARA Array Database

NASA Astrophysics Data System (ADS)

Jones, Jeremy; Schaefer, Gail; ten Brummelaar, Theo; Gies, Douglas; Farrington, Christopher

2018-01-01

We are building a searchable database for the CHARA Array data archive. The Array consists of six telescopes linked together as an interferometer, providing sub-milliarcsecond resolution in the optical and near-infrared. The Array enables a variety of scientific studies, including measuring stellar angular diameters, imaging stellar shapes and surface features, mapping the orbits of close binary companions, and resolving circumstellar environments. This database is one component of an NSF/MSIP funded program to provide open access to the CHARA Array to the broader astronomical community. This archive goes back to 2004 and covers all the beam combiners on the Array. We discuss the current status of and future plans for the public database, and give directions on how to access it.

RPG: the Ribosomal Protein Gene database

PubMed Central

Nakao, Akihiro; Yoshihama, Maki; Kenmochi, Naoya

2004-01-01

RPG (http://ribosome.miyazaki-med.ac.jp/) is a new database that provides detailed information about ribosomal protein (RP) genes. It contains data from humans and other organisms, including Drosophila melanogaster, Caenorhabditis elegans, Saccharo myces cerevisiae, Methanococcus jannaschii and Escherichia coli. Users can search the database by gene name and organism. Each record includes sequences (genomic, cDNA and amino acid sequences), intron/exon structures, genomic locations and information about orthologs. In addition, users can view and compare the gene structures of the above organisms and make multiple amino acid sequence alignments. RPG also provides information on small nucleolar RNAs (snoRNAs) that are encoded in the introns of RP genes. PMID:14681386
Ancient and modern environmental DNA

PubMed Central

Pedersen, Mikkel Winther; Overballe-Petersen, Søren; Ermini, Luca; Sarkissian, Clio Der; Haile, James; Hellstrom, Micaela; Spens, Johan; Thomsen, Philip Francis; Bohmann, Kristine; Cappellini, Enrico; Schnell, Ida Bærholm; Wales, Nathan A.; Carøe, Christian; Campos, Paula F.; Schmidt, Astrid M. Z.; Gilbert, M. Thomas P.; Hansen, Anders J.; Orlando, Ludovic; Willerslev, Eske

2015-01-01

DNA obtained from environmental samples such as sediments, ice or water (environmental DNA, eDNA), represents an important source of information on past and present biodiversity. It has revealed an ancient forest in Greenland, extended by several thousand years the survival dates for mainland woolly mammoth in Alaska, and pushed back the dates for spruce survival in Scandinavian ice-free refugia during the last glaciation. More recently, eDNA was used to uncover the past 50 000 years of vegetation history in the Arctic, revealing massive vegetation turnover at the Pleistocene/Holocene transition, with implications for the extinction of megafauna. Furthermore, eDNA can reflect the biodiversity of extant flora and fauna, both qualitatively and quantitatively, allowing detection of rare species. As such, trace studies of plant and vertebrate DNA in the environment have revolutionized our knowledge of biogeography. However, the approach remains marred by biases related to DNA behaviour in environmental settings, incomplete reference databases and false positive results due to contamination. We provide a review of the field. PMID:25487334
The National DNA Data Bank of Canada: a Quebecer perspective.

PubMed

Milot, Emmanuel; Lecomte, Marie M J; Germain, Hugo; Crispino, Frank

2013-11-20

The Canadian National DNA Database was created in 1998 and first used in the mid-2000. Under management by the RCMP, the National DNA Data Bank of Canada offers each year satisfactory reported statistics for its use and efficiency. Built on two indexes (convicted offenders and crime scene indexes), the database not only provides increasing matches to offenders or linked traces to the various police forces of the nation, but offers a memory repository for cold cases. Despite these achievements, the data bank is now facing new challenges that will inevitably defy the way the database is currently used. These arise from the increasing power of detection of DNA traces, the diversity of demands from police investigators and the growth of the bank itself. Examples of new requirements from the database now include familial searches, low-copy-number analyses and the correct interpretation of mixed samples. This paper aims to develop on the original way set in Québec to address some of these challenges. Nevertheless, analytic and technological advances will inevitably lead to the introduction of new technologies in forensic laboratories, such as single cell sequencing, phenotyping, and proteomics. Furthermore, it will not only request a new holistic/global approach of the forensic molecular biology sciences (through academia and a more investigative role in the laboratory), but also new legal developments. Far from being exhaustive, this paper highlights some of the current use of the database, its potential for the future, and opportunity to expand as a result of recent technological developments in molecular biology, including, but not limited to DNA identification.
The National DNA Data Bank of Canada: a Quebecer perspective

PubMed Central

Milot, Emmanuel; Lecomte, Marie M. J.; Germain, Hugo; Crispino, Frank

2013-01-01

The Canadian National DNA Database was created in 1998 and first used in the mid-2000. Under management by the RCMP, the National DNA Data Bank of Canada offers each year satisfactory reported statistics for its use and efficiency. Built on two indexes (convicted offenders and crime scene indexes), the database not only provides increasing matches to offenders or linked traces to the various police forces of the nation, but offers a memory repository for cold cases. Despite these achievements, the data bank is now facing new challenges that will inevitably defy the way the database is currently used. These arise from the increasing power of detection of DNA traces, the diversity of demands from police investigators and the growth of the bank itself. Examples of new requirements from the database now include familial searches, low-copy-number analyses and the correct interpretation of mixed samples. This paper aims to develop on the original way set in Québec to address some of these challenges. Nevertheless, analytic and technological advances will inevitably lead to the introduction of new technologies in forensic laboratories, such as single cell sequencing, phenotyping, and proteomics. Furthermore, it will not only request a new holistic/global approach of the forensic molecular biology sciences (through academia and a more investigative role in the laboratory), but also new legal developments. Far from being exhaustive, this paper highlights some of the current use of the database, its potential for the future, and opportunity to expand as a result of recent technological developments in molecular biology, including, but not limited to DNA identification. PMID:24312124
Ensembl core software resources: storage and programmatic access for DNA sequence and genome annotation.

PubMed

Ruffier, Magali; Kähäri, Andreas; Komorowska, Monika; Keenan, Stephen; Laird, Matthew; Longden, Ian; Proctor, Glenn; Searle, Steve; Staines, Daniel; Taylor, Kieron; Vullo, Alessandro; Yates, Andrew; Zerbino, Daniel; Flicek, Paul

2017-01-01

The Ensembl software resources are a stable infrastructure to store, access and manipulate genome assemblies and their functional annotations. The Ensembl 'Core' database and Application Programming Interface (API) was our first major piece of software infrastructure and remains at the centre of all of our genome resources. Since its initial design more than fifteen years ago, the number of publicly available genomic, transcriptomic and proteomic datasets has grown enormously, accelerated by continuous advances in DNA-sequencing technology. Initially intended to provide annotation for the reference human genome, we have extended our framework to support the genomes of all species as well as richer assembly models. Cross-referenced links to other informatics resources facilitate searching our database with a variety of popular identifiers such as UniProt and RefSeq. Our comprehensive and robust framework storing a large diversity of genome annotations in one location serves as a platform for other groups to generate and maintain their own tailored annotation. We welcome reuse and contributions: our databases and APIs are publicly available, all of our source code is released with a permissive Apache v2.0 licence at http://github.com/Ensembl and we have an active developer mailing list ( http://www.ensembl.org/info/about/contact/index.html ). http://www.ensembl.org. © The Author(s) 2017. Published by Oxford University Press.
Genetic identification of missing persons: DNA analysis of human remains and compromised samples.

PubMed

Alvarez-Cubero, M J; Saiz, M; Martinez-Gonzalez, L J; Alvarez, J C; Eisenberg, A J; Budowle, B; Lorente, J A

2012-01-01

Human identification has made great strides over the past 2 decades due to the advent of DNA typing. Forensic DNA typing provides genetic data from a variety of materials and individuals, and is applied to many important issues that confront society. Part of the success of DNA typing is the generation of DNA databases to help identify missing persons and to develop investigative leads to assist law enforcement. DNA databases house DNA profiles from convicted felons (and in some jurisdictions arrestees), forensic evidence, human remains, and direct and family reference samples of missing persons. These databases are essential tools, which are becoming quite large (for example the US Database contains 10 million profiles). The scientific, governmental and private communities continue to work together to standardize genetic markers for more effective worldwide data sharing, to develop and validate robust DNA typing kits that contain the reagents necessary to type core identity genetic markers, to develop technologies that facilitate a number of analytical processes and to develop policies to make human identity testing more effective. Indeed, DNA typing is integral to resolving a number of serious criminal and civil concerns, such as solving missing person cases and identifying victims of mass disasters and children who may have been victims of human trafficking, and provides information for historical studies. As more refined capabilities are still required, novel approaches are being sought, such as genetic testing by next-generation sequencing, mass spectrometry, chip arrays and pyrosequencing. Single nucleotide polymorphisms offer the potential to analyze severely compromised biological samples, to determine the facial phenotype of decomposed human remains and to predict the bioancestry of individuals, a new focus in analyzing this type of markers. Copyright © 2012 S. Karger AG, Basel.
Correlates of Access to Business Research Databases

ERIC Educational Resources Information Center

Gottfried, John C.

2010-01-01

This study examines potential correlates of business research database access through academic libraries serving top business programs in the United States. Results indicate that greater access to research databases is related to enrollment in graduate business programs, but not to overall enrollment or status as a public or private institution.…
Open Window: When Easily Identifiable Genomes and Traits Are in the Public Domain

PubMed Central

Angrist, Misha

2014-01-01

“One can't be of an enquiring and experimental nature, and still be very sensible.” - Charles Fort [1] As the costs of personal genetic testing “self-quantification” fall, publicly accessible databases housing people's genotypic and phenotypic information are gradually increasing in number and scope. The latest entrant is openSNP, which allows participants to upload their personal genetic/genomic and self-reported phenotypic data. I believe the emergence of such open repositories of human biological data is a natural reflection of inquisitive and digitally literate people's desires to make genomic and phenotypic information more easily available to a community beyond the research establishment. Such unfettered databases hold the promise of contributing mightily to science, science education and medicine. That said, in an age of increasingly widespread governmental and corporate surveillance, we would do well to be mindful that genomic DNA is uniquely identifying. Participants in open biological databases are engaged in a real-time experiment whose outcome is unknown. PMID:24647311
An open experimental database for exploring inorganic materials

DOE PAGES

Zakutayev, Andriy; Wunder, Nick; Schwarting, Marcus; ...

2018-04-03

The use of advanced machine learning algorithms in experimental materials science is limited by the lack of sufficiently large and diverse datasets amenable to data mining. If publicly open, such data resources would also enable materials research by scientists without access to expensive experimental equipment. Here, we report on our progress towards a publicly open High Throughput Experimental Materials (HTEM) Database (htem.nrel.gov). This database currently contains 140,000 sample entries, characterized by structural (100,000), synthetic (80,000), chemical (70,000), and optoelectronic (50,000) properties of inorganic thin film materials, grouped in >4,000 sample entries across >100 materials systems; more than a half ofmore » these data are publicly available. This article shows how the HTEM database may enable scientists to explore materials by browsing web-based user interface and an application programming interface. This paper also describes a HTE approach to generating materials data, and discusses the laboratory information management system (LIMS), that underpin HTEM database. Finally, this manuscript illustrates how advanced machine learning algorithms can be adopted to materials science problems using this open data resource.« less
An open experimental database for exploring inorganic materials

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zakutayev, Andriy; Wunder, Nick; Schwarting, Marcus

The use of advanced machine learning algorithms in experimental materials science is limited by the lack of sufficiently large and diverse datasets amenable to data mining. If publicly open, such data resources would also enable materials research by scientists without access to expensive experimental equipment. Here, we report on our progress towards a publicly open High Throughput Experimental Materials (HTEM) Database (htem.nrel.gov). This database currently contains 140,000 sample entries, characterized by structural (100,000), synthetic (80,000), chemical (70,000), and optoelectronic (50,000) properties of inorganic thin film materials, grouped in >4,000 sample entries across >100 materials systems; more than a half ofmore » these data are publicly available. This article shows how the HTEM database may enable scientists to explore materials by browsing web-based user interface and an application programming interface. This paper also describes a HTE approach to generating materials data, and discusses the laboratory information management system (LIMS), that underpin HTEM database. Finally, this manuscript illustrates how advanced machine learning algorithms can be adopted to materials science problems using this open data resource.« less
Bibliographical database of radiation biological dosimetry and risk assessment: Part 1, through June 1988

DOE Office of Scientific and Technical Information (OSTI.GOV)

Straume, T.; Ricker, Y.; Thut, M.

1988-08-29

This database was constructed to support research in radiation biological dosimetry and risk assessment. Relevant publications were identified through detailed searches of national and international electronic databases and through our personal knowledge of the subject. Publications were numbered and key worded, and referenced in an electronic data-retrieval system that permits quick access through computerized searches on publication number, authors, key words, title, year, and journal name. Photocopies of all publications contained in the database are maintained in a file that is numerically arranged by citation number. This report of the database is provided as a useful reference and overview. Itmore » should be emphasized that the database will grow as new citations are added to it. With that in mind, we arranged this report in order of ascending citation number so that follow-up reports will simply extend this document. The database cite 1212 publications. Publications are from 119 different scientific journals, 27 of these journals are cited at least 5 times. It also contains reference to 42 books and published symposia, and 129 reports. Information relevant to radiation biological dosimetry and risk assessment is widely distributed among the scientific literature, although a few journals clearly dominate. The four journals publishing the largest number of relevant papers are Health Physics, Mutation Research, Radiation Research, and International Journal of Radiation Biology. Publications in Health Physics make up almost 10% of the current database.« less
The future of forensic DNA analysis

PubMed Central

Butler, John M.

2015-01-01

The author's thoughts and opinions on where the field of forensic DNA testing is headed for the next decade are provided in the context of where the field has come over the past 30 years. Similar to the Olympic motto of ‘faster, higher, stronger’, forensic DNA protocols can be expected to become more rapid and sensitive and provide stronger investigative potential. New short tandem repeat (STR) loci have expanded the core set of genetic markers used for human identification in Europe and the USA. Rapid DNA testing is on the verge of enabling new applications. Next-generation sequencing has the potential to provide greater depth of coverage for information on STR alleles. Familial DNA searching has expanded capabilities of DNA databases in parts of the world where it is allowed. Challenges and opportunities that will impact the future of forensic DNA are explored including the need for education and training to improve interpretation of complex DNA profiles. PMID:26101278
A database of charged cosmic rays

NASA Astrophysics Data System (ADS)

Maurin, D.; Melot, F.; Taillet, R.

2014-09-01

Aims: This paper gives a description of a new online database and associated online tools (data selection, data export, plots, etc.) for charged cosmic-ray measurements. The experimental setups (type, flight dates, techniques) from which the data originate are included in the database, along with the references to all relevant publications. Methods: The database relies on the MySQL5 engine. The web pages and queries are based on PHP, AJAX and the jquery, jquery.cluetip, jquery-ui, and table-sorter third-party libraries. Results: In this first release, we restrict ourselves to Galactic cosmic rays with Z ≤ 30 and a kinetic energy per nucleon up to a few tens of TeV/n. This corresponds to more than 200 different sub-experiments (i.e., different experiments, or data from the same experiment flying at different times) in as many publications. Conclusions: We set up a cosmic-ray database (CRDB) and provide tools to sort and visualise the data. New data can be submitted, providing the community with a collaborative tool to archive past and future cosmic-ray measurements. http://lpsc.in2p3.fr/crdb; Contact: crdatabase@lpsc.in2p3.fr
NASA scientific and technical publications: A catalog of special publications, reference publications, conference publications, and technical papers, 1987-1990

NASA Technical Reports Server (NTRS)

1991-01-01

This catalog lists 783 citations of all NASA Special Publications, NASA Reference Publications, NASA Conference Publications, and NASA Technical Papers that were entered into NASA Scientific and Technical Information Database during the year's 1987 through 1990. The entries are grouped by subject category. Indexes of subject terms, personal authors, and NASA report numbers are provided.
A RESTful application programming interface for the PubMLST molecular typing and genome databases

PubMed Central

Bray, James E.; Maiden, Martin C. J.

2017-01-01

Abstract Molecular typing is used to differentiate microorganisms at the subspecies or strain level for epidemiological investigations, infection control, public health and environmental sampling. DNA sequence-based typing methods require authoritative databases that link sequence variants to nomenclature in order to facilitate communication and comparison of identified types in national or global settings. The PubMLST website (https://pubmlst.org/) fulfils this role for over a hundred microorganisms for which it hosts curated molecular sequence typing data, providing sequence and allelic profile definitions for multi-locus sequence typing (MLST) and single-gene typing approaches. In recent years, these have expanded to cover the whole genome with schemes such as core genome MLST (cgMLST) and whole genome MLST (wgMLST) which catalogue the allelic diversity found in hundreds to thousands of genes. These approaches provide a common nomenclature for high-resolution strain characterization and comparison. Molecular typing information is linked to isolate provenance, phenotype, and increasingly genome assemblies, providing a resource for outbreak investigation and research in to population structure, gene association, global epidemiology and vaccine coverage. A Representational State Transfer (REST) Application Programming Interface (API) has been developed for the PubMLST website to make these large quantities of structured molecular typing and whole genome sequence data available for programmatic access by any third party application. The API is an integral component of the Bacterial Isolate Genome Sequence Database (BIGSdb) platform that is used to host PubMLST resources, and exposes all public data within the site. In addition to data browsing, searching and download, the API supports authentication and submission of new data to curator queues. Database URL: http://rest.pubmlst.org/ PMID:29220452
Fun Databases: My Top Ten.

ERIC Educational Resources Information Center

O'Leary, Mick

1992-01-01

Provides reviews of 10 online databases: Consumer Reports; Public Opinion Online; Encyclopedia of Associations; Official Airline Guide Adventure Atlas and Events Calendar; CENDATA; Hollywood Hotline; Fearless Taster; Soap Opera Summaries; and Human Sexuality. (LRW)
Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation.

PubMed

Rognes, Torbjørn

2011-06-01

The Smith-Waterman algorithm for local sequence alignment is more sensitive than heuristic methods for database searching, but also more time-consuming. The fastest approach to parallelisation with SIMD technology has previously been described by Farrar in 2007. The aim of this study was to explore whether further speed could be gained by other approaches to parallelisation. A faster approach and implementation is described and benchmarked. In the new tool SWIPE, residues from sixteen different database sequences are compared in parallel to one query residue. Using a 375 residue query sequence a speed of 106 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon X5650 six-core processor system, which is over six times more rapid than software based on Farrar's 'striped' approach. SWIPE was about 2.5 times faster when the programs used only a single thread. For shorter queries, the increase in speed was larger. SWIPE was about twice as fast as BLAST when using the BLOSUM50 score matrix, while BLAST was about twice as fast as SWIPE for the BLOSUM62 matrix. The software is designed for 64 bit Linux on processors with SSSE3. Source code is available from http://dna.uio.no/swipe/ under the GNU Affero General Public License. Efficient parallelisation using SIMD on standard hardware makes it possible to run Smith-Waterman database searches more than six times faster than before. The approach described here could significantly widen the potential application of Smith-Waterman searches. Other applications that require optimal local alignment scores could also benefit from improved performance.
Second-Tier Database for Ecosystem Focus, 2003-2004 Annual Report.

DOE Office of Scientific and Technical Information (OSTI.GOV)

University of Washington, Columbia Basin Research, DART Project Staff,

2004-12-01

The Second-Tier Database for Ecosystem Focus (Contract 00004124) provides direct and timely public access to Columbia Basin environmental, operational, fishery and riverine data resources for federal, state, public and private entities essential to sound operational and resource management. The database also assists with juvenile and adult mainstem passage modeling supporting federal decisions affecting the operation of the FCRPS. The Second-Tier Database known as Data Access in Real Time (DART) integrates public data for effective access, consideration and application. DART also provides analysis tools and performance measures for evaluating the condition of Columbia Basin salmonid stocks. These services are critical tomore » BPA's implementation of its fish and wildlife responsibilities under the Endangered Species Act (ESA).« less
Value of circulating cell-free DNA analysis as a diagnostic tool for breast cancer: a meta-analysis

PubMed Central

Ma, Xuelei; Zhang, Jing; Hu, Xiuying

2017-01-01

Objectives The aim of this study was to systematically evaluate the diagnostic value of cell free DNA (cfDNA) for breast cancer. Results Among 308 candidate articles, 25 with relevant diagnostic screening qualified for final analysis. The mean sensitivity, specificity and area under the curve (AUC) of SROC plots for 24 studies that distinguished breast cancer patients from healthy controls were 0.70, 0.87, and 0.9314, yielding a DOR of 32.31. When analyzed in subgroups, the 14 quantitative studies produced sensitivity, specificity, AUC, and a DOR of 0.78, 0.83, 0.9116, and 24.40. The 10 qualitative studies produced 0.50, 0.98, 0.9919, and 68.45. For 8 studies that distinguished malignant breast cancer from benign diseases, the specificity, sensitivity, AUC and DOR were 0.75, 0.79, 0.8213, and 9.49. No covariate factors had a significant correlation with relative DOR. Deek's funnel plots indicated an absence of publication bias. Materials and Methods Databases were searched for studies involving the use of cfDNA to diagnose breast cancer. The studies were analyzed to determine sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, diagnostic odds ratio (DOR), and the summary receiver operating characteristic (SROC). Covariates were evaluated for effect on relative DOR. Deek's Funnel plots were generated to measure publication bias. Conclusions Our analysis suggests a promising diagnostic potential of using cfDNA for breast cancer screening, but this diagnostic method is not yet independently sufficient. Further work refining qualitative cfDNA assays will improve the correct diagnosis of breast cancers. PMID:28460452
NASA scientific and technical publications: A catalog of special publications, reference publications, conference publications, and technical papers, 1991-1992

NASA Technical Reports Server (NTRS)

1993-01-01

This catalog lists 458 citations of all NASA Special Publications, NASA Reference Publications, NASA Conference Publications, and NASA Technical Papers that were entered into the NASA Scientific and Technical Information database during accession year 1991 through 1992. The entries are grouped by subject category. Indexes of subject terms, personal authors, and NASA report numbers are provided.

Publications | Hydrogen and Fuel Cells | NREL

Science.gov Websites

, and demonstration activities in hydrogen and fuel cells. NREL Publications Database Access the full library of our publications. Search the database View all NREL publications about hydrogen and fuel cell research. Transportation and Hydrogen Newsletter Get semi-monthly updates on NREL's research, development
PoMaMo—a comprehensive database for potato genome data

PubMed Central

Meyer, Svenja; Nagel, Axel; Gebhardt, Christiane

2005-01-01

A database for potato genome data (PoMaMo, Potato Maps and More) was established. The database contains molecular maps of all twelve potato chromosomes with about 1000 mapped elements, sequence data, putative gene functions, results from BLAST analysis, SNP and InDel information from different diploid and tetraploid potato genotypes, publication references, links to other public databases like GenBank (http://www.ncbi.nlm.nih.gov/) or SGN (Solanaceae Genomics Network, http://www.sgn.cornell.edu/), etc. Flexible search and data visualization interfaces enable easy access to the data via internet (https://gabi.rzpd.de/PoMaMo.html). The Java servlet tool YAMB (Yet Another Map Browser) was designed to interactively display chromosomal maps. Maps can be zoomed in and out, and detailed information about mapped elements can be obtained by clicking on an element of interest. The GreenCards interface allows a text-based data search by marker-, sequence- or genotype name, by sequence accession number, gene function, BLAST Hit or publication reference. The PoMaMo database is a comprehensive database for different potato genome data, and to date the only database containing SNP and InDel data from diploid and tetraploid potato genotypes. PMID:15608284
Human Variome Project Quality Assessment Criteria for Variation Databases.

PubMed

Vihinen, Mauno; Hancock, John M; Maglott, Donna R; Landrum, Melissa J; Schaafsma, Gerard C P; Taschner, Peter

2016-06-01

Numerous databases containing information about DNA, RNA, and protein variations are available. Gene-specific variant databases (locus-specific variation databases, LSDBs) are typically curated and maintained for single genes or groups of genes for a certain disease(s). These databases are widely considered as the most reliable information source for a particular gene/protein/disease, but it should also be made clear they may have widely varying contents, infrastructure, and quality. Quality is very important to evaluate because these databases may affect health decision-making, research, and clinical practice. The Human Variome Project (HVP) established a Working Group for Variant Database Quality Assessment. The basic principle was to develop a simple system that nevertheless provides a good overview of the quality of a database. The HVP quality evaluation criteria that resulted are divided into four main components: data quality, technical quality, accessibility, and timeliness. This report elaborates on the developed quality criteria and how implementation of the quality scheme can be achieved. Examples are provided for the current status of the quality items in two different databases, BTKbase, an LSDB, and ClinVar, a central archive of submissions about variants and their clinical significance. © 2016 WILEY PERIODICALS, INC.
Efficient bibliographic searches on allergy using ISI databases.

PubMed

Sáez Gómez, J M; Annan, J W; Negro Alvarez, J M; Guillen-Grima, F; Bozzola, C M; Ivancevich, J C; Aguinaga Ontoso, E

2008-01-01

The aim of this article is to provide an introduction to using databases from the Thomson ISI Web of Knowledge, with special reference to Citation Indexes as an analysis tool for publications, and also to explain the meaning of the well-known Impact Factor. We present the partially modified new Consultation Interface to enhance information search routines of these databases. It introduces distinctive methods in search bibliography, including the correct application of analysis tools, paying particular attention to Journal Citation Reports and Impact Factor. We finish this article with comment on the consequences of using the Impact Factor as a quality indicator for the assessment of journals and publications, and how to ensure measures for indexing in the Thomson ISI Databases.
Online vs. Print Publications: Users' Opinions.

ERIC Educational Resources Information Center

Wang, Chih

The rapid expansion of online publications has raised some concerns about the use of online databases in comparison with using traditional print publications. To determine the opinions of end users about using Dialog online databases versus their corresponding print versions, three libraries in Atlanta, Georgia--Atlanta-Fulton Public Library,…
Protein Bioinformatics Databases and Resources

PubMed Central

Chen, Chuming; Huang, Hongzhan; Wu, Cathy H.

2017-01-01

Many publicly available data repositories and resources have been developed to support protein related information management, data-driven hypothesis generation and biological knowledge discovery. To help researchers quickly find the appropriate protein related informatics resources, we present a comprehensive review (with categorization and description) of major protein bioinformatics databases in this chapter. We also discuss the challenges and opportunities for developing next-generation protein bioinformatics databases and resources to support data integration and data analytics in the Big Data era. PMID:28150231
Going public: accessing urban data and producing population estimates using the urban FIA database

Treesearch

Chris Edgar; Mark Hatfield

2015-01-01

In this presentation we describe the urban forest inventory database (U-FIADB) and demonstrate how to use the database to produce population estimates. Examples from the recently completed City of Austin inventory will be used to demonstrate the capabilities of the database. We will identify several features of U-FIADB that are different from the FIA database (FIADB)...
DNA barcoding of vouchered xylarium wood specimens of nine endangered Dalbergia species

Treesearch

Min Yu; Lichao Jiao; Juan Guo; Alex C. Wiedenhoeft; Tuo He; Xiaomei Jiang; Yafang Yin

2017-01-01

ITS2+trnH-psbA was the best combination of DNA barcode to resolve the Dalbergia wood species studied. We demonstrate the feasibility of building a DNA barcode reference database using xylarium wood specimens.
The Vocational Guidance Research Database: A Scientometric Approach

ERIC Educational Resources Information Center

Flores-Buils, Raquel; Gil-Beltran, Jose Manuel; Caballer-Miedes, Antonio; Martinez-Martinez, Miguel Angel

2012-01-01

The scientometric study of scientific output through publications in specialized journals cannot be undertaken exclusively with the databases available today. For this reason, the objective of this article is to introduce the "Base de Datos de Investigacion en Orientacion Vocacional" [Vocational Guidance Research Database], based on the…
CD-ROM-aided Databases

NASA Astrophysics Data System (ADS)

Masuyama, Keiichi

CD-ROM has rapidly evolved as a new information medium with large capacity, In the U.S. it is predicted that it will become two hundred billion yen market in three years, and thus CD-ROM is strategic target of database industry. Here in Japan the movement toward its commercialization has been active since this year. Shall CD-ROM bussiness ever conquer information market as an on-disk database or electronic publication? Referring to some cases of the applications in the U.S. the author views marketability and the future trend of this new optical disk medium.
footprintDB: a database of transcription factors with annotated cis elements and binding interfaces.

PubMed

Sebastian, Alvaro; Contreras-Moreira, Bruno

2014-01-15

Traditional and high-throughput techniques for determining transcription factor (TF) binding specificities are generating large volumes of data of uneven quality, which are scattered across individual databases. FootprintDB integrates some of the most comprehensive freely available libraries of curated DNA binding sites and systematically annotates the binding interfaces of the corresponding TFs. The first release contains 2422 unique TF sequences, 10 112 DNA binding sites and 3662 DNA motifs. A survey of the included data sources, organisms and TF families was performed together with proprietary database TRANSFAC, finding that footprintDB has a similar coverage of multicellular organisms, while also containing bacterial regulatory data. A search engine has been designed that drives the prediction of DNA motifs for input TFs, or conversely of TF sequences that might recognize input regulatory sequences, by comparison with database entries. Such predictions can also be extended to a single proteome chosen by the user, and results are ranked in terms of interface similarity. Benchmark experiments with bacterial, plant and human data were performed to measure the predictive power of footprintDB searches, which were able to correctly recover 10, 55 and 90% of the tested sequences, respectively. Correctly predicted TFs had a higher interface similarity than the average, confirming its diagnostic value. Web site implemented in PHP,Perl, MySQL and Apache. Freely available from http://floresta.eead.csic.es/footprintdb.
HTSFinder: Powerful Pipeline of DNA Signature Discovery by Parallel and Distributed Computing

PubMed Central

Karimi, Ramin; Hajdu, Andras

2016-01-01

Comprehensive effort for low-cost sequencing in the past few years has led to the growth of complete genome databases. In parallel with this effort, a strong need, fast and cost-effective methods and applications have been developed to accelerate sequence analysis. Identification is the very first step of this task. Due to the difficulties, high costs, and computational challenges of alignment-based approaches, an alternative universal identification method is highly required. Like an alignment-free approach, DNA signatures have provided new opportunities for the rapid identification of species. In this paper, we present an effective pipeline HTSFinder (high-throughput signature finder) with a corresponding k-mer generator GkmerG (genome k-mers generator). Using this pipeline, we determine the frequency of k-mers from the available complete genome databases for the detection of extensive DNA signatures in a reasonably short time. Our application can detect both unique and common signatures in the arbitrarily selected target and nontarget databases. Hadoop and MapReduce as parallel and distributed computing tools with commodity hardware are used in this pipeline. This approach brings the power of high-performance computing into the ordinary desktop personal computers for discovering DNA signatures in large databases such as bacterial genome. A considerable number of detected unique and common DNA signatures of the target database bring the opportunities to improve the identification process not only for polymerase chain reaction and microarray assays but also for more complex scenarios such as metagenomics and next-generation sequencing analysis. PMID:26884678
HTSFinder: Powerful Pipeline of DNA Signature Discovery by Parallel and Distributed Computing.

PubMed

Karimi, Ramin; Hajdu, Andras

2016-01-01

Comprehensive effort for low-cost sequencing in the past few years has led to the growth of complete genome databases. In parallel with this effort, a strong need, fast and cost-effective methods and applications have been developed to accelerate sequence analysis. Identification is the very first step of this task. Due to the difficulties, high costs, and computational challenges of alignment-based approaches, an alternative universal identification method is highly required. Like an alignment-free approach, DNA signatures have provided new opportunities for the rapid identification of species. In this paper, we present an effective pipeline HTSFinder (high-throughput signature finder) with a corresponding k-mer generator GkmerG (genome k-mers generator). Using this pipeline, we determine the frequency of k-mers from the available complete genome databases for the detection of extensive DNA signatures in a reasonably short time. Our application can detect both unique and common signatures in the arbitrarily selected target and nontarget databases. Hadoop and MapReduce as parallel and distributed computing tools with commodity hardware are used in this pipeline. This approach brings the power of high-performance computing into the ordinary desktop personal computers for discovering DNA signatures in large databases such as bacterial genome. A considerable number of detected unique and common DNA signatures of the target database bring the opportunities to improve the identification process not only for polymerase chain reaction and microarray assays but also for more complex scenarios such as metagenomics and next-generation sequencing analysis.
Release of ToxCastDB and ExpoCastDB databases

EPA Science Inventory

EPA has released two databases - the Toxicity Forecaster database (ToxCastDB) and a database of chemical exposure studies (ExpoCastDB) - that scientists and the public can use to access chemical toxicity and exposure data. ToxCastDB users can search and download data from over 50...
32 CFR 105.15 - Defense Sexual Assault Incident Database (DSAID).

Code of Federal Regulations, 2013 CFR

2013-07-01

... 32 National Defense 1 2013-07-01 2013-07-01 false Defense Sexual Assault Incident Database (DSAID... Sexual Assault Incident Database (DSAID). (a) Purpose. (1) In accordance with section 563 of Public Law... activities. It shall serve as a centralized, case-level database for the collection and maintenance of...
32 CFR 105.15 - Defense Sexual Assault Incident Database (DSAID).

Code of Federal Regulations, 2014 CFR

2014-07-01

... 32 National Defense 1 2014-07-01 2014-07-01 false Defense Sexual Assault Incident Database (DSAID... Sexual Assault Incident Database (DSAID). (a) Purpose. (1) In accordance with section 563 of Public Law... activities. It shall serve as a centralized, case-level database for the collection and maintenance of...
Indigenous species barcode database improves the identification of zooplankton

PubMed Central

Yang, Jianghua; Zhang, Wanwan; Sun, Jingying; Xie, Yuwei; Zhang, Yimin; Burton, G. Allen; Yu, Hongxia

2017-01-01

Incompleteness and inaccuracy of DNA barcode databases is considered an important hindrance to the use of metabarcoding in biodiversity analysis of zooplankton at the species-level. Species barcoding by Sanger sequencing is inefficient for organisms with small body sizes, such as zooplankton. Here mitochondrial cytochrome c oxidase I (COI) fragment barcodes from 910 freshwater zooplankton specimens (87 morphospecies) were recovered by a high-throughput sequencing platform, Ion Torrent PGM. Intraspecific divergence of most zooplanktons was < 5%, except Branchionus leydign (Rotifer, 14.3%), Trichocerca elongate (Rotifer, 11.5%), Lecane bulla (Rotifer, 15.9%), Synchaeta oblonga (Rotifer, 5.95%) and Schmackeria forbesi (Copepod, 6.5%). Metabarcoding data of 28 environmental samples from Lake Tai were annotated by both an indigenous database and NCBI Genbank database. The indigenous database improved the taxonomic assignment of metabarcoding of zooplankton. Most zooplankton (81%) with barcode sequences in the indigenous database were identified by metabarcoding monitoring. Furthermore, the frequency and distribution of zooplankton were also consistent between metabarcoding and morphology identification. Overall, the indigenous database improved the taxonomic assignment of zooplankton. PMID:28977035
FORENSIC DNA BANKING LEGISLATION IN DEVELOPING COUNTRIES: PRIVACY AND CONFIDENTIALITY CONCERNS REGARDING A DRAFT FROM TURKISH LEGISLATION.

PubMed

Ilgili, Önder; Arda, Berna

This paper presents and analyses, in terms of privacy and confidentiality, the Turkish Draft Law on National DNA Database prepared in 2004, and concerning the use of DNA analysis for forensic objectives and identity verification in Turkey. After a short introduction including related concepts, we evaluate the draft law and provide articles about confidentiality. The evaluation reminded us of some important topics at international level for the developing countries. As a result, the need for sophisticated legislations about DNA databases, for solutions to issues related to the education of employees, and the technological dependency to other countries emerged as main challenges in terms of confidentiality for the developing countries. As seen in the Turkish Draft Law on National DNA Database, the protection of the fundamental rights and freedoms requires more care during the legislative efforts.
Development of a Consumer Product Ingredient Database for ...

EPA Pesticide Factsheets

Consumer products are a primary source of chemical exposures, yet little structured information is available on the chemical ingredients of these products and the concentrations at which ingredients are present. To address this data gap, we created a database of chemicals in consumer products using product Material Safety Data Sheets (MSDSs) publicly provided by a large retailer. The resulting database represents 1797 unique chemicals mapped to 8921 consumer products and a hierarchy of 353 consumer product “use categories” within a total of 15 top-level categories. We examine the utility of this database and discuss ways in which it will support (i) exposure screening and prioritization, (ii) generic or framework formulations for several indoor/consumer product exposure modeling initiatives, (iii) candidate chemical selection for monitoring near field exposure from proximal sources, and (iv) as activity tracers or ubiquitous exposure sources using “chemical space” map analyses. Chemicals present at high concentrations and across multiple consumer products and use categories that hold high exposure potential are identified. Our database is publicly available to serve regulators, retailers, manufacturers, and the public for predictive screening of chemicals in new and existing consumer products on the basis of exposure and risk. The National Exposure Research Laboratory’s (NERL’s) Human Exposure and Atmospheric Sciences Division (HEASD) conducts resear
eCOMPAGT integrates mtDNA: import, validation and export of mitochondrial DNA profiles for population genetics, tumour dynamics and genotype-phenotype association studies.

PubMed

Weissensteiner, Hansi; Schönherr, Sebastian; Specht, Günther; Kronenberg, Florian; Brandstätter, Anita

2010-03-09

Mitochondrial DNA (mtDNA) is widely being used for population genetics, forensic DNA fingerprinting and clinical disease association studies. The recent past has uncovered severe problems with mtDNA genotyping, not only due to the genotyping method itself, but mainly to the post-lab transcription, storage and report of mtDNA genotypes. eCOMPAGT, a system to store, administer and connect phenotype data to all kinds of genotype data is now enhanced by the possibility of storing mtDNA profiles and allowing their validation, linking to phenotypes and export as numerous formats. mtDNA profiles can be imported from different sequence evaluation programs, compared between evaluations and their haplogroup affiliations stored. Furthermore, eCOMPAGT has been improved in its sophisticated transparency (support of MySQL and Oracle), security aspects (by using database technology) and the option to import, manage and store genotypes derived from various genotyping methods (SNPlex, TaqMan, and STRs). It is a software solution designed for project management, laboratory work and the evaluation process all-in-one. The extended mtDNA version of eCOMPAGT was designed to enable error-free post-laboratory data handling of human mtDNA profiles. This software is suited for small to medium-sized human genetic, forensic and clinical genetic laboratories. The direct support of MySQL and the improved database security options render eCOMPAGT a powerful tool to build an automated workflow architecture for several genotyping methods. eCOMPAGT is freely available at http://dbis-informatik.uibk.ac.at/ecompagt.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.